Google is embarrassed about its AI Overviews, too. After a deluge of dunks and memes over the past week, which cracked on the poor quality and outright misinformation that arose from the tech giant’s underbaked new AI-powered search feature, the company on Thursday issued a mea culpa of sorts. Google — a company whose name is synonymous with searching the web — whose brand focuses on “organizing the world’s information” and putting it at user’s fingertips — actually wrote in a blog post that “some odd, inaccurate or unhelpful AI Overviews certainly did show up.”
That’s putting it mildly.
The admission of failure, penned by Google VP and Head of Search Liz Reid, seems a testimony as to how the drive to mash AI technology into everything has now somehow made Google Search worse.
In the post titled “About last week,” (this got past PR?), Reid spells out the many ways its AI Overviews make mistakes. While they don’t “hallucinate” or make things up the way that other large language models (LLMs) may, she says, they can get things wrong for “other reasons,” like “misinterpreting queries, misinterpreting a nuance of language on the web, or not having a lot of great information available.”
Reid also noted that some of the screenshots shared on social media over the past week were faked, while others were for nonsensical queries, like “How many rocks should I eat?” — something no one ever really searched for before. Since there’s little factual information on this topic, Google’s AI guided a user to satirical content. (In the case of the rocks, the satirical content had been published on a geological software provider’s website.)
It’s worth pointing out that if you had Googled “How many rocks should I eat?” and were presented with a set of unhelpful links, or even a jokey article, you wouldn’t be surprised. What people are reacting to is the confidence with which the AI spouted back that “geologists recommend eating at least one small rock per day” as if it’s a factual answer. It may not be a “hallucination,” in technical terms, but the end user doesn’t care. It’s insane.
What’s unsettling, too, is that Reid claims Google “tested the feature extensively before launch,” including with “robust red-teaming efforts.”
Does no one at Google have a sense of humor then? No one thought of prompts that would generate poor results?
In addition, Google downplayed the AI feature’s reliance on Reddit user data as a source of knowledge and truth. Although people have regularly appended “Reddit” to their searches for so long that Google finally made it a built-in search filter, Reddit is not a body of factual knowledge. And yet the AI would point to Reddit forum posts to answer questions, without an understanding of when first-hand Reddit knowledge is helpful and when it is not — or worse, when it is a troll.
Reddit today is making bank by offering its data to companies like Google, OpenAI and others to train their models, but that doesn’t mean users want Google’s AI deciding when to search Reddit for an answer, or suggesting that someone’s opinion is a fact. There’s nuance to learning when to search Reddit and Google’s AI doesn’t understand that yet.
As Reid admits, “forums are often a great source of authentic, first-hand information, but in some cases can lead to less-than-helpful advice, like using glue to get cheese to stick to pizza,” she said, referencing one of the AI feature’s more spectacular failures over the past week.
Google AI overview suggests adding glue to get cheese to stick to pizza, and it turns out the source is an 11 year old Reddit comment from user F*cksmith 😂 pic.twitter.com/uDPAbsAKeO
— Peter Yang (@petergyang) May 23, 2024
If last week was a disaster, though, at least Google is iterating quickly as a result — or so it says.
The company says it’s looked at examples from AI Overviews and identified patterns where it could do better, including building better detection mechanisms for nonsensical queries, limiting the user of user-generated content for responses that could offer misleading advice, adding triggering restrictions for queries where AI Overviews were not helpful, not showing AI Overviews for hard news topics, “where freshness and factuality are important,” and adding additional triggering refinements to its protections for health searches.
With AI companies building ever-improving chatbots every day, the question is not on whether they will ever outperform Google Search for helping us understand the world’s information, but whether Google Search will ever be able to get up to speed on AI to challenge them in return.
As ridiculous as Google’s mistakes may be, it’s too soon to count it out of the race yet — especially given the massive scale of Google’s beta-testing crew, which is essentially anybody who uses search.
“There’s nothing quite like having millions of people using the feature with many novel searches,” says Reid.