Technical Limitations of Google SGE Explained

Published on Thu, 8th Feb - 2024

Zach Jackson

Last updated on Mon, 15th Jul - 2024

Google’s generative search response, SGE (Search Generative Experience), is poised to change the way many people interact with SERPs. Understandably, this has site owners worried that their hard-fought-for visibility may be undermined. However, as impressive as SGE is, it’s still a fundamentally flawed tool.

In this article, we’ll shed light on the imperfections of SGE that suggest that Google users will still be heavily reliant on traditional Search Engine Results Pages (SERPs) and their familiar attributes.

AI hallucination

AI hallucinations refer to instances of generative artificial intelligence systems (particularly large language models) producing information, images, or text that may not be accurate or real. These responses are instead created by the AI's algorithms. These "hallucinations" can happen when the AI makes mistakes or misinterprets data, leading to the production of content that is not based on factual information.

Any AI that utilises machine learning and deep learning algorithms can hallucinate. Even the most advanced AI systems are still occasionally plagued by this form of limiting artefact, and Google’s SGE is no different.

Hallucinations can be problematic, especially when artificial intelligence is used for tasks that require accurate and reliable information, such as in medical diagnosis. This is why it's important to carefully validate any information served by Google’s AI assistant for Search. Failure to do so could result in reputational damage or worse.

Misinterpretation during corroboration

Misinterpretation during corroboration is another of the limitations of large language models. Similar to hallucination, misinterpretation during corroboration will lead to an incorrect response from Google Search Generative Experience; however, the misunderstanding is rooted in a different stage of the dialogue.

As mentioned earlier, a hallucination occurs when large language models confuse elements of the data they're using to assemble an output (a response to the user’s prompt), but misinterpretations during corroboration occur during the analysis of the prompt itself.

This misunderstanding immediately steers AI in the wrong direction when analysing data, which, in turn, skews output assembly. It might be that SGE doesn’t quite grasp the correct context of a query, or that it’s not sure of the query intent.

For example, someone might ask Google about the best things to do in Baltimore. SGE could well respond with information about Baltimore, Maryland, unaware that the user actually needed information about Baltimore, Ireland.

But a query doesn’t have to be so clearly flawed to cause a problem.

Even though one of SGE’s strengths is its ability to understand extended, conversational language, it’s currently not quite as powerful as it could be in this respect and might confuse elements of the interaction.

The chance for misinterpretation is an issue, but the good news about corroboration errors (and another thing that separates them from hallucinations) is that they’re relatively easy to identify in the output.

As SGE has the wrong end of the stick in these scenarios, its response will be factual but irrelevant, whereas a hallucination can appear like a logical output, yet it’s, at least in part, an accidental fabrication.

Furthermore, corroboration issues can typically be resolved by providing an AI with further context. Continuing the Baltimore example, our user could have specified that they meant the village in the Republic of Ireland rather than the city in Maryland.

Can hallucinations and corroboration misinterpretations happen simultaneously?

Though a rare occurrence, it’s technically possible for LLMs like SGE to misinterpret information during corroboration and then again when combing through data, producing a hybrid procedural error that includes both irrelevant and factually incorrect information.

Thankfully, the nonsensical nature of such a response will alert most users to the issue, ensuring the information isn’t shared.

AI Bias

Bias may seem like a strange label to apply to a non-organic entity but, being that Google leverages human content to train its AI assistant, human bias can seep into the data banks of the AI response.

Biases can present themselves in a variety of ways in an AI’s response to a query:

Data imbalances: If there is an imbalance in the representation of different groups in the training data, AI may have a skewed understanding of those groups. This can lead to biases in how it responds to questions or comments about those groups.
Contextual biases: Biases can also emerge based on the context of the input. For instance, if a user asks a politically sensitive question, the AI may respond based on prevalent political biases present in its training data.
Amplification of existing biases: AI models aim to predict the next word in a sentence based on what they've seen before in training data. If the training data is biased, the model may generate responses that amplify or reinforce existing biases, even if the AI isn't explicitly trying to be biased.
Ambiguity and lack of nuance: AI systems may lack the nuanced understanding that humans possess, leading them to produce responses that oversimplify complex issues and perpetuate stereotypes.
Implicit biases in language: Certain words or phrases carry historical biases, and AI models may inadvertently use these biassed linguistic constructs.
Feedback loop biases: When users provide feedback on AI-generated content, it can create a feedback loop that further reinforces biases. If users accept or endorse biased content, the AI may produce more of it.

An acknowledged & persistent problem

Bias in SERPs is no new issue. SGE's bias is simply an extension of an already prevalent issue shared by all search engines because of certain cultural and linguistic factors. Google offers the following example to demonstrate how this bias can occur:

‘Authoritative media entities and data providers often do not add a qualifier of “men’s” when writing about men’s sports, and generic queries about that sport may thus bias towards men’s players or teams, even if the information about women players or teams is an equally or perhaps even more accurate response.’

Researchers and developers at Google and beyond are working on bias detection and reduction techniques, including adjusting training data, fine-tuning models, and implementing fairness and ethics guidelines. However, bias in SERPs and AI SERP features is extremely hard to remedy, as it’s not a technical issue but a social one — it’s about human inequalities and the way they bleed into language and culture.

Speed

SGE is a powerful tool — but it’s not exactly the most efficient way to source information on Google. SGE snapshots clock in at between 5 and 7 seconds, while current SERPs features are often presented in less than a 10th of a second.

Being that people are demanding greater efficiency from search engines as the years go by, it stands to reason that this is going to be a big limitation in the eyes of users. In many instances, Google will have delivered enough information to satisfy a query before SGE has a chance to formulate a snapshot.

Absent attributions

One of SGE’s technical issues that’s causing quite a stir is the lack of proper source attribution in the AI snapshot. This omission is a significant concern for several reasons:

Transparency: Source attribution is crucial for transparency and accountability. When AI-generated content lacks clear citations or attributions, it becomes challenging for users to verify the information or gauge its reliability.
Credibility: Proper attribution of sources is a fundamental aspect of credible information. Without clear citations, users may question the accuracy and trustworthiness of the information provided by the AI. It becomes difficult to distinguish between well-established facts and unverified claims or opinions.
Plagiarism concerns: The absence of source attribution raises concerns about potential plagiarism. AI-generated responses that lack proper citations might inadvertently use text or ideas from copyrighted or proprietary sources without giving credit. This can potentially lead to legal and ethical issues.
Educational value: For educational purposes, source attribution is essential. Students, researchers, and learners rely on accurate citations to trace the origins of information, verify facts, and build upon existing knowledge. The absence of clear source information can hinder these educational pursuits.
Misinformation control: In the age of online misinformation, it is important to have a clear trail to follow back to the source of any given piece of information. Proper source attribution is a valuable tool for identifying and addressing the spread of false or misleading information.

Final thoughts

Powerful though it may be, SGE has its limitations, meaning users will continue to seek the comfort of conventional search engine results and their familiar features (bar a few downgraded rich snippets), especially when precision and depth of information are paramount.

SGE will of course get better over time, but, remember — it’s not just a tool for Google users; it also holds promise for site owners and content creators, providing opportunities to boost traffic and visibility in its own distinctive way. Understanding and harnessing the power of this technology, while remaining aware of its limitations, can place your content in prime position on the SERPs.

If you’d like assistance embracing this change and seizing the SEO opportunities it presents, contact us today.