Recent research indicates that AI-powered search engines rely on “less popular” sources compared to traditional search engine results. This shift has raised questions about the quality and reliability of information provided by AI-driven platforms like Google’s AI Overviews, Gemini, and GPT-4o. A study has quantified the differences, showing that AI search engines frequently cite websites that wouldn’t even appear in the top 100 links in a standard Google search.
Table of contents
Official guidance: IEEE – official guidance for AI-powered search engines rely on “less popular” sources
Background Context
A research paper titled “Characterizing Web Search in The Age of Generative AI” conducted by researchers from Ruhr University in Bochum, Germany, and the Max Planck Institute for Software Systems, compared traditional Google search results with those generated by AI Overviews, Gemini-2.5-Flash, and GPT-4o. The study analyzed the sources cited by these AI-powered tools against the top links provided by Google’s standard search algorithm. The queries used in the study were drawn from diverse sources, including questions submitted to ChatGPT in the WildChat dataset, general political topics listed on AllSides, and products included in the 100 most-searched Amazon products list. The findings reveal a notable discrepancy in the popularity and ranking of the cited sources.
One key finding is that AI-powered search engines rely on “less popular” sources, as measured by the domain-tracker Tranco. The study showed that sources cited by AI engines were more likely to fall outside both the top 1,000 and top 1,000,000 domains tracked by Tranco. Gemini search, in particular, showed a tendency to cite less popular domains, with the median source falling outside Tranco’s top 1,000 across all results. This suggests that AI-driven platforms are drawing information from a different pool of sources than traditional search engines, potentially impacting the comprehensiveness and reliability of search results.
Source Diversity and Ranking Discrepancies

The research further highlighted that a significant portion of the sources cited by AI-powered search engines wouldn’t even appear in the top search results of a standard Google query. For instance, the study found that a majority of the sources cited by Google’s AI Overviews do not appear in the top 10 Google links for the same query. Specifically, 53% of the sources cited by AI Overviews were not in the top 10 Google links, and 40% didn’t even fall within the top 100. This indicates a considerable divergence in the source selection strategies employed by traditional search algorithms and AI-driven platforms.
While the use of “less popular” sources by AI-powered search engines rely on “less popular” sources might raise concerns, the researchers also noted some potential benefits. For example, GPT-based searches were more likely to cite sources like corporate entities and encyclopedias, while almost never citing social media websites. This suggests that AI-driven platforms might be prioritizing certain types of sources over others, potentially leading to more authoritative or reliable information. However, the study also found that “generative engines tend to compress information, sometimes omitting secondary or ambiguous aspects that traditional search retains,” which could be a drawback in certain contexts.
Conceptual Coverage and Information Compression

An analysis tool based on large language models (LLMs) found that AI-powered search results tended to cover a similar number of identifiable “concepts” as the traditional top 10 links. This suggests that AI-driven platforms can provide a comparable level of detail, diversity, and novelty in their results. However, the study also noted that AI-powered search engines rely on “less popular” sources and have a tendency to compress information, which can lead to the omission of secondary or ambiguous aspects that traditional search retains. This was particularly evident for more ambiguous search terms, such as names shared by different people, where traditional search results provided better coverage.
The ability of AI search engines to integrate pre-trained “internal knowledge” with data culled from cited websites is another distinguishing factor. This is especially true for GPT-4o with Search Tool, which often didn’t cite any web sources and simply provided a direct response based on its pre-training. However, this reliance on pre-trained data can be a limitation when searching for timely information. For search terms pulled from Google’s list of Trending Queries, GPT-4o with Search Tool often responded with requests for more information rather than actively searching the web for up-to-date data. These findings highlight the trade-offs between leveraging pre-trained knowledge and accessing real-time information.
Implications and Future Research Directions
The research did not definitively conclude whether AI-based search engines are overall “better” or “worse” than traditional search engine links. However, it emphasized the need for future research on “new evaluation methods that jointly consider source diversity, conceptual coverage, and synthesis behavior in generative search systems.” As AI-powered search engines rely on “less popular” sources, it becomes increasingly important to develop metrics and methodologies that can accurately assess the quality, reliability, and comprehensiveness of AI-generated search results.
Ultimately, the study underscores the evolving landscape of web search and the need for ongoing evaluation and refinement of AI-driven search platforms. While AI-powered search engines offer the potential for more efficient and personalized information retrieval, it is crucial to understand the implications of their reliance on “less popular” sources and their tendency to compress information. Future research should focus on developing evaluation methods that can ensure the accuracy, completeness, and trustworthiness of AI-generated search results, ultimately benefiting users seeking reliable and comprehensive information.
Technology Disclaimer: Product specifications and features may change. Always verify current information with official sources before making purchase decisions.
Explore more: related articles.

