Fatal: The Gemini series (especially DeepResearch) hallucinates very severely in patent retrieval tasks

Recently, I’ve been using the Deep Research feature of Gemini’s web version for patent-related research tasks, generating reports with conclusions, and the results look quite good, which I really like.

I guess this is because Google has its own patent database, which is free, open, and easy to access. It doesn’t require deep connections; as long as you have the publication number of a patent, you can access the full text. This undoubtedly makes the Gemini series potentially good at patent-related tasks.

However, today, I tried to have Deep Research complete some specific patent tasks, such as finding the closest existing patent K to a target patent (also known as patent search, which is one of the most basic and common tasks in the patent industry). Deep Research completed the report as usual, stating that it found a comparative document A, and provided a detailed comparison of the content of patent A with patent K, concluding that A could invalidate patent K—this is a very important conclusion.

Deep Research also provided all the source links, which looked reliable, and previously, clicking on these links usually worked.

But this time, when I went to verify its conclusion, I found that the patent A it provided simply did not exist. No matter how I tried to search using the other descriptive methods it provided, such as patent name, applicant, abstract, etc., I couldn’t find it.
I then asked it to verify itself, but it repeatedly failed to provide the correct number. Because it described the content of the patent it provided, I tried to search for it myself, and then I confirmed that these contents were also “hallucinations.”
The link it provided for patent A (a very short and direct Google Patents link) was also completely wrong.

Next, I opened Google Grounding in AI Studio and tested a similar task using Gemini-25-Pro; and the result was the same. It seemingly completed the task with great confidence, but the content, number, and link of the patent it provided were all hallucinations.

I’m very frustrated because I frequently verify links in Deep Research and AI Studio, and there are rarely errors. However, I’ve tested the patent tasks multiple times, and only once did it provide the correct patent number and content. All other times were hallucinations, and even after pointing it out, it repeatedly traced back its thought process but couldn’t eliminate the error, continuing to give the same results.

I think this might be some kind of bug. In the background, it correctly performed the patent search, but some mechanism caused it to mistake content near the page for the correct patent content. After all, Google, with its own patent database, should have a significant advantage in such tasks.

Below is a link to an AI Studio I tested. In it, I provided a Chinese patent search document and asked it to complete the patent search for the material using Chinese prompts. It gave incorrect patent numbers, content, and information, and even after I pointed it out, it still insisted on the same conclusion, which can be used for research.

https://aistudio.google.com/app/prompts?state={"ids":["1aEMEr9g2VphnHpv9WnaqljeW9ndxfsir"],"action":"open","userId":"109508823435168261523","resourceKeys":{}}&usp=sharing, 河南中心第九届知豫杯检索大赛答题卡试题-机械1.docx (Converted - 2025-07-09 15:40) - Google Docs

Hello,

What you are describing is a case of “confabulation”, in this model generate content
(in your case sources) which is not real. There can be multiple reasons for confabulation, prompting can be one of them. So you can try to experiment with your prompts, be more precise and informative to avoid this issue. Human check is always recommended in research related topics.

I am sharing some sources where you can read more about it:
1. Paper : Confabulation: The Surprising Value of Large Language Model Hallucinations
2. Medium blog post: LLM Hallucinations Vs. LLM Confabulations
3. Towards data science blog: A New Method to Detect “Confabulations” Hallucinated by Large Language Models

I understand the current tendency of large models to confabulate, and I accept this while remaining vigilant. However, my point is that the patent search task I tested is a “tool call” task, rather than a “content generation” task. While confabulation is inevitable in content generation tasks, it is highly inappropriate that the Gemini series, in a patent search task, still yields confabulated results even when querying its own fully accessible patent database.
I believe this is an issue worth investigating; perhaps it’s just a very easily fixable but overlooked bug.

Hello, we have already shared your feedback with our internal team.
Thank you for your patience.