Is there any way to reduce the amount of hallucinated URLs when sourcing its answers ? Worst part is , quite often it seems gemini actually finds the correct information and gives the correct answer , but all the sources it gives are hallucinated and hence its extremely hard to validate this information outside of doing my own search, which basically makes it useless to do in the first place . Am i misunderstanding why it does this ? I require it to either give a response with source of the information as well or just say it didnt find any information . using different models doesnt exactly help either , the outcome is more or less same with both flash and pro.
I’ve encountered this too. Recently, I found ChatGPT 4o-search quite good.
Hi @mjmak , Welcome to the forum.
Could you please share the exact prompt you are using when you get these hallucinated URLs or sources?
Thanks
Hi sure check below , just to add i feel in ai studio the output is slightly better than through api , but i dont think i have enough data to be 100% sure here. i did try to play around increasing the negative connotation with hallucinated urls but didnt really help . sometimes all urls are hallucinated , sometimes just few , sometimes none. but generally for idk 80% of the results there is at least one hallucinated URL .
— Prompt 1 —
SYSTEM (one line, no extras)
Answer ONLY with facts you can trace to the URLs you cite; if none exist, return NOT_FOUND.
USER
Research public fundraising for the company exactly named “name of company here”.
- Identify the most recent funding round (newest by announced date).
- Summarize: amount, currency, round stage, announced date, lead + other investors.
- Decide if the round is significant — round stage ≥ Series A OR amount ≥ 2 000 000 USD (after currency conversion).
- Provide a “Sources” section containing at least one working URLs.
– Use primary sources (regulatory filings, press releases, reputable media) when available.
– If no qualifying sources exist, output “NOT_FOUND” in both the summary and JSON and skip step 3.
After the prose summary, output exactly the JSON object below and nothing else:
json
{
“company_description”: “”,
“website_url”: “<url or "n/a">”,
“hq_location”: “<City, Country or "n/a">”,
“fundraising_report_date”: “”,
“last_round_date”: “<YYYY-MM-DD or "n/a">”,
“last_round_stage”: “<pre-seed | seed | Series A | Series B | Series C+ | venture/growth | "n/a">”,
“round_type”: “<equity | SAFE | convertible_note | grant | debt | "n/a">”,
“fundraised_amount”: ,
“fundraised_currency”: “<ISO 4217 or "n/a">”,
“fundraised_amount_usd”: ,
“post_money_valuation”: ,
“lead_investors”: [“”, …] | ,
“other_notable_investors”: [“”, …] | ,
“sources”: [“”, “”, …] | ,
“confidence_score”: “<High | Medium | Low>”
}
didnt try openai yet , i did try perplexity deep research (cant remember the exact name)with all the settings to low and it worked quite well but its extremely expensive cca 1.20+ USD per 5 prompts , which generally makes it more expensive than hiring some intern to do this so not exactly worth then