Gemini 2.5-Pro and Flash both have URL context. This is all fine and dandy when you use the consumer app, it can easily access any URL. Now when I use the API, things get patchy. It is consistently unable to open certain URLs. For context, i have a site (.archive.site.com/files for example) and it hosts pdfs (.archive.site.com/files/1.pdf , .archive.site.com/files/2.pdf etc). (These aren’t real links, just an example) I’m using these pdfs as reference. Using the consumer app, it can access all of these webpages easily. When I use the API, it 's only able to access a around half for no reason, and this doesn’t change at all even after days, and repeated requests, and its always the same files, so its not anything temporary. I checked robots.txt but that seems not to be the issue. “Why don’t you download the pdfs” Cause they’re fat and I have no storage space for half a thousand of them. “Why don’t you text parse using beautiful soup or something” There is integral tabular and image data that would be lost this way.
Is there any way to fix this issue with gemini’s API? It can access random wikipedia pages fine if i change the link list. I know it isn’t a large context issue because I have the ability to only provide it with 3-5 of the relevant pdfs instead of all of them, but of those 3-5 it often can’t access them.
Do appreciate how well it works when it does work though.
I know URL context is still in the works and is experimental but i didnt expect it to be this unstable.
I saw a similar thread for the 2.0 model but it had been claimed to be fixed. No specific explanation apart from that.