Hi there,
I’m experimenting with an app that uses Gemini for processing files (PNGs, PDFs, etc). I am assigning the files a unique ID before uploading them via the Google Files API. I then pass the API file ID to Gemini. Each subsequent request sends the chat history to Gemini and references to the Google File ID. This works and is correctly not re-uploading to Google Files each time.
However, I’ve noticed that when I have, say, 5 PDFs (each ~5MB) in the chat, subsequent queries in the same conversation (even if the query isn’t related to the PDF at all) will become extremely slow. For example, saying “Hello” when no PDFs are in the chat will normally take ~2 seconds to get the first token. However, when PDFs are in the chat, it can take up to 15 seconds.
Trying this in the Gemini site itself does not suffer from this problem. I can upload 5 PDFs, ask a question about them, then ask an unrelated question, and it will take about 3 seconds to the first token.
I am aware that caching is not yet available in Gemini Flash 2.0, but at the same time, I don’t think the amount of PDFs I have would exceed the minimum threshold for being able to use caching. Therefore, it seems that this is a problem unrelated to caching.