Environment
- Model: gemini-3-flash-preview
- Chunk size: 512, overlap: 50, top_k: 3
Observed behavior
I ran three tests and tracked tool_use_prompt_token_count:
| Case | Files in store | Query | k | Tool tokens |
|---|---|---|---|---|
| 1 | hwpx only | Related to hwpx | 5 | 3,250 |
| 2 | xlsx only | Unrelated to xlsx | 3 | 355,631 |
| 3 | hwpx + xlsx | Same query as Case 1 | 3 | 19,960 |
Two things stand out:
Case 1 vs Case 3: Same document, same query. The only change is that an unrelated Excel file was added to the store. Tool tokens increased ~6x (3,250 → 19,960).
Case 2: When the query is completely unrelated to the stored document, tool tokens jumped to 355,631.
Why this is confusing
File Search should work via vector similarity search:
-
Embed the query
-
Retrieve top-k chunks → chunk size 512 × k=3 = ~1,536 tokens max
-
Pass those chunks as context
Under this model, adding an unrelated file to the store should have zero effect on tool token count — vector similarity search doesn’t require reading all documents. Yet Case 1 vs Case 3 clearly shows otherwise.
What I can’t figure out
The internal retrieval process is completely opaque. Is there re-ranking happening? Full document scanning? Something else? Without understanding what’s driving these token counts, it’s impossible to predict costs or confidently adopt File Search in production.
Has anyone managed to get clarity on this? Any insight would be appreciated.