I’m working on a search engine that send the found documents to Flash model for summarization.
When I retrieve the docs and send them to the model I get: google.api_core.exceptions.InvalidArgument: 400 The input token count (34108) exceeds the maximum number of tokens allowed (32767).
By the limits listed online this should be well within the capabilities of the model (I also tested Pro model just to get the same error). Can I define the input token limit? Is it reduced by default?
P.S.
The error occurs only when use grounding (Vertex AI search). It works OK when grounding is not used.
Grounding is not supported for non-text input with 1.5 models. Without retrieval tool it will work. Use “2.0-flash-exp” latest model with grounding if you want to upload files or multimodal input.
You can also try this in Vertex AI studio or in Google AI studio, it will throw error.
If anyone has similar issues with Gemini 2.0 Flash with structured output, I solved the issue by adding more tokens(!). By adding ~2000 additional tokens to my system prompt, marked as padding to not confuse the model, the requests worked again!