I’ve noticed that the maximum output length in Google AI Studio appears limited to 8192 tokens, and this value seems fixed and not configurable.
My specific use case involves creating comprehensive “super prompts”—prompts generated by another AI prompt, often exceeding 20,000 tokens in length when all research, RAG, and output templates with examples are included.
While an 8k-token limit might be sufficient for simpler scenarios, my application specifically relies on the ability to generate significantly longer prompts programmatically through the Gemini API.
Could someone clarify:
- Is the 8192-token output limitation also enforced when accessing Gemini through the programmatic API, or is it only a restriction of the AI Studio UI?
- Does this limitation apply equally to the most recent models, or are there newer models or configurations that support longer outputs?
- Are there recommended workarounds (e.g., chunking, pagination, or streaming) for generating outputs larger than the current token limit, or is Google considering increasing this limit in the foreseeable future?
Any insights or suggestions would be greatly appreciated!