Clarification on Gemini Output Limit (8192 tokens) for API Access and Latest Models — Need 20k+ Tokens

I’ve noticed that the maximum output length in Google AI Studio appears limited to 8192 tokens, and this value seems fixed and not configurable.

My specific use case involves creating comprehensive “super prompts”—prompts generated by another AI prompt, often exceeding 20,000 tokens in length when all research, RAG, and output templates with examples are included.

While an 8k-token limit might be sufficient for simpler scenarios, my application specifically relies on the ability to generate significantly longer prompts programmatically through the Gemini API.

Could someone clarify:

  1. Is the 8192-token output limitation also enforced when accessing Gemini through the programmatic API, or is it only a restriction of the AI Studio UI?
  2. Does this limitation apply equally to the most recent models, or are there newer models or configurations that support longer outputs?
  3. Are there recommended workarounds (e.g., chunking, pagination, or streaming) for generating outputs larger than the current token limit, or is Google considering increasing this limit in the foreseeable future?

Any insights or suggestions would be greatly appreciated!

Hi @pieterkuppens , Welcome to the forum.

The output token limit, which is 8,192, remains same whether you access the model through the API or AI Studio.

As a workaround, you can explore prompt chaining and iterative generation technique where you break down larger tasks into smaller ones and build the desired output iteratively.