Can I increase max_output_tokens

Rin_Rin · December 17, 2024, 8:58pm

Can i change

generation_config = {

“temperature”: 2,
“top_p”: 0.95,
“top_k”: 40,
“max_output_tokens”: 8192, —> this higer?
“response_mime_type”: “text/plain”,

}

Richard_Brandes · December 17, 2024, 10:09pm

8192 appears to still be the maximum output length, setting it above this has no effect. A way to ‘bypass’ this limitation is prompt chaining, or by telling the model to output [CONTINUE] at the bottom of its response if the complete response requires more tokens than permitted in a single reply. Then just send “continue” as your next response and the model should pick up where it left off (assuming the context window is large enough to include the previous chat history)

jkirstaetter · December 18, 2024, 6:58am

Hi,

Welcome to the forum.

The limit is model-dependent and you should inspect the value of the outputTokenLimit property. Some earlier models have lower values like 1024, 2048, or 4096 only. Use the listModels functionality of an SDK or use the REST API call.

# Get a list of available models.
GET https://generativelanguage.googleapis.com/v1beta/models
x-goog-api-key: {{apiKey}}

Topic		Replies	Views
Clarification on Gemini Output Limit (8192 tokens) for API Access and Latest Models — Need 20k+ Tokens Gemini API api	1	537	March 18, 2025
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2500	October 12, 2024
Gemini 2.0 thinking model returning truncated response with a blob of whitespace Gemini API gemini-20	6	979	January 25, 2025
How to expand Gemini output window Gemini API help-request , new-features	6	1497	October 9, 2024
Output/Input tokens - model gemini-1.5-flash-001 Gemini API gemini-15 , models	1	276	August 26, 2024

Can I increase max_output_tokens

generation_config = {

}

Related topics