Can i change
generation_config = {
“temperature”: 2,
“top_p”: 0.95,
“top_k”: 40,
“max_output_tokens”: 8192, —> this higer?
“response_mime_type”: “text/plain”,
Can i change
“temperature”: 2,
“top_p”: 0.95,
“top_k”: 40,
“max_output_tokens”: 8192, —> this higer?
“response_mime_type”: “text/plain”,
8192 appears to still be the maximum output length, setting it above this has no effect. A way to ‘bypass’ this limitation is prompt chaining, or by telling the model to output [CONTINUE] at the bottom of its response if the complete response requires more tokens than permitted in a single reply. Then just send “continue” as your next response and the model should pick up where it left off (assuming the context window is large enough to include the previous chat history)
Hi,
Welcome to the forum.
The limit is model-dependent and you should inspect the value of the outputTokenLimit
property. Some earlier models have lower values like 1024, 2048, or 4096 only. Use the listModels
functionality of an SDK or use the REST API call.
# Get a list of available models.
GET https://generativelanguage.googleapis.com/v1beta/models
x-goog-api-key: {{apiKey}}