Gemini 1.5 flash continually generating same text until reach max limit of token

Varshney_Kashish · November 11, 2024, 11:31am

For some prompts model fall under a infinite loop and generate a very long response that leads to repetitive lines and hit max limit of tokens.

it is very annoying, waste of tokens, money. Is there any solution of it

afirstenberg · November 11, 2024, 2:06pm

Welcome to the forums!

What models are you seeing this in?
Can you give any example prompts that are causing the issue?

A workaround is to set the maxTokens parameter to a lower value than the default 8k, but I assume that’s not what you’re asking about.

Falah_Gatea · November 20, 2024, 6:29am

in model gemini-1.5-flash-8b why max token limit to “max_output_tokens”: 8192, can I increase it

afirstenberg · November 20, 2024, 10:21am

There are several reasons, many of them somewhat technical, for why 8k is the current limit for Gemini 1.5 models. But it is worth noting that most other models don’t have more than 8k (the GPT4 models currently have 16k versions, but were at 8k until recently) and many have fewer than that.

A few big reasons there is a limit:

As noted above, sometimes models start to “run away”. A max limit tends to stop that before it happens.
The models are tuned to produce this much output and they tend to otherwise behave poorly (in terms of resources, time, and quality of response) if you go further.
Relatedly, after some limit, they tend to “lose attention”.

Since they are trained on a larger input context, it is often useful to take the limited tokens output and then send them, along with the rest of the context, with an instruction such as “continue”.

Craig_Liesinger · December 18, 2024, 3:57pm

I’ve been having same issue across models (1.5 flash, 1.5 Pro, 2.0). I’m using the firebase genkit sdk. When experimenting in the genkit developer UI I have no issues. When I deploy to firebase functions and call it, it goes crazy. max token limit keeps limit from hit but doesnt prevent the weird behavior in the response.

Topic		Replies	Views
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2624	October 12, 2024
Output/Input tokens - model gemini-1.5-flash-001 Gemini API gemini-15 , models	1	289	August 26, 2024
Can I increase max_output_tokens Gemini API api , models	2	2265	December 18, 2024
Tips on how to increase token output size in GenerateContentResponse? Gemini API gemini-15 , api , models	1	554	September 28, 2024
Proposed better handling of `MAX_TOKENS` finishReason Gemini API gemini-15 , feedback , api	6	2554	May 20, 2024

Gemini 1.5 flash continually generating same text until reach max limit of token

Related topics