Gemini 1.5 flash continually generating same text until reach max limit of token

For some prompts model fall under a infinite loop and generate a very long response that leads to repetitive lines and hit max limit of tokens.

it is very annoying, waste of tokens, money. Is there any solution of it

Welcome to the forums!

What models are you seeing this in?
Can you give any example prompts that are causing the issue?

A workaround is to set the maxTokens parameter to a lower value than the default 8k, but I assume that’s not what you’re asking about.

in model gemini-1.5-flash-8b why max token limit to “max_output_tokens”: 8192, can I increase it

There are several reasons, many of them somewhat technical, for why 8k is the current limit for Gemini 1.5 models. But it is worth noting that most other models don’t have more than 8k (the GPT4 models currently have 16k versions, but were at 8k until recently) and many have fewer than that.

A few big reasons there is a limit:

  • As noted above, sometimes models start to “run away”. A max limit tends to stop that before it happens.
  • The models are tuned to produce this much output and they tend to otherwise behave poorly (in terms of resources, time, and quality of response) if you go further.
  • Relatedly, after some limit, they tend to “lose attention”.

Since they are trained on a larger input context, it is often useful to take the limited tokens output and then send them, along with the rest of the context, with an instruction such as “continue”.