For some prompts model fall under a infinite loop and generate a very long response that leads to repetitive lines and hit max limit of tokens.
it is very annoying, waste of tokens, money. Is there any solution of it
For some prompts model fall under a infinite loop and generate a very long response that leads to repetitive lines and hit max limit of tokens.
it is very annoying, waste of tokens, money. Is there any solution of it
Welcome to the forums!
What models are you seeing this in?
Can you give any example prompts that are causing the issue?
A workaround is to set the maxTokens
parameter to a lower value than the default 8k, but I assume that’s not what you’re asking about.
in model gemini-1.5-flash-8b why max token limit to “max_output_tokens”: 8192, can I increase it
There are several reasons, many of them somewhat technical, for why 8k is the current limit for Gemini 1.5 models. But it is worth noting that most other models don’t have more than 8k (the GPT4 models currently have 16k versions, but were at 8k until recently) and many have fewer than that.
A few big reasons there is a limit:
Since they are trained on a larger input context, it is often useful to take the limited tokens output and then send them, along with the rest of the context, with an instruction such as “continue”.