I’m running into an issue with Gemini 2.5 Flash when using structured JSON output through LiteLLM. In some cases, the model enters a repetition loop and keeps generating duplicated tokens or JSON fragments until the max output token limit is reached. The final response is often malformed or incomplete JSON.
Has anyone seen this specifically with Gemini 2.5 Flash + LiteLLM?
im using the same stack as well, im facing the same issue as well, i suspect this is a intermittently occurring issue since it happened before i even integrated litellm to my stack previously. it was repeating tokens/outputs until the max_token limit which resulting in a malformed JSON. tested with 2.5 flash-lite mainly as well 3.1 flash and 2.5 flash, it was having the same issue. This might be related but i was fixing the temperature=0 to ensure close reproducibility, but tweaking it higher seem to reduce this repeating token effect, though causing the output to be less accurate