Response Generation Stops Prematurely

Dalgakiran · February 27, 2025, 1:38pm

Hello everyone,

I am using the new Gemini API (google-genai) for Python to send chat message using the send_message_streaming function of the chat class. During the conversation, the model suddenly starts to send incomplete responses. The response structure is correct, but the candidate text contains either a word or just the first syllable.

When this happens, if I reply it like “What?”, “Continue.”, or “Go on” etc., the model keeps sending incomplete responses. But if I reply with an irrelevent message or question, the model stops acting like this and send a complete response.

In the chunk data, the finising reason is normal. There are no errors or no quota-related problems. The model is gemini-2.0-flash. What could be the problem?

Dalgakiran · March 1, 2025, 9:30pm

What an inactive forum it is. Anyway, it looks like switching to the OpenAI library has solved my problem. I’m still using the Gemini 2.0 Flash model, but with OpenAI’s Python package. Don’t use the google-genai package’s chat class for streamed responses, because it is broken.

Topic		Replies	Views
Get 'No generation chunks were returned' when use gemini-2.5-flash-preview-05-20 Gemini API bug , models	3	99	June 12, 2025
Out of nowhere...previously working code now gives Multiturn chat is not enabled for models/gemini-1.5-flash-002 Gemini API model-code , gemini-flash	8	255	January 10, 2025
AttributeError: Unknown field for Candidate: finish_message Gemini API bug , api , models	2	390	October 3, 2024
Streaming not working with OpenAI contract Gemini API api	1	80	May 21, 2025
Gemini flash 2.0 API sometimes would stop outputting (paused) Gemini API feedback , prompt	18	1334	March 6, 2025

Response Generation Stops Prematurely

Related topics