Gemini 2.5 Pro with empty response.text

Ah thanks, now I know what you mean!

For the past two weeks, these three types of errors—empty responses, 500 errors, and 429 errors—have almost paralyzed API requests for gemini-2.5-pro. I hope the Google team fixes these problems as soon as possible.

429 means that you’ve exceeded the rate limit. Like TPM, RPM or RPD for your tier. Its connected to 500 and empty responses but not really api bug i suppose

Haha, thank you. I know what a 429 is. I meant the 429 is caused by retrying after too many empty responses and 500 errors.

1 Like

Hey guys, I’m wondering: are failed or empty requests billed by the API?

I’ve done some tests today and I noticed I was only charged for successful ones.

Does anyone have an idea of how long they will take to fix this issue, Im relying on this AI to run a clients system for me & I’ve tried Open AI & Claude & neither have the reasoning capabilities to generate the type of output I need them to output.

I think it will get better, i.e. about 85-90% of requests will be processed smoothly in a day or two. I think this is because, firstly, the problem is completely undermining the product’s credibility, secondly, it is already starting to resonate, and thirdly, they will soon start losing money because developers will start switching to competitors. These three factors are exacerbated by the current AI race, and Google simply cannot allow its flagship model to behave in this way. In addition, Logan Kilpatrick has appeared in the thread. So the problem is definitely a priority for the Gemini team - P0. They will most likely allocate more resources and roll back the updates that caused such a massive drop, which will be like taking an antipyretic, but the disease will remain. And judging by the fact that the problem was known about and has been around for several months, it is extremely difficult to detect and debug, and Google is a giant company that takes time to make changes, including in the Gemini API. I believe it will take months to fully resolve the issue. However, this is just my opinion.

I’m absolutely furious about this API mess that’s been dragging on for months it’s beyond frustrating that basic functionality has been broken for so long, with no end in sight. How can anyone seriously promote this as a leading or even viable option when the API is completely unusable week after week? It’s disappointing on every level, almost comical in how unreliable it is, and it feels like a total disregard for users who depend on it. This kind of ongoing failure is hard to wrap my head around in 2025 please, just get it sorted out already.

2 Likes

I have been using this model for 2 months. The issue is happening for 2 weeks. I dont know whether the issue has been going on for months.

It is hilarious how bad this is. More than half of my requests simply return something like this:

{'candidates': [{'content': {'role': 'model'}, 'finishReason': 'STOP', 'index': 0}], 'usageMetadata': {'promptTokenCount': 2477, 'totalTokenCount': 2496, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 1961}, {'modality': 'IMAGE', 'tokenCount': 516}], 'thoughtsTokenCount': 19}, 'modelVersion': 'gemini-2.5-pro', 'responseId': 'Sh-oaKrbMqGqkdUPyJuduAQ'}

And no amount of retrying, changing the prompt fixes that. It even does that with 2-turn chats with “Hello/How are you doing?”-type questions.

And they unironically call this “Production ready”…

5 Likes

I just found this post and a few others. It looks like I am having the same issue with 2.5 Pro, good to know it’s not just me, bad to know it’s an issue Gemini :frowning: I have posted here , looks like the same issue Gemini 2.5 Pro - Empty Response-Status 200

There are also these related links:

This thread:

1 Like

yeah, a complete turn-off for gemini

2 Likes

After somebody (Mohamed but I can’t find his post so sorry to Mohamed and also credit to him) suggested using Pro 2.5 in Vertex AI instead of the end to end API, and it worked. I ran multiple requests with no issue, then flipped back and forwards to the end to end vs Vertex on Pro 2.5 and only the Vertex one worked.

Setting up Vertex which was a little tricky especially finding my way around the Google Cloud settings, aside from setting up the service account for your project, you have to ensure the aim gserviceacount is set up and also ENABLED which did not seem to enable by default. You also have to make sure have to make sure that “Vertex AI API” also have user permission set up, or set it up as Vertex AI Administrator, in the IAM roles. It was tricky. Cursor AI managed to make the necessary integration after a half dozen attempts including it deciding to put a super low token request in my request!

It also has security by default which actually stops it working by default, so while setting it up I did indeed hit this issue. Here are a couple of videos that helped me set it up and fix the security issue

It looks like Google do also charge a tiny, tiny amount per 1,000 tokens for using Vertex.

Anyway, it looks like this is a way to get 2.5 Pro working. Hope the above is helpful to those not too familiar with the Vertex and Cloud setup.

2 Likes

Switching to Pro 2.5 via Vertex AI worked perfectly for me — no more failures.
Appreciate the tip!

Subject: Bug Report: Gemini 2.5 Pro returns empty response via OpenAI-compatible API due to suspected silent tool call failure.

Model: gemini-2.5-pro

Environment: Accessed via an OpenAI-compatible API layer.

Problem Description:
When sending simple, open-ended prompts (e.g., “Hello”, “What should I do now?”), the API returns a response with an empty choices array, completion_tokens: 0, and a finish_reason of “stop”, but total_tokens shows a non-zero value.

Steps to Reproduce:

Make a standard /v1/chat/completions call to the model.
Use a simple, non-specific prompt.
Do not include the tool_choice parameter.
Workaround:
The issue is consistently resolved by adding “tool_choice”: “none” to the JSON request body. This forces the model to avoid using tools and generates a proper text response.

Hypothesis:
The behavior suggests that the model is attempting a default, silent tool call which fails, causing the generation process to terminate prematurely without an error message.

That is for the OpenAI-compatible API, right? Do you happen to know the alternative of that for the native Gemini REST API?

Will this turn off the thinking? How does this work in a large context?

Hi everyone,

I’ve encountered an issue where gemini-2.5-pro (and occasionally gemini-1.5-flash) returns an empty response.text on the very first message of a chat when a system_instruction (containing a system prompt and/or long-term memory) is provided in the request. Notably, this occurs even when internet access is disabled, suggesting the problem is distinct from issues related to grounded search.

My hypothesis is that there might be an internal initialization bug within gemini-2.5-pro when the system_instruction is passed via its dedicated API parameter during the model’s initial setup for a new conversation.

To address this, I’ve implemented a workaround that has successfully resolved the empty response issue for me.

My Solution (Workaround):
Instead of passing the system_instruction directly to the genai.GenerativeModel constructor, I inject its content (which for me includes both the system prompt and long-term memory, combined into one string in my application logic) directly into the contents (message history) as a dummy user/model message pair.

Here’s the relevant code snippet from my GeminiProvider (specifically the _build_initial_context method):

    def _build_initial_context(self, model_name: str, prepared_messages: List[Dict[str, Any]], system_prompt: str):
        """
        Applies the bug workaround only to the gemini-2.5-pro model.
        """
        effective_system_prompt = system_prompt if system_prompt else None
        is_new_chat = len(prepared_messages) == 1 and prepared_messages[0]['role'] == 'user'
        
        is_bugged_model = model_name == 'gemini-2.5-pro' # Or 'gemini-1.5-flash' if the issue occurs there too
        
        # Apply the workaround only if: it's a new chat AND it's the specific bugged model AND there's a system prompt/memory.
        # Note: 'system_prompt' here already contains combined system instructions and long-term memory from my app's logic.
        if is_new_chat and is_bugged_model and effective_system_prompt:
            initial_context = []
            
            # Inject the system prompt/memory into the message history
            initial_context.extend([
                {'role': 'user', 'parts': [effective_system_prompt]},
                {'role': 'model', 'parts': ['Okay, I understand and will follow these instructions.']}
            ])

            final_contents = initial_context + prepared_messages
            return final_contents, None # system_instruction is now None, as it's handled in contents
        else:
            # In all other cases (other models, not a new chat), use the standard path.
            return prepared_messages, effective_system_prompt # system_instruction is passed as usual

How it works:
This method checks if the current request is the first in a chat for gemini-2.5-pro and if a system prompt (including long-term memory) is present. If so, it bypasses the system_instruction parameter in genai.GenerativeModel and instead prepends the system prompt to the contents (message history) as a dummy user message followed by a model response. This seems to make the model process the initial context more reliably.

This workaround has completely eliminated the empty response problem on the first turn for gemini-2.5-pro in my application. It’s important to note this is a temporary workaround and not a fix for the underlying API/model bug.

I hope this helps others facing similar challenges. Has anyone else encountered this or tried similar approaches?

Thanks!

It’s strange, but after posting this here, the method stopped working.

This gives me bad vibes.

I am a paying user of GitHub Copilot and Cursor. Because of this issue, many of my paid Gemini 2.5 Pro API requests were consumed by repeated retries. At first I thought the problem was with GitHub or Cursor, but after I could no longer tolerate it I subscribed to the Gemini 2.5 Pro API for debugging and discovered the same issue occurs even with raw API requests. It’s hard to imagine such a serious production incident persisting at a company like Google for so long without any acknowledgement or resolution.

I’m following up on my previous posts regarding gemini-2.5-pro issues. The situation has unfortunately worsened significantly.

Previously, I reported an “empty response.text” issue with gemini-2.5-pro on the initial chat turn when using system_instruction. I developed a workaround that involved injecting the system prompt (and long-term memory) directly into contents as a user/model message pair. This workaround successfully mitigated the problem for gemini-2.5-pro and is still working perfectly for gemini-2.5-flash with identical prompt structures (including very long ones).

However, as of very recently, gemini-2.5-pro has become completely unresponsive for me. It returns a StopIteration error immediately when attempting to read the first chunk from the generate_content stream. This happens consistently, regardless of prompt length (even with very short prompts), and my previous workaround is now also failing in this new, more severe way.

Key observations:

  1. Complete Unresponsiveness: model.generate_content(stream=True) for gemini-2.5-pro yields no chunks at all, resulting in an immediate StopIteration when next(iterator) is called.
  2. Affects All Prompts: This is happening even with a very short system prompt (e.g., 126 characters, as shown in the log below). This contradicts my earlier hypothesis about prompt length being the sole cause.
  3. Workaround Failure: My previously working workaround (injecting system prompt into contents) is no longer effective for gemini-2.5-pro.
  4. Model Specificity: gemini-2.5-flash continues to work perfectly fine with the exact same application logic and prompt structures (both short and long system prompts, with or without the workaround active).

This indicates a critical regression or a major, undocumented change in gemini-2.5-pro’s behavior or API validation rules, rendering the model completely unusable for my application. It implies that neither the direct system_instruction parameter (due to the original bug) nor the contents injection workaround is now viable for this specific model.

Could the Google AI team please investigate this as an urgent regression?
Are there any known service degradations, new undocumented validation rules for gemini-2.5-pro, or changes in how contents are processed for system-like instructions that would lead to this behavior? What is the currently recommended and working method to provide system instructions/context to gemini-2.5-pro?

Here is the relevant log from a run with a short system prompt (“Ассистент.txt” - 126 characters) and the workaround active:

--- DEBUG: Starting streaming response for model 'gemini-2.5-pro' ---
DEBUG: get_chat_response_stream: Config: {
  "api_key": "xxxxxxxxxxxxxxxxxxx",
  "model": "gemini-2.5-pro",
  "temperature": 1.0,
  "top_p": 0.95,
  "selected_prompt": "\u0410\u0441\u0441\u0438\u0441\u0442\u0435\u043d\u0442.txt"
}
DEBUG: get_chat_response_stream: Raw messages count: 1
DEBUG: get_chat_response_stream: System Prompt (initial): 'Ты умный и честный ассистент.
Отвечай чётко, подробно и по теме.
Отвечай только на основе фактов.
Не выдумывай ничего от себя....' (Length: 126)
DEBUG: get_chat_response_stream: Initial Long Term Memory: None
DEBUG: _prepare_messages: Preparing 1 raw messages for Gemini API.
DEBUG: _prepare_messages: Message 0 part 0 is text.
DEBUG: _prepare_messages: Finished. Prepared 1 messages for Gemini API.
DEBUG: _build_initial_context: Called for model 'gemini-2.5-pro'.
DEBUG: _build_initial_context: Initial prepared_messages count: 1
DEBUG: _build_initial_context: Has system_prompt: True. Has long_term_memory: False.
DEBUG: _build_initial_context: Activating Gemini 2.5 Pro initial context workaround for model 'gemini-2.5-pro'.
DEBUG: _build_initial_context: Adding effective_system_prompt to initial context. (Length: 126).
DEBUG: _build_initial_context: Workaround applied. Final contents length: 3. System instruction: None.
DEBUG: get_chat_response_stream: Final System Instruction: None
DEBUG: get_chat_response_stream: Final Contents (first 2 messages): [
  {
    "role": "user",
    "parts": [
      "\u0422\u044b \u0443\u043c\u043d\u044b\u0439 \u0438 \u0447\u0435\u0441\u0442\u043d\u044b\u0439 \u0430\u0441\u0441\u0438\u0441\u0442\u0435\u043d\u0442.\n\u041e\u0442\u0432\u0435\u0447\u0430\u0439 \u0447\u0451\u0442\u043a\u043e, \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e \u0438 \u043f\u043e \u0442\u0435\u043c\u0435.\n\u041e\u0442\u0432\u0435\u0447\u0430\u0439 \u0442\u043e\u043b\u044c\u043a\u043e \u043d\u0430 \u043e\u0441\u043d\u043e\u0432\u0435 \u0444\u0430\u043a\u0442\u043e\u0432.\n\u041d\u0435 \u0432\u044b\u0434\u0443\u043c\u044b\u0432\u0430\u0439 \u043d\u0438\u0447\u0435\u0433\u043e \u043e\u0442 \u0441\u0435\u0431\u044f."
    ]
  },
  {
    "role": "model",
    "parts": [
      "\u0425\u043e\u0440\u043e\u0448\u043e, \u044f \u043f\u043e\u043d\u044f\u043b \u0438 \u0431\u0443\u0434\u0443 \u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u044c \u044d\u0442\u0438\u043c \u0438\u043d\u0441\u0442\u0440\u0443\u043a\u0446\u0438\u044f\u043c."
    ]
  }
]
DEBUG: get_chat_response_stream: Total final_contents count: 3
DEBUG: get_chat_response_stream: Generation Config: GenerationConfig(candidate_count=None, stop_sequences=None, max_output_tokens=None, temperature=1.0, top_p=0.95, top_k=None, response_mime_type=None, response_schema=None, presence_penalty=None, frequency_penalty=None)
DEBUG: get_chat_response_stream: Safety Settings: {'HARM_CATEGORY_HARASSMENT': 'BLOCK_NONE', 'HARM_CATEGORY_HATE_SPEECH': 'BLOCK_NONE', 'HARM_CATEGORY_SEXUALLY_EXPLICIT': 'BLOCK_NONE', 'HARM_CATEGORY_DANGEROUS_CONTENT': 'BLOCK_NONE'}
DEBUG: get_chat_response_stream: Calling model.generate_content(stream=True)...
Window event: WindowEventType.BLUR
# ... (My expanded logging should appear here, but it's not. This suggests the StopIteration happens almost immediately after the call returns, before the iteration loop in ai_client.py gets to log "Received chunk")

--- ДЕТАЛЬНАЯ КРИТИЧЕСКАЯ ОШИБКА ---
Traceback (most recent call last):
  File "D:\DATA\FletChat Development V2 (New TTS)\ai_client.py", line 175, in get_chat_response_stream
    response_stream = model.generate_content(
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\google\generativeai\generative_models.py", line 329, in generate_content
    return generation_types.GenerateContentResponse.from_iterator(iterator)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\google\generativeai\types\generation_types.py", line 634, in from_iterator
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\google\api_core\grpc_helpers.py", line 116, in __next__
    return next(self._wrapped)
           ^^^^^^^^^^^^^^^^^^^
  File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\grpc\_channel.py", line 543, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "D:\DATA\FletChat Development V2 (New TTS)\python_embeded\Lib\site-packages\grpc\_channel.py", line 950, in _next
    raise StopIteration()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\DATA\FletChat Development V2 (New TTS)\main.py", line 1050, in stream_bot_response
    for chunk in response_stream:
                 ^^^^^^^^^^^^^^^
  File "D:\DATA\FletChat Development V2 (New TTS)\ai_client.py", line 228, in get_chat_response_stream
    except genai.types.BrokenStreamError as bse:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'google.generativeai.types' has no attribute 'BrokenStreamError'