I’m looking for some guidance regarding an issue with Gemini 3.0 Pro Preview (and 2.5 Pro) where the API returns a successful finishReason: STOP, but the content.parts field is empty. This happens even though the usageMetadata often shows that tokens (including reasoning tokens) were consumed during the process.
This behavior occurs specifically when processing unified code diffs from well-known open-source repositories. The model appears to suppress the output without returning a specific error like RECITATION or SAFETY.
Safety Settings: All categories are set to BLOCK_NONE.
Environment: Python Google GenAI SDK.
Account Type: Tier with billing enabled.
Is there any known technical reason why the model would return a successful status but no content for these types of inputs? Any help on how to troubleshoot this would be appreciated.
I’m experiencing the same issue as you.
I’m using gemini-3-flash-preview.
I’m sending thousands of images for OCR, and many of them are returned with an empty response from the model.
From what I’ve seen online, this is a known issue, and Google still hasn’t fixed it.
In the meantime, we’re just burning tokens.
It appears you are hitting an Internal Output Suppression event. When usageMetadata shows consumed tokens but empty parts, the model is successfully generating a response (and likely reasoning), but a secondary validation layer is intercepting the output. You can try experiment by wrapping the diff in a more explicit container (e.g., <git_diff> ... </git_diff>) and explicitly setting a high max_output_tokens (e.g., 8192) to ensure reasoning isn’t crowding out the response. If the diff contains PII (emails/IDs), try redacting them, as those can trigger silent blocks even on BLOCK_NONE.
It sounds like the ‘Confidence Dropout’ specific to the Flash models. When Flash is unsure of OCR accuracy due to image quality or internal filters, it defaults to a successful STOP without content.. You may try adding a ‘JSON Mode’ constraint or a system instruction that forces the model to return at least a status field (e.g., {"text": "none_detected"}). This often prevents the empty-part return. Additionally, check if your images contain PII or faces, as these can trigger silent post-generation blocks even on BLOCK_NONE.
I opened a new Gemini account using a different email and set up a paid Tier 1 account, identical to my current one. I generated a new API key and ran the exact same code in parallel using the two different keys.
The results are baffling: The code running on the new API key receives responses from the model 5 times faster. Out of 1000 frames processed, I didn’t get a single empty or ‘NONE’ response.
In contrast, using the old API key, the model’s response time was extremely slow, and I received hundreds of empty/NONE responses.
How can this be explained? I am using the exact same code for both.
I’m experiencing the Confidence Dropout issue as well. Around 20-30% of my json constraint runs on Gemini 3 Flash Preview end with a STOP and no response after wasting like ~800 thinking tokens. Gemini 2.5 Flash works perfectly fine with the same prompt. It’s so bad that I’m thinking of swapping models entirely.