[Moderate Bug] JSON Leakage When Generating Text + Image in Same Response

Michael_Bowerman · December 31, 2025, 3:55am

Severity: P2 - Moderate (Frequent failure of intended functionality)

Product: Gemini 3 (Free Tier)

Summary:
When Gemini attempts to generate both text and an image in the same response, it frequently outputs raw JSON code for the image generation tool call instead of actually executing the tool. The failure rate is approximately 70-80%.

Reproduction Steps:

Start conversation with Gemini
Ask Gemini to generate an image with accompanying description
Observe that ~75-80% of the time, Gemini outputs raw JSON like:

{
  "action": "image_generation",
  "action_input": "A blue circle on a black background"
}

instead of actually generating the image

Expected Behavior:
When Gemini decides to generate an image alongside text, the image generation tool should execute successfully and the image should be displayed to the user.

Actual Behavior:

Approximately 75-80% failure rate
Raw JSON tool call appears in the response instead of image
In its next response, Gemini typically recognizes the error and apologizes
Retrying usually results in the same JSON leakage repeatedly
After anywhere from 1-5 attempts, image generation typically eventually succeeds

Impact:

Poor user experience requiring multiple retry attempts
Makes combined text+image generation unreliable
Frustrating workflow interruptions
Contradicts the intended seamless multimodal experience

Technical Analysis:
Suggests a parsing/execution failure in the tool-calling infrastructure. When Gemini generates both text and a tool call in the same response, the parser may fail to properly extract and execute the tool call, instead rendering it as literal text.

Possible causes:

Improper delimiter/boundary detection between text and tool call
Parsing logic that fails when tool call isn’t the sole content
State machine issue in processing mixed-modal responses

Workarounds:

Generate images in separate responses from text (but this contradicts natural conversation flow)
Retry multiple times until tool executes successfully
Note: This affects the workaround for Bug #1, where generating text+image in same response sometimes helps with image retrieval

Reproducible: Yes, ~75-80% failure rate

Test Conversation Links:

https://gemini.google.com/share/c62e203d11a5
https://gemini.google.com/share/55c89d8f891e (I was also needling it about the “vision” bug in this chat, but you can see the JSON leakage bug as well in several of its responses.)
Many more; these were just the ones I gathered offhand

Srikanta_K_N · December 31, 2025, 8:54am

Hi @Michael_Bowerman,

Thanks for taking the time to provide detailed insight.

Unfortunately, when I followed your exact prompts,(In chat history you have shared) I got the expected results, in this case both text and Image in the same response. And I didn’t get an output in JSON format.

Yes, there was an issue a few days back that the model was responding in JSON format even for normal natural language text, but it was fixed.

So, it would be really helpful if you can provide some more logs/prompts you have tried and corresponding results you got, so that we can try to reproduce and escalate to the internal team.

Thank you so much!

Michael_Bowerman · December 31, 2025, 6:32pm

Interesting, I had encountered it pretty consistently up until last night. I had managed to reproduce it approximately 5 minutes before creating this thread. However, I tried to reproduce it a few hours after, and was not able to reproduce it (Gemini successfully output a response that included text + image).

If I were to try come up with any sort of consistent pattern as to when this happens, I’d say that it’s usually when the model doesn’t immediately “know” from my input that it will be generating an image, but decides to do it after its initial scan of my input. But that’s just a guess. The conversation from https://gemini.google.com/share/c62e203d11a5 might seem to go against this pattern, but I’d say it fits it, since it leaked the JSON in a response to “Did you follow my instructions?” rather than a direct request/order to generate another image.

Below are some more links where I encountered the issue. Just search for “image_generation” in the transcript to jump to where it started leaking raw JSON. I believe these are all from within the last 72 hours. I have three more, but apparently I’m only allowed to include two links in a post (why? what a crazy restriction… how am I supposed to provide good evidence with only two links?), so I will make a second post below for the other two links.

https://gemini.google.com/share/02a236fdb68e (Happened 5 times in a row at one point, including times when Gemini was only trying to generate an image without any accompanying text. Please ignore the image I uploaded at the end. Is there a way I can truncate my conversations before sharing them?)

Let me know if you need more. I’m sure I can find more, but it’s unfortunately a little cumbersome using the search tool to try to find them, because it seems to use a fuzzy search and I’m not sure how to do an exact phrase match in the search. And also, I cannot open the results in a new tab, so I have to redo the search each time. And I can’t see in the search UI for sure whether the JSON leakage really did happen in that chat or not, until I open up the conversation. It makes it difficult to track which conversations I’ve already tried and which ones I haven’t.

Michael_Bowerman · December 31, 2025, 6:32pm

https://gemini.google.com/share/affcaf8e47ed (Happens several times in this one. Again, please ignore my meta-commentary… I evidently get too much of a kick out of needling the model about its bugs.

Michael_Bowerman · December 31, 2025, 6:35pm

https://gemini.google.com/share/52419b7c6fea (This is interesting because Gemini leaked the raw JSON right out of the gate, and it didn’t even use accompanying text)
https://gemini.google.com/share/60e1d1886637 (Again, happened twice right out of the gate, with no accompanying text. Also happened a couple times later in the conversation)

Michael_Bowerman · December 31, 2025, 6:37pm

I also want to say that the restriction that I can only include two links in a post just caused me a lot of annoyance. I had not pasted my originally-planned response with all links into a separate text editor, and I messed up the copying before I deleted the links, so I had to go hunt for the links again and rewrite my commentary on them.

Why does this restriction exist? Is one of the main purposes of this forum not to enable us to report bugs in Gemini? As Gemini produces stochastic outputs, I think it’d be ideal to link many conversations that demonstrate the bug, so that the developers can try to find a consistent pattern.

Michael_Bowerman · December 31, 2025, 9:58pm

I just reproduced this again. Here is a link to the conversation: https://gemini.google.com/share/345d54646ac5. The first JSON leakage in this conversation was reproduced by the “Thinking” model on the Android app. The subsequent three leakages were reproduced by the “Fast” model on web.

Srikanta_K_N · January 2, 2026, 4:58am

Hi @Michael_Bowerman,

Thank you for bringing this to our attention. We truly appreciate you flagging this issue, we will file a bug internally.

Michael_Bowerman · January 3, 2026, 5:25am

Here’s a conversation I just had a few minutes ago. It leaked the JSON for its image tool call 10 times in a row before it finally managed to successfully produce an image: https://gemini.google.com/share/3c5cc883ab5b

PPNYC · January 12, 2026, 5:09pm

I’m no expert, but I use Claude (pro $100 plan) and ChatGPT (paid) on a regular basis. Gemini has yet to deliver a single useable asset for me; deliveries are unformatted text in a Google Sheet at best. Prompting Gemini to “create a storyboard like the attached layout, replacing the images with the provided images attached”
I get:
{ “action”: “image_generation”, “action_input”: "{‘prompt’: "A professional graphic design for a ‘Social Campaign Storyboard’ in an 8.5x11 landscape layout. The document has a white background with neutral and black typography. At the top: ‘BRAND CAMPAIGN’ in large black letters, and ‘SOCIAL CAMPAIGN STORYBOARD’ beneath it…… (etc code code code)

I tried Fast, Thinking and Pro models. Gemini can’t deliver a single useable asset. How are people using this AI when it fails at a single deliverable? I went in circles for a full hour, and not a single image was delivered. If I give no image instructions or examples at all, Nano Banana will deliver a standalone “fantasy image” of its own, but prompt it to create any sort of document out of it and I get code.

Def not worth paying for a subscription as a “normie” who isn’t using it for code.

Skyball · March 1, 2026, 1:59pm

I have a similar experience. In 80% of the cases, the LLM just outputs the raw JSON. I can’t understand how such a basic functionality is still broken.

Topic		Replies	Views
A Json response should be json parsable Gemini API api	17	1669	July 4, 2024
[Critical Bug] Image Vision Retrieval Always Returns First Generated Image Gemini API bug , image-generation	1	54	December 31, 2025
Gemini 2.5 Pro inserting random text and format tokens around json responses Gemini API bug , api	40	2027	January 5, 2026
Gemini-2.5-flash-lite produces incorrect structured output Gemini API api-key , gemini-flash-2-5	5	429	December 23, 2025
Model enters infinite loop / fails to complete response when analyzing large uploaded JSON files Gemini API bug , api , gemini	1	39	February 23, 2026

[Moderate Bug] JSON Leakage When Generating Text + Image in Same Response

Related topics