[Moderate Bug] JSON Leakage When Generating Text + Image in Same Response

[Moderate Bug] JSON Leakage When Generating Text + Image in Same Response

Severity: P2 - Moderate (Frequent failure of intended functionality)

Product: Gemini 3 (Free Tier)

Summary:
When Gemini attempts to generate both text and an image in the same response, it frequently outputs raw JSON code for the image generation tool call instead of actually executing the tool. The failure rate is approximately 70-80%.

Reproduction Steps:

  1. Start conversation with Gemini
  2. Ask Gemini to generate an image with accompanying description
  3. Observe that ~75-80% of the time, Gemini outputs raw JSON like:
{
  "action": "image_generation",
  "action_input": "A blue circle on a black background"
}

instead of actually generating the image

Expected Behavior:
When Gemini decides to generate an image alongside text, the image generation tool should execute successfully and the image should be displayed to the user.

Actual Behavior:

  • Approximately 75-80% failure rate
  • Raw JSON tool call appears in the response instead of image
  • In its next response, Gemini typically recognizes the error and apologizes
  • Retrying usually results in the same JSON leakage repeatedly
  • After anywhere from 1-5 attempts, image generation typically eventually succeeds

Impact:

  • Poor user experience requiring multiple retry attempts
  • Makes combined text+image generation unreliable
  • Frustrating workflow interruptions
  • Contradicts the intended seamless multimodal experience

Technical Analysis:
Suggests a parsing/execution failure in the tool-calling infrastructure. When Gemini generates both text and a tool call in the same response, the parser may fail to properly extract and execute the tool call, instead rendering it as literal text.

Possible causes:

  • Improper delimiter/boundary detection between text and tool call
  • Parsing logic that fails when tool call isn’t the sole content
  • State machine issue in processing mixed-modal responses

Workarounds:

  • Generate images in separate responses from text (but this contradicts natural conversation flow)
  • Retry multiple times until tool executes successfully
  • Note: This affects the workaround for Bug #1, where generating text+image in same response sometimes helps with image retrieval

Reproducible: Yes, ~75-80% failure rate

Test Conversation Links:

Hi @Michael_Bowerman,

Thanks for taking the time to provide detailed insight.

Unfortunately, when I followed your exact prompts,(In chat history you have shared) I got the expected results, in this case both text and Image in the same response. And I didn’t get an output in JSON format.

Yes, there was an issue a few days back that the model was responding in JSON format even for normal natural language text, but it was fixed.

So, it would be really helpful if you can provide some more logs/prompts you have tried and corresponding results you got, so that we can try to reproduce and escalate to the internal team.

Thank you so much!

Interesting, I had encountered it pretty consistently up until last night. I had managed to reproduce it approximately 5 minutes before creating this thread. However, I tried to reproduce it a few hours after, and was not able to reproduce it (Gemini successfully output a response that included text + image).

If I were to try come up with any sort of consistent pattern as to when this happens, I’d say that it’s usually when the model doesn’t immediately “know” from my input that it will be generating an image, but decides to do it after its initial scan of my input. But that’s just a guess. The conversation from https://gemini.google.com/share/c62e203d11a5 might seem to go against this pattern, but I’d say it fits it, since it leaked the JSON in a response to “Did you follow my instructions?” rather than a direct request/order to generate another image.

Below are some more links where I encountered the issue. Just search for “image_generation” in the transcript to jump to where it started leaking raw JSON. I believe these are all from within the last 72 hours. I have three more, but apparently I’m only allowed to include two links in a post (why? what a crazy restriction… how am I supposed to provide good evidence with only two links?), so I will make a second post below for the other two links.

  • https://gemini.google.com/share/02a236fdb68e (Happened 5 times in a row at one point, including times when Gemini was only trying to generate an image without any accompanying text. Please ignore the image I uploaded at the end. Is there a way I can truncate my conversations before sharing them?)

Let me know if you need more. I’m sure I can find more, but it’s unfortunately a little cumbersome using the search tool to try to find them, because it seems to use a fuzzy search and I’m not sure how to do an exact phrase match in the search. And also, I cannot open the results in a new tab, so I have to redo the search each time. And I can’t see in the search UI for sure whether the JSON leakage really did happen in that chat or not, until I open up the conversation. It makes it difficult to track which conversations I’ve already tried and which ones I haven’t.

I also want to say that the restriction that I can only include two links in a post just caused me a lot of annoyance. I had not pasted my originally-planned response with all links into a separate text editor, and I messed up the copying before I deleted the links, so I had to go hunt for the links again and rewrite my commentary on them.

Why does this restriction exist? Is one of the main purposes of this forum not to enable us to report bugs in Gemini? As Gemini produces stochastic outputs, I think it’d be ideal to link many conversations that demonstrate the bug, so that the developers can try to find a consistent pattern.

I just reproduced this again. Here is a link to the conversation: https://gemini.google.com/share/345d54646ac5. The first JSON leakage in this conversation was reproduced by the “Thinking” model on the Android app. The subsequent three leakages were reproduced by the “Fast” model on web.

Hi @Michael_Bowerman,

Thank you for bringing this to our attention. We truly appreciate you flagging this issue, we will file a bug internally.

Here’s a conversation I just had a few minutes ago. It leaked the JSON for its image tool call 10 times in a row before it finally managed to successfully produce an image: https://gemini.google.com/share/3c5cc883ab5b

I’m no expert, but I use Claude (pro $100 plan) and ChatGPT (paid) on a regular basis. Gemini has yet to deliver a single useable asset for me; deliveries are unformatted text in a Google Sheet at best. Prompting Gemini to “create a storyboard like the attached layout, replacing the images with the provided images attached”
I get:
{ “action”: “image_generation”, “action_input”: "{‘prompt’: "A professional graphic design for a ‘Social Campaign Storyboard’ in an 8.5x11 landscape layout. The document has a white background with neutral and black typography. At the top: ‘BRAND CAMPAIGN’ in large black letters, and ‘SOCIAL CAMPAIGN STORYBOARD’ beneath it…… (etc code code code)

I tried Fast, Thinking and Pro models. Gemini can’t deliver a single useable asset. How are people using this AI when it fails at a single deliverable? I went in circles for a full hour, and not a single image was delivered. If I give no image instructions or examples at all, Nano Banana will deliver a standalone “fantasy image” of its own, but prompt it to create any sort of document out of it and I get code.

Def not worth paying for a subscription as a “normie” who isn’t using it for code.

I have a similar experience. In 80% of the cases, the LLM just outputs the raw JSON. I can’t understand how such a basic functionality is still broken.