[Critical Bug] Image Vision Retrieval Always Returns First Generated Image

Severity: P0 - Critical (Core feature completely broken)

Product: Gemini 3 (Free Tier)

Summary:
When Gemini generates multiple images in a conversation and attempts to view them using its vision capabilities, it consistently returns the first generated image regardless of which image is requested. This makes it impossible for Gemini to accurately describe or analyze any images it generates after the first one.

Reproduction Steps:

  1. Start a new conversation with Gemini
  2. Ask Gemini to generate an image (e.g., “Generate a red square”)
  3. Ask Gemini to generate a second, different image (e.g., “Generate a blue circle”)
  4. Ask Gemini to look at and then describe the second image it just generated
  5. Observe that Gemini describes the first image (red square) instead of the second (blue circle)

Expected Behavior:
Gemini should be able to view and accurately describe each image it generates, referencing the correct image data for each position in the conversation history.

Actual Behavior:

  • Single-image vision tool: Returns the first generated image 100% of the time when viewing subsequent Gemini-generated images in standalone outputs (no accompanying text)
  • Multi-image vision tool: Inconsistently returns either correct images or n copies of the first image (no clear pattern identified)
  • Exception: Single-image tool works correctly for user-uploaded images
  • Exception: If Gemini generates both text and an image in the same response, it can see that specific image correctly

Impact:

  • Gemini cannot reliably describe, analyze, or reference its own generated images
  • Users receive hallucinated descriptions of images
  • Multi-turn image generation workflows are completely broken
  • This undermines trust in Gemini’s multimodal capabilities

Technical Analysis:
The bug appears to be in the image retrieval backend, likely in how images are indexed/cached in conversation history:

# Current (broken) behavior appears to be:
def get_conversation_image(image_index):
    return conversation.images[0]  # Always returns first image

# Expected behavior:
def get_conversation_image(image_index):
    return conversation.images[image_index]  # Returns requested image

Workarounds:

  • None reliable for single-image retrieval
  • Multi-image retrieval sometimes works but is inconsistent
  • User-uploaded images can be viewed correctly
  • Generating image with text in same response may work (but triggers separate JSON leakage bug ~75% of the time)

Additional Context:
When using multi-image retrieval, Gemini’s internal reasoning shows it correctly receives metadata indicating image_generation_content/0 is being returned, suggesting the bug is in the backend API that retrieves images, not in Gemini’s tool-calling logic.

Reproducible: Yes, 100% reproducible for single-image tool on Gemini-generated images after the first one

Test Conversation Links:

Hi @Michael_Bowerman,

Thank you for bringing this to our attention. We truly appreciate you flagging this issue, we will file a bug internally.