Title: Critical Inconsistency: Gemini 3 Pro Image (Nano Banana Pro) Editing Performance Disparity (Web UI vs. API)

Hello everyone,

I am experiencing a severe inconsistency when performing image-to-image edits using the gemini-3-pro-image-preview model (referred to as “Nano Banana Pro” in some contexts) via the API, compared to the perfect results I achieve in the web interface.

The issue centers on the model’s inability to “lock” the foreground subject when asked to replace only the background, which is crucial for product visualization and consistency.

1. The Core Problem: Foreground Drift

  • Goal: Replace ONLY the background with a detailed scene (e.g., Japanese garden).

  • Result (API): The foreground object (a complex hot tub, in my case) is slightly but consistently altered. It changes in subtle details, internal reflections, shadow intensity, or minor geometric shapes, making the output unusable for high-fidelity product rendering.

  • Result (Web UI): The web interface performs this task perfectly. The foreground subject is absolutely locked, and only the background is regenerated.

2. My Setup and Configuration

  • Model: gemini-3-pro-image-preview

  • Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent

I have maximized stability and troubleshooting by:

  • Configuration: Removed unsupported parameters (thinkingConfig, mediaResolution) that caused 400 Bad Request errors.

  • Stability Settings: Set the temperature to the lowest stable value (0.1 or 0.01) and imageSize to "4K".

The generationConfig used:

JSON

{
  "generationConfig": {
    "temperature": 0.01,
    "responseModalities": ["IMAGE"],
    "imageConfig": {
      "imageSize": "4K"
    }
  }
}

3. Exhaustive Prompt Engineering

I used an extremely detailed, high-fidelity prompt to enforce foreground preservation, using multiple “DO NOT” clauses and explicit segmentation instructions, but the model still fails to adhere to the rigid lock instruction:

// --- SNIPPET OF MY PROMPT ---
REFERENCE (LOCKED — ABSOLUTE):
Use the uploaded image as the ONLY reference (Image A).
LOCK Image A completely.
DO NOT change ANYTHING from the smallest to the biggest detail.
DO NOT modify materials, textures, colors, lighting on the product, reflections, or shadows.
...
ONLY PERMITTED CHANGE (BACKGROUND ONLY):
Replace ONLY the plain white background with a realistic Japanese spring outdoor environment.
...
FAIL CONDITIONS:
If the angle, position, proportions, or any physical detail of the swim spa changes in any way, the result is INVALID.
// -----------------------------

4. Architectural Hypothesis and Request

The model is failing to achieve the consistency seen on the website because it is relying on text inference to define the foreground mask, which is imprecise.

The web UI is likely performing automated semantic segmentation to generate a mask before calling the underlying image model.

My main question to the community and Google engineers is:

Is there an undocumented or separate dedicated image editing endpoint (e.g., inpaint or editImage) for the Gemini API that allows us to explicitly pass a foreground mask or use a specific parameter to activate the same high-fidelity foreground-locking logic utilized by the web interface?

We need a documented method to match the web UI’s performance for consistent, professional product editing via the API.

Thank you for any insight or guidance on this critical consistency issue.

1 Like

Hi! This is exactly something I’m seeing with gemini-2.5-flash-image. Very consistent and better quality results in Google AI Studio vs what I get through API (with same input data in a form of an image and a prompt). Setting generationConfig same as what we can set in Google AI Studio doesn’t make a lot of change. This definitely looks like something is happening behind the scenes. And I get it - we don’t need to know every Google secret. What I don’t get is that AI Studio is being advertised as a playground for devs to test models before implementing them through API. But without this consistency - this does not make sense.

Hi @OpenYourEyes @Michal_Roki , Thanks for reaching out to us.

Could you please share the original reference image you are using along with the the specific API-generated output that shows the drift?

@Sonali_Kumari1 this is the original reference i use https:/ /imgur.com/a/obBmACG - picture of a swim spa product ,

and this is the prompt: USE THE UPLOADED SWIM SPA IMAGE AS THE ONLY REFERENCE
Lock the appearance, camera angle, perspective, scale, proportions, and aspect ratio exactly as in the uploaded reference image.
Do not alter, crop, rotate, stretch, zoom, reframe, or reposition the swim spa in any way.
The swim spa’s shape, size, materials, colors, jets, seating, and exterior panels must remain 100% identical to the reference image.
Maintain the original lighting, reflections, and shadows on the swim spa itself.

ONLY PERMITTED CHANGE:
Replace the background with a sunny outdoor Japan setting (clear blue sky, bright daylight).
The new background must be naturally composited and must not affect the swim spa’s geometry, orientation, or scale.

Any change beyond the background replacement is strictly prohibited.

and this is the output image imgur. com/a/obBmACG - image with background.

I already used a strict prompt but the result is still inconsistent . I only experience this when i use the API but not when i use the nano banana pro in the website using multi-turn editing . the problem here is even its only changing one element the result is inconsistent

Hi @OpenYourEyes , I have used the image from imgur.com/a/obBmACG along with the prompt shared by you to generate an image using Nano Banana Pro. Please find the generated image below.

Reference image:

Generated Image:

Did you generate it using nano banana pro API via raw http request? Because the inconsistency happens there

We’re seeing an inconsistency that appears to occur at the raw REST API level, not within any SDK or wrapper.

The issue happens when making a direct HTTP POST request to the Nano Banana Pro (Gemini image) API endpoint. Under the same request structure and parameters, the API intermittently returns different results, which suggests the behavior is originating from the underlying REST service rather than client-side handling.

this is the affected end point https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent

I’m trying to automate things in Make (Integromat), where I use the API endpoint to generate and edit an image using the prompt I showed you and the reference image I showed you, but the output isn’t consistent

https:// imgur. com/a/EZxfJxX - Screenshot of the HTTP request make module

Hello,

Based on the description provided, we were unable to reproduce the issue. The model appears to successfully modify the provided image when using the REST API. We utilized the following cURL command (generate_image.sh) and input JSON (request.json) for our test:

response=$(curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
    -H "x-goog-api-key: $GOOGLE_API_KEY" \
    -H 'Content-Type: application/json' \
    -d @request.json)

echo "$response" | jq -r '.candidates[0].content.parts[] | select(has("inlineData")) | .inlineData.data' | base64 --decode > inputs/image4.jpeg

echo "Image successfully saved to inputs/image4.jpeg"
{
      "contents": [{
        "parts":[
            {"text": "<PROMPT>"},
            {
              "inline_data": {
                "mime_type":"image/jpeg",
                "data": "<BASE64_STRING>"
              }
            }
        ]
      }],
      "generationConfig": {
            "temperature": 0.01,
            "responseModalities": ["Image"],
            "imageConfig": {
            "imageSize": "4K"
            }
        }
    }

Input & Output Images:


If you continue to experience this issue, could you please provide further details, such as the specific method used to call the REST API and a comparison of the expected versus observed output?