Hello everyone,
I am experiencing a severe inconsistency when performing image-to-image edits using the gemini-3-pro-image-preview model (referred to as “Nano Banana Pro” in some contexts) via the API, compared to the perfect results I achieve in the web interface.
The issue centers on the model’s inability to “lock” the foreground subject when asked to replace only the background, which is crucial for product visualization and consistency.
1. The Core Problem: Foreground Drift
-
Goal: Replace ONLY the background with a detailed scene (e.g., Japanese garden).
-
Result (API): The foreground object (a complex hot tub, in my case) is slightly but consistently altered. It changes in subtle details, internal reflections, shadow intensity, or minor geometric shapes, making the output unusable for high-fidelity product rendering.
-
Result (Web UI): The web interface performs this task perfectly. The foreground subject is absolutely locked, and only the background is regenerated.
2. My Setup and Configuration
-
Model:
gemini-3-pro-image-preview -
Endpoint:
https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent
I have maximized stability and troubleshooting by:
-
Configuration: Removed unsupported parameters (
thinkingConfig,mediaResolution) that caused400 Bad Requesterrors. -
Stability Settings: Set the
temperatureto the lowest stable value (0.1or0.01) andimageSizeto"4K".
The generationConfig used:
JSON
{
"generationConfig": {
"temperature": 0.01,
"responseModalities": ["IMAGE"],
"imageConfig": {
"imageSize": "4K"
}
}
}
3. Exhaustive Prompt Engineering
I used an extremely detailed, high-fidelity prompt to enforce foreground preservation, using multiple “DO NOT” clauses and explicit segmentation instructions, but the model still fails to adhere to the rigid lock instruction:
// --- SNIPPET OF MY PROMPT ---
REFERENCE (LOCKED — ABSOLUTE):
Use the uploaded image as the ONLY reference (Image A).
LOCK Image A completely.
DO NOT change ANYTHING from the smallest to the biggest detail.
DO NOT modify materials, textures, colors, lighting on the product, reflections, or shadows.
...
ONLY PERMITTED CHANGE (BACKGROUND ONLY):
Replace ONLY the plain white background with a realistic Japanese spring outdoor environment.
...
FAIL CONDITIONS:
If the angle, position, proportions, or any physical detail of the swim spa changes in any way, the result is INVALID.
// -----------------------------
4. Architectural Hypothesis and Request
The model is failing to achieve the consistency seen on the website because it is relying on text inference to define the foreground mask, which is imprecise.
The web UI is likely performing automated semantic segmentation to generate a mask before calling the underlying image model.
My main question to the community and Google engineers is:
Is there an undocumented or separate dedicated image editing endpoint (e.g., inpaint or editImage) for the Gemini API that allows us to explicitly pass a foreground mask or use a specific parameter to activate the same high-fidelity foreground-locking logic utilized by the web interface?
We need a documented method to match the web UI’s performance for consistent, professional product editing via the API.
Thank you for any insight or guidance on this critical consistency issue.


