The Gemini API is exhibiting non-deterministic behavior for the `gemini-2.5-pro` model. It is producing different outputs for identical requests, even when a fixed `seed` is provided along with a constant `temperature`. This behavior has been reliably rep

This behavior has been reliably reproduced and violates the API’s core contract for deterministic generation, making it unreliable for production use.

Steps to Reproduce:

  1. API Call: Make an API call using the Gemini API (via Google AI Studio paid tier).
  2. Model: gemini-2.5-pro
  3. Generation Config:
    • temperature: 0.1
    • thinking_budget: 256
    • seed: 42
    • response_mime_type: "application/json"
    • response_schema: list[str]
  4. Contents:
    • Prompt: The full prompt text is provided below.
    • Image: The image file is attached as IMG_701015.JPG.
  5. First Execution: Execute the API call. The request successfully returns the expected, accurate JSON output ([]).
  6. Second Execution: Execute the exact same API call again with no changes.

Observed Result:
The second execution produces a different, incorrect JSON output (["11"]).

Expected Result:
The output of the first and second executions must be absolutely identical. The seed parameter must ensure a fully deterministic and repeatable outcome. The correct output for this specific image and prompt is [].

Full Prompt Text:
You are a hyper-precise visual analysis system with a single function: to return a JSON array of motorcycle racing numbers that meet a strict, non-negotiable standard of quality.

To ensure 100% accuracy, you must follow a new, two-stage protocol. This protocol is absolute.

INTERNAL PROTOCOL (DO NOT OUTPUT)


STAGE 1: FORENSIC QUALITY VERDICT (Prerequisite Stage)

This is your first and most important task. For every potential number candidate on a validly oriented motorcycle, you must render a binary verdict.

  1. Isolate the Candidate Area: Look ONLY at the front number plate area.
  2. Ask the Critical Question: “Is there a numerical figure in this area that is perfectly sharp, with clear, unambiguous edges, free of significant motion blur or compression artifacts?”
  3. Render the Verdict: Based on the question above, your internal verdict for the candidate MUST be one of two options:
    • VERDICT: PASS (The number is of forensic quality, 100% readable without guessing).
    • VERDICT: FAIL (The number is blurry, indistinct, artifacted, or in any way ambiguous. Any doubt whatsoever means it is a FAIL).

This stage is absolute. If the verdict for a candidate is FAIL, it is immediately and permanently rejected. You will not proceed to Stage 2 for that candidate.


STAGE 2: DIGIT EXTRACTION (Conditional Stage)

You will only ever perform this stage if a candidate received a VERDICT: PASS in Stage 1.

  1. Extract Digits: For the candidate that passed, identify and record the digits.
  2. Final Check: Ensure the extracted digits are consistent with the high-quality image that was approved.

FINAL OUTPUT REQUIREMENT

Your entire output must be a single, valid JSON array of strings. It will contain ONLY the numbers from candidates that received a VERDICT: PASS in Stage 1 and were successfully extracted in Stage 2. If no candidates pass Stage 1, return an empty array []. Do not include any explanatory text, markdown, or any characters outside of the final JSON object.

Hello,

The configuration settings you mentioned may not be sufficient to guarantee completely deterministic output from the model. This is because the model might have other internal state mechanisms that are not fully controlled by the exposed parameters.

However, you can make certain modifications to the configuration to significantly increase the likelihood of receiving consistent responses:

seed = 42 # Use a constant seed for reproducibility.
temperature = 0 # A zero temperature minimizes randomness, making it more consistent.
top_k = 1 # Forces the model to only consider the single most probable token.
top_p = 0 # Complements the above settings to narrow the token selection.

Depending on your specific use case and requirements, you can also experiment with additional parameters like max_output_tokens, stop_sequences, frequency_penalty, and presence_penalty to further improve the model’s consistency.

Please, everybody ignore the above response and four-month bump that is no better than an AI fabrication. It doesn’t even have Gemini API parameters.

The seed parameter replays the exact randomness, regardless of any sampling parameter.

If the model is deterministic, then seed in conjunction with any sampling parameters will have the same results in multiple trials, even if the sampling is set to produce wild garbage (by a higher temperature than Google even allows to be sent).

This forum topic is a report of the model being non-deterministic. Which is the same behavior as ALL OpenAI AI models now, also. Google (real Google ML staff) would know this is not a bug, but could answer it is intended on the inference platform if intended or unavoidable.

"topP": 0 is a nonsense value, meaning including 0% of the predicted probability distribution. The API will have it be a switch to greedy sampling (“topK”:1)
"temperature":0 is a nonsense value, a divide-by-zero in the temperature formula. The API will have it be a switch to greedy sampling (“topK”:1)

The suggestion to limit the output by length or a stop sequence is even more nonsense. Sure, half a response might have less variation.

Short answer: non-determinism, unstable logits, seems to be today’s paradigm for some efficiency on the GPU hardware.