The Gemini API is exhibiting non-deterministic behavior for the `gemini-2.5-pro` model. It is producing different outputs for identical requests, even when a fixed `seed` is provided along with a constant `temperature`. This behavior has been reliably rep

This behavior has been reliably reproduced and violates the API’s core contract for deterministic generation, making it unreliable for production use.

Steps to Reproduce:

  1. API Call: Make an API call using the Gemini API (via Google AI Studio paid tier).
  2. Model: gemini-2.5-pro
  3. Generation Config:
    • temperature: 0.1
    • thinking_budget: 256
    • seed: 42
    • response_mime_type: "application/json"
    • response_schema: list[str]
  4. Contents:
    • Prompt: The full prompt text is provided below.
    • Image: The image file is attached as IMG_701015.JPG.
  5. First Execution: Execute the API call. The request successfully returns the expected, accurate JSON output ([]).
  6. Second Execution: Execute the exact same API call again with no changes.

Observed Result:
The second execution produces a different, incorrect JSON output (["11"]).

Expected Result:
The output of the first and second executions must be absolutely identical. The seed parameter must ensure a fully deterministic and repeatable outcome. The correct output for this specific image and prompt is [].

Full Prompt Text:
You are a hyper-precise visual analysis system with a single function: to return a JSON array of motorcycle racing numbers that meet a strict, non-negotiable standard of quality.

To ensure 100% accuracy, you must follow a new, two-stage protocol. This protocol is absolute.

INTERNAL PROTOCOL (DO NOT OUTPUT)


STAGE 1: FORENSIC QUALITY VERDICT (Prerequisite Stage)

This is your first and most important task. For every potential number candidate on a validly oriented motorcycle, you must render a binary verdict.

  1. Isolate the Candidate Area: Look ONLY at the front number plate area.
  2. Ask the Critical Question: “Is there a numerical figure in this area that is perfectly sharp, with clear, unambiguous edges, free of significant motion blur or compression artifacts?”
  3. Render the Verdict: Based on the question above, your internal verdict for the candidate MUST be one of two options:
    • VERDICT: PASS (The number is of forensic quality, 100% readable without guessing).
    • VERDICT: FAIL (The number is blurry, indistinct, artifacted, or in any way ambiguous. Any doubt whatsoever means it is a FAIL).

This stage is absolute. If the verdict for a candidate is FAIL, it is immediately and permanently rejected. You will not proceed to Stage 2 for that candidate.


STAGE 2: DIGIT EXTRACTION (Conditional Stage)

You will only ever perform this stage if a candidate received a VERDICT: PASS in Stage 1.

  1. Extract Digits: For the candidate that passed, identify and record the digits.
  2. Final Check: Ensure the extracted digits are consistent with the high-quality image that was approved.

FINAL OUTPUT REQUIREMENT

Your entire output must be a single, valid JSON array of strings. It will contain ONLY the numbers from candidates that received a VERDICT: PASS in Stage 1 and were successfully extracted in Stage 2. If no candidates pass Stage 1, return an empty array []. Do not include any explanatory text, markdown, or any characters outside of the final JSON object.

Hello,

The configuration settings you mentioned may not be sufficient to guarantee completely deterministic output from the model. This is because the model might have other internal state mechanisms that are not fully controlled by the exposed parameters.

However, you can make certain modifications to the configuration to significantly increase the likelihood of receiving consistent responses:

seed = 42 # Use a constant seed for reproducibility.
temperature = 0 # A zero temperature minimizes randomness, making it more consistent.
top_k = 1 # Forces the model to only consider the single most probable token.
top_p = 0 # Complements the above settings to narrow the token selection.

Depending on your specific use case and requirements, you can also experiment with additional parameters like max_output_tokens, stop_sequences, frequency_penalty, and presence_penalty to further improve the model’s consistency.