The results from the Gemini 2.0 Flash website don't match the API results for same thing

yang_fa · March 20, 2025, 5:33am

The model and parameters are identical to those on the official website.

Govind_Keshari · March 20, 2025, 5:46am

Hi @yang_fa, Welcome to the forum!!

Are you using Google AI Studio?? Can you please provide any example screenshot to understand the situation better.

Thanks

yang_fa · March 20, 2025, 6:03am

yes, I’m using Gemini 2.0 flash.
An example is here:

prompt:

Instruction:
Given the coordinates of text boxes in an image, determine whether adjacent boxes belong to the same paragraph by considering both visual and semantic information. Follow these steps:
Analyze Visual Position:
Note that vertical stacking alone does not imply the same paragraph. The visual layout must "look like a paragraph" based on typical paragraph formatting conventions.
The ID of each box is located on the left side of the box.

Evaluate Text Semantic Content:
Check if the text in adjacent boxes is logically coherent (e.g., continuation of the same sentence, same theme, or related context).
Look for punctuation marks (e.g., periods, commas) that might indicate sentence boundaries.
Ensure that the text forms a fluent paragraph with appropriate connecting words.
Remember that content relevance does not necessarily mean they belong to the same paragraph in semantics.
The text should not be treated as individual information units but as parts of a coherent paragraph.
[important!]The text should be treated as parts of a coherent paragraph, not as individual information units.
[important!]When assessing semantic coherence, try concatenating the text from adjacent boxes to see if it forms a fluent sentence or paragraph.
Combine Information:
Use both visual and semantic cues to decide if adjacent boxes belong to the same paragraph.
Boxes with aligned right edges and coherent text content are likely part of the same paragraph.
[important!]Even if boxes are vertically stacked and content-related, they may not belong to the same paragraph unless they visually "look like a paragraph" and form a coherent text.
Output Format:
Return the results exclusively in the format [[0], [1, 2], [3, 4, 5], ...], where each sublist contains the IDs of boxes belonging to the same paragraph.
Do not include any additional text, explanations, or content.
Ensure that all text boxes in the image are accounted for in the output.
Note:
Even if boxes are visually stacked and contain similar content, they should not be considered part of the same paragraph if there is no logical connection between them.
A paragraph should have a fluent flow of sentences and connecting words, not just individual pieces of information.
You should not overly focus on textual coherence and neglect visual formatting cues.
Consider all text boxes in the image and ensure none are omitted from the output.
[important!]Special attention should be given to author names and affiliations: even if they are visually close, they should be treated as separate paragraphs due to their typical similar formatting (font, size, etc.) and semantic independence.
[important!]Special attention should be given to author names: even if they are visually close, they should be treated as separate paragraphs due to their typical similar formatting (font, size, etc.) and semantic independence.
[important!]Do not rely overly on thematic relevance; instead, consider semantic coherence and visual continuity.
Please group title texts together. Titles are typically characterized by larger font sizes, bold formatting, and text that summarizes the main content of a document or section.
Example:
Example 1:
Input: Two adjacent boxes with text "John Doe" and "ABC Corporation".
Output: [[0], [1]]
Explanation: Even though the boxes are adjacent and contain related content (a name and an institution), they do not form a coherent paragraph and should be treated as separate.
Example 2:
Input: Two adjacent boxes with text "Yuan Liu1,2∗ Cheng Lin2∗ Zijiao Zeng2 Xiaoxiao Long1† Lingjie Liu3" and "Taku Komura1 Wenping Wang4".
Output: [[0], [1]]
Explanation: Even though the boxes are adjacent and contain related content (name), they do not form a coherent paragraph and should be treated as separate.

The output of the Google AI studio is completely correct. But the output from API is: [[0], [1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11], [12, 13, 14], [15, 16, 17, 18], [19, 20, 21, 22, 23], [24, 25, 26], [27]]

Govind_Keshari · March 20, 2025, 6:09am

Hey @yang_fa, Let me get back to you on this.

Juan_Mugica · April 6, 2025, 11:10am

Hello, any news on this? I’m also getting very different results on API vs Google Studio for the same exact instructions, tools, and parameters.

Govind_Keshari · June 19, 2025, 9:07am

I checked with the team, It should not vary so much. I checked from my side as well with different prompts but it’s working fine. Not getting much reports on this so assuming it might be a hallucination and intermittent. Can you please check with latest more powerful models like 2.5 Flash and pro.

Let me know if this is the same case.

Thanks.

Topic		Replies	Views
Flash 2.5 PDF Analysis - AI Studio vs API Gemini API ai-studio , api	3	176	April 19, 2025
Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25) Gemini API feedback , api , gemini-25 , gemini-2-5	7	481	June 4, 2025
Inaccurate Bounding Box for forms Gemini API api	9	201	June 20, 2025
Using Grounding with Apps Script Google AI Studio gemini-15 , api	7	315	December 11, 2024
Issue with Gemini 1.5 Pro EXP API: Getting Different Results Compared to AI Studio Playground Gemini API gemini-15 , api , models	0	179	October 25, 2024

The results from the Gemini 2.0 Flash website don't match the API results for same thing

Related topics