Gemini-2.0-flash-thinking-exp model returns incorrect answer in 1 out of 5 runs

ramkumarkoppu · December 25, 2024, 5:08pm

gemini-2.0-flash-thinking-exp model answers most of the time but also incorrect 1 out 5 runs:

for example with the following code fragment,

MODEL_ID = "gemini-2.0-flash-thinking-exp"
client = genai.Client(http_options={'api_version':'v1alpha'})
img = Image.open("pool.png")
display(img)

thoughts_header_displayed = False
answer_header_displayed = False

for chunk in client.models.generate_content_stream(model=MODEL_ID, contents=[img, "How do I use three of these numbers to sum up to 30"]):
    for part in chunk.candidates[0].content.parts:
        if part.thought:
            if not thoughts_header_displayed:
                display(Markdown("## Thoughts"))
                thoughts_header_displayed = True
        else:
            if not answer_header_displayed:
                display(Markdown("## Answer"))
                answer_header_displayed = True

    display(Markdown(part.text))

it gets correct answer

You can use the numbers 11, 13, and by turning

the number 9 upside down to make it a 6.

11 + 13 + 6 = 30

but some times, it returns like this:

It's not possible to use three of these numbers (7, 9, 11

, 13) to sum up to exactly 30 using standard addition. Let's check all the combinations:

* 7 + 9 + 11 = 27
* 7 + 9 + 13 = 29
* 7 + 11 +

13 = 31

* 9 + 11 + 13 = 33

None of these combinations result in a sum of 30. Perhaps this is a bit of a trick question!

Topic		Replies	Views
Solving problems using Gemini AI Gemini API gemini	2	140	October 15, 2024
Gemini 2.0 Flash will not do simple arithmetics, ignores system instructions Gemini API gemini-flash	1	224	January 23, 2025
OpenAI compatibility bug in response object Gemini API feedback , bug	1	167	January 2, 2025
Reasoning tokens combined with completion tokens in OpenAI compatibility mode Gemini API api , ai	6	404	April 18, 2025
Failing to use the API (2.5 pro) - Why Google needs to overcomplicate things? Gemini API api	0	128	April 7, 2025

Gemini-2.0-flash-thinking-exp model returns incorrect answer in 1 out of 5 runs

Related topics