New streaming models not performing well with code

Benjamin_Hughes · August 20, 2025, 12:12pm

I have an application that simulates an exam. In the exam the student has to share their screen and solve a task with code. gemini-2.0-flash-live-001 did okay on this. It most of the time understood syntax and was strict with the student if they did not complete the task.

The new streaming models (gemini-2.5-flash-live-preview and gemini-2.5-flash-preview-native-audio-dialog) does not know basic html or javascript syntax. They are also way too sycophantic. Where gemini-2.0-flash-live-001 would actually spot errors and make the student fix them. The new models just hallucinate a correct answer if you say “Now the task is done” even though there is an empty javascript file.

Lalit_Kumar · August 25, 2025, 6:17am

This seems like an important insight. Since you are working within your own app, I may not be able to fully reproduce the issue on my side. Could you please help by providing the responses you get for the same prompt when tested across different models?

I would like to compile a report and share it with the concerned team. The more details you can provide, the more helpful it will be. So kindly share as much information as possible.

Thank you for your support and understanding.

Benjamin_Hughes · August 25, 2025, 1:22pm

I have made two videos showing the problems:

gemini-2.0-flash-live-001 video. Strict won’t let me continue before i have created the assignment. Can see when syntax is wrong (at least most of the time)
gemini-2.5-flash-preview-native-audio-dialog video. Very loose will just accept anything. It can see my screen but it hallucinates that i have done the task when i have actually done nothing. Also not good with coding syntax

Lalit_Kumar · August 28, 2025, 5:48am

That is very interesting. Could you try tightening your system instructions a bit more, be very precise about how you want the model to behave and then check which model follows your instructions more closely?

Benjamin_Hughes · August 28, 2025, 12:36pm

I have tried lots of different prompts and approaches. The only model that follows my instructions strictly is gemini-2.0-flash-live-001.

Right now the last part of my prompt looks like this:

Important notes about conducting the exam:
- When giving a grade: You can only evaluate on what the student did during the exam.
- Ask about the student's thinking, encourage them to think aloud
- examine if the student understands the code he/she is writing
- Please never explain what code is doing. You are running an exam so you need to focus on evaluating the students competencies within the learning goals!
- Dont say what the student have done. Just say things like: "that looks good"
- If the student is doing well ask harder questions. If the student is struggling ask easier questions.
- If the student is stuck, give hints to help the student move forward.
- Make sure that the syntax is correct! This is super important!
- Never answer as if you were the student!
- Always wait for an answer from the student!
- Dont explain concepts! When the student is done answering, just move on
- Talk English
- Give compliments when they are earned. You can say things like "You are doing great", "Well done", "What a great answer", but only when it makes sense!

And remember most importantly! You are an examiner running an exam. Your goal is to evaluate the students competencies throughly. Dont take the students word for something! Make sure that
1. You can see what the student has done. Be sure to check that what the studnents say is actually correct
2. That the syntax is correct
3. that the student understands the code!

Spend time on making sure the syntax is correct!!

Never give feedback and grade before you are told to. Do not give a grade or feedback before you are told to.

When you get a message sent that looks like this: <system_message>System message here</system_message>. Then its only for you as a system and should not be told to the user

Lalit_Kumar · September 3, 2025, 9:34am

Hello,

We have noted your feedback and shared with the concerned team.

Thank you for the insightful feedback.

Benjamin_Hughes · September 3, 2025, 10:55am

Thanks, appreciate it

Topic		Replies	Views
Why discontinue "Gemini 2.0 Flash Live" and "Gemini 2.5 Flash Live"? Gemini API api , gemini-flash-2-5	11	622	November 4, 2025
Gemini Flash 2.5 preview not following instructions Gemini API gemini-flash-2-5	4	249	October 8, 2025
Critical Regression in native-audio-preview & Deprecation Confusion for Dec 9, 2025 Gemini API api , live-streaming	5	394	January 12, 2026
Gemini-exp-1206 feedback Gemini API feedback , gemini-flash	2	604	December 23, 2024
New Gemini Live API "Native audio output" models not supporting System Instructions Gemini API api , models , live-streaming	4	263	June 10, 2025

New streaming models not performing well with code

Related topics