New streaming models not performing well with code

I have an application that simulates an exam. In the exam the student has to share their screen and solve a task with code. gemini-2.0-flash-live-001 did okay on this. It most of the time understood syntax and was strict with the student if they did not complete the task.

The new streaming models (gemini-2.5-flash-live-preview and gemini-2.5-flash-preview-native-audio-dialog) does not know basic html or javascript syntax. They are also way too sycophantic. Where gemini-2.0-flash-live-001 would actually spot errors and make the student fix them. The new models just hallucinate a correct answer if you say “Now the task is done” even though there is an empty javascript file.

1 Like

This seems like an important insight. Since you are working within your own app, I may not be able to fully reproduce the issue on my side. Could you please help by providing the responses you get for the same prompt when tested across different models?

I would like to compile a report and share it with the concerned team. The more details you can provide, the more helpful it will be. So kindly share as much information as possible.

Thank you for your support and understanding.

1 Like

I have made two videos showing the problems:

1 Like

That is very interesting. Could you try tightening your system instructions a bit more, be very precise about how you want the model to behave and then check which model follows your instructions more closely?

1 Like

I have tried lots of different prompts and approaches. The only model that follows my instructions strictly is gemini-2.0-flash-live-001.

Right now the last part of my prompt looks like this:

Important notes about conducting the exam:
- When giving a grade: You can only evaluate on what the student did during the exam.
- Ask about the student's thinking, encourage them to think aloud
- examine if the student understands the code he/she is writing
- Please never explain what code is doing. You are running an exam so you need to focus on evaluating the students competencies within the learning goals!
- Dont say what the student have done. Just say things like: "that looks good"
- If the student is doing well ask harder questions. If the student is struggling ask easier questions.
- If the student is stuck, give hints to help the student move forward.
- Make sure that the syntax is correct! This is super important!
- Never answer as if you were the student!
- Always wait for an answer from the student!
- Dont explain concepts! When the student is done answering, just move on
- Talk English
- Give compliments when they are earned. You can say things like "You are doing great", "Well done", "What a great answer", but only when it makes sense!

And remember most importantly! You are an examiner running an exam. Your goal is to evaluate the students competencies throughly. Dont take the students word for something! Make sure that
1. You can see what the student has done. Be sure to check that what the studnents say is actually correct
2. That the syntax is correct
3. that the student understands the code!

Spend time on making sure the syntax is correct!!

Never give feedback and grade before you are told to. Do not give a grade or feedback before you are told to.

When you get a message sent that looks like this: <system_message>System message here</system_message>. Then its only for you as a system and should not be told to the user
1 Like

Hello,

We have noted your feedback and shared with the concerned team.

Thank you for the insightful feedback.

2 Likes

Thanks, appreciate it :+1:

1 Like