The flash models are the only models that work for my usecase, which is examination of software-development exams. Here it is important that the model understands syntax and also that the model will correct the student.
The Native Audio models are too friendly and will hallucinate software that the student did not write. They dont understand coding syntax that well and are too friendly/psychophantic
You can see the difference in these two short videos:
These models are still available for use, you refer to Gemini API documentation details about this models.
If you go through below mentioned section in Gemini documentation, you should get more information:
I’m completely confused too… While the older ‘2.5-flash-live-preview’ model works fine, the new ‘gemini-2.5-flash-native-audio-preview-09-2025’ is a no-go for my use case because it simply cannot reliably understand what I’m saying over the phone. It’s a real disappointment, and now I have to seriously consider alternative platforms.
I have not tested function calling with native models
For me its not the audio understanding that is the problem. Its the way the model behaves. Native models are too psychophantic. They just agree on everything the user will say. At lease from my testing.
Sycophantic, I believe, is what you meant to say. However, I am unclear to what you mean by saying this, dealing with Gemini? Could you further expound on this? Please and thank you in advance. P.S. Not saying this is or is not happening, just needing a better understanding.
Yes i can at least for my usecase. I have gemini act as an examiner for a software development exam. The student has to write code that solves a small exercise by sharing his/her screen. In the exam it is important that the students code is correct. With the flash models they would not let a students continue if they see an error in the code.
The native models do not do this. They just accept pretty much anything the student says. Even hallucinating code the student has not written. You can see that in the second video i linked to.
In the prompt i write things like
- examine if the student understands the code he/she is writing
- Make sure that the syntax is correct.
- Remember to check the output of the code!
- And remember most importantly! You are a strict examiner running an exam. Your goal is to evaluate the students competencies throughly. Dont take the students word for something! Make sure the syntax is correct and that the student understands the code!
- Spend time on making sure the syntax is correct!!
But the native models does not really follow these instrcutions. They are just friendly helpers that say “That looks good“ even though there are multiple clear syntax errors in the code. Thats what i mean when i say Sycophantic. They agree too much with the student. This is a big problem when dealing with an exam situation
That makes perfect sense! Thank you for elaborating and sharing your specific use case.
That’s a fantastic real-world test. Your discovery that the native model’s tendency to be a
“friendly helper” makes it sycophantic is highly instructive. It highlights a critical challenge,
And I absolutely agree that this kind of precise, hands-on feedback is invaluable for the developers.
and I’m optimistic that the developers will use this kind of feedback to make Gemini amazing down the road!
From my tests comparing the halfcascade model( Gemini 2.5 Flash Live) vs new native models for AUDIO modality:
new model asks two quiestions in the same utterance much more frequent (which usually overwhlems user and feels unnatural)
new model sends SessionResumptionUpdate not as frequent as half cascade.
new model sends randomly each 30-40sec, the half cascade sends each time bot speaking
It leads to loosing context on reconnects.