Gemini 2.5 Flash doesn't have audio processing capability, but why?

I was testing my work, and suddenly the Gemini Flash 2.5 responded in a very weird way, saying that it can not process audio. At first, I couldn’t believe, as audio is a very basic form of data that most of the Gemini models should handle. But after testing more and more, including via API call and using UI from Google AI Studio, it’s TRUE, and it’s TRUE. What!? why 1.5 Flash, 2.0 Flash Lite, and up to 2.5 Pro, all can understand audio EXCEPT 2.5 Flash (both 2 released preview versions)

1 Like

New update: sometimes it say “Yes, I can” and sometimes it say “No, I can’t”. What I think: Google Gemini models even tho got advertised as native multimodal processing capability, what I am believe is that it’s actually calling other internal tools at Google side to achieve this “multimodal” requirements (the outcome is still considered as an actual multimodal capable AI model, but not so “native” to me), and if the video/image processing or audio processing is down, then it will loose the corresponding data type that it can “understand”. What do you think?

Hey @NguyenfromVN , I have been unable to reproduce the audio processing problem you encountered. According to the official documentation, m4a format isn’t supported.
Could you try using a different MIME type and see if the issue persists?

Hi, I think I know the reason, it’s not about m4a file, Gemini 2.5 Flash does understand this one, but sometimes it say “can process audio” and sometimes it say “can’t process audio”, this show something wrong happened at Gemini’s side when its audio processing sometimes not ready to use and caused this such response, so this issue is not consistent and hard to reproduce. Thanks for your response, I hope Google team can make the availability of the internal audio processing better, that’s all I wish and want. Have a nice day!