Gemini 1.5 refuses to process audio files

tocsa · September 19, 2024, 12:39am

I’m using the Gemini API from Flutter, more specifically the google_generative_ai | Dart package package.
I tried these models (the package uses under the hood the v1beta):

gemini-1.5-pro
gemini-1.5-flash
gemini-1.5-flash-preview (which worked two weeks ago, but now it doesn’t find the model)
gemini-1.5-flash-latest (which didn’t work two weeks ago but now this model is found)

I tried the following file formats:

ogg (Ogg Opus mono, 22kHz, ~128 kbit)
m4a (AAC LC, mono, 22kHz)
wav (mono 16 bit PCM, 22kHz)
mp3 (mono, 128kbit rate average)

For all of these I attach the files to the API call as a Part with the proper MIME type. For all the tries (except when the model is not found right out of the bat), I get something along the line of:

I am sorry, I cannot process any audio you share with me.
I am sorry, I cannot process any audio files.

Two weeks ago I remember getting back things like (Add "Shazam mode" · Issue #38 · CsabaConsulting/InspectorGadgetApp · GitHub):

I can’t hear any music playing right now.
I can’t hear any audio right now.
This suggested that the model is capable of processing audio files I’m just either prompting bad or some other problem. With the latest responses, where the model flat out states it cannot process audio files, I’m paused.

Is that true? I thought Gemini-1.5 can process video, and in that case I don’t see why it cannot process audio, which is technically a stream within a video file container. Did anyone able to perform multi-modal operations involving audio files?

Or will I have to somehow augment a dummy empty video stream and bundle the audio stream with it to “trick” the model and overcome this limitation?

tocsa · September 19, 2024, 12:57am

Another note: I tried to attach the files in the Gemini Advanced web app for a good measure, however it doesn’t seem to handle any audio files.

tocsa · September 19, 2024, 1:05am

Gemini Advanced also cannot take any video. I went to the AI Studio and it was not able to receive the ogg file from Google Drive (got a 500 error), however when I directly uploaded it from my machine it was able to process it. Then I could instruct it to recognize the music playing. It didn’t guess right but at least it seems to work there. I wonder if what I’m experiencing is some Dart Gemini API related quirk?

afirstenberg · September 19, 2024, 12:21pm

Is it using an inlineData part or a fileData part under the hood?

I didn’t think that audio was ever accepted using inlineData, you needed to upload the audio first and then use fileData.

tocsa · September 19, 2024, 1:06pm

Ooooh! I’m using inline data. I’ll try file data next then. I should do that for any file such as image or PDF as well?

afirstenberg · September 19, 2024, 1:11pm

Inline works for images. Less sure about PDF (particularly since it extracts it into multiple parts).

At this point, I do almost everything using fileData. Not much reason not to, and there are several advantages to doing so beyond the prompt itself.

Good luck!

tocsa · September 19, 2024, 4:42pm

Aaand now I remember why I didn’t use the FileData (generative-ai-dart/pkgs/google_generative_ai/lib/src/content.dart at ec5a820166fdb05fb5b387efab31eccce9d4072f · google-gemini/generative-ai-dart · GitHub). It’s part of the API, but the “Google AI File Service API” to actually upload the file is nowhere to be found.
Files API · Issue #211 · google-gemini/generative-ai-dart · GitHub

afirstenberg · September 19, 2024, 4:53pm

Doh! And you and I had that conversation already this week and I forgot. Sorry. /:

Doing the POST isn’t that difficult, however, if you want to look into that route. Happy to give you some straight REST examples if you want, but I don’t know dart.

tocsa · September 19, 2024, 5:58pm

You are being very helpful and advising. There’s the REST call as a workaround, however I’m revisiting firebase_vertexai | Flutter package again, and probably will switch over from BYO API key to BYO Firebase project method. My complete solution will contain some Cloud Functions anyway (for TTS / STT), so maybe it’s even better from a technical user perspective to deal with only Firebase and not a hodge-podge of other things as well. The project in its current form is not for grandma anyway.

Topic		Replies	Views
Gemini 2.5 Flash doesn't have audio processing capability, but why? Gemini API ui , gemini-flash-2-5	3	155	June 4, 2025
Gemini-1.5-flash is no longer processing audio files (500 Exception) - retry does not help Gemini API gemini-15 , bug , models , audio	4	100	April 9, 2025
More audio file type support in (openai-compatible) api? Gemini API audio , openai_compatibility	4	181	June 13, 2025
Gemini flash 1.5 8B having an error with not generating content in audio file Gemini API models , audio	2	65	May 14, 2025
Transcribe text to text and vice versa, speech to speech and image to text in a flutter app using gemini Gemini API	15	637	May 20, 2024

Gemini 1.5 refuses to process audio files

Related topics