Gemini 1.5 refuses to process audio files

I’m using the Gemini API from Flutter, more specifically the google_generative_ai | Dart package package.
I tried these models (the package uses under the hood the v1beta):

  • gemini-1.5-pro
  • gemini-1.5-flash
  • gemini-1.5-flash-preview (which worked two weeks ago, but now it doesn’t find the model)
  • gemini-1.5-flash-latest (which didn’t work two weeks ago but now this model is found)

I tried the following file formats:

  • ogg (Ogg Opus mono, 22kHz, ~128 kbit)
  • m4a (AAC LC, mono, 22kHz)
  • wav (mono 16 bit PCM, 22kHz)
  • mp3 (mono, 128kbit rate average)

For all of these I attach the files to the API call as a Part with the proper MIME type. For all the tries (except when the model is not found right out of the bat), I get something along the line of:

  • I am sorry, I cannot process any audio you share with me.
  • I am sorry, I cannot process any audio files.

Two weeks ago I remember getting back things like (Add "Shazam mode" · Issue #38 · CsabaConsulting/InspectorGadgetApp · GitHub):

  • I can’t hear any music playing right now.
  • I can’t hear any audio right now.
    This suggested that the model is capable of processing audio files I’m just either prompting bad or some other problem. With the latest responses, where the model flat out states it cannot process audio files, I’m paused.

Is that true? I thought Gemini-1.5 can process video, and in that case I don’t see why it cannot process audio, which is technically a stream within a video file container. Did anyone able to perform multi-modal operations involving audio files?

Or will I have to somehow augment a dummy empty video stream and bundle the audio stream with it to “trick” the model and overcome this limitation?

Another note: I tried to attach the files in the Gemini Advanced web app for a good measure, however it doesn’t seem to handle any audio files.

Gemini Advanced also cannot take any video. I went to the AI Studio and it was not able to receive the ogg file from Google Drive (got a 500 error), however when I directly uploaded it from my machine it was able to process it. Then I could instruct it to recognize the music playing. It didn’t guess right but at least it seems to work there. I wonder if what I’m experiencing is some Dart Gemini API related quirk?

Is it using an inlineData part or a fileData part under the hood?

I didn’t think that audio was ever accepted using inlineData, you needed to upload the audio first and then use fileData.

1 Like

Ooooh! I’m using inline data. I’ll try file data next then. I should do that for any file such as image or PDF as well?

Inline works for images. Less sure about PDF (particularly since it extracts it into multiple parts).

At this point, I do almost everything using fileData. Not much reason not to, and there are several advantages to doing so beyond the prompt itself.

Good luck!

1 Like

Aaand now I remember why I didn’t use the FileData (generative-ai-dart/pkgs/google_generative_ai/lib/src/content.dart at ec5a820166fdb05fb5b387efab31eccce9d4072f · google-gemini/generative-ai-dart · GitHub). It’s part of the API, but the “Google AI File Service API” to actually upload the file is nowhere to be found.
Files API · Issue #211 · google-gemini/generative-ai-dart · GitHub

Doh! And you and I had that conversation already this week and I forgot. Sorry. /:

Doing the POST isn’t that difficult, however, if you want to look into that route. Happy to give you some straight REST examples if you want, but I don’t know dart.

1 Like

You are being very helpful and advising. There’s the REST call as a workaround, however I’m revisiting firebase_vertexai | Flutter package again, and probably will switch over from BYO API key to BYO Firebase project method. My complete solution will contain some Cloud Functions anyway (for TTS / STT), so maybe it’s even better from a technical user perspective to deal with only Firebase and not a hodge-podge of other things as well. The project in its current form is not for grandma anyway.

1 Like