I’m using the Gemini API from Flutter, more specifically the google_generative_ai | Dart package package.
I tried these models (the package uses under the hood the v1beta):
gemini-1.5-pro
gemini-1.5-flash
gemini-1.5-flash-preview (which worked two weeks ago, but now it doesn’t find the model)
gemini-1.5-flash-latest (which didn’t work two weeks ago but now this model is found)
I tried the following file formats:
ogg (Ogg Opus mono, 22kHz, ~128 kbit)
m4a (AAC LC, mono, 22kHz)
wav (mono 16 bit PCM, 22kHz)
mp3 (mono, 128kbit rate average)
For all of these I attach the files to the API call as a Part with the proper MIME type. For all the tries (except when the model is not found right out of the bat), I get something along the line of:
I am sorry, I cannot process any audio you share with me.
I can’t hear any audio right now.
This suggested that the model is capable of processing audio files I’m just either prompting bad or some other problem. With the latest responses, where the model flat out states it cannot process audio files, I’m paused.
Is that true? I thought Gemini-1.5 can process video, and in that case I don’t see why it cannot process audio, which is technically a stream within a video file container. Did anyone able to perform multi-modal operations involving audio files?
Or will I have to somehow augment a dummy empty video stream and bundle the audio stream with it to “trick” the model and overcome this limitation?
Gemini Advanced also cannot take any video. I went to the AI Studio and it was not able to receive the ogg file from Google Drive (got a 500 error), however when I directly uploaded it from my machine it was able to process it. Then I could instruct it to recognize the music playing. It didn’t guess right but at least it seems to work there. I wonder if what I’m experiencing is some Dart Gemini API related quirk?
Doh! And you and I had that conversation already this week and I forgot. Sorry. /:
Doing the POST isn’t that difficult, however, if you want to look into that route. Happy to give you some straight REST examples if you want, but I don’t know dart.
You are being very helpful and advising. There’s the REST call as a workaround, however I’m revisiting firebase_vertexai | Flutter package again, and probably will switch over from BYO API key to BYO Firebase project method. My complete solution will contain some Cloud Functions anyway (for TTS / STT), so maybe it’s even better from a technical user perspective to deal with only Firebase and not a hodge-podge of other things as well. The project in its current form is not for grandma anyway.