What is the limit on audio length when using Gemini API to do ASR task?

Honglei_Zhang · July 2, 2024, 6:49am

Question is as title. The command is like this:

response = model.generate_content([prompt, audio_file])

When audio length bigger than a few minutes(maybe 10 mimutes?), a lot of transcription will be lost.
So, does the API have a audio length limit? and if so, what is the limit?

Thanks.

afirstenberg · July 2, 2024, 1:59pm

While Gemini can accept audio input, it isn’t necessarily the best solution for this problem.

You may want to consider something like the Google Speech to Text API, which does have models that are tuned for long input and transcription.

Topic		Replies	Views
Transcribe text to text and vice versa, speech to speech and image to text in a flutter app using gemini Gemini API	15	673	May 20, 2024
Error during translation and response: 'model' Gemini API	1	45	June 23, 2024
The Size or Duration of VIDEO LIMITED: while want to get the text from model.generate_content(prompt, video).? Gemini API	1	64	July 14, 2025
Resuming structured output after MAX_TOKENS cut-off Gemini API gemini-15	2	155	March 3, 2025
Speaker Diarized and Timestamped Transcription with Gemini Gemini API audio , gemini-25	3	53	August 19, 2025

What is the limit on audio length when using Gemini API to do ASR task?

Related topics