Hi everyone,
I’m using the Gemini Pro model with the Google GenAI SDK for asynchronous audio transcription. When I upload and transcribe a small number of files, everything works great — I get accurate and complete transcriptions for each file.
However, when I upload and process multiple files, some of the files — usually the ones uploaded last — return incomplete or inaccurate transcriptions. The same files work fine when transcribed individually.
I’m using the official async SDK flow for transcription. My environment variables and model configuration are correctly set.
It seems like the issue may relate to:
- Token limits or truncation during batch processing
- Possible concurrency or rate-limiting behavior in the Gemini API
- SDK handling of multiple parallel requests
Has anyone else run into similar problems with Gemini Pro or found best practices for handling batch audio transcription reliably?
Also, is there any guidance on optimal batch size or recommended concurrency limits for transcription workloads?
Hi @Vaibhav_Sharma1
Welcome to the forum!!!
Could you please share the complete payload details along with the steps to reproduce the issue? This will help us investigate it more accurately and provide a more precise response.
1 Like
Hi @Shivam_Singh2
Thank you for the warm welcome!
I am sending the following payload.
{
“audio_file_path”: “s3://your-bucket/path/to/audio.mp3”,
“prompt”: "
Prompt
",
“generation_config”: {
“temperature”: 0.3,
“response_mime_type”: “application/json”,
“max_output_tokens”: “”
},
“model”: “gemini-2.5-pro”
}
Hi @Vaibhav_Sharma1
Thank you for your patience.
We tested the behavior on our side using the same Gemini Pro model and the official async SDK, and were able to transcribe batches of 6–10 audio files successfully, without any truncation or accuracy issues.
To improve reliability on your end, we recommend limiting concurrent transcription requests to around 3–5 at a time, explicitly setting the max_output_tokens parameter (e.g., to 4096 or higher) to prevent output truncation, and adding retry logic with exponential backoff for any failed or incomplete responses.
Also, ensure that your audio files follow a consistent format (e.g., MP3 at 16kHz), and consider breaking up larger batches into smaller groups for better stability.