Resuming structured output after MAX_TOKENS cut-off

sps · March 1, 2025, 7:17pm

Is there a recommended approach for resuming the structured output of Gemini-1.5-flash models to resume generation from where it was last cut off due to finish reason: MAX_TOKENS?

I’m transcribing audio with speaker diarization using structured outputs, but the transcription is too long to fit in a single response of 8192 tokens

jkirstaetter · March 3, 2025, 5:15am

Hi @sps

Not that I’m aware of any.
I’d rather chunk the audio input into smaller pieces, get the structured output, and then piece it back together. Inconvenient but probably working faster than getting an increased output token count.

Cheers

sps · March 3, 2025, 6:04am

Hi @jkirstaetter

Thanks for the idea. The thought of breaking up the audio file into smaller chunks did cross my mind after I posted. However, breaking up audio files can remove necessary context for speaker identification and interfere with the time returned for spoken dialogues.

So, I settled on chunking the transcriptions.

Here’s what I’m currently doing:

Create a cache with the large audio file.
Prompt the model to return transcriptions within a 10-minute window, repeating until the end of the audio is reached.
Combine the transcriptions when I get all of them.

I have also found this approach to be more economical compared to resuming the structured outputs.

Topic		Replies	Views
How to expand Gemini output window Gemini API help-request , new-features	6	1460	October 9, 2024
Any nice way to output more than 8k tokens of structured json? Gemini API api	3	130	June 13, 2025
Tips on how to increase token output size in GenerateContentResponse? Gemini API gemini-15 , api , models	1	407	September 28, 2024
Continuous Generation Through Gemini 1.5 Flash API Calling Gemini API	2	126	September 11, 2024
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2455	October 12, 2024

Resuming structured output after MAX_TOKENS cut-off

Related topics