Resuming structured output after MAX_TOKENS cut-off

Is there a recommended approach for resuming the structured output of Gemini-1.5-flash models to resume generation from where it was last cut off due to finish reason: MAX_TOKENS?

I’m transcribing audio with speaker diarization using structured outputs, but the transcription is too long to fit in a single response of 8192 tokens

Hi @sps

Not that I’m aware of any.
I’d rather chunk the audio input into smaller pieces, get the structured output, and then piece it back together. Inconvenient but probably working faster than getting an increased output token count.

Cheers

1 Like

Hi @jkirstaetter

Thanks for the idea. The thought of breaking up the audio file into smaller chunks did cross my mind after I posted. However, breaking up audio files can remove necessary context for speaker identification and interfere with the time returned for spoken dialogues.

So, I settled on chunking the transcriptions.

Here’s what I’m currently doing:

  1. Create a cache with the large audio file.
  2. Prompt the model to return transcriptions within a 10-minute window, repeating until the end of the audio is reached.
  3. Combine the transcriptions when I get all of them.

I have also found this approach to be more economical compared to resuming the structured outputs.

1 Like