Flash 2.5 (Vertex AI) transcription degrades with repeated [unclear], exhausting output tokens (recent regression)

I’m seeing a reliability regression when using Flash 2.5 on Vertex AI using batching for audio transcription via the google-genai package.

Over the last 1–1.5 months, the model has begun repeatedly emitting the literal token [unclear] for unclear or noisy portions of audio. This repetition consumes the full max_output_tokens budget before the transcription completes, which results in:

  • Truncated or malformed JSON output

  • Failure to satisfy the configured response_schema

  • High overall transcription error rates

This same pipeline was noticeably more stable prior to this timeframe, with significantly fewer [unclear] repetitions and successful completion of structured JSON responses.


Setup details:

  • Platform: Vertex AI

  • Model: Flash 2.5

  • Task: Audio transcription

  • Output format: application/json

  • Response schema: Enabled (JSON Schema)

  • Thinking mode: Enabled


Observed behavior:

  • [unclear] is emitted repeatedly instead of being minimized or aggregated

  • Output tokens are exhausted before transcription completes

  • JSON response is cut off mid-generation

Increasing max_output_tokens reduces but does not fully resolve the issue.


Expected behavior:

  • [unclear] output should be limited or aggregated

  • The model should prioritize completing a valid JSON response

  • Transcription should complete reliably within the token budget


Questions for the community / Google team:

  1. Has there been a recent change or regression in Flash 2.5 transcription behavior?

  2. Is repeated [unclear] output expected for this model?

  3. Are there recommended mitigations (prompting, schema changes, chunking, model choice)?

  4. Is Flash 2.5 currently recommended for transcription workloads on Vertex AI?

2 Likes