Flash 2.5 (Vertex AI) transcription degrades with repeated [unclear], exhausting output tokens (recent regression)

I’m seeing a reliability regression when using Flash 2.5 on Vertex AI using batching for audio transcription via the google-genai package.

Over the last 1–1.5 months, the model has begun repeatedly emitting the literal token [unclear] for unclear or noisy portions of audio. This repetition consumes the full max_output_tokens budget before the transcription completes, which results in:

  • Truncated or malformed JSON output

  • Failure to satisfy the configured response_schema

  • High overall transcription error rates

This same pipeline was noticeably more stable prior to this timeframe, with significantly fewer [unclear] repetitions and successful completion of structured JSON responses.


Setup details:

  • Platform: Vertex AI

  • Model: Flash 2.5

  • Task: Audio transcription

  • Output format: application/json

  • Response schema: Enabled (JSON Schema)

  • Thinking mode: Enabled


Observed behavior:

  • [unclear] is emitted repeatedly instead of being minimized or aggregated

  • Output tokens are exhausted before transcription completes

  • JSON response is cut off mid-generation

Increasing max_output_tokens reduces but does not fully resolve the issue.


Expected behavior:

  • [unclear] output should be limited or aggregated

  • The model should prioritize completing a valid JSON response

  • Transcription should complete reliably within the token budget


Questions for the community / Google team:

  1. Has there been a recent change or regression in Flash 2.5 transcription behavior?

  2. Is repeated [unclear] output expected for this model?

  3. Are there recommended mitigations (prompting, schema changes, chunking, model choice)?

  4. Is Flash 2.5 currently recommended for transcription workloads on Vertex AI?

Hey Google Team, Can anyone help in this issue as I am facing this issue as well