I’m seeing a reliability regression when using Flash 2.5 on Vertex AI using batching for audio transcription via the google-genai package.
Over the last 1–1.5 months, the model has begun repeatedly emitting the literal token [unclear] for unclear or noisy portions of audio. This repetition consumes the full max_output_tokens budget before the transcription completes, which results in:
-
Truncated or malformed JSON output
-
Failure to satisfy the configured
response_schema -
High overall transcription error rates
This same pipeline was noticeably more stable prior to this timeframe, with significantly fewer [unclear] repetitions and successful completion of structured JSON responses.
Setup details:
-
Platform: Vertex AI
-
Model: Flash 2.5
-
Task: Audio transcription
-
Output format:
application/json -
Response schema: Enabled (JSON Schema)
-
Thinking mode: Enabled
Observed behavior:
-
[unclear]is emitted repeatedly instead of being minimized or aggregated -
Output tokens are exhausted before transcription completes
-
JSON response is cut off mid-generation
Increasing max_output_tokens reduces but does not fully resolve the issue.
Expected behavior:
-
[unclear]output should be limited or aggregated -
The model should prioritize completing a valid JSON response
-
Transcription should complete reliably within the token budget
Questions for the community / Google team:
-
Has there been a recent change or regression in Flash 2.5 transcription behavior?
-
Is repeated
[unclear]output expected for this model? -
Are there recommended mitigations (prompting, schema changes, chunking, model choice)?
-
Is Flash 2.5 currently recommended for transcription workloads on Vertex AI?