Model: gemini-3-flash-preview
Environment: Production (via API)
Issue: Sporadic insertion of HTML tags at the end of assistant turns.
Description: We are developing an AI-powered voice agent using the Gemini 3 Flash Preview model. While the model generally follows our plain-text system instructions, it sporadically appends an HTML tag to the end of its responses.
The Impact: Because this output is fed directly into a Text-to-Speech (TTS) engine, the TTS attempts to phonetically pronounce the string, resulting in a distinct and unprofessional “g” or “glottal stop” sound at the end of the agent’s speech.
Observations:
-
Environment Discrepancy: The issue occurs in our Production environment but is much less common in our Local testing environment (could potentially be to high concurrent volume in production).
-
Sporadic Nature: It does not happen on every turn; it seems to occur more frequently in longer conversations or when the model is reviewing structured lists (e.g., medication schedules).
-
Example: Assistant: I see, you had some tests done but haven’t received the results yet; do you remember which facility did those tests so we can help track them down for you?
Configurations Used:
-
model: gemini-3-flash-preview
-
temperature: 1.0
Questions for the Community:
-
Is this a known “Formatting Leakage” regression in the Gemini 3 preview models?
-
Why would this manifest in Production but stay relatively clean in Local testing (possible server-side versioning or default parameter differences)?
-
Besides post-processing Regex, are there specific safety_settings or stop_sequences that have successfully suppressed these web-markup hallucinations for other developers?