Gemini-3-flash-preview: Truncated/Garbage Output, Hallucination, and Incomplete Tool Calls in Production Testing

I’m testing gemini-3-flash-preview for a customer support AI agent (accessed via Google Generative AI API through n8n). I hit several issues that make it unusable for production.

Main issue: Garbage final output

In a multi-step agentic workflow, the model generated a correct response in one execution run (289 completion tokens). On the very next run with the same setup, it returned just ** (two asterisks) as its final output, with only 3 completion tokens. No error, just garbage. This seems similar to issues reported in gemini-cli GitHub (#10665, #7851) about empty responses and invalid chunks.

Other issues I encountered:

  1. Hallucinated data - The model made up a phone number that doesn’t exist. For customer support, this is a dealbreaker.

  2. Skipped tool calls - Sometimes the model skips tools it should call, jumping straight to a response without retrieving the data it needs.

What I tried:

  • Temperature at 1.0 (per docs)

  • Multiple test runs

Happy to share logs if useful.

1 Like

Hi @dretana , Thanks for reaching out to us.

To help us diagnose this issue, please share a bit more detail about how the final response is being generated along with a screenshot of any relevant output or logs?

Hello! Thanks for looking into this. Here’s more context:

Setup:

  • n8n workflow with an AI Agent node using gemini-3-flash-preview via Google Generative AI API using the Google Gemini Chat Model node

  • Agent has tools connected (for retrieving customer data, knowledge base lookups, etc.)

  • WhatsApp Business integration receives customer messages and triggers the workflow

  • Agent response is sent back to customer via WhatsApp

What happened: The workflow executed 3 runs for a single customer query:

  • Run 1: Tool calls executed

  • Run 2: Model generated a correct, complete response (289 tokens)

  • Run 3: Model output just ** (3 tokens) - this is what got sent to the customer

The problem is the workflow sent the output from Run 3 instead of Run 2.

Attaching:

  1. Screenshot of the WhatsApp conversation showing the ** output received by customer

  2. Screenshot of n8n execution showing the 3 runs and their outputs