Anybody succeeding in making Gemini say something before calling functions?

In Voice AI applications, latency is critical, so whenever the model calls a function (e.g., API request), the trick is to instruct it so say fillers like “Let me check that for you.”. This works well for OpenAI’s and Anthropic’s models, but for Gemini this has never been reliable enough for me, the model often calls the function first then outputs the fillers.

I am wondering whether somebody managed to make this work.

Thanks a lot in advance for your discussion!

Briefly describing how you would use the API would likely yield helpful answers.
This question might have different answers depending on the scenario.

Since you’re describing Voice AI,

let’s assume you’re using the Gemini Live API and need to use a long time tool call.

You’d like to provide a response to the user first.

A simple approach in this scenario is asynchronous tool calls.

Upon receiving the tool call, immediately send back a tool response message: “Please wait.”

Then, after execution, send the result to the Gemini Live API in text format.

This way, the Gemini Live API will ask the user to wait before describing the tool’s result.

Thanks. I’m not even talking about Gemini Live.

Please try running the following minimal reproducible example multiple times, you’ll see that Gemini very often does not say before calling the functions:

import os

from openai import OpenAI

from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.environ["GEMINI_API_KEY"],
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

weather_description = (
    'Retrieves current weather for the given location. Before calling this function, '
    'says "Let me check the weather for you..."'
)
population_description = (
    "Retrieves current population for the given city. Before calling this function, "
    "it says 'Let me check the population for you...' Args: city (str): Name of the "
    "city. Returns: float: The popualation in millions of people."
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": weather_description,
            "parameters": {
                "type": "object",
                "properties": {
                    "latitude": {"type": "number"},
                    "longitude": {"type": "number"},
                },
                "required": ["latitude", "longitude"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "get_population",
            "description": population_description,
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    },
]

messages = [
    {
        "role": "system",
        "content": (
            "You are a helpful assistant.\n"
            "You have the following tools at your disposal:\n"
            f"- get_weather: {weather_description}\n"
            f"- get_population: {population_description}\n"
            "\nIMPORTANT: You must call multiple functions at the same time. "
            "EXTREMELY IMPORTANT: BEFORE calling the tools, says 'Let me check that for you...'."
        ),
    },
    {
        "role": "user",
        "content": "Please check the weather and the current population in Paris.",
    },
]

stream = client.chat.completions.create(
    model="gemini-3.1-flash-lite",
    messages=messages,
    tools=tools,
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print("** TEXT:", repr(delta.content))
    for tool_call in delta.tool_calls or []:
        print("** TOOL:", tool_call)

In my personal experience, Gemini does tend to avoid outputting text during tool use rounds.

One feasible approach I’ve found is:

You can add a tool like progress_update({message:....})

Then, within the prompt, write a rule for parallel tool usage

requiring that progress_update be called concurrently to report progress when the tool is used

This way, Gemini will use the tool in parallel

You can then retrieve a message like “Please wait” from progress_update.args.message and display it to the user.