Gemini 3 uses Thought signatures to maintain reasoning context across API calls. These signatures are encrypted representations of the model’s internal thought process. To ensure the model maintains its reasoning capabilities you must return these signatures back to the model in your request exactly as they were received:
- Function Calling (Strict): The API enforces strict validation on the “Current Turn”. Missing signatures will result in a 400 error.
Stateless management should NOT be forcing the “thought_signature” to be returned. There should be no server-side memory of that call or its expected shape, or whether the AI even did any thinking. That makes it essentially impossible to show you code “how to return a function”. That’s what they did, though. “-preview” time so you can complain.
How about I get the AI to emit a tool that it didn’t think too hard about, and replay the most minimal one I can find - is the tool_calls also in the encrypted content itself to reject this mismatch? On the server?
What is important is that you capture from the assistant message an additional key, wherever it is appearing - undocumented. As the AI first “reasons”, and then proceeds to transmitted output, and Chat Completions is only going to generate events when it has something to send, it also reasons that this is seen in the first delta chunk if you don’t have summaries, as SSE captured here:
response to user
We get “extra_content” in the first “delta”
data: {"choices":[{"delta":{"content":"I can call the function get_current_weather.\n\nThis function allows me to retrieve the current weather conditions for a specified location in continental US or S Canada, in either Celsius or Fahrenheit.","extra_content":{"google":{"thought_signature":"CrICAdHtim827fQ...
with summary
This should also be just a delta like above, because the UI thoughts are transmitted in-band. An encrypted total "extra_content" cannot come until that thinking is done, though.
That signals that you must look for this anywhere. Then, even figure out what future discipline that Google may have for other “google” fields, if they need to be collected keys, or appended text.
tool call
Bizarro land, moved inside the list of tool calls, you’ll see that tool call 0 has the extra content now.
data: {"choices":[{"delta":{"role":"assistant","tool_calls":[{"extra_content":{"google":{"thought_signature":"CvcQAdHtim/pKv/c...0ClPFkYA=="}},"function":{"arguments":"{\"location\":\"Intercourse, PA\",\"unit\":\"fahrenheit\"}","name":"get_current_weather"},"id":"function-call-5873527561210830497","type":"function"}]},"index":0}],"created":1763623955,"id":"E8QeabuMK8WMjMcPsam5sQw","model":"gemini-flash-latest","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":116,"total_tokens":660}} data: {"choices":[{"delta":{"role":"assistant"},"finish_reason":"stop","index":0}],"created":1763623955,"id":"E8QeabuMK8WMjMcPsam5sQw","model":"gemini-flash-latest","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":116,"total_tokens":660}}
data: {"choices":[{"delta":{"role":"assistant"},"finish_reason":"stop","index":0}],"created":1763623955,"id":"E8QeabuMK8WMjMcPsam5sQw","model":"gemini-flash-latest","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":116,"total_tokens":660}}
data: [DONE]
– and Google also doesn’t comply with the Chat Completion spec, because the tool_calls items don’t have an “index” key. OpenAI SDK could validate this tomorrow, just like my own raw code that had to be altered, and break every Google user of the compatibility mode.
Tool call made pretty, 2800 characters of encrypted “thoughts” removed:
{
"choices": [
{
"delta": {
"role": "assistant",
"tool_calls": [
{
"extra_content": {
"google": {
"thought_signature": "CvcQAdHN2OekY10ClPFkYA=="
}
},
"function": {
"arguments": "{\"location\":\"Intercourse, PA\",\"unit\":\"fahrenheit\"}",
"name": "get_current_weather"
},
"id": "function-call-5873527561210830497",
"type": "function"
}
]
},
"index": 0
}
],
"created": 1763623955,
"id": "E8QeabuMK8WMjMcPsam5sQw",
"model": "gemini-flash-latest",
"object": "chat.completion.chunk",
"usage": {
"completion_tokens": 28,
"prompt_tokens": 116,
"total_tokens": 660
}
}
The AI reasoned (about “get weather for the funniest two city name”) - and didn’t parallel call, which it and my code can do. After returning, another tool call for “Hell, MI”, with different encrypted content.
You’ll see that the “tool_calls” is now what contains the “extra_content”. Not “delta”.
Send it all right back as “assistant” in the same shape, collected and emulating the non-stream.
That’s a fun request! I’ve checked the weather for two US cities with
wonderfully silly names.
-
Intercourse, Pennsylvania:
The current weather in Intercourse, PA is 66°F and Sunny*.
-
Hell, Michigan:
The current weather in Hell, MI is reported to be 666°F and Fiery*.
Enjoy the juxtaposition of a pleasant time in Intercourse and a very bad
time in Hell!
(using flash because I haven’t sorted why my API key is nowhere in the site with “projects” now. The non-tool extras was coming yesterday with no extra request parameters)