I am using API JSON methods via Python with httpx
and json
and not the SDK. I am using parsing code to gather responses developed on the OpenAI API.
My API streaming code is fluent in sending and handling anything the API might do: strict and non-strict tools (functions), parallel tool calls, structured response_format as json_schema in both strict and non-strict, and json_object. And is a chatbot example.
However I turn all that structured stuff off to just use function calling (not with the function parameter, though, which is refused by Gemini’s OpenAI compatibility)
Firstly: The documentation here is inadequate.
- There is no REST specification for structured outputs. Just Pydantic. Not pythonic streamable object input to SDK as response_format. Not JSON schema as a RESTful JSON. Not strict:false which cannot be sent by Pydantic BaseModel as SDK response_format.
- There is no tool call response object example.
- There are no delta stream examples, either with or without a function being called.
Then the fault: Sure I can chat with the AI after making changes to API endpoint, model, and using the standard Gemini environment variable in my code.
But: parsing the tool_call stream fails.
gemini-2.0-flash-exp analysis to do the work for me:
You’ve done a fantastic job of isolating the problem and providing all the necessary information. The issue is clear: the Gemini OpenAI compatibility mode is sending back a streaming response for tool_calls
that deviates from the expected OpenAI format, specifically by omitting the index
within each tool_call
object.
Let’s break this down into a bug report and then propose a solution.
Bug Report: Gemini OpenAI Compatibility Mode - Incorrect tool_calls
Streaming Format
Summary:
The Google Gemini OpenAI compatibility mode API, when returning a streaming response (stream: true
) that includes a tool_calls
(function call), sends a delta
chunk that does not conform to the expected OpenAI format. Specifically, the index
key is missing within each tool_call
object in the delta
. This causes parsing errors in code that correctly handles OpenAI’s streaming responses.
Environment:
- API: Google Gemini OpenAI compatibility mode (e.g.,
https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
) - Model:
gemini-2.0-flash-exp
(and likely others) - Streaming:
stream: true
- Trigger: API response includes
tool_calls
Steps to Reproduce:
- Use the provided Python code with all advanced feature flags set to
False
(as described in the previous turn). - Set the
ai_model
to"gemini-2.0-flash-exp"
. - Send a chat message that will trigger a tool call (e.g., “What’s the weather in Miami today?” when the
get_current_weather
tool is defined).
Expected Behavior:
The API should return a streaming delta
chunk for tool_calls
that conforms to the OpenAI format. Each tool_call
object within the delta
should have an index
key, similar to this (inferred from OpenAI’s non-streaming response structure):
{
"delta": {
"role": "assistant",
"tool_calls": [
{
"index": 0, // <-- Index is present in expected behavior
"function": {
"arguments": "{\"location\":\"Miami\",\"unit\":\"fahrenheit\"}",
"name": "get_current_weather"
},
"id": "0",
"type": "function"
}
]
},
"index": 0
}
Actual Behavior:
The API returns a streaming delta
chunk for tool_calls
where the index
key is missing within each tool_call
object:
{
"delta": {
"role": "assistant",
"tool_calls": [
{
// "index": 0, <-- Index is missing!
"function": {
"arguments": "{\"location\":\"Miami\",\"unit\":\"fahrenheit\"}",
"name": "get_current_weather"
},
"id": "0",
"type": "function"
}
]
},
"index": 0
}
This causes a KeyError: 'index'
in the process_chunk
function when it tries to access tool_call['index']
.
Impact:
Code that correctly parses OpenAI’s streaming responses will fail to parse streaming responses from the Gemini OpenAI compatibility mode when tool_calls
are present. This breaks compatibility and requires code modifications to handle the discrepancy.
Additional Notes:
- The code works correctly with the actual OpenAI API.
- The Gemini OpenAI compatibility mode documentation does not specify the format of the streaming response for
tool_calls
.
Proposed Solution:
(a brute-force that uses the model name is the AI’s solution)
This is specific to my code and tool event handler class which itself is written as example usage, but you get some free code dump here
We need to modify the process_chunk
function to gracefully handle the missing index
key in the tool_calls
delta
when using the Gemini API. We can do this by detecting the Gemini API and then providing a default index or using a different method to accumulate the tool_calls
data.
Here’s the modified code:
def process_chunk(
chunk: dict[str, Any],
state: ResponseState,
print_client: Printer | None = None
) -> None:
"""Processes a chunk of the streaming response and updates the state."""
CHUNK_LOG_ENABLED = True # Set to True to enable chunk logging
import os # Needed for file operations if logging is enabled
printing = print_client or None
if 'error' in chunk:
# API returned an error
raise Exception(f"API Error: {chunk['error']}")
if 'usage' in chunk and not chunk.get('choices'):
state.usage = chunk['usage']
return
choices = chunk.get('choices', [])
if not choices:
return
choice = choices[0]
print(choice) # Keep the chunk logging for debugging
delta = choice.get('delta', {})
if 'content' in delta and delta['content'] is not None:
content_piece = delta['content']
state.content += content_piece
if CHUNK_LOG_ENABLED:
# Ensure 'token_log.txt' exists and initialize it with '[\n' if it doesn't
if not os.path.exists('token_log.txt'):
with open('token_log.txt', 'w', encoding='utf-8') as f:
f.write('[\n')
# Append the content_piece
with open('token_log.txt', 'a', encoding='utf-8') as f:
f.write(f' {repr(content_piece)},\n')
# This is where the output happens
if not print_client:
print(content_piece, end='', flush=True)
else:
printing.word(content_piece) # line printer with wrapping
if 'function_call' in delta and delta['function_call']:
for key, value in delta['function_call'].items():
# Accumulate the function_call parts as they may be streamed in chunks
state.function_call[key] = state.function_call.get(key, '') + value
if 'tool_calls' in delta and delta['tool_calls']:
for tool_call in delta['tool_calls']:
# Check if we are using the Gemini API
if "gemini" in ai_model: # Use the global ai_model to detect
# Gemini API: Assume index 0 if not provided
index = 0
accumulated_tool_call = state.tool_calls.get(index, {
'index': index,
'id': tool_call.get('id'),
'type': tool_call.get('type'),
'function': {}
})
if 'id' in tool_call:
accumulated_tool_call['id'] = tool_call['id']
if 'type' in tool_call:
accumulated_tool_call['type'] = tool_call['type']
if 'function' in tool_call and tool_call['function']:
for k, v in tool_call['function'].items():
accumulated_tool_call['function'][k] = accumulated_tool_call['function'].get(k, '') + v
state.tool_calls[index] = accumulated_tool_call
else:
# OpenAI API: Expect index to be present
index = tool_call['index']
accumulated_tool_call = state.tool_calls.get(index, {
'index': index,
'id': tool_call.get('id'),
'type': tool_call.get('type'),
'function': {}
})
if 'id' in tool_call:
accumulated_tool_call['id'] = tool_call['id']
if 'type' in tool_call:
accumulated_tool_call['type'] = tool_call['type']
if 'function' in tool_call and tool_call['function']:
for k, v in tool_call['function'].items():
accumulated_tool_call['function'][k] = accumulated_tool_call['function'].get(k, '') + v
state.tool_calls[index] = accumulated_tool_call
if choice.get('finish_reason'):
state.finish_reason = choice['finish_reason']
if print_client:
print_client.reset() # clear position tracking
Explanation of Changes:
- API Detection: We’re using a simple heuristic: checking if
"gemini"
is present in theai_model
string. This is a reasonable assumption for now, but a more robust method might involve checking the API endpoint URL or adding a separate global variable to explicitly specify the API being used. - Conditional Index Handling:
- Gemini API: If we detect the Gemini API, we assume an
index
of0
because it’s missing. Since Gemini doesn’t seem to support multiple tool calls in a single response (at least not yet), this assumption is safe for now. We then proceed to accumulate thetool_call
data as before. - OpenAI API: If it’s not the Gemini API (i.e., it’s the real OpenAI API), we use the existing logic that expects the
index
to be present.
- Gemini API: If we detect the Gemini API, we assume an
Advantages of this Solution:
- Compatibility: It maintains compatibility with both the real OpenAI API and the Gemini OpenAI compatibility mode.
- Minimal Changes: It only modifies the
process_chunk
function, keeping the changes localized. - Handles Missing Index: It gracefully handles the missing
index
in the Gemini response.
Further Improvements:
- Robust API Detection: As mentioned, a more robust API detection mechanism might be desirable in the future.
- Multiple Tool Calls (Future-Proofing): If Gemini starts supporting multiple tool calls in a single response, you’ll need to revise the logic to handle multiple indices even when they are not explicitly provided. You might need to infer the index based on the order of the
tool_calls
in thedelta
or use a different accumulation strategy. - Error Handling: You could add more specific error handling in the Gemini case. For example, if you encounter a situation where you need to infer an index other than 0 and it leads to unexpected behavior, you could raise a custom exception or log a warning.
This revised code, along with the detailed bug report, provides a solid foundation for working with both OpenAI and Gemini APIs while addressing the specific issue you encountered. Remember to thoroughly test your code after implementing these changes!
I haven’t dumped out all Gemini stream chunks as API compatibility documentation and comparison.
Actual solution
Google ups their compatibility game to emulate OpenAI API responses completely, including streaming, including tool calls, including parallel call readiness.
OpenAI SDK also has streaming helper beta parser and collector with events, when the SDK version to be compatible with is advanced.