Endpoint https://generativelanguage.googleapis.com/v1beta/openai/chat/completions is not compliant with API specs

harki · February 27, 2026, 5:32pm

I see a few issues with the said API.

The API is not compatible with the specs ( Chat | OpenAI API Reference ). The spec mentions in bold that usage data will be optionally present only in the last chunk. But the said API returns usage data in every chunk. Applications that are OpenAI API compliant and allow switching models, end up reporting inflated token usage with Gemini. In my application I see 50-60x inflation.
completion_tokens, prompt_tokens, and total_tokens do not add up.
Tokens are roughly expected to be character count / 4 but sometimes the token count and character count are the same.

I can reproduce this using a simple curl command

curl -s "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" -H "Authorization: Bearer <Insert Your API Key Here>" -H "Content-Type: application/json" -d '{"model": "gemini-3-flash-preview", "messages": [{"role": "user", "content": "Count from 1 to 5."}], "stream": true, "stream_options": {"include_usage": true}, "max_tokens": 100}'

I see 3 chunks each containing usage data:

data: {“choices”:[{“delta”:{“content”:“1”,“role”:“assistant”},“index”:0}],“created”:1772149571,“id”:“Q9ugadePBtbAqtsPx5e1iAE”,“model”:“gemini-3-flash-preview”,“object”:“chat.completion.chunk”,“usage”:{“completion_tokens”:1,“prompt_tokens”:9,“total_tokens”:103}}

data: {“choices”:[{“delta”:{“content”:", ",“role”:“assistant”},“index”:0}],“created”:1772149571,“id”:“Q9ugadePBtbAqtsPx5e1iAE”,“model”:“gemini-3-flash-preview”,“object”:“chat.completion.chunk”,“usage”:{“completion_tokens”:3,“prompt_tokens”:9,“total_tokens”:105}}

data: {“choices”:[{“delta”:{“extra_content”:{“google”:{“thought_signature”:“<redacted>”}},“role”:“assistant”},“finish_reason”:“length”,“index”:0}],“created”:1772149571,“id”:“Q9ugadePBtbAqtsPx5e1iAE”,“model”:“gemini-3-flash-preview”,“object”:“chat.completion.chunk”,“usage”:{“completion_tokens”:3,“prompt_tokens”:9,“total_tokens”:105}}

data: [DONE]

Am I missing something?

Topic		Replies	Views
Gemini Live API Reports Triple Prompt Token Consumption Gemini API gemini-api , live-streaming	5	301	April 27, 2026
Usage not showing for OpenAI compatibility Gemini API open-models	1	267	November 22, 2024
Audio Token Counts Unexpectedly Low in Gemini Live API Gemini API gemini-api , prompt	3	155	January 13, 2026
How can I track token usage when streaming content with Gemini? Gemini API gemini	2	141	January 22, 2026
Token counts mismatch - 9x discrepancy! Gemini API bug , api	9	746	April 17, 2025

Endpoint https://generativelanguage.googleapis.com/v1beta/openai/chat/completions is not compliant with API specs

Related topics