I see a few issues with the said API.
- The API is not compatible with the specs ( Chat | OpenAI API Reference ). The spec mentions in bold that usage data will be optionally present only in the last chunk. But the said API returns usage data in every chunk. Applications that are OpenAI API compliant and allow switching models, end up reporting inflated token usage with Gemini. In my application I see 50-60x inflation.
- completion_tokens, prompt_tokens, and total_tokens do not add up.
- Tokens are roughly expected to be character count / 4 but sometimes the token count and character count are the same.
I can reproduce this using a simple curl command
curl -s "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" -H "Authorization: Bearer <Insert Your API Key Here>" -H "Content-Type: application/json" -d '{"model": "gemini-3-flash-preview", "messages": [{"role": "user", "content": "Count from 1 to 5."}], "stream": true, "stream_options": {"include_usage": true}, "max_tokens": 100}'
I see 3 chunks each containing usage data:
data: {“choices”:[{“delta”:{“content”:“1”,“role”:“assistant”},“index”:0}],“created”:1772149571,“id”:“Q9ugadePBtbAqtsPx5e1iAE”,“model”:“gemini-3-flash-preview”,“object”:“chat.completion.chunk”,“usage”:{“completion_tokens”:1,“prompt_tokens”:9,“total_tokens”:103}}
data: {“choices”:[{“delta”:{“content”:", ",“role”:“assistant”},“index”:0}],“created”:1772149571,“id”:“Q9ugadePBtbAqtsPx5e1iAE”,“model”:“gemini-3-flash-preview”,“object”:“chat.completion.chunk”,“usage”:{“completion_tokens”:3,“prompt_tokens”:9,“total_tokens”:105}}
data: {“choices”:[{“delta”:{“extra_content”:{“google”:{“thought_signature”:“<redacted>”}},“role”:“assistant”},“finish_reason”:“length”,“index”:0}],“created”:1772149571,“id”:“Q9ugadePBtbAqtsPx5e1iAE”,“model”:“gemini-3-flash-preview”,“object”:“chat.completion.chunk”,“usage”:{“completion_tokens”:3,“prompt_tokens”:9,“total_tokens”:105}}
data: [DONE]
Am I missing something?