I am developing a product that uses gemini_api_key and uses it through openai sdk. To calculate the cost, I get prompt_tokens, complete_tokens and total_tokens in each api call. According to my calculation, prompt_tokens+complete_tokens=total_tokens. However, during the calculation, I noticed that total_tokens is much larger than the two parameters above. I have used two models, gemini_2.5_pro and gemini_2.5_flash, and used the output structure. Is there any mistake here? Or does Google include token thinking in the output?
Hi @hung_hoang_dinh ,
Refer - Understand and count tokens | Gemini API | Google AI for Developers
Thanks!
I know this because there’s also a reasoning token. When using the OpenAI format, the complete_tokens doesn’t include the reasoning process, which should be counted in the output for billing. It’s probably a minor compatibility issue.