Where exactly can I quickly check cost and usage for Gemini API calls (only). Google Cloud Console is such a maze, I click around a lot but never sure what I am finding. Can you please provide exact breadcrumb path?
I have same problem, open AI billing interface much more comfortable for using
Hi Fred,
I believe Vertex Cloud includes it.
All I see is pricing. That is not the same as usage.
This is what I have been able to come up with so far
It is not very detailed — well, the error chart is detailed enough to find all your 500 HTTP errors reported there, the usage you can guesstimate.
An obvious workaround is to set up your home or small office proxy http server to monitor the traffic, then you count occurrences of proxied requests by whatever time period you choose in your proxy server weblog. Another way of saying, we can with relative ease count them ourselves.
Try https://console.cloud.google.com/billing/ then go to billing, and select “all project”, “current month”.
The usage does not show up immediately. You need to wait for it to be updated. Not sure how long is the wait but pretty sure it is updated within 24h.
I’m still on the free tier, so I incorporated this simple solution into the system instructions.
Output
- Prompt Cost: At the end of each result, calculate the cost of the request (Input & Output tokens) based on the pricing model provided below.
Pricing:
- Input: $0.35 / 1 million tokens (for prompts up to 128K tokens), $0.70 / 1 million tokens (for prompts longer than 128K tokens)
- Output: $1.05 / 1 million tokens (for prompts up to 128K tokens), $2.10 / 1 million tokens (for prompts longer than 128K tokens)
The output would look like:
Prompt Cost Calculation:
- Input Tokens: 1686
- Output Tokens: 931
- Total Tokens: 2617
- Input Cost: $0.0006 ( $0.35 / 1 million tokens * 1686 tokens )
- Output Cost: $0.0098 ( $1.05 / 1 million tokens * 931 tokens)
- Total Cost: $0.0104
Taking into consideration that I don’t have the means to verify the accuracy, or at least nothing exists for such purpose to my knowledge.
After initial testing, the cost output will be in JSON format and loaded into a database for additional processing, calculations, and potentially applying user quota.
Great idea! I implemented. I have a json file of system instruction “blocks” that I can add depending on task.
I found this thread when looking for my current token usage. The lack (or complete obscurity?) of this feature makes Google Gemini LLMs harder to use compared to competitors
+1
It was too late when I checked the usage status today. Due to the delay, I wasn’t able to monitor the high usage yesterday, and when I reviewed the dashboard this morning, I found a bill exceeding 10k USD.
Additionally, I discovered that fine-tuning is quite expensive because it trains the model on 10 epochs by default. I miscalculated the token usage based on the numbers shown in the dashboard, which were only for 1 epoch. As a result, the bill I received was multiplied by 10.
Just FYI, OpenAI provides close-to-real-time usage reports.