Calculating cost for a single Gemini 2 request with audio and text

rossmacarthur · April 30, 2025, 7:51am

I’m trying to calculate the cost for a single multi-modal request for Gemini 2.0 Flash, because I need to charge this amount to my user.

In the documentation here Vertex AI Pricing | Generative AI on Vertex AI | Google Cloud it states that

1M Input tokens has a cost of $0.15
1M Input audio tokens has a cost of $1.00
1M Output text tokens has a cost of $0.60

In the SDK, I am returned the following usage metadata:

prompt_token_count
candidates_token_count
total_token_count

My understanding is that if the input was only text, then the cost would be

cost = (0.15 / 1000000) * prompt_token_count + (0.6 / 1000000) * candidates_token_count

My question is:

Does the prompt_token_count include both the text input tokens and the audio input tokens? If so, how do I know how many tokens are due to text input vs audio input?

mberta · April 30, 2025, 1:24pm

Assuming you’re using the python sdk, this way you should be able to get both text and audio tokens:

    for prompt_tokens_detail in response.usage_metadata.prompt_tokens_details:
        print(f"Media: {prompt_tokens_detail.modality.name} token count: {prompt_tokens_detail.token_count}")

Topic		Replies	Views
Could someone help me understand gemini live pricing? Gemini API api , models , billing	0	90	April 28, 2025
How Do I Accurately Calculate Gemini 2.5 Pro API Pricing? Google AI Studio api , billing	1	294	May 5, 2025
Cost estimation for audio input and text output Gemini API gemini-15 , api	3	400	July 5, 2024
Gemini 2.0 Flash audio output costs? Gemini API gemini-20 , vertex-ai , documentation	1	71	May 23, 2025
Understanding Gemini Multimodal Live context and pricing Google AI Studio gemini	1	264	May 21, 2025

Calculating cost for a single Gemini 2 request with audio and text

Related topics