I am confused on pricing for Nano Banana Pro - I am getting ~2000 output tokens, instead of 1120, for a 1K image

I am using the gemini-3-pro-image-preview with aspect ratio set to 1:1 and resolution at 1K and I get the 1K image back as expected. But it’s the output token count that is confusing me. I am getting in the range 1850-2000 output tokens. Gemini pricing for this model says a 1K image should cost 1120 tokens. I am also getting some thought tokens separately.

Why am I getting closer to 2000 tokens, as the pricing page suggests for a 4K image, on a 1K image that should be 1120 tokens?

1 Like

Hi William!

Apologies for the confusion with the token count on your recent image generations. It definitely looks a bit strange to see a 1K image hitting the 2000 token mark. I would love to help get to the bottom of this.

The total output token count is the sum of two distinct parts of the generation process:

  1. Fixed Image Tokens: For a 1K resolution image, the model always uses 1120 tokens to render the final pixels.

  2. Dynamic Thinking Tokens: Gemini 3 Pro uses a reasoning step to “think” through the composition, lighting, and prompt adherence. This process typically generates between 700 and 900 extra tokens.

For the 4K pricing, while 2000 tokens is the base rate for a 4K image, that 4K image would also have its own thinking tokens on top of that, pushing its total count higher. You should still only be charged charged the 1K image rate for the 1120 tokens. The additional thinking tokens are billed at the standard (and much cheaper) text output rate.

You can also check the usage_metadata field in your API response. It will list a thought_token_count separately. If you add that number to 1120, it should align perfectly with the total output you are seeing.

Alisa,

Thanks for the reply! I am logging all usage_metadata for each image generation. Here are some examples for generations of 1K images at various aspect ratios (16:9 and 1:1) :

image

image

image

I also did a 4K generation test and got this:

image

Thought tokens are recorded separate from output tokens, but as you can see all of the 1K generations are ~800 output tokens more than the quoted 1120 for a 1K image. And the 4K image is 700 output tokens more than the quoted 2000 tokens for a 4K image. These are all straight from the usage_metadata and not my own math errors:

Any help would be greatly appreciated!

William

Hi William! Thank you for the details on this. We are debugging on our end and will get back to you with some more info

Thanks Alisa, I appreciate it. If you need any more details, let me know.

I haven’t seen anyone else report this issue, so I am still uncertain if it is an API issue or something on my end.

It’s very important for me because I am trying to price out an image generation service I am working on and expected cost per image was $0.13, but is now coming in at about $0.23.

Hi William!
Wanted to follow up here and let you know that our engineering team is still looking into this issue. I have an open bug for it and will make sure be sure to correct any issues on our end.

Thank you for your patience!

Thanks Alisa, I really appreciate it!

Alisa, congrats on the release of Nano Banana 2. I am working on integrating it with my apps. I’m hoping it solves whatever issue I have been having here.

But I’d still like some resolution on this. If you find time, could you check with the team and see if any progress has been made?

Thanks, William

Hi Alisa, I implemented gemini-3.1-flash-image-preview and I have the same problem. It must be something on my end since no one else has complained about this, but I simply can’t figure it out. Here are some results that show instead of the quoted 1120 output tokens for a 1K image, 1848 output tokens were used. What am I missing??

=======================================================================

IMAGE GENERATION REQUEST

=======================================================================

Model: gemini-3.1-flash-image-preview

Aspect Ratio: 16:9

Resolution: 1K

Modalities: IMAGE

Thinking Budget: 2048

Dry Run: False

=======================================================================

=======================================================================

TOKEN USAGE METADATA

=======================================================================

Model: gemini-3.1-flash-image-preview

Input tokens: 742

Output tokens: 1848

Thoughts tokens: 2970

Cached tokens: None

Total tokens: 5560

Image equivalents (@ 1120 tokens/1K image): 1.65

Cost breakdown: input=$0.0002, output=$0.1109, thoughts=$0.0045

Total estimated cost: $0.1155

=======================================================================

I set the output modalbility from [“text“, “image“] to [“image”] then the output tokens getting normal, I dont know why.

Thank you. Yes, I have just [“image”] modality set. It would make sense that I’d be getting more tokens used if I had them both set, but I have double checked and I just have [“image”].

i meet same token consume(exactly 2k) and no text output but it just work after setting modalbility so I dont know why, it works like a black box