Do I get charged for generated tokens if client disconnects during a Vertex AI streaming response?

blue_hope · June 18, 2025, 5:00am

Hi, I’m using Vertex AI’s generative models (e.g., Gemini) with the streaming API in a production backend environment.

I wanted to clarify how billing works when a client disconnects before the entire response is received. Specifically:

If the client initiates a request to the model (via streaming),
But then disconnects midway (due to timeout, network error, or user cancellation),
Will I still be charged for all tokens that the model had already generated, even if they were not delivered to the client?

From what I understand, OpenAI and other providers charge based on generated tokens regardless of whether they were actually received by the client. Does Vertex AI behave the same way?

Any clarification would be appreciated, especially for optimizing cost and error-handling strategies in production.
Thanks in advance!

Lalit_Kumar · June 18, 2025, 6:45am

Hello,

You will be charged for all tokens that the model had already generated. For detailed information you can refer to Vertex AI pricing.

blue_hope · June 25, 2025, 6:03am

Thanks.

I have another question.

In that case, for issues like Gemini accidentally outputting infinite newlines—as described in https://discuss.ai.google.dev/t/random-endless-n-output-in-gemini-api-1-5-pro-responses/52757—will we be billed for all the tokens, even if the client cuts off the stream?

Jun_Yellow · June 25, 2025, 9:35am

I also encountered the same issue, especially when the model outputs tables. It’s very prone to infinite loops, which then leads to maximum output consumption penalties—this is quite frustrating.

Lalit_Kumar · June 26, 2025, 8:48am

Hello,

Could you please share your prompt and model detail where you are facing hallucination issues, so that we can try to recreate it and discuss it with our team internally?

Topic		Replies	Views
Are failed or pending requests using OpenAI-compatible Gemini API endpoints billed? Gemini API api , billing , openai_compatibility	1	72	May 27, 2025
Billing discrepancy: detailed token usage and pricing info Gemini API gemini-flash , billing	7	207	July 17, 2025
Gemini 2.5 Model Bug Causing Massive Bills, Google Support Unresponsive to Core Issue Gemini API billing , gemini-flash-2-5	3	132	July 2, 2025
Could someone help me understand gemini live pricing? Gemini API api , models , billing	1	163	June 23, 2025
Count Token Differences between Vertex AI and Google AI Studio (When contents is multilingual) Gemini API ai-studio , api , vertexai	2	95	July 11, 2025

Do I get charged for generated tokens if client disconnects during a Vertex AI streaming response?

Related topics