Do I get charged for generated tokens if client disconnects during a Vertex AI streaming response?

Hi, I’m using Vertex AI’s generative models (e.g., Gemini) with the streaming API in a production backend environment.

I wanted to clarify how billing works when a client disconnects before the entire response is received. Specifically:

  • If the client initiates a request to the model (via streaming),
  • But then disconnects midway (due to timeout, network error, or user cancellation),
  • Will I still be charged for all tokens that the model had already generated, even if they were not delivered to the client?

From what I understand, OpenAI and other providers charge based on generated tokens regardless of whether they were actually received by the client. Does Vertex AI behave the same way?

Any clarification would be appreciated, especially for optimizing cost and error-handling strategies in production.
Thanks in advance!

Hello,

You will be charged for all tokens that the model had already generated. For detailed information you can refer to Vertex AI pricing.

Thanks.

I have another question.

In that case, for issues like Gemini accidentally outputting infinite newlines—as described in https://discuss.ai.google.dev/t/random-endless-n-output-in-gemini-api-1-5-pro-responses/52757—will we be billed for all the tokens, even if the client cuts off the stream?

I also encountered the same issue, especially when the model outputs tables. It’s very prone to infinite loops, which then leads to maximum output consumption penalties—this is quite frustrating.

Hello,

Could you please share your prompt and model detail where you are facing hallucination issues, so that we can try to recreate it and discuss it with our team internally?