Hi, I’m using Vertex AI’s generative models (e.g., Gemini) with the streaming API in a production backend environment.
I wanted to clarify how billing works when a client disconnects before the entire response is received. Specifically:
- If the client initiates a request to the model (via streaming),
- But then disconnects midway (due to timeout, network error, or user cancellation),
- Will I still be charged for all tokens that the model had already generated, even if they were not delivered to the client?
From what I understand, OpenAI and other providers charge based on generated tokens regardless of whether they were actually received by the client. Does Vertex AI behave the same way?
Any clarification would be appreciated, especially for optimizing cost and error-handling strategies in production.
Thanks in advance!