Unusually very very high time to respond in sequential requests

Hi folks,

I have this problem where I have to send a lot of fast requests to flash in a row, in average they take between 1.5 to 2s, but sometimes one of these requests is going to either take an unusually long time (think 2mn for the same context length) or crash with something like below

{ kind: Request, url: “https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent”, source: hyper_util::client::legacy::Error(SendRequest, hyper::Error(Io, Custom { kind: BrokenPipe, error: “stream closed because of a broken pipe” })) }

Any idea why taht happens?
This completely breaks my product and we could spend a LOT on Flash 2.0 if it actually worked reliably and didn’t crash every 5mn

Hey @palpapeen
Welcome to the community!
Sorry for addressing the issue so late.
Please let us know if you are using free or paid tier.
Below are few suggestions:

  • There are rate limits applicable for free and paid tier. Ensure you are not exceeding it.
  • Implement error handling and retries in the code.
  • Ensure your HTTP client is using “keep-alive” connections

If you have any questions or need assistance, feel free to ask!
Thank you!