Gemini API so slow . Am i doing something wrong?

Hey, i am trying to make an egine to rewrite some articles. So i have a prompt with a python script requesting gemini to rewrite the articles.

Sometime have to wait up to 22 minutes (1300 seconds) to get 1300 characters.

Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-west4 | Input Characters: 2356 | Output Characters: 1543 | Time Taken: 55.79 seconds

Region: europe-west1 | Input Characters: 1463 | Output Characters: 1492 | Time Taken: 92.18 secondsAttempting with region: us-east1 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west1 | Input Characters: 1634 | Output Characters: 1174 | Time Taken: 172.57 seconds

Attempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1633 | Output Characters: 1474 | Time Taken: 352.86 secondsAttempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1197 | Output Characters: 1419 | Time Taken: 636.26 secondsAttempting with region: europe-central2 / gemini-1.5-pro-002 (Attempt 1/3)Attempting with region: europe-west4 / gemini-1.5-pro-002 (Attempt 1/3)Attempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1043 | Output Characters: 1129 | Time Taken: 8.32 seconds
Attempting with region: europe-central2 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1006 | Output Characters: 979 | Time Taken: 7.57 secondsRegion: europe-north1 | Input Characters: 1505 | Output Characters: 1346 | Time Taken: 9.41 seconds

Region: europe-west4 | Input Characters: 1384 | Output Characters: 1205 | Time Taken: 8.89 secondsRegion: europe-north1 | Input Characters: 1490 | Output Characters: 1481 | Time Taken: 11.07 secondsAttempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Attempting with region: europe-west4 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-central2 | Input Characters: 1560 | Output Characters: 1368 | Time Taken: 11.20 seconds
Attempting with region: europe-west1 / gemini-1.5-pro-002 (Attempt 1/3)
Attempting with region: europe-west1 / gemini-1.5-pro-002 (Attempt 1/3)Region: europe-west4 | Input Characters: 1936 | Output Characters: 1634 | Time Taken:
24.61 seconds

Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-west4 | Input Characters: 2356 | Output Characters: 1543 | Time Taken: 55.79 seconds

Region: europe-west1 | Input Characters: 1463 | Output Characters: 1492 | Time Taken: 92.18 secondsAttempting with region: us-east1 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west1 | Input Characters: 1634 | Output Characters: 1174 | Time Taken: 172.57 seconds

Attempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1633 | Output Characters: 1474 | Time Taken: 352.86 secondsAttempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1197 | Output Characters: 1419 | Time Taken: 636.26 secondsAttempting with region: europe-central2 / gemini-1.5-pro-002 (Attempt 1/3)

Region: europe-west4 | Input Characters: 1482 | Output Characters: 1526 | Time Taken: 1337.62 seconds
Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)

Region: europe-west4 | Input Characters: 1482 | Output Characters: 1526 | Time Taken: 1337.62 seconds
Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)

Hi @Cristian_Ditoiu

Welcome to the community!

There are several ways to reduce latency:

  1. Consider batching multiple articles in a single request rather than sending each individually, as this can reduce latency.
  2. Try to shorten the prompt where possible; reducing the number of tokens will help lower response time.
  3. Use streamGenerateContent to process the output in real-time instead of waiting for the entire response.
  4. If you have precomputed input tokens that you plan to reuse, enabling caching can improve performance—more information is available in the caching documentation.
  5. High API traffic may occasionally lead to increased latency.

Thank you