Number of parallel or concurrent request for 1.5 pro

hemanth_kumar · November 29, 2024, 12:09pm

How many parallel requests can I safely make to this API call?

from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor

# Define the processing function
def process_item(i):
    try:
        tagged_sentence = gemini_model_generative(prompt.format(i.tagged_source, i.temp_target))
 
        return tagged_sentence 
    except Exception as e:
        return f"Failed ID {i.id}: {e}"

# Main execution
if __name__ == "__main__":
    with ThreadPoolExecutor(max_workers=5) as executor:  # Adjust max_workers based on your system
        futures = list(tqdm(executor.map(process_item, with_tagged_seg), total=len(with_tagged_seg)))

prashobh_paul · July 25, 2025, 5:33am

This depends on the Google GenerativeAI API’s usage quotas.
If you are going with VertexAI + Gemini models, then concurrency limit per region may be 100 to 300 requests/sec and burst rate or short spikes of 50 to 100 requests are safe.
If GeminiPro via AIStudio / Restful-API then:
50-60 requests/min for text-bison or gemini pro in free tier.
So, I would say for Free-tier 2 to 5 workers, paid can go from 10 to 20 and high-throughput vertexAI can be 20 to 80.

Topic		Replies	Views
Inquiry Regarding Rate Limits for Gemini 1.5 Pro on Google AI Studio Google AI Studio gemini-15 , ai-studio , api	7	555	May 15, 2024
The maximum number of concurrent tasks allowed for a single API key Gemini API	2	527	May 28, 2024
5 RPM - Will that be increased in future? Gemini API	4	284	May 2, 2024
429 Errors on Large Prompt Gemini API	8	439	August 4, 2024
Gemini-1.5-pro-002 quotas lower than 001 Gemini API gemini-15 , vertexai	7	1404	November 19, 2024

Number of parallel or concurrent request for 1.5 pro

Related topics