We have a python script which is calling Gemini API to get the explanation of queries and we need to show the estimated time to complete the whole process.
To get the estimated time we are looking to get the average response time of Gemini API.
Below are the tech details:
Average number of token per request: 500
Modal used: Gemini 1.5 Flash
We have observed that Gemini AI takes 10 to 12 seconds to process a request of 500 tokens. What should be the expected time for processing 500 tokens?
Several factors can affect the speed of a language model’s response. There may also be different processing times depending on whether or not you’re using function calling or just asking to produce plain text, as well as utilizing other features and mediums.
I think you’re already doing the right thing, which is observe the speed based off of your use case, and use that as your marker for expected time to experiment further and see whether or not you can increase / decrease its speed.
Thanks,
We are just asking for explanation of a Query pure text and we are targeting to use the base 1.5 Flash modal. We are not using any other features.
Based on this currently it is taking 10 to 12 seconds for response, we just wanted to confirm if this time range is similar for other people as well or is there anyone having less response time for the request like us(500 tokens per request)
On the other note we also wanted to know if we can provide proxy support though SDK.
In the past two days, both the API and AI STUDIO in the Singapore region have experienced extremely slow response times for 1.5 flash. The 2.0 exp response speed is normal. Please fix this as soon as possible!!!
The http://status.cloud.google.com/ shows there is a slow response problem with Gemini-1.5-flash at this time. The status page notes
QUOTE
13 Dec 2024 06:12 PST We will provide an update by Friday, 2024-12-13 10:00 US/Pacific with current details.
END QUOTE