It takes 190s to finish the task (image is for your referal). I personal think that it is really slow. Note that I’ve changed the model from Gemini flash 1.5 to Gemini 2.0 Flash Experimental. There were my prompts and responses from the model before I sitwch between model.
if switching between models is the reason why it is slow, so do not allow user to switch for any existing converstation with any model.
1 Like
Welcome to the forums!
Switching, itself, is unlikely to be the problem.
However, the model is pretty popular, so it seems more likely that you’re just seeing load.
My tests so far have suggested that Gemini 2.0 is significantly faster.
Thanks for your response.
Yes, my personal exeprience is the speed of Gemini Flash 2.0 is faster than Gemini Flash 1.5.
But I don’t know why the speed are much slower when I switched from 1.5 to 2 for an existed converstation/prompts.
1 Like
Hey, thanks for raising this. I’ve got the same issue here. Have you managed to sort it out? Is this happening because it is trending now and too many people are using it, causing the resources to be overloaded?
I agree that the response was faster (I used it when it first launched, and everything was faster than Gemini Flash 1.5), but today I realized it is much slower when calling the API in Python.
I tested out a ~60000 token Chinese context Q&A in google ai studio. Here is the result of whole response completed (not the first token received)
1.5-flash: 20s
gemini-1.5-pro: 50s
1.5-flash-8b: 11s
2.0-flash-exp: 130s
gemini-exp-1206: 103s
I am not sure, but seemes the exp models are not yet optimized for long context.
1 Like