When you make a call to Gemini’s new video understanding model, we need to pass in the URL every time. Does that mean the model is analyzing the video each time when we pass in a different prompt or want to talk about the video? If that is the case, it would require a lot of tokens and result in a very high cost. Does anyone have any experience working with this model?
Hi @Sohaib_Sajid , Welcome to the forum.
Gemini API is stateless, so it processes the input fresh with every request. To reduce costs, you can utilize context caching to avoid reprocessing the same data repeatedly. There is also a cookbook available you can refer to.
Thanks
1 Like