I created a script using gemini api to OCR texts of my daily videos (1 min each) from DJI camera, video contents are mostly working with my laptop. Before it works fine, gemini-2.5-pro can correctly extract most texts from videos.
But lately I observed it gives nonsense for most videos, texts extracted by it sometimes are totally irrelevant to actual texts in video. For instance, I was planing a trip by chatgpt on laptop, it gives me a long poetry only contains one correct word ‘Italy‘…… However, if I take some screen shots from the video and send them to the model, it correctly response all texts.
I’ve tried gemini api, ai studio (web), vertex ai api, vertex studio, flash, flash-lite, all the same…
Wondering anyone faced to same issue here?