I am using Gemini API on vertex ai, gemini 3.0 pro. The problem is that neither thinking budget nor thinking level can’t limit the response from taking up to 2000~3000 tokens thinking. this brings huge problem to degradation of ocr capabilities and budget issue. I hope internal members could fix this problem so that it is more reliable to use the model in businesses. Thanks.
(for context I’m using image input and explicit context caching)