I have a bunch of scripts still using Gemini gemini-1.5-flash-002. I’ve tried to migrate to 2.5 Flash before, but the thinking mode being activated by default results in a much higher cost than I’m able to assume. Problem is, trying to deactivate the thinking mode doesn’t work.
There is already a Github issue:
Even the example in the repo is throwing errors:
google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {‘error’: {‘code’: 400, ‘message’: ‘The model does not support
setting thinking_budget to 0.’, ‘status’: ‘INVALID_ARGUMENT’}}
Does anyone have succesfully deactivated thinking mode?
As mentioned in the Gemini thinking | Gemini API | Google AI for Developers, it is possible to disable “thinking mode” in Gemini 2.5 Flash model by setting the thinking_budget parameter to 0 in the API request. Please see the gist. Here the example output confirms this — thinking_budget=0 is set, the response metadata contains no thoughts_token_count , indicating that no reasoning tokens were generated.
Where as with Gemini 2.5 Pro model, thinking cannot be disabled; attempts to set thinking_budget=0 will result in a 400 INVALID_ARGUMENT error.