Gemini-flash-2-5 for bounding box detection performs worse when using thinking

For our bounding box detection tasks on documents, we found that gemini-flash-2.5 performs quite well, but only if thinking_budget=0.

If thinking_budget>0 the bounding boxes are much worse and sometimes quite far away from the actual object one is trying to detect.

Has anybody made similar observations?