Bounding Box Alignment Problems and Image Rescaling

Kevin_Dragan · September 10, 2024, 1:01pm

I’m trying to use bounding boxes to identify number labelling on images so i can paste a word label next to it. A standalone ocr case wasnt adaptive enough. Gemini-pro seems to be detecting the items, but when i try to draw the boundingbox, the locations are off in different ways that seem to indicate some sort of systemic translation error, that indicated either a shift and/or a compression or expansion issue. I have tried multiples conversion techniques of the 4 vector output (per guidelines of 1000x1000 image base) for bb generation vs the sample image, but the BB are always off in location by a shift or compression or both. I can tell the correct labels are being identified because the llm generates the right number of labels, the correct instance of labels, and the general sequence top to bottom on a page. Still, the accuracy of the location is significantly off. Can someone help with plotting BB and accuracy of BB from the gemini-1.5-pro model?

Topic		Replies	Views
Issues with the Accuracy of Object Coordinates Detected by Gemini 1.5 in Images Gemini API gemini-15	6	327	June 10, 2024
Bounding Box detection Failing with Gemini 2.0 flash Gemini API api , gemini-flash , gemini-20	0	24	May 21, 2025
Inaccurate Bounding Box for forms Gemini API api	7	125	May 22, 2025
How to improve gemini-1.5-flash output accuracy on images Gemini API gemini-15 , model	3	114	September 12, 2024
How to optimize graphic coordinates General Discussion models , android , tflite , help_request , java	7	1542	September 15, 2021

Bounding Box Alignment Problems and Image Rescaling

Related topics