Bounding Box Alignment Problems and Image Rescaling

I’m trying to use bounding boxes to identify number labelling on images so i can paste a word label next to it. A standalone ocr case wasnt adaptive enough. Gemini-pro seems to be detecting the items, but when i try to draw the boundingbox, the locations are off in different ways that seem to indicate some sort of systemic translation error, that indicated either a shift and/or a compression or expansion issue. I have tried multiples conversion techniques of the 4 vector output (per guidelines of 1000x1000 image base) for bb generation vs the sample image, but the BB are always off in location by a shift or compression or both. I can tell the correct labels are being identified because the llm generates the right number of labels, the correct instance of labels, and the general sequence top to bottom on a page. Still, the accuracy of the location is significantly off. Can someone help with plotting BB and accuracy of BB from the gemini-1.5-pro model?

1 Like

Hi @Kevin_Dragan ,
The bounding boxes are misaligned due to mismatched image scaling or coordinate normalization. Make sure to rescale the bounding box coordinates based on the actual image dimensions used during rendering.

Thanks!