Data Extraction Accuracy Issues from Documents due to Image Orientation and OCR

I’m encountering recurring errors during structured data extraction when the source document images have incorrect orientation (skewed or rotated).

These errors are not related to LLM logic or prompt instructions. When I manually align the images to the correct orientation before processing, the errors disappear.

This suggests that the core issue lies in the image pre-processing and/or OCR stage, rather than the LLM’s text interpretation. The LLM model receives text that is already distorted or incorrectly structured by the OCR, making accurate data extraction impossible, even with detailed instructions in the prompt.

I’d prefer not to integrate a third-party OCR service/library before interacting with the API. Is this something Gemini developers can address?
I am currently using gemini-2.5-flash.

1 Like