Data Extraction Accuracy Issues from Documents due to Image Orientation and OCR

I’m encountering recurring errors during structured data extraction when the source document images have incorrect orientation (skewed or rotated).

These errors are not related to LLM logic or prompt instructions. When I manually align the images to the correct orientation before processing, the errors disappear.

This suggests that the core issue lies in the image pre-processing and/or OCR stage, rather than the LLM’s text interpretation. The LLM model receives text that is already distorted or incorrectly structured by the OCR, making accurate data extraction impossible, even with detailed instructions in the prompt.

I’d prefer not to integrate a third-party OCR service/library before interacting with the API. Is this something Gemini developers can address?
I am currently using gemini-2.5-flash.

1 Like

Hello,

Welcome to the Forum!!

Just to be clear, to my understanding you are not able to extract correct data when the image is rotated or skewed and you are requesting a feature to integrate an external tool to Gemini to read images better. Did I understand your issue correctly?

Yes, it would be great if Gemini processed rotated or skewed images the same way as properly oriented ones.

Would it be possible for you to share your code and the images that you are passing so that I can recreate your issue?

Example:
I am sending a JPEG with the document and an example JSON structure. Prompt: “Convert PDF file to JSON. I am attaching the JSON structure description.”
The document in the image is rotated 90 degrees. The result of processing the document table is not entirely correct. If the JPEG is rotated to the correct position before sending, everything is fine.
Additional instructions in the prompt can only slightly improve the result.

Model: gemini-2.5-flash;
Temperature: 0,1;
Thinking mode: off.
JSON:

[
  {
    "DocumentName": "doc name",
    "DocumentHeader": {
      "Number": "doc number",
      "Date": "document date (YYYY-MM-DD)"
    },
    "DocumentParties": [
      {
        "Role": "company role",
        "Name": "company name",
        "Addresses": [
          {
            "StreetAndNumber": "street",
            "CityName": "city",
            "Region": "region",
            "PostalCode": "zip"
          }
        ]
      }
    ],
    "DocumentLines": [
      {
        "LineNumber": "line number",
        "BuyerItemCode": "item code",
        "EAN": "EAN-8 or EAN-13",
        "ItemDescription": "item description",
        "InvoiceQuantity": "quantity",
        "UnitGrossPrice": "gross price",
        "GrossAmount": "total"
      }
    ]
  }
]

Hi,

Would it possible for you to share part of your code, where you are calling Gemini API?

Hi,
Sadly, I can’t show the code.
The bug can be reproduced in Google AI Studio with the settings I described above.

Hello,

I would require you document for reproduce your issue, a few where you facing the issue should be enough. Is it possible for you to share the document?

Hello,
This document can be used to reproduce the error.

Hi,

Apologies, I could not find any document with your comment.

Hello.
I added the document twice. It was removed twice by amit_rana. I don’t know why he does that. I’m adding the document again.

It would be better if you DM me this picture with your prompt. And you might want to remove this here.

The document does not contain any confidential information. No need to delete.
Error detection prompt:
“Convert PDF file to JSON. I am attaching the JSON structure description.”
Model settings:
Model: gemini-2.5-flash;
Temperature: 0,1;
Thinking mode: off.
JSON:

[
  {
    "DocumentName": "doc name",
    "DocumentHeader": {
      "Number": "doc number",
      "Date": "document date (YYYY-MM-DD)"
    },
    "DocumentParties": [
      {
        "Role": "company role",
        "Name": "company name",
        "Addresses": [
          {
            "StreetAndNumber": "street",
            "CityName": "city",
            "Region": "region",
            "PostalCode": "zip"
          }
        ]
      }
    ],
    "DocumentLines": [
      {
        "LineNumber": "line number",
        "BuyerItemCode": "item code",
        "EAN": "EAN-8 or EAN-13",
        "ItemDescription": "item description",
        "InvoiceQuantity": "quantity",
        "UnitGrossPrice": "gross price",
        "GrossAmount": "total"
      }
    ]
  }
]

If you send a file to the model with exactly the same orientation as I sent it, then not the entire code will be recorded in BuyerItemCode.
If the image is rotated to the correct orientation - there is no error.

Hi,

This document has buyer and seller information, so we will have to remove it from public forum.

Buyer and seller information - is a not real data

Hello,

I passed image you shared (dummy data) in same orientation to AI Studio, I turned on Structured output but did not specify any structure. And I received this output:

{
  "invoice_number": "123456",
  "invoice_date": "27/05/2024",
  "seller": {
    "name": "LLC \"Mega Company 1\"",
    "address": "Address: 1111, Zakarpattia Region, Tyachiv, str. Svobody, 9-b"
  },
  "buyer": {
    "name": "LLC \"Super Company 2\"",
    "address": "Address: 22222, Luhansk Region, Lysychansk, str. Zelena, 17"
  },
  "sales_point": {
    "name": "Supermarket \"Loneliness\"",
    "address": "Address: 33333, Kyiv Region, Obuhiv, str. Chervona, 22"
  },
  "items": [
    {
      "no": 1,
      "item_code": "1223334",
      "item_description": "Gillette Shaving Foam Sensitive Skin 200ml",
      "ean": "7126352817290",
      "quantity": 10,
      "gross_price": 5,
      "total": 50
    },
    {
      "no": 2,
      "item_code": "5342125",
      "item_description": "Head & Shoulders Anti-Dandruff Shampoo Menthol Fresh 400ml",
      "ean": "5432716290012",
      "quantity": 5,
      "gross_price": 33,
      "total": 165
    }
  ]
}

I believe this is the correct information, which we wanted to extract from image. I used Gemini 2.5 Flash, Temperature = 0.1, thinking mode = off, Structured output=on.

This proves that Gemini 2.5 Flash is capable of reading rotated images, next you might want to explore how to structure your output better.

You can find relevant information about this here on Gemini API Docs.

The result contains errors. item_code was not defined correctly. For position 1, the expected result is: 12233344. For position 2: 53421253/832141234.
If you repeat the test by returning the image to the correct position, everything is ok

Ok, now I understand your issue, when you rotate image Gemini struggle to read multiple line in a column. This seems like a model performance issue.
Thank you for raising this, I have noted your inputs and will forward it to concerned team.

1 Like