Inconsistent Data Extraction and Skipped Content Using Gemini API Models

Mohammed_Anas_M · February 5, 2025, 8:51am

Hello,

I am using Gemini API models to extract information from PRD (Product Requirement Document) files. Our goal is to extract all relevant details in text form, including text information from images, flowcharts, descriptions, tables, diagrams, annotations, and any other structured or unstructured elements within the document.

However, we are encountering the following issues:

Inconsistent Data Extraction:

Some flowcharts, tables, or diagrams are not fully processed or extracted. The text information from certain sections (like images, annotations, or descriptions) is incomplete or missing entirely.

Skipped Content in Longer Documents:

For longer PRD documents, the models often fail to read the complete content and skip significant portions of the document. Has anyone experienced similar challenges with Gemini API models? Are there any recommended configurations, preprocessing techniques, or strategies to ensure the models process the entire document and improve the accuracy of the extracted data?

Any guidance, best practices, or workarounds would be greatly appreciated.

Thank you!

Mrinal_Ghosh · June 12, 2025, 10:30am

Hi @Mohammed_Anas_M ,

Welcome to the Forum !

Sorry for the late response.
Can you please let me know which Gemini model you are using ?

Topic		Replies	Views
Gemini 2.0 flash - 1.5 pro Struggles with Basic Task Execution Gemini API gemini-15 , api , models	1	134	May 19, 2025
Extracting Structured Text from Multi-Page Scanned Documents Gemini API gemini-15 , ai-studio , models	1	197	March 18, 2025
PDF Parsing Issue: Checkbox Selection Not Extracted Properly Gemini API gemini-api	3	139	June 13, 2025
上传pdf经常失败，阅读超过500页的pdf比如招股说明书文件经常出错 Gemini API api , docs	1	101	June 17, 2025
Invoice extractor using gemini pro Gemini API	2	279	May 28, 2024

Inconsistent Data Extraction and Skipped Content Using Gemini API Models

Related topics