File Search Tool fails to process medical PDF documents with complex table structures and embedded charts

John_Mavic · November 11, 2025, 12:24pm

Problematic File Characteristics:

I have identified a specific PDF type that consistently fails to process in the File Search Tool - medical cardiopulmonary stress test reports from hospital systems. Due to data privacy regulations, I cannot share the actual file, but I can provide detailed characteristics that likely contribute to the failure:

Complex Mixed Content Structure:
- Dense multi-column tables with hundreds of rows of numerical data
- Multiple embedded charts/graphs on the same pages as tabular data
- Overlapping content layers (text, tables, and vector graphics)
Special Characters and Encoding:
- Medical measurement units with special symbols (°C, L/min, mmHg, mL/min/kg)
- German umlauts (ä, ö, ü) in patient data
- Mathematical symbols (ø, ², %, )
High Data Density:
- 5 pages containing over 300 rows of time-series measurement data
- 18+ columns per table with mixed data types (text, integers, floats)
- Multiple nested header rows
Embedded Visualizations:
- 15+ line charts showing physiological parameters over time
- Flow-volume loops
- Charts with gridlines, legends, and multiple data series

Expected Behavior: The File Search Tool should successfully parse, chunk, and index the PDF file, extracting both the tabular data and recognizing the presence of charts.

Actual Behavior: The upload process fails, causing the entire batch upload operation to stall and eventually fail.

Impact: This issue is critical for enterprise use cases where:

Users need to upload large document sets containing medical reports, technical specifications, or scientific papers
A single problematic file blocks processing of all subsequent files
No clear error message indicates which aspect of the file caused the failure

Suspected Root Cause: The File Search Tool’s chunking algorithm likely encounters issues when processing PDFs that contain:

Overlapping bounding boxes from tables and embedded graphics
Complex text extraction where column boundaries are ambiguous
Special Unicode characters that may not be handled correctly during the embedding generation phase
Documents where content is rendered as a combination of text objects and vector graphics

Request: If needed, I can provide a sample file that reproduces this issue through a secure channel, but due to the sensitive nature of medical data, I cannot attach it to this public bug report.

Mahesh_Sutar · December 30, 2025, 12:24pm

Hello @John_Mavic

I tried to reproduce this on my end using a sample chemical document that includes Dense Multi-Column Tables, Pharmacokinetic Time-Series Charts, Special Character Encoding, and Layered Vector Graphics.

It is working with the File Search API as expected for me.

Could you share the Sample code snippet with prompts and a sample document DM me so I can check it on my end?

Topic		Replies	Views
Uploads in File Search Store stuck in Pending state Gemini API bug , api , performance , ground-search	0	28	March 9, 2026
FileAPI PDF files upload issues - ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Request contains an invalid argument.', 'status': 'INVALID_ARGUMENT'}} Gemini API api , models , gemini , api-key	0	82	January 12, 2026
BUG - Gemini API Gemini API bug	1	46	February 3, 2026
Investigating undocumented File Search retrieval limits that cap grounding at ~5 chunks / 2-3 documents per query Gemini API api , gemini	2	73	December 23, 2025
上传pdf经常失败，阅读超过500页的pdf比如招股说明书文件经常出错 Gemini API api , docs	1	97	June 17, 2025

File Search Tool fails to process medical PDF documents with complex table structures and embedded charts

Related topics