Bug Report: Consistent PDF Upload/Processing Failure with Gemini 3.x Models in Google AI Studio (Affects Python SDK too)
Platform: Google AI Studio + Python SDK (google-generative-ai)
Duration: Ongoing for approximately 2 months
Summary
Gemini 3.x generation models (Gemini 3.1 Pro, Gemini 3.5 Flash, Gemini 3.1 Flash-Lite) consistently fail to process a specific subset of standard, well-formed PDF documents. The model ingests the file without raising any error, but responds as if no document was provided. The same PDFs and prompts work flawlessly with Gemini 2.5 Flash. Through technical analysis of four failing files (vs. two working references), I have identified the root cause with a reproducible and statistically consistent pattern.
Affected Models
- Gemini 3.1 Pro
- Gemini 3.5 Flash
- Gemini 3.1 Flash-Lite
Unaffected Models (Working Correctly)
Gemini 2.5 Flash — reliably processes the same PDFs with no issues
Steps to Reproduce
In Google AI Studio:
- Open Google AI Studio and select a Gemini 3.x model (e.g., Gemini 3.5 Flash).
- Attach a PDF with the technical characteristics described in the Root Cause section below (Type1 fonts with custom
/Differencesencoding and no/ToUnicodemap). - Submit a prompt such as: “Extract all data from this PDF.”
- Observe the response: the model behaves as if no PDF was received and asks for the content to be pasted as plain text.
Via Python SDK:
Using Part.from_file() to pass PDF data to a Gemini 3.x model via generate_content() produces the same behavior — the model does not process the file content.
Observed Behavior
- No error is raised during file upload or ingestion.
- The model responds as if the PDF was never attached, asking the user to paste the document content manually.
- The model’s internal reasoning (visible via the
thinkingfield) explicitly states: “the actual text of the financial document has not been provided in the prompt.”
Expected Behavior
Gemini 3.x models should reliably parse and extract information from uploaded PDF documents, at parity with — or better than — Gemini 2.5 Flash, including applying a fallback to the /Differences encoding array when a /ToUnicode map is absent (see Technical Recommendation below).
Root Cause Analysis
I analysed four failing files and two working reference files to isolate the cause. The pattern is clear and fully reproducible.
File matrix
| File | Producer | /ToUnicode missing on |
Ingestion result |
|---|---|---|---|
LU0089290844_KID.pdf |
Neevia docCreator v4.5 | All Type1 fonts (R10, R12, R14) | |
LU2533812058_KID.pdf |
Neevia docCreator v4.5 | All Type1 fonts (R10, R12, R14, R35) | |
LU2314312922_KID.pdf |
Neevia docCreator v4.5 | All Type1 fonts (R10, R12, R14, R39) | |
LU2526007799_KID.pdf |
Neevia docCreator v5.0 | /R39 on page 2 only | |
PRIIP_KID_F0GBR04BQM_299.pdf |
Neevia docCreator v5.0 | None |
Primary cause — Missing /ToUnicode maps on Type1 fonts
All failing files share the same structural defect: their Type1 fonts use a custom /Encoding with a /Differences array (a non-standard glyph mapping where character codes do not correspond to Unicode code points) but do not include a /ToUnicode map.
Example font table from LU0089290844_KID.pdf (page 1):
| Font | Subtype | Encoding | /ToUnicode |
|---|---|---|---|
| /R10 | Type1 | Custom (WinAnsiEncoding + /Differences) | ABSENT |
| /R12 | Type1 | Custom (WinAnsiEncoding + /Differences) | ABSENT |
| /R14 | Type1 | Custom (WinAnsiEncoding + /Differences) | ABSENT |
| /R7 | TrueType | -– | Present |
Without a /ToUnicode map, the text renders visually correctly but cannot be extracted as Unicode text. A conforming PDF text extractor (per ISO 32000) obtains unmappable glyphs and effectively sees an empty document — which is precisely the behavior observed with Gemini 3.x.
The partial-failure case strengthens the diagnosis
LU2526007799_KID.pdf (produced by the newer v5.0) is particularly revealing: only font /R39 on page 2 lacks /ToUnicode. The rest of the document is extracted correctly, but page 2 produces corrupted output (unresolved placeholders such as |num07070oneoffcostsportfolioentrycost| and disordered rows in the cost section). This confirms that the failure is per-font and proportional to the missing maps — not a binary all-or-nothing parser failure.
Correlation with the PDF generator
All files produced by Neevia docCreator v4.5 systematically omit /ToUnicode on Type1 fonts with custom encoding. Files produced by v5.0 include it. This is a defect in the upstream generation tool, but the issue reported here remains valid on Google’s side: the Gemini 3.x ingestion pipeline does not apply the ISO 32000-compliant fallback when /ToUnicode is absent, whereas Gemini 2.5 Flash does.
Secondary factor — Compressed cross-reference stream
The v4.5 files use a compressed xref stream instead of a plain xref table:
/Type /XRef, /Filter /FlateDecode, /DecodeParms << /Columns 5 /Predictor 12 >>, /W [1 3 1]
This alone does not cause the failure, but may reduce tolerance in less robust parsers.
Note on structural tagging
None of the files are tagged PDFs (/MarkInfo, /StructTreeRoot, and /Lang are absent in all of them). This is therefore not a discriminating factor, but it rules out a structural tree as an alternative text extraction fallback.
Technical Recommendation
When a font has /Encoding with a /Differences array but no /ToUnicode map, the extractor should reconstruct the character-to-Unicode mapping from the glyph names listed in /Differences (e.g., /T, /h, /period, /space) via the standard Adobe Glyph List. This is the fallback implemented by libraries such as pypdf, and it is precisely what allows those files to be extracted correctly outside of Gemini 3.x. Implementing this fallback would resolve all the failure cases described above.
Impact
This is a production blocker for data extraction pipelines currently being migrated from Gemini 2.x to Gemini 3.x. The affected PDFs are legitimate, publicly distributed financial documents (Key Information Documents / KIDs) compliant with EU regulatory standards (PRIIPs Regulation).
Happy to share the PDF files or SDK code snippets directly if that helps with reproduction. Thank you for looking into this.