Update
We have identified that the issue is related to how the PDF is being inserted.
We created a reproducible code sample showing that it works fine with gemini-2.5-flash-lite
, but fails with gemini-2.5-flash
.
Important: This exact code has been working for weeks without any problems, but since Saturday it no longer works with gemini-2.5-flash
.
Below is the code:
try:
prompt = "Summaryze the document."
with open("6200195.pdf", "rb") as file:
binary_data = file.read()
contents = [
types.Content(
role="user",
parts=[
types.Part.from_bytes(
data=binary_data,
mime_type='application/pdf',
),
types.Part(text=prompt.strip())
]
)
]
config = types.GenerateContentConfig(
temperature=0.3,
max_output_tokens=40000,
thinking_config=types.ThinkingConfig(
thinking_budget=0,
),
seed=0,
top_p=0.5,
response_modalities=["TEXT"],
system_instruction=None,
safety_settings=[
types.SafetySetting(category="HARM_CATEGORY_HATE_SPEECH", threshold="OFF"),
types.SafetySetting(category="HARM_CATEGORY_DANGEROUS_CONTENT", threshold="OFF"),
types.SafetySetting(category="HARM_CATEGORY_SEXUALLY_EXPLICIT", threshold="OFF"),
types.SafetySetting(category="HARM_CATEGORY_HARASSMENT", threshold="OFF"),
]
)
# Generate response with streaming
response_text = ""
for chunk in client.models.generate_content_stream(
model=model_name,
contents=contents,
config=config
):
# print(f"CHUNK: #{chunk}")
if hasattr(chunk, 'candidates') and chunk.candidates:
for candidate in chunk.candidates:
if hasattr(candidate, 'content') and candidate.content:
for part in candidate.content.parts:
if hasattr(part, 'text') and part.text:
response_text += part.text
if response_text:
print(f"✅ SUCCESS with {model_name}!")
print(f"Response: {response_text}")
return True, model_name
else:
print(f"❌ No text response from {model_name}")
except Exception as e:
print(f"❌ Error with {model_name}: {e}")
continue
Response
Trying gemini-2.5-flash...
❌ Error with gemini-2.5-flash: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Request contains an invalid argument.', 'status': 'INVALID_ARGUMENT'}}
Trying gemini-2.5-flash-lite...
✅ SUCCESS with gemini-2.5-flash-lite!
Response: This document explains how to use the Gemini API for document comprehension, focusing on PDF processing. Here's a breakdown of the key points:
**Gemini's Document Comprehension Capabilities:**
* **Beyond Text Extraction:** Gemini can analyze and interpret various content types within documents, including text, images, diagrams, graphs, and tables, even in documents up to 1,000 pages long.
* **Structured Output:** It can extract information in structured formats.
* **Summarization and Q&A:** Gemini can summarize documents and answer questions based on both visual and textual elements.
* **Content Transcription:** It can transcribe document content (e.g., to HTML) while preserving layout and formatting for later use.
**How to Pass PDF Data:**
* **Interleaved PDFs:** You can pass interleaved PDF data in your `generateContent` requests.
* **Small PDFs (under 20 MB):**
* **Base64 Encoding:** Upload documents encoded in Base64.
* **Local Files:** Upload files stored locally.
* **Large PDFs (over 20 MB):**
* **File API:** Use the File API for uploading larger documents. This is recommended when the total request size (including files, text instructions, etc.) exceeds 20 MB.
* **File API Storage:** The File API allows storing up to 50 MB of PDF files for 48 hours. You can access them with your API key during this period, but not download them. The File API is free in all regions where the Gemini API is available.
* **Uploading via File API:** Use `media.upload` to upload a document file and then use it in a `models.generateContent` call.
* **Large PDFs from URLs:** The File API can also simplify uploading and processing large PDFs from URLs.
* **Large PDFs Stored Locally:** You can upload large PDFs stored locally using the File API.
* **Verifying Uploads:** You can verify successful uploads and retrieve metadata by calling `files.get`.
**Processing Multiple PDFs:**
* The Gemini API can process multiple PDF documents (up to 1,000 pages) in a single request, provided the combined size of the documents and the text prompt fit within the model's context window.
**Technical Details:**
* **Page Limit:** Gemini supports up to 1,000 pages per document, with each page equating to 258 tokens.
* **Image Resolution:** While there are no strict pixel limits, larger pages are downscaled to a maximum of 3,072 x 3,072, and smaller pages are upscaled to 768 x 768, maintaining their aspect ratio.
* **Document Types:** While you can pass other MIME types (TXT, Markdown, HTML, XML, etc.), Gemini's document vision primarily understands PDFs meaningfully. Other types will be extracted as plain text, and the model won't interpret their rendered appearance (graphics, diagrams, HTML tags, etc.).
**Best Practices:**
* Rotate pages to the correct orientation before uploading.
* Avoid blurry pages.
* For single-page documents, place the text prompt after the page.
**Further Resources:**
* **Prompting Strategies with Files:** Learn about multimodal prompting with text, image, audio, and video data.
* **System Instructions:** Direct the model's behavior based on your specific needs.
The content is subject to a Creative Commons Attribution 4.0 license, and code examples are under the Apache 2.0 license.