Gemini 2.0 and PDF OCR Fine-tuning

Jesse_Merrigan · March 18, 2025, 3:45pm

Hi everyone, I’m currently experimenting with Gemini 2.0 Flash for OCR on PDF documents. I have a relatively long prompt that instructs how I’d like the model to process the document (including deciphering relevant text and extracting handwritten text, formatting in a specific way, and how it should use confidence scores to judge when to correct OCR-based inaccuracies vs. retaining genuine spelling errors from the original handwritten text).

However, I feel I could improve the one-shot outputs I’m getting if I could include/attach .pdf documents in user inputs for fine-tuning jobs. Does anyone know if this is possible anytime soon, or if there’s an equivalent workaround?

Cheers!

GUNAND_MAYANGLAMBAM · June 12, 2025, 10:14am

Hey @Jesse_Merrigan

Unfortunately, direct fine-tuning of Gemini 2.0 Flash with PDF files isn’t currently possible.

Topic		Replies	Views
Does PDF fine-tuning focus solely on text extraction, or does it also perform visual inference? Gemini API fine-tuning	1	173	June 13, 2025
Upload PDF to Gemini File API Gemini API gemini-15 , gemini-api	11	1807	February 6, 2025
Preparing PDF files for fine-tuning Gemini with appropriate JSON format Google AI Studio fine-tuning , datasets	2	492	January 27, 2025
Gemini Fine-Tuned Model - Document Processing Error and Capability Inquiry Gemini API fine-tuning	4	161	April 7, 2025
Bad process of delaings with PDFs Google AI Studio ai-studio	3	166	September 9, 2025

Gemini 2.0 and PDF OCR Fine-tuning

Related topics