I used “Gemini 1.5 Flash” via API to extract data from PDF files. I also used Structured Output, as I expected to receive specific fields in the response.
After receiving an email from Google about “We’re discontinuing certain Gemini 1.5 models starting May 2025,” I started exploring other models.
I tried “Gemini Flash 2.0 Flash”, but after simply replacing the model, the results were cut in half. This is strange, considering Gemini Flash 2.0 Flash was announced as an improved version of 1.5.
At the same time, “Gemini Flash 2.0 Flash Lite” still returns good data.
Example of pdf file. https://publicity.businessportal.gr/api/download/YMSdata/100784?companyId=154490904000 ( pdf, public Company registration file in Greek )
Prompt:
Extract data from the document following the descriptions in json schema
Structured Output Schema:
{
"type": "object",
"description": "This is a json schema that defines how to extract company details from a document that includes data from a Company Registration Service or a similar governmental or state agency Service.",
"properties": {
"company_name": {
"type": "string",
"description": "Identify and provide the Name of the Company that this registry document is referring to."
},
"company_trade_name": {
"type": "string",
"description": "Identify and provide the Company Trade Name of the Company."
},
"company_address": {
"type": "string",
"description": "Identify and provide the registered address of the Company."
},
"company_registration_number": {
"type": "string",
"description": "Identify and provide the Registration Number of the Company."
},
"company_registration_country": {
"type": "string",
"description": "Identify and provide the registration Country of the Company."
},
"company_file_clarifications": {
"type": "string",
"description": "If the document is unclear or ambiguous regarding any of the above, please state the specific ambiguity and where it occurs in the document. Explain any related information that might be helpful."
},
"company_file_summarized_text": {
"type": "string",
"description": "Make a summary of the document in English."
}
}
}
Result for Gemini 2.0 Flash:
{
"company_name": "ΕΤΑΙΡΙΑ ΜΕΛΕΤΩΝ ΥΠΗΡΕΣΙΩΝ ΚΑΙ ΛΟΓΙΣΜΙΚΟΥ ΓΕΩΧΩΡΙΚΗΣ ΠΛΗΡΟΦΟΡΙΑΣ Ε.Ε.",
"company_trade_name": "KIKLO "
}
Result for Gemini 2.0 Flash-Lite :
{
"company_address": "Εγνατίας 154, ΔΕΘ Περίπτερο 1, Θεσσαλονίκη, 54636",
"company_file_clarifications": "The document refers to an \"ΕΤΑΙΡΙΑ ΜΕΛΕΤΩΝ ΥΠΗΡΕΣΙΩΝ ΚΑΙ ΛΟΓΙΣΜΙΚΟΥ ΓΕΩΧΩΡΙΚΗΣ ΠΛΗΡΟΦΟΡΙΑΣ Ε.Ε.\" (Company of Studies, Services, and Software of Geospatial Information E.E.) and its trade name is \"KIKLO\".",
"company_file_summarized_text": "This document is a registration of the company \"ΕΤΑΙΡΙΑ ΜΕΛΕΤΩΝ ΥΠΗΡΕΣΙΩΝ ΚΑΙ ΛΟΓΙΣΜΙΚΟΥ ΓΕΩΧΩΡΙΚΗΣ ΠΛΗΡΟΦΟΡΙΑΣ Ε.Ε.\" (Company of Studies, Services, and Software of Geospatial Information E.E.) !!!! CUT manually by me !!!! ",
"company_name": "ΕΤΑΙΡΙΑ ΜΕΛΕΤΩΝ ΥΠΗΡΕΣΙΩΝ ΚΑΙ ΛΟΓΙΣΜΙΚΟΥ ΓΕΩΧΩΡΙΚΗΣ ΠΛΗΡΟΦΟΡΙΑΣ Ε.Ε.",
"company_registration_country": "Greece",
"company_registration_number": "154490904000",
"company_trade_name": "KIKLO"
}
I got a similar result with Gemini 1.5 Flash.
For other document types with different response schemas, the problem remains the same. It looks like Gemini 2.0 Flash skips the “description” field.