Summary
We request either:
- An extension of EOL for gemini-2.0-flash and gemini-2.0-flash-lite, or
- A clear roadmap and timeline for a cost-effective successor (e.g. gemini-3-flash-lite) before current low-cost models reach shutdown.
Our production workloads are OCR, data extraction, and summarization and do not require thinking/reasoning.
Impacted models & shutdown dates
Gemini 2.0
- gemini-2.0-flash — March 31, 2026
- gemini-2.0-flash-lite — March 31, 2026
Gemini 2.5
- gemini-2.5-flash — June 17, 2026
- gemini-2.5-flash-lite — July 22, 2026
Why current replacements are not equivalent
1. Loss of cost-effective non-thinking pricing
Gemini 2.0 Flash was ideal for OCR/extraction/summarization and cost-effective for us:
- ~$0.10 / 1M input tokens
- ~$0.40 / 1M output tokens
With gemini-2.5-flash-preview, non-thinking output pricing was ~$0.60 / 1M, with thinking at ~$2.50 / 1M.
After the stable release, the non-thinking pricing path removed, leaving ~$2.50 / 1M output tokens, even though fine-tuning guidance for OCR/extraction recommends the non-thinking variant.
At $2.50 / 1M output tokens, many production OCR/extraction/summarization workloads are no longer economically viable.
2. Benchmarking shows higher cost does not improve accuracy
Our internal benchmarking on OCR + data extraction + summarization:
Base models
- gemini-2.0-flash: 91%
- gemini-2.5-flash: 91%
- gemini-2.5-flash-lite: 87%
Fine-tuned models
- gemini-2.0-flash-finetuned: 99%
- gemini-2.5-flash-finetuned: 96.5%
- gemini-2.5-flash-lite-finetuned: 95%
For our use cases:
- gemini-2.0-flash is cheaper and more accurate after fine-tuning.
- gemini-2.5-flash is ~6× more expensive on output tokens with lower accuracy.
3. Roadmap gap blocks long-term planning
We planned to move from gemini-2.0-flash → gemini-2.5-flash-lite → gemini-3-flash-lite when available, accepting some accuracy loss.
However:
- gemini-2.5-flash-lite is already scheduled for shutdown (July 22, 2026)
- gemini-3-flash-lite is not announced or publicly timed
This makes long-term planning and client commitments extremely difficult.
Requests
-
Extend EOL for:
- gemini-2.0-flash / gemini-2.0-flash-001
- gemini-2.0-flash-lite / gemini-2.0-flash-lite-001
by 6–12 months, or until a comparable-cost successor is generally available.
-
Provide an official roadmap and timeline for gemini-3-flash-lite (or equivalent), including expected pricing for non-thinking use cases.
-
Consider restoring or offering a low-cost non-thinking tier for flash-class OCR/extraction/summarization workloads, closer to:
- $0.10–$0.20 / 1M input tokens
- $0.40–$0.60 / 1M output tokens