Extend EOL for Gemini Flash cost-effective models

csekas · February 7, 2026, 10:45am

Summary

We request either:

An extension of EOL for gemini-2.0-flash and gemini-2.0-flash-lite, or
A clear roadmap and timeline for a cost-effective successor (e.g. gemini-3-flash-lite) before current low-cost models reach shutdown.

Our production workloads are OCR, data extraction, and summarization and do not require thinking/reasoning.

Impacted models & shutdown dates

Gemini 2.0

gemini-2.0-flash — March 31, 2026
gemini-2.0-flash-lite — March 31, 2026

Gemini 2.5

gemini-2.5-flash — June 17, 2026
gemini-2.5-flash-lite — July 22, 2026

Why current replacements are not equivalent

1. Loss of cost-effective non-thinking pricing

Gemini 2.0 Flash was ideal for OCR/extraction/summarization and cost-effective for us:

~$0.10 / 1M input tokens
~$0.40 / 1M output tokens

With gemini-2.5-flash-preview, non-thinking output pricing was ~$0.60 / 1M, with thinking at ~$2.50 / 1M.
After the stable release, the non-thinking pricing path removed, leaving ~$2.50 / 1M output tokens, even though fine-tuning guidance for OCR/extraction recommends the non-thinking variant.

At $2.50 / 1M output tokens, many production OCR/extraction/summarization workloads are no longer economically viable.

2. Benchmarking shows higher cost does not improve accuracy

Our internal benchmarking on OCR + data extraction + summarization:

Base models

gemini-2.0-flash: 91%
gemini-2.5-flash: 91%
gemini-2.5-flash-lite: 87%

Fine-tuned models

gemini-2.0-flash-finetuned: 99%
gemini-2.5-flash-finetuned: 96.5%
gemini-2.5-flash-lite-finetuned: 95%

For our use cases:

gemini-2.0-flash is cheaper and more accurate after fine-tuning.
gemini-2.5-flash is ~6× more expensive on output tokens with lower accuracy.

3. Roadmap gap blocks long-term planning

We planned to move from gemini-2.0-flash → gemini-2.5-flash-lite → gemini-3-flash-lite when available, accepting some accuracy loss.

However:

gemini-2.5-flash-lite is already scheduled for shutdown (July 22, 2026)
gemini-3-flash-lite is not announced or publicly timed

This makes long-term planning and client commitments extremely difficult.

Requests

Extend EOL for:
- gemini-2.0-flash / gemini-2.0-flash-001
- gemini-2.0-flash-lite / gemini-2.0-flash-lite-001
  by 6–12 months, or until a comparable-cost successor is generally available.
Provide an official roadmap and timeline for gemini-3-flash-lite (or equivalent), including expected pricing for non-thinking use cases.
Consider restoring or offering a low-cost non-thinking tier for flash-class OCR/extraction/summarization workloads, closer to:
- $0.10–$0.20 / 1M input tokens
- $0.40–$0.60 / 1M output tokens

ai92 · February 8, 2026, 8:37am

Yes, gemini-2.0-flash-lite or gemini-2.5-flash-lite should be continued until gemini-3.0-flash-lite is available at similar pricing.

Low-cost, fast, non-thinking models like Flash-Lite are critical for multiple production use cases. Discontinuing them without an equivalent replacement will have a major impact on services that currently depend on Flash-Lite.

This change would also negatively affect long-term trust in Google’s model roadmap. Additionally, it makes it difficult to transition clients to higher-cost alternatives when there is no tangible gain for tasks that Flash-Lite models can already perform effectively and often with greater accuracy.

Niko_Kovacic · February 19, 2026, 8:17pm

We are having the exact same use case (non-thinking image recognition).
We currently migrated to Gemini 2.5 Flash Lite, but it seems that the deprecations are too quick on low-cost models.

Gemini 3 Flash Lite release would be essential in giving confidence to stay in the Gemini Model ecosystem.

csekas · May 29, 2026, 9:00am

Coming back on this.
We received an email saying:

On June 15, 2026, we will remove access to the following Generative AI models on Gemini Enterprise Agent Platform for new and inactive projects only. We will also turn off model tuning on this date.

gemini-2.5-flash
gemini-2.5-flash-lite
gemini-3-flash-preview

Yet, gemini-3.1-flash-lite is not released for Tuning in Vertex AI.
Can someone from @Google advise what we should do?

Topic		Replies	Views
Switching from 2.0 flash Gemini API bug , api , models , gemini , gemini-flash	0	82	February 25, 2026
Any alternative to gemini-1.5-flash-8b now that it’s deprecated? Gemini API gemini-15 , models	3	337	October 7, 2025
Model deprecations and replacements - Gemini Flash 2.0 Gemini API models , gemini-flash	1	392	December 3, 2025
Please do not discontinue gemini-2.0-flash[-lite], 2.5 is NOT an equivalent Gemini API gemini-2-5	2	227	February 20, 2026
Gemini-1.5-Flash-002 Discontinuation date is September 24, 2025? It's too short. We hopt at least 5 years Gemini API gemini-15	4	1500	November 6, 2024