Extend EOL for Gemini Flash cost-effective models

Summary

We request either:

  1. An extension of EOL for gemini-2.0-flash and gemini-2.0-flash-lite, or
  2. A clear roadmap and timeline for a cost-effective successor (e.g. gemini-3-flash-lite) before current low-cost models reach shutdown.

Our production workloads are OCR, data extraction, and summarization and do not require thinking/reasoning.


Impacted models & shutdown dates

Gemini 2.0

  • gemini-2.0-flash — March 31, 2026
  • gemini-2.0-flash-lite — March 31, 2026

Gemini 2.5

  • gemini-2.5-flash — June 17, 2026
  • gemini-2.5-flash-lite — July 22, 2026

Why current replacements are not equivalent

1. Loss of cost-effective non-thinking pricing

Gemini 2.0 Flash was ideal for OCR/extraction/summarization and cost-effective for us:

  • ~$0.10 / 1M input tokens
  • ~$0.40 / 1M output tokens

With gemini-2.5-flash-preview, non-thinking output pricing was ~$0.60 / 1M, with thinking at ~$2.50 / 1M.
After the stable release, the non-thinking pricing path removed, leaving ~$2.50 / 1M output tokens, even though fine-tuning guidance for OCR/extraction recommends the non-thinking variant.

At $2.50 / 1M output tokens, many production OCR/extraction/summarization workloads are no longer economically viable.

2. Benchmarking shows higher cost does not improve accuracy

Our internal benchmarking on OCR + data extraction + summarization:

Base models

  • gemini-2.0-flash: 91%
  • gemini-2.5-flash: 91%
  • gemini-2.5-flash-lite: 87%

Fine-tuned models

  • gemini-2.0-flash-finetuned: 99%
  • gemini-2.5-flash-finetuned: 96.5%
  • gemini-2.5-flash-lite-finetuned: 95%

For our use cases:

  • gemini-2.0-flash is cheaper and more accurate after fine-tuning.
  • gemini-2.5-flash is ~6× more expensive on output tokens with lower accuracy.

3. Roadmap gap blocks long-term planning

We planned to move from gemini-2.0-flashgemini-2.5-flash-litegemini-3-flash-lite when available, accepting some accuracy loss.

However:

  • gemini-2.5-flash-lite is already scheduled for shutdown (July 22, 2026)
  • gemini-3-flash-lite is not announced or publicly timed

This makes long-term planning and client commitments extremely difficult.


Requests

  1. Extend EOL for:

    • gemini-2.0-flash / gemini-2.0-flash-001
    • gemini-2.0-flash-lite / gemini-2.0-flash-lite-001
      by 6–12 months, or until a comparable-cost successor is generally available.
  2. Provide an official roadmap and timeline for gemini-3-flash-lite (or equivalent), including expected pricing for non-thinking use cases.

  3. Consider restoring or offering a low-cost non-thinking tier for flash-class OCR/extraction/summarization workloads, closer to:

    • $0.10–$0.20 / 1M input tokens
    • $0.40–$0.60 / 1M output tokens

Yes, gemini-2.0-flash-lite or gemini-2.5-flash-lite should be continued until gemini-3.0-flash-lite is available at similar pricing.

Low-cost, fast, non-thinking models like Flash-Lite are critical for multiple production use cases. Discontinuing them without an equivalent replacement will have a major impact on services that currently depend on Flash-Lite.

This change would also negatively affect long-term trust in Google’s model roadmap. Additionally, it makes it difficult to transition clients to higher-cost alternatives when there is no tangible gain for tasks that Flash-Lite models can already perform effectively and often with greater accuracy.

We are having the exact same use case (non-thinking image recognition).
We currently migrated to Gemini 2.5 Flash Lite, but it seems that the deprecations are too quick on low-cost models.

Gemini 3 Flash Lite release would be essential in giving confidence to stay in the Gemini Model ecosystem.

Coming back on this.
We received an email saying:

On June 15, 2026, we will remove access to the following Generative AI models on Gemini Enterprise Agent Platform for new and inactive projects only. We will also turn off model tuning on this date.

gemini-2.5-flash
gemini-2.5-flash-lite
gemini-3-flash-preview

Yet, gemini-3.1-flash-lite is not released for Tuning in Vertex AI.
Can someone from @Google advise what we should do?