Extend EOL for Gemini Flash cost-effective models

Summary

We request either:

  1. An extension of EOL for gemini-2.0-flash and gemini-2.0-flash-lite, or
  2. A clear roadmap and timeline for a cost-effective successor (e.g. gemini-3-flash-lite) before current low-cost models reach shutdown.

Our production workloads are OCR, data extraction, and summarization and do not require thinking/reasoning.


Impacted models & shutdown dates

Gemini 2.0

  • gemini-2.0-flash — March 31, 2026
  • gemini-2.0-flash-lite — March 31, 2026

Gemini 2.5

  • gemini-2.5-flash — June 17, 2026
  • gemini-2.5-flash-lite — July 22, 2026

Why current replacements are not equivalent

1. Loss of cost-effective non-thinking pricing

Gemini 2.0 Flash was ideal for OCR/extraction/summarization and cost-effective for us:

  • ~$0.10 / 1M input tokens
  • ~$0.40 / 1M output tokens

With gemini-2.5-flash-preview, non-thinking output pricing was ~$0.60 / 1M, with thinking at ~$2.50 / 1M.
After the stable release, the non-thinking pricing path removed, leaving ~$2.50 / 1M output tokens, even though fine-tuning guidance for OCR/extraction recommends the non-thinking variant.

At $2.50 / 1M output tokens, many production OCR/extraction/summarization workloads are no longer economically viable.

2. Benchmarking shows higher cost does not improve accuracy

Our internal benchmarking on OCR + data extraction + summarization:

Base models

  • gemini-2.0-flash: 91%
  • gemini-2.5-flash: 91%
  • gemini-2.5-flash-lite: 87%

Fine-tuned models

  • gemini-2.0-flash-finetuned: 99%
  • gemini-2.5-flash-finetuned: 96.5%
  • gemini-2.5-flash-lite-finetuned: 95%

For our use cases:

  • gemini-2.0-flash is cheaper and more accurate after fine-tuning.
  • gemini-2.5-flash is ~6× more expensive on output tokens with lower accuracy.

3. Roadmap gap blocks long-term planning

We planned to move from gemini-2.0-flashgemini-2.5-flash-litegemini-3-flash-lite when available, accepting some accuracy loss.

However:

  • gemini-2.5-flash-lite is already scheduled for shutdown (July 22, 2026)
  • gemini-3-flash-lite is not announced or publicly timed

This makes long-term planning and client commitments extremely difficult.


Requests

  1. Extend EOL for:

    • gemini-2.0-flash / gemini-2.0-flash-001
    • gemini-2.0-flash-lite / gemini-2.0-flash-lite-001
      by 6–12 months, or until a comparable-cost successor is generally available.
  2. Provide an official roadmap and timeline for gemini-3-flash-lite (or equivalent), including expected pricing for non-thinking use cases.

  3. Consider restoring or offering a low-cost non-thinking tier for flash-class OCR/extraction/summarization workloads, closer to:

    • $0.10–$0.20 / 1M input tokens
    • $0.40–$0.60 / 1M output tokens
8 Likes

Yes, gemini-2.0-flash-lite or gemini-2.5-flash-lite should be continued until gemini-3.0-flash-lite is available at similar pricing.

Low-cost, fast, non-thinking models like Flash-Lite are critical for multiple production use cases. Discontinuing them without an equivalent replacement will have a major impact on services that currently depend on Flash-Lite.

This change would also negatively affect long-term trust in Google’s model roadmap. Additionally, it makes it difficult to transition clients to higher-cost alternatives when there is no tangible gain for tasks that Flash-Lite models can already perform effectively and often with greater accuracy.

2 Likes

We are having the exact same use case (non-thinking image recognition).
We currently migrated to Gemini 2.5 Flash Lite, but it seems that the deprecations are too quick on low-cost models.

Gemini 3 Flash Lite release would be essential in giving confidence to stay in the Gemini Model ecosystem.