Posting because the operational reality of running production workloads on Gemini has reached the point where we, and other teams I’ve spoken to, are seriously evaluating whether to migrate off the platform entirely. Hoping Google’s product and Dev Relations teams see this, because the issues are structural and they are not going to be solved by another preview-model release.
What happened:
We run an enterprise production chat application. Our model was set to gemini-3.1-flash-lite-preview — a model Google actively marketed as the cost-efficient choice for high-volume agentic workloads, with documentation, code examples, and pricing positioning identical to a GA model.
Starting the first week of May 2026, we began seeing rising rates of 503 “Service Unavailable” errors. These were not 429s, server capacity, not quota. Larger requests were hit disproportionately. LangSmith traces correlate cleanly with the GA transition window: on May 7, Google released gemini-3.1-flash-lite as GA and announced the preview model deprecates May 11 (today) with full shutdown May 25.
14 days from deprecation notice to shutdown. That’s the formal policy. In practice, capacity was being wound down for weeks before the notice ever appeared.
The structural issues this exposes:
-
14-day deprecation windows are operationally hostile to production teams. That is not enough time to validate a replacement on real traffic, run staging cycles, pass change-management gates, deploy, and monitor, especially for teams shipping to paying customers. OpenAI typically gives 6–12 months. Anthropic gives a minimum of 6 months. Google’s 2-week window is an industry outlier and it’s not defensible for any model that was marketed for production use.
-
Capacity is silently reallocated before the formal deprecation notice. Our 503s started before May 7. The capacity wind-down happens in the dark, teams find out from their error monitoring, not from Google’s release notes. This inverts how lifecycle changes should be communicated.
-
Preview models are marketed as production-ready, then treated as expendable.
gemini-3.1-flash-lite-previewhad a full launch announcement, dedicated documentation pages, use case guidance, and pricing positioning. Production teams reasonably adopted it. Google then forced them into a migration on Google’s schedule, not theirs. -
There is no transparent capacity SLA, and no advance signal of capacity changes. The published docs mention preview models “may have more restrictive rate limits.” They do not disclose that capacity is shared in a pool that gets actively reallocated during transitions, with predictable degradation. Teams cannot plan around what they cannot see.
-
The release cadence is fundamentally incompatible with production stability. Every Gemini model we’ve evaluated has had a 6–10 week useful lifespan before being deprecated, redirected, or capacity-throttled. Compare to GPT-4 (released March 2023, still in production use), Claude 3.5 Sonnet (released June 2024, still in production use). Production teams need predictability. Gemini’s cadence delivers the opposite.
The business consequence:
We are a pre-launch consumer application. Hours-before-launch 503 spikes on a model we adopted weeks ago based on Google’s own marketing is the kind of incident that causes founders to seriously question whether Gemini belongs in the stack at all. Our embeddings, our Live API integration, and our image generation are all on Google products. The cost of leaving is non-trivial. The operational instability is forcing exactly that internal conversation right now, and the math is moving against Google week by week.
We are not the only team having this conversation. Multiple threads on this forum from the last 60 days report identical patterns on gemini-3.1-pro-preview, on earlier gemini-3-pro-preview, on gemini-3.1-flash-lite-preview. The “Gemini is faster/cheaper” pitch increasingly fails the operational reality test when factored against the engineering cost of repeated forced migrations and the customer-trust cost of intermittent failures.
Questions for Google product and DevRel:
-
Will Google commit to a longer minimum deprecation window for any model that has been marketed for production use? 60 or 90 days would meaningfully change the calculus.
-
Will Google publish honest capacity expectations for both preview and GA models, including what production teams should expect during transition windows?
-
Will Google provide advance signal, visible to customers, when capacity allocation for a model is being reduced, before customer error rates rise?
-
For the teams already burned: what specifically has changed at Google to ensure this pattern doesn’t repeat at the next GA transition (Gemini 3.1 Pro, Gemini 3 Flash, and whatever comes after)?
-
What is the official guidance for production teams that need stable, predictable model lifecycles? “Use GA models only” doesn’t answer this when GA models also sit on 6–10 month deprecation timelines.
Open to responses from Googlers and from other teams that have navigated this. We are actively evaluating whether to remain on Gemini for the chat workload or migrate to a more operationally predictable provider, and the answers in this thread will materially inform that decision.