When will gemini-flash-lite-preview be supportted for batching. There shouldn’t be a disconnet be batching and none batching options for that model.
400 INVALID_ARGUMENT. {‘error’: {‘code’: 400, ‘message’: ‘Do not support publisher model gemini-2.5-flash-lite-preview-09-2025’, ‘status’: ‘INVALID_ARGUMENT’}}
Thanks
Hi @OnQuestionAtaTime, welcome to the community!
I can see you are using the preview model. gemini-2.5-flash and lite are Globally Available; for production usage, it’s recommended to use the stable models.
Please try using those instead.
Thank you!
Thank you
So you confirm that now flash-lite and flas-lite-preview-09-2025 are the same. It wasn’t the case before.
Thanks
According to the web, there is significant speed and cost improvement for using preview. Can you please confirm if previews are the same as the globally available models, or when they will be made available please?
Thank you
Hey,
They are not the same! What I meant is, you are using preview models, and gemini-2.5-flash and gemini-2.5-flash-lite are available globally as well.
Though preview models can be used in production as well, they come with more restricted rate limits and will be deprecated with at least 2 weeks’ notice. Hence, I suggest using the stable models.
Please refer to the attached hyperlinks for complete details.
Thank you!
Thank you for your reply
But they can’t be used for batching and the previews offer significant improvements.
Hi,
Any updates on this please, I am running batches, but the quality dropped considerably from using the API without batches. So making the cost saving pointless.
You can easily compare the model intelligence from :
https://artificialanalysis.ai/models/gemini-2-5-flash-lite-preview-09-2025
https://artificialanalysis.ai/models/gemini-2-5-flash-lite
It’s a bit unfair that batching in unavailable for the preview version. Or may be it’s only available on some specific geographic zones.
Thanks
Trying to debug, regularly the preview models are supported, however the error message doesn’t seem to align with what we produce in the Gemini API, it might be that you are trying to call Vertex AI model serving. Vertex has a separate batch offering than the Gemini API:
Gemini API Batch: https://ai.google.dev/gemini-api/docs/batch-api
Vertex AI Batch: Batch inference with Gemini | Generative AI on Vertex AI | Google Cloud Documentation.
Thanks for that. It’s clear now that Vertex AI for the model “gemini-2.5-flash-lite-preview-09-2025” does not support batching “yet”:
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-lite#2.5-flash-lite-preview
While google.cloud has batching available for Gemini-2.5-flash-lite-preview-09-2025 version.
https://ai.google.dev/gemini-api/docs/models#gemini-2.5-flash-lite-preview
I am not sure what is the reason , but the time to completion of a simple batch request is in order of magnitude faster on vertex.ai compared to google cloud, even though vertex.ai is part of the google cloud infrastructure.
Is vertex.ai more appropriate to scale applications? If that’s correct, when are you planning to be less restrictive on the model choice in Vertex.AI for batching compared to google.cloud please?
Thank you