Can I generate Images or audios in Google AI Studio?

I know there is model available for image generation in API?

I wanna know if there is feature in google AI studio to generate the image? I would like to experiment with prompt for generating more realistic images.

Also I would like to know if google has API that takes audio as input and audio as output. I am aware of fact we can use other model for audio generation(just like wispher) after getting text response from gemini, but there is large latency which is not suitable for chat like system.

Thank you.

No. Access to the Imagen models isn’t through the AI Studio API. You’ll need to use the Vertex AI API.

While you can do audio input with Gemini, it doesn’t do audio output. You’ll need to use the Google Text-to-Speech (TTS) API for that.

The largest latency I tend to notice is in the LLM portion itself, not in the STT or TTS portions. What latency numbers are you getting for each?

1 Like