Hello,
according to the documentation, the TTS models support streaming output:
However, after playing around with this feature with both the Python and Java SDK, I only ever receive a single chunk with the complete audio after a long period of waiting. Even for a 90 second speech (single-speaker TTS).
Is there anything to watch out for, which is not mentioned in the guide? Did any of you have success in streaming smaller chunks of audio?
The documenation is also quite vague as it does not show the full interaction with the client. I was assuming you need to use the generate_content_stream
method like you would for streaming text responses (maybe a wrong assumption?).
Thank you all!