Hi everyone,
I’m running into a challenging issue while using the Gemini streaming API to implement a feature that streams mixed text and images.
My Goal:
I want the model to stream text for immediate user feedback, while also generating and displaying images inline as part of the response, like a real-time tutorial.
The Problem:
The text streaming works perfectly, but as soon as the model attempts to send an image within the stream, the program crashes because the underlying HTTP client throws a Chunk too big error. This makes the streaming API feel unreliable for any use case that involves image generation.
My Questions:
-
Is this a known design limitation? Is streaming inherently unsuitable for handling large, monolithic data chunks like images?
-
Aside from abandoning the streaming experience entirely (by using the non-streaming API, which has poor UX) or implementing a complex “Tool Calling” / “Function Calling” pattern to handle images separately, is there a more direct or officially recommended approach to solve this?
I’ve considered manually increasing the buffer size in the underlying library (like aiohttp), but this feels like a temporary workaround that doesn’t address the root cause and introduces memory risks.
I’m very interested to hear how others in the community are handling this common scenario, or if there’s a best practice recommended by the official team. Thanks