I’m using Gemini Live (Flash 3.1) over WebSockets for an Android voice app. I’m facing a specific “silent failure” issue:
When the network is slow or has high jitter, the WebSocket remains connected and no errors are thrown, but speech is not recognized at all. On a stable network, it works perfectly.
Current Setup:
-
16 kHz, mono, 16-bit PCM.
-
Audio frames are sent immediately after capture.
-
Basic reconnect logic (only triggers on socket close).
Questions:
-
Are there recommended buffering or smoothing strategies (e.g., specific chunk sizes) to handle unstable throughput?
-
Should I consider moving to a different transport to mitigate this?
Any insights on making the audio stream more resilient to “slow-but-connected” networks would be appreciated!