[Merge Fellas] Debugging an "Unlimited Shake" with Gemini API?

Hi everyone,

First-time poster here, though I’ve been learning from this community for a while. I’m hoping to get some advice on a perplexing issue I’m facing while integrating the Gemini API into an Android project.

The Context: I’m working on a proof-of-concept feature I’ve codenamed Merge Fellas. The goal is to use the Gemini Pro Vision model (via the Android SDK) to analyze a real-time camera feed and suggest placement for interactive UI elements.

The Problem: The “Unlimited Shake” The core issue is a visual instability I can only describe as an “unlimited shake”. When the model returns coordinates for an overlay, instead of being stable, the element jitters rapidly in a tight loop on the screen. It feels like a runaway feedback loop between the camera input and the model’s response.

What I’ve Tried So Far: I’ve spent a good amount of time trying to isolate the cause. Here’s what I’ve done:

  • Simplified the Prompt: I’ve stripped my prompt down to the bare minimum to ensure the model’s response structure isn’t overly complex.
  • Throttled API Calls: I implemented a simple throttling mechanism to limit requests to one per second, but the shaking persists for each new position.
  • Checked Threading: I’ve confirmed that the Gemini API calls are on a background thread and the UI updates are correctly posted to the main thread.
  • Tested with Static Images: When I feed the model a static image instead of the camera stream, the returned coordinates are perfect and the UI element is completely stable. This strongly suggests the problem is in the real-time pipeline.

To keep this post tidy, I’ve put my Gemini API configuration, build.gradle dependencies, and a sanitized Logcat snippet showing the coordinate updates into a Gist. (You will place a Gist/Pastebin link here in your actual post)

My Question: I’m starting to wonder if I’m missing a fundamental pattern for managing this kind of real-time generative feedback.

  1. Has anyone experienced similar visual instability or “shaking” when processing a real-time stream with Gemini?
  2. Is there a recommended best practice for smoothing or stabilizing the output from the model for AR/overlay applications?

Any advice on what to check next would be a huge help. Thanks in advance!