Will it be possible to use fully local/offline Gemini Live AI models on devices in the future?

Hi Gemma team and community,
First of all, huge congrats on the Gemma 4 series — the on-device performance and native multimodal support are game-changing!
I just saw this incredible demo by @rohanpaul_ai (https://x.com/rohanpaul_ai/status/2040731218157941100):
Gemma 4 E2B running fully offline on an iPhone 17 Pro at ~40 tokens/s (MLX optimized for Apple Silicon), with 128K context, SOTA coding & math, and “thinking mode”. The model already natively supports audio + video frames.
This immediately made me wonder: Is there a realistic near-term path to turn this kind of on-device multimodal model into a complete local/offline version of Gemini Live?
Specifically, I’m thinking about:
Real-time full-duplex voice conversation
Continuous live camera input or screen sharing
Everything processed 100% on-device with naturally low latency
Since Gemma 4 E2B/E4B already handles audio and video frames natively, what do you see as the biggest remaining challenges?
Would love to hear any thoughts from the Gemma team or the community on:
Current technical hurdles (e.g. real-time streaming pipelines, latency, power efficiency)
Existing prototypes or open-source efforts
Future roadmap possibilities
Thanks in advance — this feels like the next exciting frontier for on-device AI!