Hi everyone,
I’m currently building a POC (proof of concept) for a hyperlocal taxi booking app development, and I’m exploring how to integrate Google’s Gemini API to make the system more intelligent, especially during the ride request phase.
The goal is to enhance the app’s ability to:
-Detect the user’s intended destination from natural language (even if it’s vague like “I need to catch a flight soon”).
-Understand ride preferences, e.g., quiet ride, AC on/off, preferred driver gender, etc.
-Pick up emotional or safety signals (e.g., “I don’t feel safe”, “I’m being followed”).
-Convert voice or free-text inputs into structured data that can be used by the backend for decision-making (e.g., driver assignment, emergency escalation, etc.)
What I’m Trying with Gemini:
I’m thinking of passing the user input (voice → text) into Gemini API with a prompt like:
“Extract destination, urgency, preferences, and emotional state from this user input…”
But I’m unsure about:
1- Prompt engineering – What’s the most effective way to structure prompts for such multi-field extraction?
2- Latency – How responsive is Gemini in a mobile context? Any best practices for optimizing speed?
3- Security & privacy – Any tips on handling sensitive user input responsibly when using large language models?
4- Fallback strategies – How to gracefully handle cases when Gemini fails to extract useful info?
Has anyone tried using Gemini or similar LLMs in a real-time mobile app setting like this?
Would love to hear:
-Your experiences
-Prompt techniques
-API optimization tips
-Gotchas to avoid
Thanks