Hello Google AI Developer Community,
I’m currently working on a project that involves generating game descriptions based on the theme of “Actions and Stuff APK” using Gemini API. My goal is to use the multimodal capabilities of the model to create engaging and context-aware descriptions by combining text, images, and gameplay context. However, I’ve encountered several challenges:
1. Inconsistent Output
When using Gemini API (specifically Gemini 1.5 Pro), the descriptions often vary in tone, length, and coherence. For instance, some outputs sound too technical or generic, while others miss the casual and chaotic nature of Actions and Stuff, a sandbox-style game popular on platforms like ‘Moderated by moderator’. I’m aiming for output that captures the spontaneous, user-driven fun of the game — but it’s not always reflected well.
2. Difficulty with Visual Input
I’ve uploaded screenshots from the game, including scenes with random action setups and character behavior, to help the API better understand the context. Unfortunately, the generated descriptions tend to misinterpret objects or fail to grasp the open-ended gameplay style. Is there a way to improve how the model handles visual cues, especially from sandbox-style games?
3. Prompt Structuring
I’ve tried prompts like:
- “Write a short game description for Actions and Stuff APK, highlighting player freedom and sandbox-style gameplay.”
- “Based on this image, describe what a user can do in Actions and Stuff — be creative, fun, and lighthearted.”
However, the results still feel off. Should I be using few-shot prompting with more examples from websites like or try more structured prompt templates?
4. Language and Context
Even though the game uses simple English, the API sometimes misses the slang or playful tone that’s central to Actions and Stuff. This can lead to robotic-sounding descriptions. Are there best practices for making AI-generated content feel more natural and culturally aligned with younger audiences?
I’ve also experimented with parameters like temperature and top-p, and tried enabling Grounding with Google Search for improved relevance. But the results still don’t capture the quirky, user-generated energy that defines Actions and Stuff APK.
I’d love to hear from others in the community:
If you’ve worked on similar projects or have tips for better multimodal content generation — especially for sandbox or simulation games — I’d greatly appreciate your insights. And if anyone has tried working with Gemini 2.5 Pro or adding fine-tuned examples for niche games, I’d be curious to hear about that too.
Thanks in advance!