Challenges in Generating Game Descriptions for Actions and Stuff APK Using Multimodal AI

Yaro_Lisitsyn · July 24, 2025, 3:58am

Hello Google AI Developer Community,

I’m currently working on a project that involves generating game descriptions based on the theme of “Actions and Stuff APK” using Gemini API. My goal is to use the multimodal capabilities of the model to create engaging and context-aware descriptions by combining text, images, and gameplay context. However, I’ve encountered several challenges:

1. Inconsistent Output

When using Gemini API (specifically Gemini 1.5 Pro), the descriptions often vary in tone, length, and coherence. For instance, some outputs sound too technical or generic, while others miss the casual and chaotic nature of Actions and Stuff, a sandbox-style game popular on platforms like ‘Moderated by moderator’. I’m aiming for output that captures the spontaneous, user-driven fun of the game — but it’s not always reflected well.

2. Difficulty with Visual Input

I’ve uploaded screenshots from the game, including scenes with random action setups and character behavior, to help the API better understand the context. Unfortunately, the generated descriptions tend to misinterpret objects or fail to grasp the open-ended gameplay style. Is there a way to improve how the model handles visual cues, especially from sandbox-style games?

3. Prompt Structuring

I’ve tried prompts like:

“Write a short game description for Actions and Stuff APK, highlighting player freedom and sandbox-style gameplay.”
“Based on this image, describe what a user can do in Actions and Stuff — be creative, fun, and lighthearted.”

However, the results still feel off. Should I be using few-shot prompting with more examples from websites like or try more structured prompt templates?

4. Language and Context

Even though the game uses simple English, the API sometimes misses the slang or playful tone that’s central to Actions and Stuff. This can lead to robotic-sounding descriptions. Are there best practices for making AI-generated content feel more natural and culturally aligned with younger audiences?

I’ve also experimented with parameters like temperature and top-p, and tried enabling Grounding with Google Search for improved relevance. But the results still don’t capture the quirky, user-generated energy that defines Actions and Stuff APK.

I’d love to hear from others in the community:
If you’ve worked on similar projects or have tips for better multimodal content generation — especially for sandbox or simulation games — I’d greatly appreciate your insights. And if anyone has tried working with Gemini 2.5 Pro or adding fine-tuned examples for niche games, I’d be curious to hear about that too.

Thanks in advance!

Lalit_Kumar · July 24, 2025, 7:33am

Hello,

For better performance I would recommend using our new models (e.g 2.5 Pro) which significant outperform our earlier models (1.5 Pro) and other competing models as well. And I would also experiment with prompts, try to make them more detailed and precise, give step by step and clear instructions to understand inputs better and to provide better output.

Topic		Replies	Views
Kesulitan dalam membuat deskripsi permainan Upin Ipin Universe APK menggunakan Gemini API Google AI Studio api , models , gemini	1	193	July 29, 2025
Merge Fellas: Sulit mendapatkan output yang konsisten dari Gemini API untuk deskripsi game? Google AI Studio ai-studio , generative-ai	3	53	July 22, 2025
Gemini 2.5 Pro has gotten worse Google AI Studio models , model , gemini-2-5	15	1272	July 24, 2025
General feedback for Gemini API Gemini API	2	169	September 12, 2024
Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25) Gemini API feedback , api , gemini-25 , gemini-2-5	7	630	June 4, 2025

Challenges in Generating Game Descriptions for Actions and Stuff APK Using Multimodal AI

1. Inconsistent Output

2. Difficulty with Visual Input

3. Prompt Structuring

4. Language and Context

Related topics