Request for Multimodal Audio Output Allowlisting - Project: xavier-488808

Model: gemini-2.5-flash Region: us-central1

Use Case Description: I am developing a high-fidelity flight simulation soundpack generator (SLC - Self Loading Cargo). The application generates realistic airline crew announcements based on real-time flight data (weather, location, destination).

Why I need Native Audio Output:

  1. Bilingual Fluidity: The native multimodal output of Gemini 2.5 Flash allows for seamless transition between languages (e.g., Japanese and English) within the same audio stream, preserving the character’s voice persona (Puck/Aoife).

  2. Contextual Prosody: The model’s ability to adjust tone and emphasis based on the generated text (e.g., safety warnings vs. welcome messages) is critical for simulation immersion.

  3. Workflow Optimization: Direct audio generation significantly reduces latency compared to traditional text-to-speech pipelines.

Target Audience: This is for a local simulation tool used by the flight simulation community. There is no automated public-facing bot involved, and all content is safety-related for entertainment/simulation purposes.

Thank you for your review.