How to achieve AI Studio-like multi-turn image consistency with Gemini 3 Pro Image API in automation workflows?

Background

I’m building an automated visual book-to-graphic-novel pipeline using n8n workflow automation and the Gemini 3 Pro Image API. I write books and want to convert them into 20-30 page visual graphic novels with consistent characters, style, and visual continuity.

What Works (AI Studio)

In AI Studio, the workflow is seamless:

  1. I provide the full book context/storyline upfront

  2. I say: “Generate the cover for this book” → AI generates it

  3. I say: “Generate page 1” → AI generates it with consistent style and characters

  4. I say: “Generate page 2” → AI maintains context of previous pages and style

  5. This continues for 20+ pages with perfect consistency

The key: AI Studio seems to maintain persistent visual context throughout the entire conversation without me needing to re-upload reference images or previous pages.

What Doesn’t Work (API + n8n Automation)

When building this via the Gemini API with n8n automation, I cannot replicate this behavior. Here’s what I’ve tried:

Attempt 1: Multi-turn conversation with contents array

  • Following the multi-turn image editing docs

  • ive tried Manually building a conversationHistory array with all previous messages

  • also tried attaching each image problem: Hitting the 14-image attachment limit around page 12-14 when including all generated pages in history

Attempt 2: Context Caching

  • Attempted to use context caching to store character references and style

  • Problem: Context caching is not available for gemini-3-pro-image-preview model (only text models )

My Questions

  1. How does AI Studio maintain visual context across 20+ image generations? Is it using a different API endpoint or method that’s not documented in the public API docs?

  2. Is there a way to use the Chat/multi-turn functionality that maintains visual memory without hitting attachment limits?

  3. Should I be using a different model or approach for this use case? The goal is: upload context once → generate 20-30 sequential images with consistent characters/style.

What I Need

A method to replicate AI Studio’s behavior programmatically:

  • Provide full story context + character references once

  • Generate pages 1-30 sequentially via API calls

  • Each page maintains visual consistency with previous pages

  • No need to re-upload references or regenerated pages with each request

Any guidance, code examples, or documentation pointers would be incredibly helpful!