How to achieve AI Studio-like multi-turn image consistency with Gemini 3 Pro Image API in automation workflows?

Background

I’m building an automated visual book-to-graphic-novel pipeline using n8n workflow automation and the Gemini 3 Pro Image API. I write books and want to convert them into 20-30 page visual graphic novels with consistent characters, style, and visual continuity.

What Works (AI Studio)

In AI Studio, the workflow is seamless:

  1. I provide the full book context/storyline upfront

  2. I say: “Generate the cover for this book” → AI generates it

  3. I say: “Generate page 1” → AI generates it with consistent style and characters

  4. I say: “Generate page 2” → AI maintains context of previous pages and style

  5. This continues for 20+ pages with perfect consistency

The key: AI Studio seems to maintain persistent visual context throughout the entire conversation without me needing to re-upload reference images or previous pages.

What Doesn’t Work (API + n8n Automation)

When building this via the Gemini API with n8n automation, I cannot replicate this behavior. Here’s what I’ve tried:

Attempt 1: Multi-turn conversation with contents array

  • Following the multi-turn image editing docs

  • ive tried Manually building a conversationHistory array with all previous messages

  • also tried attaching each image problem: Hitting the 14-image attachment limit around page 12-14 when including all generated pages in history

Attempt 2: Context Caching

  • Attempted to use context caching to store character references and style

  • Problem: Context caching is not available for gemini-3-pro-image-preview model (only text models )

My Questions

  1. How does AI Studio maintain visual context across 20+ image generations? Is it using a different API endpoint or method that’s not documented in the public API docs?

  2. Is there a way to use the Chat/multi-turn functionality that maintains visual memory without hitting attachment limits?

  3. Should I be using a different model or approach for this use case? The goal is: upload context once → generate 20-30 sequential images with consistent characters/style.

What I Need

A method to replicate AI Studio’s behavior programmatically:

  • Provide full story context + character references once

  • Generate pages 1-30 sequentially via API calls

  • Each page maintains visual consistency with previous pages

  • No need to re-upload references or regenerated pages with each request

Any guidance, code examples, or documentation pointers would be incredibly helpful!

Hi @Samuel2, apologies for the delayed response.

You can try to use FileAPI and utilize file_uri to maintain context across image generations and to avoid a massive payload overhead. You can also try to have a sliding window approach for URIs as well. File API

You can use System instructions to define requirements; this can work as caching, based on how you are providing the instructions.

Thank you!

FWIW, we are not doing anything magic in the AI Studio UI vs API. We are using the raw API.