Which API method is better for RAG-based chatbots in Gemini — generate_content() or chat.send_message()?

sol-ruh-1 · June 18, 2025, 1:39am

Hi Gemini team and community,

I’m building a RAG based chatbot using the Gemini API.

I’m currently deciding between two approaches provided by the SDK:

generate_content() using a structured list of contents = [...] (manual management of history and context),
or chat = client.chats.create(...); chat.send_message(...) ?
Thank you for the responses in advance.

GUNAND_MAYANGLAMBAM · June 18, 2025, 4:29am

Hi @sol-ruh-1 , Welcome to the forum.

It depends on your use case, if you need fine-grained control over history and context, especially for RAG implementations, generate_content() is the better choice. If you prefer a simpler, higher-level interface with automatic history management, chat.send_message() is more convenient. We also have a cookbook for document question answering that you can refer to for guidance.

sol-ruh-1 · June 18, 2025, 4:40am

Hello @GUNAND_MAYANGLAMBAM , thank you for the helpful response and I actually have seen the coodbook before and it’s a very simple implemntaion so here i am asking for the following specifics:

As I’m using generate_content() for a RAG-based chatbot, I’d like to clarify a few specifics around structuring the contents argument for long-running, multi-turn conversations. In my case, I manually manage chat history and retrieved documents from a vector store.

What role should I use for the system-level prompt?
In OpenAI, we usually start with a system role to define the assistant’s behavior (e.g., “You are a helpful assistant that answers concisely using retrieved documents.”).
In Gemini’s generate_content(), can I use "role": "system" in the same way, or is there a better/more appropriate method to specify that?
Best practices for context management in long chats:
Since Gemini models have a fixed context window, what’s the recommended way to preserve context without exceeding the context window limit?

Are there any patterns, utilities, or suggested strategies from the Gemini team for keeping the prompt within limits without losing essential context for follow-up questions?

If this is also addressed in the Document QA cookbook or any RAG-specific guidance, I’d love a pointer to the relevant section.

Thanks again! This forum has been super valuable

GUNAND_MAYANGLAMBAM · June 18, 2025, 5:49am

To define a system prompt, you can include the system instruction in the config, please refer to this cookbook for guidance. Regarding context management in long conversations, all LLMs have context length limits, so it’s a good practice to summarize earlier turns into a concise recap to preserve continuity.

Thank you

sol-ruh-1 · June 18, 2025, 10:51am

Thank you for clarifying and the cookbook source — that was helpful.

Combining the previous two questions, i have follow up questions:

1. Are there any cookbook examples for building a RAG chatbot using `generate_content()` with an explicit `contents[]` array?

Most examples I’ve seen focus on simple one-off prompts. I’m looking for a more detailed structure — one that:

Builds a contents array with multi-turn context,
Injects retrieved documents (from a vector store or search),
And clearly separates system instructions, user input, and model response.

Any official examples or best practices for this setup would be appreciated.

2. In a RAG chatbot, I typically have:

A system-level prompt (instruction),
A retrieved context (from search),
A user query.

How should these be structured in the contents[]?

Specifically:

Should the system prompt say something like:
“Use the retrieved content to answer the user’s question precisely”,
and then place both the retrieved context and the user question inside a single user role?

Or is it better to:

Keep the system prompt more general (e.g., “You are a helpful assistant”), and
Put a single user message like:

“Content:\n\n\nQuestion:\n”?

A small example or recommended structure would be really helpful.

Thanks again!

Pannaga_J · June 27, 2025, 9:03am

Hi @sol-ruh-1 Unfortunaltely there is no official cookbook for RAG chatbots with multi-turn context at present. We’ve reported this to our internal team Thank you for bringing this up .

Topic		Replies	Views
Gemini API generateAnswer still valid? Gemini API api , models	4	69	June 9, 2025
Confusion on genai SDK design (compare to OpenAI SDK) Gemini API generative-ai , openai_compatibility	1	101	May 22, 2025
Conversation history in self.client.models.generate_content in google.gen-ai Gemini API api , model	6	244	June 24, 2025
Can you use Gemini to answer questions based on a single text data source? Gemini API	7	882	July 10, 2024
Trying to improve my Gemini model Gemini API gemini , prompt	2	53	April 3, 2025

Which API method is better for RAG-based chatbots in Gemini — generate_content() or chat.send_message()?

1. Are there any cookbook examples for building a RAG chatbot using generate_content() with an explicit contents[] array?

2. In a RAG chatbot, I typically have:

Related topics

1. Are there any cookbook examples for building a RAG chatbot using `generate_content()` with an explicit `contents[]` array?