Which API method is better for RAG-based chatbots in Gemini — generate_content() or chat.send_message()?

Hi Gemini team and community,

I’m building a RAG based chatbot using the Gemini API.

I’m currently deciding between two approaches provided by the SDK:

  1. generate_content() using a structured list of contents = [...] (manual management of history and context),
  2. or chat = client.chats.create(...); chat.send_message(...) ?
    Thank you for the responses in advance.

Hi @sol-ruh-1 , Welcome to the forum.

It depends on your use case, if you need fine-grained control over history and context, especially for RAG implementations, generate_content() is the better choice. If you prefer a simpler, higher-level interface with automatic history management, chat.send_message() is more convenient. We also have a cookbook for document question answering that you can refer to for guidance.

Hello @GUNAND_MAYANGLAMBAM , thank you for the helpful response and I actually have seen the coodbook before and it’s a very simple implemntaion so here i am asking for the following specifics:

As I’m using generate_content() for a RAG-based chatbot, I’d like to clarify a few specifics around structuring the contents argument for long-running, multi-turn conversations. In my case, I manually manage chat history and retrieved documents from a vector store.

  1. What role should I use for the system-level prompt?
    In OpenAI, we usually start with a system role to define the assistant’s behavior (e.g., “You are a helpful assistant that answers concisely using retrieved documents.”).
    In Gemini’s generate_content(), can I use "role": "system" in the same way, or is there a better/more appropriate method to specify that?
  2. Best practices for context management in long chats:
    Since Gemini models have a fixed context window, what’s the recommended way to preserve context without exceeding the context window limit?

Are there any patterns, utilities, or suggested strategies from the Gemini team for keeping the prompt within limits without losing essential context for follow-up questions?

If this is also addressed in the Document QA cookbook or any RAG-specific guidance, I’d love a pointer to the relevant section.

Thanks again! This forum has been super valuable

To define a system prompt, you can include the system instruction in the config, please refer to this cookbook for guidance. Regarding context management in long conversations, all LLMs have context length limits, so it’s a good practice to summarize earlier turns into a concise recap to preserve continuity.

Thank you

Thank you for clarifying and the cookbook source — that was helpful.

Combining the previous two questions, i have follow up questions:

1. Are there any cookbook examples for building a RAG chatbot using generate_content() with an explicit contents[] array?

Most examples I’ve seen focus on simple one-off prompts. I’m looking for a more detailed structure — one that:

  • Builds a contents array with multi-turn context,
  • Injects retrieved documents (from a vector store or search),
  • And clearly separates system instructions, user input, and model response.

Any official examples or best practices for this setup would be appreciated.

2. In a RAG chatbot, I typically have:

  • A system-level prompt (instruction),
  • A retrieved context (from search),
  • A user query.

How should these be structured in the contents[]?

Specifically:

  • Should the system prompt say something like:
    “Use the retrieved content to answer the user’s question precisely”,
    and then place both the retrieved context and the user question inside a single user role?

Or is it better to:

  • Keep the system prompt more general (e.g., “You are a helpful assistant”), and
  • Put a single user message like:

“Content:\n\n\nQuestion:\n”?

A small example or recommended structure would be really helpful.

Thanks again!

Thanks again!

Hi @sol-ruh-1 Unfortunaltely there is no official cookbook for RAG chatbots with multi-turn context at present. We’ve reported this to our internal team Thank you for bringing this up .

1 Like