I’m building a RAG pipeline with gemini-embedding-001 and RETRIEVAL_DOCUMENT task type. My documents are chunked, and each chunk has a short title (1-5 words).
I have two approaches and would appreciate clarity on which is optimal:
1. Native title parameter:
config=types.EmbedContentConfig(
task_type="RETRIEVAL_DOCUMENT",
title=content_title,
output_dimensionality=768,
)
contents = [chunk_text]
2. Manual prepend:
I don’t feed in title to the config and have the following
config=types.EmbedContentConfig(
task_type="RETRIEVAL_DOCUMENT",
output_dimensionality=768,
)
contents = [f"Title: {title}\nContent: {chunk_text}"]
Questions:
-
Does the native
titleparameter handle the title differently from simply prepending it to the content string? -
For short titles (1-5 words), is the native param meaningfully better than manual prepend?