Payload Size Limit Error with embed_content API

I am currently using the Google AI Python SDK to generate embeddings for some markdown with the following code:

model = "models/embedding-001"
embedding = genai.embed_content(
            model=model,
            content=text,
            task_type="retrieval_document",
        )

However, I am encountering the following error:

InvalidArgument: 400 Request payload size exceeds the limit: 10000 bytes.

I estimated the size of the text to be 4945 tokens using the following code:

model = genai.GenerativeModel("models/gemini-1.5-flash")
print(model.count_tokens(text))

Is it alright? The token size does not appear to be so big so that i need to create chunks for my markdown. How can i resolve this?

I would appreciate any insights or guidance on how to handle this issue or whether the limit can be adjusted in some way.

Thank you in advance for your support!

As the error message says - the limit is 10000 bytes. Not tokens.

Where is this code from? I’m asking because I thought there is no such model as embedding-001. Or if there’s such then it’s very old.
See Get text embeddings  |  Generative AI on Vertex AI  |  Google Cloud

I see textembedding-gecko@001. I’d try a newer model, like the suggested 004 or multilingual-002. Newer models gave lower dimensionality as well.

Conceptual thinking: as embedding goes you want to grasp well rounded concepts and not a bunch of mix of concepts when trying to index into that high dimensional latent embedding space. This is why RAG frameworks do chunking and perform the embedding for each chunk hoping the chunks are more whole concepts instead of a mix of some. 5k tokens seem way too big for an ideal chunk size. It’s usually around 150 characters or 80-100 tokens, possibly even less. So consider that when architecting your generative AI pipeline.

Welcome to the forum @diego_mattozo

The embeddings-001 model has an input token limit of 2048 tokens.

1 Like

Hi, thanks for your response. I was following this tutorial. I just thought that the api limitation of 10k bytes (~1500 words) was a little weird. Shouldn’t it just truncate my text or give me the possibility to do so? For my poc i think its ok, i didnt want to deal with chunking yet. After i changed my code to use another sdk, it worked (with truncation):

from vertexai.language_models import TextEmbeddingInput, TextEmbeddingModel
texts = [...]
model = TextEmbeddingModel.from_pretrained(
            "textembedding-gecko@001"
        )
inputs = [TextEmbeddingInput(text, task_type=task_type) for text in texts]
        embeddings = model.get_embeddings(inputs)
1 Like