EmbeddingGemma indexing is extremely slow when used for knowledge indexing in Dify (Docker setup)

Hello,

I am using EmbeddingGemma for knowledge indexing in Dify, and the indexing process is taking an extremely long time.
I would appreciate any advice on possible causes and how to improve performance.

Here is my setup and situation:

Dify is deployed using Docker (self-hosted)

Embedding model: EmbeddingGemma

Use case: Knowledge indexing (document ingestion / vectorization)

The system works functionally, but indexing speed is much slower than expected

Even relatively small documents take a long time to complete indexing

Things I am wondering about:

Is EmbeddingGemma known to be slower for batch or large-scale embedding tasks?

Are there recommended Docker resource settings (CPU, memory, threads) for EmbeddingGemma?

Does EmbeddingGemma run fully locally, or could there be hidden bottlenecks (e.g., model loading, single-thread execution)?

Are there best practices for chunk size, batch size, or parallelism when using EmbeddingGemma with Dify?

Is this behavior expected compared to other embedding models?

Any guidance, documentation references, or tuning tips would be greatly appreciated.

Thank you in advance.

Hi @K_F
Thanks for reaching out! To answer your questions .
EmbeddingGemma is designed to run locally and can be slower than API-based embedding services, especially without GPU acceleration. It’s a transformer-based model that processes text sequentially.
EmbeddingGemma runs fully locally within your Docker container. Model loading happens once at startup, but inference can be single-threaded depending on your configuration and it also supports batch processing, which is crucial for performance.

To understand and diagnose your issue better can you please let us know

  1. Are you running on CPU only or do you have GPU available?
  2. What size documents are you indexing (number of chunks/tokens)?
  3. How long does it take to index, say, a 1-page document?
  4. Can you also explain how exactly are you running EmbeddingGemma with Dify?
  5. Are you seeing any error logs or warnings in the Dify/Docker logs?

Some general optimizations you can try are

  1. To ensure your Docker container has adequate CPU and memory allocation.
  2. Optimize chunk size: Smaller chunks (256-512 tokens) may process faster than very large chunks
  3. Check if CPU/memory is maxing out during indexing.

Thanks

Thank you for your reply!

1.Are you running on CPU only or do you have GPU available?

→Yes, GPU should be available

2.What size documents are you indexing (number of chunks/tokens)?

→15MB size file.

3.How long does it take to index, say, a 1-page document?

→I don’t know exact hour, but I would say it takes 3 or 4 hours for simple text file.

4.Can you also explain how exactly are you running EmbeddingGemma with Dify?

→I run the dify on Docker, and run it completely offline. I don’t know the details.

5.Are you seeing any error logs or warnings in the Dify/Docker logs?

→Not on Docker logs. There was an error message on DiFy knowledge UI. It says “Server Unavailable Error when calling embedding API.Connection refused on port 11434, retries exceeded.”

Hi @K_F
Thanks for the detailed answers, that helps a lot.
Port 11434 error = embedding service not reachable.
Can you please check the embedding service connectivity? Since Dify is trying to connect to port 11434. Please make sure Ollama is running and bound to 0.0.0.0, not localhost.

If indexing takes hours for a small text file, EmbeddingGemma is running on CPU, not GPU even if a GPU exists on the machine. Please check while indexing with this command “nvidia-smi”.
If you want to use GPU then must include GPU access in Your docker-compose.yamlfile .

Bad chunking can make indexing extremely slow. My recommendation is to have chunk size of 256–512 tokens and enable batching if possible.

Please give these steps a try and let me know if the issue persists .