EmbeddingGemma indexing is extremely slow when used for knowledge indexing in Dify (Docker setup)

K_F · January 21, 2026, 1:36am

Hello,

I am using EmbeddingGemma for knowledge indexing in Dify, and the indexing process is taking an extremely long time.
I would appreciate any advice on possible causes and how to improve performance.

Here is my setup and situation:

Dify is deployed using Docker (self-hosted)

Embedding model: EmbeddingGemma

Use case: Knowledge indexing (document ingestion / vectorization)

The system works functionally, but indexing speed is much slower than expected

Even relatively small documents take a long time to complete indexing

Things I am wondering about:

Is EmbeddingGemma known to be slower for batch or large-scale embedding tasks?

Are there recommended Docker resource settings (CPU, memory, threads) for EmbeddingGemma?

Does EmbeddingGemma run fully locally, or could there be hidden bottlenecks (e.g., model loading, single-thread execution)?

Are there best practices for chunk size, batch size, or parallelism when using EmbeddingGemma with Dify?

Is this behavior expected compared to other embedding models?

Any guidance, documentation references, or tuning tips would be greatly appreciated.

Thank you in advance.

Pannaga_J · January 24, 2026, 4:51pm

Hi @K_F
Thanks for reaching out! To answer your questions .
EmbeddingGemma is designed to run locally and can be slower than API-based embedding services, especially without GPU acceleration. It’s a transformer-based model that processes text sequentially.
EmbeddingGemma runs fully locally within your Docker container. Model loading happens once at startup, but inference can be single-threaded depending on your configuration and it also supports batch processing, which is crucial for performance.

To understand and diagnose your issue better can you please let us know

Are you running on CPU only or do you have GPU available?
What size documents are you indexing (number of chunks/tokens)?
How long does it take to index, say, a 1-page document?
Can you also explain how exactly are you running EmbeddingGemma with Dify?
Are you seeing any error logs or warnings in the Dify/Docker logs?

Some general optimizations you can try are

To ensure your Docker container has adequate CPU and memory allocation.
Optimize chunk size: Smaller chunks (256-512 tokens) may process faster than very large chunks
Check if CPU/memory is maxing out during indexing.

Thanks

K_F · January 27, 2026, 2:46am

Thank you for your reply!

1.Are you running on CPU only or do you have GPU available?

→Yes, GPU should be available

2.What size documents are you indexing (number of chunks/tokens)?

→15MB size file.

3.How long does it take to index, say, a 1-page document?

→I don’t know exact hour, but I would say it takes 3 or 4 hours for simple text file.

4.Can you also explain how exactly are you running EmbeddingGemma with Dify?

→I run the dify on Docker, and run it completely offline. I don’t know the details.

5.Are you seeing any error logs or warnings in the Dify/Docker logs?

→Not on Docker logs. There was an error message on DiFy knowledge UI. It says “Server Unavailable Error when calling embedding API.Connection refused on port 11434, retries exceeded.”

Pannaga_J · January 29, 2026, 1:07pm

Hi @K_F
Thanks for the detailed answers, that helps a lot.
Port 11434 error = embedding service not reachable.
Can you please check the embedding service connectivity? Since Dify is trying to connect to port 11434. Please make sure Ollama is running and bound to 0.0.0.0, not localhost.

If indexing takes hours for a small text file, EmbeddingGemma is running on CPU, not GPU even if a GPU exists on the machine. Please check while indexing with this command “nvidia-smi”.
If you want to use GPU then must include GPU access in Your docker-compose.yamlfile .

Bad chunking can make indexing extremely slow. My recommendation is to have chunk size of 256–512 tokens and enable batching if possible.

Please give these steps a try and let me know if the issue persists .

Clintin_Brummer1 · May 4, 2026, 4:41pm

Issue: “EmbeddingGemma indexing is extremely slow when used for knowledge indexing in Dify (Docker setup)”

The Problem: Generating embeddings for large document sets inside a Docker container is severely bottlenecked.

The Fix:

Slow embedding generation in Docker usually points to hardware utilization issues rather than the model itself.

Hardware Acceleration: Ensure Docker is configured for GPU passthrough. If running on an NVIDIA GPU, ensure the --gpus all flag is set and the container has the CUDA toolkit installed.

Batch Sizing: Within the Dify configuration, lower the document chunk batch size. Attempting to embed chunks that are too large in a single pass on CPU or limited VRAM will cause severe thermal throttling and slow processing times.

Resource Allocation: If strictly running on CPU, increase the number of allocated cores and RAM in Docker Desktop’s resource settings to allow parallel processing of the embedding vectors.

Topic		Replies	Views
Gemma 4 e4b latency optimisations Gemma pipelines	2	54	May 12, 2026
Slow Context Cache Creation with Gemini 2.5 Pro: Looking for Optimization Methods Gemini API api , models , context_caching	3	359	June 27, 2025
Loading VertexAIEmbeddings is stuck Gemini API	1	78	August 5, 2025
Gemini API responses slower than Gemini on web when files are in chat Gemini API api , gemini-flash , gemini-20	1	597	June 13, 2025
Gemini Live API very slow Gemini API gemini-api	2	193	March 26, 2026

EmbeddingGemma indexing is extremely slow when used for knowledge indexing in Dify (Docker setup)

Related topics