Hi all;
We are seeking investment for a LegalTech RAG project and need a realistic budget estimation for scaling.
The Context:
-
Target Scale: ~15 million text files (avg. 120k chars/file). Total ~1.8 TB raw text.
-
Requirement: High precision. Must support continuous data updates.
-
MVP Status: We achieved successful results on a small scale using
gemini-embedding-001+ChromaDB.
Questions:
-
Moving from MVP to 15 million docs: What is a realistic OpEx range (Embedding + Storage + Inference) to present to investors?
-
Is our MVP stack scalable/cost-efficient at this magnitude?
Thanks!