====== Embedding Models Comparison ====== Embedding models convert text (and sometimes images) into dense numerical vectors that capture semantic meaning, enabling similarity search, retrieval-augmented generation (RAG), and clustering. The choice of embedding model significantly impacts retrieval quality, cost, and latency. ((https://www.stackai.com/insights/best-embedding-models-for-rag-in-2026-a-comparison-guide|StackAI: Best Embedding Models for RAG 2026)) ===== OpenAI Embeddings ===== **text-embedding-3-small**: * Dimensions: 1,536 * MTEB retrieval score: ~60-62 * Pricing: $0.02 per million input tokens * Best for: Budget-friendly, high-throughput applications **text-embedding-3-large**: * Dimensions: 3,072 * MTEB retrieval score: ~62+ (top proprietary baseline) * Pricing: $0.13 per million input tokens * Best for: Maximum quality when cost is secondary Both models are English-focused with limited multilingual optimization. ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Best Embedding Models 2026)) ===== Cohere Embeddings ===== **embed-v3**: * Dimensions: 1,024 * MTEB retrieval score: 60-64 * Strong multilingual support (100+ languages) * Pricing: ~$0.10 per million tokens **embed-v4**: * Enterprise-tuned with improved performance over v3 * Matryoshka representation support for dimension reduction * Pairs well with Cohere's reranker for two-stage retrieval ((https://www.stackai.com/insights/best-embedding-models-for-rag-in-2026-a-comparison-guide|StackAI: Embedding Models Comparison)) ===== BGE Models (BAAI) ===== **BGE-M3** is the leading open-source embedding model: * Dimensions: 1,024 * MTEB retrieval score: 62-64 * Supports 100+ languages * Matryoshka representation learning support * Free to self-host * Best for: Multilingual RAG with full data sovereignty ((https://milvus.io/blog/choose-embedding-model-rag-2026.md|Milvus: Choosing Embedding Models for RAG 2026)) ===== E5 Models (Microsoft) ===== E5 models range from small (384 dimensions, 118M parameters) to large (4,096 dimensions): * E5-small achieves 100% Top-5 RAG accuracy while being 14x faster than large models * MTEB retrieval score: 60-62 * Free and open-source * Best for: Efficiency-critical deployments where speed matters ((https://aimultiple.com/open-source-embedding-models|AI Multiple: Open Source Embedding Models)) ===== Jina Embeddings ===== **Jina v2**: 768-1,024 dimensions, ~$0.05/M tokens via API, self-hostable **Jina v3/v4**: * Up to 2,048 dimensions (v4) * MTEB score: 62+ * LoRA adapters for domain specialization * Matryoshka support (correlation coefficient rho=0.833) * Multimodal support (text, image, PDF) * 3.8B parameters for self-hosting ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Best Embedding Models 2026)) ===== Voyage AI Embeddings ===== **voyage-3-large** and **Voyage Multimodal 3.5**: * Dimensions: 1,024-3,072 (compressible to 512 via Matryoshka) * MTEB score: 62+ (beats OpenAI at int8 compressed 512 dimensions) * 89+ languages supported * Matryoshka correlation: rho=0.880 (highest among tested models) * Cross-modal retrieval R@1=0.900 * At compressed dimensions, outperforms full OpenAI vectors by 1.16% at 200x lower storage ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Best Embedding Models 2026)) ===== MTEB Benchmark Rankings ===== The Massive Text Embedding Benchmark (MTEB) evaluates embedding models across retrieval, classification, clustering, and other tasks. As of 2025-2026: ((https://milvus.io/blog/choose-embedding-model-rag-2026.md|Milvus: Choosing Embedding Models 2026)) **Top proprietary**: OpenAI text-embedding-3-large, Voyage-3-large, Gemini Embedding 2 **Top open-source**: BGE-M3, Jina v4, llama-embed-nemotron-8b **Multilingual leaders**: Cohere v4, BGE-M3 (cross-lingual R@1 > 0.98) **Efficiency leaders**: E5-small (14x faster, 100% Top-5 accuracy) No single model dominates all categories; the best choice depends on language requirements, budget, latency constraints, and whether self-hosting is needed. ===== Matryoshka Representations ===== Matryoshka Representation Learning (MRL) trains embeddings so that truncating dimensions preserves most of the original performance. This allows trading storage and compute for quality: ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Embedding Models)) * Voyage achieves rho=0.880 correlation between full and truncated dimensions * Jina v4 achieves rho=0.833 * Enables 200x storage savings with minimal recall degradation * Supported by Voyage, Jina, BGE-M3, and Cohere v4 ===== Choosing an Embedding Model ===== ^ Use Case ^ Recommended Models ^ | General RAG (English) | OpenAI text-embedding-3-large or -small | | Multilingual RAG | Cohere embed-v4 or BGE-M3 | | Cost-sensitive / Self-hosting | BGE-M3 or E5-small | | Storage-constrained | Voyage (MRL at 512 dims) or Jina v4 | | Multimodal (image/PDF) | Jina v4 or Voyage Multimodal 3.5 | | Enterprise two-stage | Cohere v4 + Cohere Reranker | Always pair embeddings with a reranker for 5-10% additional retrieval quality gains. ((https://www.stackai.com/insights/best-embedding-models-for-rag-in-2026-a-comparison-guide|StackAI: Embedding Models Comparison)) ===== See Also ===== * [[semantic_search|Semantic Search]] * [[reranking|Reranking]] * [[retrieval_strategies|Retrieval Strategies]] ===== References =====