====== Embedding Models Comparison ======

Embedding models convert text (and sometimes images) into dense numerical vectors that capture semantic meaning, enabling similarity search, retrieval-augmented generation (RAG), and clustering. The choice of embedding model significantly impacts retrieval quality, cost, and latency. ((https://www.stackai.com/insights/best-embedding-models-for-rag-in-2026-a-comparison-guide|StackAI: Best Embedding Models for RAG 2026))

===== OpenAI Embeddings =====

**text-embedding-3-small**:
  * Dimensions: 1,536
  * MTEB retrieval score: ~60-62
  * Pricing: $0.02 per million input tokens
  * Best for: Budget-friendly, high-throughput applications

**text-embedding-3-large**:
  * Dimensions: 3,072
  * MTEB retrieval score: ~62+ (top proprietary baseline)
  * Pricing: $0.13 per million input tokens
  * Best for: Maximum quality when cost is secondary

Both models are English-focused with limited multilingual optimization. ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Best Embedding Models 2026))

===== Cohere Embeddings =====

**embed-v3**:
  * Dimensions: 1,024
  * MTEB retrieval score: 60-64
  * Strong multilingual support (100+ languages)
  * Pricing: ~$0.10 per million tokens

**embed-v4**:
  * Enterprise-tuned with improved performance over v3
  * Matryoshka representation support for dimension reduction
  * Pairs well with Cohere's reranker for two-stage retrieval ((https://www.stackai.com/insights/best-embedding-models-for-rag-in-2026-a-comparison-guide|StackAI: Embedding Models Comparison))

===== BGE Models (BAAI) =====

**BGE-M3** is the leading open-source embedding model:
  * Dimensions: 1,024
  * MTEB retrieval score: 62-64
  * Supports 100+ languages
  * Matryoshka representation learning support
  * Free to self-host
  * Best for: Multilingual RAG with full data sovereignty ((https://milvus.io/blog/choose-embedding-model-rag-2026.md|Milvus: Choosing Embedding Models for RAG 2026))

===== E5 Models (Microsoft) =====

E5 models range from small (384 dimensions, 118M parameters) to large (4,096 dimensions):
  * E5-small achieves 100% Top-5 RAG accuracy while being 14x faster than large models
  * MTEB retrieval score: 60-62
  * Free and open-source
  * Best for: Efficiency-critical deployments where speed matters ((https://aimultiple.com/open-source-embedding-models|AI Multiple: Open Source Embedding Models))

===== Jina Embeddings =====

**Jina v2**: 768-1,024 dimensions, ~$0.05/M tokens via API, self-hostable

**Jina v3/v4**:
  * Up to 2,048 dimensions (v4)
  * MTEB score: 62+
  * LoRA adapters for domain specialization
  * Matryoshka support (correlation coefficient rho=0.833)
  * Multimodal support (text, image, PDF)
  * 3.8B parameters for self-hosting ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Best Embedding Models 2026))

===== Voyage AI Embeddings =====

**voyage-3-large** and **Voyage Multimodal 3.5**:
  * Dimensions: 1,024-3,072 (compressible to 512 via Matryoshka)
  * MTEB score: 62+ (beats OpenAI at int8 compressed 512 dimensions)
  * 89+ languages supported
  * Matryoshka correlation: rho=0.880 (highest among tested models)
  * Cross-modal retrieval R@1=0.900
  * At compressed dimensions, outperforms full OpenAI vectors by 1.16% at 200x lower storage ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Best Embedding Models 2026))

===== MTEB Benchmark Rankings =====

The Massive Text Embedding Benchmark (MTEB) evaluates embedding models across retrieval, classification, clustering, and other tasks. As of 2025-2026: ((https://milvus.io/blog/choose-embedding-model-rag-2026.md|Milvus: Choosing Embedding Models 2026))

**Top proprietary**: OpenAI text-embedding-3-large, Voyage-3-large, Gemini Embedding 2

**Top open-source**: BGE-M3, Jina v4, llama-embed-nemotron-8b

**Multilingual leaders**: Cohere v4, BGE-M3 (cross-lingual R@1 > 0.98)

**Efficiency leaders**: E5-small (14x faster, 100% Top-5 accuracy)

No single model dominates all categories; the best choice depends on language requirements, budget, latency constraints, and whether self-hosting is needed.

===== Matryoshka Representations =====

Matryoshka Representation Learning (MRL) trains embeddings so that truncating dimensions preserves most of the original performance. This allows trading storage and compute for quality: ((https://blog.premai.io/best-embedding-models-for-rag-2026-ranked-by-mteb-score-cost-and-self-hosting/|PremAI: Embedding Models))

  * Voyage achieves rho=0.880 correlation between full and truncated dimensions
  * Jina v4 achieves rho=0.833
  * Enables 200x storage savings with minimal recall degradation
  * Supported by Voyage, Jina, BGE-M3, and Cohere v4

===== Choosing an Embedding Model =====

^ Use Case ^ Recommended Models ^
| General RAG (English) | OpenAI text-embedding-3-large or -small |
| Multilingual RAG | Cohere embed-v4 or BGE-M3 |
| Cost-sensitive / Self-hosting | BGE-M3 or E5-small |
| Storage-constrained | Voyage (MRL at 512 dims) or Jina v4 |
| Multimodal (image/PDF) | Jina v4 or Voyage Multimodal 3.5 |
| Enterprise two-stage | Cohere v4 + Cohere Reranker |

Always pair embeddings with a reranker for 5-10% additional retrieval quality gains. ((https://www.stackai.com/insights/best-embedding-models-for-rag-in-2026-a-comparison-guide|StackAI: Embedding Models Comparison))

===== See Also =====

  * [[semantic_search|Semantic Search]]
  * [[reranking|Reranking]]
  * [[retrieval_strategies|Retrieval Strategies]]

===== References =====