Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Safety & Governance
Evaluation
Research
Development
Meta
Embeddings are dense vector representations that capture semantic meaning of text, images, and other data in continuous high-dimensional space. They are the foundation of semantic search, retrieval-augmented generation, clustering, and classification in AI agent systems. Choosing the right embedding model directly impacts retrieval quality, agent accuracy, and operational costs.
Embedding models transform input data into fixed-size numerical vectors $\mathbf{x} \in \mathbb{R}^d$ where semantically similar items are positioned close together in vector space. The similarity between two embeddings is typically measured using cosine similarity:
$$\text{sim}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{||\mathbf{a}|| \cdot ||\mathbf{b}||}$$
This enables:
| Model | Provider | Dimensions | MTEB Score | Best For |
| text-embedding-3-large | OpenAI | 3072 | ~70% | General-purpose agent retrieval |
| text-embedding-3-small | OpenAI | 1536 | ~65% | Cost-effective applications |
| Embed v4 | Cohere | 1024-4096 | ~68% | Multilingual enterprise agents |
| BGE-M3 | BAAI | 768-1024 | ~68% | Open-source RAG, cost-sensitive |
| nomic-embed-text | Nomic | 768 | ~66% | Low-latency, edge deployment |
| Voyage 3 | Voyage AI | 1024 | ~69% | Long-context retrieval |
| jina-embeddings-v3 | Jina AI | 1024 | ~67% | Multilingual, code embeddings |
import numpy as np from openai import OpenAI client = OpenAI() def embed_texts(texts: list[str], model="text-embedding-3-large") -> np.ndarray: """Embed a batch of texts using OpenAI API.""" response = client.embeddings.create(input=texts, model=model) return np.array([item.embedding for item in response.data]) def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float: return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) # Embed documents and query documents = [ "RAG combines retrieval with generation for grounded responses", "Fine-tuning adapts model weights on domain-specific data", "Prompt engineering designs inputs to guide model behavior" ] doc_embeddings = embed_texts(documents) query_embedding = embed_texts(["How do I ground agent responses in facts?"])[0] # Find most relevant document similarities = [cosine_similarity(query_embedding, doc) for doc in doc_embeddings] best_match = documents[np.argmax(similarities)] print(f"Most relevant: {best_match}") # RAG document
The number of dimensions $d$ in an embedding affects the trade-off between semantic precision and computational cost:
Practical guidance: Start with medium dimensions (768-1024). Only scale up if retrieval quality benchmarks show meaningful improvement. Use dimensionality reduction (PCA, Matryoshka embeddings) to test whether lower dimensions maintain acceptable recall.
Multi-modal embedding models project different data types (text, images, audio) into a shared vector space $\mathbb{R}^d$, enabling cross-modal search:
Cross-modal search enables agents to find relevant images from text queries or match text descriptions to visual content.
The Massive Text Embedding Benchmark (MTEB) evaluates embedding models across 56+ tasks spanning retrieval, classification, clustering, reranking, and semantic similarity. Key findings for 2026:
Embeddings are stored and queried in vector databases using approximate nearest neighbor (ANN) algorithms. The most common distance metrics are:
Key ANN implementations include: