Table of Contents

Embeddings

Embeddings are dense vector representations that capture semantic meaning of text, images, and other data in continuous high-dimensional space. They are the foundation of semantic search, retrieval-augmented generation, clustering, and classification in AI agent systems. Choosing the right embedding model directly impacts retrieval quality, agent accuracy, and operational costs.

How Embeddings Work

Embedding models transform input data into fixed-size numerical vectors $\mathbf{x} \in \mathbb{R}^d$ where semantically similar items are positioned close together in vector space. The similarity between two embeddings is typically measured using cosine similarity:

$$\text{sim}(\mathbf{a}, \mathbf{b}) = \frac{\mathbf{a} \cdot \mathbf{b}}{||\mathbf{a}|| \cdot ||\mathbf{b}||}$$

This enables:

Text Embedding Models

Model Provider Dimensions MTEB Score Best For
text-embedding-3-large OpenAI 3072 ~70% General-purpose agent retrieval
text-embedding-3-small OpenAI 1536 ~65% Cost-effective applications
Embed v4 Cohere 1024-4096 ~68% Multilingual enterprise agents
BGE-M3 BAAI 768-1024 ~68% Open-source RAG, cost-sensitive
nomic-embed-text Nomic 768 ~66% Low-latency, edge deployment
Voyage 3 Voyage AI 1024 ~69% Long-context retrieval
jina-embeddings-v3 Jina AI 1024 ~67% Multilingual, code embeddings

Example: Embedding Pipeline for Agents

import numpy as np
from openai import OpenAI
 
client = OpenAI()
 
def embed_texts(texts: list[str], model="text-embedding-3-large") -> np.ndarray:
    """Embed a batch of texts using OpenAI API."""
    response = client.embeddings.create(input=texts, model=model)
    return np.array([item.embedding for item in response.data])
 
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
 
# Embed documents and query
documents = [
    "RAG combines retrieval with generation for grounded responses",
    "Fine-tuning adapts model weights on domain-specific data",
    "Prompt engineering designs inputs to guide model behavior"
]
doc_embeddings = embed_texts(documents)
query_embedding = embed_texts(["How do I ground agent responses in facts?"])[0]
 
# Find most relevant document
similarities = [cosine_similarity(query_embedding, doc) for doc in doc_embeddings]
best_match = documents[np.argmax(similarities)]
print(f"Most relevant: {best_match}")  # RAG document

Dimensionality Considerations

The number of dimensions $d$ in an embedding affects the trade-off between semantic precision and computational cost:

Practical guidance: Start with medium dimensions (768-1024). Only scale up if retrieval quality benchmarks show meaningful improvement. Use dimensionality reduction (PCA, Matryoshka embeddings) to test whether lower dimensions maintain acceptable recall.

Multi-Modal Embeddings

Multi-modal embedding models project different data types (text, images, audio) into a shared vector space $\mathbb{R}^d$, enabling cross-modal search:

Cross-modal search enables agents to find relevant images from text queries or match text descriptions to visual content.

MTEB Benchmark

The Massive Text Embedding Benchmark (MTEB) evaluates embedding models across 56+ tasks spanning retrieval, classification, clustering, reranking, and semantic similarity. Key findings for 2026:

Embedding Selection Criteria

Embeddings are stored and queried in vector databases using approximate nearest neighbor (ANN) algorithms. The most common distance metrics are:

Key ANN implementations include:

References

See Also