ChromaDB

ChromaDB is an open-source, AI-native embedding database designed to make it easy to build LLM applications with embeddings. With over 27,000 GitHub stars, it provides a simple API for storing, querying, and managing vector embeddings with metadata filtering, making it the go-to choice for rapid prototyping and production RAG applications.

Repository	github.com/chroma-core/chroma
License	Apache 2.0
Language	Python, Rust
Stars	27K+
Category	Embedding Database

Key Features

Simple API – NumPy-like simplicity for adding, querying, and managing embeddings
AI-Native Design – Purpose-built for embeddings and RAG workflows
Pluggable Embedding Functions – Built-in support for OpenAI, Sentence Transformers, Cohere, Gemini, Jina AI, Ollama, and custom functions
Hybrid Search – Vector similarity combined with metadata filtering, regex, and full-text search
Multiple Deployment Modes – In-memory, persistent (embedded), and client-server
Framework Integration – Native support for LangChain, LlamaIndex, and other RAG frameworks
Metadata Arrays – Support for string, number, and boolean arrays in metadata filtering (added 2026)

Architecture

ChromaDB uses a collection-based structure separating vector storage, metadata storage, and embedding generation:

Collection Layer – Logical groupings of embeddings, documents, IDs, and metadata
Embedding Functions – Modular, pluggable components that generate vectors from text or images
Vector Store – HNSW indexing for approximate nearest neighbor search with configurable distance metrics
Metadata Store – Separate storage for structured metadata enabling efficient filtering
Storage Backends – DuckDB+Parquet (default persistent), SQLite, or PostgreSQL

graph TB subgraph Client["Client Layer"] PyClient[Python Client] JSClient[JavaScript Client] HTTP[HTTP Client] end subgraph Core["ChromaDB Core"] Collections[Collection Manager] EF[Embedding Functions] QE[Query Engine] end subgraph Search["Search Layer"] HNSW[HNSW Vector Index] MetaFilter[Metadata Filter] FTS[Full-Text Search] Hybrid[Hybrid Ranker] end subgraph Storage["Storage Backends"] Memory[In-Memory] DuckDB[DuckDB + Parquet] SQLite[SQLite] PG[PostgreSQL] end subgraph Embeddings["Embedding Providers"] OpenAI[OpenAI] ST[Sentence Transformers] Cohere[Cohere] Custom[Custom Functions] end Client --> Core EF --> Embeddings Core --> Search Search --> Storage

Deployment Modes

ChromaDB supports three deployment modes for different use cases:

Mode	Description	Use Case
In-Memory	Fully ephemeral, embedded in app	Prototyping, testing, MVPs
Persistent (Embedded)	Disk-based via DuckDB+Parquet	Local apps, development
Client-Server	HTTP API, multi-tenant, scalable	Production, distributed systems

Metadata Filtering

ChromaDB combines vector similarity with exact metadata filters using where clauses:

Equality matching: { “source”: “wiki” }
Range queries: { “score”: { “$gt”: 0.8 } }
Set membership: { “topic”: { “$in”: [“ai”, “ml”] } }
Array metadata (2026): Complex multi-value filters on array fields

Code Example

import chromadb
from chromadb.utils import embedding_functions
 
# Initialize persistent client
client = chromadb.PersistentClient(path="./chroma_db")
 
# Configure embedding function
ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-api-key",
    model_name="text-embedding-3-small"
)
 
# Create or get collection
collection = client.get_or_create_collection(
    name="documents",
    embedding_function=ef,
    metadata={"hnsw:space": "cosine"}
)
 
# Add documents with metadata
collection.add(
    documents=[
        "RAG combines retrieval with generation for grounded answers",
        "Vector databases store high-dimensional embeddings",
        "Knowledge graphs capture entity relationships"
    ],
    metadatas=[
        {"source": "tutorial", "topic": "rag"},
        {"source": "docs", "topic": "database"},
        {"source": "paper", "topic": "knowledge_graph"}
    ],
    ids=["doc1", "doc2", "doc3"]
)
 
# Query with metadata filter
results = collection.query(
    query_texts=["How does retrieval augmented generation work?"],
    n_results=3,
    where={"topic": {"$in": ["rag", "database"]}},
    include=["documents", "metadatas", "distances"]
)
 
for doc, meta, dist in zip(
    results["documents"][0],
    results["metadatas"][0],
    results["distances"][0]
):
    print(f"[{dist:.4f}] ({meta['source']}) {doc[:60]}...")

AI Agent Knowledge Base

Sidebar

Table of Contents

ChromaDB

Key Features

Architecture

Deployment Modes

Metadata Filtering

Code Example

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

ChromaDB

Key Features

Architecture

Deployment Modes

Metadata Filtering

Code Example

References

See Also

Page Tools