Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
ChromaDB is an open-source, AI-native embedding database designed to make it easy to build LLM applications with embeddings. With over 27,000 GitHub stars, it provides a simple API for storing, querying, and managing vector embeddings with metadata filtering, making it the go-to choice for rapid prototyping and production RAG applications.
| Repository | github.com/chroma-core/chroma |
| License | Apache 2.0 |
| Language | Python, Rust |
| Stars | 27K+ |
| Category | Embedding Database |
ChromaDB uses a collection-based structure separating vector storage, metadata storage, and embedding generation:
ChromaDB supports three deployment modes for different use cases:
| Mode | Description | Use Case |
|---|---|---|
| In-Memory | Fully ephemeral, embedded in app | Prototyping, testing, MVPs |
| Persistent (Embedded) | Disk-based via DuckDB+Parquet | Local apps, development |
| Client-Server | HTTP API, multi-tenant, scalable | Production, distributed systems |
ChromaDB combines vector similarity with exact metadata filters using where clauses:
{ “source”: “wiki” }{ “score”: { “$gt”: 0.8 } }{ “topic”: { “$in”: [“ai”, “ml”] } }import chromadb from chromadb.utils import embedding_functions # Initialize persistent client client = chromadb.PersistentClient(path="./chroma_db") # Configure embedding function ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-api-key", model_name="text-embedding-3-small" ) # Create or get collection collection = client.get_or_create_collection( name="documents", embedding_function=ef, metadata={"hnsw:space": "cosine"} ) # Add documents with metadata collection.add( documents=[ "RAG combines retrieval with generation for grounded answers", "Vector databases store high-dimensional embeddings", "Knowledge graphs capture entity relationships" ], metadatas=[ {"source": "tutorial", "topic": "rag"}, {"source": "docs", "topic": "database"}, {"source": "paper", "topic": "knowledge_graph"} ], ids=["doc1", "doc2", "doc3"] ) # Query with metadata filter results = collection.query( query_texts=["How does retrieval augmented generation work?"], n_results=3, where={"topic": {"$in": ["rag", "database"]}}, include=["documents", "metadatas", "distances"] ) for doc, meta, dist in zip( results["documents"][0], results["metadatas"][0], results["distances"][0] ): print(f"[{dist:.4f}] ({meta['source']}) {doc[:60]}...")