A vector database is the foundational storage layer in a Retrieval-Augmented Generation architecture. It stores high-dimensional vector embeddings of document chunks and enables efficient semantic similarity searches that retrieve relevant context for LLMs, improving response accuracy and reducing hallucinations. 1)
In the RAG workflow, data is chunked during ingestion, converted to embeddings via transformer models, and stored in the vector database during the indexing phase. At query time, the user question is embedded into the same vector space, and the database retrieves the top-K most similar embeddings. These retrieved chunks then provide context to the LLM for grounded generation. 2)
This architecture addresses critical LLM limitations including outdated training data, context window constraints, and the inability to access proprietary or domain-specific knowledge. 3)
Vector databases store dense vector embeddings – compact numerical arrays (typically 768-1536 dimensions) that capture semantic meaning from text, images, or other data modalities. 4) Unlike traditional databases built around structured rows and exact keyword matches, vector databases are purpose-built for similarity search in high-dimensional spaces. 5)
Each entry in the database typically includes the vector embedding itself, the original text chunk for retrieval and display, and associated metadata (source file, page number, timestamps, categories) for filtering. 6)
When a query arrives, it is embedded into a vector using the same model that encoded the documents. The vector database then computes distances between the query vector and all stored vectors to find the nearest neighbors. 7)
Efficient retrieval at scale requires specialized indexing algorithms that avoid brute-force comparison of every stored vector. 9)
HNSW builds a multi-layer graph structure where each layer provides increasingly fine-grained navigation to nearest neighbors. It offers an excellent balance of speed and recall for dynamic datasets where new vectors are frequently inserted. HNSW is the most widely supported algorithm across vector databases. 10)
IVF partitions vectors into clusters (Voronoi cells) during index construction. At query time, only the most relevant clusters are searched, dramatically reducing computation. IVF is highly scalable for massive datasets and is often combined with product quantization for memory-efficient search. 11)
| Database | Type | Key Strengths | Deployment | Notable Features |
|---|---|---|---|---|
| Pinecone | Managed cloud | Fully managed, easy scaling | Cloud-only | Serverless, auto-indexing, hybrid search |
| Weaviate | Open-source | GraphQL API, modular architecture | Self-hosted or cloud | HNSW, semantic hybrid search, modules for embeddings |
| Milvus | Open-source | High scalability (billions of vectors) | Self-hosted, Kubernetes | IVF, HNSW, distributed architecture, multi-modal |
| Qdrant | Open-source | Fast ANN, Rust-based performance | Self-hosted or cloud | HNSW, cosine similarity, billions-scale, filtering |
| ChromaDB | Open-source | Lightweight, Python-native | Local, embedded | Simple API, HNSW, ideal for prototyping |
| pgvector | PostgreSQL extension | SQL integration, no separate DB | Self-hosted (PostgreSQL) | HNSW and IVF via extension, ACID compliant |
Pinecone suits production deployments without operational overhead. Open-source options like Milvus and Qdrant offer customization and cost control for self-hosted environments. ChromaDB excels for rapid prototyping, while pgvector integrates vector search into existing PostgreSQL infrastructure. 13)
Production vector databases face several challenges at scale: