====== Role of a Vector Database in AI RAG Architecture ======

A vector database is the foundational storage layer in a Retrieval-Augmented Generation architecture. It stores high-dimensional vector embeddings of document chunks and enables efficient semantic similarity searches that retrieve relevant context for LLMs, improving response accuracy and reducing hallucinations. ((source [[https://writer.com/engineering/rag-vector-database/|Writer - RAG Vector Database]]))

===== How Vector Databases Fit Into RAG =====

In the RAG workflow, data is chunked during ingestion, converted to embeddings via transformer models, and stored in the vector database during the indexing phase. At query time, the user question is embedded into the same vector space, and the database retrieves the top-K most similar embeddings. These retrieved chunks then provide context to the LLM for grounded generation. ((source [[https://qdrant.tech/articles/what-is-rag-in-ai/|Qdrant - What is RAG in AI]]))

This architecture addresses critical LLM limitations including outdated training data, context window constraints, and the inability to access proprietary or domain-specific knowledge. ((source [[https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/|Cloudflare - What is a Vector Database]]))

===== How Embeddings Are Stored =====

Vector databases store **dense vector embeddings** -- compact numerical arrays (typically 768-1536 dimensions) that capture semantic meaning from text, images, or other data modalities. ((source [[https://medium.com/@iamanraghuvanshi/vector-embeddings-and-vector-databases-0cd0e2a8d95b|Raghuvanshi - Vector Embeddings and Vector Databases]])) Unlike traditional databases built around structured rows and exact keyword matches, vector databases are purpose-built for **similarity search in high-dimensional spaces**. ((source [[https://medium.com/@yaroslavzhbankov/vector-databases-searching-by-meaning-the-essential-engine-of-the-llm-era-1982794e7542|Zhbankov - Vector Databases: Searching by Meaning]]))

Each entry in the database typically includes the vector embedding itself, the original text chunk for retrieval and display, and associated metadata (source file, page number, timestamps, categories) for filtering. ((source [[https://qdrant.tech/articles/what-is-rag-in-ai/|Qdrant - What is RAG in AI]]))

===== Similarity Search =====

When a query arrives, it is embedded into a vector using the same model that encoded the documents. The vector database then computes distances between the query vector and all stored vectors to find the nearest neighbors. ((source [[https://redis.io/blog/vector-database-use-cases/|Redis - Vector Database Use Cases]]))

==== Distance Metrics ====

  * **Cosine similarity**: Measures the cosine of the angle between vectors, focusing on directional alignment while ignoring magnitude. Most commonly used for text embeddings.
  * **Euclidean distance**: Measures straight-line distance between vector endpoints. Useful when magnitude matters.
  * **Dot product**: Combines magnitude and direction. Fast to compute and effective when vectors are normalized. ((source [[https://redis.io/blog/vector-database-use-cases/|Redis - Vector Database Use Cases]]))

===== Indexing Algorithms =====

Efficient retrieval at scale requires specialized indexing algorithms that avoid brute-force comparison of every stored vector. ((source [[https://medium.com/@tararoutray/the-architecture-behind-vector-databases-in-modern-ai-systems-17a6c8a19095|Routray - Vector Database Architecture]]))

==== HNSW (Hierarchical Navigable Small World) ====

HNSW builds a multi-layer graph structure where each layer provides increasingly fine-grained navigation to nearest neighbors. It offers an excellent balance of speed and recall for dynamic datasets where new vectors are frequently inserted. HNSW is the most widely supported algorithm across vector databases. ((source [[https://qdrant.tech/articles/what-is-rag-in-ai/|Qdrant - What is RAG in AI]]))

==== IVF (Inverted File Index) ====

IVF partitions vectors into clusters (Voronoi cells) during index construction. At query time, only the most relevant clusters are searched, dramatically reducing computation. IVF is highly scalable for massive datasets and is often combined with product quantization for memory-efficient search. ((source [[https://qdrant.tech/articles/what-is-rag-in-ai/|Qdrant - What is RAG in AI]]))

===== Comparison of Popular Vector Databases =====

^ Database ^ Type ^ Key Strengths ^ Deployment ^ Notable Features ^
| **Pinecone** | Managed cloud | Fully managed, easy scaling | Cloud-only | Serverless, auto-indexing, hybrid search |
| **Weaviate** | Open-source | GraphQL API, modular architecture | Self-hosted or cloud | HNSW, semantic hybrid search, modules for embeddings |
| **Milvus** | Open-source | High scalability (billions of vectors) | Self-hosted, Kubernetes | IVF, HNSW, distributed architecture, multi-modal |
| **Qdrant** | Open-source | Fast ANN, Rust-based performance | Self-hosted or cloud | HNSW, cosine similarity, billions-scale, filtering |
| **ChromaDB** | Open-source | Lightweight, Python-native | Local, embedded | Simple API, HNSW, ideal for prototyping |
| **pgvector** | PostgreSQL extension | SQL integration, no separate DB | Self-hosted (PostgreSQL) | HNSW and IVF via extension, ACID compliant |

((source [[https://qdrant.tech/articles/what-is-rag-in-ai/|Qdrant - What is RAG in AI]]))

Pinecone suits production deployments without operational overhead. Open-source options like Milvus and Qdrant offer customization and cost control for self-hosted environments. ChromaDB excels for rapid prototyping, while pgvector integrates vector search into existing PostgreSQL infrastructure. ((source [[https://writer.com/engineering/rag-vector-database/|Writer - RAG Vector Database]]))

===== Scaling Challenges =====

Production vector databases face several challenges at scale:

  * **Memory requirements**: High-dimensional vectors consume significant RAM, especially with HNSW indexes
  * **Update costs**: Re-indexing when embedding models change can require processing millions of documents
  * **Multi-tenancy**: Isolating data between users or organizations adds architectural complexity
  * **Latency vs. accuracy**: ANN algorithms trade precision for speed, requiring careful tuning of parameters like ef_search and nprobe ((source [[https://writer.com/engineering/rag-vector-database/|Writer - RAG Vector Database]]))

===== See Also =====

  * [[retrieval_augmented_generation|Retrieval-Augmented Generation]]
  * [[how_to_build_a_rag_pipeline|How to Build a RAG Pipeline]]
  * [[agentic_rag|Agentic RAG]]
  * [[vector_db_comparison|Vector Database Comparison]]
  * [[rag_phases|Phases of a RAG System]]
  * [[rag_ingestion_phase|What Happens During the Ingestion Phase of RAG]]
  * [[rag_retrieval_phase|How Does the Retrieval Phase Work in RAG]]

===== References =====