====== FAISS ====== FAISS (Facebook AI Similarity Search) is an open-source library developed by [[meta|Meta]] AI Research for efficient similarity search and clustering of dense vectors. Written in C++ with Python and C wrappers, it provides GPU-accelerated implementations of several indexing strategies including inverted file indexes, product quantization, and HNSW, enabling billion-scale nearest neighbor search on commodity hardware. Originally described by [[https://arxiv.org/abs/1702.08734|Johnson et al., 2017]], FAISS has become the de facto standard library for vector similarity search in both research and production systems, integrated into vector databases including Milvus, OpenSearch, and Vearch.(([[https://arxiv.org/abs/1702.08734|arXiv:1702.08734]])) ===== Index Types and Quantization ===== FAISS provides a comprehensive set of index types that trade off between search accuracy, speed, and memory usage: **Flat Indexes** (IndexFlatL2, IndexFlatIP) perform exhaustive brute-force comparison against all vectors. IndexFlatL2 computes the L2 distance $||\mathbf{x} - \mathbf{q}||_2$ between the query and every database vector. They provide exact results and serve as accuracy baselines, but scale linearly with dataset size. IndexFlatIP supports [[maximum_inner_product_search|MIPS]] directly by computing $\langle \mathbf{q}, \mathbf{x} \rangle$ for each database vector. **Inverted File Indexes (IVF)** partition vectors into clusters using k-means, then search only the nearest clusters at query time. IVFFlat stores full vectors within clusters; IVFPQ combines clustering with product quantization for massive memory savings. The nprobe parameter controls the speed-accuracy tradeoff by setting how many clusters to search. **Product Quantization (PQ)** compresses vectors by splitting them into $m$ subvectors and quantizing each independently using small codebooks of size $k^*$. A $d$-dimensional vector is decomposed as $\mathbf{x} = [\mathbf{x}^1, \mathbf{x}^2, \ldots, \mathbf{x}^m]$, and the approximate distance is computed as: $$\tilde{d}(\mathbf{x}, \mathbf{q}) = \sqrt{\sum_{j=1}^{m} d(\mathbf{x}^j, \mathbf{c}_{i_j}^j)^2}$$ This reduces memory by 10-100x while enabling fast approximate distance computation via lookup tables. PQ is the foundation for scaling to billions of vectors on a single machine. **Scalar Quantization (SQ)** reduces precision of vector components (e.g., float32 to int8), offering a simpler compression scheme with less accuracy loss than PQ for moderate compression ratios. **[[hnsw_graphs|HNSW]] Index** (IndexHNSWFlat) builds a hierarchical navigable small world graph for logarithmic-time search with $O(\log N)$ query complexity. FAISS HNSW implementation supports flat vectors and can be combined with quantization for memory efficiency. **Binary Indexes** operate on binary vectors using Hamming distance $d_H(\mathbf{x}, \mathbf{y}) = \sum_{i} x_i \oplus y_i$, providing extremely fast search for binary hash codes. ===== Python Example ===== import numpy as np import faiss # Generate sample data: 10000 vectors of 128 dimensions np.random.seed(42) dimension = 128 num_vectors = 10000 database_vectors = np.random.random.astype("float32") query_vectors = np.random.random.astype("float32") # Build an IVF index with flat storage (faster than brute-force) num_clusters = 100 quantizer = faiss.IndexFlatL2(dimension) index = faiss.IndexIVFFlat(quantizer, dimension, num_clusters) # Train the index on the data, then add vectors index.train(database_vectors) index.add(database_vectors) # Search: find 4 nearest neighbors for each query index.nprobe = 10 # number of clusters to visit (speed vs accuracy) distances, indices = index.search(query_vectors, k=4) print(f"Nearest neighbor indices:\n{indices}") print(f"L2 distances:\n{distances}") # Use the index [[factory|factory]] for a composite index (IVF + PQ compression) index_pq = faiss.index_factory(dimension, "IVF256,PQ16") index_pq.train(database_vectors) index_pq.add(database_vectors) distances_pq, indices_pq = index_pq.search(query_vectors, k=4) print(f"PQ-compressed search results:\n{indices_pq}") ===== GPU Acceleration ===== FAISS achieves significant performance gains through GPU optimization, with reported speeds 8.5x faster than previous state-of-the-art approaches. The GPU implementation uses optimized k-selection algorithms and keeps intermediate state in GPU registers for maximum throughput. Key GPU features include: * Batch query processing that saturates GPU parallelism * GPU-native IVF and PQ implementations * Multi-GPU sharding for datasets exceeding single GPU memory * Hybrid CPU/GPU pipelines where coarse quantization runs on GPU and refinement on CPU [[meta|Meta]] has reported indexing 1.5 trillion 144-dimensional vectors for internal applications, demonstrating FAISS production scale. ===== Composite Indexes and Index Factory ===== FAISS index [[factory|factory]] string syntax allows composing complex indexes declaratively. For example, "IVF4096,PQ32" creates an inverted file index with 4096 clusters and 32-byte product quantization. Common production configurations include: * **IVF + PQ**: The workhorse for billion-scale search, balancing memory and accuracy * **IVF + HNSW coarse quantizer**: Uses HNSW to speed up cluster assignment in IVF * **OPQ + IVF + PQ**: Adds optimized product quantization rotation for better PQ accuracy * **IVF + SQ**: Scalar quantization for moderate compression with simpler implementation FAISS also provides preprocessing tools: PCA for dimensionality reduction, random rotation matrices for variance spreading across PQ subspaces, and k-means clustering for data analysis. ===== Benchmarks and Scalability ===== On standard benchmarks (SIFT1M, Deep1B, BIGANN), FAISS consistently ranks among the top libraries for speed-recall tradeoffs, particularly in GPU-accelerated and memory-constrained configurations. Key performance characteristics: * **Billion-scale search**: IVF+PQ indexes can search 1B+ vectors in milliseconds on a single machine * **Memory efficiency**: PQ compression reduces a 1B vector dataset from ~512GB (128-dim float32) to ~32GB * **Throughput**: GPU implementations achieve tens of thousands of queries per second on batch workloads Compared to [[hnsw_graphs|HNSW]]-only solutions (hnswlib), FAISS offers more flexibility in memory-accuracy tradeoffs through quantization but requires more tuning. Compared to [[scann|ScaNN]], FAISS has broader index type coverage and GPU support, while ScaNN may offer better CPU performance for inner product search due to anisotropic quantization. ===== Usage in Agent and RAG Pipelines ===== In 2025, FAISS commonly serves as the low-level similarity search engine powering [[memory_augmentation_strategies|retrieval-augmented generation]] architectures. Its role in agent systems includes: * **Embedding retrieval**: Answering "which stored documents are most similar to this query?" for [[long_term_memory|long-term memory]] access * **[[semantic_search|Semantic search]]**: Powering the retrieval step in RAG pipelines for LangChain, [[llamaindex|LlamaIndex]], and custom agent frameworks * **Deduplication**: Identifying near-duplicate content using approximate search * **Clustering**: Grouping [[embeddings|embeddings]] for memory organization and topic modeling FAISS is integrated into higher-level vector databases ([[milvus|Milvus]] uses FAISS indexes internally) and directly into agent frameworks. [[langchain|LangChain]] and [[llamaindex|LlamaIndex]] both provide FAISS vector store integrations for rapid prototyping and production deployment. ===== See Also ===== * [[milvus|Milvus]] * [[scann|ScaNN]] * [[image_similarity_search|Image Similarity Search]] ===== References =====