FAISS (Facebook AI Similarity Search) is an open-source library developed by Meta AI Research for efficient similarity search and clustering of dense vectors. Written in C++ with Python and C wrappers, it provides GPU-accelerated implementations of several indexing strategies including inverted file indexes, product quantization, and HNSW, enabling billion-scale nearest neighbor search on commodity hardware. Originally described by Johnson et al., 2017, FAISS has become the de facto standard library for vector similarity search in both research and production systems, integrated into vector databases including Milvus, OpenSearch, and Vearch.1)
FAISS provides a comprehensive set of index types that trade off between search accuracy, speed, and memory usage:
Flat Indexes (IndexFlatL2, IndexFlatIP) perform exhaustive brute-force comparison against all vectors. IndexFlatL2 computes the L2 distance $||\mathbf{x} - \mathbf{q}||_2$ between the query and every database vector. They provide exact results and serve as accuracy baselines, but scale linearly with dataset size. IndexFlatIP supports MIPS directly by computing $\langle \mathbf{q}, \mathbf{x} \rangle$ for each database vector.
Inverted File Indexes (IVF) partition vectors into clusters using k-means, then search only the nearest clusters at query time. IVFFlat stores full vectors within clusters; IVFPQ combines clustering with product quantization for massive memory savings. The nprobe parameter controls the speed-accuracy tradeoff by setting how many clusters to search.
Product Quantization (PQ) compresses vectors by splitting them into $m$ subvectors and quantizing each independently using small codebooks of size $k^*$. A $d$-dimensional vector is decomposed as $\mathbf{x} = [\mathbf{x}^1, \mathbf{x}^2, \ldots, \mathbf{x}^m]$, and the approximate distance is computed as:
$$\tilde{d}(\mathbf{x}, \mathbf{q}) = \sqrt{\sum_{j=1}^{m} d(\mathbf{x}^j, \mathbf{c}_{i_j}^j)^2}$$
This reduces memory by 10-100x while enabling fast approximate distance computation via lookup tables. PQ is the foundation for scaling to billions of vectors on a single machine.
Scalar Quantization (SQ) reduces precision of vector components (e.g., float32 to int8), offering a simpler compression scheme with less accuracy loss than PQ for moderate compression ratios.
HNSW Index (IndexHNSWFlat) builds a hierarchical navigable small world graph for logarithmic-time search with $O(\log N)$ query complexity. FAISS HNSW implementation supports flat vectors and can be combined with quantization for memory efficiency.
Binary Indexes operate on binary vectors using Hamming distance $d_H(\mathbf{x}, \mathbf{y}) = \sum_{i} x_i \oplus y_i$, providing extremely fast search for binary hash codes.
import numpy as np import faiss # Generate sample data: 10000 vectors of 128 dimensions np.random.seed(42) dimension = 128 num_vectors = 10000 database_vectors = np.random.random.astype("float32") query_vectors = np.random.random.astype("float32") # Build an IVF index with flat storage (faster than brute-force) num_clusters = 100 quantizer = faiss.IndexFlatL2(dimension) index = faiss.IndexIVFFlat(quantizer, dimension, num_clusters) # Train the index on the data, then add vectors index.train(database_vectors) index.add(database_vectors) # Search: find 4 nearest neighbors for each query index.nprobe = 10 # number of clusters to visit (speed vs accuracy) distances, indices = index.search(query_vectors, k=4) print(f"Nearest neighbor indices:\n{indices}") print(f"L2 distances:\n{distances}") # Use the index [[factory|factory]] for a composite index (IVF + PQ compression) index_pq = faiss.index_factory(dimension, "IVF256,PQ16") index_pq.train(database_vectors) index_pq.add(database_vectors) distances_pq, indices_pq = index_pq.search(query_vectors, k=4) print(f"PQ-compressed search results:\n{indices_pq}")
FAISS achieves significant performance gains through GPU optimization, with reported speeds 8.5x faster than previous state-of-the-art approaches. The GPU implementation uses optimized k-selection algorithms and keeps intermediate state in GPU registers for maximum throughput. Key GPU features include:
Meta has reported indexing 1.5 trillion 144-dimensional vectors for internal applications, demonstrating FAISS production scale.
FAISS index factory string syntax allows composing complex indexes declaratively. For example, “IVF4096,PQ32” creates an inverted file index with 4096 clusters and 32-byte product quantization. Common production configurations include:
FAISS also provides preprocessing tools: PCA for dimensionality reduction, random rotation matrices for variance spreading across PQ subspaces, and k-means clustering for data analysis.
On standard benchmarks (SIFT1M, Deep1B, BIGANN), FAISS consistently ranks among the top libraries for speed-recall tradeoffs, particularly in GPU-accelerated and memory-constrained configurations. Key performance characteristics:
Compared to HNSW-only solutions (hnswlib), FAISS offers more flexibility in memory-accuracy tradeoffs through quantization but requires more tuning. Compared to ScaNN, FAISS has broader index type coverage and GPU support, while ScaNN may offer better CPU performance for inner product search due to anisotropic quantization.
In 2025, FAISS commonly serves as the low-level similarity search engine powering retrieval-augmented generation architectures. Its role in agent systems includes:
FAISS is integrated into higher-level vector databases (Milvus uses FAISS indexes internally) and directly into agent frameworks. LangChain and LlamaIndex both provide FAISS vector store integrations for rapid prototyping and production deployment.