====== FAISS ======

FAISS (Facebook AI Similarity Search) is an open-source library developed by [[meta|Meta]] AI Research for efficient similarity search and clustering of dense vectors. Written in C++ with Python and C wrappers, it provides GPU-accelerated implementations of several indexing strategies including inverted file indexes, product quantization, and HNSW, enabling billion-scale nearest neighbor search on commodity hardware. Originally described by [[https://arxiv.org/abs/1702.08734|Johnson et al., 2017]], FAISS has become the de facto standard library for vector similarity search in both research and production systems, integrated into vector databases including Milvus, OpenSearch, and Vearch.(([[https://arxiv.org/abs/1702.08734|arXiv:1702.08734]]))

===== Index Types and Quantization =====

FAISS provides a comprehensive set of index types that trade off between search accuracy, speed, and memory usage:

**Flat Indexes** (IndexFlatL2, IndexFlatIP) perform exhaustive brute-force comparison against all vectors. IndexFlatL2 computes the L2 distance $||\mathbf{x} - \mathbf{q}||_2$ between the query and every database vector. They provide exact results and serve as accuracy baselines, but scale linearly with dataset size. IndexFlatIP supports [[maximum_inner_product_search|MIPS]] directly by computing $\langle \mathbf{q}, \mathbf{x} \rangle$ for each database vector.

**Inverted File Indexes (IVF)** partition vectors into clusters using k-means, then search only the nearest clusters at query time. IVFFlat stores full vectors within clusters; IVFPQ combines clustering with product quantization for massive memory savings. The nprobe parameter controls the speed-accuracy tradeoff by setting how many clusters to search.

**Product Quantization (PQ)** compresses vectors by splitting them into $m$ subvectors and quantizing each independently using small codebooks of size $k^*$. A $d$-dimensional vector is decomposed as $\mathbf{x} = [\mathbf{x}^1, \mathbf{x}^2, \ldots, \mathbf{x}^m]$, and the approximate distance is computed as:

$$\tilde{d}(\mathbf{x}, \mathbf{q}) = \sqrt{\sum_{j=1}^{m} d(\mathbf{x}^j, \mathbf{c}_{i_j}^j)^2}$$

This reduces memory by 10-100x while enabling fast approximate distance computation via lookup tables. PQ is the foundation for scaling to billions of vectors on a single machine.

**Scalar Quantization (SQ)** reduces precision of vector components (e.g., float32 to int8), offering a simpler compression scheme with less accuracy loss than PQ for moderate compression ratios.

**[[hnsw_graphs|HNSW]] Index** (IndexHNSWFlat) builds a hierarchical navigable small world graph for logarithmic-time search with $O(\log N)$ query complexity. FAISS HNSW implementation supports flat vectors and can be combined with quantization for memory efficiency.

**Binary Indexes** operate on binary vectors using Hamming distance $d_H(\mathbf{x}, \mathbf{y}) = \sum_{i} x_i \oplus y_i$, providing extremely fast search for binary hash codes.

===== Python Example =====

<code python>
import numpy as np
import faiss

# Generate sample data: 10000 vectors of 128 dimensions
np.random.seed(42)
dimension = 128
num_vectors = 10000
database_vectors = np.random.random.astype("float32")
query_vectors = np.random.random.astype("float32")

# Build an IVF index with flat storage (faster than brute-force)
num_clusters = 100
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, num_clusters)

# Train the index on the data, then add vectors
index.train(database_vectors)
index.add(database_vectors)

# Search: find 4 nearest neighbors for each query
index.nprobe = 10  # number of clusters to visit (speed vs accuracy)
distances, indices = index.search(query_vectors, k=4)

print(f"Nearest neighbor indices:\n{indices}")
print(f"L2 distances:\n{distances}")

# Use the index [[factory|factory]] for a composite index (IVF + PQ compression)
index_pq = faiss.index_factory(dimension, "IVF256,PQ16")
index_pq.train(database_vectors)
index_pq.add(database_vectors)
distances_pq, indices_pq = index_pq.search(query_vectors, k=4)
print(f"PQ-compressed search results:\n{indices_pq}")
</code>

===== GPU Acceleration =====

FAISS achieves significant performance gains through GPU optimization, with reported speeds 8.5x faster than previous state-of-the-art approaches. The GPU implementation uses optimized k-selection algorithms and keeps intermediate state in GPU registers for maximum throughput. Key GPU features include:

  * Batch query processing that saturates GPU parallelism
  * GPU-native IVF and PQ implementations
  * Multi-GPU sharding for datasets exceeding single GPU memory
  * Hybrid CPU/GPU pipelines where coarse quantization runs on GPU and refinement on CPU

[[meta|Meta]] has reported indexing 1.5 trillion 144-dimensional vectors for internal applications, demonstrating FAISS production scale.

===== Composite Indexes and Index Factory =====

FAISS index [[factory|factory]] string syntax allows composing complex indexes declaratively. For example, "IVF4096,PQ32" creates an inverted file index with 4096 clusters and 32-byte product quantization. Common production configurations include:

  * **IVF + PQ**: The workhorse for billion-scale search, balancing memory and accuracy
  * **IVF + HNSW coarse quantizer**: Uses HNSW to speed up cluster assignment in IVF
  * **OPQ + IVF + PQ**: Adds optimized product quantization rotation for better PQ accuracy
  * **IVF + SQ**: Scalar quantization for moderate compression with simpler implementation

FAISS also provides preprocessing tools: PCA for dimensionality reduction, random rotation matrices for variance spreading across PQ subspaces, and k-means clustering for data analysis.

===== Benchmarks and Scalability =====

On standard benchmarks (SIFT1M, Deep1B, BIGANN), FAISS consistently ranks among the top libraries for speed-recall tradeoffs, particularly in GPU-accelerated and memory-constrained configurations. Key performance characteristics:

  * **Billion-scale search**: IVF+PQ indexes can search 1B+ vectors in milliseconds on a single machine
  * **Memory efficiency**: PQ compression reduces a 1B vector dataset from ~512GB (128-dim float32) to ~32GB
  * **Throughput**: GPU implementations achieve tens of thousands of queries per second on batch workloads

Compared to [[hnsw_graphs|HNSW]]-only solutions (hnswlib), FAISS offers more flexibility in memory-accuracy tradeoffs through quantization but requires more tuning. Compared to [[scann|ScaNN]], FAISS has broader index type coverage and GPU support, while ScaNN may offer better CPU performance for inner product search due to anisotropic quantization.

===== Usage in Agent and RAG Pipelines =====

In 2025, FAISS commonly serves as the low-level similarity search engine powering [[memory_augmentation_strategies|retrieval-augmented generation]] architectures. Its role in agent systems includes:

  * **Embedding retrieval**: Answering "which stored documents are most similar to this query?" for [[long_term_memory|long-term memory]] access
  * **[[semantic_search|Semantic search]]**: Powering the retrieval step in RAG pipelines for LangChain, [[llamaindex|LlamaIndex]], and custom agent frameworks
  * **Deduplication**: Identifying near-duplicate content using approximate search
  * **Clustering**: Grouping [[embeddings|embeddings]] for memory organization and topic modeling

FAISS is integrated into higher-level vector databases ([[milvus|Milvus]] uses FAISS indexes internally) and directly into agent frameworks. [[langchain|LangChain]] and [[llamaindex|LlamaIndex]] both provide FAISS vector store integrations for rapid prototyping and production deployment.

===== See Also =====

  * [[milvus|Milvus]]
  * [[scann|ScaNN]]
  * [[image_similarity_search|Image Similarity Search]]

===== References =====