Anisotropic Vector Quantization
Partitioning and Scoring
Python Example
Performance on ANN-Benchmarks
Recent Improvements (Post-2020)
Integration with TensorFlow and JAX
Production Deployments
See Also
References

ScaNN

ScaNN (Scalable Nearest Neighbors) is a vector similarity search library developed by Google Research that implements state-of-the-art techniques for maximum inner product search and nearest neighbor retrieval. Introduced by Guo et al., 2020 in the paper “Accelerating Large-Scale Inference with Anisotropic Vector Quantization,” ScaNN key innovation is anisotropic vector quantization (AVQ), which minimizes angular error rather than reconstruction error when compressing vectors.¹⁾ This approach yields superior recall at a given latency budget, and ScaNN consistently achieves top results on the ann-benchmarks leaderboard across multiple dataset configurations.

Anisotropic Vector Quantization

Standard product quantization (PQ), as used in FAISS, minimizes reconstruction error: it finds codebook entries that minimize the L2 distance $||\mathbf{x} - \tilde{\mathbf{x}}||_2$ between original and quantized vectors. However, for inner product search, reconstruction error is not the right objective. A vector component parallel to the query direction matters far more for inner product scoring than the perpendicular component.

Anisotropic Vector Quantization (AVQ) addresses this by applying direction-dependent weighting during quantization. Given a vector $\mathbf{x}$ and its quantized approximation $\tilde{\mathbf{x}}$, AVQ minimizes a weighted loss:

$$\mathcal{L}_{\text{AVQ}} = w_\parallel \cdot ||\mathbf{e}_\parallel||^2 + w_\perp \cdot ||\mathbf{e}_\perp||^2$$

where $\mathbf{e}_\parallel$ and $\mathbf{e}_\perp$ are the parallel and perpendicular components of the quantization error, and $w_\parallel \gg w_\perp$. The original Guo et al., 2020 paper formalizes this as a weighted loss function with a tunable anisotropy parameter that controls the parallel-to-perpendicular error ratio.

The result is that ScaNN achieves higher recall than standard PQ at the same compression ratio for inner product and cosine similarity searches, which are the dominant distance metrics in embedding-based retrieval for AI agents.

Partitioning and Scoring

ScaNN uses a three-stage pipeline that progressively refines search results:

Stage 1: Tree-Based Partitioning. The dataset is partitioned into clusters using k-means with adaptive leaf sizes. At query time, the system identifies the nearest clusters and only searches within them, pruning the vast majority of the dataset. The num_leaves parameter controls granularity: more leaves means smaller clusters (faster per-cluster search but more clusters to check). Score-based pruning determines how many clusters to search, dynamically adjusting based on the query distance to cluster centroids.

Stage 2: Asymmetric Hashing (score-ah). Within selected clusters, ScaNN computes approximate inner products using asymmetric hashing. The database vectors are quantized (compressed), but the query vector remains in full precision. The approximate score is $\langle \mathbf{q}, \tilde{\mathbf{x}} \rangle \approx \langle \mathbf{q}, \mathbf{x} \rangle$. This asymmetric approach preserves more information than symmetric quantization where both sides are compressed. The dimensions_per_block parameter controls the quantization granularity.

Stage 3: Exact Reranking. The top candidates from approximate scoring are reranked using exact distance computation $\langle \mathbf{q}, \mathbf{x} \rangle$ on the original (uncompressed) vectors. The reranking depth parameter controls how many candidates are rescored, trading latency for accuracy.

Python Example

import numpy as np
import scann
 
# Generate sample data: 10000 vectors of 128 dimensions
np.random.seed(42)
dimension = 128
num_vectors = 10000
database = np.random.random((num_vectors, dimension)).astype("float32")
# Normalize for cosine similarity (dot product on unit vectors)
database /= np.linalg.norm(database, axis=1, keepdims=True)
 
queries = np.random.random((5, dimension)).astype("float32")
queries /= np.linalg.norm(queries, axis=1, keepdims=True)
 
# Build a ScaNN index using the builder API
searcher = (
    scann.scann_ops_pybind.builder(database, num_neighbors=10, distance_measure="dot_product")
    .tree(num_leaves=100, num_leaves_to_search=10, training_sample_size=num_vectors)
    .score_ah(dimensions_per_block=2, anisotropic_quantization_threshold=0.2)
    .reorder(reordering_num_neighbors=100)
    .build()
)
 
# Search for the 10 nearest neighbors of each query
neighbors, distances = searcher.search_batched(queries, final_num_neighbors=10)
 
print(f"Nearest neighbor indices:\n{neighbors}")
print(f"Dot product scores:\n{distances}")
 
# Single query search
single_neighbors, single_distances = searcher.search(queries[0], final_num_neighbors=5)
print(f"Single query top-5: {single_neighbors}")

Performance on ANN-Benchmarks

ScaNN leads ann-benchmarks.com on several datasets, particularly glove-100-angular, where its anisotropic quantization provides the strongest advantage. Key performance characteristics:

Speed-Recall Tradeoff: ScaNN achieves 2-10x faster search than FAISS at equivalent recall on CPU for inner product and angular distance metrics
Scalability: Handles billions of vectors with millisecond latency through efficient partitioning and SIMD-optimized scoring kernels (AVX2 on x86)
Memory Efficiency: AVQ achieves better recall per byte of compressed representation than standard PQ

Compared to HNSW, ScaNN uses less memory (compressed vectors vs. graph edges) and provides faster search at high recall on dense vector workloads, though HNSW supports dynamic insertions more naturally. Compared to FAISS IVF+PQ, ScaNN anisotropic quantization provides measurably better recall at the same compression level for inner product search.

Recent Improvements (Post-2020)

KScaNN (2025) extends ScaNN with hybrid intra-cluster search combining graph-based navigation and brute-force scanning, ML-driven per-query parameter tuning, and ARM processor optimizations. KScaNN achieves 1.06x improvement on angular benchmarks like GloVe-100.

SOAR Integration enhances ScaNN indexing with orthogonality amplification, improving recall in low-memory configurations by ensuring quantized representations better preserve distance relationships.

Hardware Optimizations include SIMD kernels tuned for AVX2 (x86) and NEON (ARM), batch query parallelism, and memory-mapped index support for datasets larger than RAM.

Integration with TensorFlow and JAX

ScaNN integrates natively with the TensorFlow ecosystem:

TensorFlow Recommenders (TFRS): ScaNN serves as the retrieval backend for the official TensorFlow recommendation library, providing fast candidate generation in two-tower models
Keras: The keras-rs package provides a ScaNN layer for end-to-end trainable retrieval models, demonstrated in movie recommendation tutorials
TensorFlow Serving: ScaNN indexes can be loaded alongside TF models for integrated inference and retrieval

ScaNN is also available as a standalone Python package (pip install scann) with a simple API for index construction, search, and serialization.

Production Deployments

Google Cloud Spanner integrated ScaNN for ANN search (GA in 2025), enabling vector similarity queries alongside traditional SQL in Google globally distributed database. This powers recommendation, semantic search, and generative AI features within Spanner existing infrastructure.

Google Search and Ads uses ScaNN internally for real-time embedding retrieval in latency-sensitive services where CPU efficiency is critical.

For AI agent systems, ScaNN serves as the retrieval engine for RAG pipelines, indexing document and conversation embeddings for fast long-term memory access. Its Keras integration makes it particularly suitable for agents built within the TensorFlow/Google Cloud ecosystem.

References

¹⁾

Guo, R. et al. "Accelerating Large-Scale Inference with Anisotropic Vector Quantization." arXiv:1910.04489, 2020.

Table of Contents