Maximum Inner Product Search

Maximum Inner Product Search (MIPS) is a fundamental retrieval operation that finds the vector in a collection which maximizes the inner product (dot product) with a given query vector. Unlike nearest neighbor search based on Euclidean distance, MIPS directly optimizes for the dot product similarity, making it particularly relevant for recommendation systems and retrieval-augmented generation where learned embeddings encode relevance as inner product magnitude. As of 2025, theoretical breakthroughs and new graph-based algorithms have significantly advanced MIPS efficiency.

Problem Formulation

Given a query vector $\mathbf{q}$ and a database of $n$ vectors $X = \{\mathbf{x}_1, \ldots, \mathbf{x}_n\}$ in $d$-dimensional space, MIPS seeks to find:

$$\mathbf{x}^* = \arg\max_{\mathbf{x} \in X} \langle \mathbf{q}, \mathbf{x} \rangle$$

where $\langle \mathbf{q}, \mathbf{x} \rangle = \sum_{i=1}^{d} q_i x_i$ denotes the inner product. Unlike Euclidean distance or cosine similarity, the inner product incorporates vector magnitudes, which encode valuable signals. In recommendation systems, a user embedding's magnitude may reflect engagement intensity, while an item embedding's magnitude may encode popularity. Cosine similarity normalizes these away (computing $\frac{\langle \mathbf{q}, \mathbf{x} \rangle}{||\mathbf{q}|| \cdot ||\mathbf{x}||}$ instead), losing information that MIPS preserves.

The key challenge is that inner product does not satisfy the triangle inequality ($\langle \mathbf{a}, \mathbf{c} \rangle$ cannot be bounded by $\langle \mathbf{a}, \mathbf{b} \rangle + \langle \mathbf{b}, \mathbf{c} \rangle$), which means standard metric space pruning techniques (used in Euclidean search) cannot be directly applied. This makes MIPS fundamentally harder than standard nearest neighbor search and has motivated specialized algorithms.

Reduction to Nearest Neighbor Search

A foundational theoretical insight is that MIPS can be reduced to Euclidean nearest neighbor search (NNS) through vector augmentation. By appending extra dimensions to both query and database vectors, one can transform the inner product maximization into a distance minimization problem. The classic approach transforms a database vector $\mathbf{x}$ to $\tilde{\mathbf{x}} = [\mathbf{x}; ||\mathbf{x}||^2; ||\mathbf{x}||^4; \ldots]$ and the query $\mathbf{q}$ to $\tilde{\mathbf{q}} = [\mathbf{q}; \frac{1}{2}; \frac{1}{2}; \ldots]$, so that $||\tilde{\mathbf{q}} - \tilde{\mathbf{x}}||^2$ is minimized when $\langle \mathbf{q}, \mathbf{x} \rangle$ is maximized.¹⁾

A 2025 theoretical breakthrough (Chen et al., arXiv, March/April 2025) proves that MIPS is equivalent to Euclidean NNS by simply scaling the query vector by a sufficiently large factor. This enables direct use of optimized NNS graph indices (SSG, MRNG) without the distortion introduced by earlier augmentation schemes. Their MAG (Metric-Amphibious Graph) algorithm blends inner-product and Euclidean edges for optimal connectivity, yielding order-of-magnitude speedups over prior MIPS-specific methods.

Key Algorithms

Asymmetric LSH for MIPS. Standard LSH requires a metric distance, which inner product is not. Asymmetric LSH handles this by applying different hash functions to queries and database vectors, compensating for norm biases.²⁾ While it provides sublinear query time guarantees, it has been largely surpassed by graph and quantization methods for practical high-dimensional workloads.

Graph-Based Methods. These dominate the 2025 MIPS landscape. ip-NSW+ (Liu et al., 2019) uses angular (cosine) graphs with two-phase search, achieving 11x speedup over inner-product-only graph walks. SSG+PSP and MAG (Chen et al., 2025) deliver order-of-magnitude gains through adaptive edge pruning and query scaling. Graph-based methods benefit from the navigable small-world property and can leverage existing HNSW implementations with minor modifications.

Quantization-Based Methods. ScaNN uses anisotropic vector quantization specifically optimized for inner product scoring, achieving state-of-the-art speed-recall tradeoffs.³⁾ FAISS supports MIPS through its IndexFlatIP and IVF indexes with inner product distance.

BanditMIPS takes a fundamentally different approach, treating MIPS as a multi-armed bandit problem. It achieves $O(1)$ complexity in dimension $d$ (versus prior $O(\sqrt{d})$) by using randomized coordinate sampling to estimate inner products, dramatically speeding up search in very high dimensions.⁴⁾

Sparse MIPS. SINDI (Li et al., 2025) addresses sparse vector MIPS with SIMD-optimized inverted indexes and aggressive pruning, achieving >99% recall at high queries-per-second. This is critical for hybrid dense-sparse retrieval in production RAG systems.

Applications in Retrieval-Augmented Generation

MIPS is the core operation underlying retrieval-augmented generation for AI agents. When an agent encodes a query and searches its long-term memory for relevant documents, it performs MIPS (or approximate MIPS) against its embedding database. The choice of inner product over cosine similarity matters when embedding magnitudes carry meaning, such as document importance scores or confidence levels.

In recommendation systems, MIPS maximizes user-item inner products $\langle \mathbf{u}, \mathbf{v} \rangle$ where vector magnitudes reflect preference strength or item popularity. Production systems at companies like Shopee and Ant Group use VSAG (Vector Search Approximate Graph) implementations for $k$-MIPS to retrieve diverse top-$k$ items.

Hybrid dense-sparse MIPS (as in SINDI) supports multilingual production retrieval by combining dense embedding similarity with sparse term-matching signals, integrating with inverted indexes for information retrieval at scale.