Table of Contents

pgvector

pgvector is an open-source PostgreSQL extension that adds native support for storing, indexing, and querying high-dimensional vectors. It enables efficient similarity search within existing PostgreSQL databases, allowing vector operations alongside traditional relational data.1)

Setup

Install pgvector from source or via package managers:2)

Enable in a database with:

CREATE EXTENSION vector;

Create a vector column with a fixed dimension count (e.g., 1536 for OpenAI embeddings):

CREATE TABLE items (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  embedding VECTOR(1536)
);

Vector Data Types

pgvector introduces the vector data type for fixed-length arrays of floating-point numbers:3)

Distance Operators

pgvector provides operators for different similarity metrics:4)

Operator Metric Usage
L2 (Euclidean) distance General-purpose similarity
Cosine distance Normalized angle-based similarity
<#> Negative inner product Maximum inner product search

Example query:

SELECT * FROM items
ORDER BY embedding <=> query_embedding
LIMIT 10;

IVFFlat Indexing

IVFFlat (Inverted File with Flat compression) partitions the vector space into clusters using k-means:5)

CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

HNSW Indexing

HNSW (Hierarchical Navigable Small World) uses a graph structure for high-recall approximate nearest neighbor search:6)

CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

Performance Tuning

Key tuning strategies for production pgvector deployments:7)

pgvector vs Dedicated Vector Databases

Feature pgvector Pinecone/Weaviate
Deployment Extension in existing PostgreSQL Managed SaaS or self-hosted
Cost Free/open-source Subscription-based
Integration Native SQL, ACID transactions, joins Vector-focused; requires data sync
Scale Millions of vectors with tuning Billions with horizontal sharding
Best For Unified relational + vector workloads Pure vector-heavy applications

pgvector excels when vectors need to live alongside relational data, avoiding data silos and synchronization overhead.8)

See Also

References