Setup
Vector Data Types
Distance Operators
IVFFlat Indexing
HNSW Indexing
Performance Tuning
pgvector vs Dedicated Vector Databases
See Also
References

pgvector

pgvector is an open-source PostgreSQL extension that adds native support for storing, indexing, and querying high-dimensional vectors. It enables efficient similarity search within existing PostgreSQL databases, allowing vector operations alongside traditional relational data.¹⁾

Setup

Install pgvector from source or via package managers:²⁾

Linux – apt install postgresql-16-pgvector
macOS – brew install pgvector
Managed services – pre-installed on AWS RDS, Google Cloud SQL, Supabase, Neon

Enable in a database with:

CREATE EXTENSION vector;

Create a vector column with a fixed dimension count (e.g., 1536 for OpenAI embeddings):

CREATE TABLE items (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  embedding VECTOR(1536)
);

Vector Data Types

pgvector introduces the vector data type for fixed-length arrays of floating-point numbers:³⁾

Dimensions specified at column creation
Integrity validation on insert
Aggregate functions: avg(vector), sum(vector)
Combinable with SQL joins, filters, and aggregations

Distance Operators

pgvector provides operators for different similarity metrics:⁴⁾

Operator	Metric	Usage
`↔`	L2 (Euclidean) distance	General-purpose similarity
`⇔`	Cosine distance	Normalized angle-based similarity
`<#>`	Negative inner product	Maximum inner product search

Example query:

SELECT * FROM items
ORDER BY embedding <=> query_embedding
LIMIT 10;

IVFFlat Indexing

IVFFlat (Inverted File with Flat compression) partitions the vector space into clusters using k-means:⁵⁾

Searches only the closest cluster(s) instead of all vectors
lists parameter controls the number of clusters (recommended: sqrt(n) for n rows)
ivfflat.probes at query time controls how many clusters to search
Requires data before index creation (needs data for centroid calculation)
Lower memory usage than HNSW
Best for smaller datasets or when memory is constrained

CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

HNSW Indexing

HNSW (Hierarchical Navigable Small World) uses a graph structure for high-recall approximate nearest neighbor search:⁶⁾

Builds a multi-layer graph of connected nodes
Higher layers for coarse navigation, lower layers for precise search
Key tunable parameters:
- m – max connections per node (default 16; higher improves recall, increases index size)
- ef_construction – search depth during build (default 64; higher improves quality)
hnsw.ef_search at query time controls search effort (default 40)
Supports parallel index builds
Best for large datasets requiring high recall and low latency

CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

Performance Tuning

Key tuning strategies for production pgvector deployments:⁷⁾

Increase maintenance_work_mem for faster index builds (2-4GB recommended)
Increase max_parallel_maintenance_workers for parallel HNSW builds
Normalize vectors for cosine distance operations
Use table partitioning for very large datasets
Monitor with standard PostgreSQL tools
Set hnsw.ef_search based on recall requirements

pgvector vs Dedicated Vector Databases

Feature	pgvector	Pinecone/Weaviate
Deployment	Extension in existing PostgreSQL	Managed SaaS or self-hosted
Cost	Free/open-source	Subscription-based
Integration	Native SQL, ACID transactions, joins	Vector-focused; requires data sync
Scale	Millions of vectors with tuning	Billions with horizontal sharding
Best For	Unified relational + vector workloads	Pure vector-heavy applications