pgvector is an open-source PostgreSQL extension that adds native support for storing, indexing, and querying high-dimensional vectors. It enables efficient similarity search within existing PostgreSQL databases, allowing vector operations alongside traditional relational data.1)
Install pgvector from source or via package managers:2)
apt install postgresql-16-pgvectorbrew install pgvectorEnable in a database with:
CREATE EXTENSION vector;
Create a vector column with a fixed dimension count (e.g., 1536 for OpenAI embeddings):
CREATE TABLE items ( id BIGSERIAL PRIMARY KEY, content TEXT, embedding VECTOR(1536) );
pgvector introduces the vector data type for fixed-length arrays of floating-point numbers:3)
avg(vector), sum(vector)pgvector provides operators for different similarity metrics:4)
| Operator | Metric | Usage |
|---|---|---|
↔ | L2 (Euclidean) distance | General-purpose similarity |
⇔ | Cosine distance | Normalized angle-based similarity |
<#> | Negative inner product | Maximum inner product search |
Example query:
SELECT * FROM items ORDER BY embedding <=> query_embedding LIMIT 10;
IVFFlat (Inverted File with Flat compression) partitions the vector space into clusters using k-means:5)
lists parameter controls the number of clusters (recommended: sqrt(n) for n rows)ivfflat.probes at query time controls how many clusters to searchCREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
HNSW (Hierarchical Navigable Small World) uses a graph structure for high-recall approximate nearest neighbor search:6)
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 128);
Key tuning strategies for production pgvector deployments:7)
maintenance_work_mem for faster index builds (2-4GB recommended)max_parallel_maintenance_workers for parallel HNSW buildshnsw.ef_search based on recall requirements| Feature | pgvector | Pinecone/Weaviate |
|---|---|---|
| Deployment | Extension in existing PostgreSQL | Managed SaaS or self-hosted |
| Cost | Free/open-source | Subscription-based |
| Integration | Native SQL, ACID transactions, joins | Vector-focused; requires data sync |
| Scale | Millions of vectors with tuning | Billions with horizontal sharding |
| Best For | Unified relational + vector workloads | Pure vector-heavy applications |
pgvector excels when vectors need to live alongside relational data, avoiding data silos and synchronization overhead.8)