====== Milvus ====== **Milvus** is a cloud-native, open-source vector database built primarily in Go and C++ and optimized for high-performance similarity search on massive-scale embedding datasets. With over **43,000 GitHub stars** and CNCF graduated status, it is one of the most mature and battle-tested vector databases available.(([[https://github.com/milvus-io/milvus|Milvus GitHub Repository]])) | **Repository** | [[https://[[github|github]].com/milvus-io/milvus|github.com/milvus-io/milvus]] | | **License** | Apache 2.0 | | **Language** | Go, C++ | | **Stars** | 43K+ | | **Category** | Vector Database | ===== Key Features ===== * **Billion-Scale Search**, Handles billion-scale vector datasets with millisecond latency * **Disaggregated Architecture**, Independent scaling of storage and compute with microservices design * **Multiple Index Types**, HNSW, IVF, DiskANN with Int8 compression for memory efficiency * **[[hybrid_search|Hybrid Search]]**, Combines vector similarity with scalar filtering, BM25 full-text search, and JSON path indexing * **Hot/Cold Tiering**, Automatic data tiering between memory/SSD and object storage based on access patterns * **Hardware Acceleration**, AVX512, [[neon|Neon]] SIMD, quantization, and GPU support * **Multi-Tenancy**, Supports up to 100K collections per cluster * **CNCF Graduated**, Production-proven with enterprise-grade reliability ===== Architecture ===== Milvus employs a multi-layered, microservices-based architecture separating access, coordination, worker, and storage layers:(([[https://milvus.io/docs/architecture_overview.md|Milvus Architecture Documentation]])) * **Access Layer**, Handles client requests, validation, and routing to internal nodes * **Coordinator Service**, The brain of the system; manages load balancing, data coordination, and query planning via etcd * **Streaming Node**, Go-based component built on Woodpecker WAL for real-time ingestion, eliminating external message queues * **Data Node and Query Node**, C++ worker nodes handling compaction, indexing on sealed segments, and query execution * **Object Storage**, All data persists in S3/MinIO with a diskless architecture for simplified geo-distributed deployments graph TB subgraph Access["Access Layer"] SDK[Client SDKs] Proxy[Proxy / Load Balancer] end subgraph Coord["Coordinator Service"] Root[Root Coord] Data[Data Coord] Query[Query Coord] Index[Index Coord] end subgraph Workers["Worker Nodes"] SN[Streaming Node - Go] DN[Data Node - C++] QN[Query Node - C++] end subgraph Storage["Storage Layer"] WAL[Woodpecker WAL] ObjStore[(Object Storage - S3/MinIO)] Etcd[(etcd - Metadata)] end Access --> Coord Coord --> Workers SN --> WAL DN --> ObjStore QN --> ObjStore Coord --> Etcd ===== Indexing Algorithms ===== Milvus supports multiple vector indexing strategies: * **HNSW**, [[hnsw_graphs|Hierarchical Navigable Small World graphs]] for high-accuracy approximate nearest neighbor search; supports Int8 compression to reduce memory while preserving accuracy * **IVF**, Inverted File index for partitioned search with configurable accuracy/speed tradeoffs * **DiskANN**, Disk-optimized approximate nearest neighbor for datasets larger than memory * **BM25**, Full-text search index with 400% faster performance than Elasticsearch, supporting configurable drop_ratio_search ===== Storage and Tiering ===== Milvus 2.6 introduced tiered hot/cold separation:(([[https://milvus.io/blog/introduce-milvus-2-6-built-for-scale-designed-to-reduce-costs.md|Milvus 2.6 Release Blog]])) * **Hot data**, Frequently accessed data stays in memory/SSD for low-latency queries * **Cold data**, Infrequently accessed data migrates to object storage (S3/MinIO/NetApp) * **Dynamic migration**, Automatic tiering based on access patterns, reducing storage costs by 50% * **Storage v2**, Parquet/Vortex formats for Spark compatibility and reduced IOPS/memory usage ===== Code Example ===== from pymilvus import MilvusClient, DataType # Connect to Milvus client = MilvusClient(uri="http://localhost:19530") # Create collection with schema client.create_collection( collection_name="articles", dimension=768, metric_type="COSINE", auto_id=True ) # Insert vectors with metadata data = [ {"vector": [0.1] * 768, "title": "RAG Guide", "category": "ai"}, {"vector": [0.2] * 768, "title": "Vector DBs", "category": "database"}, ] client.insert(collection_name="articles", data=data) # [[hybrid_search|Hybrid search]]: vector similarity + scalar filter results = client.search( collection_name="articles", data=[[0.15] * 768], filter='category == "ai"', limit=10, output_fields=["title", "category"] ) for hits in results: for hit in hits: print(f"{hit['entity']['title']}: {hit['distance']:.4f}") ===== See Also ===== * [[weaviate|Weaviate]] * [[qdrant|Qdrant]] * [[supabase_vector|Supabase Vector]] * [[lateon|LateOn]] * [[vector_db_comparison|Vector Database Comparison]] ===== References =====