====== LightRAG ====== **LightRAG** is a knowledge graph-based retrieval-augmented generation framework that integrates graph structures with vector representations to enable efficient, dual-level information retrieval from document collections. With over **30,000 GitHub stars** and an associated EMNLP 2025 paper, it represents a significant evolution beyond classical RAG approaches. | **Repository** | [[https://github.com/HKUDS/LightRAG|github.com/HKUDS/LightRAG]] | | **License** | MIT | | **Language** | Python | | **Stars** | 30K+ | | **Category** | Knowledge Graph RAG | ===== Key Features ===== * **Knowledge Graph Construction** -- Automatically extracts entities and relationships from documents using LLMs, building a comprehensive knowledge graph * **Dual-Level Retrieval** -- Low-level retrieval for precise entity/edge queries and high-level retrieval for broad topic aggregation * **Five Query Modes** -- Naive, Local, Global, Hybrid, and Mix retrieval strategies * **Incremental Updates** -- Seamlessly integrates new documents by merging graph data without rebuilding the entire index * **Web UI** -- Built-in web server with knowledge graph visualization and API support * **Multi-Hop Reasoning** -- Graph structures enable extraction of global information from multi-hop subgraphs ===== Architecture ===== LightRAG's architecture consists of three primary components that work together to provide graph-enhanced retrieval: **1. Graph-Based Text Indexing:** Documents are segmented into chunks, and LLMs extract entities (names, dates, locations, events) and their relationships. This constructs a comprehensive knowledge graph with key-value data structures for optimized retrieval. **2. Dual-Level Retrieval Paradigm:** The system generates query keys at both detailed and abstract levels to accommodate diverse query types. **3. Retrieval-Augmented Answer Generation:** A general-purpose LLM generates answers by processing concatenated values from relevant entities, relations, names, descriptions, and text excerpts. graph TB subgraph Indexing["Graph-Based Indexing"] Docs[Document Chunks] Extract[Entity Extraction via LLM] RelEx[Relationship Extraction] KG[(Knowledge Graph)] KV[Key-Value Index] end subgraph Retrieval["Dual-Level Retrieval"] Query[User Query] Low[Low-Level: Entity/Edge Lookup] High[High-Level: Topic Aggregation] Vec[Vector Similarity Search] Graph[Graph Traversal] end subgraph Generation["Answer Generation"] Merge[Context Merging] LLM[LLM Generation] Answer[Final Answer] end Docs --> Extract Extract --> RelEx RelEx --> KG KG --> KV Query --> Low Query --> High Low --> Vec Low --> Graph High --> Vec High --> Graph Vec --> Merge Graph --> Merge Merge --> LLM LLM --> Answer ===== Dual-Level Retrieval ===== The retrieval paradigm operates at two distinct levels: * **Low-Level Retrieval** -- Detail-oriented queries that extract precise information about specific nodes or edges within the graph. Example: "Who wrote Pride and Prejudice?" targets a specific entity-relationship pair. * **High-Level Retrieval** -- Broader topic queries that aggregate information across multiple related entities and relationships, providing insights into higher-level concepts and summaries. By generating query keys at both levels, LightRAG ensures comprehensive, contextually relevant responses regardless of query complexity. ===== Query Modes ===== LightRAG supports five distinct query modes: * **Naive** -- Simple retrieval without graph traversal (baseline) * **Local** -- Uses local subgraph around query entities for focused retrieval * **Global** -- Considers the entire knowledge graph for comprehensive answers * **Hybrid** -- Combines local and global approaches * **Mix** -- Advanced combination strategy for optimal results ===== Code Example ===== from lightrag import LightRAG, QueryParam from lightrag.llm import openai_complete, openai_embedding import os os.environ["OPENAI_API_KEY"] = "your-api-key" # Initialize LightRAG with working directory rag = LightRAG( working_dir="./lightrag_data", llm_model_func=openai_complete, llm_model_name="gpt-4o", embedding_func=openai_embedding, embedding_model_name="text-embedding-3-small" ) # Index documents (builds knowledge graph automatically) with open("research_paper.txt", "r") as f: rag.insert(f.read()) # Query with different modes result_naive = rag.query("What are the main findings?", param=QueryParam(mode="naive")) result_local = rag.query("What did the authors conclude about X?", param=QueryParam(mode="local")) result_hybrid = rag.query("How does this relate to the broader field?", param=QueryParam(mode="hybrid")) print(result_hybrid) ===== References ===== * [[https://github.com/HKUDS/LightRAG|LightRAG GitHub Repository]] * [[https://arxiv.org/html/2410.05779v1|LightRAG Paper (arXiv)]] * [[https://lightrag.github.io|LightRAG Project Page]] ===== See Also ===== * [[ragflow|RAGFlow]] -- RAG engine with deep document understanding * [[dify|Dify]] -- Agentic workflow platform with RAG * [[milvus|Milvus]] -- Vector database for embedding storage * [[chromadb|ChromaDB]] -- AI-native embedding database * [[qdrant|Qdrant]] -- High-performance vector database