====== LightRAG ======
**LightRAG** is a knowledge graph-based retrieval-augmented generation framework that integrates graph structures with vector representations to enable efficient, dual-level information retrieval from document collections. With over **30,000 GitHub stars** and an associated EMNLP 2025 paper, it represents a significant evolution beyond classical RAG approaches.
| **Repository** | [[https://github.com/HKUDS/LightRAG|github.com/HKUDS/LightRAG]] |
| **License** | MIT |
| **Language** | Python |
| **Stars** | 30K+ |
| **Category** | Knowledge Graph RAG |
===== Key Features =====
* **Knowledge Graph Construction** -- Automatically extracts entities and relationships from documents using LLMs, building a comprehensive knowledge graph
* **Dual-Level Retrieval** -- Low-level retrieval for precise entity/edge queries and high-level retrieval for broad topic aggregation
* **Five Query Modes** -- Naive, Local, Global, Hybrid, and Mix retrieval strategies
* **Incremental Updates** -- Seamlessly integrates new documents by merging graph data without rebuilding the entire index
* **Web UI** -- Built-in web server with knowledge graph visualization and API support
* **Multi-Hop Reasoning** -- Graph structures enable extraction of global information from multi-hop subgraphs
===== Architecture =====
LightRAG's architecture consists of three primary components that work together to provide graph-enhanced retrieval:
**1. Graph-Based Text Indexing:** Documents are segmented into chunks, and LLMs extract entities (names, dates, locations, events) and their relationships. This constructs a comprehensive knowledge graph with key-value data structures for optimized retrieval.
**2. Dual-Level Retrieval Paradigm:** The system generates query keys at both detailed and abstract levels to accommodate diverse query types.
**3. Retrieval-Augmented Answer Generation:** A general-purpose LLM generates answers by processing concatenated values from relevant entities, relations, names, descriptions, and text excerpts.
graph TB
subgraph Indexing["Graph-Based Indexing"]
Docs[Document Chunks]
Extract[Entity Extraction via LLM]
RelEx[Relationship Extraction]
KG[(Knowledge Graph)]
KV[Key-Value Index]
end
subgraph Retrieval["Dual-Level Retrieval"]
Query[User Query]
Low[Low-Level: Entity/Edge Lookup]
High[High-Level: Topic Aggregation]
Vec[Vector Similarity Search]
Graph[Graph Traversal]
end
subgraph Generation["Answer Generation"]
Merge[Context Merging]
LLM[LLM Generation]
Answer[Final Answer]
end
Docs --> Extract
Extract --> RelEx
RelEx --> KG
KG --> KV
Query --> Low
Query --> High
Low --> Vec
Low --> Graph
High --> Vec
High --> Graph
Vec --> Merge
Graph --> Merge
Merge --> LLM
LLM --> Answer
===== Dual-Level Retrieval =====
The retrieval paradigm operates at two distinct levels:
* **Low-Level Retrieval** -- Detail-oriented queries that extract precise information about specific nodes or edges within the graph. Example: "Who wrote Pride and Prejudice?" targets a specific entity-relationship pair.
* **High-Level Retrieval** -- Broader topic queries that aggregate information across multiple related entities and relationships, providing insights into higher-level concepts and summaries.
By generating query keys at both levels, LightRAG ensures comprehensive, contextually relevant responses regardless of query complexity.
===== Query Modes =====
LightRAG supports five distinct query modes:
* **Naive** -- Simple retrieval without graph traversal (baseline)
* **Local** -- Uses local subgraph around query entities for focused retrieval
* **Global** -- Considers the entire knowledge graph for comprehensive answers
* **Hybrid** -- Combines local and global approaches
* **Mix** -- Advanced combination strategy for optimal results
===== Code Example =====
from lightrag import LightRAG, QueryParam
from lightrag.llm import openai_complete, openai_embedding
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
# Initialize LightRAG with working directory
rag = LightRAG(
working_dir="./lightrag_data",
llm_model_func=openai_complete,
llm_model_name="gpt-4o",
embedding_func=openai_embedding,
embedding_model_name="text-embedding-3-small"
)
# Index documents (builds knowledge graph automatically)
with open("research_paper.txt", "r") as f:
rag.insert(f.read())
# Query with different modes
result_naive = rag.query("What are the main findings?",
param=QueryParam(mode="naive"))
result_local = rag.query("What did the authors conclude about X?",
param=QueryParam(mode="local"))
result_hybrid = rag.query("How does this relate to the broader field?",
param=QueryParam(mode="hybrid"))
print(result_hybrid)
===== References =====
* [[https://github.com/HKUDS/LightRAG|LightRAG GitHub Repository]]
* [[https://arxiv.org/html/2410.05779v1|LightRAG Paper (arXiv)]]
* [[https://lightrag.github.io|LightRAG Project Page]]
===== See Also =====
* [[ragflow|RAGFlow]] -- RAG engine with deep document understanding
* [[dify|Dify]] -- Agentic workflow platform with RAG
* [[milvus|Milvus]] -- Vector database for embedding storage
* [[chromadb|ChromaDB]] -- AI-native embedding database
* [[qdrant|Qdrant]] -- High-performance vector database