====== LightRAG ======

**LightRAG** is a knowledge graph-based retrieval-augmented generation framework that integrates graph structures with vector representations to enable efficient, dual-level information retrieval from document collections. With over **30,000 GitHub stars** and an associated EMNLP 2025 paper, it represents a significant evolution beyond classical RAG approaches.

| **Repository** | [[https://github.com/HKUDS/LightRAG|github.com/HKUDS/LightRAG]] |
| **License** | MIT |
| **Language** | Python |
| **Stars** | 30K+ |
| **Category** | Knowledge Graph RAG |

===== Key Features =====

  * **Knowledge Graph Construction** -- Automatically extracts entities and relationships from documents using LLMs, building a comprehensive knowledge graph
  * **Dual-Level Retrieval** -- Low-level retrieval for precise entity/edge queries and high-level retrieval for broad topic aggregation
  * **Five Query Modes** -- Naive, Local, Global, Hybrid, and Mix retrieval strategies
  * **Incremental Updates** -- Seamlessly integrates new documents by merging graph data without rebuilding the entire index
  * **Web UI** -- Built-in web server with knowledge graph visualization and API support
  * **Multi-Hop Reasoning** -- Graph structures enable extraction of global information from multi-hop subgraphs

===== Architecture =====

LightRAG's architecture consists of three primary components that work together to provide graph-enhanced retrieval:

**1. Graph-Based Text Indexing:** Documents are segmented into chunks, and LLMs extract entities (names, dates, locations, events) and their relationships. This constructs a comprehensive knowledge graph with key-value data structures for optimized retrieval.

**2. Dual-Level Retrieval Paradigm:** The system generates query keys at both detailed and abstract levels to accommodate diverse query types.

**3. Retrieval-Augmented Answer Generation:** A general-purpose LLM generates answers by processing concatenated values from relevant entities, relations, names, descriptions, and text excerpts.

<mermaid>
graph TB
    subgraph Indexing["Graph-Based Indexing"]
        Docs[Document Chunks]
        Extract[Entity Extraction via LLM]
        RelEx[Relationship Extraction]
        KG[(Knowledge Graph)]
        KV[Key-Value Index]
    end
    subgraph Retrieval["Dual-Level Retrieval"]
        Query[User Query]
        Low[Low-Level: Entity/Edge Lookup]
        High[High-Level: Topic Aggregation]
        Vec[Vector Similarity Search]
        Graph[Graph Traversal]
    end
    subgraph Generation["Answer Generation"]
        Merge[Context Merging]
        LLM[LLM Generation]
        Answer[Final Answer]
    end
    Docs --> Extract
    Extract --> RelEx
    RelEx --> KG
    KG --> KV
    Query --> Low
    Query --> High
    Low --> Vec
    Low --> Graph
    High --> Vec
    High --> Graph
    Vec --> Merge
    Graph --> Merge
    Merge --> LLM
    LLM --> Answer
</mermaid>

===== Dual-Level Retrieval =====

The retrieval paradigm operates at two distinct levels:

  * **Low-Level Retrieval** -- Detail-oriented queries that extract precise information about specific nodes or edges within the graph. Example: "Who wrote Pride and Prejudice?" targets a specific entity-relationship pair.
  * **High-Level Retrieval** -- Broader topic queries that aggregate information across multiple related entities and relationships, providing insights into higher-level concepts and summaries.

By generating query keys at both levels, LightRAG ensures comprehensive, contextually relevant responses regardless of query complexity.

===== Query Modes =====

LightRAG supports five distinct query modes:

  * **Naive** -- Simple retrieval without graph traversal (baseline)
  * **Local** -- Uses local subgraph around query entities for focused retrieval
  * **Global** -- Considers the entire knowledge graph for comprehensive answers
  * **Hybrid** -- Combines local and global approaches
  * **Mix** -- Advanced combination strategy for optimal results

===== Code Example =====

<code python>
from lightrag import LightRAG, QueryParam
from lightrag.llm import openai_complete, openai_embedding
import os

os.environ["OPENAI_API_KEY"] = "your-api-key"

# Initialize LightRAG with working directory
rag = LightRAG(
    working_dir="./lightrag_data",
    llm_model_func=openai_complete,
    llm_model_name="gpt-4o",
    embedding_func=openai_embedding,
    embedding_model_name="text-embedding-3-small"
)

# Index documents (builds knowledge graph automatically)
with open("research_paper.txt", "r") as f:
    rag.insert(f.read())

# Query with different modes
result_naive = rag.query("What are the main findings?",
                          param=QueryParam(mode="naive"))
result_local = rag.query("What did the authors conclude about X?",
                          param=QueryParam(mode="local"))
result_hybrid = rag.query("How does this relate to the broader field?",
                           param=QueryParam(mode="hybrid"))

print(result_hybrid)
</code>

===== References =====

  * [[https://github.com/HKUDS/LightRAG|LightRAG GitHub Repository]]
  * [[https://arxiv.org/html/2410.05779v1|LightRAG Paper (arXiv)]]
  * [[https://lightrag.github.io|LightRAG Project Page]]

===== See Also =====

  * [[ragflow|RAGFlow]] -- RAG engine with deep document understanding
  * [[dify|Dify]] -- Agentic workflow platform with RAG
  * [[milvus|Milvus]] -- Vector database for embedding storage
  * [[chromadb|ChromaDB]] -- AI-native embedding database
  * [[qdrant|Qdrant]] -- High-performance vector database