Table of Contents

LightRAG

LightRAG is a knowledge graph-based retrieval-augmented generation framework that integrates graph structures with vector representations to enable efficient, dual-level information retrieval from document collections. With over 30,000 GitHub stars and an associated EMNLP 2025 paper, it represents a significant evolution beyond classical RAG approaches.

Repository github.com/HKUDS/LightRAG
License MIT
Language Python
Stars 30K+
Category Knowledge Graph RAG

Key Features

Architecture

LightRAG's architecture consists of three primary components that work together to provide graph-enhanced retrieval:

1. Graph-Based Text Indexing: Documents are segmented into chunks, and LLMs extract entities (names, dates, locations, events) and their relationships. This constructs a comprehensive knowledge graph with key-value data structures for optimized retrieval.

2. Dual-Level Retrieval Paradigm: The system generates query keys at both detailed and abstract levels to accommodate diverse query types.

3. Retrieval-Augmented Answer Generation: A general-purpose LLM generates answers by processing concatenated values from relevant entities, relations, names, descriptions, and text excerpts.

graph TB subgraph Indexing["Graph-Based Indexing"] Docs[Document Chunks] Extract[Entity Extraction via LLM] RelEx[Relationship Extraction] KG[(Knowledge Graph)] KV[Key-Value Index] end subgraph Retrieval["Dual-Level Retrieval"] Query[User Query] Low[Low-Level: Entity/Edge Lookup] High[High-Level: Topic Aggregation] Vec[Vector Similarity Search] Graph[Graph Traversal] end subgraph Generation["Answer Generation"] Merge[Context Merging] LLM[LLM Generation] Answer[Final Answer] end Docs --> Extract Extract --> RelEx RelEx --> KG KG --> KV Query --> Low Query --> High Low --> Vec Low --> Graph High --> Vec High --> Graph Vec --> Merge Graph --> Merge Merge --> LLM LLM --> Answer

Dual-Level Retrieval

The retrieval paradigm operates at two distinct levels:

By generating query keys at both levels, LightRAG ensures comprehensive, contextually relevant responses regardless of query complexity.

Query Modes

LightRAG supports five distinct query modes:

Code Example

from lightrag import LightRAG, QueryParam
from lightrag.llm import openai_complete, openai_embedding
import os
 
os.environ["OPENAI_API_KEY"] = "your-api-key"
 
# Initialize LightRAG with working directory
rag = LightRAG(
    working_dir="./lightrag_data",
    llm_model_func=openai_complete,
    llm_model_name="gpt-4o",
    embedding_func=openai_embedding,
    embedding_model_name="text-embedding-3-small"
)
 
# Index documents (builds knowledge graph automatically)
with open("research_paper.txt", "r") as f:
    rag.insert(f.read())
 
# Query with different modes
result_naive = rag.query("What are the main findings?",
                          param=QueryParam(mode="naive"))
result_local = rag.query("What did the authors conclude about X?",
                          param=QueryParam(mode="local"))
result_hybrid = rag.query("How does this relate to the broader field?",
                           param=QueryParam(mode="hybrid"))
 
print(result_hybrid)

References

See Also