Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine developed by Infiniflow that specializes in deep document understanding through advanced parsing capabilities including OCR, table structure recognition, and document layout analysis. With over 76,000 GitHub stars, it excels at handling complex documents that other RAG systems struggle with.
| Repository | github.com/infiniflow/ragflow |
| License | Apache 2.0 |
| Language | Python |
| Stars | 76K+ |
| Category | RAG Engine |
RAGFlow decouples data extraction from chunking (since v0.17.0), allowing independent selection of visual models for each processing task. The pipeline flows through ingestion, parsing, embedding, retrieval, and generation stages.
RAGFlow's parsing capabilities are the core differentiator:
import requests RAGFLOW_API = "http://localhost:9380/api/v1" API_KEY = "ragflow-your-api-key" HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"} # Create a knowledge base (dataset) dataset = requests.post(f"{RAGFLOW_API}/datasets", headers=HEADERS, json={"name": "technical_docs", "chunk_method": "naive"} ).json() dataset_id = dataset["data"]["id"] # Upload a document with open("complex_report.pdf", "rb") as f: upload = requests.post( f"{RAGFLOW_API}/datasets/{dataset_id}/documents", headers={"Authorization": f"Bearer {API_KEY}"}, files={"file": f} ).json() # Query the knowledge base with RAG answer = requests.post(f"{RAGFLOW_API}/chats", headers=HEADERS, json={"question": "What were the Q3 revenue figures?", "dataset_ids": [dataset_id]} ).json() print(answer["data"]["answer"])