====== Knowledge Graphs ====== Knowledge graphs provide structured representations of entities and their relationships, enabling AI agents to reason over interconnected data rather than processing information in isolation. By combining knowledge graphs with LLMs through patterns like GraphRAG, agents gain the ability to perform multi-hop reasoning, track entity relationships, and maintain factual grounding that flat vector search cannot achieve. ===== What is a Knowledge Graph? ===== A knowledge graph represents information as a network of **entities** (nodes) connected by **relationships** (edges), with properties on both. Unlike tabular databases, knowledge graphs excel at representing complex, interconnected domains where the relationships between things are as important as the things themselves. * **Nodes** represent entities: people, companies, concepts, documents * **Edges** represent relationships: "works_at", "cites", "related_to" * **Properties** store attributes on both nodes and edges Formally, a knowledge graph is a set of triples $G = \{(h, r, t) \mid h, t \in \mathcal{E}, r \in \mathcal{R}\}$, where $\mathcal{E}$ is the set of entities, $\mathcal{R}$ is the set of relation types, $h$ is the head entity, $r$ is the relation, and $t$ is the tail entity. ===== Property Graphs vs. RDF ===== | **Model** | **Structure** | **Query Language** | **Best For** | | Property Graph | Nodes and edges with key-value properties | Cypher (Neo4j), GQL | Application development, flexible schemas | | RDF (Triples) | Subject-predicate-object triples | SPARQL | Linked data, ontology-driven reasoning | Property graphs (used by [[https://neo4j.com|Neo4j]], Amazon Neptune, Memgraph) dominate agent applications due to their flexibility and developer-friendly query languages. ===== Knowledge Graph Embeddings ===== Knowledge graph embedding methods learn low-dimensional vector representations for entities and relations, enabling link prediction and reasoning. Common scoring functions include: * **TransE**: Models relations as translations, scoring $f(h, r, t) = -||\mathbf{h} + \mathbf{r} - \mathbf{t}||$ where entities and relations are embedded as vectors * **DistMult**: Uses bilinear scoring $f(h, r, t) = \mathbf{h}^\top \text{diag}(\mathbf{r}) \, \mathbf{t} = \sum_i h_i \cdot r_i \cdot t_i$ * **ComplEx**: Extends to complex-valued embeddings $f(h, r, t) = \text{Re}(\sum_i h_i \cdot r_i \cdot \bar{t}_i)$ for asymmetric relations ===== Knowledge Graphs for AI Agents ===== Knowledge graphs serve as centralized, dynamic hubs that allow specialized agents to access and share contextual data. Key capabilities include: * **Structured retrieval** — Agents traverse relationships to find connected information that vector similarity search misses * **Multi-hop reasoning** — Following chains of relationships to answer complex questions (e.g., "Which suppliers serve customers in regions with regulatory changes?") * **Entity resolution** — Disambiguating references to the same entity across different data sources * **Temporal tracking** — Recording when facts were true, enabling reasoning about change over time * **Shared context** — Multiple agents access the same knowledge graph as a shared source of truth ===== GraphRAG ===== [[https://microsoft.github.io/graphrag/|Microsoft's GraphRAG]] combines knowledge graph construction with retrieval-augmented generation: - **Entity extraction** — LLMs extract entities and relationships from source documents - **Graph construction** — Extracted entities are organized into a knowledge graph with community detection - **Hierarchical summarization** — Communities are summarized at multiple levels of abstraction - **Graph-enhanced retrieval** — Queries traverse the graph structure for context, supplementing vector search GraphRAG excels on complex analytical queries that require understanding entity relationships across many documents. ===== Example: Agent with Knowledge Graph ===== from neo4j import GraphDatabase class KnowledgeGraphAgent: def __init__(self, uri, auth, llm_client): self.driver = GraphDatabase.driver(uri, auth=auth) self.llm = llm_client def query_graph(self, question: str) -> str: # LLM generates a Cypher query from natural language cypher = self.llm.invoke( f"Convert this question to a Neo4j Cypher query:\n{question}\n" f"Schema: (Person)-[:WORKS_AT]->(Company)-[:LOCATED_IN]->(City)" ) # Execute the graph query with self.driver.session() as session: results = session.run(cypher) data = [dict(record) for record in results] # Generate natural language response from graph data response = self.llm.invoke( f"Question: {question}\n" f"Graph results: {data}\n" f"Provide a clear answer based on these results." ) return response def add_knowledge(self, text: str): # Extract entities and relationships from text entities = self.llm.invoke( f"Extract entities and relationships from this text as " f"(subject, relationship, object) triples:\n{text}" ) # Store in graph with self.driver.session() as session: for subject, rel, obj in parse_triples(entities): session.run( "MERGE (a:Entity {name: $subj}) " "MERGE (b:Entity {name: $obj}) " "MERGE (a)-[r:RELATES {type: $rel}]->(b)", subj=subject, obj=obj, rel=rel ) agent = KnowledgeGraphAgent( "bolt://localhost:7687", ("neo4j", "password"), llm_client ) agent.add_knowledge("Anthropic, founded by Dario Amodei, is headquartered in San Francisco.") answer = agent.query_graph("Where is the company founded by Dario Amodei located?") ===== Entity Extraction ===== Modern entity extraction for knowledge graphs uses LLMs to identify entities and relationships from unstructured text. Key approaches: * **Zero-shot extraction** — Prompt LLMs to extract entities without examples * **NER + relation extraction** — Combine named entity recognition with relationship classification * **Iterative refinement** — Multiple passes to resolve coreferences and disambiguate entities * **Schema-guided extraction** — Constrain extraction to a predefined ontology ===== Graph Databases ===== | **Database** | **Type** | **Query Language** | **Agent Integration** | | [[https://neo4j.com|Neo4j]] | Property Graph | Cypher | LangChain, LlamaIndex, direct Bolt protocol | | Amazon Neptune | Property Graph + RDF | Gremlin, SPARQL | AWS ecosystem, SageMaker | | Memgraph | Property Graph | Cypher | Real-time streaming, in-memory | | FalkorDB | Property Graph | Cypher | Redis-compatible, low latency | ===== References ===== * [[https://microsoft.github.io/graphrag/|Microsoft GraphRAG]] * [[https://neo4j.com|Neo4j Graph Database]] * [[https://beam.ai/agentic-insights/5-ways-knowledge-graphs-are-quietly-reshaping-ai-workflows-in-2026|Knowledge Graphs Reshaping AI Workflows 2026]] ===== See Also ===== * [[retrieval_augmented_generation]] — RAG patterns including GraphRAG * [[embeddings]] — Vector representations complementing graph retrieval * [[agent_memory_frameworks]] — Memory systems using knowledge graphs * [[agent_orchestration]] — Multi-agent systems sharing knowledge graphs