Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Causal analysis – determining what causes what – is one of the most important yet technically demanding tasks in data science. Causal-Copilot (2025) is an LLM-powered autonomous agent that automates the entire causal analysis pipeline: from data ingestion and causal discovery through identification, estimation, and interpretation, all driven by natural language interaction.
Causal-Copilot operates through a modular pipeline where each stage is orchestrated by the LLM agent:
1. User Interaction: The user uploads data and specifies causal questions in natural language. The system parses queries, incorporates domain knowledge, and supports interactive feedback at every stage.
2. Preprocessing: Automatic data cleaning, schema extraction, and diagnostic analysis including tests for linearity, stationarity, and heterogeneity across subpopulations.
3. Algorithm Selection: The LLM evaluates data characteristics and selects from 20+ algorithms, then configures hyperparameters. This replaces the traditional expert-driven process of manually choosing between methods.
4. Core Analysis: Executes the selected algorithms for causal discovery, causal inference, and auxiliary analyses.
5. Postprocessing: Bootstrap evaluation for robustness, LLM-guided graph refinement, and support for user revisions to the causal graph.
6. Report Generation: Produces visualizations, natural language interpretations, and LaTeX reports.
Causal-Copilot integrates methods across the full spectrum of causal analysis:
Causal Discovery (Graph Structure Learning):
| Family | Methods |
|---|---|
| Constraint-based | PC, FCI (handles latent confounders) |
| Score-based | GES (Greedy Equivalence Search) |
| Optimization-based | NOTEARS (continuous optimization for DAGs) |
| Functional | LiNGAM family (non-Gaussian identification) |
The NOTEARS optimization formulates DAG learning as a continuous problem:
$$\min_{W} \frac{1}{2n} \|X - XW\|_F^2 + \lambda \|W\|_1 \quad \text{s.t.} \quad h(W) = 0$$
where $h(W) = \text{tr}(e^{W \circ W}) - d$ is the acyclicity constraint.
Causal Inference (Effect Estimation):
Auxiliary Analysis:
class CausalCopilot: def __init__(self, llm, method_registry): self.llm = llm self.methods = method_registry def analyze(self, data, question): diagnostics = self.preprocess(data) selected_methods = self.select_algorithms(diagnostics, question) causal_graph = self.discover(data, selected_methods["discovery"]) causal_graph = self.refine_graph(causal_graph, question) effects = self.estimate( data, causal_graph, selected_methods["inference"], treatment=question.treatment, outcome=question.outcome ) robust = self.bootstrap_evaluate(effects, n_iterations=500) report = self.generate_report(causal_graph, robust, question) return report def select_algorithms(self, diagnostics, question): prompt = self.build_selection_prompt(diagnostics, question) selection = self.llm.reason(prompt) return { "discovery": self.methods.get(selection.discovery_method), "inference": self.methods.get(selection.inference_method) } def discover(self, data, method): raw_graph = method.fit(data) bootstrap_graphs = [ method.fit(data.sample(frac=0.8)) for _ in range(100) ] edge_confidence = self.compute_edge_stability(bootstrap_graphs) return self.prune_unstable_edges(raw_graph, edge_confidence) def refine_graph(self, graph, question): refinement = self.llm.evaluate_graph(graph, question.domain) return graph.apply_refinements(refinement)
Causal-Copilot consistently outperforms individual algorithms across diverse scenarios:
Tabular Data (F1 Score):
| Scenario | Causal-Copilot | PC | FCI | GES |
|---|---|---|---|---|
| Dense Graph (p=0.5) | 0.65 | 0.41 | 0.44 | 0.40 |
| Large Scale (p=50) | 0.94 | 0.70 | 0.79 | N/A |
| Non-Gaussian Noise | 0.97 | 0.84 | 0.85 | 0.86 |
| Heterogeneous Domains | 0.77 | 0.51 | 0.62 | 0.40 |
Time Series Data (F1 Score):
| Scenario | Causal-Copilot | PCMCI | DYNOTEARS |
|---|---|---|---|
| Small (p=5, lag=3) | 0.98 | 0.92 | 0.97 |
| Large Lag (lag=20) | 0.85 | 0.84 | 0.77 |
The agent excels especially in challenging scenarios (extreme scale, non-Gaussian noise, heterogeneous domains) where algorithm selection is critical.