Table of Contents

Causal Reasoning Agents: Causal-Copilot

Causal analysis – determining what causes what – is one of the most important yet technically demanding tasks in data science. Causal-Copilot (2025) is an LLM-powered autonomous agent that automates the entire causal analysis pipeline: from data ingestion and causal discovery through identification, estimation, and interpretation, all driven by natural language interaction.

End-to-End Causal Analysis Pipeline

Causal-Copilot operates through a modular pipeline where each stage is orchestrated by the LLM agent:

1. User Interaction: The user uploads data and specifies causal questions in natural language. The system parses queries, incorporates domain knowledge, and supports interactive feedback at every stage.

2. Preprocessing: Automatic data cleaning, schema extraction, and diagnostic analysis including tests for linearity, stationarity, and heterogeneity across subpopulations.

3. Algorithm Selection: The LLM evaluates data characteristics and selects from 20+ algorithms, then configures hyperparameters. This replaces the traditional expert-driven process of manually choosing between methods.

4. Core Analysis: Executes the selected algorithms for causal discovery, causal inference, and auxiliary analyses.

5. Postprocessing: Bootstrap evaluation for robustness, LLM-guided graph refinement, and support for user revisions to the causal graph.

6. Report Generation: Produces visualizations, natural language interpretations, and LaTeX reports.

Supported Causal Methods

Causal-Copilot integrates methods across the full spectrum of causal analysis:

Causal Discovery (Graph Structure Learning):

Family Methods
Constraint-based PC, FCI (handles latent confounders)
Score-based GES (Greedy Equivalence Search)
Optimization-based NOTEARS (continuous optimization for DAGs)
Functional LiNGAM family (non-Gaussian identification)

The NOTEARS optimization formulates DAG learning as a continuous problem:

$$\min_{W} \frac{1}{2n} \|X - XW\|_F^2 + \lambda \|W\|_1 \quad \text{s.t.} \quad h(W) = 0$$

where $h(W) = \text{tr}(e^{W \circ W}) - d$ is the acyclicity constraint.

Causal Inference (Effect Estimation):

Auxiliary Analysis:

Code Example: Causal Analysis Agent

class CausalCopilot:
    def __init__(self, llm, method_registry):
        self.llm = llm
        self.methods = method_registry
 
    def analyze(self, data, question):
        diagnostics = self.preprocess(data)
        selected_methods = self.select_algorithms(diagnostics, question)
        causal_graph = self.discover(data, selected_methods["discovery"])
        causal_graph = self.refine_graph(causal_graph, question)
        effects = self.estimate(
            data, causal_graph, 
            selected_methods["inference"],
            treatment=question.treatment,
            outcome=question.outcome
        )
        robust = self.bootstrap_evaluate(effects, n_iterations=500)
        report = self.generate_report(causal_graph, robust, question)
        return report
 
    def select_algorithms(self, diagnostics, question):
        prompt = self.build_selection_prompt(diagnostics, question)
        selection = self.llm.reason(prompt)
        return {
            "discovery": self.methods.get(selection.discovery_method),
            "inference": self.methods.get(selection.inference_method)
        }
 
    def discover(self, data, method):
        raw_graph = method.fit(data)
        bootstrap_graphs = [
            method.fit(data.sample(frac=0.8))
            for _ in range(100)
        ]
        edge_confidence = self.compute_edge_stability(bootstrap_graphs)
        return self.prune_unstable_edges(raw_graph, edge_confidence)
 
    def refine_graph(self, graph, question):
        refinement = self.llm.evaluate_graph(graph, question.domain)
        return graph.apply_refinements(refinement)

Benchmark Results

Causal-Copilot consistently outperforms individual algorithms across diverse scenarios:

Tabular Data (F1 Score):

Scenario Causal-Copilot PC FCI GES
Dense Graph (p=0.5) 0.65 0.41 0.44 0.40
Large Scale (p=50) 0.94 0.70 0.79 N/A
Non-Gaussian Noise 0.97 0.84 0.85 0.86
Heterogeneous Domains 0.77 0.51 0.62 0.40

Time Series Data (F1 Score):

Scenario Causal-Copilot PCMCI DYNOTEARS
Small (p=5, lag=3) 0.98 0.92 0.97
Large Lag (lag=20) 0.85 0.84 0.77

The agent excels especially in challenging scenarios (extreme scale, non-Gaussian noise, heterogeneous domains) where algorithm selection is critical.

Pipeline Diagram

flowchart TD A[User: Data + Natural Language Question] --> B[Preprocessing Agent] B --> C[Data Cleaning & Diagnostics] C --> D[Algorithm Selection Agent] D --> E[Method Configuration] E --> F{Analysis Type} F --> G[Causal Discovery] F --> H[Causal Inference] F --> I[Auxiliary Analysis] G --> J[Graph: PC / FCI / GES / NOTEARS / LiNGAM] H --> K[Effects: DML / DR / IV / PSM] I --> L[SHAP / Anomaly Attribution] J --> M[Postprocessing Agent] K --> M L --> M M --> N[Bootstrap Evaluation] N --> O[LLM Graph Refinement] O --> P[Report Generation] P --> Q[Visualizations + LaTeX Report]

Key Capabilities

References

See Also