====== Causal Reasoning Agents: Causal-Copilot ======

Causal analysis -- determining what causes what -- is one of the most important yet technically demanding tasks in data science. **Causal-Copilot** (2025) is an LLM-powered autonomous agent that automates the entire causal analysis pipeline: from data ingestion and causal discovery through identification, estimation, and interpretation, all driven by natural language interaction.

===== End-to-End Causal Analysis Pipeline =====

Causal-Copilot operates through a modular pipeline where each stage is orchestrated by the LLM agent:

**1. User Interaction:** The user uploads data and specifies causal questions in natural language. The system parses queries, incorporates domain knowledge, and supports interactive feedback at every stage.

**2. Preprocessing:** Automatic data cleaning, schema extraction, and diagnostic analysis including tests for linearity, stationarity, and heterogeneity across subpopulations.

**3. Algorithm Selection:** The LLM evaluates data characteristics and selects from 20+ algorithms, then configures hyperparameters. This replaces the traditional expert-driven process of manually choosing between methods.

**4. Core Analysis:** Executes the selected algorithms for causal discovery, causal inference, and auxiliary analyses.

**5. Postprocessing:** Bootstrap evaluation for robustness, LLM-guided graph refinement, and support for user revisions to the causal graph.

**6. Report Generation:** Produces visualizations, natural language interpretations, and LaTeX reports.

===== Supported Causal Methods =====

Causal-Copilot integrates methods across the full spectrum of causal analysis:

**Causal Discovery (Graph Structure Learning):**

^ Family ^ Methods ^
| Constraint-based | PC, FCI (handles latent confounders) |
| Score-based | GES (Greedy Equivalence Search) |
| Optimization-based | NOTEARS (continuous optimization for DAGs) |
| Functional | LiNGAM family (non-Gaussian identification) |

The NOTEARS optimization formulates DAG learning as a continuous problem:

$$\min_{W} \frac{1}{2n} \|X - XW\|_F^2 + \lambda \|W\|_1 \quad \text{s.t.} \quad h(W) = 0$$

where $h(W) = \text{tr}(e^{W \circ W}) - d$ is the acyclicity constraint.

**Causal Inference (Effect Estimation):**

  * Double Machine Learning (DML)
  * Doubly Robust estimation
  * Instrumental Variables (IV, DRIV)
  * Propensity Score Matching (PSM)
  * Counterfactual estimation

**Auxiliary Analysis:**
  * SHAP feature importance
  * Anomaly attribution

===== Code Example: Causal Analysis Agent =====

<code python>
class CausalCopilot:
    def __init__(self, llm, method_registry):
        self.llm = llm
        self.methods = method_registry

    def analyze(self, data, question):
        diagnostics = self.preprocess(data)
        selected_methods = self.select_algorithms(diagnostics, question)
        causal_graph = self.discover(data, selected_methods["discovery"])
        causal_graph = self.refine_graph(causal_graph, question)
        effects = self.estimate(
            data, causal_graph, 
            selected_methods["inference"],
            treatment=question.treatment,
            outcome=question.outcome
        )
        robust = self.bootstrap_evaluate(effects, n_iterations=500)
        report = self.generate_report(causal_graph, robust, question)
        return report

    def select_algorithms(self, diagnostics, question):
        prompt = self.build_selection_prompt(diagnostics, question)
        selection = self.llm.reason(prompt)
        return {
            "discovery": self.methods.get(selection.discovery_method),
            "inference": self.methods.get(selection.inference_method)
        }

    def discover(self, data, method):
        raw_graph = method.fit(data)
        bootstrap_graphs = [
            method.fit(data.sample(frac=0.8))
            for _ in range(100)
        ]
        edge_confidence = self.compute_edge_stability(bootstrap_graphs)
        return self.prune_unstable_edges(raw_graph, edge_confidence)

    def refine_graph(self, graph, question):
        refinement = self.llm.evaluate_graph(graph, question.domain)
        return graph.apply_refinements(refinement)
</code>

===== Benchmark Results =====

Causal-Copilot consistently outperforms individual algorithms across diverse scenarios:

**Tabular Data (F1 Score):**

^ Scenario ^ Causal-Copilot ^ PC ^ FCI ^ GES ^
| Dense Graph (p=0.5) | **0.65** | 0.41 | 0.44 | 0.40 |
| Large Scale (p=50) | **0.94** | 0.70 | 0.79 | N/A |
| Non-Gaussian Noise | **0.97** | 0.84 | 0.85 | 0.86 |
| Heterogeneous Domains | **0.77** | 0.51 | 0.62 | 0.40 |

**Time Series Data (F1 Score):**

^ Scenario ^ Causal-Copilot ^ PCMCI ^ DYNOTEARS ^
| Small (p=5, lag=3) | **0.98** | 0.92 | 0.97 |
| Large Lag (lag=20) | **0.85** | 0.84 | 0.77 |

The agent excels especially in challenging scenarios (extreme scale, non-Gaussian noise, heterogeneous domains) where algorithm selection is critical.

===== Pipeline Diagram =====

<mermaid>
flowchart TD
    A[User: Data + Natural Language Question] --> B[Preprocessing Agent]
    B --> C[Data Cleaning & Diagnostics]
    C --> D[Algorithm Selection Agent]
    D --> E[Method Configuration]
    E --> F{Analysis Type}
    F --> G[Causal Discovery]
    F --> H[Causal Inference]
    F --> I[Auxiliary Analysis]
    G --> J[Graph: PC / FCI / GES / NOTEARS / LiNGAM]
    H --> K[Effects: DML / DR / IV / PSM]
    I --> L[SHAP / Anomaly Attribution]
    J --> M[Postprocessing Agent]
    K --> M
    L --> M
    M --> N[Bootstrap Evaluation]
    N --> O[LLM Graph Refinement]
    O --> P[Report Generation]
    P --> Q[Visualizations + LaTeX Report]
</mermaid>

===== Key Capabilities =====

  * **Natural language interface:** No statistical expertise required -- users describe causal questions in plain English
  * **Automatic method selection:** The LLM chooses appropriate algorithms based on data characteristics, eliminating the need for manual algorithm comparison
  * **Scalability:** Handles datasets with up to 500 variables and complex time-series with long lags
  * **Robustness:** Bootstrap evaluation and graph refinement ensure reliable results
  * **Interpretability:** Generated reports explain findings in accessible language with supporting visualizations

===== References =====

  * [[https://arxiv.org/abs/2504.13263|Causal-Copilot: An Autonomous Agent for End-to-End Causal Analysis (arXiv:2504.13263)]]
  * [[https://www.charonwangg.com/project/copilot/|Causal-Copilot Project Page]]

===== See Also =====

  * [[data_science_agents|Data Science Agents: DatawiseAgent]]
  * [[clinical_diagnosis_agents|Clinical Diagnosis Agents: MACD]]
  * [[knowledge_graph_world_models|Knowledge Graph World Models: AriGraph]]