====== Data Science Agents: DatawiseAgent ======
LLM-based agents are transforming data science workflows by autonomously executing end-to-end machine learning pipelines. **DatawiseAgent** (2025) introduces a notebook-centric framework that mimics how human data scientists work -- iteratively planning, coding, debugging, and refining within Jupyter notebooks.
===== Architecture: Finite State Transducer =====
DatawiseAgent models the data science workflow as a **Finite State Transducer (FST)** with four orchestrated states connected by a transition function $\delta$ that responds to action signals and computational feedback:
States: {q_plan, q_inc, q_debug, q_filter}
Transition: δ(q, signal) → q'
The state transition function ensures that errors from planning or execution trigger appropriate repair sequences while constraints prevent infinite loops.
===== Core Components =====
**DFS-like Adaptive Re-Planning:** The agent explores the solution space as a tree structure. After completing a subtask, it evaluates whether to backtrack (explore sibling nodes), advance deeper, or terminate. This enables dynamic adaptation when initial strategies fail.
**Incremental Execution:** Rather than generating entire solutions at once, the agent progressively produces text and code step-by-step for each subtask, incorporating real-time execution feedback to handle LLM limitations and task interdependencies.
**Self-Debugging:** When execution errors occur, the agent analyzes faulty code using fine-grained execution feedback. It iteratively refines code through LLM-based diagnosis, handling both syntax and logic errors across multiple repair attempts.
**Post-Filtering:** After debugging, this cleanup stage removes errors and redundancies to produce clean, executable code. It learns from past mistakes to prevent error accumulation in subsequent cells.
===== Notebook-Centric Design =====
All agent-environment interactions occur through Jupyter notebook cells -- markdown for planning and observations, code cells for execution. This unifies communication and supports:
* Flexible planning with documented reasoning
* Progressive development with real-time feedback
* Failure recovery through cell-level rollback
* Context management via markdown observations
===== Formal State Transition Model =====
The trajectory through the FST can be expressed as:
$$\tau = (q_0, a_0, q_1, a_1, \ldots, q_n)$$
where each state $q_i$ represents one of the four processing stages. The transition function incorporates execution signals:
$$\delta(q_{plan}, \text{error}) \rightarrow q_{debug}$$
$$\delta(q_{debug}, \text{fixed}) \rightarrow q_{filter}$$
$$\delta(q_{filter}, \text{clean}) \rightarrow q_{plan}$$
===== Code Example: Agent Interaction Loop =====
class DatawiseAgent:
def __init__(self, notebook, llm, max_turns=20):
self.notebook = notebook
self.llm = llm
self.state = "plan"
self.max_turns = max_turns
self.history = []
def run(self, task_description):
self.notebook.add_markdown(task_description)
for turn in range(self.max_turns):
if self.state == "plan":
subtask = self.llm.plan(self.history, task_description)
action = "backtrack" if subtask.score < 0.3 else "advance"
if action == "backtrack":
self.notebook.rollback_to(subtask.branch_point)
self.state = "execute"
elif self.state == "execute":
code = self.llm.generate_code(subtask)
result = self.notebook.execute_cell(code)
self.state = "debug" if result.has_error else "plan"
elif self.state == "debug":
fix = self.llm.diagnose_and_fix(result.error, code)
result = self.notebook.execute_cell(fix)
self.state = "filter" if not result.has_error else "debug"
elif self.state == "filter":
clean_code = self.llm.post_filter(self.notebook.cells)
self.notebook.consolidate(clean_code)
self.state = "plan"
self.history.append((self.state, turn))
return self.notebook
===== Workflow Diagram =====
stateDiagram-v2
[*] --> Planning
Planning --> Execution : subtask generated
Execution --> Planning : success (advance/backtrack)
Execution --> SelfDebugging : execution error
SelfDebugging --> PostFiltering : fix successful
SelfDebugging --> SelfDebugging : retry fix
PostFiltering --> Planning : clean code produced
PostFiltering --> [*] : task complete
===== Key Results =====
* Published at EMNLP 2025 (Main Conference)
* Ablation studies confirm that both DFS-planning and code repair modules contribute significantly to performance
* The FST-based approach mitigates cascading failures common in prior linear agent pipelines
* Outperforms single-pass generation approaches on complex, multi-step data science tasks
===== References =====
* [[https://arxiv.org/abs/2503.07044|DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science (arXiv:2503.07044)]]
* [[https://aclanthology.org/2025.emnlp-main.58/|EMNLP 2025 Proceedings]]
===== See Also =====
* [[agent_rl_training|Agent RL Training: Agent-R1 and RAGEN]]
* [[api_tool_generation|API Tool Generation: Doc2Agent and LRASGen]]
* [[causal_reasoning_agents|Causal Reasoning Agents: Causal-Copilot]]