====== Data Science Agents: DatawiseAgent ====== LLM-based agents are transforming data science workflows by autonomously executing end-to-end machine learning pipelines. **DatawiseAgent** (2025) introduces a notebook-centric framework that mimics how human data scientists work -- iteratively planning, coding, debugging, and refining within Jupyter notebooks. ===== Architecture: Finite State Transducer ===== DatawiseAgent models the data science workflow as a **Finite State Transducer (FST)** with four orchestrated states connected by a transition function $\delta$ that responds to action signals and computational feedback: States: {q_plan, q_inc, q_debug, q_filter} Transition: δ(q, signal) → q' The state transition function ensures that errors from planning or execution trigger appropriate repair sequences while constraints prevent infinite loops. ===== Core Components ===== **DFS-like Adaptive Re-Planning:** The agent explores the solution space as a tree structure. After completing a subtask, it evaluates whether to backtrack (explore sibling nodes), advance deeper, or terminate. This enables dynamic adaptation when initial strategies fail. **Incremental Execution:** Rather than generating entire solutions at once, the agent progressively produces text and code step-by-step for each subtask, incorporating real-time execution feedback to handle LLM limitations and task interdependencies. **Self-Debugging:** When execution errors occur, the agent analyzes faulty code using fine-grained execution feedback. It iteratively refines code through LLM-based diagnosis, handling both syntax and logic errors across multiple repair attempts. **Post-Filtering:** After debugging, this cleanup stage removes errors and redundancies to produce clean, executable code. It learns from past mistakes to prevent error accumulation in subsequent cells. ===== Notebook-Centric Design ===== All agent-environment interactions occur through Jupyter notebook cells -- markdown for planning and observations, code cells for execution. This unifies communication and supports: * Flexible planning with documented reasoning * Progressive development with real-time feedback * Failure recovery through cell-level rollback * Context management via markdown observations ===== Formal State Transition Model ===== The trajectory through the FST can be expressed as: $$\tau = (q_0, a_0, q_1, a_1, \ldots, q_n)$$ where each state $q_i$ represents one of the four processing stages. The transition function incorporates execution signals: $$\delta(q_{plan}, \text{error}) \rightarrow q_{debug}$$ $$\delta(q_{debug}, \text{fixed}) \rightarrow q_{filter}$$ $$\delta(q_{filter}, \text{clean}) \rightarrow q_{plan}$$ ===== Code Example: Agent Interaction Loop ===== class DatawiseAgent: def __init__(self, notebook, llm, max_turns=20): self.notebook = notebook self.llm = llm self.state = "plan" self.max_turns = max_turns self.history = [] def run(self, task_description): self.notebook.add_markdown(task_description) for turn in range(self.max_turns): if self.state == "plan": subtask = self.llm.plan(self.history, task_description) action = "backtrack" if subtask.score < 0.3 else "advance" if action == "backtrack": self.notebook.rollback_to(subtask.branch_point) self.state = "execute" elif self.state == "execute": code = self.llm.generate_code(subtask) result = self.notebook.execute_cell(code) self.state = "debug" if result.has_error else "plan" elif self.state == "debug": fix = self.llm.diagnose_and_fix(result.error, code) result = self.notebook.execute_cell(fix) self.state = "filter" if not result.has_error else "debug" elif self.state == "filter": clean_code = self.llm.post_filter(self.notebook.cells) self.notebook.consolidate(clean_code) self.state = "plan" self.history.append((self.state, turn)) return self.notebook ===== Workflow Diagram ===== stateDiagram-v2 [*] --> Planning Planning --> Execution : subtask generated Execution --> Planning : success (advance/backtrack) Execution --> SelfDebugging : execution error SelfDebugging --> PostFiltering : fix successful SelfDebugging --> SelfDebugging : retry fix PostFiltering --> Planning : clean code produced PostFiltering --> [*] : task complete ===== Key Results ===== * Published at EMNLP 2025 (Main Conference) * Ablation studies confirm that both DFS-planning and code repair modules contribute significantly to performance * The FST-based approach mitigates cascading failures common in prior linear agent pipelines * Outperforms single-pass generation approaches on complex, multi-step data science tasks ===== References ===== * [[https://arxiv.org/abs/2503.07044|DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science (arXiv:2503.07044)]] * [[https://aclanthology.org/2025.emnlp-main.58/|EMNLP 2025 Proceedings]] ===== See Also ===== * [[agent_rl_training|Agent RL Training: Agent-R1 and RAGEN]] * [[api_tool_generation|API Tool Generation: Doc2Agent and LRASGen]] * [[causal_reasoning_agents|Causal Reasoning Agents: Causal-Copilot]]