====== TaskWeaver ====== TaskWeaver is a code-first agent framework developed by Microsoft Research (Qiao et al., 2023) that converts natural language user requests into executable Python code. Unlike text-centric frameworks that chain LLM calls with predefined tools, TaskWeaver leverages LLM code generation capabilities to handle complex logic, rich data structures, and domain-specific analytics tasks through a stateful, plugin-extensible architecture. ===== Core Design Philosophy ===== TaskWeaver's central principle is **code-first execution**: every user request ultimately becomes runnable Python code. This design choice provides several advantages: * **Expressiveness**: Arbitrary logic can be encoded in code, not constrained to predefined tool chains * **Rich data structures**: Native support for DataFrames, arrays, dictionaries, and complex objects * **Composability**: Plugins are callable functions that can be combined with custom code * **Verifiability**: Generated code can be inspected, tested, and constrained by security rules ===== Architecture ===== TaskWeaver uses a multi-agent architecture with three primary roles following a ReAct (reasoning-and-act) pattern: **Planner** * Decomposes user requests into sub-tasks * Creates and updates execution plans * Delegates sub-tasks to the Code Interpreter * Reflects on results and iterates until completion **Code Generator (CG)** * Part of the Code Interpreter component * Generates Python code snippets for each sub-task * Incorporates plugin schemas and domain examples * Can produce pure code, plugin calls, or both **Code Executor** * Runs generated code in a sandboxed environment * Captures outputs, errors, and state changes * Maintains persistent session state across interactions The workflow follows: \text{User Request} \xrightarrow{\text{Planner}} \text{Sub-tasks} \xrightarrow{\text{CG}} \text{Python Code} \xrightarrow{\text{Executor}} \text{Results} \xrightarrow{\text{Planner}} \text{Response} ===== Plugin System ===== Plugins are the extensibility mechanism of TaskWeaver. Each plugin is defined as a Python function with a YAML schema: # Plugin definition: anomaly_detection.yaml # name: anomaly_detection # description: Detect anomalies in time series data # args: # - name: data # type: pd.DataFrame # description: Input DataFrame with timestamp and value columns # - name: threshold # type: float # description: Z-score threshold for anomaly detection # returns: # - name: result # type: pd.DataFrame # description: DataFrame with anomaly flags added import pandas as pd import numpy as np def anomaly_detection(data, threshold=2.0): mean = data["value"].mean() std = data["value"].std() data["z_score"] = (data["value"] - mean) / std data["is_anomaly"] = data["z_score"].abs() > threshold return data Key plugin features: * **Dynamic selection**: Only relevant plugins are included in the LLM prompt per request, avoiding prompt bloat * **Schema-driven**: YAML schemas tell the LLM how to call each plugin correctly * **Plugin-only mode**: Optional restriction that limits code generation to plugin calls only * **Scalable**: New plugins can be added without modifying the core framework ===== Stateful Execution ===== Unlike frameworks that reset state between turns, TaskWeaver maintains a **persistent execution environment**: * Variables (e.g., DataFrames) persist across sub-tasks within a session * The Code Executor tracks both chat history and code execution history * In-memory data structures are preserved, enabling iterative refinement This is critical for data analytics workflows where loading data once and performing multiple analyses is the standard pattern. ===== Handling Rich Data Structures ===== TaskWeaver natively supports complex data types that text-only agents struggle with: # Example: Multi-step data analytics with TaskWeaver # Step 1: Plugin loads data from SQL database -> returns DataFrame df = sql_pull_data(query="SELECT * FROM sales WHERE year=2024") # Step 2: Generated code performs analysis (not a plugin) import pandas as pd monthly = df.groupby(pd.Grouper(key="date", freq="M")).agg({ "revenue": "sum", "units": "sum" }) monthly["avg_price"] = monthly["revenue"] / monthly["units"] # Step 3: Plugin detects anomalies on the derived DataFrame anomalies = anomaly_detection(monthly, threshold=2.5) # Step 4: Generated code creates visualization import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(12, 6)) ax.plot(anomalies.index, anomalies["revenue"]) ax.scatter( anomalies[anomalies["is_anomaly"]].index, anomalies[anomalies["is_anomaly"]]["revenue"], color="red", s=100, label="Anomaly" ) ax.set_title("Monthly Revenue with Anomalies") plt.legend() plt.savefig("revenue_anomalies.png") The DataFrame flows through plugins and custom code without serialization, maintaining schema, types, and in-memory efficiency. ===== Security and Verification ===== TaskWeaver implements code verification before execution: * Configurable rules can ban forbidden imports (e.g., ''os.system'', ''subprocess'') * Code is inspected for unsafe function calls * Plugin-only mode restricts to vetted function calls * Sandboxed execution environment isolates code from the host system ===== Comparison to Other Frameworks ===== ^ Feature ^ TaskWeaver ^ LangChain ^ AutoGen ^ | Paradigm | Code-first generation | Prompt chaining with tools | Multi-agent conversations | | Data Structures | Native (DataFrame, etc.) | Text serialization | Text-based | | State Management | Persistent execution env | Chain state / memory | Conversation history | | Extensibility | Plugin YAML schemas | Tool/chain definitions | Agent role definitions | | Code Execution | Built-in sandboxed executor | Requires external setup | Code execution agent | | Complex Logic | Arbitrary Python code | Limited to tool chains | Agent negotiation | | Domain Adaptation | Examples + plugins | Prompt engineering | Agent specialization | ===== References ===== * [[https://arxiv.org/abs/2311.17541|Qiao et al. (2023) - TaskWeaver: A Code-First Agent Framework]] * [[https://github.com/microsoft/TaskWeaver|Official TaskWeaver Repository (Microsoft)]] * [[https://www.microsoft.com/en-us/research/blog/taskweaver-a-code-first-agent-framework-for-efficient-data-analytics-and-domain-adaptation/|Microsoft Research Blog: TaskWeaver]] ===== See Also ===== * [[agentbench|AgentBench]] - Benchmark for evaluating LLM agents in interactive environments * [[aider|Aider]] - AI pair programming tool with a different approach to code generation * [[tau_bench|tau-bench]] - Benchmark for tool-agent-user interaction