TaskWeaver is a code-first agent framework developed by Microsoft Research (Qiao et al., 2023) that converts natural language user requests into executable Python code. Unlike text-centric frameworks that chain LLM calls with predefined tools, TaskWeaver leverages LLM code generation capabilities to handle complex logic, rich data structures, and domain-specific analytics tasks through a stateful, plugin-extensible architecture.
TaskWeaver's central principle is code-first execution: every user request ultimately becomes runnable Python code. This design choice provides several advantages:
TaskWeaver uses a multi-agent architecture with three primary roles following a ReAct (reasoning-and-act) pattern:
Planner
Code Generator (CG)
Code Executor
The workflow follows:
<latex> \text{User Request} \xrightarrow{\text{Planner}} \text{Sub-tasks} \xrightarrow{\text{CG}} \text{Python Code} \xrightarrow{\text{Executor}} \text{Results} \xrightarrow{\text{Planner}} \text{Response} </latex>
Plugins are the extensibility mechanism of TaskWeaver. Each plugin is defined as a Python function with a YAML schema:
# Plugin definition: anomaly_detection.yaml # name: anomaly_detection # description: Detect anomalies in time series data # args: # - name: data # type: pd.DataFrame # description: Input DataFrame with timestamp and value columns # - name: threshold # type: float # description: Z-score threshold for anomaly detection # returns: # - name: result # type: pd.DataFrame # description: DataFrame with anomaly flags added import pandas as pd import numpy as np def anomaly_detection(data, threshold=2.0): mean = data["value"].mean() std = data["value"].std() data["z_score"] = (data["value"] - mean) / std data["is_anomaly"] = data["z_score"].abs() > threshold return data
Key plugin features:
Unlike frameworks that reset state between turns, TaskWeaver maintains a persistent execution environment:
This is critical for data analytics workflows where loading data once and performing multiple analyses is the standard pattern.
TaskWeaver natively supports complex data types that text-only agents struggle with:
# Example: Multi-step data analytics with TaskWeaver # Step 1: Plugin loads data from SQL database -> returns DataFrame df = sql_pull_data(query="SELECT * FROM sales WHERE year=2024") # Step 2: Generated code performs analysis (not a plugin) import pandas as pd monthly = df.groupby(pd.Grouper(key="date", freq="M")).agg({ "revenue": "sum", "units": "sum" }) monthly["avg_price"] = monthly["revenue"] / monthly["units"] # Step 3: Plugin detects anomalies on the derived DataFrame anomalies = anomaly_detection(monthly, threshold=2.5) # Step 4: Generated code creates visualization import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(12, 6)) ax.plot(anomalies.index, anomalies["revenue"]) ax.scatter( anomalies[anomalies["is_anomaly"]].index, anomalies[anomalies["is_anomaly"]]["revenue"], color="red", s=100, label="Anomaly" ) ax.set_title("Monthly Revenue with Anomalies") plt.legend() plt.savefig("revenue_anomalies.png")
The DataFrame flows through plugins and custom code without serialization, maintaining schema, types, and in-memory efficiency.
TaskWeaver implements code verification before execution:
os.system, subprocess)| Feature | TaskWeaver | LangChain | AutoGen |
|---|---|---|---|
| Paradigm | Code-first generation | Prompt chaining with tools | Multi-agent conversations |
| Data Structures | Native (DataFrame, etc.) | Text serialization | Text-based |
| State Management | Persistent execution env | Chain state / memory | Conversation history |
| Extensibility | Plugin YAML schemas | Tool/chain definitions | Agent role definitions |
| Code Execution | Built-in sandboxed executor | Requires external setup | Code execution agent |
| Complex Logic | Arbitrary Python code | Limited to tool chains | Agent negotiation |
| Domain Adaptation | Examples + plugins | Prompt engineering | Agent specialization |