====== TaskWeaver ======
TaskWeaver is a code-first agent framework developed by Microsoft Research (Qiao et al., 2023) that converts natural language user requests into executable Python code. Unlike text-centric frameworks that chain LLM calls with predefined tools, TaskWeaver leverages LLM code generation capabilities to handle complex logic, rich data structures, and domain-specific analytics tasks through a stateful, plugin-extensible architecture.
===== Core Design Philosophy =====
TaskWeaver's central principle is **code-first execution**: every user request ultimately becomes runnable Python code. This design choice provides several advantages:
* **Expressiveness**: Arbitrary logic can be encoded in code, not constrained to predefined tool chains
* **Rich data structures**: Native support for DataFrames, arrays, dictionaries, and complex objects
* **Composability**: Plugins are callable functions that can be combined with custom code
* **Verifiability**: Generated code can be inspected, tested, and constrained by security rules
===== Architecture =====
TaskWeaver uses a multi-agent architecture with three primary roles following a ReAct (reasoning-and-act) pattern:
**Planner**
* Decomposes user requests into sub-tasks
* Creates and updates execution plans
* Delegates sub-tasks to the Code Interpreter
* Reflects on results and iterates until completion
**Code Generator (CG)**
* Part of the Code Interpreter component
* Generates Python code snippets for each sub-task
* Incorporates plugin schemas and domain examples
* Can produce pure code, plugin calls, or both
**Code Executor**
* Runs generated code in a sandboxed environment
* Captures outputs, errors, and state changes
* Maintains persistent session state across interactions
The workflow follows:
\text{User Request} \xrightarrow{\text{Planner}} \text{Sub-tasks} \xrightarrow{\text{CG}} \text{Python Code} \xrightarrow{\text{Executor}} \text{Results} \xrightarrow{\text{Planner}} \text{Response}
===== Plugin System =====
Plugins are the extensibility mechanism of TaskWeaver. Each plugin is defined as a Python function with a YAML schema:
# Plugin definition: anomaly_detection.yaml
# name: anomaly_detection
# description: Detect anomalies in time series data
# args:
# - name: data
# type: pd.DataFrame
# description: Input DataFrame with timestamp and value columns
# - name: threshold
# type: float
# description: Z-score threshold for anomaly detection
# returns:
# - name: result
# type: pd.DataFrame
# description: DataFrame with anomaly flags added
import pandas as pd
import numpy as np
def anomaly_detection(data, threshold=2.0):
mean = data["value"].mean()
std = data["value"].std()
data["z_score"] = (data["value"] - mean) / std
data["is_anomaly"] = data["z_score"].abs() > threshold
return data
Key plugin features:
* **Dynamic selection**: Only relevant plugins are included in the LLM prompt per request, avoiding prompt bloat
* **Schema-driven**: YAML schemas tell the LLM how to call each plugin correctly
* **Plugin-only mode**: Optional restriction that limits code generation to plugin calls only
* **Scalable**: New plugins can be added without modifying the core framework
===== Stateful Execution =====
Unlike frameworks that reset state between turns, TaskWeaver maintains a **persistent execution environment**:
* Variables (e.g., DataFrames) persist across sub-tasks within a session
* The Code Executor tracks both chat history and code execution history
* In-memory data structures are preserved, enabling iterative refinement
This is critical for data analytics workflows where loading data once and performing multiple analyses is the standard pattern.
===== Handling Rich Data Structures =====
TaskWeaver natively supports complex data types that text-only agents struggle with:
# Example: Multi-step data analytics with TaskWeaver
# Step 1: Plugin loads data from SQL database -> returns DataFrame
df = sql_pull_data(query="SELECT * FROM sales WHERE year=2024")
# Step 2: Generated code performs analysis (not a plugin)
import pandas as pd
monthly = df.groupby(pd.Grouper(key="date", freq="M")).agg({
"revenue": "sum",
"units": "sum"
})
monthly["avg_price"] = monthly["revenue"] / monthly["units"]
# Step 3: Plugin detects anomalies on the derived DataFrame
anomalies = anomaly_detection(monthly, threshold=2.5)
# Step 4: Generated code creates visualization
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(anomalies.index, anomalies["revenue"])
ax.scatter(
anomalies[anomalies["is_anomaly"]].index,
anomalies[anomalies["is_anomaly"]]["revenue"],
color="red", s=100, label="Anomaly"
)
ax.set_title("Monthly Revenue with Anomalies")
plt.legend()
plt.savefig("revenue_anomalies.png")
The DataFrame flows through plugins and custom code without serialization, maintaining schema, types, and in-memory efficiency.
===== Security and Verification =====
TaskWeaver implements code verification before execution:
* Configurable rules can ban forbidden imports (e.g., ''os.system'', ''subprocess'')
* Code is inspected for unsafe function calls
* Plugin-only mode restricts to vetted function calls
* Sandboxed execution environment isolates code from the host system
===== Comparison to Other Frameworks =====
^ Feature ^ TaskWeaver ^ LangChain ^ AutoGen ^
| Paradigm | Code-first generation | Prompt chaining with tools | Multi-agent conversations |
| Data Structures | Native (DataFrame, etc.) | Text serialization | Text-based |
| State Management | Persistent execution env | Chain state / memory | Conversation history |
| Extensibility | Plugin YAML schemas | Tool/chain definitions | Agent role definitions |
| Code Execution | Built-in sandboxed executor | Requires external setup | Code execution agent |
| Complex Logic | Arbitrary Python code | Limited to tool chains | Agent negotiation |
| Domain Adaptation | Examples + plugins | Prompt engineering | Agent specialization |
===== References =====
* [[https://arxiv.org/abs/2311.17541|Qiao et al. (2023) - TaskWeaver: A Code-First Agent Framework]]
* [[https://github.com/microsoft/TaskWeaver|Official TaskWeaver Repository (Microsoft)]]
* [[https://www.microsoft.com/en-us/research/blog/taskweaver-a-code-first-agent-framework-for-efficient-data-analytics-and-domain-adaptation/|Microsoft Research Blog: TaskWeaver]]
===== See Also =====
* [[agentbench|AgentBench]] - Benchmark for evaluating LLM agents in interactive environments
* [[aider|Aider]] - AI pair programming tool with a different approach to code generation
* [[tau_bench|tau-bench]] - Benchmark for tool-agent-user interaction