Core Design Philosophy
Architecture
Plugin System
Stateful Execution
Handling Rich Data Structures
Security and Verification
Comparison to Other Frameworks
References
See Also

TaskWeaver

TaskWeaver is a code-first agent framework developed by Microsoft Research (Qiao et al., 2023) that converts natural language user requests into executable Python code. Unlike text-centric frameworks that chain LLM calls with predefined tools, TaskWeaver leverages LLM code generation capabilities to handle complex logic, rich data structures, and domain-specific analytics tasks through a stateful, plugin-extensible architecture.

Core Design Philosophy

TaskWeaver's central principle is code-first execution: every user request ultimately becomes runnable Python code. This design choice provides several advantages:

Expressiveness: Arbitrary logic can be encoded in code, not constrained to predefined tool chains
Rich data structures: Native support for DataFrames, arrays, dictionaries, and complex objects
Composability: Plugins are callable functions that can be combined with custom code
Verifiability: Generated code can be inspected, tested, and constrained by security rules

Architecture

TaskWeaver uses a multi-agent architecture with three primary roles following a ReAct (reasoning-and-act) pattern:

Planner

Decomposes user requests into sub-tasks
Creates and updates execution plans
Delegates sub-tasks to the Code Interpreter
Reflects on results and iterates until completion

Code Generator (CG)

Part of the Code Interpreter component
Generates Python code snippets for each sub-task
Incorporates plugin schemas and domain examples
Can produce pure code, plugin calls, or both

Code Executor

Runs generated code in a sandboxed environment
Captures outputs, errors, and state changes
Maintains persistent session state across interactions

The workflow follows:

<latex> \text{User Request} \xrightarrow{\text{Planner}} \text{Sub-tasks} \xrightarrow{\text{CG}} \text{Python Code} \xrightarrow{\text{Executor}} \text{Results} \xrightarrow{\text{Planner}} \text{Response} </latex>

Plugin System

Plugins are the extensibility mechanism of TaskWeaver. Each plugin is defined as a Python function with a YAML schema:

# Plugin definition: anomaly_detection.yaml
# name: anomaly_detection
# description: Detect anomalies in time series data
# args:
#   - name: data
#     type: pd.DataFrame
#     description: Input DataFrame with timestamp and value columns
#   - name: threshold
#     type: float
#     description: Z-score threshold for anomaly detection
# returns:
#   - name: result
#     type: pd.DataFrame
#     description: DataFrame with anomaly flags added
 
import pandas as pd
import numpy as np
 
def anomaly_detection(data, threshold=2.0):
    mean = data["value"].mean()
    std = data["value"].std()
    data["z_score"] = (data["value"] - mean) / std
    data["is_anomaly"] = data["z_score"].abs() > threshold
    return data

Key plugin features:

Dynamic selection: Only relevant plugins are included in the LLM prompt per request, avoiding prompt bloat
Schema-driven: YAML schemas tell the LLM how to call each plugin correctly
Plugin-only mode: Optional restriction that limits code generation to plugin calls only
Scalable: New plugins can be added without modifying the core framework

Stateful Execution

Unlike frameworks that reset state between turns, TaskWeaver maintains a persistent execution environment:

Variables (e.g., DataFrames) persist across sub-tasks within a session
The Code Executor tracks both chat history and code execution history
In-memory data structures are preserved, enabling iterative refinement

This is critical for data analytics workflows where loading data once and performing multiple analyses is the standard pattern.

Handling Rich Data Structures

TaskWeaver natively supports complex data types that text-only agents struggle with:

# Example: Multi-step data analytics with TaskWeaver
# Step 1: Plugin loads data from SQL database -> returns DataFrame
df = sql_pull_data(query="SELECT * FROM sales WHERE year=2024")
 
# Step 2: Generated code performs analysis (not a plugin)
import pandas as pd
monthly = df.groupby(pd.Grouper(key="date", freq="M")).agg({
    "revenue": "sum",
    "units": "sum"
})
monthly["avg_price"] = monthly["revenue"] / monthly["units"]
 
# Step 3: Plugin detects anomalies on the derived DataFrame
anomalies = anomaly_detection(monthly, threshold=2.5)
 
# Step 4: Generated code creates visualization
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(anomalies.index, anomalies["revenue"])
ax.scatter(
    anomalies[anomalies["is_anomaly"]].index,
    anomalies[anomalies["is_anomaly"]]["revenue"],
    color="red", s=100, label="Anomaly"
)
ax.set_title("Monthly Revenue with Anomalies")
plt.legend()
plt.savefig("revenue_anomalies.png")

The DataFrame flows through plugins and custom code without serialization, maintaining schema, types, and in-memory efficiency.

Security and Verification

TaskWeaver implements code verification before execution:

Configurable rules can ban forbidden imports (e.g., os.system, subprocess)
Code is inspected for unsafe function calls
Plugin-only mode restricts to vetted function calls
Sandboxed execution environment isolates code from the host system

Comparison to Other Frameworks

Feature	TaskWeaver	LangChain	AutoGen
Paradigm	Code-first generation	Prompt chaining with tools	Multi-agent conversations
Data Structures	Native (DataFrame, etc.)	Text serialization	Text-based
State Management	Persistent execution env	Chain state / memory	Conversation history
Extensibility	Plugin YAML schemas	Tool/chain definitions	Agent role definitions
Code Execution	Built-in sandboxed executor	Requires external setup	Code execution agent
Complex Logic	Arbitrary Python code	Limited to tool chains	Agent negotiation
Domain Adaptation	Examples + plugins	Prompt engineering	Agent specialization

Table of Contents