====== TaskWeaver ======

TaskWeaver is a code-first agent framework developed by Microsoft Research (Qiao et al., 2023) that converts natural language user requests into executable Python code. Unlike text-centric frameworks that chain LLM calls with predefined tools, TaskWeaver leverages LLM code generation capabilities to handle complex logic, rich data structures, and domain-specific analytics tasks through a stateful, plugin-extensible architecture.

===== Core Design Philosophy =====

TaskWeaver's central principle is **code-first execution**: every user request ultimately becomes runnable Python code. This design choice provides several advantages:

  * **Expressiveness**: Arbitrary logic can be encoded in code, not constrained to predefined tool chains
  * **Rich data structures**: Native support for DataFrames, arrays, dictionaries, and complex objects
  * **Composability**: Plugins are callable functions that can be combined with custom code
  * **Verifiability**: Generated code can be inspected, tested, and constrained by security rules

===== Architecture =====

TaskWeaver uses a multi-agent architecture with three primary roles following a ReAct (reasoning-and-act) pattern:

**Planner**
  * Decomposes user requests into sub-tasks
  * Creates and updates execution plans
  * Delegates sub-tasks to the Code Interpreter
  * Reflects on results and iterates until completion

**Code Generator (CG)**
  * Part of the Code Interpreter component
  * Generates Python code snippets for each sub-task
  * Incorporates plugin schemas and domain examples
  * Can produce pure code, plugin calls, or both

**Code Executor**
  * Runs generated code in a sandboxed environment
  * Captures outputs, errors, and state changes
  * Maintains persistent session state across interactions

The workflow follows:

<latex>
\text{User Request} \xrightarrow{\text{Planner}} \text{Sub-tasks} \xrightarrow{\text{CG}} \text{Python Code} \xrightarrow{\text{Executor}} \text{Results} \xrightarrow{\text{Planner}} \text{Response}
</latex>

===== Plugin System =====

Plugins are the extensibility mechanism of TaskWeaver. Each plugin is defined as a Python function with a YAML schema:

<code python>
# Plugin definition: anomaly_detection.yaml
# name: anomaly_detection
# description: Detect anomalies in time series data
# args:
#   - name: data
#     type: pd.DataFrame
#     description: Input DataFrame with timestamp and value columns
#   - name: threshold
#     type: float
#     description: Z-score threshold for anomaly detection
# returns:
#   - name: result
#     type: pd.DataFrame
#     description: DataFrame with anomaly flags added

import pandas as pd
import numpy as np

def anomaly_detection(data, threshold=2.0):
    mean = data["value"].mean()
    std = data["value"].std()
    data["z_score"] = (data["value"] - mean) / std
    data["is_anomaly"] = data["z_score"].abs() > threshold
    return data
</code>

Key plugin features:
  * **Dynamic selection**: Only relevant plugins are included in the LLM prompt per request, avoiding prompt bloat
  * **Schema-driven**: YAML schemas tell the LLM how to call each plugin correctly
  * **Plugin-only mode**: Optional restriction that limits code generation to plugin calls only
  * **Scalable**: New plugins can be added without modifying the core framework

===== Stateful Execution =====

Unlike frameworks that reset state between turns, TaskWeaver maintains a **persistent execution environment**:

  * Variables (e.g., DataFrames) persist across sub-tasks within a session
  * The Code Executor tracks both chat history and code execution history
  * In-memory data structures are preserved, enabling iterative refinement

This is critical for data analytics workflows where loading data once and performing multiple analyses is the standard pattern.

===== Handling Rich Data Structures =====

TaskWeaver natively supports complex data types that text-only agents struggle with:

<code python>
# Example: Multi-step data analytics with TaskWeaver
# Step 1: Plugin loads data from SQL database -> returns DataFrame
df = sql_pull_data(query="SELECT * FROM sales WHERE year=2024")

# Step 2: Generated code performs analysis (not a plugin)
import pandas as pd
monthly = df.groupby(pd.Grouper(key="date", freq="M")).agg({
    "revenue": "sum",
    "units": "sum"
})
monthly["avg_price"] = monthly["revenue"] / monthly["units"]

# Step 3: Plugin detects anomalies on the derived DataFrame
anomalies = anomaly_detection(monthly, threshold=2.5)

# Step 4: Generated code creates visualization
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(anomalies.index, anomalies["revenue"])
ax.scatter(
    anomalies[anomalies["is_anomaly"]].index,
    anomalies[anomalies["is_anomaly"]]["revenue"],
    color="red", s=100, label="Anomaly"
)
ax.set_title("Monthly Revenue with Anomalies")
plt.legend()
plt.savefig("revenue_anomalies.png")
</code>

The DataFrame flows through plugins and custom code without serialization, maintaining schema, types, and in-memory efficiency.

===== Security and Verification =====

TaskWeaver implements code verification before execution:

  * Configurable rules can ban forbidden imports (e.g., ''os.system'', ''subprocess'')
  * Code is inspected for unsafe function calls
  * Plugin-only mode restricts to vetted function calls
  * Sandboxed execution environment isolates code from the host system

===== Comparison to Other Frameworks =====

^ Feature ^ TaskWeaver ^ LangChain ^ AutoGen ^
| Paradigm | Code-first generation | Prompt chaining with tools | Multi-agent conversations |
| Data Structures | Native (DataFrame, etc.) | Text serialization | Text-based |
| State Management | Persistent execution env | Chain state / memory | Conversation history |
| Extensibility | Plugin YAML schemas | Tool/chain definitions | Agent role definitions |
| Code Execution | Built-in sandboxed executor | Requires external setup | Code execution agent |
| Complex Logic | Arbitrary Python code | Limited to tool chains | Agent negotiation |
| Domain Adaptation | Examples + plugins | Prompt engineering | Agent specialization |

===== References =====

  * [[https://arxiv.org/abs/2311.17541|Qiao et al. (2023) - TaskWeaver: A Code-First Agent Framework]]
  * [[https://github.com/microsoft/TaskWeaver|Official TaskWeaver Repository (Microsoft)]]
  * [[https://www.microsoft.com/en-us/research/blog/taskweaver-a-code-first-agent-framework-for-efficient-data-analytics-and-domain-adaptation/|Microsoft Research Blog: TaskWeaver]]

===== See Also =====

  * [[agentbench|AgentBench]] - Benchmark for evaluating LLM agents in interactive environments
  * [[aider|Aider]] - AI pair programming tool with a different approach to code generation
  * [[tau_bench|tau-bench]] - Benchmark for tool-agent-user interaction