====== Financial Trading Agents ======

LLM-powered agents for financial trading fuse language understanding with reinforcement learning for sequential decision-making, while benchmarks reveal that most LLM agents struggle to outperform simple buy-and-hold strategies in real-market conditions.

===== Overview =====

Financial trading demands reasoning over multimodal data (price time series, fundamentals, news), sequential decision-making under uncertainty, and risk management. Three research threads address this: FLAG-Trader fuses LLMs with gradient-based RL for policy optimization, StockBench provides a contamination-free benchmark for realistic multi-month trading evaluation, and multi-agent investment teams deploy collaborative agent architectures for portfolio management.(([[https://arxiv.org/abs/2502.11433|Xiong et al. "FLAG-Trader: Fusion LLM Agent with Gradient-based RL for Financial Trading" (2025)]]))(([[https://arxiv.org/abs/2510.02209|Chen et al. "StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?" (2025)]]))(([[https://arxiv.org/abs/2602.23330|"Multi-Agent Investment Teams for Portfolio Management" (2026)]]))

===== FLAG-Trader: LLM + RL Fusion =====

FLAG-Trader uses a partially fine-tuned LLM as the policy network within a reinforcement learning framework:

**Architecture**: A parameter-efficient fine-tuning (PEFT) module encodes market data into textual state representations fed to the LLM policy network. Only a subset of LLM parameters is updated to balance domain adaptation with preservation of pre-trained knowledge.

**State Representation**: Temporal market data and textual streams (news, reports) are jointly processed into unified inputs:

<latex>s_t = \text{Encode}(x_t^{price}, x_t^{fund}, x_t^{text})</latex>

**Policy Optimization**: The LLM serves as policy $\pi_\theta(a|s)$ and is trained via policy gradient:

<latex>\theta \leftarrow \theta + \alpha \nabla_\theta J(\theta), \quad J(\theta) = \mathbb{E}\left[\sum_{t=0}^{T} \gamma^t r_t\right]</latex>

where $r_t$ captures trading rewards (returns, risk-adjusted metrics) and $\gamma$ is the discount factor.

**Key Result**: A 135M-parameter open-source model with RL fine-tuning surpasses larger proprietary models (e.g., GPT-o1-preview) in cumulative return and Sharpe ratio.

===== StockBench: Real-Market Benchmark =====

StockBench evaluates LLM agents in realistic, multi-month stock trading environments:

  * **Contamination-free**: Uses recent market data to prevent data leakage
  * **Daily decision cycle**: Agents receive prices, fundamentals, and news daily, making buy/sell/hold decisions
  * **Financial metrics**: Cumulative return, maximum drawdown, Sortino ratio

<latex>\text{Sortino} = \frac{R_p - R_f}{\sigma_d}</latex>

where $R_p$ is portfolio return, $R_f$ is risk-free rate, and $\sigma_d$ is downside deviation.

**Key Finding**: Most LLMs struggle to outperform buy-and-hold, revealing that strong static QA performance does not translate to effective trading behavior. Only select models (DeepSeek-V3 with lowest return variance, some GPT variants) show potential for higher risk-adjusted returns.

===== Multi-Agent Investment Teams =====

Collaborative multi-agent architectures deploy specialized roles for portfolio management:

  * **Analyst Agent**: Processes fundamentals and generates research reports
  * **Strategist Agent**: Formulates trading strategies based on market regime
  * **Risk Manager Agent**: Monitors portfolio exposure and enforces risk limits
  * **Executor Agent**: Optimizes trade execution timing and sizing

<latex>w^* = \arg\max_w \frac{\mathbb{E}[R_w] - R_f}{\sqrt{\text{Var}(R_w)}} \quad \text{s.t.} \quad \sum_i w_i = 1, \; w_i \geq 0</latex>

===== Code Example =====

<code python>
from dataclasses import dataclass
import numpy as np

@dataclass
class MarketState:
    prices: np.ndarray
    fundamentals: dict
    news: list[str]
    timestamp: str

class FLAGTrader:
    def __init__(self, llm_policy, risk_threshold: float = 0.05):
        self.policy = llm_policy
        self.risk_threshold = risk_threshold
        self.portfolio = {"cash": 100000.0, "holdings": {}}

    def encode_state(self, state: MarketState) -> str:
        price_summary = f"Prices: {state.prices[-5:].tolist()}"
        news_summary = " | ".join(state.news[:3])
        return (
            f"{price_summary}\n"
            f"Fundamentals: {state.fundamentals}\n"
            f"News: {news_summary}"
        )

    def decide(self, state: MarketState) -> dict:
        encoded = self.encode_state(state)
        action = self.policy.generate(
            f"Market state:\n{encoded}\n"
            f"Portfolio: {self.portfolio}\n"
            f"Decision (buy/sell/hold with sizing):"
        )
        return self.parse_action(action)

    def calculate_sortino(self, returns: np.ndarray,
                          risk_free: float = 0.02) -> float:
        excess = returns - risk_free / 252
        downside = np.sqrt(np.mean(np.minimum(excess, 0) ** 2))
        return np.mean(excess) / max(downside, 1e-8) * np.sqrt(252)

    def backtest(self, market_data: list[MarketState]) -> dict:
        daily_returns = []
        for state in market_data:
            action = self.decide(state)
            pnl = self.execute(action, state)
            daily_returns.append(pnl)
        returns = np.array(daily_returns)
        return {
            "cumulative_return": np.prod(1 + returns) - 1,
            "max_drawdown": self.calc_max_drawdown(returns),
            "sortino_ratio": self.calculate_sortino(returns)
        }
</code>

===== Architecture =====

<mermaid>
graph TD
    A[Daily Market Data] --> B[Price Encoder]
    A --> C[Fundamentals Encoder]
    A --> D[News/Text Encoder]
    B --> E[Unified State Representation]
    C --> E
    D --> E
    E --> F[LLM Policy Network - PEFT]
    F --> G[Action: Buy/Sell/Hold + Size]
    G --> H[Risk Manager]
    H --> I{Within Risk Limits?}
    I -->|Yes| J[Execute Trade]
    I -->|No| K[Adjust Position Size]
    K --> J
    J --> L[Portfolio Update]
    L --> M[Reward Calculation]
    M --> N[RL Policy Gradient Update]
    N --> F
    subgraph Evaluation - StockBench
        O[Multi-Month Simulation] --> P[Cumulative Return]
        O --> Q[Max Drawdown]
        O --> R[Sortino Ratio]
    end
</mermaid>

===== Key Results =====

^ System ^ Metric ^ Finding ^
| FLAG-Trader (135M) | Sharpe ratio | Outperforms GPT-o1-preview |
| FLAG-Trader | Cumulative return | Best across trading scenarios |
| StockBench | Buy-and-hold comparison | Most LLMs underperform |
| StockBench | Best performer | DeepSeek-V3 (lowest variance) |
| Multi-agent teams | Portfolio management | Role specialization improves risk-adjusted returns |

===== See Also =====

  * [[budget_aware_reasoning|Budget-Aware Reasoning]]
  * [[database_tuning_agents|Database Tuning Agents]]
  * [[devops_incident_agents|DevOps Incident Agents]]

===== References =====