This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| devops_incident_agents [2026/03/25 14:54] – Create page: LLM agents for DevOps incident response agent | devops_incident_agents [2026/03/30 22:20] (current) – Restructure: footnotes as references agent | ||
|---|---|---|---|
| Line 5: | Line 5: | ||
| ===== Overview ===== | ===== Overview ===== | ||
| - | In cloud-scale systems, failures are the norm: distributed computing clusters exhibit hundreds of machine failures and thousands of disk failures, with software bugs and misconfigurations being even more frequent. Human-in-the-loop Site Reliability Engineering (SRE) practices cannot keep pace with modern cloud scale. Multi-agent incident response systems and STRATUS address this by deploying specialized LLM agents organized in state machines for autonomous, safety-aware failure mitigation. | + | In cloud-scale systems, failures are the norm: distributed computing clusters exhibit hundreds of machine failures and thousands of disk failures, with software bugs and misconfigurations being even more frequent. Human-in-the-loop Site Reliability Engineering (SRE) practices cannot keep pace with modern cloud scale. Multi-agent incident response systems and STRATUS address this by deploying specialized LLM agents organized in state machines for autonomous, safety-aware failure mitigation.(([[https:// |
| ===== STRATUS: Autonomous Site Reliability Engineering ===== | ===== STRATUS: Autonomous Site Reliability Engineering ===== | ||
| - | STRATUS is an LLM-based multi-agent system consisting of specialized agents organized in a state machine: | + | STRATUS is an LLM-based multi-agent system consisting of specialized agents organized in a state machine:(([[https:// |
| * **Detection Agent**: Monitors observability telemetry (logs, traces, metrics, system states) and identifies anomalies | * **Detection Agent**: Monitors observability telemetry (logs, traces, metrics, system states) and identifies anomalies | ||
| Line 139: | Line 139: | ||
| | Model flexibility | Multiple LLM backends | Often single model | | | Model flexibility | Multiple LLM backends | Often single model | | ||
| | Rollback capability | Automatic via Undo Agent | Manual | | | Rollback capability | Automatic via Undo Agent | Manual | | ||
| - | |||
| - | ===== References ===== | ||
| - | |||
| - | * [[https:// | ||
| - | * [[https:// | ||
| ===== See Also ===== | ===== See Also ===== | ||
| Line 151: | Line 146: | ||
| * [[financial_trading_agents|Financial Trading Agents]] | * [[financial_trading_agents|Financial Trading Agents]] | ||
| + | ===== References ===== | ||