====== Droid (Factory) ======
**Droid** is Factory.ai's multi-model CLI coding agent that achieved the **#1 ranking on Terminal-Bench** with a score of 58.75%.(([[https://factory.ai|Factory.ai Website]]))(([[https://factory.ai/product/ide|Droid IDE Integration]])) Unlike open-source alternatives, Droid is a commercial product with a proprietary architecture optimized for enterprise software development. It features specialized "droids" for different task types and runs across terminals, IDEs (VS Code, JetBrains, Vim), web browsers, and integrations like Slack and Jira.
Website: [[https://factory.ai]] | Benchmark: [[https://factory.ai/news/terminal-bench]]
===== Key Features =====
* **Terminal-Bench #1** — Achieved state-of-the-art 58.75% on Terminal-Bench, outperforming all other agents including those from model labs
* **Specialized Droids** — Pre-built specialized agents (Code, Review, QA, Security, etc.) optimized for specific task types
* **Multi-Model Flexibility** — Uses any model (Claude, GPT, Gemini, custom) per task with no vendor lock-in
* **Adjustable Autonomy** — Levels from low (manual approval) to high (full autonomy), starting supervised for safety
* **Large Codebase Handling** — Agentic search understands million-line repositories instantly
* **Cross-Platform** — Runs in terminal, VS Code, JetBrains, Vim, web browser, Slack, and Jira
* **Background Execution** — Supports long-running tasks with process management and cleanup
* **Custom Droids** — Define specialized agents in ''.factory/droids/'' using YAML/Markdown configuration
===== Specialized Droids =====
^ Droid ^ Purpose ^ Optimization ^
| Code Droid | Feature development, refactoring, bug fixes | Full tool access, implementation focus |(([[https://factory.ai/news/code-droid-technical-report|Code Droid Technical Report]]))
| Review Droid | Pull request analysis and feedback | Code quality patterns, security checks |
| QA Droid | Testing and quality assurance | Test generation, coverage analysis |
| Security Droid | Security auditing and vulnerability detection | OWASP patterns, dependency scanning |
| Custom Droids | User-defined specialized behaviors | Configurable via YAML/MD in .factory/droids/ |
===== Architecture =====
Droid's architecture is optimized for speed and accuracy:
* **Main Agent** — Central orchestrator that delegates to specialized droids based on task analysis
* **Droid System** — Each specialized droid has its own optimized prompts, tool configurations, and model preferences
* **System Bootstrap** — Automatically gathers environment context (languages, git state, env vars, running processes) at session start
* **Speed Optimizations** — Uses ripgrep for fast code search, short timeouts for rapid iteration, efficient tool implementations
* **Context Layers** — Hierarchical context management for maintaining awareness across complex multi-step tasks
* **Subagent Delegation** — Main droid can spawn and coordinate specialized sub-droids for parallel work
===== Usage Example =====
# Install Droid CLI
curl -fsSL https://factory.ai/install.sh | sh
# Start interactive session
droid
# Use with a direct task
droid "implement rate limiting for the API endpoints"
# Switch modes (shift-tab to cycle: default, automatic, planning)
# In planning mode, Droid analyzes without making changes
# Use a specialized droid
droid review # Review Droid for PR analysis
droid qa # QA Droid for test generation
# Create a custom droid
# Add .factory/droids/security-auditor.yaml to your project
droid security-auditor "audit the authentication module"
===== How It Works =====
graph TD
A[User Task] --> B[Main Droid Agent]
B --> C[System Bootstrap]
C --> D[Gather Environment Context]
D --> E[Task Analysis]
E --> F{Delegation Decision}
F --> G[Code Droid]
F --> H[Review Droid]
F --> I[QA Droid]
F --> J[Custom Droid]
G --> K[Implementation]
H --> L[PR Analysis]
I --> M[Test Generation]
J --> N[Specialized Task]
K --> O[Optimized Tool Execution]
L --> O
M --> O
N --> O
O --> P[ripgrep Search]
O --> Q[File Operations]
O --> R[Shell Commands]
P --> S[Result Aggregation]
Q --> S
R --> S
S --> T[User Review]
T --> U{Autonomy Level}
U -->|Low| V[Manual Approval]
U -->|High| W[Auto-Execute]
V --> X[Apply Changes]
W --> X
===== Terminal-Bench Results =====
Terminal-Bench is an open benchmark with 80+ human-verified, Dockerized tasks covering coding, build/test, dependency management, and data/ML workflows.(([[https://factory.ai/news/terminal-bench|Terminal-Bench Results]])) Droid's #1 ranking demonstrates:
* **Agent Design Matters** — Droid outperformed all agents regardless of underlying model, proving that agent architecture is more important than model choice
* **Environment Awareness** — System bootstrapping gives Droid superior understanding of project context
* **Speed Optimization** — Fast tool execution (ripgrep, short timeouts) reduces iteration time
* **Practical Tasks** — Strong performance on real-world tasks, not just synthetic benchmarks
===== See Also =====
* [[gemini_cli|Gemini CLI]] — Google's terminal agent
* [[cline|Cline]] — Model-agnostic autonomous coding agent
* [[roo_code|Roo Code]] — Multi-mode CLI agent with orchestrator
* [[amazon_q_cli|Amazon Q CLI]] — AWS's agentic terminal
* [[trae_agent|Trae Agent]] — ByteDance's research-friendly CLI agent
===== References =====