Droid (Factory)
Droid is Factory.ai's multi-model CLI coding agent that achieved the #1 ranking on Terminal-Bench with a score of 58.75%.1)2) Unlike open-source alternatives, Droid is a commercial product with a proprietary architecture optimized for enterprise software development. It features specialized “droids” for different task types and runs across terminals, IDEs (VS Code, JetBrains, Vim), web browsers, and integrations like Slack and Jira.
Website: https://factory.ai | Benchmark: https://factory.ai/news/terminal-bench
Key Features
Terminal-Bench #1 — Achieved state-of-the-art 58.75% on Terminal-Bench, outperforming all other agents including those from model labs
Specialized Droids — Pre-built specialized agents (Code, Review, QA, Security, etc.) optimized for specific task types
Multi-Model Flexibility — Uses any model (Claude, GPT, Gemini, custom) per task with no vendor lock-in
Adjustable Autonomy — Levels from low (manual approval) to high (full autonomy), starting supervised for safety
Large Codebase Handling — Agentic search understands million-line repositories instantly
Cross-Platform — Runs in terminal, VS Code, JetBrains, Vim, web browser, Slack, and Jira
Background Execution — Supports long-running tasks with process management and cleanup
Custom Droids — Define specialized agents in .factory/droids/ using YAML/Markdown configuration
Specialized Droids
| Droid | Purpose | Optimization |
| Code Droid | Feature development, refactoring, bug fixes | Full tool access, implementation focus |
| Review Droid | Pull request analysis and feedback | Code quality patterns, security checks |
| QA Droid | Testing and quality assurance | Test generation, coverage analysis |
| Security Droid | Security auditing and vulnerability detection | OWASP patterns, dependency scanning |
| Custom Droids | User-defined specialized behaviors | Configurable via YAML/MD in .factory/droids/ |
Architecture
Droid's architecture is optimized for speed and accuracy:
Main Agent — Central orchestrator that delegates to specialized droids based on task analysis
Droid System — Each specialized droid has its own optimized prompts, tool configurations, and model preferences
System Bootstrap — Automatically gathers environment context (languages, git state, env vars, running processes) at session start
Speed Optimizations — Uses ripgrep for fast code search, short timeouts for rapid iteration, efficient tool implementations
Context Layers — Hierarchical context management for maintaining awareness across complex multi-step tasks
Subagent Delegation — Main droid can spawn and coordinate specialized sub-droids for parallel work
Usage Example
# Install Droid CLI
curl -fsSL https://factory.ai/install.sh | sh
# Start interactive session
droid
# Use with a direct task
droid "implement rate limiting for the API endpoints"
# Switch modes (shift-tab to cycle: default, automatic, planning)
# In planning mode, Droid analyzes without making changes
# Use a specialized droid
droid review # Review Droid for PR analysis
droid qa # QA Droid for test generation
# Create a custom droid
# Add .factory/droids/security-auditor.yaml to your project
droid security-auditor "audit the authentication module"
How It Works
graph TD
A[User Task] --> B[Main Droid Agent]
B --> C[System Bootstrap]
C --> D[Gather Environment Context]
D --> E[Task Analysis]
E --> F{Delegation Decision}
F --> G[Code Droid]
F --> H[Review Droid]
F --> I[QA Droid]
F --> J[Custom Droid]
G --> K[Implementation]
H --> L[PR Analysis]
I --> M[Test Generation]
J --> N[Specialized Task]
K --> O[Optimized Tool Execution]
L --> O
M --> O
N --> O
O --> P[ripgrep Search]
O --> Q[File Operations]
O --> R[Shell Commands]
P --> S[Result Aggregation]
Q --> S
R --> S
S --> T[User Review]
T --> U{Autonomy Level}
U -->|Low| V[Manual Approval]
U -->|High| W[Auto-Execute]
V --> X[Apply Changes]
W --> X
Terminal-Bench Results
Terminal-Bench is an open benchmark with 80+ human-verified, Dockerized tasks covering coding, build/test, dependency management, and data/ML workflows.3) Droid's #1 ranking demonstrates:
Agent Design Matters — Droid outperformed all agents regardless of underlying model, proving that agent architecture is more important than model choice
Environment Awareness — System bootstrapping gives Droid superior understanding of project context
Speed Optimization — Fast tool execution (ripgrep, short timeouts) reduces iteration time
Practical Tasks — Strong performance on real-world tasks, not just synthetic benchmarks
See Also
-
Cline — Model-agnostic autonomous coding agent
Roo Code — Multi-mode CLI agent with orchestrator
-
Trae Agent — ByteDance's research-friendly CLI agent
References