Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
This is an old revision of the document!
Droid is Factory.ai's multi-model CLI coding agent that achieved the #1 ranking on Terminal-Bench with a score of 58.75%.1)2) Unlike open-source alternatives, Droid is a commercial product with a proprietary architecture optimized for enterprise software development. It features specialized “droids” for different task types and runs across terminals, IDEs (VS Code, JetBrains, Vim), web browsers, and integrations like Slack and Jira.
Website: https://factory.ai | Benchmark: https://factory.ai/news/terminal-bench
.factory/droids/ using YAML/Markdown configuration| Droid | Purpose | Optimization |
|---|---|---|
| Code Droid | Feature development, refactoring, bug fixes | Full tool access, implementation focus |
| Review Droid | Pull request analysis and feedback | Code quality patterns, security checks |
| QA Droid | Testing and quality assurance | Test generation, coverage analysis |
| Security Droid | Security auditing and vulnerability detection | OWASP patterns, dependency scanning |
| Custom Droids | User-defined specialized behaviors | Configurable via YAML/MD in .factory/droids/ |
Droid's architecture is optimized for speed and accuracy:
# Install Droid CLI curl -fsSL https://factory.ai/install.sh | sh # Start interactive session droid # Use with a direct task droid "implement rate limiting for the API endpoints" # Switch modes (shift-tab to cycle: default, automatic, planning) # In planning mode, Droid analyzes without making changes # Use a specialized droid droid review # Review Droid for PR analysis droid qa # QA Droid for test generation # Create a custom droid # Add .factory/droids/security-auditor.yaml to your project droid security-auditor "audit the authentication module"
Terminal-Bench is an open benchmark with 80+ human-verified, Dockerized tasks covering coding, build/test, dependency management, and data/ML workflows.3) Droid's #1 ranking demonstrates: