Table of Contents

Droid (Factory)

Droid is Factory.ai's multi-model CLI coding agent that achieved the #1 ranking on Terminal-Bench with a score of 58.75%.1)2) Unlike open-source alternatives, Droid is a commercial product with a proprietary architecture optimized for enterprise software development. It features specialized “droids” for different task types and runs across terminals, IDEs (VS Code, JetBrains, Vim), web browsers, and integrations like Slack and Jira.

Website: https://factory.ai | Benchmark: https://factory.ai/news/terminal-bench

Key Features

Specialized Droids

Droid Purpose Optimization
Code Droid Feature development, refactoring, bug fixes Full tool access, implementation focus
Review Droid Pull request analysis and feedback Code quality patterns, security checks
QA Droid Testing and quality assurance Test generation, coverage analysis
Security Droid Security auditing and vulnerability detection OWASP patterns, dependency scanning
Custom Droids User-defined specialized behaviors Configurable via YAML/MD in .factory/droids/

Architecture

Droid's architecture is optimized for speed and accuracy:

Usage Example

# Install Droid CLI
curl -fsSL https://factory.ai/install.sh | sh
 
# Start interactive session
droid
 
# Use with a direct task
droid "implement rate limiting for the API endpoints"
 
# Switch modes (shift-tab to cycle: default, automatic, planning)
# In planning mode, Droid analyzes without making changes
 
# Use a specialized droid
droid review   # Review Droid for PR analysis
droid qa       # QA Droid for test generation
 
# Create a custom droid
# Add .factory/droids/security-auditor.yaml to your project
droid security-auditor "audit the authentication module"

How It Works

graph TD A[User Task] --> B[Main Droid Agent] B --> C[System Bootstrap] C --> D[Gather Environment Context] D --> E[Task Analysis] E --> F{Delegation Decision} F --> G[Code Droid] F --> H[Review Droid] F --> I[QA Droid] F --> J[Custom Droid] G --> K[Implementation] H --> L[PR Analysis] I --> M[Test Generation] J --> N[Specialized Task] K --> O[Optimized Tool Execution] L --> O M --> O N --> O O --> P[ripgrep Search] O --> Q[File Operations] O --> R[Shell Commands] P --> S[Result Aggregation] Q --> S R --> S S --> T[User Review] T --> U{Autonomy Level} U -->|Low| V[Manual Approval] U -->|High| W[Auto-Execute] V --> X[Apply Changes] W --> X

Terminal-Bench Results

Terminal-Bench is an open benchmark with 80+ human-verified, Dockerized tasks covering coding, build/test, dependency management, and data/ML workflows.3) Droid's #1 ranking demonstrates:

See Also

References