This is an old revision of the document!

Droid (Factory)

Droid is Factory.ai's multi-model CLI coding agent that achieved the #1 ranking on Terminal-Bench with a score of 58.75%. Unlike open-source alternatives, Droid is a commercial product with a proprietary architecture optimized for enterprise software development. It features specialized “droids” for different task types and runs across terminals, IDEs (VS Code, JetBrains, Vim), web browsers, and integrations like Slack and Jira.

Website: https://factory.ai | Benchmark: https://factory.ai/news/terminal-bench

Key Features

Terminal-Bench #1 — Achieved state-of-the-art 58.75% on Terminal-Bench, outperforming all other agents including those from model labs
Specialized Droids — Pre-built specialized agents (Code, Review, QA, Security, etc.) optimized for specific task types
Multi-Model Flexibility — Uses any model (Claude, GPT, Gemini, custom) per task with no vendor lock-in
Adjustable Autonomy — Levels from low (manual approval) to high (full autonomy), starting supervised for safety
Large Codebase Handling — Agentic search understands million-line repositories instantly
Cross-Platform — Runs in terminal, VS Code, JetBrains, Vim, web browser, Slack, and Jira
Background Execution — Supports long-running tasks with process management and cleanup
Custom Droids — Define specialized agents in .factory/droids/ using YAML/Markdown configuration

Specialized Droids

Droid	Purpose	Optimization
Code Droid	Feature development, refactoring, bug fixes	Full tool access, implementation focus
Review Droid	Pull request analysis and feedback	Code quality patterns, security checks
QA Droid	Testing and quality assurance	Test generation, coverage analysis
Security Droid	Security auditing and vulnerability detection	OWASP patterns, dependency scanning
Custom Droids	User-defined specialized behaviors	Configurable via YAML/MD in .factory/droids/

Architecture

Droid's architecture is optimized for speed and accuracy:

Main Agent — Central orchestrator that delegates to specialized droids based on task analysis
Droid System — Each specialized droid has its own optimized prompts, tool configurations, and model preferences
System Bootstrap — Automatically gathers environment context (languages, git state, env vars, running processes) at session start
Speed Optimizations — Uses ripgrep for fast code search, short timeouts for rapid iteration, efficient tool implementations
Context Layers — Hierarchical context management for maintaining awareness across complex multi-step tasks
Subagent Delegation — Main droid can spawn and coordinate specialized sub-droids for parallel work

Usage Example

# Install Droid CLI
curl -fsSL https://factory.ai/install.sh | sh
 
# Start interactive session
droid
 
# Use with a direct task
droid "implement rate limiting for the API endpoints"
 
# Switch modes (shift-tab to cycle: default, automatic, planning)
# In planning mode, Droid analyzes without making changes
 
# Use a specialized droid
droid review   # Review Droid for PR analysis
droid qa       # QA Droid for test generation
 
# Create a custom droid
# Add .factory/droids/security-auditor.yaml to your project
droid security-auditor "audit the authentication module"

How It Works

graph TD A[User Task] --> B[Main Droid Agent] B --> C[System Bootstrap] C --> D[Gather Environment Context] D --> E[Task Analysis] E --> F{Delegation Decision} F --> G[Code Droid] F --> H[Review Droid] F --> I[QA Droid] F --> J[Custom Droid] G --> K[Implementation] H --> L[PR Analysis] I --> M[Test Generation] J --> N[Specialized Task] K --> O[Optimized Tool Execution] L --> O M --> O N --> O O --> P[ripgrep Search] O --> Q[File Operations] O --> R[Shell Commands] P --> S[Result Aggregation] Q --> S R --> S S --> T[User Review] T --> U{Autonomy Level} U -->|Low| V[Manual Approval] U -->|High| W[Auto-Execute] V --> X[Apply Changes] W --> X

Terminal-Bench Results

Terminal-Bench is an open benchmark with 80+ human-verified, Dockerized tasks covering coding, build/test, dependency management, and data/ML workflows. Droid's #1 ranking demonstrates:

Agent Design Matters — Droid outperformed all agents regardless of underlying model, proving that agent architecture is more important than model choice
Environment Awareness — System bootstrapping gives Droid superior understanding of project context
Speed Optimization — Fast tool execution (ripgrep, short timeouts) reduces iteration time
Practical Tasks — Strong performance on real-world tasks, not just synthetic benchmarks

AI Agent Knowledge Base

Sidebar

Table of Contents

Droid (Factory)

Key Features

Specialized Droids

Architecture

Usage Example

How It Works

Terminal-Bench Results

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Droid (Factory)

Key Features

Specialized Droids

Architecture

Usage Example

How It Works

Terminal-Bench Results

References

See Also

Page Tools