Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
Core Concepts
Reasoning
Memory & Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools
Safety & Security
Evaluation
Meta
An agentic AI workflow is a process where AI agents autonomously perceive, decide, reason through multi-step tasks, and execute actions with minimal human intervention. Unlike single-shot prompting, agentic workflows place a model inside a loop — it plans, acts, observes results, and iterates until a goal is satisfied.1)
Ng's key insight captures the paradigm shift: «Better workflows beat better models.»
A workflow earns the label agentic when it exhibits all of the following properties:
These properties together allow a small model running inside a well-designed loop to outperform a much larger model running in zero-shot mode — a finding repeatedly confirmed on coding benchmarks (see Benchmarks below).
Andrew Ng identified four foundational design patterns for agentic systems at the Sequoia AI Ascent conference in March 2024.2)
The agent critiques its own output and iterates. A separate critic call (which may use the same model) scores or annotates the draft; the generator then revises. This simple loop pushed GPT-4 from 67% to 88% on the HumanEval coding benchmark — matching or exceeding human performance without any change to the underlying model weights.
The agent interleaves Reasoning and Acting steps (the ReAct pattern): it emits a thought, calls a tool (web search, Python interpreter, SQL query, external API), observes the result, and continues reasoning. Tool use breaks the knowledge-cutoff barrier and enables real-time data access.
The agent decomposes a high-level goal into an ordered sequence of sub-tasks, executes them, and replans when a step fails or returns unexpected results. Planning agents can handle objectives that span dozens of tool calls and require backtracking.
Specialised agents collaborate: one orchestrates, others execute domain-specific subtasks (coding, research, QA, summarisation). Systems such as ChatDev model an entire software company as a society of agents with distinct roles. Multi-agent architectures increase parallelism and allow each agent to stay within a focused context window.
Practitioners increasingly treat deliberate human checkpoints as a first-class design element rather than an afterthought. The agent pauses at high-stakes decision points, presents its reasoning, and waits for approval before proceeding. This pattern is especially prominent in enterprise deployments where auditability and compliance are required.
HumanEval pass@1 results illustrate the leverage that agentic scaffolding adds on top of raw model capability:
| Setup | HumanEval pass@1 |
|---|---|
| GPT-3.5, zero-shot | 48% |
| GPT-4, zero-shot | 67% |
| GPT-3.5, agentic (iterative) | >67% (exceeded GPT-4 zero-shot) |
| GPT-4, agentic (AlphaCodium flow) | 95.1% |
The AlphaCodium result — achieved by wrapping GPT-4 in a multi-step code-generation and test-driven refinement loop — exceeds the zero-shot score by more than 28 percentage points without fine-tuning.
The following open-source and managed frameworks are widely used to build agentic AI workflows:
| Framework | Maintainer | Primary abstraction | Notes |
|---|---|---|---|
| LangGraph | LangChain | Stateful directed graph | Fine-grained control over agent loops; see LangGraph |
| CrewAI | CrewAI Inc. | Role-based crew of agents | High-level; see CrewAI |
| AutoGen | Microsoft | Conversational multi-agent | Research-grade; see AutoGen |
| LlamaIndex | LlamaIndex | Data-centric agent pipelines | Strong RAG integration |
| OpenAI Agents SDK | OpenAI | Handoffs and guardrails | First-party SDK for GPT models |
| Amazon Bedrock Agents | AWS | Managed agent runtime | Enterprise-managed; native AWS tooling |
| Google Vertex AI Agent Builder | Google Cloud | Managed agent runtime | Integrates Gemini models and Google Search |
Adoption has moved from research to production faster than most technology cycles:
The cost-reduction figures from AT&T underline Ng's workflow-over-model thesis: infrastructure and orchestration design matter as much as model selection.