Semi-Formal Code Reasoning

Semi-Formal Code Reasoning is a structured prompting methodology for LLM agents that bridges the gap between unstructured chain-of-thought reasoning and full formal verification. Introduced by Ugare and Chandra at Meta (2026), the approach requires agents to construct explicit premises, trace execution paths, and derive formal conclusions when analyzing code semantics — all without executing the code.

Motivation

LLM-based coding agents increasingly generate patches, fix bugs, and modify codebases. A critical bottleneck is verifying correctness — determining whether two patches are semantically equivalent, localizing faults, or answering deep questions about code behavior. Traditional approaches fall into two extremes:

Unstructured CoT: The model reasons freely but can skip cases, make unsupported claims, or hallucinate behavior
Formal verification: Mathematically rigorous but requires specialized tooling and is impractical at scale

Semi-formal reasoning occupies the productive middle ground: structured enough to serve as a verifiable certificate, flexible enough to apply broadly.

The Semi-Formal Reasoning Methodology

The methodology imposes a three-phase structure on the agent's reasoning process:

Phase 1: Premise Construction. The agent explicitly identifies and states all relevant premises — variable states, control flow paths, type constraints, and environmental assumptions. This forces the agent to ground its reasoning in concrete code facts rather than general intuition.

Phase 2: Execution Tracing. The agent systematically traces execution paths through the code, tracking state transformations at each step. Unlike informal reasoning, every state transition must be justified by a specific code construct.

Phase 3: Formal Conclusion. The agent derives a conclusion that logically follows from the traced premises. The conclusion must address all identified paths — the agent cannot cherry-pick favorable cases.

# Example: Semi-formal reasoning for patch equivalence
# Original code:
def get_value(d, key):
    if key in d:
        return d[key]
    return None
 
# Patched code:
def get_value(d, key):
    return d.get(key)
 
# Semi-formal reasoning trace:
# PREMISES:
#   P1: d is a dict, key is hashable
#   P2: dict.get(key) returns d[key] if key in d, else None
#   P3: Original: if key in d -> return d[key]; else -> return None
# EXECUTION TRACE:
#   Case 1 (key in d): Original returns d[key], Patch returns d.get(key) = d[key]
#   Case 2 (key not in d): Original returns None, Patch returns d.get(key) = None
# CONCLUSION: Patches are semantically equivalent for all inputs
#   where d is a dict and key is hashable (EQUIVALENT)

Agentic Code Reasoning

The paper frames this within a broader capability called agentic code reasoning — an agent's ability to navigate files, trace dependencies, and gather context iteratively to perform deep semantic analysis without execution. The agent actively explores the codebase, building up context before applying semi-formal reasoning to specific questions.

This is distinct from single-pass analysis: the agent may need to read multiple files, follow import chains, and understand class hierarchies before it has sufficient premises to reason formally.

Evaluation Results

Semi-formal reasoning was evaluated across three distinct tasks:

Patch Equivalence Verification:

Standard agentic reasoning: 78% accuracy
Semi-formal reasoning: 88% on curated examples
On real-world agent-generated patches: 93% accuracy
This approaches the reliability needed for execution-free RL reward signals

Code Question Answering (RubberDuckBench):

Standard agentic reasoning: 78% accuracy
Semi-formal reasoning: 87% accuracy (+9 percentage points)

Fault Localization (Defects4J):

Semi-formal reasoning improves Top-5 accuracy by 5 percentage points over standard reasoning

Implications for RL Training Pipelines

A key practical application is using semi-formal reasoning as an execution-free reward signal for reinforcement learning. Current RL training for code agents requires running test suites to determine if patches are correct — this is expensive and slow. At 93% accuracy on real-world patches, semi-formal reasoning approaches the reliability threshold for replacing execution-based verification in RL reward computation.

$$ ext{Reward}(p) pprox ext{SemiFormal}(p_{ ext{original}}, p_{ ext{patched}}) \quad ext{where} \quad P( ext{correct}) \geq 0.93$$

The Certificate Property

The critical distinction from standard CoT is the certificate property: a semi-formal reasoning trace can be independently verified. Because the agent must state all premises, trace all paths, and derive conclusions that cover all cases, a reviewer (human or automated) can check:

Are all premises grounded in actual code?
Does the execution trace cover all control flow paths?
Does the conclusion follow from the traced paths?

This transforms reasoning from an opaque black box into an auditable chain of evidence.

AI Agent Knowledge Base

Sidebar

Table of Contents

Semi-Formal Code Reasoning

Motivation

The Semi-Formal Reasoning Methodology

Agentic Code Reasoning

Evaluation Results

Implications for RL Training Pipelines

The Certificate Property

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Semi-Formal Code Reasoning

Motivation

The Semi-Formal Reasoning Methodology

Agentic Code Reasoning

Evaluation Results

Implications for RL Training Pipelines

The Certificate Property

References

See Also

Page Tools