====== Guidance ======

**Guidance** is an open-source Python library by **Microsoft** for controlling and constraining the outputs of large language models. With over **21,000 stars** on GitHub, it implements **constrained decoding** — steering token generation at the inference layer to guarantee outputs match specified formats like JSON, Python, HTML, SQL, and more.(([[https://github.com/guidance-ai/guidance|GitHub Repository]]))(([[https://www.microsoft.com/en-us/research/project/guidance-control-lm-output/|Microsoft Research Project Page]]))(([[https://github.com/guidance-ai/llguidance|Low-Level Guidance Engine (llguidance)]]))

Rather than relying on prompt engineering, retry loops, or post-processing, Guidance enforces structural constraints directly during model inference, achieving 100%% guaranteed output structure with 30-50%% reduction in latency and costs compared to conventional prompting techniques.

===== How Constrained Decoding Works =====

Guidance implements constrained decoding by steering the language model **token by token** during inference. Instead of generating text freely and hoping it matches a desired format, Guidance manipulates the token probability distribution at each step to ensure only valid tokens can be selected.

The library batches any additional text added by the user as execution unfolds, treating the entire process as a **single API call** rather than multiple sequential calls. This eliminates the need for expensive retries or fine-tuning.

===== Key Features =====

  * **Guaranteed output structure** — 100%% compliance with specified formats
  * **Multiple constraint types** — Select (choices), regular expressions, context-free grammars
  * **Format flexibility** — JSON, Python, HTML, SQL, and any custom format
  * **30-50%% cost reduction** — Fewer API calls and faster inference
  * **Pure Python syntax** — Uses f-strings and prebuilt components
  * **Tool deployment** — Rich templates integrated into workflow
  * **Local model support** — Compatible with most open-source LLMs

===== Installation and Usage =====

<code python>
# Install Guidance
# pip install guidance

import guidance
from guidance import gen, select

# Load a model
model = guidance.models.LlamaCpp("path/to/model.gguf")

# Basic constrained generation
@guidance
def character_creator(lm):
    lm += f"Name: {gen('name', max_tokens=20, stop='\n')}"
    lm += f"\nClass: {select(['Warrior', 'Mage', 'Rogue', 'Cleric'], name='class')}"
    lm += f"\nStrength: {gen('strength', regex='[0-9]{1,2}')}"
    lm += f"\nBackstory: {gen('backstory', max_tokens=100)}"
    return lm

result = model + character_creator()
print(result["name"], result["class"], result["strength"])

# JSON output with guaranteed structure
@guidance
def json_extractor(lm, text):
    lm += f"Extract entities from: {text}\n"
    lm += '{"name": "' + gen("name", stop='"') + '", '
    lm += '"age": ' + gen("age", regex="[0-9]+") + ', '
    lm += '"role": "' + select(["engineer", "manager", "designer"], name="role") + '"}'
    return lm

result = model + json_extractor(text="Alice is a 30-year-old engineer")
</code>

===== Architecture =====

<code>
%%{init: {'theme': 'dark'}}%%
graph TB
    Dev([Developer]) -->|Guidance Program| GP[Guidance Engine]
    GP -->|Token-by-Token Steering| LLM[Language Model]
    LLM -->|Logits| CD[Constrained Decoder]
    CD -->|Valid Token Mask| LLM
    GP -->|Constraints| CT{Constraint Type}
    CT -->|Choices| Sel[Select]
    CT -->|Pattern| Reg[Regex]
    CT -->|Grammar| CFG[Context-Free Grammar]
    CT -->|Free Text| Gen[gen with stop tokens]
    GP -->|Single API Call| Result[Structured Output]
    Result -->|Guaranteed Format| App[Application]
    subgraph Inference Layer
        LLM
        CD
    end
</code>

===== Comparison with Prompt Engineering =====

^ Approach ^ Structure Guarantee ^ Retries Needed ^ Latency ^ Cost ^
| Prompt Engineering | No guarantee | Often | High | High |
| Output Parsing | No guarantee | Sometimes | Medium | Medium |
| Fine-tuning | Partial | Rarely | Low | Very High (training) |
| **Guidance** | **100%% guaranteed** | **Never** | **Low** | **Low** |

===== See Also =====

  * [[outlines|Outlines — Structured Output via Constrained Decoding]]
  * [[promptfoo|Promptfoo — LLM Evaluation and Red Teaming]]
  * [[deepeval|DeepEval — Unit-Test Style LLM Evaluation]]

===== References =====