This is an old revision of the document!

Guidance

Guidance is an open-source Python library by Microsoft for controlling and constraining the outputs of large language models. With over 21,000 stars on GitHub, it implements constrained decoding — steering token generation at the inference layer to guarantee outputs match specified formats like JSON, Python, HTML, SQL, and more.¹⁾²⁾³⁾

Rather than relying on prompt engineering, retry loops, or post-processing, Guidance enforces structural constraints directly during model inference, achieving 100 guaranteed output structure with 30-50 reduction in latency and costs compared to conventional prompting techniques.

How Constrained Decoding Works

Guidance implements constrained decoding by steering the language model token by token during inference. Instead of generating text freely and hoping it matches a desired format, Guidance manipulates the token probability distribution at each step to ensure only valid tokens can be selected.

The library batches any additional text added by the user as execution unfolds, treating the entire process as a single API call rather than multiple sequential calls. This eliminates the need for expensive retries or fine-tuning.

Key Features

Guaranteed output structure — 100 compliance with specified formats * **Multiple constraint types** — Select (choices), regular expressions, context-free grammars * **Format flexibility** — JSON, Python, HTML, SQL, and any custom format * **30-50 cost reduction — Fewer API calls and faster inference * Pure Python syntax — Uses f-strings and prebuilt components * Tool deployment — Rich templates integrated into workflow * Local model support — Compatible with most open-source LLMs ===== Installation and Usage ===== <code python> # Install Guidance # pip install guidance import guidance from guidance import gen, select # Load a model model = guidance.models.LlamaCpp(“path/to/model.gguf”) # Basic constrained generation @guidance def character_creator(lm): lm += f“Name: {gen('name', max_tokens=20, stop='\n')}” lm += f“\nClass: {select(['Warrior', 'Mage', 'Rogue', 'Cleric'], name='class')}” lm += f“\nStrength: {gen('strength', regex='[0-9]{1,2}')}” lm += f“\nBackstory: {gen('backstory', max_tokens=100)}” return lm result = model + character_creator() print(result[“name”], result[“class”], result[“strength”]) # JSON output with guaranteed structure @guidance def json_extractor(lm, text): lm += f“Extract entities from: {text}\n” lm += '{“name”: “' + gen(“name”, stop='”') + '“, ' lm += '“age”: ' + gen(“age”, regex=”[0-9]+“) + ', ' lm += '“role”: ”' + select([“engineer”, “manager”, “designer”], name=“role”) + '“}' return lm result = model + json_extractor(text=“Alice is a 30-year-old engineer”) </code> ===== Architecture ===== <code> {init: {'theme': 'dark'}} graph TB Dev([Developer]) –>|Guidance Program| GP[Guidance Engine] GP –>|Token-by-Token Steering| LLM[Language Model] LLM –>|Logits| CD[Constrained Decoder] CD –>|Valid Token Mask| LLM GP –>|Constraints| CT{Constraint Type} CT –>|Choices| Sel[Select] CT –>|Pattern| Reg[Regex] CT –>|Grammar| CFG[Context-Free Grammar] CT –>|Free Text| Gen[gen with stop tokens] GP –>|Single API Call| Result[Structured Output] Result –>|Guaranteed Format| App[Application] subgraph Inference Layer LLM CD end </code> ===== Comparison with Prompt Engineering ===== ^ Approach ^ Structure Guarantee ^ Retries Needed ^ Latency ^ Cost ^ | Prompt Engineering | No guarantee | Often | High | High | | Output Parsing | No guarantee | Sometimes | Medium | Medium | | Fine-tuning | Partial | Rarely | Low | Very High (training) | | Guidance | 100%% guaranteed | Never | Low | Low** |

AI Agent Knowledge Base

Sidebar

Table of Contents

Guidance

How Constrained Decoding Works

Key Features

References

See Also

AI Agent Knowledge Base

User Tools

Site Tools

Sidebar

Table of Contents

Guidance

How Constrained Decoding Works

Key Features

References

See Also

Page Tools