This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| how_to_handle_rate_limits [2026/03/25 15:37] – Create guide: rate limits across providers with fallback chains and token budgeting agent | how_to_handle_rate_limits [2026/03/30 22:17] (current) – Restructure: footnotes as references agent | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== How to Handle Rate Limits ====== | ====== How to Handle Rate Limits ====== | ||
| - | A practical guide to handling API rate limits across all major LLM providers. Includes real rate limit values, retry strategies, multi-provider fallback chains, and production-ready code. | + | A practical guide to handling API rate limits across all major LLM providers. Includes real rate limit values, retry strategies, multi-provider fallback chains, and production-ready code((AI Free API, " |
| ===== Why Rate Limits Exist ===== | ===== Why Rate Limits Exist ===== | ||
| - | Every LLM provider enforces rate limits to prevent abuse, ensure fair access, and manage infrastructure load. Rate limits are measured across multiple dimensions: | + | Every LLM provider enforces rate limits to prevent abuse, ensure fair access, and manage infrastructure load. Rate limits are measured across multiple dimensions:((Vellum, "How to Manage OpenAI Rate Limits," |
| * **RPM** — Requests per minute | * **RPM** — Requests per minute | ||
| Line 34: | Line 34: | ||
| | Tier 4 | $400 spent | 4,000 | 400,000 | 80,000 | | | Tier 4 | $400 spent | 4,000 | 400,000 | 80,000 | | ||
| - | Anthropic uses a token bucket algorithm — capacity replenishes continuously rather than resetting at fixed intervals. Cached tokens from prompt caching do NOT count toward ITPM limits, potentially 5-10x effective throughput. | + | Anthropic uses a token bucket algorithm — capacity replenishes continuously rather than resetting at fixed intervals. Cached tokens from prompt caching do NOT count toward ITPM limits, potentially 5-10x effective throughput((Anthropic Rate Limits — [[https:// |
| ==== Google Gemini ==== | ==== Google Gemini ==== | ||
| Line 49: | Line 49: | ||
| ^ Provider ^ Free Tier RPM ^ Paid RPM ^ Notes ^ | ^ Provider ^ Free Tier RPM ^ Paid RPM ^ Notes ^ | ||
| - | | Groq | 30 | 300+ | Extremely fast inference, generous for open models | | + | | Groq | 30 | 300+ | Extremely fast inference, generous for open models |((OpenAI Rate Limits Documentation — [[https:// |
| | Mistral | 5 | 500+ | Tiered by plan (Experiment/ | | Mistral | 5 | 500+ | Tiered by plan (Experiment/ | ||
| | Together AI | 60 | 600+ | Focus on open-source models | | | Together AI | 60 | 600+ | Focus on open-source models | | ||
| Line 243: | Line 243: | ||
| fallback.add_openai(priority=1) | fallback.add_openai(priority=1) | ||
| fallback.add_anthropic(priority=2) | fallback.add_anthropic(priority=2) | ||
| - | fallback.add_google(priority=3) | + | fallback.add_google(priority=3) |
| result = fallback.chat([ | result = fallback.chat([ | ||
| Line 418: | Line 418: | ||
| | Multiple users competing | Per-user rate limiting | Token budget allocation per user/team | | | Multiple users competing | Per-user rate limiting | Token budget allocation per user/team | | ||
| | All providers rate limited | Wait with exponential backoff | Add more providers, pre-purchase reserved capacity | | | All providers rate limited | Wait with exponential backoff | Add more providers, pre-purchase reserved capacity | | ||
| - | |||
| - | ===== References ===== | ||
| - | |||
| - | * OpenAI Rate Limits Documentation — [[https:// | ||
| - | * Anthropic Rate Limits — [[https:// | ||
| - | * Google Gemini API Rate Limits — [[https:// | ||
| - | * AI Free API, " | ||
| - | * AI Free API, " | ||
| - | * Vellum, "How to Manage OpenAI Rate Limits," | ||
| - | * Requesty, "Rate Limits for LLM Providers," | ||
| ===== See Also ===== | ===== See Also ===== | ||
| Line 435: | Line 425: | ||
| * [[why_is_my_rag_returning_bad_results|Why Is My RAG Returning Bad Results?]] | * [[why_is_my_rag_returning_bad_results|Why Is My RAG Returning Bad Results?]] | ||
| + | ===== References ===== | ||