LLM API rate limits cheatsheet
RPM and TPM by provider and tier. Bookmark this. Avoid the 429.
Hit a 429? Here’s where each provider’s tiers sit, what you spend to move up, and how to plan around them. Numbers below come from each provider’s public rate-limit docs, focused on the most-used model per provider. Most providers cap on three axes at once — RPM, TPM, and a daily or spend ceiling — so the actual limit you hit is whichever runs out first.
OpenAI
Showing limits for GPT-4o. o1 / o3 models have lower per-model RPM caps (e.g., o1 starts at 500 RPM on Tier 1). Mini variants get higher TPM.
| Tier | Qualification | RPM | TPM | Daily limit | Notes |
|---|---|---|---|---|---|
| Tier 1 | $5 spent + 7 days | 500 | 30,000 | 200 requests | |
| Tier 2 | $50 spent + 7 days | 5,000 | 450,000 | — | |
| Tier 3 | $100 spent + 7 days | 5,000 | 800,000 | — | |
| Tier 4 | $250 spent + 14 days | 10,000 | 2,000,000 | — | |
| Tier 5 | $1,000 spent + 30 days | 30,000 | 30,000,000 | — |
Anthropic
Showing limits for Claude Sonnet 4.6. Opus has half the TPM at the same tier. Haiku has 2× TPM.
| Tier | Qualification | RPM | TPM | Daily limit | Notes |
|---|---|---|---|---|---|
| Build Tier 1 | Credit card added | 50 | 40,000 | — | |
| Build Tier 2 | $40 spent + 7 days | 1,000 | 80,000 | — | |
| Build Tier 3 | $200 spent + 7 days | 2,000 | 160,000 | — | |
| Build Tier 4 | $400 spent + 14 days | 4,000 | 400,000 | — | |
| Scale | Sales contract | — | — | — | Custom limits, negotiated. |
Showing limits for Gemini 2.0 Flash. Gemini 1.5 Pro has lower RPM (300 on Tier 1).
| Tier | Qualification | RPM | TPM | Daily limit | Notes |
|---|---|---|---|---|---|
| Free | Google account | 15 | 1,000,000 | 1,500 requests | Used for training by default. |
| Tier 1 | Billing enabled | 2,000 | 4,000,000 | — | |
| Tier 2 | $250 spent + 30 days | 10,000 | 10,000,000 | — |
Groq
Showing limits for Llama 3.3 70B.
| Tier | Qualification | RPM | TPM | Daily limit | Notes |
|---|---|---|---|---|---|
| Free | Sign up | 30 | 6,000 | 14,400 requests | |
| Pay-as-you-go | Billing enabled | 1,000 | 300,000 | — |
DeepSeek
Showing limits for DeepSeek V3 (deepseek-chat). No published per-tier limits; concurrency-based.
| Tier | Qualification | RPM | TPM | Daily limit | Notes |
|---|---|---|---|---|---|
| Pay-as-you-go | Billing enabled | — | — | — | DeepSeek doesn't publish RPM/TPM — concurrency-controlled. Expect ~60 concurrent requests. |
Mistral
Showing limits for Mistral Large 2.
| Tier | Qualification | RPM | TPM | Daily limit | Notes |
|---|---|---|---|---|---|
| Free (Experiment) | Sign up | 1 | 500,000 | 1B tokens / month | |
| Production | Billing enabled | 200 | 2,000,000 | — |
xAI
Showing limits for Grok-3.
| Tier | Qualification | RPM | TPM | Daily limit | Notes |
|---|---|---|---|---|---|
| Pay-as-you-go | Billing enabled | 60 | 240,000 | — | Higher tiers negotiated. |
How to avoid rate limits in production
- Exponential backoff on 429s — most providers tell you to retry after N seconds via
Retry-After. - Rotate across multiple API keys (and providers) — Tokenwise can do this automatically with a fallback rule.
- Cross-provider fallback (Anthropic → OpenAI → Gemini) so a single provider outage doesn’t take you down.
- Batch what you can — OpenAI, Anthropic, and Mistral offer batch endpoints at 50% cost with a 24-hour SLA.
Source: provider docs, last verified May 24, 2026. Limits change — check the provider’s page for canonical numbers, or pull the same data from /api/llm-prices.json.
Tokenwise fallbacks route around rate limits automatically — try it.