Free tool

LLM API rate limits cheatsheet

RPM and TPM by provider and tier. Bookmark this. Avoid the 429.

← Back to free tools

Hit a 429? Here’s where each provider’s tiers sit, what you spend to move up, and how to plan around them. Numbers below come from each provider’s public rate-limit docs, focused on the most-used model per provider. Most providers cap on three axes at once — RPM, TPM, and a daily or spend ceiling — so the actual limit you hit is whichever runs out first.

OpenAI

Showing limits for GPT-4o. o1 / o3 models have lower per-model RPM caps (e.g., o1 starts at 500 RPM on Tier 1). Mini variants get higher TPM.

TierQualificationRPMTPMDaily limitNotes
Tier 1$5 spent + 7 days50030,000200 requests
Tier 2$50 spent + 7 days5,000450,000
Tier 3$100 spent + 7 days5,000800,000
Tier 4$250 spent + 14 days10,0002,000,000
Tier 5$1,000 spent + 30 days30,00030,000,000

Anthropic

Showing limits for Claude Sonnet 4.6. Opus has half the TPM at the same tier. Haiku has 2× TPM.

TierQualificationRPMTPMDaily limitNotes
Build Tier 1Credit card added5040,000
Build Tier 2$40 spent + 7 days1,00080,000
Build Tier 3$200 spent + 7 days2,000160,000
Build Tier 4$400 spent + 14 days4,000400,000
ScaleSales contractCustom limits, negotiated.

Google

Showing limits for Gemini 2.0 Flash. Gemini 1.5 Pro has lower RPM (300 on Tier 1).

TierQualificationRPMTPMDaily limitNotes
FreeGoogle account151,000,0001,500 requestsUsed for training by default.
Tier 1Billing enabled2,0004,000,000
Tier 2$250 spent + 30 days10,00010,000,000

Groq

Showing limits for Llama 3.3 70B.

TierQualificationRPMTPMDaily limitNotes
FreeSign up306,00014,400 requests
Pay-as-you-goBilling enabled1,000300,000

DeepSeek

Showing limits for DeepSeek V3 (deepseek-chat). No published per-tier limits; concurrency-based.

TierQualificationRPMTPMDaily limitNotes
Pay-as-you-goBilling enabledDeepSeek doesn't publish RPM/TPM — concurrency-controlled. Expect ~60 concurrent requests.

Mistral

Showing limits for Mistral Large 2.

TierQualificationRPMTPMDaily limitNotes
Free (Experiment)Sign up1500,0001B tokens / month
ProductionBilling enabled2002,000,000

xAI

Showing limits for Grok-3.

TierQualificationRPMTPMDaily limitNotes
Pay-as-you-goBilling enabled60240,000Higher tiers negotiated.

How to avoid rate limits in production

  • Exponential backoff on 429s — most providers tell you to retry after N seconds via Retry-After.
  • Rotate across multiple API keys (and providers) — Tokenwise can do this automatically with a fallback rule.
  • Cross-provider fallback (Anthropic → OpenAI → Gemini) so a single provider outage doesn’t take you down.
  • Batch what you can — OpenAI, Anthropic, and Mistral offer batch endpoints at 50% cost with a 24-hour SLA.

Source: provider docs, last verified May 24, 2026. Limits change — check the provider’s page for canonical numbers, or pull the same data from /api/llm-prices.json.

Tokenwise fallbacks route around rate limits automatically — try it.