Free tool

LLM context window comparison

What fits in 128K vs 1M vs 2M tokens? A visual reference for picking the right model.

How big is your context, really?

A token is roughly three-quarters of an English word —context_words ≈ context_tokens × 0.75. So a 200K token window holds about 150,000 words: one full-length novel, or every README in a mid-sized monorepo, or roughly 600 pages of dense technical PDFs. A 2M token window starts to feel ridiculous — that’s the full Harry Potter series with half of it left over.

Context size matters in three places. RAG depth: the more chunks you can stuff in, the less your retrieval has to be perfect. Agent step count: every tool call, observation, and reasoning step accumulates — agents that think for 20+ turns burn through 32K windows before they finish the task. Long-document work: summarizing a contract, refactoring across files, or answering questions over a research corpus is bottlenecked by the longest single thing you can show the model.

Bigger isn’t free. Most models charge per input token, so a 1M-token prompt at $3/M costs $3 per call before you’ve read a word of output. And recall degrades past ~64K tokens for most models — “needle in a haystack” benchmarks hide the fact that real reasoning over 500K tokens is still hit-or-miss. Pick the smallest window that fits your job.

Mistral Small 3

Mistral

32,000 tokens≈ 24K words

~24,000 words — a long magazine feature or a short novella.

GPT-4o

OpenAI

128,000 tokens≈ 96K words

~96,000 words — The Great Gatsby plus Animal Farm.

GPT-4o mini

OpenAI

128,000 tokens≈ 96K words

~96,000 words — a full short novel with room to spare.

Grok-2

xAI

128,000 tokens≈ 96K words

~96,000 words — a full short novel with room to spare.

DeepSeek V3

DeepSeek

128,000 tokens≈ 96K words

~96,000 words — Slaughterhouse-Five end to end, twice.

DeepSeek R1

DeepSeek

128,000 tokens≈ 96K words

~96,000 words — one mid-sized novel.

Mistral Large 2

Mistral

128,000 tokens≈ 96K words

~96,000 words — about one mid-length novel.

Llama 3.3 70B (Groq)

Groq

128,000 tokens≈ 96K words

~96,000 words — a paperback novel cover to cover.

Llama 3.1 8B (Groq)

Groq

128,000 tokens≈ 96K words

~96,000 words — a paperback novel cover to cover.

o1

OpenAI

200,000 tokens≈ 150K words

~150,000 words — a full novel (think The Goldfinch).

o3-mini

OpenAI

200,000 tokens≈ 150K words

~150,000 words — a novel-length input with room to think.

Claude Sonnet 4.6

Anthropic

200,000 tokens≈ 150K words

~150,000 words — a novel, or every Stripe API doc combined.

Claude Haiku 4.5

Anthropic

200,000 tokens≈ 150K words

~150,000 words — a novel-length input at draft-speed cost.

Claude 3.5 Sonnet

Anthropic

200,000 tokens≈ 150K words

~150,000 words — one full novel.

GPT-4.1

OpenAI

1,000,000 tokens≈ 750K words

~750,000 words — War and Peace plus Anna Karenina.

GPT-4.1 mini

OpenAI

1,000,000 tokens≈ 750K words

~750,000 words — War and Peace plus Anna Karenina.

Claude Opus 4.7

Anthropic

1,000,000 tokens≈ 750K words

~750,000 words — all of Tolstoy's major novels in one prompt.

Gemini 2.0 Flash

Google

1,000,000 tokens≈ 750K words

~750,000 words — the entire Lord of the Rings trilogy, twice.

Gemini 1.5 Flash

Google

1,000,000 tokens≈ 750K words

~750,000 words — Lord of the Rings + The Hobbit, with margin.

Grok-3

xAI

1,000,000 tokens≈ 750K words

~750,000 words — every Sherlock Holmes story written, plus extras.

Gemini 1.5 Pro

Google

2,000,000 tokens≈ 1.50M words

~1.5M words — all seven Harry Potter books × 1.5.

Source: provider docs, last verified May 24, 2026. 1 token ≈ 0.75 English words. Bars use a log scale from 8K to 2M tokens. Same data as JSON.

Hitting context limits in production? Tokenwise shows you which calls bloat the most — and which ones could move to a smaller, cheaper window without losing quality.