Free tool

Structured outputs support matrix

Which models support JSON mode, strict schema, function calling, parallel tools, and reasoning tokens.

JSON mode is not strict schema

The terms get used interchangeably in marketing copy, but they behave very differently in production. JSON object mode (response_format: { type: 'json_object' } on OpenAI-style APIs) guarantees the response parses as JSON — nothing more. The model still picks its own keys, types, and shape, and you still need a runtime validator. Strict JSON schema goes further: you supply a JSON Schema, and the provider constrains decoding so every response conforms exactly — no missing fields, no extra keys, no surprise enums. If your code does JSON.parse(...).user.email without checking, you want strict schema.

Function calling and tool calling are the same feature. OpenAI renamed it in 2024 and everyone else followed. The model returns a structured payload — { name, arguments }— instead of free text, and you decide whether to actually run the function. Parallel tool calls let the model fire multiple tools in one turn (e.g. fetch weather and stock price simultaneously); without it, you’re back to sequential round-trips.

Reasoning tokensare the chain-of-thought tokens that o1, o3-mini, Claude with extended thinking, Gemini Thinking, Grok-3, and DeepSeek R1 generate before they answer. On OpenAI and Anthropic they’re hidden from the response body but still billed at the output-token rate — easy to forget when forecasting cost. DeepSeek R1 returns them inline so you can read the reasoning, which is sometimes useful for debugging and sometimes a leak you don’t want.

Model	JSON object moderesponse_format: json_object	Strict JSON schemaGuaranteed schema conformance	Streaming JSONSSE chunks while generating
GPT-4oOpenAI
GPT-4o miniOpenAI
GPT-4.1OpenAI
GPT-4.1 miniOpenAI
o1OpenAIReasoning tokens are hidden but billed.			Partial
o3-miniOpenAI
Claude Opus 4.7AnthropicJSON via prefill / tool-use; extended thinking is opt-in.	Partial
Claude Sonnet 4.6Anthropic	Partial
Claude Haiku 4.5Anthropic	Partial
Claude 3.5 SonnetAnthropic	Partial
Gemini 2.0 FlashGoogleThinking variant (Flash Thinking) supports reasoning tokens.
Gemini 1.5 ProGoogle
Gemini 1.5 FlashGoogle
Grok-3xAI
Grok-2xAI
DeepSeek V3DeepSeek
DeepSeek R1DeepSeekReasoning chain is returned in the response (not hidden).
Mistral Large 2Mistral
Mistral Small 3Mistral
Llama 3.3 70B (Groq)GroqBehaviour varies by host (Together, Fireworks, Groq, Bedrock).		Partial
Llama 3.1 8B (Groq)GroqBehaviour varies by host (Together, Fireworks, Groq, Bedrock).		Partial

Picking a model for structured output

If you’re building agents that call tools and parse JSON in tight loops, GPT-4o, GPT-4.1, and Claude Sonnet 4.6 are the safe defaults — all three support strict schema, parallel tools, and streaming, and they fail in predictable ways. Reach for a reasoning model (o3-mini, Claude Sonnet 4.6 with extended thinking, Grok-3) only when the task actually benefits from deliberation; the hidden reasoning tokens turn a $0.002 call into a $0.02 call without warning. Avoid Grok-2, DeepSeek, and self-hosted Llama for production JSON pipelines unless you wrap them in your own validator and retry loop — they’ll happily return malformed objects on edge cases. And whatever you pick, validate at runtime: providers ship regressions, schema enforcement has bugs, and a single bad response can crash a worker downstream.

Source: provider docs, last verified May 24, 2026. Same data as JSON.

Track structured-output failures across providers — try Tokenwise.