What It Is

Mistral Small 4 dropped March 16, 2026. It's the first model to unify three previously separate products into one: Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding). One model, three modes.

The headline feature is configurable reasoning. Set reasoning_effort="none" for fast chat. Set it to "high" for deep chain-of-thought. Same deployment, adjustable per request.

Key Insight: This isn't just cost optimization. It's architectural innovation. MoE models with configurable routing have been theorized for years. Mistral shipped it.

Technical Specifications

Spec Value
Total Parameters 119B
Active Params/Token ~6.5B
Architecture MoE (128 experts, 4 active)
Context Window 256K tokens
Modalities Text + Image
License Apache 2.0 (fully open)
Output Speed 137-177 tokens/sec
Time to First Token 0.97-4.84s

Hardware Requirements

  • Minimum: 4x NVIDIA HGX H100, 2x H200, or 1x DGX B200
  • VRAM: ~60-70GB (4-bit quantized), ~240GB (16-bit)
  • Supported: vLLM, llama.cpp, SGLang, Transformers

Benchmarks

Benchmark Mistral Small 4 GPT-OSS 120B DeepSeek R1
AIME 2025 93 ~85 76.0%
LiveCodeBench 64 63 77.0%
GPQA Diamond 71.2% - 81.3%
Intelligence Index 27.8 - -
AA LCR 0.72 - -

Efficiency: On AA LCR, Mistral scores 0.72 with 1.6K characters. Qwen needs 5.8-6.1K characters for comparable performance. That's 3.5-4x shorter output.

DeepSeek R1 wins 6/6 benchmarks on raw performance. But Mistral costs 9x less. Different tradeoffs for different workflows.

Pricing

Model Input ($/1M) Output ($/1M)
Mistral Small 4 $0.15 $0.60
GPT-5.4 Mini $0.75 (5x more) $4.50 (7.5x more)
DeepSeek R1 $1.35 (9x more) $4.20
Gemini Flash-Lite $0.075 -

The value proposition: $0.15/M input is among the cheapest multimodal reasoning models available. Flash-Lite is cheaper but lacks configurable reasoning.

Community Sentiment

Reddit r/LocalLLaMA (PROS):

  • "Best open-weight small model for combined workloads"
  • "$0.60/1M output is a steal"
  • Apache 2.0 praised for commercial freedom

Reddit r/MistralAI (CONS):

  • "Kind of awful with images" (API testing feedback)
  • "Lost to Chinese/Korean/Saudi models badly"
  • Document OCR: Qwen 85.5 vs Mistral 66 (math OCR weakest)

Hacker News (#47404575):

  • "MoE models keep beating much larger dense ones"
  • "Just enough to fit onto single H100 with 4-bit quant"
  • Mixed views on benchmark trustworthiness

Known Limitations

  1. Image handling: Multiple reports of poor multimodal performance
  2. Spatial reasoning: SVG generation failures in testing
  3. Context limit: 256K vs competitors' 400K-1M+
  4. Math OCR: 66 vs Qwen 85.5 on document math
  5. Benchmark transparency: Selective publishing vs DeepSeek

Real-World Use Cases

Best For:

  • Cost-conscious high-volume deployments
  • Single-model simplicity requirements
  • Open-source/self-hosting needs (Apache 2.0)
  • EU-hosted inference (data sovereignty)
  • Variable-complexity pipelines (configurable reasoning)

Not Best For:

  • Maximum reasoning performance (use DeepSeek R1)
  • Image-intensive workflows (reported issues)
  • 256K context needs

  • Computer use/autonomous agents

The Bottom Line

Mistral Small 4 isn't trying to beat DeepSeek on raw benchmarks. It's trying to win on value. 5x cheaper input, 7.5x cheaper output, Apache 2.0 license, and the first configurable reasoning architecture shipped to production.

For enterprise buyers running millions of tokens daily, the math is straightforward. DeepSeek R1 costs $1.35/M input. Mistral Small 4 costs $0.15/M. That's $1.20 saved per million tokens. Scale that across a year.

The configurable reasoning feature is the real innovation. One model handles both fast chat and deep reasoning. No need to maintain separate deployments. No need to route requests between Magistral and Small 3.2. Same API endpoint, different reasoning_effort parameter.

March 2026 was a blitz for Mistral: 6 products in 15 days. Small 4, Voxtral TTS, Leanstral, Forge, Spaces CLI, and NVIDIA Nemotron Coalition founding membership. ARR hit $400M. Valuation $13.8B. The European OpenAI label is starting to look less like hype.