What It Is
Mistral Small 4 dropped March 16, 2026. It's the first model to unify three previously separate products into one: Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding). One model, three modes.
The headline feature is configurable reasoning. Set reasoning_effort="none" for fast chat. Set it to "high" for deep chain-of-thought. Same deployment, adjustable per request.
Key Insight: This isn't just cost optimization. It's architectural innovation. MoE models with configurable routing have been theorized for years. Mistral shipped it.
Technical Specifications
| Spec | Value |
|---|---|
| Total Parameters | 119B |
| Active Params/Token | ~6.5B |
| Architecture | MoE (128 experts, 4 active) |
| Context Window | 256K tokens |
| Modalities | Text + Image |
| License | Apache 2.0 (fully open) |
| Output Speed | 137-177 tokens/sec |
| Time to First Token | 0.97-4.84s |
Hardware Requirements
- Minimum: 4x NVIDIA HGX H100, 2x H200, or 1x DGX B200
- VRAM: ~60-70GB (4-bit quantized), ~240GB (16-bit)
- Supported: vLLM, llama.cpp, SGLang, Transformers
Benchmarks
| Benchmark | Mistral Small 4 | GPT-OSS 120B | DeepSeek R1 |
|---|---|---|---|
| AIME 2025 | 93 | ~85 | 76.0% |
| LiveCodeBench | 64 | 63 | 77.0% |
| GPQA Diamond | 71.2% | - | 81.3% |
| Intelligence Index | 27.8 | - | - |
| AA LCR | 0.72 | - | - |
Efficiency: On AA LCR, Mistral scores 0.72 with 1.6K characters. Qwen needs 5.8-6.1K characters for comparable performance. That's 3.5-4x shorter output.
DeepSeek R1 wins 6/6 benchmarks on raw performance. But Mistral costs 9x less. Different tradeoffs for different workflows.
Pricing
| Model | Input ($/1M) | Output ($/1M) |
|---|---|---|
| Mistral Small 4 | $0.15 | $0.60 |
| GPT-5.4 Mini | $0.75 (5x more) | $4.50 (7.5x more) |
| DeepSeek R1 | $1.35 (9x more) | $4.20 |
| Gemini Flash-Lite | $0.075 | - |
The value proposition: $0.15/M input is among the cheapest multimodal reasoning models available. Flash-Lite is cheaper but lacks configurable reasoning.
Community Sentiment
Reddit r/LocalLLaMA (PROS):
- "Best open-weight small model for combined workloads"
- "$0.60/1M output is a steal"
- Apache 2.0 praised for commercial freedom
Reddit r/MistralAI (CONS):
- "Kind of awful with images" (API testing feedback)
- "Lost to Chinese/Korean/Saudi models badly"
- Document OCR: Qwen 85.5 vs Mistral 66 (math OCR weakest)
Hacker News (#47404575):
- "MoE models keep beating much larger dense ones"
- "Just enough to fit onto single H100 with 4-bit quant"
- Mixed views on benchmark trustworthiness
Known Limitations
- Image handling: Multiple reports of poor multimodal performance
- Spatial reasoning: SVG generation failures in testing
- Context limit: 256K vs competitors' 400K-1M+
- Math OCR: 66 vs Qwen 85.5 on document math
- Benchmark transparency: Selective publishing vs DeepSeek
Real-World Use Cases
Best For:
- Cost-conscious high-volume deployments
- Single-model simplicity requirements
- Open-source/self-hosting needs (Apache 2.0)
- EU-hosted inference (data sovereignty)
- Variable-complexity pipelines (configurable reasoning)
Not Best For:
- Maximum reasoning performance (use DeepSeek R1)
- Image-intensive workflows (reported issues)
256K context needs
- Computer use/autonomous agents
The Bottom Line
Mistral Small 4 isn't trying to beat DeepSeek on raw benchmarks. It's trying to win on value. 5x cheaper input, 7.5x cheaper output, Apache 2.0 license, and the first configurable reasoning architecture shipped to production.
For enterprise buyers running millions of tokens daily, the math is straightforward. DeepSeek R1 costs $1.35/M input. Mistral Small 4 costs $0.15/M. That's $1.20 saved per million tokens. Scale that across a year.
The configurable reasoning feature is the real innovation. One model handles both fast chat and deep reasoning. No need to maintain separate deployments. No need to route requests between Magistral and Small 3.2. Same API endpoint, different reasoning_effort parameter.
March 2026 was a blitz for Mistral: 6 products in 15 days. Small 4, Voxtral TTS, Leanstral, Forge, Spaces CLI, and NVIDIA Nemotron Coalition founding membership. ARR hit $400M. Valuation $13.8B. The European OpenAI label is starting to look less like hype.