Google DeepMind just dropped Gemma 4—and for the first time, you can run a frontier-level multimodal model on your phone. Apache 2.0 licensed.
Released April 2, 2026, Gemma 4 comes in four variants: two Edge models for mobile (E2B, E4B) and two Workstation models for servers (26B MoE, 31B Dense).
The technical breakthrough
Gemma 4 introduces Per-Layer Embeddings (PLE) on Edge models—adding representational depth without increasing active compute. The E2B runs on 2.3B active parameters while maintaining 5.1B total capacity.
Key specs:
- E2B/E4B: 128K context, text + image + audio (up to 30s)
- 26B MoE: 3.8B active params, 256K context, text + image + video (60s)
- 31B Dense: Full 31B active, same modalities
All models feature Hybrid Attention (alternating local and global) and Dual RoPE Regime for efficient context handling. Built-in thinking mode for chain-of-thought reasoning.
Benchmarks that matter
| Benchmark | Gemma 4 31B | Gemma 3 27B |
|---|---|---|
| MMLU Pro | 85.2% | 67.6% |
| AIME 2026 | 89.2% | 20.8% |
The 31B model hits #3 on Arena AI Leaderboard for open models.
Why it matters
140+ languages. Native function calling. Structured JSON output. System role support. Video understanding on Workstation models.
But the real story is deployment: E2B runs on phones. E4B runs on laptops. Apache 2.0 means no commercial restrictions.
Google just made frontier multimodal AI accessible to everyone—not just those with GPU clusters.