Google DeepMind just dropped Gemma 4—and for the first time, you can run a frontier-level multimodal model on your phone. Apache 2.0 licensed.

Released April 2, 2026, Gemma 4 comes in four variants: two Edge models for mobile (E2B, E4B) and two Workstation models for servers (26B MoE, 31B Dense).

The technical breakthrough

Gemma 4 introduces Per-Layer Embeddings (PLE) on Edge models—adding representational depth without increasing active compute. The E2B runs on 2.3B active parameters while maintaining 5.1B total capacity.

Key specs:

  • E2B/E4B: 128K context, text + image + audio (up to 30s)
  • 26B MoE: 3.8B active params, 256K context, text + image + video (60s)
  • 31B Dense: Full 31B active, same modalities

All models feature Hybrid Attention (alternating local and global) and Dual RoPE Regime for efficient context handling. Built-in thinking mode for chain-of-thought reasoning.

Benchmarks that matter

Benchmark Gemma 4 31B Gemma 3 27B
MMLU Pro 85.2% 67.6%
AIME 2026 89.2% 20.8%

The 31B model hits #3 on Arena AI Leaderboard for open models.

Why it matters

140+ languages. Native function calling. Structured JSON output. System role support. Video understanding on Workstation models.

But the real story is deployment: E2B runs on phones. E4B runs on laptops. Apache 2.0 means no commercial restrictions.

Google just made frontier multimodal AI accessible to everyone—not just those with GPU clusters.