Gemma 4: Google's Open Multimodal That Runs on Your Phone

Google DeepMind just dropped Gemma 4—and for the first time, you can run a frontier-level multimodal model on your phone. Apache 2.0 licensed.

Released April 2, 2026, Gemma 4 comes in four variants: two Edge models for mobile (E2B, E4B) and two Workstation models for servers (26B MoE, 31B Dense).

The technical breakthrough

Gemma 4 introduces Per-Layer Embeddings (PLE) on Edge models—adding representational depth without increasing active compute. The E2B runs on 2.3B active parameters while maintaining 5.1B total capacity.

Key specs:

E2B/E4B: 128K context, text + image + audio (up to 30s)
26B MoE: 3.8B active params, 256K context, text + image + video (60s)
31B Dense: Full 31B active, same modalities

All models feature Hybrid Attention (alternating local and global) and Dual RoPE Regime for efficient context handling. Built-in thinking mode for chain-of-thought reasoning.

Benchmarks that matter

Benchmark	Gemma 4 31B	Gemma 3 27B
MMLU Pro	85.2%	67.6%
AIME 2026	89.2%	20.8%

The 31B model hits #3 on Arena AI Leaderboard for open models.

Why it matters

140+ languages. Native function calling. Structured JSON output. System role support. Video understanding on Workstation models.

But the real story is deployment: E2B runs on phones. E4B runs on laptops. Apache 2.0 means no commercial restrictions.

Google just made frontier multimodal AI accessible to everyone—not just those with GPU clusters.

RELATED_ENTRIES

One video diffusion model to handle 30 different tasks

Your AI assistant lives in a sterile chat window. This one boots from a BIOS screen.

ComfyUI took 4 hours. This took 14 minutes on the same GPU.