xAI dropped Grok 4.3 beta on April 17 with almost no announcement—no press release, no blog post, just a new option in the model selector. The silence doesn't match what's under the hood.

Architecture & Specs

Grok 4.3 runs on a confirmed 0.5 trillion parameter checkpoint with a 2-million-token context window. A 1T checkpoint was reportedly "five days away" as of launch. The model uses a 16-agent "Heavy" architecture where a leader agent coordinates sub-agents for parallel research and cross-checking—reasoning is always active, no toggling between thinking and output modes.

Spec Value
Parameters ~0.5T (confirmed); 1T in training
Context window 2M tokens (standard); 2M (Heavy tier)
Architecture 16-agent Heavy mode
Speed 209 tok/s output
Input modalities Text, Image, Video
Output modalities Text, PDF, PPTX, XLSX
Release April 17, 2026 (beta)

Benchmarks

Grok 4.3 scores 53 on the Artificial Analysis Intelligence Index v4.0—a composite of 10 evals. That puts it ahead of Muse Spark and Claude Sonnet 4.6, 4 points ahead of Grok 4.20, but trailing GPT-5.5 (xhigh) at 60.

Benchmark Grok 4.3 Grok 4.20 Delta
GDPval-AA (ELO) 1500 1179 +321
τ²-Bench Telecom 98% ~92% +6%
IFBench 81% 81% 0
AA-Omniscience +8 pts baseline +8
Intelligence Index 53 49 +4

That GDPval-AA jump is the story. +321 ELO points means the multi-agent architecture isn't just marketing—it's translating into measurable gains on agentic tasks. Grok 4.3 now surpasses Gemini 3.1 Pro Preview, Muse Spark, and GPT-5.4 mini on that specific eval.

Pricing

Tier Input (per 1M tok) Output (per 1M tok)
Standard $1.25 $2.50
+200K context surcharge $2.50 $5.00

At $1.25/$2.50, Grok 4.3 costs roughly a quarter of Claude Opus 4.7's input price and half of GPT-5.5. The caveat: costs double if you exceed 200K tokens in a single request. The model is also notably verbose—88M tokens generated during Intelligence Index testing versus 35M average—which eats into the per-token savings.

What the Community Says

"Grok seems in general better at being 'human' in ways that are hard to define... while ChatGPT would write a dissertation on the message that still doesn't clear anything up."

Hacker News spent 470+ comments dissecting Grok 4.3. The consensus splits cleanly:

What works:

  • Tone and register matching. ESL users report Grok captures formality levels better than any competitor.
  • Speed at 209 tok/s genuinely surprised people given the 0.5T parameter count.
  • Voice dictation accuracy—one user reported 98% accuracy with their accent versus 90-95% for ChatGPT.

What doesn't:

  • No persistent memory across sessions. At $300/month for SuperGrok Heavy, this is a glaring gap.
  • No MCP support, no connected apps, no exportable artifacts.
  • API access is locked behind the enterprise product—consumer SuperGrok doesn't include it.

"ChatGPT has like 90-95% accuracy with my accent, the speech input on Android's Gboard something like 75%, Grok surprisingly gets something like 98% of my words correct."

The Elephant in the Room

Grok 4.3 arrived the day after Claude Opus 4.7. The comparison is unavoidable.

Area Grok 4.3 Claude Opus 4.7
SWE-bench ~79% 87.6%
Intelligence Index 53 ~58 (est.)
Context window 2M tokens 200K tokens
Speed 209 tok/s ~80 tok/s
Input cost $1.25/M $5.00/M

Grok 4.3 wins on context, speed, and price. Claude wins on coding and raw intelligence. The gap is closing but hasn't closed.

"Grok 4.3 is not a leap of that magnitude. It's an incremental update: more cost-efficient, better at agentic tasks, and marginally stronger on the composite index."

What's Missing

  • No model card or benchmark report. xAI hasn't published official numbers.
  • The 1T checkpoint is speculative. Musk confirmed it's training but didn't ship it.
  • Video understanding is beta quality. It works for timestamped queries but isn't production-ready for complex analysis.
  • All 11 original co-founders have left. The pipeline functions, but institutional memory is gone.

The verdict: Grok 4.3 isn't winning every benchmark, but it's competitive at a significantly lower price point. If your workload is agentic or needs wide context, it's worth testing. If you need top-tier coding or a complete ecosystem, Claude and GPT still hold the edge.

Sources

https://awesomeagents.ai/models/grok-4-3/ https://artificialanalysis.ai/models/grok-4-3 https://news.ycombinator.com/item?id=47972447 https://officechai.com/ai/grok-4-3-benchmarks/ https://docs.x.ai/developers/models https://docsbot.ai/models/grok-4-3