xAI dropped Grok 4.3 beta on April 17 with almost no announcement—no press release, no blog post, just a new option in the model selector. The silence doesn't match what's under the hood.
Architecture & Specs
Grok 4.3 runs on a confirmed 0.5 trillion parameter checkpoint with a 2-million-token context window. A 1T checkpoint was reportedly "five days away" as of launch. The model uses a 16-agent "Heavy" architecture where a leader agent coordinates sub-agents for parallel research and cross-checking—reasoning is always active, no toggling between thinking and output modes.
| Spec | Value |
|---|---|
| Parameters | ~0.5T (confirmed); 1T in training |
| Context window | 2M tokens (standard); 2M (Heavy tier) |
| Architecture | 16-agent Heavy mode |
| Speed | 209 tok/s output |
| Input modalities | Text, Image, Video |
| Output modalities | Text, PDF, PPTX, XLSX |
| Release | April 17, 2026 (beta) |
Benchmarks
Grok 4.3 scores 53 on the Artificial Analysis Intelligence Index v4.0—a composite of 10 evals. That puts it ahead of Muse Spark and Claude Sonnet 4.6, 4 points ahead of Grok 4.20, but trailing GPT-5.5 (xhigh) at 60.
| Benchmark | Grok 4.3 | Grok 4.20 | Delta |
|---|---|---|---|
| GDPval-AA (ELO) | 1500 | 1179 | +321 |
| τ²-Bench Telecom | 98% | ~92% | +6% |
| IFBench | 81% | 81% | 0 |
| AA-Omniscience | +8 pts | baseline | +8 |
| Intelligence Index | 53 | 49 | +4 |
That GDPval-AA jump is the story. +321 ELO points means the multi-agent architecture isn't just marketing—it's translating into measurable gains on agentic tasks. Grok 4.3 now surpasses Gemini 3.1 Pro Preview, Muse Spark, and GPT-5.4 mini on that specific eval.
Pricing
| Tier | Input (per 1M tok) | Output (per 1M tok) |
|---|---|---|
| Standard | $1.25 | $2.50 |
| +200K context surcharge | $2.50 | $5.00 |
At $1.25/$2.50, Grok 4.3 costs roughly a quarter of Claude Opus 4.7's input price and half of GPT-5.5. The caveat: costs double if you exceed 200K tokens in a single request. The model is also notably verbose—88M tokens generated during Intelligence Index testing versus 35M average—which eats into the per-token savings.
What the Community Says
"Grok seems in general better at being 'human' in ways that are hard to define... while ChatGPT would write a dissertation on the message that still doesn't clear anything up."
Hacker News spent 470+ comments dissecting Grok 4.3. The consensus splits cleanly:
What works:
- Tone and register matching. ESL users report Grok captures formality levels better than any competitor.
- Speed at 209 tok/s genuinely surprised people given the 0.5T parameter count.
- Voice dictation accuracy—one user reported 98% accuracy with their accent versus 90-95% for ChatGPT.
What doesn't:
- No persistent memory across sessions. At $300/month for SuperGrok Heavy, this is a glaring gap.
- No MCP support, no connected apps, no exportable artifacts.
- API access is locked behind the enterprise product—consumer SuperGrok doesn't include it.
"ChatGPT has like 90-95% accuracy with my accent, the speech input on Android's Gboard something like 75%, Grok surprisingly gets something like 98% of my words correct."
The Elephant in the Room
Grok 4.3 arrived the day after Claude Opus 4.7. The comparison is unavoidable.
| Area | Grok 4.3 | Claude Opus 4.7 |
|---|---|---|
| SWE-bench | ~79% | 87.6% |
| Intelligence Index | 53 | ~58 (est.) |
| Context window | 2M tokens | 200K tokens |
| Speed | 209 tok/s | ~80 tok/s |
| Input cost | $1.25/M | $5.00/M |
Grok 4.3 wins on context, speed, and price. Claude wins on coding and raw intelligence. The gap is closing but hasn't closed.
"Grok 4.3 is not a leap of that magnitude. It's an incremental update: more cost-efficient, better at agentic tasks, and marginally stronger on the composite index."
What's Missing
- No model card or benchmark report. xAI hasn't published official numbers.
- The 1T checkpoint is speculative. Musk confirmed it's training but didn't ship it.
- Video understanding is beta quality. It works for timestamped queries but isn't production-ready for complex analysis.
- All 11 original co-founders have left. The pipeline functions, but institutional memory is gone.
The verdict: Grok 4.3 isn't winning every benchmark, but it's competitive at a significantly lower price point. If your workload is agentic or needs wide context, it's worth testing. If you need top-tier coding or a complete ecosystem, Claude and GPT still hold the edge.
Sources
https://awesomeagents.ai/models/grok-4-3/ https://artificialanalysis.ai/models/grok-4-3 https://news.ycombinator.com/item?id=47972447 https://officechai.com/ai/grok-4-3-benchmarks/ https://docs.x.ai/developers/models https://docsbot.ai/models/grok-4-3