Grok 4.3: xAI's Heavy Multi-Agent Engine

xAI dropped Grok 4.3 beta on April 17 with almost no announcement—no press release, no blog post, just a new option in the model selector. The silence doesn't match what's under the hood.

Architecture & Specs

Grok 4.3 runs on a confirmed 0.5 trillion parameter checkpoint with a 2-million-token context window. A 1T checkpoint was reportedly "five days away" as of launch. The model uses a 16-agent "Heavy" architecture where a leader agent coordinates sub-agents for parallel research and cross-checking—reasoning is always active, no toggling between thinking and output modes.

Spec	Value
Parameters	~0.5T (confirmed); 1T in training
Context window	2M tokens (standard); 2M (Heavy tier)
Architecture	16-agent Heavy mode
Speed	209 tok/s output
Input modalities	Text, Image, Video
Output modalities	Text, PDF, PPTX, XLSX
Release	April 17, 2026 (beta)

Benchmarks

Grok 4.3 scores 53 on the Artificial Analysis Intelligence Index v4.0—a composite of 10 evals. That puts it ahead of Muse Spark and Claude Sonnet 4.6, 4 points ahead of Grok 4.20, but trailing GPT-5.5 (xhigh) at 60.

Benchmark	Grok 4.3	Grok 4.20	Delta
GDPval-AA (ELO)	1500	1179	+321
τ²-Bench Telecom	98%	~92%	+6%
IFBench	81%	81%	0
AA-Omniscience	+8 pts	baseline	+8
Intelligence Index	53	49	+4

That GDPval-AA jump is the story. +321 ELO points means the multi-agent architecture isn't just marketing—it's translating into measurable gains on agentic tasks. Grok 4.3 now surpasses Gemini 3.1 Pro Preview, Muse Spark, and GPT-5.4 mini on that specific eval.

Pricing

Tier	Input (per 1M tok)	Output (per 1M tok)
Standard	$1.25	$2.50
+200K context surcharge	$2.50	$5.00

At $1.25/$2.50, Grok 4.3 costs roughly a quarter of Claude Opus 4.7's input price and half of GPT-5.5. The caveat: costs double if you exceed 200K tokens in a single request. The model is also notably verbose—88M tokens generated during Intelligence Index testing versus 35M average—which eats into the per-token savings.

What the Community Says

"Grok seems in general better at being 'human' in ways that are hard to define... while ChatGPT would write a dissertation on the message that still doesn't clear anything up."

Hacker News spent 470+ comments dissecting Grok 4.3. The consensus splits cleanly:

What works:

Tone and register matching. ESL users report Grok captures formality levels better than any competitor.
Speed at 209 tok/s genuinely surprised people given the 0.5T parameter count.
Voice dictation accuracy—one user reported 98% accuracy with their accent versus 90-95% for ChatGPT.

What doesn't:

No persistent memory across sessions. At $300/month for SuperGrok Heavy, this is a glaring gap.
No MCP support, no connected apps, no exportable artifacts.
API access is locked behind the enterprise product—consumer SuperGrok doesn't include it.

"ChatGPT has like 90-95% accuracy with my accent, the speech input on Android's Gboard something like 75%, Grok surprisingly gets something like 98% of my words correct."

The Elephant in the Room

Grok 4.3 arrived the day after Claude Opus 4.7. The comparison is unavoidable.

Area	Grok 4.3	Claude Opus 4.7
SWE-bench	~79%	87.6%
Intelligence Index	53	~58 (est.)
Context window	2M tokens	200K tokens
Speed	209 tok/s	~80 tok/s
Input cost	$1.25/M	$5.00/M

Grok 4.3 wins on context, speed, and price. Claude wins on coding and raw intelligence. The gap is closing but hasn't closed.

"Grok 4.3 is not a leap of that magnitude. It's an incremental update: more cost-efficient, better at agentic tasks, and marginally stronger on the composite index."

What's Missing

No model card or benchmark report. xAI hasn't published official numbers.
The 1T checkpoint is speculative. Musk confirmed it's training but didn't ship it.
Video understanding is beta quality. It works for timestamped queries but isn't production-ready for complex analysis.
All 11 original co-founders have left. The pipeline functions, but institutional memory is gone.

The verdict: Grok 4.3 isn't winning every benchmark, but it's competitive at a significantly lower price point. If your workload is agentic or needs wide context, it's worth testing. If you need top-tier coding or a complete ecosystem, Claude and GPT still hold the edge.

Sources

https://awesomeagents.ai/models/grok-4-3/ https://artificialanalysis.ai/models/grok-4-3 https://news.ycombinator.com/item?id=47972447 https://officechai.com/ai/grok-4-3-benchmarks/ https://docs.x.ai/developers/models https://docsbot.ai/models/grok-4-3

Architecture & Specs

Benchmarks

Pricing

What the Community Says

The Elephant in the Room

What's Missing

Sources

RELATED_ENTRIES

One video diffusion model to handle 30 different tasks

Your AI assistant lives in a sterile chat window. This one boots from a BIOS screen.

ComfyUI took 4 hours. This took 14 minutes on the same GPU.