
Poolside just emerged from stealth with Laguna XS.2 and M.1—two MoE language models built specifically for agentic coding workflows. The headline innovation isn't the architecture itself, but the Muon optimizer, a novel training method that achieves the same loss as AdamW in roughly 15% fewer steps.
Technical Specs
Laguna XS.2 (33B-A3B) runs 33B total parameters with only 3B activated per token—making it local-ready on consumer hardware with 36GB RAM. It ships with Apache 2.0 licensing, 128K context window, and FP8 quantized KV cache. The architecture uses 256 experts plus one shared expert, with 40 layers mixing sliding window attention (30 layers, 512-token window) and global attention (10 layers).
Laguna M.1 (225B-A23B) is the flagship—225B total, 23B active per token. Same 128K context, but weights remain closed (available on request for researchers). Both models were trained on 30T+ tokens with ~4.4T synthetic data (13% of the mix).
The Muon Optimizer
Muon replaces AdamW's two-state approach (momentum + variance) with a single-state design that applies Newton-Schulz orthogonalization to gradients. The mechanism maintains gradient diversity during training, preventing collapse that commonly occurs in MoE architectures.
Key differences:
| Aspect | AdamW | Muon |
|---|---|---|
| States per parameter | 2 | 1 |
| Memory usage | Higher | 50% reduction |
| Update mechanism | Adaptive LR | Gradient orthogonalization |
The compute overhead for orthogonalization stays under 1% of training step time, and the checkpoint sizes drop significantly. Poolside's distributed implementation batches Newton-Schulz operations across ranks with communication-compute overlap and CUDA graphs for efficiency.
Benchmarks
Laguna M.1: 72.5% SWE-bench Verified, 46.9% SWE-bench Pro, 40.7% Terminal-Bench 2.0
Laguna XS.2: 68.2% SWE-bench Verified, 44.5% SWE-bench Pro, 30.1% Terminal-Bench 2.0
The honest reporting stands out. Poolside openly acknowledges that Qwen3.6-35B-A3B (73.4% SWE-bench Verified, 51.5% Terminal-Bench) outperforms Laguna XS.2, and DeepSeek-V4-Flash (79.0% SWE-bench) leads the category. Terminal-Bench 2.0 reveals a significant gap—Laguna XS.2 scores 30.1% vs Qwen's 51.5%.
The positioning is clear: this is a Western open-weights alternative to Chinese model dominance, built for agent-first workflows with native ACP spec support.
Agent RL Training
Poolside built a fully asynchronous online RL system for long-horizon coding agents. The architecture decouples actors (running sandboxed tasks) from trainers ( consuming trajectories), with GPUDirect RDMA weight transfers moving hundreds of GB in ~5 seconds. The system uses a variant of CISPO for off-policy stability across multi-day training runs.
Availability
OpenRouter: Both models run on free tier (poolside/laguna-xs.2:free, poolside/laguna-m.1:free)
Ollama: ollama run laguna-xs.2 for local inference
HuggingFace: poolside/Laguna-XS.2 with FP8, NVFP4, and INT4 variants
Poolside Platform: Free API access at platform.poolside.ai
Community Sentiment
Hacker News reception was mixed. Users praised the fast inference and ACP spec adherence—one commenter noted it works better than Codex or OpenCode in Zed. Others criticized the benchmark position: "not winning any popular benchmark" and "quite a huge lead for Qwen" on Terminal-Bench. The consensus view: it's good to see a Western lab emerge from stealth with competitive models, even if they're not leading the leaderboard.
The real question is whether Muon's training efficiency translates to faster iteration cycles for future releases. 15% fewer steps to match AdamW performance is a genuine contribution—optimizer research has been stagnant since AdamW's 2019 introduction.
Sources: https://poolside.ai/blog/laguna-a-deeper-dive https://poolside.ai/blog/introducing-laguna-xs2-m1 https://huggingface.co/poolside/Laguna-XS.2 https://news.ycombinator.com/item?id=47936511 https://openrouter.ai/models/poolside/laguna-xs.2 https://ollama.com/library/laguna-xs.2 https://github.com/poolsideai/pool