You burn premium model tokens reading files. Your main agent is stuck doing file I/O when it should be reasoning.
Before: I'd ask Claude to "analyze this project" and watch my token meter drop $0.50 before it even started thinking. Heavy weeks meant hitting Pro limits by Thursday.
After: I send ~50 tokens to a $0.02/call Kimi K2.5 coworker. It reads files, maps structure, returns a compact summary. Claude Opus sees a 2-page brief instead of 50 pages of raw code.
What broke: One time I delegated 8 tasks to Kimi in batch without checking mid-way. 30 minutes later I had 8,500 lines of output. 61% was garbage because Kimi followed my extraction rules too literally—it pulled the wrong fields from malformed CSV headers. The fix: check outputs every 2-3 delegated tasks.
What I learned: Delegation works for mechanical work (file reads, pattern matching, scaffolding). It fails when you batch without feedback. The sweet spot: one task at a time, verify, then move on.
The Setup: Kimi K2.5 as Your $0.02/Call Assistant
Moonshot's Kimi K2.5 runs $0.60/M input tokens, $2.50/M output. That's 90% cheaper than Claude Opus and 5x cheaper than Sonnet.
Configuration options:
Option A — Environment variables (quickest):
export ANTHROPIC_BASE_URL="https://api.moonshot.ai/anthropic"
export ANTHROPIC_AUTH_TOKEN="${YOUR_MOONSHOT_API_KEY}"
export ANTHROPIC_MODEL="kimi-k2.5"
export ANTHROPIC_SMALL_FAST_MODEL="kimi-k2.5"
Option B — Persistent settings (~/.claude/settings.json):
{
"env": {
"ANTHROPIC_BASE_URL": "https://api.moonshot.ai/anthropic",
"ANTHROPIC_AUTH_TOKEN": "YOUR_KEY",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
}
What surprised me: The environment variable method worked instantly. No config file edits. Just set and restart Claude Code.
Tasks That Belong on the Cheap Coworker
Good for delegation:
- File reading and exploration — mechanical, no reasoning needed
- Codebase mapping — pattern matching, not architecture decisions
- Dependency analysis — follow import chains, report back
- Test scaffolding — template-based, low creativity
- Documentation summaries — formatting work
Keep on Opus/Sonnet:
- Architecture decisions
- Complex debugging with security implications
- Multi-file refactoring that needs coordination
- Code review where context matters
Real cost impact: 100 coding sessions cost me $25.80 with Claude Sonnet. Switching mechanical tasks to Kimi dropped that to $3.00. Same output quality for the reasoning parts.
What Actually Broke (The Honest Part)
The 8,500-line disaster taught me three things:
Batch delegation without checkpoints = garbage accumulation. I asked Kimi to extract data from 8 similar CSVs. It pulled wrong fields from malformed headers. By task 5, the schema drift was severe.
AI follows rules literally, even wrong ones. My extraction regex was slightly off. Kimi applied it perfectly—to the wrong columns.
Single-agent consistency often beats division of labor. For complex extraction, one careful pass by Opus was faster than debugging Kimi's batch outputs.
So What
The takeaway isn't "use Kimi everywhere." It's: stop burning premium reasoning tokens on mechanical tasks.
Your Opus/Sonnet quota is the expensive resource. Protect it. Send file reads, searches, and scaffolding to cheaper models. Keep the architecture decisions, debugging, and code review where context matters.
Try this: next time Claude starts reading 30 files for a simple refactor, stop it. Send a cheap coworker to summarize. Then give Claude the brief. Watch your token usage.
Sources
https://www.reddit.com/r/ClaudeAI/comments/1t1o43w/i_gave_claude_code_a_002call_coworker_and_stopped/ https://platform.moonshot.ai/docs/guide/agent-support https://kimi-k25.com/blog/kimi-k2-5-claude-code https://dev.to/shimo4228/kimi-wrote-8500-lines