The Three-Model Stack: A Practical Setup

All my systems & weekly discoveries here 👇:

Artemis
Artemis is where I share the AI discoveries, workflows, and systems actually worth using, for creators, founders, and operators who want leverage.
https://www.skool.com/artemis-1201/about

Setup

Three frontier models dropped in one week:

April 16: Claude Opus 4.7 (Anthropic)

April 20: Kimi K2.6 (Moonshot AI, open weights)

April 23: GPT-5.5 (OpenAI)

Most people pick one and move on. That's the wrong move. The actual win is routing each task to the right model. This doc strips out the hype and gives you the practical setup.

My honest take upfront: The "300 parallel agents, 4,000 coordinated steps, replace a team of four" framing is mostly marketing. The real value is more boring and more useful: each model is genuinely best at different things, and switching between them cuts cost without losing quality on most workflows. That's worth setting up. The agentic swarm stuff is real but fragile, treat it as bonus, not core.

The Only Rule That Matters

Task	Model	Why
Bulk coding, scaffolding, overnight runs, batch generation	Kimi K2.6	Cheapest by a lot, open weights, strong on long-horizon coding
Production code, vision, legal/enterprise docs, anything precision-sensitive	Claude Opus 4.7	Leads on SWE-Bench Pro and most real-world code quality tests
Math, deep web research, computer use / GUI navigation	GPT-5.5	Leads on FrontierMath, Terminal-Bench 2.0, OSWorld-Verified

Five seconds of routing per task. That's the whole strategy.

What Each Model Actually Is

Kimi K2.6 (the cheap workhorse)

Released: April 20, 2026, open weights under modified MIT

Pricing: ~$0.60 input / $2.50 output per 1M tokens (roughly 8x cheaper than Opus)

Architecture: 1T total parameters, 32B active per token, 256k context

Strengths: Long-horizon coding, agent swarm coordination, front-end generation from sketches, native video input

Weaknesses: Lags on single-turn high-stakes reasoning, weaker on math (GPQA, AIME) than GPT-5.5, vendor jurisdiction matters for some compliance contexts

Claude Opus 4.7 (the senior engineer)

Released: April 16, 2026

Pricing: $5 input / $25 output per 1M tokens

Strengths: Leads SWE-Bench Pro at 64.3% (~6 pts ahead of Kimi and GPT-5.5), strong vision, best-in-class on legal and enterprise documents, fewer errors than predecessor on source-grounded work

Weaknesses: Expensive, slightly weaker on web research vs GPT-5.5

GPT-5.5 (the researcher and operator)

Released: April 23, 2026

Pricing: $5 input / $30 output per 1M tokens (Pro: $30/$180)

Context: 1M tokens in API

Strengths: Leads on Terminal-Bench 2.0 (82.7% vs Opus 69.4%), OSWorld-Verified for computer use (78.7%), FrontierMath, long-context retrieval

Weaknesses: Per Artificial Analysis, hallucination rate is significantly higher than Opus 4.7 on AA-Omniscience. Loses to Opus on real-world code quality.

How to Actually Set This Up

Three options, ranked by effort.

Option 1: Manual routing (free, works today)

Three questions before every task:

Bulk coding or autonomous work? → Kimi

Production-perfect, vision, or legal? → Opus 4.7

Math, web research, or computer use? → GPT-5.5

That's it. Five seconds per task. Most people stop here and that's fine.

Option 2: Claude Code Router

Run Claude Code's interface but route requests to whichever model fits.

Repo: github.com/musistudio/claude-code-router

Lets you keep the Claude Code agent loop and swap the brain underneath

Routes through OpenRouter so you can hit Kimi, GPT-5.5, or any model with one config

Option 3: Auto-routing service

coderouter.io automatically picks the model per API call. No configuration. Useful if you don't want to think about routing at all and you trust someone else's heuristics.

Repos Worth Bookmarking

Cutting the original list down to what actually matters:

For Kimi K2.6:

github.com/moonshotai/Kimi-K2 — official repo, weights, deployment guides

github.com/chongdashu/cc-kimi-k2-thinking-prompts — use Kimi through Claude Code's CLI by swapping one env var

For Opus 4.7:

github.com/Piebald-AI/claude-code-system-prompts — full Claude Code system prompt + 24 built-in tool descriptions

For routing across all three:

github.com/musistudio/claude-code-router — single interface, three brains

github.com/asgeirtj/system_prompts_leaks — leaked system prompts for all three models, useful for understanding how each company shapes behavior

The other repos in the original article are mostly noise. These five cover 90% of what you'll actually use.

Three System Prompts to Install

One per model. Save somewhere accessible. Paste at session start or set as persistent system prompt.

For Kimi K2.6 (bulk work and agents)

You are a senior engineer focused on implementation speed and correctness.

Your job: build exactly what is asked, nothing more, nothing less.

Rules:
- Read the full context before writing a single line
- Make surgical changes only, touch nothing adjacent to the task
- If you see a better approach, say so before building
- Validate your changes against existing logic before responding
- Every changed line must trace directly to the task
- For long-horizon tasks, state your plan and verify each step before moving to the next

When running as an agent:
- Report progress every 30 steps
- Flag blockers immediately instead of working around them silently
- If a subtask fails, pause and surface it rather than continuing

Success: the change works, nothing else broke, every step is traceable.

For Claude Opus 4.7 (production work)

You are a senior engineer and architect working on production systems where correctness matters more than speed.

Your job: produce work that is right on the first pass, not fast on the first draft.

Rules:
- Identify what is actually being asked, not just what was literally said
- If multiple interpretations exist, name them and ask before proceeding
- Apply the simplest solution that fully solves the problem
- Flag assumptions explicitly before building on them
- If the approach is wrong, say so before building it
- Touch only what the task requires, no drive-by improvements
- Verify your output against the existing logic before responding

For documents and legal content:
- Flag any claim that requires a specific source to be trustworthy
- Distinguish clearly between what is established and what is interpretation
- Never soften or hedge a clear finding to avoid discomfort

Success: the output could go directly to production or publication without revision.

For GPT-5.5 (research and computer use)

You are a senior research analyst and systems operator.

Your job: find the right answer fast and act on it without hand-holding.

Rules:
- Lead with the answer or finding, support it after
- Use specific numbers and named sources, not generalities
- Distinguish clearly between established fact, contested claim, and your interpretation
- Flag uncertainty explicitly, never bury gaps in vague language
- Do not include information that does not serve the question
- When operating tools or interfaces, state your action before taking it and report the result after

For web research:
- Prioritize primary sources over aggregators
- If sources conflict, name the conflict and explain which you trust more
- Do not present a single source as settled consensus

Success: I can explain the key finding to someone else after reading it and act on it without searching for anything else.

What to Actually Do This Week

Don't overthink the setup. Pick one workflow you do often, route it, see if quality holds.

Day 1: Sign up for Kimi API at platform.moonshot.ai. Run one task you'd normally give Opus through Kimi instead. Compare output. (Most likely outcome: 80% as good for 12% of the cost on that task.)

Day 2: Add the three system prompts above to your tool of choice (Cursor, Claude Code, ChatGPT custom instructions, whatever).

Day 3: Pick one bulk task you've been putting off (research, scaffolding, batch generation). Run it through Kimi overnight. Review in the morning.

Day 4-7: Use the routing table. Notice when you reach for the wrong model out of habit.

That's it. No "your workflow is permanently different" promises. Just: you'll spend less, and you'll have the right tool for each kind of work.