Top 1% Upwork (8 years) 286+ client deployments 2,036+ projects shipped GoHighLevel Certified Partner Featured speaker: GHL Summit 2025 Client Login
← All issues
The Scale Brief · Issue #147

Your agent is paying frontier-model rates
to do intent classification.

Pull your agent's last week of LLM-API spend and bucket it by turn-type. I'll wait.

Most of your agent's calls don't need a frontier model. They need a router. The user says "what's my balance" — that's a 50-token decision: which tool to invoke + which arguments to pass. You're paying gpt-4-class rates to make a decision a Haiku-class model would get right almost every time.

The pattern that fixes this is mostly named after itself: cheap planner, expensive executor.

The shape

Two LLM calls per turn instead of one:

  1. Planner: small fast model (sonnet-haiku, gpt-4.1-nano, gemini-flash). Input is the user's message + a compressed system prompt that lists available intents/tools. Output is a structured JSON object: { intent, tool, args, needs_reasoning }. ~200-500 tokens out.
  2. Executor: depends on needs_reasoning. If false, run the deterministic tool directly + format the response with a cheap-model pass. If true, hand off to the frontier model with only the relevant slice of context.

The math:

The implementation

Here's the planner in 30 lines. The trick is the structured-output format: tool-use schema or a strict JSON schema. Don't free-text it — the planner is supposed to be cheap and predictable, not creative.

const PLANNER_PROMPT = `You route user requests to the right handler.
Output ONLY a JSON object: {"intent": "...", "tool": "...", "args": {...}, "needs_reasoning": bool}.

Intents: balance_query, schedule_call, cancel_order, billing_question, free_form.
Tools: get_account, book_calendar, cancel_order, lookup_billing, none.

Set needs_reasoning=true ONLY when the user asks "why", "explain", or "what should I do".`;

async function planTurn(userMessage, sessionContext) {
  const r = await fetch("https://api.anthropic.com/v1/messages", {
    method: "POST",
    headers: { "x-api-key": KEY, "content-type": "application/json", "anthropic-version": "2023-06-01" },
    body: JSON.stringify({
      model: "claude-haiku-4-5",  // ~10-15× cheaper than frontier
      max_tokens: 300,
      system: PLANNER_PROMPT,
      messages: [{ role: "user", content: userMessage }]
    })
  });
  const data = await r.json();
  return JSON.parse(data.content[0].text);
}

async function handleTurn(userMessage, sessionContext) {
  const plan = await planTurn(userMessage, sessionContext);

  if (plan.needs_reasoning) {
    return await frontierModel(userMessage, sessionContext, plan);  // expensive path
  }

  const toolResult = await runTool(plan.tool, plan.args);
  return await cheapFormatter(toolResult, plan.intent);  // also haiku-class
}

The flow: every turn pays the planner cost (small). Most turns finish in the cheap path. The minority that need real reasoning still get it, with focused context.

The wins beyond cost

Cost reduction is the obvious win. Two less-obvious ones matter just as much:

The edge cases

The dashboard signal

Instrument three numbers per turn: planner cost, executor cost, total latency. Bucket by intent. The shape of cost-per-intent will tell you which intents are over- or under-routed. Want us to audit yours? Apply for the audit.

The one-line summary

Most of your agent's turns are routing decisions, not reasoning. Pay routing rates for routing turns. The cheap-model planner pattern compounds with the heartbeat fix in Issue #144 and the context-window fix in Issue #146 — all three are about not paying frontier rates for non-frontier work.

Enjoyed this? One essay like this every Sunday — 12,400+ founders read it.
Subscribe free RSS

Keep reading

Issue #146
Your Agent Will Hit the Context Window. Plan For It.
The retry-with-summary pattern beats buying a bigger window.
Issue #148
Least-Privilege Agent Tool Access
The tool-gate pattern this planner sits behind.
Issue #150 · NEW
12 Business Automations + The OS That Makes Them Compound
Why scattered automation can't compound — and the business-OS architecture that fixes it.
★★★★★

"I highly recommend Adam and his team. They are exceptional."

Digital Marketing Automation | Funnel setup · 3.8h·2021 · Upwork verified →
★★★★★

"I highly recommend Adam and his team. They are exceptional."

Digital Marketing Automation | Funnel setup · 3.8h·2021·Upwork verified → · Upwork ✓
★★★★★

"Adam's work on our funnel mapping buildout was exceptional. He demonstrated a deep understanding of our business needs, translating them into an efficient and effective funnel strategy. His clear communication and expert guidance made the process seamless."

B2B Funnel Building for Amazon Coaching Agency · 2.8h·2023·Upwork verified → · Upwork ✓
Run the audit on your agents

The Scale Audit ships the planner pattern
+ 24 others on every deploy.

Apply for an audit and we instrument your agent's per-turn cost + latency, bucket by intent, and ship the planner + executor split as part of the engagement.

Apply for a free audit All issues