The default in most agent frameworks is to hand the full tool list to the model on every turn. The model picks. This is the "we trust the model" school of thought. It works fine until it doesn't.
Two failure modes show up reliably:
- Quality drift. The model sees 12 tools but the current turn only needs one of them. It picks a similar-looking wrong tool. The user gets a polite but irrelevant response. This happens far more than agent builders admit.
- Privilege escalation. A jailbroken or confused user gets the model to invoke
cancel_subscriptionfrom a billing-inquiry conversation. Or worse:send_email_to(arbitrary_address)with content the user dictated. The tools were "available," so the model used them.
Both go away under least-privilege tool access: each turn gets only the tools that match the detected intent. Default deny. Explicit grants. The same rule that secured Unix syscalls 50 years ago.
The pattern
One intent classifier + one tool gate. The classifier was the subject of Issue #147 — a cheap-model planner that outputs structured JSON: { intent, tool, args, needs_reasoning }. Today's pattern is what happens between the classifier and the executor.
The gate is a small map: intent → allowed tools. When the executor goes to run, it filters the model's tool list down to only what the gate allows for the classified intent. Anything outside the allowlist is unavailable to that turn's model call — not refused, not visible.
const INTENT_TOOLS = {
balance_query: ['get_account'],
schedule_call: ['book_calendar', 'list_calendar_slots'],
cancel_order: ['lookup_order', 'cancel_order'],
billing_question: ['lookup_billing', 'get_account'],
greeting: [], // no tools — agent just talks
free_form: ['get_account', 'lookup_order', 'lookup_billing'] // fallback, narrow
};
async function gatedExecutor(plan, fullToolRegistry) {
const allowed = INTENT_TOOLS[plan.intent] || [];
const scopedTools = fullToolRegistry.filter(t => allowed.includes(t.name));
// Audit log: every tool-gate decision logged for review
log({ event: 'tool_gate', intent: plan.intent, allowed, requested: plan.tool });
if (plan.tool && !allowed.includes(plan.tool)) {
// Planner asked for a tool that's not allowed for this intent.
// Don't silently retry — surface it so the gate map can be refined.
return { error: 'gate_violation', intent: plan.intent, tool: plan.tool };
}
return await runWithTools(plan, scopedTools);
}
The greeting intent has zero tools. The balance_query has exactly one. cancel_order requires two consecutive tools (lookup then cancel) which means the agent literally cannot cancel something without first having looked it up.
What this fixes
- The wrong-tool-picked failure from quality drift disappears. The model can't pick a wrong tool that isn't visible to it.
- The escalation surface shrinks. A jailbroken prompt that tries to invoke
cancel_subscriptionduring a balance_query gets caught at the gate, not at the LLM. - The audit log becomes meaningful. Every
tool_gatelog line tells you what the classifier thought + what tool was attempted. Misclassifications become visible the moment the classifier picks a wrong intent and the gate rejects.
The edge cases
- Multi-intent turns. A user asks "what's my balance and can you cancel my last order?" — that's two intents. The planner pattern (Issue #147) should emit a plan-list; the gate is applied per step.
- Intent drift mid-conversation. The user starts at balance_query, then says "actually never mind, cancel everything." Re-classify per turn. Don't carry the previous intent forward as a privilege.
- The free_form fallback. Real conversations have ambiguous turns. Don't make the fallback "all tools" — keep it narrow. List only the read-only tools that can't cause damage. Writes always require a specific intent.
- Tool definitions drift. When you add a new tool, ask first: which intents should be allowed to use it? Add to
INTENT_TOOLSin the same commit. CI should flag tools that exist in the registry but not in any intent allowlist.
The dashboard signal
One number tells you whether the gate is well-tuned: gate-violation rate per intent. High violation rate on a specific intent means the classifier is misclassifying that intent (or the gate is too narrow). Zero violations across the board for a week of traffic means either the gate is correctly calibrated, or your classifier is so cautious it never recommends a tool — both worth investigating.
For every agent we deploy at AutomateScale: explicit INTENT_TOOLS map + tool-gate executor + audit log of every gate decision. The gate sits between the planner from Issue #147 and the executor. It's where security review happens. Want us to audit yours? Apply for the audit.
The one-line summary
Every tool your agent has is a tool a jailbroken prompt can invoke. Default deny + explicit per-intent grants is what 50 years of Unix taught us about least-privilege — and what most agent frameworks unlearned. Ship the gate before the next CVE-shaped surprise.