Your Agent Picks the Wrong Tool. It’s Not the Agent’s Fault.
Lesson 2.1: When your agent reaches for the wrong tool, the answer isn’t a routing classifier. It’s a description that says when NOT to use the tool.
Architecture Series · Domain 2 · Lesson 2.1 · Tool Design
Domain 1 (Lessons 1.1 – 1.7 + Capstone Parts 1 & 2) is complete. The full series index also appears at the end of this piece.
👋 Welcome! I’m Daniel Williams. I write Claude Code for Non-Coders for senior technical professionals who built their careers on technical judgment, stopped writing code years ago, and are now figuring out how AI and coding agents will change their work.
The goal is to keep you as the operator, not the AI’s assistant (”reverse-centaur“), by helping you decide which tasks to automate and which require the judgment that made you valuable in the first place.
I advise clients on AI tools, strategy, and human resilience at dewilliams.co. This newsletter is where I document the patterns, commands, and operator habits that help you grow from babysitting prompts to building reliable systems.
Join 33,000+ senior technical professionals learning the operating discipline that keeps your judgment valuable.
tl;dr Tool selection is a description problem before it is an architecture problem. The first fix when your agent picks the wrong tool is to rewrite the two competing descriptions so each one names the other and says when NOT to use it.
Your agent picks the wrong tool. You ask about a specific order, and it fetches the customer profile. You ask about a refund, and it lists recent orders. The reflex is to blame Claude. The reflex is wrong.
Tool selection is not a model problem. The model reads each tool’s name and description and picks based on what it sees. When two descriptions say roughly the same thing, the model picks one, and you hope for the best.
For operators who aren’t writing tools themselves, the judgment that matters is what to do about it. You can stack architecture on top, a classifier, a router, or a block of few-shot examples that show correct routing. Or you can fix the descriptions that made the choice ambiguous in the first place. Operators who reach for the classifier turn a thirty-line fix into a three-hundred-line system, and the system still fails at the edges where the descriptions still overlap. Operators who fix the descriptions ship reliable agents and stop fighting their own architecture. This is the kind of choice that separates operators from prompters, the operating discipline this newsletter exists to teach.
The Misrouting Moment
Let’s look at an example in which a customer support agent has five tools: lookup_customer, lookup_order, list_recent_orders, list_refunds, and issue_refund. The user prompts, “Check the status of order #12345.” The agent reaches for lookup_customer. The two descriptions it had to pick between read:
lookup_customer: Retrieves customer information.
lookup_order: Retrieves order information.The boundary between them is implied, not stated. The model is being asked to infer where one tool ends and the other begins, based on two short sentences that mirror each other almost word for word. Models infer poorly at the edges of overlapping definitions, and overlapping definitions are the rule, not the exception, in tool surfaces that grow over time.
Vague or overlapping descriptions produce unreliable selection. Not occasionally, not on edge cases. Reliably unreliable, wherever the descriptions don’t draw clear lines.
The Reflexes That Make It Worse
I made this mistake on an MCP server I was building before I had a name for it. A tool would get picked at the wrong moment, and I would reach for the architecture itself: add a mode parameter, split one tool into two, consolidate two others. Architectural moves, sometimes a whole afternoon of refactoring. The descriptions stayed roughly the same, and that was the actual problem. Every time, the fix I needed was just twenty minutes of writing.
Instinctively, most builders gravitate toward architecture before descriptions. But the right move is usually to fix the descriptions BEFORE tinkering with the tool architecture.
A classifier in front of the agent is the most common reach. A small model reads the user’s message and decides which tool to call. It sometimes works, at the cost of doubling the architecture. You now have two systems to maintain and two systems that can disagree. The classifier routes around broken descriptions rather than fixing them.
Few-shot patches are the next instinct. A block of examples in the system prompt shows the right tool for various inputs. They sometimes work, at the cost of tokens per call and a system prompt that grows whenever a new ambiguity surfaces. The descriptions are still wrong, and you have patched the symptom.
Consolidation is the move that feels architecturally clean. Collapse the two tools into one with an entity_type parameter. This is the right move for five GitHub tools that all handle issues in similar ways. It is the wrong move to use lookup_customer instead of lookup_order. Those tools operate on different resources, return different shapes, and have different consequences when called incorrectly. A bad consolidation produces a single tool that requires more documentation than the two clean ones it replaced.
The fix most builders skip is the one the official docs call “by far the most important factor in tool performance.” Rewrite both descriptions so each one explicitly names the other and says when NOT to use it.



