Picking MCP Servers for an Agent Without Drowning the Context Window: A Selection Heuristic for 2026

An MCP server is roughly 500-1,000 tokens of context per tool, billed every turn forever. The right number for a production agent is almost always 3-5, not 15. Here's the heuristic I use and the math behind it.

An MCP (Model Context Protocol) server is roughly 500-1,000 tokens of prompt context per exposed tool, billed on every single turn for the lifetime of the agent. Anthropic's own production agents reportedly carried 134K tokens of tool definitions before optimization. For a typical production agent in 2026 the right number of MCP servers is almost always 3-5, not 15 — and the second one matters more than the tenth. This post is the heuristic I use to decide which MCP servers belong on an agent and which ones should stay off.

I run this exercise at the start of every new agent project (most recently for a Travel MCP server I open-sourced — flights, hotels, weather, with a MongoDB-backed per-tool TTL cache). It is the single highest-ROI design decision you make in the first hour, because every tool you put on the agent compounds in three ways across the agent's lifetime: token cost, selection accuracy, and security surface area.

What does each MCP server actually cost you?

Three costs, all paid forever:

▸Token cost. ~500-1,000 tokens per tool description, plus tool name, plus argument schema, plus example. Five servers each exposing 15 tools is 50,000-75,000 tokens before the user has typed a single word. At Sonnet pricing in 2026 that's a non-trivial bill at scale.
▸Selection cost. Every additional tool in the surface raises the probability the agent picks the wrong tool. Anthropic's internal benchmarks show selection accuracy degrades non-linearly past about 20 tools. Past 30, you essentially require a tool_search indirection (Anthropic's Tool Search Tool, which loads tool definitions on demand and reportedly preserves ~191K tokens of context vs ~123K with the traditional all-tools-in-prompt approach).
▸Security cost. A January 2026 audit found 66% of scanned community MCP servers had security findings, with 30+ CVEs in the first two months of the year alone. Each MCP server you mount is an external supply-chain dependency that can read your tool calls. Treat the install list like an npm install — vet, version-pin, and minimize.

The selection heuristic

I run every candidate MCP server through these five questions, in order. If a server doesn't pass the first three, I don't install it. If it doesn't pass all five, I keep looking.

▸Does this solve a weekly problem, or a once-a-quarter problem? Install only servers that earn back their token cost on a weekly cadence. If you'd touch the tool less than once a week, the agent shouldn't pay for it every turn.
▸Is the tool surface domain-shaped or kitchen-sink? A good MCP server exposes 3-7 narrow, well-named tools that compose to cover its domain. A bad one exposes 30 tools that mirror its underlying REST API one-to-one and force the agent to figure out which one to call. Prefer the former even when the capabilities overlap less.
▸Does the server cache? If the upstream is rate-limited (which most third-party APIs are) and the server doesn't cache, the agent will eat rate-limit errors at exactly the moments you most want it to be reliable. The Travel MCP I shipped uses MongoDB with per-tool TTLs tuned to each upstream's reliability — long for slow-moving weather, short for live flight pricing.
▸Does the server have a clean error contract? When the upstream fails, does the tool return a structured error the agent can route on ("upstream is rate-limited; retry in 30s") or a stack trace? Stack traces blow your context budget and confuse the model. Engineered error responses cost a few extra dev-hours and save you debugging in production.
▸Is the maintainer responsive to security issues? Check the issue tracker, the time-to-CVE-fix history, and the Github Security Advisories tab. The MCP ecosystem is young enough that maintenance discipline varies wildly. A flashy server with no commits in six months is a liability you cannot accept on a production agent.

What the starting set usually looks like

For most agents I've worked on in 2026, the right starting set is exactly three MCP servers, expanded one at a time only when the absence is felt:

▸A code/repo server (the official GitHub MCP server is the obvious default) — eliminates the most common context-switching tax for any developer-flavored agent.
▸A docs server (Context7 or similar) — fast-moving frameworks change quickly enough that without docs-on-demand, the agent will confidently hallucinate API signatures.
▸One domain-specific server — the actual reason this agent exists. Travel MCP for a travel-booking concierge. A CRM MCP for a sales agent. A medical-records MCP for a clinical scribe. Pick the one server that maps directly to the agent's job.

When you actually need 30 tools

There's exactly one scenario where the "3-5 servers" rule breaks: when the agent's job is to do many small actions across a heterogeneous surface (think IT-helpdesk agents that touch 20 internal systems, or sales-ops agents that route across CRM + email + calendar + comms). For those, do not stuff all tools into the prompt. Use Anthropic's Tool Search Tool, or a similar tool-on-demand pattern, where the agent's first move is always to query a search index of available tools, load only the ones it needs, and then act. The pattern preserves ~70K tokens of context per turn at the cost of one extra round-trip.

Where to find good MCP servers

The official MCP Registry (registry.modelcontextprotocol.io) is the canonical source as of 2026 and is what Claude Code, Cursor, and the major MCP clients pull from. Beyond the registry, two community lists worth watching: punkpeye/awesome-mcp-servers (~85K stars on GitHub) and the official Anthropic MCP servers repo. For freshly-built servers that haven't hit the registry yet, search GitHub for mcp-server followed by your domain — there's usually something within a week of any new vertical taking off.

What this is really about

Tool selection on an agent is the same problem as dependency selection on a backend service. The right answer is never "add everything that might be useful." The right answer is "add the smallest set that lets the system do its job, and remove the ones that don't earn their keep." The 2026 difference is that on an agent, every dependency is also a prompt tax — paid every turn, charged in tokens, billed in dollars, audited by your model's selection accuracy. Token budget compounds. Selection accuracy compounds. Security debt compounds. Pick three servers. Earn the fourth.

If you're hiring an AI engineer or founding engineer in NYC and want to talk through tool-surface design, agent-infrastructure tradeoffs, or anything in this post, I'm at tanay@tanayshah.dev. I've been building production agent systems for the last year — happy to compare notes.

← MORE NOTES OPEN COMMS →