Why Every Tool in Your MCP Server Needs a Different TTL

The default MCP server caches uniformly: one defaultTTL of an hour, applied across every tool. The default is wrong because every tool exposes data with a different volatility profile and a different upstream reliability profile. Here's the two-axis framework I use to pick TTLs, the three real tools that ended up at minutes / hours / days, and what the June 2026 MCP spec roadmap does (and doesn't) solve.

Pick any open-source MCP server template from 2026 and you'll find the same default in the config: defaultTTL: 3600. One hour. For every tool. The template authors aren't wrong to pick a default; they have to pick one. But the default is also the answer that fits the fewest real situations, because the tools you expose through an MCP server have wildly different freshness needs and wildly different upstream reliability profiles. A flight-price tool that returns hour-old prices is broken. A weather-forecast tool that calls a stable public API every five minutes is wasteful. Same one-hour default, two different production failures.

Two axes, not one

The right TTL for a tool isn't a function of how fast the answer changes alone. It's a function of two things, weighed against each other:

▸Volatility: how fast does the underlying truth move? Flight prices move minute-to-minute. Hotel rates move day-to-day. Historical weather is immutable; forecast weather is stable on the order of hours.
▸Upstream reliability: how often will the upstream simply refuse you? Google Flights aggressively blocks scrapers; you might be locked out for hours. Booking.com's HTML changes occasionally but rarely refuses a single page request. Open-Meteo (free public API) almost never refuses, but you don't want to hammer it just because you can.

Volatility says how fast the cache goes stale. Reliability says how expensive (in fallback cost or human attention) a cache miss is. The right TTL is roughly the time before the data goes stale, capped by how often you can afford to get blocked. A tool with high volatility AND low upstream reliability is the worst combination: you need fresh data and you can't easily get it. That tool gets the most aggressive cache plus a fallback strategy plus telemetry on miss rate. A tool with low volatility AND high reliability gets the loosest cache; it doesn't matter much either way.

Three tools, three TTL decisions

The MCP server I'll talk about (Travel MCP, public on GitHub) exposes three tools: get_latest_flight_data, get_latest_hotel_data, get_weather_on_dates. Each lives in a different quadrant of the volatility-reliability plane. Here's the decision in one diagram:

                  HIGH volatility
                       ▲
                       │
                       │   ┌──────────────────────┐
                       │   │  FLIGHTS  TTL = 5min │  ◄── high volatility +
                       │   │  upstream blocks     │      low upstream reliability:
                       │   │  → fallback to       │      shortest TTL,
                       │   │    sample data       │      explicit fallback path,
                       │   └──────────────────────┘      track miss rate
                       │
                       │      ┌──────────────────────┐
                       │      │  HOTELS  TTL = 4hr   │  ◄── medium volatility +
                       │      │  Playwright scraper  │      medium reliability:
                       │      │  → live, can drift   │      moderate TTL,
                       │      └──────────────────────┘      schema-drift watch
                       │
                       │                ┌──────────────────────┐
                       │                │ WEATHER TTL = 24hr   │  ◄── low volatility +
                       │                │ free public API      │      high reliability:
                       │                │ → no fallback needed │      loosest TTL,
                       │                └──────────────────────┘      simplest path
                       │
                       └──────────────────────────────────────►
                  LOW reliability                HIGH reliability

Three tools, three TTLs, none of them the template default of one hour. The TTL spread (5 minutes to 24 hours, ~290x range) is what you'd expect across realistic tools. Anyone who runs a production MCP server with defaultTTL: 3600 for everything has either three tools that happen to all want one hour, or (more likely) two tools serving stale data and one tool wasting upstream calls.

Why MongoDB and not Redis

Most production MCP server caching guides reach for Redis. Redis is correct for the case where you want sub-millisecond cache lookups for very hot keys with simple key-value shape. The Travel MCP server's tool results are not that shape. They're large structured documents (a flight result is a few KB of JSON, hotel results are larger), they don't need sub-millisecond access (the upstream call they're caching takes 200-2000ms anyway, so 5-15ms cache lookup is fine), and they benefit from per-tool collection isolation with their own indexes.

I picked MongoDB with one collection per tool, a TTL index on the document expiresAt field, and per-tool query indexes for the search keys (origin/destination + date for flights, city + checkin date for hotels, city + date range for weather). MongoDB's TTL index handles eviction natively (a background sweeper drops expired docs); the per-collection setup makes it trivial to set different TTL behavior per tool. The total operational cost of running a small MongoDB instance for this is comparable to a small Redis instance, and the document-shaped storage matches the data better.

The general rule: Redis when you're caching short keys for a high-RPS hot path. Document store when you're caching tool results that are themselves documents. The 'Redis everywhere' default in most caching guides ignores this distinction.

The June 2026 MCP spec roadmap

The official MCP roadmap for June 2026 includes adding TTL and ETag values at the protocol level. This is the right direction. It moves the cache decision out of each individual server's config and into a typed contract the client can reason about: a server can advertise 'this tool's results are valid for 300 seconds; here's an ETag for cheap revalidation,' and the client can cache, revalidate, or bypass without per-server bespoke logic.

What it doesn't solve: the per-tool decision still has to be made by the server author. Picking a TTL for your flight tool isn't a problem the protocol can solve for you. The protocol can give you the right place to write the answer down. You still have to know what the answer is.

The protocol-level TTL also doesn't model upstream reliability. A tool with a 5-minute TTL and an upstream that's blocked half the time is still a hard production problem. The fallback strategy (sample data, last-known-good cache, graceful empty result) is a server-side concern that lives outside the protocol. The June 2026 spec gets you halfway; the rest is still your design call.

What I would change in the Travel MCP design

▸Add a stale-while-revalidate path for flights. The current design serves a fresh result or a fallback. The middle ground (serve the slightly-stale result while kicking off a background refresh) is operationally simpler and gives the user a faster response. The cost is ~30 lines of code and a refresh queue I haven't built yet.
▸Track per-tool miss rate and surface it as an MCP server metric. Today the MongoDB TTL index quietly evicts expired docs; the only signal that a TTL is too short is when the upstream rate-limits me. A counter of 'cache miss + upstream call' versus 'cache hit' per tool would let me tune TTLs based on data, not gut feel.
▸Add a tool-level config schema so the TTL and fallback strategy travel with the tool definition, not with the server's environment variables. Today the TTL is a MONGO_TTL_FLIGHTS=300 env var. That works for one server. The moment a second consumer wants a different TTL profile, this design breaks.
▸Move the schema-drift detector into the cache layer for the hotel scraper. When Booking.com changes its page structure, the scraper starts returning empty results and the cache cheerfully serves them. A 'this looks empty / wrong' detector at write time would prevent caching the broken results and refusing to serve them after.

The bigger lesson

The mistake most MCP servers make is treating cache as an afterthought ('add a TTL on the response, ship it'). The mistake the prevailing 2026 caching guides make is the opposite ('add semantic caching, multi-tier L1 / L2, distributed cache invalidation'). The right answer is in the middle and depends entirely on the shape of your tools: high-volatility / low-reliability tools need careful design, low-volatility / high-reliability tools need almost nothing, and most MCP servers will have a mix.

If a hiring manager asks me how I think about MCP server design, the cache is the part I'd point at. Not because it's the most interesting part of the protocol, but because it's the part most people get wrong by default, and getting it right is what turns a demo MCP server into one you can run for a year without staring at it.

References

▸Model Context Protocol roadmap, June 2026 spec target (blog.modelcontextprotocol.io)
▸ChatForest: 'MCP Caching Strategies: Prompt, Server-Side, Semantic, Gateway' (2026)
▸MakeAIHQ: 'MCP Caching Strategies: Redis, CDN & Semantic Caching Guide'
▸AWS Database blog: 'Optimize LLM response costs and latency with effective caching'
▸PyImageSearch: 'Semantic Caching for LLMs: TTLs, Confidence, and Cache Safety' (May 2026)
▸MongoDB TTL indexes: native expiration documentation (mongodb.com/docs)
▸Travel MCP server — public repo + architecture write-up (tanayshah.dev/projects/travel-mcp-server/)

← MORE NOTES OPEN COMMS →