Agent Infrastructure
The engineering patterns I use when building agent runtimes for seed-stage AI startups — runtime sandboxing, MCP servers, tool surfaces, eval harnesses, durable workflows, and the prompt-cache hierarchy that keeps agents fast and cheap in production.
Most of what passes for “agent infrastructure” in public conversation is the model layer: which provider, which model size, which prompt-engineering trick. The model is the easy part. The infrastructure that decides whether your agent survives contact with real users is everywhere else — the sandbox the agent executes inside, the tool surface it sees, the cache it hits on every turn, the eval harness that catches regressions before they ship, the workflow engine that resumes after a deploy. This is a hub for the field notes I have written on each layer, organized by the order I think about them when I am designing a new system.
Each section below links to the full posts. The posts are written from production reps — every recommendation has been run against real customer load, and every anti-pattern is one I have either shipped or watched a peer ship. If you are hiring an engineer to build any of this, the playbook for that hire lives at How to Evaluate a Founding Engineer in 2026.
Runtime sandboxing & isolation
How to keep an autonomous coding agent inside its sandbox in 2026 — the layered model after the Bubblewrap escape, comparison of bwrap / Landlock / gVisor / Firecracker / microVM approaches, and the bridge between agent and sandbox.
Tool surface & MCP design
MCP server selection without drowning the context window, how to design tool schemas that don't blow your prompt cache, deferred-loading tool indexes, and per-tool TTLs for caching.
- 7 MINDesigning Tool Surfaces for LLM Agents: What Goes On the Tool, What Stays In the Loop
- 6 MINPicking MCP Servers for an Agent Without Drowning the Context Window: A Selection Heuristic for 2026
- 7 MINShipping 100+ Tools to Claude Without Bloating the Cache: Anthropic Tool Search and Deferred Loading
- 7 MINWhy Every Tool in Your MCP Server Needs a Different TTL
Prompt cache & context engineering
What 1M-context actually buys you in production, the prompt cache hierarchy across tools / system / messages, skill and memory injection patterns for long-running loops, and how to keep the cache hot through pivots.
Eval harnesses & observability
Why I built my own eval harness instead of reaching for LangSmith, agent reliability patterns from real production incidents, streaming anomaly detection via flight recorders, and what to grade when there is no ground truth.
Durable workflows & coordination
Picking a durable workflow engine for AI agents in 2026 — replay-based vs checkpoint-based vs event-driven — and the decision tree I use across Temporal, Trigger.dev, Inngest, and LangGraph.
Streaming & client patterns
SSE versus WebSockets for agent streaming, partial-JSON tool-call parsing, iOS chat-buffer debouncing, and the Postgres-append-only event log that lets web / iOS / CLI clients share one source of truth.
- 7 MINWhen to Use SSE vs WebSocket for AI Agent Streaming (and Why I Use Both)
- 7 MINWhy Your Agent's UI Lags Behind Its Tool Calls (and the Streaming JSON Parser That Fixes It)
- 7 MINWhy Your iOS Streaming Chat Is Cooking the GPU (and the 30-Line Debounce Buffer That Fixes It)
- 8 MINWhy I Use a Postgres Append-Only Log for Agent Chat (Not Redis Streams)
Multi-vendor & resilience
Designing agent layers to swap models without rewriting them, sync-to-async ML pipeline migrations, container startup migrations, and zero-data-retention configurations for regulated workloads.
Security & adversarial patterns
Defense-in-depth for prompt injection, blast-radius scoping for agent tool surfaces, and the client-side PII anonymization layer that keeps user data out of model context entirely.
RAG & retrieval architecture
Two-stage retrieval-plus-rerank pipelines that actually work in 2026, the reranker decision tree per workload, and the per-stage metric set you need to debug regressions.