TANAY.SHAH
← FIELD REPORT/AGENT INFRASTRUCTURE
// TOPICAL HUB · 2026

Agent Infrastructure

The engineering patterns I use when building agent runtimes for seed-stage AI startups — runtime sandboxing, MCP servers, tool surfaces, eval harnesses, durable workflows, and the prompt-cache hierarchy that keeps agents fast and cheap in production.

Most of what passes for “agent infrastructure” in public conversation is the model layer: which provider, which model size, which prompt-engineering trick. The model is the easy part. The infrastructure that decides whether your agent survives contact with real users is everywhere else — the sandbox the agent executes inside, the tool surface it sees, the cache it hits on every turn, the eval harness that catches regressions before they ship, the workflow engine that resumes after a deploy. This is a hub for the field notes I have written on each layer, organized by the order I think about them when I am designing a new system.

Each section below links to the full posts. The posts are written from production reps — every recommendation has been run against real customer load, and every anti-pattern is one I have either shipped or watched a peer ship. If you are hiring an engineer to build any of this, the playbook for that hire lives at How to Evaluate a Founding Engineer in 2026.

// 01 · RUNTIME SANDBOXING & ISOLATION

Runtime sandboxing & isolation

How to keep an autonomous coding agent inside its sandbox in 2026 — the layered model after the Bubblewrap escape, comparison of bwrap / Landlock / gVisor / Firecracker / microVM approaches, and the bridge between agent and sandbox.

// 02 · TOOL SURFACE & MCP DESIGN

Tool surface & MCP design

MCP server selection without drowning the context window, how to design tool schemas that don't blow your prompt cache, deferred-loading tool indexes, and per-tool TTLs for caching.

// 03 · PROMPT CACHE & CONTEXT ENGINEERING

Prompt cache & context engineering

What 1M-context actually buys you in production, the prompt cache hierarchy across tools / system / messages, skill and memory injection patterns for long-running loops, and how to keep the cache hot through pivots.

// 04 · EVAL HARNESSES & OBSERVABILITY

Eval harnesses & observability

Why I built my own eval harness instead of reaching for LangSmith, agent reliability patterns from real production incidents, streaming anomaly detection via flight recorders, and what to grade when there is no ground truth.

// 05 · DURABLE WORKFLOWS & COORDINATION

Durable workflows & coordination

Picking a durable workflow engine for AI agents in 2026 — replay-based vs checkpoint-based vs event-driven — and the decision tree I use across Temporal, Trigger.dev, Inngest, and LangGraph.

// 06 · STREAMING & CLIENT PATTERNS

Streaming & client patterns

SSE versus WebSockets for agent streaming, partial-JSON tool-call parsing, iOS chat-buffer debouncing, and the Postgres-append-only event log that lets web / iOS / CLI clients share one source of truth.

// 07 · MULTI-VENDOR & RESILIENCE

Multi-vendor & resilience

Designing agent layers to swap models without rewriting them, sync-to-async ML pipeline migrations, container startup migrations, and zero-data-retention configurations for regulated workloads.

// 08 · SECURITY & ADVERSARIAL PATTERNS

Security & adversarial patterns

Defense-in-depth for prompt injection, blast-radius scoping for agent tool surfaces, and the client-side PII anonymization layer that keeps user data out of model context entirely.

// 09 · RAG & RETRIEVAL ARCHITECTURE

RAG & retrieval architecture

Two-stage retrieval-plus-rerank pipelines that actually work in 2026, the reranker decision tree per workload, and the per-stage metric set you need to debug regressions.