// NOTES · 10 POSTS

The
notes.

Technical notes on what shipping production AI infrastructure in 2026 actually looks like — agent compliance, multi-vendor orchestration, MCP, the trade-offs nobody warns you about.

01 · 2026-05-107 MIN READ

Shipping an Agent iOS App From Zero in Two Weeks: What Survived, What Didn't

Native iOS clients are still the highest-fidelity surface for an AI agent — push, haptics, secure enclave, real local state. Here are the five engineering calls that survived contact with production, and the two I'd undo if I were starting over.

READ POST→

02 · 2026-05-108 MIN READ

Why I Built My Own Agent Eval Harness Instead of Reaching for LangSmith

The off-the-shelf agent observability tools (LangSmith, Braintrust, Phoenix, Langfuse) are excellent for what they do. They are not eval harnesses. Here's the difference, why it matters, and the ~200-line Redis-coordinated wave scheduler I wrote when I needed a real one.

READ POST→

03 · 2026-05-107 MIN READ

Why I Use gRPC for the Agent-to-Sandbox Bridge (and JSON-RPC Inside It)

Most teams pick one wire protocol and use it everywhere. The right answer is to pick by trust boundary: gRPC + Protobuf for the API-to-pod hop where you control both ends, JSON-RPC over a subprocess pipe inside the sandbox where the network surface has to be zero. Here's the two-tier design and the math behind it.

READ POST→

04 · 2026-05-107 MIN READ

What I Learned About Anthropic's Prompt Cache From Running an Agent Loop in Production

Prompt caching is sold as a 90% cost cut. In production agent loops it can quietly become a 30% cost increase, depending on five things the docs do not put in big letters. Here are the patterns that made the math actually work for me, including the on-demand tool loading trick that keeps the cache alive.

READ POST→

05 · 2026-05-107 MIN READ

Why I Run Postgres Migrations on Container Startup, Not From CI

The internet consensus is that database migrations belong in your CI/CD pipeline, never in your container's entrypoint. The consensus is right for the wrong reasons. Here's the four coordination problems people are actually trying to avoid, the 30-line Postgres advisory-lock pattern that solves all four, and why container-startup migrations are the simplest deploy story for a small team that doesn't have a release engineer.

READ POST→

06 · 2026-05-097 MIN READ

What the Bubblewrap Sandbox Escape Tells Us About Agent Runtime Hardening in 2026

An autonomous agent that can disable its own sandbox is a sandbox you no longer have. Lessons from a real 2026 escape — and the four-layer model I use to reason about agent runtime isolation in production.

READ POST→

07 · 2026-05-096 MIN READ

Picking MCP Servers for an Agent Without Drowning the Context Window: A Selection Heuristic for 2026

An MCP server is roughly 500-1,000 tokens of context per tool, billed every turn forever. The right number for a production agent is almost always 3-5, not 15. Here's the heuristic I use and the math behind it.

READ POST→

08 · 2026-05-097 MIN READ

Designing Tool Surfaces for LLM Agents: What Goes On the Tool, What Stays In the Loop

Tool-surface design is the highest-leverage knob in production agent infrastructure — and the one most engineers underweight. Here's the design language for cache-friendly, token-minimal, domain-shaped tools that scale past the demo.

READ POST→

09 · 2026-05-095 MIN READ

Multi-Vendor Agent Design: Why One Model Isn't Enough in 2026

Single-vendor agent architectures are a 2024 pattern. In 2026, the right move is splitting the loop — Claude for reasoning, Gemini for high-resolution vision, Llama (via Groq) for sub-100ms hot paths. Here's the orchestration shape that actually ships.

READ POST→

10 · 2026-05-096 MIN READ

Building a Zero-Data-Retention Layer for Production LLM Agents

Anthropic's hosted Programmatic Tool Calling is fast, accurate, and absolutely incompatible with Zero Data Retention. Here's the request-interception pattern enterprise teams use to keep customer data on-prem while preserving model code quality.

READ POST→

← BACK TO FIELD REPORT CASE STUDIES →