The
notes.
Technical notes on what shipping production AI infrastructure in 2026 actually looks like — agent compliance, multi-vendor orchestration, MCP, the trade-offs nobody warns you about.
Shipping an Agent iOS App From Zero in Two Weeks: What Survived, What Didn't
Native iOS clients are still the highest-fidelity surface for an AI agent — push, haptics, secure enclave, real local state. Here are the five engineering calls that survived contact with production, and the two I'd undo if I were starting over.
Why I Built My Own Agent Eval Harness Instead of Reaching for LangSmith
The off-the-shelf agent observability tools (LangSmith, Braintrust, Phoenix, Langfuse) are excellent for what they do. They are not eval harnesses. Here's the difference, why it matters, and the ~200-line Redis-coordinated wave scheduler I wrote when I needed a real one.
Why I Use gRPC for the Agent-to-Sandbox Bridge (and JSON-RPC Inside It)
Most teams pick one wire protocol and use it everywhere. The right answer is to pick by trust boundary: gRPC + Protobuf for the API-to-pod hop where you control both ends, JSON-RPC over a subprocess pipe inside the sandbox where the network surface has to be zero. Here's the two-tier design and the math behind it.
What I Learned About Anthropic's Prompt Cache From Running an Agent Loop in Production
Prompt caching is sold as a 90% cost cut. In production agent loops it can quietly become a 30% cost increase, depending on five things the docs do not put in big letters. Here are the patterns that made the math actually work for me, including the on-demand tool loading trick that keeps the cache alive.
Why I Run Postgres Migrations on Container Startup, Not From CI
The internet consensus is that database migrations belong in your CI/CD pipeline, never in your container's entrypoint. The consensus is right for the wrong reasons. Here's the four coordination problems people are actually trying to avoid, the 30-line Postgres advisory-lock pattern that solves all four, and why container-startup migrations are the simplest deploy story for a small team that doesn't have a release engineer.
What the Bubblewrap Sandbox Escape Tells Us About Agent Runtime Hardening in 2026
An autonomous agent that can disable its own sandbox is a sandbox you no longer have. Lessons from a real 2026 escape — and the four-layer model I use to reason about agent runtime isolation in production.
Picking MCP Servers for an Agent Without Drowning the Context Window: A Selection Heuristic for 2026
An MCP server is roughly 500-1,000 tokens of context per tool, billed every turn forever. The right number for a production agent is almost always 3-5, not 15. Here's the heuristic I use and the math behind it.
Designing Tool Surfaces for LLM Agents: What Goes On the Tool, What Stays In the Loop
Tool-surface design is the highest-leverage knob in production agent infrastructure — and the one most engineers underweight. Here's the design language for cache-friendly, token-minimal, domain-shaped tools that scale past the demo.
Multi-Vendor Agent Design: Why One Model Isn't Enough in 2026
Single-vendor agent architectures are a 2024 pattern. In 2026, the right move is splitting the loop — Claude for reasoning, Gemini for high-resolution vision, Llama (via Groq) for sub-100ms hot paths. Here's the orchestration shape that actually ships.
Building a Zero-Data-Retention Layer for Production LLM Agents
Anthropic's hosted Programmatic Tool Calling is fast, accurate, and absolutely incompatible with Zero Data Retention. Here's the request-interception pattern enterprise teams use to keep customer data on-prem while preserving model code quality.