// NOTES · 30 POSTS

The
notes.

Technical notes on what shipping production AI infrastructure in 2026 actually looks like — agent compliance, multi-vendor orchestration, MCP, the trade-offs nobody warns you about.

01 · 2026-06-058 MIN READ

Tessen: Building the Harness for AI Agents — From Forensic Capture to Runtime Control

Traditional observability treats an agent call like a web request. But an agent is a program that thinks, and it fails in ways a span can't show you. Tessen is the harness I'm building: two lines to capture everything your agent actually does in production, then catch the runaway loop before the bill does.

Tessen: Building the Harness for AI Agents — From Forensic Capture to Runtime Control

Why Vector Similarity Alone Lies in RAG (and the Rerank Step Most Pipelines Skip)

How to Evaluate a Founding Engineer in 2026: A Playbook for Seed-Stage Founders

Structured Outputs vs Tool Calling for LLM Data Extraction: Pick by Intent, Not by Habit

Why Prompt-Injection Filters Don't Save You (and What Actually Limits the Blast Radius)

Why Your Agent's UI Lags Behind Its Tool Calls (and the Streaming JSON Parser That Fixes It)

Picking a Durable Workflow Engine for AI Agents in 2026: Trigger.dev v4 vs Inngest vs Temporal

Why Your iOS Streaming Chat Is Cooking the GPU (and the 30-Line Debounce Buffer That Fixes It)

Shipping 100+ Tools to Claude Without Bloating the Cache: Anthropic Tool Search and Deferred Loading

When to Use SSE vs WebSocket for AI Agent Streaming (and Why I Use Both)

Building a Sub-2-Second Sales Coach: Two-Path Architecture for Real-Time Conversation AI

Anonymizing PII Client-Side Before It Reaches the LLM (Why I Don't Trust the Gateway)

Skill and Memory Injection for Agent Loops: Why I Don't Let the Agent Page Its Own Memory

What 1M Context Actually Buys You (and What It Doesn't): Production Patterns from a 2026 Agent Loop

Why I Use a Postgres Append-Only Log for Agent Chat (Not Redis Streams)

Bubblewrap, Landlock, gVisor, Firecracker: Choosing a Sandbox for AI Agent Code Execution in 2026

Four Production Reliability Patterns for AI Agents (Beyond Retry-With-Backoff)

Why Every Tool in Your MCP Server Needs a Different TTL

Building a Black-Box Flight Recorder for Streaming Anomalies

From Days to Hours: Migrating a 20M-Record Wikipedia ML Pipeline From Sync to Async

Why I Run Postgres Migrations on Container Startup, Not From CI

What I Learned About Anthropic's Prompt Cache From Running an Agent Loop in Production

Why I Use gRPC for the Agent-to-Sandbox Bridge (and JSON-RPC Inside It)

Why I Built My Own Agent Eval Harness Instead of Reaching for LangSmith

Shipping an Agent iOS App From Zero in Two Weeks: What Survived, What Didn't

Building a Zero-Data-Retention Layer for Production LLM Agents

Multi-Vendor Agent Design: Why One Model Isn't Enough in 2026

Designing Tool Surfaces for LLM Agents: What Goes On the Tool, What Stays In the Loop

Picking MCP Servers for an Agent Without Drowning the Context Window: A Selection Heuristic for 2026

What the Bubblewrap Sandbox Escape Tells Us About Agent Runtime Hardening in 2026