Structured AI · Founding Engineer.
Fourteen weeks of agent infrastructure, sandboxed runtimes, an iOS app, and durable chat — all in production.
Joined at the founding-engineering stage with a clear mandate: ship the agent infrastructure, the iOS client, and the deploy pipeline that production needed. Most of the engineering decisions had compound constraints — enterprise compliance, cross-platform parity, deploy speed — that rewarded engineers who could move fast across the stack.
Fourteen weeks of velocity: 1,035 commits, ~74/week median, peak week of 145. The work spanned backend (FastAPI + Postgres + Redis + gRPC bridge), agent loop (LangGraph + Claude with on-demand tool loading), code execution (in-house sandbox built for Zero Data Retention compliance), iOS / iPadOS client from zero, and CI work cutting staging deploys to under seven minutes.
Most architectural detail beyond the public timeline is NDA-bound. The deeper version is available in 1:1 conversation under appropriate scope.
In-house sandbox over hosted PTC
Anthropic's hosted Programmatic Tool Calling doesn't support Zero Data Retention, and enterprise clients required ZDR. Built a request-interception layer that handles PTC traffic locally — untrusted code runs in a hardened in-house pod (JSON-RPC over child subprocess, AST validation, bubblewrap + seccomp); customer data wipes immediately and never reaches the model provider.
Multi-vendor agent (Claude reasoning + Gemini vision)
Anthropic Claude drives the reasoning loop; Google Gemini Vision is delegated for high-resolution document inspection through a multi-tile rendering pipeline that yields significantly higher effective resolution than naive full-page rendering. Different providers, different strengths — the agent uses each where it wins.
Synthesis over framework adoption
Built three prior agents on LangGraph Deep Agents, raw LangGraph nodes, and Anthropic's Claude Code SDK loop. None hit the bar. The shipped agent is a synthesis: takes the strongest pattern from each (prefetch-backed tools and server-side compaction, workspace + execute model, minimal-tool philosophy, adaptive thinking) and strips out every line of framework overhead.
Modular agent loop with on-demand tool loading and persistent skill / memory injection. Cache-friendly via Anthropic's ephemeral cache_control directive — the right primitive for cost-sensitive long-context agents.
Construction drawings are 36×24-inch sheets — naive full-page rendering loses critical detail. Multi-tile rendering yields significantly higher effective resolution at the same per-tile cost. Multi-vendor design where each provider plays its strength.
Linux user-namespace isolation + syscall filtering + pre-execution AST static analysis. Same-class hardening as production code-execution platforms (Replit / E2B), built in-house for ZDR-compliant enterprise deployment.
Anthropic's hosted PTC doesn't support Zero Data Retention. Custom interception layer keeps execution local while preserving Claude's PTC code-quality. ZDR compliance without sacrificing model capability.
Cross-process typed RPC at scale. Protobuf schemas double as the API contract — no drift between client and server.
Chat survives network blips, pod restarts, and deploys without lost messages. Replay-on-reconnect is the modern primitive for production conversational UI.
Container-native serverless compute for the agent pod. ACR Tasks build with Blacksmith BuildKit cache. CPU-only torch + blobless wipe-guard clone cut staging deploys from 10+ min to under 7.
- →Founding-engineering velocity isn't typing speed; it's the ability to make 'good enough now, refactor next sprint' calls without stopping the line. Every week I shipped, I also paid down a known piece of debt I'd taken the prior week. Net: forward motion every Monday.
- →Multi-vendor agent design is increasingly important — single-provider lock-in is a liability for both compliance and capability. The pattern of 'reasoning model + delegated vision model' will keep generalizing.
- →The most valuable infrastructure I shipped was the CI work. Cutting staging deploys from 10+ minutes to under 7 changed how the whole team iterated.