TANAY.SHAH
// Document — Field Report
FILED 2026.05.08 / NEW YORK, NY

TanayShah.

I'm an AI engineer. I build the systems behind production chat apps and agent platforms — backends, sandboxes, iOS clients, fast.

CS Honors + Statistics · UMD · grad. Dec 2025
4 roles · 4 years · NYC-based
Open to: founding · early-stage · applied AI
Most recent: Founding Engineer @ Structured AI

// IF YOU'RE COUNTING

Most recent 14-week window: 1,035 commits, 30 PRs merged, work spanning backend, frontend, iOS, and infra at near-100% blame-ownership. Four production roles across four years. The numbers are reproducible — charts in §03, timeline in §04.

SCROLL FOR THE 30-SECOND TAKE →§ 01 / 08
§01.5// transmission

I'm Tanay. I build production AI infrastructure — agent loops, sandboxed runtimes, durable event-sourced chat, the iOS clients that hang off them. Production agent infrastructure that ships — and agents that don't just work, they work well. I think about evals as much as I think about loops.

The way I work: pick a hard problem, draw the diagram, ship the first version in days, instrument it, iterate. I think about agent systems for fun — LangGraph + Claude with on-demand tool loading; gRPC bridges between API and slim agent pods; Python sandboxes built on JSON-RPC, AST validation, and seccomp-locked bubblewrap. The diagrams in §02 are real, the numbers in §03 are measured.

// MOST RECENT CHAPTER
Fourteen weeks as Founding Engineer at Structured AI (NYC). 1,035 commits across backend, frontend, iOS, plugin work, and infra. Shipped the production document-analysis agent (the shape in §02), a ZDR-compliant code-execution sandbox built on the PTC-interception pattern, an in-house eval harness wired to a public domain benchmark, and the iOS app from zero. Before that: distributed systems at Intuitive Labs SF; two years of ML research at UMD's iSchool fine-tuning Llama on twenty million Wikipedia AfD comments. I started writing CNNs at sixteen.

Graduate of the University of Maryland — departmental honors in CS, Statistics minor, December 2025. The page below is instrumented; you can read it like a spec.

§02// architecture — how i build agent systems

How I architect
agent systems.

Same shape every time: a thin client, an API gateway, a slim agent pod that runs a graph runtime + LLM with on-demand tool loading, retrieval over a vector store, durable state in Postgres + Redis. The diagram below is the reference pattern — portable across agent products.

// THE SANDBOX TRICK
The code sandbox isn't there for safety alone — it's a request interceptor. Hosted code-execution offerings typically ship customer payloads through a third-party endpoint. This sandbox lets the model believe it's executing upstream while the run actually happens on hardened in-house infra — giving you full Zero Data Retention without giving up tool quality. Payloads wipe immediately; the upstream provider never sees them.

  • Layered shape · portable across agent products
  • Streaming end-to-end · WSS · SSE · JSON-RPC
  • Tools discovered at runtime · MCP-shaped
  • Sandbox = interceptor · ZDR by default
// HOVER A NODE TO INSPECT
Idle. Move cursor over diagram →
// FIG. 02 · AGENT PLATFORM · LAYERED VIEWSCALE 1:N
CLIENTSAPI · TRANSPORTAGENT PODSTATE · STOREWSS · SSEHTTPS · WSgRPC · ProtobufJSON-RPCasyncpg · pgvectorlocks · queuessessionsWEB CLIENTReactive · streaming chatMOBILE CLIENTNative · streamingDESKTOP PLUGINHost-embedded · IPCAPI GATEWAYAuth · sessions · routes · RPC clientAGENT POD
Graph runtime · LLM · prompt cache · on-demand tool search · skill / memory injection
CODE SANDBOX
bwrap · seccomp · AST validation · child subproc
vector RAGweb searchcode registrydoc renderSQL + VECTORasync driver · vector ext · poolingKV / QUEUESsessions · locks · DLQCONTAINER PLATFORMmanaged jobs · registry · object storeHARDENED · ZDR PATH// EVAL HARNESStask suiteground-truth→ per-release tuning
critical pathaux flowstreamedPROD · 2026.04
§03// velocity — output, sampled

What fourteen weeks of me looks like.

1,035 commits, ~74/week median. The chart below is a single 14-week sample of my output cadence — author-filtered to a single email. Calendar dates and milestone labels are stripped out for confidentiality; the bar shape is real.

WEEKS14
TOTAL1,035 commits
MEDIAN~74 commits / week
PEAK WK145 commits
MIN WK30 commits
AUTHORsingle email · public git
// FIG. 03 · WEEKLY COMMIT VOLUMEN = 14
04080120160COMMITS30W140W235W340W460W570W685W780W870W9110W10140W11145W1280W1350W14PEAK · 145
// SHAPE — ramp up over the first three weeks, sustained mid-cadence through the middle, peak in the final third around a major ship, taper through the wrap. Median ~74/week, peak 145, min 30. Counts and shape are real; calendar dates and milestone labels are out of scope here.
§04// timeline · career
loading…
§06// capability · year × technology

Surface area —
instrumented.

A skill cloud lies. This grid tells the truth: which years used what, how deeply, and where things accelerated. 2026 is the steepest column on the page — that's by design.

// LEGEND
not used / before time
touched · learning
production-grade
shipped at scale · owned
// TECH20222023202420252026
LANGUAGES
Python
TypeScript
Swift / SwiftUI
C# / XAML
C / C++
SQL
AGENTS · AI
Anthropic Claude
LangGraph / Deep Agents
MCP
Gemini Pro / Flash
PyTorch / TF
RAG · pgvector
Eval harness · benchmark
Prompt eng · cache · 1M ctx
BACKEND · INFRA
FastAPI
PostgreSQL · asyncpg
Redis
gRPC + Protobuf
Docker · multi-stage
Azure (Container Apps)
AWS (Fargate · Lambda · S3 · SQS)
Kafka / Spark
SANDBOXING · SECURITY
bwrap / seccomp
AST validation
FRONTEND · MOBILE
SvelteKit · Svelte 5
React / Next.js
iOS · MarkdownUI
OTHER
Typst
SearXNG · self-host
§07// faq · for recruiters & co-founders

The
quick answers.

Recruiters and co-founders ask the same ten questions on the first call. Here are the answers — concise enough to scan, complete enough to pre-qualify.

Q01Who is Tanay Shah?+
Tanay Shah is an AI engineer in New York City — Founding Engineer at Structured AI (Y Combinator F25 batch), where over a 14-week tenure he shipped a production document-analysis agent, an in-house ZDR-compliant code-execution sandbox, an iOS / iPadOS client from zero, and CI work that materially cut staging deploy time (1,035 commits, author-filtered, public-git verifiable). University of Maryland Computer Science graduate (departmental honors, Statistics minor, December 2025). Open to founding and early-stage AI engineering roles in 2026. Distinct from the other public Tanay Shahs: the Senior Product Manager at Amazon in Boston, the corporate-securities attorney at Baker McKenzie in Toronto, the diagnostic-radiology professor at the University of Florida, and the M&A principal at Deloitte Consulting.
Q02What does Tanay build?+
Production AI infrastructure: agent loops on LangGraph + Anthropic Claude with on-demand tool loading and prompt caching; sandboxed Python runtimes built on JSON-RPC + AST validation + bubblewrap + seccomp; durable event-sourced chat backends with replay-on-reconnect cursors and dead-letter queues; iOS / iPadOS clients with streaming markdown chat and Apple Sign-In OAuth; real-time data pipelines with bounded queues, drop-oldest backpressure, and on-anomaly black-box flight recorders.
Q03Where is Tanay based?+
New York City. Open to relocation for the right team.
Q04What kinds of roles is Tanay looking for?+
Founding / early-stage engineer roles at applied-AI startups, infrastructure / platform engineering at scale, and big-tech SWE positions where the team ships hard problems. He's particularly drawn to teams working on agent infrastructure, MCP, multi-vendor LLM orchestration, durable chat, and AI safety / sandboxing.
Q05What's Tanay's strongest technical area?+
AI agent infrastructure. He's shipped LangGraph + Anthropic Claude (Opus / Sonnet) agents in production, designed an in-house Programmatic-Tool-Calling sandbox that satisfies Zero Data Retention compliance for enterprise clients, built multi-vendor agent designs (Claude for reasoning, Gemini Vision for high-resolution document inspection), and contributed to the Model Context Protocol (MCP) ecosystem with a 4-star Travel MCP server.
Q06What was Tanay's role at Structured AI?+
Founding Engineer (NYC, February 2026 – May 2026). 1,035 commits in 14 weeks across backend, frontend, iOS, plugin work, and infra — author-filtered, public-git verifiable. Shipped the production document-analysis agent (graph-runtime + LLM), an in-house ZDR-compliant code-execution sandbox, the iOS app from zero, and CI work that materially cut staging deploy time.
Q07How fast does Tanay ship?+
Sustained: ~74 commits per week median over a documented 14-week sample. Peak week: 145 commits. The sample is author-filtered to a single email and reproducible from public git. The cadence sustains across backend, frontend, iOS, and infra work simultaneously.
Q08What's Tanay's tech stack?+
Languages: Python (3.12, asyncio TaskGroup / ExceptionGroup), TypeScript (5.5+), Swift / SwiftUI, C# / XAML, SQL, C / C++. Agents / AI: Anthropic Claude (Opus / Sonnet), Google Gemini 3.1 Pro Vision, OpenAI GPT-4o, Llama 3.3 70B (via Groq), LangGraph, Deep Agents, Anthropic Claude Code SDK, MCP. Backend: FastAPI, PostgreSQL (async drivers, pgvector, connection pooling), Redis, gRPC + Protobuf, Trigger.dev v4. Storage: Drift (Flutter SQLite), ObjectBox vectors, TensorFlow Lite, Apache AGE (Postgres graph), MongoDB. Frontend / Mobile: SvelteKit + Svelte 5, React + Next.js, Tailwind, Swift / SwiftUI, Flutter / Dart. Infra: Azure (Container Apps + Jobs, container registry, build cache), AWS (Fargate, Lambda), Vercel, Docker (multi-stage). Security: bubblewrap, seccomp, AST validation.
Q09How can I reach Tanay?+
Email tanayshah2024@gmail.com (fastest), LinkedIn at linkedin.com/in/tanayshah11, or GitHub at github.com/tanayshah11. His résumé is downloadable from tanayshah.dev. He responds to recruiter outreach within 24 hours.
Q10Does Tanay have open-source work?+
Yes. Notable repos at github.com/tanayshah11: travel-mcp-server (4 stars — MCP server for AI travel search), ai-agent-error-patterns (production reliability patterns for AI agents on Trigger.dev v4), mercury-stream (real-time market data pipeline with anomaly detection), apex-backtest (CLI wrapper for QuantConnect's Lean), and jarvis (privacy-first on-device personal AI on Flutter + FastAPI).
Q11How does Tanay pick MCP servers for an agent?+
He treats each MCP server as 500-1,000 tokens of prompt context per tool, billed every turn forever, and runs each candidate through five questions: does it solve a weekly problem (not a quarterly one); is the tool surface domain-shaped or kitchen-sink; does the server cache; does it have a clean structured-error contract; is the maintainer responsive to security issues. Recommended starting set is three servers (a code/repo server, a docs server, one domain-specific) and earn the fourth. Full heuristic with 2026 token-cost data at tanayshah.dev/blog/picking-mcp-servers-for-agents.
Q12How does Tanay think about agent sandbox security?+
He uses a four-layer hardening model — filesystem (allowlist via Bubblewrap --ro-bind, never denylist), syscall (seccomp BPF filters blocking ptrace / process_vm_readv / unexpected execve), network (default-deny with per-tool allowlists), and capability boundary at the model layer (narrow domain-specific tools instead of generic shell). Three operational rules: assume eventual escape and scope IAM accordingly; log every tool call to immutable storage; treat the model itself as a supply-chain dependency. Reasoned through the April 2026 Claude Code Bubblewrap escape (Ona) and the May 2026 CVEs in Microsoft Semantic Kernel and Claude Code Hooks. Full write-up at tanayshah.dev/blog/agent-sandbox-runtime-hardening.
Q13How does Tanay handle multi-vendor LLM orchestration?+
He picks each vendor by what it wins at — Claude for long-context reasoning and tool calling, Gemini Vision for high-resolution document inspection, Llama 3.3 on Groq for sub-100ms first-token latency on the live coaching path, GPT for the open-ended creative work. Behind a clean abstraction so a model swap is one config line, with per-tenant routing rules, per-vendor circuit breakers, and unified token-cost / latency telemetry. Multi-vendor is the new senior+ AI engineering bar; single-vendor portfolios are 2024-tier. Architecture in the multi-vendor agent design post.
Q14What's Tanay's view on Zero Data Retention for production AI agents?+
ZDR-grade compliance can't piggyback on Anthropic's hosted Programmatic Tool Calling because that feature's data retention is governed by its own policy, not your org's ZDR setting. The pattern Tanay uses: a request-interception layer that intercepts tool calls before they reach the hosted backend, runs them in your own sandbox, and returns results to the agent loop — preserving model-side reasoning quality while keeping customer data on-prem. Architectural write-up with the Anthropic-doc citation at tanayshah.dev/blog/zero-data-retention-agents.
§08// contact · open comms

Want to build
something serious?

Founding / early engineer roles. AI infra. Backend at scale. Big tech if the team ships hard problems. I'm in NYC, willing to relocate, and I move fast. Pick a command:

© 2026 TANAY SHAHFIELD REPORT v2026.05