TANAY.SHAH
← FIELD REPORT/PROJECTS/AI-AGENT-ERROR-PATTERNS
✓ SHIPPED · 2025 · REFERENCE / OPEN-SOURCE LIBRARY · UPDATED 2026-05-10

AI Agent Error-Handling Patterns.

Stop your AI agents from failing silently. Four production reliability patterns, with tests and upgrade paths.

// 01 — WHY I BUILT IT
THE PROBLEM

Most AI agent tutorials show the happy path: LLM responds, task succeeds, everyone's happy. Real production systems need to survive cascading failures when the LLM provider is down, partial batch failures (95 items succeed and 5 fail — now what?), edge cases where the AI can't decide and needs human judgment, and rate limits that force you to fall back to cheaper models.

I kept seeing teams ship AI features that worked beautifully in demo and broke catastrophically in week 2 — silent retry storms burning through OpenAI credits, batches that quietly dropped 5% of items, agents that stalled on ambiguous input. The patterns to fix all of this are well-known in distributed systems but mostly absent from the AI-agent literature. So I codified them.

// 02 — THE APPROACH
THE WORK

The repo implements all four patterns on Trigger.dev v4 (durable task runner, native to the agentic-workflow shape) with TypeScript. Each pattern has a standalone CLI test that runs in ~3ms, no server needed, so you can validate behavior in CI.

The patterns are designed to compose: a single agent task can use circuit breaker + partial success + graceful degradation simultaneously. Each pattern documents a clear production upgrade path (Redis-backed circuit breaker state, Postgres-backed batch tracking, Slack-backed human escalation, Sentry-backed observability) so the example code is the starting point, not the end state.

// 03 — KEY DECISIONS
WHAT I CHOSE & WHY
DECISION · 01

Trigger.dev v4 over LangGraph / Temporal

LangGraph is the right tool for in-process agent loops; Temporal is the right tool for sprawling enterprise workflows. For most agent teams, the durable-task primitive sits in between — and that's exactly Trigger.dev's shape. Picking it forced the patterns to express themselves at the right level of abstraction.

DECISION · 02

Standalone CLI tests, not just integration tests

If the patterns require a server, a database, and an API key to validate, they won't get used. The CLI test runs all four patterns in 3ms with zero setup — so the project demos itself in the time it takes to clone.

DECISION · 03

Document the upgrade path, not just the demo

Most error-handling examples are toy. The README explicitly maps each demo decision to its production version: in-memory state → Redis; console alerts → Slack/PagerDuty; mock LLM → real LLM with cost tracking. Readers can see exactly what to change when they adopt.

// 05 — STATE OF THE ART
2026 BLEEDING-EDGE TECH
Trigger.dev v4 (durable agentic runtime)

Purpose-built for long-running agent tasks. Beats LangGraph for orchestration sprawl and beats Temporal for AI-shaped workflows. The right primitive for production agents in 2026.

Multi-provider graceful degradation (GPT-4 → Claude → template)

Vendor lock-in is a 2026 liability. The graceful-degradation pattern lets agents survive provider outages, rate limits, and cost spikes by automatic fallback through a provider chain.

Resume-token-based human-in-the-loop

When an agent gets stuck, durable resume tokens beat polling-based HITL by orders of magnitude in cost and latency. Standard primitive for serious AI products.

TypeScript 5.5 + Zod runtime validation

Schema-first agent design — every tool input/output is Zod-validated, every error path has a typed shape. No string-typing in production agents.

// 06 — MEASURED
NUMBERS THAT MATTER
Tests
4/4 passing
Standalone CLI
Test duration
~3 ms
All patterns combined
Patterns
4
Circuit breaker · Partial success · HITL · Graceful degradation
// 07 — IF I DID IT AGAIN
LESSONS · WHAT I'D CHANGE
  • The most underrated pattern of the four is partial success. Teams underestimate how often batch operations are 95/5 splits, and how much pain comes from treating those as binary success/failure.
  • Graceful degradation across providers (GPT-4 → Claude → template) is more nuanced than it looks — different providers have different output formats, so the fallback chain has to either normalize outputs or accept lossy responses. Worth a separate post.
  • Human-in-the-loop with resume tokens turns out to be the right primitive for most 'agent gets stuck' situations. The pattern is well-suited to durable-task runners and surprisingly hard to retrofit onto stateless agent loops.
// 08 — STACK
THE TOOLS
LANGUAGE
TypeScript 5.5
RUNTIME
Trigger.dev v4
TOOLING
pnpm
SCHEMA
Zod