AI Agent Error-Handling Patterns · Reference / open-source library

// 01 — WHY I BUILT IT

THE PROBLEM

Most AI agent tutorials show the happy path: LLM responds, task succeeds, everyone's happy. Real production systems need to survive cascading failures when the LLM provider is down, partial batch failures (95 items succeed and 5 fail — now what?), edge cases where the AI can't decide and needs human judgment, and rate limits that force you to fall back to cheaper models.

I kept seeing teams ship AI features that worked beautifully in demo and broke catastrophically in week 2 — silent retry storms burning through OpenAI credits, batches that quietly dropped 5% of items, agents that stalled on ambiguous input. The patterns to fix all of this are well-known in distributed systems but mostly absent from the AI-agent literature. So I codified them.

// 02 — THE APPROACH

THE WORK

The repo implements all four patterns on Trigger.dev v4 (durable task runner, native to the agentic-workflow shape) with TypeScript. Each pattern has a standalone CLI test that runs in ~3ms, no server needed, so you can validate behavior in CI.

The patterns are designed to compose: a single agent task can use circuit breaker + partial success + graceful degradation simultaneously. Each pattern documents a clear production upgrade path (Redis-backed circuit breaker state, Postgres-backed batch tracking, Slack-backed human escalation, Sentry-backed observability) so the example code is the starting point, not the end state.

// 03 — KEY DECISIONS

WHAT I CHOSE & WHY

DECISION · 01

Trigger.dev v4 over LangGraph / Temporal

LangGraph is the right tool for in-process agent loops; Temporal is the right tool for sprawling enterprise workflows. For most agent teams, the durable-task primitive sits in between — and that's exactly Trigger.dev's shape. Picking it forced the patterns to express themselves at the right level of abstraction.

DECISION · 02

Standalone CLI tests, not just integration tests

If the patterns require a server, a database, and an API key to validate, they won't get used. The CLI test runs all four patterns in 3ms with zero setup — so the project demos itself in the time it takes to clone.

DECISION · 03

Document the upgrade path, not just the demo

Most error-handling examples are toy. The README explicitly maps each demo decision to its production version: in-memory state → Redis; console alerts → Slack/PagerDuty; mock LLM → real LLM with cost tracking. Readers can see exactly what to change when they adopt.

// 05 — STATE OF THE ART

2026 BLEEDING-EDGE TECH

▸ Trigger.dev v4 (durable agentic runtime)

Purpose-built for long-running agent tasks. Beats LangGraph for orchestration sprawl and beats Temporal for AI-shaped workflows. The right primitive for production agents in 2026.

▸ Multi-provider graceful degradation (GPT-4 → Claude → template)

Vendor lock-in is a 2026 liability. The graceful-degradation pattern lets agents survive provider outages, rate limits, and cost spikes by automatic fallback through a provider chain.

▸ Resume-token-based human-in-the-loop

When an agent gets stuck, durable resume tokens beat polling-based HITL by orders of magnitude in cost and latency. Standard primitive for serious AI products.

▸ TypeScript 5.5 + Zod runtime validation

Schema-first agent design — every tool input/output is Zod-validated, every error path has a typed shape. No string-typing in production agents.

// 06 — MEASURED

NUMBERS THAT MATTER

Tests

4/4 passing

Standalone CLI

Test duration

~3 ms

All patterns combined

Patterns

4

Circuit breaker · Partial success · HITL · Graceful degradation

// 07 — IF I DID IT AGAIN

LESSONS · WHAT I'D CHANGE

→The most underrated pattern of the four is partial success. Teams underestimate how often batch operations are 95/5 splits, and how much pain comes from treating those as binary success/failure.
→Graceful degradation across providers (GPT-4 → Claude → template) is more nuanced than it looks — different providers have different output formats, so the fallback chain has to either normalize outputs or accept lossy responses. Worth a separate post.
→Human-in-the-loop with resume tokens turns out to be the right primitive for most 'agent gets stuck' situations. The pattern is well-suited to durable-task runners and surprisingly hard to retrofit onto stateless agent loops.

// 08 — STACK

THE TOOLS

LANGUAGE

TypeScript 5.5

RUNTIME

Trigger.dev v4

TOOLING

pnpm

SCHEMA

Zod

// 09 — RELATED WORK

OTHER CASE STUDIES BY TANAY

Architecture sketch · weekend project

Real-Time Sales-Conversation Coaching Agent

Architecture sketch for an agent that listens to in-person field sales conversations and delivers grounded coaching — live during the call and deep after it.

READ →

Real-time data infrastructure

MercuryStream

A real-time market data pipeline that solves the problems exchanges warn you about.

READ →

MCP server

Travel MCP Server

Real-time access to flights, hotels, and weather — for any AI assistant that speaks MCP.

READ →

// RELATED NOTES