Why I Use a Postgres Append-Only Log for Agent Chat (Not Redis Streams)

The 2026 default for resumable LLM chat is Redis Streams behind Vercel AI SDK 5's resumeStream. It works, and it doubles your storage footprint with two systems of record: Postgres for messages, Redis for resumption. The version I shipped uses one append-only Postgres table for both. Here's the schema, the seq-number contract, and why I'd defend the choice in front of a hiring manager.

The standard 2026 recipe for an LLM chat backend that survives reconnects looks something like this: persist messages to Postgres (or Mongo), track in-progress streams in Redis Streams, expose a resumeStream endpoint that tails the Redis stream by ID, and use Vercel AI SDK 5's useChat with resume: true on the client. The recipe is documented, well-supported, and works. It also gives you two systems of record for the same conversation: Postgres holds the messages, Redis holds the in-progress chunks, and the operational reality of keeping them in sync is a tax you pay forever.

I picked a different shape and shipped it on iOS, web, and a CLI client off the same backend: a single Postgres table holding the event log, a monotonic seq number per chat, and a tiny events_since(chat_id, after_seq) query that every client uses to catch up after a disconnect. One storage system, one source of truth, three clients reading from it. The post is about why, what the schema looks like, and where the design's edges are.

The two-systems problem with the default

Redis Streams is a great primitive for what it is: a fast in-memory append-only log with consumer groups, blocking reads, and minimal overhead. The reason it's the default in the Vercel AI SDK 5/6 resumable-streams pattern is that the latency profile (sub-millisecond writes, blocking reads for new chunks) is ideal for the streaming-chunk use case. The reason it's not the right answer for me: it's a separate persistence tier that has to coordinate with the messages I'm already storing in Postgres, and the coordination has its own failure modes.

▸TTL bites. Redis Streams entries expire by default. If a user closes their laptop for two hours, the in-progress stream is gone and the resume hits a cold path. You can extend the TTL but at memory cost.
▸Two backups, two restores. Postgres is backed up every hour with point-in-time recovery; Redis is in-memory with optional AOF persistence. After an incident the two stores can be at different points in time. Reconciling them is bespoke work.
▸Two failure modes. The chat is durable when both systems are healthy. When Redis is down, new messages still write to Postgres but resume doesn't work. When Postgres is down, nothing works but you also lose the ability to compose a reply because messages aren't persisted.
▸Cross-client sync needs a bridge. Web reads from Redis Streams; the iOS client wants the same stream; the CLI consumer is a third client. Each surface has to know about both Postgres and Redis or you have to build a fan-out service in front of the streams.

None of these are show-stoppers. They are, individually, ~1 day of engineering work each. Cumulatively they are the operational tax that makes engineers eventually wonder if the second system was worth it.

The single-table design

CREATE TABLE chat_events (
  chat_id     UUID        NOT NULL,
  seq         BIGINT      NOT NULL,
  event_type  TEXT        NOT NULL,    -- 'user_msg', 'assistant_chunk', 'tool_call', 'tool_result', 'error'
  payload     JSONB       NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  PRIMARY KEY (chat_id, seq)
);

-- Per-chat seq comes from a per-chat counter, not a global sequence:
-- a deferred trigger or a CTE assigns seq = COALESCE(MAX(seq), 0) + 1
-- atomically per insert. Global sequences are tempting but they couple
-- write throughput across all chats; per-chat counters scale linearly.

CREATE INDEX chat_events_by_chat_seq
  ON chat_events (chat_id, seq);

-- The query every client runs on reconnect:
SELECT seq, event_type, payload
FROM chat_events
WHERE chat_id = $1 AND seq > $2
ORDER BY seq;

Three things make this work as the only chat persistence layer:

▸Append-only schema. Once a row is written it never changes. No UPDATE statements. Edits to a previous message are new events (event_type = 'edit') that reference the original seq. This is the event-sourcing discipline that makes replay-from-log meaningful.
▸Per-chat monotonic seq. Each chat has its own sequence space starting at 1. The (chat_id, seq) primary key gives uniqueness without a global sequence bottleneck. Insert is INSERT ... RETURNING seq inside a transaction that bumps the per-chat counter atomically.
▸JSONB payload. The shape of an event varies by type (chunks have text, tool calls have args, errors have stack traces). JSONB keeps schema evolution cheap; you don't need a migration every time you add a new event field. The payload structure is documented in code, validated at the application boundary, and asserted via Postgres jsonb_typeof checks if you want to be strict.

The seq-number contract for clients

Every client tracks last_seq it has rendered for each chat. On any state-disturbing event (open the chat, app foreground, network reconnect, push notification), the client sends GET /chats/{id}/events?since={last_seq} and receives the deterministic ordered list of new events. The client applies them to local state, updates last_seq to the highest seq it received, and the UI reflects exactly what's on the server.

The contract is one query, deterministic ordering by seq, idempotent application on the client side. There's no concept of 'live' versus 'historical' events; both are reads from the same table. Streaming, when the client wants real-time updates, layers SSE on top: the server pushes new events as they're written, the client extends last_seq as it consumes them, and on disconnect the catch-up query is the same one it ran on first load. There is no second pathway. The streaming pathway and the cold-load pathway are the same code on both ends.

Mobile network drops, app backgrounding, and laptop-lid-close are the same case from the protocol's perspective: the SSE connection ends, the client reopens it later (or the user navigates back), the catch-up query runs, the UI is consistent. Nothing is lost because nothing was held in volatile state on the server.

Multi-client (web + iOS + CLI) consumption

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Web client  │    │  iOS client  │    │  CLI client  │
│  (Next.js)   │    │  (Swift)     │    │  (Python)    │
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                   │
       │  GET /events?since=N (same query)     │
       ▼                   ▼                   ▼
       ┌─────────────────────────────────────────┐
       │   FastAPI (or any thin REST/SSE layer)  │
       └────────────────────┬────────────────────┘
                            │
                            ▼
                   ┌─────────────────┐
                   │   chat_events   │   ◄── single source of truth
                   │   (Postgres)    │
                   │   append-only   │
                   └─────────────────┘

The web client uses Server-Sent Events. The iOS client uses URLSession streaming with the same SSE format. The CLI client polls the catch-up endpoint every second. Every client uses the same ?since=N pagination contract. There is no client-specific server logic, no per-client persistence shape, no multi-vendor protocol negotiation. The simplest design that satisfies the requirements.

Trade-offs I accept

▸Latency floor. Postgres insert + SSE forward is in the 5-15 ms range on a healthy cluster. Redis Streams can do this in 1-3 ms. For chat, the difference is invisible: the LLM's per-chunk latency is 30-100 ms, dominating any infrastructure delta. For ultra-low-latency interactive workloads (game state synchronization, financial tickers), Redis is the right answer.
▸Per-chat seq counter is a hot row. A single chat with high-frequency tool calls can become a Postgres write hotspot. In practice the rate is low (an active chat is doing maybe 10-30 events/second tops); for chats orders of magnitude busier, you'd shard by chat_id or move to a per-shard counter.
▸Backups carry the chat history. The append-only log is the chat history. Privacy / compliance / GDPR right-to-erasure is implemented as a hard-delete that takes the chat row and all its events; soft-delete with retention is implemented as a tombstone event. Both are explicit, both are audit-loggable.

When Redis Streams is the right answer

Two cases:

▸Sub-5ms is a product requirement. If the chat is feeding a live trading dashboard or a multiplayer game state, the latency floor of a Postgres insert is too high. Redis Streams (or even faster: NATS JetStream, KafkaStreams in tight memory) earns its keep.
▸You're already on the Vercel platform with their AI SDK and your team's tolerance for Postgres operational depth is low. The default recipe is a sane default. Don't fight it for a small team that has more important things to ship; the operational tax is real but manageable, and the time you'd spend rolling your own append-only log is time not spent on the agent.

What I would change if I rebuilt

▸Add a materialized projection table for fast UI list rendering. The append-only log is the source of truth, but rendering 'all my chats sorted by last activity' requires reading all events. A small materialized table (chat_id, title, last_event_at, message_count) refreshed by trigger gives the list view sub-10ms latency without compromising the log's append-only property.
▸Move the seq counter into a per-chat row in a separate counters table to dodge the hot-row issue. This is invisible to clients (seq is still per-chat-monotonic) and lets the events table be a pure append.
▸Versioned event payloads. Today the payload schema is documented in code; a payload_version: number field would make schema evolution explicit and let consumers handle older versions gracefully without runtime guessing.

The bigger lesson

The default architecture diagram for an LLM chat in 2026 has too many boxes. Two storage systems, three queues, a separate websocket layer, a fan-out service. Each box is a defensible decision. Cumulatively they're an operational burden that doesn't pay for itself for most teams. The append-only-Postgres-as-event-log shape is the simplest design that satisfies durable + resumable + multi-client + replay-able, and it leans on the database you already have for everything else.

Picking the right storage shape is where 'production-ready' is decided for a chat backend, not in the agent loop or the prompt engineering. The agent code can be replaced; the storage shape lives forever. If a hiring manager asks me about chat persistence design, this is what I describe and why.

References

▸Vercel AI SDK: Chatbot Resume Streams (ai-sdk.dev/docs/ai-sdk-ui/chatbot-resume-streams)
▸Stardrift: 'Is resumable LLM streaming hard? No, it's just annoying.' (stardrift.ai/blog)
▸Upstash: 'How to Build LLM Streams That Survive Reconnects, Refreshes, and Crashes'
▸Postgres docs: deferrable triggers, jsonb_typeof, and append-only patterns
▸Vercel Workflow DevKit blog drafts on event-sourced execution
▸Couchbase XDCR: monotonic sequence number conflict resolution (docs.couchbase.com)
▸AgentRR: 'Get Experience from Practice: LLM Agents with Record & Replay' (arXiv 2505.17716)

← MORE NOTES OPEN COMMS →