TANAY.SHAH
← FIELD REPORT/BLOG/STREAMING-PARTIAL-JSON-TOOL-CALLS
// PUBLISHED 2026-05-10· 7 MIN READ

Why Your Agent's UI Lags Behind Its Tool Calls (and the Streaming JSON Parser That Fixes It)

Anthropic streams tool-call arguments as `input_json_delta` events that don't form valid JSON until the block closes. Most agent UIs wait for the close, parse the full string, then render the result, which means the user stares at 'agent is calling a tool…' for two seconds while the args stream in. The fix is a partial-mode JSON parser that emits valid intermediate states. The fix exists in the Anthropic SDK already; the production patterns around using it correctly are what this post is about.

Open the network tab on any production agent UI in 2026 and watch a tool call land. The model decides to invoke a tool. The transport (SSE, usually) emits a sequence of events: content_block_start with an empty input, then dozens of input_json_delta events each carrying a partial_json fragment, finally content_block_stop. Most teams wait for that final event, parse the complete JSON, and render the tool invocation in their UI. That's a 1-3 second window where the user sees 'agent is doing something' with no detail. The fix is to parse the partial JSON as it arrives and stream meaningful UI updates against it. The fix is also a place where it's easy to ship subtle bugs, which is what this post is about.

Anthropic's wire format, briefly

When Claude decides to call a tool during streaming, the API emits a content_block_start for the tool_use block (with input: {} as a placeholder), then a sequence of content_block_delta events of type input_json_delta. Each delta carries a partial_json string fragment. Concatenated in order, these fragments form the complete JSON body of the tool's arguments. The block ends with content_block_stop, after which the agent loop is expected to actually invoke the tool with the parsed arguments.

event: content_block_start
data: {"type":"tool_use","id":"...","name":"search","input":{}}

event: content_block_delta
data: {"type":"input_json_delta","partial_json":"{\"query\":"}

event: content_block_delta
data: {"type":"input_json_delta","partial_json":" \"agen"}

event: content_block_delta
data: {"type":"input_json_delta","partial_json":"t inf"}

event: content_block_delta
data: {"type":"input_json_delta","partial_json":"ra NYC\""}

event: content_block_delta
data: {"type":"input_json_delta","partial_json":", \"top_k\": 5}"}

event: content_block_stop
data: {}

The naive consumer accumulates the partial_json strings into a buffer and calls JSON.parse / json.loads on the buffer at content_block_stop. Works, but loses the ~500-2000ms of streaming time during which you could be showing the user 'searching for: agent infra NYC' in real time.

What 'parse partial JSON' actually means

A standard JSON parser fails on incomplete input. A partial-mode parser is one that returns a best-effort object given an incomplete string by closing open structures and dropping incomplete trailing tokens. There's a small tower of correctness levels:

  • trailing-strings: the parser closes incomplete strings at the current position and returns the partial value. Useful when you want to render the in-progress text immediately. Trade-off: you'll see partial words during streaming.
  • partial-mode (default in jiter): the parser closes open objects and arrays but treats incomplete strings as not-yet-present. The query field appears in the parsed object only when its closing quote arrives. Trade-off: each field 'pops in' as a complete value rather than streaming character-by-character.
  • schema-aware: the parser knows the expected schema and emits intermediate objects whose shape matches. The query field is a string; the top_k field is an integer; the parser refuses to emit the integer until the closing brace or comma confirms it's a complete number, not a partial one (12 vs 120 vs 1200).

Each of these is right for a different UI shape. For a search box that streams the query as the model types it, trailing-strings is correct (the user sees the query forming). For a structured tool call where partial values would mislead the user (a delete_count field showing '5' while the model is mid-typing '500'), partial-mode is correct. For both at once, schema-aware is correct, with per-field policy.

What I ship in production

┌────────────────────────────────────────────────────┐
│  SSE event reader (one event at a time)            │
└────────────────────┬───────────────────────────────┘
                     │
                     ▼
┌────────────────────────────────────────────────────┐
│  Buffer:  pending_json: str  (mutated on each      │
│                              input_json_delta)     │
└────────────────────┬───────────────────────────────┘
                     │
                     ▼
┌────────────────────────────────────────────────────┐
│  Parser:  jiter.from_json(pending_json,            │
│             partial_mode="trailing-strings")       │
│           → returns dict-with-partials             │
└────────────────────┬───────────────────────────────┘
                     │
                     ▼
┌────────────────────────────────────────────────────┐
│  Schema gate:  for each field, decide:             │
│   - emit immediately (search.query, free-text)     │
│   - wait for completion (limits, integers, enums)  │
│   - structural changes only (lists, nested)        │
└────────────────────┬───────────────────────────────┘
                     │
                     ▼
┌────────────────────────────────────────────────────┐
│  Debounced UI update (50-100ms, similar to the     │
│   MarkdownUI streaming buffer pattern)             │
└────────────────────────────────────────────────────┘

The pipeline runs on every input_json_delta. The buffer grows linearly. The parser is called repeatedly on the buffer (each call O(n) on the current buffer length). The schema gate decides which fields to expose to the UI. The UI debouncer prevents the React reconciler from re-rendering 50 times a second during a fast stream.

Why I parse the buffer each time, not incrementally

There are libraries (VectorJSON, JSON River) that maintain parser state across deltas to avoid re-parsing the whole buffer. They claim O(n) total parse time across the stream, vs O(n²) for repeated parses. The math is right and for very long buffers it would matter. In practice, tool-call argument JSON is small (most tool calls have under 2KB of arguments). 50 re-parses of a 2KB buffer is half a millisecond per delta on a laptop. Not the bottleneck.

The bottleneck is React. Re-rendering the agent UI 50 times during a 1-second stream blows past the frame budget; the UI feels janky long before the parser becomes the issue. Solving the React side first (via the debounce) pays for the 'wasted' re-parse cost a hundred times over. I'd revisit this for stream-heavy use cases (tool calls with multi-MB schemas, code-generation tools that stream long blocks), but for the median agent tool call the simple repeated-parse is correct.

The bugs you avoid by doing this right

  • Showing wrong intermediate values. If the model is streaming 1500 and the parser tries to emit at 15 (after the first two characters), the UI flashes 15, then 150, then 1500. Schema-aware partial parsing waits for the next character that isn't a digit before emitting an integer. The user only ever sees the final value.
  • Choking on escapes. partial_json fragments can split mid-escape ('\\u00' in one delta, '63' in the next). A parser that doesn't handle this rejects the buffer until both arrive. jiter's partial_mode handles it; some hand-rolled parsers don't. Test with adversarial inputs.
  • Stale renders after stream end. If the UI debounces at 80ms and the stream ends at 75ms, the final render hasn't happened yet. Always force a flush on content_block_stop, the same way the iOS streaming-buffer pattern flushes on stream end.
  • Race conditions on the buffer. If the SSE reader and the parser are on different async tasks, two deltas can interleave with two parses in the wrong order. Single-threaded buffer mutation, then parser call, then schema gate, then UI update is the right ordering; doing any of it in parallel is the bug surface.

Built-in support across the SDKs

  • Anthropic Python SDK: jiter with partial_mode=True is built in for streaming tool use. The fine-grained mode (eager_input_streaming flag, partial_mode='trailing-strings') is opt-in per tool.
  • Vercel AI SDK: streamObject's partialObjectStream is the corresponding TypeScript primitive. useObject is the React hook that wires it up. Tool-call streaming is on the stable surface in v5+.
  • LangChain JsonOutputParser: yields partial dicts when used in streaming mode; works across providers.
  • Direct partial parsers per language: partial-json-parser (Python + JS), PartialJSON (Swift), JsonCompleter (Ruby), VectorJSON (high-performance WASM). Use when you're not on the SDK's happy path or when the SDK's parser doesn't expose the field-level granularity you want.

What I would change if I rebuilt the parsing layer

  • Per-field debounce rates. A free-text field (search query) should update at 30-60Hz; a structural field (selected_items array) can update at 5-10Hz with no UX loss. Today my debouncer is one global rate; per-field rates are a small refinement that ships better-feeling UI.
  • Schema-driven 'safe to emit' decisions. Today the gate logic is hand-coded per tool; a Zod-schema-derived gate would be the cleaner abstraction. The trade-off is that the schema schema itself becomes a thing you maintain.
  • Lift the parser into a shared worker on the web. Today it runs on the main thread. For the median tool call this is fine; for the long-tail tool with a multi-KB streaming schema, offloading to a Web Worker would keep the React main thread cleaner.

The bigger lesson

Streaming UIs for AI agents are mostly not about LLM-side optimizations. They're about consuming the LLM's output stream efficiently and rendering safe intermediate states. Partial JSON parsing is the unglamorous infrastructure that makes the difference between an agent that feels slow and an agent that feels live. The libraries exist; the SDK support is there; what's missing in most teams' implementations is the schema-aware emit policy that decides which fields are safe to show in flight. That decision is per-tool and per-field, not a global setting.

If a hiring manager asks me how I think about agent UI infrastructure, this is the framing. Not 'we use the streaming SDK,' but 'here's the wire format, here's the partial parser configuration, here's the per-field emit gate that decides what the user sees while the model is mid-stream.' That level of plumbing literacy is the thing that separates an agent product that demos from one that ships.

References

  • Anthropic: 'Streaming messages' and tool-use streaming (platform.claude.com/docs/build-with-claude/streaming)
  • Anthropic: 'Fine-grained tool streaming' (platform.claude.com/docs/agents-and-tools/tool-use/fine-grained-tool-streaming)
  • jiter: Rust JSON parser used by the Anthropic Python SDK (github.com/pydantic/jiter)
  • Vercel AI SDK: streamObject + partialObjectStream + useObject (ai-sdk.dev)
  • promplate/partial-json-parser (Python + JavaScript)
  • teamchong/vectorjson: WASM SIMD streaming parser for tool calls
  • JacksonKearl/gjp-4-gpt: streaming JSON parser for live LLM output
  • aha.io engineering: 'Streaming AI responses and the incomplete JSON problem'