Why Your iOS Streaming Chat Is Cooking the GPU (and the 30-Line Debounce Buffer That Fixes It)
MarkdownUI plus AsyncSequence is the obvious way to render a streaming Claude or GPT response in SwiftUI. It is also the way I shipped a chat app where the phone got noticeably warm during long replies. The fix is a 30-line debounce buffer that re-parses on newlines or every 80ms, whichever comes first. Same UX, GPU stays cool, the bug stops showing up in the App Store reviews.
Open any tutorial on building an iOS chat that streams Claude or GPT responses and you'll see the same code shape: an AsyncSequence of token chunks pumping into an @Observable view model, the view model's message string updating on each chunk, and a MarkdownUI view re-rendering as the string grows. It works. It also has a property the tutorials don't mention: every appended token re-parses the entire markdown string from scratch, every SwiftUI body update re-builds the rich-text layout, and a 4,000-token response means roughly 4,000 parses of progressively longer strings. On modern A17/M-series silicon you can almost feel the phone heat up. On older devices the chat starts dropping frames. The fix is a debounce buffer; the post is about what it actually has to do to be correct.
Why MarkdownUI is the right starting point
A small clarification before the criticism. swift-markdown-ui (gonzalezreal/swift-markdown-ui) is the right rendering library for streaming agent output on iOS in 2026. It handles GFM tables, fenced code blocks with syntax highlighting via Splash, links with proper accessibility, lists, blockquotes, and inline code. The successor project (Textual, by the same author, listed as the active development home now) is solving the same problems with better internals; both are good choices. The performance issue is not the library being bad; it's a fundamental property of any rich-text renderer asked to re-render its entire input on every token. The 30-line buffer in this post applies just as well to Textual.
The naive pattern, with a name attached to the bug
@Observable
final class ChatViewModel {
var streamingMessage: String = ""
func receive(stream: AsyncThrowingStream<String, Error>) async throws {
for try await chunk in stream {
streamingMessage += chunk // ← re-parses on EVERY token
}
}
}
struct ChatBubble: View {
let model: ChatViewModel
var body: some View {
Markdown(model.streamingMessage) // ← MarkdownUI re-renders here
}
}The bug is the for-await loop appending one token, the view tracking that property, the view body re-running, MarkdownUI re-parsing the whole-string-so-far. At 50 tokens/second from Claude or GPT, that's 50 full parses per second of an ever-growing string. The parse complexity is roughly O(n) in string length; cumulatively over a 4K-token response, you're doing O(n²) work to render what should be O(n) text.
The 30-line fix
@Observable
final class ChatViewModel {
var renderedMessage: String = "" // what MarkdownUI sees
private var pendingTokens: String = "" // what the stream is filling
private var flushTask: Task<Void, Never>?
func receive(stream: AsyncThrowingStream<String, Error>) async throws {
for try await chunk in stream {
pendingTokens += chunk
// Newline = natural paragraph boundary, flush immediately
if chunk.contains("\n") {
flushNow()
} else {
scheduleFlush(after: .milliseconds(80))
}
}
flushNow() // ensure final flush after stream ends
}
private func scheduleFlush(after delay: Duration) {
flushTask?.cancel()
flushTask = Task { [weak self] in
try? await Task.sleep(for: delay)
guard !Task.isCancelled else { return }
await MainActor.run { self?.flushNow() }
}
}
private func flushNow() {
flushTask?.cancel()
renderedMessage = pendingTokens // single property update
} // single MarkdownUI re-parse
}Three things make this work:
- ▸Two separate properties: pendingTokens (mutated on every chunk, NOT observed by any view) and renderedMessage (mutated only on flush, observed by MarkdownUI). The view only sees the property that changes infrequently. SwiftUI's @Observable granularity does the rest: views that don't read pendingTokens don't re-render on its updates.
- ▸Newline-or-80ms flush. Newlines are natural paragraph boundaries; flushing on them keeps the visible text in sync with semantic structure. The 80ms timer covers the case where there's a long stream of inline tokens between newlines (a code block, a long URL). 80ms is roughly 12 flushes/second, well under the perceptual threshold for smooth streaming, and 4-5x fewer re-parses than 50 tokens/second.
- ▸MainActor for the property write. SwiftUI must read view-model state on the main actor; mutating from a background Task without bridging is a runtime warning at best and a crash at worst. The Task.sleep happens off-main; the actual write hops back via MainActor.run.
Why this is correct, not just fast
The version of this fix that bites you in code review is the one that flushes only on time. Pure 80ms debounce loses the 'finished thought' coherence: a paragraph break that arrives at millisecond 5 of an 80ms window doesn't render until millisecond 80. The user sees text appear in clumps that don't match the semantic structure of the content. Flushing on newline-or-time gives the perceptual smoothness of time-based debounce plus the semantic correctness of newline-based flush.
There's a similar trap with the cancel-and-replace pattern on the Task. If the timer fires and the flush completes, but a new chunk arrived during the flush, the new chunk has to be visible in the next flush. The pendingTokens / renderedMessage split makes this trivial: the next flush copies whatever has accumulated. The version where you mutate one string and try to debounce the view updates leaves the door open for races; the two-string version closes it.
Measuring the difference (rough numbers)
- ▸Naive version on iPhone 15 Pro, 4K-token Claude response: ~9 seconds of sustained ~80% GPU utilization, phone runs warm, occasional dropped frames in scroll-to-bottom.
- ▸Buffered version, same response: GPU peaks under 30%, phone stays cool, no dropped frames.
- ▸Battery drain difference, same response: roughly 2x improvement (this is hard to measure precisely; my proxy was 'time before the device temperature warning kicks in during a 30-message benchmark').
- ▸Perceived speed: indistinguishable. The user reads text at well under 12 paragraphs/second; flushing 12 times/second feels native.
These numbers are specific to my benchmark and not a published result. The shape, however, is robust: any version of the buffer that flushes 10-20 times per second will outperform the naive version dramatically because it caps the re-parse work at a constant per-second rate regardless of token velocity.
Common mistakes I see in this pattern
- ▸Debouncing inside the view (.onChange with debounce). Tempting because it keeps the view-model simple. Wrong because the view-model is still mutating the observed property on every token; the view's debounce only delays the SwiftUI body re-run, not the MarkdownUI parse. The work happens regardless. Move the debounce into the view-model.
- ▸One AsyncStream → multiple views observing. If the chat list and the open thread both render the streaming message, both pay the parse cost. Confine the streaming message to one view; the list view shows a stable preview ('AI is typing...').
- ▸Forgetting Task cancellation. If the user navigates away mid-stream, the receive loop keeps appending to pendingTokens. The flush task keeps firing. Tie the receive Task to the view's life cycle (or a chat-session id) and cancel on dismiss.
- ▸Bridging through Combine instead of @Observable. Combine works but the @Observable macro (iOS 17+) has finer-grained dependency tracking. Combine publishers force everything that subscribes to re-render; @Observable only re-renders views that read the changed property.
- ▸Re-implementing markdown parsing inline. The temptation when fighting parse cost is to do less parsing yourself. Don't; MarkdownUI / Textual handle the long tail of edge cases (mid-token bold, partial code fences, RTL text) better than a hand-roll. Buffer the input, not the parser.
What I would change if I rebuilt the streaming layer
- ▸Migrate to Textual. The successor project to MarkdownUI has internal improvements specifically targeting the streaming case. The buffer pattern still applies, but the absolute parse cost should drop further.
- ▸Add a 'final flush' that re-renders one more time after the stream ends. The current implementation does this, but the 80ms timer can leave a sliver of pending tokens un-flushed if the stream ends abruptly. An explicit terminal flush guarantees the final state matches the stream.
- ▸Tunable flush cadence per device tier. iPhone 15 Pro can absorb 20 flushes/second comfortably; an older iPhone XR cannot. A simple tier check (UIDevice model) and a per-tier flush_ms config gives the older devices breathing room without slowing down the newer ones.
- ▸Backpressure-aware. If MarkdownUI is still parsing the previous flush when the next is scheduled, skip the next flush. Today the pattern doesn't observe parse completion; in practice this is rare, but a queue-depth check would close the gap.
The bigger lesson
Performance bugs in iOS streaming chat are mostly not about Swift, MarkdownUI, or even SwiftUI. They're about the rate at which observed state changes. The same lesson applies to web (re-rendering React components on every token), to terminal apps (re-rendering ANSI escape sequences), to anywhere a stream feeds a renderer. The fix is small and well-known to people who've shipped streaming UIs before; the surprise is how often it gets skipped because the tutorial example has a 12-token response and the bug doesn't show up.
If a hiring manager asks me how I think about iOS LLM client engineering, this is the kind of thing I'd want to talk about. Not 'we use MarkdownUI,' but 'here's the parse-cost shape, here's the debounce that fixes it, here's why it's an architectural concern of the view-model and not a styling concern of the view.' Native craft on iOS in 2026 is the difference between an app that demos and an app that ships.
References
- ▸gonzalezreal/swift-markdown-ui (GitHub): now in maintenance mode
- ▸Textual (gonzalezreal): the active successor project
- ▸swift-markdown-ui issue #426: documented streaming performance issue
- ▸swift-markdown-ui discussion #261: 'how to optimize performance for repeated rendering on live chat messages?'
- ▸Apple SwiftUI: @Observable macro and granular dependency tracking (iOS 17+)
- ▸Apple SwiftUI: AsyncSequence + AsyncStream patterns
- ▸Yan Zaitsev: 'From Stream to Screen: Handling GenAI Rich Responses in SwiftUI' (Medium / SAFE Engineering)
// RELATED READING
- POSTShipping an Agent iOS App From Zero in Two Weeks: What Survived, What Didn't
- POSTWhy I Use a Postgres Append-Only Log for Agent Chat (Not Redis Streams)
- POSTWhen to Use SSE vs WebSocket for AI Agent Streaming (and Why I Use Both)
- POSTAnonymizing PII Client-Side Before It Reaches the LLM (Why I Don't Trust the Gateway)
- CASE STUDYStructured AI — Founding Engineer Reps