Stateful Continuation for AI Agents: Why Transport Layers Now Matter
Author: Anirudh Mendiratta
Introduction
As AI coding agents mature, the transport layer becomes increasingly crucial for agentic workflows. This article explores why stateful continuation, as implemented by OpenAI's WebSocket mode, significantly improves performance by reducing overhead and enhancing efficiency.
The Problem: Overhead in Agentic Workflows
Agent workflows involve multiple turns, each requiring tool calls and context. In single-turn LLM use, overhead is negligible. However, in multi-turn agentic coding sessions, the growing context becomes a bottleneck. Retransmitting the entire conversation history with each turn leads to linear payload growth and increased latency.
The Solution: Stateful Continuation
OpenAI's WebSocket mode introduces server-side caching of conversation history. This approach significantly reduces client-sent data by 80%+ and improves execution time by 15-29%.
The Benefits
- Architectural Advantage: Stateful continuation is not protocol-specific; any approach avoiding retransmitting context can achieve similar gains.
- Performance Trade-offs: While stateful designs offer performance, they introduce challenges in reliability, observability, and portability.
The Airplane Problem
A real-world example illustrates the issue: during a flight, the in-flight internet and Claude Code agent struggled with timing out requests due to the growing payload. Retransmitting the entire conversation history with each turn led to a bottleneck on a bandwidth-constrained link.
The Agentic Coding Loop
AI coding agents, like Claude Code, OpenAI Codex, Cursor, and Cline, perform multi-file edits, run test suites, and iterate on failing builds. The core is the agent loop, a cycle of model inference and tool execution.
The HTTP Overhead Problem
HTTP-based APIs, including OpenAI's Responses API, are stateless. Each turn requires retransmitting system instructions, tool definitions, user prompts, and prior model outputs, leading to linear payload growth.
Benchmarking Results
Our benchmarks, using GPT-5.4 and GPT-4o-mini, demonstrated significant performance improvements with WebSocket mode:
- Data Reduction: WebSocket reduced client-sent data by 80-86%.
- Execution Time: WebSocket delivered 15-29% faster end-to-end execution.
- First-Turn TTFT: WebSocket handshake didn't add significant overhead.
Why It's Faster: The Architecture
WebSocket's speed stems from server-side state management, caching the most recent response in connection-local memory.
The Bandwidth Math
Our benchmarks showed a 144 KB reduction in client-sent data per task with WebSocket, leading to a 29 Gbps reduction in ingress traffic for a single major provider.
Server-Side State: The Real Innovation
The key insight is that WebSocket's speed isn't due to the protocol itself but server-side state management, enabling near-instant continuation without re-tokenizing the full context.
The Statefulness Spectrum
Different approaches to context accumulation offer trade-offs in state location, durability, latency, and bandwidth.
Parallel Execution
For parallel tasks, separate WebSocket connections are needed, as each connection handles one response at a time.
When HTTP Is Still the Right Choice
HTTP is suitable for simple interactions, multi-provider support, stateless infrastructure, and debugging.
Conclusion
WebSocket mode significantly improves agentic coding workflows, offering performance gains and reduced overhead. However, it's currently OpenAI-specific, creating provider lock-in. The industry's challenge is to converge on a standard for stateful LLM continuation.
About the Author
Anirudh Mendiratta is the author of this insightful article.