A
argbe.tech - news1min read
GPT-5.3-Codex-Spark targets agentic coding speed with 1,000+ tokens/sec and 128k context
OpenAI released GPT-5.3-Codex-Spark, a smaller GPT-5.3-Codex variant tuned for ultra-low-latency, real-time coding. It pairs a 128k context window with system optimizations aimed at faster agentic workflows.
OpenAI released GPT-5.3-Codex-Spark, a smaller GPT-5.3-Codex variant built for real-time coding that can surpass 1,000 tokens per second on specialized ultra-low-latency hardware while keeping a 128k context window.
- Designed specifically for near-instant coding interactions, Spark is currently text-only despite the large 128k context window.
- Persistent WebSocket connections cut client/server roundtrip overhead by 80%, reducing latency in long-running agent sessions.
- System optimizations halve time-to-first-token and reduce per-token overhead by 30% to keep iterative edits responsive.
- Inference is powered by Cerebras Wafer Scale Engine 3, a purpose-built accelerator focused on high-speed serving.
- During the research preview, Spark uses separate rate limits and doesn’t count toward standard ChatGPT Pro quotas.