A
argbe.tech - news
1min read

GPT-5.3-Codex-Spark targets agentic coding speed with 1,000+ tokens/sec and 128k context

OpenAI released GPT-5.3-Codex-Spark, a smaller GPT-5.3-Codex variant tuned for ultra-low-latency, real-time coding. It pairs a 128k context window with system optimizations aimed at faster agentic workflows.

OpenAI released GPT-5.3-Codex-Spark, a smaller GPT-5.3-Codex variant built for real-time coding that can surpass 1,000 tokens per second on specialized ultra-low-latency hardware while keeping a 128k context window.

  • Designed specifically for near-instant coding interactions, Spark is currently text-only despite the large 128k context window.
  • Persistent WebSocket connections cut client/server roundtrip overhead by 80%, reducing latency in long-running agent sessions.
  • System optimizations halve time-to-first-token and reduce per-token overhead by 30% to keep iterative edits responsive.
  • Inference is powered by Cerebras Wafer Scale Engine 3, a purpose-built accelerator focused on high-speed serving.
  • During the research preview, Spark uses separate rate limits and doesn’t count toward standard ChatGPT Pro quotas.