OpenAI taps Cerebras for 750MW of low-latency inference compute
Yesterday, OpenAI announced a partnership with Cerebras to add 750MW of ultra low-latency AI compute to its platform. OpenAI says the capacity will roll out in phases and come online in multiple tranches through 2028.
Yesterday (January 14, 2026), OpenAI announced a partnership with Cerebras to bring 750MW of ultra low-latency AI compute onto OpenAI’s platform.
OpenAI says Cerebras builds purpose-built AI systems aimed at accelerating long outputs, and attributes the speed to a design that concentrates compute, memory, and bandwidth on a single large chip to reduce inference bottlenecks.
OpenAI says the Cerebras capacity will be integrated into its inference stack in phases, with expansion planned across workloads including code generation, image generation, and AI agent use cases.
In describing the goal, OpenAI pointed to an interactive loop—request, model processing, response—and said the added low-latency inference capacity is intended to make that cycle feel faster for real-time use.
OpenAI stated that the 750MW capacity will come online in multiple tranches through 2028.
OpenAI executive Sachin Katti described the deal as adding a dedicated low-latency inference option within OpenAI’s compute portfolio, and Cerebras CEO Andrew Feldman framed the focus as enabling real-time inference with OpenAI models on Cerebras hardware.