OpenAI Now Using Cerebras’ AI Chips To Code At 1,000 Tokens Per Second

Just posted to my Forbes column about one of the biggest — and boldest — bets yet in the AI chip wars.

Cerebras Systems just closed a staggering $1 billion Series H round, pushing its valuation to $23 billion and putting it squarely in the ring against Nvidia’s dominance of AI compute. The company’s Wafer Scale Engine 3 isn’t just another chip — it’s a literal monster, 56 times larger than the biggest GPU, packing more than four trillion transistors. Cerebras claims it delivers over 20x the performance per watt of competitors, and it’s now powering OpenAI’s GPT-5.3-Codex-Spark model at an eye-popping 1,000 tokens per second — about 15x faster than before. If you’ve ever stared at a blinking cursor waiting for an LLM to finish coding, this is the kind of speed that changes the experience.

“It’s the fastest inference in the world,” a Cerebras executive told me last year — and with a $10 billion OpenAI deal to deliver 750 megawatts of wafer-scale systems, this is no longer just an ambitious hardware story. It’s about who controls the infrastructure behind the most hyped technology on the planet.

The bigger picture? AI demand is exploding, power and cooling constraints are real, and hyperscalers — and even governments — are racing to avoid single-supplier risk while securing sovereign compute capacity. Capital is clearly chasing that urgency.

Read the full post here …

OpenAI Now Using Cerebras’ AI Chips To Code At 1,000 Tokens Per Second

Subscribe to my Substack