Just posted to my Forbes column about a fascinating twist in the AI arms race — and it’s not about building bigger models or more massive data centers.
In a world so AI-hungry that tech giants are investing in nuclear power just to keep up, Microsoft-backed startup d-Matrix is asking a different question: what if we don’t need more raw power at all? CEO Sid Sheth argues that the real opportunity isn’t in training ever-larger models — it’s in running them more efficiently. His view is simple but disruptive: “Training is all performance, inference is all efficiency.” Instead of using training chips (like GPUs) to answer AI queries — essentially cleaning a house with a hammer because that’s what built it — d-Matrix has designed chips specifically for inference, where memory speed and latency matter just as much as compute.
The result? By tightly integrating memory and compute and rethinking chip architecture from the ground up, the company claims it can run inference at roughly 90% lower cost than traditional GPUs — with lower latency and far better energy efficiency. If that holds at scale, AI’s future may depend less on who trains the biggest model and more on who can answer your question fastest and cheapest.
“You’re going to see it in mass production this year,” Sheth told me.