Fruit-fly AI: SLMs are the new LLMs

AI is devouring the planet’s electricity, already using up to 2% of global energy and projected to hit 5% by 2030. But a Spanish-Canadian company, Multiverse Computing, says it can slash that energy footprint by up to 95% without sacrificing performance.

In this episode, we chat with Samuel Mugel, Multiverse’s CTO, about how quantum-inspired algorithms can drastically compress large language models while keeping them smart, useful, and fast. Mugel explains how their approach – intelligently pruning and reorganizing model weights — lets them fit functioning AIs into hardware as tiny as a Raspberry Pi or the equivalent of a fly’s brain.

We also explore how small language models could power Edge AI, smart appliances, and robots that work offline and in real time, while also making AI more sustainable, accessible, and affordable. Mugel also discusses how ideas from quantum tensor networks help identify only the most relevant parts of a model—an “intelligently destructive” approach that saves massive compute and power.

Check it out … and subscribe to my YouTube channel:

Tiny AI: key points and takeaways:

The problem

AI currently consumes 1–2% of the world’s electricity, projected to reach 5% by 2030.
Current large models like LLaMA or GPT have billions of parameters, driving unsustainable energy and compute costs.
Most AI runs in the cloud, adding latency, network dependence, and massive power draw.

Multiverse’s approach

Multiverse builds small language models (SLMs) with under 100 million parameters, 10–30× smaller than standard LLMs.
Their smallest model, Superfly, has the parameter count equivalent of two fruit flies’ brains.
Using quantum-inspired tensor network methods, they reorganize and compress AI models while maintaining performance.
Combines quantization (simplifying weight precision) and intelligent pruning (removing unneeded connections) for efficient models.
This approach can cut energy and compute use by up to 95%.

Why small models matter

Edge AI: enables smart appliances, vehicles, and devices to work offline with onboard compute.
Faster, cheaper inference: critical for AI agents, IoT, and embedded robotics.
Resilience: edge-based models continue functioning even without cloud connectivity.
Sustainability: helps slow AI’s growing power consumption while improving accessibility.

Quantum connection

Multiverse’s compression techniques are inspired by quantum computing, particularly tensor networks, which identify and retain only relevant parts of complex systems.
The company started in quantum computing and still plans to port AI compression to quantum processors in the future.

Broader implications

AI’s “compute explosion” is unsustainable without innovations like this.
Small, task-specific models may soon run directly on phones, cars, and robots, reducing cloud dependence.
The next wave of AI might not be bigger, but smarter and smaller—and possibly quantum-inspired.

Subscribe to TechFirst on all the major podcasting channels …

Fruit-fly AI: SLMs are the new LLMs

Tiny AI: key points and takeaways:

Subscribe to my Substack