Machine unlearning: AI’s missing link?

AI models are powerful, but they don’t forget. And that’s a problem.

They hallucinate. They inherit bias. They absorb sensitive data. And once they’re trained, fixing those issues is painfully expensive. Retraining takes weeks and maybe tens of millions of dollars. And any guardrails the AI company puts up are brittle.

What if you could perform surgery on the model itself?

In this episode of TechFirst, John Koetsier sits down with Ben Luria, co-founder of Hirundo, to explore machine unlearning, a new approach that selectively removes unwanted data, behaviors, and vulnerabilities from trained AI systems.

Get the deepest insights concisely on the TechFirst Substack newsletter
Subscribe to the TechFirst YouTube channel to never miss an episode

And, watch our conversation here:

Hirundo claims it can:

Cut hallucinations in half
Massively reduce bias
Reduce successful prompt injection attacks by over 90%
Do it in under an hour on a single GPU
Preserve benchmark performance

Instead of adding more guardrails, machine unlearning works inside the model, identifying problematic weights, isolating behavioral vectors, and surgically removing risks without degrading quality.

If AI is going mainstream in enterprises, it needs a remediation layer. Is machine unlearning the missing piece?

Transcript: machine unlearning

Note: this transcript may not be 100% correct. Check the video for exact quotations.

John Koetsier:

Machine learning, machine learning, machine learning. We hear a lot about machine learning. Is it time for us to start talking about machine unlearning? Hello and welcome to TechFirst. My name is John Koetsier. We have AI everywhere these days. Obviously super powerful models doing incredible things with impressive results.

We’ve also got hallucinations. We’ve got some bias. We have unethical use. Has anyone seen Grok lately? Prompt injection attacks are all over the place. How do we fix it all? Retraining is expensive. Fine-tuning is slow. Guardrails can be brittle. Here’s a new idea: machine unlearning — selective removal, surgical precision, erasing bad behaviors while preserving performance.

Today I’m joined by Ben Luria from Hirundo. Hirundo is working on machine unlearning — that’s technology that can remove unwanted data, behaviors, and vulnerabilities from already trained models. Testing across both closed and open models, they’ve cut hallucinations in half, reduced bias massively, and shut down most prompt injection attacks — all without degrading model quality.

Welcome, Ben. How are you doing?

Ben Luria:

Great. Thanks for this great introduction.

John Koetsier:

Awesome. Well, thanks for being here. I know it’s late for you, early for me. Hey, this is life. Let’s jump right in. What the heck is machine unlearning?

Ben Luria:

All right. When we’re talking about machine unlearning, basically we’re talking about almost a neurosurgery approach for remediating AI models.

Before we dive into what machine unlearning is and how it’s being done, a bit about why it’s needed. I think we’re reading a lot about the huge expenses toward data centers and the time it takes to train these large language models that came in a storm a few years ago and improved our lives — and definitely made them more interesting.

The analogy that I like to give is that we call it a neural network, right? It’s supposed to resemble the human mind, and it carries very similar traits. It’s easy to learn. It’s nearly impossible to forget. After you spend sometimes months training your models, what’s there is there to remain.

If you accidentally included in the training data things that shouldn’t be there — PII, noncompliant data, and so forth — it’s there. It’s entangled. It’s embedded in the neurons. If the model has developed certain traits that are problematic — we mentioned some of them: vulnerabilities, biases, tendencies to hallucinate — they’re also there to remain.

Just like if I present you with a video of a pink elephant and then tell you not to remember this pink elephant, you’ll still remember it. It’s the same with AI — until machine unlearning kicks in.

Unlearning has been a field of research in academia and in big tech for the last 10 years, even prior to LLMs. It’s basically about mechanisms that allow remediation of AI models in a deep way — meaning the ability to undo those effects, to teach AI how to forget, to remove after the fact either unwanted data or unwanted traits from models.

The main clarification worth adding is that we’re talking about solving these things at the model level itself.

Existing solutions in the market around AI trustworthiness, AI safety, and AI security focus almost exclusively on outside defense — perimeter defense. Usually we call them guardrails, a sort of firewall to prevent or filter problematic inputs and outputs. These solutions are needed, and I believe they’ll remain part of the AI stack, but they can’t be the only solution.

Unlearning provides a deeper way of remediation in the model itself.

John Koetsier:

It’s pretty interesting. I think when ChatGPT-4 launched, there was a story about the system prompt that came out, right? It became available. It wasn’t massively revealing or a huge scoop, but there were clearly attempts in the system prompt to put guardrails around what ChatGPT can say and what it can’t say, what it can do and what it can’t do.

That’s kind of an after-the-fact, locking-the-barn-door-after-the-cows-are-out type of thing, right? Machine unlearning is actually taking that out of the model. You’re still going to have some system prompt, probably, but you’re not going to have to try and protect against things that you know are already embedded in your model. Correct?

Ben Luria:

Yeah, absolutely. I believe a lot of the things that we’re seeing at the moment will remain. But when we think about the AI stack in enterprises and current post-training operations in AI labs, that’s the missing piece.

There’s no real remediation available for AI. If there’s something problematic in there that poses a business risk, reputational risk, or legal risk, right now the existing approaches are more similar to band-aids rather than surgery to take the problematic piece out.

John Koetsier:

And let’s be honest. We don’t know everything that’s in every model that’s out there, but we know people are grabbing the world, right? As much data as they can get, wherever they can get it. So you’re pretty much guaranteed to have stuff in there that, if you had some time and thought and energy to think about it for half a second, you wouldn’t want. Correct?

Ben Luria:

Absolutely. It’s a bit of a Wild West moment across the ecosystem, right? Technology always comes before regulation, I’d say. So we’re still adapting. We still don’t know how to control these tools. I think right now everyone is trying to grab as much as they can. We’re seeing a lot of out-of-court settlements around IP, a lot of different claims about whether someone used my data or my creation in AI models.

We’re aiming to solve that, but also the operational risks of deploying AI. A lot of reports talk about this conundrum: everyone wants to use AI in enterprises and businesses, but at the same time so many projects are stuck in the playground because of the risks that deploying an AI model at scale still poses to an organization.

John Koetsier:

Let’s talk about how it works. We’ve all heard the numbers in terms of parameters and the amount of data going into building some of these large LLMs. I can only imagine how much is there. How on earth do you find what you don’t want there and remove it? What does that process look like?

Ben Luria:

Right. Maybe a bit of background about the team behind Hido and why we’re best suited to tackle this challenge.

I call myself the pretty face of the company because I come from a less technical background, which surprises people. I’m one of the first Rhodes Scholars from Israel. I researched at Oxford in public policy and innovation. But that’s not enough to deliver such cutting-edge technology.

Our chief scientist is one of the most acclaimed computer scientists from Israel. His name is Professor Roded Sharan. He was the Dean of Computer Science and Executive VP at the Technion, Israel’s leading technical university. He started working on AI in the late 1980s and is approaching 65 US patents — the archetype of the crazy scientist.

Our CTO was an award-winning R&D officer for the Israeli government and a researcher at the Technion. We have a bunch of very talented PhDs in our research team. We’re obsessed with this problem. We have eight value stats around this technology, and we’re the first bringing this to market, even though it’s been a field of research for the last decade.

In a nutshell, the process has roughly three stages.

The first is detection — where are things at? From a product point of view, the input could be as simple as clicking a button to detect and diagnose biases in a model against gender, age, race. It could detect tendencies to hallucinate, detect vulnerabilities, or if you know there’s PII that you accidentally fine-tuned the model on, that PII can be the input.

Almost like a detective, our algorithm detects where these things are represented. If it’s problematic data, we detect it at the weight or parameter or neuron level — very specific. If it’s a tendency or behavior, we find it across model vectors in the latent space. Where are the directions that represent tendencies to be successfully hacked, or tendencies to be biased against an age or race category?

So detection is one. Two is isolation.

The reason there are no other unlearning solutions in the market right now is twofold. A, sometimes existing research doesn’t actually remove what you want to remove. But B, and just as important, a lot of times you end up removing more than you wanted.

You try to make the model more secure and resilient to attacks, but now it’s not following instructions properly and not engaging with users even when they’re asking reasonable questions.

The second step in our IP and engine is isolating the detected risks using different mechanisms. I won’t dive too deep, but it involves understanding behaviorally which vectors to preserve and not steer away from, or how to disentangle the information we want to remove at the weight level from other information.

Then the third step is the actual remediation or mitigation. Either steering the model away from bad vector directions or editing out specific weights after they’ve been disentangled.

This complex process works in an automated system, given a model and what you want to remove. One of the main advantages is that it’s extremely lean and fast on resources.

In models with fewer than 20 billion parameters, the process usually takes less than an hour on a single GPU, even from the previous generation. Compared to fine-tuning or retraining — which can take weeks — that’s a significant gift back to your data science team.

John Koetsier:

Wow. That’s kind of insane. It’s mind-blowing in a sense. I’m trying to compare it on a human level. It’s like brain surgery, right?

You want to take out the pink elephant but not remove information about regular elephants in Africa. That is insanely challenging, I would assume. Are you using AI to fix AI?

Ben Luria:

Like every good development team right now, we use coding agents and so on. But I wouldn’t call it using AI to fix AI.

Our algorithm isn’t just putting an AI model on top of another AI model to filter things — that’s basically what guardrails are. To build the best algorithms, we sometimes take assistance from coding agents, but that’s not the core of what we’re doing.

John Koetsier:

So you’re the one startup in the world that’s modest about AI use and doesn’t say AI-powered, AI-infused, AI everything.

Ben Luria:

Don’t get me wrong — if you go to our website, you’ll see the word AI a lot. But what we do is so science fiction that it even goes against existing literature on these ideas.

Most people don’t know about machine unlearning as a concept. The few who do think of it as a field that’s impossible to solve at scale. So the initial response we get is, “Oh, you’re just putting a layer on top.” That’s why I say it’s not just AI on AI.

John Koetsier:

It’s fascinating. When you think about human unlearning, we forget stuff, but when things happen to us it’s almost impossible to unlearn patterns and neural pathways. Really challenging.

You’re talking about eliminating hallucinations. That’s critical. We’re very familiar with AI being confidently incorrect. How do you do that? What do you focus on?

Ben Luria:

First, there’s confusion between hallucinations and inaccuracies.

If a model has grounded data that’s plainly wrong and outputs something based on that data — whether from training, fine-tuning, RAG, online articles, or user input — that’s an inaccuracy. Garbage in, garbage out. The model did what it was supposed to do.

What we define as hallucinations — and many researchers agree — is when the model either derives information and confidently outputs something wrong, or doesn’t have enough information but fills in the gap confidently, making stuff up.

The difference is that here you can spot a behavioral pattern. The model is overconfident without relying on information it has. Instead of saying “I don’t know” or asking for more information, it starts generating.

In both cases, we detect behavioral tendencies in the model’s vectors. Just like we hunt for bias representations, we can find behavioral patterns that lead to hallucinations — overconfidence, ignoring context, and so forth.

Once isolated and mitigated, we see reductions in hallucinations in summarization, RAG outputs, and more. We’re not teaching the model new information — we’re making it more grounded in itself.

John Koetsier:

Very cool. Let’s talk about prompt injection. Some security experts say it can’t be 100% solved and will become a bigger problem as LLMs enter more areas of the world.

You said you can reduce it by up to 85%. How?

Ben Luria:

One of the main use cases for our platform is AI security and reducing prompt injection.

I agree — there’s no silver bullet. What makes AI, AI is that it behaves somewhat like the human mind. It’s a numbers game. There will always be edge cases.

We don’t guarantee 100%, especially with behaviors.

We’ve tested across many models — Gemma, GPT-4o, Neuron, Mistral, LLaMA, Qwen, DeepSeek, and others. We run tests that trigger vulnerability patterns and identify where those behaviors are represented.

Using our system, we can compose improvements — less bias and more security. We’ve seen consistent reductions. Recently we presented to a major AI lab a case study showing 90.8% reduction in successful prompt injections without guardrails and without harming key performance metrics like MMLU and other benchmarks labs care about.

We don’t position ourselves as a replacement for guardrails. Enterprises should use guardrails. But guardrails are the outside fence. If we make the model 90% safer internally, attacks that pass the guardrails can still fail at the model level.

John Koetsier:

Perfect. Are you also working with the OpenAIs and Geminis of the world and helping them get it right at the source?

Ben Luria:

We’re having conversations with most of the big AI labs. We’re targeting both labs and enterprises.

For enterprises, it’s about safer deployment of AI models at scale in high-stakes missions — banks, financial institutions, consumer apps.

For AI labs, we position this as another powerful tool in their toolkit, alongside reinforcement learning, supervised fine-tuning, and safety datasets.

John Koetsier:

It’s critical. Companies like OpenAI are getting sued for things they put out. There’s a lot at stake.

Ben Luria:

Absolutely. It’s not just regulatory or reputational risk. We also offer the ability to deliver models faster while acing benchmarks if they adopt us in post-training operations.

Reinforcement learning and supervised fine-tuning are here to stay. Unlearning is a unique approach that acts faster and can freeze performance on benchmarks while fixing weaker aspects.

John Koetsier:

Do you think machine unlearning will become a standard core component of the AI stack?

Ben Luria:

That’s what I’m here for. We’re here to spread the word and build a unicorn. We have an amazing team and backers. It’s an uphill battle — we’re a small startup from Israel, not based in the Valley.

But the proposition makes sense to everyone who hears it: how come AI can’t forget, and what are the opportunities if we make it forget selectively?

I think when we look back from 2030, we’ll see that today’s AI stack is just an early version. In the absence of a remediation layer, it’s still evolving. By 2030, unlearning will be such an obvious standard that we’ll wonder how it wasn’t already.

John Koetsier:

Super cool. Ben, thank you for your time.

Ben Luria:

Thank you. Really appreciate it.

Machine unlearning: AI’s missing link?

Transcript: machine unlearning

Subscribe to my Substack