Teaching robots like humans

Imagine teaching a robot 1000 tasks in just 24 hours. Imagine teaching robots just like you teach humans. In fact, what if teaching a robot were as easy as showing it once?

Humans can learn new skills almost instantly by watching, trying, or receiving a quick explanation. Robots, historically, haven’t been so lucky. Training them often requires huge datasets with real or virtual data, massive engineering effort, and weeks or months of experimentation.

But that may be changing.

In this episode of TechFirst, host John Koetsier talks with Edward Johns, Director of the Robot Learning Lab at Imperial College London, about a breakthrough in efficient imitation learning that allowed a robot to learn 1,000 different tasks in just 24 hours.

Get the deepest insights concisely on the TechFirst Substack newsletter
Subscribe to the TechFirst YouTube channel to never miss an episode

And, watch our conversation here:

Instead of collecting huge datasets, Johns’ team combines simulation training, clever algorithm design, and single demonstrations to dramatically speed up how robots learn.

We discuss:

How robots can learn from just one demonstration
Why breaking tasks into “reach” and “interact” phases makes learning faster
The role of simulation data in robotics AI
Why robotics doesn’t have the same data advantage as large language models
The future of prompt-like robot training
Whether humanoid robots will actually learn like humans

As robotics hardware rapidly improves and costs fall, breakthroughs like this could be the key to making robots truly useful in homes, factories, and everyday life.

If robots are going to become real collaborators with humans, they’ll need to learn quickly … just like we do.

Transcript: teaching robots like humans

Note: this is a partially AI-generated transcript. It may not be 100% correct. Check the video for exact quotations.

John Koetsier

Can we teach a robot as fast as a human? Hello and welcome to TechFirst. My name is John Koetsier. As much as we love tech around here, I love tech around here, humans are actually pretty amazing. We can often learn a task just by seeing it or being told about it. It’s one-shot learning, single-shot learning. Sometimes we can even figure it out on our own without any insight, guidance, or instruction. Robots, on the other hand, tend to need huge amounts of data, which can be painful, slow, and expensive to get. Maybe not anymore. Someone just trained a robot on a thousand different tasks in only 24 hours. To learn more, we’re chatting with a professor at Imperial College London. He formerly worked for Dyson. He’s now the director of the Robot Learning Lab there. His name is Edward Johns. Welcome, Edward. How are you doing?

Edward Johns

Hi, John. Yeah, thanks so much. Really great to be here and chat with you about efficient imitation learning in robotics. Thanks a lot for having me.

John Koetsier

Awesome, super pumped to have you. Let’s just start with the big obvious question: how did you teach your robot a thousand things to do in 24 hours?

Edward Johns

Well, it wasn’t me, it was my students. So I should really congratulate them on actually physically providing all of those thousand demonstrations. We actually have a video of it on our website, which you can go and watch to see them all providing the thousand demonstrations. But in terms of the robot learning algorithm, what we did is we thought carefully about how to decompose robot trajectories into two sequential phases. So if you imagine, I know that’s like my phone here, imagine you wanted a robot to grab a phone. Rather than having to learn all of these actions in one monolithic policy, you would decompose it into two different policies: reaching the object and then interacting with the object. And we found that that was an inductive bias that made things much more efficient. And we also used a lot of simulation data as part of the robot’s brain and combined that with one demonstration for each task. So it’s a combination of those two ideas really that led to us getting this massive speed-up in efficiency.

John Koetsier

Help me understand that difference. I can assume that if you’re going super old school in terms of controlling a robot, you have to give it tons of spatial coordinates—move this way, do that, and everything. How did you simplify that? Go into some more detail if you would.

Edward Johns

Yeah. So I suppose the really old-fashioned way would be that for each individual task—let’s say it was a grasping task or opening a door, whatever it is—you would, as an engineer, specify the coordinates that the gripper has to go to and the trajectory that it needs to move through. And that does work. If you define a task well enough, you can get an engineer to design a robot that will perform that task. The problem is it’s not scalable.

And there are infinite tasks out there, and we don’t have infinite time to design those controllers. So a more scalable solution is to use machine learning. And that’s where it’s quicker and easier to collect the data and then let an algorithm learn from that data than to manually design the solutions.

The problem with machine learning and imitation learning in robotics is that collecting that data is very expensive and slow, particularly in robotics where we don’t have very much data. So it’s a really big problem that needs to be solved. The way we solved it was, first of all, to think about what parts of the trajectory—this is really focusing on manipulation specifically, so interactions with robot grippers and objects—really need to be learned.

What are the different ways in which approaching the object can be modeled compared to interacting with the object? That was one inductive bias. The other inductive bias is whether some parts of the approaching reasoning can be trained in simulation rather than requiring real-world data collection.

As an example, if you’ve got a robot gripper here trying to grasp my fist, you wouldn’t need to go and collect data for all of these different trajectories like you would with traditional imitation learning. You provide one trajectory, and then in simulation it’s seen how to approach different types of objects—random objects—from different initial states. It’s able to combine that simulation reasoning with that single demonstration in order to know how to approach that object from a new state that it hasn’t actually seen any real data for.

John Koetsier

Super interesting, super interesting. How complex does that get in a real-world environment where things are moving, changing? Let’s say a humanoid robot in a human space.

Edward Johns

Yeah. So if you’ve got a complex environment where objects are moving, that becomes very difficult. I think the first thing the community is trying to solve is just dealing with static objects and making sure the robot can do something even when the object isn’t moving.

If you’re talking about objects that actually move around and you’ve got to catch something that’s flying or moving around in front of you, I would leave that for another year or so before that becomes the challenge in robotics. The more complex the environments become, the more difficult the challenges and the more careful we have to think about our solutions.

But the more complex the environment becomes, the more data you would need if you were to go through the old-fashioned route of just collecting demonstrations. So as we get into more complex environments, that approach of simply collecting demonstrations to cover all of those edge cases might not be the right solution in the long term.

John Koetsier

It is really interesting right now because we’re entering a period where we’re getting more and more data. Whether they’ve used it well or not, if you look at Tesla, they’ve gathered enormous amounts of data from millions of vehicles out on the road. If you look at humanoid robots or other robot form factors, we’re starting to see those getting deployed in test situations doing actual jobs.

There are some in homes, actually. They’re not talking about it a lot, but there are some humanoid robots in homes. You see Figure robots running around with the team in the parking lot. We’re actually at the stage where we’re capturing more and more data, and we’re just at the cusp of allowing that to tsunami into enormous amounts of it. But you made a good point earlier. There are infinite tasks. There’s so much to learn. We need faster ways.

Edward Johns

Yeah. And I think particularly when you think of robotics as being something that the end user should have control over. If you’re buying a robot and it can only do a fixed number of tasks, you’re going to get to the point quite quickly where that’s not very useful for you.

You’d have to call up the company that sold you the robot and say, “Hey, can you spend another month training it on this one task that I want it to do?” And they’ll say, “No, we’re not going to do that for you.”

So really you want a robot that you can teach yourself. That’s probably going to need an efficient and easy way to teach the robot. In an ideal world, we could have access to effectively infinite data, which is how large language models emerged, because there’s not infinite language data but certainly a huge amount of language data that’s constantly growing.

But we don’t have that in robotics. So it’s a much harder problem. We probably can’t just use the same ideologies that we used in training large language models in robotics. We’ve got to think more carefully about the efficiency of those algorithms.

John Koetsier

For LLMs, we have this technology called the internet where there are trillions or quadrillions of words out there that people can grab and start learning from. It’s a little different in robotics. That data is a little closer held.

Edward Johns

Yeah. With language models, the data was already there before people found a use for it in AI. In robotics, the data is only going to be there after people have found a use for robotics, if you’re going to go down that route of deploying robots in the real world to collect that data.

That approach creates a bit of a chicken-and-egg problem. How do you get that robot deployed meaningfully and at scale in order to collect enough data to do the task that the person would have bought the robot for in the first place? It’s not guaranteed that that approach will happen.

A more guaranteed approach may be to make it easier for end users to deploy robots in the first place without having to solve that chicken-and-egg problem themselves.

John Koetsier

If you think of an employer-employee or person-to-person analogy, I can train somebody to do something that I know how to do, and I should be able to train my robot to do something that it didn’t know how to do when it left the factory. This would allow it. This is somewhat prompt-based, correct, your method of training?

Edward Johns

There are a couple of works coming out from our lab over the last year. One of them we’ve just mentioned, which is the “Learning 1,000 Tasks in a Day” project. Another work that we published last year was called Instant Policy. This is a method based on in-context learning.

It’s a big neural network that we’ve trained with simulated data, and we can prompt that network with a demonstration. A little bit like you would prompt a language model with some text, we can prompt that network with a demonstration. It does a forward pass through the network and outputs the appropriate action.

Because it learns instantly, you as the end user become in control of it, and you can become creative and playful with all the different applications you find for it. It’s a sort of ChatGPT version of imitation learning where you can prompt it with your own demonstration and the robot will learn that task right away.

The thousand-tasks work was a different kind of method. It wasn’t really based on prompting in that same way. It was more based on decomposing trajectories into alignment and interaction phases.

John Koetsier

If you go prompt-based, does that mean it’s probabilistic rather than deterministic, and could that cause some challenges in the future?

Edward Johns

It is probabilistic to a certain extent. But as long as your training data captures the right kinds of distributions that you want the robot to output, then that’s fine.

All methods based on deep learning have the argument that you’re not fully in control of what the robot is going to predict. But if you are in control of the demonstration you’re providing, if you’re in control of the training data, and if you’re in control of the empirical evaluation that proves it is robust and safe, then I think that’s sufficient.

Going down the route of only deploying robots if we can prove fundamentally that they are safe is going to make it very difficult to deploy robots in many scenarios. We don’t do that with humans either. We don’t prove that a human is safe before we allow them to drive us in their taxi.

So I think we can prove things empirically in robotics, and that should be fine.

John Koetsier

In fact, I think we’ve proven that cars are not safe in all cases, 100 percent of cases. The same will be true for most machines. What tasks can this work for? Essentially anything?

Edward Johns

There’s what tasks it can work for now and what tasks it can theoretically work for in a couple of years. Right now, the tasks it can perform are things like picking up objects, placing them somewhere, opening doors and drawers—tasks where the trajectories are relatively smooth and the precision is maybe half a centimeter or so.

With more simulation training data, we should be able to have much more versatile trajectories and much more precise tasks. To get there, one part of solving the problem is difficult research and development, but another part is scaling up simulation data.

The nice thing about simulation-based robot foundation models is that it’s quite quick and cheap to get to the scale you need for them to start working, compared to having to collect all of that data in the real world. You’re going to reach that critical point much more quickly.

John Koetsier

You mentioned precision. That’s not only a function of your software and intelligence but also your hardware. There are different levels of precision depending on your actuators, gears, and so on. Some robots are designed to be 100 percent accurate every time, like robots welding cars in assembly lines. Others are designed to get into the area and then narrow down.

Edward Johns

Yes. A robot that’s really precise and stiff, meaning it goes exactly where you want it to go, works well if the place the robot thinks it needs to put its hand is actually the correct place.

In traditional automation like car assembly lines, that’s fine because the robot goes back and forth to the same point over and over again. There’s no uncertainty about what it has to do.

In real robotics there is uncertainty. You’re passing an image through a neural network that’s predicting things that might not be micron-level accurate. Sometimes you actually want a little compliance between the gripper and the environment so that if the prediction isn’t perfect and the gripper needs to rub against the edge of something to complete the task, that compliance allows it.

But if the robot knows exactly what it’s supposed to do, compliance could be bad. Imagine a robot arm trying to drill a hole and the drill just flopped down because of compliance. It would be a terrible drilling robot.

You have to design that compliance around how much trust you have in whether the robot is predicting the right actions. There are also different types of robots based on quality. A very cheap robot arm will probably not be very accurate. That might be fine if the AI isn’t very accurate either. What’s the point in spending a lot of money on a robot if the AI isn’t good?

But if the AI is good, then you can afford to have a more precise robot and match the two together. Conversely, some robot learning algorithms are good enough to deal with the inaccuracies of a cheap robot arm. Depending on the controller design, you can exploit cheap arms or expensive ones.

John Koetsier

When can we start seeing the results of this in the real world?

Edward Johns

I think soon. It depends on the applications, but in tasks like manufacturing where somebody wants to show a demonstration of a task and then step away and let the robot perform it autonomously, this is something I see happening quite quickly.

End users don’t want to spend days collecting data and waiting days for a neural network to train. So in manufacturing and industrial environments there’s a real pull for this technology. That will likely be the first market.

Ultimately, any robot that can learn a task very quickly will be better than one that learns slowly. The way I see the field going is that robots will become very quick at learning, whether in manufacturing or at home.

If robots are to become genuine alternatives to humans, they probably have to learn quickly like humans in order to be as useful as humans. Right now we have humanoid robots that look like humans and have the form factor of humans, but they are certainly not learning like humans.

The videos you see might give the impression that they learned like humans, but you don’t see all the months of data collection behind that video. So the robot looks like a human, but it isn’t learning like one, which means it’s not a real alternative to human labor yet.

John Koetsier

Maybe let’s zoom out a little bit. It must be a fascinating time for you. You’re deep into robotics, building technology and the AI behind it. We’re seeing an explosion in robotics right now across all form factors, especially humanoids.

There are around 150 humanoid companies in China. The government there is even saying there are too many. There are dozens in the U.S., some in Europe, and others around the world. What are your thoughts seeing all this? Where do you think we’re going to get to the point where this really changes the world?

Edward Johns

I think it’s great that a lot of money is being put into robotics. It’s also great for me as an AI person in robotics that a lot of hardware is coming out now that is much cheaper and much better than before.

In some ways the AI and the hardware have to improve together. Developments in deep learning started showing intelligent behavior even on older hardware. That encouraged companies to build more robots because they saw a market for combining hardware with AI.

Now AI researchers see lots of cheap robots available, which means they can start companies and experiments more easily. These two fields are pushing each other forward.

It’s great that there are many hardware companies because most things become cheap once they are manufactured at scale. One reason robots have been expensive is simply that they haven’t been manufactured at scale.

The same goes for tactile sensors. High-quality ones are very expensive today, but eventually they will become cheap.

Certain regions of the world have different strengths. There’s a lot of very high-quality and good-value hardware coming out of China, including many humanoid companies and robot arms. Some of the best robot arms in the world are coming from China right now.

On the AI side, Europe and the U.S. are very strong. The robot arms manufactured there are often very good but also very expensive. That can make it hard to justify them when cheaper alternatives exist.

Overall, it’s a great time to work in robotics. There’s much more support in the community, and many people are releasing open-source models. It’s much easier for students to get started now. If you have only a hundred pounds, you can buy a cheap robot arm, often even a 3D-printable one, and start training it on basic tasks.

Things have moved forward very quickly in the last couple of years in terms of enabling people to enter the field.

John Koetsier

It’s going to be a super interesting next couple of years because you have companies in China producing thousands of humanoid robots a year and planning to scale. In the U.S., companies like Foundation are just starting but moving quickly with huge scale plans.

Tesla obviously has manufacturing capacity, and companies like Figure and Apptronik are moving forward as well. Somebody is going to scale in the Western world as well as in China to tens of thousands of units a year.

The question will be: will the hardware be good enough? Will the AI be good enough to justify that? Or will it fall on its face because it’s too early? That’s always the question. When do you scale? When is there enough functionality? When is it good enough to be a commercial success? Really challenging question to answer. Any thoughts?

Edward Johns

Going back to the work we’ve been doing on efficient robot learning, it somewhat changes that question. It’s less about when we’ve collected enough real-world data and more about when we’ve developed algorithms that are efficient enough to learn from the small amount of data we already have.

There are two approaches to solving this problem. One is brute force: collect more data and hope that it works. The problem is that robotics data isn’t shared between companies.

Unlike the ChatGPT moment with language models, where everyone trains on similar datasets from the internet, robotics companies all collect their own data. We’re very inefficient as a planet in terms of sharing robotics data. Every company ends up collecting the same types of data independently.

That approach might eventually lead to widespread deployment, but another approach is solving the foundational problem of how robots can learn very quickly. That’s the type of deeper academic research we’re doing at Imperial College.

If we focus on foundational algorithms that allow efficient learning and focus on the user experience—making it easy and quick for end users to teach robots—then I think that’s the real route to deploying robots at scale in the short term.

With language and images there was so much data that throwing it into neural networks worked quickly. Robotics might eventually follow that path, but it will likely take longer. Efficient learning methods may get us there sooner.

John Koetsier

Maybe we should end here. What’s your next step? What are you working on next? What’s the next big problem you’re solving?

Edward Johns

We’re continuing this line of research in efficient imitation learning. The goal is to reach a point where someone could walk into one of our labs, think of a task off the top of their head, teach the robot that task, and then step back and watch the robot perform it.

You can’t do that with traditional imitation learning because you’d have to collect hundreds of demonstrations and then wait a couple of days for the neural network to train. You would come back at the end of the week to see the robot perform the task you wanted.

If we can get to the point where someone teaches a robot instantly and sees it perform the task right away, that would be amazing. That’s what we’re aiming for right now.

Thanks so much for having me, John.

Teaching robots like humans

Transcript: teaching robots like humans

Subscribe to my Substack