How MIT’s Cheetah robot teaches itself to walk in 3 hours

Programming robots is so 2010. Providing the AI framework within with they can teach themselves is accelerating training and development of new behaviors from 100 days to 3 hours.

In this TechFirst, we meet 2 of the researchers behind making MIT’s mini-Cheetah robot learn to run … and run fast. Professor Pulkit Agrawal and grad student Gabriel Margolis share how fast it can go, how it teaches itself to run with both rewards and “punishments,” and what this means for future robots in the home and workplace.


Support TechFirst: Become a $SMRT stakeholder


Scroll down to watch, listen, or read the full transcript …

(Subscribe to my YouTube channel)

TechFirst podcast: how MIT’s mini-cheetah teaches itself to run

 

Full transcript: MIT’s running robot

(This transcript has been lightly edited for length and clarity.)

John Koetsier: Can robots run? Well, not easily, and usually not without a ton of training. MIT, however, has built a Cheetah, which recently broke the record for fastest robot, and perhaps even more interestingly, taught itself how to run. To learn just how it did that and what it means for the future, we’re chatting with MIT professor, Pulkit Agrawal, and PhD student, Gabriel Margolis. Welcome to TechFirst.

Pulkit Agrawal: Thank you, John, for having us. We are excited to be with you.

John Koetsier: Excellent. Tell me about the robot. What is Cheetah?

Pulkit: So, Cheetah is a robot which was developed by the Biomimetics Lab at MIT, in a hope that at some day we can have robots like the real cheetah, right? And it has gone through multiple iterations. And, you know, what we have now is a miniature version of the bigger Cheetah, and this is what is called as the Mini-Cheetah. So think of it as a robotic platform with which we can study how we can get to the real cheetah-like locomotion.

John Koetsier: Tell me about the size and the speed. It’s got four legs … what kind of weight are we looking at here?

Gabriel Margolis: Yeah. So, the robot, it’s 9 kilograms, so, around 20 pounds. And, right, four legs. Each leg actually has three motors in it, so two in the shoulder and one in the knee. And so there are a total of 12 motors on the robot, and it stands about 30 centimeters tall.

John Koetsier: That’s actually really interesting that you say there’s three motors per leg, because I’m trying to imagine how many muscles I might have in my leg, and I’m guessing it’s [laughing] maybe an order of magnitude more?

Gabriel Margolis: Yeah, yeah. So, in some ways, well, definitely this robot is a very simplified version of what we would find in nature. And with three degrees of freedom, what we can get is a good range of motion of the foot to be able to, like, put it in different locations and apply forces in different directions. But a whole lot of stuff can happen between the foot and the shoulder, that may or may not be captured by this robot, but could occur in biological systems. Yeah.

John Koetsier: So, I want to talk about how it taught itself to run, which is super cool, super interesting, and super relevant to future robotic design and delivery. But first off, how fast does it go?

Gabriel Margolis: So, it can go up to…well, the fastest that we’ve recorded it using the controller we designed is 3.9 meters per second, which, to our knowledge, is the fastest that anyone has made it go. So, yeah, for reference, that’s probably like a pretty fast human running speed. I mean, not like Usain Bolt fast sprinting, but a typical person. Like, I would have to…I’d be a little short of breath running beside it, for sure.

John Koetsier: Okay, interesting. So, we’re not quite at cheetah speed here, you know, 70 kilometers an hour or something like that. But, maybe we’re approaching 25 or something like that?

Gabriel Margolis: Yeah, yeah. And of course, one thing to note is that this robot stands much shorter than a human or cheetah, has much shorter legs, right? So, one important thing to consider is adjusting for size when we think about the speed of these robots. And of course, there have been other legged robots too, which maybe have run faster in the past, but were much larger.

John Koetsier: Interesting. So you scale up. You scale up your levers, you scale up your speed as well. Well, Pulkit, let’s get into one of the most interesting things about this. It taught itself how to run. How’d that happen?

Pulkit Agrawal: Yeah, no, great question, right? So, what we did was, we made up some environments where we thought the cheetah should be able to perform very well. So, that means that we set, you know, how the ground should vary, what should be the friction of the ground, so on and so forth, and then our algorithm pretty much outputs a sequence of commands, which are a sequence of joint positions. And then we would evaluate if this sequence of actions are good or bad by measuring how fast the cheetah walked, right?

And then, whatever actions led to faster motion, we would prioritize them more and more. So, what’s essentially happening is trial and error learning. And things which are winning get incentivized more, and the agent tries them more and more, right? So, that’s pretty much, at a high level, what’s happening.

John Koetsier: So, that’s super interesting. I’m wondering if it’s more complex than that, or if you think you’ll make it more complex than that as you go, because you basically said, “Hey, you’re rewarded, it’s a positive when you go faster.” There isn’t like…is there like a safety thing? You’re rewarded as you hit fewer things? Or you’re rewarded as you fall down less or something like that?

Pulkit Agrawal: Definitely, right. So I think you raised two very important points. So, definitely, you’re given a negative reward if you fall down.

John Koetsier: Otherwise known as a punishment [laughter].

Pulkit Agrawal: Yes, you need to punish it for doing bad things. And, you know, we also penalize it. I mean, don’t make an action which consumes too much energy, for example, right? So, there are some things that you want the robot to do, and some things we don’t want the robot to do. And, you know, the whole art is in saying the minimal amount of things so that the robot also has the flexibility to come up with its own behaviors to go fast. I mean, we don’t want to trust the human to say exactly how you should move the leg, right? But we still want to put constraints like, “Okay, don’t consume too much energy, and still, you know, run fast.”

John Koetsier: That is really interesting, because you can imagine tweaking those parameters as you have different applications. You can also imagine a learning system that learns to tweak its own parameters as it, you know, looks at a future journey that it has to take, for instance, or something…”I need to get 100 miles over there. I have this much energy. I better go at that speed in order to get there. And that’s probably slow, but I’ll get there.” Talk about why it’s important that this robot teaches itself and what that will do in the future.

Pulkit Agrawal: Yeah, again, an excellent question. So, what we’re really looking for is a scalable way to construct robotic systems, right? If I said, “Hey, John, here’s a robot, and I wanted to make it walk, but come back after a year,” right? And then it’s like, “Okay, now it needs, you know, it also has a hand and it needs to open doors, give me another year, or a couple of years,” right? Now, you can kind of add up the number of years it would be required before we have a robot in the house, or to develop a particular application where the need is like, “Oh, we need this robot in a day.” Or, you know, some disaster happens and it’s a specific kind of manipulation required.

Now, traditionally, the process that people have been using require you to study the actual system and manually design models which are used for doing control, right? And this process is good, it’s well established, but it’s not very scalable, it has this human in the loop, right? What we’re trying to do is to find the right balance of human-in-the-loop so that we can train these systems quickly.

So we are still building a simulation environment. But we are removing the human from designing the specific behaviors. The human doesn’t need to design the particular model of the robot which is used to come up with action, right? So, essentially, we can use this algorithm and within three hours come up with, you know, it could walk, but we could also have it jump. But we have also other places where you use similar frameworks to say have a hand manipulate an object. So I could pick up an object and it can start reorienting like this, right? With the same framework. So essentially, I think, the reason we are doing this is so we can make robot learning be scalable so we can go to applications faster.

John Koetsier: And so, that’s part of what you talked about before, you said accelerating 100 days of learning into three hours. That’s a massive speed-up factor. I mean, that’s huge, right?

Pulkit Agrawal: Yes. So, maybe I will try to put that in context. So, I think when we are saying, you know, 100 days to three hours, I think this is because in simulation, things can go much faster than real time, right? And over here, we are actually leveraging advances from Nvidia and other companies which have built good simulators. Now, when we say about scalability, I think the scalability we are talking about is amount of human time required to engineer a behavior. So, I think those are the two different aspects, right? 

John Koetsier: Yes.

Pulkit Agrawal: So, the cost is, if I remove a human designer, if I’m designing behaviors, I need to pay a cost. And that cost is we’re doing trial and error learning, but then it requires a lot more data, right? Now, if you were to do this in the real world, this would be very expensive because it would require 100 days of real-world experience. And if the robot falls down, that’s not very good. So, what simulator offers is kind of a safe playground, where it’s okay for the robot to fall down and go back, but also the simulator can run much faster than real time.

John Koetsier: That makes a ton of sense. Yeah, absolutely. I totally get that. And, you know, after 100 days of real-world testing and having the robot fall down here and fall downstairs there, you probably wouldn’t have a robot left to do more training and testing with. Which brings up an interesting point. What perceptual systems have you built into Cheetah? How does it perceive the world around it? Where the ground is, what the environment it’s in, how not to run into things and how to maybe act differently depending on what it sees on the ground or feels on the ground?

Pulkit Agrawal: You want to take this one, Gabe?

Gabriel Margolis: Yeah. So, actually, right now, or in the system that we deployed, there are no cameras whatsoever. So, actually, all of the behaviors that we’ve shown have been achieved, essentially, blindly. So, what the Mini-Cheetah, what the robot is doing is it’s feeling the environment through…it doesn’t have touch sensors, either, in its feet, but it’s just feeling the environment through the motion of its joints.

So, for example, when you walk, think about how you walk over an ordinary indoor floor versus how you might walk across an icy pond. If you try to walk the same way, you might experience a very different feeling and find yourself in a very different position than on these two different surfaces. So, even if you had your eyes closed, you would probably be able to tell the difference between the two surfaces that you were crossing as you cross them. And so, that’s actually all that this robot is doing right now to adapt to different terrains is it’s feeling what happens to its own body over time. So, in the future, we’re definitely interested in maybe adding more sensors, but all of these behaviors that we’ve shown were actually achieved without them.

John Koetsier: That’s amazing. It kind of reminds me of a long time ago. I went camping early spring, and we had to ford across a cold river and I lost feeling in my feet. I had to just shove my foot down to feel was there a rock? Could I take another step, move forward? So it’s just feeling where its own limbs are basically. I understand there’s probably extra…well, there’s computational cost, there’s sort of world modeling cost and energy cost by building in more perceptions over time. But I’m guessing you’ll do that and make something that could do more sophisticated things. It’s amazing that you’ve done it without any of that so far.

Gabriel Margolis: Yeah, yeah. So there are certainly some advantages to using vision, but there are a lot of drawbacks too. One is that it can be a lot slower to simulate, actually. So you might not be able to get this massive speed up in the same way using just the regular, like, trying to simulate what would the robot see as it crosses different terrains. Another is actually that there’s some aspect of the robot that it can’t see, for example, its own mass. Like, if I put a payload on the robot’s back, maybe the camera isn’t pointed in a place where it’s going to be able to see that, but it will be able to feel it through its joints. So, I do think this is a really powerful framework of, in particular, adapting to what the robot feels.

John Koetsier: I’m not sure it’s going to work for making Teslas self-driving, [laughing] it might cost a lot and kill a few people on the streets, but I totally get it. Let’s talk about… Go ahead.

Pulkit Agrawal: There’s also an interesting parallel to what you mentioned with like Tesla and self-driving, and the role of vision over here, right? So, you don’t want to use vision to teach a car how to move left or right. Right? That, you can still learn. What you want to use vision is to say when I should move left, and when I should move right. Right? So, essentially, you want this right kind of modularity to build a scalable system.

So, what our Cheetah is doing is, you know, it can walk and it can run. Now, what it needs to learn is when, if I have to go somewhere, I should avoid ice, or I should maybe jump over something, or crouch under something, right? So, this design choice is not…it also has that element of putting modularity in the right way.

John Koetsier: I love that. I love that because that’s very human. That’s very kind of biological. We have different systems, like, there’s a part of my brain that says, “I wanna go to McDonald’s.” That’s not the part of the brain that says, “Okay, start moving, lean forward, lift a leg, get the arms going,” all that stuff, that happens almost autonomically, right? I mean, we don’t even think about it, subconscious is happening there, right?

So, that makes a ton of sense. You’ve got modularity, and you can add different components based on need and what you’ve got available. So, this has been… This is a pretty fast robot. And sometimes I’m wondering what you would do if you see this running to you at high speed in a dark alley after midnight [laughing]. But, how fast do you think you can get?

Pulkit Agrawal: So, I think, the speed is proportional to how much power we’re going to put in, right? So, if we increase the motors, you know, if we increase the size of the robot, of course, it can go faster and faster, right? So, I think it is pretty application-driven. If you said, “Well, I want a robot which goes as fast as Usain Bolt,” you know, I’m sure one can build such a robot.

In fact, Boston Dynamics had a demonstration way back in 2012 that they had a robot going as fast as Usain Bolt. But the difference was that this robot was on a treadmill, it was externally powered, and it had a support system. And what we were trying for is, yes, we want to be fast, but we also want to be on natural terrains, as a real cheetah is. So, I think, John, to answer your question, it depends on what approaches you will give me, and we’ll make it as fast as you want it.

John Koetsier: Always faster, always faster. But that brings up a…that’s a great segue actually, it brings up a good question, which is the power problem, right? When we saw those robots from Boston Dynamics, you know, there’s an external power source, and that’s a big deal, right? Because power is hard to build in efficiently for a decent amount of time. How do you solve that? How do we solve being able to give an autonomous robot at least hours of power?

Pulkit Agrawal: Yeah, again, an excellent question. I think this is where, you know, what you’ll find is non-linear movement in the research space. So, some people might come in and say, “Hey, what about a solar-powered skin on the robot? [laughter]

John Koetsier: It only has to move very slowly.

Pulkit Agrawal: Right. I think there are things you can do in terms of batteries, and we are seeing a lot of movement happening in the space. Then there are things that we can do in terms of the control design, right? I mean, right now, I said we are minimizing energy, but we could do it much better, right? Then there are things in the hardware design, for example, like, if you look at this robot, it has this, you know, feet, which is like a sphere, but it doesn’t really have an ankle to it. So imagine that if you had to walk but without an ankle or run without an ankle, it would incredibly be hard. And if you have an ankle which is compliant, you could generate energy by pressing into the ankle and then leveraging that.

So, I think that this is a multi-dimensional problem that we will solve by better ways of harvesting energy, for example, solar energy, better batteries, which are going to come in. You know, just better mechanical design, right? Maybe if we build, like lighter robots instead of using what materials we are using. And so, yeah, so I think it’s going to be a convergence of many things.

John Koetsier: That makes a lot of sense. And I think that there’s some robots, I believe, some military ones that are using like small internal combustion engines as well, because gas is a pretty efficient fuel, is a pretty efficient collector of energy in a small amount of space. So that makes some sense. Super interesting. Also an ankle, I think, would make your robot last longer because there’s this cushioning factor as well, right? So, that would make a lot of sense.

Let’s talk about where this will be useful. So, we talked about speed. That’s cool. We talked about learning. That’s essential if you want robots to enter and be somewhat adaptable. It’s not like a single purpose, “I stamp out that part.” Or, “I weld that thing,” right? If you want an adaptable robot, it needs to learn. Where will this be useful? Where do you see it going?

Pulkit Agrawal: Gonna take this, Gabe?

Gabriel Margolis: Yeah. So, I guess, specifically on the subject of legged robots, I think that what’s potentially useful about them is that they can go a lot of places that wheeled robots can’t. And of course, we use wheeled robots all the time. We may not always think of them as wheeled robots, but, you know, they’re cars out on the roads, and they perform all sorts of useful tasks for us such as delivering goods, like emergency rescue and response, and taking us places, transportation. But they’re constrained to operating on these surfaces that are specifically designed for wheeled robots. They can’t come into human environments and interact with us in our homes, in our offices, in our factories, as easily.

So, really, all of those domains are places where legs can produce benefit. We can have emergency response vehicles that actually come into a home and save someone. We can have delivery services that bring something up your stairs onto your porch or even into your house. And I think that this expansion of robot mobility will be really cool for all of these applications.

I would also say that the learning-based techniques that are behind the control of these legged robots can then be extended to robots with other form factors and other interesting functionality. In particular, robots that can use hands and fingers to manipulate the world. It’s actually, in some ways a similar control problem, of course, with other details involved. But we’re optimistic that maybe some of these learning techniques that are useful in legged robots can also be useful over there, and we can build robots that can actually interact with objects in their environment and perform tasks that way as well.

John Koetsier: Super cool. I love it. I mean, like, you mentioned coming into the house last mile for delivery. Those make tons of sense. Even search and rescue, going in the forest, on the mountain somewhere, something like that, right? You know, a rescue on Everest, you don’t want to send a human up there because they’re in the death zone. Maybe you can send a robot at some point, who knows. So, many applications there. I wanted to ask you to follow-up, Gabriel, on that just briefly, because you mentioned cars in the context of robots. How do you define robots?

Gabriel Margolis: So, it’s difficult to define a robot. One definition that I like is that a robot is a mechanical system that doesn’t work yet… 

John Koetsier: [Laughter] I don’t like that one.

Gabriel Margolis: Maybe that’s too broad. But I think that our definition of autonomy, in some ways, it keeps shiftin and it’s not well-defined. That’s just my opinion. Because we have a lot of machines that can do a lot of interesting things autonomously for us. For example, we have washing machines in our homes that will wash our dishes, or clothes, and automate a lot of things that used to take a lot of like human manual labor. But we don’t typically think of those as robots. Maybe some people do, but typically, robotics researchers are not thinking anymore about dishwashers because it’s sort of a solved problem. And we’re thinking about what are all of the edge cases that a dishwasher couldn’t handle in a kitchen, which, of course, there are many.

John Koetsier: Is it really a solved problem though? Big pots don’t fit in, somebody has to rinse them, it has to go in, it has to come out, it has to be placed somewhere. All those problems are not solved. I mean, sure, the actual washing, but, I mean, what about…I’m just bugging you.

Gabriel Margolis: Sure. Sure. [Laughter]. But I guess think of a more formal definition. I think a robot is like a machine that perceives the world using some sort of sensors and then takes action on the world using actuators. Which, I mean, in legged robots, we would think of as just the motors of the robot. For a washing machine, of course, it might be some nozzles and things like that. And in between the sensors and actuators, there will be some processing that decides, “Okay, given what I’m seeing in the world, how should I operate on it?”

John Koetsier: So, that’s super interesting. Under that definition, my Tesla Model Y is a robot, because it will drive itself under certain conditions, and on others, I will not allow it. So that is interesting. That brings up another question though, because…and, Pulkit, I’m going to direct it your way, because you’re building some really amazing tech here, just in terms of what this robot can do, how it acts, but especially the learning capabilities that we need, we need in a lot of different places.

We are in an interesting place as a world. We have a ton of environmental issues. We have potential workforce issues as many populations around the globe are aging, as some populations are potentially even decreasing. I mean, just as we want to apply human ingenuity and creativity to higher-order problems, we want to allow machines to do some things that we really wouldn’t want a human to have to do. They do it right now, but it’d be nicer if they don’t. How do MIT innovations like this get commercialized? How do they get out in the real world? How do you get this in the hands of companies that are building robots right now and need better ways to train them and get them smart and useful in the world?

Pulkit Agrawal: John, this is an excellent question. Right, and how do we…so, what we think of research is a technology demonstration so that people can start imagining what is possible, right? And then there are different ways in which we can take this technology in real hands. I mean, one way is, you know, Gabe decides he wants to do a startup and maybe commercializes this technology. I mean, we have had past instances, like, a couple of my students last year, they were working on some robotic systems and now they have their own company. And I think, at MIT, entrepreneurship is highly encouraged, and that is one avenue, right?

Another avenue is to license such technology, but licensing and filing patents is a two-way sword. In some ways, you know, some companies could pick it up, but you’ll be also stymying open source build up which can happen from our work, right? So, I mean, for example, we have released this work in open source. So, if you want to download the code, play around with it, replicate the results, you can do it.

And the good thing is, we are really doing this research on a low-cost platform which you could kind of, even if you build your own quadruped, you can deploy our system on it because we don’t use any specialized or expensive equipment to do it. So, in some ways, our research is very democratic, in some ways, right?

And then, there’s always this third option where some company comes across our work and they’re like, “Hey, this technology is something that we can incorporate.” We give out, we go give talks, you know, MIT has its own technology office which helps take this research and figures out who the right partners are. But from our perspective, what, as researchers, we try to do is to make it as open source as possible. And sometimes, you know, some people may not see the vision, and we are able to see the vision, and then we go out, open our own companies.

John Koetsier: Excellent, excellent. Well, I want to thank both of you. This has been super interesting. The Cheetah is very cool, and the speed that you’ve been able to get it up to is impressive, as is the self-learning capabilities that you’ve given it that are, in some sense, kind of human-like. Thank you so much for this time.

Pulkit Agrawal: Thanks a lot, John, for having us.

Gabriel Margolis: Yeah. Thank you, John.

TechFirst is about smart matter … drones, AI, robots, and other cutting-edge tech

Made it all the way down here? Wow!

The TechFirst with John Koetsier podcast is about tech that is changing the world, including wearable tech, and innovators who are shaping the future. Guests include former Apple CEO John Scully. The head of Facebook gaming. Amazon’s head of robotics. GitHub’s CTO. Twitter’s chief information security officer, and much more. Scientists inventing smart contact lenses. Startup entrepreneurs. Google executives. Former Microsoft CTO Nathan Myhrvold. And much, much more.

Consider supporting TechFirst by becoming a $SMRT stakeholder, connect to my YouTube channel, and subscribe on your podcast platform of choice: