Helm.ai is making AI that teaches itself how to drive a car

AI self-driving cars teaching themselves

Can we design AI that will teach itself how to drive a car?

Self-driving cars will unlock trillions in market value and probably change our lives with fractional car ownership, better ride-sharing … maybe even cars that pay for themselves. Also … they’ll give us back months if not years of our time that we currently spend driving.

In this episode of TechFirst with John Koetsier we chat with Helm.ai CEO Vlad Voroninski.

Helm.ai has developed a new AI technology it calls “Deep Teaching” which it says will make it 100,000 times cheaper to train self-driving car AIs.

Listen: AI that teaches itself to drive

Don’t forget to subscribe to TechFirst with John Koetsier wherever podcasts are published:

Watch: AI that teaches itself to drive

Subscribe to my YouTube channel so you’ll get notified when I go live with future guests, or see the videos later.

Read: AI that teaches itself to drive

John Koetsier: Can we design AI that will teach itself how to drive a car?

Welcome to TechFirst with John Koetsier. Self-driving cars will unlock trillions in market value and probably change our lives, right, opening the door to fractional car ownership, probably better ride sharing, maybe even cars that pay for themselves. They’ll also give us back months, if not years, of our time that we currently spend sitting in a car going places.

But they’re hard to train. Tesla might have the most data. Google’s Waymo is a leader as well. What about if we build an AI that learns unsupervised? Maybe based on dashcam video?

Helm.ai is pioneering what it calls Deep Teaching and we’re going to chat with the CEO, Vlad Voroninski. Welcome!

Vlad Voroninski: Hi John. Thanks for having me. 

John Koetsier: Absolutely. Your self-driving system is driving on roads that it’s never seen before. How’s it doing that? 

The bottleneck to training these systems is that annotated data is very expensive

Vlad Voroninski: Yeah. I would say similarly to how humans are able to drive on roads they’ve never seen, in a sense that deep neural networks are capable of learning arbitrary concepts, including to some degree generalizations to new environments. The bottleneck to training these systems is that annotated data is very expensive, but if you can train them on enough data you actually get pretty good generalization performance.

Vladislav Voroninski

Vladislav Voroninski, CEO of Helm.ai

At Helm.ai we’re able to achieve that using our Deep Teaching technology which trains without human annotation or simulation. And it’s on a similar level of effectiveness as supervised learning, which allows us to actually achieve a higher level of accuracy as well as generalization, more so than the traditional methods.

John Koetsier: Right, right. Before we dive into what exactly that technology looks like and how it works, maybe give a general comment to kick off — how hard a problem, how thorny a problem is it to create a car that’s self-driving?

Vlad Voroninski: Absolutely. So, simply achieving a system that has safety levels on par with a human is actually fairly tractable, in part because human failure modes are somewhat preventable, you know, things like inattention or aggressive driving, etc. But in truth that even achieving that level of safety is not sufficient to launch a scalable fleet.

Really what you need is something that’s much safer than a human.

Really what you need is something that’s much safer than a human. It needs to be fully interpretable for liability reasons, highly scalable, you know, the ability to kind of go into new places very quickly and highly cost effective. And achieving all of these things simultaneously is quite hard.

John Koetsier: Yeah.

Vlad Voroninski:  Really, the primary driving factor for all of these variables is really the sophistication of your AI stack. You know, the more accuracy it has the safer the endpoint system will be, the more broad the capability, the more interpretable it becomes, the more flexible the AI is, the more scalable it is.

And being able to do things without kind of teleoperation, or using computer vision instead of LIDAR, allows you to build an ultimately cheaper stack. And so really AI is the bottleneck, and in particular with the bottleneck is this kind of a huge number of these AI sub problems that have to be solved at human levels of accuracy. And each one of them is really kind of too expensive to be solved completely using the well known approaches, so, you know, inherently cutting edge R&D is required there.

And so humans when they’re actually paying attention and not being too aggressive or inebriated, are actually quite good at driving in a sense that we’re able to …

John Koetsier: Good to know.

Vlad Voroninski: Right, empirically. You know, we’re able to basically observe entirely unforeseen situations, interpret them kind of in the nick of time, and apply our knowledge of the world to kind of make an optimal decision. Right?

Even on a budget of like tens of billions of dollars, it’s not clear how to do that …

And totally anticipate situations and to build a fully self-driving car product that’s scalable would require a similar level of sophistication from the AI system. And so even on a budget of like tens of billions of dollars, it’s not clear how to do that when you’re using the traditional methods.

John Koetsier: Yes. So talk about how your software is different than your competitors. 

Vlad Voroninski: Yeah, absolutely. So yeah, I mean the software that we develop, as a function of our proprietary training methodology called “Deep Teaching,” ends up having higher levels of accuracy, can generalize better to new situations, and handles more corner cases.

For example, if you’ve ever been on Page Mill Road near Skyline Boulevard in the Bay area, it’s a highly curvy, steep mountain road. We’re able to drive that with just one camera and one GPU.

You know, essentially because the approach is highly capital efficient, we can build many more features than our competitors can. One example I can provide is, we can drive certain roads that still standard direction systems cannot. For example, if you’ve ever been on Page Mill Road near Skyline Boulevard in the Bay area, it’s a highly curvy, steep mountain road. We’re able to drive that with just one camera and one GPU, and the neural network that we trained to understand that road actually it was never trained on data from that road, nor did we use any human annotation or simulation.

And that certainly goes beyond the state of the arts of today’s production systems. And so really, you know, the difference at the end of the day is really kind of the real time perception accuracy, the ability to interpret sensor data very quickly and very, very accurately. And those advantages kind of carry across the board to many other functionalities, beyond sort of, you know, lane detection or understanding can really be used for any kind of object category.

John Koetsier: And talk about, you mentioned capital efficiency of your model, can you go into some more detail there? How is it so capital efficient? How have you been able to get so much training data? 

Vlad Voroninski: Absolutely. Yeah. So, you know, typically the way that an AI system is trained, right, there’s some kind of broad data coming in, let’s say it’s images. There are examples of some [tasks] being performed, like their annotations provided along with the image that maybe some human actually labeled certain pixels for what they are, and we train on that data. Now the cost of the annotation becomes the bottleneck for improving the system very quickly, because the cost of annotation is roughly dollars per image.

And these systems continue to get better into the billions or trillions of images, so it’s not really possible to get the accuracy you want that way. And so the cost of annotation is about a hundred thousand X more than the cost of simply processing an image through a GPU. 

The cost of annotation is about a hundred thousand X more than the cost of simply processing an image through a GPU.

John Koetsier: Wow.

Vlad Voroninski: So, yeah, so it’s really quite a big difference. And so that’s, you know, if you can come up with a learning technology that is on par with supervised training but it doesn’t require the annotation, talking about a hundred X — sorry, a hundred thousand X reduction in cost. 

John Koetsier: A hundred thousand X reduction in cost. 

Vlad Voroninski: That’s right. 

John Koetsier: Wow. 

Vlad Voroninski: Yeah. It’s similar to what we’ve seen, for example, in the biotechnology industry, right? So the cost of mapping the human genome dropped a hundred thousand X over the course of 20 years, right? So we’ve seen examples of that before.

In AI, I believe it’s going to happen a lot more quickly in the next few years and it’s going to have even broader impacts, right, because it’s going to be applying to many industries simultaneously. 

John Koetsier: Talk a bit about Deep Teaching, how you developed it, how it works, and how are you using it? 

Vlad Voroninski: Absolutely. Yeah, so Deep Teaching is really a technology that we invented at Helm and combines certain tools and insights from applied mathematics, with deep neural networks, to effectively teach them how to perform certain tasks without requiring human annotation or simulation.

So, you know, the motivation actually stems to some degree from an area that I worked in during my academic career called “compressive sensing.”

So to give an example, you know science is full of these kinds of reconstruction problems where you observe information, indirect information about some object of interest, and you want to recover the structure of that object from that indirect information. So for example, a diffraction pattern and extra crystallography can be used to recover the electron density of a protein you’re looking at, something like that. So that’s, for example, how DNA was discovered, or the structure of DNA was discovered.

So, compressive sensing is an area of research which solves these reconstruction problems with a lot less data than people kind of previously thought possible, by incorporating certain structural assumptions about the object of interest into their construction process. So, you know, these techniques have been actually used to speed up MRI by a factor of 10, and when combined with deep learning more like a factor of a hundred. So it’s a proven technology with kind of far-reaching implications in and of itself.

These techniques have been actually used to speed up MRI by a factor of 10, and when combined with deep learning more like a factor of a hundred …

And with AI, what we have, right, we’re looking for a neural network that will perform some task well. That’s the object of interest. And with supervised learning you have these training examples, it’s kind of like indirect information of how the task is performed. And deep learning effectively uses that to find the neural network. Now, Deep Teaching is basically something that is able to still find this neural network without those training examples. And you have to kind of bring in new information somehow, so the information that we bring in are essentially structural assumptions about the data at hand.

Certainly, I mean, if you’re familiar with computer vision certain gestalt principles, really kind of anything that we know about the world strikes a good balance of being general enough to be useful every time, but also quite informative. So it’s similar to compressive sensing in that sense on a very high level. And yeah, so, the challenge really is kind of, it’s not just removing a need for annotation, it’s also ensuring that there’s enough learned per image, right? 

John Koetsier: Yeah.

Vlad Voroninski: So you really have to bring in these kinds of more sophisticated priors, but if you can do that, then you’re really kind of approaching that cost reduction that we talked about earlier …

John Koetsier: Can you give some examples of those priors? 

Vlad Voroninski: Yeah, sure. I mean, essentially it can be anything that’s about the world.

I mean, I’ll talk about some basic stuff. I can’t get into like the very detailed things, but the world is three dimensional, right? We know that there’s kind of temporal contiguity, special contiguity, essentially anything intuitive that you kind of know about the world to roughly be true. Or there’s roughly 20 different things that the brain uses during the course of inference for vision that are well known principles.

There’s roughly 20 different things that the brain uses during the course of inference for vision that are well known principles.

And so in some ways, it’s, I can say kind of anything is up for grabs. Like any information you incorporate is up for grabs. The challenge is really how to strike a balance again for how to get something that’s going to be useful and repeatably useful across many different object categories and tasks, and is simultaneously very informative. And that’s the thing that I think we like really cracked with that new teaching. 

John Koetsier: Interesting, and so when I first heard you talk about those priors, I was thinking simpler, like objects or something like that. But you’re actually talking about principles, maybe like object permanence or something like that. Is that correct? 

Vlad Voroninski: Absolutely. That’s a good example. Yeah. 

John Koetsier: Okay, good. Now you’re not building a car yourself, right? What’s your model? 

Vlad Voroninski: That’s right. What we’re looking to do is really to solve the critical AI piece of the puzzle for self driving cars and license the resulting software to auto manufacturers and fleets. So we’re like a typical supplier, but we only license software. 

John Koetsier: Yes. 

Vlad Voroninski: Which means we’re agnostic to the sensor configuration on a compute platform. And we partner closely with our customers, which is really required for bringing sophisticated autonomous driving features into production. So you can sort of think about what we’re doing as kind of an Android model for self-driving cars.

You can sort of think about what we’re doing as kind of an Android model for self-driving cars.

John Koetsier: Interesting. Can you talk about your customers? Are they public? 

Vlad Voroninski: Yeah, I mean, it’s some of the usual suspects, we have interests from many OEMs, we’re working closely with several of them as well as some fleets. You know, we’re not able to discuss these customer engagements, but we look forward to making announcements at the appropriate time. 

John Koetsier: Interesting, interesting. Well, how would you rate yourself or compare yourself? I mean, obviously this is challenging for somebody who’s running and building — running a company and building the technology. How would you kind of rate some of the contenders in the field?  Maybe the Waymos, maybe even Tesla, slightly different proposition, others like that?  Where would you rank them? Have you ranked them? Do you do competitive analysis and where would you put yourself in that pecking order? 

Vlad Voroninski: Yeah, I mean, certainly we’ve done certain types of benchmarking and of course we go to trade shows and we see kind of everybody’s demos and they see our demos. So, you know, people can sort of get a sense. So I don’t want to make any blanket statements and certainly nobody has complete access to some other company’s technology internally, so it’s hard to know for sure, but as far as what’s out there externally, I think we had a very successful demo at CES. I think, I can dare say we had the best perception demo at CES. 

John Koetsier: Nice.

Vlad Voroninski: So that was, I think, pretty successful. 

John Koetsier: What was your demo? What did you do? 

Vlad Voroninski: Yeah, so we had, so actually a lot of these videos are now up online on our website, etc. But these were kind of semantic segmentation tasks, so essentially predicting what every single pixel in an image corresponds to, which kind of object it is. These are pretty challenging, very detailed computer vision problems where essentially it’s, you’re getting kind of complete understanding of an image in a 2D sense at least, we also have 2D information being predicted.

We did some highway driving on Vegas roads that had some challenging stats and aggressive Vegas drivers. So yeah, I mean, I think that one example I can give, is sort of when you train neural networks on a lot of data — and I’m talking like tens of millions of images or more for some instrumentation, you know, which is really something that’s pretty hard to accomplish using traditional methods because it would be far too expensive — what you get is something kind of much more crisp and stable in your predictions.

What you get is something kind of much more crisp and stable in your predictions.

And so that’s one thing that we can definitely tell we’re doing better than some of the other competitors, and I think people that work on the problem can also kind of tell right away when they look at the predictions. But again, maybe that’s a bit of a technical point, but we’ve certainly done a competitor benchmarking internally.

We’re not going to be talking about that yet, but you can look forward maybe to certain announcements from us in the near future about that. 

John Koetsier: Okay, I look forward to that, definitely. Now it’s not, what you’re doing is not just about autonomous cars on public roads. You’re actually building software that can be used for autonomous robots as well, different types of vehicles as well. Is that correct? 

Vlad Voroninski: That’s right. You know, when we first started Helm.ai a few years ago, we kind of saw the potential for many markets, including self-driving, cars, drones, consumer robots, kind of robotics at large, and even something like retail, like the Amazon Go concept, or fields like medical imaging.

So there’s a lot of applications.

We picked autonomous driving because of the market size and clear need …

I mean, we picked autonomous driving because of the market size and clear need for advanced technology. And we also saw certain specialized problems at the time that we were confident we could solve. We didn’t know exactly how general our technology would be, but it turned out to be quite general. And so there’s actually no difference in applying it to arbitrary object categories for perception or intent prediction kind of across the board. And so, you know, that opens up a lot of possibilities and I think advances in supervised learning like Deep Teaching or other approaches can be highly disruptive technologies across several trillion dollar markets. 

John Koetsier: Talk about some of those. I mean, obviously there’s a big need for delivery robots in the near future and other things like that, but talk about some of those trillion dollar markets. 

Vlad Voroninski: Yeah. I mean, I think that’s a really good example, right?

Delivery, both like land-based and air-based, is certainly going to be a market that gets disrupted as autonomy kind of comes to fruition. And, you know, the bottlenecks there are essentially the same thing that I mentioned before, right, that the sophistication of your AI stack will determine whether these technologies can be used at scale to actually handle all of the — I mean there are massive liability issues with deploying at-scale fleets. 

John Koetsier: Yes, yes.

Eventually we’ll have all sorts of robots helping us.

Vlad Voroninski: And so you have to be ready. So yeah, safety is going to be king and so is interpretability and cost. So yeah, I mean, we see quite a lot of potential there. And you know, just general robotics, like consumer robotics, right? Eventually we’ll have all sorts of robots helping us in various ways, or there’s even ways to kind of automate manufacturing, right, which is certainly relevant today. So yeah, I think we’re going to see quite a lot of activity in these markets this decade. 

John Koetsier: Interesting, interesting. And now of course, the question that every self-driving startup hates, the timeline question.

Give us your current kind of best guess estimate for Level 5 autonomy for self-driving cars, basically going just about anywhere they want to go, wherever their owner wants them to go, by themselves. 

Vlad Voroninski: So, yeah. I mean, so I think like a Level 5 is, it kind of depends how you interpret it, right? But I’ll answer the question. So if you mean Level 5 like literally going anywhere in a sense of being able to go like off-roading in a jungle or driving on the moon without knowing ahead of time what the task is, then I think that an AI system that can do that would be on par with a human in many ways.

If you mean Level 5 like literally going anywhere in a sense of being able to go like off-roading in a jungle or driving on the moon without knowing ahead of time what the task is, then I think that an AI system that can do that would be on par with a human in many ways.

And potentially could be AI complete, meaning that it could be as hard as solving general intelligence. So, you know, that’s kind of a controversial topic and so nobody knows for sure, but you know, it could happen within 15 years. I don’t know. Certainly I think computer vision will be solved to kind of human levels of performance maybe within a few years kind of along the way, and unsupervised learning is certainly going to play a pretty big role in that.

But I do want to make a distinction between L5 and L4, like I think they’re different, right? Like L4 saying, okay, we’re going to drive on just certain controlled access highways. That’s going to happen well before 15 years, right?

I mean, from a safety perspective, I think that’s already achievable quite soon, but really the bottleneck is going to be kind of liability and how to tackle that. And if the government kind of steps up and actually puts in place similar laws as to what like the aviation industry has, then it could really take off sooner because …

John Koetsier: Interesting.

Vlad Voroninski: Because then you’re kind of defining what exactly it means, because inevitably accidents will happen, right? And even have to go to a court of law and explain, sort of what exactly happened, and if an AI system is not fully prepared for some corner case, that’s going to be a very, very tough situation. And so that’s kind of really where the bottleneck I think will be for when we can deploy these systems safely.

John Koetsier: Yeah. Interesting, interesting. Excellent. Well, I want to thank you for your time. It’s been interesting. 

Vlad Voroninski: Thanks for having me. 

John Koetsier: It’s been a real pleasure and thank you as well, listeners and viewers, for joining us on TechFirst. Whatever platform you’re on, please like, subscribe, share, comment. If you’re on the podcast, you like it, please rate it, review it, and thanks.

Until next time, this is John Koetsier with TechFirst.