‘We like to say that we are not building a vehicle, we are building a driver’, says Dmitri Dolgov co-CEO of Waymo.
Dmitri helps lead the autonomous driving company with a mission to make it safe and easy for people and things to get where they’re going. Dmitri was also one of the founders of the Google Self-Driving Car Project, which began in 2009 and became Waymo in 2016.
Waymo's approach to self-driving cars involves building a highly predictive, generalized driving model (which they refer to as the Waymo Driver), which they have deployed extensively in Waymo One and Waymo Via–their autonomous ride-hailing and long-haul delivery services respectively.
Waymo began in 2009 from the Google Self-Driving Car Project, and was since spun off into its own company. Waymo’s mission is to make it safe and easy for people and things to get where they’re going. To this end, as Dmitri explains, Waymo has driven more autonomous miles than anyone else in the industry. Currently they drive more than 100,000 miles a week between their two autonomous vehicle services. Notably, their ride-hailing service, previously deployed in Phoenix, Arizona, has recently been expanded to San Francisco.
A fully autonomous driver, at a high level, must have similar capabilities to a human driver. It must have knowledge of its state, location, and goals. It must be able to sense or see the surroundings. Furthermore, it should be able to understand the intent of and interactions between other actors (i.e. other drivers, pedestrians, animals, etc.) in the world, and use this knowledge to predict what these actors will do in the future. Combining these capabilities is necessary to make safe, knowledgeable driving decisions. Finally, the driver must actuate the vehicle to actually drive around.
Dolgov explains that there are three main areas of technology that must come together to create such an autonomous system. The first is the self-driving hardware, i.e. sensor platform, compute hardware, etc. The second is the AI itself, the model that processes the data from the sensors and makes driving decisions. Finally, one requires off-board infrastructure that enables effective, large-scale training, deployment, and evaluation of neural-network models and the system as a whole. Waymo has chosen to invest in all three areas, creating their own hardware suite with three sensing modalities, fast, real-time inference using AI models for perception and scene prediction with significant machine learning infrastructure, as well as simulation infrastructure.
As Dmitri explains, San Francisco is great for testing Waymo’s autonomous cars due to the richness of the environment. The company had begun testing in San Francisco as early as 2009, and now returns in full force with a large-scale deployment of its ride-hailing fleet. But, while The Golden City is a key area of the company’s testing, they have done testing in over 25 cities, each of which gives them useful data.
While there are lessons to learn, the fundamental capabilities needed to drive in all of these locations are very similar. The distribution and frequencies of different encounters change, but the fundamentals basically remain the same. In San Francisco one is more likely to encounter crowds of pedestrians, for example, while less likely to do so in locations like much of Phoenix. However, in locations like crowded shopping malls or school zones, such encounters are much more prevalent in Phoenix and other locations. Learnings in one location can translate well to many others.
Waymo’s experience in taking an autonomous vehicle and deploying it on large scale came with its own unique challenges, such as maintenance, update pipeline and other scaffolding. Knowing the system works effectively, efficiently, and safely throughout such a large deployment, Waymo became much more confident in its capabilities and robustness. Furthermore, they learned the difficulties in iterating quickly and ascertaining they are performing better on their metrics and not caught in local minima. In order to regularly deploy the system, they created a large framework for evaluation and deployment, enabling researchers and engineers to focus on advancing the technology.
With all of this background and learning, the speed with which Waymo has been able to ramp up in San Francisco demonstrates well their ability to translate their capabilities to new domains.
Interested readers can sign up with Waymo’s Trusted Tester program, to experience the ‘Waymo Driver’ for themselves.
Waymo’s goal is to build a driver that is capable of adapting to changes in the environment and in its own form (i.e. car or truck). Waymo One is Waymo’s ride-hailing service, while Waymo Via is their delivery service. Much of long-haul delivery’s tasks is a strict subset of ride-hailing, so a large portion of learnings can translate from one to another. While long-haul trucking drives in different environments than ride-hailing (more highway as opposed to more local/urban streets), the basic capabilities of the AI are very similar.
“Unify and simplify.” Dolgov explains that, while there are some differences in specialization, the fundamentals of the three major areas he explained, hardware, AI, and infrastructure, carry over well. Waymo focuses on principled solutions to the core driving problems, simplifying and streaming architecture, and investing heavily in shared infrastructure. The different specializations require a more generalizable, robust stack. This will, in the future, enable them to more easily scale to more locations and to more product applications.
In the early days, much work was done in perception and using supervised models, requiring human labeling. As models and neural networks got bigger, the data requirements increased. Waymo leverages offboard perception to augment human labels with more powerful models which cannot be run in real time. Furthermore, data augmentation and synthetic data generation play a big role.
As more advanced models are used in other parts of the stack, such as semantics (i.e. understanding scenes as a whole), behavior prediction, imitation learning, and motion planning and prediction, the data strategies must change to meet the different data needs of these new model types. Generative approaches such as auto-labeling have an application here.
It’s also necessary to expand data in the tail of the distribution. What is described here are rare encounters and situations which, by nature, don’t have much data. Gathering more data in the head or average case of the distribution has limited value, while expanding areas with limited or no data can drastically change how the model performs in certain situations. Waymo has invested heavily into automating this task, to automatically expand the dataset where it is sparse. As Dolgov says, the data-feedback loop is a first class object in Waymo’s design methodology.
Focusing on one area is much simpler than focusing on both. A driver-assist system that focuses on the average case and relies on the human to address the long-tail areas is much simpler than a more robust system. There are many such off-the-shelf systems available today. On the other extreme, overemphasizing the tail without paying attention to the head results in an autonomous vehicle that never leaves the parking lot. There is no magic solution to this problem. It requires a lot of work and creativity to balance these two fundamentally irreconcilable areas effectively.
As much of the head or average case should be dealt with with the primary solution. Long-tail areas for one method might be easily addressed with another. Combining models, using data augmentation, and a robust sensor suite can “make your big hammer as big as possible.”
Afterwards one must evaluate performance and understand the system’s performance in the tail area. Having systems robust to various corner cases, coupled with anomaly and outlier detection, can provide solutions for the tail-end of cases.
“Perception is just the start of the challenge.” Dolgov explains that it is necessary but insufficient. Behavior and decision-making add a lot more complexity. In perception, one must identify objects. While this is technically complex, fundamentally it is just identifying objects like pedestrians, cars, etc. However, the more complex tasks require scene-level understanding and context. Recent advancements in machine learning have enabled tackling these problems.
Models that can represent the entire scene use efficient, well structured representations of heterogeneous features in the scene, based on appearance, structure of the world, etc. Waymo has published work on hierarchical neural-nets that model both static and dynamic parts of the environment.
Through his 15 years in the industry, Dmitri Dolgov has seen the rise of autonomous vehicles from “mostly science-fiction” to its current state of being deployed at scale in multiple cities. Currently, the foundation technology has been developed, and autonomous vehicles are maturing in their state in the world. However, there is much left to be solved, and there is great momentum in the industry. What we are seeing is only the beginning.
I'm excited to welcome our next speaker, Dmitri Dolgov. Dmitri is the co-CEO of Waymo, an autonomous driving technology company with the mission to make it safe and easy for people and things to get where they're going. Dmitri is one of the founders of the Google Self-Driving Car Project, which began in 2009 and became Waymo in 2016. At Waymo, Dmitri's responsible for overall company strategy, with his primary focus on the development and deployment of the Waymo driver. Prior to Waymo, Dmitri worked on autonomous driving efforts at Toyota and Stanford as part of Stanford's DARPA Urban Challenge team. Dmitri received his bachelor and master's in physics and math from Moscow Institute of Physics and Technology and a PhD in Computer Science from the University of Michigan. Welcome, Dmitri. Thank you so much for sitting down with us today, Dmitri.
Thanks, Alex. It's great to be here.
Alexandr Wang (01:16):
So I want to start out actually... Waymo recently announced it's Trusted Driver program in San Francisco, and I live in San Francisco. For most of us in San Francisco, we know there's a unique topology, it's a busy city, there's always unique challenges associated with it. One of the things I'm really curious to hear about from you all... You started in Arizona and now have a program in San Francisco. Can you share a little bit about what your strategy has been for testing the AI and ensuring that it can be deployed safely?
Yeah, San Francisco is great. It's great for testing and technology development. In fact, because of the richness of the environment that we see there, we've tested there as early as 2009. I actually remember one of our early milestones was autonomously driving through Lombard Street, which I think was something we did back in 2010. We've been in the city for many years. We have more recently ramped up the testing and data collection that we're doing there, especially this year, to the point where today we're driving more than 100,000 miles per week, and it's giving us incredibly valuable experience.
It is, from the technical perspective, a key part of our testing and data collection portfolio. Of course, we test in a lot more places than just San Francisco or Arizona. Over the years, we've been in more than 25 cities across the country, and each location usually offers us something very valuable, very unique. For example, driving in Michigan in the winter gives you super useful snow data in dealing with the diverse weather conditions. And from that perspective, similarly, San Francisco is great in terms of helping us refine the performance and the capabilities of our driver when it comes to urban settings, things like dealing with dense crowds of pedestrians, narrow passages, all those interesting things that urban settings offer. And more broadly, driving in California has been incredibly valuable. So over the years, we've accumulated more autonomous miles than anyone in the industry and that has greatly informed and guided our technology development.
Besides the technical perspective, urban areas like San Francisco actually offer really valuable insights on the product and commercial side of things. And as I mentioned, this is why last month we started our Trusted Tester program, which allows residents of San Francisco to participate in this program and then hail an autonomous ride in one of our vehicles, and they will be experiencing the latest version of our technology. It's on the full electric GLRP basis with the latest... the 5th generation of our driver. That includes the latest hardware as well as the latest software. So we're getting super valuable insights from our riders and that's been tremendously valuable in helping us refine the technology and the product.
Yeah, that's super cool. I'd love to figure out a way to become of those trusted riders. I want to actually take a big step back for a second. You've been working on the autonomous vehicle probably for many, many years now, in over a decade, and I'm curious what originally drew you to the problem, and what sort of kept you in it for so long in such exciting way?
You're making me feel old. Yeah, it's been quite a journey. I actually consider myself incredibly lucky. I see this as a once in a lifetime opportunity. What has drawn me to this problem and what continues to keep me incredibly excited about it, combination of things that I think come together in a way that's unique to the space. One is just the incredible opportunity to have a positive transformative impact on society. Not that often do you get a chance to transform an industry that is so huge and have a positive impact on safety, on removing any friction out of transportation, or people getting from here to there as well as goods. That's number one.
Number two is technology. It's incredibly exciting, and it's a number of different areas that will have to come together to enable us to build and deploy a fully autonomous vehicles, from sensing, compute, ML and AI, infrastructure, and there's just so many... And all of those things have to be kind of state of the art, and it's incredibly exciting to be on that bleeding edge of technology in all of those areas and be pushing it forward. And then, finally, I found that this space draws some of the most interesting, most talented people that I really, really enjoyed working with.
Yeah, that's so awesome. I wanted actually... kind of touch on the technology. Waymo operates a fully self-driving, ride-hailing service in Arizona. And to be able to operate that service in the first place, there's just a huge number of technical challenges that I imagine... or that I know Waymo has had to overcome over time. Maybe explain some of the technology that enables the Waymo driver and how's that evolved over the course of the past 10 years. What are sort of the big things that you have learned?
It's a complex problem. And if we're talking about building a fully autonomous driver, a system that is capable of taking care of the entire task of driving end-to-end, the capability that you have to build at a high level is not that different from what a human has to do while driving around. So the high level, you have to know where you are and where you're going. You have to be able to see your surroundings and understand what's going on. You need to understand the intent and be able to understand the interactions between other actors, on the more pedestrian cycles to other drivers, and make predictions, like what they're going to do jointly in the future. And finally, all of that enables the safe and predictable and comfortable driving decisions that you yourself have to make, and then, of course, you have to actually... the vehicle to drive around.
Obviously, there's a tremendous amount of technology that goes into creating all of that capability. I would, at a high level, highlight three big areas of technology that have to come together to create that system. One is self-driving hardware, sensing, compute. Number two is autonomy AI and on-board software that runs in the car that processes all the data and makes driving decisions. Number three is all of the off-board infrastructure that allows you to effectively and at large scale train your system, train your ML models, evaluate them, and deploy them, and evaluate the entire system, things like the simulator.
So on the hardware side, our approach has been to build around hardware suite. We use three different sensing modalities, cameras, LIDARs, and radars. They all have high resolution, long range, and 360 degree coverage around the vehicle, and all of that is powered by our on-board compute system that were there. The emphasis has been on fast realtime inference in high capacity ML models.
On the autonomy AI side of things and on-board software, our strategy has always been to leverage state-of-the-art ML and all parts of our stack, from basic perception to semantic understanding to behavior prediction to planning. And finally, we have been investing very, very heavily into all of the infrastructure, in particular, ML infrastructure and simulation.
Yeah. One of the questions that naturally comes up is... Obviously, AI is very fast evolving field but, notably, AI has lots of limitations. The state-of-the-art AI that you'd get from the rest of the world has natural limitations. What are some of those limitations, and how have you all at Waymo thought about what needs to be innovated, what needs to be researched, what needs to be built, to then enable the safest possible driver?
Yeah, it's been a super fast moving field. Honestly, over the last decade, things have changed drastically. The AI and ML that we have today is very different from AI and ML that we have back in 2009 or even around 2013, 2014, in the early days of conveyance. That was a big step function but, even then, what we have today is drastically different.
So the limitations... The capabilities change, the limitations change. I would characterize it more of as an continuous ongoing almost daily push to improve the capabilities of AI and ML as it applies to autonomous vehicles and iterating on the architecture of your entire system to leverage the full power of the most advanced state-of-the-art ML. That breaks down into a few dimensions, things like accuracy, obviously, efficiency, leveraging better structured representations dealing with sparse data, and so forth and so on. We've been doing a lot of work across all of those fronts at Waymo, and we work with the research community. In fact, we contribute back to the research community through some efforts, like our Waymo open dataset, so that we, together jointly as the community and the industry, can move things forward along all of those dimensions. And that's really what we need, to iterate as fast as possible on all of these problems.
Yeah, truly. One thing that I think is underratedly challenging part of self-driving is sort of being able to scale across the entire United States or over time across the whole globe effectively. So how does your experience in Arizona help you scale to new environments where there might be different challenges or different lessons to be learned or different qualities, like San Francisco, for example? And what are some of the technical challenges in being able to scale the driver across geographies in the United States?
Yeah, great question. So make a couple points here. First of all, the capabilities that the fundamental people build is that you need to drive in those different environments are not that dissimilar. Of course, the distribution of context and scenarios that you encounter and the frequencies change, but a lot of the fundamentals are shared. For example, San Francisco and other urban areas, you are fairly likely to encounter crowds of pedestrians that you have to navigate through and that happens fairly frequently on roads and intersections and this context. In Phoenix, that happens less frequently on large multilane roads, but when you're navigating through shopping mall at peak density time or you're driving through a school zone at pickup or drop-off hours, you face many of the same challenges.
Now, second point I'm going to make is that... As we discussed, it's our experience and the testing and data collection that drives the development of our system comes from many more places than just Arizona. We've tested in one in 25 cities over the years in this country. And while we chose to focus on Phoenix for the purposes of deploying our first ride-hailing service, there's been a lot more that has been happening in the background that's been driving the development of the system. In particular, all of that, we had many multi-year research and engineering efforts that are now coming together and in the fifth generation of our system that is now driving in San Francisco.
But to the core of your question of what is that experience in Phoenix and how does that help, it's actually incredibly valuable. That unique experience that we have of taking the technology all the way to drivers to operating a driver regularly as opposed to just one-off demo or one-off pilot, and actually standing up a full service with these fully rider-only vehicles that have no human drivers in them and all of the scaffolding that we have to build around it has been incredibly valuable.
When you knew that it's going to be very difficult, and we knew the experience is going to be very valuable, and that's why we picked Phoenix as our first deployment. And, oh, boy, did we underestimate how much we're going to learn. It helped us evaluate the system with confidence. This task of evaluating the capabilities of autonomous vehicles, in many ways, is just as hard as building it in the first place. Phoenix taught us how to develop at speed and iterate in our performance in hill climb and all the metrics that we care about and make sure we're not going the circles or getting stuck in local memory.
We learned to regularly deploy the system. Just imagine the first time you want to get system out the door, that's all of that your engineering team does, trying to get the first release out the door and nobody has any cycles to do anything else. That's not sustainable. That's not how you go fast, right? We've invested a tremendous amount in building this whole machine that helps us evaluate and deploy the system, so that our researchers and engineers can actually focus on advancing this technology, and then the deployment happens in the background. So that's incredibly valuable.
And finally, we learned to operate this fleet of fully autonomous vehicles 24/7 and, of course, there's a lot more that goes into operating a fleet of fully autonomous rider-only vehicles that you don't have to do if you're operating a fleet with human drivers in it. In that experience, we'll try to slay it really well. Carrying that technology all the way to a fully launched commercial service has been incredibly difficult and it's incredibly valuable experience, and all of that is very, very hard to do, and all of that experience, of course, will translate really well to future deployments in next domains. In particular, as we discussed, recently we started offering our rides in San Francisco through our Trusted Tester program, and the speed with which we were able to get to that milestone after we ramped up our testing on the new latest generation platform in the city early in the year is very positive evidence of that experience that we've built over the years, translating seamlessly to new domains.
Yeah, no, I mean, I think a lot of what you just mentioned... These are the problems that people don't think about when you think about, okay, what actually is involved in launching a full autonomous ride-hailing service, but there's a lot, just in terms of how you have to operate, the systems you have to build, the infrastructure you have to build. That is very hard. You're right, it's very unique, that Waymo's built all that, operating service and now shipping that to a bunch of other geographies.
One other thing that is exciting about Waymo or that I'm sure that you're excited about that is operating in different business domains as well, so not only operating in ride-hailing but also having a long-haul trucking and local delivery programs. One thing that I'm sure you think a lot about is how do you design the AI capabilities, so that you're able to leverage the same AI capability and the same technology across each of these business domains. How do you that? How do you think about an overall system design problem as well as how you enable the organization to be able to do that?
Yeah, that's absolutely correct. We like to say that we are not building a vehicle, we are building a driver, a system that's capable of fully autonomous driving and will support multiple applications and multiple commercial lines. Right now, we have two main business lines, Waymo One is for moving people for our ride-hailing service, and Waymo Via for moving goods through our trucking and local delivery efforts.
Now, in terms of building a driver that generalizes across the domains, depending on which combinations of products and deployments you take, the amount of overlap varies. For example, if you look at local deliveries, actually the environments where the market is and the driving challenges that we have to solve are, in many ways, just descript subset of what you need to do for ride-hailing, so the driving task kind of basically follows from that. Trucking's a little bit different. As we discussed, some of the shared capabilities translate between the domains. Clearly, there's some specialization and differences when riding a little car versus a big car. The amount of time that you spend driving on urban low-speed roads versus freeways is different. You're, of course, much more likely to encounter some situations like person opening the door of their car and hopping out right in front of you in a dense urban setting as opposed to I]on a highway, but it still happens. You still have to be ready for it.
So at a high level, while the risk in specialization, the fundamental capabilities... We talked about hardware, autonomy AI, and all of the infrastructure in the backline. In all of those areas, the fundamentals that really hard, the first order hard problems in research and engineering, those carry over really well. You look at sensing or hardware platform or generally with compute, while the configuration of sensors and some specification might be a little different but, fundamentally, what goes into building high resolution, long range, reliable sensors is shared, and that's why we have our fifth generation hardware on both the GLRP system and the trucks. The same goes for compute.
In terms of autonomy AI, again, the distribution of context is a little bit different, but the fundamental capabilities... what does it take to build a good perception system, good understanding of semantics and prediction and planning, that is shared. And finally, on infrastructure side of things and the data science, all of the data management for training and evaluating your ML, the simulator, the data part is different, right? So the parameters might be different but the fundamentals, all that multi-year investment into the tooling of all of the frameworks... it carries over really well.
Because of that, really, our strategy when it comes to technology development and team organization is to unify and simplify. So we focus on principled solutions to the core driving problems. We put a lot of effort into simplifying and streamlining our architecture and investing very heavily in shared infrastructure and developing productivity that enables all of those platforms. So if you do it right, what we're actually finding is that even though there is, of course, a cost to a specialization, you'll also get very positive feedback loops between those domains. It forces you to build a more robust, more generalizable core stack, and then that actually serves you really well in the long term.
Yeah, I mean, even just drawing analogy to the trends in AI recently where the big learnings you have... If you just build one big model and then you learn how to express that model in various ways, that's how you get the best possible performance, right? Obviously, this is probably literally true in terms of models that you're producing but also maybe in more organizational levels. You invest a lot into developing core capabilities that can be adaptive to all different kinds of situations.
That's absolutely right. I think I really like the analogy. You invest in the proper way of solving the fundamentals of a problem without building a lot of fragmented little pieces, and that's the right solution that enables any of your deployments, any operation, any of the environments.
Yeah, no, so this investment into this core fundamental system makes a lot of sense. You've already seen benefits through it across your different business models. How do you think it helps you as you decide to create potential other vehicle fleets or other market opportunities, other use cases, on top of the same core technology?
Oh, the main thing is that... That's why we're doing it is that it helps us build robust, generalizable solutions, and it is serving us really well in the two main applications that we have today. That's that same muscle, and that's that same shared core that will leverage and will enable us to pursue additional vehicle platforms and additional scaling to new environments and new cities and supporting multiple product applications.
Yeah, awesome. Yeah, I want to switch gears now to a component of the tech strategy that we talked a little about which is the fact that, as you mentioned, Waymo's driven more miles with autonomous vehicles than any other player to date. And so, there's currently been a huge investment into getting a lot of testing, getting a lot of data, and being able to use that to inform great algorithms. You and I both know, at its core, that the data's what sort of program these algorithms and so that's a key part of any sort of AI-centric strategy. How has your data strategy evolved over time at Waymo from maybe the early days where the inaudible 00:25:02 was started pre... lot of the neural network craze and I'm sure you've adapted to that and you've adapted to a lot of the recent advancements in the technology?
Yeah. I would say that evaluation and data-driven development has been a part of our strategy from the earliest days. But, of course, what that means and what type of data you have and what kind of ML models you have and how that data powers those models is drastically different between what we did today and what we did, well, even a couple years ago... let alone 2009.
In terms of things that changed... Clearly, along one dimension, there's been a significant evolution of the ML models themselves. In the earlier days, a lot of the ML work was primarily in perception and supervised ML and, of course, that has certain data strategies. Of course, as the models get bigger, you need more data, but a lot of it was supervised ML with human labeling. Over time, as we're starting to see more advanced ML models, find applications across other parts of the stack, like semantics, understanding scenes of the whole behavior prediction, imitation learning, motion planning, and decision making, that has an impact on the data strategy. In that space, there is a lot more you can do with auto labeling, and other generative approaches to data management become more relevant.
And that's been very important for us to leverage these advances to the max is the data part of expanding into the tail of the distribution. ML models operate in distributions. Any cases or examples in the tail of the distribution, all of the distribution present challenges. When you look at the data aspect of it, investing and gathering more data into the head or the average case of the distribution has limited value, so you really want to go after the tail. And that has been a key part of our strategy over the years as well.
There, you need to be bold to evaluate the performance of your system. You need to invest in data mining to find interesting examples that are representative of that tail part of the distribution. And then, use a variety of techniques like data augmentation or simulation, to kind of leverage those examples to the max, so that you can allow your models to get farther into the tail. And, of course, you want to automate that whole process, and that's been another big area of our investment, frameworks and all of the ML infrastructure for closing the loop on that data mining training cycle.
One thing that I think... I remember seeing a Waymo presentation that I think is profound. It's like investing in this data feedback loop as a... that's our first class object in the development life cycle, right? How do you get this go from data mining to training to using that performance in your car to then get more examples, and how do make that loop travel as quickly as possible.
That's exactly right. How do you make it as fast as possible and how do you make it as automated and low human engineering cost as possible. We talked a little bit earlier about building that machine for deploying this system, evaluating and deploying a system so that it happens almost automatically. Same goes for the ML infrastructure, right? Really, that full machine and that full cycle, you want it to happen quickly and kind of automatically in the background, so your talented people researching, engineers can focus on other things.
Yeah. I think one of the fun things... When you say long tail, obviously, we can visualize that means in the distribution. But I think the long tail circumstances in self-driving is actually really quite novel or quite surprising. What are some fun, interesting long tail examples that you all have discovered? As we mentioned before, you have driven the most miles among any self-driving effort. And so, what are kind of the most surprising example or some fun ones?
Oh, man. We see a lot. Yeah, there're some things that we maybe don't see every mile, but sometimes they get characterized as the long tail. I wouldn't call them the long tail, things like construction zones, vehicles and other cars running stop signs and red lights, aggressively jaywalking pedestrians or cyclists breaking laws, stuff falling off of trunks. Don't see that every mile, but it is part of the distribution.
And in terms of really rare stuff, we see a lot, things like a drunk cyclist weaving through traffic with a stop sign on his back. Halloween always is a good source of interesting data. You see people wearing Halloween costumes, witches, ghosts, spiders, dinosaurs, all kinds of animals, animals on the road, horses, other animals doing animal things. We recently saw a Bubble Truck. It's a truck that drives around making bubbles... pretty helpful. All of those examples into be very interesting, very informative. We pay close attention to them. Most of the time, whenever we find them, that's part of the data mining and the hard data example mining strategy. You bring them into your data sets and this is where the power of our simulator comes in. Each of those examples, we're able to expand into a family into of more scenarios that then guides the training evaluation of our system.
Yeah. How do you balance holistically as well as from a data perspective this challenge in solving for the long tail as well as the average case or the head case distributions? Obviously, literally from a neural network perspective, there is some trade-off to those optimization functions. Is it challenging to optimize for both? How do you all think about it? How do you ensure that that results in the best possible performance?
Yeah, it is a challenge, both of the model level, component level of this system level. The trade-off is complicated. As you pointed out Alex, if you just focus on one of those, let's say, you want to focus on the average case, maybe for the purpose of building a driver-assist system where you kind of hope to rely on the human to helping you out in the long tail, that simplifies things tremendously. In fact, there's a lot of off-the-shelf stuff that exists today that gets you most of the way there. Not good enough for a fully autonomous inaudible 00:32:53 type system where since you're operating in a safety critical environment, you really need to have a very high performance in the tail. But then, of course, if you overemphasize the tail and you don't have a good graceful solution to it, you end up with an autonomous vehicle that never leaves the parking lot. That's not particularly useful either.
Unfortunately, there are no silver bullets to this problem, and it's actually one of the dimensions that makes this whole space so complicated when you want to go to full autonomy. It's so hard. It does require a lot of hardwork and a lot of creativity on the research and engineering along multiple dimensions.
So what we find particularly important to do this well is the following. First of all, modeling data. You want to bite off as much of that head or torso of the distribution as possible with your primary solution. And what might be the long tail for one sensor might be piece of cake for another sensor. What might be the long tail for one approach to data and ML modeling might be the head of the distribution for another one. This goes into the various things which is discussed earlier, on modeling, on data augmentation. So you employ all of those to basically make your big hammer as big as possible, enough to take as much of a bite off the distribution as possible.
Then, the other key component that you need in order to even talk about this optimization and this trade-off is evaluation. You need to know your performance in the tail. That's kind of your first order of business, right? If you can't evaluate, there's no meaningful optimization strategy that you can pursue. So that goes back to the various approaches and various techniques for evaluating the system, in particular, the simulation.
And finally, system-level capabilities are really important. You can design or you can architect your system overall to have redundancy and to be robust to various corner cases, so that it enables a robust reaction or robust handling of cases that maybe even never seen before. And then, you couple that with various mechanisms for anomaly detection and all about detection.
Yeah. I mean, how do you test the system? To your point, how do you design and build a system? One of the unique challenges of self-driving is sort of this open-ended long tail problem, which is like you could encounter scenarios that no vehicle has ever encountered in quite the same way in the past. How do you think about that from a development perspective? How do you develop the system? You mentioned part of it is building the system that's robust. How do you develop it so it can handle these scenarios as well as evaluate its performance on this open-ended long tail problems?
It's a huge question, right? I mean, you talked a lot about few themes. You talked a little bit about events that are rare, like people running red lights or people jumping out from cars. There, you can talk about the distribution and characterize it and try to climb into the long tail. As you pointed out there, some cases that are so strange that doesn't make sense to talk about characterizing the distribution.
For example, we, a little while ago, saw a motorcycle traveling at 70 miles an hour in our path without a rider because the rider happen to fall off a while ago. The system still has to be able to handle it. Again, for these one of cases, a lot of the work that you do happens at the system level. You kind of have to work in a systematic way throughout the system. Starts with perception, right? There's various techniques starting from sensors but all the way to perception models and algorithms that you can employ where even if you don't understand something, even if you can't classify something, there's still reason about it. For example, in this case with the motorcycle that doesn't have a rider, maybe that's not something you trained your system and you can effectively classify there's an object. You still see it as a moving object, so you propagate it through your entire stack, and so forth and so on.
Yeah. How do you think about the technology approach to solve this prediction planning problems that are maybe more open-ended in nature of broader distributions, et cetera?
Yeah, you're absolutely right. Perception is just the start of the challenge. It is very important to do a very good job at perception. If you can't see what's happening around you, you're not going to be able solve all of those other challenge, so it is table stakes. It is necessary but insufficient to do well at those other tasks.
When it comes to these challenges of behavior prediction, semantic understanding, decision making, the dimensionality of the problem explodes. For perception tasks that are object low of perception, yes, the sensing dimensionality is very, very high but, at least, it's specific and local to an object. Pedestrian's a pedestrian and a car is a car, regardless of what other objects you have around it. When you get into the domain of semantic understanding, understanding intent, making predictions, making decisions, the entire context of the scene becomes very important.
Unfortunately, some of the advances in recent ML and AI around high capacity models have been very powerful in allowing us to tackle some of those challenges. What has been very useful is creating architectures... This is the data strategy that also is a part of it. Models that can effectively represent the entire scene, the entire context that you're operating in, use efficient small-structured representations to model the interactions between all of the components that matter. And often, those are very heterogeneous signals, some are based on appearance, some are based on behavior, some are based on the static structure of the world, to allow us to create those models that can be deployed in production.
Yeah. And I'm curious to dig in to the data strategy management. Obviously, for the perception problem, it's in some ways straightforward. You label a bunch of cars, then you can predict a car in an image. What is the data strategy look like? How do you have to then think about your datasets? What datasets are you collecting? What information are you trying to pull out of those datasets to be able to then drive? And by the way, how you do data augmentations, how you might generate new data in a way to build this finer grain representations, as you mentioned, of the scenes?
Yeah. So lot of these techniques actually do come into the play even when we're talking about the core perception tasks. A part of the evolution of our data strategy there has been to leverage things like offboard perception. With the computational power that we now have, we can build more powerful models that help us not purely rely on human labels but also augment that with the more powerful models that we can run not in realtime, and we leveraging massive compute and leveraging various tricks like going back and forth in time. That's part of the strategy.
Data augmentation has all kinds of techniques. I actually published some of our work around data augmentation and synthetic data generation that also helps. When it comes to things like behavior prediction, the nice thing... Dimensionality explodes, complexity of the modeling of all of the interactions explodes, but the nice thing is that you can leverage auto labeling, and you see what happens in the future, and those are your labels of domain. It simplifies things somewhat. Models there become really important. In part, we've talked a little just now about structured representations that allow us to model interactions. We published some work on the hierarchical graph neural nets that model all of those interactions between static parts of the environment and as well as dynamic parts. The model that we had for behavior prediction there is something we call VectorNet, that can leverage some of those most advanced techniques.
Similarly, if you're doing planning or imitation learning, you can do things... Auto labeling is another thing that also helps you in this domain. And then, you can do other things like leveraging your simulator to just play in that environment that opens up possibilities like reinforcement learning.
Yeah. For example, what we saw in large language models, or NLP in general, is the evolution of the role of the human teacher was one of explicitly producing labels to then producing nudges or guidance or ratings or whatnot that then allow us to figure out, okay, what is it that the human teacher is getting at and what is it that is more reasonable. How do you think that that kind of paradigm plays into the evolution of how a lot of these models develop in self-driving?
Depends on what kind of teaching you are doing, I find that purely doing imitation learning in an environment that is as complex as driving is very powerful, but it's not enough. You need to augment it with something. You end up facing issues whenever you're out of the distribution like that or the classical inaudible 00:43:57 and things like that. So a lot of our focus in this space has been to leverage those techniques around high capacity models and imitation learning and human teachers, but then augment them with some additional boost that you get from injecting some bias into your system through some structured representations and leveraging the simulator to allow you to explore all the parts of the space that are not well-represented in the distribution that you see from human to human examples or human teaching.
Yep, yep. I want to pivot now to sensors. You've mentioned it a few times, kind of the sensing problem. One of the unique things about Waymo's strategy is its decision to build its own LIDAR sensors and, in some cases, even sell those to other companies. Can you share why Waymo strategically decided to built its own LIDAR sensors?
Yeah, there's actually maybe two parts to this question. One is why use LIDARs at all and the second part... should you build your own or buy something off the shelf.
On the first one, again, if we're talking about building a fully autonomous system, one that's capable of operating and performing safely in all of the complex scenarios that we might encounter, then... While perception, as we discussed, is just the start of the problem, being very good at it really matters. And different sensing modalities, LIDARs, cameras, radars, they have different physical characteristics that complement each other very nicely. LIDARs, for example, compared to cameras are active sensors. They bring their own energy, so they work just as well in pitch darkness as they do during the day. Radars use different wavelengths that help them work better in other environmental conditions. They really complement each other nicely, so our strategy is better sensors, better data, more powerful system. There is no reason, the way we see it, to handicap yourself to not use the power that those different sensing modalities provide. And LIDAR is one of the really important ones.
And to the second part of the question, why build our own versus using something that exists, again, it comes down to the level of performance. We are now on our fifth generation of the hardware suite. We've accumulated over the years a lot of experience in building the LIDARs themself and understanding what really matters for our entire stack, and all of that experience went into the design of our fifth generation LIDARs, which are much more advanced than anything that you see in the rest of the industry or much more advanced than our fourth generation system.
We talked a little bit about scaling. That is another reason why we build our own. You really have to design for manufacturability at scale. Building self-driving hardware, for example, is a very different problem than building cars to scale. There are great companies that are incredible at manufacturing vehicles which is very difficult problem, but it's a very different one from manufacturing these complex electro-optical systems that go into self-driving vehicle or autonomous vehicle. So we've designed from the ground up our fifth generation hardware, including the LIDARs, to be manufactured by scale, to provide the capability that we need, to provide a level of reliability that is required, and have the right unit economics.
That is part of the strategy. Not just LIDARs, it also what we are doing with the rest of hardware, radars, cameras, LIDARs, can be a platform. It allows us to build what we need for where we're going at scale. And when you're designing the whole hardware stack, you also have some pretty interesting opportunities to optimize things jointly, not just optimize one of the sensing modalities for the needs of your software stack but to optimize the entire system, the sensing and the compute, so you can provide the capability that gives you the best bang for the buck on the entire stack.
I want to close by just asking a couple big picture questions. Dmitri, you've been pioneering autonomous vehicles and working the field since the very, very early days and obviously continued to today where you're running Waymo and co-CEO of Waymo. If you take a big step back and look at sort of all the progress that's been made to date, what's kind of your big picture analysis on how far the industry has come so far and how far we have left to go?
Yeah, a lot has changed. I've been, myself, lucky to be a part of this journey for more than a decade now. I've been working on autonomous vehicles for about 15 years now, so we've definitely gone a very, very long way. Even when we started in Google in 2009, we had a vision. We believed in the technology. We believed in the transformative potential of it, but it's still mostly science fiction, right? And it's been an incredible journey, and it's amazing and rewarding to see this thing having become real, went from a vision to now a product that's deployed. We have a fleet of fully autonomous vehicles that are operating 24/7, that are open to the public, and anybody can download an app, the Waymo app, hail a vehicle, completely empty car shows up, takes you anywhere you want to go in our service territory. So it is incredibly exciting to see us get to this point, and it's a very, very meaningful product and technology proof point. The cars are here. They're driving around. They are moving people.
As a phase for us as a company and in terms of where the industry is as a whole, I think right now is actually an incredibly exciting time. On one hand, there is this proof point. These cars are driving around with nobody in them as we speak, and it gives us that solid footing. It gives us that foundation that we can build up on as we go forward. And that experience is, you and I discussed that, actually has been incredibly valuable for Waymo. There's no beating during this, for reals.
But going forward, it may mean very interesting, very exciting challenges remain. We're going to see many breakthroughs in a number of areas. I talked about some of the very exciting stuff that's happening in high capacity ML models and how you can leverage it in this domain. So it's this very nice mix of this thing is real, you can drive around and experience the product but, at the same time, all of the dynamic nature and all of the technical momentum that's happening in the industry is still there. So I'm very excited about where we are. I'm super excited about where we're going as we take this technology to more places and more people, both Waymo One and Waymo Via for moving people and goods.
Yeah. What's next for Waymo?
Well, we have a lot going on. So in the coming months, in ride-hailing we will see us do more in our Trusted Tester program. If you live in San Francisco, make sure to apply, check out the rides in our autonomous vehicles. And then, we'll see more news in Waymo Via as we expand our testing and expand our operations on the trucking side.
Super exciting. Well, you have to get me off that wait list for the Trusted Driver program in San Francisco. Dmitri, thank you so much again for sitting down with us today and this super interesting conversation.
Thank you for having me. I really enjoyed it.