Sign in or Join the community to continue

ML at Waymo: Building a Scalable Autonomous Driving Stack with Drago Anguelov

Posted Oct 06, 2021 | Views 35.2K

# TransformX 2021

# Keynote

Share

speaker

Dragomir Anguelov

Distinguished Scientist and Head of Research @ Waymo

Drago joined Waymo in 2018 to lead the Research team, which focuses on pushing the state of the art in autonomous driving using machine learning. Earlier in his career he spent eight years at Google; first working on 3D vision and pose estimation for StreetView, and later leading a research team which developed computer vision systems for annotating Google Photos. The team also invented popular methods such as the Inception neural network architecture, and the SSD detector, which helped win the Imagenet 2014 Classification and Detection challenges. Prior to joining Waymo, Drago led the 3D Perception team at Zoox.

+ Read More

SUMMARY

In this keynote, Drago Anguelov, Head of Research at Waymo, discusses Waymo’s progress towards building a scalable technology stack for autonomous driving vehicles. With more than a decade of experience in solving autonomous driving, Waymo is now operating the world’s first commercial ride-hailing service Waymo One in Phoenix and has recently welcomed its first riders in San Francisco by kicking off the Trusted Tester program. Drago will give an overview of the key autonomous driving challenges and describe how Waymo is leveraging the cutting edge ML systems across the stack to handle them. He will also outline promising avenues to keep expanding the scope of ML in the stack in the future and showcase some of Waymo’s work in the space.

+ Read More

TRANSCRIPT

Speaker 1 (00:15): Next up, we're thrilled to welcome Drago Anguelov. Drago is the head of research, and a distinguished scientist at Waymo. He leads the research on pushing boundaries of autonomous driving, using machine learning.

Speaker 1 (00:31): Prior to joining Waymo, Drago led the 3D perception team at Zoox. Before Zoox, Drago was at Google, first working on 3D vision and pose estimation for street view, and later leading a research team which developed computer vision systems for annotating Google phots. The team also invented popular methods, such as the inception neural network architecture, and the SSD detector, which helped win the ImageNet 2014 classification and detection challenges. Drago, over to you.

Drago Anguelov (01:10): Hi. I'm Drago Anguelov. I lead the research team at Waymo, and today I will tell you about how we leverage machine learning to build our autonomous driving stack. A short intro for those that are not familiar with Waymo. We are Alphabet's autonomous driving company. We are building the world's most experienced driver, focusing on two categories: moving people with our Waymo One service, and moving goods with Waymo Veer, our commercial delivery arm that features these class A trucks and delivery trucks.

Drago Anguelov (01:45): Over more than a decade, we have driven more than 20 million miles on public roads in autonomous modes, over 20 billion miles in simulation, and across more than 25 cities in the United States. And all this experience that we have collected is benefiting all the drivers in that fleet.

Drago Anguelov (02:06): Since October 2020, we have been offering Waymo One, the world's first fully autonomous service, with 100% of rides currently with no human driver in Phoenix, Arizona. Waymo One drivers are now able to hail passenger vehicle on the app, when they download their app, and invite guests to join them, and Tweet and blog about their experiences on social media. And let's see what some of their experiences have been like.

Speaker 3 (02:44): Hey guys. My ride is here. Here we go. Check this out.

Speaker 4 (02:50): Human power. This car is all yours with no one upfront.

Speaker 5 (02:55): Wait a minute. We're going red. No, we're cruising right on red. How about that? Right on. Good. Well, he gets with it, doesn't he? He steps on the gas pretty good. Or she. It.

Speaker 6 (03:08): See? No one up there. So crazy.

Speaker 7 (03:16): Yes. Let me tell you why. We've been doing this since 2018, when they had drivers in the front seat that could manually take over just and case. So, yes. I'm quite comfortable with this.

Speaker 8 (03:33): Work zone, speed limit 35. Yep. Nailing that 35 no problem.

Speaker 9 (03:34): Thank you.

Speaker 10 (03:43): Oh, we're coming to a stop sign. See what it going to be. Oh, just getting ready. Aye. Whipping it. Whipping it. This car drive better than some of y'all.

Drago Anguelov (04:01): So, this is some of the experience people have been having with our first and only autonomous service car open to the public. And we're working hard to keep improving the service, and of course to bring it a lot more places.

Drago Anguelov (04:16): In more recent news, earlier this year, we announced that we have significantly expanded our testing in San Francisco, and last month we announced that we have kicked off our Waymo One Trusted Tester program. It allows San Fransicans that are part of the program to hail an autonomous ride, off our all electric new platform, based on Jaguar Pace, and featuring our fifth generation sensors, and get rides in the city.

Drago Anguelov (04:51): This new fifth generation sensor suite of ours includes powerful state-of-the-art lighters that can see up to 300 meters, cameras with very high resolution that can see even longer range. 360 degree surround vehicle and see around inclusions, like in the front parts on the side. And we have a radar imaging system that offers great field of view and resolution, compared to any in its class.

Drago Anguelov (05:26): So, before I show you our Waymo driver happen in some San Francisco streets, I want to do autonomous vehicle driving 101 introduction, to describe roughly what the stack needs to be able to in order to be able to drive safely on the street. And this is to benefit people that are not that well familiar with our autonomous vehicle driving domain.

Drago Anguelov (05:51): And so, imagine you're at this intersection that I'm showing you, and the first question the system needs to address is, what is around me? It can leverage our rich suite of sensors, and potentially some prior map knowledge, and detect a set of agents in the scene. The police car in front, the various vehicles, pedestrians. And also detect traffic lights, various other map elements. Construction and the like.

Drago Anguelov (06:24): And so, once we have done this, the next question to ask is, well, what is likely to happen next? This involves predicting the behavior of other agents. What are they likely to do in the scene, based on the context and some of their current actions?

Drago Anguelov (06:45): And given that they have these predictions, the next question is, what should I do? It's about deciding how to make progress in the scene. And that is planning. It requires planning a trajectory that is safe in this environment. And let's say we pick this green trajectory. Then ultimately a controller executes the necessary steering wheel turns, throttling and braking to take us on that trajectory, right?

Drago Anguelov (07:16): So, these are all the tasks that an autonomous vehicle needs to do. Let's see some examples from us, doing fully autonomous driving in San Francisco. So, this is an autonomous model. You can see on the top right there is a green square saying, "Auto." And here you can see some interesting interactions in dense urban scenarios, or even in these narrow streets. There's a lot of interactions that we need to be able to successfully handle.

Drago Anguelov (07:45): Here we're negotiating in the narrow streets with another vehicle going over a bump. Before we had a bicyclist signaling to us here. Pedestrian really wishes to go in front of us, and the car just let him.

Drago Anguelov (08:06): There is a lot of busy intersections, with potentially cutting in bicyclists and busy traffic that we need to pass. More bicyclist action. You can see us letting them go in front of us after they signal to us that they intend to do so.

Drago Anguelov (08:28): This is us driving in the night. You can see in the camera, not too clear. In the latter, you can clearly see the door is open, and we'll let the construction person... or is it a [inaudible 00:08:39] person, doing what they're up to, and then safely pass this vehicle. We can tell that this vehicle has an open door, and that it's parked there for a long time.

Drago Anguelov (08:48): Dealing with double parked vehicles is an important part of driving. Here's a more complex double parked vehicle scenarios. People loading things. We can tell. We can see by the items, automatically we predict that all of these vehicles are parked there for a while, and so we plan a safe maneuver to go by them. Right? And so, these scenes illustrate some typical scenarios in San Francisco driving, and a lot of them, as you saw, involve all kinds of interactions with agents, with vehicles on the road.

Drago Anguelov (09:21): Now, one of the challenges of autonomous driving and scaling is is that our driver needs to handle a variety of different scenes and domains, beyond dense urban. Suburban, freeway, and of course different kinds of weather, you know? And that makes the problem additionally interesting and challenging.

Drago Anguelov (09:42): Also, the San Francisco examples I showed you are quite common. They happen all the time. There are those certain scenarios that happen quite rarely, that we need to also handle well. This scenario is reasonably standard. So, he's going to start going forward. Look on the left here. There's a vehicle comes in on the red, and will just blow completely through this intersection, even though the green was for us. And so, we want to deal with such scenarios reasonably as well.

Drago Anguelov (10:17): Now, I'm going to show you an even longer long tale scenario in the next example. I think this is an example that [Nitri 00:10:27] mentioned in his first I-chat with Alex in this conference. And I will emphasize, no one was harmed in this example. Let me show you what is a real rare case.

Drago Anguelov (10:44): So, here we are starting to go. You can see something happened and blew out the trailer behind these cars. So, one more time. There is a vehicle coming, going by. Oh, it takes out the trailer. So, maybe a closer view. Enhance computer. With some stabilization, it can see a motorcycle plows into the trailer. Luckily no one was injured. But events like this, even though it's hard to imagine they could happen, do occasionally happen. And even though you cannot avoid all of them, it helps to do reasonable things when such things transpire.

Drago Anguelov (11:30): So, how does Waymo deal with the complexity of autonomous driving? So, I'm going to try to illustrate to you. From its foundation, we've emphasized a thoughtful and safety conscious approach to doing this. It is also an integrated approach, where we have designed our hardware and software in house. We leverage a tray of very powerful sensors, camera, lidar and radar. Specifically, lidar and radar are active sensors, and they're a great safety feature. They ensure that if there is some object out there, we will see it.

Drago Anguelov (12:06): And so, also lidar and radar, their signal typically has less variance than the camera. For example, the shapes of cars are pretty consistent in the lidar, even though their appearances in camera in the various condition can change dramatically, the pixels. And so, typically you need less data in camera and lidar to achieve robust detection results. So, ultimately this is a great safety feature.

Drago Anguelov (12:29): And we have built into perception this additional attention and focus, because perception is at the foundation of your whole system. And if you cannot detect the initial objects, or the semantic attributes of the scene, then all of your high level reasoning beyond that point can suffer, because you have not even addressed the basics.

Drago Anguelov (12:55): This is a short video of our fifth generation lidar that is on the I-bases and the trucks. You can see up to 800 meters in dramatically better resolution, with a lot better field of view. A lot more signal than our previous lidar stack. And it's a fantastic sensor to work with.

Drago Anguelov (13:16): So, let's go on to the next point, which is that we also use high definition maps as a prior. Not as an immutable truth. Maps are a great safety feature, because they let us anticipate what happens in the included parts of the scene. Furthermore, the map is a dynamic entity, and it's a mechanism where our fleet can share observation between vehicles, in some cases in real time, about features that they're likely to anticipate.

Drago Anguelov (13:49): Maps is specifically important when parts of the scene are clouded, either by vehicles in front of you or by a hill or by the intersection geometry. And it's quite helpful to know what to anticipate there. It adds into our system stack.

Drago Anguelov (14:09): Also, of course, we leverage machine learning very heavily at Waymo. It's a great tool to handle the world complexity of the type that I showed you. Of course, assuming that we have the prerequisite data to learn from. But we have this data. We have thousands of years of driving experience, and we have been capturing these diverse scenes in high resolution with multiple sensors over time. So, that's great.

Drago Anguelov (14:35): What's also really great in our autonomous vehicle domain that some other robotics domain do not share is that when we drive, we capture the behavior of all these humans that do the same task that we're trying to perfect. You can capture 10 to hundreds of humans driving, and showing you how they do it. Of course, some are real experts, and some not quite so much. But it's all very useful signal for machine learning, right?

Drago Anguelov (14:59): At the same time, there is a requirement for robustness. You need to be able to also do reasonable things in very rare cases of the kinds I showed you. And so, in that case, it does help for machine learning to be complimented with expert domain knowledge. Because having machine learning deal robustly with cases when there's almost no examples is still an open research problem, right? And so, an autonomous driving stack needs to be designed to leverage as fully as possible these trends in machine learning, while mitigating its weaknesses, right? And that how we've tried to build our stack.

Drago Anguelov (15:32): Now, when I started with Waymo and since then, the scope of ML has been dramatically expanding. And so, by now, there is large machine learning components in every major production system. And so, each of these models benefits from cutting edge architectures and state-of-the-art research.

Drago Anguelov (15:54): And for example, in perception, we have even published a lot of work in various conferences. I'll just give a couple of examples of some state-of-the-art systems that we have shown. So, on the left is an object detector in lidar that we published this summer, running multiple times, realtime, on the full range. We cut it off at 200 meters here, but on the full range of the lidar, multiple times faster than the real time. Really accurate. This is frame by frame detection. If you combine them, it will get a highest model tracking.

Drago Anguelov (16:29): And on the right is a model that I'm showing called [Wider 00:16:32]. This is an image model. It depicts the depth of every pixel in the scene and images. And there's certainty in your depth, and it cross predicts semantics that the have not shown. And basically the predictions of this model, it can make a really nice pretty point cloud. And it shows how really well our models can estimate depth and this state of the art depth estimation, it allows them to lift the cameras to 3D. It kind of leverages the fact that we also have a collocated lidar sensor, and it helps train our camera models. And so, we can get results like this.

Drago Anguelov (17:12): Now, in the last couple of years, deep learning really pushed also the state of the art in behavior prediction and general behavior modeling. And at Waymo, we have published some strong work in this space, like multi path, vector net, TNT. These are all various neural network architectures for behavior modeling, for increasing quality in this case.

Drago Anguelov (17:38): And on this slide, I'm showing you one of our more recent architectures. It's called multi path plus plus. It's a behavior prediction ITnet. We presented it in the summer in CBR Conference. And it's currently the leader by a margin on our Waymo open notion dataset, and was also in the first place on another popular benchmark called [Argulus 00:18:05], for a while.

Drago Anguelov (18:06): So, this examples I show you of this prediction model, there's various agents in the scene, that predict distribution over possible behaviors up to eight seconds in the future. And the more bright the colors are, the more likely the predictions are. And you can see there's some interesting intersection predictions, multi model. And here is parking lot with pedestrians going by and vehicles driving by and so on, right? It gives you an idea of the kind of models we have in these architectures for predicting and modeling behavior.

Drago Anguelov (18:41): And I'm showing you perception and prediction examples, but deep learning is also transforming planning, simulation and relation. At Waymo, we have major machine learning components in these systems in production, and I will talk a bit more about those topics later in the talk.

Drago Anguelov (19:01): But before I do that, I wanted to highlight a dataset that we have made. The Waymo open dataset. We created it two years ago, and continue expanding it. If you're interested in doing autonomous vehicle research on topics such as perception, behaviors, simulation, agents and so on, and you want to train models like the one you just saw, you may want to consider our dataset.

Drago Anguelov (19:31): And we continue expanding and enriching it, trying to make it applicable to more tasks in autonomous driving, and running challenges already for three years straight with workshop talks to highlight the moving entries. So, research in autonomous vehicle modeling is, I mean, quite an exciting field right now. And if you want to join it, please consider and reach out, we will continue expanding this dataset.

Drago Anguelov (20:08): So, I talked a bit about existing models. In this part of the talk, I want to discuss ways in which we can further grow the scope of the machine learning enabled systems. And one promising direction is to leverage the recent large capacity models trends.

Drago Anguelov (20:28): So, in the last couple years, there has been dramatic improvement in the fields of natural language processing. A lot of you may have heard of this model, GPD-3 by Open AI. I think there, with starting with a very high performing transform architecture, they showed that scaling it and training it on a really large web data sets, continuous pushing the state of the art and the quality of the model.

Drago Anguelov (20:56): We see some of the similar effects in computer vision, with models such as Efficient Net and also Cog Net more recently, by the Google Brain teams. Again, similar trends. As the number of parameters grow for good architecture families, accuracy continues growing. Generalization power continues growing.

Drago Anguelov (21:18): So, to me, I believe these learnings transfer to our domain. We have been collecting thousands of years of driving data, with now different platforms, including trucks and passenger cars, in many urban scenarios. And in some sense, I presented as many urban scenarios and diversity of scenes and conditions as a challenge to be solved, but also it's an opportunity, because now with these large model trends handling such diverse datasets may become also strengths, because models improve being fed data from potentially different environments and domains.

Drago Anguelov (21:59): And we're already leveraging some of these synergies in scale at Waymo now. So, for example, compared to driving on surfaced streets in cities, truck driving is less eventful. We do it mostly on the highway. Now, we have this rich Phoenix and San Francisco data though, and so if we leverage a lot of it for behavior prediction models, we find that we can get significant gains by mixing car and truck data. And our current, say, behavior prediction dataset for trucks contains 85% of car data, in a beneficial way. So, that's one example of synergies across domains.

Drago Anguelov (22:44): And I think there's also trend in our domain, going from maybe dozens of smaller models and specialist models. Each may may be trained for specific tasks with its own ground truths. To fill larger models that handle and train to produce multiple outputs and benefiting ultimately from more supervision from these different outputs.

Drago Anguelov (23:07): And so, at the more general level, if you want to summarize what recent trends are, data model architectures help. Specifically model architectures that contain transformer layers do particularly well. And within a given model architecture family, quality is a function of scaled up models, but also scaled up datasets. So, you can't just do one without the other. That does not quite work. So, you need to work both sides of the equation, right?

Drago Anguelov (23:36): And so, we have our fleet. It's very sizable, with all these rich sensors. That means it's a lot of data. And of course, we also have been doing research to even further augment what we can do with this data, and I will highlight a few examples of research that we have published.

Drago Anguelov (23:57): So, one thing that you will see here in this video is a scene from our Waymo open motion dataset. This scene was bot enabled by an algorithm, a model, that we published in the CBR Conference this summer. That model is trained and run, so not bot data. So, it's not limited by latency constraints on the vehicle. And onboard, it also benefits from seeing the future. Right? Because it captured the full scene. It drove through all of it. Now for every point in time, you can look, well, both backwards and forwards in future, and estimate the most likely... a lot more accurately a shape, trajectory of the vehicles and so on.

Drago Anguelov (24:39): And I think this clip, and yes you can check out the dataset itself, shows the kind of quality that can be achieved. And the significance of this is now we can use methods like this to go label a lot of data automatically, and then train our own bot model on that data. And just making it a lot more powerful as a result.

Drago Anguelov (25:04): Another method that is very popular for augmenting your data is, of course, applying various input perturbations. And here, I've just illustrated some of the perturbations you can apply to objects to allow the point clouds. So, you can scale down and drop out points and cut out pieces and so on.

Drago Anguelov (25:26): Now, that's all great, but you want to... These populations have a certain set of parameters. It's quite a large set of parameters. It's not that easy to guess them, so ideally we want to automate this process, and we have a paper last year that we showed how to search through the space of maybe a dozen or several dozen parameters for these augmentations, and achieve significantly higher quality and generalization for the models that leverage this technique on label data.

Drago Anguelov (26:00): And this year, we expanded it to do the same kind of techniques in a beneficial manner also on unlabeled data, where we automatically label the data and perturb it, to eek out maximum gains. And with techniques like this, you can potentially decrease your data needs here from black to orange is the example, maybe in order of magnitude at least, right? So, that's a nice technique to have.

Drago Anguelov (26:31): Another technique that is very popular and relevant in our domain is self supervision. So, our models and our perception system perform multiple tasks, that all contribute to a holistic understanding of the scene. In a paper jointly with Google Brain this year called Taskology, we explored some of the beneficial aspects of this rich structure in the scene. It's a spatial temporal structure. We showed how to train potentially asynchronously in a very scalable way, several different models that accomplish different related tasks.

Drago Anguelov (27:07): And the outputs of these tasks are related by some dramatic and temporal constraints. And so, whenever those constraints are violated, we can push gradient through the models, and improve them to be more consistent in ways that we expect. And so, that's quite effective too.

Drago Anguelov (27:24): I'll give you an example of what I exactly mean. So, you take the computer version scenario. If you have an image, and two images, and you predict how you move between them with a model, the depth in the first image, which objects move, how they move, you essentially can reconstruct the second image given your predictions. And then you can check if in the second image the pixels look like you would expect it. And if they don't, you can push gradient to tell it, no, it predicted something off. Let's try to adjust only the prediction stack to predict the next image. And that's a powerful supervision signal. It helps train a lot of outputs, such as 3D flow and so on.

Drago Anguelov (28:01): Similarly lidar, a very similar story unfolds. So, if you detect objects in a frame, and give their bounding boxes and velocity, you expect them in the second frame where you detect them to be consistent with the predictions from the first frame. And based on this, you can push gradient, and again improve your models. So, that's good techniques to have as well.

Drago Anguelov (28:26): And last but not least, I want to highlight the recent work of ours that will appear... It's actually in archive now, but it will appear in the ICCV Conference later this year. And it's called Semantic Point Generation, and it's this ingenious idea to predict an auxiliary output in the model that can be leveraged then to improve 3D detection quality and robustness.

Drago Anguelov (28:52): And the way this works is, we train the model first to predict these red semantic points from the black real points. And the semantic points are meant to cover the full shape of the object. And then we take this whole augmented point cloud, the real and the augmented points, and from it we predict the 3D box in the model.

Drago Anguelov (29:13): And so, what this does, it homogenizes a lot the input to the box predictor. It helps you handle different detection cases, such as long range where there's not many points, because you then imagine a set of points a certain way. Or in rain, where the lidar patterns are different, it homogenizes the point patterns inside the model.

Drago Anguelov (29:34): And it also allows it the more ingenious way of data augmentation, because then now it can start transforming this augmented semantic point cloud, as opposed to just the visible points. And so, this method currently tops the TT benchmark. This is maybe still the most popular benchmark. Not sure. But a good one to certainly run on, and we're on first hard medium examples on [Kitip 00:30:02]. And this method is also tops on our own Waymo open dataset domain adaptation benchmark, which is the challenge of, okay, can you train on San Francisco and Phoenix in clear weather, and then predict curriculum in rainy weather? This type of model significantly improves the state of the art in that task.

Drago Anguelov (30:25): So, so far, I discussed approaches that works very well for the part of the system that are trained in a more supervised or semi supervised manner. These are common tasks for a lot of domains, like perception. However, in autonomous vehicle domain, our ultimate task is to train an AI agent. And to fully leverage the power of machine learning there, it is key to be able to have good evaluation of this full driving agent behavior, which we can do ideally efficiently and without much human supervision. And that would unlock machine learning.

Drago Anguelov (31:03): And so, a key direction to keep growing the impact of machine learning is to emphasize scalable full-stack testing. Right? And how do you do scalable full-stack testing? There's actually a very rich set of techniques. We have published at Waymo our... The first company to do so, our safety framework in October of 2020. So, if you want to learn more about testing, I refer you to that publication. However, let me recap some of the main options.

Drago Anguelov (31:34): So, one of them is, we can test our software on public roads under the supervision of a trained human operator. And so, that really tests the software wholesale. It can check how it really interacts with other road users, and you can gain the experiences in the world. However, if you change the system, if it can continue driving, it can take a while for your updated system to really see some of the more rarer situations. So, this critical and important way of testing only scales up so much. Right?

Drago Anguelov (32:07): And one thing that we can do to help with very rare scenarios is we can stage them ourselves. And we do so in our former force base, nicknamed Castle. So, we have studied typical rare scenarios from the literature that one needs to be concerned about and staged them. And then we can check how our driver responds in situations staged at Castle.

Drago Anguelov (32:34): However, right, those things are also great, but it's also quite effort intensive, and requires a lot of thought. How to set up these scenarios? That leads us to simulation, right? And simulation at Waymo is used extensively for validating the performance of new releases at scale. Simulated scenarios do not really expire too, and so you can use them to evaluate successive versions of the stack. And of course, simulation can provide the important supervision signal for machine learning algorithms.

Drago Anguelov (33:05): Now, before we continue talking about simulation, maybe one way to look at it is to say, "Well, what do recent DeepMind AI successes have in common?" Apart from the fact that a lot of really smart DeepMind AI people do great things? Right? Well, there is various systems of theirs, benefit from learning how to maximize rewards at scale by running millions of billions of simulation for the task at hand. And achieve really great result doing it.

Drago Anguelov (33:38): Now, the question for us is, right, well, where is the perfect self driving simulator? First we need to build that simulator, and then we need to use it to define and solve the autonomous driving stack. Right? So, that's great.

Drago Anguelov (33:56): Right? So, what do we want to build? We want to build the simulation signal. And this is a picture from the Sim City game, for those that don't recognize it as a symbolic representation of Simulation City. Right? And the purpose of this city will be to train autonomous vehicles so that they can learn much faster from simulation than from real world driving.

Drago Anguelov (34:18): And you want this city to be built on your data that you've collected from your 20 million plus real world miles driven. It'll benefit from the experience. You want to do, if possible, full trips in all kinds of environments and traffic. You want, ideally, to take all kinds of scenario situations encountered in the real world, and replay them in each real world scenario up to 10 or 100 X different ways with agents, right? So, this project is ongoing. At Waymo, we're exploring Simulation City.

Drago Anguelov (34:49): One key prerequisite for this is simulation realism. Simulation realism is important, because you want the outcomes in the simulator, in the real world, to be highly correlated. The more there is a discrepancy, the more you need to reconcile it with this gap, with human expert inspection and judgment. So, the more it can minimize it, the more it can just purely rely on simulator driving.

Drago Anguelov (35:20): And so, perception realism... So, simulation realism has several aspect, and one is sensor or data or perception realism. So, first we need to... If we drive in this simulator, we need to be able to simulate the vehicle sensors, so reduce the outputs of the perception stack. I mean, ultimately it depends how you design your system. But one of the two.

Drago Anguelov (35:43): And Waymo has a key advantage in this space, because we have powerful 3D sensors. We can reconstruct the environment with really high fidelity. The background of this image is an example of a Waymo centimeter accurate map reconstructed with our lidars. And of course, we also have really powerful perception and auto labeling algorithms to reconstruct very accurately the movements of the dozens or hundreds of agents, as I showed you in the scene.

Drago Anguelov (36:12): And so, based on these environments, you can run simulation. And of course, with a lot of simulation of all our sensors, lidar included, I will show camera simulation, because I think people can relate and have the most intuitions about it. So, here is a scene we drove in San Francisco. And we have a reconstruction of it. It's on the right, half of the image is a simulated camera output. And so, if you look closer, the shadows for the building and the light poles, right, will start aligning, because one is reconstructed. The scene is reconstructed based on what we detected.

Drago Anguelov (36:47): The vehicles are placed where we detected vehicles. Poles and lights are placed in the locations where our perception system detected them, right? And the building were leveraged from Google Geo, who have gone with efforts like Street View, collected data, and reconstruct in nice models. So, the buildings in the scene. Right?

Drago Anguelov (37:08): And so, then we can continue playing the same scenarios in different conditions. So, you can play the scenario in the fog or in the rain, or you can keep changing the time of day. And that allows you to multiply your learnings, and experience from these scenes.

Drago Anguelov (37:26): So, that's one simulator that was based on more traditional area reconstruction type of techniques. But in this next video, I want to show you some early results that we are getting in collaboration with Google Research on building a large scale simulation environment that is generated by a trained machine learning model. Right? And so, this is a learned model showing the simulation environment.

Drago Anguelov (37:54): And here, we gave a trajectory that is very unrealistic, that no vehicle can drive, just to prove that this is a simulated environment. Because we fly through vehicles and parking lots as we go. But this kind of tells you of the fidelity that is possible by using machine learning to build the simulator. And so, I'm very excited about this direction.

Drago Anguelov (38:19): And in addition to reconstructing the scene, you can also reconstruct various vehicle that you observe in the scene. And these are some I'm showing with fairly complex shapes that we can then pan around, and show you quite realistically how they change. And so, this is still a bit early work, and there is room for improvement here. But I think it's quite promising.

Drago Anguelov (38:50): Let's move from perception realism and sensor realism to behavior realism, which actually potentially is even more important. And this is my classic example that I show and modulate the need for behavior realism. So, this is a scenario we captured at some point, when the Waymo vehicle was the green box, and we drove through the scene and recorded how a set of objects behaved.

Drago Anguelov (39:14): And now we start driving in this simulator, and do something different. And we end up being the cyan box. And so, if they played the other agents the way we collected them, they start, well, pretty much smashing into us, because I mean, in the new environment, they're completely unaware that we are now in this new position, right? And what is needed is these agents themselves, they react to the things we do. And we need to build the world simulator. At the moment, we deviate a lot from what we did when we initially captured this scenario. We need to build a simulator where the agents react to our behaviors in a realistic way. Right?

Drago Anguelov (39:54): And we have been building agent models at Waymo even from before I joined it. So, three years now. We have a variety of different simulated agents, based even on different models. Here I decided to show you a couple of our recent fully machine learnt agents. So, these eight green objects are all embodied by agent model. And you can see the vehicle here decides to lurch in... well, lets first two vehicles pass, and then it decides to merge. And it starts going for this lane, but they come to an interesting situation, which is kind of the type of things you want to test.

Drago Anguelov (40:42): On the bottom is the same starting condition, but this time the vehicle just merges really quickly, and completely different scenario ensues, right? And I wanted to illustrate with this the power of agents to multiply your learnings. Even in a single scenario, you can play 10 to 100 different ways and check them, what happens with your autonomous driver, how it will handle.

Drago Anguelov (41:09): And now that you've seen examples of our agents, the question is, okay, well, what is a good object for behavior realism? Right? And one criterion for realistic behavior is when you observe it, like in the example here, can you tell that it's actually an agent versus the real thing?

Drago Anguelov (41:30): And so here, one of green and blue is the real thing, and one of them is an agent that we have replaced the real thing with. You can try to guess which one is which. Right? And humans are quite good in a specific instantiation of a scenario to tell you what is realistic and not. However, they also... doesn't quite tell you about the distribution of behaviors. The full distribution.

Drago Anguelov (41:54): And so, for this, you can use distribution matching matrix to measure how certain aspects of the agent's behaviors, such as, for example, how often it runs off road or how often it collides with other objects, whether that's realistic compared to the expectations we have. And it's a very complex distribution generally range of behavior. But you can slice it along verticals that you care about, specifically that relate to your safety expectations, for example. Or you can check that the accelerations and the amount of turning and cut-ins an agent does, matches what you see in real scenarios. But these examples in way to ground the development of agents into reality.

Drago Anguelov (42:37): And planning agents are quite related by design. Typically a planning agent is given a route and tries to execute it. This is an example of a fully learnt planning model that navigates quite successfully. I'll play it again. Narrow environment here they're doing [inaudible 00:42:56] with this agent, and then later our vehicle will stop and let the other agent pass.

Drago Anguelov (43:03): And while for sim agents, imitation and realism is what you want. For planning agents, we need to go beyond imitation level of performance, which presents an interesting opportunity to start combining imitation learning with reinforcement learning, right? And here's an interesting type of agent that I'm showing here, of the kind that we're trying in our simulators.

Drago Anguelov (43:30): So, let's zoom back out and see how some of this discussion fits into the big picture. So, ultimately, when you build AI agents, you can use ML to pull yourself by the bootstraps. And so, the process looks a bit like this. So, we go and collect data in the real world with our platform. We build a simulator with that data. And we also use imitation learning to learn simulated agent models and populate the simulator with such models.

Drago Anguelov (44:07): Then we can train in that simulator, and also leveraging real life experience, an autonomous vehicle model. We can evaluate it. Potentially you can also leverage some humans in the loop. And then once we have AV models we like, we can go out in the real world, collect more data with these models, and close the loop.

Drago Anguelov (44:34): And so, we are actively working on setting up and iterating this virtue cycle, and I look forward to sharing our ongoing progress on these topics in future events. Thank you very much.

+ Read More

Watch More

Taking Autonomous Driving from Research to Reality with Drago Anguelov

Posted Jun 21, 2021 | Views 1.6K

# Transform 2021

# Fireside Chat

Data-Driven AI for Autonomous Vehicles With Dmitri Dolgov of Waymo

Posted Oct 06, 2021 | Views 7.2K

# TransformX 2021

# Fireside Chat

Designing for Autonomous Vehicles at Scale With Hussein Mehanna of Cruise

Posted Oct 06, 2021 | Views 5.7K

# TransformX 2021

# Fireside Chat