How Waymo Is Using ML to Build a Scalable, Autonomous ‘Driver’

Machine learning feeds into, and is trained by, a fleet of vehicles that drives a combined 100,000 miles each week.

Johanna Ambrosio

Waymo, the autonomous driving technology company, is essentially building a human on wheels, says Dmitri Dolgov, the company’s co-CEO. The goal: for the company’s automated vehicles to see, reason, and drive like a person, only completely safely. Doing so successfully, however, involves multiple types of machine learning (ML) and much other technology.

Dolgov explained some of the company’s ML efforts at the recent artificial intelligence (AI) conference Scale TransformX 2021.

The AI and ML that we have today is very different from what Waymo used when the company first started, Dolgov said. “The capabilities change, and the limitations change,” and it’s been a continuous, almost daily push to improve the capabilities of AI and ML.

The idea is to iterate as quickly as possible on all the technology needed—for improving accuracy, maximizing efficiency, leveraging better-structured representations with sometimes sparse data, and doing other tasks. Here are the highlights from his talk.

In the early days, a lot of Waymo’s ML work involved perception and supervised ML with human-powered labeling. Now there is “a lot more you can do with auto labeling,” Dolgov said, “and other approaches to data management become more relevant.”

Waymo’s approach to self-driving cars is to build a high-capacity, predictive driving model called the Waymo Driver. It can handle both “usual” conditions, such as driving on a highway and navigating around pedestrians on city streets, and rarer situations, called long-tail cases.

The latter include real-world scenarios that Waymo vehicles have experienced: drunken bicyclists weaving in and out of traffic, items falling off trucks, or people walking around in elaborate costumes, some looking decidedly un-human, especially during Halloween. And then there was the so-called bubble truck, a vehicle that drives around making bubbles, and a motorcycle traveling unattended at 70mph after the rider had fallen off. “You might not recognize it as a motorcycle because it doesn’t have a rider, but you can still see it as a moving object and propagate that throughout the entire stack,” Dolgov said.

The Waymo Driver model is used in Waymo One, the firm’s autonomous ride-hailing service, and Waymo Via, the company’s long-haul delivery service. While long-haul trucking operates in different environments than ride hailing—more highway than local/urban streets—the basic capabilities of the underlying ML and AI models are similar, Dolgov said.

While some situations such as a person opening a car door and hopping out in front of you are more likely on a city street than a highway, he said, you still have to be ready for pretty much anything to happen in any environment.

Gathering Data from Around the Country

Waymo, begun in 2009 as the Google Self-Driving Car Project, was spun off in 2016 as a separate Alphabet subsidiary. Waymo’s mission is to make it safe and easy for people and things to get where they’re going. To this end, Waymo has driven more autonomous miles than any other company in the industry, Dolgov said. Its ride-hailing service, initially deployed only in the Phoenix area, recently expanded in pilot form to San Francisco, where the vehicle fleet drives 100,000 miles each week.

In Chandler, Arizona, people can sign up for the commercial ride-sharing service, and in San Francisco, they can try the Trusted Tester program. All told, Waymo has operated its vehicles in 25 locales around the country, with each city adding to its research trove and data store, and the company is deploying the fifth generation of its Driver in both its cars and its trucks.

“San Francisco is great … when it comes to urban settings, things like dealing with dense crowds and narrow passages. Driving in Michigan in the winter gives you super-useful data in dealing with snow and the diverse weather conditions.” —Dmitri Dolgov

Data augmentation and synthetic data generation play a big role. As more advanced models are used in various parts of the stack—including semantics to help understand scenes as a whole, behavior prediction, imitation learning, and motion planning and prediction—data strategies must change to meet the needs of these emerging model types.

Gathering more data about the “head,” or average use, case has limited value, while expanding the “tail” areas, which have limited or no data, can drastically change how the model performs in certain situations. Waymo has invested heavily in automatically expanding the dataset where it is sparse. “You need to invest in data mining to find interesting examples that are representative of the tail part of the distribution,” Dolgov said.

It requires a lot of work and creativity to balance the head and tail cases effectively. Building a driver-assist system that focuses on the average case and relies on a human to address the long-tail areas is much simpler than building a more robust system that also incorporates the tail cases. At the other extreme, Dolgov said, overemphasizing the tail without paying attention to the head results in an autonomous vehicle that “never leaves the parking lot.”

Pulling Together the Whole Package

A fully autonomous driver, at a high level, must have knowledge about its current state, current and future locations, and goals. It must be able to sense or see its surroundings. And it should be able to understand the intent of and interactions among other actors—drivers, pedestrians, animals, etc.—with which it shares the road and use this knowledge to predict what these actors will do in the short- and longer-term future. Finally, the driver must operate the vehicle and actually drive.

There are three main areas of technology needed to create this type of autonomous system, Dolgov said:

The self-driving hardware, including the lidar and other sensor platforms, radar system, compute hardware, etc.
The AI itself, the model that processes the data from the sensors and makes driving decisions
Off-board infrastructure for effective, large-scale model training, deployment, and evaluation of the neural-network models and the system as a whole

Waymo has invested in all three areas, creating its own hardware suite with three sensing modalities, infrastructure for testing and simulation, and fast, real-time inference using AI models for perception and scene prediction with significant ML capabilities.

By developing pretty much every major component itself, Waymo gains the benefits of integration, better performance, and “very positive feedback loops” among domains, Dolgov said. There is a cost to specialization, he said, but Waymo’s strategy is to “unify and simplify” when it comes to technology development, he said.

The idea is to create the “proper way of solving the fundamentals of a problem without building a lot of fragmented little pieces.” It’s what will allow Waymo to pursue additional vehicle platforms and additional scaling to new environments and new cities, he said.

To deploy the autonomous system as a full-scale commercial product, Waymo created a large framework for evaluation and deployment, allowing researchers and engineers to focus on advancing the technology instead of worrying about each individual release cycle.

Further, he said, “we’ve learned to operate this fleet of fully autonomous vehicles 24/7. Carrying that technology all the way to a fully launched commercial service has been incredibly difficult,” Dolgov said.

Perception and Prediction Go Hand in Hand

“Our strategy has always been to leverage state-of-the-art ML in all parts of our stack, from basic perception to semantic understanding to behavior prediction to planning,” Dolgov said. Evaluation is also key; “If you can’t evaluate, there’s no meaningful optimization strategy that you can pursue.”

The requirement to mimic human behavior and decision making creates even more complexity. In perception, one must identify objects. While this is technically complicated, fundamentally it is just identifying things such as pedestrians, cars, etc. However, the more intricate tasks require scene-level understanding and context. Recent advancements in ML have helped tackle these problems.

“You employ all of these to make your big hammer as big as possible, enough to take as much of a bite of the distribution as possible” at both the head and tail ends, Dolgov said.

Models that can represent the entire scene use efficient, well-structured representations of heterogeneous features in the scene, based on appearance, structure of the world, and so on. Waymo has published its work on hierarchical neural networks that model both static and dynamic parts of the environment, and it released its Waymo Open Dataset in August 2019.

The Open Dataset is a perception dataset comprising high-resolution sensor data and labels for 1,950 segments. It now also includes a motion dataset with object trajectories that correspond to 3D maps for 103,354 segments. The goal is to help the research community advance machine perception and autonomous driving technology.

Waymo found it beneficial to mix and match ML techniques. Doing imitation learning in an environment as complex as driving is “very powerful, but it’s not enough. You need to augment it with something,” Dolgov said. This includes injecting some bias into the system through structured representations and leveraging the simulator to allow you to explore parts of the space you can’t understand well from human examples or by human teaching.

With the computational power that we now have, we can build more powerful models that don’t rely on human labels, and Waymo can leverage “various tricks,” such as going back and forth in time, Dolgov said.

“It’s been incredibly exciting to be on that bleeding edge of technology in all these areas and be pushing it forward.”

Learn More

For more details about Waymo’s ML strategy, watch Dolgov’s talk from TransformX, “Data-Driven AI for Autonomous Vehicles,” or read the full transcript.

Image credit: Dllu, CC BY-SA 4.0, via Wikimedia Commons