This year’s Computer Vision and Pattern Recognition Conference (CVPR 2022) came and went, but the research presented there is here to stay, and it sets the stage for the year ahead. One theme that showed up in many papers was “data-centric AI,” meaning AI research that focuses not so much on the model architectures as on the quality of the data used to train those models. While massive strides have been made in improving the performance of deep learning models over the past decade, research into how to improve the quality of the data underlying these models is relatively new.
Much of this research focuses on unique ways to curate, generate, or label data in resource-constrained environments, using techniques such as generative modeling or weak supervision. Because deep neural networks (DNNs) produce staggering improvements on various tasks but require a large amount of labeled data to train, these methods are becoming a priority in AI.
Real-world data—especially edge-case data—is usually unannotated and can be expensive and difficult to gather and label, which makes synthetic data generation an appealing avenue for creating or augmenting the large datasets you need to train DNNs. Read on to learn about four papers, presented at CVPR 2022, that exemplify this new data-centric AI approach and how you can use the insights they bring in your day-to-day engineering and research work.
Autonomous driving is one of the most exciting applications of AI, and the world in which autonomous vehicles operate is richly multivalent and ever-changing. Self-driving cars routinely encounter new environments they may not have seen before in their training data, such as a foggy country road in the pre-dawn morning or the dimly lit interior of a parking garage. It can be incredibly challenging to deploy cameras in the world to capture every scenario, but techniques such as synthetic data generation can alleviate some of the burdens of manual data collection and curation.
This is the approach that SHIFT takes. By generating synthetic images of eight different driving locations, SHIFT provides a highly accessible and useful dataset for manufacturers that want to train their autonomous vehicles to operate in various environments. The dataset provides a collection of synthetically generated sensor data, including views from an RGB camera set with five different cameras and a LiDAR sensor. It also “supports 13 perception tasks for multi-task driving systems: semantic/instance segmentation, monocular/stereo depth regression, 2D/3D object detection, 2D/3D multiple object tracking (MOT), optical flow estimation, point cloud registration, visual odometry, trajectory forecasting and human pose estimation,” the paper says.
While it’s not the first synthetic driving dataset, SHIFT notably improves on what its predecessors offered by providing continuously varying environmental data (see Figure 1). Previous synthetic datasets captured environmental conditions at discrete points in time—for example, two images of a two-lane highway: one in the morning and one at night.
Figure 1: The SHIFT model is designed to capture both discrete domain shifts (e.g., complete changes of scene) and continuous domain shifts (e.g., a gradual transition from day to night within a single video frame). Source: “SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation”
However, the authors of SHIFT realized that this approach is insufficient for training vehicles that can smoothly adapt to changes in their environment. Day fades gradually into night—there is no on-off switch that immediately extinguishes the sun. Similarly, fog may gradually roll in during early-morning hours to obscure a country road. SHIFT models these continuous phenomena by providing sensor captures of these states at time-varying positions during their onset and cessation, providing autonomous vehicles with a more nuanced understanding of their surrounding world.
While SHIFT provides continuously varying synthetic data for common environmental conditions, manufacturers and researchers alike need to collect data on the most extreme weather conditions that an autonomous vehicle might encounter. A self-driving car that fails to navigate properly during a hurricane evacuation or that skids off the road during a blizzard could cause harm to drivers or pedestrians. Furthermore, it can be difficult to acquire sufficient real-world training data for these rare situations, since severe storms are both unpredictable and relatively infrequent. Synthetic data generation is especially well suited to this task.
Although SHIFT samples data from many sensors, the snowfall simulation paper focuses exclusively on LiDAR data, which provides some of the most robust 3D information available to autonomous vehicles. Because LiDAR uses a pulsed laser to measure distances between objects, it is vulnerable to the presence of environmental factors that scatter or refract the laser beam, in particular rain, snow, and fog. This results in extremely noisy data that often leads to inaccurate estimates of object distances, which can be catastrophic for the vehicle as it navigates its environment.
By using advanced mathematical and physical modeling that amounts to treating snowflakes as individual spheres, the researchers provided a robust ground-truth signal that LiDAR-based autonomous vehicles can incorporate into their modeling systems during periods of intense snowfall (see Figure 2). The model also considers the wetness of roads, reducing the disruptive effect on traditional LiDAR systems. When used to train several 3D object detection models, the researchers’ simulated data provided a performance lift of up to 2.1% over the best existing detection model—a significant improvement.
Figure 2: For these three road scenes under heavy snowfall, the four columns to the right of the images show the performance of the model as trained using different data augmentations. The “snow + wet” augmentation is the only method that successfully trains the model to produce correct object detections for each of the three scenes. Source: LiDAR Snowfall Simulation for Robust 3D Object Detection
Neural networks are quite sensitive to small perturbations of their inputs, which is why they are vulnerable to adversarial examples and data privacy attacks. One way to improve the robustness and accuracy of deep learning models is to train them using data augmentations in which filters or corruptions are applied to samples in the training set (see Figure 3). This makes the model better at extracting the core features essential to its semantic understanding of the image while leaving it less sensitive to extraneous noise.
Figure 3: Examples of the 3D augmentations and corruptions proposed in the paper. Source: 3D Common Corruptions and Data Augmentation
For computer vision models, the augmentations applied to images have traditionally been 2D in nature: for example, applying image blur, adding gaussian noise to the pixels, changing the lighting of a scene, or occluding specific objects. This paper takes things a step further by introducing 3D augmentations. The idea is to make computer vision models more robust to the 3D geometry of a scene. A model, for example, should be able to detect the presence of a couch in a living room, regardless of how the camera is oriented relative to the couch, the depth of field, the lighting in the room, and so on.
Specifically, the paper introduces 20 three-dimensional corruptions relating to scene attributes such as depth of field, camera motion, lighting, video, weather, view changes, semantics, and noise. Most of the corruptions require only an RGB camera image and some notion of scene depth to be applied, although a few also require a 3D mesh. For datasets that do not have these attributes, many of the corruptions can still be applied using approximation techniques. This paper points to an interesting direction into 3D robustness research by demonstrating the usefulness of 3D corruptions to model benchmarking and training.
Many synthetic dataset generation techniques are task-specific. Kubric aims to change this. This scalable Python library interfaces between Blender, the popular open-source 3D modeling tool, and PyBullet, a physics simulation engine, to allow for the rapid generation of random, photorealistic 3D images, videos, and scenes (see Figure 4). By leveraging freely available 3D assets and textures, users of Kubric can generate terabytes of synthetic training data with just a few simple Python commands.
Figure 4: Example scene created and rendered with Kubric, along with some of the automatically generated annotations. Source: Kubric: A Scalable Dataset Generator
Perhaps just as important as the software itself is the suite of benchmark datasets and tasks the paper’s researchers introduced for computer vision models using data generated by the Kubric tool. The four tasks are object discovery from video, optical flow, texture-structure approximation using NeRF (neural radiance fields), and pose estimation. The richly annotated and large-scale datasets associated with each of these tasks give researchers new challenges to tackle, with much more supervised data than is likely to be available from a hand-labeled dataset.
The benchmark tasks show the power of Kubric to rapidly scale data in order to empower the next generation of deep learning models. Kubric’s developers plan to add more advanced capabilities in the future, including the ability to “include volumetric effects like fog or fire, soft-body and cloth simulations, and advanced camera effects such as depth of field and motion blur.” They also hope to incorporate more freely available 3D assets.
Data-centric AI is a rapidly growing subset of machine learning research and promises new ways to harness high-quality data for more accurate and generalizable deep learning models. At CVPR 2022, researchers demonstrated the rapid rate at which this new paradigm is catching on and showcased an incredible diversity of research into this emerging subfield. Synthetic data generation is at the heart of these applications, providing fine-grained control over the training dataset.
Synthetic data generation allows ML practitioners to design datasets that match their application’s unique needs, address problematic edge cases, and scale datasets to train large networks. Recent improvements in generative modeling techniques such as GANs, NeRFs, flow-based, and diffusion models have made large-scale, photorealistic synthetic image generation possible. These papers, as well as their freely available code and datasets, should be useful to researchers and engineers developing the next generation of deep learning applications.