Computer Vision

January 19, 2022

Data Augmentation in Computer Vision: A Guide

# Data Augmentation

# Computer Vision

Adding more data to an existing dataset improves the performance of computer vision systems.

Mehreen Saeed

Many probabilistic machine learning algorithms and deep learning methods require large amounts of data for learning and model building. In many real-world situations, data may be limited or be too expensive to capture. Data augmentation offers a solution to the problem of limited data and complements the original or incomplete dataset with more training examples and data points.

What Is Overfitting in Machine Learning?

In the field of computer vision, many ML models for object recognition and classification require large amounts of data for training. With few or limited examples, these methods tend to overfit the training data. Overfitting refers to a model specializing or adapting itself perfectly to the training examples, including the noise in those examples. Hence, the model loses its generalization capability and performs poorly on unseen data or test examples.

For example, if you train a neural network on a few images of handwritten digits, it will learn to recognize all the digits in the training set correctly. However, the system will have a high error rate on images of handwritten digits that it has not seen before.

Why Is Data Augmentation Required?

To circumvent the problem of overfitting, you add data to the training set and train the model using the larger set. The augmented data can be composed of synthetically generated or augmented data points from existing examples or it may be acquired through other means such as open-source datasets related to the same application. In this article, we’ll focus on artificially generated data.

Experiments have shown that data augmentation increases the generalization ability of a learning system and can significantly improve the accuracy rate of the system.

When Is Data Augmentation Required?

Data augmentation is required when the training set has few or limited training examples. Here are a few possible scenarios:

Data is not available. For example, a system learning to classify rare diseases from digital images requires real-life examples from people who have those diseases, but those people might be few in number. Data may also be difficult to obtain due to considerations such as patient privacy.

Data is too expensive to acquire. A typical example is a data-rich medical imaging system.

Annotating and labeling data is expensive. Even in instances where a large number of digital images are available, getting the data annotated at high enough quality can be costly without some automation. Some niche applications may also require expensive, skilled personnel, making annotation more costly.

The ML model requires large data. You might have a large training set with thousands of examples, but your ML model requires hundreds of thousands of examples to learn and generalize.

Data is imbalanced. An example is a system classifying various tree species from digital images that has very few examples of rare species but lots of images of the more common ones.

What Are the Types of Data Augmentation Techniques in Computer Vision?

There are many data augmentation techniques you can choose from that have successfully solved problems in computer vision. A nice taxonomy of these methods has been defined by Connor Shorten and Taghi Khoshgoftaar in a survey on image data augmentation for deep learning. Figure 1 below has been simplified and adapted from their paper. It shows some of the more common and important methods, and I’ve added sampling/probabilistic methods to it. You can read their paper for more details.

Figure 1: A taxonomy of data augmentation techniques (simplified and adapted from “A Survey on Image Data Augmentation for Deep Learning,” by Shorten and Khoshgoftaar, in the Journal of Big Data, 2019)

Basic Image Manipulations

Basic image manipulations include simple techniques (see Figure 2) to derive a new image from one or more images. There are many ways to do this, including:

Transformation of original images: The original images are rotated, flipped, scaled, or passed through different filters (e.g., blurring or sharpening) to create new images.

Color transformations: Additional images can also be generated by changing the original image’s colors or altering its intensity and brightness.

Mixing images: A new image can be obtained as a function of two images. The new images may look like random images to humans but have been shown to improve the accuracy of recognition systems.

Random erasing: An area of an existing image is selected randomly and replaced by either black, white, or random colors.

Figure 2: Various geometric transformations. Source: Mehreen Saeed

Deep Learning Methods

Deep learning architectures are an extension of neural networks with many layers. Neural networks are ML models inspired by the workings of the human brain. They can be trained to learn a nonlinear function that maps an input image to an output image.

Many deep learning architectures, such as convolutional neural networks (CNNs), have the strategy of generating new images implicitly built within the model itself. An image is convolved with different filters to generate new images or representations, which are then passed to more layers for learning.

Below are some of the data augmentation techniques that explicitly generate new images and are based on deep learning architectures:

Adversarial training: This model of ML employs two or more networks with “rival,” or contrasting, objectives or goals. A rival network learns data augmentation by creating images that result in misclassifications in the other network.

Generative adversarial networks (GANs): This framework uses two neural networks: a generative network and a discriminative network. The former creates/generates synthetic images, and the latter evaluates them. Many new architectures have been defined based on the original GAN concept, including cycleGANs, progressively growing GANs, vanilla GANs, and conditional GANs.

Neural style transfer: A neural network consists of layers of perceptrons, including an input layer and an output layer. The layers in between are hidden representations of the input images and normally lie in a lower dimensional space. New images can be created from these image descriptions by passing them to more perceptron layers further in the network. The new images are therefore an adapted or modified form of the original images. Variational auto-encoders are an example of such networks that can be used to synthesize new images from given input images.

Methods Based on Sampling or Probability Distributions

Other data augmentation methods include:

Sampling: Simple sampling methods (Figure 3) such as oversampling and undersampling can be used to create new images. Synthetic minority oversampling technique (SMOTE) is a well-known method for synthesizing new data based on sampling. SMOTE generates new images via interpolation from existing images.

Probabilistic methods: Synthetic or new images can also be generated by directly estimating the data distributions of existing images and randomly sampling from those distributions to create new images.

Figure 3: Synthetic face images generated via sampling from Gaussian mixtures. Source: Mehreen Saeed

Which Technique Is Right for You?

Data augmentation techniques improve the accuracy of computer vision models. Using additional images during the training phase adds variety and more features to your existing data, which your model can use to generalize more and reduce overfitting. It’s interesting to note that many augmented images are not comprehensible by humans and that it is not completely understood why such images improve the performance of the system.

You can choose a data augmentation technique based on traditional methods of manipulating images, or you can go with a more sophisticated strategy, such as one based on neural networks.

The method you choose should depend upon your resources and your application. For example, you can use SMOTE to deal with class imbalance or use traditional geometric transformations that are easier to understand, interpret, and generate.

Alternatively, you can use deep learning methods to create a larger dataset when you have a lot of processing power and memory resources available to you.

Data Augmentation in Computer Vision: A Guide

Adding more data to an existing dataset improves the performance of computer vision systems.

Popular

Related