How to Improve Content Moderation on Inputs to Large ML Models with Synthetic Data

Suggestions for making DALL·E 2 give even better results by extending the training dataset with synthetic data

Harish Kamath

Sam Denton

Google researchers showed back in 2017 that Transformer models could generate convincing text based on a prompt. In 2020, an overlapping team successfully classified images with a Vision Transformer (ViT). In early 2022, OpenAI released DALL·E 2 in this blog post, demonstrating a model that could reliably generate high-resolution, sophisticated images in a wide variety of learned styles from a simple text prompt. The OpenAI team described three different ways they used data selection to reduce certain harmful content. They used it to:

filter out training data to remove graphic content;

re-weight the remaining dataset to improve bias; and

use clustering to remove duplicate images to avoid memorization/regurgitation of images.

This is a great framework for generally improving content moderation of large ML models. Read on to learn how each of those three steps could be supplemented with synthetic data to improve the results even further.

Filter Out ‘Harmful’ Training Data

DALL·E 2’s developers use an active learning approach to create classifiers that will find the examples of the image categories they would like to label—for example, images not suitable for work. They start with small datasets for both positive and negative examples (a few hundred of each), but typically, accurate, robust models require closer to 1,000 samples of each.

At this stage, procedurally generated synthetic examples of both positive and negative examples would make these classification models far more robust before even proceeding to the active learning stage. Thus, in an ideal scenario, the classifiers start out with far more accuracy this way, and the active learning stage can yield the same results in fewer iterations. Thus, things can happen faster and at lower cost.

Furthermore, humans are perhaps more intuitively creative at “defining” categories that aren’t work-suitable than they are adept at searching a large database of images and finding the NSFW examples in the haystack. With active learning, humans can reinforce the algorithm’s signal as to what is in bounds and what is out: Additional examples of positive and negative classification help the model improve in accuracy, eliminating the need to assess images one at a time from the (massive) dataset that OpenAI trained DALL·E 2 on. OpenAI’s active learning approach consisted of two main steps:

The harmful/benign binary classifier’s threshold hyperparameters were tuned such that recall was nearly 100%, but with an initially problematic, high false-positive rate. In this way, OpenAI’s annotation team was mostly labeling (or confirming) truly negative cases. This technique helped reduce the overall time required to label images, but it failed to expand the search space of the model to encompass new classes or clusters of harmful images not previously captured by the classifier.

The second step was to run many-fold cross-validation (also known as “n-fold”) to find positive samples in the existing labeled dataset that the model tended to misclassify as negative. This involved multiple training runs with different train-validation splits. Then, the team scanned their large remaining dataset of unlabeled images for nearest neighbors of these samples in a perceptual feature space. Only then were human labelers assigned to classify this new set of discovered images.

In spite of exploration through cross-validation, however, different clusters identified in feature-space might have been ideally suited to synthetic generation. This additional step might have reduced the need for exhaustive searching of sexual or violent data by human labelers.

Re-Weighting the Remainder of the Dataset to Reduce Unintended Bias

The OpenAI team next assigned a loss score to every image in the training set. They then calculated the ratio of the likelihood that the image is from the unfiltered dataset versus the filtered dataset. If the result is a higher value, they would weight the loss of sample further, implying that the filtered dataset lacked proper representation from a certain cluster. (In their blog post they mentioned a lack of females in their filtered dataset.)

However, synthetic data, intended to create additional points in a nearby cluster, could be generated around high-ratio samples rather than artificially inflating the loss of a specific image. Next, we could create a probability from this ratio and use that to choose whether or not to sample one of X nearest neighbors to this image from a synthetic dataset. This would reduce the risk of overfitting to particular examples with very high loss, and also unbias the model by providing more examples of the underrepresented class.

Clustering to Remove Duplicate Images

The next step the OpenAI team took was to deduplicate their dataset. Using random sampling, they examined subsets of the dataset and create clusters based on five parameter sets. Those clusters were then sampled for duplicate images. However, simply discarding the duplicate images within the dataset isn’t necessarily ideal. It should be feasible to expand the dataset by replacing duplicates with synthetic equivalents: Assess the class of the image, use a random seed, and procedurally generate an image of that class. Even if the classification is of low certainty, mitigating the reduction in dataset size while expanding it with a new, synthetic image should have double benefits. The resulting dataset would contain only unique images, and it would maintain its original size. Additionally, setting up an adversarial network (GAN) to synthesize images that the trained model thinks are in-class versus an impostor, might further resolve the boundaries between safe and unsafe content.

Applicable Takeaways

Looking back, OpenAI’s DALL·E 2 was a breakthrough for highly convincing image synthesis, and perhaps more importantly, it introduced three groundbreaking techniques to improve harmful content filtering via tweaks or maybe even “hacks” to their training data:

Filter out training data to remove graphic content.

Re-weight the remaining dataset to improve bias.

Use clustering to remove duplicate images to avoid memorization/regurgitation of images.

What’s somewhat surprising about these three particular innovations is that synthetic data provides an opportunity to improve on all three of them. Since, generally speaking, large machine learning models can be manipulated or even enhanced with the data you feed it, synthetic data is a cheap way to modify model performance to suit your needs. And particularly for “unsafe” classes, synthetic data reduces the need for human annotators to curate or generally experience these harmful images.

How to Improve Content Moderation on Inputs to Large ML Models with Synthetic Data

Suggestions for making DALL·E 2 give even better results by extending the training dataset with synthetic data

Popular

Related