Machine learning has become a common part of data-driven technology across a wide variety of industries. Production-level ML pipelines, however, can be incredibly complex. Production models can involve complex interactions such as model chaining, where one model is used to generate data for another model. Additionally, these models often require periodic retraining and deployment to incorporate new data.
Because these models require huge amounts of data, organizations can save large amounts of time and money by optimizing their pipelines and eliminating unnecessary computation. The complexity of these pipelines, however, can make pipeline optimization similarly complicated. To better understand production pipelines, researchers at UC Berkeley and Google analyzed a collection of 3,000 ML pipelines at Google consisting of over 450,000 trained models. They examined common characteristics across these pipelines and explored how these pipelines could be adjusted to optimize model performance.
Here’s what their study revealed about popular ML pipelines and how you can use the results to improve your own pipelines.
The study began by examining high-level characteristics of ML pipelines across Google. The researchers examined characteristics such as pipeline complexity, model architecture, and resource consumption. They found that around 60% of the models at Google were deep neural networks, but the remaining 40% were other types of ML models. This suggests that there is often value in using simpler model architecture; deep neural networks are not the best solution for every problem. Organizations should establish pipelines that can handle diverse model types.
When examining resource consumption, the researchers found that training only accounted for 20% of total computation time, emphasizing the importance of other aspects of the ML pipeline. A large portion of resources was devoted to ingesting data, transforming data for feature engineering, and validating data distributions. When optimizing a production pipeline, it’s important to consider how these stages of the pipelines can be improved, instead of only focusing on the training process.
After analyzing the high-level features of ML pipelines, the study explored the process of fine-tuning these models. Model training is an iterative process, with training repetitions playing a vital role in converging on a result. Additionally, complex ML pipelines often involve chains of models, where the output of one model is given as input to the next. They also need to retrain models as they receive new data.
Fine-tuning these models can require a great amount of computational power. In their study, the researchers examined the benefit gained from each repetition in the ML pipeline. They examined how much the data distribution changed with each repetition and how frequently the trained models were actually deployed.
To look at this granularly, the study divided each section of the pipeline into a subgraph, or “model graphlet.” Each model graphlet represented a single end-to-end pipeline for an individual model. When examining the pipelines in the dataset, the study found that, on average, models were updated seven times per day, with over 1% of pipelines updating models over 100 times a day. The input data for consecutive model updates typically had large amounts of overlap, but their overall data distribution varied, underscoring the need for the models to be retrained to prevent data drift.
When examining the role of these graphlets in the deployed models, the study found that a great deal of processing power was going to models that were not actually being deployed. In fact, only one in four model trainings resulted in model deployments; the three other models resulted in wasted computation. Eliminating these graphlets prior to computation could therefore save a lot of unnecessary resource usage.
After discovering that unused models were large users of computation power, the study focused on examining ways to predict whether a model training run will result in a deployed mode. If these training runs could be identified prior to computation, they could be skipped past in the pipeline.
The research team pursued this approach by developing an ML-based approach to predict whether a model would be deployed. Models were defined by features including the average number of inputs and outputs for each execution, their model architecture, the similarity of input data between the current model and preceding models, and a binary indicator denoting whether the training operation was different from the previous model.
The ML-based approach proved incredibly effective at predicting whether a model would be deployed. It achieved high accuracy and saved up to 50% of wasted computation without compromising model performance. By targeting the removal of wasted resource usage, the team was able to greatly reduce the resources required by ML pipelines.
Because ML production pipelines are composed of a variety of interlocking steps, it can be difficult to parse out methods for performance improvement. However, even complex pipelines can be optimized by leveraging ML technology. This study demonstrates that ML techniques can be used to identify and remove sources of wasted computation, making it possible to improve the efficiency of an ML pipeline without sacrificing performance.
For more information about this study’s analysis of Google’s ML pipelines, as well as the techniques used to identify and remove wasted computation, read the full paper, “Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities.”