MLOps: What It Is and Why It Matters

Machine learning operations (MLOps) facilitates and automates the deployment, maintenance, and monitoring of ML models. Here's what you need to know—and why it's important.

Steven J. Vaughan-Nichols

Machine learning has come a long way since IBM engineer Arthur Samuel coined the term in 1959 on his way to building the first successful game-playing program for checkers. One fairly recent evolutionary step is combining machine learning with DevOps to create a new discipline called “machine learning operations,” or MLOps.

MLOps encompasses the best practices for efficiently deploying and maintaining machine learning models in production. Here's what you need to know about MLOps and how it relates to its cousin, DevOps.

Meet DevOps, One of MLOps’ Predecessors

DevOps, for those who have spent more time with AI and ML than with modern production software pipelines, is the marriage of development and operations.

As Damon Edwards, senior director of product at PagerDuty and an early DevOps pioneer, explained, “Development-centric folks tend to come from a mindset where change is the thing that they are paid to accomplish. The business depends on them to respond to changing needs. … Operations folks come from a mindset where change is the enemy. The business depends on them to keep the lights on and deliver the services that make the business money today.” (Edwards is currently senior director for product at PagerDuty.)

With the rise of virtual machines (VMs), containers, Kubernetes, and cloud, DevOps bridged this gap between developers and operations professionals. Nowadays, they often work on the same team.

How Does MLOps Relate to DevOps?

Just as DevOps fostered communications between programmers and operators, MLOps encourages collaboration among DevOps teams and ML-savvy data scientists. And just as DevOps led to automating programming, deployment, and management in production pipelines, MLOps leads to automating the deployment, maintenance and monitoring of ML models in production.

Deploying MLOps and MLOps models isn’t easy. The ML lifecycle incorporates many complex components, including data ingestion, data preparation, model training, model deployment, model monitoring, and on and on. These processes require collaboration between teams such as data engineering, data science, and ML engineering.

How Best to Approach MLOps?

Jean-Christophe Petkovich, vice president and engineering manager at Two Sigma Investments, uses this high-level summary of how his team approaches MLOps:

You must be able to build model artifacts that contain all the information needed to preprocess your data and generate a result.

Once you’ve built these model artifacts, you must be able to track the code that built them and the data they were trained and tested on.

You must keep track of how all these things—the models, their code, and their data—are related.

With that step complete, you must be able to mark them ready for staging and production and run them through a CI/CD process.

Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact.

This approach will work for almost anyone trying to get their mind around MLOps.

That said, MLOps is far from an exact science today. MLOps development and deployment are still in the early stages of maturity.

Why Is MLOps Important?

It’s time to start learning about MLOps. Today, ML is still seen by many as a domain unto itself, but it won’t stay that way for long. To be successful, ML must become part of the modern software development pipeline.

That, of course, means you’ll need to learn it as well. No matter where you work in ML, you will be affected by it. If most of your work involves acquiring or cleaning data, you won’t need to deal with it as much as others, at least not at first. But as MLOps shifts left, toward the earliest parts of the software development lifecycle, these early stages will become part of the MLOps process as well. In other words, as MLOps evolves, some operational aspects of production/automation will be built earlier into the app development stage.

In the meantime, the closer your work is to the “right”—e.g., training the model, testing the model, improving it, and putting it into a production application—the sooner you’ll need to come to grips with MLOps methodologies.

What Are the Benefits of MLOps?

When MLOps works well, it delivers the same advantages as modern cloud-native program pipelines: speedier program delivery and expanded scalability. In other words, MLOps will allow data teams to achieve faster model development, deliver better ML models, and yield faster deployment and production. In addition, MLOps ensures that the model is doing what it was designed to do.

MLOps will also allow teams to scale and manage thousands of models. In short, MLOps will vastly speed up the shift of ML in many enterprises from early-stage adoption into full-scale production.

What Standards Exist for MLOps?

There are numerous, incompatible MLOps approaches at the moment, although, of course, you’d normally choose just one. At one extreme, you have MLOps programs such as Canonical’s Charmed Kubeflow 1.4, and then there are complete soup-to-nuts approaches such as Amazon Web Services SageMaker, a fully managed, end-to-end, cloud-based ML platform.

In addition, there are those who take a do-it-yourself approach to MLOps using such open-source programs as Apache Airflow, Kubeflow, MLflow, and TensorFlow with Apache Beam. While these programs can capture data science workflows and pipelines end to end, there are other commonly used applications such as Comet, Data Version Control (DVC), and Neptune that contribute bits and pieces of MLOps.

It’s too early to know for sure which ML approach will win out and which tools or frameworks will set the standard, and yet it’s clear that the commercial future of AI and ML lies in MLOps adoption. It’s time to start learning and working on your MLOps approach. But stay flexible, so you can respond to any changes in direction or technology as the discipline matures.

MLOps: What It Is and Why It Matters

Machine learning operations (MLOps) facilitates and automates the deployment, maintenance, and monitoring of ML models. Here's what you need to know—and why it's important.

Popular

Related