The transition from Machine Learning research and experimentation to production deployment of ML models involves many challenges.
These have to do with the provisioning and scaling of compute power for training, developing the ML application’s API, serving the model, monitoring ML application health, and enabling High Availability. In many production use cases, an ML model training should also be an iterative process, which requires continuous retraining of the algorithms based on ingested user data.
The natural solution to these challenges is the MLOps methodology, which seeks to increase automation and efficiency of ML workflows, from development and testing to deployment.
In this blog, we discuss how Kubeflow enables automation for ML pipelines via leveraging Kubernetes for ML containerization, declarative management of application deployments, automatic disaster recovery, custom API and resources, and many other features. We also show how Kubeflow helps orchestrate every step of the ML workflow, from initial experimentation to production deployment, by providing a set of ML tools and enabling easy integration of ML pipelines with Kubernetes.
How Kubernetes Enables MLOps
Kubernetes is a container orchestration platform that facilitates the deployment and management of containerized applications in any environment. More specifically, it allows cluster administrators to define the desired state of their deployments using declarative manifests that configure all aspects of the application’s lifecycle, such as storage, networking, load balancing, health checks, updates, etc. Kubernetes Control Plane makes sure that the actual state matches the desired state by running periodic control loops.
Kubernetes is widely regarded as the platform of choice for implementing DevOps methodologies and is also well-suited for managing ML workflows per the MLOps methodology.
For ML projects, Kubernetes can provide a variety of benefits:
- Portability of containerized deployments, that is, the ability to run on any infrastructure supporting your container runtime
- Automatic management of ML deployments (scaling, rolling updates, health checks)
- Easy integration with cloud networking and storage
- Advanced cluster networking with support for Ingress
- Load balancing and microservices architecture, which makes it easy to implement the API for accessing your ML models in production
All of these features make Kubernetes an attractive environment for running ML workloads in production. The challenge, however, is how to integrate ML libraries and processes into the K8s platform so as to use them seamlessly. Kubeflow is a set of tools designed precisely to address this challenge.
What Is Kubeflow?
Kubeflow is a cloud-native application that facilitates the deployment of ML workflows in Kubernetes. Kubeflow’s main purpose is to automate each step of the ML workflow, making it reproducible in any environment and trainable on Kubernetes. It provides all necessary integration points for popular ML libraries and coding environments and ships with tools and services for transforming ML models into containerized applications served on Kubernetes. The solution addresses the needs of ML engineers and MLOps who want to deploy ML systems to various environments for development, testing, and production-level serving. By offering various automation tools Kubeflow also bridges the gap between the scientific part of ML and data science, on the one hand, and engineering part (application deployment and management), on the other.
Before delving deeper into how Kubeflow works, let’s start with a concrete example.
Let’s say you need to classify medical images of patients to identify certain skin diseases. Kubeflow provides all the necessary tools to do this:
- Jupyter notebook server and popular ML libraries to experiment with your medical data and ML algorithms
- Kubeflow Pipelines to schedule an ML workflow and configure interaction between its components
- Automatic hyperparameter optimization for your Convolutional Neural Networks (CNN)
- Easy containerization and deployment of image classification code in Kubernetes
- Distributed training on Kubernetes
- Serving frameworks such as TF Serving, KFServing, and Seldon to expose the model to the rest of the world
- Monitoring and performance analysis using Metadata
To get a deeper understanding of how Kubeflow works, let’s briefly describe each step of the ML workflow.
ML Workflow Management with Kubeflow
Kubeflow enables full automation of the ML workflow via the Kubeflow Pipelines tool. ML developers can define a pipeline as a multi-step process, and Kubeflow will enable its end-to-end orchestration from initial training to serving and monitoring. Pipelines are represented as graphs that consist of multiple components. For example, one component may be responsible for data preprocessing and another one for data transformation, model training, and so on.
Pipelines allows for the scheduling of an ML workflow and monitoring of its progress, quickly adjusting code, and improving the model. Also, different workflows may be saved and re-used to create end-to-end solutions fast without having to rebuild each time. Developers can additionally define how often pipelines should run (one-off or recurring), monitor model predictions, and quickly tune experiments to create new models.
Experimentation: Model Selection and Initial Training
During the experimentation phase, ML developers interact with various ML libraries to run different algorithms and experiment with a neural network structure, weights initialization, learning rate, etc. To make it easy for developers to use familiar libraries, Kubeflow provides integrations with TensorFlow, PyTorch, XGBoost, and MXNet. These tools are containerized and pre-configured as part of the workflow and provided in a Kubernetes-native way.
The Kubeflow installation also includes the Jupyter notebook server, an interactive Python environment for fast prototyping, and testing of ML models. In Kubeflow, Jupyter notebooks run in pods, which means they can be easily integrated with other components to convert model code into training jobs.
For example, developers can use the Kubeflow Fairing library, which allows you to package the code of Jupyter notebooks into Docker container images and run them as training jobs on Kubeflow. With Fairing, there is no need to manually transform model code into containerized versions, and if the model run is successful, the Fairing library can be used for model deployment.
Tuning Model Hyperparameters
Hyperparameters, such as learning rate and the number of neural net layers, significantly affect the performance of the ML model. Searching for the optimal hyperparameter configuration manually is a difficult and tedious task. Kubeflow ships with the Katib tool that helps automate the tuning of the ML model’s hyperparameters and neural net architecture.
With Katib, the job of the ML developer is minimized to formulating some objective metrics (e.g., model accuracy), defining a search space (e.g., min and max hyperparameter values), and selecting a search algorithm. Guided by this configuration, Katib then performs multiple tests to find the optimal combination. It can work with several hyperparameters, like learning rate, and several layers at the same time.
Also, Katib offers a neural architecture search (NAS) feature, a more comprehensive approach to model design that optimizes not only hyperparameters but also a model’s structure and node weights.
After the ML developer is satisfied with the model, it’s time to deploy it to production. Many challenges arise here, like the need to select the right environment, train the model for production using GPUs, create a model access API, integrate with storage infrastructure, and ensure continuous improvement of the model based on the ingested data.
Kubeflow takes care of all of these challenges with its advanced model training and serving features.
Initial training of the model during the experimentation stage might not be enough to achieve a production-grade level of accuracy and performance. ML developers will often need to train the model on powerful GPUs and distributed infrastructure on more data.
Kubeflow provides various operators and custom resources to easily run training jobs on Kubernetes. It supports PyTorch jobs, TFJobs, and MXNet training jobs. If you want to train your model in a distributed way to achieve higher performance, you can use the MPI Operator that ships with Kubeflow.
Serving ML Models
After the model has achieved high accuracy, it’s time to serve it to end customers. The main challenge here is creating a model API for model users, ensuring high availability and fast model prediction.
The Kubernetes API and cloud-native ecosystem really shine as it comes to serving ML models. For example, to use TensorFlow Serving, you just need to create a K8s deployment with the TensorFlow Serving container and a K8s service for this deployment. Kubeflow ships with the Istio service mesh by default, so you can also easily use Istio virtual services to route traffic to the model and expose it to the world using the Istio gateway. Plus, you can improve model performance by splitting traffic between different model instances using Istio DestinationRule. K8s and Kubeflow will manage the model to ensure that the number of pods matches the desired state and that the model is always healthy and available.
In addition to individual serving frameworks (e.g., TensorFlow Serving and PyTorch), KubeFlow ships with multi-framework model serving tools such as KFServing and Seldon Core. These tools support the most popular ML frameworks and offer some advanced model serving features. For example, KFServing supports GPU autoscaling, canary rollouts, model health checks, model metrics observability, and post-processing of predictions for performance assessment.
Monitoring Model Performance with Metadata
The Kubeflow Metadata project supports model auditing and monitoring by tracking and managing the metadata generated by the ML workflows. Accessible via the Kubeflow UI, this metadata provides information about model execution (runs), model design, datasets used, model metrics (e.g., accuracy), and other artifacts. These types of metadata can be programmatically generated by using Metadata SDK calls in your ML code and then accessed for audit or regulatory compliance in the Artifact Store in the Kubeflow UI.
Wrapping up, Kubeflow brings the power of Kubernetes to ML workflows and provides all the necessary tools at each stage of the MLOps process. Among the best features Kubeflow has to offer:
- Defining and orchestrating the entire ML workflow using Kubeflow Pipelines
- Seamless integration between ML libraries, Jupyter notebooks, ML code, and Kubernetes for easy containerization of ML models
- Automated hyperparameter and neural architecture search using state-of-the-art algorithms
- Fast execution of training jobs in a distributed environment
- Automatic deployment and serving of the model using ML serving providers (KFServing, Seldon), K8s Services, and the Istio service mesh
- Monitoring the health of ML models and auto-scaling models in the cloud environment using Seldon and KFServing
- Enriching the ML pipeline with metadata about inputs, execution, and performance
These features make Kubeflow a valuable tool for businesses seeking fast and efficient deployment of ML models in production. The platform has the ability to dramatically reduce time-to-market for ML products.