Machine Learning Abstractions with Kubeflow

Kubeflow is an open-source platform for developing, deploying and managing machine learning systems. Built on top of Kubernetes, it is designed to be both portable and scalable. There are many components that form a Kubeflow installation and recently we have been working with a number of customers to help them evaluate its potential. However, in this post we are going to focus specifically on how Kubeflow leverages the extensibility of Kubernetes to implement machine learning specific functionality.

Machine Learning

A machine learning system is one that builds and trains a predictive model by processing input data. Typically this involves iteratively modifying the model’s parameters based on the input data, saving the trained model to disk and then applying the trained model to new input data to make predictions (also known as inference).

However, a data scientist skilled at designing and building these models may not also have the operational knowledge to deal with the many other aspects required for productionising such a model; this may include running distributed training at scale, or resiliently serving the model to allow predictions to be made on demand. This is where Kubeflow’s Kubernetes extensions can help.

Custom Resources

Kubeflow uses Kubernetes custom resources to extend the Kubernetes API to support machine learning specific resources. These resources are designed to present an interface that is familiar to data scientists, while abstracting away operational details. Custom controllers are then deployed to maintain the desired state specified by these resources.


InferenceService is an example of a custom resource installed by Kubeflow. It is implemented by the KFServing component and allows data scientists to serve trained models for inference.

The following simple example serves a TensorFlow model over HTTP. TensorFlow is a machine learning platform which provides libraries for building and training machine learning models. It includes a serving component which supports serving a trained model saved in a standard format called SavedModel.

kind: InferenceService
  name: demo
  namespace: default
      serviceAccountName: demo
            cpu: 1
            memory: 1Gi
        runtimeVersion: 1.14.0
        storageUri: gs://demo/model

We define the InferenceService with the location at which the model is saved in the SavedModel format (gs://demo/model). We can also specify the serving framework (tensorflow) and version (1.14.0) as well as resources. The serviceAccountName field can be used to grant the necessary permissions for serving the model, for example by using GKE Workload Identity the service can be allowed to read Google Cloud Storage to pull down the saved model to be served.

When this resource is applied to the cluster, the custom controller is triggered and orchestrates the necessary Kubernetes resources to serve the model; this involves interacting with the Istio and Knative Serving APIs, which are both dependencies of KFServing.

Once the InferenceService is ready (kubectl wait --for=condition=Ready inferenceservice/demo -n kfserving --timeout=300s), the status of the resource will contain an HTTP URL (kubectl get inferenceservice demo -n kfserving -o jsonpath='{.status.url}') which has been generated dynamically (by default using the resource’s name and Namespace to make it unique across the cluster). Using a standard API, new input data can be POSTed to this endpoint to return model predictions. Note that gRPC support is on the roadmap.

Using InferenceServices to manage model serving in this way allows operational knowledge to be captured by the KFServing custom controller, turning Kubernetes into a serverless platform for serving trained models. Additional features such as canary rollouts, pre/post-processing, explainability support and autoscaling (by default) make KFServing a powerful tool, with plenty more to look forward to.


TFJob is another custom resource installed by Kubeflow. It is implemented by the TFJob operator and can be used to run distributed training of TensorFlow models across a Kubernetes cluster.

This works by running each task (Pod) in a training cluster with the correct TF_CONFIG environment variable set; this tells the task its role in the cluster and the endpoints of the other tasks to connect to (TensorFlow libraries can be used to parse TF_CONFIG to automatically wire up the interactions between tasks). This is similar to the capability offered by AI Platform, but with the TFJob implementation using Pods instead of VMs.


Kubeflow Pipelines provides a Python SDK that can be used to generate definitions of your machine learning training pipelines. In fact, pipelines are defined as Argo Workflows by default, which are themselves implemented as a Kubernetes custom resource and controller. Tekton pipelines are also supported as an alternative.

Pipelines can be used to orchestrate graphs of containers, but in particular they can be used to orchestrate other custom resources using custom pipeline tasks (for example InferenceServices and TFJobs).


Looking at only a fraction of the capabilities offered by Kubeflow we can start to understand some of its major benefits; using the Kubernetes API and machinery to abstract away and manage key elements of the lifecycle of a machine learning system gives a lot of control to data scientists who may not have the operational knowledge (or time) to configure these setups themselves.

Get in Touch

If you are interested in leveraging Kubernetes and Kubeflow for managing machine learning workloads, Jetstack offers consulting and subscription to guide you on your journey. Get in touch to find out how we can help!

Tags// , ,