Cluster API: Past, Present and Future

The Cluster API is a Kubernetes project that brings declarative, Kubernetes-style APIs to cluster creation. It does this by using CustomResourceDefinitions to extend the API exposed by the Kubernetes API Server, allowing users to create new resources such as Clusters (representing a Kubernetes cluster) and Machines (representing the machines that make up the Nodes that form the cluster). A controller for each resource is then responsible for reacting to changes to these resources to bring up the cluster. The API is designed in such a way that different infrastructure providers can integrate with it to provide their environment specific logic.

The Cluster API project is still in the early stages, but what is currently possible already demonstrates the enormous power it brings. The aim of this post is to summarise the capabilities of the project to date and to look ahead to what is in store for subsequent releases.

Past, Present and Future

At the time of writing, the most recent release of the Cluster API implements the v1alpha2 version. Here we discuss the transformation of this API and how providers can integrate with it.

Past: v1alpha1

The initial v1alpha1 implementation of the Cluster API requires providers to include the Cluster API controller code in their project and to implement actuators (interfaces) to handle their environment specific logic (for example calls to cloud provider APIs). The code runs as a single provider specific manager binary which manages a controller for each of the resources required to manage a cluster.

Present: v1alpha2

One of the pain points of the v1alpha1 method of consuming the Cluster API is that it requires each provider to implement a certain amount of bootstrap boilerplate code, typically using kubeadm. To remedy this, v1alpha2 introduces bootstrap providers which are responsible for generating the data required to turn a Machine into a Kubernetes Node. The kubeadm bootstrap provider is a bootstrap provider implementation that is able to handle this task for all environments using kubeadm. Its default behaviour is to generate a cloud-config script for each Machine which can be used to bootstrap the Node.

Another change introduced by v1alpha2 is that it is no longer necessary for providers to include Cluster API controller code in their projects. Instead, Cluster API offers independent controllers respnsible for the core types. For further details on the motivations behind these changes see the proposal.

For this version there are now three managers (instead of one) that need to be deployed:

  • Cluster API manager: to manage core v1alpha2 resources
  • Bootstrap provider manager: to manage resources to generate the data to turn a Machine into a Kubernetes Node
  • Infrastructure provider manager: to manage resources that provide the infrastructure required to run the cluster

For example, if I wanted to create a cluster on GCP configured using kubeadm, I would deploy the Cluster API manager (to reconcile core resources, for example Cluster and Machine resources), the kubeadm bootstrap provider (to reconcile KubeadmConfig resources, for example) and the GCP infrastructure provider (to reconcile environment specific resources, for example GCPClusters and GCPMachines).

To see how these resources should be applied, we will run through a cluster deployment using a Kubernetes infrastructure provider implementation that I wrote — that is, a provider where the infrastructure is provided by Kubernetes itself. Kubernetes Nodes run as Kubernetes Pods using kind images.

To start, we need to create a base cluster to provide the infrastructure for our Cluster API cluster. We will be using GKE here. The following commands assume you have gcloud installed with a GCP project and billing account set up.

WARNING: the gcloud commands will cost money — consider using the GCP Free Tier.

Calico will be used as the CNI solution for the Cluster API cluster. This requires some particular configuration when provisioning the GKE cluster in order to route IPv4 encapsulated packets. To not distract from describing Cluster API behaviour we will run them here without explanation. Refer to the Kubernetes infrastructure provider repository for details.

gcloud container clusters create management-cluster --cluster-version=1.14 --image-type=UBUNTU
CLUSTER_CIDR=$(gcloud container clusters describe management-cluster --format="value(clusterIpv4Cidr)")
gcloud compute firewall-rules create allow-management-cluster-pods-ipip --source-ranges=$CLUSTER_CIDR --allow=ipip
kubectl apply -f <(cat <<EOF
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: forward-ipencap
  namespace: kube-system
  labels:
    app: forward-ipencap
spec:
  selector:
    matchLabels:
      name: forward-ipencap
  template:
    metadata:
      labels:
        name: forward-ipencap
    spec:
      hostNetwork: true
      initContainers:
      - name: forward-ipencap
        command:
        - sh
        - -c
        - |
          apk add iptables
          iptables -C FORWARD -p ipencap -j ACCEPT || iptables -A FORWARD -p ipencap -j ACCEPT
        image: alpine:3.11
        securityContext:
          capabilities:
            add: ["NET_ADMIN"]
      containers:
      - name: sleep-forever
        image: alpine:3.11
        command: ["tail"]
        args: ["-f", "/dev/null"]
EOF
)

With the GKE cluster provisioned, we can now deploy the necessary managers.

# Install cluster api manager
kubectl apply -f https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.2.8/cluster-api-components.yaml

# Install kubeadm bootstrap provider
kubectl apply -f https://github.com/kubernetes-sigs/cluster-api-bootstrap-provider-kubeadm/releases/download/v0.1.5/bootstrap-components.yaml

# Install kubernetes infrastructure provider
kubectl apply -f https://github.com/dippynark/cluster-api-provider-kubernetes/releases/download/v0.2.1/provider-components.yaml

# Allow cluster api controller to interact with kubernetes infrastructure resources
# If the kubernetes provider were SIG-sponsored this would not be necesarry ;)
# https://cluster-api.sigs.k8s.io/providers/v1alpha1-to-v1alpha2.html#the-new-api-groups
kubectl apply -f https://github.com/dippynark/cluster-api-provider-kubernetes/releases/download/v0.2.1/capi-kubernetes-rbac.yaml

We can now deploy our cluster.

kubectl apply -f <(cat <<EOF
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesCluster
metadata:
  name: example
spec:
  controlPlaneServiceType: LoadBalancer
---
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Cluster
metadata:
  name: example
spec:
  clusterNetwork:
    services:
      cidrBlocks: ["172.16.0.0/12"]
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    serviceDomain: "cluster.local"
  infrastructureRef:
    apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
    kind: KubernetesCluster
    name: example
EOF
)

Here we define our environment specific KubernetesCluster resource. This is expected to provision the necessary infrastructure components needed to run a Kubernetes cluster. For example, a GCPCluster might provision a VPC, firewall rules and a load balancer to reach the API Server(s). Here our KubernetesCluster just provisions a Kubernetes Service of type LoadBalancer for the API Server. We can query the KubernetesCluster to see its status.

$ kubectl get kubernetescluster
NAME      PHASE         HOST             PORT   AGE
example   Provisioned   35.205.255.206   443    51s

We reference our provider specific cluster resource from our core Cluster resource which provides networking details for the cluster. The KubernetesCluster will be modified to be owned by the Cluster resource.

We are now ready to deploy our Machines. Here we create a controller Machine which references the infrastructure provider specific KubernetesMachine resource together with a bootstrap provider specific KubeadmConfig resource.

kubectl apply -f <(cat <<EOF
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfig
metadata:
  name: controller
spec:
  initConfiguration:
    nodeRegistration:
      kubeletExtraArgs:
        eviction-hard: nodefs.available<0%,nodefs.inodesFree<0%,imagefs.available<0%
        cgroups-per-qos: "false"
        enforce-node-allocatable: ""
  clusterConfiguration:
    controllerManager:
      extraArgs:
        enable-hostpath-provisioner: "true"
---
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesMachine
metadata:
  name: controller
---
apiVersion: cluster.x-k8s.io/v1alpha2
kind: Machine
metadata:
  name: controller
  labels:
    cluster.x-k8s.io/cluster-name: example
    cluster.x-k8s.io/control-plane: "true"
spec:
  version: "v1.17.0"
  bootstrap:
    configRef:
      apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
      kind: KubeadmConfig
      name: controller
  infrastructureRef:
    apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
    kind: KubernetesMachine
    name: controller
EOF
)

The kubeadm bootstrap provider turns the KubeadmConfig resource into a cloud-config script which is consumed by the Kubernetes infrastructure provider to bootstrap a Kubernetes Pod to form the control plane for the new cluster.

The Kubernetes infrastructure provider does this by leaning on systemd which runs as part of the kind image; a bash script is generated from the cloud-config script to create and run the specified files and commands. The script is mounted into the Pod using a Kubernetes Secret which is then triggered using a systemd path unit once the containerd socket is available. You can exec into the controller Pod and run journalctl -u cloud-init to see the output of this script. cat /opt/cloud-init/bootstrap.sh will show the full script.

Once the kubelet is running it registers itself with the cluster by creating a controller Node object in etcd (also running on the controller Pod).

We can now deploy our worker Machines. This looks quite similar to the controller Machine provisioning except we make use of a MachineDeployment, KubeadmConfigTemplate and KubernetesMachineTemplate to request multiple replicas of a worker Node.

kubectl apply -f <(cat <<EOF
apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
kind: KubernetesMachineTemplate
metadata:
  name: worker
spec:
  template:
    spec: {}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfigTemplate
metadata:
  name: worker
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            eviction-hard: nodefs.available<0%,nodefs.inodesFree<0%,imagefs.available<0%
            cgroups-per-qos: "false"
            enforce-node-allocatable: ""
---
apiVersion: cluster.x-k8s.io/v1alpha2
kind: MachineDeployment
metadata:
  name: worker
  labels:
    cluster.x-k8s.io/cluster-name: example
    nodepool: default
spec:
  replicas: 3
  selector:
    matchLabels:
      cluster.x-k8s.io/cluster-name: example
      nodepool: default
  template:
    metadata:
      labels:
        cluster.x-k8s.io/cluster-name: example
        nodepool: default
    spec:
      version: "v1.17.0"
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
          kind: KubeadmConfigTemplate
          name: worker
      infrastructureRef:
        apiVersion: infrastructure.lukeaddison.co.uk/v1alpha2
        kind: KubernetesMachineTemplate
        name: worker
EOF
)

MachineDeployments work similarly to Kubernetes Deployments in that they manage MachineSets which in turn manage the desired number of replicas of Machines.

We should now be able to query the Machines we have provisioned to see their status.

$ kubectl get machines
NAME                      PROVIDERID                                          PHASE
controller                kubernetes://871cde5a-3159-11ea-a1c6-42010a840084   provisioning
worker-6c498c48db-4grxq                                                       pending
worker-6c498c48db-66zk7                                                       pending
worker-6c498c48db-k5kkp                                                       pending

We can also see the corresponding KubernetesMachines.

$ kubectl get kubernetesmachines
NAME           PROVIDER-ID                                         PHASE          AGE
controller     kubernetes://871cde5a-3159-11ea-a1c6-42010a840084   Provisioning   53s
worker-cs95w                                                       Pending        35s
worker-kpbhm                                                       Pending        35s
worker-pxsph                                                       Pending        35s

Soon all KubernetesMachines should be in a Running state.

$ kubectl get kubernetesmachines
NAME           PROVIDER-ID                                         PHASE     AGE
controller     kubernetes://871cde5a-3159-11ea-a1c6-42010a840084   Running   2m
worker-cs95w   kubernetes://bcd10f28-3159-11ea-a1c6-42010a840084   Running   1m
worker-kpbhm   kubernetes://bcd4ef33-3159-11ea-a1c6-42010a840084   Running   1m
worker-pxsph   kubernetes://bccd1af4-3159-11ea-a1c6-42010a840084   Running   1m

We can also see the Pods corresponding to our KubernetesMachines.

$ kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
controller     1/1     Running   0          2m11s
worker-cs95w   1/1     Running   0          111s
worker-kpbhm   1/1     Running   0          111s
worker-pxsph   1/1     Running   0          111s

The Cluster API manager generates a kubeconfig and stores it as a Kubernetes Secret called <clusterName>-kubeconfig. We can retrieve that and access the cluster.

$ kubectl get secret example-kubeconfig -o jsonpath='{.data.value}' | base64 --decode > example-kubeconfig
$ export KUBECONFIG=example-kubeconfig
$ kubectl get nodes
NAME           STATUS     ROLES    AGE     VERSION
controller     NotReady   master   3m16s   v1.17.0
worker-cs95w   NotReady   <none>   2m34s   v1.17.0
worker-kpbhm   NotReady   <none>   2m32s   v1.17.0
worker-pxsph   NotReady   <none>   2m34s   v1.17.0

Finally, we can apply our Calico CNI solution. The Nodes should soon become Ready.

$ kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
$ kubectl get nodes
NAME           STATUS   ROLES    AGE     VERSION
controller     Ready    master   5m8s    v1.17.0
worker-cs95w   Ready    <none>   4m26s   v1.17.0
worker-kpbhm   Ready    <none>   4m24s   v1.17.0
worker-pxsph   Ready    <none>   4m26s   v1.17.0

We can now run workloads on our brand new cluster! kubectl run nginx --image=nginx --replicas=3

This flow would be similar for other infrastructure providers. Many other examples can be found in the Cluster API quick start.

Future: v1alpha3 and Beyond

We are only just scratching the surface of the capabilities Cluster API has the potential to provide. We will go over some of the other cool things that are on the roadmap.

MachineHealthCheck

In v1alpha2 an infrastructure specific Machine can mark itself as failed and the status will bubble up to the owning Machine, but no action is taken by an owning MachineSet. The reason for this is that resources other than a MachineSet could own the Machine and so it makes sense for Machine remediation logic to be decoupled from MachineSets.

MachineHealthCheck is a proposed resource to describe failure scenarios for Nodes and to delete the corresponding Machine should one occur. This would trigger the appropriate deletion behaviour (e.g. drain) and any controlling resource to bring up a replacement Machine.

KubeadmControlPlane

Currently, creating a HA control plane and managing the control plane in general requires carefully configuring independent controller Machines with the correct bootstrap configuration (which need to come up in the correct order). v1alpha3 looks to support control plane providers with an initial kubeadm control plane implementation. This will require few changes from an infrastructure provider perspective but will allow users to manage the instantiation and scaling of the control plane without manually creating the corresponding Machines. The kubeadm control plane proposal provides further details.

Together with MachineHealthChecks, automatic control plane remediation would be possible using the Cluster API.

Cluster Autoscaler

Cluster Autoscaler is one example of a project that can leverage Cluster API. The current implementation requires each supported cloud provider to implement the CloudProvider and NodeGroup interfaces necessary for scaling groups of instances in their environment. With the advent of Cluster API, autoscaling logic could be implemented in a provider agnostic way by interacting with Cluster API resources instead of directly with provider specific APIs.

Summary

We have taken quite an in-depth look at the current capabilities of the Cluster API and what to look forward to in the near future. It’s a very exciting time for the project as it looks to reach completeness. As with almost anything Kubernetes related, opportunities to contribute are open and numerous.