Container-Native Multi-Cluster Global Load Balancing With Cloud Armor on Google Cloud Platform

During a recent project, a Jetstack customer wanted to load balance global traffic to multiple Google Kubernetes Engine (GKE) clusters, while also benefiting from Google’s Cloud Armor to protect against denial of service (DoS) attacks. Additionally, they wanted to make use of container-native load balancing for improved traffic visibility and network performance.

Google Cloud Platform (GCP) offers various load balancing solutions which are generally well documented and easy to use. However with our specific set of requirements, we found that things got a bit more difficult. We worked with the customer’s infrastructure team to design and implement a custom global load balancer.

This blog post explains the various challenges we encountered along the way, and how we eventually managed to get everything we wanted and simplify the load balancing infrastructure.

This post also has an accompanying demo of the solution we designed, which is linked in a later section.

Background

Previously the customer had made use of Kubernetes Services of type LoadBalancer. This triggers the cloud-controller, a Kubernetes controller designed to integrate with cloud providers, to create a GCP network load balancer for each of the applications they needed to expose.

Before the Services were created Terraform was used to reserve Google compute IP addresses for each application. These IPs were then specified in the Service manifest to be used by the cloud-controller when creating the load balancer.

This process was completed for each of their clusters in different regions. DNS A records with regional policies were created for each application, with all the IP addresses for that application in different regions. Users were then directed to the right IP address, and thus the right cluster, based on geolocation.

Diagram of load balancer using regional DNS records and Services of type LoadBalancer

This diagram shows the load balancing configuration and how traffic flows from the DNS records to the correct Pods.

The EU and US ovals show the effect of the regional DNS, routing traffic to a different compute IP address in a different GCP region based on where the request originated from.

The manually reserved compute IP addresses are shown in blue, and the load balancer components and health check created by the cloud-controller are shown in green.

Only two applications are shown here, but the customer has many more, each with their own DNS record, compute IP address, and load balancer.

Inside the Load Balancer

With the existing setup the load balancers created by the cloud-controller for the LoadBalancer type Services are L4 TCP network load balancers. As shown in the previous diagram these are actually comprised of a forwarding rule and target pool. Internally the load balancer isn’t a fixed thing, but part of the configuration of GCP’s software defined network.

Different types of GCP load balancer are comprised of different network components, which this post will explore.

All load balancers also require a health check. This is external to the load balancer, but is required to determine which instances can have traffic routed to them. This is also created by the cloud-controller.

Version 1 - HTTP(S) Load Balancers With Cloud Armor

Unfortunately the L4 TCP network load balancers don’t support Google Cloud Armor. To make use of this protection the infrastructure would need to switch to using L7 HTTP(S) load balancers.

Ingress

In the same way that a network load balancer is created for Kubernetes Services of type LoadBalancer, a HTTP(S) load balancer is created automatically for Kubernetes Ingress resources. These load balancers are set up by a separate ingress-gce controller.

While the customer did not choose to use an Ingress approach in the end, it was explored as an option, and was the first step on the way to the final design. As the HTTP(S) load balancer was still required all the changes discussed here; changing from an L4 to L7 load balancer, using URL maps, and terminating TLS connection; still needed to be resolved.

L4 to L7

A major challenge of switching to L7 HTTP(S) load balancers was the restriction on which ports could be used. The L4 TCP network load balancers used previously allowed any port to be used. However HTTP load balancers only allow TCP ports 80 and 8080, and HTTPS load balancers only allow TCP port 443.

The customer’s applications used a range of ports to separate HTTP and gRPC traffic. Work had to be done to verify that each application could be changed to work with the limited range of ports. Client applications also required updates, and this change meant that the old load balancer would have to be left in place for a while after the change to give users time to update.

URL Maps

Another important change was brought about by theL7 HTTP(S) load balancers use of a URL map. This component of the load balancer can be used to direct traffic to different backends depending on the domain or path that is queried.

This means that the customer’s users can still access different applications from different domains. But now rather than these domains pointing to different external compute IP address and load balancers for each application, there is just a single address and load balancer. The URL map then handles routing traffic to the correct Pods based on the domain that has been queried.

Having only one compute IP address and load balancer in each region simplified the customer’s infrastructure and minimised the attack surface. It also allowed DNS records to be simplified.

HTTPS

Previously the customer’s applications supported HTTP and HTTPS over the same load balancer. The TLS termination was handled by the applications themselves, which used a single shared certificate for HTTPS, which included all subdomains that were used.

Switching to L7 HTTP(S) load balancing means that to support HTTPS the load balancer would now need to terminate the TLS connection and would therefore need to have access to the certificates used.

When using a load balancer set up from a Kubernetes Ingress this can be done either by creating a Kubernetes Secret containing the relevant key and certificate, then referencing it in the Ingress, or by creating a GCP SSL certificate and referencing it an annotation on the Ingress. The customer decided to use a GCP SSL certificate as Kubernetes Secrets are less secure, and this made it easier to restrict access using GCP IAM roles.

Additionally, supporting HTTP and HTTPS requires two versions of the load balancer’s forwarding rule and target proxy as each one can only use ports 80 or 8080, and 443 respectively. When using Ingress resources configured for HTTPS both of these component types are created automatically.

A sample ingress that could be used to create this load balancer is given below.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: apps
  annotations:
    ingress.gcp.kubernetes.io/pre-shared-cert: "apps-ssl-cert"
spec:
  rules:
  - host: app1.customer.io
    http:
    paths:
    - backend:
      serviceName: app1
      servicePort: 80
  - host: app2.customer.io
    http:
    paths:
    - backend:
      serviceName: app2
      servicePort: 80

Note the ingress.gcp.kubernetes.io/pre-shared-cert annotation used to point the ingress-gce controller at the GCP SSL certificate.

Diagram of load balancer using regional DNS records and Ingress resources

This diagram shows the proposed setup, with the load balancer components created by the ingress-gce controller shown in green. While it is not so clear with only two applications in the previous diagram, this reduces the number of load balancers the customer had quite substantially.

Version 2 - Container Native

Standard load balancers on GCP use instance groups as the backend, directing traffic to the compute instances operating as Nodes in the cluster. These then use iptables rules to direct traffic to the relevant Pod on that Node.

This adds at least one extra network hop, and by default the traffic could then be sent to a Pod on another Node which would add further hops. This can be prevented by setting the Kubernetes Service field externalTrafficPolicy to Local, though this can degrade load balancing effectiveness and cause scheduling problems.

Container native load balancing uses a component called Network Endpoint Groups (NEGs) to target Kubernetes Pods directly as the backend of the load balancer. This improves network efficiency, reduces hops, and increases visibility. It also allows for more future integrations with Cloud Armor and other load balancer enhancements.

NEG Annotation

On GKE the NEGs can be created and managed automatically by adding an annotation to a Kubernetes Service. While there are some examples of how this annotation could be used, there wasn’t much general documentation for how the annotation worked. In the end we found more details in the source code for the ingress-gce controller on GitHub.

The annotation key is cloud.google.com/neg. The value is then a small blob of JSON with two possible keys; "ingress" and "exposed_ports".

The "ingress" key is boolean, and should be set to true if the NEG is going to be used with an Ingress resource. It causes the ingress-gce controller to wait on creating NEGs until the Ingress is created.

The "exposed_ports" key is to specify which ports the Service exposes. The format is to specify the port number as a key with an empty JSON object. For example "80":{}. Note that the NEG annotation can be used like this standalone, without an Ingress resource.

Altogether this would look like this:

apiVersion: v1
kind: Service
metadata:
  name: app1
  labels:
    app: app1
  annotations:
    cloud.google.com/neg: '{"ingress": true, "exposed_ports": {"80":{}}}'
...

Examples of this in use can be found in the manifests in the demo, linked later in this post.

Diagram of load balancer using regional DNS records and Ingress resources with Network Endpoint Groups

This diagram shows the new proposed setup. It now includes the network endpoint groups, shown in green, which are created by the ingress-gce controller from the annotations on the Services. The network endpoint groups are actually zonal resources, but have the same name for each zone in a region. They then only point to Pods running on Nodes in that zone. This detail is omitted from the diagram to avoid overcomplicating it, but a complete example of how it works can be seen in the demo below.

The load balancer created by the ingress-gce controller is also shown in green. The Kubernetes Nodes are no longer shown as they have been removed from traffic flow.

Version 3 - Going Global

Using Kubernetes Ingress resources to trigger the creation of L7 HTTP(S) load balancers would have allowed us to use Cloud Armor and container native load balancing. However we also decided that as part of this change we wanted to move to global load balancing on Google Cloud Platform, rather than relying on regional DNS records to direct traffic to clusters in different regions.

This would simplify the load balancing, by removing the need for a full load balancer in every region. Visibility was also improved, as GCP exposes data about the volume of traffic being directed to different regions. It would also simplify the DNS records, which were currently being created by a large and hard to maintain Terraform project.

Service and Ingress

Currently it’s not possible to make a load balancer created automatically for a Kubernetes Ingress or Service to route traffic to multiple clusters. This is because a corresponding resource would need to be present in each cluster, however each cluster has its own cloud-controller or ingress-gce controller which will try to create its own load balancer. There is no mechanism to designate one cluster as the master in the configuration.

The kubemci Tool

Google provide a multi cluster ingress tool called kubemci, which can be used to automate the setup of a global load balancer from a Kubernetes Ingress resource. It essentially completes the same steps that the cloud-controller would, but as it is external to any cluster it can coordinate between them. When using it an annotation must be added to Ingress resources to tell the cloud controller not to create a load balancer.

We chose not to use this tool for several reasons. Firstly, it did not support using NEGs for container native load balancing. It also didn’t seem very actively maintained, and as the customer already had a lot of tooling in place for creating infrastructure, we felt it would be better to make use of that, and create the global load balancer ourselves with Terraform.

The Custom Solution

Creating the load balancer ‘manually’ with Terraform took a bit of experimentation to get right. The GCP load balancer API combined with Terraform led to some odd situations at times. For example the port_name field in the backend_service is redundant when using NEGs, but still required. Changing some fields on resources can also change the type of thing they’re required to refer to, or be referred to by, which Terraform doesn’t handle very well. Overall when making substantial changes to the configuration it was often easier to just tear the whole thing down and recreate it, rather than navigate the delicate chain of dependencies.

Eventually we reached a good configuration which mirrored the load balancer created for Kubernetes Ingress resources, but used global resources to balance traffic between multiple clusters in different regions.

Terraform NEGs

A key step was referencing the NEGs, created for Kubernetes Services by the ingress-gce controller, in the Terraform project. This is done by creating a google_compute_network_endpoint_group data resource.

The name of each NEG is generated randomly by the ingress-gce controller. This means the Service must be created first, then when the corresponding NEG is created the name can be queried and added to the Terraform project. The name of the NEG is added with the cloud.google.com/neg-status annotation to the Kubernetes Service, which makes it easy to find.

For example, after NEG creation the Service will look like this:

apiVersion: v1
kind: Service
metadata:
  name: app1
  labels:
    app: app1
  annotations:
    cloud.google.com/neg: '{"ingress": true, "exposed_ports": {"80":{}}}'
    cloud.google.com/neg-status: 'k8s1-1f6cdaf7-default-app1-80-5101da04'

The Terraform data resource will then look something like this:

data "google_compute_network_endpoint_group" "app1_neg_eu_1" {
  name = "k8s1-1f6cdaf7-default-app1-80-5101da04"
  zone = "europe-west2-a"
}

As mentioned previously, the NEG resources are actually zonal. But for a regional cluster they will have the same name for each zone in the region. This means it’s easy to create multiple Terraform data resources, one for each zone.

Full Control

Now that we have full control of this load balancer it is also easier to adjust and add or remove clusters as required. This is particularly useful when adding new clusters as it allows them to be introduced to the environment by updating the global load balancer, and easily removed it if they cause problems. It also allows new clusters to be added in other regions with traffic being directed to them automatically.

Diagram of load balancer using global load balancer created with Terraform and Services of type NodePort with Network Endpoint Groups

This diagram shows the load balancer components, now created by Terraform, in blue. The network endpoint groups are still created by the ingress-gce controller, and are shown in green.

Demo

While initially testing load balancer configurations, a proof of concept was created. This has been converted into a demo to accompany this blog post. It uses Terraform to create two clusters in different regions, add some sample workloads, and then create a global load balancer. The demo includes instructions on how to set everything up and test the load balancer.

https://github.com/wwwil/glb-demo

Conclusion

Switching from having many automatically created load balancers, and performing region balancing at the DNS level to creating a single, global, multi-cluster load balancer with Terraform required a bit of learning and experimentation. But overall it gave a valuable insight into the internals of GCP load balancers, and allowed the customer to get everything they required for their infrastructure.

In future it would be great to see better support for automatic multi-cluster load balancing on GKE. This could be achieved by providing some way for the cloud-controller or ingress-gce controllers to communicate across clusters, with one designated as master that would create the GCP load balancer components, and the others just providing updates of the NEGs or instances that should be routed to.

Much of the challenge in this project came from moving to HTTP(S) load balancing, but the multi-cluster requirement still presented a lot of difficulty and didn’t feel well supported in terms of the tools and information made available. Needing to move from having load balancers created automatically to creating them with Terraform took a reasonable amount of time in research and experimentation to gain confidence that our solution would work.

Hopefully this post and demo can provide some useful information and a template for creating your own solutions.