GKE on AWS

This is the first in a series of posts taking a look at Google Cloud Anthos, and how it seeks to facilitate digital transformation and become the management plane for enterprise workloads across hybrid and multi-cloud environments, starting with GKE on AWS becoming generally available.

The value proposition of Anthos is to enable environmental agnosticism, with containers and Kubernetes being the common denominator for our workloads. This allows for a level of portability through Anthos to manage workload deployments and lifecycles across multi-cloud (GCP, AWS and Azure), as well as on-prem data centres (VMWare & bare metal).

At Jetstack, we’re seeing an increasing amount of clients seeking to either adopt or mature their Kubernetes offering, whilst leveraging the advantages of cloud-native principles in accordance with their business requirements and existing infrastructure investments. The Anthos initiative typifies this requirement by laying the foundations to migrate and align workloads across organisational boundaries.

To open, we’ll be covering a facet of Anthos which brings the GKE experience to your AWS environment.

Anthos

Anthos is a framework of software components, orientated around managing the deployment and life-cycling of infrastructure and workloads across multiple environments and locations. With enterprises having established infrastructure presences across multiple cloud providers and on-premises locations, Anthos centralises the orchestration of segregated clusters and workloads, providing a single-pane-of-glass across hybrid and multi-cloud topologies. This consolidates operations and provides consistency across cloud providers, whilst embracing existing infrastructure investments and unlocking new possibilities for hybrid and multi-cloud compositions. This also allows for companies to modernise in place, continuing to run workloads on-prem or on their infrastructure but adopting Kubernetes and cloud-native principles.

As well as a Kubernetes distribution, Anthos also provides ways to simplify hybrid and multi-cloud consistency and compliance through Config Management & Policy Controller. With Config Management, the GitOps methodology is adopted to reconcile observed state with the desired state for Kubernetes objects in source control through a cluster Operator. Policy Controller facilitates conformance at scale by building on Gatekeeper to provide a constraint template library to ensure consistency and compliance of configuration, as well as offering extensibility through writing policies using OPA and Rego.

Anthos Service Mesh is core to the proposition of running hybrid Kubernetes across cloud and on-premises infrastructure. Built using Istio, it enhances our experience by abstracting and automating cross-cutting concerns, such as issuing workload identities via X.509 certificates to facilitate automatic mutual TLS across our workloads and clusters, and provides mechanisms for layer 7 traffic routing within the mesh.

Anthos Service Mesh also centralises the process of certificate issuance and renewal, leading to segregated clusters being able to have cross-boundary trust ensuring service-to-service communications can mutually authenticate.

GKE on AWS

GKE on AWS follows GKE On-Prem in being the next enabler to bring the GKE experience to your infrastructure. This means we can integrate with existing AWS environments, and leverage the Anthos stack to provide consistency across our clusters and workloads whilst having centralised operations with the GCP Console.

Bringing the GKE experience to AWS empowers developers, administrators and architects to incorporate GKE as we know it on Google Cloud into their existing infrastructure, whilst being selective about workload placement pertinent to business decisions and maximising business value.

This unlocks lots of opportunities to harness the advantages of multi-cloud. Now we can focus on the business logic of our applications, and deploy anywhere due to the homogenous runtime environments, with workloads being highly portable allowing for placement strategies to ensure high availability or scaling requirements.

We can also take advantage of proprietary managed services offered by cloud providers, allowing for flexibility when adopting a multi-cloud strategy and having an interoperability with our workloads and their infrastructure requirements.

Architecture

The solution for GKE on AWS provides the requisite tooling for deploying GKE into your AWS environment, creating new or working alongside existing resources. The design philosophy reuses the concept seen in the GKE On-Prem architecture, with a hierarchical model comprised of management and user clusters, the former using an AWS Controller to bootstrap the creation and manage the lifecycle of the latter.

aws-arch

Through the management cluster, we can create GKE clusters with the AWSCluster and AWSNodePool custom resource definitions. It is then the responsibility of the Cluster Operator controller to provision the necessary resources for the user clusters through the AWS APIs. This is implemented through the gke-aws-cluster-operator static pod, which is an application that contains the core cluster management logic.

One management cluster can administer multiple downstream user clusters, with control plane configuration stored in etcd and with storage persisted on an AWS EBS volume.

Deployment

Management cluster

To begin, we’ll deploy our management cluster using the anthos-gke cli. This will autogenerate some Terraform for us to deploy which will comprise the necessary infrastructure to host our management plane in AWS. It includes a dedicated VPC and subnets (or it can be integrated with an existing VPC), as well as security groups to facilitate inbound and outbound SSH and HTTPS traffic for the Kubernetes master.

The management cluster will allow us to administer GKE on AWS ‘user’ clusters for running our workloads, provisioned by Kubernetes objects. This hierarchical model of management and user clusters is a core principle of Anthos, whether it is with GKE On-Prem on VMWare or GKE on AWS, the orchestration on downstream clusters is all done through the Kubernetes API and CRDs.

We can begin our management cluster provisioning using anthos-gke. The prerequisites for the installation are:

  • KMS key for encrypting clusters secrets
  • KMS key for securing management service’s etcd database
  • Inbound SSH CIDR range for the bastion host created in the AWS public subnet for accessing the GKE nodes
  • GCP Project which has access to Anthos
  • Service accounts and keys to:
    • manage GKE for AWS membership to GKE Hub
    • set up Connect between GKE on AWS and GKE Hub
    • accessing gcr.io repository from GKE on AWS nodes

With these prerequisites, we can populate the required config file to configure the management service.

apiVersion: multicloud.cluster.gke.io/v1
kind: AWSManagementService
metadata:
  name: management
spec:
  version: aws-1.4.1-gke.15
  region: eu-west-1
  authentication:
    awsIAM:
      adminIdentityARNs:
      - arn:aws:iam::0123456789012:user/gke-aws-admin
  kmsKeyARN: arn:aws:kms:eu-west-1:0123456789012:key/4172f749-702e-41fe-ac3f-898d21930cb6
  databaseEncryption:
    kmsKeyARN: arn:aws:kms:eu-west-1:0123456789012:key/12560928-59cf-4157-a29f-eecb4ec93fd2
  googleCloud:
    projectID: jetstack-anthos
    serviceAccountKeys:
      managementService: management-key.json
      connectAgent: hub-key.json
      node: node-key.json
  dedicatedVPC:
    vpcCIDRBlock: 10.0.0.0/16
    availabilityZones:
    - eu-west-1a
    - eu-west-1b
    - eu-west-1c
    privateSubnetCIDRBlocks:
    - 10.0.1.0/24
    - 10.0.2.0/24
    - 10.0.3.0/24
    publicSubnetCIDRBlocks:
    - 10.0.4.0/24
    - 10.0.5.0/24
    - 10.0.6.0/24
    bastionAllowedSSHCIDRBlocks:
    - 198.51.100.0/24

Running anthos-gke aws management init will encrypt our service account keys and generate a root CA, writing these values to a configuration file.

$ anthos-gke aws management init
generating cluster ID
encrypting Google Cloud service account key (Management Service)
encrypting Google Cloud service account key (Connect Agent)
encrypting Google Cloud service account key (Node)
generating root certificate authority (CA)
writing file: anthos-gke.status.yaml

To create the cluster, we need to apply the generated configuration.

$ anthos-gke aws management apply
creating S3 bucket: gke-jetstack-anthos-eu-west-1-bootstrap
writing file: README.md
writing file: backend.tf
writing file: main.tf
writing file: outputs.tf
writing file: variables.tf
writing file: vpc.tf
writing file: terraform.tfvars.json
Initializing modules...
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_bastion_security_group_rules...
- gke_bastion_security_group_rules in .terraform/modules/gke_bastion_security_group_rules/modules/gke-bastion-security-group-rules
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_controlplane_iam_policies...
- gke_controlplane_iam_policies in .terraform/modules/gke_controlplane_iam_policies/modules/gke-controlplane-iam-policies
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_controlplane_iam_role...
- gke_controlplane_iam_role in .terraform/modules/gke_controlplane_iam_role/modules/gke-controlplane-iam-role
Downloading gcs::https://www.googleapis.com/storage/v1/gke-multi-cloud-release/aws/aws-1.4.1-gke.15/modules/terraform-aws-gke.tar.gz for gke_management...
- gke_management in .terraform/modules/gke_management/modules/gke-management
...
Apply complete! Resources: 61 added, 0 changed, 0 destroyed.

This creates an S3 bucket for the gke-aws-node-agent binary, and initialises the Terraform modules which will provision the necessary infrastructure to host the GKE on AWS management cluster.

Once Terraform has successfully completed provisioning the infrastructure, we can see that a bastion host, as well as the necessary EC2 instances, ELBs and security groups have been created.

management cluster instances

management cluster LBs

management cluster SGs

management cluster VPC

management cluster Subnets

These are Ubuntu 18.04 instances, launched into an auto scaling group with a launch template and user data running the gke-aws-node-agent, bootstrapping the instance to function as the management plane.

We can use the bastion host to gain access to the Kubernetes API by opening an SSH tunnel allowing anthos-gke to complete the setup.

terraform output bastion_tunnel > bastion-tunnel.sh
chmod 755 bastion-tunnel.sh
./bastion-tunnel.sh -N &
anthos-gke aws management get-credentials

After this we can connect to the management cluster using kubectl.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectx gke_aws_management_gke-404767c1 && kubectl cluster-info
Kubernetes master is running at https://gke-404767c1-management-06afb2d2341f17cb.elb.eu-west-1.amazonaws.com

AWSCluster

Now that we have our management cluster, we can provision user clusters to run our workloads. In AWS, user clusters manifest through the AWSCluster and AWSNodePool custom resources. This leverages a declarative, Kubernetes-style API approach for cluster creation, configuration and management.

If we take a look at the CRDs currently on the management cluster, we can see the two resources available to provision AWS clusters.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl get crd
NAME                                     CREATED AT
awsclusters.multicloud.cluster.gke.io    2020-07-27T09:57:57Z
awsnodepools.multicloud.cluster.gke.io   2020-07-27T09:57:57Z

Our previous Terraform deployment can be used to generate a configuration for a basic user cluster.

terraform output cluster_example > cluster-0.yaml
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSCluster
metadata:
  name: cluster-0
spec:
  region: eu-west-1
  authentication:
    awsIAM:
      adminIdentityARNs:
      - arn:aws:iam::0123456789012:user/gke-aws-admin
  networking:
    vpcID: vpc-027d53db543f33b56
    serviceAddressCIDRBlocks:
    - 10.1.0.0/16
    podAddressCIDRBlocks:
    - 10.2.0.0/16
    serviceLoadBalancerSubnetIDs:
    - subnet-08fdd9360320f5f27
    - subnet-03babff7f9b5d4a7b
    - subnet-072949cee3d082184
    - subnet-06efa94e317c13730
    - subnet-03808c78800c82a9d
    - subnet-07b471ccadb908e9b
  controlPlane:
    version: 1.16.9-gke.12
    keyName: gke-404767c1-keypair
    instanceType: t3.medium
    iamInstanceProfile: gke-404767c1-controlplane
    securityGroupIDs:
    - sg-0372ceaae4fc17084
    subnetIDs:
    - subnet-08fdd9360320f5f27
    - subnet-03babff7f9b5d4a7b
    - subnet-072949cee3d082184
    rootVolume:
      sizeGiB: 10
    etcd:
      mainVolume:
        sizeGiB: 10
    databaseEncryption:
      kmsKeyARN: arn:aws:kms:eu-west-1:0123456789012:key/12560928-59cf-4157-a29f-eecb4ec93fd2
    hub:
      membershipName: projects/jetstack-anthos/locations/global/memberships/cluster-0
---
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSNodePool
metadata:
  name: cluster-0-pool-0
spec:
  clusterName: cluster-0
  version: 1.16.9-gke.12
  region: eu-west-1
  subnetID: subnet-08fdd9360320f5f27
  minNodeCount: 3
  maxNodeCount: 5
  instanceType: t3.medium
  keyName: gke-404767c1-keypair
  iamInstanceProfile: gke-404767c1-nodepool
  maxPodsPerNode: 100
  securityGroupIDs:
  - sg-0372ceaae4fc17084
  rootVolume:
    sizeGiB: 10
---
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSNodePool
metadata:
  name: cluster-0-pool-1
spec:
  clusterName: cluster-0
  version: 1.16.9-gke.12
  region: eu-west-1
  subnetID: subnet-03babff7f9b5d4a7b
  minNodeCount: 3
  maxNodeCount: 5
  instanceType: t3.medium
  keyName: gke-404767c1-keypair
  iamInstanceProfile: gke-404767c1-nodepool
  maxPodsPerNode: 100
  securityGroupIDs:
  - sg-0372ceaae4fc17084
  rootVolume:
    sizeGiB: 10
---
apiVersion: multicloud.cluster.gke.io/v1
kind: AWSNodePool
metadata:
  name: cluster-0-pool-2
spec:
  clusterName: cluster-0
  version: 1.16.9-gke.12
  region: eu-west-1
  subnetID: subnet-072949cee3d082184
  minNodeCount: 3
  maxNodeCount: 5
  instanceType: t3.medium
  keyName: gke-404767c1-keypair
  iamInstanceProfile: gke-404767c1-nodepool
  maxPodsPerNode: 100
  securityGroupIDs:
  - sg-0372ceaae4fc17084
  rootVolume:
    sizeGiB: 10

We can submit the AWSCluster and AWSNodePool resources to initiate the cluster creation.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl apply -f cluster-0.yaml
awscluster.multicloud.cluster.gke.io/cluster-0 created
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0 created
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-1 created
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-2 created

As our custom resource is part of the Kubernetes parlance, we can interact with the object to see it’s state.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl get AWSClusters,AWSNodePools
NAME                                             STATE          AGE   VERSION         ENDPOINT
awscluster.multicloud.cluster.gke.io/cluster-0   Provisioning   22s   1.16.9-gke.12   gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com

NAME                                                     CLUSTER     STATE          AGE   VERSION
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0   cluster-0   Provisioning   22s   1.16.9-gke.12
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-1   cluster-0   Provisioning   22s   1.16.9-gke.12
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-2   cluster-0   Provisioning   22s   1.16.9-gke.12

At this point, the AWS Controller is provisioning resources in AWS which will comprise our user cluster. The default GKE on AWS installation creates an AWSCluster with three control plane replicas in the same availability zones. The management cluster places the control planes in a private subnet behind an AWS Network Load Balancer (NLB). The management cluster interacts with the control plane using that NLB.

user cluster instances

management cluster lbs

management cluster SGs

Once the cluster bootstrap process is complete we can see the events.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl get events
LAST SEEN   TYPE     REASON                        OBJECT                         MESSAGE
4m56s       Normal   StartedNodePoolProvisioning   awsnodepool/cluster-0-pool-0   Started node pool provisioning
4m45s       Normal   CreatedLaunchTemplate         awsnodepool/cluster-0-pool-0   Created launch template named "gke-dfccaa67-nodepool-3125db34-1.16.9-gke.12"
4m41s       Normal   CreatedAutoScalingGroup       awsnodepool/cluster-0-pool-0   Created auto scaling group named "gke-dfccaa67-nodepool-3125db34"
4m56s       Normal   StartedNodePoolProvisioning   awsnodepool/cluster-0-pool-1   Started node pool provisioning
4m44s       Normal   CreatedLaunchTemplate         awsnodepool/cluster-0-pool-1   Created launch template named "gke-dfccaa67-nodepool-fdca9ec5-1.16.9-gke.12"
4m40s       Normal   CreatedAutoScalingGroup       awsnodepool/cluster-0-pool-1   Created auto scaling group named "gke-dfccaa67-nodepool-fdca9ec5"
4m56s       Normal   StartedNodePoolProvisioning   awsnodepool/cluster-0-pool-2   Started node pool provisioning
4m43s       Normal   CreatedLaunchTemplate         awsnodepool/cluster-0-pool-2   Created launch template named "gke-dfccaa67-nodepool-76e54252-1.16.9-gke.12"
4m38s       Normal   CreatedAutoScalingGroup       awsnodepool/cluster-0-pool-2   Created auto scaling group named "gke-dfccaa67-nodepool-76e54252"
4m56s       Normal   CreatingCluster               awscluster/cluster-0           Cluster version 1.16.9-gke.12 is being created
4m52s       Normal   TagSubnets                    awscluster/cluster-0           Tagged subnets ["subnet-08fdd9360320f5f27" "subnet-03babff7f9b5d4a7b" "subnet-072949cee3d082184" "subnet-06efa94e317c13730" "subnet-03808c78800c82a9d" "subnet-07b471ccadb908e9b"] with tags map["kubernetes.io/cluster/gke-dfccaa67":"shared"]
4m51s       Normal   CreatedSecurityGroup          awscluster/cluster-0           Created security group named "gke-dfccaa67-controlplane"
4m51s       Normal   CreatedSecurityGroup          awscluster/cluster-0           Created security group named "gke-dfccaa67-nodepool"
4m51s       Normal   CreatedEtcdVolume             awscluster/cluster-0           Created etcd volume on replica 0
4m50s       Normal   CreatedEtcdVolume             awscluster/cluster-0           Created etcd volume on replica 1
4m50s       Normal   CreatedEtcdVolume             awscluster/cluster-0           Created etcd volume on replica 2
4m50s       Normal   CreatedNetworkLoadBalancer    awscluster/cluster-0           Created network load balancer named "gke-dfccaa67-controlplane"
4m49s       Normal   CreatedTargetGroup            awscluster/cluster-0           Created target group named "gke-dfccaa67-controlplane"
4m48s       Normal   CreatedNetworkInterface       awscluster/cluster-0           Created network interface on replica 0
4m48s       Normal   CreatedNetworkInterface       awscluster/cluster-0           Created network interface on replica 1
4m47s       Normal   CreatedNetworkInterface       awscluster/cluster-0           Created network interface on replica 2
4m47s       Normal   CreatedListener               awscluster/cluster-0           Created listener on load balancer with ARN "arn:aws:elasticloadbalancing:eu-west-1:0123456789012:loadbalancer/net/gke-dfccaa67-controlplane/f26573d5bef4bba0"
4m46s       Normal   CreatedLaunchTemplate         awscluster/cluster-0           Created launch template named "gke-dfccaa67-controlplane-0-1.16.9-gke.12"
4m46s       Normal   CreatedLaunchTemplate         awscluster/cluster-0           Created launch template named "gke-dfccaa67-controlplane-1-1.16.9-gke.12"
4m46s       Normal   CreatedLaunchTemplate         awscluster/cluster-0           Created launch template named "gke-dfccaa67-controlplane-2-1.16.9-gke.12"
4m43s       Normal   CreatedAutoScalingGroup       awscluster/cluster-0           Created auto scaling group named "gke-dfccaa67-controlplane-0"
4m42s       Normal   CreatedAutoScalingGroup       awscluster/cluster-0           Created auto scaling group named "gke-dfccaa67-controlplane-1"
4m41s       Normal   CreatedAutoScalingGroup       awscluster/cluster-0           Created auto scaling group named "gke-dfccaa67-controlplane-2"
0s          Normal   RegisteredGKEHubMembership    awscluster/cluster-0           Registered to GKE Hub using membership "projects/jetstack-anthos/locations/global/memberships/cluster-0"
0s          Normal   AddonsApplied                 awscluster/cluster-0           Addons applied for version 1.16.9-gke.12
0s          Normal   InstalledGKEHubAgent          awscluster/cluster-0           Installed GKE Hub agent
0s          Normal   ClusterProvisioned            awscluster/cluster-0           Cluster version 1.16.9-gke.12 has been provisioned
0s          Normal   ProvisionedNodePool           awsnodepool/cluster-0-pool-1   Node pool provisioned
0s          Normal   ProvisionedNodePool           awsnodepool/cluster-0-pool-2   Node pool provisioned
0s          Normal   ProvisionedNodePool           awsnodepool/cluster-0-pool-0   Node pool provisioned

Also, we can see the logs from the gke-aws-cluster-operator from the management instance.

export POD_ID=$(sudo crictl pods --name gke-aws-cluster-operator --latest --quiet)
export CONTAINER_ID=$(sudo crictl ps --pod $POD_ID --latest --quiet)
sudo crictl logs $CONTAINER_ID
{"level":"info","ts":1594307993.1424375,"logger":"setup","msg":"starting cluster controller","version":"aws-0.2.1-gke.7"}
{"level":"info","ts":1594307993.2432592,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"awsnodepool-reconciler"}
{"level":"info","ts":1594307993.243259,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"awscluster-reconciler"}
{"level":"info","ts":1594307993.343698,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"awsnodepool-reconciler","worker count":1}
{"level":"info","ts":1594307993.3439271,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"awscluster-reconciler","worker count":1}
{"level":"info","ts":1594308345.2519808,"msg":"Validating AWSCluster create"}
{"level":"info","ts":1594308345.257065,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308345.2595742,"logger":"controlplane-reconciler","msg":"adding finalizer","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308345.2624917,"msg":"Validating AWSCluster update"}
{"level":"info","ts":1594308345.264924,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308345.376223,"msg":"Validating AWSNodePool create"}
{"level":"info","ts":1594308345.3793032,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308345.382418,"logger":"nodepool-reconciler","msg":"adding finalizer","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308345.384661,"msg":"Validating AWSNodePool update"}
{"level":"info","ts":1594308345.387612,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308346.1197634,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.SetProvisioningStateCommand"}
{"level":"info","ts":1594308346.1245346,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308346.125778,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308346.1586978,"msg":"Planning provisioning"}
{"level":"info","ts":1594308346.1587348,"logger":"nodepool-reconciler","msg":"executing command","command":"*nodepool.SetProvisioningStateCommand"}
{"level":"info","ts":1594308346.1637373,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308346.1641495,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308346.6994834,"msg":"Planning provisioning"}
{"level":"info","ts":1594308346.6995149,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308350.6837356,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateSecurityGroupCommand"}
{"level":"info","ts":1594308351.0741904,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateSecurityGroupCommand"}
{"level":"info","ts":1594308351.6007662,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateEtcdVolumeCommand"}
{"level":"info","ts":1594308351.7904568,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateEtcdVolumeCommand"}
{"level":"info","ts":1594308351.9867935,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateEtcdVolumeCommand"}
{"level":"info","ts":1594308352.1732028,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkLoadBalancerCommand"}
{"level":"info","ts":1594308352.6873991,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateTargetGroupCommand"}
{"level":"info","ts":1594308352.8467724,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateRootCASecretCommand"}
{"level":"info","ts":1594308352.864171,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateKeyPairSecretCommand"}
{"level":"info","ts":1594308352.8721669,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateSSHKeySecretCommand"}
{"level":"info","ts":1594308352.8793178,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateHubMembershipCommand"}
{"level":"info","ts":1594308353.4512975,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308353.4515018,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308354.3009841,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupIngressCommand"}
{"level":"info","ts":1594308354.4307039,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupEgressCommand"}
{"level":"info","ts":1594308354.568091,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupIngressCommand"}
{"level":"info","ts":1594308354.7273095,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.AddSecurityGroupEgressCommand"}
{"level":"info","ts":1594308354.881284,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkInterfaceCommand"}
{"level":"info","ts":1594308355.101913,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkInterfaceCommand"}
{"level":"info","ts":1594308355.3225775,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateNetworkInterfaceCommand"}
{"level":"info","ts":1594308355.5158548,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateListenerCommand"}
{"level":"info","ts":1594308355.5431557,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAdminCertSecretCommand"}
{"level":"info","ts":1594308355.5547044,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308355.554765,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308356.164133,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308356.523212,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308356.633643,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308356.747063,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308356.8503606,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308356.8505132,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308357.0783868,"msg":"Planning provisioning"}
{"level":"info","ts":1594308357.2415674,"logger":"nodepool-reconciler","msg":"executing command","command":"*nodepool.CreateLaunchTemplateCommand"}
{"level":"info","ts":1594308357.3434715,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308357.3441882,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308357.5997462,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308358.355025,"msg":"Planning provisioning"}
{"level":"info","ts":1594308358.3550591,"logger":"nodepool-reconciler","msg":"executing command","command":"*nodepool.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308358.8569715,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308359.884949,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.CreateAutoScalingGroupCommand"}
{"level":"info","ts":1594308597.8016522,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308598.6790671,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.ApplyAddonsCommand"}
{"level":"info","ts":1594308604.1369157,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308606.138593,"msg":"Planning provisioning"}
{"level":"info","ts":1594308606.1387239,"msg":"Waiting for nodes to join the cluster","readyNodes":0,"expected":3}
{"level":"info","ts":1594308606.1387453,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308616.1390402,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308617.4870696,"msg":"Planning provisioning"}
{"level":"info","ts":1594308617.4871018,"msg":"Waiting for nodes to join the cluster","readyNodes":0,"expected":3}
{"level":"info","ts":1594308617.4871092,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308618.3437061,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.InstallConnectAgentCommand"}
{"level":"info","ts":1594308623.9253933,"logger":"controlplane-reconciler","msg":"reconciliation finished but more work is needed","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308623.925449,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308624.7936692,"logger":"controlplane-reconciler","msg":"executing command","command":"*controlplane.SetProvisionedStateCommand"}
{"level":"info","ts":1594308624.799627,"logger":"controlplane-reconciler","msg":"reconciliation finished","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308624.7996898,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308625.5904305,"logger":"controlplane-reconciler","msg":"reconciliation finished","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308627.4873798,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308628.643484,"msg":"Planning provisioning"}
{"level":"info","ts":1594308628.6435158,"msg":"Waiting for nodes to join the cluster","readyNodes":1,"expected":3}
{"level":"info","ts":1594308628.6435242,"logger":"nodepool-reconciler","msg":"reconciliation finished but more work is needed","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308633.9257207,"logger":"controlplane-reconciler","msg":"starting reconciliation","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308634.866224,"logger":"controlplane-reconciler","msg":"reconciliation finished","awscluster":"default/cluster-0"}
{"level":"info","ts":1594308638.6436768,"logger":"nodepool-reconciler","msg":"starting reconciliation","awsnodepool":"default/pool-0"}
{"level":"info","ts":1594308639.952066,"msg":"Planning provisioning"}
{"level":"info","ts":1594308639.9520943,"msg":"All nodes ready, switching to 'Provisioned' state"}

With our GKE on AWS user cluster fully provisioned, we will see in the GCP console that the cluster has automatically been registered with the GKE Hub.

user cluster registered

As part of the bootstrap process, our cluster is registered with the GKE Hub using the service account key provided, as well as deploying a Connect agent into the user cluster. After the connection is established, the Connect Agent service can exchange account credentials, technical details, and metadata about connected infrastructure and workloads necessary to manage them with Google Cloud, including the details of resources, applications, and hardware.

All that remains is to obtain our kubeconfig for the user cluster.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   anthos-gke aws clusters get-credentials cluster-0

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectx gke_aws_default_cluster-0_gke-dfccaa67 && kubectl cluster-info
Switched to context "gke_aws_default_cluster-0_gke-dfccaa67".
Kubernetes master is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com
CoreDNS is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
KubeDNSUpstream is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns-upstream:dns/proxy
Metrics-server is running at https://gke-dfccaa67-controlplane-f26573d5bef4bba0.elb.eu-west-1.amazonaws.com/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy

Workloads

Now we have our user cluster deployed, we can start deploying workloads and see how the AWS Controller dynamically provisions AWS resources subsequent to our Kubernetes objects.

For this demonstration, we’ll be using the Online Boutique demonstration application to illustrate a microservices application leveraging underlying integrations to facilitate persistent storage, load balancing and routing for GKE on AWS.

env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml

Recall the Connect agent running in the user cluster, which is sending details and metadata about the user cluster infrastructure and workloads to the GKE Hub. Consequently, all of our workloads are viewable in the Kubernetes Engine dashboard just like if they were running in any other GKE cluster.

management cluster SGs

Load Balancing

With our Online Boutique application deployed, a LoadBalancer service has been created to make it publically available. However, there are some steps necessary in order for the AWS Controller to facilitate routing to the services.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl get svc frontend-external
NAME                            TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
service/frontend-external       LoadBalancer   10.1.113.141   <pending>     80:31128/TCP   17s

First, in order for the AWS Controller to configure our load balancer, we need to tag the subnet with the cluster ID to ensure correct placement. Depending on whether we want our Load Balancer to be public or private, the tags on the subnets allow the AWS Controller to find the correct cluster in which the GKE cluster has been deployed. By default, the AWS Controller will create a Classic ELB in the public subnet, in the according subnet pertinent to the GKE cluster’s placement. If we want to deploy an Network Load Balancer, or create a Load Balancer in the private subnet, we can annotate the Service with service.beta.kubernetes.io/aws-load-balancer-type: "nlb", and service.beta.kubernetes.io/aws-load-balancer-internal: "true" respectively.

aws ec2 create-tags \
   --resources $SUBNET_ID \
   --tags Key=kubernetes.io/cluster/$CLUSTER_ID,Value=shared

Now, if we watch our frontend-external service, we get a public endpoint, with an ELB created in AWS:

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl get svc frontend-external
NAME                TYPE           CLUSTER-IP     EXTERNAL-IP                                                              PORT(S)        AGE
frontend-external   LoadBalancer   10.1.113.141   a52ebaf96215a4c489f7b47c0eafb4f1-523240740.eu-west-1.elb.amazonaws.com   80:31128/TCP   18m

frontend-external-lb

online-boutique

Storage

Persistent storage can be created for workloads within GKE on AWS using PersistentVolume, PersistentVolumeClaim and StorageClass resources, providing persistent file and block storage.

Creating a PersistentVolumeClaim without the field spec.storageClassName set provisions a gp2 volume using the default GKE on AWS EBS CSI Driver StorageClass.

First, let’s create our PersistentVolumeClaim:

env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: redis-pvc
spec:
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 1Gi
EOF

Then patch our redis deployment:

env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl patch deploy redis-cart --patch '{
 "spec": {
     "template": {
         "spec": {
             "containers": [{
                 "name": "redis",
                 "volumeMounts": [{
                     "mountPath": "/data",
                     "name": "redis-pvc"
                 }]
             }],
             "volumes": [{
                 "name": "redis-pvc",
                 "persistentVolumeClaim": {
                     "claimName": "redis-pvc"
                 }
             }]
         }
     }
 }
}'

Now we can see the persistent volume claim is bound to the volume, with an EBS created in AWS.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl get pvc,pv
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/redis-pvc   Bound    pvc-78366255-3a17-4d16-8665-c9d5c6c8a88e   1Gi        RWO            standard-rwo   35m

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
persistentvolume/pvc-78366255-3a17-4d16-8665-c9d5c6c8a88e   1Gi        RWO            Delete           Bound    default/redis-pvc   standard-rwo            13m

persistent-volume

Autoscaling

Anthos GKE on AWS also provides for Cluster autoscaling to benefit from the elasticity of dynamically provisioning nodes in accordance to demand. This ensures that there are resources available relative to resource requests from workloads and that infrastructure is scaled in to optimise cost. Again, this is all achieved through custom resource definitions, with the AWSNodePool resource defining the minNodeCount and maxNodeCount, and the gke-aws-cluster-operator adjusting capacity of the AWS auto scaling group. All this aims to simplify scaling logic and reducing cognitive overhead when it comes to cost-efficient and performant compute infrastructure.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectx gke_aws_management_gke-404767c1 && kubectl patch AWSNodePool cluster-0-pool-0 --type=json -p='[{"op": "replace", "path": "/spec/minNodeCount", "value": 4}]'
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0 patched

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectl get AWSNodepool
NAME               CLUSTER     STATE         AGE   VERSION
cluster-0-pool-0   cluster-0   Resizing      38m   1.16.9-gke.12
cluster-0-pool-1   cluster-0   Provisioned   38m   1.16.9-gke.12
cluster-0-pool-2   cluster-0   Provisioned   38m   1.16.9-gke.12

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectx gke_aws_default_cluster-0_gke-dfccaa67 && kubectl get nodes
Switched to context "gke_aws_default_cluster-0_gke-dfccaa67".
NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-1-172.eu-west-1.compute.internal   Ready    <none>   35m   v1.16.9-gke.12
ip-10-0-1-204.eu-west-1.compute.internal   Ready    <none>   35m   v1.16.9-gke.12
ip-10-0-1-59.eu-west-1.compute.internal    Ready    <none>   34m   v1.16.9-gke.12
ip-10-0-1-78.eu-west-1.compute.internal    Ready    <none>   53s   v1.16.9-gke.12
ip-10-0-2-92.eu-west-1.compute.internal    Ready    <none>   35m   v1.16.9-gke.12
ip-10-0-2-93.eu-west-1.compute.internal    Ready    <none>   35m   v1.16.9-gke.12
ip-10-0-2-94.eu-west-1.compute.internal    Ready    <none>   35m   v1.16.9-gke.12
ip-10-0-3-139.eu-west-1.compute.internal   Ready    <none>   34m   v1.16.9-gke.12
ip-10-0-3-192.eu-west-1.compute.internal   Ready    <none>   35m   v1.16.9-gke.12
ip-10-0-3-208.eu-west-1.compute.internal   Ready    <none>   34m   v1.16.9-gke.12

Scaling the NodePool out causes an additional EC2 instance to be launched.

user-cluster-scale-out

Reducing the minNodeCount will cause our NodePool to scale in as the current resources requested by the workloads do not constitute the number of nodes currently deployed.

$ env HTTP_PROXY=http://127.0.0.1:8118 \
   kubectx gke_aws_management_gke-404767c1 && kubectl patch AWSNodePool cluster-0-pool-0 --type=json -p='[{"op": "replace", "path": "/spec/minNodeCount", "value": 1}]'
Switched to context "gke_aws_management_gke-404767c1".
awsnodepool.multicloud.cluster.gke.io/cluster-0-pool-0 patched

Now, we see the Cluster Autoscaler kick in and scale our NodePool down by terminating a node to optimise utilisation.

user-cluster-scale-in

Logging and Monitoring

We saw earlier how the Connect agent sends metadata of workloads running in the cluster back to GCP. As this metadata is collated we can begin to get a perspective of all of our workloads running across all of our GKE registered clusters. This is a key value-add feature of Anthos, as we are able to consolidate management and orchestration of our clusters and workloads through a single-pane-of-glass.

GKE On-Prem provides in-cluster Stackdriver agents, a combination of Fluent StatefulSets and Daemonsets, as well as Prometheus and Stackdriver sidecars, which collect logs and metrics for system components and sends this data to the Cloud Logging API.

At the time of writing, logging and monitoring for GKE on AWS is yet to be fully implemented. Support is due in a future release, with a similar implementation to GKE, where a gke-metrics-agent will deployed via a DaemonSet on the user cluster which scrapes metrics from the kubelet and pushes to Cloud Monitoring APIs. A FluentBit pipeline will also collect logs from all Containerd components and the kubelet. We can see some of this in action, with our application logs from AWS appearing in Cloud Logging.

gcp-frontend-logs

This means we get telemetry data and logs from both on-premises and cloud deployments, all viewable through the GCP Console. When it comes to performing root cause analysis, access to logs is critical hence consolidation of multiple clusters removes the overhead of managing and traversing multiple systems in order to identify issues and correlating cascading errors.

As well as Cloud Logging & Monitoring, Anthos also integrates with Elastic Stack, Splunk and Datadog, offering a variety of options to customise your approach to observability, complementing existing solutions in on-premises or cloud environments.

You might opt to disable GCP Logging and Monitoring, and instead use Grafana and Prometheus as your monitoring solutions. These are currently available as optional parts of the GKE On-Prem installation, so integrating these OSS solutions with GKE on AWS is also an option depending on your use case. One must assess the trade offs of opting out of GCP’s supported logging and monitoring solutions as well as maintaining your own monitoring stack, versus the costs of transferring telemetry data across from AWS into GCP. It would be possible to gain multi-cluster observability on top of Prometheus and Grafana through Thanos, but again this would require investment in implementing and maintaining such a deployment.

Future

GKE for AWS offers the first opportunity to experience GKE on another cloud provider. Later this year we should see a preview of GKE running on Azure, furthering the boundaries of running applications on GKE outside of GCP. In addition to this, the on-premises offering will be furthered by having a bare-metal deployment option to run Anthos on physical servers. Again, this allows for companies to use existing hardware investments, paving the way for the adoption and standardisation of Kubernetes, with the option of running orthogonally with a cloud provider to optimise capex and opex investment.

With competition from other solutions such as VMware Tanzu and Azure Arc, as well as managed OpenShift offerings the hybrid and multi-cloud orchestration marketplace is maturing to a place where enterprises have a multitude of options when it comes to leveraging the cloud providers of their choice, and how they want to consolidate management and observability of workloads across disparate environments.

Anthos is not only a delivery mechanism for running Kubernetes on your infrastructure, but also provides the components required to optimise the experience of Kubernetes. Whether it’s improving continuous delivery and environmental consistency through Config Management and Policy Controller, or having fine-grained control over traffic shaping and centralised SLO and SLA orchestration using Anthos Service Mesh, Anthos provides enterprises with a repertoire from which they can migrate to, and leverage the power from Kubernetes.

Coming up, we’ll have more content on Anthos around:

  • Running applications across multi-cloud with GKE and Anthos Service Mesh
  • How to register EKS and AKS clusters with the GKE Hub via attached clusters
  • Ensuring conformance through Policy Controller and OPA
  • Anthos on Azure

Get in touch

If you’re wanting to know more about Anthos or running hybrid and multi-cloud Kubernetes, Jetstack offers consulting and subscription which can help you in your investigation and adoption in a variety of ways. Let us know if you’re interested in a workshop or working together to dive deeper into Anthos.