Terraforming the EKS cluster
[This blog will hugely help EKS cluster administrators in solving some of the problems faced in setting up a cluster when they already have the AWS resources that they would want the cluster to interact with. And also those who are struggling with post-installation automation (or workflow) that needs to be built into their Terraform workflow itself in a controlled fashion.]
Note: “Terraform” will be used both as a noun (Terraform, capital T) and a verb (terraform, small t) in this document.
The moment we started our journey toward Kubernetes for our flagship product Freshdesk, the one thing that we had on very high priority was to automate what could be automated from the word go. And obviously, Terraform gave us the ability to automate the infrastructure. So to run any app on Kubernetes we would require a Kubernetes cluster for that, and we chose to go with EKS (AWS Elastic Kubernetes Service) as it provided us with a Managed Control Plane. Being new to Kubernetes meant we needed to iterate with changes like the type of instances, security groups, IAM role, bootstrap scripts, subnets, AMI, etc., and to do that fast we needed to automate it using Terraform.
A couple of things to know before we move on: Freshdesk being a large product, we could not afford to move lock, stock, and barrel to Kubernetes from our existing infrastructure which was running on AWS OpsWorks. So we took a more staggered approach to moving, hence we knew at some point we would be running both OpsWorks and Kubernetes workloads side by side. This led us to model our Terraform templates to reuse some already created resources like VPC, IAM roles, etc.
Being new to EKS (or Kubernetes), we wanted to get to know pieces of Terraform-provided resources and their attributes to create the EKS cluster. This led us on the path to first getting to work with the Terraform-provided EKS-specific resources directly and then to breaking them up into modules (or using third-party modules). So the sections below describe, in a fair amount of detail, the path toward this end.
In the beginning
We started to terraform the EKS cluster setup, with an aim to get the Cluster up and running with self-managed Autoscaling node groups, and security groups and roles tailored for our needs. So the version 1.0 of the EKS Terraform template had everything in it. Here are the comments from the first Terraform template.
Notice how we crammed everything into one large main.tf file. This worked as we were just starting off and we had to:
- Use the same VPC that we already had
- Create subnets using the prefixes
- An EKS cluster with no public access
- Autoscaling group that spans multiple subnets and hence multiple AZs
- Create an NLB for the private link creation
- A lambda that periodically checks any IP changes on API server and updates the Target groups of the NLB
To make the Kubernetes cluster further usable for our purpose we had to utilize the output of this template and work on it further. Here is how it panned out.
Cluster Autoscaler
We needed the Cluster Autoscaler component to start, as we would otherwise have to manage to scale manually. We created Cluster Autoscaler YAML from a template file (cluster_autoscaler.yml.tpl) with placeholders populated with Terraform’s template_file data source like the following:
And then used that data source to create the actual cluster_autoscaler.yml file.
Setup script
There are some post-creation steps to get our desired working state. These steps, which need to be performed after the cluster is created, include:
- Disable SNAT;
- Adding Prometheus annotation for scraping AWS CNI metrics.
The setup script would actually do more than that: it would also terraform(plan and apply), and set up the environment for the Kubernetes CLI(kubectl) access.
These templates and script enabled us to quickly hit the ground running, but they had some shortcomings.
- They were very specific to our specific needs, not in the philosophy of DRY
- There was some issue with ASGs’ operations with the Cluster Autoscalers operations
Going modular
Terraform modules provide for the single monolithic template described above to be generalized into reusable, self-contained templates. These self-contained templates would need to have a well-defined interface, meaning Terraform variables and outputs, which allow you to chain these modules together to get to the desired template. This is how we broke down that template into modules.
Subnet creation module
As mentioned earlier, we had to create subnets in the existing VPC for co-existing with other infra in that VPC. So we had to squeeze subnets into the available CIDR gaps in the VPC. We used vpc-free to look for the gaps of available CIDRs. We took some of these CIDRs and gave it to this subnet creation module, and it would iterate over the available AZs to create subnets on them.
This module would create the Kubernetes essential tags for the subnets by merging them into the user-provided tags.
We also have a module that would create VPC and subnets from the ground up but that was not usable here.
EKS master module
The EKS master template is fairly straightforward. This module provides download URLs for the kubectl, aws-iam-authenticator since we pass the cluster version we keep a version based map of those urls here. The reason for having these tools’ urls here is to use them in post-setup or update scripts, which we will discuss below.
A few things to notice in the above output are:
- cluster_scaler_yaml in the outputs section, this contains the entire YAML of the Cluster Autoscaler deployment. We moved the YAML generation here because the Cluster Autoscaler too moves with the version of the Cluster we would be running.
- This module does not create anything but a basic EKS cluster, so if we want to add any additional policies or security groups we would pass it as inputs, for which we already have the input variables defined.
- The output of subnets creation module, which is a list of subnet ids, can be passed directly into this module as inputs.
EKS nodes module
There was a problem that we faced when having an Autoscaling Group (one single ASG) spans over multiple AZs and enabling Cluster Autoscaler to manage the ASG’s desired capacity. The Cluster Autoscaler at times might just prefer to scale the nodes in one AZ that the ASG might not approve of, and its Autoscaling Group would rebalance the instances according to its modus operandi. This thrashes the pods without them going through graceful shutdown as we had configured. It turns out it is documented in the Cluster Autoscaler’s readme for the AWS provider:
Cluster autoscaler does not support Auto Scaling Groups which span multiple Availability Zones; instead you should use an Auto Scaling Group for each Availability Zone and enable the—balance-similar-node-groups feature. If you do use a single Auto Scaling Group that spans multiple Availability Zones you will find that AWS unexpectedly terminates nodes without them being drained because of the rebalancing feature.
This led to a lot of surprises when we were scaling our application pods, but had interference from the AWS Autoscaling Groups’ rebalancing feature. This led us to create a module for the EKS nodes that allowed us an ASG per AZ.
The number of Autoscaling Groups are defined based on vpc_zone_identifier list, which is in turn a list of the subnets, hence the distribution of the Autoscaling Groups depends on the distribution of subnets.
Also, note that we ignored changes to the desired_capacity and vpc_zone_identifier. First, desired_capacity—since the Cluster Autoscaler is responsible for managing the desired capacity of ASGs so it was important for Terraform to get out of the way of managing it. Second, vpc_zone_identifier—we sometimes would want to create additional subnets and associate them to the ASG, when we do run out of IPs on an assigned subnet, this ‘ignore clause’ ensures that the subnet list on the ASG is not reset by a Terraform run.
So eventually, we settled for the following inputs and outputs for the EKS nodes module.
At the very basic level the EKS nodes module just creates node groups (or ASG) provided with the subnets, and registers with the EKS cluster, details for which are provided as inputs. But we might want to attach other policies and nodes’ IAM role which could be provided through node_associated_policies. Also, additional security groups could be provided too.
NLB for private access
We have internal operations tools that interact with the Kubernetes cluster, and since our cluster is by default private we need to provide access to these tools. These tools might be in another VPC or another AWS account altogether, so to provide such access we create NLB and an endpoint service for every cluster. This allows us to set up Private Link access to our Kubernetes cluster through these NLBs.
For this purpose, we have a module that creates an NLB and an Endpoint service.
The targets on the NLB’s target group are added by a lambda—the lambda matches a tag on the NLB to the cluster’s name, and gets the corresponding EKS cluster master’s private IPs and updates the targets. Though it will be interesting to talk about here, it is really out of scope for this topic.
Why not community modules?
Yes, we did start with what was available in the community! Like the initial monolithic template itself inspired by what was documented here, this document was an excellent resource for us to understand how to use Terraform to build an EKS cluster from the ground up. And when we were ready to modularize, we were heavily inspired by CloudPosse’s suite of Terraform templates, and the idea to break down into separate modules for EKS Master and EKS Nodes is drawn from their idea as you can see by clicking those links. These modules are awesome but we felt it was better to have more control over what we do in our modules as at times these third-party modules add more complexity internally trying to be too generic.
Support scripts
We have two inline scripts that run as part of the Terraform run. One script, the setup.sh, runs the first time the Terraform template runs, and the other script, update.sh, is run the first time, and every time we would want to update something in the EKS cluster.
Here is how we run the setup.sh script: we set it up as Terraform null_resource with a local_exec provisioner, as you can see below. This script sets up all the defaults for a production-ready Freshworks Kubernetes cluster.
- Disable SNAT
- Set up Prometheus annotations for CNI plugin
- Deploy Cluster Autoscaler
- Deploy Gatekeeper
- Deploy Metrics Server
In the same way, we would run the update.sh script too, and on the first run the update.sh sets up all the required Demonsets for the logging, monitoring, etc. Notice the “_1” at the end of the null_resource name; we would change or increment this number whenever we would want to update something.
Putting it all together and workflow
So to create an EKS cluster we put together all the modules described above and create one template that will suit our requirement. Usually, we might want to create some specific IAM policies that are required for a product and pass to the modules through this template. Also, there might be some security groups too that we would want to be whitelisted on the nodes’ Security Group. On the whole, most of our EKS cluster creation templates will have these modules at the very basic level.
One important thing that we want to state here is that all our module references are using git tag so that any changes to the modules themselves do not affect our current state, and this allows the module developers to iterate on the module independently. Since we have staging and production environments both managed by Terraform, we can upgrade to newer versions of these modules in a phased manner with little breakage.
We use Terraform Enterprise for all our Terraform workflows, so the complete infrastructure that is in place is GitOps. Terraform Enterprise, apart from providing us with GitOps, also provides user management and secrets management, which allows us control over who actually does what to the infra.