DNS and Ingress Implementation
To understand the design decisions made regarding DNS and Ingress implementation, we'll first explain a few Kubernetes objects relevant to this discussion.
A Kubernetes service is a way to expose an application running on a group of Pods as a network service. Since Pods get their own IP address in the cluster and the addresses can change as Pods come and go, Services provide a layer of abstraction.
A Service gets its own IP address at the time of creation and does not change for the life of the service. The traffic forwarding rules for a Service are updated dynamically as Pods created and deleted with requests always being forwarded towards a running Pod.
A Kubernetes component
kube-proxy allows communication by managing forwarding rules using IPVS or iptables. A couple of Kubernetes Service types are explained below, namely,
- ClusterIP (default)
ClusterIP service exposes it on an internal IP of the cluster and is only reachable from within the cluster.
An example Service definition with the name
sample-service that listens on TCP port 80 & forwards traffic to Pods with the label
app: sample-app with a default Service Type of
ClusterIP is defined below -
apiVersion: v1 kind: Service metadata: name: sample-service spec: selector: app: sample-app ports: - protocol: TCP port: 80 type: ClusterIP
NodePort service causes all nodes in the cluster to listen on a specific port defined in the Service definition. Traffic received on that port is forwarded to the application, regardless of what node the application is running on, that uses the label defined in the selector. The port serves as a proxy to the application. If the specific port is undefined in the Service definition, Kubernetes chooses one between the range 30000-32767.
An example Service definition with the name
sample-service that listens on TCP port 80 & forwards traffic to Pods with the label
app: sample-app with a Service Type of
NodePort (meaning, the nodes in the cluster listen on a specific port) is defined below -
apiVersion: v1 kind: Service metadata: name: sample-service spec: selector: app: sample-app ports: - port: 80 targetPort: 80 type: NodePort
With the use of a load balancer (LB) in front of the cluster and any hostname-based or path-based routing and/or TLS termination can be handled at the LB. The LB examines and forwards requests to the appropriate port on a node in the cluster.
Ingress is a Kubernetes object that allows for definition for HTTP-based routing rules to forward traffic to services. The Ingress definition only includes the configuration of these rules, while a separate component called an Ingress Controller watches multiple Ingress objects and updates its own configuration as the Ingress objects are created or deleted or updated.
An example Ingress definition with the name
example-ingress with a rule for HTTP requests to host
foo.vt.edu be forwarded to a service
app1 on port
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-ingress spec: rules: - host: foo.vt.edu http: paths: - backend: serviceName: app1 servicePort: 80
Proxy Ingress Controllers
A Proxy Ingress Controller, such as a Traefik or Nginx Ingress Controller, is itself a reverse proxy and exposed as a NodePort service. This controller then watches for Ingress objects and updates its routing configuration based on the rules defined in those Ingress objects. The traffic received by the controller is hence forwarded to the respective ClusterIP services in the backend.
Using a load balancer in front of the cluster, HTTP routing is moved into the cluster with the load balancer sending traffic to nodes in the cluster using TCP passthrough & TLS termination occurring within the cluster.
Application Load Balancer Ingress Controllers
Compared to Proxy Ingress Controllers, an Application Load Balancer (ALB) Ingress Controller does not run as a reverse proxy in the cluster. The controller still watches for Ingress objects but interacts with the AWS API to create and configure ALBs external to the cluster which forwards traffic into the cluster.
By default, each Ingress object creates its own ALB. The Ingress object defines forwarding rules and these rules in turn create a separate target group in the ALB for forwarding traffic to either the correct Instance or IP address as explained below -
- Instance mode: Traffic is forwarded to the correct EC2 instance where services are exposed with the type of
NodePort. The ALB controller ensures the port mapping configuration.
- IP mode: Traffic is sent directly to the pod without using a service. This mode requires using the AWS VPC CNI (Container Network Interface) driver so that pods can be given an Elastic Network Interface (ENI) and delegated an IP address in the VPC. A downside to this approach is running into ENI usage limits for a particular instance type.
The following example shows an Ingress object with three forwarding rules, namely,
/accounts. The three rules create three target groups,
ServiceC, respectively. Target groups
ServiceB use Instance mode and forward traffic to services of type
NodePort. Target group
ServiceC uses IP mode and forwards traffic directly to the pods for the application. The ALB Ingress Controller also watches for Ingress objects and provisions ALBs.
TLS/SSL Certificate Management
In order to automate managing the provisioning, renewal and usage of certificates a few mechanisms exist.
InCommon provides identity and access services for many higher ed institutions, including Virginia Tech. All TLS certificates provided by Secure Identity Services (SIS) are issued by InCommon. InCommon does have ACME support, but requires authentication using External Account Binding (EAB). While the official ACME clients support EAB authentication, other tools that provide ACME/LetsEncrypt certificate support may have limited support.
LetsEncrypt is a public service that provides free and signed SSL/TLS certificates. It leverages the ACME protocol to request, prove ownership, and issue signed certificates.
AWS Certificate Manager
AWS Certificate Manager (ACM) is an AWS service that provides free certificate storage and issuance. The private keys for all TLS certificates are stored securely and are in-accessible to customers, so unable to be used in custom applications. However, ACM certs can be leveraged in many other AWS services, such as all load balancers. AWS ACM does not leverage the ACME protocol, but has APIs and has support from other higher-level tools, such as Terraform.
Domain Validation Methods
All automated certificate providers require validation that the requester has ownership of the requested names. There are three main methods used across the various providers.
- HTTP-based challenge: The requester is given a token and is required to place it at a known location on the web server for the requested name. Once placed, the provider that attempts to retrieve the file.
- DNS-based challenge: The requester is required to create a DNS record with a provider-provided name and value. Once created, the provider fetches the DNS record.
- Email-based challenge: The provider sends emails to one or more email addresses for the requested name (or parent addresses) with a link to a website in which the receiver can validate the request.
The following table outlines the challenges which provider supports what challenge methods:
At Virginia Tech
At the time of writing, Virginia Tech’s DNS service is managed in two ways - centralized or delegated. For the vast majority of DNS records, DNS is managed by a core DNS team. Requests are made by designated network liaisons, validated by the DNS team, and then applied at the next restart of the nameservers. At a minimum, these restarts occur on every Tuesday and Thursday. History has shown they occur much more frequently.
In a limited number of instances, zone delegation has been authorized to allow DNS to be managed by external DNS providers, including AWS Route53. With very few exceptions, the delegated zones are under the *.cloud.vt.edu namespace. Once delegated, the owning AWS account can create, update, or delete records through the Route53 service and see them on the global internet almost instantly.
DNS management in AWS is provided by the AWS Route53 service. To fully control the records, domains or subdomains are delegated from a parent zone. As mentioned earlier, most of the names have come from the *.cloud.vt.edu namespace.
A useful Route53 feature is the ability to create “alias” records. This allows a public name, such as app1.vt.edu, to resolve directly to the IP addresses used by a load balancer without using CNAME records. This saves a DNS round trip and helps “hide” the fact that an AWS load balancer is being used.
Route53 also supports geolocation-based responses, allowing customers in one location to get different DNS answers from those in another location. This would be important for applications that need quick response times and have customers across the country or worldwide.
There are three major problems to be solved when operating a Kubernetes cluster in the public cloud -
- Getting traffic to the load balancer
- Where TLS termination occurs
Getting traffic to the load balancer
Getting traffic from the internet to the load balancer in front of the cluster is solved by managing DNS. To manage DNS, the Platform team has two options -
- Controlling the requested names
- Use CNAMEs
Controlling the requested names
If the Platform team controls the DNS records for applications, we can ensure the names resolve to the AWS load balancers. If the name is delegated to AWS Route53, the application records can be alias records to the AWS load balancers in front of the cluster.
The advantage of this approach being, changes to the application DNS records can be made instantaneously, without dependencies on external teams, if load balancers need to be recreated, rebalanced, etc.
While complete control of the records is great, this approach would increase costs with each hosted zone being created in Route53 as more and more applications move to the platform. Another problem might cause us to run into complex delegation structures where an application team might want one name to be used on the platform and another managed by VT. For example, app1.vt.edu is hosted on the platform, while app2.app1.vt.edu is hosted on premises.
The second option is for the app to use a CNAME to a cluster-level hostname, which points to the AWS load balancers in front of the cluster. The records could use the following pattern -
- App record (app1.vt.edu) - has no A/AAAA records, but a CNAME record pointing to a cluster record
- Cluster record (k8s-1.aws.clusters.platform.it.vt.edu) - this name would be managed in AWS and resolves to the load balancers in front of the cluster.
This approach has several advantages compared to controlling records for applications. VT Hostmaster maintains control over the application records while the Platform team only maintains control over the cluster records. As a result, the number of delegated zones needed is greatly reduced making this design cheaper. Complex delegation structures are also supported because the Platform team does not own the application records.
The cluster record structure identifies the cluster (test, dev or prod) and a cloud provider. It also supports the possibility of multiple clusters in multiple clouds. The cloud-provider level also allows that zone to be delegated to that cloud’s DNS service, helping streamline DNS updates. The structure being -
Where TLS Termination occurs
There are two main options for TLS termination -
- External to the cluster
- Internal to the cluster
External to the cluster
For TLS termination occurring external to the cluster, it would have to be done on the AWS load balancers in front of the cluster. For supporting TLS termination on the load balancers, AWS requires that the certificates must be loaded in AWS Certificate Manager (ACM).
There are two options to get the certificates into ACM -
- Use certificates provisioned by ACM
- Import externally provisioned certificates into ACM
With AWS only supporting DNS and Email based challenges for domain validation, using ACM provisioned certificates would be complicated because there is no existing tooling for automating domain validation if application names are not delegated to the Platform team.
While automating of importing externally provisioned certificates into ACM is an option, service quota limits would be an issue with only 25 certificates being allowed to be associated with a load balancer. With every 26th certificate requiring a new load balancer, coordinating which app names should point to which load balancers and sharing of new load balancers with the ingress controller would be difficult.
Internal to the cluster
For TLS termination occurring internal to the cluster, it would be accomplished at reverse proxies acting as ingress controllers. We can also use externally provisioned certificates by either InCommon or LetsEncrypt in this approach.
Existing tooling already exists to provision certificates and make them available to ingress controllers. Many proxies have the ability to provision the certs themselves. DNS and load balancer structure would also be more simpler. Downside being that private keys would be theoretically accessible on the file system but access to them can be locked down using policies.
Preferred methods for DNS and TLS termination
Leveraging CNAMEs to get traffic to the cluster load balancers should be preferred. There are many valid reasons for the VT DNS team to keep tight control over DNS. Giving full ownership to the Platform Team for all app names is simply too risky.
While the ALB integrations theoretically have support to send traffic directly to a pod, it bypasses Kubernetes services, which removes many other integration points such as advanced routing mechanisms with Ingress.
Load balancers charge both a per-hour and per-usage cost. The amount of in-cluster infrastructure would remain the same to support in-cluster routing. Moving TLS termination external to the cluster would only increase costs, both in monthly infrastructure costs and in development time. As such, using “dumb” load balancers and having both TLS termination and routing within the cluster itself should be preferred.
Combining the outcomes from the previous sections, we arrive at a Kubernetes platform that is commonly seen in many deployment environments. This model uses the CNAME approach to get traffic to the cluster and then supports TLS termination and routing at the ingress level within the cluster itself. This helps keep costs down, encourages TLS everywhere, and provides better support for Kubernetes-native tooling and application deployment models.
As such, the final structure could be represented using the diagram below. Note that while application-specific names (e.g., app1.vt.edu) will be fully supported, cluster-specific wildcarded “vanity URL” that teams can be used for quick usage and prototyping without needing to further configure DNS. That is also represented below.