Containers, Docker, and Kubernetes Part 2
In Part 1 of this series I touched on containers, Docker, and how these techonologies are rapidly redefining operations and infrastructure across the industry. However, just knowing about containers and Docker isn’t enough to know how to apply these technologies to your stack. Here, in Part 2 of this blog series, I will go over Kubernetes, the container orchestration tool I’ve chosen to provide the support structure for fully moving to a containerized infrastructure.
Google is an avid user of containers, running billions of them on hundreds of thousands of servers over many years. Over time they’ve built up internal tools to help manage this massive infrastructure, a tool suite they call Borg. Over the past few years, many on the Borg team have taken the lessons learned along the way and applied them in a new orchestration tool they call Kubernetes, releasing it to the public as an open source project.
Kubernetes, like Borg, is a suite of tools and services that work together to provide answers to all of the questions I posed at the end of Part 1. This is a complex system with many moving parts, but it is production ready and has already been heavily tested and used by many companies outside of Google. Learning such a system is not trivial, but the Kubernetes project has some fantastic documentation. Every aspect of the system is covered including multiple examples, suggestions, and possible error cases. They even provide an in-browser interactive tutorial that I highly recommend running through.
All that said, you’ll quickly realize that there are many parts of Kubernetes to grok, but that’s ok. You don’t need to understand every aspect of the system to successfuly use it. For this post I’m going to go over what I’ve learned about Kubernetes, its basic concepts, and what pieces I currently use.
At the highest level, a Kubernetes cluster will consist of many Resources. These resources can be defined in JSON or YAML, though I personally prefer YAML as I find it easier to both read and write, and it supports commenting sections of the configuration.
I won’t cover all of the available Resource types, as that list is quite large and growing, but will instead cover the Resources that you need to know to set up an application in the Kubernetes way. Specifically, the resources I’ll cover here are: Pod, Deployment, Service, and Namespace. I’ll also cover a few other concepts and layers that these resources manage or make use of.
First off is the Node. This is nothing more than the server or virtual machine on which the Kubernetes cluster is running. A Node provides the computing resources necessary to run your containers.
The lowest level of abstraction that you’ll work with in Kubernetes is the Pod. A Pod consists of one or more containers that run on the same Node and have shared resources. Containers in a pod are able to communicate with each other via localhost, providing a way to run an application consisting of several tightly-knit containers in a scalable fashion.
The Pod is the immutable layer of Kubernetes. Pods are never updated, but instead are shut down, thrown away, and replaced. They can be started and stopped manually, but that is not common in practice. The configuration and management of Pods in the cluster will almost always be managed by a Deployment.
Deployments are the workhorse of managing and running a Kubernetes cluster. A Deployment is where all of the heavy lifting happens to run and manage Pods. Deployments are configured with how many Pods are needed to run, what those Pods look like, and how the Pods are started and shut down depending on deployments or issues with the Node or cluster.
(Technically, Deployments hand off some of that work to a Replica Set that it creates for you, but that isn’t necessary to understand at this time.)
An example Deployment that brings up three nginx Pods:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 3 template: metadata: labels: role: web spec: containers: - name: nginx image: nginx:1.7.9 ports: - containerPort: 80
Having multiple Deployments each managing many Pods and containers is great, but how to you get a web request from the Internet into your Rails application? This is where Services come in. Services provide the entry point and exposure to your Deployments and Pods, both to other containers in your cluster, and to the Internet itself. A
NodePort Service is used for internal access, but can also be used to expose a high-level port (30000 - 35000) outside of the cluster that maps down into containers. If you are on Amazon (AWS) or Google’s Container Engine (GKE), you can instead make use of the
LoadBalancer Service type. LoadBalancer Services work with your cloud provider to provide an actual load balancer configured with the proper rules to forward traffic into your cluster.
This may be hard to follow, so here’s an example use case to help show how Services are used. Let’s say we are running
nginx as a front-end to our
rails application, and we need
redis to be available internally only. Assuming we’re on GKE or AWS, we want a load balancer to point to
nginx to point to
rails to have access to redis. The labels in the
selector field are used by Kubernetes to hook up the Service to its matching Deployment.
## # nginx # Listen to the world on port 80 and 443 ## apiVersion: v1 kind: Service metadata: name: nginx labels: role: web spec: type: LoadBalancer ports: - name: http port: 80 targetPort: 80 - name: https port: 443 targetPort: 443 selector: # Find all Resources that are tagged with the "role: web" label # In our case, it will find the nginx Deployment mentioned above role: web --- ## # Rails # Listen for traffic on 8080 so we don't have to run as root. ## apiVersion: v1 kind: Service metadata: name: rails labels: role: rails spec: type: NodePort ports: - port: 8080 targetPort: 8080 selector: role: rails --- ## # Redis ## apiVersion: v1 kind: Service metadata: name: redis labels: role: redis spec: type: NodePort ports: - port: 6379 targetPort: 6379 selector: role: redis
On top of all of this you can provide a Namespace. A Namespace is nothing more than a text identifier you can use to encapsulate your infrastructure. Kubernetes makes use of Namespaces internally to segregate its own services (kubedns, kube-proxy, etc) from your application by putting them in the
kube-system namespace. If you don’t provide a Namespace, Kubernetes will put your resources in the
default namespace, and for most use cases that will suffice. However, if you are running, for example, an infrastructure used by multiple different teams, using Namespaces can help provide a logical seperation as well as prevent collisions and confusion.
An example Namespace:
apiVersion: v1 kind: Namespace metadata: name: my-app
Kubernetes makes heavy use of labels to both tag and find resources across your clusters. You can see the use of labels in the examples above, where the Deployment has a
labels section and the Services have matching
selectors. With labels, Kubernetes will link up matching Resources into the full stack for you, resulting in: Service -> Deployment -> Pod -> Container.
Like I said at the beginning, Kubernetes is a large ecosystem that’s getting larger with each release, but once you understand these initial Resources and how to use them, continuing your education on the other Resources that Kubernetes provides gets much easier. For some next steps, I recommend looking into the following:
- Every application has information that needs to stay secure (database credentials, etc). Secrets are how Kubernetes takes your sensitive information and makes them available to Pods and containers.
- DaemonSet for when you want to make sure that some or all Nodes run a copy of a Pod.
- Job for when you have one-off tasks that you need to run on the cluster (for example, we use a Job to run database migrations).
In Part 3 of this series I will dive into more technical details of setting up, configuring, and managing your own Kubernetes cluster!