We’ve exited Kubernetes after 20 months. Here’s everything we wish we would have known before using the platform.
This article is part of the “Goodbye K8s” series, looking at the reasons that eventually drove us away from one of the hottest platforms on the market:
A Whole New World
As a spare time blog, we may have properly missed out on (some would say, ignored) the opportunity for a solid business case on using K8s.
But, we certainly had plenty of time to get acquainted with the technical nuances (read: annoyances) when it came to running a production app in K8s for 20 months. Our experiences of leveraging K8s as the platform of choice are probably best summed up as
K8s is a whole new world. And ecosystem.
This means that by leveraging K8s, we were taking on an additional level of abstraction and complexity as set out by the platform. Plus, all the new tooling and support processes required for it.
Moreover, K8s is effectively an entirely service oriented and software defined abstraction on a data centre that specialises in containers. Lots of things are possible. Especially stupid ones.
But it doesn’t stop there. K8s is just a platform to run containers. And those containers have to come from somewhere (more on that later on). Thankfully, ours were readily available. The picture — and thus complexity — would have been quite different if we would have had to build our own.
What’s the lesson, you ask?! Well.
Depending on where you’re coming from, leveraging K8s might mean taking on as much as two levels of additional abstraction layers — and ultimately complexity.
Those are essentially two new worlds you’re setting out to explore. In order to leverage K8s.
Still here? Then let’s start with the K8s cluster.
The K8s Cluster
By deliberately choosing to leverage a managed K8s service in GCP, we were able to outsource the entire problem of creating and operating a K8s cluster to GKE.
Creating the K8s Cluster
Despite GKE making it extremely easy to spin up new clusters in Terraform, our final technology stack still had many moving parts that supplied us with hours of phun on end.
An endless source of phun was the fact that different parts of the stack required their very own Terraform provider to work. GCP required the GCP provider. K8s required the K8s provider. Helm (a K8s package manager) required the Helm provider. So far, so good.
The point where it started to fall apart was when we tried to deploy custom resource definitions (in short CRDs) as required by the
jetstack/cert-manager Helm chart into the GKE cluster using the standard Terraform K8s provider.
In early 2019, CRDs were unsupported by the official Terraform K8s provider. At the time of writing this article in late 2020, they still are. However, there are announcements indicating raw YAML support in a potential future release.
Even though almost everything can (but shouldn’t necessarily) be done in Bash, we opted for remaining within the Terraform universe. A lengthier search revealed an alternative — and at that time already unsupported — legacy Terraform K8s provider which was essentially a front to an underlying
So, at this point we had successfully introduced two Terraform providers for exactly the same K8s cluster. And we needed both.
More moving parts that were rapidly evolving or already legacy! Not to mention the endless hours spent trying to eliminate race conditions between provider hand-offs or passing credentials between them.
Even when applying industry best practices such as modularisation and loose coupling, it took a significant effort to keep our Terraform stack stable and maintainable. Let alone upgradeable. And then we never really open sourced it as we do with many other projects.
Managing the K8s Cluster
As with many other things, management of the K8s cluster began with its creation.
For under-provisioning our K8s cluster, we paid the
Unschedulable price. As discovered during an outage that could only be resolved by manually scaling up the cluster. And yes: we should have probably enabled
auto-scaling right from the start. But then
we’re cheap we spend our money thought out.
For not enabling the
auto-repair flag, we paid the price. During a similar outage when the cluster simply wouldn’t repair itself.
Right-sizing our cluster took a while, especially as the failures also took a while to manifest themselves. Over-provisioning a cluster can quickly become a financial nightmare as we soon realised.
Even though they are still visible in the console, GKE does a fairly good job of abstracting away the underlying machines and volumes. However, for transferring state between clusters, we mostly relied on support from the applications.
Fun fact: For transferring assets such as images between installations, we actually used the
kubectl cp command, which we then ended up never fully automating.
In general, we get the feeling that state in a K8s cluster is still a problem at times. For example, a Helm chart we used leveraged a
StatefulSet of size one, effectively limiting future scaling.
Overall, we were always dealing with an abstraction layer and had to work with the endpoints and tools available at that point in time. No endpoint? No tool? No clue!
Up to this point, we’ve not spent much time on another elephant in the room: The actual containers running in the K8s cluster. They certainly need to be taken into account as well.
Before you can take on K8s, you need to be able to create and manage containerised applications. At scale.
As with K8s itself and depending on the individual starting point, this might result in again taking on a whole new world. A whole ecosystem, in fact. Not to mention the deployment mechanism. Helm is all the rage. And ever changing.
Creating Containerised Apps
Managing Containerised Apps
Helm makes it incredibly easy to deploy things into a K8s cluster. But it also makes it very easy to loose track of installations. It ships with its own database but that also has its limitations.
Again, you’re dealing with an abstraction layer. Having to work with the endpoints and the tooling available. Install the wrong Helm client on your machine by accidentally upgrading it and you’ll struggle to communicate with the cluster pendant.
A Brave New World
K8s opens the door widely to the vision of a service oriented and software defined abstraction on a data centre. Also in the cloud. Especially in the cloud.
Maybe K8s is the cloud we always wanted.
K8s is fairly universal and can run pretty much any anything (within limits) in a cluster of the right size — just ask Google about it. This reminds me of a quote around Python that might as well be applicable to K8s
K8s is like a Swiss-army knife. Extremely versatile. But you also need to know what you’re doing. Otherwise you may find yourself hammering a nail into the wall with a toothpick.
K8s is a cloud agnostic IaaS for containers with sensible default services such as real DNS and graceful container handling. Everything else can be added.
However, all that comes at a price. Complications and problems are simply pushed up the stack or hidden away behind layers of abstraction.
At the same time, the platform as well as the ecosystem surrounding it seem to be evolving at the speed of light. K8s APIs get deprecated and new ones introduced with seemingly every new release. The managed K8s service providers are doing a great job providing stability and continuity in a universe of rapid change.
However, this hasn’t stopped us from numerous situations where upgrades weren’t backwards compatibly, broke required functionality, and we had to spend hours on end trying to either come up with a solution to the problem or a suitable work around (read: technical debt).
Moreover, yesterday’s cool tool might easily turn out to be today’s unsupported legacy misadventure, as in our case of the two Terraform K8s providers. Picking a winning toolchain from an ever growing sea of competing tools can feel a bit like playing the lottery.
We’ll Meet Again K8s!
The final article in this series explores the lessons for the enterprise use of K8s as well as the overall potential we can see in the platform. And why we’ll most likely meet again in the future.
Subscribe to How Hard Can It Be?!
Get the latest posts by following our LinkedIn page