InfoQ asked James Munnelly and Matt Bates from Jetstack about their view and ongoing work to be able to configure, deploy, monitor, scale, and auto-heal stateful services in Kubernetes in the same way as stateless services. In particular, we’ve asked them about the approach and implementation of Navigator, an open source Kubernetes extension Munnelly and Bates have been developing.
The advantages of Kubernetes for stateless services are well documented. However, stateful workloads have particular requirements that have not been fully addressed yet by the Kubernetes ecosystem, according to James Munnelly, solutions engineer, and Matt Bates, co-founder of Jetstack, who presented at QCon London this year.
InfoQ took the chance to ask Munnelly and Bates about their view and on going work to be able to configure, deploy, monitor, scale and auto-heal stateful services in Kubernetes in the same way as stateless services. In particular, we’ve asked them about the approach and implementation of Navigator, an open source Kubernetes extension Munnelly and Bates have been developing.
InfoQ: Why did you decide for a Kubernetes-only platform instead of a hybrid: Kubernetes for stateless services and cloud provider services for data storage?
Matt Bates and James Munnelly: Where managed cloud provider services exist, such as Cloud SQL, there is a good case to use these in the pattern you describe. However, as more enterprises look to use Kubernetes in multiple environments, there is a desire for deployment and operational consistency and this is not always achievable with different flavours of managed service. It is also the case that in some environments, especially on-premises, such managed services simply do not exist. This is a situation for many of our enterprise customers.
There are already some efforts to integrate cloud provider managed services into cloud-native applications with the service-catalog project. We see Navigator integrating with service catalog to provider higher level layers of abstraction than we offer already today.
InfoQ: What are the fundamental difficulties you faced managing stateful workloads on Kubernetes?
Bates and Munnelly: Kubernetes provides many benefits to industry in terms of development velocity, resource utilisation and automated operations, however it’s fair to say that this has not been translated across to stateful workloads.
Many common database systems make assumptions they will be run on machines with fixed software versions, persistent disks and network identity – pets, essentially. Few systems are designed for highly dynamic environments like Kubernetes, where pods can come and go and change identity, and services are round-robin load balanced, for instance.
Moving database systems to Kubernetes is also problematic because it does not have the complex and application-specific operational awareness required to respond appropriately for all the various types of failure. So during these events, human interaction is often still required to ‘operate’ the database in question, and benefits of the time and efficiency savings anticipated with the automation can diminish.
InfoQ: In your talk you highlighted how Kubernetes evolution has been adding features over time that help manage stateful workloads, could you expand on that?
Bates and Munnelly: Since the very early days of the project, there have been efforts to introduce and mature features to enable workloads with state. Persistent Volumes, dynamic volume provisioning and StatefulSet provide building blocks that help run applications which require persistent disks or stable network identity, such as a databases. These are brilliant tools, but on their own can be problematic to use and understand, and cannot do everything you need to automate the operations of the many flavours of distributed database systems.
InfoQ: Despite that evolution, you’ve opted to develop a Kubernetes extension called Navigator. Could you tell us what features Navigator provides and how do you see the tool fitting in the existing Kubernetes ecosystem?
Bates and Munnelly: We’re very much building on this evolution. Resources such as StatefulSet and PersistentVolume, and their controllers, have brought about the building blocks for distributed stateful systems on Kubernetes. But by themselves, these primitives are not quite enough as they do not take account of the application-specific behaviour for bootstrap, scale-up/down, backup and restore, and more. We are building extensions to Kubernetes in order to fill these missing gaps between platform functionality and user experience.
InfoQ: In your talk you mentioned the operator pattern. Could you summarize how this pattern can help or hinder the operation of Kubernetes stateful workloads?
Bates and Munnelly: The Operator pattern was introduced by the folks at CoreOS and they have led the way in adopting this pattern to orchestrate and manage the likes of etcd and Prometheus. In Navigator, we follow a similar pattern, but we also add a co-located binary (a ‘Pilot’) that wraps each deployed database process. It’s our eyes and ears to determine the database node’s state, and this is reported back to the Pilot resource status in the Navigator API server (built on Kubernetes API machinery).
InfoQ: How does Helm native Kubernetes application management fit in? What are its main shortcomings when it comes to applications with strong data storage requirements?
Bates and Munnelly: It’s great to see such an extensive and ever-growing library of Helm charts. Most applications can now be easily deployed from a readymade chart, and that includes stateful systems such as MySQL, MongoDB and Elasticsearch, the list goes on. However, many of these charts still require point-in-time management and lack the operational knowledge for pro-active management and failure recovery. A chart will spin you up an Elasticsearch cluster, say, but it won’t be able to handle scale down gracefully.
InfoQ: How does the Navigator extension work, in a nutshell?
Bates and Munnelly: Navigator introduces new API types (such as ‘ElasticsearchCluster’, ‘CassandraCluster’) which represent higher level constructs for users to interact with.
We have then created an ‘operator’ which is responsible for manipulating and creating other Kubernetes resources in order to realise the ‘desired state’ (i.e. a valid Cassandra deployment). This controller continually watches the deployment, and takes corrective action in response to failures, as well to drive operational tasks such as upgrade and scale-up/down.
In order to facilitate data collection from the databases being deployed, ‘Pilots’, small applications that run alongside your database processes, collect information and store it back in the Navigator API in order to inform decisions made by the controller.
This separation of collection from action has been a key success for the project so far.
InfoQ: Navigator makes use of Kubernets CustomResourceDefinition (CRD), correct? But it also extends the Kubernetes API, why? Could you provide an example?
Bates and Munnelly: Navigator extends the API by including its own API server that can run alongside an existing Kubernetes control plane in order to provide the API extensions.
Home
United States
USA — software Q&A with James Munnelly and Matt Bates on Kubernetes Stateful Services and...