Home United States USA — software Using Machine Learning to Automate Kubernetes Optimization

Using Machine Learning to Automate Kubernetes Optimization

August 29, 2022

123

This article provides an explanation of how machine learning can be used to manage Kubernetes resources efficiently.
Join the DZone community and get the full member experience.
Note: This is the third article of a series covering Kubernetes resource management and optimization. In this article, we explain how machine learning can be used to manage Kubernetes resources efficiently. Previous articles explained Kubernetes resource types and requests and limits.
As Kubernetes has become the de-facto standard for application container orchestration, it has also raised vital questions about optimization strategies and best practices. One of the reasons organizations adopt Kubernetes is to improve efficiency, even while scaling up and down to accommodate changing workloads; however, the same fine-grained control that makes Kubernetes so flexible also makes it challenging to effectively tune and optimize.
In this article, we’ll explain how machine learning can be used to automate the tuning of these resources and ensure efficient scaling for variable workloads.
Optimizing applications for Kubernetes is largely a matter of ensuring that the code uses its underlying resources — namely CPU and memory — as efficiently as possible. That means ensuring performance that meets or exceeds service-level objectives at the lowest possible cost and with minimal effort.
When creating a cluster, we can configure the use of two primary resources — memory and CPU — at the container level. Namely, we can set limits as to how much of these resources our application can use and request. We can think of those resource settings as our input variables, and the output in terms of performance, reliability, and resource usage (or cost) of running our application. As the number of containers increases, the number of variables also increases, and with that, the overall complexity of cluster management and system optimization increases exponentially.
To further complicate matters, different resource parameters are interdependent. Changing one parameter may have unexpected effects on cluster performance and efficiency. This means that manually determining the precise configurations for optimal performance is an impossible task unless you have unlimited time and Kubernetes experts.
If we do not set custom values for resources during the container deployment, Kubernetes automatically assigns these values. The challenge here is that Kubernetes is quite generous with its resources to prevent two situations: service failure due to an out-of-memory (OOM) error and unreasonably slow performance due to CPU throttling. However, using the default configurations to create a cloud-based cluster will result in unreasonably high cloud costs without guaranteeing sufficient performance.
This all becomes even more complex when we seek to manage multiple parameters for several clusters. For optimizing an environment’s worth of metrics, a machine learning system can be an integral addition.
There are two general approaches to machine learning-based optimization, each of which provides value in a different way. First, experimentation-based optimization can be done in a non-prod environment using a variety of scenarios to emulate possible production scenarios. Second, observation-based optimization can be performed either in prod or non-prod by observing actual system behavior.