This article reviews how Kubernetes provides the platform capabilities for dynamic deployment, scaling, and management in Cloud-native applications.
Join the DZone community and get the full member experience. In the era of web-scale, every organization is looking to scale its applications on-demand, while minimizing infrastructure expenditure. Cloud-native applications, such as microservices are designed and implemented with scale in mind and Kubernetes provides the platform capabilities for dynamic deployment, scaling, and management. Autoscaling and scale to zero is a critical functional requirement for all serverless platforms as well as platform-as-a-service (PaaS) solution providers because it helps to minimize infrastructure costs. For example, the following graph shows how Microsoft Azure saves money by having autoscaling and snooze (scale to zero) in the Azure PaaS and how their customers have directly benefited from the savings. Even though it is not directly related to Kubernetes deployment, it emphasizes how important to have the autoscaling and scale to zero in your solutions. Source Kubernetes’ Horizontal Pod Autoscaler (HPA) automatically scales the application workload by scaling the number of Pods in deployment (or replication controller, replica set, stateful set), based on observed metrics like CPU utilization, memory consumption, or with custom metrics provided by the application. Horizontal Pod Autoscaler HPA uses the following simple algorithms to determine the scaling decision, and it can scale the deployment within the defined minimum and the maximum number of replicas. desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue)] Kubernetes’ default HPA is based on CPU utilization and desiredReplicas never go lower than 1, where CPU utilization cannot be zero for a running Pod. This is the same behavior for memory consumption-based autoscaling, where you cannot achieve scale to zero. However, it is possible to scale into zero replicas if you ignore CPU and memory utilization and consider other metrics to determine whether the application is idle. For example, a workload that only consumes and processes a queue can scale to zero if we can take queue length as a metric and the queue is empty for a given period of time. Of course, there should be other factors to consider like lower latency sensitivity and fast bootup time (warm-up time) of the workload to have a smooth user experience.