Kubernetes Vertical Pod Autoscaling Explained: When and Why to Scale Vertically

Kubernetes Vertical Pod Autoscaling Explained: When and Why to Scale Vertically

Kubernetes has introduced as its alpha version in its Kubernetes 1.27 release the Vertical Pod Autoscaling capability to provide the option for the Kubernetes workload to be able to scale using the “vertical” approach by adding more resources to an existing pod. This increases the autoscaling capabilities of your Kubernetes workloads that you have at your disposal such as KEDA or Horizontal Pod Autoscaling.

Vertical Scaling vs Horizontal Scaling

Vertical and horizontal scaling are two approaches used in scaling up the performance and capacity of computer systems, particularly in distributed systems and cloud computing. Vertical scaling, also known as scaling up or scaling vertically, involves adding more resources, such as processing power, memory, or storage, to a single instance or server. This means upgrading the existing compute components or migrating to a more powerful infrastructure. Vertical scaling is often straightforward to implement and requires minimal changes to the software architecture. It is commonly used when the system demands can be met by a single, more powerful infrastructure.

On the other hand, horizontal scaling, also called scaling out or scaling horizontally, involves adding more instances or servers to distribute the workload. Instead of upgrading a single instance, multiple instances are employed, each handling a portion of the workload. Horizontal scaling offers the advantage of increased redundancy and fault tolerance since multiple instances can share the load. Additionally, it provides the ability to handle larger workloads by simply adding more machines to the cluster. However, horizontal scaling often requires more complex software architectures, such as load balancing and distributed file systems, to efficiently distribute and manage the workload across the machines.

In summary, vertical scaling involves enhancing the capabilities of a single object, while horizontal scaling involves distributing the workload across multiple instances. Vertical scaling is easier to implement but may have limitations in terms of the maximum resources available on a single machine. Horizontal scaling provides better scalability and fault tolerance but requires more complex software infrastructure. The choice between vertical and horizontal scaling depends on factors such as the specific requirements of the system, the expected workload, and the available resources.

Why Kubernetes Vertical AutoScaling?

This is an interesting topic because we have been living in a world where the state was that was always better to scale out (using Horizontal Scaling) rather than scaling up (using Vertical Scaling) and especially this was one of the mantras you heard in cloud-native developments. And, that hasn’t changed because horizontal scaling provides much more benefits than vertical scaling and it is well covered with the Autoscaling capabilities or side-projects such as KEDA. So, in that case, why is Kubernetes including this feature and why are we using this site to discuss it?

Because with the transformation of Kubernetes to be the de-facto alternative to any deployment you do nowadays, the characteristic and capabilities of the workloads that you need to handle have extended and that’s why you need to use different techniques to provide the best experience to each of the workloads types

How Kubernetes Vertical Autoscaling?

Here you will find all the documentation about this new feature that as commented is still in the “alpha” stage to is something to try as an experimental mode rather than using it at the production level HPA Documentation

Vertical Scaling works in the way that you will be able to change the resources assigned to the pod, CPU, and memory without needing to restart the pod and change the manifest declaration and that’s a clear benefit of this approach. As you know, until now if you want to change the resources applied to a workload you need to update the manifest document and restart the pod to apply the new changes.

To define this you need to specify the resizePolicy by adding a new section to the manifest pod as you can see here:

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo-5
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-ctr-5
    image: nginx
    resizePolicy:
    - resourceName: cpu
      restartPolicy: NotRequired
    - resourceName: memory
      restartPolicy: RestartContainer
    resources:
      limits:
        memory: "200Mi"
        cpu: "700m"
      requests:
        memory: "200Mi"
        cpu: "700m"

For example in this case we define for the different resource names the policy that we want to apply, if we’re going to change the cpu assigned it won’t require a restart but in case we’re changing the memory it would require a restart.

That implied that if would like to change the CPU assigned you can directly patch the manifest as you can see in the snippet below and that provides an update of the assigned resources:

 kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}'

When to use Vertical Scaling are the target scenarios?

It will depend on a lot of different scenarios from the use-case but also from the technology stack that your workload is using to know what of these capabilities can apply. As a normal thing, the CPU change will be easy to adapt to any technology but the memory one would be more difficult depending on the technology used as in most of the technologies the memory assigned is defined at the startup time.

This will help to update components that have changed their requirements as an average scenario or when you’re testing new workloads with live load and you don’t want to disrupt the current processing of the application or simply workloads that don’t support horizontal scaling because are designed on a single-replica mode

 Conclusion

In conclusion, Kubernetes has introduced Vertical Pod Autoscaling, enabling Kubernetes vertical autoscaling of workloads by adding resources to existing pods. Kubernetes Vertical autoscaling allows for resource changes without restarting pods, providing flexibility in managing CPU and memory allocations.

Kubernetes Vertical autoscaling offers a valuable option for adapting to evolving workload needs. It complements horizontal scaling by providing flexibility without the need for complex software architectures. By combining vertical and horizontal scaling approaches, Kubernetes users can optimize their deployments based on specific workload characteristics and available resources.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes Autoscaling 1.26 Explained: HPA v2 Changes and Impact on KEDA

Kubernetes Autoscaling 1.26 Explained: HPA v2 Changes and Impact on KEDA

Introduction

Kubernetes Autoscaling has suffered a dramatic change. Since the Kubernetes 1.26 release, all components should migrate their HorizontalPodAutoscaler objects from the v1 to the new release v2that has been available since Kubernetes 1.23.

HorizontalPodAutoscaler is a crucial component in any workload deployed on a Kubernetes cluster, as the scalability of this solution is one of the great benefits and key features of this kind of environment.

A little bit of History

Kubernetes has introduced a solution for the autoscaling capability since the version Kubernetes 1.3 a long time ago, in 2016. And the solution was based on a control loop that runs at a specific interval that you can configure with the property --horizontal-pod-autoscaler-sync-period parameters that belong to the kube-controller-manager.

So, once during this period, it will get the metrics and evaluate through the condition defined on the HorizontalPodAutoscaler component. Initially, it was based on the compute resources used by the pod, main memory, and CPU.

Kubernetes Autoscaling 1.26: A Game-Changer for KEDA Users?

This provided an excellent feature, but with the past of time and adoption of the Kubernetes environment, it has been shown as a little narrow to handle all the scenarios that we should have, and here is where other awesome projects we have discussed here, such as KEDA brings into the picture to provide a much more flexible set of features.

Kubernetes AutoScaling Capabilities Introduced v2

With the release of the v2 of the Autoscaling API objects, we have included a range of capabilities to upgrade the flexibility and options available now. There most relevant ones are the following:

  • Scaling on custom metrics: With the new release, you can configure an HorizontalPodAutoscaler object to scale using custom metrics. When we talk about custom metrics, we talk about any metric generated from Kubernetes. You can see a detailed walkthrough about using Custom metrics in the official documentation
  • Scaling on multiple metrics: With the new release, you also have the option to scale based on more than one metric. So now the HorizontalPodAutoscalerwill evaluate each scaling rule condition, propose a new scale value for each of them, and take the maximum value as the final one.
  • Support for Metrics API: With the new release, the controller from the HoriztalPodAutoscaler components retrieves metrics from a series of registered APIs, such as metrics.k8s.io, custom.metrics.k8s.io ,external.metrics.k8s.io. For more information on the different metrics available, you can take a look at the design proposal
  • Configurable Scaling Behavior: With the new release, you have a new field, behavior, that allows configuring how the component will behave in terms of scaling up or scaling down activity. So, you can define different policies for the scaling up and others for the scaling down, limit the max number of replicas that can be added or removed in a specific period, to handle the issues with the spikes of some components as Java workloads, among others. Also, you can define a stabilization window to avoid stress when the metric is still fluctuating.

Kubernetes Autoscaling v2 vs KEDA

We have seen all the new benefits that Autoscaling v2 provides, so I’m sure that most of you are asking the same question: Is Kubernetes Autoscaling v2 killing KEDA?

Since the latest releases of KEDA, KEDA already includes the new objects under the autoscaling/v2 group as part of their development, as KEDA relies on the native objects from Kubernetes, and simplify part of the process you need to do when you want to use custom metric or external ones as they have scalers available for pretty much everything you could need now or even in the future.

But, even with that, there are still features that KEDA provides that are not covered here, such as the scaling “from zero” and “to zero” capabilities that are very relevant for specific kinds of workloads and to get a very optimized use of resources. Still, it’s safe to say that with the new features included in the autoscaling/v2 release, the gap is now smaller. Depending on your needs, you can go with the out-of-the-box capabilities without including a new component in your architecture.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)

KEDA provides a rich environment to scale your application apart from the traditional HPA approach using CPU and Memory

Autoscaling is one of the great things of cloud-native environments and helps us to provide an optimized use of the operations. Kubernetes provides many options to do that being one of those the Horizontal Pod Autoscaler (HPA) approach.

HPA is the way Kubernetes has to detect if it is needed to scale any of the pods, and it is based on the metrics such as CPU usage or memory.

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Sometimes those metrics are not enough to decide if the number of replicas we have available is enough. Other metrics can provide a better perspective, such as the number of requests or the number of pending events.

Kubernetes Event-Driven Autoscaling (KEDA)

Here is where KEDA comes to help. KEDA stands for Kubernetes Event-Driven Autoscaling and provides a more flexible approach to scale our pods inside a Kubernetes cluster.

It is based on scalers that can implement different sources to measure the number of requests or events that we receive from different messaging systems such as Apache Kafka, AWS Kinesis, Azure EventHub, and other systems as InfluxDB or Prometheus.

KEDA works as it is shown in the picture below:

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)

We have our ScaledObject that links our external event source (i.e., Apache Kafka, Prometheus ..) with the Kubernetes Deployment we would like to scale and register that in the Kubernetes cluster.

KEDA will monitor the external source, and based on the metrics gathered, will communicate the Horizontal Pod Autoscaler to scale the workload as defined.

Testing the Approach with a Use-Case

So, now that we know how that works, we will do some tests to see it live. We are going to show how we can quickly scale one of our applications using this technology. And to do that, the first thing we need to do is to define our scenario.

In our case, the scenario will be a simple cloud-native application developed using a Flogo application exposing a REST service.

The first step we need to do is to deploy KEDA in our Kubernetes cluster, and there are several options to do that: Helm charts, Operation, or YAML files. In this case, we are going to use the Helm charts approach.

So, we are going to type the following commands to add the helm repository and update the charts available, and then deploy KEDA as part of our cluster configuration:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda

After running this command, KEDA is deployed in our K8S cluster, and it types the following command kubectl get all will provide a situation similar to this one:

pod/keda-operator-66db4bc7bb-nttpz 2/2 Running 1 10m
pod/keda-operator-metrics-apiserver-5945c57f94-dhxth 2/2 Running 1 10m

Now, we are going to deploy our application. As already commented to do that we are going to use our Flogo Application, and the flow will be as simple as this one:

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)
Flogo application listening to the requests
  • The application exposes a REST service using the /hello as the resource.
  • Received requests are printed to the standard output and returned a message to the requester

Once we have our application deployed on our Kubernetes application, we need to create a ScaledObject that is responsible for managing the scalability of that component:

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)
ScaleObject configuration for the application

We use Prometheus as a trigger, and because of that, we need to configure where our Prometheus server is hosted and what query we would like to do to manage the scalability of our component.

In our sample, we will use the flogo_flow_execution_count that is the metric that counts the number of requests that are received by this component, and when this has a rate higher than 100, it will launch a new replica.

After hitting the service with a Load Test, we can see that as soon as the service reaches the threshold, it launch a new replica to start handling requests as expected.

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)
Autoscaling being done using Prometheus metrics.

All of the code and resources are hosted in the GitHub repository shown below:

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/


Summary

This post has shown that we have unlimited options in deciding the scalability options for our workloads. We can use the standard metrics like CPU and memory, but if we need to go beyond that, we can use different external sources of information to trigger that autoscaling.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes Autoscaling: Learn How to Scale Your Kubernetes Deployments Dynamically

Kubernetes Autoscaling: Learn How to Scale Your Kubernetes Deployments Dynamically

Discover the different options to scale your platform based on the traffic load you receive

When talking about Kubernetes, you’re always talking about the flexibility options that it provides. Usually, one of the topics that come into the discussion is the elasticity options that come with the platform — especially when working on a public cloud provider. But how can we really implement it?

Before we start to show how to scale our Kubernetes platform, we need to do a quick recap of the options that are available to us:

  • Cluster Autoscaler: When the load of the whole infrastructure reaches its peak, we can improve it by creating new worker nodes to host more service instances.
  • Horizontal Pod Autoscaling: When the load for a specific pod or set of pods reaches its peak, we deploy a new instance to ensure that we can have the global availability that we need.

Let’s see how we can implement these using one of the most popular Kubernetes-managed services, Amazon’s Elastic Kubernetes Services (EKS).


Setup

The first thing that we’re going to do is create a cluster with a single worker node to demonstrate the scalability behavior easily. And to do that, we’re going to use the command-line tool eksctl to manage an EKS cluster easily.

To be able to create the cluster, we’re going to do it with the following command:

eksctl create cluster --name=eks-scalability --nodes=1 --region=eu-west-2 --node-type=m5.large --version 1.17 --managed --asg-access

After a few minutes, we will have our own Kubernetes cluster with a single node to deploy applications on top of it.

Now we’re going to create a sample application to generate load. We’re going to use TIBCO BusinessWorks Application Container Edition to generate a simple application. It will be a REST API that will execute a loop of 100,000 iterations acting as a counter and return a result.

BusinessWorks sample application to show the scalability options
BusinessWorks sample application to show the scalability options

And we will use the resources available in this GitHub repository:

We will build the container image and push it to a container registry. In my case, I am going to use my Amazon ECR instance to do so, and I will use the following commands:

docker build -t testeks:1.0 .
docker tag testeks:1.0 938784100097.dkr.ecr.eu-west-2.amazonaws.com/testeks:1.0
docker push 938784100097.dkr.ecr.eu-west-2.amazonaws.com/testeks:1.0

And once that the image is pushed into the registry, we will deploy the application on top of the Kubernetes cluster using this command:

kubectl apply -f .\testeks.yaml

After that, we will have our application deployed there, as you can see in the picture below:

Image deployed on the Kubernetes cluster
Image deployed on the Kubernetes cluster

So, now we can test the application. To do so, I will make port 8080 available using a port-forward command like this one:

kubectl port-forward pod/testeks-v1-869948fbb-j5jh7 8080:8080

With that, I can see and test the sample application using the browser, as shown below:

Swagger UI tester for the Kubernetes sample application
Swagger UI tester for the Kubernetes sample application

Horizontal pod autoscaling

Now, we need to start defining the autoscale rules, and we will start with the Horizontal Pod Autoscaler (HPA) rule. We will need to choose the resource that we would like to use to scale our pod. In this test, I will use the CPU utilization to do so, and I will use the following command:

kubectl autoscale deployment testeks-v1 --min=1 --max=5 --cpu-percent=80

That command will scale the replica set testeks from one (1) instance to five (5) instances when the CPU utilization percent is higher than 80%.

If now we check the status of the components, we will get something similar to the image below:

HPA rule definition for the application using CPU utilization as the key metric
HPA rule definition for the application using CPU utilization as the key metric

If we check the TARGETS column, we will see this value: <unknown>/80%. That means that 80% is the target to trigger the new instances and the current usage is <unknown>.

We do not have anything deployed on the cluster to get the metrics for each of the pods. To solve that, we need to deploy the Metrics Server. To do so, we will follow the Amazon AWS documentation:

So, running the following command, we will have the Metrics Server installed.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml

And after doing that, if we check again, we can see that the current user has replaced the <unknown>:

Current resource utilization after installing the metrics-server on the Kubernetes cluster
Current resource utilization after installing the Metrics Server on the Kubernetes cluster

If that works, I am going to start sending requests using a Load Test inside the cluster. I will use the sample app defined below:

To deploy, we will use a YAML file with the following content:

https://gist.github.com/BetterProgramming/53181f3aa7bee7b7e3adda7c4ed8ca40#file-deploy-yaml

And we will deploy it using the following command:

kubectl apply -f tester.yaml

After doing that, we will see that the current utilization is being increased. After a few seconds, it will start spinning new instances until it meets the maximum number of pods defined in the HPA rule.

Pods increasing when the load exceeds the target defined in previous steps.
Pods increasing when the load exceeds the target defined in previous steps.

Then, as soon as the load also decreases, the number of instances will be deleted.

Pods are deleted as soon as the load decreases.
Pods are deleted as soon as the load decreases.

Cluster autoscaling

Now, we need to see how we can implement the Cluster Autoscaler using EKS. We will use the information that Amazon provides:

https://github.com/alexandrev/testeks

The first step is to deploy the cluster autoscaling, and we will do it using the following command:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Then we will run this command:

kubectl -n kube-system annotate deployment.apps/cluster-autoscaler cluster-autoscaler.kubernetes.io/safe-to-evict=”false”

And we will edit the deployment to provide the current name of the cluster that we are managing. To do that, and we will run the following command:

kubectl -n kube-system edit deployment.apps/cluster-autoscaler

When your default text editor opens with the text content, you need to make the following changes:

  • Set your cluster name in the placeholder available.
  • Add these additional properties:
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
Deployment edits that are needed to configure the Cluster Autoscaler
Deployment edits that are needed to configure the Cluster Autoscaler

Now we need to run the following command:

kubectl -n kube-system set image deployment.apps/cluster-autoscaler cluster-autoscaler=eu.gcr.io/k8s-artifacts-prod/autoscaling/cluster-autoscaler:v1.17.4

The only thing that is left is to define the AutoScaling policy. To do that, we will use the AWS Services portal:

  • Enter into the EC service page on the region in which we have deployed the cluster.
  • Select the Auto Scaling Group options.
  • Select the Auto Scaling Group that has been created as part of the EKS cluster-creating process.
  • Go to the Automatic Scaling tab and click on the Add Policy button available.
Autoscaling policy option in the EC2 Service console
Autoscaling policy option in the EC2 Service console

Then we should define the policy. We will use the Average CPU utilization as the metric and set the target value to 50%:

Autoscaling policy creation dialog
Autoscaling policy creation dialog

To validate the behavior, we will generate load using the tester as we did in the previous test and validate the node load using the following command:

kubectl top nodes
kubectl top nodes’ sample output
kubectl top nodes’ sample output

Now we deploy the tester again. As we already have it deployed in this cluster, we need to delete it first to deploy it again:

kubectl delete -f .\tester.yaml
kubectl apply -f .\tester.yaml

As soon as the load starts, new nodes are created, as shown in the image below:

kubectl top nodes’ sample output
kubectl top nodes showing how nodes have been scaled up

After the load finishes, we go back to the previous situation:

kubectl top nodes showing how nodes have been scaled down
kubectl top nodes showing how nodes have been scaled down

Summary

In this article, we have shown how we can scale a Kubernetes cluster in a dynamic way both at the worker node level using the Cluster Autoscaler capability and at the pod level using the Horizontal Pod Autoscaler. That gives us all the options needed to create a truly elastic and flexible environment able to adapt to each moment’s needs with the most efficient approach.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.