Ensuring Kubernetes Security: A Collaborative Journey for Developers and Operators

Kubernetes Security is one of the most critical aspects today in IT world. Kubernetes has become the backbone of modern infrastructure management, allowing organizations to scale and deploy containerized applications with ease. However, the power of Kubernetes also brings forth the responsibility of ensuring robust security measures are in place. This responsibility cannot rest solely on the shoulders of developers or operators alone. It demands a collaborative effort where both parties work together to mitigate potential risks and vulnerabilities.

Even though DevOps and Platform Engineering approaches are pretty standard, there are still tasks responsible for different teams, even though nowadays you have platform and project teams.

Here you will see three easy ways to improve your Kubernetes security from both dev and ops perspectives:

No Vulnerabilities in Container Images

Vulnerability Scan on Container Images is something crucial in nowadays developments because the number of components deployed on the system has grown exponentially, and also the opacity of them as well. Vulnerabilities Scan using tools such as Trivy or the integrated options in our local docker environments such as Docker Desktop or Rancher Desktop is mandatory, but how can you use it to make your application more secure?

  • Developer’s responsibility:
    • Use only allowed standard base images, well-known
    • Reduce, at minimum, the number of components and packages to be installed with your application (better Alpine than Debian)
    • Use a Multi-Stage approach to only include what you will need in your images.
    • Run a vulnerability scan locally before pushing
  • Operator’s responsibility:
    • Force to download all base images for the corporate container registry
    • Enforce vulnerability scan on push, generating alerts and avoiding deployment if the quality criteria are unmet.
    • Perform regular vulnerability scans for runtime images and generate incidents for the development teams based on the issues discovered.

No Additional Privileges in Container Images

Now that our application doesn’t include any vulnerability, we need to ensure the image is not allowed to do what it should, such as elevating privileges. See what you can do depending on your role:

  • Developer responsibility:
    • Never create images with root user and use security context options in your Kubernetes Manifest files
    • Test your images with all the possible capabilities dropped unless needed for some specific reason
    • Make your filesystem read-only and use volumes for the required folders on your application.
  • Operator’s responsibility:

Restrict visibility between components

When we design applications nowadays, it is expected that they require to connect to other applications and components, and the service discovery capabilities in Kubernetes are excellent in how we can interact. Still, also this allows other apps to connect to services that maybe they shouldn’t. See what you can do to help on that aspect depending on your role and responsibility:

  • Developer responsibility:
    • Ensure your application has proper authentication and authorization policies in place to avoid any unauthorized use of your application.
  • Operation responsibility:
    • Manage at the platform level the network visibility of the components but deny all traffic by default and allow the connections required by design by using Network Policies.
    • Use Service Mesh tools to have a central approach for authentication and authorization.
    • Use tools like Kiali to monitor the network traffic and detect unreasonable traffic patterns.

Conclusion

In conclusion, the importance of Kubernetes security cannot be overstated. It requires collaboration and shared responsibility between developers and operators. By focusing on practices such as vulnerability scanning, restricting additional privileges, and restricting visibility between components, organizations can create a more secure Kubernetes environment. By working together, developers and operators can fortify the container ecosystem, safeguarding applications, data, and critical business assets from potential security breaches. With a collaborative approach to Kubernetes security, organizations can confidently leverage the full potential of this powerful orchestration platform while maintaining the highest standards of security. By adopting these practices, organizations can create a more secure Kubernetes environment, protecting their applications and data from potential threats.

Unlocking Performance and Adaptability: Exploring Kubernetes Vertical Autoscaling

Kubernetes has introduced as its alpha version in its Kubernetes 1.27 release the Vertical Pod Autoscaling capability to provide the option for the Kubernetes workload to be able to scale using the “vertical” approach by adding more resources to an existing pod. This increases the autoscaling capabilities of your Kubernetes workloads that you have at your disposal such as KEDA or Horizontal Pod Autoscaling.

Vertical Scaling vs Horizontal Scaling

Vertical and horizontal scaling are two approaches used in scaling up the performance and capacity of computer systems, particularly in distributed systems and cloud computing. Vertical scaling, also known as scaling up or scaling vertically, involves adding more resources, such as processing power, memory, or storage, to a single instance or server. This means upgrading the existing compute components or migrating to a more powerful infrastructure. Vertical scaling is often straightforward to implement and requires minimal changes to the software architecture. It is commonly used when the system demands can be met by a single, more powerful infrastructure.

On the other hand, horizontal scaling, also called scaling out or scaling horizontally, involves adding more instances or servers to distribute the workload. Instead of upgrading a single instance, multiple instances are employed, each handling a portion of the workload. Horizontal scaling offers the advantage of increased redundancy and fault tolerance since multiple instances can share the load. Additionally, it provides the ability to handle larger workloads by simply adding more machines to the cluster. However, horizontal scaling often requires more complex software architectures, such as load balancing and distributed file systems, to efficiently distribute and manage the workload across the machines.

In summary, vertical scaling involves enhancing the capabilities of a single object, while horizontal scaling involves distributing the workload across multiple instances. Vertical scaling is easier to implement but may have limitations in terms of the maximum resources available on a single machine. Horizontal scaling provides better scalability and fault tolerance but requires more complex software infrastructure. The choice between vertical and horizontal scaling depends on factors such as the specific requirements of the system, the expected workload, and the available resources.

Why Kubernetes Vertical AutoScaling?

This is an interesting topic because we have been living in a world where the state was that was always better to scale out (using Horizontal Scaling) rather than scaling up (using Vertical Scaling) and especially this was one of the mantras you heard in cloud-native developments. And, that hasn’t changed because horizontal scaling provides much more benefits than vertical scaling and it is well covered with the Autoscaling capabilities or side-projects such as KEDA. So, in that case, why is Kubernetes including this feature and why are we using this site to discuss it?

Because with the transformation of Kubernetes to be the de-facto alternative to any deployment you do nowadays, the characteristic and capabilities of the workloads that you need to handle have extended and that’s why you need to use different techniques to provide the best experience to each of the workloads types

How Kubernetes Vertical Autoscaling?

Here you will find all the documentation about this new feature that as commented is still in the “alpha” stage to is something to try as an experimental mode rather than using it at the production level HPA Documentation

Vertical Scaling works in the way that you will be able to change the resources assigned to the pod, CPU, and memory without needing to restart the pod and change the manifest declaration and that’s a clear benefit of this approach. As you know, until now if you want to change the resources applied to a workload you need to update the manifest document and restart the pod to apply the new changes.

To define this you need to specify the resizePolicy by adding a new section to the manifest pod as you can see here:

apiVersion: v1
kind: Pod
metadata:
  name: qos-demo-5
  namespace: qos-example
spec:
  containers:
  - name: qos-demo-ctr-5
    image: nginx
    resizePolicy:
    - resourceName: cpu
      restartPolicy: NotRequired
    - resourceName: memory
      restartPolicy: RestartContainer
    resources:
      limits:
        memory: "200Mi"
        cpu: "700m"
      requests:
        memory: "200Mi"
        cpu: "700m"

For example in this case we define for the different resource names the policy that we want to apply, if we’re going to change the cpu assigned it won’t require a restart but in case we’re changing the memory it would require a restart.

That implied that if would like to change the CPU assigned you can directly patch the manifest as you can see in the snippet below and that provides an update of the assigned resources:

 kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}'

When to use Vertical Scaling are the target scenarios?

It will depend on a lot of different scenarios from the use-case but also from the technology stack that your workload is using to know what of these capabilities can apply. As a normal thing, the CPU change will be easy to adapt to any technology but the memory one would be more difficult depending on the technology used as in most of the technologies the memory assigned is defined at the startup time.

This will help to update components that have changed their requirements as an average scenario or when you’re testing new workloads with live load and you don’t want to disrupt the current processing of the application or simply workloads that don’t support horizontal scaling because are designed on a single-replica mode

 Conclusion

In conclusion, Kubernetes has introduced Vertical Pod Autoscaling, enabling Kubernetes vertical autoscaling of workloads by adding resources to existing pods. Kubernetes Vertical autoscaling allows for resource changes without restarting pods, providing flexibility in managing CPU and memory allocations.

Kubernetes Vertical autoscaling offers a valuable option for adapting to evolving workload needs. It complements horizontal scaling by providing flexibility without the need for complex software architectures. By combining vertical and horizontal scaling approaches, Kubernetes users can optimize their deployments based on specific workload characteristics and available resources.

How to Use SoapUI Integrated with Maven for Automation Testing

How to Use SoapUI Integrated with Maven for Automation Testing

SoapUI is a popular open-source tool used for testing SOAP and REST APIs. It comes with a user-friendly interface and a variety of features to help you test API requests and responses. In this article, we will explore how to use SoapUI integrated with Maven for automation testing.

Why Use SoapUI with Maven?

Maven is a popular build automation tool that simplifies building and managing Java projects. It is widely used in the industry, and it has many features that make it an ideal choice for automation testing with SoapUI.

By integrating SoapUI with Maven, you can easily run your SoapUI tests as part of your Maven build process. This will help you to automate your testing process, reduce the time required to test your APIs, and ensure that your tests are always up-to-date.

Setting Up SoapUI and Maven

Before we can start using SoapUI with Maven, we must set up both tools on our system. First, download and install SoapUI from the official website. Once SoapUI is installed, we can proceed with installing Maven.

To install Maven, follow these steps:

  1. Download the latest version of Maven from the official website.
  2. Extract the downloaded file to a directory on your system.
  3. Add the bin directory of the extracted folder to your system’s PATH environment variable.
  4. Verify that Maven is installed by opening a terminal or command prompt and running the command mvn -version.

Creating a Maven Project for SoapUI Tests

Now that we have both SoapUI and Maven installed, we can create a Maven project for our SoapUI tests. To create a new Maven project, follow these steps:

  1. Open a terminal or command prompt and navigate to the directory where you want to create your project.
  2. Run the following command: mvn archetype:generate -DgroupId=com.example -DartifactId=my-soapui-project -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
  3. This will create a new Maven project with the group ID com.example and the artifact ID my-soapui-project.

Adding SoapUI Tests to the Maven Project

Now that we have a Maven project, we can add our SoapUI tests to the project. To do this, follow these steps:

  1. Create a new SoapUI project by opening SoapUI and selecting File > New SOAP Project.
  2. Follow the prompts to create a new project, including specifying the WSDL file and endpoint for your API.
  3. Once your project is created, create a new test suite and add your test cases.
  4. Save your SoapUI project.

Next, we need to add our SoapUI project to our Maven project. To do this, follow these steps:

  1. In your Maven project directory, create a new directory called src/test/resources.
  2. Copy your SoapUI project file (.xml) to this directory.
  3. In the pom.xml file of your Maven project, add the following code:
<build>
  <plugins>
    <plugin>
      <groupId>com.smartbear.soapui</groupId>
      <artifactId>soapui-maven-plugin</artifactId>
      <version>5.6.0</version>
      <configuration>
        <projectFile>1/src/test/resources/my-soapui-project.xml</projectFile>
        <outputFolder>1/target/surefire-reports</outputFolder>
        <junitReport>true</junitReport>
        <exportwAll>true</exportwAll>
      </configuration>
      <executions>
        <execution>
          <phase>test</phase>
          <goals>
            <goal>test</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

This code configures the SoapUI Maven plugin to run our SoapUI tests during the test phase of the Maven build process.

Creating Assertions in SoapUI Projects

Now that we have our SoapUI tests added to our Maven project, we can create assertions to validate the responses of our API calls. To create assertions in SoapUI, follow these steps:

  1. Open your SoapUI project and navigate to the test case where you want to create an assertion.
  2. Right-click on the step that you want to validate and select Add Assertion.
  3. Choose the type of assertion that you want to create (e.g. Contains, XPath Match, Valid HTTP Status Codes, etc.).
  4. Configure the assertion according to your needs.
  5. Save your SoapUI project.

Running SoapUI Tests with Assertions Using Maven

Now that we have our SoapUI tests and assertions added to our Maven project, we can run them using Maven. To run your SoapUI tests with Maven and validate the responses using assertions, follow these steps:

  1. Open a terminal or command prompt and navigate to your Maven project directory.
  2. Run the following command: mvn clean test
  3. This will run your SoapUI tests and generate a report in the target/surefire-reports directory of your Maven project.

During the test execution, if any assertion fails, the test will fail and an error message will be displayed in the console. By creating assertions, we can ensure that our API calls are returning the expected responses.

Conclusion

In this article, we have learned how to use SoapUI integrated with Maven for automation testing, including how to create assertions in SoapUI projects. By using these two tools together, we can automate our testing process, reduce the time required to test our APIs, and ensure that our tests are always up-to-date. If you are looking to get started with automation testing using SoapUI and Maven, give this tutorial a try!

How To Enable Sticky Session on Your Kubernetes Workloads using Istio?

Istio allows you to configure Sticky Session, among other network features, for your Kubernetes workloads. As we have commented in several posts regarding Istio, istio deploys a service mesh that provides a central control plane to have all the configuration regarding the network aspects of your Kubernetes workloads. This covers many different aspects of the communication inside the container platform, such as security covering security transport, authentication or authorization, and, at the same time, network features, such as routing and traffic distribution, which is the main topic for today’s article.

These routing capabilities are similar to what a traditional Load Balancer of Level 7 can provide. When we talk about Level 7, we’re referring to the conventional levels that compound the OSI stack, where level 7 is related to the Application Level.

A Sticky Session or Session Affinity configuration is one of the most common features you can need to implement in this scenario. The use-case is the following one:

How To Enable Sticky Session on Your Kubernetes Workloads using Istio?

You have several instances of your workloads, so different pod replicas in a Kubernetes situation. All of these pods behind the same service. By default, it will redirect the requests in a round-robin fashion among the pod replicas in a Ready state, so Kubernetes understand that they’re ready to get the request unless you define it differently.

But in some cases, mainly when you are dealing with a web application or any stateful application that handles the concept of a session, you could want the replica that processes the first request and also handles the rest of the request during the lifetime of the session.

Of course, you could do that easily just by routing all traffic to one request, but in that case, we lose other features such as traffic load balancing and HA. So, this is usually implemented using Session Affinity or Sticky Session policies that provides best of both worlds: same replica handling all the request from an user, but traffic distribution between different users.

How Sticky Session Works?

The behavior behind this is relatively easy. Let’s see how it works.

First, the important thing is that you need “something” as part of your network requests that identify all the requests that belong to the same session, so the routing component (in this case, this role is played by istio) can determine which part needs to handle these requests.

This is “something” that we use to do that, it can be different depending on your configuration, but usually, this is a Cookie or an HTTP Header that we send in each request. Hence, we know that the replica handles all requests of that specific type.

How does Istio implement Sticky Session support?

In the case of using Istio to do this role, we can implement that by using a specific Destination Rule that allows us, among other capabilities, to define the traffic policy to define how we want the traffic to be split and to implement the Sticky Session we need to use the “consistentHash” feature, that allows that all the requests that compute to the same hash will be sent to the replica.

When we define the consistentHash features, we can say how this hash will be created and, in other words, which components will be used to generate this hash, and this can be one of the following options:

  • httpHeaderName: Uses an HTTP Header to do the traffic distribution
  • httpCookie: Uses an HTTP Cookie to do the traffic distribution
  • httpQueryParameterName: Uses a Query String to do the traffic Distribution.
  • maglev: Uses Google’s Maglev Load Balancer to do the determination. You can read more about Maglev in the article from Google.
  • ringHash: Uses a ring-based hashed approach to load balancing between the available pods.

So, as you can see, you will have a lot of different options. Still, just the first three would be the most used to implement a sticky session, and usually, the HTTP Cookie (httpCookie) option will be the preferred one, as it would rely on the HTTP approach to manage the session between clients and servers.

Sticky Session Implementation Sample using TIBCO BW

We will define a very simple TIBCO BW workload to implement a REST service, serving a GET reply with a hardcoded value. To simplify the validation process, the application will log the hostname of the pod so quickly we can see who is handling each of the requests:

How To Enable Sticky Session on Your Kubernetes Workloads using Istio?

We deploy this in our Kubernetes cluster and expose it using a Kubernetes service; in our case, the name of this service will be test2-bwce-srv

On top of that, we apply the istio configuration, which will require three (3) istio objects: gateway, virtual service, and the destination rule. As our focus is on the destination rule, we will try to keep it as simple as possible in the other two objects:

 apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: default-gw
spec:
  selector:
    istio: ingressgateway
  servers:
  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP

Virtual Service:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: test-vs
spec:
  gateways:
  - default-gw
  hosts:
  - test.com
  http:
  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: test2-bwce-srv
        port:
          number: 8080

And finally, the DestinationRule will use a httpCookie that we will name ISTIOD, as you can see in the snippet below:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
    name: default-sticky-dr
    namespace: default
spec:
    host: test2-bwce-srv.default.svc.cluster.local
    trafficPolicy:
      loadBalancer:
        consistentHash:
          httpCookie: 
            name: ISTIOID
            ttl: 60s

Now, that we have already started our test, and after launching the first request, we get a new Cookie that is generated by istio itself that is shown in the Postman response window:

This request has been handled for one of the replicas available of the service, as you can see here:

All subsequent request from Postman already includes the cookie, and all of them are handled from the same pod:

While the other replica’ log is empty, as all the requests have been routed to that specific pod.

Summary

We covered in this article the reason behind the need for a sticky session in Kubernetes workload and how we can achieve that using the capabilities of the Istio Service Mesh. So, I hope this can help implement this configuration on your workloads that you can need today or in the future

Kubernetes Autoscaling 1.26: A Game-Changer for KEDA Users?

Introduction

Kubernetes Autoscaling has suffered a dramatic change. Since the Kubernetes 1.26 release, all components should migrate their HorizontalPodAutoscaler objects from the v1 to the new release v2that has been available since Kubernetes 1.23.

HorizontalPodAutoscaler is a crucial component in any workload deployed on a Kubernetes cluster, as the scalability of this solution is one of the great benefits and key features of this kind of environment.

A little bit of History

Kubernetes has introduced a solution for the autoscaling capability since the version Kubernetes 1.3 a long time ago, in 2016. And the solution was based on a control loop that runs at a specific interval that you can configure with the property --horizontal-pod-autoscaler-sync-period parameters that belong to the kube-controller-manager.

So, once during this period, it will get the metrics and evaluate through the condition defined on the HorizontalPodAutoscaler component. Initially, it was based on the compute resources used by the pod, main memory, and CPU.

Kubernetes Autoscaling 1.26: A Game-Changer for KEDA Users?

This provided an excellent feature, but with the past of time and adoption of the Kubernetes environment, it has been shown as a little narrow to handle all the scenarios that we should have, and here is where other awesome projects we have discussed here, such as KEDA brings into the picture to provide a much more flexible set of features.

Kubernetes AutoScaling Capabilities Introduced v2

With the release of the v2 of the Autoscaling API objects, we have included a range of capabilities to upgrade the flexibility and options available now. There most relevant ones are the following:

  • Scaling on custom metrics: With the new release, you can configure an HorizontalPodAutoscaler object to scale using custom metrics. When we talk about custom metrics, we talk about any metric generated from Kubernetes. You can see a detailed walkthrough about using Custom metrics in the official documentation
  • Scaling on multiple metrics: With the new release, you also have the option to scale based on more than one metric. So now the HorizontalPodAutoscalerwill evaluate each scaling rule condition, propose a new scale value for each of them, and take the maximum value as the final one.
  • Support for Metrics API: With the new release, the controller from the HoriztalPodAutoscaler components retrieves metrics from a series of registered APIs, such as metrics.k8s.io, custom.metrics.k8s.io ,external.metrics.k8s.io. For more information on the different metrics available, you can take a look at the design proposal
  • Configurable Scaling Behavior: With the new release, you have a new field, behavior, that allows configuring how the component will behave in terms of scaling up or scaling down activity. So, you can define different policies for the scaling up and others for the scaling down, limit the max number of replicas that can be added or removed in a specific period, to handle the issues with the spikes of some components as Java workloads, among others. Also, you can define a stabilization window to avoid stress when the metric is still fluctuating.

Kubernetes Autoscaling v2 vs KEDA

We have seen all the new benefits that Autoscaling v2 provides, so I’m sure that most of you are asking the same question: Is Kubernetes Autoscaling v2 killing KEDA?

Since the latest releases of KEDA, KEDA already includes the new objects under the autoscaling/v2 group as part of their development, as KEDA relies on the native objects from Kubernetes, and simplify part of the process you need to do when you want to use custom metric or external ones as they have scalers available for pretty much everything you could need now or even in the future.

But, even with that, there are still features that KEDA provides that are not covered here, such as the scaling “from zero” and “to zero” capabilities that are very relevant for specific kinds of workloads and to get a very optimized use of resources. Still, it’s safe to say that with the new features included in the autoscaling/v2 release, the gap is now smaller. Depending on your needs, you can go with the out-of-the-box capabilities without including a new component in your architecture.

Grafana Alerting vs AlertManager: A Comparison of Two Leading Monitoring Tools

Introduction

Grafana Alerting capabilities continue to improve in each new release the GrafanaLabs team does. Especially with the changes done in Grafana 8 and Grafana 9, many questions have been raised regarding its usage, the capabilities supported, and the comparison with other alternatives.

We want to start setting the context about Grafana Alerting based on the usual stack we deployed to improve the observability of our workloads. Grafana can be used for any workload; there is a preference for some specific ones being the most used solution when we talk about Kubernetes workloads.

In this kind of deployment, the stack we usually deploy is Grafana as the visualization tool and Prometheus as the core to gather all metrics, so all responsibilities are differentiated. Grafana draws all the information using its excellent dashboarding capabilities, gathering the information from Prometheus.

Grafana Alerting vs AlertManager: A Comparison of Two Leading Monitoring Tools

When we plan to start including alerts, as we cannot accept that we need to have a specific team just watching dashboards to detect where something is going wrong, we need to implement a way to push alerts.

Alerting capabilities in Grafana have been present since the beginning, but its capabilities in the early stages have been limited to generating graphical alerts focused on the dashboards. Instead of that, Prometheus acting as the brain, includes a side-card component called AlertManager that can handle the creation and notification of any alerts generated from all the information stored in Prometheus.

As main capabilities that Alert Manager provides are the definition of the alerts, a grouping of the alerts, dismiss rules to mute some notifications, and finally, the way to send that alert to any system based on a plugin system and a webhook to be able to extend it to any component available.

Grafana Alerting vs AlertManager: A Comparison of Two Leading Monitoring Tools

So, this is the initial stage, but this has been changed with the latest releases of Grafana in the last months, as commented, and now the barrier between both components is much fuzzier, as we’re going to see.

What are the main capabilities Grafana Alerting provides today?

Grafana Alerting allows you to define Alert rules defining the criteria under which this alert should fire. It can have different queries, conditions, evaluation frequency, and the duration over which the condition is met.

This alert can be generated from any of the sources supported in Grafana, and that’s a very relevant topic as this is not limited to the Prometheus data. With the eclosion of the GrafanaLabs stack with many new products such as Grafana Loki and Grafana Mimir, among others, this is especially relevant.

Once each of the alerts once it fires, you can define a Notification policy to decide where, when, and how each of these alerts is routed to. A notification policy also has a contact point associated with one or more notifiers.

Additionally, you can silence alerts to stop receiving notifications of a specific alert instance and mute alerts when you can define some period where new alerts will not be generated or notified.

All of that with powerful dashboarding capabilities using all the power of the Grafana dashboard features.

Grafana Alerting vs AlertManager: A Comparison of Two Leading Monitoring Tools

 Grafana Alerting vs Prometheus Alert Manager

After reading the previous section probably, you are confused because most of the new features added are very similar to the ones we have available on Prometheus AlertManager.

So, in that case, what tool should we use? Should we replace Prometheus AlertManager and start using Grafana Alerting? Should we use both? As you can imagine, this is one of these questions that doesn’t have clear answers as it will depend a lot on the context and your specific scenario, but let me give you some pointers around it.

  • Grafana Alerting can be very powerful if you are already inside the Grafana stack. If you are already using Grafana Loki (and require to generate alerts from it), Grafana Mimir, or directly Grafana cloud, probably Grafana Alert would provide a better fit for your ecosystem.
  • If you require complex alerts defined with complex queries and calculations, Prometheus AlertManager will provide a much more complex and rich ecosystem to generate your alerts.
  • If you are looking for a SaaS approach, Grafana Alerting is also provided as part of Grafana Cloud, so it can be used without the requirement to be installed in your ecosystem.
  • If you are using Grafana Alerting, you need to consider that the same component serving the dashboards is computing and generating the alerts, which would require additional HA capabilities. It will be a non-evitable relationship between both features (dashboards and alerts). Suppose that doesn’t resonate well with you because the criticality of your dashboard is not the same as the alerts, or you think your dashboard’s usage can affect the alerts’ performance. In that case, Prometheus Alert Manager will provide a better approach as it runs in a specific pod in isolation.
  • At this moment, Grafana Alerting uses a SQL Database to manage duplication among other features, so depending on the number of alerts you need to work on could not be enough in terms of performance, and the usage of the time series database from Prometheus can be a better fit.

Summary

Grafana Alerting is incredible progress on the journey of the Grafana Labs team to provide an end-to-end observability stack with a great fit on the rest of the ecosystem with the option to run it in SaaS mode and focus on ease of use. But there are better options than depending on your needs.

Understanding Istio ServiceEntry: How to Extend Your Service Mesh to External Endpoints

What Is An Istio ServiceEntry?

Istio ServiceEntry is the way to define an endpoint that doesn’t belong to the Istio Service Registry. Once the ServiceEntry is part of the registry, it can define rules and enforce policies as if they belong to the mesh.

Istio Service Entry answers the question you probably have done several times when using a Service Mesh. How can I do the same magic with external endpoints that I can do when everything is under my service mesh scope? And Istio Service Entry objects provide precisely that:

A way to have an extended mesh managing another kind of workload or, even better, in Istio’s own words:

ServiceEntry enables adding additional entries into Istio’s internal service registry so that auto-discovered services in the mesh can access/route to these manually specified services.

These services could be external to the mesh (e.g., web APIs) or mesh-internal services that are not part of the platform’s service registry (e.g., a set of VMs talking to services in Kubernetes).

What are the main capabilities of Istio ServiceEntry?

Here you can see a sample of the YAML definition of a Service Entry:

apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: external-svc-redirect
spec:
  hosts:
  - wikipedia.org
  - "*.wikipedia.org"
  location: MESH_EXTERNAL
  ports:
  - number: 443
    name: https
    protocol: TLS
  resolution: NONE

In this case, we have an external-svc-redirectServiceEntry object that is handling all calls going to the wikipedia.org, and we define the port and protocol to be used (TLS – 443) and classify this service as external to the mesh (MESH_EXTERNAL) as this is an external Web page.

You can also specify more details inside the ServiceEntry configuration, so you can, for example, define a hostname or IP and translate that to a different hostname and port because you can also specify the resolution mode you want to use for this specific Service Entry. If you see the snippet above, you will find a resolution field with NONE value that says it will not make any particular resolution. But other values valid are the following ones:

  • NONE: Assume that incoming connections have already been resolved (to a specific destination IP address).
  • STATIC: Use the static IP addresses specified in endpoints as the backing instances associated with the service.
  • DNS: Attempt to resolve the IP address by querying the ambient DNS asynchronously.
  • DNSROUNDROBIN: Attempt to resolve the IP address by querying the ambient DNS asynchronously. Unlike DNS, DNSROUNDROBIN only uses the first IP address returned when a new connection needs to be initiated without relying on complete results of DNS resolution, and references made to hosts will be retained even if DNS records change frequently eliminating draining connection pools and connection cycling.

To define the target of the ServiceEntry, you need to specify its endpoints by using a WorkloadEntry object. To do that, you need to provide the following data:

  • address: Address associated with the network endpoint without the port.
  • ports: Set of ports associated with the endpoint
  • weight: The load balancing weight associated with the endpoint.
  • locality: The locality associated with the endpoint. A locality corresponds to a failure domain (e.g., country/region/zone).
  • network: Network enables Istio to group endpoints resident in the same L3 domain/network.

What Can You Do With Istio ServiceEntry?

The number of use cases is enormous. Once a ServiceEntry is similar to what you have a Virtual Service defined, you can apply any destination rule to them to do a load balancer, a protocol switch, or any logic that can be done with the DestinationRule object. The same applies to the rest of the Istio CRD, such as RequestAuthentication, and PeerAuthorization, among others.

You can also have a graphical representation of the ServiceEntry inside Kiali, a visual representation for the Istio Service Mesh, as you can see in the picture below:

Understanding Istio ServiceEntry: How to Extend Your Service Mesh to External Endpoints

As you can define, an extended mesh with endpoints outside the Kubernetes cluster is something that is becoming more usual with the explosion of clusters available and the hybrid environments when you need to manage clusters of different topologies and not lose the centralized policy-based network management that the Istio Service Mesh provides to your platform.

Secure Your Services with Istio: A Step-by-Step Guide to Setting up Istio TLS Connections

Introduction

Istio TLS configuration is one of the essential features when we enable a Service Mesh. Istio Service Mesh provides so many features to define in a centralized, policy way how transport security, among other characteristics, is handled in the different workloads you have deployed on your Kubernetes cluster.

One of the main advantages of this approach is that you can have your application focus on the business logic they need to implement. These security aspects can be externalized and centralized without necessarily including an additional effort in each application you have deployed. This is especially relevant if you are following a polyglot approach (as you should) across your Kubernetes cluster workloads.

So, this time we’re going to have our applications just handling HTTP traffic for both internal and external, and depending on where we are reaching, we will force that connection to be TLS without the workload needed to be aware of it. So, let’s see how we can enable this Istio TLS configuration

Scenario View

We will use this picture you can see below to keep in mind the concepts and components that will interact as part of the different configurations we will apply to this.

  • We will use the ingress gateway to handle all incoming traffic to the Kubernetes cluster and the egress gateway to handle all outcoming traffic from the cluster.
  • We will have a sidecar container deployed in each application to handle the communication from the gateways or the pod-to-pod communication.

To simplify the testing applications, we will use the default sample applications Istio provides, which you can find here.

How to Expose TLS in Istio?

This is the easiest part, as all the incoming communication you will receive from the outside will enter the cluster through the Istio Ingress Gateway, so it is this component the one that needs to handle the TLS connection and then use the usual security approach to talk to the pod exposing the logic.

By default, the Istio Ingress Gateway already exposes a TLS port, as you can see in the picture below:

Secure Your Services with Istio: A Step-by-Step Guide to Setting up Istio TLS Connections

So we will need to define a Gateway that receives all this traffic through the HTTPS and redirect that to the pods, and we will do it as you can see here:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: bookinfo-gateway-https
  namespace: default
spec:
  selector:
    istio: ingressgateway
  servers:
    - hosts:
        - '*'
      port:
        name: https
        number: 443
        protocol: HTTPS
      tls:
        mode: SIMPLE # enables HTTPS on this port
        credentialName: httpbin-credential 

As we can see, it is a straightforward configuration, just adding the port HTTPS on the 443 and providing the TLS configuration:

And with that, we can already reach using SSL the same pages:

Secure Your Services with Istio: A Step-by-Step Guide to Setting up Istio TLS Connections

How To Consume SSL from Istio?

Now that we have generated a TLS incoming request without the application knowing anything, we will go one step beyond that and do the most challenging configuration. We will set up TLS/SSL connection to any outgoing communication outside the Kubernetes cluster without the application knowing anything about it.

To do so, we will use one of the Istio concepts we have already covered in a specific article. That concept is the Istio Service Entry that allows us to define an endpoint to manage it inside the MESH.

Here we can see the Wikipedia endpoint added to the Service Mesh registry:

 apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: se-app
  namespace: default
spec:
  hosts:
  - wikipedia.org
  ports:
  - name: https
    number: 443
    protocol: HTTPS
  resolution: DNS

Once we have configured the ServiceEntry, we can define a DestinationRule to force all connections to wikipedia.org will use the TLS configuration:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: tls-app
  namespace: default
spec:
  host: wikipedia.org
  trafficPolicy:
    tls:
      mode: SIMPLE

Kiali 101: Understanding and Utilizing this Essential Istio Service Mesh Management Tool

What Is Kiali?

Kiali is an open-source project that provides observability for your Istio service mesh. Developed by Red Hat, Kiali helps users understand the structure and behavior of their mesh and any issues that may arise.

Kiali provides a graphical representation of your mesh, showing the relationships between the various service mesh components, such as services, virtual services, destination rules, and more. It also displays vital metrics, such as request and error rates, to help you monitor the health of your mesh and identify potential issues.

 What are Kiali Main Capabilities?

One of the critical features of Kiali is its ability to visualize service-to-service communication within a mesh. This lets users quickly see how services are connected, and requests are routed through the mesh. This is particularly useful for troubleshooting, as it can help you quickly identify problems with service communication, such as misconfigured routing rules or slow response times.

Kiali 101: Understanding and Utilizing this Essential Service Mesh Management Tool

Kiali also provides several tools for monitoring the health of your mesh. For example, it can alert you to potential problems, such as a high error rate or a service not responding to requests. It also provides detailed tracking information, allowing you to see the exact path a request took through the mesh and where any issues may have occurred.

In addition to its observability features, Kiali provides several other tools for managing your service mesh. For example, it includes a traffic management module, which allows you to control the flow of traffic through your mesh easily, and a configuration management module, which helps you manage and maintain the various components of your mesh.

Overall, Kiali is an essential tool for anyone using an Istio service mesh. It provides valuable insights into the structure and behavior of your mesh, as well as power monitoring and management tools. Whether you are starting with Istio or an experienced user, Kiali can help ensure that your service mesh runs smoothly and efficiently.

What are the main benefits of using Kiali?

The main benefits of using Kiali are:

  • Improved observability of your Istio service mesh. Kiali provides a graphical representation of your mesh, showing the relationships between different service mesh components and displaying key metrics. This allows you to quickly understand the structure and behavior of your mesh and identify potential issues.
  • Easier troubleshooting. Kiali’s visualization of service-to-service communication and detailed tracing information make it easy to identify problems with service communication and pinpoint the source of any issues.
  • Enhanced traffic management. Kiali includes a traffic management module allowing you to control traffic flow through your mesh easily.
  • Improved configuration management. Kiali’s configuration management module helps you manage and maintain the various components of your mesh.

How To Install Kiali?

There are several ways to install Kiali as part of your Service Mesh deployment, being the preferred option to use the Operator model available here.

You can install this operator using Helm or OperatorHub. To install it using Helm Charts, you need to add the following repository using this command:

 helm repo add kiali https://kiali.org/helm-charts

** Remember that once you add a new repo, you need to run the following command to update the charts available

helm repo update

Now, you can install it using the helm installprimitive such as in the following sample:

helm install \
    --set cr.create=true \
    --set cr.namespace=istio-system \
    --namespace kiali-operator \
    --create-namespace \
    kiali-operator \
    kiali/kiali-operator

If you prefer going down the route of OperatorHub, you can use the following URL . Now, by clicking on the Install button, you will see the steps to have the component installed in your Kubernetes environment.

Kiali 101: Understanding and Utilizing this Essential Service Mesh Management Tool

In case you want a simple installation of Kiali, you can also use the sample YAML available inside the Istio installation folder using the following command:

kubectl apply -f $ISTIO_HOME/samples/addons/kiali.yaml

How does Kiali work?

Kiali is just the graphical representation of the information available regarding how the service mesh works. So it is not the responsibility of Kiali to store those metrics but to retrieve them and draw them in a relevant way for the user of the tool.

Prometheus does the storage of this data, so Kiali uses the Prometheus REST API to retrieve the information and draw it graphically, as you can see here:

Kiali 101: Understanding and Utilizing this Essential Service Mesh Management Tool
  • It is going to show several relevant parts of the graph. It will show the namespace selected and inside of them the different apps (it would detect an app in case you have a label added to the workload with the name app ). Inside, each app will add different services and pods with other icons (triangles for the services and squares for the pods).
  • It will also show how the traffic reaches the cluster through the different ingress gateways and how it goes out in case we have any egress gateway configured.
  • It will show the kind of traffic we’re handling and the different error rates based on the kind of protocol, such as TCP, HTTP, and so on, as you can see in the picture below. The protocol is decided based on a naming convention on the port name from the service with the expected format: protocol-name
Kiali 101: Understanding and Utilizing this Essential Service Mesh Management Tool

Can Kiali be used with any service mesh?

No, Kiali is specifically designed for use with Istio service meshes.

It provides observability, monitoring, and management tools for Istio service meshes but is incompatible with other service mesh technologies.

If you use a different service mesh, you will need to find an additional tool for managing and monitoring it.

Are there other alternatives to Kiali?

Even if you cannot see natural alternatives to Kiali to visualize your workloads and traffic through the Istio Service Mesh, you can use other tools to grab the metrics that feed Kiali and have custom visualization using more generic tools such as Grafana, among others.

Let’s talk about similar tools to Kialia for other Service Meshes, such as Linkerd, Consul Connect, or even Kuma. Most follow a different approach where the visualization part is not a separate “project” but relies on a standard visualization tool. That gives you much more flexibility, but at the same time, it lacks most of the excellent visualization of the traffic that Kialia provides, such as graph views or being able to modify the traffic directly from the graph view.

Helm Templates in Files: How To Customize ConfigMaps Content Simplified in 10 Minutes

Helm Templates in Files, such as ConfigMaps Content or Secrets Content, is of the most common requirements when you are in the process of creating a new helm chart. As you already know, Helm Chart is how we use Kubernetes to package our application resources and YAML in a single component that we can manage at once to ease the maintenance and operation process.

Helm Templates Overview

By default, the template process works with YAML files, allowing us to use some variables and some logic functions to customize and templatize our Kubernetes YAML resources to our needs.

So, in a nutshell, we can only have yaml files inside the templates folder of a YAML. But sometimes we would like to do the same process on ConfigMaps or Secrets or to be more concrete to the content of those ConfigMaps, for example, properties files and so on.

Helm Templates in Files: How To Customize ConfigMaps Content Simplified
Helm Templates in Files: Helm Templates Overview Overview showing the Files outside the templates that are usually required

As you can see it is quite normal to have different files such as json configuration file, properties files, shell scripts as part of your helm chart, and most of the times you would like to give some dynamic approach to its content, and that’s why using helm Templates in Files it is so important to be the main focus for this article

Helm Helper Functions to Manage Files

By default, Helm provides us with a set of functions to manage files as part of the helm chart to simplify the process of including them as part of the chart, such as the content of ConfigMap or Secret. Some of these functions are the following:

  • .Files.Glob: This function allows to find any pattern of internal files that matches the pattern, such as the following example:
    { range $path, $ := .Files.Glob ".yaml" }
  • .Files.Get: This is the simplest option to gather the content of a specific file that you know the full path inside your helm chart, such as the following sample: {{ .Files.Get "config1.toml" | b64enc }}

You can even combine both functions to use together such as in the following sample:

 {{ range $path, $_ :=  .Files.Glob  "**.yaml" }}
      {{ $.Files.Get $path }}
{{ end }}

Then you can combine that once you have the file that you want to use with some helper functions to easily introduce in a ConfigMap and a Secret as explained below:

  • .AsConfig : Use the file content to be introduced as ConfigMap handling the pattern: file-name: file-content
  • .AsSecrets: Similar to the previous one, but doing the base64 encoding for the data.

Here you can see a real example of using this approach in an actual helm chart situation:

apiVersion: v1
kind: Secret
metadata:
  name: zones-property
  namespace: {{ $.Release.Namespace }}
data: 
{{ ( $.Files.Glob "tml_zones_properties.json").AsSecrets | indent 2 }} 

You can find more information about that here. But this only allows us to grab the file as is and include it in a ConfigMap. It is not allowing us to do any logic or any substitution to the content as part of that process. So, if we want to modify this, this is not a valid sample.

How To Use Helm Templates in Files Such as ConfigMaps or Secrets?

In case we can do some modifications to the content, we need to use the following formula:

apiVersion: v1
kind: Secret
metadata:
  name: papi-property
  namespace: {{ $.Release.Namespace }}
data:
{{- range $path, $bytes := .Files.Glob "tml_papi_properties.json" }}
{{ base $path | indent 2 }}: {{ tpl ($.Files.Get $path) $ | b64enc }}
{{ end }}

So, here we are doing is first iterating for the files that match the pattern using the .Files.Glob function we explained before, iterating in case we have more than one. Then we manually create the structure following the pattern : file-name: file-content.

To do that, we use the function base to provide just the filename from a full path (and add the proper indentation) and then use the .Files.Get to grab the file’s content and do the base64 encoding using the b64encfunction because, in this case, we’re handling a secret.

The trick here is adding the tpl function that allows this file’s content to go through the template process; this is how all the modifications that we need to do and the variables referenced from the .Values object will be adequately replaced, giving you all the power and flexibility of the Helm Chart in text files such as properties, JSON files, and much more.

I hope this is as useful for you as it has been for me in creating new helm charts! And Look here for other tricks using loops or dependencies.