Best Kubernetes Management Tools: Dashboard, CLI, and GUI Options Compared

Best Kubernetes Management Tools: Dashboard, CLI, and GUI Options Compared

Maximizing the productivity of working with Kubernetes Environment with a tool for each persona

We all know that Kubernetes is the default environment for all our new applications we developed we will build. The flavor of that Kubernetes platform can be of different ways and forms, but one thing is clear, it is complex.

The reasons behind this complexity is being able to provide all the flexibility but it is also true that the k8s project never has put much effort to provide a simple way to manage your clusters and kubectl is the point of access to send commands leaving this door open to the community to provide its own solution and these are the things that we are going to discuss today.

Kubernetes Dashboard: The Default Option

Kubernetes Dashboard is the default option for most of the installations. It is a web-based interface that is part of the K8s project but not deployed by default when you install the cluster

Best Kubernetes Management Tools: Dashboard, CLI, and GUI Options Compared

K9S: The CLI option

K9S is one of the most common options for the ones that love a very powerful command-line interface with a lot of options at your disposal

Best Kubernetes Management Tools: Dashboard, CLI, and GUI Options Compared

It is a mix between all the power of a command-line interface with all the keyboard options at your disposals with a very fancy graphical view to have a quick overview of the status of your cluster at glance.

Lens — The Graphical Option

The lens is a very vitaminized GUI option that goes beyond that just showing the status of the K8S cluster or allowing modifications on the components. With integration with other projects such as Helm or support for the CRD. It provides a very pleasant experience of managing clusters with multi-cluster support as well. To know more about Lens you can take a look at this article that we cover its main features:

Octant — The Web Option

Octant provides an improved experience compared with the default web option discussed in this article using the Kubernetes dashboard. Built for extension with a plug-in system that allows you to extend or customize the behavior of octant to maximize your productivity managing K8S clusters. Including CRD support and graphical visualization of dependencies provides an awesome experience.

Best Kubernetes Management Tools: Dashboard, CLI, and GUI Options Compared

Summary

The have provided in this article different tools that will help you during the important task to manage or inspect your Kubernetes cluster. Each of them with its own characteristics and each of them focuses on different ways to provide the information (CLI, GUI and Web) so you can always find one that works best for your situation and preferences.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)

KEDA provides a rich environment to scale your application apart from the traditional HPA approach using CPU and Memory

Autoscaling is one of the great things of cloud-native environments and helps us to provide an optimized use of the operations. Kubernetes provides many options to do that being one of those the Horizontal Pod Autoscaler (HPA) approach.

HPA is the way Kubernetes has to detect if it is needed to scale any of the pods, and it is based on the metrics such as CPU usage or memory.

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Sometimes those metrics are not enough to decide if the number of replicas we have available is enough. Other metrics can provide a better perspective, such as the number of requests or the number of pending events.

Kubernetes Event-Driven Autoscaling (KEDA)

Here is where KEDA comes to help. KEDA stands for Kubernetes Event-Driven Autoscaling and provides a more flexible approach to scale our pods inside a Kubernetes cluster.

It is based on scalers that can implement different sources to measure the number of requests or events that we receive from different messaging systems such as Apache Kafka, AWS Kinesis, Azure EventHub, and other systems as InfluxDB or Prometheus.

KEDA works as it is shown in the picture below:

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)

We have our ScaledObject that links our external event source (i.e., Apache Kafka, Prometheus ..) with the Kubernetes Deployment we would like to scale and register that in the Kubernetes cluster.

KEDA will monitor the external source, and based on the metrics gathered, will communicate the Horizontal Pod Autoscaler to scale the workload as defined.

Testing the Approach with a Use-Case

So, now that we know how that works, we will do some tests to see it live. We are going to show how we can quickly scale one of our applications using this technology. And to do that, the first thing we need to do is to define our scenario.

In our case, the scenario will be a simple cloud-native application developed using a Flogo application exposing a REST service.

The first step we need to do is to deploy KEDA in our Kubernetes cluster, and there are several options to do that: Helm charts, Operation, or YAML files. In this case, we are going to use the Helm charts approach.

So, we are going to type the following commands to add the helm repository and update the charts available, and then deploy KEDA as part of our cluster configuration:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda

After running this command, KEDA is deployed in our K8S cluster, and it types the following command kubectl get all will provide a situation similar to this one:

pod/keda-operator-66db4bc7bb-nttpz 2/2 Running 1 10m
pod/keda-operator-metrics-apiserver-5945c57f94-dhxth 2/2 Running 1 10m

Now, we are going to deploy our application. As already commented to do that we are going to use our Flogo Application, and the flow will be as simple as this one:

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)
Flogo application listening to the requests
  • The application exposes a REST service using the /hello as the resource.
  • Received requests are printed to the standard output and returned a message to the requester

Once we have our application deployed on our Kubernetes application, we need to create a ScaledObject that is responsible for managing the scalability of that component:

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)
ScaleObject configuration for the application

We use Prometheus as a trigger, and because of that, we need to configure where our Prometheus server is hosted and what query we would like to do to manage the scalability of our component.

In our sample, we will use the flogo_flow_execution_count that is the metric that counts the number of requests that are received by this component, and when this has a rate higher than 100, it will launch a new replica.

After hitting the service with a Load Test, we can see that as soon as the service reaches the threshold, it launch a new replica to start handling requests as expected.

Event-Driven Autoscaling in Kubernetes with KEDA (Beyond CPU and Memory Metrics)
Autoscaling being done using Prometheus metrics.

All of the code and resources are hosted in the GitHub repository shown below:

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/


Summary

This post has shown that we have unlimited options in deciding the scalability options for our workloads. We can use the standard metrics like CPU and memory, but if we need to go beyond that, we can use different external sources of information to trigger that autoscaling.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes Health Checks Explained: Simplify Cluster Diagnostics with KubeEye

Kubernetes Health Checks Explained: Simplify Cluster Diagnostics with KubeEye

KubeEye supports you in the task of ensuring that your cluster is performing well and ensure all your best practices are being followed.

Kubernetes has become the new normal to deploy our applications and other serverless options, so the administration of these clusters has become critical for most enterprises, and doing a proper Kubernetes Health Check is becoming critical.

This task is clear that it is not an easy task. As always, the flexibility and power that technology provides to the users (in this case, the developers) also came with a trade-off with the operation and management’s complexity. And this is not an exception to that.

We have evolved, including managed options that simplify all the underlying setup and low-level management of the infrastructure behind it. However, many things need to be done for the cluster administration to have a happy experience in the journey of a Kubernetes Administrator.

A lot of concepts to deal with: namespaces, resource limits, quotas, ingress, services, routes, crd… Any help that we can get is welcome. And with this purpose in mind, KubeEye has been born.

KubeEye is an open-source project that helps to identify some issues in our Kubernetes Clusters. Using their creators’ words:

KubeEye aims to find various problems on Kubernetes, such as application misconfiguration(using Polaris), cluster components unhealthy and node problems(using Node-Problem-Detector). Besides predefined rules, it also supports custom defined rules.

So we can think like a buddy that is checking the environment to make sure that everything is well configured and healthy. Also, it allows us to define custom rules to make sure that all the actions that the different dev teams are doing are according to the predefined standards and best practices.

So let’s see how we can include KubeEye to do a health check of our environment. The first thing we need to do is to install it. At this moment, KubeEye only offers a release for Linux-based system, so if you are using other systems like me, you need to follow another approach and type the following commands:

git clone https://github.com/kubesphere/kubeeye.git
cd kubeeye
make install

After doing that, we end up with a new binary in our PATH named `ke`, and this is the only component needed to work with the app. The second step we need to do to get more detail on those diagnostics is to install the node problem detector component.

This component is a component installed in each node of the cluster. It helps to make more visible to the upstream layers issues regarding the behavior of the Kubernetes cluster. This is an optional step, but it will provide more meaningful data, and install that, we need to run the following command.

ke install npd

And now we’re ready to start checking our environment, and the order is as easy as this one.

ke diag

This will provide an output similar to this that is compounded by two different tables. The first one will be focused on the Pod and the issues and events raised as part of the platform’s status, and the other will focus on the rest of the elements and kinds of objects for the Kubernetes Clusters.

Kubernetes Health Checks Explained: Simplify Cluster Diagnostics with KubeEye
Output from the ke diag command

The table for the issues at the pod level has the following fields:

  • Namespace where the pod belongs to.
  • Severity of the issue.
  • Pod Name that is responsible for the issue
  • EventTime of where this event has been raised
  • Reason for the issue
  • Message with the detailed description of the issue

The second table for the other objects has the following structure:

  • Namespace where the object that has an issue that is being detected is deployed.
  • Severity of the issue.
  • Name of the component
  • Kind of the component
  • Time of where this issue has been raised
  • Message with the detailed description of the issue

Command’s output can also show other tables if some issues are detected at the node level.


Today we cover a fascinating topic as it is the Kubernetes Administration and introduce a new tool that helps your daily task.

I truly expect that this tool can be added to your toolbox and ease the path for a happy and healthy Kubernetes Cluster administration!

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Improving Development Security with Open Source DevSecOps Tools (Syft & Grype)

Improving Development Security with Open Source DevSecOps Tools (Syft & Grype)

Discover how Anchore can help you to keep your software safe and secure without losing agility.

Development Security is one of the big topics of today’s development practice. All the improvements that we got following the DevOps practices have generated many issues and concerns from the security perspective.

The explosion of components that the security teams need to deal with, container approaches, and polyglot environments gave us many benefits from the development and the operational perspective. Still, it made the security side of it more complex.

This is why there have been many movements regarding the “Shift left” approach and including security as part of the DevOps process creating the new term for DevSecOps that is becoming the new normal.

So, today what I would like to bring to you is a set of tools that I have just discovered that are created with the approach of making your life easier from the development security perspective because also developers need to be part of this and not leave all the responsibility to a different team.

This set of tools is name Anchore Toolbox, and they are open source and free to use, as you can see on the official webpage (https://anchore.com/opensource/)

So, what Anchore can provide to us? At the moment, we are talking about two different applications: Syft and Grype.

Syft

Syft is a CLI tool and go library for generating a Software Bill of Materials (SBOM) from container images and filesystems. Installation is as easy as just executing the following command:

curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

And after doing that, we need to type syft to see all the options at our disposal:

Improving Development Security with Open Source DevSecOps Tools (Syft & Grype)
Syft help menu with all the options available

So, in our case, I will use to generate a bill of materials from an existing Docker image from bitnami/kafka to show how this works. I need to type the following command:

syft bitnami/kafka

And after a few seconds to have the image loaded and analyzed, I get as the output the list of all and each of the packages that this image has installed and the version of each of them as shown in the picture below. One great thing is that it shows not only the operating system packages like what we have installed using apk or apt but also other components like java libraries as well so we can have a complete bill of materials for this container image.

 ✔ Loaded image
 ✔ Parsed image
 ✔ Cataloged image [204 packages]
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-javadoc.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-javadoc.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/scala-java8-compat_2.12–0.9.1.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/scala-java8-compat_2.12–0.9.1.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-test-sources.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-test-sources.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/jackson-module-scala_2.12–2.10.5.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/jackson-module-scala_2.12–2.10.5.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka-streams-scala_2.12–2.7.0.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka-streams-scala_2.12–2.7.0.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-test.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-test.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/scala-collection-compat_2.12–2.2.0.jar’
[0019] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/scala-collection-compat_2.12–2.2.0.jar’
[0020] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0.jar’
[0020] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0.jar’
[0020] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-sources.jar’
[0020] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/kafka_2.12–2.7.0-sources.jar’
[0020] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/scala-logging_2.12–3.9.2.jar’
[0020] WARN unexpectedly empty matches for archive ‘/opt/bitnami/kafka/libs/scala-logging_2.12–3.9.2.jar’
NAME VERSION TYPE
 java-archive
acl 2.2.53–4 deb
activation 1.1.1 java-archive
adduser 3.118 deb
aopalliance-repackaged 2.6.1 java-archive
apt 1.8.2.2 deb
argparse4j 0.7.0 java-archive
audience-annotations 0.5.0 java-archive
base-files 10.3+deb10u8 deb
base-passwd 3.5.46 deb
bash 5.0–4 deb
bsdutils 1:2.33.1–0.1 deb
ca-certificates 20200601~deb10u2 deb
com.fasterxml.jackson.module.jackson.module.scala java-archive
commons-cli 1.4 java-archive
commons-lang3 3.8.1 java-archive
...

Grype

Grype is a vulnerability scanner for container images and filesystems. It is the next step because it checks the image’s components and checks if there is any known vulnerability.

To install this component again is as easy as type the following command:

curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin

After doing that, we need to type grype to have the help menu with all the options at our disposal:

Improving Development Security with Open Source DevSecOps Tools (Syft & Grype)
Grype help menu with all the options available

Grype works in the following one. The first thing it does is load the vulnerability DB to check the different packages against this database to search for any known vulnerability. After doing that, follow the same pattern as syft and generate the bill of materials and check each of the components into the vulnerability database, and if there is a match. It just provides the ID of the vulnerability, the severity, and, if this is fixed into a higher version, provides the version where this vulnerability has been fixed.

Here you can see the output regarding the same image from bitnami/kafka with all the vulnerabilities detected

grype bitnami/kafka
 ✔ Vulnerability DB [updated]
 ✔ Loaded image
 ✔ Parsed image
 ✔ Cataloged image [204 packages]
 ✔ Scanned image [149 vulnerabilities]
[0018] ERROR matcher failed for pkg=Pkg(type=java-archive, name=, version=): matcher failed to fetch by CPE pkg=’’: product name is required
[0018] ERROR matcher failed for pkg=Pkg(type=java-archive, name=, version=): matcher failed to fetch by CPE pkg=’’: product name is required
[0018] ERROR matcher failed for pkg=Pkg(type=java-archive, name=, version=): matcher failed to fetch by CPE pkg=’’: product name is required
[0018] ERROR matcher failed for pkg=Pkg(type=java-archive, name=, version=): matcher failed to fetch by CPE pkg=’’: product name is required
[0018] ERROR matcher failed for pkg=Pkg(type=java-archive, name=, version=): matcher failed to fetch by CPE pkg=’’: product name is required
[0018] ERROR matcher failed for pkg=Pkg(type=java-archive, name=, version=): matcher failed to fetch by CPE pkg=’’: product name is required
NAME INSTALLED FIXED-IN VULNERABILITY SEVERITY
apt 1.8.2.2 CVE-2011–3374 Negligible
bash 5.0–4 CVE-2019–18276 Negligible
commons-lang3 3.8.1 CVE-2013–1907 Medium
commons-lang3 3.8.1 CVE-2013–1908 Medium
coreutils 8.30–3 CVE-2016–2781 Low
coreutils 8.30–3 CVE-2017–18018 Negligible
curl 7.64.0–4+deb10u1 CVE-2020–8169 Medium
..

Summary

These simple CLI tools help us a lot in the needed journey to keep our software current and free of known vulnerabilities and improve our development security. Also, as these are CLI apps and also can run on containers, it is effortless to include those as part of your CICD pipeline so vulnerabilities can check in an automated way.

They also provided a plugin to be included in the most used CI/CD systems such as Jenkins, Cloudbees, CircleCI, GitHub Actions, Bitbucket, Azure DevOps, and so on.

Optimize Prometheus Disk Usage: Practical TSDB Tuning and Retention Strategies

Optimize Prometheus Disk Usage: Practical TSDB Tuning and Retention Strategies

Check out the properties that will let you an optimized use of your disk storage and savings storing your monitoring data

Prometheus has become a standard component in our cloud architectures and Prometheus storage is becoming a critical aspect. So I am going to guess that if you are reading this you already know what Prometheus is. If this is not the case, please take your time to take a look at other articles that I have created:

We know that usually when we monitor using Prometheus we have so many exporters available at our disposal and also that each of them exposes a lot of very relevant metrics that we need to track everything we need to and that lead to very intensive usage of the storage available if we do not manage accordingly.

There are two factors that affect this. The first one is to optimize the number of metrics that we are storing and we already provide tips to do that in other articles as the ones shown below:

The other one is how long we store the metrics called the “retention period in Prometheus.” And this property has suffered a lot of changes during the different versions. If you would like to see all the history please take a look at this article from Robust Perception:

The main properties that you can configure are the following ones:

  • storage.tsdb.retention.time: Number of days to store the metrics by default to 15d. This property replaces the deprecated one storage.tsdb.retention.
  • storage.tsdb.retention.size: You can specify the limit of size to be used. This is not a hard limit but a minimum so please define some margin here. Units supported: B, KB, MB, GB, TB, PB, EB. Ex: “512MB”. This property is experimental so far as you can see in the official documentation:

https://prometheus.io/docs/prometheus/latest/storage

What about setting this configuration in the operator for Kubernetes? In that case, you also have similar options available in the values.yaml configuration file for the chart as you can see in the image below:

Optimize Prometheus Disk Usage: Practical TSDB Tuning and Retention Strategies
values.yml for the Prometheus Operator Helm Chart

This should help you get an optimized deployment of Prometheus that ensures all the features that Prometheus has but at the same time an optimal use of the resources at your disposal.

Additional to that, you should also check the Managed Service options that some providers have regarding Prometheus, such as the Amazon Managed Services for Prometheus, as you can see in the link below:

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native

Learn about the new horizontally-scalable, highly available, multi-tenant log aggregation system inspired by Prometheus that can be the best fit for your logging architecture

Loki vs ELK is something you are reading and hearing each time more often as from some time it is a raise on the dispute of becoming the de-factor standard for log aggregation architectures.

When we talk about Cloud-Native Architecture, log aggregation is something key that you need to consider. The old practices that we followed in the on-premises virtual machine approach for logging are not valid anymore.

We already cover this topic in my previous post that I recommend you to talk a look in case you haven’t read it yet, but this is not the topic for today.

Elasticsearch as the core and the different derívate de stacks like ELK/EFK had gained popularity in the last years, being pretty much the default open-source option when we talked about log aggregation and one of the options. The main public cloud providers have also adopted this solution as part of their own offering as the Amazon Elasticsearch Service provides.

But Elasticsearch is not perfect. If you have already used it, you probably know about it. Still, because their features are so awesome, especially on the searching and indexing capabilities, it has been the kind of leader today. But other topics like the storage use, the amount of power you need to handle it, and the architecture with different kinds of nodes (master, data, ingester) increase its complexity for cases when we need something smaller.

And to fill this gap is where our main character for today’s post arrives: Loki or Grafana Loki.

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native
Grafana Loki Logo from https://grafana.com/oss/loki/

Loki is a logging management system created as part of the Grafana project, and it has been created with a different approach in mind than Elasticsearch.

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.

So as we can read in the definition from their own page above, it covers several interesting topics in comparison with Elasticsearch:

  • First of all, it addresses some of the usual pain points for ELK customers: It is very cost-effective and easy to operate.
  • It clearly says that the approach is not the same as ELK, you are not going to have a complete index of the payload for the events, but it is based on different labels that you can define for each log stream.
  • Prometheus inspires that, which is critical because it enabled the idea to use log traces as metrics to empower our monitoring solutions.

Let’s start with the initial questions when we show an interesting new technology, and we would like to start testing it.

How can I install Loki?

Loki is distributed in different flavors to be installed in your environment in the way you need it.

  • SaaS: provided as part of the hosting solution of Grafana Cloud.
  • On-Premises: Provided as a normal binary to be download to run in an on-premises mode.
  • Cloud: Provided a docker image or even a Helm Chart to be deployed into your Kubernetes-based environment.

GrafanaLabs teams also provide Enterprise Support for Loki if you would like to use it on production mode in your company. Still, at the same time, all the code is licensed using Apache License 2.0, so you can take a look at all the code and contribute to it.

How does Loki work?

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native
High-level Loki Architecture from https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/

Architecture wise is very similar to the ELK/EFK stack and follow the same approach of “collectors” and “indexers” as ELK has:

  • Loki itself is the central node of the architecture responsible for storing the log traces and their labels and provided an API to search among them based on their own language LogQL (a similar approach to the PromQL from Prometheus).
  • promtail is the agent component that runs in the edge getting all those log traces that we need that can be running on a machine on-prem or a DaemonSet fashion in our own Kubernetes cluster. It plays the same role as Logstash/Fluent-bit/Fluentd works in the ELK/EFK stack. Promtail provides the usual plugin mode to filter and transforms our log traces as the other solutions provide. At the same time, it provides an interesting feature to convert those log traces into Prometheus metrics that can be scraped directly by your Prometheus server.
  • Grafana is the UI for the whole stack and plays a similar role as Kibana in the ELK/EFK stack. Grafana, among other plugins, provides direct integration with Loki as a Datasource to explore those traces and include them in the Dashboards.

Summary

Grafana Loki can be a great solution for your logging architecture to cover address two points: Provide a Lightweight log aggregation solution for your environment and at the same time enable your log traces as a source for your metrics, allowing you to create detailed, more business-oriented metrics that use in your dashboards and your monitoring systems.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Linkerd Service Mesh Explained: Solving Microservice Communication Challenges

Linkerd Service Mesh Explained: Solving Microservice Communication Challenges

CNCF-sponsored service Mesh Linkerd provides a lot of needed features in nowadays microservices architectures.

If you are reading this, probably, you are already aware of the challenges that come with a microservices architecture. It could be because you are reading about those or even because you are challenging them right now in your own skin.

One of the most common challenges is network and communication. With the eclosion of many components that need communication and the ephemeral approach of the cloud-native developments, many new features are a need when in the past were just a nice-to-have.

Concepts like service registry and service discovery, service authentication, dynamic routing policies, and circuit breaker patterns are no longer things that all the cool companies are doing but something basic to master the new microservice architecture as part of a cloud-native architecture platform, and here is where the Service Mesh project is increasing its popularity as a solution for most of this challenges and providing these features that are needed.

If you remember, a long time ago, I already cover that topic to introduce Istio as one of the options that we have:

But this project created by Google and IBM is not the only option that you have to provide those capabilities. As part of the Cloud Native Computing Foundation (CNCF), the Linkerd project provides similar features.

How to install Linkerd

To start using Linkerd, the first thing that we need to do is to install the software and to do that. We need to do two installations, one on the Kubernetes server and another on the host.

To install on the host, you need to go to the releases page and download the edition for your OS and install it.

I am using a Windows-based system in my sample, so I use chocolatey to install the client. After doing so, I can see the version of the CLI typing the following command:

linkerd version

And you will get an output that will say something similar to this:

PS C:\WINDOWS\system32> linkerd.exe version
Client version: stable-2.8.1
Server version: unavailable

Now we need to do the installation on the Kubernetes server, and to do so, we use the following command:

linkerd install | kubectl apply -f -

And you will get an output similar to this one:

PS C:\WINDOWS\system32> linkerd install | kubectl apply -f -
namespace/linkerd created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-identity created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-identity created
serviceaccount/linkerd-identity created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-controller created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-controller created
serviceaccount/linkerd-controller created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-destination created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-destination created
serviceaccount/linkerd-destination created
role.rbac.authorization.k8s.io/linkerd-heartbeat created
rolebinding.rbac.authorization.k8s.io/linkerd-heartbeat created
serviceaccount/linkerd-heartbeat created
role.rbac.authorization.k8s.io/linkerd-web created
rolebinding.rbac.authorization.k8s.io/linkerd-web created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-web-check created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-web-check created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-web-admin created
serviceaccount/linkerd-web created
customresourcedefinition.apiextensions.k8s.io/serviceprofiles.linkerd.io created
customresourcedefinition.apiextensions.k8s.io/trafficsplits.split.smi-spec.io created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-prometheus created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-prometheus created
serviceaccount/linkerd-prometheus created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created
serviceaccount/linkerd-proxy-injector created
secret/linkerd-proxy-injector-tls created
mutatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-proxy-injector-webhook-config created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-sp-validator created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-sp-validator created
serviceaccount/linkerd-sp-validator created
secret/linkerd-sp-validator-tls created
validatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-sp-validator-webhook-config created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-tap created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-tap-admin created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-tap created
clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-tap-auth-delegator created
serviceaccount/linkerd-tap created
rolebinding.rbac.authorization.k8s.io/linkerd-linkerd-tap-auth-reader created
secret/linkerd-tap-tls created
apiservice.apiregistration.k8s.io/v1alpha1.tap.linkerd.io created
podsecuritypolicy.policy/linkerd-linkerd-control-plane created
role.rbac.authorization.k8s.io/linkerd-psp created
rolebinding.rbac.authorization.k8s.io/linkerd-psp created
configmap/linkerd-config created
secret/linkerd-identity-issuer created
service/linkerd-identity created
deployment.apps/linkerd-identity created
service/linkerd-controller-api created
deployment.apps/linkerd-controller created
service/linkerd-dst created
deployment.apps/linkerd-destination created
cronjob.batch/linkerd-heartbeat created
service/linkerd-web created
deployment.apps/linkerd-web created
configmap/linkerd-prometheus-config created
service/linkerd-prometheus created
deployment.apps/linkerd-prometheus created
deployment.apps/linkerd-proxy-injector created
service/linkerd-proxy-injector created
service/linkerd-sp-validator created
deployment.apps/linkerd-sp-validator created
service/linkerd-tap created
deployment.apps/linkerd-tap created
configmap/linkerd-config-addons created
serviceaccount/linkerd-grafana created
configmap/linkerd-grafana-config created
service/linkerd-grafana created
deployment.apps/linkerd-grafana created

Now we can check that the installation has been done properly using the command:

linkerd check

And if everything has been done properly, you will get an output like this one:

PS C:\WINDOWS\system32> linkerd check
kubernetes-api
--------------
√ can initialize the client
√ can query the Kubernetes API
kubernetes-version
------------------
√ is running the minimum Kubernetes API version
√ is running the minimum kubectl version
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
√ no unschedulable pods
√ controller pod is running
√ can initialize the client
√ can query the control plane API
linkerd-config
--------------
√ control plane Namespace exists
√ control plane ClusterRoles exist
√ control plane ClusterRoleBindings exist
√ control plane ServiceAccounts exist
√ control plane CustomResourceDefinitions exist
√ control plane MutatingWebhookConfigurations exist
√ control plane ValidatingWebhookConfigurations exist
√ control plane PodSecurityPolicies exist
linkerd-identity
----------------
√ certificate config is valid
√ trust anchors are using supported crypto algorithm
√ trust anchors are within their validity period
√ trust anchors are valid for at least 60 days
√ issuer cert is using supported crypto algorithm
√ issuer cert is within its validity period
√ issuer cert is valid for at least 60 days
√ issuer cert is issued by the trust anchor

Then we can see the dashboard from Linkerd using the following command:

linkerd dashboard
Linkerd Service Mesh Explained: Solving Microservice Communication Challenges
Dashboard initial web page after a clean Linkerd installation

Deployment of the apps

We will use the same apps that we use some time ago to deploy istio, so if you want to remember what they are doing, you need to look again at that article.

I have uploaded the code to my GitHub repository, and you can find it here: https://github.com/alexandrev/bwce-linkerd-scenario

To deploy, you need to have your docker images pushed to a docker registry, and I will use Amazon ECR as the docker repository that I am going to use.

So I need to build and push those images with the following commands:

docker build -t provider:1.0 .
docker tag provider:1.0 938784100097.dkr.ecr.eu-west-2.amazonaws.com/provider-linkerd:1.0
docker push 938784100097.dkr.ecr.eu-west-2.amazonaws.com/provider-linkerd:1.0
docker build -t consumer:1.0 .
docker tag consumer:1.0 938784100097.dkr.ecr.eu-west-2.amazonaws.com/consumer-linkerd:1.0
docker push 938784100097.dkr.ecr.eu-west-2.amazonaws.com/consumer-linkerd:1.0

And after that, we are going to deploy the images on the Kubernetes cluster:

kubectl apply -f .\provider.yaml
kubectl apply -f .\consumer.yaml

And now we can see those apps in the Linkerd Dashboard on the default namespace:

Linkerd Service Mesh Explained: Solving Microservice Communication Challenges
Image showing the provider and consumer app as linked applications

And now, we can reach the consumer endpoint using the following command:

kubectl port-forward pod/consumer-v1-6cd49d6487-jjm4q 6000:6000

And if we reach the endpoint, we got the expected reply from the provider.

Linkerd Service Mesh Explained: Solving Microservice Communication Challenges
Sample response provided by the provider

And in the dashboard, we can see the stats of the provider:

Linkerd Service Mesh Explained: Solving Microservice Communication Challenges
Linkerd dashboard showing the stats of the flow

Also, linked by default provided a Grafana dashboard where you can see more metrics you can get there using the grafana link that the dashboard has.

Linkerd Service Mesh Explained: Solving Microservice Communication Challenges
Grafana link on the Linkerd Dashboard

When you enter that, you could see something like the dashboard shown below:

Linkerd Service Mesh Explained: Solving Microservice Communication Challenges
Grafana dashboard showing the linkerd statistics

Summary

With all this process, we have seen how easily we can deploy a linkerd service mesh in our Kubernetes cluster and how applications can integrate and interact with them. In the next posts, we will dive into the most advanced features that will help us in the new challenges that come with the Microservices architecture.

Event Streaming, APIs, and Data Integration: The 3 Core Pillars of Cloud Integration

Event Streaming, APIs, and Data Integration: The 3 Core Pillars of Cloud Integration

Event Streaming, API, and Data are the three musketeers that cover all the aspects of mastering integration in the cloud.

Enterprise Application Integration has been one of the most challenging IT landscape topics since the beginning of time. As soon as the number of systems and applications in big corporations started and grows, this becomes an issue we should address. This process’s efficiency will also define what companies succeed and which ones will fail as the cooperation between applications becomes critical to respond at the pace that the business was demanding.

I usually like to use the “road analogy” to define this:

It doesn’t matter if you have the fastest cars if you don’t have proper roads you will not get anywhere

This situation generates a lot of investments from the companies. Also, a lot of vendors and products were launched to support that situation. Some solutions are starting to emerge: EAI, ESB, SOA, Middleware, Distributed Integration Platforms, Cloud-Native solution, and iPaaS.

Each of the approaches provides a solution for existing challenges. As long as the rest of the industry was evolving, the solutions changed to adapt to the new reality (containers, microservices, DevOps, API-led, Event-Driven..)

So, what is the situation today? Today is extended the misconception that integration is the same as API and also that API is asynchronous HTTP based (REST, gRPC, GraphQL) API. But it is much more than this.

Event Streaming, APIs, and Data Integration: The 3 Core Pillars of Cloud Integration
Photo by Tolga Ulkan on Unsplash

1.- API

API-led is key to the integration solution for sure, especially focus on the philosophical approach behind it. Each component that we create today is created with a collaboration in mind to work with existing and future components to benefit the business in an easy and agile way. This transcends the protocol discussion completely.

API covers all different kinds of solutions from existing REST API to AsyncAPI to cover the event-based API.

2.- Event Streaming

Asynchronous communication is needed because the patterns and the requirements when you are talking about big enterprises and different applications make this essential. Requirements like pub-sub approach to increase independence among services and apps, control-flow to manage the execution of high-demanding flows that can exceed the throttling for applications, especially when talking about SaaS solutions.

So, you can think that this is a very opinionated view, but at the same time, this is something that most of the providers in this space have realized based on their actions:

  • AWS release SNS/SQS, its first messaging system, as its only solution.
  • Nov 2017 AWS releases Amazon MQ, another queue messaging system to cover the scenarios that SQS cannot cover.
  • May 2019 AWS releases Amazon MSK, a managed service for Kafka solutions to provide streaming data distribution and processing capabilities.

And that situation is because when we move away from smaller applications when we are migrating from a monolith approach to a micro-service application, more patterns and more requirements are needed, and here is. In contrast, integration solutions have shown in the past,t this is critical for integration solutions.

3.- Data Integration

Usually, when we talk about integration, we talk about Enterprise Application Integration because we have this past bias. Even I use this term to cover this topic, EAI, because we usually refer to these solutions. But since the last years, we are more focused on the data distribution amount the company rather than how applications integrated because what is really important is the data they are exchanging and how we can transform this raw data into insights that we can use to know better our customers or optimize our process or discover new opportunities based on that.

Until recently, this part was handled apart from the integration solutions. You probably rely on a focused ETL (Extract-Transform-Load) that helps to move the data from one database to another or a different kind of storage like a Data Warehouse so your Data Scientist can work with them.

But again, agility has made that this needs to change, and all the principles integration has in terms of providing more agility to the business is also applied to how we exchange data. We try to avoid the data’s technical move and try to ease the access and the right organization on this data. Data Virtualization and Data Streaming are the core capabilities that address and handle those challenges providing an optimized solution for how the data is distributed.

Summary

My main expectation with this article is to make you aware that when you are thinking about integrating your application, this is much more than the REST API that you are exposing, maybe using some API Gateway, and the needs can be very different. The strongest your integration platform is, the stronger your business will be.

3 Unusual Developer Tools That Seriously Boost Productivity (Beyond VS Code)

3 Unusual Developer Tools That Seriously Boost Productivity (Beyond VS Code)

A non-VS Code list for software engineers

This is not going to be one of those articles about tools that can help you develop code faster. If you’re interested in that, you can check out my previous articles regarding VS Code extensions, linters, and other tools that make your life as a developer easier.

My job is not only about software development but also about solving issues that my customers have. While their issues can be code-related, they can be an operation error or even a design problem.

I usually tend to define my role as a lone ranger. I go out there without knowing what I will face, and I need to be ready to adapt, solve the problem, and make customers happy. This experience has helped me to develop a toolchain that is important for doing that job.

Let’s dive in!


1. MobaXterm

This is the best tool to manage different connections to different servers (SSH access for a Linux server, RDP for a Windows server, etc.). Here are some of its key features:

  • Graphical SSH port-forwarding for those cases when you need to connect to a server you don’t have direct access to.
  • Easy identity management to save the passwords for the different servers. You can organize them hierarchically for ease of access, especially when you need to access so many servers for different environments and even different customers.
  • SFTP automatic connection when you connect to an SSH server. It lets you download and upload files as easily as dropping files there.
  • Automatical X11 forwarding so you can launch graphical applications from your Linux servers without needing to configure anything or use other XServers like XMing.
MobaXterm in action
MobaXterm in action

2. Beyond Compare

There are so many tools to compare files, and I think I have used all of them — from standalone applications like WinMerge, Meld, Araxis, KDiff, and others to extensions for text editors like VS Code and Notepad++.

However, none of those can compare to the one and only Beyond Compare.

I covered Beyond Compare when I started working on software engineering in 2010, and it is a tool that comes with me on each project I have. I use it every day. So, what makes this tool different from the rest?

It is simply the best tool to make any comparison because it does not just compare text and folders. It does that perfectly, but at the same time, it also compares ZIP files while browsing the content, JAR files, and so on. This is very important when we’d like to check if two JAR files that are uploaded in DEV and PROD are the same version of the tool or to know if a ZIP file has the right content when it is uploaded.

Beyond Compare in action
Beyond Compare 4

3. Vi Editor

This is the most important one — the best text editor of all time — and it is available pretty much on every server.

It is a command-line text editor with a huge number of shortcuts that allows you to be very productive when you are inside a server checking logs and configuration files to see where the problem is.

For a long time, I have had a Vi cheat sheet printed out to make sure I can master the most important shortcuts and thus increase my productivity while fighting inside the enemy lines (the customer’s servers).

Vi text editor
VIM — Vi improved the ultimate text editor.

Amazon Managed Service for Prometheus Explained: High-Availability Monitoring on AWS

Amazon Managed Service for Prometheus Explained: High-Availability Monitoring on AWS

Learn what Amazon Managed Service for Prometheus provides and how you can benefit from it.

Monitoring is one of the hot topics when we talk about cloud-native architectures. Prometheus is a graduated Cloud Native Computing Foundation (CNCF) open-source project and one of the industry-standard solutions when it comes to monitoring your cloud-native deployment, especially when Kubernetes is involved.

Following its own philosophy of providing a managed service for some of the most used open-source projects but fully integrated with the AWS ecosystem, AWS releases a general preview (at the time of writing this article): Amazon Managed Service for Prometheus (AMP).

The first thing is to define what Amazon Managed Service for Prometheus is and what features provide. So, this is the Amazon definition of the service:

A fully managed Prometheus-compatible monitoring service that makes it easy to monitor containerized applications securely and at scale.

And I would like to spend some time on some parts of this sentence.

  • Fully managed service: So, this will be hosted and handle by Amazon, and we are just going to interact with it using API as we do with other Amazon services like EKS, RDS, MSK, SQS/SNS, and so on.
  • Prometheus-compatible: So, that means that even if this is not a pure-Prometheus installation, the API is going to be compatible. So the Prometheus clients who can use Grafana or others to get the information from Prometheus will work without changing their interfaces.
  • Service at-scale: Amazon, as part of the managed service, will take care of the solution’s scalability. You don’t need to define an instance-type or how much RAM or CPU you do need. This is going to be handled by AWS.

So, that sounds perfect. So you can think that you are going to delete your Prometheus server, and it will start using this service. Maybe you are even typing something like helm delete prom… WAIT WAIT!!

Because at this point, this is not going to replace your local Prometheus server, but it will allow the integration with it. So, that means that your Prometheus server is going to act like a scraper for the whole monitoring scalable solution that AMP is providing, something as you can see in the picture below:

Amazon Managed Service for Prometheus Explained: High-Availability Monitoring on AWS
Reference Architecture for Amazon Prometheus Service

So, you are still going to need a Prometheus server, that is right, but all the complexity are going to be avoided and leverage to the managed service: Storage configuration, High availability, API optimization, and so on is going to be just provided to you out of the box.

Ingesting information into Amazon Managed Service for Prometheus

At this moment, there is two way to ingest data into the Amazon Prometheus Service:

  • From an existing Prometheus server using the remote_write capability and configuration, so that means that each series that is scraped by the local Prometheus is going to be sent to the Amazon Prometheus Service.
  • Using AWS Distro for OpenTelemetry to integrate with this service using the Prometheus Receiver and the AWS Prometheus Remote Write Exporter components to get that.

Summary

So this is a way to provide an enterprise-grade installation leveraging on all the knowledge that AWS has hosting and managing this solution at scale and optimized in terms of performance. You can focus on the components you need to get the metrics ingested into the service.

I am sure this will not be the last movement from AWS in observability and metrics management topics. I am sure they will continue to provide more tools to the developer’s and architects’ hands to define optimized solutions as easily as possible.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.