Top 3 Ways to Deploy Grafana Loki on Kubernetes for Scalable Logging

Top 3 Ways to Deploy Grafana Loki on Kubernetes for Scalable Logging

Deployment Models for a Scalable Log Aggregation Architecture using Loki

Deploy a scalable Loki is not an straightforward task. We already have talked about Loki in previous posts on the site, and it is becoming more and more popular, and usage becomes much more regular each day. That is why I think it makes sense to include another post regarding Loki Architecture.

Loki has several advantages that promote it as a default choice to deploy a Log Aggregation Stack. One of them is its scalability because you can see across different deployment models how many components you like to deploy and their responsibilities. So the target of the topic is to show you how to deploy an scalable Loki solution and this is based on two concepts: components available and how you group them.

So we will start with the different components:

  • ingester: responsible for writing log data to long-term storage backends (DynamoDB, S3, Cassandra, etc.) on the write path and returning log data for in-memory queries on the read path.
  • distributor: responsible for handling incoming streams by clients. It’s the first step in the write path for log data.
  • query-frontend: optional service providing the querier’s API endpoints and can be used to accelerate the read path
  • querier: service handles queries using the LogQL query language, fetching logs from the ingesters and long-term storage.
  • ruler: responsible for continually evaluating a set of configurable queries and performing an action based on the result.

Then you can join them into different groups, and depending on the size of these groups, you have a different deployment topology, as shown below:

Top 3 Ways to Deploy Grafana Loki on Kubernetes for Scalable Logging
Loki Monolith Deployment Mode
  • Monolith: As you can imagine, all components are running together in a single instance. This is the simplest option and is recommended as a 100 GB / day starting point. You can even scale this deployment, but it will scale all components simultaneously, and it should have a shared object state.
Top 3 Ways to Deploy Grafana Loki on Kubernetes for Scalable Logging
Loki Simple Scalable Deployment Mode
  • Simple Scalable Deployment Model: This is the second level, and it can scale up at several TB of logs per day. It consists of splitting the components into two different profiles: read and write.
Top 3 Ways to Deploy Grafana Loki on Kubernetes for Scalable Logging
Loki Microservice Deployment Mode
  • Microservices: That means that each component will be managed independently, giving you all the power at your hand to scale each of these components alone.

Defining the deployment model of each instance is very easy, and it is based on a single parameter named target. So depending on the value of the target it will follow one of the previous deployment models:

  • all (default): It will deploy as in monolith mode.
  • write: It will be the write path on the simple scalable deployment model
  • read: It will be the reading group on the simple, scalable deployment model
  • ingester, distributor, query-frontend, query-scheduler, querier, index-gateway, ruler, compactor: Individual values to deploy a single component for the microservice deployment model.

The target argument will help for an on-premises kind of deployment. Still, if you are using Helm for the installation, Loki already provides different helm charts for the other deployment models:

But all those helm charts are based on the same principle commented above on defining the role of each instance using the argument target, as you can see in the picture below:

Top 3 Ways to Deploy Grafana Loki on Kubernetes for Scalable Logging

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Prometheus ServiceMonitor vs PodMonitor: Key Differences and When to Use Each

black flat screen tv turned on near black and gray audio component

Discover the differences between two of the most used CRDs from Prometheus Operator and how to use each of them.

ServiceMonitor and PodMonitor are terms that you will start to see more often when talking about using Prometheus. We have covered a lot about Prometheus in the past articles. It is one of the primary references when we talk about monitoring in a cloud-native environment and is specially focused on the Kubernetes ecosystem.

Prometheus has a new deployment model under the Kubernetes Operator Framework in recent times. That has generated several changes in terms of resources and how we configure several aspects of the monitoring of our workloads. Some of these concepts are now managed as Customer Resource Definition (CRD) that are included to simplify the system’s configuration and be more aligned with the capabilities of the Kubernetes platform itself. This is great but, at the same time, changes how we need to use this excellent monitoring tool for cloud-native workloads.

Today, we will cover two of these new CRDs, one of the most relevant ones: ServiceMonitor and PodMonitor. These are the new objects that specify the resources that will be under monitoring scope to the platform, and each of them covers a different type of object, as you can imagine: Services and Pods.

Each of them has its definition file with its particular fields and metadata, and to highlight them, I will present a sample for each of them below:

Service Monitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    serviceMonitorSelector: prometheus
  name: prometheus
  namespace: prometheus
spec:
  endpoints:
  - interval: 30s
    targetPort: 9090
    path: /metrics
  namespaceSelector:
    matchNames:
    - prometheus
  selector:
    matchLabels:
      operated-prometheus: "true"

Pod Monitor

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: front-end
  labels:
    name: front-end
spec:
  namespaceSelector:
    matchNames:
      - sock-shop
  selector:
    matchLabels:
      name: front-end
  podMetricsEndpoints:
  - targetPort: 8079

As you can see, the definitions of the components are very similar and very intuitive, focusing on the selector to detect which pods or services we should monitor and some data regarding the specific target of the monitoring, so Prometheus knows how to scrape them.

If you want to take a look more in detail at any option you can configure on this CRD, I would recommend you to take a look at this URL which includes a detailed field to field documentation of the most common CRDs:

These components will belong to the definition of your workloads, which means that the creation and maintenance of these objects will be from the application’s developers.

That is great because several reasons:

  • It will include the Monitoring aspect of the component itself, so you will never forget the add the configuration from a specific component. That means it can be included in the duplicate YAML files or Helm Chart or a Kustomize resources as another needed resource.
  • It will de-centralize the monitoring configuration making it more agile, and it will progress as the software components do it.
  • It will reduce the impact on other monitored components as there is no need to act in any standard file or resource, so any different workloads will continue to work as expected.

Both objects are very similar in their purposes as both of them scrape all the endpoints that match the selector that we added. So, in which cases should I use one or the other?

The answer will be straightforward. By default, you will go with a ServiceMonitor because it will provide the metrics from the service itself and each of the endpoints that the service has, so each of the pods that are implementing the service will be discovered and scraped as part of this action.

So, in which cases should I use PodMonitor? Where the workload you are trying to monitor doesn’t act behind a service, so as there is no service defined, you cannot use ServiceMonitor. Do you want some examples of those? Let’s bring some!

  • Services that interact using other protocols that are not HTTP-based, such as Kafka, SQS/SNS, JMS, or similar ones.
  • Components such as CronJobs, DaemonSets, or non exposing any incoming connection model.

So I hope this article will help you understand the main difference between those objects and go a little deeper into how the new Prometheus Operator Framework resources work. We will continue covering other aspects in upcoming posts.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Promtail Explained: Turning Logs into Metrics for Prometheus and Loki

Promtail Explained: Turning Logs into Metrics for Prometheus and Loki

Promtail is the solution when you need to provide metrics that are only present on the log traces of the software you need to monitor to provide a consistent monitoring platform

It is a common understanding that three pillars in the observability world help us to get a complete view of the status of our own platforms and systems: Logs, Traces, and Metrics.

To provide a summary of the differences between each of them:

  • Metrics are the counters about the state of the different components from both a technical and a business view. So we can see here things like the CPU consumption, the number of requests, memory, or disk usage…
  • Logs are the different messages that each of the pieces of software in our platform provides to understand its current behavior and detect some non-expected situations.
  • Trace is the different data regarding the end-to-end request flow across the platform with the services and systems that have been part of that flow and data related to that concrete request.

We have solutions that claim to address all of them, mainly in the enterprise software with Dynatrace, AppDynamics, and similar. And on the other hand, we try to go with a specific solution for each of them that we can easily integrate together and we have discussed a lot about that options in previous articles.

But, some situations in that software don’t work following this path because we live in the most heterogeneous era. We all embrace, at some level, the polyglot approach on the new platforms. In some cases, we can see that software is using log traces to provide data related to metrics or other matters, and here is when we need to rely on pieces of software that help us “fix” that situation, and Promtail does specifically that.

Promtail is mainly a log forwarder similar to others like fluentd or fluent-bit from CNCF or logstash from the ELK stack. In this case, this is the solution from Grafana Labs, and as you can imagine, this is part of the Grafana stack with Loki to be the “master-mind” that we cover in this article that I recommend you to take a look at if you haven’t read it yet:

Promtail has two main ways of behaving as part of this architecture, and the first one is very similar to others in this space, as we commented before. It helps us ship our log traces from our containers to the central location that will mainly be Loki and can be a different one and provide the usual options to play and transform those traces as we can do in other solutions. You can look at all the options in the link below, but as you can imagine, this includes transformation, filtering, parsing, and so on.

But what makes promtail so different is just one of the actions that you can do, and that action is metrics. Metrics provides a specific way to, based on the data that we are reading from the logs, create Prometheus metrics that a Prometheus server can scrape. That means that you can use the log traces that you are processing that can something like this:

[2021–06–06 22:02.12] New request received for customer_id: 123
[2021–06–06 22:02.12] New request received for customer_id: 191
[2021–06–06 22:02.12] New request received for customer_id: 522

With this information apart to send those metrics to the central location to create a metric call, for example: `total_request_count` that will be generated by the promtail agent and also exposed by it and being able also to use a metrics approach even for systems or components that don’t provide a standard way to do that like a formal metrics API.

And the way to do this is very well integrated with the configuration. This is done with an additional stage (this is how we call the actions we can do in Promtail) that is namedmetrics.

The schema of that metric stage is straightforward, and if you are familiar with Prometheus, you will see how direct it is from a definition of Prometheus metrics to this snippet:

# A map where the key is the name of the metric and the value is a specific
# metric type.
metrics:
  [<string>: [ <metric_counter> | <metric_gauge> | <metric_histogram> ] ...]

So we start defining the kind of metrics that we would like to define, and we have the usual ones: counter, gauge, or histogram, and for each of them, we have a set of options to be able to declare our metrics as you can see here for a Counter Metrics

# The metric type. Must be Counter.
type: Counter

# Describes the metric.

[description: <string>]

# Defines custom prefix name for the metric. If undefined, default name “promtail_custom_” will be prefixed.

[prefix: <string>]

# Key from the extracted data map to use for the metric, # defaulting to the metric’s name if not present.

[source: <string>]

# Label values on metrics are dynamic which can cause exported metrics # to go stale (for example when a stream stops receiving logs). # To prevent unbounded growth of the /metrics endpoint any metrics which # have not been updated within this time will be removed. # Must be greater than or equal to ‘1s’, if undefined default is ‘5m’

[max_idle_duration: <string>]

config: # If present and true all log lines will be counted without # attempting to match the source to the extract map. # It is an error to specify `match_all: true` and also specify a `value`

[match_all: <bool>]

# If present and true all log line bytes will be counted. # It is an error to specify `count_entry_bytes: true` without specifying `match_all: true` # It is an error to specify `count_entry_bytes: true` without specifying `action: add`

[count_entry_bytes: <bool>]

# Filters down source data and only changes the metric # if the targeted value exactly matches the provided string. # If not present, all data will match.

[value: <string>]

# Must be either “inc” or “add” (case insensitive). If # inc is chosen, the metric value will increase by 1 for each # log line received that passed the filter. If add is chosen, # the extracted value most be convertible to a positive float # and its value will be added to the metric. action: <string>

And with that, you will have your metric created and exposed, just waiting for a Prometheus server to scrape it. If you would like to see all the options available, all this documentation is available in the Grafana Labs documentation that you can check in the link:

I hope you will find this interesting and a useful way to keep all your observability information managed correctly using the right solution and provide a solution for these pieces of software that don’t follow your paradigm.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes Health Checks Explained: Simplify Cluster Diagnostics with KubeEye

Kubernetes Health Checks Explained: Simplify Cluster Diagnostics with KubeEye

KubeEye supports you in the task of ensuring that your cluster is performing well and ensure all your best practices are being followed.

Kubernetes has become the new normal to deploy our applications and other serverless options, so the administration of these clusters has become critical for most enterprises, and doing a proper Kubernetes Health Check is becoming critical.

This task is clear that it is not an easy task. As always, the flexibility and power that technology provides to the users (in this case, the developers) also came with a trade-off with the operation and management’s complexity. And this is not an exception to that.

We have evolved, including managed options that simplify all the underlying setup and low-level management of the infrastructure behind it. However, many things need to be done for the cluster administration to have a happy experience in the journey of a Kubernetes Administrator.

A lot of concepts to deal with: namespaces, resource limits, quotas, ingress, services, routes, crd… Any help that we can get is welcome. And with this purpose in mind, KubeEye has been born.

KubeEye is an open-source project that helps to identify some issues in our Kubernetes Clusters. Using their creators’ words:

KubeEye aims to find various problems on Kubernetes, such as application misconfiguration(using Polaris), cluster components unhealthy and node problems(using Node-Problem-Detector). Besides predefined rules, it also supports custom defined rules.

So we can think like a buddy that is checking the environment to make sure that everything is well configured and healthy. Also, it allows us to define custom rules to make sure that all the actions that the different dev teams are doing are according to the predefined standards and best practices.

So let’s see how we can include KubeEye to do a health check of our environment. The first thing we need to do is to install it. At this moment, KubeEye only offers a release for Linux-based system, so if you are using other systems like me, you need to follow another approach and type the following commands:

git clone https://github.com/kubesphere/kubeeye.git
cd kubeeye
make install

After doing that, we end up with a new binary in our PATH named `ke`, and this is the only component needed to work with the app. The second step we need to do to get more detail on those diagnostics is to install the node problem detector component.

This component is a component installed in each node of the cluster. It helps to make more visible to the upstream layers issues regarding the behavior of the Kubernetes cluster. This is an optional step, but it will provide more meaningful data, and install that, we need to run the following command.

ke install npd

And now we’re ready to start checking our environment, and the order is as easy as this one.

ke diag

This will provide an output similar to this that is compounded by two different tables. The first one will be focused on the Pod and the issues and events raised as part of the platform’s status, and the other will focus on the rest of the elements and kinds of objects for the Kubernetes Clusters.

Kubernetes Health Checks Explained: Simplify Cluster Diagnostics with KubeEye
Output from the ke diag command

The table for the issues at the pod level has the following fields:

  • Namespace where the pod belongs to.
  • Severity of the issue.
  • Pod Name that is responsible for the issue
  • EventTime of where this event has been raised
  • Reason for the issue
  • Message with the detailed description of the issue

The second table for the other objects has the following structure:

  • Namespace where the object that has an issue that is being detected is deployed.
  • Severity of the issue.
  • Name of the component
  • Kind of the component
  • Time of where this issue has been raised
  • Message with the detailed description of the issue

Command’s output can also show other tables if some issues are detected at the node level.


Today we cover a fascinating topic as it is the Kubernetes Administration and introduce a new tool that helps your daily task.

I truly expect that this tool can be added to your toolbox and ease the path for a happy and healthy Kubernetes Cluster administration!

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Optimize Prometheus Disk Usage: Practical TSDB Tuning and Retention Strategies

Optimize Prometheus Disk Usage: Practical TSDB Tuning and Retention Strategies

Check out the properties that will let you an optimized use of your disk storage and savings storing your monitoring data

Prometheus has become a standard component in our cloud architectures and Prometheus storage is becoming a critical aspect. So I am going to guess that if you are reading this you already know what Prometheus is. If this is not the case, please take your time to take a look at other articles that I have created:

We know that usually when we monitor using Prometheus we have so many exporters available at our disposal and also that each of them exposes a lot of very relevant metrics that we need to track everything we need to and that lead to very intensive usage of the storage available if we do not manage accordingly.

There are two factors that affect this. The first one is to optimize the number of metrics that we are storing and we already provide tips to do that in other articles as the ones shown below:

The other one is how long we store the metrics called the “retention period in Prometheus.” And this property has suffered a lot of changes during the different versions. If you would like to see all the history please take a look at this article from Robust Perception:

The main properties that you can configure are the following ones:

  • storage.tsdb.retention.time: Number of days to store the metrics by default to 15d. This property replaces the deprecated one storage.tsdb.retention.
  • storage.tsdb.retention.size: You can specify the limit of size to be used. This is not a hard limit but a minimum so please define some margin here. Units supported: B, KB, MB, GB, TB, PB, EB. Ex: “512MB”. This property is experimental so far as you can see in the official documentation:

https://prometheus.io/docs/prometheus/latest/storage

What about setting this configuration in the operator for Kubernetes? In that case, you also have similar options available in the values.yaml configuration file for the chart as you can see in the image below:

Optimize Prometheus Disk Usage: Practical TSDB Tuning and Retention Strategies
values.yml for the Prometheus Operator Helm Chart

This should help you get an optimized deployment of Prometheus that ensures all the features that Prometheus has but at the same time an optimal use of the resources at your disposal.

Additional to that, you should also check the Managed Service options that some providers have regarding Prometheus, such as the Amazon Managed Services for Prometheus, as you can see in the link below:

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native

Learn about the new horizontally-scalable, highly available, multi-tenant log aggregation system inspired by Prometheus that can be the best fit for your logging architecture

Loki vs ELK is something you are reading and hearing each time more often as from some time it is a raise on the dispute of becoming the de-factor standard for log aggregation architectures.

When we talk about Cloud-Native Architecture, log aggregation is something key that you need to consider. The old practices that we followed in the on-premises virtual machine approach for logging are not valid anymore.

We already cover this topic in my previous post that I recommend you to talk a look in case you haven’t read it yet, but this is not the topic for today.

Elasticsearch as the core and the different derívate de stacks like ELK/EFK had gained popularity in the last years, being pretty much the default open-source option when we talked about log aggregation and one of the options. The main public cloud providers have also adopted this solution as part of their own offering as the Amazon Elasticsearch Service provides.

But Elasticsearch is not perfect. If you have already used it, you probably know about it. Still, because their features are so awesome, especially on the searching and indexing capabilities, it has been the kind of leader today. But other topics like the storage use, the amount of power you need to handle it, and the architecture with different kinds of nodes (master, data, ingester) increase its complexity for cases when we need something smaller.

And to fill this gap is where our main character for today’s post arrives: Loki or Grafana Loki.

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native
Grafana Loki Logo from https://grafana.com/oss/loki/

Loki is a logging management system created as part of the Grafana project, and it has been created with a different approach in mind than Elasticsearch.

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.

So as we can read in the definition from their own page above, it covers several interesting topics in comparison with Elasticsearch:

  • First of all, it addresses some of the usual pain points for ELK customers: It is very cost-effective and easy to operate.
  • It clearly says that the approach is not the same as ELK, you are not going to have a complete index of the payload for the events, but it is based on different labels that you can define for each log stream.
  • Prometheus inspires that, which is critical because it enabled the idea to use log traces as metrics to empower our monitoring solutions.

Let’s start with the initial questions when we show an interesting new technology, and we would like to start testing it.

How can I install Loki?

Loki is distributed in different flavors to be installed in your environment in the way you need it.

  • SaaS: provided as part of the hosting solution of Grafana Cloud.
  • On-Premises: Provided as a normal binary to be download to run in an on-premises mode.
  • Cloud: Provided a docker image or even a Helm Chart to be deployed into your Kubernetes-based environment.

GrafanaLabs teams also provide Enterprise Support for Loki if you would like to use it on production mode in your company. Still, at the same time, all the code is licensed using Apache License 2.0, so you can take a look at all the code and contribute to it.

How does Loki work?

Loki vs ELK Stack: Lightweight Log Aggregation for Kubernetes and Cloud-Native
High-level Loki Architecture from https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/

Architecture wise is very similar to the ELK/EFK stack and follow the same approach of “collectors” and “indexers” as ELK has:

  • Loki itself is the central node of the architecture responsible for storing the log traces and their labels and provided an API to search among them based on their own language LogQL (a similar approach to the PromQL from Prometheus).
  • promtail is the agent component that runs in the edge getting all those log traces that we need that can be running on a machine on-prem or a DaemonSet fashion in our own Kubernetes cluster. It plays the same role as Logstash/Fluent-bit/Fluentd works in the ELK/EFK stack. Promtail provides the usual plugin mode to filter and transforms our log traces as the other solutions provide. At the same time, it provides an interesting feature to convert those log traces into Prometheus metrics that can be scraped directly by your Prometheus server.
  • Grafana is the UI for the whole stack and plays a similar role as Kibana in the ELK/EFK stack. Grafana, among other plugins, provides direct integration with Loki as a Datasource to explore those traces and include them in the Dashboards.

Summary

Grafana Loki can be a great solution for your logging architecture to cover address two points: Provide a Lightweight log aggregation solution for your environment and at the same time enable your log traces as a source for your metrics, allowing you to create detailed, more business-oriented metrics that use in your dashboards and your monitoring systems.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?

Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?

Discover SARChart and kSAR as critical utilities to be part of your toolbelt for administration or troubleshooting

There was a time when we didn’t have public cloud providers providing us with a bunch of kinds of services and a whole platform and experience unified, covering all the aspects of our technical needs when we talked about an IT environment and sysstat metrics were key there.

There was a time when AWS Cloud Watch, Azure Monitor, Prometheus were not a thing, and we need to deal with Linux servers without a complete portal providing all the metrics that we could need.

There was a time… that it is still the present for so many customers and organizations all over the world and they still need to deal with this situation, and probably you face this situation now or even in the future. So, let’s see what we can do regarding that.

Introducing sysstat

For several decades the standard way to extract the usage metrics from a Linux server was sysstat. Based on the words on its official web page, this is what sysstat is:

The sysstat utilities are a collection of performance monitoring tools for Linux. These include sar, sadf, mpstat, iostat, tapestat, pidstat, cifsiostat and sa tools

Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?
Sysstat is an ancient but reliable piece of software that its owner continue to update even today.. but keeping the same webpage since the beginning 🙂

Sysstat is old but powerful, and it has so many options that have to save my life in a lot of customers and provide a lot of handy information that I needed at that time. But today, I am going to talk about a specific utility from the whole lot, that is sar.

sar is the command to be able to query the performance metrics for an existing machine. Just typing the command sar is enough to start seeing awesome things. That will give you the CPU metrics for the whole day for each of the CPUs that your machine has and also split depending on the kind of usage (user, system, idle, all).

Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?
Execution of command sar in a local machine

But these metrics are not only the things that you can get. Other options available

  • sar -r: Provide memory metrics
  • sar -q: Provide the load metrics.
  • sar -n: Provide the network metrics.
  • sar -A: Provides ALL the metrics.
  • sar -f /var/log/sysstat/sa[day-of-the-month]: It will provide metrics for the day of the month instead of the current day.

There are a lot of options more that you can use on your daily basis, so if you need something concrete, take a look at the man page for the sar command:

But we are all visual people, right? It is true that seeing trends and evolutions is more complex in text mode and also seeing only daily data at a time. So take a look at the options to handle that challenge:

kSAR

Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?
Logo from the kSAR application (https://sourceforge.net/projects/ksar/)

Java-based developed frontend using Swing library to represents the data from sar visually. It is a portable one, so you need the JAR file to execute it. And you can invoke it in several ways:

  • Providing the file you got from a machine that you executed the sar command.
  • Connecting using SSH to a remote machine and running the command that you need.
Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?
Graphical visualization of the sar metrics using kSAR

SARChart

What about when you are on a machine that you don’t have the rights to install any application, even a portable one as kSAR is, or maybe you only have your tablet available? In that case, we have SARChart.

Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?
Homepage of the SARChart application (https://sarchart.dotsuresh.com/)

SARChart is a web application that provides a graphical analysis of the sar files. So you only need to upload the file to get a complete graphical and well-looked analysis of your data covering all its aspects. Additionally, all the work is done at the client level without sending any of your data to any server.

Sysstat Metrics and Tools: How to Get an Awesome Graphical Analysis?
CPU usage analysis provided by SARChart

Summary

I hope you find these tools interesting if you didn’t know about them, and I also hope that they can help you with your daily work or at least be part of your toolset to be at your disposal when you need them.

Amazon Managed Service for Prometheus Explained: High-Availability Monitoring on AWS

Amazon Managed Service for Prometheus Explained: High-Availability Monitoring on AWS

Learn what Amazon Managed Service for Prometheus provides and how you can benefit from it.

Monitoring is one of the hot topics when we talk about cloud-native architectures. Prometheus is a graduated Cloud Native Computing Foundation (CNCF) open-source project and one of the industry-standard solutions when it comes to monitoring your cloud-native deployment, especially when Kubernetes is involved.

Following its own philosophy of providing a managed service for some of the most used open-source projects but fully integrated with the AWS ecosystem, AWS releases a general preview (at the time of writing this article): Amazon Managed Service for Prometheus (AMP).

The first thing is to define what Amazon Managed Service for Prometheus is and what features provide. So, this is the Amazon definition of the service:

A fully managed Prometheus-compatible monitoring service that makes it easy to monitor containerized applications securely and at scale.

And I would like to spend some time on some parts of this sentence.

  • Fully managed service: So, this will be hosted and handle by Amazon, and we are just going to interact with it using API as we do with other Amazon services like EKS, RDS, MSK, SQS/SNS, and so on.
  • Prometheus-compatible: So, that means that even if this is not a pure-Prometheus installation, the API is going to be compatible. So the Prometheus clients who can use Grafana or others to get the information from Prometheus will work without changing their interfaces.
  • Service at-scale: Amazon, as part of the managed service, will take care of the solution’s scalability. You don’t need to define an instance-type or how much RAM or CPU you do need. This is going to be handled by AWS.

So, that sounds perfect. So you can think that you are going to delete your Prometheus server, and it will start using this service. Maybe you are even typing something like helm delete prom… WAIT WAIT!!

Because at this point, this is not going to replace your local Prometheus server, but it will allow the integration with it. So, that means that your Prometheus server is going to act like a scraper for the whole monitoring scalable solution that AMP is providing, something as you can see in the picture below:

Amazon Managed Service for Prometheus Explained: High-Availability Monitoring on AWS
Reference Architecture for Amazon Prometheus Service

So, you are still going to need a Prometheus server, that is right, but all the complexity are going to be avoided and leverage to the managed service: Storage configuration, High availability, API optimization, and so on is going to be just provided to you out of the box.

Ingesting information into Amazon Managed Service for Prometheus

At this moment, there is two way to ingest data into the Amazon Prometheus Service:

  • From an existing Prometheus server using the remote_write capability and configuration, so that means that each series that is scraped by the local Prometheus is going to be sent to the Amazon Prometheus Service.
  • Using AWS Distro for OpenTelemetry to integrate with this service using the Prometheus Receiver and the AWS Prometheus Remote Write Exporter components to get that.

Summary

So this is a way to provide an enterprise-grade installation leveraging on all the knowledge that AWS has hosting and managing this solution at scale and optimized in terms of performance. You can focus on the components you need to get the metrics ingested into the service.

I am sure this will not be the last movement from AWS in observability and metrics management topics. I am sure they will continue to provide more tools to the developer’s and architects’ hands to define optimized solutions as easily as possible.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Optimize Prometheus Disk Usage: Reduce TSDB Size and Control Metrics Cardinality

Optimize Prometheus Disk Usage: Reduce TSDB Size and Control Metrics Cardinality

Learn some tricks to analyze and optimize the usage that you are doing of the TSDB and save money on your cloud deployment.

In previous posts, we discussed how the storage layer worked for Prometheus and how effective it was. But in the current times, we are of cloud computing we know that each technical optimization is also a cost optimization as well and that is why we need to be very diligent about any option that we use regarding optimization.

We know that usually when we monitor using Prometheus we have so many exporters available at our disposal and also that each of them exposes a lot of very relevant metrics that we need to track everything we need to. But also, we should be aware that there are also metrics that we don’t need at this moment or we don’t plan to use it. So, if we are not planning to use, why do we want to waste disk space storing them?

So, let’s start taking a look at one of the exporters we have in our system. In my case, I would like to use a BusinessWorks Container Application that exposes metrics about its utilization. If you check their metrics endpoint you could see something like this:

# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_221-b27",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.0318492E8
jvm_memory_bytes_used{area="nonheap",} 1.52094712E8
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 1.35266304E8
jvm_memory_bytes_committed{area="nonheap",} 1.71302912E8
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 1.073741824E9
jvm_memory_bytes_max{area="nonheap",} -1.0
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{area="heap",} 1.34217728E8
jvm_memory_bytes_init{area="nonheap",} 2555904.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 3.3337536E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 1.04914136E8
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 1.384304E7
jvm_memory_pool_bytes_used{pool="G1 Eden Space",} 3.3554432E7
jvm_memory_pool_bytes_used{pool="G1 Survivor Space",} 1048576.0
jvm_memory_pool_bytes_used{pool="G1 Old Gen",} 6.8581912E7
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 3.3619968E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 1.19697408E8
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.7985536E7
jvm_memory_pool_bytes_committed{pool="G1 Eden Space",} 4.6137344E7
jvm_memory_pool_bytes_committed{pool="G1 Survivor Space",} 1048576.0
jvm_memory_pool_bytes_committed{pool="G1 Old Gen",} 8.8080384E7
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
jvm_memory_pool_bytes_max{pool="G1 Eden Space",} -1.0
jvm_memory_pool_bytes_max{pool="G1 Survivor Space",} -1.0
jvm_memory_pool_bytes_max{pool="G1 Old Gen",} 1.073741824E9
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
jvm_memory_pool_bytes_init{pool="G1 Eden Space",} 7340032.0
jvm_memory_pool_bytes_init{pool="G1 Survivor Space",} 0.0
jvm_memory_pool_bytes_init{pool="G1 Old Gen",} 1.26877696E8
# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="direct",} 148590.0
jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool="direct",} 148590.0
jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool="direct",} 19.0
jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 16993.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 17041.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 48.0
# HELP bwce_activity_stats_list BWCE Activity Statictics list
# TYPE bwce_activity_stats_list gauge
# HELP bwce_activity_counter_list BWCE Activity related Counters list
# TYPE bwce_activity_counter_list gauge
# HELP all_activity_events_count BWCE All Activity Events count by State
# TYPE all_activity_events_count counter
all_activity_events_count{StateName="CANCELLED",} 0.0
all_activity_events_count{StateName="COMPLETED",} 0.0
all_activity_events_count{StateName="STARTED",} 0.0
all_activity_events_count{StateName="FAULTED",} 0.0
# HELP activity_events_count BWCE All Activity Events count by Process, Activity State
# TYPE activity_events_count counter
# HELP activity_total_evaltime_count BWCE Activity EvalTime by Process and Activity
# TYPE activity_total_evaltime_count counter
# HELP activity_total_duration_count BWCE Activity DurationTime by Process and Activity
# TYPE activity_total_duration_count counter
# HELP bwpartner_instance:total_request Total Request for the partner invocation which mapped from the activities
# TYPE bwpartner_instance:total_request counter
# HELP bwpartner_instance:total_duration_ms Total Duration for the partner invocation which mapped from the activities (execution or latency)
# TYPE bwpartner_instance:total_duration_ms counter
# HELP bwce_process_stats BWCE Process Statistics list
# TYPE bwce_process_stats gauge
# HELP bwce_process_counter_list BWCE Process related Counters list
# TYPE bwce_process_counter_list gauge
# HELP all_process_events_count BWCE All Process Events count by State
# TYPE all_process_events_count counter
all_process_events_count{StateName="CANCELLED",} 0.0
all_process_events_count{StateName="COMPLETED",} 0.0
all_process_events_count{StateName="STARTED",} 0.0
all_process_events_count{StateName="FAULTED",} 0.0
# HELP process_events_count BWCE Process Events count by Operation
# TYPE process_events_count counter
# HELP process_duration_seconds_total BWCE Process Events duration by Operation in seconds
# TYPE process_duration_seconds_total counter
# HELP process_duration_milliseconds_total BWCE Process Events duration by Operation in milliseconds
# TYPE process_duration_milliseconds_total counter
# HELP bwdefinitions:partner BWCE Process Events count by Operation
# TYPE bwdefinitions:partner counter
bwdefinitions:partner{ProcessName="t1.module.item.getTransactionData",ActivityName="FTLPublisher",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="TransactionService",PartnerOperation="GetTransactionsOperation",Location="internal",PartnerMiddleware="MW",} 1.0
bwdefinitions:partner{ProcessName=" t1.module.item.auditProcess",ActivityName="KafkaSendMessage",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="AuditService",PartnerOperation="AuditOperation",Location="internal",PartnerMiddleware="MW",} 1.0
bwdefinitions:partner{ProcessName="t1.module.item.getCustomerData",ActivityName="JMSRequestReply",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="CustomerService",PartnerOperation="GetCustomerDetailsOperation",Location="internal",PartnerMiddleware="MW",} 1.0
# HELP bwdefinitions:binding BW Design Time Repository - binding/transport definition
# TYPE bwdefinitions:binding counter
bwdefinitions:binding{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInterface="GetCustomer360:GetDataOperation",Binding="/customer",Transport="HTTP",} 1.0
# HELP bwdefinitions:service BW Design Time Repository - Service definition
# TYPE bwdefinitions:service counter
bwdefinitions:service{ProcessName="t1.module.sub.item.getCustomerData",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
bwdefinitions:service{ProcessName="t1.module.sub.item.auditProcess",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
bwdefinitions:service{ProcessName="t1.module.sub.orchestratorSubFlow",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
bwdefinitions:service{ProcessName="t1.module.Process",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
# HELP bwdefinitions:gateway BW Design Time Repository - Gateway definition
# TYPE bwdefinitions:gateway counter
bwdefinitions:gateway{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",Endpoint="bwce-demo-mon-orchestrator-bwce",InteractionType="ISTIO",} 1.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1956.86
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.604712447107E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 763.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.046207488E9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.2151936E8
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="G1 Young Generation",} 540.0
jvm_gc_collection_seconds_sum{gc="G1 Young Generation",} 4.754
jvm_gc_collection_seconds_count{gc="G1 Old Generation",} 2.0
jvm_gc_collection_seconds_sum{gc="G1 Old Generation",} 0.563
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 98.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 43.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 98.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 109.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0

As you can see a lot of metrics but I have to be honest I am not using most of them in my dashboards and to generate my alerts. I can use the metrics regarding the application performance for each of the BusinessWorks process and its activities, also the JVM memory performance and number of threads but things like how the JVM GC is working for each of the layers of the JVM (G1 Young Generation, G1 Old Generation) I’m not using them at all.

So, If I show the same metric endpoint highlighting the things that I am not using it would be something like this:

# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_221-b27",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0

# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.0318492E8
jvm_memory_bytes_used{area="nonheap",} 1.52094712E8
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 1.35266304E8
jvm_memory_bytes_committed{area="nonheap",} 1.71302912E8
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 1.073741824E9
jvm_memory_bytes_max{area="nonheap",} -1.0
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{area="heap",} 1.34217728E8
jvm_memory_bytes_init{area="nonheap",} 2555904.0

# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 3.3337536E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 1.04914136E8
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 1.384304E7
jvm_memory_pool_bytes_used{pool="G1 Eden Space",} 3.3554432E7
jvm_memory_pool_bytes_used{pool="G1 Survivor Space",} 1048576.0
jvm_memory_pool_bytes_used{pool="G1 Old Gen",} 6.8581912E7
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 3.3619968E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 1.19697408E8
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.7985536E7
jvm_memory_pool_bytes_committed{pool="G1 Eden Space",} 4.6137344E7
jvm_memory_pool_bytes_committed{pool="G1 Survivor Space",} 1048576.0
jvm_memory_pool_bytes_committed{pool="G1 Old Gen",} 8.8080384E7
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9
jvm_memory_pool_bytes_max{pool="G1 Eden Space",} -1.0
jvm_memory_pool_bytes_max{pool="G1 Survivor Space",} -1.0
jvm_memory_pool_bytes_max{pool="G1 Old Gen",} 1.073741824E9
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
jvm_memory_pool_bytes_init{pool="G1 Eden Space",} 7340032.0
jvm_memory_pool_bytes_init{pool="G1 Survivor Space",} 0.0
jvm_memory_pool_bytes_init{pool="G1 Old Gen",} 1.26877696E8
# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="direct",} 148590.0
jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool="direct",} 148590.0
jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool="direct",} 19.0
jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 16993.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 17041.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 48.0

# HELP bwce_activity_stats_list BWCE Activity Statictics list
# TYPE bwce_activity_stats_list gauge
# HELP bwce_activity_counter_list BWCE Activity related Counters list
# TYPE bwce_activity_counter_list gauge
# HELP all_activity_events_count BWCE All Activity Events count by State
# TYPE all_activity_events_count counter
all_activity_events_count{StateName="CANCELLED",} 0.0
all_activity_events_count{StateName="COMPLETED",} 0.0
all_activity_events_count{StateName="STARTED",} 0.0
all_activity_events_count{StateName="FAULTED",} 0.0
# HELP activity_events_count BWCE All Activity Events count by Process, Activity State
# TYPE activity_events_count counter
# HELP activity_total_evaltime_count BWCE Activity EvalTime by Process and Activity
# TYPE activity_total_evaltime_count counter
# HELP activity_total_duration_count BWCE Activity DurationTime by Process and Activity
# TYPE activity_total_duration_count counter
# HELP bwpartner_instance:total_request Total Request for the partner invocation which mapped from the activities
# TYPE bwpartner_instance:total_request counter
# HELP bwpartner_instance:total_duration_ms Total Duration for the partner invocation which mapped from the activities (execution or latency)
# TYPE bwpartner_instance:total_duration_ms counter
# HELP bwce_process_stats BWCE Process Statistics list
# TYPE bwce_process_stats gauge
# HELP bwce_process_counter_list BWCE Process related Counters list
# TYPE bwce_process_counter_list gauge
# HELP all_process_events_count BWCE All Process Events count by State
# TYPE all_process_events_count counter
all_process_events_count{StateName="CANCELLED",} 0.0
all_process_events_count{StateName="COMPLETED",} 0.0
all_process_events_count{StateName="STARTED",} 0.0
all_process_events_count{StateName="FAULTED",} 0.0
# HELP process_events_count BWCE Process Events count by Operation
# TYPE process_events_count counter
# HELP process_duration_seconds_total BWCE Process Events duration by Operation in seconds
# TYPE process_duration_seconds_total counter
# HELP process_duration_milliseconds_total BWCE Process Events duration by Operation in milliseconds
# TYPE process_duration_milliseconds_total counter
# HELP bwdefinitions:partner BWCE Process Events count by Operation
# TYPE bwdefinitions:partner counter
bwdefinitions:partner{ProcessName="t1.module.item.getTransactionData",ActivityName="FTLPublisher",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="TransactionService",PartnerOperation="GetTransactionsOperation",Location="internal",PartnerMiddleware="MW",} 1.0
bwdefinitions:partner{ProcessName=" t1.module.item.auditProcess",ActivityName="KafkaSendMessage",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="AuditService",PartnerOperation="AuditOperation",Location="internal",PartnerMiddleware="MW",} 1.0
bwdefinitions:partner{ProcessName="t1.module.item.getCustomerData",ActivityName="JMSRequestReply",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="CustomerService",PartnerOperation="GetCustomerDetailsOperation",Location="internal",PartnerMiddleware="MW",} 1.0
# HELP bwdefinitions:binding BW Design Time Repository - binding/transport definition
# TYPE bwdefinitions:binding counter
bwdefinitions:binding{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInterface="GetCustomer360:GetDataOperation",Binding="/customer",Transport="HTTP",} 1.0
# HELP bwdefinitions:service BW Design Time Repository - Service definition
# TYPE bwdefinitions:service counter
bwdefinitions:service{ProcessName="t1.module.sub.item.getCustomerData",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
bwdefinitions:service{ProcessName="t1.module.sub.item.auditProcess",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
bwdefinitions:service{ProcessName="t1.module.sub.orchestratorSubFlow",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
bwdefinitions:service{ProcessName="t1.module.Process",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0
# HELP bwdefinitions:gateway BW Design Time Repository - Gateway definition
# TYPE bwdefinitions:gateway counter
bwdefinitions:gateway{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",Endpoint="bwce-demo-mon-orchestrator-bwce",InteractionType="ISTIO",} 1.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1956.86
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.604712447107E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 763.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.046207488E9
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.2151936E8
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="G1 Young Generation",} 540.0
jvm_gc_collection_seconds_sum{gc="G1 Young Generation",} 4.754
jvm_gc_collection_seconds_count{gc="G1 Old Generation",} 2.0
jvm_gc_collection_seconds_sum{gc="G1 Old Generation",} 0.563

# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 98.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 43.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 98.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 109.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0

So, it can be a 50% of the metric endpoint response the part that I’m not using, so, why I am using disk space that I am paying for to storing it? And this is just for a “critical exporter”, one that I try to use as much information as possible, but think about how many exporters do you have and how much information you use for each of them.

Ok, so now the purpose and the motivation of this post are clear, but what we can do about it?

Discovering the REST API

Prometheus has an awesome REST API to expose all the information that you can wish about. If you have ever use the Graphical Interface for Prometheus (shown below) you are using the REST API because this is why is behind it.

Optimize Prometheus Disk Usage: Reduce TSDB Size and Control Metrics Cardinality
Target view of the Prometheus Graphical Interface

We have all the documentation regarding the REST API in the Prometheus official documentation:

https://prometheus.io/docs/prometheus/latest/querying/api/

But what is this API providing us in terms of the time-series database TSDB that Prometheus is using?

TSDB Admin APIs

We have a specific API to manage the performance of the TSDB database but in order to be able to use it, we need to enable the Admin API. And that is done by providing the following flag where we are launching the Prometheus server --web.enable-admin-api.

If we are using the Prometheus Operator Helm Chart to deploy this we need to use the following item in our values.yaml

## EnableAdminAPI enables Prometheus the administrative HTTP API which includes functionality such as deleting time series.    
## This is disabled by default.
## ref: https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis
## enableAdminAPI: true

We have a lot of options enable when we enable this administrative API but today we are going to focus on a single REST operation that is the “stats”. This is the only method related to TSDB that it doesn’t require to enable the Admin API. This operation, as we can read in the Prometheus documentation, returns the following items:

headStats: This provides the following data about the head block of the TSDB:

  • numSeries: The number of series.
  • chunkCount: The number of chunks.
  • minTime: The current minimum timestamp in milliseconds.
  • maxTime: The current maximum timestamp in milliseconds.

seriesCountByMetricName: This will provide a list of metrics names and their series count.

labelValueCountByLabelName: This will provide a list of the label names and their value count.

memoryInBytesByLabelName This will provide a list of the label names and memory used in bytes. Memory usage is calculated by adding the length of all values for a given label name.

seriesCountByLabelPair This will provide a list of label value pairs and their series count.

To access to that API we need to hit the following endpoint:

GET /api/v1/status/tsdb

So, when I am doing that in my Prometheus deployment I get something similar to this:

{
"status":"success",
"data":{
"seriesCountByMetricName":[
{
"name":"apiserver_request_duration_seconds_bucket",
"value":34884
},
{
"name":"apiserver_request_latencies_bucket",
"value":7344
},
{
"name":"etcd_request_duration_seconds_bucket",
"value":6000
},
{
"name":"apiserver_response_sizes_bucket",
"value":3888
},
{
"name":"apiserver_request_latencies_summary",
"value":2754
},
{
"name":"etcd_request_latencies_summary",
"value":1500
},
{
"name":"apiserver_request_count",
"value":1216
},
{
"name":"apiserver_request_total",
"value":1216
},
{
"name":"container_tasks_state",
"value":1140
},
{
"name":"apiserver_request_latencies_count",
"value":918
}
],
"labelValueCountByLabelName":[
{
"name":"__name__",
"value":2374
},
{
"name":"id",
"value":210
},
{
"name":"mountpoint",
"value":208
},
{
"name":"le",
"value":195
},
{
"name":"type",
"value":185
},
{
"name":"name",
"value":181
},
{
"name":"resource",
"value":170
},
{
"name":"secret",
"value":168
},
{
"name":"image",
"value":107
},
{
"name":"container_id",
"value":97
}
],
"memoryInBytesByLabelName":[
{
"name":"__name__",
"value":97729
},
{
"name":"id",
"value":21450
},
{
"name":"mountpoint",
"value":18123
},
{
"name":"name",
"value":13831
},
{
"name":"image",
"value":8005
},
{
"name":"container_id",
"value":7081
},
{
"name":"image_id",
"value":6872
},
{
"name":"secret",
"value":5054
},
{
"name":"type",
"value":4613
},
{
"name":"resource",
"value":3459
}
],
"seriesCountByLabelValuePair":[
{
"name":"namespace=default",
"value":72064
},
{
"name":"service=kubernetes",
"value":70921
},
{
"name":"endpoint=https",
"value":70917
},
{
"name":"job=apiserver",
"value":70917
},
{
"name":"component=apiserver",
"value":57992
},
{
"name":"instance=192.168.185.199:443",
"value":40343
},
{
"name":"__name__=apiserver_request_duration_seconds_bucket",
"value":34884
},
{
"name":"version=v1",
"value":31152
},
{
"name":"instance=192.168.112.31:443",
"value":30574
},
{
"name":"scope=cluster",
"value":29713
}
]
}
}

We can also check the same information if we use the new and experimental React User Interface on the following endpoint:

/new/tsdb-status
Optimize Prometheus Disk Usage: Reduce TSDB Size and Control Metrics Cardinality
Graphical Visualization of top 10 series count by metric name in the new Prometheus UI

So, with that, you will get the Top 10 series and labels that are inside your time-series database, so in case, some of them are not useful you can just get rid of them using the normal approaches to drop a series or a label. This is great, but what if all the ones shown here are relevant, what can we do about it?

Mmmm, maybe we can use PromQL to monitor this (dogfodding approach). So if we would like to extract the same information but using PromQL we can do it with the following query:

topk(10, count by (__name__)({__name__=~".+"}))
Optimize Prometheus Disk Usage: Reduce TSDB Size and Control Metrics Cardinality
Top 10 of metric series generated and stored in the time series database

And now we have all the power at my hands. For example, let’s take a look not at the 10 more relevant but the 100 more relevants or any other filter that we need to apply. For example, let’s see the metrics regarding with the JVM that we discussed at the beginning. And we will do that with the following PromQL query:

topk(100, count by (__name__)({__name__=~"jvm.+"}))
Optimize Prometheus Disk Usage: Reduce TSDB Size and Control Metrics Cardinality
Top 100 of metric series regarding to JVM metrics

So we can see that we have at least 150 series regarding to metrics that I am not using at all. But let’s do it even better, let’s take a look at the same but group by job names:

topk(10, count by (job,__name__)({__name__=~".+"}))
Optimize Prometheus Disk Usage: Reduce TSDB Size and Control Metrics Cardinality
Result of checking the top 10 metric series count with the job that is generating them

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Prometheus Storage Explained: How the TSDB Works and Why It Matters

Prometheus Storage Explained: How the TSDB Works and Why It Matters

Learn the bases that make Prometheus, so a great solution to monitor your workloads and use it for your own benefit.

Prometheus is one of the key systems in nowadays cloud architectures. The second graduate project from the Cloud Native Computing Foundation (CNCF) after Kubernetes itself, and is the monitoring solution for excellence in most of the workloads running on Kubernetes.

If you already have used Prometheus for some time, you know that it relies on a Time series database so Prometheus storage is one of the key elements. Based on their own words from the Prometheus official page:

Every time series is uniquely identified by its metric name and optional key-value pairs called labels, and that series is similar to the tables in a relational model. And inside each of those series, we have samples that are similar to the tuples. And each of the samples contains a float value and a milliseconds-precision timestamp.

Default on-disk approach

By default, Prometheus uses a local-storage approach storing all those samples on disk. This data is distributed in different files and folders to group different chunks of data.

So, we have folders to create those groups, and by default, they are a two-hour block and can contain one or more files depending on the amount of data ingested in that period of time as each folder contains all the samples for that specific timeline.

Additionally, each folder also has some kind of metadata files that help locate each of the data files’ metrics.

A file is persistent in a complete manner when the block is over, and before that, it keeps in memory and uses a write-ahead log technical to recover the data in case of a crash of the Prometheus server.

So, at a high-level view, the directory structure of a Prometheus server’s data directory will look something like this:

Remote Storage Integration

Default on-disk storage is good and has some limitations in terms of scalability and durability, even considering the performance improvement of the latest version of the TSDB. So, if we’d like to explore other options to store this data, Prometheus provides a way to integrate with remote storage locations.

It provides an API that allows writing samples that are being ingested into a remote URL and, at the same time, be able to read back sample data for that remote URL as shown in the picture below:

As always in anything related to Prometheus, the number of adapters created using this pattern is huge, and it can be seen in the following link in detail:

Summary

Knowing how prometheus storage works is critical to understand how we can optimize their usage to improve the performance of our monitoring solution and provide a cost-efficient deployment.

In the following posts, we’re going to cover how we can optimize the usage of this storage layer, making sure that only the metrics and samples that are important to use are being stored, and also how to analyze which metrics are the ones used most of the time-series database to be able to take good decision about which metrics should be dropped and which ones should be kept.

So, stay tuned for the next post regarding how we can have a better life with Prometheus and not die in the attempt.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.