Promtail: The Missing Link Logs and Metrics for your Monitoring Platform.

Promtail is the solution when you need to provide metrics that are only present on the log traces of the software you need to monitor to provide a consistent monitoring platform

Photo by SOULSANA on Unsplash

It is a common understanding that three pillars in the observability world help us to get a complete view of the status of our own platforms and systems: Logs, Traces, and Metrics.

To provide a summary of the differences between each of them:

  • Metrics are the counters about the state of the different components from both a technical and a business view. So we can see here things like the CPU consumption, the number of requests, memory, or disk usage…
  • Logs are the different messages that each of the pieces of software in our platform provides to understand its current behavior and detect some non-expected situations.
  • Trace is the different data regarding the end-to-end request flow across the platform with the services and systems that have been part of that flow and data related to that concrete request.

We have solutions that claim to address all of them, mainly in the enterprise software with Dynatrace, AppDynamics, and similar. And on the other hand, we try to go with a specific solution for each of them that we can easily integrate together and we have discussed a lot about that options in previous articles.

But, some situations in that software don’t work following this path because we live in the most heterogeneous era. We all embrace, at some level, the polyglot approach on the new platforms. In some cases, we can see that software is using log traces to provide data related to metrics or other matters, and here is when we need to rely on pieces of software that help us “fix” that situation, and Promtail does specifically that.

Promtail is mainly a log forwarder similar to others like fluentd or fluent-bit from CNCF or logstash from the ELK stack. In this case, this is the solution from Grafana Labs, and as you can imagine, this is part of the Grafana stack with Loki to be the “master-mind” that we cover in this article that I recommend you to take a look at if you haven’t read it yet:

Promtail has two main ways of behaving as part of this architecture, and the first one is very similar to others in this space, as we commented before. It helps us ship our log traces from our containers to the central location that will mainly be Loki and can be a different one and provide the usual options to play and transform those traces as we can do in other solutions. You can look at all the options in the link below, but as you can imagine, this includes transformation, filtering, parsing, and so on.

But what makes promtail so different is just one of the actions that you can do, and that action is metrics. Metrics provides a specific way to, based on the data that we are reading from the logs, create Prometheus metrics that a Prometheus server can scrape. That means that you can use the log traces that you are processing that can something like this:

[2021–06–06 22:02.12] New request received for customer_id: 123
[2021–06–06 22:02.12] New request received for customer_id: 191
[2021–06–06 22:02.12] New request received for customer_id: 522

With this information apart to send those metrics to the central location to create a metric call, for example: `total_request_count` that will be generated by the promtail agent and also exposed by it and being able also to use a metrics approach even for systems or components that don’t provide a standard way to do that like a formal metrics API.

And the way to do this is very well integrated with the configuration. This is done with an additional stage (this is how we call the actions we can do in Promtail) that is namedmetrics.

The schema of that metric stage is straightforward, and if you are familiar with Prometheus, you will see how direct it is from a definition of Prometheus metrics to this snippet:

# A map where the key is the name of the metric and the value is a specific
# metric type.
  [<string>: [ <metric_counter> | <metric_gauge> | <metric_histogram> ] ...]

So we start defining the kind of metrics that we would like to define, and we have the usual ones: counter, gauge, or histogram, and for each of them, we have a set of options to be able to declare our metrics as you can see here for a Counter Metrics

# The metric type. Must be Counter.
type: Counter

# Describes the metric.

[description: <string>]

# Defines custom prefix name for the metric. If undefined, default name “promtail_custom_” will be prefixed.

[prefix: <string>]

# Key from the extracted data map to use for the metric, # defaulting to the metric’s name if not present.

[source: <string>]

# Label values on metrics are dynamic which can cause exported metrics # to go stale (for example when a stream stops receiving logs). # To prevent unbounded growth of the /metrics endpoint any metrics which # have not been updated within this time will be removed. # Must be greater than or equal to ‘1s’, if undefined default is ‘5m’

[max_idle_duration: <string>]

config: # If present and true all log lines will be counted without # attempting to match the source to the extract map. # It is an error to specify `match_all: true` and also specify a `value`

[match_all: <bool>]

# If present and true all log line bytes will be counted. # It is an error to specify `count_entry_bytes: true` without specifying `match_all: true` # It is an error to specify `count_entry_bytes: true` without specifying `action: add`

[count_entry_bytes: <bool>]

# Filters down source data and only changes the metric # if the targeted value exactly matches the provided string. # If not present, all data will match.

[value: <string>]

# Must be either “inc” or “add” (case insensitive). If # inc is chosen, the metric value will increase by 1 for each # log line received that passed the filter. If add is chosen, # the extracted value most be convertible to a positive float # and its value will be added to the metric. action: <string>

And with that, you will have your metric created and exposed, just waiting for a Prometheus server to scrape it. If you would like to see all the options available, all this documentation is available in the Grafana Labs documentation that you can check in the link:

I hope you will find this interesting and a useful way to keep all your observability information managed correctly using the right solution and provide a solution for these pieces of software that don’t follow your paradigm.

If you find this content interesting please think about making a contribution using the button below to keep this content updated and increased!

Alexandre Vazquez: