Enable Access Logs on OpenShift Default Routes (HAProxy Ingress Debugging)

white and gray spiral stairs

Put some brain when a route is not working as expected, or your consumers are not able to reach the service

We all know that Openshift is an outstanding Kubernetes Distribution and one of the most used mainly when talking about private-cloud deployments. Based on the solid reputation of Red Hat Enterprise Linux, Openshift was able to create a solid product that is becoming almost a standard for most enterprises.

It provides a lot of extensions from the Vanilla Kubernetes style, including some of the open-source industry standards such as Prometheus, Thanos, and Grafana for Metrics Monitoring or ELK stack for Logging Aggregation but also including its extensions such as the Openshift Routes.

Openshift Routes was the initial solution before the Ingress concept was a reality inside the standard. Now, it also implements following that pattern to keep it compatible. It is backed by HAProxy, one of the most known reverse-proxy available in the open-source community.

One of the tricky parts by default is knowing how to debug when one of your routes is not working as expected. The way you create routes is so easy that anyone can make it in a few clicks, and if everything works as expected, that’s awesome.

But if it doesn’t, the problems start because, by default, you don’t get any logging about what’s happening. But that’s what we are going to solve here.

First, we will talk a little more about how this is configured. Currently (Openshift 4.8 version), this is implemented, as I said, using HAProxy by default so if you are using other technology as ingresses such as Istio or Nginx, this article is not for you (but don’t forget to leave a comment if a similar kind of article would be of your interest so I can also bring it to the back-log 🙂 )

From the implementation perspective, this is implemented using the Operator Framework, so the ingress is deployed as an Operator, and it is available in the openshift-ingress-operator namespace.

Enable Access Logs on OpenShift Default Routes (HAProxy Ingress Debugging)
ingress-operator pods on Openshift ecosystem

So, as this is an operator, several Custom Resources Definition (CRD) have been installed to work with this, one of the most interesting of this article. This CRD is Ingress Controllers.

Enable Access Logs on OpenShift Default Routes (HAProxy Ingress Debugging)
Ingress instances on Openshift Ecosystem

By default, you will only see one instance named default. This is the one that includes the configuration of the ingress that is being deployed, so we need to add here an additional configuration to have also the logs.

Enable Access Logs on OpenShift Default Routes (HAProxy Ingress Debugging)
Ingress controller YAML file

The snippet that we need to that is the one shown below under the spec parameter that starts the definition of the specification of the IngressController itself:

   logging:
    access:
      destination:
        type: Container
      httpLogFormat: >-
        log_source="haproxy-default" log_type="http" c_ip="%ci" c_port="%cp"
        req_date="%tr" fe_name_transport="%ft" be_name="%b" server_name="%s"
        res_time="%TR" tot_wait_q="%Tw" Tc="%Tc" Tr="%Tr" Ta="%Ta"
        status_code="%ST" bytes_read="%B" bytes_uploaded="%U"
        captrd_req_cookie="%CC" captrd_res_cookie="%CS" term_state="%tsc"
        actconn="%ac" feconn="%fc" beconn="%bc" srv_conn="%sc" retries="%rc"
        srv_queue="%sq" backend_queue="%bq" captrd_req_headers="%hr"
        captrd_res_headers="%hs" http_request="%r"
 

This will make another container deployed on the router pods in the openshift-ingressnamespace following the sidecar pattern named logs.

Enable Access Logs on OpenShift Default Routes (HAProxy Ingress Debugging)
Router pods on Openshift Installation

This container will print the logs from the requests reaching the ingress component, so next time your consumer is not able to call your service, you will be able to see the incoming requests with all their metadata and know at least what is doing wrong:

Enable Access Logs on OpenShift Default Routes (HAProxy Ingress Debugging)
Openshift Route Access Logs

As you can see, simple and easy!! If you don’t need it anymore, you can again remove the configuration and save it, and the new version will be rolled out and go back to normal.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

How To Improve Your Kubernetes Workload Development Productivity

timelapse photo of highway during golden hour

Telepresence is the way to reduce the time between your lines of code and a cloud-native workload running.

timelapse photo of highway during golden hour
Photo by Joey Kyber on Unsplash

We all know how cloud-native workloads and Kubernetes have changed how we do things. There are a lot of benefits that come with the effect of containerization and orchestration platforms such as Kubernetes, and we have discussed a lot about it: scalability, self-healing, auto-discovery, resilience, and so on.

But some challenges have been raised, most of them on the operational aspect that we have a lot of projects focused on tackling, but usually, we forget about what the ambassador has defined as the “inner dev cycle.”

The “inner dev cycle” is the productive workflow that each developer follows when working on a new application, service, or component. This iterative flow is where we code, test what we’ve coded, and fix what is not working or improve what we already have.

This flow has existed since the beginning of time; it doesn’t matter if you were coding in C using STD Library or COBOL in the early 1980 or doing nodejs with the latest frameworks and libraries at your disposal.

We have seen movements towards making this inner cycle more effective, especially in front-end development. We have many options to see the last change we have done in code, just saving the file. But for the first time when the movement to a container-based platform, this flow makes devs less productive.

The main reason is that the number of tasks a dev needs to do has increased. Imagine this set of steps that we need to perform:

  • Build the app
  • Build the container image
  • Deploy the container image in Kubernetes

These actions are not as fast as testing your changes locally, making devs less productive than before, which is what the “telepresence” project is trying to solve.

Telepresence is an incubator project from the CNCF that has recently focused a lot of attention because it has included OOTB in the latest releases of the Docker Desktop component. Based on its own words, this is the definition of the telepresence project:

Telepresence is an open-source tool that lets developers code and test microservices locally against a remote Kubernetes cluster. Telepresence facilitates more efficient development workflows while relieving the need to worry about other service dependencies.

Ok, so let’s see how we can start? Let’s dive in together. The first thing we need to do is to install telepresence in our Kubernetes cluster:

Note: It is also a way to install telepresence using Helm in your cluster following these steps:

helm repo add datawire  https://app.getambassador.io
helm repo update
kubectl create namespace ambassador
helm install traffic-manager --namespace ambassador datawire/telepresence

Now I will create a simple container that will host a Golang application that exposes a simple REST service and make it more accessible; I will follow the tutorial that is available below; you can do it as well.

Once we have our golang application ready, we are going to generate the container from it, using the following Dockerfile:

FROM golang:latest

RUN apt-get update
RUN apt-get upgrade -y

ENV GOBIN /go/bin

WORKDIR /app

COPY *.go ./
RUN go env -w GO111MODULE=off
RUN go get .
RUN go build -o /go-rest
EXPOSE 8080
CMD [ "/go-rest" ]

Then once we have the app, we’re going to upload to the Kubernetes server and run it as a deployment, as you can see in the picture below:

kubectl create deployment rest-service --image=quay.io/alexandrev/go-test  --port=8080
kubectl expose deploy/rest-service

Once we have that, it is the moment to start executing the telepresence, and we will start connecting to the cluster using the following command telepresence connect, and it will show an output like this one:

How To Improve Your Kubernetes Workload Development Productivity

Then we are going to list the endpoints available to intercept with the command telepresence listand we will see our rest-service that we have exposed before:

How To Improve Your Kubernetes Workload Development Productivity

Now, we will run the specific interceptor, but before that, we’re going to do the trick so we can connect it to our Visual Studio Code. We will generate a launch.json file in Visual Studio Code with the following content:

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Launch with env file",
            "type": "go",
            "request": "launch",
            "mode": "debug",
            "program": "1",
            "envFile": "NULL/go-debug.env"
           }
    ]
}

The interesting part here is the envFile argument that points to a non-existent file go-debug.env on the same folder, so we need to make sure that we generate that file when we do the interception. So we will use the following command:

telepresence intercept rest-service --port 8080:8080 --env-file /Users/avazquez/Data/Projects/GitHub/rest-golang/go-debug.env

And now, we can start our debug session in Visual Studio code and maybe add a breakpoint and some lines, as you can see in the picture below:

How To Improve Your Kubernetes Workload Development Productivity

So, now, if we hit the pod in Kubernetes, we will see how the breakpoint is being reached as we were in a local debugging session.

How To Improve Your Kubernetes Workload Development Productivity

That means that we can inspect variables and everything, change the code, or do whatever we need to speed up our development!

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Prometheus ServiceMonitor vs PodMonitor: Key Differences and When to Use Each

black flat screen tv turned on near black and gray audio component

Discover the differences between two of the most used CRDs from Prometheus Operator and how to use each of them.

ServiceMonitor and PodMonitor are terms that you will start to see more often when talking about using Prometheus. We have covered a lot about Prometheus in the past articles. It is one of the primary references when we talk about monitoring in a cloud-native environment and is specially focused on the Kubernetes ecosystem.

Prometheus has a new deployment model under the Kubernetes Operator Framework in recent times. That has generated several changes in terms of resources and how we configure several aspects of the monitoring of our workloads. Some of these concepts are now managed as Customer Resource Definition (CRD) that are included to simplify the system’s configuration and be more aligned with the capabilities of the Kubernetes platform itself. This is great but, at the same time, changes how we need to use this excellent monitoring tool for cloud-native workloads.

Today, we will cover two of these new CRDs, one of the most relevant ones: ServiceMonitor and PodMonitor. These are the new objects that specify the resources that will be under monitoring scope to the platform, and each of them covers a different type of object, as you can imagine: Services and Pods.

Each of them has its definition file with its particular fields and metadata, and to highlight them, I will present a sample for each of them below:

Service Monitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    serviceMonitorSelector: prometheus
  name: prometheus
  namespace: prometheus
spec:
  endpoints:
  - interval: 30s
    targetPort: 9090
    path: /metrics
  namespaceSelector:
    matchNames:
    - prometheus
  selector:
    matchLabels:
      operated-prometheus: "true"

Pod Monitor

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: front-end
  labels:
    name: front-end
spec:
  namespaceSelector:
    matchNames:
      - sock-shop
  selector:
    matchLabels:
      name: front-end
  podMetricsEndpoints:
  - targetPort: 8079

As you can see, the definitions of the components are very similar and very intuitive, focusing on the selector to detect which pods or services we should monitor and some data regarding the specific target of the monitoring, so Prometheus knows how to scrape them.

If you want to take a look more in detail at any option you can configure on this CRD, I would recommend you to take a look at this URL which includes a detailed field to field documentation of the most common CRDs:

These components will belong to the definition of your workloads, which means that the creation and maintenance of these objects will be from the application’s developers.

That is great because several reasons:

  • It will include the Monitoring aspect of the component itself, so you will never forget the add the configuration from a specific component. That means it can be included in the duplicate YAML files or Helm Chart or a Kustomize resources as another needed resource.
  • It will de-centralize the monitoring configuration making it more agile, and it will progress as the software components do it.
  • It will reduce the impact on other monitored components as there is no need to act in any standard file or resource, so any different workloads will continue to work as expected.

Both objects are very similar in their purposes as both of them scrape all the endpoints that match the selector that we added. So, in which cases should I use one or the other?

The answer will be straightforward. By default, you will go with a ServiceMonitor because it will provide the metrics from the service itself and each of the endpoints that the service has, so each of the pods that are implementing the service will be discovered and scraped as part of this action.

So, in which cases should I use PodMonitor? Where the workload you are trying to monitor doesn’t act behind a service, so as there is no service defined, you cannot use ServiceMonitor. Do you want some examples of those? Let’s bring some!

  • Services that interact using other protocols that are not HTTP-based, such as Kafka, SQS/SNS, JMS, or similar ones.
  • Components such as CronJobs, DaemonSets, or non exposing any incoming connection model.

So I hope this article will help you understand the main difference between those objects and go a little deeper into how the new Prometheus Operator Framework resources work. We will continue covering other aspects in upcoming posts.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Scale to Zero in Kubernetes: Bringing the Serverless Experience to Your Cluster

brown and beige weighing scale

Bringing the Serverless Experience To Your Kubernetes Cluster

Serverless always has been considered the next step in the cloud journey. You know what I mean: you start from your VM on-premises, then you move to have containers on a PaaS platform, and then you try to find your next stop in this journey that is serverless.

Scale to Zero in Kubernetes: Bringing the Serverless Experience to Your Cluster
Technological evolution defined based on infrastructure abstraction perspective

Serverless is the idea of forgetting about infrastructure and focusing only on your apps. There is no need to worry about where it will run or the management of the underlying infrastructure. Serverless has started as a synonym of the Function as a Service (FaaS) paradigm. It has been populated first by the Amazon Lambda functions and later by all the major cloud providers.

It started as an alternative to the containerized approach that probably requires a lot of technical skills to manage and run at a production scale, but this is not the case anymore.

We have seen how the serverless approach has reached any platform despite this starting point. Following the same principles, we have different platforms that its focus is to abstract all technical aspects for the operational part and provide a platform where you can put your logic running. Pretty much every SaaS platform covers this approach but I would like to highlight some samples to clarify:

  • netlify is a platform that allows you to deploy your web application without needing to manage anything else that the code needed to run it.
  • TIBCO Cloud Integration is an iPaaS solution that provides all the technical resources you could need so you can focus on deploying your integration services.

But going beyond that, pretty much each service provided by the major cloud platform such as Azure, AWS, or GCP follows the same principle. Most of them (messaging, machine learning, storage, and so on) abstract all the infrastructure underlying it so you can focus on the real service.

Going back to the Kubernetes ecosystem we have two different layers of that approach. The main one is the managed Kubernetes services that all big platforms provide where all the management of the Kubernetes (master nodes, internal Kubernetes components) are transparent to you and you center everything on the workers. And the second level is what you can get in the AWS world with the EKS + Fargate kind of architecture where not even the worker nodes exist, you have your pods that will be deployed on a machine that belongs to your cluster but you don’t need to worry about it, or manage anything related to that.

So as we have seen serverless approach is coming to all areas but this is not the scope of this article. The idea here is to try to focus on the serverless as a synonym of Function as a Service and (FaaS) and How we can bring the FaaS experience to our productive K8S ecosystem. But let’s start with the initial questions:

Why would we like to do that?

This is the most exciting thing to ask: what are the benefits this approach provides? Function as a Service follows the zero-scale approach. That means that the function is not loaded if they are not being executed, and this is important, especially when you are responsible for your infrastructure or at least paying for it.

Imagine a normal microservices written in any technology, the amount of resources it can use depends on its load, but even without any load, you need some resources to keep it running; mainly, we are talking about memory that you need to stay in use. The actual amount will depend on the technology and the development itself, but it can be moved from some MB to some hundreds. If we consider all the microservices a significant enterprise can get, you will get a difference of several GB that you are paying for that are not providing any value.

But beyond the infrastructure management, this approach also plays very well with another of the latest architectural approaches, the Event-Driven Application (EDA), because we can have services that are asleep just waiting for the right event to wake them up and start processing.

So, in a nutshell, the serverless approach helps you get your optimized infrastructure dream and enable different patterns also in an efficient way. But what happens is I already own the infrastructure? It will be the same because you will run more services in the same infrastructure, so you will still get the optimized use of your current infrastructure.

What do we need to enable that?

The first thing that we need to know is that not all technologies or frameworks are suitable to run on this approach. That is because you need to meet some requirements to be able to do that as a successful approach, as shown below:

  • Quick Startup: If your logic is not loaded before a request hits the service, you will need to make sure the logic can load quickly to avoid impacting the consumer of the service. So that means that you will need a technology that can load in a small amount of time, usually talking in the microsecond range.
  • Stateless: As your logic is not going to be loaded in a continuous mode it is not suitable for stateful services.
  • Disposability: Similar to the previous point it should be ready for graceful shutdown in a robust way

How do we do that?

Several frameworks allow us to get all those benefits that we can incorporate into our Kubernetes ecosystem, such as the following ones:

  • KNative: This is the framework that the CNCF Foundation supports and is being included by default in many Kubernetes distributions such as Red Hat Openshift Platform.
  • OpenFaaS: This is a well-used framework created by Alex Ellis that supports the same idea.

It is true that there are other alternatives such as Apache OpenWhisk, Kubeless, or Fission but there less used in today’s world and mainly most alternative has been chosen between OpenFaaS and KNative but if you want to read more about other alternatives I will let you an article about the CNCF covering them so you can take a look for yourself:

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Multi-Container Pods in Kubernetes: When to Use Them (and When Not To)

city with high rise buildings during night time

A multi-container Pod should be the exception, not the default.

Let’s Talk About the Most Dangerous Option From Pod Design Perspective, so you can be ready to use it!

One of the usual conversations is about the composition and definition of components inside a Pod. This is normal for people moving from traditional deployment to a cloud-native environment, and the main question is: How many containers can I have inside a pod? 

I’m sure that most of you have heard or have asked that question at some point on your cloud-native journey, or even you have this doubt internally at this moment, and there is no doubt on the answer: One single container.

Wait, wait!! Don’t leave the post yet! We know that is not technically true, but it is easier to understand initially; you can only have a pod doing one thing.

So, if that’s the case, why do the multi container pods exist? And most importantly, if this is the first time you have heard that concept, what is a multi container pod?

Let’s start with the definition: A multi container pod has more than one container in its composition. And when we are talking about multi container, we are not talking about having some initContainers to manage the dependencies. Still, we are talking about having more than one container run simultaneously and at the same level, as you can see in the picture below:

Multi-Container Pods in Kubernetes: When to Use Them (and When Not To)
Multi Container Pod Definition

Does Kubernetes support this model? Yes, for sure. You can define inside your containers section as many containers as you need. So, from a technical view, there is no limit to having as many containers as you need in the same pod. But the main question you should ask yourself is:

Is this what you want to do?

A pod is the smallest unit in Kubernetes as a reminder. You deploy and undeploy pods, stop and start pods, restart pods, scale pods. So anything that is inside the same pod is highly coupled. It’s like a bundle, and they also share resources. So it is even more critical.

Imagine this situation, I’d like to buy a notebook, so I go to the shop and ask for the notebook, but they don’t have a single notebook. Still, they have an incredible bundle: a notebook, a pen, and a stapler just for $2 more than a single notebook price.

So you think that this is an excellent price because you are getting a pen and a stapler for a small part of their price if you would like to buy it in isolation. So you think that’s a good idea. But then, you remind that you also need other notebooks for other purposes. In the end, you need ten more notebooks, but when you need to buy them, you also need to acknowledge the ten pens and ten staplers that you don’t need anymore. OK, there are cheaper, but in the end, you are paying a reasonable price for something that you don’t need. So, it is not efficient. And the same applies to the Pod structure definition.

In the end, you move from traditional monolith deployments to different containers inside a pod to have the same challenges and issues? What is the point of doing that?

None.

If there is no reason to have two containers tightly together, why is this allowed in the K8S specification? Because this is useful for some specific use-cases and scenarios. Let’s talk about some of them.

  • Helper Containers: This is the most common one and is that you have different containers inside the pod. Still, one is the main one, the one that provides a business capability or a feature, and the other is just helping in some way.
  • Sidecar Pattern Implementation: Another common approach to have this composition is implementing the sidecar pattern. This is how it works by deploying another container to perform a specific capability. You have seen it, for example, for Service Meshes, Log Aggregation Architecture, or other components that follow that pattern.
  • Monitoring Exporters: Another usual see to thing do is to use one of these containers to act as an exporter for the monitoring metrics of the main component. This is usually seen on architectures such as Prometheus, where each piece has its exporter to be scraped from the Prometheus Server

There are also exciting facts of sharing containers inside a pod because, as commented, they also share resources such as:

  • Volumes: You can, for example, define a shared folder for all the different containers inside a pod, so one container can read information for the other to perform its task quickly and efficiently.
  • Inter-process Communication: You can communicate between containers using IPC to communicate more efficiently.
  • Network: The different containers inside a pod can also access ports from other containers just reaching localhost.

I hope this article has helped you understand why this capability of having many containers inside the same pod exists, but at the same time to know which kind of scenarios are using this approach and having some reasoning about if a new use-case should be used this approach or not.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Troubleshoot Network Connections in Kubernetes Workloads (Live Traffic Debugging)

blue UTP cord

Discover Mizu: Traffic Viewer for Kubernetes to ease this challenge and improve your daily work.

One of the most common things we have to do when testing and debugging our cloud-native workloads on Kubernetes is to check the network communication.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

It could be to check the incoming traffic you are getting so we can inspect the requests we are receiving and see what we are replying to and similar kinds of use-cases. I am sure this sounds familiar to most of you.

I usually solve that using tcpdump on the container, similar to what I would do in a traditional environment, but this is not always easy. Depending on the environment and configuration, you cannot do so because you need to include a new package in your container image, do a new deployment, so it is available, etc.

So, to solve that and other similar problems, I discovered a tool named Mizu, which I would like to have found a few months ago because it would help me a lot. Mizu is precisely that. In its own words:

Mizu is a simple-yet-powerful API traffic viewer for Kubernetes, enabling you to view all API communication between microservices across multiple protocols to help you debug and troubleshoot regressions.

To install, it is pretty straightforward. You need to grab the binary and provide the correct permission on your computer. You have a different binary for each architecture, and in my case (Mac Intel-based), these are the commands that I executed:

curl -Lo mizu github.com/up9inc/mizu/releases/latest/download/mizu_darwin_amd64 && chmod 755 mizu && mv mizu /usr/local/bin

And that’s it, then you have a binary in your laptop that connects to your Kubernetes cluster using Kubernetes API, so you need to have configured the proper context.

In my case, I have deployed a simple nginx server using the command:

 kubectl run simple-app --image=nginx --port 80

And once that the component has been deployed, as it is shown in the Lens screenshot below:

I ran the command to launch mizu from my laptop:

mizu tap

And after a few seconds, I have in front of me a webpage opened monitoring all traffic happening in this pod:

Troubleshoot Network Connections in Kubernetes Workloads (Live Traffic Debugging)

I have made the nginx port expose using the kubectl expose command:

 kubectl expose pod/simple-app

And after that, I deployed a temporary pod using the curl image to start sending some requests with the command shown below:

 kubectl run -it --rm --image=curlimages/curl curly -- sh

now I’ve started to send some requests to my nginx pod using curl:

 curl -vvv http://simple-app:80 

And after a few calls, I could see a lot of information in front of me. First of all, I can see the requests I was sending with all the details of it:

Troubleshoot Network Connections in Kubernetes Workloads (Live Traffic Debugging)

But even more important, I can see a service map diagram showing the dependencies and the calls graphically happening to the pod with the response time and also the protocol usage:

Troubleshoot Network Connections in Kubernetes Workloads (Live Traffic Debugging)

This will not certainly replace a complete observability solution on top of a service mesh. Still, it will be a beneficial tool to add to your toolchain when you need to debug a specific communication between components or similar kinds of scenarios. As commented, it is like a high-level tcpdump for pod communication.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes Metadata Explained: Access Pod Names, Labels, and Annotations at Runtime

black laptop computer turned on on table

Discover how to extract all the information available to inject it into your pods

Kubernetes Metadata is how you will access some of the information from your pods in your application at runtime. When you are moving from a traditional kind of development to a cloud-native one, you usually need to access some out-of-the-box information available in a conventional environment.

This happens especially when we are talking about a platform that in the past was deployed on any platform that was populated with some information such as application name, version, domain, and so on. But this is tricky in a cloud-native approach. Or maybe not, But at least for some time, you have been wondering how you can get access to some of the information you know about your cloud-native workload, so the running application inside the pod knows it as well.

Because when you define a cloud-native, you describe a lot of very relevant information. For example, let’s think about that. When you start your pod, you know your pod name because it is your hostname:

Kubernetes Metadata Explained: Access Pod Names, Labels, and Annotations at Runtime

But when you define your workload, you have a deployment name; how can you get it from your pod? How do you get which namespace your pod has been deployed to? Or what about all the metadata we define as labels and annotations?

The good thing is that there is a way to get any single data we have commented on, so don’t worry; you will get all this information available to use if you need to.

The standard way to access any information is through environment variables. This is the traditional way that we provide initial data to our pod. We already have seen we know we can use ConfigMaps to populate environment variables, but this is not the only way to provide data to our pods. There is much more, so take a look at it.

Discovering the fieldRef option

When we discussed using ConfigMap as environment variables, we had two ways to populate that information. Providing all the ConfigMap content, in which case we used the envFrom option, we can also use the valueFrom and provide the configMap name and the same key we would like to get the valueFrom.

So, following this section approach, we have an even more helpful command called fieldRef. fieldRef is the command name for a reference to a field, and we can use it inside the valueFrom directive. In a nutshell, we can provide a field reference as a value to an environment variable key.

So let’s take a look at the data that we can get from this object:

  • metadata.name: This gets the pod name as a value for an environment value
  • metadata.namespace: Provides the namespace that the pod is running as the value
  • metadata.lables[LABELNAME]: Extract the value of the label as the value for the environment key
  • metadata.annotations[ANNOTATIONNAME]: Extract the value of the annotation as value for the environment key

So here, you can see a snippet that defines different environment variables using this metadata as the value so you can use it inside the pod just gathering as standard environment variables:

        env:
        - name: APP_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app']
        - name: DOMAIN_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['domain']
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name 

Going Even Beyond

But this is not everything that the fieldRef option can provide, there is much more, and if you would like to take a look, you can do it here:

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Discovering The Truth Behind Kubernetes Secrets

man in white dress shirt wearing black framed eyeglasses

We have been talking recently about ConfigMap being one of the objects to store a different configuration for Kubernetes based-workloads. But what happens with sensitive data?

This is an interesting question, and the initial answer from the Kubernetes platform was to provide a Secrets object. Based on its definition from the Kubernetes official website, they define secrets like this:

A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. Such information might otherwise be put in a Pod specification or in a container image. Using a Secret means that you don’t need to include confidential data in your application code

So, by default, secrets are what you should use to store your sensitive data. From the technical perspective, to use them, they behave very similar to ConfigMap, so you can link it to the different environment variables, mount it inside a pod, or even have specific usages for managing credentials for different kinds of accounts such as Service Accounts. This classifies the different types of secrets that you can create:

  • Opaque: This defines a generic secret that you can use for any purpose (mainly configuration data or configuration files)
  • Service-Account-Token: This defines the credentials for service accounts, but this is deprecated and no longer in use since Kubernetes 1.22.
  • Docker-Registry Credentials: This defines credentials to connect to the Docker registry to download images as part of your deployment process.
  • Basic or SSH Auth: This defines specific secrets to handle authentication.
  • TLS Secret:
  • Bootstrap Secrets:

But is it safe to use Kubernetes Secrets to store sensitive data? The main answer for any question in any tech-related topic is: It depends. But some controversy has arisen that this topic is also covered in the official Kubernetes page, highlighting the following aspects:

Kubernetes Secrets are, by default, stored unencrypted in the API server’s underlying data store (etcd). Anyone with API access can retrieve or modify a Secret, and so can anyone with access to etcd. Additionally, anyone who is authorized to create a Pod in a namespace can use that access to read any Secret in that namespace; this includes indirect access such as the ability to create a Deployment.

So, the main thing is, by default, this is a very, very insecure way. It seems more like a categorization of the data than a proper secure handle. Also, Kubernetes provide some tips to try to make this alternative more secure:

  • Enable Encryption at Rest for Secrets.
  • Enable or configure RBAC rules that restrict reading data in Secrets (including indirect means).
  • Where appropriate, also use mechanisms such as RBAC to limit which principals are allowed to create new Secrets or replace existing ones.

But that can be not enough, and that has created room for third-party and cloud providers to provide their solution that covers these needs and at the same time also offer additional features. Some of these options are the ones shown below:

  • Cloud Key Management Systems: Pretty much all the big cloud providers provide some way of Secret Management to go beyond these features and mitigate those risks. If we talk about AWS, there is AWS Secrets Manager , if we are talking about Azure, we have Azure Key Vault , and in the case of Google, we also have Google Secret Manager.
  • Sealed Secrets is a project that tries to extend Secrets to provide more security, especially on the Configuration as a Code approach, offers a safe way to store those objects in the same kind of repositories as you expose any other Kubernetes resource file. In its own words, “ The SealedSecret can be decrypted only by the controller running in the target cluster, and nobody else (not even the original author) can obtain the original Secret from the SealedSecret.”
  • Third-party Secrets Managers that are similar to the ones from the Cloud Providers that allows a more independent approach, and there are several players here such as Hashicorp Vault or CyberArk Secret Manager
  • Finally also, Spring Cloud Config can provide security to store data that are related to sensitive configuration concepts such as passwords and at the same time covers the same need as the ConfigMap provides from a unified perspective.

I hope this article has helped to understand the purpose of the Secrets in Kubernetes and, at the same time, the risks regarding its security and how we can mitigate them or even rely on other solutions that provide a more secure way to handle this critical piece of information.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

Kubernetes ConfigMaps Explained: Best Practices to Manage Configuration Properly

black and silver laptop computer beside yellow ceramic mug

ConfigMaps is one of the most known and, at the same time, less used objects in the Kubernetes ecosystem. It is one of the primary objects that has been there from the beginning, even if we tried so many other ways to implement a Config Management solution (such as Consul, Spring Cloud Config, and others).

Based on its own documentation words:

A ConfigMap is an API object used to store non-confidential data in key-value pairs.

https://kubernetes.io/docs/concepts/configuration/configmap/

Its motivation was to provide a native solution for configuration management for cloud-native deployments. A way to manage and deploy configuration focusing on different code from the configuration. Now, I still remember the WAR files with the application.properties file inside of it.

ConfigMap is a resource as simple as you can see in the snippet below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: game-demo
data:
  player_initial_lives: "3"
  ui_properties_file_name: "user-interface.properties"

ConfigMaps are objects that belong to a namespace. They have a strong relationship with Deployment and Pod and enable the option to have different logical environments using namespace where they can deploy the same application. Still, with a specific configuration, so they will need a particular configMap to support that, even if it is based on the same resource YAML file.

From a technical perspective, the content of the ConfigMap is stored in the etcd database as it happens for any information that is related to the Kubernetes environment, and you should remember that etcd by default is not encrypted, so all the data can be retrieved for anyone that has access to it.

Purposes of ConfigMaps

Configuration Parameters

The first and foremost purpose of the configMap is to provide configuration parameters to your workload. An industrialized way to remove the need for env variables is to link the environment configuration from your application.

Kubernetes ConfigMaps Explained: Best Practices to Manage Configuration Properly

Providing Environment Dependent Files

Another significant usage is providing or replacing files inside your containers containing the critical configuration file. One of the primary samples that illustrate this is to give a logging configuration for your app if your app is using the logback library. In this case, you need to provide a logback.xml file, so it knows how to apply your logging configuration.

Other options can be properties. The file needs to be located there or even public-key certificates to handle SSL connections with safelisted servers only.

Kubernetes ConfigMaps Explained: Best Practices to Manage Configuration Properly
Kubernetes ConfigMaps Explained: Best Practices to Manage Configuration Properly

Read-Only Folders

Another option is to use the ConfigMap as a read-only folder to provide an immutable way to link information to the container. One use-case of this can be Grafana Dashboards that you are adding to your Grafana pod (if you are not using the Grafana operator)

Kubernetes ConfigMaps Explained: Best Practices to Manage Configuration Properly

Different ways to create a ConfigMap

You have several ways to create a ConfigMap using the interactive mode that simplifies its creation. Here are the ones that I use the most:

Create a configMap to host key-value pairs for configuration purposes

kubectl create configMap name  --from-literal=key=value

Create a configMap using a Java-like properties file to populate the ConfigMap in a key-value pair.

 kubectl create configMap name --from-env-file=filepath

Create a configMap using a file to be part of the content of the ConfigMap

 kubectl create configMap name --from-file=filepath

 ConfigMap Refresh Lifecycle

ConfigMaps are updated in the same way as any other Kubernetes object, and you can use even the same commands such as kubctl apply to do that. But you need to be careful because one thing is to update the ConfigMap itself and another thing is that the resource using this ConfigMap is updated.

In all the use-cases that we have described here, the content of the ConfigMap depends on the Pod’s lifecycle. That means that the content of the ConfigMap is read on the initialization process of the Pod. So to update the ConfigMap data inside the pod, you will need to restart or bring a new instance of the pod after you have modified the ConfigMap object itself.

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.

CKAD Exam Preparation: Practical Tips to Pass the Kubernetes Developer Certification

person writing on white paper

Learn From My Own Experience To Clear Your Kubernetes Certification Exam

But I would also like to provide some practical advice based on my own experience if this can help anyone else going through the same process. I know there are a lot of similar articles, and most of them are worth it because each of them provides a different perspective and approach. So here it is mine:

  • Fast but Safe. You will have around 2 hours to complete between 15 to 20 practical questions, which pretty much gives you about 6 minutes each on average. That’s enough time to do it, but also you must go fast. So, try to avoid the approach of reading the whole exam first or moving across questions. It is better to start with the first one right away and if you are blocked, move to the next one. At the same time, you must validate the output you are getting to ensure that you are not missing anything. Try to run any command to validate if the objects have been created correctly and have the right attributes and configuration before moving to the next one. Time is precious. I had a lot of time at the end of the exam to review the questions, but it is also true that I spent 20 minutes because I wrote ngnix instead of nginx, and I was unable to see it!!
  • Imperative commands is the way to go: You must learn the YAML structure for the main objects. Deployment, Pod, CronJob, Jobs, etc. You will also need to master the imperative commands to generate the initial output quickly. Imperative commands such as kubectl run, kubectl create, kubectl expose will not provide 100% of the answer, but maybe 80% is the base to make arrangements to have the solution to your question quickly. I recommend taking a look at this resource:
  • kubectl explain to avoid going through documentation on thinking a lot. I have a problem learning the exact name of a field or the location in the YAML file. So I used a lot of the kubectl explain, especially with the —rescursive flag. It provides the YAML structure so, if you don’t remember if the key name is configMap or ConfigMapRef or claimName or persitentVolumeClaim, this will be an incredible help. If you also add a grep -A 10 -B 5 command to find your field and its context, you will master it. This doesn’t replace knowing the YAML structure, but it will help to be efficient when you don’t remember the exact name or location.
CKAD Exam Preparation: Practical Tips to Pass the Kubernetes Developer Certification
kubectl explain pod –recursive
  • Don’t forget about docker/podman and helm: With the changes in the certification in September 2021 also, the building process is essential, so it is excellent if you have enough time in your preparation to play with tools such as docker/podman or helm so you will master any question related to that that you could find.
  • Use the simulator: LinuxFoundation is providing you two sessions on the simulator that, from one side, will give you an authentic exam experience, so you will face similar kinds of questions and interface to feel that you are not the first time that you are facing and at the same time you could feel familiar with the environment. I recommend using both sessions (both have the same question), one in the middle of your training and the second one just one or two days before your exam.

So, here are my tips, and I hope you will like them. If they were helpful to you, please let me know on social networks or by mail or another way of contacting your preference! All the best in your preparation, and I’m sure you will get your goals!

📚 Want to dive deeper into Kubernetes? This article is part of our comprehensive Kubernetes Architecture Patterns guide, where you’ll find all fundamental and advanced concepts explained step by step.