In the previous post of these series regarding how to set up a Hybrid EKS cluster making use of both traditional EC2 machines but also serverless options using Fargate, we were able to create the EKS cluster with both deployment fashion available. If you didn’t take a look at it yet, do it now!
At that point, we have an empty cluster with everything ready to deploy new workloads, but we still need to configure a few things before doing the deployment. First thing is to decide which workloads are going to be deployed using the serverless option and which ones will use the traditional EC2 option.
By default, all the workloads deployed on the namespaces default and kube-system as you can see in the picture below form the AWS Console:
So that means that all workloads from the default namespace and the kube-system namespace will be deployed in a serverless fashion. If that’s what you’d like perfect. But sometimes you’d like to start with a delimited set of namespaces where you’d like to use the serverless option and rely on the traditional deployment.
We can check that same information using eksctl and to do that we need to type the following command:
eksctl get fargateprofile --cluster cluster-test-hybrid -o yaml
The output of that command should be something similar of the information that we can see in the AWS Console:
NOTE: If you don’t remember the name of your cluster you just need to type the command eksctl get clusters
So, this is what we’re going to do and to do that the first thing we need to do is to create a new namespace named “serverless” that is going to hold our serverless deployment and to do that we use a kubectl command as follows:
kubectl create namespace serverless
And now, we just need to create a new fargate profile that is going to replace the one that we have at the moment and to do that we need to make use again of eksctl to handle that job:
NOTE: We also can use not only namespace to limit the scope of our serverless deployment but also tags, so we can have in the same namespace workloads that are deployed using traditional deployment and others using serverless fashion. That will give us all the posibilities to design your cluster as you wish. To do that we will append the argument labels in a key=value fashion.
And we will get an output similar to this:
[ℹ] creating Fargate profile “fp-serverless-profile” on EKS cluster “cluster-test-hybrid”
[ℹ] created Fargate profile “fp-serverless-profile” on EKS cluster “cluster-test-hybrid”
If now we check the number of profiles that we have available we should get two profiles handling three namespaces (the ones that are managed by the default profile — default and kube-system — and the one — serverless — handled by the one we just created now)
We just will use the following command to delete the default profile:
And the output of that command should be similar to this one:
[ℹ] deleted Fargate profile “fp-default” on EKS cluster “cluster-test-hybrid”
And after that, we have now ready our cluster with limited scope for serverless deployments. In the next post of the series, we will just deploy workloads on both fashions to see the difference between them. So, don’t miss the updates regarding this series making sure that you follow my posts, and if you’d like the article, or you have some doubts or comments, please leave your feedback using the comments below!
EKS Fargate AWS Kubernetes Cluster: Learn how to create a Kubernetes cluster that can use also all the power of serverless computing using AWS Fargate
We know that there are several movements and paradigms that are pushing us hard to change our architectures trying to leverage much more managed services and taking care of the operational level so we can focus on what’s really important for our own business: creating applications and deliver value through them.
AWS from Amazon has been a critical partner during that journey, especially in the container world. With the release of EKS some time ago were able to provide a managed Kubernetes service that everyone can use, but also introducing the CaaS solution Fargate also gives us the power to run a container workload in a serverless fashion, without needing to worry about anything else.
But you could be thinking about if those services can work together. And the short answer is yes. But even more important than that we’re seeing that also they can work in a mixed mode:
So you can have an EKS cluster that has some nodes that are Fargate services and some nodes that are normal EC2 machines for workloads that are working in a state-full fashion or fits better in a traditional EC2 approach. And everything works by the same rules and is managed by the same EKS Cluster.
So, that sounds amazing but, How we can do that? Let’s see.
eksctl
To get to that point there is a tool that we need to introduce first, and that tool is named eksctl and it is a command-line utility that helps us to do any action to interact with the EKS service and simplifies a lot the work to do and also be able to automate most of the tasks in a non-human required mode. So, the first thing we need to is to get eksctl in our platforms ready. Let’s see how we can get that.
We have here all the detailed from Amazon itself about how to install eksctl in different platforms, no matter if you’re using Windows, Linux, or MacOS X:
Installing or updating eksctl – Amazon EKS
Learn how to install or update the eksctl command line tool. This tool is used to create and work with an Amazon EKS cluster.
After doing that we can check that we have installed the eksctl software running the command:
eksctl version
And we should get an output similar to this one:
eksctl version output command
So after doing that we can see that we have access to all the power behind the EKS service just typing these simple commands into our console window.
Creating the EKS Hybrid Cluster
Now, we’re going to create a mixed environment with some EC2 machines and enable the Fargate support for EKS. To do that, we will start with the following command:
eksctl create cluster --version=1.15 --name=cluster-test-hybrid --region=eu-west-1 --max-pods-per-node=1000 --fargate
[ℹ] eksctl version 0.26.0
[ℹ] using region eu-west-1
[ℹ] setting availability zones to [eu-west-1c eu-west-1a eu-west-1b]
[ℹ] subnets for eu-west-1c - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ] subnets for eu-west-1a - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ] subnets for eu-west-1b - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ] using Kubernetes version 1.15
[ℹ] creating EKS cluster "cluster-test-hybrid" in "eu-west-1" region with Fargate profile
[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=eu-west-1 --cluster=cluster-test-hybrid'
[ℹ] CloudWatch logging will not be enabled for cluster "cluster-test-hybrid" in "eu-west-1"
[ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=eu-west-1 --cluster=cluster-test-hybrid'
[ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "cluster-test-hybrid" in "eu-west-1"
[ℹ] 2 sequential tasks: { create cluster control plane "cluster-test-hybrid", create fargate profiles }
[ℹ] building cluster stack "eksctl-cluster-test-hybrid-cluster"
[ℹ] deploying stack "eksctl-cluster-test-hybrid-cluster"
[ℹ] creating Fargate profile "fp-default" on EKS cluster "cluster-test-hybrid"
[ℹ] created Fargate profile "fp-default" on EKS cluster "cluster-test-hybrid"
[ℹ] "coredns" is now schedulable onto Fargate
[ℹ] "coredns" is now scheduled onto Fargate
[ℹ] "coredns" pods are now scheduled onto Fargate
[ℹ] waiting for the control plane availability...
[✔] saved kubeconfig as "C:\\Users\\avazquez/.kube/config"
[ℹ] no tasks
[✔] all EKS cluster resources for "cluster-test-hybrid" have been created
[ℹ] kubectl command should work with "C:\\Users\\avazquez/.kube/config", try 'kubectl get nodes'
[✔] EKS cluster "cluster-test-hybrid" in "eu-west-1" region is ready
This command will setup the EKS cluster enabling the Fargate support.
NOTE: The first thing that we should notice is that the Fargate support for EKS is not yet available in all the AWS regions. So, depending on the region that you’re using you could get an error. At this moment this is just enabled in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo) based on the information from AWS Announcements: https://aws.amazon.com/about-aws/whats-new/2020/04/eks-adds-fargate-support-in-frankfurt-oregon-singapore-and-sydney-aws-regions/
So, now, we should add to that cluster a Node Group. a Node Group is a set of EC2 instances that are going to be managed as part of it. And to do that we will use the following command:
eksctl create nodegroup --cluster cluster-test-hybrid --managed
[ℹ] eksctl version 0.26.0
[ℹ] using region eu-west-1
[ℹ] will use version 1.15 for new nodegroup(s) based on control plane version
[ℹ] nodegroup "ng-1262d9c0" present in the given config, but missing in the cluster
[ℹ] 1 nodegroup (ng-1262d9c0) was included (based on the include/exclude rules)
[ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "cluster-test-hybrid"
[ℹ] 2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "ng-1262d9c0" } } }
[ℹ] checking cluster stack for missing resources
[ℹ] cluster stack has all required resources
[ℹ] building managed nodegroup stack "eksctl-cluster-test-hybrid-nodegroup-ng-1262d9c0"
[ℹ] deploying stack "eksctl-cluster-test-hybrid-nodegroup-ng-1262d9c0"
[ℹ] no tasks
[✔] created 0 nodegroup(s) in cluster "cluster-test-hybrid"
[ℹ] nodegroup "ng-1262d9c0" has 2 node(s)
[ℹ] node "ip-192-168-69-215.eu-west-1.compute.internal" is ready
[ℹ] node "ip-192-168-9-111.eu-west-1.compute.internal" is ready
[ℹ] waiting for at least 2 node(s) to become ready in "ng-1262d9c0"
[ℹ] nodegroup "ng-1262d9c0" has 2 node(s)
[ℹ] node "ip-192-168-69-215.eu-west-1.compute.internal" is ready
[ℹ] node "ip-192-168-9-111.eu-west-1.compute.internal" is ready
[✔] created 1 managed nodegroup(s) in cluster "cluster-test-hybrid"
[ℹ] checking security group configuration for all nodegroups
[ℹ] all nodegroups have up-to-date configuration
So now we should be able to use kubectl to manage this new cluster. If you don’t have installed kubectl or you haven’t heard about it. This is the command line tool that allow us to manage your Kubernetes Cluster and you can install it based on the documentation shown here:
Installing or updating eksctl – Amazon EKS
Learn how to install or update the eksctl command line tool. This tool is used to create and work with an Amazon EKS cluster.
So, now, we should start taking a look at the infrastructure that we have. So if we type the following command to see the nodes at our disposal:
kubectl get nodes
We see an output similar to this:
NAME STATUS ROLES AGE VERSION
fargate-ip-192-168-102-22.eu-west-1.compute.internal Ready <none> 10m v1.15.10-eks-094994
fargate-ip-192-168-112-125.eu-west-1.compute.internal Ready <none> 10m v1.15.10-eks-094994
ip-192-168-69-215.eu-west-1.compute.internal Ready <none> 85s v1.15.11-eks-bf8eea
ip-192-168-9-111.eu-west-1.compute.internal Ready <none> 87s v1.15.11-eks-bf8eea
As you can see we have 4 “nodes” two that start with the fargate name that are fargate nodes and two that just start with ip-… and those are the traditional EC2 instances. And after that moment that’s it, we have our mixed environment ready to use.
We can check the same cluster using the AWS EKS page to see that configuration more in detail. If we enter in the EKS page for this cluster we see in the Compute tab the following information:
We see under Node Groups the data around the EC2 machines that are managed as part of this cluster and as you can see we saw 2 as the Desired Capacity and that’s why we have 2 EC2 instances in our cluster. And regarding the Fargate profile, we see the namespaces set to default and kube-system and that means that all the deployments to those namespaces are going to be deployed using Fargate Tasks.
Summary
In the following articles in these series, we will see how to progress on our Hybrid cluster, deploy workloads scale it based on the demand that we’re getting, enabling integration with other services like AWS CloudWatch, and so on. So, stay tuned, and don’t forget to follow my articles to not miss any new updates as soon as it’s available to you!
Managed Container Platform is disrupting everything. We’re living in a time where development and the IT landscape are changing, new paradigms like microservices and containers seem to be out there for the last few years, and if we trust the reality that the blog posts and the articles show today, we’re all of the users already using them all the time.
Did you see any blog posts about how to develop a J2EE application running on your Tomcat server on-prem? Probably not. The most similar article should probably be how to containerize your tomcat-based application.
But do you know what? Most companies still are working that way. So even if all companies have a new digital approach in some departments, they also have other ones being more traditional.
So, that seems that we need to find a different way to translate the main advantages of a container-based platform to a kind of speech they can see and realize the tangible benefits they can get from there and have the “Hey, this can work for me!” kind of spirit.
1. You will get all components isolated and updated more quickly
That’s one of the great things about container-based platforms compared with previous approaches like application server-based platforms. When you have an application server cluster, you still have one cluster with several applications. So you usually do some isolation, keep related applications, provide independent infrastructure for the critical ones, and so on.
But even with that, at some level, the application continues to be coupled, so some issues with some applications could bring down another one that was not expected for business reasons.
With a container-based platform, you’re getting each application in its bubble, so any issue or error will affect that application and nothing more. Platform stability is a priority for all companies and all departments inside them. Just ask yourself: Do you want to end with those “domino’s chains” of failure? How much will your operations improve? How much will your happiness increase?
Additionally, based on the container approach, you will get smaller components. Each of them will do a single task providing a single capability to your business, which means that it will be much easier to update, test, and deploy in production. So that, in the end will generate more deployments into the production environment and reduce the time to market for your business capabilities.
You will be able to deploy faster and have more stable operations simultaneously.
2.- You will optimize the use of your infrastructure
Costs, everything is about costs. There are no single conversations with customers who are not trying to pay less for their infrastructure. So, let’s face it. We should be able to run operations in an optimized way. So, if our infrastructure cost is going higher, that needs to mean that our business increases.
Container-based platforms will allow optimizing infrastructure in two different ways. First, if using two main concepts: Elasticity and Infrastructure Sharing.
Elasticity is related because I’m only going to have the infrastructure I need to support the load I have at this moment. So, if the load increases, my infrastructure will increase to handle it, but after that moment goes away, it will go back to what it needs now after that peak happened.
Infrastructure sharing is about using each server’s part that is free to deploy other applications. Imagine a traditional approach where I have two servers for my set of applications. Probably I don’t have 100% usage of those servers because I need to have some spare computer to be able to act when the load increases. I probably have 60–70% percent. That means 30% free. If we have different departments with different systems, and each has its infrastructure 30% free, how much of our infrastructure are we just throwing away? How many dollars/euros/pounds are you just throwing off the window?
Container-based platforms don’t need specific tools or software installed on the platform to run a different kind of application. It is not required because everything resides inside the container, so I can use any free space to deploy other applications doing a more efficient usage of those.
3.- You will not need infrastructure for administration
Each system that is big enough has some resources dedicated to being able to manage it. However, even most of the recommended architectures recommend placing those components isolated from your runtime components to avoid any issue regarding administrator or maintenance that can affect your runtime workloads, which means specific infrastructure that you’re using for something that isn’t helping your business. Of course, you can explain to any business user that you need a machine to run that provides the capabilities required. But it is more complex than using additional infrastructure (and generating cost) to place other components that are not helping the business.
So, managed container platforms take that problem away because you’re going to provide the infrastructure you need to run your workloads, and you’re going to be given for free or such low fee the administration capabilities. And addition to that, you don’t even need to worry that administration features are always available and working fine because this is leverage to the provider itself.
Wrap up and next steps
As you can see, we describe very tangible benefits that are not industry-based or development focus. Of course, we can have so many more to add to this list, but these are the critical ones that affect any company in any industry worldwide. So, please, take your time to think about how these capabilities can help to improve your business. But not only that, take your time to quantify how that will enhance your business. How much can you save? How much can you get from this approach?
And when you have in front of you a solid business case based on this approach, you will get all the support and courage you need to move forward during that route!! So I wish you a peaceful transition!
Find a way to re-define and re-organize the name of your Prometheus metrics to meet your requirements
Prometheus has become the new standard when we’re talking about monitoring our new modern application architecture, and we need to make sure we know all about its options to make sure we can get the best out of it. I’ve been using it for some time until I realized about a feature that I was desperate to know how to do, but I couldn’t find anywhere clearly define. So as I didn’t found it easily, I thought about writing a small article to show you how to do it without needed to spend the same time as I did.
We have plenty of information about how to configure Prometheus and use some of the usual configuration plugins, as we can see on its official webpage [1]. Even I already write about some configuration and using it for several purposes, as you can see also in other posts [2][3][4].
One of these configuration plugins is about relabeling, and this is a great thing. We have that each of the exporters can have its labels and meaning for those, and when you try to manage different technologies or components makes complex that all of them match together even if all of them follow the naming convention that Prometheus has [5].
But I had this situation, and I’m sure you have gone or will go towards that as well, that I have similar metrics for different technologies that for me are the same, and I need to keep them with the same name, but as they belong to other technologies they are not. So I need to find a way to rename the metric, and the great thing is that you can do that.
To do that, you just need to do a metric_relabel configuration. This configuration applies to relabel (as the name already indicates) labels of your prometheus metrics in this case before being ingested but also allow us to use some notable terms to do different things, and one of these notable terms is __name__. __name__ is a particular label that will enable you to rename your prometheus metrics before being ingested in the Prometheus Timeseries Database. And after that point, this will be as it will have that name since the beginning.
How to use that is relatively easy, is as any other relabel process, and I’d like to show you a sample about how to do it.
- source_labels: [__name__]
regex: 'jvm_threads_current'
target_label: __name__
replacement: 'process_thread_count'
Here it is a simple sample to show how we can rename a metric name jvm_threads_current to count the threads inside the JVM machine to do it more generic to be able to include the threads for the process in a process_thread_count prometheus metrics that we can use now as it was the original name.
Add a header to begin generating the table of contents
We all know that in the rise of the cloud-native development and architectures, we’ve seen Kubernetes based platforms as the new standard all focusing on new developments following the new paradigms and best practices: Microservices, Event-Driven Architectures new shiny protocols like GraphQL or gRPC, and so on and so forth.
This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.
In previous posts, I’ve explained how to integrate TIBCO BusinessWorks 6.x / BusinessWorks Container Edition (BWCE) applications with Prometheus, one of the most popular monitoring systems for cloud layers. Prometheus is one of the most widely used solutions to monitor your microservices inside a Kubernetes cluster. In this post, I will explain steps to leverage Prometheus for integrating with applications running on TIBCO Cloud Integration (TCI).
TCI is TIBCO’s iPaaS and primarily hides the application management complexity of an app from users. You need your packaged application (a.k.a EAR) and manifest.json — both generated by the product to simply deploy the application.
Isn’t it magical? Yes, it is! As explained in my previous post related to Prometheus integration with BWCE, which allows you to customize your base images, TCI allows integration with Prometheus in a slightly different manner. Let’s walk through the steps.
TCI has its own embedded monitoring tools (shown below) to provide insights into Memory and CPU utilization, plus network throughput, which is very useful.
While the monitoring metrics provided out-of-the-box by TCI are sufficient for most scenarios, there are hybrid connectivity use-cases (application running on-prem and microservices running on your own cluster that could be on a private or public cloud) that might require a unified single-pane view of monitoring.
Import the Prometheus plugin by choosing Import → Plug-ins and Fragments option and specifying the directory downloaded from the above mentioned GitHub location. (shown below)
Step two involves adding the Prometheus module previously imported to the specific application as shown below:
Step three is just to build the EAR file along with manifest.json.
NOTE: If the EAR doesn’t get generated once you add the Prometheus plugin, please follow the below steps:
Export the project with the Prometheus module to a zip file.
Remove the Prometheus project from the workspace.
Import the project from the zip file generated before.
Before you deploy the BW application on TCI, we need to enable an additional port on TCI to scrape the Prometheus metrics.
Step four Updating manifest.json file.
By default, a TCI app using the manifest.json file only exposes one port to be consumed from outside (related to functional services) and the other to be used internally for health checks.
For Prometheus integration with TCI, we need an additional port listening on 9095, so Prometheus server can access the metrics endpoints to scrape the required metrics for our TCI application.
We need to slightly modify the generated manifest.json file (of BW app) to expose an additional port, 9095 (shown below) .
Also, to tell TCI that we want to enable Prometheus endpoint we need to set a property in the manifest.json file. The property is TCI_BW_CONFIG_OVERRIDES and provide the following value: BW_PROMETHEUS_ENABLE=true, as shown below:
We also need to add an additional line (propertyPrefix) in the manifest.json file as shown below.
Now, we are ready to deploy the BW app on TCI and once it is deployed we can see there are two endpoints
If we expand the Endpoints options on the right (shown above), you can see that one of them is named “prometheus” and that’s our Prometheus metrics endpoint:
Just copy the prometheus URL and append it with /metrics (URL in the below snapshot) — this will display the Prometheus metrics for the specific BW app deployed on TCI.
Note: appending with /metrics is not compulsory, the as-is URL for Prometheus endpoint will also work.
In the list you will find the following kind of metrics to be able to create the most incredible dashboards and analysis based on that kind of information:
JVM metrics around memory used, GC performance and thread pools counts
CPU usage by the application
Process and Activity execution counts by Status (Started, Completed, Failed, Scheduled..)
Duration by Activity and Process.
With all this available the information you can create dashboards similar to the one shown below, in this case using Spotfire as the Dashboard tool:
But you can also integrate those metrics with Grafana or any other tool that could read data from Prometheus time-series database.
Prometheus Monitoring for Microservices using TIBCO
We’re living a world with constant changes and this is even more true in the Enterprise Application world. I’ll not spend much time talking about things you already know, but just say that the microservices architecture approach and the PaaS solutions have been a game-changer for all enterprise integration technologies. This time I’d like to […]
In that post, we described that there were several ways to update Prometheus about the services that ready to monitor. And we choose the most simple at that moment that was the static_config configuration which means:
Don’t worry Prometheus, I’ll let you know the IP you need to monitor and you don’t need to worry about anything else.
And this is useful for a quick test in a local environment when you want to test quickly your Prometheus set up or you want to work in the Grafana part to design the best possible dashboard to handle your need.
But, this is not too useful for a real production environment, even more, when we’re talking about a Kubernetes cluster when services are going up & down continuously over time. So, to solve this situation Prometheus allows us to define a different kind of ways to perform this “service discovery” approach. In the official documentation for Prometheus, we can read a lot about the different service discovery techniques but at a high level these are the main service discovery techniques available:
Configuration | Prometheus
An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
azure_sd_configs: Azure Service Discovery
consul_sd_configs: Consul Service Discovery
dns_sd_configs: DNS Service Discovery
ec2_sd_configs: EC2 Service Discovery
openstack_sd_configs: OpenStack Service Discovery
file_sd_configs: File Service Discovery
gce_sd_configs: GCE Service Discovery
kubernetes_sd_configs: Kubernetes Service Discovery
marathon_sd_configs: Marathon Service Discovery
nerve_sd_configs: AirBnB’s Nerve Service Discovery
serverset_sd_configs: Zookeeper Serverset Service Discovery
triton_sd_configs: Triton Service Discovery
static_config: Static IP/DNS for the configuration. No Service Discovery.
And even, it all these options are not enough for you and need something more specific you have an API available to extend the Prometheus capabilities and create your own Service Discovery technique. You can find more info about it here:
Implementing Custom Service Discovery | Prometheus
An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
But this is not our case, for us, the Kubernetes Service Discovery is the right choice for our approach. So, we’re going to change the static configuration we had in the previous post:
As you can see this is quite more complex than the previous configuration but it is not as complex as you can think at first glance, let’s review it by different parts.
- role: endpoints
namespaces:
names:
- default
It says that we’re going to use role for endpoints that are created under the default namespace and we’re going to specify the changes we need to do to find the metrics endpoints for Prometheus.
That means that we want to do a replace of the label value and we can do several things:
Rename the label name using the target_label to set the name of the final label that we’re going to create based on the source_labels.
Replace the value using the regex parameter to define the regular expression for the original value and the replacement parameter that is going to express the changes that we want to do to this value.
So, now after applying this configuration when we deploy a new application in our Kubernetes cluster, like the project that we can see here:
Automatically we’re going to see an additional target on our job-name configuration “bwce-metrics”
Prometheus is becoming the new standard for Kubernetes monitoring and today we are going to cover how we can do Prometheus TIBCO monitoring in Kubernetes.
This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.
We’re living in a world with constant changes and this is even more true in the Enterprise Application world. I’ll not spend much time talking about things you already know, but just say that the microservices architecture approach and the PaaS solutions have been a game-changer for all enterprise integration technologies.
This time I’d like to talk about monitoring and the integration capabilities we have of using Prometheus to monitor our microservices developed under TIBCO technology. I don’t like to spend too much time either talking about what Prometheus is, as you probably already know, but in a summary, this is an open-source distributed monitoring platform that has been the second project released by the Cloud Native Computing Foundation (after Kubernetes itself) and that has been established as a de-facto industry standard for monitoring K8S clusters (alongside with other options in the market like InfluxDB and so on).
Prometheus has a lot of great features, but one of them is that it has connectors for almost everything and that’s very important today because it is so complicated/unwanted/unusual to define a platform with a single product for the PaaS layer. So today, I want to show you how to monitor your TIBCO BusinessWorks Container Edition applications using Prometheus.
Most of the info I’m going to share is available in the bw-tooling GitHub repo, so you can get to there if you need to validate any specific statement.
bw-tooling/prometheus-integration at master · TIBCOSoftware/bw-tooling
Collection of tools designed to simplify deployment and management of TIBCO BusinessWorks applications – bw-tooling/prometheus-integration at master · TIBCOSoftware/bw-tooling
Ok, are we ready? Let’s start!!
I’m going to assume that we already have a Kubernetes cluster in place and Prometheus installed as well. So, the first step is to enhance the BusinessWorks Container Edition base image to include the Prometheus capabilities integration. To do that we need to go to the GitHub repo page and follow these instructions:
Download & unzip the prometheus-integration.zip folder.
Open TIBCO BusinessWorks Studio and point it to a new workspace.
Right-click in Project Explorer → Import… → select Plug-ins and Fragments → select Import from the directory radio button
Browse it to prometheus-integration folder (unzipped in step 1)
Now click Next → Select Prometheus plugin → click Add button → click Finish. This will import the plugin in the studio.
Now, to create JAR of this plugin so first, we need to make sure to update com.tibco.bw.prometheus.monitor with ‘.’ (dot) in Bundle-Classpath field as given below in META-INF/MANIFEST.MF file.
Right-click on Plugin → Export → Export…
Select type as JAR file click Next
Now Click Next → Next → select radio button to use existing MANIFEST.MF file and browse the manifest file
Click Finish. This will generate prometheus-integration.jar
Now, with the JAR already created what we need to do is include it in your own base image. To do that we place the JAR file in the <TIBCO_HOME>/bwce/2.4/docker/resources/addons/jar
And we launch the building image command again from the <TIBCO_HOME>/bwce/2.4/docker folder to update the image using the following command (use the version you’re using at the moment)
docker build -t bwce_base:2.4.4 .
So, now we have an image with Prometheus support! Great! We’re close to the finish, we just create an image for our Container Application, in my case, this is going to be a very simple echo service that you can see here.
And we only need to keep these things in particular when we deploy to our Kubernetes cluster:
We should set an environment variable with the BW_PROMETHEUS_ENABLE to “TRUE”
We should expose the port 9095 from the container to be used by Prometheus to integrate.
Now, we only need to provide this endpoint to the Prometheus scrapper system. There are several ways to do that, but we’re going to focus on the simple one.
We need to change the prometheus.yml to add the following job data:
Probes are how we’re able to say to Kubernetes that everything inside the pod is working as expected. Kubernetes has no way to know what’s happening inside at the fine-grained and has no way to know for each container if it is healthy or not, that’s why they need help from the container itself.
First, you can think that you can do it with the entrypoint component of your Dockerfile as you only specify one command to run inside each container, so check if that process is running, and that means that everything is healthy? Ok… fair enough…
But, is this true always? A running process at the OS/container level means that everything is working fine? Let’s think about the Oracle database for a minute, imagine that you have an issue with the shared memory and it keeps in an initializing status forever, K8S is going to check the command, it is going to find that is running and says to the whole cluster: Ok! Don’t worry! Database is working perfectly, go ahead and send your queries to it!!
This could happen with similar components like a web server or even with an application itself, but it is too common when you have servers that can handle deployments on it, like BusinessWorks Container Edition itself. And that’ why this is very important for us as developers and even more important for us as administrators. So, let’s start!
The first thing we’re going to do is to build a BusinessWorks Container Edition Application, as this is not the main purpose of this article, we’re going to use the same ones I’ve created for the BusinessWorks Container Edition — Istio Integration that you could find here.
So, this is a quite simple application that exposes a SOAP Web Service. All applications in BusinessWorks Container Edition (as well as in BusinessWorks Enterprise Edition) has its own status, so you can ask them if they’re Running or not, that something the BusinessWorks Container internal “engine” (NOTE: We’re going to use the word engine to simplify when we’re talking about the internals of BWCE. In detail, the component that knows the status of the application is the internal AppNode the container starts, but let’s keep it simple for now)
Kubernetes Probes
In Kubernetes, exists the “probe” concept to perform health check to your container. This is performed by configuring liveness probes or readiness probes.
Liveness probe: Kubernetes uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress.
Readiness probe: Kubernetes uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balance
Even when there are two types of probes for BusinessWorks Container Edition both are handling the same way, the idea is the following one: As long as the application is Running, you can start sending traffic and when it is not running we need to restart the container, so that makes it simpler for us.
Implementing Probes
Each BusinessWorks Container Edition application that is started has an out of the box way to know if it is healthy or not. This is done by a special endpoint published by the engine itself:
http://localhost:7777/_ping/
So, if we have a normal BusinessWorks Container Edition application deployed on our Kubernetes cluster as we had for the Istio integration we have logs similar to these ones:
Staring traces of a BusinessWorks Container Edition Application
As you can see logs says that the application is started. So, as we can’t launch a curl request from the inside the container (as we haven’t exposed the port 7777 to the outside yet and curl is not installed in the base image), the first thing we’re going to do is to expose it to the rest of the cluster.
To do that we change our Deployment.yml file that we have used to this one:
Deployment.yml file with the 7777 port exposed
Now, we can go to any container in the cluster that has “curl” installed or any other way to launch a request like this one with the HTTP 200 code and the message “Application is running”.
Successful execution of _ping endpoint
NOTE: If you forget the last / and try to invoke _ping instead of _ping/ you’re going to get an HTTP 302 Found code with the final location as you can see here:
HTTP 302 code execution were pointing to _ping instead of _ping/
Ok, let’s see what happens if now we stop the application. To do that we’re going to go inside the container and use the OSGi console.
To do that once you’re inside the container you execute the following command:
ssh -p 1122 equinox@localhost
It is going to ask for credentials and use the default password ‘equinox’. After that is going to give you the chance to create a new user and you can use whatever credentials work for you. In my example, I’m going to use admin / adminadmin (NOTE: Minimum length for a password is eight (8) characters.
If we execute frwk:la is going to show the applications deployed, in our case the only one, as it should be in BusinessWorks Container Edition application:
To stop it, we are going to execute the following command to list all the OSGi bundle we have at the moment running in the system:
frwk:lb
Now, we find the bundles that belong to our application (at least two bundles (1 per BW Module and another for the Application)
Showing bundles inside the BusinessWorks Container Application
And now we can stop it using felix:stop <ID>, so in my case, I need to execute the following commands:
stop “603”
stop “604”
Commands to stop the bundles that belong to the application
And now the application is stopped
OSGi console showing the application as Stopped
So, if now we try to launch the same curl command as we executed before, we’re getting the following output:
Failed execution of ping endpoint when Application is stopped
As you can see an HTTP 500 Error which means something is not fine. If now we try to start again the application using the start bundle command (equivalent to the stop bundle command that we used before) for both bundles of the application, you are going to see that the application says is running again:
And the command has the HTTP 200 output as it should have and the message “Application us running”
So, now, after knowing how the _ping/ endpoint works we only need to add it to our deployment.yml file from Kubernetes. So we modified again our deployment file to be something like this:
NOTE: It’s quite important the presence of initialDelaySeconds parameter to make sure the application has the option to start before start executing the probe. In case you don’t put this value you can get a Reboot Loop in your container.
NOTE: Example shows port 7777 as an exported port but this is only needed for the steps we’ve done before and you will not be needed in a real production environment.
So now we deploy again the YML file and once we get the application running we’re going to try the same approach, but now as we have the probes defined as soon as I stop the application containers has going to be restarted. Let’s see!
As you can see in the picture above after the application is Stopped the container has been restarted and because of that, we’ve got expelled from inside the container.
So, that’s all, I hope that helps you to set up your probes and in case you need more details, please take a look at the Kubernetes documentation about httpGet probes to see all the configuration and option that you can apply to them.