Find the greatest way to manage your Kubernetes development cluster
I need to start this article by admitting that I am an advocate of Graphical User Interfaces and everything that provides a way to speed up the way we do things and be more productive.
So when we talk about how to manage our Kubernetes cluster mainly for development purposes, you can imagine that I am one of those people who tries any available tool to make that journey easier. The ones who’ve started using Portainer to manage their local Docker engine or are a fan of the new dashboard in Docker for Windows/Mac. But that is far from reality.
In terms of Kubernetes management, I got used to typing all the commands to check the pods, the logs, the status of the cluster to do the port-forwards, etc. Any task I did was with a terminal, and I felt that it was the right thing to do. I did not even use a Kubernetes dashboard to have a web page for my Kubernetes environment. All of that changed last week when I met with a colleague who showed me what Lens could do.
Lens is a totally different story. I am not praising it because I am being paid to do so. This is an open source project that you can find on GitHub. But the way that it does the job is just awesome!
Image of Len showing the status of a Kubernetes cluster — Screenshot by the author.
The first thing I would like to mention regarding Lens is that it has multi-context support, so you can have all the different Kubernetes contexts available to switch following a Slack approach when we switch from different workspaces. It just reads your .kube/config file and makes all those contexts available to you to connect to the one you would like.
Kubernetes context selection in Lens
Once we have connected to one of these clusters, we have different options to see the status of it, but the first one is to check the Workloads using the Overview option:
Workloads Overview in Lens
Then, you can drill down to any pod or different object inside Kubernetes to check its status and at the same time do the main actions you usually do when you deal with a pod, such as check the logs, execute a terminal to one of the containers that belong to that pod, or even edit the YAML for that pod.
Pod options inside Lens
But Lens goes beyond the usual Kubernetes tasks because it also has a Helm integration, so you can check the releases that you have there, the version of the status, and so on:
Helm integration option in Lens
The experience of managing everything feels perfect. You are more productive as well. Even those who love the CLI and terminals need to admit that to do regular tasks, the Graphical approach and the mouse are faster than the keyboard — even for the defenders of the mechanical keyboard like myself.
So, I encourage you to download Lens and start using it right now. To do so, go to their main web page and download it:
Learn some tricks to analyze and optimize the usage that you are doing of the TSDB and save money on your cloud deployment.
In previous posts, we discussed how the storage layer worked for Prometheus and how effective it was. But in the current times, we are of cloud computing we know that each technical optimization is also a cost optimization as well and that is why we need to be very diligent about any option that we use regarding optimization.
We know that usually when we monitor using Prometheus we have so many exporters available at our disposal and also that each of them exposes a lot of very relevant metrics that we need to track everything we need to. But also, we should be aware that there are also metrics that we don’t need at this moment or we don’t plan to use it. So, if we are not planning to use, why do we want to waste disk space storing them?
So, let’s start taking a look at one of the exporters we have in our system. In my case, I would like to use a BusinessWorks Container Application that exposes metrics about its utilization. If you check their metrics endpoint you could see something like this:
# HELP jvm_info JVM version info # TYPE jvm_info gauge jvm_info{version="1.8.0_221-b27",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0 # HELP jvm_memory_bytes_used Used bytes of a given JVM memory area. # TYPE jvm_memory_bytes_used gauge jvm_memory_bytes_used{area="heap",} 1.0318492E8 jvm_memory_bytes_used{area="nonheap",} 1.52094712E8 # HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area. # TYPE jvm_memory_bytes_committed gauge jvm_memory_bytes_committed{area="heap",} 1.35266304E8 jvm_memory_bytes_committed{area="nonheap",} 1.71302912E8 # HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area. # TYPE jvm_memory_bytes_max gauge jvm_memory_bytes_max{area="heap",} 1.073741824E9 jvm_memory_bytes_max{area="nonheap",} -1.0 # HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area. # TYPE jvm_memory_bytes_init gauge jvm_memory_bytes_init{area="heap",} 1.34217728E8 jvm_memory_bytes_init{area="nonheap",} 2555904.0 # HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_used gauge jvm_memory_pool_bytes_used{pool="Code Cache",} 3.3337536E7 jvm_memory_pool_bytes_used{pool="Metaspace",} 1.04914136E8 jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 1.384304E7 jvm_memory_pool_bytes_used{pool="G1 Eden Space",} 3.3554432E7 jvm_memory_pool_bytes_used{pool="G1 Survivor Space",} 1048576.0 jvm_memory_pool_bytes_used{pool="G1 Old Gen",} 6.8581912E7 # HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_committed gauge jvm_memory_pool_bytes_committed{pool="Code Cache",} 3.3619968E7 jvm_memory_pool_bytes_committed{pool="Metaspace",} 1.19697408E8 jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.7985536E7 jvm_memory_pool_bytes_committed{pool="G1 Eden Space",} 4.6137344E7 jvm_memory_pool_bytes_committed{pool="G1 Survivor Space",} 1048576.0 jvm_memory_pool_bytes_committed{pool="G1 Old Gen",} 8.8080384E7 # HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_max gauge jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8 jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0 jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9 jvm_memory_pool_bytes_max{pool="G1 Eden Space",} -1.0 jvm_memory_pool_bytes_max{pool="G1 Survivor Space",} -1.0 jvm_memory_pool_bytes_max{pool="G1 Old Gen",} 1.073741824E9 # HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_init gauge jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0 jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0 jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0 jvm_memory_pool_bytes_init{pool="G1 Eden Space",} 7340032.0 jvm_memory_pool_bytes_init{pool="G1 Survivor Space",} 0.0 jvm_memory_pool_bytes_init{pool="G1 Old Gen",} 1.26877696E8 # HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool. # TYPE jvm_buffer_pool_used_bytes gauge jvm_buffer_pool_used_bytes{pool="direct",} 148590.0 jvm_buffer_pool_used_bytes{pool="mapped",} 0.0 # HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool. # TYPE jvm_buffer_pool_capacity_bytes gauge jvm_buffer_pool_capacity_bytes{pool="direct",} 148590.0 jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0 # HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool. # TYPE jvm_buffer_pool_used_buffers gauge jvm_buffer_pool_used_buffers{pool="direct",} 19.0 jvm_buffer_pool_used_buffers{pool="mapped",} 0.0 # HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM # TYPE jvm_classes_loaded gauge jvm_classes_loaded 16993.0 # HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution # TYPE jvm_classes_loaded_total counter jvm_classes_loaded_total 17041.0 # HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution # TYPE jvm_classes_unloaded_total counter jvm_classes_unloaded_total 48.0 # HELP bwce_activity_stats_list BWCE Activity Statictics list # TYPE bwce_activity_stats_list gauge # HELP bwce_activity_counter_list BWCE Activity related Counters list # TYPE bwce_activity_counter_list gauge # HELP all_activity_events_count BWCE All Activity Events count by State # TYPE all_activity_events_count counter all_activity_events_count{StateName="CANCELLED",} 0.0 all_activity_events_count{StateName="COMPLETED",} 0.0 all_activity_events_count{StateName="STARTED",} 0.0 all_activity_events_count{StateName="FAULTED",} 0.0 # HELP activity_events_count BWCE All Activity Events count by Process, Activity State # TYPE activity_events_count counter # HELP activity_total_evaltime_count BWCE Activity EvalTime by Process and Activity # TYPE activity_total_evaltime_count counter # HELP activity_total_duration_count BWCE Activity DurationTime by Process and Activity # TYPE activity_total_duration_count counter # HELP bwpartner_instance:total_request Total Request for the partner invocation which mapped from the activities # TYPE bwpartner_instance:total_request counter # HELP bwpartner_instance:total_duration_ms Total Duration for the partner invocation which mapped from the activities (execution or latency) # TYPE bwpartner_instance:total_duration_ms counter # HELP bwce_process_stats BWCE Process Statistics list # TYPE bwce_process_stats gauge # HELP bwce_process_counter_list BWCE Process related Counters list # TYPE bwce_process_counter_list gauge # HELP all_process_events_count BWCE All Process Events count by State # TYPE all_process_events_count counter all_process_events_count{StateName="CANCELLED",} 0.0 all_process_events_count{StateName="COMPLETED",} 0.0 all_process_events_count{StateName="STARTED",} 0.0 all_process_events_count{StateName="FAULTED",} 0.0 # HELP process_events_count BWCE Process Events count by Operation # TYPE process_events_count counter # HELP process_duration_seconds_total BWCE Process Events duration by Operation in seconds # TYPE process_duration_seconds_total counter # HELP process_duration_milliseconds_total BWCE Process Events duration by Operation in milliseconds # TYPE process_duration_milliseconds_total counter # HELP bwdefinitions:partner BWCE Process Events count by Operation # TYPE bwdefinitions:partner counter bwdefinitions:partner{ProcessName="t1.module.item.getTransactionData",ActivityName="FTLPublisher",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="TransactionService",PartnerOperation="GetTransactionsOperation",Location="internal",PartnerMiddleware="MW",} 1.0 bwdefinitions:partner{ProcessName=" t1.module.item.auditProcess",ActivityName="KafkaSendMessage",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="AuditService",PartnerOperation="AuditOperation",Location="internal",PartnerMiddleware="MW",} 1.0 bwdefinitions:partner{ProcessName="t1.module.item.getCustomerData",ActivityName="JMSRequestReply",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="CustomerService",PartnerOperation="GetCustomerDetailsOperation",Location="internal",PartnerMiddleware="MW",} 1.0 # HELP bwdefinitions:binding BW Design Time Repository - binding/transport definition # TYPE bwdefinitions:binding counter bwdefinitions:binding{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInterface="GetCustomer360:GetDataOperation",Binding="/customer",Transport="HTTP",} 1.0 # HELP bwdefinitions:service BW Design Time Repository - Service definition # TYPE bwdefinitions:service counter bwdefinitions:service{ProcessName="t1.module.sub.item.getCustomerData",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 bwdefinitions:service{ProcessName="t1.module.sub.item.auditProcess",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 bwdefinitions:service{ProcessName="t1.module.sub.orchestratorSubFlow",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 bwdefinitions:service{ProcessName="t1.module.Process",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 # HELP bwdefinitions:gateway BW Design Time Repository - Gateway definition # TYPE bwdefinitions:gateway counter bwdefinitions:gateway{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",Endpoint="bwce-demo-mon-orchestrator-bwce",InteractionType="ISTIO",} 1.0 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 1956.86 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.604712447107E9 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 763.0 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1048576.0 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 3.046207488E9 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 4.2151936E8 # HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds. # TYPE jvm_gc_collection_seconds summary jvm_gc_collection_seconds_count{gc="G1 Young Generation",} 540.0 jvm_gc_collection_seconds_sum{gc="G1 Young Generation",} 4.754 jvm_gc_collection_seconds_count{gc="G1 Old Generation",} 2.0 jvm_gc_collection_seconds_sum{gc="G1 Old Generation",} 0.563 # HELP jvm_threads_current Current thread count of a JVM # TYPE jvm_threads_current gauge jvm_threads_current 98.0 # HELP jvm_threads_daemon Daemon thread count of a JVM # TYPE jvm_threads_daemon gauge jvm_threads_daemon 43.0 # HELP jvm_threads_peak Peak thread count of a JVM # TYPE jvm_threads_peak gauge jvm_threads_peak 98.0 # HELP jvm_threads_started_total Started thread count of a JVM # TYPE jvm_threads_started_total counter jvm_threads_started_total 109.0 # HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers # TYPE jvm_threads_deadlocked gauge jvm_threads_deadlocked 0.0 # HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors # TYPE jvm_threads_deadlocked_monitor gauge jvm_threads_deadlocked_monitor 0.0
As you can see a lot of metrics but I have to be honest I am not using most of them in my dashboards and to generate my alerts. I can use the metrics regarding the application performance for each of the BusinessWorks process and its activities, also the JVM memory performance and number of threads but things like how the JVM GC is working for each of the layers of the JVM (G1 Young Generation, G1 Old Generation) I’m not using them at all.
So, If I show the same metric endpoint highlighting the things that I am not using it would be something like this:
# HELP jvm_info JVM version info # TYPE jvm_info gauge jvm_info{version="1.8.0_221-b27",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0 # HELP jvm_memory_bytes_used Used bytes of a given JVM memory area. # TYPE jvm_memory_bytes_used gauge jvm_memory_bytes_used{area="heap",} 1.0318492E8 jvm_memory_bytes_used{area="nonheap",} 1.52094712E8 # HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area. # TYPE jvm_memory_bytes_committed gauge jvm_memory_bytes_committed{area="heap",} 1.35266304E8 jvm_memory_bytes_committed{area="nonheap",} 1.71302912E8 # HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area. # TYPE jvm_memory_bytes_max gauge jvm_memory_bytes_max{area="heap",} 1.073741824E9 jvm_memory_bytes_max{area="nonheap",} -1.0 # HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area. # TYPE jvm_memory_bytes_init gauge jvm_memory_bytes_init{area="heap",} 1.34217728E8 jvm_memory_bytes_init{area="nonheap",} 2555904.0 # HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_used gauge jvm_memory_pool_bytes_used{pool="Code Cache",} 3.3337536E7 jvm_memory_pool_bytes_used{pool="Metaspace",} 1.04914136E8 jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 1.384304E7 jvm_memory_pool_bytes_used{pool="G1 Eden Space",} 3.3554432E7 jvm_memory_pool_bytes_used{pool="G1 Survivor Space",} 1048576.0 jvm_memory_pool_bytes_used{pool="G1 Old Gen",} 6.8581912E7 # HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_committed gauge jvm_memory_pool_bytes_committed{pool="Code Cache",} 3.3619968E7 jvm_memory_pool_bytes_committed{pool="Metaspace",} 1.19697408E8 jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 1.7985536E7 jvm_memory_pool_bytes_committed{pool="G1 Eden Space",} 4.6137344E7 jvm_memory_pool_bytes_committed{pool="G1 Survivor Space",} 1048576.0 jvm_memory_pool_bytes_committed{pool="G1 Old Gen",} 8.8080384E7 # HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_max gauge jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8 jvm_memory_pool_bytes_max{pool="Metaspace",} -1.0 jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 1.073741824E9 jvm_memory_pool_bytes_max{pool="G1 Eden Space",} -1.0 jvm_memory_pool_bytes_max{pool="G1 Survivor Space",} -1.0 jvm_memory_pool_bytes_max{pool="G1 Old Gen",} 1.073741824E9 # HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool. # TYPE jvm_memory_pool_bytes_init gauge jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0 jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0 jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0 jvm_memory_pool_bytes_init{pool="G1 Eden Space",} 7340032.0 jvm_memory_pool_bytes_init{pool="G1 Survivor Space",} 0.0 jvm_memory_pool_bytes_init{pool="G1 Old Gen",} 1.26877696E8 # HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool. # TYPE jvm_buffer_pool_used_bytes gauge jvm_buffer_pool_used_bytes{pool="direct",} 148590.0 jvm_buffer_pool_used_bytes{pool="mapped",} 0.0 # HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool. # TYPE jvm_buffer_pool_capacity_bytes gauge jvm_buffer_pool_capacity_bytes{pool="direct",} 148590.0 jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0 # HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool. # TYPE jvm_buffer_pool_used_buffers gauge jvm_buffer_pool_used_buffers{pool="direct",} 19.0 jvm_buffer_pool_used_buffers{pool="mapped",} 0.0 # HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM # TYPE jvm_classes_loaded gauge jvm_classes_loaded 16993.0 # HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution # TYPE jvm_classes_loaded_total counter jvm_classes_loaded_total 17041.0 # HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution # TYPE jvm_classes_unloaded_total counter jvm_classes_unloaded_total 48.0 # HELP bwce_activity_stats_list BWCE Activity Statictics list # TYPE bwce_activity_stats_list gauge # HELP bwce_activity_counter_list BWCE Activity related Counters list # TYPE bwce_activity_counter_list gauge # HELP all_activity_events_count BWCE All Activity Events count by State # TYPE all_activity_events_count counter all_activity_events_count{StateName="CANCELLED",} 0.0 all_activity_events_count{StateName="COMPLETED",} 0.0 all_activity_events_count{StateName="STARTED",} 0.0 all_activity_events_count{StateName="FAULTED",} 0.0 # HELP activity_events_count BWCE All Activity Events count by Process, Activity State # TYPE activity_events_count counter # HELP activity_total_evaltime_count BWCE Activity EvalTime by Process and Activity # TYPE activity_total_evaltime_count counter # HELP activity_total_duration_count BWCE Activity DurationTime by Process and Activity # TYPE activity_total_duration_count counter # HELP bwpartner_instance:total_request Total Request for the partner invocation which mapped from the activities # TYPE bwpartner_instance:total_request counter # HELP bwpartner_instance:total_duration_ms Total Duration for the partner invocation which mapped from the activities (execution or latency) # TYPE bwpartner_instance:total_duration_ms counter # HELP bwce_process_stats BWCE Process Statistics list # TYPE bwce_process_stats gauge # HELP bwce_process_counter_list BWCE Process related Counters list # TYPE bwce_process_counter_list gauge # HELP all_process_events_count BWCE All Process Events count by State # TYPE all_process_events_count counter all_process_events_count{StateName="CANCELLED",} 0.0 all_process_events_count{StateName="COMPLETED",} 0.0 all_process_events_count{StateName="STARTED",} 0.0 all_process_events_count{StateName="FAULTED",} 0.0 # HELP process_events_count BWCE Process Events count by Operation # TYPE process_events_count counter # HELP process_duration_seconds_total BWCE Process Events duration by Operation in seconds # TYPE process_duration_seconds_total counter # HELP process_duration_milliseconds_total BWCE Process Events duration by Operation in milliseconds # TYPE process_duration_milliseconds_total counter # HELP bwdefinitions:partner BWCE Process Events count by Operation # TYPE bwdefinitions:partner counter bwdefinitions:partner{ProcessName="t1.module.item.getTransactionData",ActivityName="FTLPublisher",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="TransactionService",PartnerOperation="GetTransactionsOperation",Location="internal",PartnerMiddleware="MW",} 1.0 bwdefinitions:partner{ProcessName=" t1.module.item.auditProcess",ActivityName="KafkaSendMessage",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="AuditService",PartnerOperation="AuditOperation",Location="internal",PartnerMiddleware="MW",} 1.0 bwdefinitions:partner{ProcessName="t1.module.item.getCustomerData",ActivityName="JMSRequestReply",ServiceName="GetCustomer360",OperationName="GetDataOperation",PartnerService="CustomerService",PartnerOperation="GetCustomerDetailsOperation",Location="internal",PartnerMiddleware="MW",} 1.0 # HELP bwdefinitions:binding BW Design Time Repository - binding/transport definition # TYPE bwdefinitions:binding counter bwdefinitions:binding{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInterface="GetCustomer360:GetDataOperation",Binding="/customer",Transport="HTTP",} 1.0 # HELP bwdefinitions:service BW Design Time Repository - Service definition # TYPE bwdefinitions:service counter bwdefinitions:service{ProcessName="t1.module.sub.item.getCustomerData",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 bwdefinitions:service{ProcessName="t1.module.sub.item.auditProcess",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 bwdefinitions:service{ProcessName="t1.module.sub.orchestratorSubFlow",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 bwdefinitions:service{ProcessName="t1.module.Process",ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",} 1.0 # HELP bwdefinitions:gateway BW Design Time Repository - Gateway definition # TYPE bwdefinitions:gateway counter bwdefinitions:gateway{ServiceName="GetCustomer360",OperationName="GetDataOperation",ServiceInstance="GetCustomer360:GetDataOperation",Endpoint="bwce-demo-mon-orchestrator-bwce",InteractionType="ISTIO",} 1.0 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 1956.86 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.604712447107E9 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 763.0 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1048576.0 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 3.046207488E9 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 4.2151936E8 # HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds. # TYPE jvm_gc_collection_seconds summary jvm_gc_collection_seconds_count{gc="G1 Young Generation",} 540.0 jvm_gc_collection_seconds_sum{gc="G1 Young Generation",} 4.754 jvm_gc_collection_seconds_count{gc="G1 Old Generation",} 2.0 jvm_gc_collection_seconds_sum{gc="G1 Old Generation",} 0.563 # HELP jvm_threads_current Current thread count of a JVM # TYPE jvm_threads_current gauge jvm_threads_current 98.0 # HELP jvm_threads_daemon Daemon thread count of a JVM # TYPE jvm_threads_daemon gauge jvm_threads_daemon 43.0 # HELP jvm_threads_peak Peak thread count of a JVM # TYPE jvm_threads_peak gauge jvm_threads_peak 98.0 # HELP jvm_threads_started_total Started thread count of a JVM # TYPE jvm_threads_started_total counter jvm_threads_started_total 109.0 # HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers # TYPE jvm_threads_deadlocked gauge jvm_threads_deadlocked 0.0 # HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors # TYPE jvm_threads_deadlocked_monitor gauge jvm_threads_deadlocked_monitor 0.0
So, it can be a 50% of the metric endpoint response the part that I’m not using, so, why I am using disk space that I am paying for to storing it? And this is just for a “critical exporter”, one that I try to use as much information as possible, but think about how many exporters do you have and how much information you use for each of them.
Ok, so now the purpose and the motivation of this post are clear, but what we can do about it?
Discovering the REST API
Prometheus has an awesome REST API to expose all the information that you can wish about. If you have ever use the Graphical Interface for Prometheus (shown below) you are using the REST API because this is why is behind it.
Target view of the Prometheus Graphical Interface
We have all the documentation regarding the REST API in the Prometheus official documentation:
But what is this API providing us in terms of the time-series database TSDB that Prometheus is using?
TSDB Admin APIs
We have a specific API to manage the performance of the TSDB database but in order to be able to use it, we need to enable the Admin API. And that is done by providing the following flag where we are launching the Prometheus server --web.enable-admin-api.
If we are using the Prometheus Operator Helm Chart to deploy this we need to use the following item in our values.yaml
## EnableAdminAPI enables Prometheus the administrative HTTP API which includes functionality such as deleting time series. ## This is disabled by default. ## ref: https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-admin-apis ## enableAdminAPI: true
We have a lot of options enable when we enable this administrative API but today we are going to focus on a single REST operation that is the “stats”. This is the only method related to TSDB that it doesn’t require to enable the Admin API. This operation, as we can read in the Prometheus documentation, returns the following items:
headStats: This provides the following data about the head block of the TSDB:
numSeries: The number of series.
chunkCount: The number of chunks.
minTime: The current minimum timestamp in milliseconds.
maxTime: The current maximum timestamp in milliseconds.
seriesCountByMetricName: This will provide a list of metrics names and their series count.
labelValueCountByLabelName: This will provide a list of the label names and their value count.
memoryInBytesByLabelName This will provide a list of the label names and memory used in bytes. Memory usage is calculated by adding the length of all values for a given label name.
seriesCountByLabelPair This will provide a list of label value pairs and their series count.
To access to that API we need to hit the following endpoint:
GET /api/v1/status/tsdb
So, when I am doing that in my Prometheus deployment I get something similar to this:
We can also check the same information if we use the new and experimental React User Interface on the following endpoint:
/new/tsdb-status
Graphical Visualization of top 10 series count by metric name in the new Prometheus UI
So, with that, you will get the Top 10 series and labels that are inside your time-series database, so in case, some of them are not useful you can just get rid of them using the normal approaches to drop a series or a label. This is great, but what if all the ones shown here are relevant, what can we do about it?
Mmmm, maybe we can use PromQL to monitor this (dogfodding approach). So if we would like to extract the same information but using PromQL we can do it with the following query:
topk(10, count by (__name__)({__name__=~".+"}))
Top 10 of metric series generated and stored in the time series database
And now we have all the power at my hands. For example, let’s take a look not at the 10 more relevant but the 100 more relevants or any other filter that we need to apply. For example, let’s see the metrics regarding with the JVM that we discussed at the beginning. And we will do that with the following PromQL query:
topk(100, count by (__name__)({__name__=~"jvm.+"}))
Top 100 of metric series regarding to JVM metrics
So we can see that we have at least 150 series regarding to metrics that I am not using at all. But let’s do it even better, let’s take a look at the same but group by job names:
topk(10, count by (job,__name__)({__name__=~".+"}))
Result of checking the top 10 metric series count with the job that is generating them
Prometheus is one of the key systems in nowadays cloud architectures. The second graduate project from the Cloud Native Computing Foundation (CNCF) after Kubernetes itself, and is the monitoring solution for excellence in most of the workloads running on Kubernetes.
If you already have used Prometheus for some time, you know that it relies on a Time series database so Prometheus storage is one of the key elements. Based on their own words from the Prometheus official page:
Storage | Prometheus
An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
Every time series is uniquely identified by its metric name and optional key-value pairs called labels, and that series is similar to the tables in a relational model. And inside each of those series, we have samples that are similar to the tuples. And each of the samples contains a float value and a milliseconds-precision timestamp.
Default on-disk approach
By default, Prometheus uses a local-storage approach storing all those samples on disk. This data is distributed in different files and folders to group different chunks of data.
So, we have folders to create those groups, and by default, they are a two-hour block and can contain one or more files depending on the amount of data ingested in that period of time as each folder contains all the samples for that specific timeline.
Additionally, each folder also has some kind of metadata files that help locate each of the data files’ metrics.
A file is persistent in a complete manner when the block is over, and before that, it keeps in memory and uses a write-ahead log technical to recover the data in case of a crash of the Prometheus server.
So, at a high-level view, the directory structure of a Prometheus server’s data directory will look something like this:
Remote Storage Integration
Default on-disk storage is good and has some limitations in terms of scalability and durability, even considering the performance improvement of the latest version of the TSDB. So, if we’d like to explore other options to store this data, Prometheus provides a way to integrate with remote storage locations.
It provides an API that allows writing samples that are being ingested into a remote URL and, at the same time, be able to read back sample data for that remote URL as shown in the picture below:
As always in anything related to Prometheus, the number of adapters created using this pattern is huge, and it can be seen in the following link in detail:
Integrations | Prometheus
An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
Summary
Knowing how prometheus storage works is critical to understand how we can optimize their usage to improve the performance of our monitoring solution and provide a cost-efficient deployment.
In the following posts, we’re going to cover how we can optimize the usage of this storage layer, making sure that only the metrics and samples that are important to use are being stored, and also how to analyze which metrics are the ones used most of the time-series database to be able to take good decision about which metrics should be dropped and which ones should be kept.
Learn how you can leverage the use of Snyk inside your Docker engine installation
Security is the most relevant topic in modern architecture. It needs to be handled from all different perspectives. Having a single team auditing the platforms and the developments that we built is not enough.
The introduction of DevSecOps as the new normal, including the security teams and policies being part of the development process to avoid security becoming a blocker of innovation and make sure that the artifacts we deploy are secured, have made this clear.
Docker image scanning is one of the most important topics we can cover regarding the container images to know that all the internal components that are part of the image are safe from vulnerabilities. We usually rely on some systems to do so.
I wrote an article regarding the usage of one of the most relevant options (Harbor) from the open source world to do this job.
And this is also being done by different Docker repositories from cloud providers like Amazon ECR as of this year. But why do we need to wait until we push the images to an external Docker registry? Why can’t we do it in our local environment?
Now we can. Version 2.5.0.1 of the Docker engine also includes the Snyk components needed to inspect the Docker images directly from the command line:
So, let’s start. Let’s open a new terminal and type the following command:
docker scan <image-name>
As soon as we type this, the command will tell us that this scanning process will use Snyk to do that and we need to authorize access to those services to do the scanning process.
After that, we get a list of all the vulnerabilities detected, as you can see in the picture below:
Vulnerability scanning using your local Docker client
For each of the vulnerabilities, you can see the following data:
Detailed information provided for each of the vulnerabilities detected
We get the library with the vulnerability, the severity level, and a short description of it. If you need more details, you can also check the provided URL that is linked to a description page for that vulnerability:
Vulnerability detailed page from snyk
Finally, it also provides the sources introducing this library in your image so this can be solved quickly.
It provides a high-level view of the whole image too, as you can see here:
Overview of your Docker images with all the vulnerabilities detected
So, now you don’t have any excuse to not have all your images safe and secure before pushing to your local repository. Let’s do it!
Find how you can improve the size of your Docker images for a better experience and savings inside your organization.
Containerization is the new normal. We are all aware of that. All the new versions of the corporate software and all the open-source projects are including the options to use a docker image to run their software.
Probably you already have been doing your tests or even running in production workloads based on docker images that you have built yourself. If that is the case, you probably know one of the big challenges when you’re doing this kind of task: How to optimize the size of the image you generate?
One of the main reasons the docker image can be so big is because they are built following a layered concept. And that means that each of the images is being created as the addition of layers, each associated with the different commands you have in your Dockerfile.
Graphical explanation of how a Docker image is compounded.
Use dive to analyze the size of your images
dive is an open-source project that provides a detailed view of the composition of your docker images. It works as a command-line interface application that has a great view of the content of the layers, as you can see in the picture below:
Dive execution of a BusinessWorks Container Edition image
The tool follows an n-curses interface (if you are old enough to remember how tools were before Graphical User Interfaces was a thing; it should look familiar) and has these main features:
This tool will provide the list of layers in the top-left of the screen and the size associated with each of them.
Provides general stats about image efficiency (a percentage value), a potential view of the wasted size, and the image’s total size.
For each of the layers selected, you get a view on the file system for this view with the data of each folder’s size.
Also, get a view of the bigger elements and the number of replication of these objects.
Now, you have a tool that will help you first to know how your image is built and get performance data of each of the tweaks that you do to improve that size. So, let’s start with the tricks.
1.- Clean your image!
This first is quite obvious, but that doesn’t mean that it is not important. Usually, when you create a Docker image, you follow the same pattern:
You declare a base image to leverage on.
You add resources to do some work.
You do some work.
Usually, we forget an additional step: To clean the added resources when they are not needed anymore! So, it is important to be sure that we remove each of the files that we don’t need anymore.
This also applies to other components like the apt cache when we are installing a new package that we need or any temporary folder that we need to perform an installation or some work to build the image.
2.- Be careful about how you create your Dockerfile
As we already mentioned, each of the commands that we declare in our Dockerfile generates a new layer. So, it is important to be very careful with the lines that we have in the Dockerfile. Even if this is a tradeoff regarding the readability of the Dockerfile, it is important to try to merge commands in the same RUN primitive to make sure we are not creating additional layers.
Sample for a Dockerfile with merged commands
You can also use Docker linters like Hadolint that will help you with this and other anti-patterns that you should avoid when you are creating a Dockerfile.
3.- Go for docker build — squash
The latest versions of the Docker engine provide a new option when you build your images to create with the minimized size squashing of the intermediate layers that can be created as part of the Dockerfile creation process.
That works, providing a new flag when you are doing the build of your image. So, instead of doing this:
To be able to use this option, you should enable the experimental features on your Docker Engine. To do that, you need to enable that in your daemon.json file and restart the engine. If you are using Docker for Windows or Docker for Mac, you can do it using the user interface as shown below:
Summary
These tweaks will help you make your Docker images thinner and much more pleasant the process of pulling and pushing and, at the same time, even saving some money regarding the storage of the images in the repository of your choice. And not only for you but for many others that can leverage the work that you are doing. So think about yourself but also think about the community.
In the previous post of these series regarding how to set up a Hybrid EKS cluster making use of both traditional EC2 machines but also serverless options using Fargate, we were able to create the EKS cluster with both deployment fashion available. If you didn’t take a look at it yet, do it now!
At that point, we have an empty cluster with everything ready to deploy new workloads, but we still need to configure a few things before doing the deployment. First thing is to decide which workloads are going to be deployed using the serverless option and which ones will use the traditional EC2 option.
By default, all the workloads deployed on the namespaces default and kube-system as you can see in the picture below form the AWS Console:
So that means that all workloads from the default namespace and the kube-system namespace will be deployed in a serverless fashion. If that’s what you’d like perfect. But sometimes you’d like to start with a delimited set of namespaces where you’d like to use the serverless option and rely on the traditional deployment.
We can check that same information using eksctl and to do that we need to type the following command:
eksctl get fargateprofile --cluster cluster-test-hybrid -o yaml
The output of that command should be something similar of the information that we can see in the AWS Console:
NOTE: If you don’t remember the name of your cluster you just need to type the command eksctl get clusters
So, this is what we’re going to do and to do that the first thing we need to do is to create a new namespace named “serverless” that is going to hold our serverless deployment and to do that we use a kubectl command as follows:
kubectl create namespace serverless
And now, we just need to create a new fargate profile that is going to replace the one that we have at the moment and to do that we need to make use again of eksctl to handle that job:
NOTE: We also can use not only namespace to limit the scope of our serverless deployment but also tags, so we can have in the same namespace workloads that are deployed using traditional deployment and others using serverless fashion. That will give us all the posibilities to design your cluster as you wish. To do that we will append the argument labels in a key=value fashion.
And we will get an output similar to this:
[ℹ] creating Fargate profile “fp-serverless-profile” on EKS cluster “cluster-test-hybrid”
[ℹ] created Fargate profile “fp-serverless-profile” on EKS cluster “cluster-test-hybrid”
If now we check the number of profiles that we have available we should get two profiles handling three namespaces (the ones that are managed by the default profile — default and kube-system — and the one — serverless — handled by the one we just created now)
We just will use the following command to delete the default profile:
And the output of that command should be similar to this one:
[ℹ] deleted Fargate profile “fp-default” on EKS cluster “cluster-test-hybrid”
And after that, we have now ready our cluster with limited scope for serverless deployments. In the next post of the series, we will just deploy workloads on both fashions to see the difference between them. So, don’t miss the updates regarding this series making sure that you follow my posts, and if you’d like the article, or you have some doubts or comments, please leave your feedback using the comments below!
EKS Fargate AWS Kubernetes Cluster: Learn how to create a Kubernetes cluster that can use also all the power of serverless computing using AWS Fargate
We know that there are several movements and paradigms that are pushing us hard to change our architectures trying to leverage much more managed services and taking care of the operational level so we can focus on what’s really important for our own business: creating applications and deliver value through them.
AWS from Amazon has been a critical partner during that journey, especially in the container world. With the release of EKS some time ago were able to provide a managed Kubernetes service that everyone can use, but also introducing the CaaS solution Fargate also gives us the power to run a container workload in a serverless fashion, without needing to worry about anything else.
But you could be thinking about if those services can work together. And the short answer is yes. But even more important than that we’re seeing that also they can work in a mixed mode:
So you can have an EKS cluster that has some nodes that are Fargate services and some nodes that are normal EC2 machines for workloads that are working in a state-full fashion or fits better in a traditional EC2 approach. And everything works by the same rules and is managed by the same EKS Cluster.
So, that sounds amazing but, How we can do that? Let’s see.
eksctl
To get to that point there is a tool that we need to introduce first, and that tool is named eksctl and it is a command-line utility that helps us to do any action to interact with the EKS service and simplifies a lot the work to do and also be able to automate most of the tasks in a non-human required mode. So, the first thing we need to is to get eksctl in our platforms ready. Let’s see how we can get that.
We have here all the detailed from Amazon itself about how to install eksctl in different platforms, no matter if you’re using Windows, Linux, or MacOS X:
Installing or updating eksctl – Amazon EKS
Learn how to install or update the eksctl command line tool. This tool is used to create and work with an Amazon EKS cluster.
After doing that we can check that we have installed the eksctl software running the command:
eksctl version
And we should get an output similar to this one:
eksctl version output command
So after doing that we can see that we have access to all the power behind the EKS service just typing these simple commands into our console window.
Creating the EKS Hybrid Cluster
Now, we’re going to create a mixed environment with some EC2 machines and enable the Fargate support for EKS. To do that, we will start with the following command:
eksctl create cluster --version=1.15 --name=cluster-test-hybrid --region=eu-west-1 --max-pods-per-node=1000 --fargate
[ℹ] eksctl version 0.26.0
[ℹ] using region eu-west-1
[ℹ] setting availability zones to [eu-west-1c eu-west-1a eu-west-1b]
[ℹ] subnets for eu-west-1c - public:192.168.0.0/19 private:192.168.96.0/19
[ℹ] subnets for eu-west-1a - public:192.168.32.0/19 private:192.168.128.0/19
[ℹ] subnets for eu-west-1b - public:192.168.64.0/19 private:192.168.160.0/19
[ℹ] using Kubernetes version 1.15
[ℹ] creating EKS cluster "cluster-test-hybrid" in "eu-west-1" region with Fargate profile
[ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=eu-west-1 --cluster=cluster-test-hybrid'
[ℹ] CloudWatch logging will not be enabled for cluster "cluster-test-hybrid" in "eu-west-1"
[ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=eu-west-1 --cluster=cluster-test-hybrid'
[ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "cluster-test-hybrid" in "eu-west-1"
[ℹ] 2 sequential tasks: { create cluster control plane "cluster-test-hybrid", create fargate profiles }
[ℹ] building cluster stack "eksctl-cluster-test-hybrid-cluster"
[ℹ] deploying stack "eksctl-cluster-test-hybrid-cluster"
[ℹ] creating Fargate profile "fp-default" on EKS cluster "cluster-test-hybrid"
[ℹ] created Fargate profile "fp-default" on EKS cluster "cluster-test-hybrid"
[ℹ] "coredns" is now schedulable onto Fargate
[ℹ] "coredns" is now scheduled onto Fargate
[ℹ] "coredns" pods are now scheduled onto Fargate
[ℹ] waiting for the control plane availability...
[✔] saved kubeconfig as "C:\\Users\\avazquez/.kube/config"
[ℹ] no tasks
[✔] all EKS cluster resources for "cluster-test-hybrid" have been created
[ℹ] kubectl command should work with "C:\\Users\\avazquez/.kube/config", try 'kubectl get nodes'
[✔] EKS cluster "cluster-test-hybrid" in "eu-west-1" region is ready
This command will setup the EKS cluster enabling the Fargate support.
NOTE: The first thing that we should notice is that the Fargate support for EKS is not yet available in all the AWS regions. So, depending on the region that you’re using you could get an error. At this moment this is just enabled in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo) based on the information from AWS Announcements: https://aws.amazon.com/about-aws/whats-new/2020/04/eks-adds-fargate-support-in-frankfurt-oregon-singapore-and-sydney-aws-regions/
So, now, we should add to that cluster a Node Group. a Node Group is a set of EC2 instances that are going to be managed as part of it. And to do that we will use the following command:
eksctl create nodegroup --cluster cluster-test-hybrid --managed
[ℹ] eksctl version 0.26.0
[ℹ] using region eu-west-1
[ℹ] will use version 1.15 for new nodegroup(s) based on control plane version
[ℹ] nodegroup "ng-1262d9c0" present in the given config, but missing in the cluster
[ℹ] 1 nodegroup (ng-1262d9c0) was included (based on the include/exclude rules)
[ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "cluster-test-hybrid"
[ℹ] 2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "ng-1262d9c0" } } }
[ℹ] checking cluster stack for missing resources
[ℹ] cluster stack has all required resources
[ℹ] building managed nodegroup stack "eksctl-cluster-test-hybrid-nodegroup-ng-1262d9c0"
[ℹ] deploying stack "eksctl-cluster-test-hybrid-nodegroup-ng-1262d9c0"
[ℹ] no tasks
[✔] created 0 nodegroup(s) in cluster "cluster-test-hybrid"
[ℹ] nodegroup "ng-1262d9c0" has 2 node(s)
[ℹ] node "ip-192-168-69-215.eu-west-1.compute.internal" is ready
[ℹ] node "ip-192-168-9-111.eu-west-1.compute.internal" is ready
[ℹ] waiting for at least 2 node(s) to become ready in "ng-1262d9c0"
[ℹ] nodegroup "ng-1262d9c0" has 2 node(s)
[ℹ] node "ip-192-168-69-215.eu-west-1.compute.internal" is ready
[ℹ] node "ip-192-168-9-111.eu-west-1.compute.internal" is ready
[✔] created 1 managed nodegroup(s) in cluster "cluster-test-hybrid"
[ℹ] checking security group configuration for all nodegroups
[ℹ] all nodegroups have up-to-date configuration
So now we should be able to use kubectl to manage this new cluster. If you don’t have installed kubectl or you haven’t heard about it. This is the command line tool that allow us to manage your Kubernetes Cluster and you can install it based on the documentation shown here:
Installing or updating eksctl – Amazon EKS
Learn how to install or update the eksctl command line tool. This tool is used to create and work with an Amazon EKS cluster.
So, now, we should start taking a look at the infrastructure that we have. So if we type the following command to see the nodes at our disposal:
kubectl get nodes
We see an output similar to this:
NAME STATUS ROLES AGE VERSION
fargate-ip-192-168-102-22.eu-west-1.compute.internal Ready <none> 10m v1.15.10-eks-094994
fargate-ip-192-168-112-125.eu-west-1.compute.internal Ready <none> 10m v1.15.10-eks-094994
ip-192-168-69-215.eu-west-1.compute.internal Ready <none> 85s v1.15.11-eks-bf8eea
ip-192-168-9-111.eu-west-1.compute.internal Ready <none> 87s v1.15.11-eks-bf8eea
As you can see we have 4 “nodes” two that start with the fargate name that are fargate nodes and two that just start with ip-… and those are the traditional EC2 instances. And after that moment that’s it, we have our mixed environment ready to use.
We can check the same cluster using the AWS EKS page to see that configuration more in detail. If we enter in the EKS page for this cluster we see in the Compute tab the following information:
We see under Node Groups the data around the EC2 machines that are managed as part of this cluster and as you can see we saw 2 as the Desired Capacity and that’s why we have 2 EC2 instances in our cluster. And regarding the Fargate profile, we see the namespaces set to default and kube-system and that means that all the deployments to those namespaces are going to be deployed using Fargate Tasks.
Summary
In the following articles in these series, we will see how to progress on our Hybrid cluster, deploy workloads scale it based on the demand that we’re getting, enabling integration with other services like AWS CloudWatch, and so on. So, stay tuned, and don’t forget to follow my articles to not miss any new updates as soon as it’s available to you!
Managed Container Platform is disrupting everything. We’re living in a time where development and the IT landscape are changing, new paradigms like microservices and containers seem to be out there for the last few years, and if we trust the reality that the blog posts and the articles show today, we’re all of the users already using them all the time.
Did you see any blog posts about how to develop a J2EE application running on your Tomcat server on-prem? Probably not. The most similar article should probably be how to containerize your tomcat-based application.
But do you know what? Most companies still are working that way. So even if all companies have a new digital approach in some departments, they also have other ones being more traditional.
So, that seems that we need to find a different way to translate the main advantages of a container-based platform to a kind of speech they can see and realize the tangible benefits they can get from there and have the “Hey, this can work for me!” kind of spirit.
1. You will get all components isolated and updated more quickly
That’s one of the great things about container-based platforms compared with previous approaches like application server-based platforms. When you have an application server cluster, you still have one cluster with several applications. So you usually do some isolation, keep related applications, provide independent infrastructure for the critical ones, and so on.
But even with that, at some level, the application continues to be coupled, so some issues with some applications could bring down another one that was not expected for business reasons.
With a container-based platform, you’re getting each application in its bubble, so any issue or error will affect that application and nothing more. Platform stability is a priority for all companies and all departments inside them. Just ask yourself: Do you want to end with those “domino’s chains” of failure? How much will your operations improve? How much will your happiness increase?
Additionally, based on the container approach, you will get smaller components. Each of them will do a single task providing a single capability to your business, which means that it will be much easier to update, test, and deploy in production. So that, in the end will generate more deployments into the production environment and reduce the time to market for your business capabilities.
You will be able to deploy faster and have more stable operations simultaneously.
2.- You will optimize the use of your infrastructure
Costs, everything is about costs. There are no single conversations with customers who are not trying to pay less for their infrastructure. So, let’s face it. We should be able to run operations in an optimized way. So, if our infrastructure cost is going higher, that needs to mean that our business increases.
Container-based platforms will allow optimizing infrastructure in two different ways. First, if using two main concepts: Elasticity and Infrastructure Sharing.
Elasticity is related because I’m only going to have the infrastructure I need to support the load I have at this moment. So, if the load increases, my infrastructure will increase to handle it, but after that moment goes away, it will go back to what it needs now after that peak happened.
Infrastructure sharing is about using each server’s part that is free to deploy other applications. Imagine a traditional approach where I have two servers for my set of applications. Probably I don’t have 100% usage of those servers because I need to have some spare computer to be able to act when the load increases. I probably have 60–70% percent. That means 30% free. If we have different departments with different systems, and each has its infrastructure 30% free, how much of our infrastructure are we just throwing away? How many dollars/euros/pounds are you just throwing off the window?
Container-based platforms don’t need specific tools or software installed on the platform to run a different kind of application. It is not required because everything resides inside the container, so I can use any free space to deploy other applications doing a more efficient usage of those.
3.- You will not need infrastructure for administration
Each system that is big enough has some resources dedicated to being able to manage it. However, even most of the recommended architectures recommend placing those components isolated from your runtime components to avoid any issue regarding administrator or maintenance that can affect your runtime workloads, which means specific infrastructure that you’re using for something that isn’t helping your business. Of course, you can explain to any business user that you need a machine to run that provides the capabilities required. But it is more complex than using additional infrastructure (and generating cost) to place other components that are not helping the business.
So, managed container platforms take that problem away because you’re going to provide the infrastructure you need to run your workloads, and you’re going to be given for free or such low fee the administration capabilities. And addition to that, you don’t even need to worry that administration features are always available and working fine because this is leverage to the provider itself.
Wrap up and next steps
As you can see, we describe very tangible benefits that are not industry-based or development focus. Of course, we can have so many more to add to this list, but these are the critical ones that affect any company in any industry worldwide. So, please, take your time to think about how these capabilities can help to improve your business. But not only that, take your time to quantify how that will enhance your business. How much can you save? How much can you get from this approach?
And when you have in front of you a solid business case based on this approach, you will get all the support and courage you need to move forward during that route!! So I wish you a peaceful transition!
Learn how you can include Harbor registry in your DevSecOps toolset to increase the security and management on your container-based platform
With the transition to a more agile development process where the number of deployments has been increased in an exponential way. That situation has made it quite complex to keep pace to make sure we’re not just deploying code more often into production that provides the capabilities that are required by the business. But, also, at the same time, we’re able to do it securely and safely.
That need is leading toward the DevSecOps idea to include security as part of the DevOps culture and practices as a way to ensure safety from the beginning on development and across all the standard steps from the developer machine to the production environment.
Additional to that, because of the container paradigm we have a more polyglot approach with different kinds of components running on our platform using a different base image, packages, libraries, and so on. We need to make sure they’re still secure to use and we need tools to be able to govern that in a natural way. To help us on that duty is where components like Harbor help us to do that.
Harbor is a CNCF project at the incubator stage at the moment of writing this article, and it provides several capabilities regarding how to manage container images from a project perspective. It gives a project approach with its docker registry and also a chart museum if we’d like to use Helm Charts as part of our project development. But it includes security features too, and that’s the one that we’re going to cover in this article:
Vulnerabilities Scan: it allows you to scan all the docker images registered in the different repositories to check if they have vulnerabilities. It also provides automation during that process to make sure that every time we push a new image, this is scanned automatically. Also, it will enable defining policies to avoid pulling any image with vulnerabilities and also set the level of vulnerabilities (low, medium, high, or critical) that we’d like to tolerate it. By default, it comes with Clair as the default scanner, but you can introduce others as well.
Signed images: Harbor registry provides options to deploy notary as part of its components to be able to sign images during the push process to make sure that no modifications are done to that image
Tag Inmuttability and Retention Rules: Harbor registry also provides the option to define tag immutability and retention rules to make sure that we don’t have any attempt to replace images with others using the same tag.
Harbor registry is based on docker so you can run it locally using docker and docker-compose using the procedure that is available on its official web page. But it also supports being installed on top of your Kubernetes platform using the helm chart and operator that is available.
Once the tool is installed, we have access to the UI Web Portal, and we’re able to create a project that has repositories as part of it.
Project List inside the Harbor Portal UI
As part of the project configuration, we can define the security policies that we’d like to provide to each project. That means that different projects can have different security profiles.
Security settings inside a Project in Harbor Porta UI
And once we push a new image to the repository that belongs to that project we’re going to see the following details:
In this case, I’ve pushed a TIBCO BusinessWorks Container Edition application that doesn’t contain any vulnerability and just shows that and also where this was checked.
Also, if we see the details, we can check additional information like if the image has been signed or not, or be able to check it again.
Image details inside Harbor Portal UI
Summary
So, this is just a few features that Harbor provides from the security perspective. But Harbor is much more than only that so probably we cover more of its features in further articles I hope based on what you read today you’d like to give it a chance and start introducing it in your DevSecOps toolset.
Find a way to re-define and re-organize the name of your Prometheus metrics to meet your requirements
Prometheus has become the new standard when we’re talking about monitoring our new modern application architecture, and we need to make sure we know all about its options to make sure we can get the best out of it. I’ve been using it for some time until I realized about a feature that I was desperate to know how to do, but I couldn’t find anywhere clearly define. So as I didn’t found it easily, I thought about writing a small article to show you how to do it without needed to spend the same time as I did.
We have plenty of information about how to configure Prometheus and use some of the usual configuration plugins, as we can see on its official webpage [1]. Even I already write about some configuration and using it for several purposes, as you can see also in other posts [2][3][4].
One of these configuration plugins is about relabeling, and this is a great thing. We have that each of the exporters can have its labels and meaning for those, and when you try to manage different technologies or components makes complex that all of them match together even if all of them follow the naming convention that Prometheus has [5].
But I had this situation, and I’m sure you have gone or will go towards that as well, that I have similar metrics for different technologies that for me are the same, and I need to keep them with the same name, but as they belong to other technologies they are not. So I need to find a way to rename the metric, and the great thing is that you can do that.
To do that, you just need to do a metric_relabel configuration. This configuration applies to relabel (as the name already indicates) labels of your prometheus metrics in this case before being ingested but also allow us to use some notable terms to do different things, and one of these notable terms is __name__. __name__ is a particular label that will enable you to rename your prometheus metrics before being ingested in the Prometheus Timeseries Database. And after that point, this will be as it will have that name since the beginning.
How to use that is relatively easy, is as any other relabel process, and I’d like to show you a sample about how to do it.
- source_labels: [__name__]
regex: 'jvm_threads_current'
target_label: __name__
replacement: 'process_thread_count'
Here it is a simple sample to show how we can rename a metric name jvm_threads_current to count the threads inside the JVM machine to do it more generic to be able to include the threads for the process in a process_thread_count prometheus metrics that we can use now as it was the original name.