Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes security is not a single feature you enable — it is a layered discipline that spans the control plane, workloads, networking, supply chain, and runtime. This guide covers the security controls that matter most in production, why each one exists, and how to implement them without breaking your cluster.

The Kubernetes Attack Surface

Before hardening anything, understand what you are protecting. A Kubernetes cluster has several distinct attack surfaces:

  • API server — The central control plane. Any entity that can reach it with valid credentials can read cluster state, modify workloads, or escalate privileges.
  • etcd — Stores all cluster state in plain text, including Secrets. Direct etcd access is equivalent to root on every node.
  • Nodes — A compromised node can access all Secrets mounted on pods running on it, access the kubelet API, and potentially escape to the hypervisor.
  • Pods — Privileged pods, host-network pods, and pods with excessive capabilities can break container isolation.
  • Supply chain — Malicious images, compromised registries, and unsigned artifacts can introduce attacker-controlled code into your cluster.
  • RBAC — Overly permissive roles allow lateral movement and privilege escalation once an attacker gains any foothold.

The controls below address each of these surfaces. Prioritize based on your threat model — a public-facing multi-tenant cluster needs all of them; an internal development cluster can relax some.

1. RBAC: Least Privilege from Day One

Role-Based Access Control is Kubernetes’ primary authorization mechanism. Most clusters fail at RBAC not because it is misconfigured, but because it is over-permissive by default and nobody reviews it systematically.

Common RBAC Mistakes

  • Binding to cluster-admin for convenience. Almost no workload needs cluster-admin. Use namespaced roles wherever possible.
  • Using * verbs or resources in roles. Wildcard permissions are almost always broader than intended.
  • Not auditing ServiceAccount token usage. Every pod gets a ServiceAccount. The default ServiceAccount in most namespaces has no permissions, but custom workloads often get over-permissive SAs.
  • Forgetting automountServiceAccountToken: false. If a workload does not need to talk to the Kubernetes API, disable token mounting entirely.

Practical RBAC Patterns

For a workload that only needs to read ConfigMaps in its own namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: configmap-reader
  namespace: my-app
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-configmap-reader
  namespace: my-app
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: my-app
roleRef:
  kind: Role
  name: configmap-reader
  apiGroup: rbac.authorization.k8s.io

Audit existing RBAC with kubectl-who-can or rbac-tool to find overly permissive bindings before attackers do.

2. Pod Security Standards

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces one of three security profiles at the namespace level:

  • Privileged — No restrictions. For system components only.
  • Baseline — Prevents the most critical privilege escalations: privileged containers, hostPID, hostIPC, hostNetwork, dangerous capabilities.
  • Restricted — Enforces current hardening best practices. Requires running as non-root, dropping all capabilities, and using a restricted seccomp profile.

Enable enforcement at the namespace level with labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.30
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.30
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.30

A pod that runs as root or requests host-network in a namespace enforcing restricted will be rejected at admission. The warn and audit modes let you test before enforcing.

PSA covers the most critical pod-level escalations, but it is coarse-grained. For fine-grained policy control, use Kyverno alongside PSA.

3. Network Policies: Micro-Segmentation

By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. This is a flat network model that gives attackers unrestricted lateral movement once they compromise any workload.

Network Policies define L3/L4 allow-rules for pod-to-pod communication. They are enforced by your CNI plugin (Calico, Cilium, Weave — not Flannel, which does not support NetworkPolicy).

Default Deny Pattern

Start by denying all ingress and egress in every namespace, then open only what is explicitly needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then allow specific traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 5432

Do not forget DNS egress — most workloads need to resolve names via kube-dns, which requires UDP port 53 egress to the kube-system namespace.

4. Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted. Stored in etcd in plain text by default. Anyone with get permission on Secrets can read them. This is not a vulnerability — it is a design decision that puts the responsibility on you to:

  • Enable encryption at rest for etcd. Configure EncryptionConfiguration with an AES-CBC or AES-GCM provider. This encrypts Secrets before they are written to etcd.
  • Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the External Secrets Operator means actual secret values never live in Kubernetes at all.
  • Restrict Secret RBAC aggressively. Never give list on Secrets cluster-wide — it returns all values. Use get on named resources where possible.
  • Avoid environment variables for secrets. Prefer volume mounts. Env vars are visible in pod inspect output and can leak through application logging.
# etcd encryption at rest - in kube-apiserver config
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}

5. Image Security and Supply Chain

Your runtime security posture is only as good as the images you run. A compromised image from a public registry bypasses every runtime control you have.

Scan images in CI

Use Trivy, Grype, or Snyk to scan images as part of your CI pipeline. Block deployments of images with critical CVEs:

# In your CI pipeline
trivy image --exit-code 1 --severity CRITICAL your-image:tag

Use a private registry with admission control

Only allow images from your private registry using an admission webhook (Kyverno, OPA Gatekeeper). This prevents developers from running arbitrary public images in production:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: Enforce
  rules:
  - name: validate-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Images must come from registry.company.com"
      pattern:
        spec:
          containers:
          - image: "registry.company.com/*"

Use distroless or minimal base images

Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no debugging tools. This drastically reduces the attack surface and the number of CVEs. Google’s distroless images are available for Java, Node.js, Python, and Go.

Sign and verify images

Cosign (from the Sigstore project) lets you sign container images and verify signatures at admission time using Kyverno or Connaisseur. This prevents image substitution attacks where an attacker replaces a legitimate image in your registry.

6. Runtime Security

Runtime security detects and responds to malicious activity after a container is running. The primary tool in this space is Falco — a CNCF project that uses eBPF to monitor system calls and raise alerts when containers behave unexpectedly.

Default Falco rules catch common attack patterns:

  • Shell spawned in a container
  • Network connection to an unexpected IP
  • Write to a sensitive file path (/etc/passwd, /etc/shadow)
  • Privilege escalation via setuid binaries
  • Container drift (new executable files written at runtime)

Combine Falco with seccomp profiles to restrict the system calls a container can make at the kernel level. The RuntimeDefault seccomp profile (available since Kubernetes 1.27 as a default) blocks 300+ system calls that containers virtually never need.

spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 65534
      capabilities:
        drop: ["ALL"]

These four securityContext settings together (allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ALL) make container escape significantly harder and satisfy the Kubernetes Restricted pod security standard.

7. API Server Hardening

The API server is the most critical component to harden. Key settings:

  • Disable anonymous authentication. --anonymous-auth=false ensures every request is authenticated.
  • Enable audit logging. Log all API server requests to a file or webhook. Without audit logs, you cannot investigate incidents or detect RBAC abuse.
  • Restrict admission plugins. Ensure NodeRestriction is enabled — it prevents node kubelets from modifying objects outside their own node.
  • Do not expose the API server to the internet. Use a VPN, bastion host, or private endpoint. If you must expose it, restrict access by IP.
# Minimal audit policy - log all requests at metadata level,
# and full request body for sensitive resources
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
- level: Metadata
  omitStages: ["RequestReceived"]

8. etcd Security

etcd stores all cluster state. Treat it as sensitive as your production database:

  • Enable TLS for all etcd communication. Both peer communication (etcd-to-etcd) and client communication (apiserver-to-etcd) must use mutual TLS.
  • Restrict network access to etcd. etcd should only be reachable by the API server. Use firewall rules or security groups to enforce this.
  • Enable encryption at rest. As described in the Secrets section above.
  • Backup etcd regularly. An etcd snapshot is a complete copy of all cluster state, including all Secrets. Encrypt backups and store them separately from the cluster.

9. CIS Kubernetes Benchmark

The CIS Kubernetes Benchmark is a comprehensive checklist of security controls covering the control plane, nodes, and workloads. Running kube-bench against your cluster gives you a scored assessment against the CIS controls:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs $(kubectl get pods -l app=kube-bench -o name)

kube-bench outputs PASS/FAIL/WARN for each control with remediation guidance. Run it after initial cluster setup and after major configuration changes.

10. Continuous Security Posture with Kubescape

Kubescape and similar tools (Starboard/Trivy Operator, KubeScore) provide continuous security scanning of live cluster state — not just a one-time audit. They check workloads against NSA/CISA hardening guidelines, MITRE ATT&CK framework, and the CIS benchmark in real time.

Deploy Trivy Operator for continuous in-cluster scanning:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aquasecurity/trivy-operator 
  --namespace trivy-system 
  --create-namespace 
  --set="trivy.ignoreUnfixed=true"

Trivy Operator creates VulnerabilityReport, ConfigAuditReport, and RbacAssessmentReport custom resources in the same namespace as each workload. These can be scraped by Prometheus and displayed in Grafana for a security dashboard.

Security Hardening Checklist

  • ✅ RBAC reviewed — no wildcard roles, no unnecessary cluster-admin bindings
  • ✅ ServiceAccount token automount disabled for workloads that do not need API access
  • ✅ Pod Security Standards enforced at namespace level (at least Baseline, Restricted where possible)
  • ✅ Network policies deployed — default deny with explicit allows
  • ✅ Secrets encrypted at rest in etcd
  • ✅ Images scanned in CI — no critical CVEs in production
  • ✅ Private registry enforced via admission control
  • ✅ Container securityContext hardened (non-root, read-only filesystem, no capabilities)
  • ✅ seccomp RuntimeDefault profile enabled
  • ✅ API server audit logging enabled
  • ✅ etcd TLS and network access restricted
  • ✅ kube-bench run and critical/high findings remediated
  • ✅ Runtime security (Falco) deployed and alerts routed to on-call
  • ✅ Continuous scanning (Trivy Operator or Kubescape) deployed

FAQ

Where do I start if my cluster has no security controls today?

Start with the highest-impact, lowest-effort controls first: audit your RBAC (revoke cluster-admin where not needed), enable Pod Security Admission in warn mode on all namespaces, and deploy Trivy Operator. These three steps give you immediate visibility and prevent the most common privilege escalations without breaking anything.

Does enabling Network Policies break DNS resolution?

Yes, if you deploy a default-deny egress policy without explicitly allowing DNS. Add an egress rule allowing UDP port 53 to the kube-dns service in kube-system when applying default-deny network policies.

Is Kubernetes certified for PCI-DSS or SOC 2?

Kubernetes itself is not certified — your configuration and the controls you implement determine compliance. The CIS Kubernetes Benchmark maps to many PCI-DSS and SOC 2 requirements. Managed Kubernetes offerings (EKS, GKE, AKS) have their own compliance certifications for the underlying infrastructure.

Should I use OPA Gatekeeper or Kyverno?

Both enforce admission policies, but Kyverno is Kubernetes-native (policies are written as YAML) while Gatekeeper uses Rego (a purpose-built policy language). For teams without Rego expertise, Kyverno is significantly faster to adopt and maintain. For teams already using OPA elsewhere in their stack, Gatekeeper offers consistency. Both integrate well with GitOps workflows.

How often should I update Kubernetes for security patches?

Follow a patch release within 30 days of release for CVEs rated High or Critical. Minor version upgrades (e.g., 1.29 → 1.30) should happen within the support window — Kubernetes maintains the last three minor versions. Falling more than one minor version behind means running without security patches for a growing subset of the codebase.

For a deeper look at how security fits into the broader Kubernetes platform architecture, see the Kubernetes architecture patterns guide and the guide on building a security-first Kubernetes culture.

ArgoCD Guide: GitOps Continuous Delivery for Kubernetes

ArgoCD Guide: GitOps Continuous Delivery for Kubernetes

ArgoCD has become the de facto standard for GitOps-based continuous delivery in Kubernetes. If you are running production workloads on Kubernetes and still deploying with raw kubectl apply or untracked Helm releases, ArgoCD solves a class of problems you may not even know you have yet. This guide covers everything from core concepts to production-grade configuration.

The Problem ArgoCD Solves

Traditional CI/CD pushes deployments into a cluster. A CI system runs tests, builds an image, and then executes kubectl apply or helm upgrade against the cluster. This model has several structural problems:

  • Drift goes undetected. Someone applies a hotfix directly to the cluster. Now your Git repository no longer reflects reality, and nobody knows it.
  • No single source of truth. The cluster state is authoritative, not Git. Your desired state and actual state can diverge silently.
  • Rollback is painful. Rolling back a bad deployment means re-running old CI pipelines or manually reversing changes, neither of which is fast.
  • Multi-cluster management compounds the problem. Each cluster becomes a snowflake with its own history of undocumented changes.

GitOps inverts this model. Git is the source of truth. The cluster pulls its desired state from Git and continuously reconciles toward it. ArgoCD is the most mature GitOps operator for Kubernetes, implementing this pull-based model with a production-ready feature set.

How ArgoCD Works: Core Architecture

ArgoCD runs as a set of controllers inside your Kubernetes cluster. The core components are:

  • Application Controller — Watches both the Git repository and the live cluster state. Computes the diff and drives reconciliation.
  • API Server — Exposes the gRPC/REST API consumed by the CLI, UI, and external systems.
  • Repository Server — Generates Kubernetes manifests from source (Helm, Kustomize, plain YAML, Jsonnet).
  • Redis — Caches cluster state and repository data to reduce API server load.
  • Dex (optional) — Provides OIDC authentication for SSO integration.

The fundamental unit in ArgoCD is an Application — a CRD that maps a source (a path in a Git repo at a specific revision) to a destination (a namespace in a cluster). ArgoCD continuously compares the desired state from Git with the live state in the cluster and reports on the sync status.

Sync Status vs Health Status

Two orthogonal concepts you need to understand from day one:

  • Sync Status — Does the live state match what Git says it should be? Values: Synced, OutOfSync, Unknown.
  • Health Status — Is the application actually working? Values: Healthy, Progressing, Degraded, Suspended, Missing, Unknown.

An application can be Synced but Degraded — the manifests were applied correctly, but a pod is crash-looping. Conversely, it can be OutOfSync but Healthy — someone applied a change directly to the cluster outside of Git.

Installing ArgoCD

The official installation method uses a single manifest. For production, always pin to a specific version:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.11.0/manifests/install.yaml

This deploys ArgoCD in the argocd namespace with full cluster-admin access. For a production HA setup, use the manifests/ha/install.yaml variant, which deploys multiple replicas of the API server and application controller.

Accessing the UI and CLI

The initial admin password is auto-generated and stored in a secret:

argocd admin initial-password -n argocd

For local access, port-forward the API server:

kubectl port-forward svc/argocd-server -n argocd 8080:443

Then log in via the CLI:

argocd login localhost:8080 --username admin --password <password> --insecure

For production, expose the ArgoCD server via an Ingress or LoadBalancer with a proper TLS certificate. If you’re using NGINX Ingress Controller:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: argocd-server-ingress
  namespace: argocd
  annotations:
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  ingressClassName: nginx
  rules:
  - host: argocd.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: argocd-server
            port:
              number: 443

Defining Your First Application

Applications can be created via the UI, the CLI, or declaratively with a YAML manifest. The declarative approach is the recommended one — it means your ArgoCD configuration itself is in Git:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/your-app
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Key fields to understand:

  • targetRevision — Can be a branch name, tag, or commit SHA. For production, pin to a tag rather than HEAD.
  • path — The directory within the repo containing your Kubernetes manifests.
  • automated.prune — Automatically delete resources that are no longer in Git. Required for true GitOps but use carefully — it will delete things.
  • automated.selfHeal — Automatically revert manual changes made directly to the cluster. This is what enforces Git as the single source of truth.

Helm Integration

ArgoCD has native Helm support. It can deploy Helm charts directly from chart repositories or from your Git repository. You can override values per environment:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prometheus-stack
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://prometheus-community.github.io/helm-charts
    chart: kube-prometheus-stack
    targetRevision: 58.4.0
    helm:
      releaseName: prometheus-stack
      valuesObject:
        grafana:
          adminPassword: "${GRAFANA_PASSWORD}"
        prometheus:
          prometheusSpec:
            retention: 30d
            storageSpec:
              volumeClaimTemplate:
                spec:
                  storageClassName: fast-ssd
                  resources:
                    requests:
                      storage: 50Gi
  destination:
    server: https://kubernetes.default.svc
    namespace: observability

One important nuance: ArgoCD renders Helm charts server-side using its own templating engine, not helm install. This means Helm hooks (pre-install, post-upgrade, etc.) are supported, but the release is not tracked in Helm’s release history. Running helm list will not show ArgoCD-managed releases unless you configure ArgoCD to use the Helm secrets backend.

Projects: Multi-Tenancy and Access Control

ArgoCD Projects provide multi-tenancy within a single ArgoCD instance. They let you restrict which source repositories, destination clusters, and namespaces a team can deploy to. Every Application belongs to a Project.

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: platform-team
  namespace: argocd
spec:
  description: Platform team applications
  sourceRepos:
  - 'https://github.com/your-org/*'
  destinations:
  - namespace: 'platform-*'
    server: https://kubernetes.default.svc
  clusterResourceWhitelist:
  - group: ''
    kind: Namespace
  namespaceResourceBlacklist:
  - group: ''
    kind: ResourceQuota

Projects are where you define the boundaries of what each team can do. The default project has no restrictions — never use it for production workloads. Create dedicated projects per team or per environment.

RBAC Configuration

ArgoCD has its own RBAC system layered on top of Kubernetes RBAC. It is configured via the argocd-rbac-cm ConfigMap. Roles are defined per project or globally:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.default: role:readonly
  policy.csv: |
    # Platform team has full access to platform-team project
    p, role:platform-admin, applications, *, platform-team/*, allow
    p, role:platform-admin, projects, get, platform-team, allow
    p, role:platform-admin, repositories, *, *, allow

    # Dev team can sync but not delete
    p, role:developer, applications, get, */*, allow
    p, role:developer, applications, sync, */*, allow
    p, role:developer, applications, action/*, */*, allow

    # Bind SSO groups to roles
    g, your-org:platform-team, role:platform-admin
    g, your-org:developers, role:developer

The policy.default: role:readonly ensures that any authenticated user who has no explicit role assignment gets read-only access — a safe default for production.

Multi-Cluster Management

ArgoCD can manage multiple Kubernetes clusters from a single control plane. Register external clusters with the CLI:

# First, ensure the target cluster context is in your kubeconfig
argocd cluster add production-eu-west --name production-eu-west

# Verify registration
argocd cluster list

ArgoCD will create a ServiceAccount in the target cluster and store its credentials as a Kubernetes secret in the ArgoCD namespace. Applications can then target this cluster by name in their destination.server field.

For large-scale multi-cluster setups, consider the App of Apps pattern or ApplicationSets. ApplicationSets are a controller that generates Applications dynamically based on generators — cluster lists, Git directory structures, or matrix combinations:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-addons
  namespace: argocd
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          environment: production
  template:
    metadata:
      name: '{{name}}-addons'
    spec:
      project: platform
      source:
        repoURL: https://github.com/your-org/cluster-addons
        targetRevision: HEAD
        path: 'addons/{{metadata.labels.region}}'
      destination:
        server: '{{server}}'
        namespace: kube-system

This single ApplicationSet deploys the appropriate addons to every cluster labeled environment: production, using each cluster’s region label to select the correct path in the repository.

Sync Strategies and Waves

When deploying complex applications with dependencies between resources, you need to control the order of deployment. ArgoCD provides two mechanisms:

Sync Phases

Resources are deployed in three phases: PreSync, Sync, and PostSync. Use Sync Hooks for resources that must complete before the main sync proceeds (database migrations, certificate issuance, etc.):

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: your-app:v1.2.3
        command: ["./migrate.sh"]
      restartPolicy: Never

Sync Waves

Within the Sync phase, waves control ordering. Resources with a lower wave number are applied and must become healthy before resources with higher wave numbers are applied:

# Applied first
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"

# Applied after wave 1 is healthy
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "2"

Notifications and Alerting

ArgoCD Notifications is a standalone controller that sends alerts when Application state changes. It supports Slack, PagerDuty, GitHub commit status, email, and a dozen other providers. Configure it via the argocd-notifications-cm ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  service.slack: |
    token: $slack-token
  template.app-sync-failed: |
    slack:
      attachments: |
        [{
          "title": "{{.app.metadata.name}}",
          "color": "#E96D76",
          "fields": [{
            "title": "Sync Status",
            "value": "{{.app.status.sync.status}}",
            "short": true
          },{
            "title": "Message",
            "value": "{{range .app.status.conditions}}{{.message}}{{end}}",
            "short": false
          }]
        }]
  trigger.on-sync-failed: |
    - when: app.status.sync.status == 'Unknown'
      send: [app-sync-failed]
    - when: app.status.operationState.phase in ['Error', 'Failed']
      send: [app-sync-failed]

Secret Management with ArgoCD

ArgoCD intentionally has no secret management built in — storing secrets in Git as plain text is never acceptable. The common patterns are:

  • Sealed Secrets (Bitnami) — Encrypts secrets with a cluster-specific key. The encrypted secret can be committed to Git; only the cluster can decrypt it.
  • External Secrets Operator — Syncs secrets from Vault, AWS Secrets Manager, GCP Secret Manager, etc. into Kubernetes secrets. The ArgoCD Application manages the ExternalSecret CRD, not the actual secret value.
  • argocd-vault-plugin — A plugin that replaces placeholder values in manifests with secrets retrieved from Vault at sync time.

The External Secrets Operator approach is the most flexible for teams already using a centralized secrets backend. The Application in ArgoCD deploys ExternalSecret objects, which the ESO controller resolves at runtime without ever touching Git.

Production Best Practices

  • Run ArgoCD in HA mode. Use manifests/ha/install.yaml with 3 replicas of the API server and multiple application controller shards for large clusters (100+ applications).
  • Pin image versions. Never use latest for the ArgoCD image itself. Pin to a specific version and upgrade deliberately.
  • Use the App of Apps pattern for bootstrapping. A single root Application deploys all other Applications. This makes cluster bootstrapping idempotent and reproducible.
  • Separate ArgoCD config from application config. Store ArgoCD Application manifests in a dedicated gitops repository, separate from application source code.
  • Enable resource tracking via annotations. Use application.resourceTrackingMethod: annotation in argocd-cm instead of the default label-based tracking, which can conflict with Helm’s own labels.
  • Set resource limits on ArgoCD controllers. Application controller CPU and memory scale with the number of resources tracked. Monitor and tune accordingly.
  • Restrict auto-sync in production. Consider requiring manual sync approval for production environments even when using GitOps — or at minimum require a PR approval gate before changes reach the target branch.

ArgoCD vs Flux

Flux v2 is the other major GitOps operator. Both are CNCF projects. The main differences in practice:

FeatureArgoCDFlux v2
UIBuilt-in web UINo official UI (use Weave GitOps)
Multi-clusterSingle control plane manages many clustersAgent per cluster, pull model
ApplicationSetsNativeKustomization + HelmRelease
Secret managementPlugin-basedSOPS native integration
Learning curveSteeper (more concepts)Lower (Kubernetes-native CRDs)
CNCF statusGraduatedGraduated

ArgoCD wins when you need the UI, multi-cluster management from a central plane, or have a large operations team that benefits from the visual application topology view. Flux wins when you want a simpler, purely Kubernetes-native approach with better SOPS integration for secret management.

FAQ

Can ArgoCD deploy to the cluster it runs in?

Yes. The https://kubernetes.default.svc destination refers to the local cluster. ArgoCD can manage both its own cluster and external clusters simultaneously.

Does ArgoCD support private Git repositories?

Yes. Configure repository credentials via argocd repo add with SSH keys, HTTPS username/password, or GitHub App credentials. Credentials are stored as Kubernetes secrets in the ArgoCD namespace.

How does ArgoCD handle CRD installation?

CRDs can be managed by ArgoCD, but there is a chicken-and-egg problem: if a CRD is not yet installed, ArgoCD cannot validate resources that use it. The recommended pattern is to put CRDs in wave 1 and dependent resources in wave 2, or to use a separate Application for CRDs.

What is the difference between an Application and an AppProject?

An Application is the unit of deployment — it maps a Git source to a cluster destination. An AppProject is a grouping and access control boundary — it restricts what sources and destinations an Application within the project can use. Every Application belongs to exactly one AppProject.

How do I roll back a deployment with ArgoCD?

The GitOps way: revert the commit in Git and let ArgoCD reconcile. ArgoCD also provides a UI-based rollback to any previous sync revision, but this is considered a temporary measure — the Git history should always be updated to match.

Getting Started

The fastest path from zero to a working ArgoCD setup on a local cluster:

# 1. Create a local cluster (kind or minikube)
kind create cluster --name argocd-demo

# 2. Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 3. Wait for pods
kubectl wait --for=condition=Ready pods --all -n argocd --timeout=120s

# 4. Get the initial admin password
argocd admin initial-password -n argocd

# 5. Port-forward and log in
kubectl port-forward svc/argocd-server -n argocd 8080:443 &
argocd login localhost:8080 --username admin --insecure

# 6. Deploy your first application
argocd app create guestbook 
  --repo https://github.com/argoproj/argocd-example-apps.git 
  --path guestbook 
  --dest-server https://kubernetes.default.svc 
  --dest-namespace guestbook 
  --sync-policy automated

From here, the natural next steps are integrating ArgoCD with your existing CI pipeline (CI builds and pushes the image, updates the image tag in Git, ArgoCD detects the change and syncs), configuring SSO via Dex, and setting up the App of Apps pattern for managing multiple applications declaratively.

For teams looking to go deeper on GitOps and ArgoCD in production, the Kubernetes architecture patterns guide covers how ArgoCD fits into a broader platform engineering stack alongside service mesh, policy enforcement, and observability tooling.

TIBCO BusinessWorks HashiCorp Vault Integration: Secure Secrets in 3 Steps

TIBCO BusinessWorks HashiCorp Vault Integration: Secure Secrets in 3 Steps

Introduction

This article aims to show the TIBCO BW Hashicorp Vault Configuration to integrate your TIBCO BW application with the secrets stored in Hashicorp Vault, mainly for the externalization and management of password and credentials resources.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

As you probably know, in the TIBCO BW application, the configuration is stored in Properties at different levels (Module or Application properties). You can read more about them here. And the primary purpose of that properties is to provide flexibility to the application configuration.

These properties can be of different types, such as String, Integer, Long, Double, Boolean, and DateTime, among other technical resources inside TIBCO BW, as shown in the picture below:

TIBCO BW Hashicorp Vault Configuration: BW Property Types

The TIBCO BW Hashicorp Vault integration will affect only those properties of Password type (at least up to 2.7.2/6.8.1 BW version). The reason behind that is that those properties are the kind of data relevant to the information that is sensitive and needs to be secure. Other concepts can be managed through standard Kubernetes components such as ConfigMaps.

BW Application Definition

We are going to start with a straightforward application, as you can see in the picture below:

TIBCO BW Hashicorp Vault Configuration: Property sample

Just a simple timer that will be executed once and insert the current time into the PostgreSQL database. We will use Hashicorp Vault to store the password of the database user to be able to connect to it. The username and the connection string will reside on a ConfigMap.

We will skip the part of the configuration regarding the deployment of the TIBCO BW application Containers and link to a ConfigMap you have an article covering that in detail in case you need to follow it, and we will focus just on the topic regarding TIBCO BW Hashicorp Vault integration.

So we will need to tell TIBCO BW that the password of the JDBC Shared Resource will be linked to Hashicorp Vault configuration, and to do that, the first thing is to have tied the Password of the Shared Resources to a Module Property as shown in the picture below:

TIBCO BW Hashicorp Vault Configuration: Password linked to Module Property

Now, we need to tell this Module Property that is Linked to Hashicorp Vault, and we will do that on the Application Property View, selecting that this property is linked to a Credential Management Solution as shown in the picture below:

TIBCO BW Hashicorp Vault Configuration: Credential Management Configuration for Property

And it is now when we establish the TIBCO BW Hashicorp Vault relationship. We need to click directly on the green plus sign, and we will have a modal window asking for the technology of credentials management that we’re going to use and the data needed for each of them, as you can see in the following picture:

TIBCO BW Hashicorp Vault Configuration: Credential Management Configuration for Property

We will select Hashicorp Vault as the provided. Then we will need to provide three attributes that we already commented on in the previous article when we start creating secrets in Hashicorp Vault:

  • Secret Name: this is the secret name path after the root path of the element.
  • Secret Key: This is the key inside the secret itself
  • Mount Path: This is the root path of the secret

To get more details about these three concepts, please look at our article about how to create secrets in Hashicorp Vault.

So with all this, we have pretty much everything we need to connect to Hashicorp Vault and grab the secret, and from the TIBCO BW BusinessStudio side, everything is done; we can generate the EAR file and deploy it into Kubernetes because here it is the last part of our configuration.

 Kubernetes Deployment

Until this moment, we have the following information already provided:

  • BW Process that has the login to connect to the Database and insert information
  • Link between the password property used to connect and the Hashicorp Secret definition

So, pretty much everything is there, but one concept is missing. How will the Kubernetes Pod connect to Hashicorp once the pod is deployed? Until this point, we didn’t provide the Hashicorp Vault server location of the authentication method to connect to it. This is the missing part of the TIBCO BW Hashicorp Vault integration and will be part of the Kubernetes Deployment YAML file.

We will do that using the following environment properties in this sample:

TIBCO BW Hashicorp Vault Configuration: Hashicorp Environment Variables
  • HASHICORP_VAULT_ADDR: This variable will point to where the Hashicorp Vault server is located
  • HASHICORP_VAULT_AUTH: This variable will indicate which authentication options will be used. In our case, we will use the token one as we used in the previous article
  • HASHICORP_VAULT_KV_VERSION: This variable indicates which version of the KV storage solution we are using and will be two by default.
  • HASHICORP_VAULT_TOKEN: This will be just the token value to be able to authentication against the Hashicorp Vault server

If you are using other authentication methods or just want to know more about those properties please take a look at this documentation from TIBCO.

With all that added to the environment properties of our TIBCO BW application, we can run it, and we will get an output similar to this one, and that shows that the TIBCO BW Hashicorp Vault integration has been done and the application was able to start without any issue

TIBCO BW Hashicorp Vault Configuration: Running sample

TIBCO BusinessWorks Modules Explained: Types, Limitations, and Best Practices

TIBCO BusinessWorks Modules Explained: Types, Limitations, and Best Practices

TIBCO BW Modules are one of the most relevant contents on your TIBCO BW developments. Learn all the details about the different TIBCO BW Modules available and when to use each of them.

TIBCO BW has evolved in several ways and adapter to the latest changes of architecture. Because of that, since the conception of the latest major version, it has introduced several concepts that is important to master to be able to unleash all the power that this remarkable tool provides to you. Today we are going to talk about the Modules.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

Every TIBCO BW application is composed of different modules that are the components that host all the logic that you can create, and that’s the first thing to write down: All your code and everything you do in your application will belong to one TIBCO BW Module.

If we think about the normal hierarchy of TIBCO BW components it will be something like that picture below:

TIBCO BusinessWorks Modules Explained: Types, Limitations, and Best Practices

At the top level, we will have the Application; at the second level, we will have the modules, and after that, we will have the packages and finally, the technical components such as Process, Resources, Classes, Schemas, Interfaces, and so on. Learn more about this here.

TIBCO BW Module Classification

There are several kind of module and each of them provides a specific use-case and has some characteristics associated with it.

  • Application Module: It is the most important kind of module because without each you cannot have an application. It is the master module and only can be one per application. It is where all your main logic to that application will reside.
  • Shared Module: It is the other only BW native module and it is main purpose as the name shows it is to host all the code and components that can be shared between several applications. If you have experience with previous versions of TIBCO BW you can think on this TIBCO BW Module as a replacement of a Design Time Library (a.k.a DTL) or if you have experience with a programming language a library that is imported to the code. Because of that it doesn’t have a restriction on the number of applications that can use a share module and there is no limitation on the number of share modules that a TIBCO BW Application can have.
  • OSGI Module: This module is the one that is not BW native and it is not going to be include BW objects such as Processes, Resources and so on, and there are mainly concieved to have Java classes. And again it is more like a helper module that also can be shared as needed. Usual scenarios for use this kind of module is to define Custom XPath Functions for example or to have Java Code shared between several applications.

Both Shared Modules and OSGI Modules can be defined as Maven dependencies and use the Maven process to publish them in a Maven repository and also to be retrieved from it based on the declaration.

That provides a very efficient way for distribution and version control of these shared components and, at the same, offers a similar process for other programming languages such as Java so that it will decrease the learning curve for that process.

TIBCO BW Module Limitations

As we already commented, there are some limitations or special characteristics that each module has. We should be very aware of it to help us properly distribute our code using the right kind of modules.

As commented, one application can have only one TIBCO BW Application Module. Even though it is technically possible to have the same BW Application Module in more than one application, that has no sense because both applications will be the same as its main code will be the same.

TIBCO BW Shared Modules at other hand, cannot have Starter components or Activator process as part of its declaration and all of them should reside on the TIBCO BW Application Module

Both TIBCO BW Application Module and TIBCO BW Shared Module can have Java code, but on the other way, the OSGI module can only have Java code and no other TIBCO BW resources.

TIBCO BW Shared Modules can be exported in two different ways, as regular modules (ZIP file with the source code) and in Binary format, to be shared among other developers but not allowing them to change or change their view of the implementation details. This is still supported for legacy reasons, but today’s recommended way to distribute the software is using Maven, as discussed above.

TIBCO BW Module Use-Cases

As commented there are different use cases for each of the module that because of that it will help you decide which component work best for each scenario:

  • TIBCO BW Shared Modules covers all the standard components needed for all the applications. Here, the main use-case is the framework components or main patterns that simplify the development and homogenize. This helps control standard capabilities such as error handling, auditing, logging, or even internal communication, so the developers only need to focus on the business logic for their use case.
  • Another use-case for TIBCO BW Shared Module encapsulates anything shared between applications, such as Resources, to connect to one backend, so all the applications that need to connect to that backend can import and avoid the need to re-work that part.
  • OSGi Module is to have Java code that has a weak relationship with the BW code, such as component to perform an activity such as Sign a PDF Document or Integrate with an element using a Java native API so we can keep it and evolve it separate to the TIBCO BW Code.
  • Another case for OSGI Module is defining the Custom XPath Functions that you will need as part of your Shared Module or your Application Module.
  • TIBCO BW Application Module, on the other hand, only should have code that is specific to the business problem that we are resolving here, moving all code that can be used for more than one application to a Shared Module.

Configure TIBCO BusinessWorks EMS Reconnection: 2 Reliable Approaches Explained

Configure TIBCO BusinessWorks EMS Reconnection: 2 Reliable Approaches Explained

On this article we are going to cover how TIBCO BW EMS Reconnection works and how you can apply it on your application and the pros and const about the different options available.

One of the main issues we all have faced when working on a TIBCO BW and EMS integration is the reconnection part. Even though this is something that we need on minimal occasions because of the TIBCO EMS server’s extreme reliability, it can have severe consequences if we don’t have that well configured.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

– /wp:paragraph –>

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

TIBCO BW supports many different integration methods and hundreds of connectors that allow you to connect to any possible source. But truth must be told, EMS is one of the standard connectors you need to enable. That’s why TIBCO BW and EMS usually goes together when it comes to a proper Integration Platform.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

JMS Support for TIBCO BW is out of the box, but like any other JMS implementation, you need to provide the client libraries to establish a real connection.

To do that, since TIBCO BW 6, a simple way is provided to simplify that process, and this is what we are going to cover in this article.

Problem description

The first thing is to know that you need to do something and the most important thing is to learn to understand what kind of error is related to this problem. You could find two different errors depending on where you are testing this: design-time or runtime.

If we are talking about a runtime issue, you can see a trace similar to this one:

2022-06-02T13:27:15,867 ERROR [pool-13-thread-2] c.t.b.thor.runtime.model.Constituent - The following error has occurred for "name: test-app version: 1.0.0.qualifier bundle name: test-app " which needs to be resolved.
2022-06-02T13:27:15,878 ERROR [pool-13-thread-2] c.t.b.thor.runtime.model.Constituent - TIBCO-BW-FRWK-600053: Failed to initialize BW Component [ComponentStarter].
<CausedBy> com.tibco.bw.core.runtime.api.BWEngineException: TIBCO-BW-CORE-500232: Failed to initialize BW Component [ComponentStarter], Application [test-app:1.0] due to activity initialization error.
<CausedBy> com.tibco.bw.core.runtime.ActivityInitException: TIBCO-BW-CORE-500408: Failed to initialize the ProcessStarter activity [JMSReceiveMessage] in process [com.test.Starter], module [test-app] **due to unexpected activity lifecycle error.**
**<CausedBy> java.lang.NullPointerException**

Each time you see a java.lang.NullPointerExceptionrelated to a JMS Receive activity, you can be sure the issue is related to the installation of the drivers.

If we are talking about Design-time, you will see the same error when you are trying to start a Run or Debug session, but additional you will see the following error when you are testing a JMS Connection Resource, as you can see in the picture below:

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

Installation Process

The installation process is quite simple, but you need access to an EMS installation or at least a disk location with the clients stored. If you already have that, you just need to go to the following location:

 TIBCO_HOME/bw/<version>/bin

Where TIBCO_HOME is the installation folder for the BusinessWorks application, and version is the minor version format (such as 6.7, 2.7, 6.8, and so on).

At this location, you will run the following command:

 ./bwinstall ems-driver

This will start and ask for the location of the client libraries, as you can see in the picture below:

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

And after that, it will do the process of installing it will end with the BUILD SUCCESSFULL output. And after that point, you will need to restart the Business Studio or the runtime components (such as AppNodes or bwagent) to have the configuration applied.

Solution for Transformation failed for XSLT input in BusinessWorks

Solution for Transformation failed for XSLT input in BusinessWorks

Solving one of the most common developer issues using BusinessWorks

The Transformation failed for XSLT input is one of the most common error messages you could see when developing using the TIBCO BusinessWorks tools. Understanding what the message is saying is essential to provide a quick solution.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

I have seen developers needing hours and hours trying to troubleshoot this kind of error when most of the time, all the information you are receiving is just in front of you, but you need to understand why the engine is aching for it.

But let’s provide a little bit of context first. What is this error that we’re talking about? I’m talking about something like what you can see in the log trace below:

...
com.tibco.pvm.dataexch.xml.util.exceptions.PmxException: PVM-XML-106027: Transformation failed for XSLT input '<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tns1="http://www.tibco.com/pe/WriteToLogActivitySchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:tib="http://www.tibco.com/bw/xslt/custom-functions" version="2.0"><xsl:param name="Mapper2"/><xsl:template name="Log-input" match="/"><tns1:ActivityInput><message><xsl:value-of select="tib:add-to-dateTime($Mapper2/primitive,0,0,1,0,0,0)"/></message></tns1:ActivityInput></xsl:template></xsl:stylesheet>'
	at com.tibco.pvm.infra.dataexch.xml.genxdm.expr.IpmxGenxXsltExprImpl.eval(IpmxGenxXsltExprImpl.java:65)
	at com.tibco.bx.core.behaviors.BxExpressionHelper.evalAsSubject(BxExpressionHelper.java:107)
...

Sounds more familiar now? As I already said, all TIBCO BusinessWorks developers have faced that. As I said, I have seen some of them struggling or even re-doing the job repeatedly without finding a proper solution. But the idea here is to try to solve it efficiently and quickly.

I will start with the initial warnings: Let’s avoid re-coding, re-doing something that should work but is not working, because you are not winning in any scenario:

  • If you re-do it again and it didn’t work, you lose x2 time creating something you already have, and you are still stuck.
  • If you re-do it again and it works, you don’t know what was wrong, so that you will face it again shortly.

So, how do this works? Let’s use this process:

Collect —> Analyze —> Understand —> Fix.

Scenario

So first, we need to collect data, and most of the time, as I said, it is enough with the log trace that we have, but at some moment, you could need to require the code of your process to detect the error the solution.

So we will start with this source code that you can see in the screenshot below and the following explanation:

Solution for Transformation failed for XSLT input in BusinessWorks

We have a Mapper that is defining a schema with an int element that is an option, and we’re not providing any value:

Solution for Transformation failed for XSLT input in BusinessWorks

And then we’re using the date to print a log, adding one more day to that date:

Solution for Transformation failed for XSLT input in BusinessWorks

Solution

First, we need to understand the error message. When we’re getting an error message titled `Transformation failed for XSLT input that says precisely this:

I tried to execute the XSLT that is related to one activity, and I failed because of a technical error

As probably you already know, each of the BusinessWorks activities executed an XSL Transformation internally to do the Input mapping; you can see it in the Business Studio directly. So, this error is telling you that this internal XSL is failing.

Solution for Transformation failed for XSLT input in BusinessWorks

And the reason is a run-time issue that cannot be detected at design time. So, to make it clear, the values of some of your variables and transforming fail. So first, you should focus on this. Usually, all the information is in the log trace itself, so let’s start analyzing that:

Solution for Transformation failed for XSLT input in BusinessWorks

So, let’s see all the information that we have here:

  • First of all, we can detect what activity is failing. You can easily do that if you are debugging locally, but if this is happening on the server can be more tricky; but with this information, you have: All the XSLT itself that is printed on the log trace, so you can easily
  • You also have a Caused by that is telling you why this is failing:
    • You can have several Caused by sentences that should be reading in cascading mode, so the lower one is the root issue generating the error for all the others above, so we should locate that first.
Solution for Transformation failed for XSLT input in BusinessWorks

In this case, the message is quite evident, as you can see in the trace below.

 com.tibco.xml.cxf.runtime.exceptions.FunctionException: XPath function {http://www.tibco.com/bw/xslt/custom-functions}add-to-dateTime exception: gregorian cannot be null.

So the add-to-dateTime function fails because one Gregorian argument (so, a date) is null. And that’s precisely what is happening in my case. If I provide a value to the parameter… Voilà, it is working!

Solution for Transformation failed for XSLT input in BusinessWorks

 Summary

Similar situations can happen with different root causes, but the most commons are:

  • Issue with optional and not optional elements so a null reaches a point where it should.
  • Validation errors because the input parameter doesn’t match the field definition.
  • Extended XML Types that are not supported by the function used.

All these issues can be easily and quickly solved following the reasoning we explained in this post!

So, let’s put it into practice next time you see a colleague with that issue and help them have a more efficient programming experience!

Optimize TIBCO BW SOAP API Response Time for Large Payloads

Optimize TIBCO BW SOAP API Response Time for Large Payloads

Discover how to fine-tune your SOAP services to be able to be efficient for massive payload requests

We all know that when we are implementing API, we need to carefully design the size that we can manage as the maximum of request size. So, for example, you should know that for an online request, the usual upper limit is 1 MB, everything beyond that we should be able to manage it differently (options to drive can be slicing the requests or using other protocols rather than HTTP to handle this kind of loads). But then real-life comes to tackle us there.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

Not always this is possible to stick to the plan. Not always we are the ones making that decision. And we can argue as long as we like that this is not a good idea and that is great, but at the same time, we need to do something that is working.

By default, when we are exposing a SOAP Service on TIBCO BusinessWorks, it relies on third-party libraries to manage the request, parse it, and help us access the requests’ content. Some of them come from the Apache Foundation, and the one that we are going to talk about is Apache Commons.

When we are sending a big request to our SOAP service, in my case, this is an 11 MB SOAP request to my system, and I start to see the following behavior:

Optimize TIBCO BW SOAP API Response Time for Large Payloads
  • Service is not responding to my consumer.
  • I can see significant usage of the HTTP Connector thread handling the request before sending it to the actual service.
  • CPU and memory are increasing a lot.

So, how can we improve that? The first thing is to go more into the details about that extensive CPU usage. For example, if we go into the stack trace that the HTTP Connector threads are running, you can see the following stack trace:

Optimize TIBCO BW SOAP API Response Time for Large Payloads

You can get that information from several sources:

  • One is using the JVisual VM and going to the snapshot details in the samples as I did.
  • You can also get a thread dump and use tools such as https://fastthread.io/index.jsp to visualize it graphically.

Here we can see that we’re stuck at the log method. And that is strange why I am logging these requests if I’m not doing that in the BusinessWorks configuration. The answer is quite simple: Apache libraries have their logging system that is not affected by the logback configuration BusinessWorks uses.

So, we can disable that using the following JVM property:

-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.NoOpLog

Response time has improved from 120 seconds for 11 MB to less than 3 seconds, including all the logic the service was doing. Pretty impressive, right?

Optimize TIBCO BW SOAP API Response Time for Large Payloads

Summary

I hope you find this interesting, and if you are one of those facing this issue now, you have information not to be stopped by this one. If you would like to submit your questions feel free to use one of the following options:

  • Twitter: You can send me a mention at @alexandrev on Twitter or a DM or even just using the hashtag #TIBFAQS that I will monitor.
  • Email: You can email me to alexandre.vazquez at gmail.com with your question.
  • Instagram: You can send me a DM on Instagram at @alexandrev
Optimize TIBCO BW SOAP API Response Time for Large Payloads

#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes

#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes

Provide more agility to your troubleshooting efforts by debugging exactly where the error is happening using Remote Debugging techniques

#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes
Photo by Markus Winkler on Unsplash

Container revolution has provided a lot of benefits, as we have been discussed in-depth in other articles, and at the same time, it has also provided some new challenges that we need to tackle.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

All agility that we have now in the hands of our developers needs to be also extended to the maintenance work and fixing things. We need to be agile as well. We know some of the main issues that we have regarding this: It works on my environment, with the data set that I have, I couldn’t see the issue, or I couldn’t reproduce the error, are just sentences that we listen to over and over and delay the resolution of some errors or improvements even when the solution is simple we struggle in getting a real scenario to test.

And here is where Remote Debugging comes in. Remote Debugging is, just as its own name clearly states, to be able to debug something that is not local that is remote. It has been focused since its conception in Mobile Development because it doesn’t matter how good the simulator is. You will always need to test in a real device to make sure everything is working properly.

So this is the same concept but applicable to a container, so that means that I have a TIBCO BusinessWorks application running on Kubernetes. We want to debug it as it has been running locally, as shown in the image before. To be able to do that, we need to follow these steps:

Enabling the Remote Debugging in the pod

The first step is to enable the remote debug option in the application and to do that, we need to use the internal API that the BusinessWorks provides, and to do that, we need to execute from inside the container:

curl -XPOST http://localhost:8090/bw/bwengine.json/debug/?interface=0.0.0.0&port=5554&engineName=Main

In case that we do not have any tool like curl or wget to hit a URL inside the container, you can always use the port-forward strategy to make the 8090 port from the pod accessible to enable the debug port using a command similar to the one below:

kubectl port-forward hello-world-test-78b6f9b4b-25hss 8090:8090

And then, we can hit it from our local machine to enable remote debugging

Make the Debug Port accessible to the Studio

To do the remote debugging, we need to be able to connect our local TIBCO BusinessStudio to this specific pod that is executing the load and, to do that, we need to have access to the Debug port. To get this, we have mainly two options that are the ones shown in the subsections below: Expose the port at the pod level and port-forwarding option.

Expose the port at the Pod Level

We need to have the debug port opened in our pod. To do that, we need to define another port that is not in use by the application, and it is not the default administration port (8090) to the one to be exposed. For example, in my case, I will use 5554 as the debug port, and to do that, I define another port to be accessed.

#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes
Definition of the debug port as a Service

Port-Forwarding Option

Another option if we do not want to expose the debug port all the time, even if this is not going to be used unless we’re executing the remote debug, we have another option to do a port-forward to the debug port in our local.

kubectl port-forward hello-world-test-78b6f9b4b-cctgh 5554:5554

Connection to the TIBCO Business Studio

Now that we have everything ready, we need to connect our local TIBCO Business Studio to the pod, and to do that, we need to follow these steps:

Run → Debug Configurations, and we select the Remote BusinessWorks

#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes
Selection of the Remote BusinessWorks application option in the Debug Configuration

And now we need to provide the connection details. For this case, we will use the localhost and port 5554 and click on the Debug button.

#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes
Setting the connection properties for the Remote Debugging

After that moment, we will establish a connection between both environments: the pod running on our Kubernetes cluster and our local TIBCO BusinessStudio. And as soon as we hit the container, we can see the execution in our local environment:

#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes
Remote Debugging execution from our TIBCO BusinesStudio instance

Summary

I hope you find this interesting, and if you are one of those facing this issue now, you have information not to be stopped by this one. If you would like to submit your questions feel free to use one of the following options:

  • Twitter: You can send me a mention at @alexandrev on Twitter or a DM or even just using the hashtag #TIBFAQS that I will monitor.
  • Email: You can send me an email to alexandre.vazquez at gmail.com with your question.
  • Instagram: You can send me a DM on Instagram at @alexandrev
#TIBFAQS Enabling Remote Debugging for TIBCO BusinessWorks Application on Kubernetes

How to Check TIBCO BusinessWorks Configuration at Runtime (OSGi lcfg Command)

How to Check TIBCO BusinessWorks Configuration at Runtime (OSGi lcfg Command)

Discover how the OSGI lcfg command can help you be sure which is the configuration at runtime.

Knowing the TIBCO BW configuration at runtime is became critical as you always need to know if the latest changes has been applied or just want to check the specific value for a Module Property as part of your development.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

When we are talking about applications deployed on the cloud one of the key things is Configuration Management. Especially if we include into the mix things like Kubernetes, Containers, External Configuration Management System things got tricky.

Usual configuration when we are talking about a Kubernetes environment for configuration management is the use of Config Maps or Spring Cloud Config.

When you can upload the configuration in a separate step as deploying the application, you can get into a situation where you are not sure about what is the running configuration that a BusinessWorks application has.

To check TIBCO BW configuration there is an easy way to know exactly the current values:

  • We just need to get inside the container to be able to access the internal OSGI console that allows us to execute administrative commands.
  • We have spoken other times about that API but in case you would like to take a deeper look you just need to check this link:
  • And one of the commands is lcfg that allows knowing which configuration is being used by the application that is running:
curl localhost:8090/bw/framework.json/osgi?command=lcfg

With an output similar to this:

How to Check TIBCO BusinessWorks Configuration at Runtime (OSGi lcfg Command)
Sample output for the lcfg command of a Running BusinessWorks Container Application

Summary

I hope you find this interesting, and if you are one of those facing this issue now, you have information not to be stopped by this one. If you would like to submit your questions feel free to use one of the following options:

  • Twitter: You can send me a mention at @alexandrev on Twitter or a DM or even just using the hashtag #TIBFAQs that I will monitor.
  • Email: You can send me an email to alexandre.vazquez at gmail.com with your question.
  • Instagram: You can send me a DM on Instagram at @alexandrev
How to Check TIBCO BusinessWorks Configuration at Runtime (OSGi lcfg Command)