Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes Security Best Practices: A Production Hardening Guide

Kubernetes security is not a single feature you enable — it is a layered discipline that spans the control plane, workloads, networking, supply chain, and runtime. This guide covers the security controls that matter most in production, why each one exists, and how to implement them without breaking your cluster.

The Kubernetes Attack Surface

Before hardening anything, understand what you are protecting. A Kubernetes cluster has several distinct attack surfaces:

  • API server — The central control plane. Any entity that can reach it with valid credentials can read cluster state, modify workloads, or escalate privileges.
  • etcd — Stores all cluster state in plain text, including Secrets. Direct etcd access is equivalent to root on every node.
  • Nodes — A compromised node can access all Secrets mounted on pods running on it, access the kubelet API, and potentially escape to the hypervisor.
  • Pods — Privileged pods, host-network pods, and pods with excessive capabilities can break container isolation.
  • Supply chain — Malicious images, compromised registries, and unsigned artifacts can introduce attacker-controlled code into your cluster.
  • RBAC — Overly permissive roles allow lateral movement and privilege escalation once an attacker gains any foothold.

The controls below address each of these surfaces. Prioritize based on your threat model — a public-facing multi-tenant cluster needs all of them; an internal development cluster can relax some.

1. RBAC: Least Privilege from Day One

Role-Based Access Control is Kubernetes’ primary authorization mechanism. Most clusters fail at RBAC not because it is misconfigured, but because it is over-permissive by default and nobody reviews it systematically.

Common RBAC Mistakes

  • Binding to cluster-admin for convenience. Almost no workload needs cluster-admin. Use namespaced roles wherever possible.
  • Using * verbs or resources in roles. Wildcard permissions are almost always broader than intended.
  • Not auditing ServiceAccount token usage. Every pod gets a ServiceAccount. The default ServiceAccount in most namespaces has no permissions, but custom workloads often get over-permissive SAs.
  • Forgetting automountServiceAccountToken: false. If a workload does not need to talk to the Kubernetes API, disable token mounting entirely.

Practical RBAC Patterns

For a workload that only needs to read ConfigMaps in its own namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: configmap-reader
  namespace: my-app
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: my-app-configmap-reader
  namespace: my-app
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: my-app
roleRef:
  kind: Role
  name: configmap-reader
  apiGroup: rbac.authorization.k8s.io

Audit existing RBAC with kubectl-who-can or rbac-tool to find overly permissive bindings before attackers do.

2. Pod Security Standards

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. Its replacement is Pod Security Admission (PSA), a built-in admission controller that enforces one of three security profiles at the namespace level:

  • Privileged — No restrictions. For system components only.
  • Baseline — Prevents the most critical privilege escalations: privileged containers, hostPID, hostIPC, hostNetwork, dangerous capabilities.
  • Restricted — Enforces current hardening best practices. Requires running as non-root, dropping all capabilities, and using a restricted seccomp profile.

Enable enforcement at the namespace level with labels:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.30
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.30
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.30

A pod that runs as root or requests host-network in a namespace enforcing restricted will be rejected at admission. The warn and audit modes let you test before enforcing.

PSA covers the most critical pod-level escalations, but it is coarse-grained. For fine-grained policy control, use Kyverno alongside PSA.

3. Network Policies: Micro-Segmentation

By default, every pod in a Kubernetes cluster can communicate with every other pod across all namespaces. This is a flat network model that gives attackers unrestricted lateral movement once they compromise any workload.

Network Policies define L3/L4 allow-rules for pod-to-pod communication. They are enforced by your CNI plugin (Calico, Cilium, Weave — not Flannel, which does not support NetworkPolicy).

Default Deny Pattern

Start by denying all ingress and egress in every namespace, then open only what is explicitly needed:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then allow specific traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - protocol: TCP
      port: 5432

Do not forget DNS egress — most workloads need to resolve names via kube-dns, which requires UDP port 53 egress to the kube-system namespace.

4. Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted. Stored in etcd in plain text by default. Anyone with get permission on Secrets can read them. This is not a vulnerability — it is a design decision that puts the responsibility on you to:

  • Enable encryption at rest for etcd. Configure EncryptionConfiguration with an AES-CBC or AES-GCM provider. This encrypts Secrets before they are written to etcd.
  • Use external secret stores. HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault with the External Secrets Operator means actual secret values never live in Kubernetes at all.
  • Restrict Secret RBAC aggressively. Never give list on Secrets cluster-wide — it returns all values. Use get on named resources where possible.
  • Avoid environment variables for secrets. Prefer volume mounts. Env vars are visible in pod inspect output and can leak through application logging.
# etcd encryption at rest - in kube-apiserver config
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}

5. Image Security and Supply Chain

Your runtime security posture is only as good as the images you run. A compromised image from a public registry bypasses every runtime control you have.

Scan images in CI

Use Trivy, Grype, or Snyk to scan images as part of your CI pipeline. Block deployments of images with critical CVEs:

# In your CI pipeline
trivy image --exit-code 1 --severity CRITICAL your-image:tag

Use a private registry with admission control

Only allow images from your private registry using an admission webhook (Kyverno, OPA Gatekeeper). This prevents developers from running arbitrary public images in production:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-image-registries
spec:
  validationFailureAction: Enforce
  rules:
  - name: validate-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Images must come from registry.company.com"
      pattern:
        spec:
          containers:
          - image: "registry.company.com/*"

Use distroless or minimal base images

Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no debugging tools. This drastically reduces the attack surface and the number of CVEs. Google’s distroless images are available for Java, Node.js, Python, and Go.

Sign and verify images

Cosign (from the Sigstore project) lets you sign container images and verify signatures at admission time using Kyverno or Connaisseur. This prevents image substitution attacks where an attacker replaces a legitimate image in your registry.

6. Runtime Security

Runtime security detects and responds to malicious activity after a container is running. The primary tool in this space is Falco — a CNCF project that uses eBPF to monitor system calls and raise alerts when containers behave unexpectedly.

Default Falco rules catch common attack patterns:

  • Shell spawned in a container
  • Network connection to an unexpected IP
  • Write to a sensitive file path (/etc/passwd, /etc/shadow)
  • Privilege escalation via setuid binaries
  • Container drift (new executable files written at runtime)

Combine Falco with seccomp profiles to restrict the system calls a container can make at the kernel level. The RuntimeDefault seccomp profile (available since Kubernetes 1.27 as a default) blocks 300+ system calls that containers virtually never need.

spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 65534
      capabilities:
        drop: ["ALL"]

These four securityContext settings together (allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, runAsNonRoot: true, capabilities.drop: ALL) make container escape significantly harder and satisfy the Kubernetes Restricted pod security standard.

7. API Server Hardening

The API server is the most critical component to harden. Key settings:

  • Disable anonymous authentication. --anonymous-auth=false ensures every request is authenticated.
  • Enable audit logging. Log all API server requests to a file or webhook. Without audit logs, you cannot investigate incidents or detect RBAC abuse.
  • Restrict admission plugins. Ensure NodeRestriction is enabled — it prevents node kubelets from modifying objects outside their own node.
  • Do not expose the API server to the internet. Use a VPN, bastion host, or private endpoint. If you must expose it, restrict access by IP.
# Minimal audit policy - log all requests at metadata level,
# and full request body for sensitive resources
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
- level: Metadata
  omitStages: ["RequestReceived"]

8. etcd Security

etcd stores all cluster state. Treat it as sensitive as your production database:

  • Enable TLS for all etcd communication. Both peer communication (etcd-to-etcd) and client communication (apiserver-to-etcd) must use mutual TLS.
  • Restrict network access to etcd. etcd should only be reachable by the API server. Use firewall rules or security groups to enforce this.
  • Enable encryption at rest. As described in the Secrets section above.
  • Backup etcd regularly. An etcd snapshot is a complete copy of all cluster state, including all Secrets. Encrypt backups and store them separately from the cluster.

9. CIS Kubernetes Benchmark

The CIS Kubernetes Benchmark is a comprehensive checklist of security controls covering the control plane, nodes, and workloads. Running kube-bench against your cluster gives you a scored assessment against the CIS controls:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs $(kubectl get pods -l app=kube-bench -o name)

kube-bench outputs PASS/FAIL/WARN for each control with remediation guidance. Run it after initial cluster setup and after major configuration changes.

10. Continuous Security Posture with Kubescape

Kubescape and similar tools (Starboard/Trivy Operator, KubeScore) provide continuous security scanning of live cluster state — not just a one-time audit. They check workloads against NSA/CISA hardening guidelines, MITRE ATT&CK framework, and the CIS benchmark in real time.

Deploy Trivy Operator for continuous in-cluster scanning:

helm repo add aquasecurity https://aquasecurity.github.io/helm-charts/
helm install trivy-operator aquasecurity/trivy-operator 
  --namespace trivy-system 
  --create-namespace 
  --set="trivy.ignoreUnfixed=true"

Trivy Operator creates VulnerabilityReport, ConfigAuditReport, and RbacAssessmentReport custom resources in the same namespace as each workload. These can be scraped by Prometheus and displayed in Grafana for a security dashboard.

Security Hardening Checklist

  • ✅ RBAC reviewed — no wildcard roles, no unnecessary cluster-admin bindings
  • ✅ ServiceAccount token automount disabled for workloads that do not need API access
  • ✅ Pod Security Standards enforced at namespace level (at least Baseline, Restricted where possible)
  • ✅ Network policies deployed — default deny with explicit allows
  • ✅ Secrets encrypted at rest in etcd
  • ✅ Images scanned in CI — no critical CVEs in production
  • ✅ Private registry enforced via admission control
  • ✅ Container securityContext hardened (non-root, read-only filesystem, no capabilities)
  • ✅ seccomp RuntimeDefault profile enabled
  • ✅ API server audit logging enabled
  • ✅ etcd TLS and network access restricted
  • ✅ kube-bench run and critical/high findings remediated
  • ✅ Runtime security (Falco) deployed and alerts routed to on-call
  • ✅ Continuous scanning (Trivy Operator or Kubescape) deployed

FAQ

Where do I start if my cluster has no security controls today?

Start with the highest-impact, lowest-effort controls first: audit your RBAC (revoke cluster-admin where not needed), enable Pod Security Admission in warn mode on all namespaces, and deploy Trivy Operator. These three steps give you immediate visibility and prevent the most common privilege escalations without breaking anything.

Does enabling Network Policies break DNS resolution?

Yes, if you deploy a default-deny egress policy without explicitly allowing DNS. Add an egress rule allowing UDP port 53 to the kube-dns service in kube-system when applying default-deny network policies.

Is Kubernetes certified for PCI-DSS or SOC 2?

Kubernetes itself is not certified — your configuration and the controls you implement determine compliance. The CIS Kubernetes Benchmark maps to many PCI-DSS and SOC 2 requirements. Managed Kubernetes offerings (EKS, GKE, AKS) have their own compliance certifications for the underlying infrastructure.

Should I use OPA Gatekeeper or Kyverno?

Both enforce admission policies, but Kyverno is Kubernetes-native (policies are written as YAML) while Gatekeeper uses Rego (a purpose-built policy language). For teams without Rego expertise, Kyverno is significantly faster to adopt and maintain. For teams already using OPA elsewhere in their stack, Gatekeeper offers consistency. Both integrate well with GitOps workflows.

How often should I update Kubernetes for security patches?

Follow a patch release within 30 days of release for CVEs rated High or Critical. Minor version upgrades (e.g., 1.29 → 1.30) should happen within the support window — Kubernetes maintains the last three minor versions. Falling more than one minor version behind means running without security patches for a growing subset of the codebase.

For a deeper look at how security fits into the broader Kubernetes platform architecture, see the Kubernetes architecture patterns guide and the guide on building a security-first Kubernetes culture.

ArgoCD Guide: GitOps Continuous Delivery for Kubernetes

ArgoCD Guide: GitOps Continuous Delivery for Kubernetes

ArgoCD has become the de facto standard for GitOps-based continuous delivery in Kubernetes. If you are running production workloads on Kubernetes and still deploying with raw kubectl apply or untracked Helm releases, ArgoCD solves a class of problems you may not even know you have yet. This guide covers everything from core concepts to production-grade configuration.

The Problem ArgoCD Solves

Traditional CI/CD pushes deployments into a cluster. A CI system runs tests, builds an image, and then executes kubectl apply or helm upgrade against the cluster. This model has several structural problems:

  • Drift goes undetected. Someone applies a hotfix directly to the cluster. Now your Git repository no longer reflects reality, and nobody knows it.
  • No single source of truth. The cluster state is authoritative, not Git. Your desired state and actual state can diverge silently.
  • Rollback is painful. Rolling back a bad deployment means re-running old CI pipelines or manually reversing changes, neither of which is fast.
  • Multi-cluster management compounds the problem. Each cluster becomes a snowflake with its own history of undocumented changes.

GitOps inverts this model. Git is the source of truth. The cluster pulls its desired state from Git and continuously reconciles toward it. ArgoCD is the most mature GitOps operator for Kubernetes, implementing this pull-based model with a production-ready feature set.

How ArgoCD Works: Core Architecture

ArgoCD runs as a set of controllers inside your Kubernetes cluster. The core components are:

  • Application Controller — Watches both the Git repository and the live cluster state. Computes the diff and drives reconciliation.
  • API Server — Exposes the gRPC/REST API consumed by the CLI, UI, and external systems.
  • Repository Server — Generates Kubernetes manifests from source (Helm, Kustomize, plain YAML, Jsonnet).
  • Redis — Caches cluster state and repository data to reduce API server load.
  • Dex (optional) — Provides OIDC authentication for SSO integration.

The fundamental unit in ArgoCD is an Application — a CRD that maps a source (a path in a Git repo at a specific revision) to a destination (a namespace in a cluster). ArgoCD continuously compares the desired state from Git with the live state in the cluster and reports on the sync status.

Sync Status vs Health Status

Two orthogonal concepts you need to understand from day one:

  • Sync Status — Does the live state match what Git says it should be? Values: Synced, OutOfSync, Unknown.
  • Health Status — Is the application actually working? Values: Healthy, Progressing, Degraded, Suspended, Missing, Unknown.

An application can be Synced but Degraded — the manifests were applied correctly, but a pod is crash-looping. Conversely, it can be OutOfSync but Healthy — someone applied a change directly to the cluster outside of Git.

Installing ArgoCD

The official installation method uses a single manifest. For production, always pin to a specific version:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.11.0/manifests/install.yaml

This deploys ArgoCD in the argocd namespace with full cluster-admin access. For a production HA setup, use the manifests/ha/install.yaml variant, which deploys multiple replicas of the API server and application controller.

Accessing the UI and CLI

The initial admin password is auto-generated and stored in a secret:

argocd admin initial-password -n argocd

For local access, port-forward the API server:

kubectl port-forward svc/argocd-server -n argocd 8080:443

Then log in via the CLI:

argocd login localhost:8080 --username admin --password <password> --insecure

For production, expose the ArgoCD server via an Ingress or LoadBalancer with a proper TLS certificate. If you’re using NGINX Ingress Controller:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: argocd-server-ingress
  namespace: argocd
  annotations:
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  ingressClassName: nginx
  rules:
  - host: argocd.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: argocd-server
            port:
              number: 443

Defining Your First Application

Applications can be created via the UI, the CLI, or declaratively with a YAML manifest. The declarative approach is the recommended one — it means your ArgoCD configuration itself is in Git:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/your-app
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Key fields to understand:

  • targetRevision — Can be a branch name, tag, or commit SHA. For production, pin to a tag rather than HEAD.
  • path — The directory within the repo containing your Kubernetes manifests.
  • automated.prune — Automatically delete resources that are no longer in Git. Required for true GitOps but use carefully — it will delete things.
  • automated.selfHeal — Automatically revert manual changes made directly to the cluster. This is what enforces Git as the single source of truth.

Helm Integration

ArgoCD has native Helm support. It can deploy Helm charts directly from chart repositories or from your Git repository. You can override values per environment:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prometheus-stack
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://prometheus-community.github.io/helm-charts
    chart: kube-prometheus-stack
    targetRevision: 58.4.0
    helm:
      releaseName: prometheus-stack
      valuesObject:
        grafana:
          adminPassword: "${GRAFANA_PASSWORD}"
        prometheus:
          prometheusSpec:
            retention: 30d
            storageSpec:
              volumeClaimTemplate:
                spec:
                  storageClassName: fast-ssd
                  resources:
                    requests:
                      storage: 50Gi
  destination:
    server: https://kubernetes.default.svc
    namespace: observability

One important nuance: ArgoCD renders Helm charts server-side using its own templating engine, not helm install. This means Helm hooks (pre-install, post-upgrade, etc.) are supported, but the release is not tracked in Helm’s release history. Running helm list will not show ArgoCD-managed releases unless you configure ArgoCD to use the Helm secrets backend.

Projects: Multi-Tenancy and Access Control

ArgoCD Projects provide multi-tenancy within a single ArgoCD instance. They let you restrict which source repositories, destination clusters, and namespaces a team can deploy to. Every Application belongs to a Project.

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: platform-team
  namespace: argocd
spec:
  description: Platform team applications
  sourceRepos:
  - 'https://github.com/your-org/*'
  destinations:
  - namespace: 'platform-*'
    server: https://kubernetes.default.svc
  clusterResourceWhitelist:
  - group: ''
    kind: Namespace
  namespaceResourceBlacklist:
  - group: ''
    kind: ResourceQuota

Projects are where you define the boundaries of what each team can do. The default project has no restrictions — never use it for production workloads. Create dedicated projects per team or per environment.

RBAC Configuration

ArgoCD has its own RBAC system layered on top of Kubernetes RBAC. It is configured via the argocd-rbac-cm ConfigMap. Roles are defined per project or globally:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.default: role:readonly
  policy.csv: |
    # Platform team has full access to platform-team project
    p, role:platform-admin, applications, *, platform-team/*, allow
    p, role:platform-admin, projects, get, platform-team, allow
    p, role:platform-admin, repositories, *, *, allow

    # Dev team can sync but not delete
    p, role:developer, applications, get, */*, allow
    p, role:developer, applications, sync, */*, allow
    p, role:developer, applications, action/*, */*, allow

    # Bind SSO groups to roles
    g, your-org:platform-team, role:platform-admin
    g, your-org:developers, role:developer

The policy.default: role:readonly ensures that any authenticated user who has no explicit role assignment gets read-only access — a safe default for production.

Multi-Cluster Management

ArgoCD can manage multiple Kubernetes clusters from a single control plane. Register external clusters with the CLI:

# First, ensure the target cluster context is in your kubeconfig
argocd cluster add production-eu-west --name production-eu-west

# Verify registration
argocd cluster list

ArgoCD will create a ServiceAccount in the target cluster and store its credentials as a Kubernetes secret in the ArgoCD namespace. Applications can then target this cluster by name in their destination.server field.

For large-scale multi-cluster setups, consider the App of Apps pattern or ApplicationSets. ApplicationSets are a controller that generates Applications dynamically based on generators — cluster lists, Git directory structures, or matrix combinations:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-addons
  namespace: argocd
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          environment: production
  template:
    metadata:
      name: '{{name}}-addons'
    spec:
      project: platform
      source:
        repoURL: https://github.com/your-org/cluster-addons
        targetRevision: HEAD
        path: 'addons/{{metadata.labels.region}}'
      destination:
        server: '{{server}}'
        namespace: kube-system

This single ApplicationSet deploys the appropriate addons to every cluster labeled environment: production, using each cluster’s region label to select the correct path in the repository.

Sync Strategies and Waves

When deploying complex applications with dependencies between resources, you need to control the order of deployment. ArgoCD provides two mechanisms:

Sync Phases

Resources are deployed in three phases: PreSync, Sync, and PostSync. Use Sync Hooks for resources that must complete before the main sync proceeds (database migrations, certificate issuance, etc.):

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: your-app:v1.2.3
        command: ["./migrate.sh"]
      restartPolicy: Never

Sync Waves

Within the Sync phase, waves control ordering. Resources with a lower wave number are applied and must become healthy before resources with higher wave numbers are applied:

# Applied first
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "1"

# Applied after wave 1 is healthy
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "2"

Notifications and Alerting

ArgoCD Notifications is a standalone controller that sends alerts when Application state changes. It supports Slack, PagerDuty, GitHub commit status, email, and a dozen other providers. Configure it via the argocd-notifications-cm ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  service.slack: |
    token: $slack-token
  template.app-sync-failed: |
    slack:
      attachments: |
        [{
          "title": "{{.app.metadata.name}}",
          "color": "#E96D76",
          "fields": [{
            "title": "Sync Status",
            "value": "{{.app.status.sync.status}}",
            "short": true
          },{
            "title": "Message",
            "value": "{{range .app.status.conditions}}{{.message}}{{end}}",
            "short": false
          }]
        }]
  trigger.on-sync-failed: |
    - when: app.status.sync.status == 'Unknown'
      send: [app-sync-failed]
    - when: app.status.operationState.phase in ['Error', 'Failed']
      send: [app-sync-failed]

Secret Management with ArgoCD

ArgoCD intentionally has no secret management built in — storing secrets in Git as plain text is never acceptable. The common patterns are:

  • Sealed Secrets (Bitnami) — Encrypts secrets with a cluster-specific key. The encrypted secret can be committed to Git; only the cluster can decrypt it.
  • External Secrets Operator — Syncs secrets from Vault, AWS Secrets Manager, GCP Secret Manager, etc. into Kubernetes secrets. The ArgoCD Application manages the ExternalSecret CRD, not the actual secret value.
  • argocd-vault-plugin — A plugin that replaces placeholder values in manifests with secrets retrieved from Vault at sync time.

The External Secrets Operator approach is the most flexible for teams already using a centralized secrets backend. The Application in ArgoCD deploys ExternalSecret objects, which the ESO controller resolves at runtime without ever touching Git.

Production Best Practices

  • Run ArgoCD in HA mode. Use manifests/ha/install.yaml with 3 replicas of the API server and multiple application controller shards for large clusters (100+ applications).
  • Pin image versions. Never use latest for the ArgoCD image itself. Pin to a specific version and upgrade deliberately.
  • Use the App of Apps pattern for bootstrapping. A single root Application deploys all other Applications. This makes cluster bootstrapping idempotent and reproducible.
  • Separate ArgoCD config from application config. Store ArgoCD Application manifests in a dedicated gitops repository, separate from application source code.
  • Enable resource tracking via annotations. Use application.resourceTrackingMethod: annotation in argocd-cm instead of the default label-based tracking, which can conflict with Helm’s own labels.
  • Set resource limits on ArgoCD controllers. Application controller CPU and memory scale with the number of resources tracked. Monitor and tune accordingly.
  • Restrict auto-sync in production. Consider requiring manual sync approval for production environments even when using GitOps — or at minimum require a PR approval gate before changes reach the target branch.

ArgoCD vs Flux

Flux v2 is the other major GitOps operator. Both are CNCF projects. The main differences in practice:

FeatureArgoCDFlux v2
UIBuilt-in web UINo official UI (use Weave GitOps)
Multi-clusterSingle control plane manages many clustersAgent per cluster, pull model
ApplicationSetsNativeKustomization + HelmRelease
Secret managementPlugin-basedSOPS native integration
Learning curveSteeper (more concepts)Lower (Kubernetes-native CRDs)
CNCF statusGraduatedGraduated

ArgoCD wins when you need the UI, multi-cluster management from a central plane, or have a large operations team that benefits from the visual application topology view. Flux wins when you want a simpler, purely Kubernetes-native approach with better SOPS integration for secret management.

FAQ

Can ArgoCD deploy to the cluster it runs in?

Yes. The https://kubernetes.default.svc destination refers to the local cluster. ArgoCD can manage both its own cluster and external clusters simultaneously.

Does ArgoCD support private Git repositories?

Yes. Configure repository credentials via argocd repo add with SSH keys, HTTPS username/password, or GitHub App credentials. Credentials are stored as Kubernetes secrets in the ArgoCD namespace.

How does ArgoCD handle CRD installation?

CRDs can be managed by ArgoCD, but there is a chicken-and-egg problem: if a CRD is not yet installed, ArgoCD cannot validate resources that use it. The recommended pattern is to put CRDs in wave 1 and dependent resources in wave 2, or to use a separate Application for CRDs.

What is the difference between an Application and an AppProject?

An Application is the unit of deployment — it maps a Git source to a cluster destination. An AppProject is a grouping and access control boundary — it restricts what sources and destinations an Application within the project can use. Every Application belongs to exactly one AppProject.

How do I roll back a deployment with ArgoCD?

The GitOps way: revert the commit in Git and let ArgoCD reconcile. ArgoCD also provides a UI-based rollback to any previous sync revision, but this is considered a temporary measure — the Git history should always be updated to match.

Getting Started

The fastest path from zero to a working ArgoCD setup on a local cluster:

# 1. Create a local cluster (kind or minikube)
kind create cluster --name argocd-demo

# 2. Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 3. Wait for pods
kubectl wait --for=condition=Ready pods --all -n argocd --timeout=120s

# 4. Get the initial admin password
argocd admin initial-password -n argocd

# 5. Port-forward and log in
kubectl port-forward svc/argocd-server -n argocd 8080:443 &
argocd login localhost:8080 --username admin --insecure

# 6. Deploy your first application
argocd app create guestbook 
  --repo https://github.com/argoproj/argocd-example-apps.git 
  --path guestbook 
  --dest-server https://kubernetes.default.svc 
  --dest-namespace guestbook 
  --sync-policy automated

From here, the natural next steps are integrating ArgoCD with your existing CI pipeline (CI builds and pushes the image, updates the image tag in Git, ArgoCD detects the change and syncs), configuring SSO via Dex, and setting up the App of Apps pattern for managing multiple applications declaratively.

For teams looking to go deeper on GitOps and ArgoCD in production, the Kubernetes architecture patterns guide covers how ArgoCD fits into a broader platform engineering stack alongside service mesh, policy enforcement, and observability tooling.

Enable ECS Logging in TIBCO BusinessWorks with Logback

Enable ECS Logging in TIBCO BusinessWorks with Logback

TIBCO BW ECS Logging Support is becoming one demanded feature based on the increased usage of the Elastic Common Schema for the log aggregation solution based on the Elastic stack (previously known as ELK stack)

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

We have already commented a lot about the importance of log aggregation solutions and their benefits, especially when discussing container architecture. Because of that, today, we will focus on how we can adapt our BW applications to support this new logging format.

Because the first thing that we need to know is the following statement: Yes, this can be done. And it can be done non-dependant on your deployment model. So, the solution provided here works for both on-premises solutions as well as container deployments using BWCE.

TIBCO BW Logging Background

TIBCO BusinessWorks (container or not) relies on its logging capabilities in the logback library, and this library is configured using a file named logback.xml that could have the configuration that you need, as you can see in the picture below:

BW ECS Logging: Sample of Logback.xml default config

Logback is a well-known library for Java-based developments and has an architecture based on a core solution and plug-ins that extend its current capabilities. It’s this plug-in approach that we are going to do to support ECS.

Even in the ECS Official documentation covers the configuration of enabling this logging configuration when using the Logback solution as you can see in the picture below and this official link:

BW ECS Logging: ECS Java Dependency Information

In our case, we don’t need to add the dependency anywhere but just download the dependency, as we will need to include to the existing OSGI bundles for the TIBCO BW installation. We will need just two files that are the following ones:

  • ecs-logging-core-1.5.0.jar
  • logback-ecs-encoder-1.5.0.jar

At the moment of writing this article, current versions are 1.5.0 for each of them, but keep a look to make sure you’re using a recent version of this software to avoid any problems with support and vulnerabilities.

Once we have these libraries, we need to add it to the BW system installation, and we need to do it differently if we are using a TIBCO on-premises installation or a TIBCO BW base installation. To be honest, the things we need to do are the same; the process of doing it is different.

Because, in the end, what we need to do is just a simple task. Include these JAR files as part of the current logback OSGI bundle that TIBCO BW loads. So, let’s see how we can do that and start with an on-premises installation. We will use the TIBCO BWCE 2.8.2 version as an example, but similar steps will be required for other versions.

On-premise installation is the easiest way to do it, but just because it has fewer steps than when we are doing it in a TIBCO BWCE base image. So, in this case, we will go to the following location: <TIBCO_HOME>/bwce/2.8/system/shared/com.tibco.tpcl.logback_1.2.1600.002/

  • We will place the download JARs in that folder
BW ECS Logging: JAR location
  • We will open the META-INF/MANIFEST.MF and do the following modifications:
    • Add those JARs to the Bundle-Classpath section:
BW ECS Logging: Bundle-Classpath changes
  • Include the following package (co.elastic.logging.logback) as part of the exported packages by adding it to the Exported-packagesection:
BW ECS Logging: Export-Package changes

Once this is done, our TIBCO BW installation supports ECS format. and we just need to configure the logback.xml to use it, and we can do that relying on the official documentation on the ECS page. We need to include the following encoder, as shown below:

 <encoder class="co.elastic.logging.logback.EcsEncoder">
    <serviceName>my-application</serviceName>
    <serviceVersion>my-application-version</serviceVersion>
    <serviceEnvironment>my-application-environment</serviceEnvironment>
    <serviceNodeName>my-application-cluster-node</serviceNodeName>
</encoder>

For example, if we modify the default logback.xml configuration file with this information, we will have something like this:

<?xml version="1.0" encoding="UTF-8"?>
<configuration scan="true">
  
  <!-- *=============================================================* -->
  <!-- *  APPENDER: Console Appender                                 * -->
  <!-- *=============================================================* -->  
  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="co.elastic.logging.logback.EcsEncoder">
      <serviceName>a</serviceName>
      <serviceVersion>b</serviceVersion>
      <serviceEnvironment>c</serviceEnvironment>
      <serviceNodeName>d</serviceNodeName>
  </encoder>
  </appender>



  <!-- *=============================================================* -->
  <!-- * LOGGER: Thor Framework loggers                              * -->
  <!-- *=============================================================* -->
  <logger name="com.tibco.thor.frwk">
    <level value="INFO"/>
  </logger>
  
  
  
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Framework loggers                     * -->
  <!-- *=============================================================* -->
  <logger name="com.tibco.bw.frwk">
    <level value="WARN"/>
  </logger>  
  
  <logger name="com.tibco.bw.frwk.engine">
    <level value="INFO"/>
  </logger>   
  
  
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Engine loggers                        * -->
  <!-- *=============================================================* --> 
  <logger name="com.tibco.bw.core">
    <level value="WARN"/>
  </logger>
  
  <logger name="com.tibco.bx">
    <level value="ERROR"/>
  </logger>

  <logger name="com.tibco.pvm">
    <level value="ERROR"/>
  </logger>
  
  <logger name="configuration.management.logger">
    <level value="INFO"/>
  </logger>
  
  
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Palette and Activity loggers          * -->
  <!-- *=============================================================* -->
  
  <!-- Default Log activity logger -->
  <logger name="com.tibco.bw.palette.generalactivities.Log">
    <level value="DEBUG"/>
  </logger>
  
  <logger name="com.tibco.bw.palette">
    <level value="ERROR"/>
  </logger>

  
  
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Binding loggers                       * -->
  <!-- *=============================================================* -->
  
  <!-- SOAP Binding logger -->
  <logger name="com.tibco.bw.binding.soap">
    <level value="ERROR"/>
  </logger>
  
  <!-- REST Binding logger -->
  <logger name="com.tibco.bw.binding.rest">
    <level value="ERROR"/>
  </logger>
  
  
  
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Shared Resource loggers               * -->
  <!-- *=============================================================* --> 
  <logger name="com.tibco.bw.sharedresource">
    <level value="ERROR"/>
  </logger>
  
  
   
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Schema Cache loggers                  * -->
  <!-- *=============================================================* -->
  <logger name="com.tibco.bw.cache.runtime.xsd">
    <level value="ERROR"/>
  </logger> 
  
  <logger name="com.tibco.bw.cache.runtime.wsdl">
    <level value="ERROR"/>
  </logger> 
  
    
  
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Governance loggers                    * -->
  <!-- *=============================================================* -->  
  <!-- Governance: Policy Director logger1 --> 
  <logger name="com.tibco.governance">
    <level value="ERROR"/>
  </logger>
   
  <logger name="com.tibco.amx.governance">
    <level value="WARN"/>
  </logger>
   
  <!-- Governance: Policy Director logger2 -->
  <logger name="com.tibco.governance.pa.action.runtime.PolicyProperties">
    <level value="ERROR"/>
  </logger>
  
  <!-- Governance: SPM logger1 -->
  <logger name="com.tibco.governance.spm">
    <level value="ERROR"/>
  </logger>
  
  <!-- Governance: SPM logger2 -->
  <logger name="rta.client">
    <level value="ERROR"/>
  </logger>
  
  
    
  <!-- *=============================================================* -->
  <!-- * LOGGER: BusinessWorks Miscellaneous Loggers                 * -->
  <!-- *=============================================================* --> 
  <logger name="com.tibco.bw.platformservices">
    <level value="INFO"/>
  </logger>
  
  <logger name="com.tibco.bw.core.runtime.statistics">
    <level value="ERROR"/>
  </logger>
  

  
  <!-- *=============================================================* -->
  <!-- * LOGGER: Other loggers                                       * -->
  <!-- *=============================================================* -->  
  <logger name="org.apache.axis2">
    <level value="ERROR"/>
  </logger>

  <logger name="org.eclipse">
    <level value="ERROR"/>
  </logger>
  
  <logger name="org.quartz">
    <level value="ERROR"/>
  </logger>
  
  <logger name="org.apache.commons.httpclient.util.IdleConnectionHandler">
    <level value="ERROR"/>
  </logger>
  
  
  
  <!-- *=============================================================* -->
  <!-- * LOGGER: User loggers.  User's custom loggers should be      * -->
  <!-- *         configured in this section.                         * -->
  <!-- *=============================================================* -->

  <!-- *=============================================================* -->
  <!-- * ROOT                                                        * -->
  <!-- *=============================================================* --> 
  <root level="ERROR">
   <appender-ref ref="STDOUT" />
  </root>
  
</configuration>

You can also do more custom configurations based on the information available on the ECS encoder configuration page here.

How to enable TIBCO BW ECS Logging Support?

For BWCE, the steps are similar, but we need to be aware that all the runtime components are packaged inside the base-runtime-version.zip that we download from our TIBCO eDelivery site, so we will need to use a tool to open that ZIP and do the following modifications:

  • We will place the download JARs on that folder /tibco.home/bwce/2.8/system/shared/com.tibco.tpcl.logback_1.2.1600.004
BW ECS Logging: JAR location
  • We will open the META-INF/MANIFEST.MF and do the following modifications:
    • Add those JARs to the Bundle-Classpath section:
BW ECS Logging: Bundle-Classpath changes
  • Include the following package (co.elastic.logging.logback) as part of the exported packages by adding it to the Exported-packagesection:
BW ECS Logging: Export-package changes
  • Additionally we will need to modify the bwappnode in the location /tibco.home/bwce/2.8/bin to add the JAR files also to the classpath that the BWCE base image use to run to ensure this is loading:
BW ECS Logging: bwappnode change

Now we can build our BWCE base image as usual and modify the logback.xml as explained above. Here you can see a sample application using this configuration:

{"@timestamp":"2023-08-28T12:49:08.524Z","log.level": "INFO","message":"TIBCO BusinessWorks version 2.8.2, build V17, 2023-05-19","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"main","log.logger":"com.tibco.thor.frwk"}

<>@BWEclipseAppNode> {"@timestamp":"2023-08-28T12:49:25.435Z","log.level": "INFO","message":"Started by BusinessStudio.","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"main","log.logger":"com.tibco.thor.frwk.Deployer"}
{"@timestamp":"2023-08-28T12:49:32.795Z","log.level": "INFO","message":"TIBCO-BW-FRWK-300002: BW Engine [Main] started successfully.","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"main","log.logger":"com.tibco.bw.frwk.engine.BWEngine"}
{"@timestamp":"2023-08-28T12:49:34.338Z","log.level": "INFO","message":"TIBCO-THOR-FRWK-300001: Started OSGi Framework of AppNode [BWEclipseAppNode] in AppSpace [BWEclipseAppSpace] of Domain [BWEclipseDomain]","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"Framework Event Dispatcher: Equinox Container: 1395256a-27a2-4e91-b774-310e85b0b87c","log.logger":"com.tibco.thor.frwk.Deployer"}
{"@timestamp":"2023-08-28T12:49:34.456Z","log.level": "INFO","message":"TIBCO-THOR-FRWK-300018: Deploying BW Application [t3:1.0].","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"Framework Event Dispatcher: Equinox Container: 1395256a-27a2-4e91-b774-310e85b0b87c","log.logger":"com.tibco.thor.frwk.Application"}
{"@timestamp":"2023-08-28T12:49:34.524Z","log.level": "INFO","message":"TIBCO-THOR-FRWK-300021: All Application dependencies are resolved for Application [t3:1.0]","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"Framework Event Dispatcher: Equinox Container: 1395256a-27a2-4e91-b774-310e85b0b87c","log.logger":"com.tibco.thor.frwk.Application"}
{"@timestamp":"2023-08-28T12:49:34.541Z","log.level": "INFO","message":"Started by BusinessStudio, ignoring .enabled settings.","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"Framework Event Dispatcher: Equinox Container: 1395256a-27a2-4e91-b774-310e85b0b87c","log.logger":"com.tibco.thor.frwk.Application"}
{"@timestamp":"2023-08-28T12:49:35.842Z","log.level": "INFO","message":"TIBCO-THOR-FRWK-300006: Started BW Application [t3:1.0]","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"EventAdminThread #1","log.logger":"com.tibco.thor.frwk.Application"}
{"@timestamp":"2023-08-28T12:49:35.954Z","log.level": "INFO","message":"aaaaaaa&#10;","ecs.version": "1.2.0","service.name":"a","service.version":"b","service.environment":"c","service.node.name":"d","event.dataset":"a","process.thread.name":"bwEngThread:In-Memory Process Worker-1","log.logger":"com.tibco.bw.palette.generalactivities.Log.t3.module.Log"}
gosh: stopping shell

Enable SwaggerUI in TIBCO BusinessWorks When Offloading SSL (BWCE Fix)

Enable SwaggerUI in TIBCO BusinessWorks When Offloading SSL (BWCE Fix)

SwaggerUI TIBCO BusinessWorks is one of the features available by default to all the TIBCO BusinessWorks REST Service developed. As you probably know, SwaggerUI is just an HTML Page with a graphical representation of the Swagger definition file (or OpenAPI specification to be more accurate with the current version of the standards in use) that helps to understand the operation and capabilities exposed by the service and also provide an easy way to test the service as you can see in the picture below:

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

How To Enable SwaggerUI TIBCO BusinessWorks when Offloading SSL Certificate: SwaggerUI view from TIBCO BWCE app

This interface is provided out of the box for any REST Service developed using TIBCO BusinessWorks that uses a different port (7777 by default) in case we’re talking about an on-premises deployment or in the /swagger endpoint in case we are talking about a TIBCO BusinessWorks Container Edition.

 How does SwaggerUI work to load the Swagger Specification?

SwaggerUI works in a particular way. When you reach the URL of the SwaggerUI, there is another URL that is usually part of a text field inside the web page that holds the link to the JSON or YAML document that stores the actual specification, as you can see in the picture below:

How To Enable SwaggerUI TIBCO BusinessWorks when Offloading SSL Certificate: SwaggerUI highlighting the 2 URL loaded in the process

So, you can think that this is a 2-call kind of process:

  • First call loads the SwaggerUI as a graphical container
  • Then, based on the internal URL provided there, do a second call to retrieve the document specification
  • And with that information, render the information in the SwaggerUI format.

The issue is raised when the SwaggerUI is exposed behind a Load Balancer because the second URL needs to use the advertised URL as the backend server is not reached directly by the client browsing the SwaggerUI. This is solved out of the box with Kubernetes capabilities in the case of TIBCO BWCE, and for the on-premises deployment, it offers two properties to handle that as follows:

# ------------------------------------------------------------------------------
# Section:  BW REST Swagger Configuration.  The properties in this section
# are applicable to the Swagger framework that is utilized by the BW REST 
# Binding.
#
# Note: There are additional BW REST Swagger configuration properties that
# can be specified in the BW AppNode configuration file "config.ini".  Refer to
# the BW AppNode configuration file's section "BW REST Swagger configuration" 
# for details. 
# ------------------------------------------------------------------------------
# Swagger framework reverse proxy host name.  This property is optional and 
# it specifies the reverse proxy host name on which Swagger framework serves 
# the API's, documentation  endpoint, api-docs, etc.. 
bw.rest.docApi.reverseProxy.hostName=localhost

# Swagger framework port.  This property is optional and it specifies the 
# reverse proxy port on which Swagger framework serves the API's, documentation
# endpoint, api-docs, etc.
bw.rest.docApi.reverseProxy.port=0000

You can browse the official documentation page for more detailed information.

That solves the main issue regarding the hostname and the port to be reached as the final user requires. Still, there is an outstanding component on the URL that could generate an issue, and that’s the protocol, so, in a nutshell, if this is exposed using HTTP or HTTPS.

How to Handle Swagger URL when offloading SSL?

Until the release of TIBCO BWCE 2.8.3, the protocol depended on the HTTP Connector configuration you used to expose the swagger component. So, if you use an HTTP connector without SSL configuration, it will try to reach the endpoint using an HTTP connection. In the other case, if you use an HTTP connector with an SSL connection, it will try to use an HTTPS connection. That seems fine, but some use cases could generate a problem:

SSL Certificate offloaded in the Load Balancer: If we offload the SSL configuration on the Load Balancer as it is used in traditional on-premises deployments and some of the Kubernetes configurations, the consumer will establish an HTTPS connection to the Load Balancer, but internally the communication with the BWCE will be done using HTTP, so, in this case, it will generate a mismatch, because in the second call of the requests it will guess that as the HTTP Connector from BWCE is not using HTTPS, the URL should be reached using HTTP but that’s not the case as the communication goes through the Load Balancer that is handled the security.

Service Mesh Service Exposition: Similar to the previous case, but in that case, close to the Kubernetes deployment. Suppose we are using Service Mesh such as Istio or others. In that case, security is one of the things that needs to be handled. Hence, the situation is the same as the scenario above because the BWCE doesn’t know the security configuration but is impacting the default endpoint generated.

How To Enable SwaggerUI TIBCO BusinessWorks when Offloading SSL Certificates?

Since BWCE 2.8.3, there is a new JVM property that we can use to force the endpoint generated to be HTTPS even if the HTTP Connector used by the BWCE application doesn’t have any security configuration that helps us to solve this issue in the cases above and similar scenario. The property can be added as any other JVM property using the BW_JAVA_OPTS environment property, and the value is this: bw.rest.enable.secure.swagger.url =true

Increase HTTP Logs in TIBCO BusinessWorks for Debugging and Troubleshooting

Increase HTTP Logs in TIBCO BusinessWorks for Debugging and Troubleshooting

Increasing the HTTP logs in TIBCO BusinessWorks when you are debugging or troubleshooting an HTTP-based integration that could be related to a REST or SOAP service is one of the most used and helpful things you can do when developing with TIBCO BusinessWorks.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

The primary purpose of increasing the HTTP logs is to get complete knowledge about what information you are sending and which communication you are receiving from your partner communicator to help understand an error or unexpected behavior.

What are the primary use cases for increasing the HTTP logs?

In the end, all the different use cases are variations of the primary use case: “Get full knowledge about the HTTP exchange communication between both parties.” Still, some more detailed ones can be listed below:

  • Understand why a backend server is rejecting a call that could be related to Authentication or Authorization, and you need to see the detailed response by the backend server.
  • Verify the value of each HTTP Header you are sending that could affect the communication’s compression or accepted content type.
  • See why you’re rejecting a call from a consumer

Splitting the communication based on the source

The most important thing to understand is that the logs usually depend on the library you are using, and it is not the same library used to expose an HTTP-based Server as the library you use to consume an HTTP-based service such as REST or a SOAP service.

Starting from what you expose, this is the easiest thing because this will be defined by the HTTP Connector resources you’re using, as you can see in the picture below:

HTTP Shared Resources Connector in BW

All HTTP Connector Resources that you can use to expose REST and SOAP services are based on the Jetty Service implementation, and that means that the loggers that you need to change their configuration are related to the Jetty server itself.

More complex, in theory, are the ones related to the client communication when our TIBCO BusinessWorks application consumes an HTTP-based service provided by a backend because each of these communications has its own HTTP Client Shared Resources. The configuration of each of them will be different because one of the settings we can get here is the Implementation Library, and that will have a direct effect on the way to change the log configuration:

HTTP Client Resource in BW that shows the different implementation libraries to detect the logger to Increasing HTTP Logs in TIBCO BusinessWorks

You have three options when you define an HTTP Client Resource, as you can see in the picture above:

  • Apache HttpComponents: The default one supports HTTP1, SOAP and REST services.
  • Jetty HTTP Client: This client only supports HTTP flows such as HTTP1 and HTTP2, and it would be the primary option when you’re working with HTTP2 flows.
  • Apache Commons: Similar to the first one, but this is currently deprecated, and to be honest, if you have some client component using this configuration, you should change it when you can to the Apache HttpComponents.

So, if we’re consuming a SOAP and REST service, it is clear that we will be using the implementation library Apache HttpComponents, and that will give us the logger we need to use.

Because for Apache HttpComponents, we can rely on the following logger: “org.apache.http” and in case we want to extend the server side, or we’re using Jetty HTTP client, we can use this one: “org.eclipse.jetty.http”

We need to be aware that we cannot extend it just for a single HTTP Client resource because the configuration will be based on the Implementation Library, so in case we set the DEBUG level for the Apache HttpComponents library, it will affect all Shared Resources using this implementation Library, and you’ll need to differentiate based on the data inside the log so that will be part of your data analysis.

How to set HTTP Logs in TIBCO BusinessWorks?

Now that we have the loggers, we must set it to a DEBUG (or TRACE) level. We need to know how to do it, and we have several options depending on how we would like to do it and what access we have. The scope of this article is TIBCO BusinessWorks Container Edition, but you can easily extrapolate part of this knowledge to an on-premises TIBCO BusinessWorks installation.

TIBCO BusinessWorks (container or not) relies on its logging capabilities in the log back library, and this library is configured using a file named logback.xml that could have the configuration that you need, as you can see in the picture below:

logback.xml configuration with the default structure in TIBCO BW

So if we want to add a new logging configuration, we need to add a new element loggerto the file with the following structure:

  <logger name="%LOGGER_WE_WANT_TO_SEE">
    <level value="%LEVEL_WE_WANT_TO_SEE%"/>
  </logger>    

So, the logger was precise based on the previous section, and the level will depend on how much info you want to see. The log Levels are the following ones: ERROR, WARN, INFO, DEBUG, TRACE. DEBUG and TRACE are the ones that show more information.

In our case, DEBUG should be enough to get the full HTTP Request and HTTP Response, but you can also apply it to other things where you could need a different log level.

Now you need to add that to the logback.xml file, and to do that, you have several options, as commented:

  • You can find the logback.xml inside the BWCE container (or the AppNode configuration folder) and modify its content. The default location of this file is this one: /tmp/tibco.home/bwce/<VERSION>/config/logback.xml To do this, you will need to have access to do a kubectl exec on the bwce container, and if you do the change, the change will be temporary and lost in the next restart. That could be something good or bad, depending on your goal.
  • If you want to have it permanent or don’t have access to the container, you have two options. The first one is to include a custom copy of the logback.xml in the /resources/custom-logback/ folder in the BWCE base image and set the environment variable CUSTOM_LOGBACK to TRUE value, and that will override the default logback.xml configuration with the content of this file. As commented, this will be “permanent” and will apply since the first deployment of the app with this configuration. You can find more info the official doc here.
  • There is also an additional one since BWCE 2.7.0 and above that allows you to change the logback.xml content without a new copy or changing the base image, and that’s based on the usage of the environment property BW_LOGGER_OVERRIDES with the content in the following way (logger=value) so in our case it would be something like this org.apache.http=DEBUG and in the next deployment you will get this configuration. Similar to the previous one, this will be permanent but doesn’t require adding a file to the base image to be achievable.

So, as you can see, you have different options depending on your needs and access levels.

Conclusion

In conclusion, enhancing HTTP logs within TIBCO BusinessWorks during debugging and troubleshooting is a vital strategy. Elevating log levels provides a comprehensive grasp of information exchange, aiding in analyzing errors and unexpected behaviors. Whether discerning backend rejection causes, scrutinizing HTTP header effects, or isolating consumer call rejections, amplified logs illuminate complex integration scenarios. Adaptations vary based on library usage, encompassing server exposure and service consumption. Configuration through the logback library involves tailored logger and level adjustments. This practice empowers developers to unravel integration intricacies efficiently, ensuring robust and seamless HTTP-based interactions across systems.

ReadOnlyRootFilesystem for TIBCO BWCE: Securing Containers with Kubernetes Best Practices

ReadOnlyRootFilesystem for TIBCO BWCE: Securing Containers with Kubernetes Best Practices

This article will cover how to enhance the security of your TIBCO BWCE images by creating a ReadOnlyFileSystem Image for TIBCO BWCE. In previous articles, we have commented on the benefits that this kind of image provides several advantages in terms of security, focusing on aspects such as reducing the attack surface by limiting the kind of things any user can do, even if they gain access to running containers.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

The same applies in case any malware your image can have will have limited the possible actions they can do without any write access to most of the container.

How ReadOnlyFileSystem affects a TIBCO BWCE image?

This has a clear impact as the TIBCO BWCE image is an image that needs to write in several folders as part of the expected behavior of the application. That’s mandatory and non-dependent on the scripts you used to build your image.

As you probably know, TIBCO BWCE ships two sets of scripts to build the Docker base image: the main ones and the ones included in the folder reducedStartupTime, as you can see in the GitHub page but also inside your docker folder in the TIBCO-HOME after the installation as you can see in the picture below.

ReadOnlyRootFilesystem for TIBCO BWCE: Securing Containers with Kubernetes Best Practices

The main difference between them is where the unzip of the bwce-runtime is made. In the case of the default script, the unzip is done in the startup process of the image, and in the reducedStartupTime this is done in the building of the image itself. So, you can start thinking that the default ones need some writing access as they need to unzip the file inside the container, and that’s true.

But also, the reduced startupTime requires writing access to run the application; several activities are done regarding unzipping the EAR file, managing the properties file, and additional internal activities. So, no matter what kind of scripts you’re using, you must provide a write-access folder to do this activity.

By default, all these activities are limited to a single folder. If you keep everything by default, this is the /tmp folder, so you must provide a volume for that folder.

How to deploy a TIBCO BWCE application with the

Now, that is clear that you need a volume for the /tmp folder, and now you need to define the kind of volume that you want to use for this one. As you know, there are several kinds of volumes that you can determine depending on the requirements that you have.

In this case, the only requirement is to write access, but there is no need regarding storage and persistency, so, in that case, we can use an emptyDir mode. emptyDir content, which is erased when a pod is removed, is similar to the default behavior but allows writing permission on its content.

To show how the YAML would like, we will use the default one that we have available in the documentation here:

apiVersion: v1
kind: Pod
metadata:
  name: bookstore-demo
  labels:
    app: bookstore-demo
spec:
  containers:
  - name: bookstore-demo
    image: bookstore-demo:2.4.4
    imagePullPolicy: Never
    envFrom:
    - configMapRef:
      name: name 

So, we will change that to include the volume, as you can see here:

apiVersion: v1
kind: Pod
metadata:
  name: bookstore-demo
  labels:
    app: bookstore-demo
spec:
  containers:
  - name: bookstore-demo
    image: bookstore-demo:2.4.4
    imagePullPolicy: Never
	securityContext:
		readOnlyRootFilesystem: true
    envFrom:
    - configMapRef:
      name: name
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

The changes are the following:

  • Include the volumes section with a single volume definition with the name of tmp with an emptyDirdefinition.
  • Include a volumeMountssection for the tmpvolume that is mounted in the /tmp path to allow to write on that specific path to enable also the unzip of the bwce-runtime as well as all the additional activities that are required.
  • To trigger this behavior, include the readOnlyRootFilesystem flag in the securityContext section.

Conclusion

Incorporating a ReadOnlyFileSystem approach into your TIBCO BWCE images is a proactive strategy to fortify your application’s security posture. By curbing unnecessary write access and minimizing the potential avenues for unauthorized actions, you’re taking a vital step towards safeguarding your containerized environment.

This guide has unveiled the critical aspects of implementing such a security-enhancing measure, walking you through the process with clear instructions and practical examples. With a focus on reducing attack vectors and bolstering isolation, you can confidently deploy your TIBCO BWCE applications, knowing that you’ve fortified their runtime environment against potential threats.

TIBCO BusinessWorks HashiCorp Vault Integration: Secure Secrets in 3 Steps

TIBCO BusinessWorks HashiCorp Vault Integration: Secure Secrets in 3 Steps

Introduction

This article aims to show the TIBCO BW Hashicorp Vault Configuration to integrate your TIBCO BW application with the secrets stored in Hashicorp Vault, mainly for the externalization and management of password and credentials resources.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

As you probably know, in the TIBCO BW application, the configuration is stored in Properties at different levels (Module or Application properties). You can read more about them here. And the primary purpose of that properties is to provide flexibility to the application configuration.

These properties can be of different types, such as String, Integer, Long, Double, Boolean, and DateTime, among other technical resources inside TIBCO BW, as shown in the picture below:

TIBCO BW Hashicorp Vault Configuration: BW Property Types

The TIBCO BW Hashicorp Vault integration will affect only those properties of Password type (at least up to 2.7.2/6.8.1 BW version). The reason behind that is that those properties are the kind of data relevant to the information that is sensitive and needs to be secure. Other concepts can be managed through standard Kubernetes components such as ConfigMaps.

BW Application Definition

We are going to start with a straightforward application, as you can see in the picture below:

TIBCO BW Hashicorp Vault Configuration: Property sample

Just a simple timer that will be executed once and insert the current time into the PostgreSQL database. We will use Hashicorp Vault to store the password of the database user to be able to connect to it. The username and the connection string will reside on a ConfigMap.

We will skip the part of the configuration regarding the deployment of the TIBCO BW application Containers and link to a ConfigMap you have an article covering that in detail in case you need to follow it, and we will focus just on the topic regarding TIBCO BW Hashicorp Vault integration.

So we will need to tell TIBCO BW that the password of the JDBC Shared Resource will be linked to Hashicorp Vault configuration, and to do that, the first thing is to have tied the Password of the Shared Resources to a Module Property as shown in the picture below:

TIBCO BW Hashicorp Vault Configuration: Password linked to Module Property

Now, we need to tell this Module Property that is Linked to Hashicorp Vault, and we will do that on the Application Property View, selecting that this property is linked to a Credential Management Solution as shown in the picture below:

TIBCO BW Hashicorp Vault Configuration: Credential Management Configuration for Property

And it is now when we establish the TIBCO BW Hashicorp Vault relationship. We need to click directly on the green plus sign, and we will have a modal window asking for the technology of credentials management that we’re going to use and the data needed for each of them, as you can see in the following picture:

TIBCO BW Hashicorp Vault Configuration: Credential Management Configuration for Property

We will select Hashicorp Vault as the provided. Then we will need to provide three attributes that we already commented on in the previous article when we start creating secrets in Hashicorp Vault:

  • Secret Name: this is the secret name path after the root path of the element.
  • Secret Key: This is the key inside the secret itself
  • Mount Path: This is the root path of the secret

To get more details about these three concepts, please look at our article about how to create secrets in Hashicorp Vault.

So with all this, we have pretty much everything we need to connect to Hashicorp Vault and grab the secret, and from the TIBCO BW BusinessStudio side, everything is done; we can generate the EAR file and deploy it into Kubernetes because here it is the last part of our configuration.

 Kubernetes Deployment

Until this moment, we have the following information already provided:

  • BW Process that has the login to connect to the Database and insert information
  • Link between the password property used to connect and the Hashicorp Secret definition

So, pretty much everything is there, but one concept is missing. How will the Kubernetes Pod connect to Hashicorp once the pod is deployed? Until this point, we didn’t provide the Hashicorp Vault server location of the authentication method to connect to it. This is the missing part of the TIBCO BW Hashicorp Vault integration and will be part of the Kubernetes Deployment YAML file.

We will do that using the following environment properties in this sample:

TIBCO BW Hashicorp Vault Configuration: Hashicorp Environment Variables
  • HASHICORP_VAULT_ADDR: This variable will point to where the Hashicorp Vault server is located
  • HASHICORP_VAULT_AUTH: This variable will indicate which authentication options will be used. In our case, we will use the token one as we used in the previous article
  • HASHICORP_VAULT_KV_VERSION: This variable indicates which version of the KV storage solution we are using and will be two by default.
  • HASHICORP_VAULT_TOKEN: This will be just the token value to be able to authentication against the Hashicorp Vault server

If you are using other authentication methods or just want to know more about those properties please take a look at this documentation from TIBCO.

With all that added to the environment properties of our TIBCO BW application, we can run it, and we will get an output similar to this one, and that shows that the TIBCO BW Hashicorp Vault integration has been done and the application was able to start without any issue

TIBCO BW Hashicorp Vault Configuration: Running sample

TIBCO BusinessWorks Modules Explained: Types, Limitations, and Best Practices

TIBCO BusinessWorks Modules Explained: Types, Limitations, and Best Practices

TIBCO BW Modules are one of the most relevant contents on your TIBCO BW developments. Learn all the details about the different TIBCO BW Modules available and when to use each of them.

TIBCO BW has evolved in several ways and adapter to the latest changes of architecture. Because of that, since the conception of the latest major version, it has introduced several concepts that is important to master to be able to unleash all the power that this remarkable tool provides to you. Today we are going to talk about the Modules.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

Every TIBCO BW application is composed of different modules that are the components that host all the logic that you can create, and that’s the first thing to write down: All your code and everything you do in your application will belong to one TIBCO BW Module.

If we think about the normal hierarchy of TIBCO BW components it will be something like that picture below:

TIBCO BusinessWorks Modules Explained: Types, Limitations, and Best Practices

At the top level, we will have the Application; at the second level, we will have the modules, and after that, we will have the packages and finally, the technical components such as Process, Resources, Classes, Schemas, Interfaces, and so on. Learn more about this here.

TIBCO BW Module Classification

There are several kind of module and each of them provides a specific use-case and has some characteristics associated with it.

  • Application Module: It is the most important kind of module because without each you cannot have an application. It is the master module and only can be one per application. It is where all your main logic to that application will reside.
  • Shared Module: It is the other only BW native module and it is main purpose as the name shows it is to host all the code and components that can be shared between several applications. If you have experience with previous versions of TIBCO BW you can think on this TIBCO BW Module as a replacement of a Design Time Library (a.k.a DTL) or if you have experience with a programming language a library that is imported to the code. Because of that it doesn’t have a restriction on the number of applications that can use a share module and there is no limitation on the number of share modules that a TIBCO BW Application can have.
  • OSGI Module: This module is the one that is not BW native and it is not going to be include BW objects such as Processes, Resources and so on, and there are mainly concieved to have Java classes. And again it is more like a helper module that also can be shared as needed. Usual scenarios for use this kind of module is to define Custom XPath Functions for example or to have Java Code shared between several applications.

Both Shared Modules and OSGI Modules can be defined as Maven dependencies and use the Maven process to publish them in a Maven repository and also to be retrieved from it based on the declaration.

That provides a very efficient way for distribution and version control of these shared components and, at the same, offers a similar process for other programming languages such as Java so that it will decrease the learning curve for that process.

TIBCO BW Module Limitations

As we already commented, there are some limitations or special characteristics that each module has. We should be very aware of it to help us properly distribute our code using the right kind of modules.

As commented, one application can have only one TIBCO BW Application Module. Even though it is technically possible to have the same BW Application Module in more than one application, that has no sense because both applications will be the same as its main code will be the same.

TIBCO BW Shared Modules at other hand, cannot have Starter components or Activator process as part of its declaration and all of them should reside on the TIBCO BW Application Module

Both TIBCO BW Application Module and TIBCO BW Shared Module can have Java code, but on the other way, the OSGI module can only have Java code and no other TIBCO BW resources.

TIBCO BW Shared Modules can be exported in two different ways, as regular modules (ZIP file with the source code) and in Binary format, to be shared among other developers but not allowing them to change or change their view of the implementation details. This is still supported for legacy reasons, but today’s recommended way to distribute the software is using Maven, as discussed above.

TIBCO BW Module Use-Cases

As commented there are different use cases for each of the module that because of that it will help you decide which component work best for each scenario:

  • TIBCO BW Shared Modules covers all the standard components needed for all the applications. Here, the main use-case is the framework components or main patterns that simplify the development and homogenize. This helps control standard capabilities such as error handling, auditing, logging, or even internal communication, so the developers only need to focus on the business logic for their use case.
  • Another use-case for TIBCO BW Shared Module encapsulates anything shared between applications, such as Resources, to connect to one backend, so all the applications that need to connect to that backend can import and avoid the need to re-work that part.
  • OSGi Module is to have Java code that has a weak relationship with the BW code, such as component to perform an activity such as Sign a PDF Document or Integrate with an element using a Java native API so we can keep it and evolve it separate to the TIBCO BW Code.
  • Another case for OSGI Module is defining the Custom XPath Functions that you will need as part of your Shared Module or your Application Module.
  • TIBCO BW Application Module, on the other hand, only should have code that is specific to the business problem that we are resolving here, moving all code that can be used for more than one application to a Shared Module.

Configure TIBCO BusinessWorks EMS Reconnection: 2 Reliable Approaches Explained

Configure TIBCO BusinessWorks EMS Reconnection: 2 Reliable Approaches Explained

On this article we are going to cover how TIBCO BW EMS Reconnection works and how you can apply it on your application and the pros and const about the different options available.

One of the main issues we all have faced when working on a TIBCO BW and EMS integration is the reconnection part. Even though this is something that we need on minimal occasions because of the TIBCO EMS server’s extreme reliability, it can have severe consequences if we don’t have that well configured.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

– /wp:paragraph –>

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

TIBCO BW supports many different integration methods and hundreds of connectors that allow you to connect to any possible source. But truth must be told, EMS is one of the standard connectors you need to enable. That’s why TIBCO BW and EMS usually goes together when it comes to a proper Integration Platform.

This article is part of my comprehensive TIBCO Integration Platform Guide where you can find more patterns and best practices for TIBCO integration platforms.

JMS Support for TIBCO BW is out of the box, but like any other JMS implementation, you need to provide the client libraries to establish a real connection.

To do that, since TIBCO BW 6, a simple way is provided to simplify that process, and this is what we are going to cover in this article.

Problem description

The first thing is to know that you need to do something and the most important thing is to learn to understand what kind of error is related to this problem. You could find two different errors depending on where you are testing this: design-time or runtime.

If we are talking about a runtime issue, you can see a trace similar to this one:

2022-06-02T13:27:15,867 ERROR [pool-13-thread-2] c.t.b.thor.runtime.model.Constituent - The following error has occurred for "name: test-app version: 1.0.0.qualifier bundle name: test-app " which needs to be resolved.
2022-06-02T13:27:15,878 ERROR [pool-13-thread-2] c.t.b.thor.runtime.model.Constituent - TIBCO-BW-FRWK-600053: Failed to initialize BW Component [ComponentStarter].
<CausedBy> com.tibco.bw.core.runtime.api.BWEngineException: TIBCO-BW-CORE-500232: Failed to initialize BW Component [ComponentStarter], Application [test-app:1.0] due to activity initialization error.
<CausedBy> com.tibco.bw.core.runtime.ActivityInitException: TIBCO-BW-CORE-500408: Failed to initialize the ProcessStarter activity [JMSReceiveMessage] in process [com.test.Starter], module [test-app] **due to unexpected activity lifecycle error.**
**<CausedBy> java.lang.NullPointerException**

Each time you see a java.lang.NullPointerExceptionrelated to a JMS Receive activity, you can be sure the issue is related to the installation of the drivers.

If we are talking about Design-time, you will see the same error when you are trying to start a Run or Debug session, but additional you will see the following error when you are testing a JMS Connection Resource, as you can see in the picture below:

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

Installation Process

The installation process is quite simple, but you need access to an EMS installation or at least a disk location with the clients stored. If you already have that, you just need to go to the following location:

 TIBCO_HOME/bw/<version>/bin

Where TIBCO_HOME is the installation folder for the BusinessWorks application, and version is the minor version format (such as 6.7, 2.7, 6.8, and so on).

At this location, you will run the following command:

 ./bwinstall ems-driver

This will start and ask for the location of the client libraries, as you can see in the picture below:

TIBCO BusinessWorks and EMS Integration: Install JMS Drivers and Fix Common Errors

And after that, it will do the process of installing it will end with the BUILD SUCCESSFULL output. And after that point, you will need to restart the Business Studio or the runtime components (such as AppNodes or bwagent) to have the configuration applied.