Platform Engineering Archives - Alexandre Vazquez

The recent announcement regarding the deprecation of the Ingress-NGINX controller sent a ripple through the Kubernetes community. For many organizations, it’s the first major deprecation of a foundational, widely-adopted ecosystem component. While the immediate reaction is often tactical—”What do we replace it with?”—the more valuable long-term question is strategic: “How do we systematically manage this and future migrations?”

This event isn’t an anomaly; it’s a precedent. As Kubernetes matures, core add-ons, APIs, and patterns will evolve or sunset. Platform engineering teams need a repeatable, low-risk framework for navigating these changes. Drawing from the Ingress-NGINX transition and established deployment management principles, we can abstract a robust Kubernetes Migration Framework applicable to any major component, from service meshes to CSI drivers.

Why Ad-Hoc Migrations Fail in Production

Attempting a “big bang” replacement or a series of manual, one-off changes is a recipe for extended downtime, configuration drift, and undetected regression. Production Kubernetes environments are complex systems with deep dependencies:

Interdependent Workloads: Multiple applications often share the same ingress controller, relying on specific annotations, custom snippets, or behavioral quirks.
Automation and GitOps Dependencies: Helm charts, Kustomize overlays, and ArgoCD/Flux manifests are tightly coupled to the existing component’s API and schema.
Observability and Security Integration: Monitoring dashboards, logging parsers, and security policies are tuned for the current implementation.
Knowledge Silos: Tribal knowledge about workarounds and specific configurations isn’t documented.

A structured framework mitigates these risks by enforcing discipline, creating clear validation gates, and ensuring the capability to roll back at any point.

The Four-Phase Kubernetes Migration Framework

This framework decomposes the migration into four distinct phases: Assessment, Parallel Run, Cutover, and Decommission. Each phase has defined inputs, activities, and exit criteria.

Phase 1: Deep Assessment & Dependency Mapping

Before writing a single line of new configuration, understand the full scope. The goal is to move from “we use Ingress-NGINX” to a precise inventory of how it’s used.

Inventory All Ingress Resources: Use kubectl get ingress --all-namespaces as a starting point, but go deeper.
Analyze Annotation Usage: Script an analysis to catalog every annotation in use (e.g., nginx.ingress.kubernetes.io/rewrite-target, nginx.ingress.kubernetes.io/configuration-snippet). This reveals functional dependencies.
Map to Backend Services: For each Ingress, identify the backend Services and Namespaces. This highlights critical applications and potential blast radius.
Review Customizations: Document any custom ConfigMaps for main NGINX configuration, custom template patches, or modifications to the controller deployment itself.
Evaluate Alternatives: Based on the inventory, evaluate candidate replacements (e.g., Gateway API with a compatible implementation, another Ingress controller like Emissary-ingress or Traefik). The Google Cloud migration framework provides a useful decision tree for ingress-specific migrations.

The output of this phase is a migration manifesto: a concrete list of what needs to be converted, grouped by complexity and criticality.

Phase 2: Phased Rollout & Parallel Run

This is the core of a low-risk migration. Instead of replacing, you run the new and old systems in parallel, shifting traffic gradually. For ingress, this often means installing the new controller alongside the old one.

Dual Installation: Deploy the new ingress controller in the same cluster, configured with a distinct ingress class (e.g., ingressClassName: gateway vs. nginx).

Create Canary Ingress Resources: For a low-risk application, create a parallel Ingress or Gateway resource pointing to the new controller. Use techniques like managed deployments with canary patterns to control exposure.

# Example: A new Gateway API HTTPRoute for a canary service
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: app-canary
spec:
  parentRefs:
  - name: company-gateway
  rules:
  - backendRefs:
    - name: app-service
      port: 8080
      weight: 10 # Start with 10% of traffic

Validate Equivalency: Use traffic mirroring (if supported) or direct synthetic testing against both ingress paths. Compare logs, response headers, latency, and error rates.
Iterate and Expand: Gradually increase traffic weight or add more applications to the new stack, group by group, based on the assessment from Phase 1.

This phase relies heavily on your observability stack. Dashboards comparing error rates, latency (p50, p99), and throughput between the old and new paths are essential.

Phase 3: Validation & Automated Cutover

The cutover is not a manual event. It’s the final step in a validation process.

Define Validation Tests: Create a suite of tests that must pass before full cutover. This includes:
- Smoke tests for all critical user journeys.
- Load tests to verify performance under expected traffic patterns.
- Security scan validation (e.g., no unintended ports open).
- Compliance checks (e.g., specific headers are present).
Automate the Switch: For each application, the cutover is ultimately a change in its Ingress or Gateway resource. This should be done via your GitOps pipeline. Update the source manifests (e.g., change the ingressClassName), merge, and let automation apply it. This ensures the state is declarative and recorded.
Maintain Rollback Capacity: The old system must remain operational and routable (with reduced capacity) during this phase. The GitOps rollback is simply reverting the manifest change.

Phase 4: Observability & Decommission

Once all traffic is successfully migrated and validated over a sustained period (e.g., 72 hours), you can decommission the old component.

Monitor Aggressively: Keep a close watch on all key metrics for at least one full business cycle (a week).
Remove Old Resources: Delete the old controller’s Deployment, Service, ConfigMaps, and CRDs (if no longer needed).
Clean Up Auxiliary Artifacts: Remove old RBAC bindings, service accounts, and any custom monitoring alerts or dashboards specific to the old component.
Document Lessons Learned: Update runbooks and architecture diagrams. Note any surprises, gaps in the process, or validation tests that were particularly valuable.

Key Principles for a Resilient Framework

Beyond the phases, these principles should guide your framework’s design:

Always Maintain Rollback Capability: Every step should be reversible with minimal disruption. This is a core tenet of managing Kubernetes deployments.
Leverage GitOps for State Management: All desired state changes (Ingress resources, controller deployments) must flow through version-controlled manifests. This provides an audit trail, consistency, and the simplest rollback mechanism (git revert).
Validate with Production Traffic Patterns: Synthetic tests are insufficient. Use canary weights and traffic mirroring to validate with real user traffic in a controlled manner.
Communicate Transparently: Platform teams should maintain a clear migration status page for internal stakeholders, showing which applications have been migrated, which are in progress, and the overall timeline.

Conclusion: Building a Migration-Capable Platform

The deprecation of Ingress-NGINX is a wake-up call. The next major change is a matter of “when,” not “if.” By investing in a structured migration framework now, platform teams transform a potential crisis into a manageable, repeatable operational procedure.

This framework—Assess, Run in Parallel, Validate, and Decommission—abstracts the specific lessons from the ingress migration into a generic pattern. It can be applied to migrating from PodSecurityPolicies to Pod Security Standards, from a deprecated CSI driver, or from one service mesh to another. The tools (GitOps, canary deployments, observability) are already in your stack. The value is in stitching them together into a disciplined process that ensures platform evolution doesn’t compromise platform stability.

Start by documenting this framework as a runbook template. Then, apply it to your next significant component update, even a minor one, to refine the process. When the next major deprecation announcement lands in your inbox, you’ll be ready.