Skip to main content

Command Palette

Search for a command to run...

Building a Safe AKS Upgrade Testing Environment

Updated
6 min read

Building a Safe AKS Upgrade Testing Environment with ArgoCD & Kustomize

Kubernetes upgrades in production environments always carry risk. API deprecations, networking changes, autoscaler behavior, or ingress regressions can break critical workloads if not validated beforehand.

In our environment, the production AKS cluster runs dozens of business applications across multiple namespaces, managed through ArgoCD and GitOps. We needed a safe way to:

  • validate Kubernetes upgrades

  • test application compatibility

  • avoid production downtime

  • keep everything GitOps-driven

  • reduce infrastructure costs

This article documents the complete setup process: creating a test AKS cluster, deploying a separate ArgoCD instance, and using Kustomize overlays to deploy production workloads with reduced replicas and resource usage.

Why We Needed a Test Cluster

Our production AKS cluster hosts a multi-tenant platform with numerous namespaces and services. Upgrading directly in production was not an option.

The goal was to create a test cluster that:

  • mirrors production configuration

  • runs selected workloads

  • uses fewer resources

  • is fully managed via GitOps

  • allows safe upgrade testing

Production Environment Snapshot

The production cluster contains both platform and business namespaces.

Platform namespaces

  • argocd

  • calico-system

  • cert-manager

  • ingress-nginx

  • monitoring

  • kong

  • tigera-operator

Business namespaces

  • contracts-service

  • identity-service

  • workflow-engine

  • api-gateway

  • reporting-service

  • messaging-service

Instead of replicating everything, we chose representative workloads to validate upgrade safety.

Step 1 — Extract Production Cluster Configuration

First, we exported cluster configuration for reference.

az aks show \
  --resource-group prod-cluster-rg \
  --name prod-aks-cluster \
  --output json > aks-prod.json

Check Kubernetes version:

az aks show \
  -g prod-cluster-rg \
  -n prod-aks-cluster \
  --query kubernetesVersion

List node pools:

az aks nodepool list \
  --resource-group prod-cluster-rg \
  --cluster-name prod-aks-cluster \
  -o table

Check networking:

az aks show \
  --resource-group prod-cluster-rg \
  --name prod-aks-cluster \
  --query networkProfile

Key observations:

  • network plugin: kubenet

  • network policy: calico

  • pod CIDR: 10.244.0.0/16

  • service CIDR: 10.0.0.0/16

The test cluster must match these settings.

Step 2 — Create Test Resource Group

az group create \
  --name test-cluster-rg \
  --location westeurope

Step 3 — Create Test AKS Cluster

We created a scaled-down cluster with identical networking.

az aks create \
  --resource-group test-cluster-rg \
  --name test-aks-cluster \
  --location westeurope \
  --kubernetes-version 1.31.6 \
  --network-plugin kubenet \
  --network-policy calico \
  --pod-cidr 10.244.0.0/16 \
  --service-cidr 10.0.0.0/16 \
  --dns-service-ip 10.0.0.10 \
  --node-count 2 \
  --node-vm-size Standard_E2as_v6 \
  --enable-managed-identity \
  --generate-ssh-keys \
  --load-balancer-sku standard

Add a secondary node pool:

az aks nodepool add \
  --resource-group test-cluster-rg \
  --cluster-name test-aks-cluster \
  --name systempool \
  --node-count 1 \
  --node-vm-size Standard_E2as_v6 \
  --mode System

Step 4 — Connect kubectl to Test Cluster

az aks get-credentials \
  --resource-group test-cluster-rg \
  --name test-aks-cluster

Verify:

kubectl get nodes

Step 5 — Install a Separate ArgoCD in the Test Cluster

Since the existing ArgoCD runs inside production, we deployed a separate instance for isolation.

Switch context:

kubectl config use-context test-aks-cluster

Create namespace:

kubectl create namespace argocd

Install ArgoCD:

kubectl apply -n argocd \
-f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Wait for pods:

kubectl get pods -n argocd

Expose ArgoCD

Quick access:

kubectl port-forward svc/argocd-server -n argocd 8081:443

Open:

https://localhost:8081

Get admin password

kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d

Login with:

  • username: admin

  • password: retrieved value

Step 6 — Understanding the GitOps Repository Structure

Production ArgoCD apps deploy from:

repoURL: https://bitbucket.org/your-org/kubernetes-platform.git
path: azure/<app>

Example:

azure/app-a
azure/app-b
azure/app-c

Instead of duplicating folders, we used Kustomize overlays to create a test environment.

Step 7 — Create Kustomize Base

Inside an application folder:

azure/app-a/

Create:

azure/app-a/kustomization.yaml

resources:
- deployment.yaml
- service.yaml
- ingress.yaml

This defines the base configuration.

Step 8 — Create Test Overlay

Create:

azure/app-a/overlays/test/

overlays/test/kustomization.yaml

resources:
- ../../

patches:
- replicas.yaml
- resources.yaml

Reduce replicas

overlays/test/replicas.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-a
spec:
  replicas: 1

Reduce resource usage

overlays/test/resources.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-a
spec:
  template:
    spec:
      containers:
      - name: app-a
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "300m"
            memory: "256Mi"

Optional: disable autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-a
$patch: delete

Optional: change ingress hostname

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-a-ingress
spec:
  rules:
  - host: app-a-test.example.com

Step 9 — Commit and Push

git add .
git commit -m "Add test overlay"
git push

Step 10 — Deploy Application via Test ArgoCD

Create a new ArgoCD application.

Source

Repository:

https://bitbucket.org/your-org/kubernetes-platform.git

Path:

azure/app-a/overlays/test

Revision:

HEAD

Destination

Cluster: in-cluster
Namespace: app-a-test

Create and sync the application.

Why This Does Not Affect Production

Production ArgoCD deploys from:

azure/app-a

Test ArgoCD deploys from:

azure/app-a/overlays/test

Different paths ensure production remains untouched.

Step 11 — Repeat for Other Critical Applications

Repeat the overlay approach for representative services such as:

  • identity service

  • workflow engine

  • core API

  • messaging service

This provides realistic upgrade validation coverage.

Step 12 — Preview Overlay Output (Optional)

To preview the final manifests:

kubectl kustomize azure/app-a/overlays/test

Step 13 — Upgrade Testing Workflow

Once workloads are running:

  1. upgrade the test cluster

  2. validate ingress & TLS

  3. verify autoscaling behavior

  4. check storage mounts

  5. test authentication & APIs

  6. run smoke tests

  7. monitor logs & metrics

If everything is stable, proceed with the production upgrade.

Key Lessons Learned

  • Never test upgrades directly in production.

  • Use environment overlays instead of duplicating manifests.

  • Separate ArgoCD instances improve isolation.

  • Reduce replicas and resources to control costs.

  • Always test ingress, storage, and autoscaling behavior.

Final Thoughts

This approach transforms upgrade testing from a risky operation into a controlled and repeatable process.

By combining AKS, ArgoCD, and Kustomize overlays, we built a GitOps-driven test environment that mirrors production behavior while remaining cost-efficient and safe.

This setup is now part of our upgrade lifecycle and has significantly reduced operational risk.