Building a Safe AKS Upgrade Testing Environment
Building a Safe AKS Upgrade Testing Environment with ArgoCD & Kustomize
Kubernetes upgrades in production environments always carry risk. API deprecations, networking changes, autoscaler behavior, or ingress regressions can break critical workloads if not validated beforehand.
In our environment, the production AKS cluster runs dozens of business applications across multiple namespaces, managed through ArgoCD and GitOps. We needed a safe way to:
validate Kubernetes upgrades
test application compatibility
avoid production downtime
keep everything GitOps-driven
reduce infrastructure costs
This article documents the complete setup process: creating a test AKS cluster, deploying a separate ArgoCD instance, and using Kustomize overlays to deploy production workloads with reduced replicas and resource usage.
Why We Needed a Test Cluster
Our production AKS cluster hosts a multi-tenant platform with numerous namespaces and services. Upgrading directly in production was not an option.
The goal was to create a test cluster that:
mirrors production configuration
runs selected workloads
uses fewer resources
is fully managed via GitOps
allows safe upgrade testing
Production Environment Snapshot
The production cluster contains both platform and business namespaces.
Platform namespaces
argocd
calico-system
cert-manager
ingress-nginx
monitoring
kong
tigera-operator
Business namespaces
contracts-service
identity-service
workflow-engine
api-gateway
reporting-service
messaging-service
Instead of replicating everything, we chose representative workloads to validate upgrade safety.
Step 1 — Extract Production Cluster Configuration
First, we exported cluster configuration for reference.
az aks show \
--resource-group prod-cluster-rg \
--name prod-aks-cluster \
--output json > aks-prod.json
Check Kubernetes version:
az aks show \
-g prod-cluster-rg \
-n prod-aks-cluster \
--query kubernetesVersion
List node pools:
az aks nodepool list \
--resource-group prod-cluster-rg \
--cluster-name prod-aks-cluster \
-o table
Check networking:
az aks show \
--resource-group prod-cluster-rg \
--name prod-aks-cluster \
--query networkProfile
Key observations:
network plugin: kubenet
network policy: calico
pod CIDR: 10.244.0.0/16
service CIDR: 10.0.0.0/16
The test cluster must match these settings.
Step 2 — Create Test Resource Group
az group create \
--name test-cluster-rg \
--location westeurope
Step 3 — Create Test AKS Cluster
We created a scaled-down cluster with identical networking.
az aks create \
--resource-group test-cluster-rg \
--name test-aks-cluster \
--location westeurope \
--kubernetes-version 1.31.6 \
--network-plugin kubenet \
--network-policy calico \
--pod-cidr 10.244.0.0/16 \
--service-cidr 10.0.0.0/16 \
--dns-service-ip 10.0.0.10 \
--node-count 2 \
--node-vm-size Standard_E2as_v6 \
--enable-managed-identity \
--generate-ssh-keys \
--load-balancer-sku standard
Add a secondary node pool:
az aks nodepool add \
--resource-group test-cluster-rg \
--cluster-name test-aks-cluster \
--name systempool \
--node-count 1 \
--node-vm-size Standard_E2as_v6 \
--mode System
Step 4 — Connect kubectl to Test Cluster
az aks get-credentials \
--resource-group test-cluster-rg \
--name test-aks-cluster
Verify:
kubectl get nodes
Step 5 — Install a Separate ArgoCD in the Test Cluster
Since the existing ArgoCD runs inside production, we deployed a separate instance for isolation.
Switch context:
kubectl config use-context test-aks-cluster
Create namespace:
kubectl create namespace argocd
Install ArgoCD:
kubectl apply -n argocd \
-f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
Wait for pods:
kubectl get pods -n argocd
Expose ArgoCD
Quick access:
kubectl port-forward svc/argocd-server -n argocd 8081:443
Open:
Get admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
Login with:
username: admin
password: retrieved value
Step 6 — Understanding the GitOps Repository Structure
Production ArgoCD apps deploy from:
repoURL: https://bitbucket.org/your-org/kubernetes-platform.git
path: azure/<app>
Example:
azure/app-a
azure/app-b
azure/app-c
Instead of duplicating folders, we used Kustomize overlays to create a test environment.
Step 7 — Create Kustomize Base
Inside an application folder:
azure/app-a/
Create:
azure/app-a/kustomization.yaml
resources:
- deployment.yaml
- service.yaml
- ingress.yaml
This defines the base configuration.
Step 8 — Create Test Overlay
Create:
azure/app-a/overlays/test/
overlays/test/kustomization.yaml
resources:
- ../../
patches:
- replicas.yaml
- resources.yaml
Reduce replicas
overlays/test/replicas.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-a
spec:
replicas: 1
Reduce resource usage
overlays/test/resources.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-a
spec:
template:
spec:
containers:
- name: app-a
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "300m"
memory: "256Mi"
Optional: disable autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-a
$patch: delete
Optional: change ingress hostname
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-a-ingress
spec:
rules:
- host: app-a-test.example.com
Step 9 — Commit and Push
git add .
git commit -m "Add test overlay"
git push
Step 10 — Deploy Application via Test ArgoCD
Create a new ArgoCD application.
Source
Repository:
https://bitbucket.org/your-org/kubernetes-platform.git
Path:
azure/app-a/overlays/test
Revision:
HEAD
Destination
Cluster: in-cluster
Namespace: app-a-test
Create and sync the application.
Why This Does Not Affect Production
Production ArgoCD deploys from:
azure/app-a
Test ArgoCD deploys from:
azure/app-a/overlays/test
Different paths ensure production remains untouched.
Step 11 — Repeat for Other Critical Applications
Repeat the overlay approach for representative services such as:
identity service
workflow engine
core API
messaging service
This provides realistic upgrade validation coverage.
Step 12 — Preview Overlay Output (Optional)
To preview the final manifests:
kubectl kustomize azure/app-a/overlays/test
Step 13 — Upgrade Testing Workflow
Once workloads are running:
upgrade the test cluster
validate ingress & TLS
verify autoscaling behavior
check storage mounts
test authentication & APIs
run smoke tests
monitor logs & metrics
If everything is stable, proceed with the production upgrade.
Key Lessons Learned
Never test upgrades directly in production.
Use environment overlays instead of duplicating manifests.
Separate ArgoCD instances improve isolation.
Reduce replicas and resources to control costs.
Always test ingress, storage, and autoscaling behavior.
Final Thoughts
This approach transforms upgrade testing from a risky operation into a controlled and repeatable process.
By combining AKS, ArgoCD, and Kustomize overlays, we built a GitOps-driven test environment that mirrors production behavior while remaining cost-efficient and safe.
This setup is now part of our upgrade lifecycle and has significantly reduced operational risk.

