Written by: Vimal P
Date: March 24, 2021
Tanzu Kubernetes Grid 1.2 upgrade – Tips & Tricks
Upgrading an environment has been always been challenging. In this blog, we will highlight some of the issues one might encounter during step-by-step the upgrade process.
Workflow of upgrade tkg 1.1.x to tkg 1.2.x

In my current lab, Tanzu Kubernetes Grid was running on the 1.1.0 version and Kubernetes version 1.18.2.
Step-1: The management and workload clusters present in the environment in a healthy state. Health and version check with help of the below command.
# tkg version
# tkg get cluster –include-management-cluster
# kubectl get cluster -A –show-labels
# tkg get mc
Step-2: Download and Install the New Version of the TKG 1.2.1 CLI, downloaded to update tkg CLI binary and the latest Kubernetes node OS ova file. There is no need for the HAProxy ova file (not available for download) since we’re moving to Kube-VIP.
Modify the ~/.tkg/config.yaml file with new Kubernetes node OS ova template path in place of old vSphere template path.

With the new cli in place and the new ova saved as a template in vCenter, we can run our first tkg command with the new binary to update the relevant TKG files.
Step-3: We use the tkg upgrade management-cluster and tkg upgrade cluster CLI commands to upgrade clusters.
Before start, the management cluster update, label the management cluster as a management cluster.
# kubectl -n tkg-system label cluster management-cluster-tkg cluster-role.tkg.tanzu.vmware.com/management=”” –overwrite=true

# kubectl get cluster -A –show-labels

We are ready to start to upgrade the management cluster
# tkg upgrade management-cluster management-cluster-tkg

We observed in recent tasks in vSphere, a new control plane VM is created from the 1.19.3 Kubernetes node OS image that has been saved as a template. After some time, we observed that the original control plane node for the management cluster getting deleted.
In my lab have three control node VM, updated one by another. After the control node VM was updated, new worker nodes were created from the 1.19.3 image and the originals deleted.
After some time, the process is completed. Once management cluster upgraded, verified management cluster.
# tkg get cluster –include-management-cluster

Upgrading the workload cluster follows the same process with tkg upgrade cluster CLI commands.
# tkg upgrade cluster cluster-tenant1

We can now see that both clusters are running the newer Kubernetes version.
# tkg get cluster –include-management-cluster

Step-4: TKG Extension is not present in my environment hence I skipped this. Refer TKG Extension upgrade on Upgrade TKG Extensions from 1.1.x to 1.2.x
Step-5: This step was not present during the tkg 1.0 to 1.1 upgrade.
Once the management and workload cluster up and running, the next step Migrate Clusters from an HA Proxy Load Balancer to Kube-VIP. Updating clusters to use Kube-VIP rather than HA Proxy allows you to remove the HA Proxy VM from each cluster that we deployed with the older version, which reduces resource consumption.
The first validate the IP address in use for the management cluster. An object of type HA Proxy loadbalancer should exist and we just need to get the IP address associated with help of the below command.
# kubectl -n tkg-system get haproxyloadbalancer management-cluster-tkg-tkg-system

# kubectl -n tkg-system get haproxyloadbalancer management-cluster-tkg-tkg-system -o template='{{.status.address}}’

With the IP address validated, we can now prepare to update the kubeadm configuration to make use of it. Create a file named patch.yaml and paste the YAML below into it with the kube-vip info we need and the IP address that was used by HAProxy (used for the vip_address value).
#cat patch.yaml
spec:
kubeadmConfigSpec:
files:
– content: |
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: kube-vip
namespace: kube-system
spec:
containers:
– args:
– start
env:
– name: vip_arp
value: “true”
– name: vip_leaderelection
value: “true”
– name: vip_address
value: 192.168.1.41
– name: vip_interface
value: eth0
– name: vip_leaseduration
value: “15”
– name: vip_renewdeadline
value: “10”
– name: vip_retryperiod
value: “2”
image: registry.tkg.vmware.run/kube-vip:v0.1.8_vmware.1
imagePullPolicy: IfNotPresent
name: kube-vip
resources: {}
securityContext:
capabilities:
add:
– NET_ADMIN
– SYS_TIME
volumeMounts:
– mountPath: /etc/kubernetes/admin.conf
name: kubeconfig
hostNetwork: true
volumes:
– hostPath:
path: /etc/kubernetes/admin.conf
type: FileOrCreate
name: kubeconfig
status: {}
owner: root:root
path: /etc/kubernetes/manifests/kube-vip.yaml
We need to validate the control plane name for the management cluster, so we can patch it.
# kubectl -n tkg-system get kcp

Now we are ready to apply the patch.yaml file to the KubeadmControlPlane resource in the cluster.
# kubectl -n tkg-system patch kcp management-cluster-tkg-control-plane –type merge –patch “$(cat patch.yaml)”

When we apply this patch, the kcp-controller-manager pod in the cluster detects a change and starts creating new machines with the updated specifications. We see a new control plane node getting created in the vSphere client and the original one getting removed. If you have multiple control plane nodes, the process will roll through them all, one at a time.
Once the process is done, we can check on the most recently created control plane node and make sure that it’s functional.
# kubectl get ma | grep $(kubectl get ma –sort-by=.metadata.creationTimestamp -o jsonpath=”{.items[-1:].metadata.name}”)

We can remove the haproxyloadbalancer object with help of the below command. It was removed HAproxy VM from vSphere as well
# kubectl -n tkg-system delete haproxyloadbalancer management-cluster-tkg-tkg-system

We need to edit the vspherecluster object to remove the reference to HAProxy.
# kubectl -n tkg-system edit vspherecluster management-cluster-tkg

Remove the loadBalancerRef reference and save the file.

Note:- Since the IP Address for the HAProxy VM was assigned via DHCP, we need to create a static DHCP reservation for the Kube-VIP address, with a fake MAC address that will not possibly come into use, and modify the IP range in use to exclude this IP address. This will ensure that the DHCP server does not ever reassign the Kube-VIP address to some other machine. In my setup DHCP was created on NSX-T.
Step-6: HAProxy to Kube-VIP for the workload cluster
The process for moving from HAProxy to Kube-VIP for the workload cluster is like what was done for the management cluster and is also done from the context of the management cluster. The largest difference is that the objects for the workload cluster live in the default namespace.
# kubectl get haproxyloadbalancer cluster-tenant1-default -o template='{{.status.address}}’

The patch.yaml is the same as to use earlier, only the IP address that was used by HAProxy in the workload cluster.
:
:
– name: vip_address
value: 192.168.2.10
– name: vip_interface
value: eth0
:
:
# kubectl get kcp

# kubectl get ma | grep $(kubectl get ma –sort-by=.metadata.creationTimestamp -o jsonpath=”{.items[-1:].metadata.name}”)

# kubectl delete haproxyloadbalancer cluster-tenant1-default

# kubectl edit vspherecluster cluster-tenant1 …. Remove the loadBalancerRef reference and save the file

Step-7: Both of my clusters are now upgraded to the 1.2.1 version running Kubernetes 1.19.3.
For more detailed information on Tanzu Kubernetes Grid upgrade please refer to VMware Upgrading Tanzu Kubernetes Grid.