New upgrade ansible for 1.28 k8s
The new procedure should:
-
1. Check that /var/lib/kubelet/cpu_manager_state
has"policyName":"static"
, if not delete the file when restarting kubelet (new cpumanager policy will automagically apply during the upgrade) -
2. Install new GPU driver via apt. Most stuff is there already, but should get rid of multiple reboots -
3. Install fabricmanager if GPU is TESLA. Compare the list of HW prefixes to the list of devices on the node. Fabricmanager install code is there already, jsut need the check correctly in ansible. -
4. Actually perform the k8s upgrade -
5. Verify CRIO stuff for upgrade
Edited by Sam Albin