Auto reboot nodes based on conditions
GPU watcher in portal is now disabled.
node-problem-detector/node-problem-detector added the GPUFailed
condition to nodes, and k8s is supposed to automagically taint nodes with broken GPUs. We need to verify how it works, and reboot the nodes based on this taint, not cordoned ones.