Develop nodeOperator
- watch CrashLoops in GPU system pods
- on broken, drain the node
- reboot the node (ansible task)
- if not rebooted, call IPMI reset if possible
- monitor IPMI status on separate page (connectivity to node’s IPMI)
- update IPMI IPs from NEtBox
- node label unification (netbox* -> topology*)