Grafana Monitoring Issues: Showing wrong number of GPUs
Grafana monitoring page showing GPU utilization of osg-nrao
Number of GPUs shown does not match with the active pods using GPUs.
User reported
A pod that is listed as Completed, shows up as Running in grafana and there is no terminate log in the events:
sfiligoi@Igors-MacBook-Pro image % kubectl get pods -n osg-nrao |grep osg-nrao-6697ded5-000162-xlbtv
osg-nrao-6697ded5-000162-xlbtv 0/1 Completed 0 91m
sfiligoi@Igors-MacBook-Pro image % kubectl get events -n osg-nrao |grep osg-nrao-6697ded5-000162-xlbtv |grep -v FailedScheduling
55m Normal Scheduled pod/osg-nrao-6697ded5-000162-xlbtv Successfully assigned osg-nrao/osg-nrao-6697ded5-000162-xlbtv to uicnrp-fiona2.evl.uic.edu
55m Normal Pulling pod/osg-nrao-6697ded5-000162-xlbtv Pulling image "sfiligoi/prp-portal-wn:ospool"
55m Normal Pulled pod/osg-nrao-6697ded5-000162-xlbtv Successfully pulled image "sfiligoi/prp-portal-wn:ospool" in 253.409149ms (253.420949ms including waiting)
55m Normal Created pod/osg-nrao-6697ded5-000162-xlbtv Created container htcondor
55m Normal Started pod/osg-nrao-6697ded5-000162-xlbtv Started container htcondor
sfiligoi@Igors-MacBook-Pro image %