Melissa's AXOL1TL Training for NRP Nautilus
creating and running test pod: using image gitlab-registry.nrp-nautilus.io/mquinnan/axol1tl-hub:axo-env-gpu-conda
#in same directory as axo-pod.yml
kubectl create -f axo-pod.yml -n axol1tl
#check status
kubectl --namespace=axol1tl get pod
#enter pod once ready
kubectl exec -it axo-pod -- /bin/bash
#run testing script:
bash
conda activate axo-env
git clone https://gitlab.nrp-nautilus.io/mquinnan/nrp-axo.git .
export KERAS_BACKEND=torch
ray start --head
python test_contrastive_vae_tuner.py
#exit and delete pod when done
exit
kubectl delete -f axo-pod.yml -n axol1tl
for jobs still working on yml script, pending. script for that is contrastive_vae_tuner.py
data files are located in /axovol/
persistent volume
Running Job on a Raycluster
#make the raycluster
helm install raycluster kuberay/ray-cluster -f ray-cluster.yaml
#check status
kubectl get rayclusters -n axol1tl
#start the job
kubectl apply -f tuner-job.yaml
#view the job
kubectl get jobs
#view the pod
kubectl get pods
#debugging
kubectl logs tuner-job- -n axol1tl
#delete the job
kubectl delete tuner-job -n axol1tl
#delete the ray cluster
kubectl delete raycluster-kuberay -n axol1tl