Skip to content
Snippets Groups Projects

Melissa's AXOL1TL Training for NRP Nautilus

creating and running test pod: using image gitlab-registry.nrp-nautilus.io/mquinnan/axol1tl-hub:axo-env-gpu-conda

#in same directory as axo-pod.yml
kubectl create -f axo-pod.yml -n axol1tl

#check status
kubectl --namespace=axol1tl get pod

#enter pod once ready
kubectl exec -it axo-pod -- /bin/bash

#run testing script:

bash
conda activate axo-env
git clone https://gitlab.nrp-nautilus.io/mquinnan/nrp-axo.git .
export KERAS_BACKEND=torch

ray start --head

python test_contrastive_vae_tuner.py

#exit and delete pod when done
exit
kubectl delete -f axo-pod.yml -n axol1tl

for jobs still working on yml script, pending. script for that is contrastive_vae_tuner.py

data files are located in /axovol/ persistent volume

Running Job on a Raycluster

#make the raycluster
helm install raycluster kuberay/ray-cluster -f ray-cluster.yaml

#check status
kubectl get rayclusters -n axol1tl

#start the job
kubectl apply -f tuner-job.yaml

#view the job
kubectl get jobs

#view the pod
kubectl get pods

#debugging
kubectl logs tuner-job- -n axol1tl

#delete the job
kubectl delete tuner-job -n axol1tl

#delete the ray cluster
kubectl delete raycluster-kuberay -n axol1tl