Revamp the container build procedures
-
Upgrade Drivers to 535 or 545. (nodes seem to work pretty stably) -
Upgrade NVIDIA Container Toolkit across all nodes. Then, update k8s-device-plugin
. -
Move GitLab Runners to non-NVIDIA nodes -
Base foundation container to Ubuntu, not the NVIDIA CUDA base image. -
Install PyTorch with pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
and TensorFlow withpip3 install tensorflow[and-cuda]
. Keras doesn't need to be installed separately and TensorRT installation is incorrect. Will be fixed throughtensorflow[and-cuda]
. -
Add JAX, requires CUDA > 11.8. -
Consider best practices for buildx. Consult: https://github.com/docker/build-push-action -
Install GitHub Copilot and Copilot Chat manually by downloading and running code-server --install-extension <path to vsix>
. -
Find a way to automatically add Jupyter server credentials to VS Code as well (allows running Jupyter notebooks in VS Code). -
Add Chrome and a virtual desktop (Selkies) to containers.
Edited by Seungmin Kim