From Zero to Pytorch with CUDA and Docker, Ubuntu 24.04

11 minute read

Published:

In this guide, I will go through the installation of CUDA, Docker, and PyTorch on Ubuntu 24.04. Briefly introducing them:

  • CUDA is a framework developed by NVIDIA that allows you to perform parallel computation using the power of GPUs
  • Docker is a platform that allows you to create virtual environments inside containers in which run your applications; These self-contained environments are isolated from the host system, meaning that inside them is possible to run codes, install dependencies, libraries, and everything you need for running your application, without affecting the host machine. The amazing thing is that containers are portable! You can run your application on any machine or operating system with Docker installed, without knowing the system architecture;
  • PyTorch is a framework that provides efficient tools for building and training Neural Networks.

This is everything you need to start playing with Deep Learning using your GPU!

Let’s go through the installation!

The first step is to have Ubuntu installed, my verison is 24.04 LTS. Once you have installed it, open the terminal and use the command:

ubuntu-drivers autoinstall

Recommended drivers will be installed by the system, including the ones for your GPU.

Now you can run the command:

nvidia-smi

If in the output of this command is mentioned a CUDA version, like in the image below, it means that your GPU supports it. The version shown by the command is the maximum one supported by your hardware.

nvidia-smi command output

Before installing CUDA, you can double-check your GPU compatibility at this link

Another important thing is to have gcc installed, a C/C++ compiler necessary for CUDA installation. You can get it through the command:

sudo apt install build-essential

Now, you can check here which is the maximum CUDA version supported by PyTorch to ensure you will not install a CUDA version higher than the maximum one supported by the deep learning framework.

Finally, everything is ready for CUDA installation.

CUDA installation

CUDA Pre-installation actions

First you need to install the Kernel Headers, needed by the drivers:

sudo apt-get install linux-headers-$(uname -r)

When you install CUDA on Ubuntu, your system needs to trust the software source from which CUDA comes. It uses a security key, which acts like a digital signature to verify that the software is safe and really from NVIDIA. This key has been updated, so we need to remove the outdated key. Run the command:

sudo apt-key del 7fa2af80

To download CUDA, go at this link, select Linux > x86_64 > Ubuntu > 24.04 > deb(local) and run the instructions that appear:

(PAY ATTENTION: before running, write your version of CUDA where necessary in the above commands. You can look at the archive of CUDA version)

wget https://developer.download.nvidia.com/compute/CUDA/repos/ubuntu2404/x86_64/CUDA-ubuntu2404.pin
sudo mv CUDA-ubuntu2404.pin /etc/apt/preferences.d/CUDA-repository-pin-600
wget https://developer.download.nvidia.com/compute/CUDA/12.6.2/local_installers/CUDA-repo-ubuntu2404-12-6-local_12.6.2-560.35.03-1_amd64.deb
sudo dpkg -i CUDA-repo-ubuntu2404-12-6-local_12.6.2-560.35.03-1_amd64.deb
sudo cp /var/CUDA-repo-ubuntu2404-12-6-local/CUDA-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install CUDA-toolkit-12-6

CUDA Post-installation actions

After the installation of CUDA you need to export its path, even in this case pay attention to specify the right version you installed:

export PATH=/usr/local/CUDA-12.6/bin${PATH:+:${PATH}}

Verify CUDA installation

To verify the success of the installation it is possible to download and run some samples offered by NVIDIA, but first reboot the system.

To download these samples install git:

sudo apt install git

and run:

git clone https://github.com/nvidia/CUDA-samples
cd open CUDA-samples/

Now open the link of the git repository, go to the releases and check your CUDA version, go back to your terminal and run:

git checkout vYour-CUDA-version
make
cd Samples/1_Utilities/deviceQuery
./deviceQuery

If you get a PASS as output, your CUDA device is detected correctly.

Now run:

cd ..
cd bandwidthTest/
./bandwidthTest

If you get another PASS, the installation is successfully done! Everything went well!

Docker installation

Now is the turn of Docker. As I already mentioned, Docker allows you to create isolated applications that can run independently from the rest of the system organizing them inside containers. This is exactly what we will do for running the PyTorch application!

Run the following commands:

sudo apt update
sudo apt install curl apt-transport-https ca-certificates software-properties-common 
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt install docker-ce -y

To verify the Docker installation run:

sudo systemctl status docker

Finally, to run Docker without specifying every time the sudo command, make the Docker group and add your user inside it through the following commands:

sudo groupadd docker
sudo usermod -aG docker $USER

The last step to configure Docker is to let it able to detect and interact with GPUs. For this aim, you need to install the NVIDIA Container Toolkit. First, configure the repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey |sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
sudo apt-get update

Then, install the package:

sudo apt-get install -y nvidia-container-toolkit

To complete the installation run:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

The first command is to configure the container runtime, the second one is to restart the Docker daemon.

PyTorch installation

Everything is now ready to install PyTorch inside a Docker container.

First of all, you need to create a Docker image… What is a Docker image? It is the skeleton of a container, an executable file containing a set of instructions used to create a container based on the rules you want. The image file contains all the files, libraries, and dependencies for running your container. This file is portable, so when moving it to another host machine with Docker installed, you can recreate always the same container.

You can also create an image based on an already existing one, and customize it installing other libraries or dependencies. You can find a database of existing images on Docker Hub.

In my case, for example, I use a Python-based image on which I install all the libraries I need for my application, including PyTorch. The image I usually use is the following:

FROM python:3.10

WORKDIR /you_workdir_name

RUN apt-get update
RUN apt install -y libgl1-mesa-glx
COPY ./requirements.txt /you_workdir_name/requirements.txt

# Install the Python dependencies
RUN pip install --no-cache-dir --upgrade -r /you_workdir_name/requirements.txt

# Set the command to run your application
CMD ["/bin/sh", "-c", "bash"]

As I said, for my projects, I usually create an image based on the Python 3.10 one, specifying it with the FROM command. This image provides Python 3.10 and essential libraries like numpy. Using WORKDIR, the container’s working directory is set to /your_workdir_name. This is where the application files and dependencies will be managed inside the container. With the command RUN, it is possible to execute instructions inside the container. With the command COPY, files can be copied from a local folder to the working directory inside the container. In my case, I copied requirements.txt from the host machine to the container’s working directory. This .txt file contains all the libraries I want to install inside the container. In this case, you need to specify the PyTorch library. I’ll provide an example shortly. Using RUN, instructions can be executed inside the container. In this case, I used it to install all libraries specified in the requirements.txt file using pip Python package. Finally, CMD allows you to set the default command to open a shell, allowing interactive access to the container when it starts.

The image file must be called: Dockerfile. At the same directory level you need to create a requirements.txt. This file contains all the libaries you want to install inside the container, in this specific case the file will contains just the PyTorch library. Based on the PyTorch supported CUDA version you installed, the requirements.txt changes.

For CUDA 12.4:

torch 
torchvision
torchaudio

For CUDA 12.1:

torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

For CUDA 11.8:

torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

You are ready to create your container!

First build your image, run the following command where the Dockerfile is located:

docker build -t image_name .

The image will be built and everything will be prepared to create you container. Once you executed the command, you can crate the container running the following instruction:

docker run -it --gpus all -v /absolute/path/to/local/folder/of/the/project:/you_workdir_name --name container_name image_name

Going deep through the command, with the -it flag you will interact directly with the container’s bash after its creation. Using --gpus all you tell the container to use all the available GPUs. The flag -v is a little bit tricky, it creates a virtual channel between a local folder of your machine, in which there are all the files of the application you want to run, with the virtual working directory inside the container (the one specified previously in the image file). So, everything inside the local directory can also be viewed and accessed from the container’s working directory. In this way, if you are using an IDE to modify the codes inside the local directory, changes will be directly reflected in the container.

Once inside the container’s bash, you can act as if you are using your system’s terminal, you can run a Python file:

python ./python_file

From your IDE you can modify the code and run again the file.

Now you are totally able to write and run PyTorch applications! Enjoy it!