Devices#

A significant aspect of working with PyTorch involves managing the devices on which your code runs. This page focuses on that topic.

Note: To fully grasp the concepts discussed here, you’ll need a GPU that supports CUDA. Consider using Google Colab for this purpose.

import torch
from torch.utils import benchmark

To check the status of your GPU, use the nvidia-smi command as shown in the cell below.

!nvidia-smi
Wed Aug  7 11:41:57 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Selecting device#

By default, all objects are created on the cpu. However, you can use the to method to move them to a different device.


By default, tensors created in PyTorch are allocated to the CPU device. The following code demonstrates this by creating a tensor and printing its device:

cpu_tensor = torch.ones([3, 3])
cpu_tensor.device
device(type='cpu')

The following code switches the device to cuda and displays the device:

cuda_tensor = cpu_tensor.to("cuda")
cuda_tensor.device
device(type='cuda', index=0)

Another option is to use cuda method:

cpu_tensor.cuda().device
device(type='cuda', index=0)

Device availability#

It’s a common scenario to debug your program on one machine but use cloud utilities like Colab for computationally heavy operations. To avoid potential errors caused by forgetting to switch the device, you can check the availability of the device and use it if available.

To determine if a device is available at runtime, use torch.cuda.is_available() for CUDA devices or torch.backends.mps.is_available() for Apple processors.


The following cell shows how you can check if CUDA is available.

torch.cuda.is_available()
False

The full code that defines the device variable based on the runtime environment is shown below:

if torch.cuda.is_available():
    device = torch.device("cuda")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
else:
    device = torch.device("cpu")
print("using device", device)
using device cpu

Network#

PyTorch networks have a to method that conveniently applies a specified device to all of the network’s tensors.


In the following code cell, we create a network and then print the devices of its tensors.

network = torch.nn.Sequential(
    torch.nn.Linear(3,3),
    torch.nn.ReLU()
)

for param in network.parameters():
  print(param.device)
cpu
cpu

After applying the .to('cuda') method to the network, we obtain a new object that represents the same network but with all its tensors now residing on the GPU.

cuda_network = network.to('cuda')

for param in cuda_network.parameters():
  print(param.device)
cuda:0
cuda:0

Benchmark#

Let’s compare the performance of a PyTorch network running on the device (likely a GPU) and on the CPU using torch.utils.benchmark.


This code creates a large, inefficient neural network and attempts to use it with a large matrix, resulting in slow execution time.

dimentionality = 1000
network = torch.nn.Sequential(*[
    torch.nn.Linear(dimentionality, dimentionality) for i in range(100)
])
X = torch.randn([dimentionality, dimentionality])

benchmark.Timer(
    stmt="network(X)",
    globals={"network": network, "X": X}
).timeit(5)
<torch.utils.benchmark.utils.common.Measurement object at 0x7bdd1334ed40>
network(X)
  1.73 s
  1 measurement, 5 runs , 1 thread

Now, we’ll benchmark the same network but this time with its computations moved to the GPU to see how much faster it performs.

network = network.to('cuda')
X = X.to('cuda')

benchmark.Timer(
    stmt="network(X)", globals={"network": network, "X": X}
).timeit(5)
<torch.utils.benchmark.utils.common.Measurement object at 0x7bdd1334d5a0>
network(X)
  65.91 ms
  1 measurement, 5 runs , 1 thread