In the past, it was often said that running deep learning software on AMD GPUs was a major hassle, and users with deep learning needs were advised to buy Nvidia graphics cards. However, with the recent popularity of Large Language Models (LLMs) and many research institutions releasing LLaMA-based models, I became interested and wanted to test them out. Since most of the high-VRAM graphics cards I have on hand are from AMD, I decided to try running them on these cards.
Table of Contents
Testing Environment
The current testing environment uses a VM with GPU passthrough; the setup is as follows:
- CPU: 16-core AMD Epyc 7302P
- RAM: 96GB DDR4 ECC 2933
- GPU: AMD Instinct MI25 (WX9100 BIOS)
- OS: Ubuntu 20.04
- Host OS: PVE 7.4-3
- ROCm 4.5.2
ROCm is an open-source platform developed by AMD for running deep learning applications, which includes HIP, a CUDA conversion layer.
System Settings
ROCm Installation Steps
First, we need to add the user to the 'video' group to obtain the necessary permissions.
sudo usermod -a -G video $LOGNAME
Then, download and install the installer:
sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/focal/amdgpu-install_5.4.50403-1_all.deb
sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb
Finally, install the GPU drivers and ROCm, note that I am specifically designating version 4.5.2 here. While the latest version is currently 5.4.3, the last version to support the AMD Instinct MI25/WX9100 is 4.5.2.
sudo amdgpu-install --usecase=rocm,dkms --rocmrelease=4.5.2
The specified rocm 跟 dkms is a common use case. To see which use cases are available, you can use the following command:
sudo amdgpu-install --list-usecase
The installation process will take some time, mostly depending on your internet speed. Once the installation is complete, we need to restart the system to load the new drivers.
sudo reboot
PyTorch Installation Steps
Most projects nowadays use PyTorch as their deep learning framework. To use ROCm , you need to specify a particular version during installation.
Note that most projects specify in requirements.txt the requirements PyTorch, and the default is to install CUDA the version; if it is already installed, you need to uninstall it first.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm4.5.2
Finally, specify the corresponding ROCm version; here we are installing version 4.5.2.
Testing
To test PyTorch whether the GPU is being used, you can enter the following commands in the Python command line:
import torch
torch.cuda.is_available()
If it displays True , it means the installation was successful!
Conclusion
After practical operation, I found that using an AMD GPU to run deep learning applications is not difficult. Currently, I have successfully run stable-diffusion and Vicuna-7B. If readers with AMD graphics cards also want to try some deep learning projects, you might consider using AMD cards to save on the cost of purchasing Nvidia cards.
Copyright Notice: All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise specified.
