AMD GPU and Deep Learning: Practical Learning Guide

AMD GPU 與深度學習:實用教學指南

Historically, AMD GPUs have been considered less suitable for deep learning tasks, leading many deep learning users to prefer Nvidia GPUs. However, recently, LLMs (Large Language Models) have gained significant attention, and numerous research teams have released models based on LLaMA, prompting me to feel inspired and eager to experiment. I have several AMD GPUs with ample VRAM, so I’ve decided to test these cards for running LLMs.

Test Environment

The current testing environment uses VM with GPU passthrough. The environment is as follows:

  • CPU: 16-core AMD EPYC 7302P
  • RAM: 96GB DDR4 ECC 2933
  • GPU: AMD Instinct MI25 (WX9100 BIOS)
  • OS: Ubuntu 20.04
  • Host OS: PVE 7.4-3
  • ROCm 4.5.2

ROCm is AMD's open-source platform designed for deep learning-related applications, including HIP, which provides CUDA porting capabilities.

System Configuration

ROCm Installation Steps

First, we need to add the user to the video group to gain the necessary permissions.

sudo usermod -a -G video $LOGNAME

Then download and install the installation script:

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/focal/amdgpu-install_5.4.50403-1_all.deb  
sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb

Finally, install the GPU driver and ROCmNote that I specifically specify version 4.5.2. The latest version is currently 5.4.3, but the latest version supporting AMD Instinct MI25/WX9100 is 4.5.2.

sudo amdgpu-install --usecase=rocm,dkms --rocmrelease=4.5.2

This version is commonly used for rocmdkms a typical use case. To see which use cases can benefit from the following commands:

sudo amdgpu-install --list-usecase

The installation process may take some time, mainly depending on network speed. After installation, we need to reboot the system to load the new driver.

sudo reboot

PyTorch Installation Steps

Most projects currently use PyTorch as a deep learning framework, but to use ROCm you must specifically specify the version during installation.

Note that most projects use requirements.txt Mid-term meeting PyTorch, the default installation is CUDA the version, if already installed, you need to uninstall it first.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm4.5.2

Finally, you must specify the corresponding ROCm version, this is the installation of version 4.5.2.

Test

Test PyTorch whether GPU is being used, you can enter the following command in Python:

import torch
torch.cuda.is_available()

If it shows True , it means the installation was successful!

Summary

After actual operation, it was found that running deep learning applications on AMD GPU is not difficult. Currently successfully running stable-diffusion and Vicuna-7B. If you have an AMD graphics card and would like to experiment with some deep learning projects, you are welcome to explore using AMD graphics cards to save the cost of purchasing Nvidia graphics cards.


Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.

Leave a Reply