Creating a PyTorch/TensorFlow Code Environment on AMD GPUs

Originally posted:
Last updated:
Yao Fehlis's avatar
Yao Fehlis
Corresponding Author
Rajat Arora's avatar
Rajat Arora
Reviewer
Justin Chang's avatar
Justin Chang
Reviewer
Austin Ellis's avatar
Austin Ellis
Reviewer

Goal: The machine learning ecosystem is quickly exploding and we aim to make porting to AMD GPUs simple with this series of machine learning blogposts.

Audience: Data scientists and machine learning practitioners, as well as software engineers who use PyTorch/TensorFlow on AMD GPUs. You can be new to machine learning, or experienced in using Nvidia GPUs.

Motivation: Because when starting a new machine learning project, you may notice that many existing codes on GitHub are almost always CUDA based. If you have AMD GPUs and follow their instructions on running the code, it often does not work. We provide steps, based on our experience, that can help you get a code environment working for your experiments and to manage working with CUDA-based code repositories on AMD GPUs.

Differentiator from existing online resources:

  • This is from a machine learning practitioner’s perspective, to guide you away from rabbit holes due to habits and preferences, such as using Jupyter Notebooks and pip install.

  • This is not to teach you how to install PyTorch/TensorFlow on ROCm because this step alone often times cannot lead to successfully running machine learning code.

  • This is not to teach you how to HIPify code, but instead, to let you know that sometimes you don’t even need that step.

  • As of today, this is the only documentation so far on the internet that has end-to-end instructions on how to create PyTorch/TensorFlow code environment on AMD GPUs.

The prerequisite is to have ROCm installed, follow the instructions here and here.

Install PyTorch or TensorFlow on ROCm

Option 1. PyTorch

We recommend following the instructions on the official ROCm PyTorch website.

Option 2. TensorFlow

We recommend following the instructions on the official ROCm TensorFlow website.

Note: We also strongly recommend using Docker image with PyTorch or TensorFlow pre-installed. The reason is that if you create a virtual environment or conda environment, certain ROCm dependencies may not be properly installed. It can be non-trivial to install dependencies.

Note: You don’t need flag —gpus all to run docker on AMD GPUs.

Git clone the source code you want to run

Terminal window
<code>git<span class="w"> </span>clone<span class="w"> </span>–-recursive<span class="w"> </span><https: github.com="" project="" repo.git="">
</https:></code>

Install library requirements based on the GitHub repository

  • Skip the commands that create virtual environments or conda environments. They are usually in machine_install.sh or setup.sh files.

  • Go directly to the library list and remove torch and tensorflow since these are CUDA-based by default. The docker containers should already have those libraries installed for ROCm. You can usually find the library list in requirements.txt.

  • Run pip3 install –r requirements.txt where requirements.txt contains single lines with package names (and possibly package versions).

Run your code

If you can run your code without problems, then you have successfully created a code environment on AMD GPUs!

If not, then it may be due to the additional packages in requirements.txt depending on CUDA, which needs to be HIPified to run on AMD GPUs.

Obtain HIPified library source code

Option 1. Find existing HIPified library source code

You can simply search online or on GitHub for “library_name” + “ROCm”. The HIPified code will pop up if it exists.

Since this step is not trivial, here is an example:

If you are trying to run large language model related code, you may need the library bitsandbytes (see link).

Searching online for “bitsandbytes ROCm” you will find this fork which adds ROCm support with a HIP compilation target.

Terminal window
<code>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/agrocylo/bitsandbytes-rocm<span class="w"> </span>
<span class="nb">cd</span><span class="w"> </span>bitsandbytes-rocm<span class="w"> </span>
<span class="nb">export</span><span class="w"> </span><span class="nv">ROCM_HOME</span><span class="o">=</span>/opt/rocm/<span class="w"> </span>
make<span class="w"> </span>hip<span class="w"> </span>-j<span class="w"> </span>
python3<span class="w"> </span>setup.py<span class="w"> </span>install<span class="w"> </span>
</code>

Note: the installation location may have the version number such as /opt/rocm-5.5.0.

Option 2. HIPify code if necessary

We recommend following the below tutorials for this option.

Commit changes to Docker Image

Once you finish modifying the new Docker container following the first step (“Install PyTorch or TensorFlow on ROCm”), exit out:

Terminal window
<code><span class="nb">exit</span>
</code>

Prompt the system to display a list of launched containers and find the docker container ID:

Terminal window
<code>docker<span class="w"> </span>ps<span class="w"> </span>-a
</code>

Create a new image by committing the changes:

Terminal window
<code>docker<span class="w"> </span>commit<span class="w"> </span><span class="o">[</span>CONTAINER_ID<span class="o">]</span><span class="w"> </span><span class="o">[</span>new_image_name<span class="o">]</span>
</code>

In conclusion, this article introduces key steps on how to create PyTorch/TensorFlow code environment on AMD GPUs. ROCm is a maturing ecosystem and more GitHub codes will eventually contain ROCm/HIPified ports. Future posts to AMD lab notes will discuss the specifics of porting from CUDA to HIP, as well guides to running popular community models from HuggingFace.

Yao Fehlis's avatar

Yao Fehlis

Corresponding Author
Yao Fehlis is a Member of Technical Staff (MTS) at Research and Advanced Development at AMD, and her focus involves AI for science, AI for manufacturing and large language models. In AI for science, she works internally and externally with academia to enhance traditional HPC applications with machine learning to accelerate scientific discoveries. In AI for manufacturing, she works internally with product teams to use machine learning to uncover optimal parameters and configurations in AMD designs. Prior to joining AMD, she worked as a data scientist at KUKA Robotics where she led predictive maintenance projects for industrial KUKA robots and worked on deep learning projects such as teaching robots to pick up objects. She holds a PhD in computational chemistry from Rice University.
Rajat Arora's avatar

Rajat Arora

Reviewer
Rajat Arora is a Senior Member of Technical Staff (SMTS) Software System Design Engineer in the Data Center GPU Software Solutions group at AMD, where he works on porting and optimizing high-performance computing applications for AMD GPUs. He obtained his PhD in Computational Mechanics from Carnegie Mellon University. His PhD research focused at the intersection of high performance scientific computing, numerical analysis, and material science. Recently, his research interests have expanded to include development of physics-informed machine learning models and tools to accelerate scientific discovery and engineering design.
Justin Chang's avatar

Justin Chang

Reviewer
Justin Chang is a Senior Member of Technical Staff (SMTS) Software System Design Engineer in the Data Center GPU Software Solutions group and manages the AMD lab notes blog series. He received his PhD degree in Civil Engineering from the University of Houston, where he published several journal papers on structure-preserving high performance computational methods for transport in porous media. As a postdoc, he worked for both Rice University and the National Renewable Energy Laboratory to accelerate finite element simulation time of subsurface flow through dual porosity porous medium and lithium-ion batteries used in electric vehicles. He also worked for the Oil and Gas industry and focused on GPU porting and optimization of key FWI, RTM, and other seismic imaging workloads.
Austin Ellis's avatar

Austin Ellis

Reviewer
Austin Ellis is an Member of the Technical Staff (MTS) at AMD and on-site APU Application Architect at Lawrence Livermore National Laboratory helping to deploy AMDʼs next Exascale system, El Capitan. He specializes in high performance computing, machine learning, GPU computing, scalable algorithms, and data analytics. Previously, he was an HPC research scientist within the Analytics and AI Methods at Scale group within the Oak Ridge Leadership Computing Facility (OLCF). While at the OLCF, he was part of the HPL-MxP team which currently holds the record for the world's fastest computation at 9.951 ExaOps on the world's first Exascale supercomputer, Frontier, powered by AMD hardware.

Related news and technical articles