LLM performance on AMD GPU with ROCm 7.x vs 6.x

31 May 2026 | amdgpu llm benchmark

In this post, I compare the performance of large language models (LLMs) on the AMD Radeon AI PRO R9700 using Ollama with different ROCm versions. The R9700 is a powerful GPU designed for professional workloads, including machine learning and AI applications. In my previous post, LLM performance on AMD Radeon AI PRO R9700, I tested LLM performance with ROCm 6.4. Now, let’s see how ROCm 7.1 compares.

chart

Hardware

The PC used for testing is equipped with the following specifications:

Motherboard: B550 AORUS ELITE AX V2
GPU: AMD Radeon AI PRO R9700 (32 GB VRAM, RDNA 4)
CPU: AMD Ryzen 9 5950X (16 cores / 32 threads)
RAM: 64 GB DDR4-2666
Storage: NVMe Samsung SSD 970 EVO Plus 1TB (x2)

Test environment

For LLM performance testing, I used the following environment:

docker run -d --name ollama --device /dev/kfd --device /dev/dri \
  -e "HSA_OVERRIDE_GFX_VERSION=12.0.0" \
  -v ollama:/root/.ollama -p 11434:11434 ollama/ollama:rocm

Instead of the rocm tag, I used 0.12.11-rocm for ROCm 6.4 and 0.30.0-rocm for ROCm 7.1.

Pull and run each LLM model (repeat for every model):

docker exec -it ollama ollama run mistral:7b --verbose

> Tell me a story about a brave knight who saves a village from a dragon.

...

total duration:       9.705420035s
load duration:        12.444423ms
prompt eval count:    21 token(s)
prompt eval duration: 31.977818ms
prompt eval rate:     656.71 tokens/s
eval count:           904 token(s)
eval duration:        9.5508627s
eval rate:            94.65 tokens/s

Performance comparison

ROCm 6.4 (Ollama v0.12.11)

Model	VRAM	Prompt	Response
mistral:7b	6 GB	414 t/s	80 t/s
llama3.1:8b	7 GB	386 t/s	77 t/s
phi4:14b	12 GB	213 t/s	50 t/s
gpt-oss:20b	13 GB	704 t/s	91 t/s
gemma3:27b	19 GB	207 t/s	27 t/s
qwen3-coder:30b	18 GB	250 t/s	75 t/s
qwen3:32b	21 GB	179 t/s	23 t/s
deepseek-r1:32b	22 GB	99 t/s	23 t/s

ROCm 7.1 (Ollama v0.30.0)

Model	VRAM	Prompt	Boost	Response	Boost
mistral:7b	6 GB	656 t/s	+58%	94 t/s	+18%
llama3.1:8b	7 GB	786 t/s	+104%	89 t/s	+16%
phi4:14b	12 GB	592 t/s	+178%	56 t/s	+12%
gpt-oss:20b	13 GB	972 t/s	+38%	100 t/s	+10%
gemma3:27b	19 GB	346 t/s	+67%	28 t/s	+4%
qwen3-coder:30b	18 GB	463 t/s	+85%	83 t/s	+11%
qwen3:32b	21 GB	287 t/s	+60%	24 t/s	+4%
deepseek-r1:32b	22 GB	201 t/s	+103%	26 t/s	+13%

Conclusion

Across the tested models, ROCm 7.1 improved response speed by 11% and prompt processing speed by 87% compared to ROCm 6.4. This shows that AMD’s ROCm 7.x optimizations with Ollama have a significant impact on LLM performance, making this stack a strong choice for AI workloads on AMD GPUs.

Meefik's Blog

Freedom and Open Source