RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Hardware
  4. /AMD Ryzen AI 9 HX 370 (Strix Point)
UNIT · AMD · PC-NPU
32 GB UNIFIEDpc-npu·Reviewed May 2026

AMD Ryzen AI 9 HX 370 (Strix Point)

AMD Ryzen AI 9 HX 370 spec card — unified memory, 90 GB/s bandwidth, 28 W; 8B INT4 on XDNA 2 NPU (50 TOPS)
diagram
Credit: RunLocalAI·License: CC-BY-4.0 (original illustration)·Source

Strix Point laptop SoC. XDNA 2 NPU at 50 TOPS INT8 + RDNA 3.5 iGPU. ROCm support on Linux unlocks llama.cpp ROCm path; on Windows, ONNX Runtime + DirectML.

Released 2024
▼ CHECK CURRENT PRICE· 1 retailer

AMD Ryzen AI 9 HX 370 (Strix Point)

Check on Amazon→

Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.

RUNLOCALAI SCORE
See full leaderboard →
127/ 1000
DD-tier
Estimated
Throughput
26/ 500
VRAM-fit
0/ 200
Ecosystem
130/ 200
Efficiency
26/ 100

Extrapolated from 90 GB/s bandwidth — 9.0 tok/s estimated. No measured benchmarks yet.

WORKLOAD FIT
Try other hardware →

Plain-English: Doesn't fit modern chat models usefully.

7B chat△
Marginal
14B chat△
Marginal
32B chat△
Marginal
70B chat✗
Doesn't fit
Coding agent△
Marginal
Vision (≤8B VLM)△
Marginal
Long context (32K)△
Marginal
✓Comfortable — fits with headroom
~Tight — works, no slack
△Marginal — needs aggressive quant
✗Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 8, 2026
3.9/10

What it does well

The AMD Ryzen AI 9 HX 370 (Strix Point) is AMD's flagship laptop APU and the most credible "AI laptop without a discrete GPU" pick on the Windows side. Twelve Zen 5 / Zen 5c CPU cores + 16-core RDNA 3.5 integrated GPU + dedicated XDNA 2 NPU rated at 50 TOPS — all in a laptop chassis at $1,599 retail (mid-tier laptops often ship with this chip in well-built systems). The unified memory architecture (typically 32 GB DDR5X-7500 in laptops) is shared across CPU + iGPU + NPU, which means smaller LLMs can use the full 32 GB DRAM ceiling without VRAM constraints. For 7B–13B class inference, the iGPU + NPU combination delivers genuinely useful throughput (15–35 tok/s on 7B Q4 is realistic) without the discrete GPU's 100+ W power envelope. Battery life under inference load is meaningfully better than RTX-equipped gaming laptops (5–8 hours real local AI on battery vs 1–3 hours on Razer Blade 16). The chip is excellent for "I want to run small AI models on my work laptop" segment without paying for a gaming-class discrete GPU.

Where it breaks

  • No CUDA — RDNA 3.5 + XDNA 2 NPU are AMD ecosystems. llama.cpp ROCm + DirectML + ONNX Runtime work; vLLM, SGLang, TensorRT-LLM all do not. If your stack is CUDA-locked, this APU is friction.
  • NPU framework support is thin. XDNA 2's 50 TOPS sounds compelling but real-world LLM throughput on the NPU is limited by software — most inference runs on the iGPU instead, where ROCm support is patchy on Windows.
  • iGPU memory bandwidth limits decode speed. Shared LPDDR5X-7500 at ~120 GB/s is dramatically below discrete GPU bandwidth (RTX 4090 mobile at 1.0 TB/s). For 13B+ workloads, decode is meaningfully slower than equivalent discrete-GPU laptops.
  • Hard ceiling on model size. 32 GB unified RAM minus OS + apps leaves ~24 GB for LLM workloads. 70B Q4 doesn't fit. 32B FP16 doesn't fit. 14B Q4 fits with limited context.
  • No real story for fine-tuning. Wrong tier — pick a discrete-GPU laptop or workstation.
  • Variable system quality. The HX 370 ships in laptops ranging from $1,500 to $2,500 with very different cooling + RAM configurations. Performance varies dramatically — read laptop reviews carefully before buying.
  • Linux support is improving but laggy. Strix Point Linux drivers (kernel 6.10+, mesa 24.x+) are functional but new-architecture kinks remain. Windows is the more polished path in 2026.

Ideal model range

  • Sweet spot: 7B FP16 / Q5 inference at 25–40 tok/s on the iGPU — genuinely useful for IDE coding assistants, document Q&A.
  • Sweet spot: 7B QLoRA inference for embedding models or specialized smaller-class fine-tunes.
  • Sweet spot: Multi-model agentic loops fitting 24 GB total — 4B + embedding + small re-ranker.
  • Sweet spot: Battery-life-friendly local AI for the traveling professional who doesn't need 24×7 fast inference.
  • Stretch: 13B Q4 with 8K context (10–18 tok/s — usable but slow for interactive use).
  • Bad fit: 32B-class anything, 70B-class anything, fine-tuning, production serving, anything that requires CUDA.

Bad use cases

  • Anyone targeting 70B / 32B local AI. Hard memory ceiling + bandwidth ceiling. Pick a discrete-GPU laptop (Razer Blade 16, ASUS ROG Strix Scar 18) or MacBook Pro 16 M4 Max.
  • CUDA-locked stacks. No CUDA. Don't pick AMD if the rest of your toolchain is NVIDIA.
  • Production serving / sustained inference. Wrong tier — laptop APU.
  • Maximum tok/s on small models. Even discrete laptop GPUs (RTX 4060/4070 Mobile) win decisively on bandwidth-bound decode.
  • Heavy fine-tuning workflows. Pick a discrete GPU.
  • Gaming + AI dual purpose. RDNA 3.5 iGPU is gaming-capable but a discrete RTX laptop is dramatically better at both AI and gaming.

Verdict

Buy this if you want a laptop that runs sub-13B local AI well (8B Q4 / Q5 at usable speed), you value battery life and silence over raw throughput, your stack is Windows-AMD-friendly (ROCm / DirectML / ONNX Runtime), and you don't need 14B+ models. AMD Ryzen AI 9 HX 370 is the right pick for the segment that wants "good enough" local AI on a normal-form-factor productivity laptop without paying for a gaming GPU.

Skip this if you need 14B+ models (jump to discrete GPU laptop), you're CUDA-locked (pick NVIDIA), you want maximum local AI performance (Razer Blade 16 with RTX 5090 Mobile is dramatically faster), you can use macOS (MacBook Pro M4 Max wins on memory ceiling + ecosystem maturity at higher tier), or you're production-serving (wrong category entirely).

How it compares

  • vs Intel Core Ultra 7 258V (Lunar Lake) → Lunar Lake at $1,199 has Intel Arc Xe2 iGPU + 48 TOPS NPU vs Strix Point's RDNA 3.5 iGPU + 50 TOPS NPU. Intel has slightly better Windows-on-ARM-ish driver experience; AMD has more raw iGPU compute. Both are sub-13B class machines. Pick by laptop OEM availability + Windows ecosystem preference.
  • vs Razer Blade 16 (RTX 5090 Mobile) → Razer Blade 16 has 24 GB CUDA discrete GPU + dramatically more compute + actual 70B Q4 capability at +180% price. Strix Point wins on battery life, silence, weight, sub-13B-class accessibility. Pick by workload size — sub-13B accept Strix Point, anything serious pick discrete GPU.
  • vs MacBook Pro 16 M4 Max (128 GB unified) → MBP 16 wins on memory ceiling (4× the RAM), battery life, silence, ecosystem (MLX is more polished than ROCm-on-Windows). Strix Point laptops win on price (sub-$1,800 vs $4,000+), Windows ecosystem, AMD-aligned stacks. Pick by ecosystem and budget.
  • vs Lenovo Legion 5 Pro Gen 7 (RTX 3080 Mobile) → Legion has 16 GB discrete CUDA at +$700 price. Discrete GPU wins for AI throughput; Strix Point wins for portability + battery + sub-13B accessibility.
  • vs Framework Laptop 16 (RX 7700S) → Framework Laptop 16 with discrete dGPU (8 GB RX 7700S) is similar AMD AI ecosystem at modest discrete GPU. Pick Framework for repairability + AMD discrete; HX 370 systems for newer NPU + Strix Point arch + better battery life on integrated-only workloads.
BLK · OVERVIEW

Overview

What the Ryzen AI 9 HX 370 actually is, in local-AI terms

The AMD Ryzen AI 9 HX 370 (Strix Point) is AMD's 2024-2025 Copilot+ PC laptop SoC and the most capable on-device-AI x86 mobile chip AMD has shipped. 12 Zen 5 / Zen 5c cores, an RDNA 3.5 integrated GPU, and the XDNA 2 NPU at 50 TOPS INT8 — the headline number that puts the chip past Microsoft's 40 TOPS Copilot+ certification floor.

For the local-AI operator looking at "what's the best AMD-powered AI laptop in 2026," the HX 370 is the answer. The same operator should also be honest about the trade: Strix Point is a meaningfully better laptop CPU + iGPU + NPU combination than the Hawk Point predecessor, but it does not change the fundamental fact that on-device AI on x86 laptops in 2026 is a 7B-class story, not a 32B-class story.

Where it fits in the hardware ladder

In the Copilot+ x86 laptop tier:

Chip NPU TOPS iGPU Mem BW Notes
Intel Lunar Lake (258V) 48 Xe2 136 GB/s Intel Copilot+ flagship
AMD Ryzen AI 9 HX 370 50 RDNA 3.5 ~90 GB/s AMD Copilot+ flagship
AMD Ryzen AI 9 365 50 RDNA 3.5 ~90 GB/s sibling chip, slightly fewer cores
Snapdragon X Elite 45 Adreno X1 135 GB/s ARM Copilot+ alternative

vs the Apple alternative: the Apple M4 Max plays in a different league for "real LLM inference" because of memory bandwidth (~400+ GB/s) and unified-memory capacity. The HX 370 is competitive for NPU-accelerated small-model inference, not for transformer attention bound on memory bandwidth.

Best use cases

  • Copilot+ native Windows AI workloads. Phi-4 / Llama 3.2 1B / 3B running through ONNX Runtime + DirectML on the NPU. Native Windows on-device AI is what Strix Point is built for.
  • Linux laptop with a usable iGPU LLM path. ROCm support on Strix Point's RDNA 3.5 iGPU is real in 2026 and lets you run llama.cpp GPU-accelerated on a laptop without a dGPU.
  • Battery-aware on-device coding assistant. Small coding models (Qwen 2.5 Coder 1.5B / 3B) routed through the NPU keep the dGPU idle, save battery.
  • Enterprise compliance laptops. Air-gapped on-device AI for fields where cloud inference is prohibited.
  • Developer dev box with light local-AI workloads. Use the laptop for prototyping, do real inference on a workstation.

For the laptop pattern see /stacks/private-rag-laptop.

What it can run

The story is memory-bandwidth-bound, not compute-bound. ~90 GB/s system DRAM bandwidth is the actual ceiling on transformer decode tok/s.

Model class Quant Path Realistic tok/s
1B-3B INT4 / INT8 NPU + ONNX Runtime + DirectML usable, snappy
7B-8B Q4_K_M iGPU via ROCm + llama.cpp usable for short prompts
7B-8B Q4_K_M NPU via ONNX + DirectML usable; small wins over iGPU
13B Q4_K_M iGPU + 32 GB RAM works but slow
32B+ — — unrealistic on a laptop

The headline 50 TOPS INT8 number is meaningful for prefill but transformer decode is bandwidth-bound, and 90 GB/s is the wall.

OS support

OS Quality Notes
Windows 11 (24H2+) excellent the Copilot+ path; ONNX Runtime + DirectML
Linux (Ubuntu 24.04 LTS) good ROCm 6.x supports the iGPU; some laptop quirks
Linux (other) partial distro-dependent driver packaging
WSL2 partial GPU passthrough exists; rougher than native
macOS unsupported

If your day job is Linux dev, expect a few weeks of debugging to get full ROCm + NPU + power management working cleanly.

Software / runtime support

  • ONNX Runtime + DirectML — the canonical NPU path on Windows
  • AMD Ryzen AI Software (XDNA driver) — Windows-only; the official NPU access SDK
  • ROCm 6.x on Linux — works on the RDNA 3.5 iGPU with the right gfx target
  • llama.cpp HIP — works on Linux; the Linux-native path
  • llama.cpp Vulkan — cross-platform fallback; usable on Windows when ROCm-on-iGPU isn't an option
  • Ollama — works on Linux via HIP, on Windows via Vulkan
  • OpenVINO — partial support; primarily an Intel path
  • CUDA / TensorRT-LLM / ExLlamaV2 — wrong vendor

For format support across runtimes see /systems/quantization-formats.

What breaks first

  1. NPU access on non-Windows. XDNA Linux driver work is ongoing in 2026 but Windows is dramatically smoother for NPU paths.
  2. iGPU memory budget. The iGPU shares system DRAM; allocating 16 GB to an LLM leaves less for the OS / apps. Plan around 32 GB minimum, ideally 64 GB for serious work.
  3. Power management vs throughput. Sustained inference pushes the chip to its 28 W ceiling; battery life collapses. Plug in for serious workloads.
  4. Driver lineage drift on Linux. ROCm + amdgpu + linux-firmware versions need to match; mismatches surface as silent CPU fallback. See /errors/rocm-device-not-found.
  5. Bleeding-edge model architectures on the NPU. ONNX-conversion-and-NPU-deployment workflow assumes the model converts cleanly; novel architectures often need manual op-implementations.

Alternatives by intent

If you want… Reach for
Intel x86 Copilot+ flagship Intel Lunar Lake (258V)
ARM Windows alternative Snapdragon X Elite
Apple-ecosystem on-device Apple M4 Max
Real serious LLM workstation RTX 4070 Ti Super or RTX 4090 desktop
Older AMD AI laptop (cheaper) Hawk Point Ryzen AI 8040
Maximum unified memory on Apple Apple M3 Ultra Mac Studio

Best pairings

  • Windows 11 24H2 + ONNX Runtime + DirectML + Phi-4 / Llama 3.2 — the Copilot+ canonical setup
  • Ubuntu 24.04 LTS + ROCm 6.x + llama.cpp HIP — the Linux dev box default
  • Ollama + Llama 3.1 8B Q4_K_M — the cross-platform homelab-on-laptop fallback
  • 64 GB system RAM — non-negotiable for serious laptop AI; iGPU borrows from main DRAM
  • Plugged-in operation for LLM workloads — battery-only is fine for 1-3B models, painful for 8B+

Who should avoid the Ryzen AI 9 HX 370

  • Anyone expecting workstation-class throughput from a laptop. Wrong tier.
  • Operators on a CUDA-only software stack. Wrong vendor.
  • Multi-user-serving production. Wrong form factor.
  • Apple-ecosystem operators. Stay with Apple Silicon.
  • Linux purists who want zero-friction. The Strix Point Linux experience is good but not the smoothest path; M4 Max + macOS is smoother for "just works."
  • Workloads that need 24+ GB of GPU memory. Wrong tier; buy a desktop dGPU.

Related

  • Stacks: /stacks/private-rag-laptop, /stacks/android-on-device-ai
  • System guides: /systems/linux-local-ai, /systems/quantization-formats
  • Tools: ONNX Runtime, OpenVINO, llama.cpp, ROCm
  • Hardware: Intel Lunar Lake (258V), Snapdragon X Elite, Apple M4 Max
  • Errors: /errors/rocm-device-not-found, /errors/wsl2-gpu-not-detected
Retailers we'd check:Amazon

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM0 GB
System RAM (typical)32 GB
Power draw28 W
Released2024
MSRP$1599
Backends
ROCm
Compare alternatives

Hardware worth comparing

Same VRAM tier and the one step above and below — so you can frame the buying decision against real options.

Same VRAM tier
Cards in the same memory band
  • Intel Core Ultra 7 258V (Lunar Lake)
    intel · 0 GB VRAM
    3.8/10
  • Apple M3 Ultra
    apple · 0 GB VRAM
    10.0/10
  • Apple M2 Ultra
    apple · 0 GB VRAM
    9.9/10
  • Apple M4 Ultra
    apple · 0 GB VRAM
    10.0/10
  • Qualcomm Snapdragon X Plus
    qualcomm · 0 GB VRAM
    5.8/10
  • Apple M4 Pro
    apple · 0 GB VRAM
    10.0/10
Step up
More VRAM — bigger models, more context
  • Lenovo Legion 5 Pro Gen 7 (RTX 3080 16GB)
    nvidia · 16 GB VRAM
    9.3/10
  • NVIDIA GeForce RTX 4090
    nvidia · 24 GB VRAM
    8.8/10
  • NVIDIA GeForce RTX 3090 Ti
    nvidia · 24 GB VRAM
    8.8/10
Step down
Less VRAM — cheaper, more constrained
  • NVIDIA GeForce RTX 5070 Ti
    nvidia · 16 GB VRAM
    8.1/10
  • NVIDIA GeForce RTX 4070 Ti Super
    nvidia · 16 GB VRAM
    8.1/10
  • NVIDIA GeForce RTX 4070 Ti
    nvidia · 12 GB VRAM
    7.3/10
Buyer guides where this card is the right answer

Ryzen AI 9 HX 370 (Strix Point) is the headline integrated-graphics AI part of 2026. The iGPU + eGPU guides below frame this decision space.

  • best iGPU for local AI
  • best eGPU setup for local AI
  • best mini PC for local AI

Frequently asked

Does AMD Ryzen AI 9 HX 370 (Strix Point) support CUDA?

No — AMD Ryzen AI 9 HX 370 (Strix Point) is an AMD card. Use ROCm (Linux) or the Vulkan backend in llama.cpp instead. CUDA-only tools won't work.

Where next?

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
Troubleshooting
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.