server
Open source
free
4.3/5
NVIDIA TensorRT-LLM
NVIDIA's optimized inference path for Hopper, Ada, and Blackwell. Compile your model once, serve at peak hardware speed.
Overview
NVIDIA's optimized inference path for Hopper, Ada, and Blackwell. Compile your model once, serve at peak hardware speed.
Pros
- Peak NVIDIA hardware utilization
- FP8 / FP4 acceleration on Blackwell
Cons
- NVIDIA only
- Compilation step is heavy
Compatibility
| Operating systems | Linux Windows |
| GPU backends | NVIDIA CUDA |
| License | Open source · free |
Get NVIDIA TensorRT-LLM
Frequently asked
Is NVIDIA TensorRT-LLM free?
Yes — NVIDIA TensorRT-LLM is free to download and use and open-source under a permissive license.
What operating systems does NVIDIA TensorRT-LLM support?
NVIDIA TensorRT-LLM supports Linux, Windows.
Which GPUs work with NVIDIA TensorRT-LLM?
NVIDIA TensorRT-LLM supports NVIDIA CUDA. CPU-only inference is also possible but slow.
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.