other
32B parameters
Commercial OK

OLMo 2 32B

Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.

License: Apache 2.0·Released Apr 12, 2026·Context: 32,768 tokens

Overview

Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.

Strengths

  • Fully open (data + code + weights)
  • Apache 2.0
  • Reproducible

Weaknesses

  • Behind closed-data peers on some benchmarks

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M19.0 GB24 GB

Get the model

HuggingFace

Original weights

huggingface.co/allenai/OLMo-2-32B-Instruct

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of OLMo 2 32B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run OLMo 2 32B?

24GB of VRAM is enough to run OLMo 2 32B at the Q4_K_M quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use OLMo 2 32B commercially?

Yes — OLMo 2 32B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of OLMo 2 32B?

OLMo 2 32B supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/allenai/OLMo-2-32B-Instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.