Command R 35B

Positioning

The practical-VRAM Cohere model. Command R 35B fits on a 24 GB card at Q4 with no offload, retains the RAG specialization, and is the right pick for non-commercial RAG workflows on consumer hardware.

Strengths

22 GB at Q4_K_M — fits on 24 GB cards full-GPU.
Same RAG training as Command R+ — citation-aware, retrieval-friendly.
Strong tool-use format.

Limitations

CC-BY-NC license — non-commercial only without separate Cohere agreement.
General quality below Qwen 3 32B at similar VRAM.
Multilingual strong but narrower than Command R+.

Real-world performance on RTX 4090

Q4_K_M (22 GB): 55–70 tok/s decode — full GPU, no offload
Q5_K_M (26 GB): partial offload, 18–26 tok/s
Q8_0 (38 GB): workstation territory

Should you run this locally?

Yes, for RAG workflows in non-commercial settings on a 24 GB card. No, for commercial use without the Cohere license, or for general chat where Qwen 3 32B is a better generalist.

How it compares

vs Command R+ 104B → 104B is meaningfully smarter; 35B fits the VRAM budget.
vs Qwen 3 32B → Qwen wins on general use + license; Command R wins on RAG specifically.
vs Mistral Small 3 24B → Mistral has cleaner license; Command R has stronger RAG behavior.

Run this yourself

ollama pull command-r:35b-q4_K_M
ollama run command-r:35b-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090

Quantization	File size	VRAM required
Q4_K_M	21.0 GB	26 GB

Quantization

File size

VRAM required

Q4_K_M

21.0 GB

26 GB

Frequently asked

What's the minimum VRAM to run Command R 35B?

26GB of VRAM is enough to run Command R 35B at the Q4_K_M quantization (file size 21.0 GB). Higher-quality quantizations need more.

Can I use Command R 35B commercially?

Command R 35B is released under the CC BY-NC 4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Command R 35B?

Command R 35B supports a context window of 131,072 tokens (about 131K).

How do I install Command R 35B with Ollama?

Run `ollama pull command-r:35b` to download, then `ollama run command-r:35b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Command R 35B?

Can I use Command R 35B commercially?

What's the context length of Command R 35B?

How do I install Command R 35B with Ollama?