Command R+ (Aug 2024)
Cohere's August 2024 Command R+ refresh. RAG-optimized; non-commercial license. Strong tool-calling and citation discipline.
Positioning
Cohere Command R+ (08-2024) is the open-weight refresh of Cohere's flagship retrieval-augmented model and one of the few 100B+ class open-weight models with explicit RAG / tool-use tuning. 104 billion parameters dense, 128K context, released under a research-and-non-commercial license (CC-BY-NC-4.0 + Cohere Acceptable Use). The 08-2024 update brought meaningful improvements over the original April 2024 release: better retrieval grounding accuracy, expanded multilingual coverage (10+ languages well-supported including Arabic, Korean, Hebrew), and stronger tool-use chain-of-thought.
Strengths
- Retrieval grounding is a genuine differentiator. Command R+ was trained with explicit RAG document-citation as a first-class capability — citation accuracy on multi-document QA is meaningfully better than equal-parameter Llama 3 / Qwen 3 models that bolt RAG on after training.
- Multilingual coverage is real. Genuinely useful for Arabic, Korean, Hebrew, Indonesian, Vietnamese — languages where Llama 3 lags.
- Tool-use is well-tuned. Function calling + multi-step tool use shows up clearly in agentic benchmarks.
- 128K context with stable degradation curve. Performance at 64K-128K context is closer to short-context performance than typical 100B-class models.
Limitations
- License is non-commercial. CC-BY-NC-4.0 + Cohere AUP — production commercial deployments require Cohere licensing. This is the single biggest practical limitation vs Llama 3.3 / Qwen 3 (which ship under permissive open-weight licenses).
- Compute requirements are real. 104B FP16 needs ~210 GB; 104B Q4 needs ~55-60 GB. Won't fit a single consumer GPU. MI300X / PRO 6000 Blackwell / Mac Studio M3 Ultra is the floor.
- Reasoning is not class-leading. DeepSeek V3 and Qwen 3 Reasoning beat Command R+ on math/code/logic benchmarks.
- Latency is workstation-tier. Decode at ~25-40 tok/s on PRO 6000 Blackwell at Q4. Production multi-tenant serving needs proper serving stack on H100/H200 cluster.
Real-world performance
- Llama 3 70B vs Command R+ 104B: Llama 3 70B is faster (smaller, better optimized) and equally capable on English-only tasks. Command R+ wins clearly on multilingual + RAG citation + tool-use chains.
- Qwen 3 235B-A22B MoE vs Command R+ 104B dense: Qwen 3 235B is faster (MoE active params ~22B) and stronger on reasoning. Command R+ wins on retrieval grounding + multilingual.
- Claude 3.5 Sonnet via API vs Command R+ self-hosted: API is faster and stronger on most tasks but data sovereignty + RAG + on-prem deployment is impossible. Pick by deployment requirements.
Should you run this locally?
Yes if you specifically need on-prem multilingual RAG at the 100B-class capability tier and your deployment context is research / non-commercial / Cohere-licensed. Production-grade RAG with strong citation accuracy is genuinely a Command R+ strength. Pick Mac Studio M3 Ultra (192 GB) or MI300X (192 GB) for single-card deployment.
No if you need permissive open-weight licensing (pick Llama 3.3 70B or Qwen 3 235B-A22B), reasoning-heavy workloads (pick DeepSeek V3 / Qwen 3 Reasoning), or production at scale where commercial licensing cost beats self-hosting math.
How it compares
- vs Llama 3.3 70B: Smaller, faster, permissive license. Llama wins for production commercial; Command R+ wins for multilingual RAG.
- vs Qwen 3 235B-A22B: Larger params + stronger reasoning + permissive license. Qwen 3 wins on most metrics; Command R+ wins on retrieval grounding citation accuracy.
- vs DeepSeek V3 (671B MoE): DeepSeek V3 is dramatically larger MoE with stronger reasoning. Command R+ wins on dense-model deployment simplicity.
- vs command-r-35b: Smaller Cohere sibling — same retrieval focus at lower parameter count. Pick R+ for full capability, R for cheaper inference.
Run this yourself
- Single-card workstation: PRO 6000 Blackwell (96 GB) at Q4 — fits comfortably with full context.
- Single-card AMD: MI300X (192 GB) at FP16 with full 128K context.
- Mac Studio: Mac Studio M3 Ultra (192 GB) at FP16 via MLX or llama.cpp Metal.
- Datacenter: 2× H100 PCIe NVLinked at FP8 production serving.
- Cloud rental: Runpod / Lambda H100 PCIe ~$2.50-3.50/hr.
Overview
Cohere's August 2024 Command R+ refresh. RAG-optimized; non-commercial license. Strong tool-calling and citation discipline.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Strongest open RAG-tuned model in 2024
- Citation discipline
Weaknesses
- Non-commercial license blocks production deployment
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| AWQ-INT4 | 60.0 GB | 72 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Command R+ (Aug 2024).
Frequently asked
What's the minimum VRAM to run Command R+ (Aug 2024)?
Can I use Command R+ (Aug 2024) commercially?
What's the context length of Command R+ (Aug 2024)?
Source: huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Command R+ (Aug 2024) runs on your specific hardware before committing money.