mistral
7B parameters
Commercial OK

Mistral 7B Instruct v0.3

The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.

License: Apache 2.0·Released May 22, 2024·Context: 32,768 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
5.5/10
Positioning

The model that defined local LLMs in 2023. Today, it's a benchmark baseline more than a working choice — every newer 7–8B model is meaningfully better while sitting in the same VRAM bracket. The Apache 2.0 license is its remaining real strength.

Strengths
  • True Apache 2.0 license: no usage caps, no name-restrictions, no DUA. The most legally clean 7B in active use.
  • Mature fine-tune ecosystem: thousands of derivatives, well-tested LoRA recipes, strong tooling support.
  • Predictable runtime behavior: every runner has stable, well-debugged Mistral support — no surprises.
Limitations
  • Instruction following lags Llama 3.1 8B: more frequent hallucinations on multi-step prompts, weaker JSON adherence.
  • No system-prompt support in the v0.3 chat template — quirks the integration story for assistants and agent loops.
  • Knowledge cutoff late-2023: noticeably stale on anything 2024+.
Real-world performance on RTX 4090
  • Q4_K_M (4.4 GB): 100–120 tok/s decode, TTFT under 70 ms
  • Q5_K_M (5.1 GB): 90–105 tok/s
  • Q8_0 (7.7 GB): 75–88 tok/s
Should you run this locally?

Yes, for Apache-license-required commercial deployment, fine-tune base for novel domain adaptation, or as a regression baseline. No, for any general chat or assistant work — Llama 3.1 8B and Qwen 2.5 7B both beat it.

How it compares
  • vs Llama 3.1 8B → Llama wins on instruction reliability, system-prompt support, and recency. The only reason to prefer Mistral is licensing.
  • vs Qwen 2.5 7B → Qwen wins on knowledge breadth and multilingual; Mistral has the simpler license. Almost always pick Qwen unless license is the gating concern.
  • vs Mistral Nemo 12B → Nemo replaces Mistral 7B v0.3 in the modern Mistral lineup — same Apache license, materially stronger model for ~50% more VRAM.
  • vs Phi-3.5 Mini → comparable capability, Mistral uses ~2× the VRAM. Phi wins on efficiency.
Run this yourself
ollama pull mistral:7b-instruct-v0.3-q4_K_M
ollama run mistral:7b-instruct-v0.3-q4_K_M
Settings: Q4_K_M GGUF, 4096 ctx, llama.cpp/CUDA, RTX 4090
Why this rating

5.5/10 — historically important, currently obsolete. Llama 3.1 8B and Qwen 2.5 7B both surpass it across the board. Keep on disk only if you have a fine-tuned variant you depend on.

Overview

The reference 7B from Mistral. Apache 2.0 with native function calling. Mature ecosystem.

Strengths

  • Apache 2.0
  • Native function calling
  • Battle-tested

Weaknesses

  • Outpaced by Qwen 3 8B
  • 32K context only

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M4.4 GB6 GB
Q5_K_M5.1 GB7 GB

Get the model

Ollama

One-line install

ollama run mistral:7bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

1 run on record
HardwareConf.QuantCtxTokens / secVRAMTTFTDate
NVIDIA GeForce RTX 4090(Ollama)MQ4_K_M4K
112.3tok/s
5.1 GB64 msApr 22, 26

Hardware that runs this

Cards with enough VRAM for at least one quantization of Mistral 7B Instruct v0.3.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Mistral 7B Instruct v0.3?

6GB of VRAM is enough to run Mistral 7B Instruct v0.3 at the Q4_K_M quantization (file size 4.4 GB). Higher-quality quantizations need more.

Can I use Mistral 7B Instruct v0.3 commercially?

Yes — Mistral 7B Instruct v0.3 ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Mistral 7B Instruct v0.3?

Mistral 7B Instruct v0.3 supports a context window of 32,768 tokens (about 33K).

How do I install Mistral 7B Instruct v0.3 with Ollama?

Run `ollama pull mistral:7b` to download, then `ollama run mistral:7b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.