moonshot
200B parameters
Restricted

Kimi K1.5

Moonshot's reasoning model. Reasoning-token emission with very long thinking-block depth — sometimes 5000+ tokens per query. Strong on math; restricted commercial license.

License: Moonshot License·Released Dec 1, 2025·Context: 200,000 tokens

Overview

Moonshot's reasoning model. Reasoning-token emission with very long thinking-block depth — sometimes 5000+ tokens per query. Strong on math; restricted commercial license.

Strengths

  • Deep reasoning at frontier scale
  • Strong math benchmarks

Weaknesses

  • Long reasoning blocks add wall-clock cost
  • Restricted license

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
AWQ-INT4115.0 GB140 GB

Get the model

HuggingFace

Original weights

huggingface.co/moonshotai/Kimi-K1.5

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Kimi K1.5.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run Kimi K1.5?

140GB of VRAM is enough to run Kimi K1.5 at the AWQ-INT4 quantization (file size 115.0 GB). Higher-quality quantizations need more.

Can I use Kimi K1.5 commercially?

Kimi K1.5 is released under the Moonshot License, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Kimi K1.5?

Kimi K1.5 supports a context window of 200,000 tokens (about 200K).

Source: huggingface.co/moonshotai/Kimi-K1.5

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.