RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Glossary / Large language models / DoRA (Weight-Decomposed Low-Rank Adaptation)
Large language models

DoRA (Weight-Decomposed Low-Rank Adaptation)

Also known as: weight-decomposed lora

DoRA (Weight-Decomposed Low-Rank Adaptation) is a fine-tuning method that improves upon LoRA by decomposing pre-trained weights into magnitude and direction components, then applying low-rank updates only to the direction. This aligns better with full fine-tuning behavior, often yielding higher accuracy for the same rank. Operators encounter DoRA when they need better task performance without increasing adapter size—DoRA matches or exceeds LoRA at rank 8 while using similar VRAM during training and negligible extra inference cost.

Deeper dive

Standard LoRA applies a low-rank update ΔW = BA to the weight matrix W, treating magnitude and direction jointly. DoRA first decomposes W into a magnitude vector m (scaling each output channel) and a directional matrix D/||D|| (unit norm). The low-rank update is applied only to D, while m remains trainable. This separation lets the model adjust the scale of features independently from their direction, mimicking the behavior of full fine-tuning more closely. In practice, DoRA adds a small number of extra parameters (the magnitude vector) but keeps the same rank for the directional adapter. Training memory is nearly identical to LoRA because the magnitude vector is tiny. During inference, DoRA can be merged into the base weights just like LoRA, so there is no runtime overhead. Operators using Hugging Face PEFT can enable DoRA by setting use_dora=True in the LoRA config. Benchmarks on commonsense reasoning and instruction following show DoRA outperforming LoRA at the same rank, especially at lower ranks (e.g., rank 4 vs rank 8).

Practical example

Fine-tuning Llama 3.1 8B on a reasoning dataset with rank 8 LoRA uses ~16 GB VRAM for training (with gradient checkpointing). Switching to DoRA at the same rank adds only ~8K extra parameters (the magnitude vector) and uses the same VRAM. Inference after merging is identical in speed and memory. The operator sees improved accuracy on the target task without any hardware upgrade.

Workflow example

In Hugging Face Transformers with PEFT, an operator replaces LoraConfig with LoraConfig(use_dora=True) to enable DoRA. The training script remains unchanged—same batch size, same optimizer, same VRAM. After training, model.merge_and_unload() merges the adapter into the base weights. The resulting model file is the same size as a LoRA-merged model and runs identically in llama.cpp or Ollama.

Related terms

QLoRALoRA (Low-Rank Adaptation)Fine-tuningParameter-Efficient Fine-Tuning (PEFT)

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →