Large language models

DoRA (Weight-Decomposed Low-Rank Adaptation)

Also known as: weight-decomposed lora

DoRA (Weight-Decomposed Low-Rank Adaptation) is a fine-tuning method that improves upon LoRA by decomposing pre-trained weights into magnitude and direction components, then applying low-rank updates only to the direction. This aligns better with full fine-tuning behavior, often yielding higher accuracy for the same rank. Operators encounter DoRA when they need better task performance without increasing adapter size—DoRA matches or exceeds LoRA at rank 8 while using similar VRAM during training and negligible extra inference cost.

Deeper dive

Standard LoRA applies a low-rank update ΔW = BA to the weight matrix W, treating magnitude and direction jointly. DoRA first decomposes W into a magnitude vector m (scaling each output channel) and a directional matrix D/||D|| (unit norm). The low-rank update is applied only to D, while m remains trainable. This separation lets the model adjust the scale of features independently from their direction, mimicking the behavior of full fine-tuning more closely. In practice, DoRA adds a small number of extra parameters (the magnitude vector) but keeps the same rank for the directional adapter. Training memory is nearly identical to LoRA because the magnitude vector is tiny. During inference, DoRA can be merged into the base weights just like LoRA, so there is no runtime overhead. Operators using Hugging Face PEFT can enable DoRA by setting use_dora=True in the LoRA config. Benchmarks on commonsense reasoning and instruction following show DoRA outperforming LoRA at the same rank, especially at lower ranks (e.g., rank 4 vs rank 8).

Practical example

Fine-tuning Llama 3.1 8B on a reasoning dataset with rank 8 LoRA uses ~16 GB VRAM for training (with gradient checkpointing). Switching to DoRA at the same rank adds only ~8K extra parameters (the magnitude vector) and uses the same VRAM. Inference after merging is identical in speed and memory. The operator sees improved accuracy on the target task without any hardware upgrade.

Workflow example

In Hugging Face Transformers with PEFT, an operator replaces LoraConfig with LoraConfig(use_dora=True) to enable DoRA. The training script remains unchanged—same batch size, same optimizer, same VRAM. After training, model.merge_and_unload() merges the adapter into the base weights. The resulting model file is the same size as a LoRA-merged model and runs identically in llama.cpp or Ollama.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides

When it doesn't work

Deeper dive

Practical example

Workflow example

Related terms