Neural network architectures
Decoder-Only Transformer
Decoder-only is the architecture of GPT, Llama, Qwen, Mistral, DeepSeek, and almost every modern open-weight LLM. The model is a stack of transformer decoder blocks (causal self-attention only, no cross-attention to a separate encoder), trained autoregressively on next-token prediction.
The "encoder" doesn't exist as a separate component — input prompt tokens and generated tokens go through the same blocks. The causal mask prevents tokens from attending to future positions.
Decoder-only won the architecture war for generative LLMs because it scales cleanly with parameters and data, and a single model can do everything (chat, code, reasoning, summarization) with prompting alone.
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.