AI glossary
481 terms across 19 categories. 26 have full definitions today; the rest are cataloged and being written.
We focus depth on terms most relevant to running AI locally. Cloud-only and academic terms are listed for completeness but get less attention.
Core concepts & fields18 terms · 0 defined
Large language models46 terms · 15 defined
A Large Language Model is a neural network with billions of parameters trained on massive text corpora to predict the ne
Quantization is the process of reducing a model's numeric precision to shrink its memory footprint with minimal quality
Inference is the act of running a trained model to generate predictions, as opposed to training which produces the model
RAG is the pattern of retrieving relevant documents from a knowledge base and including them in the LLM's prompt so the
Hallucination is when an LLM generates plausible-sounding but factually incorrect information — citing papers that don't
Prompt engineering is the practice of crafting model inputs to elicit better outputs without changing the model itself.
LoRA is a parameter-efficient fine-tuning technique that adapts a large pre-trained model by training small low-rank mat
Fine-tuning is continued training of a pre-trained model on a smaller, task-specific dataset. Pre-training builds genera
An embedding is a fixed-length vector representation of text, image, or other input — typically 384-3072 dimensions — wh
Chain-of-thought prompting is asking a model to show its reasoning step-by-step before giving the final answer. It drama
Latency measures how fast you get a response. Two metrics matter for local LLMs: Time to First Token TTFT — wall-clock
GGUF GGML Unified Format is the file format used by llama.cpp and its ecosystem Ollama, KoboldCPP, LM Studio. A single f
Throughput measures how much work a system completes per unit time — typically tokens-per-second across all concurrent r
QLoRA combines LoRA/glossary/lora fine-tuning with 4-bit quantization of the base model. Introduced by Tim Dettmers in 2
Speculative decoding speeds up LLM inference by using a small fast "draft" model to propose the next several tokens, the
Transformer & LLM components28 terms · 5 defined
The KV cache stores the key and value tensors from previous attention computations so the model doesn't recompute them a
The context window is the maximum number of tokens a model can attend to at once — both prompt and previously generated
A token is the smallest unit of text a language model processes. Most modern models use subword tokenization, where comm
Tokenization is the process of converting text into the numeric tokens a model can process. Modern systems use subword t
Flash Attention is a memory-efficient implementation of the attention mechanism that reduces memory usage from On² to On
Natural language processing28 terms · 0 defined
Notable models & companies18 terms · 0 defined
Generative AI23 terms · 0 defined
Neural network architectures23 terms · 1 defined
Mixture of Experts is a neural network architecture where multiple specialized sub-networks "experts" exist, but only a
Hardware & infrastructure35 terms · 2 defined
VRAM is the dedicated memory on a GPU. For local AI, VRAM capacity is the single most important spec — it determines whi
CUDA Compute Unified Device Architecture is NVIDIA's parallel-computing platform and the dominant API for GPU-accelerate
Frameworks & tools21 terms · 0 defined
Computer vision24 terms · 0 defined
Agents & agentic AI17 terms · 3 defined
An AI agent is software that uses an LLM to decide what to do, takes actions, observes results, and iterates toward a go
Function calling also called tool use is a capability where the model emits structured JSON requesting that specific too
MCP is an open protocol introduced by Anthropic in late 2024 for connecting AI agents to tools and data sources in a sta
Learning paradigms23 terms · 0 defined
Ethics, safety & society23 terms · 0 defined
Training & optimization34 terms · 0 defined
Specialized domains21 terms · 0 defined
Data & datasets34 terms · 0 defined
Classical ML algorithms27 terms · 0 defined
Evaluation metrics22 terms · 0 defined
MLOps & deployment16 terms · 0 defined
Missing a term?
The glossary grows when we find gaps.
If you searched for an AI term and we don't have a definition, email hello@runlocalai.co with the term. We prioritize terms that are practical for running AI locally over purely academic ones, but we'll consider any reasonable suggestion.