AI glossary

481 terms across 19 categories. 26 have full definitions today; the rest are cataloged and being written.

We focus depth on terms most relevant to running AI locally. Cloud-only and academic terms are listed for completeness but get less attention.

Core concepts & fields18 terms · 0 defined

Artificial Intelligence (AI)

Machine Learning (ML)

Deep Learning (DL)

Neural Networks

Artificial General Intelligence (AGI)

Artificial Superintelligence (ASI)

Inference (logical)

Automated Reasoning

Knowledge Representation

Cognitive Computing

Computational Intelligence

Large language models46 terms · 15 defined

Large Language Model (LLM)

A Large Language Model is a neural network with billions of parameters trained on massive text corpora to predict the ne

Quantization is the process of reducing a model's numeric precision to shrink its memory footprint with minimal quality

Inference is the act of running a trained model to generate predictions, as opposed to training which produces the model

Retrieval-Augmented Generation (RAG)

RAG is the pattern of retrieving relevant documents from a knowledge base and including them in the LLM's prompt so the

Hallucination is when an LLM generates plausible-sounding but factually incorrect information — citing papers that don't

Prompt Engineering

Prompt engineering is the practice of crafting model inputs to elicit better outputs without changing the model itself.

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning technique that adapts a large pre-trained model by training small low-rank mat

RLHF (Reinforcement Learning from Human Feedback)

Fine-tuning is continued training of a pre-trained model on a smaller, task-specific dataset. Pre-training builds genera

Embedding (Vector Embedding)

An embedding is a fixed-length vector representation of text, image, or other input — typically 384-3072 dimensions — wh

Foundation Model

Chain-of-Thought (CoT)

Chain-of-thought prompting is asking a model to show its reasoning step-by-step before giving the final answer. It drama

Latency measures how fast you get a response. Two metrics matter for local LLMs: Time to First Token TTFT — wall-clock

Vector Database

GGUF GGML Unified Format is the file format used by llama.cpp and its ecosystem Ollama, KoboldCPP, LM Studio. A single f

Throughput measures how much work a system completes per unit time — typically tokens-per-second across all concurrent r

Instruction Tuning

QLoRA combines LoRA/glossary/lora fine-tuning with 4-bit quantization of the base model. Introduced by Tim Dettmers in 2

Semantic Search

Direct Preference Optimization (DPO)

Few-Shot Prompting

In-Context Learning

Prompt Injection

Zero-Shot Prompting

Speculative Decoding

Speculative decoding speeds up LLM inference by using a small fast "draft" model to propose the next several tokens, the

Parameter-Efficient Fine-Tuning (PEFT)

Constitutional AI

Knowledge Distillation

Proximal Policy Optimization (PPO)

RLAIF (RL from AI Feedback)

Tree of Thoughts

Catastrophic Forgetting

Transformer & LLM components28 terms · 5 defined

The KV cache stores the key and value tensors from previous attention computations so the model doesn't recompute them a

The context window is the maximum number of tokens a model can attend to at once — both prompt and previously generated

Attention Mechanism

A token is the smallest unit of text a language model processes. Most modern models use subword tokenization, where comm

Tokenization is the process of converting text into the numeric tokens a model can process. Modern systems use subword t

Multi-Head Attention

Flash Attention

Flash Attention is a memory-efficient implementation of the attention mechanism that reduces memory usage from On² to On

Temperature (sampling)

Grouped-Query Attention (GQA)

Rotary Position Embedding (RoPE)

Byte Pair Encoding (BPE)

Encoder-Decoder

Top-p (Nucleus) Sampling

Cross-Attention

Positional Encoding

Layer Normalization

Greedy Decoding

Residual Connection

Feed-Forward Network

Notable models & companies18 terms · 0 defined

Claude (Anthropic)

Gemini (Google)

Google DeepMind

Phi (Microsoft)

Command (Cohere)

Hardware & infrastructure35 terms · 2 defined

Agents & agentic AI17 terms · 3 defined

An AI agent is software that uses an LLM to decide what to do, takes actions, observes results, and iterates toward a go

Function Calling / Tool Use

Function calling also called tool use is a capability where the model emits structured JSON requesting that specific too

MCP (Model Context Protocol)

MCP is an open protocol introduced by Anthropic in late 2024 for connecting AI agents to tools and data sources in a sta

Autonomous Agent

Multi-Agent System

Orchestration (agents)

Planning (in agents)

Agent Memory (Short/Long/Episodic)

Robotic Process Automation (RPA)

Goal-Oriented Agent

Agent-Based Modeling

Deliberative Agent

BDI Architecture

Specialized domains21 terms · 0 defined

MLOps & deployment16 terms · 0 defined

Model Deployment

Model Monitoring

Real-Time Inference

Batch Inference

Edge Deployment

Model Versioning

Canary Deployment

Shadow Deployment

Missing a term?

The glossary grows when we find gaps.

If you searched for an AI term and we don't have a definition, email hello@runlocalai.co with the term. We prioritize terms that are practical for running AI locally over purely academic ones, but we'll consider any reasonable suggestion.