Sparse Retrieval — AI glossary

Sparse retrieval scores documents by lexical overlap with the query — high-dimensional vectors where most entries are zero. BM25 is the canonical algorithm; TF-IDF, Lucene-style scoring, and SPLADE (learned sparse) are variants.

Sparse retrieval excels at exact-match queries (product codes, function names, rare terminology) where dense embeddings often miss. It's also faster to update incrementally and easier to interpret — every match has an explainable token overlap.

For RAG, sparse-only retrieval underperforms hybrid on most benchmarks but stays competitive on technical / code corpora where vocabulary is unique.

Related terms