Tool calling

Tool calling (also called function calling) is a model's structured output capability where it produces JSON-shaped tool invocations instead of free-form text when the use case calls for action. The model sees a list of available tools (with JSON schemas), decides which to call, and emits {"name": "search_web", "args": {"query": "..."}}. The runtime parses, executes, and feeds the result back as a new user-role message.

What tool calling enables: agents (multi-step reasoning + action loops), structured extraction (forcing the model to emit JSON conforming to a schema), MCP clients (the Model Context Protocol exposes tools as a standard interface). Modern open-weight models with strong tool calling: Qwen 2.5 Coder, Llama 3.3, DeepSeek V4, Mistral Small 3 — all train on tool-using corpora and emit tool calls reliably.

Operator caveats that matter: (1) tokenizer alignment — some quantization formats subtly damage tool-call output structure; verify your AWQ/GGUF quant produces clean JSON before committing. (2) temperature — keep ≤0.4 for tool-calling agents; >0.6 causes JSON parse errors as the model invents tool names. (3) runtime parser — vLLM's tool-call parsing was buggy until 0.6.x; SGLang shipped it cleanly later. (4) schema complexity — large JSON schemas burn KV cache; keep tool definitions terse.

Related terms

See also