Tool use is how an LLM goes from “text in, text out” to “text in, call an API, get data, then text out.” The model is given a list of tools (name, description, parameters) and can request a call; your code runs the call and returns the result to the model. That’s the basis of search assistants, code runners, and custom integrations. Function calling is the same idea with a more formal name: the model outputs a structured call (function name + arguments) instead of free text, which makes parsing and execution reliable.
Most APIs (OpenAI, Anthropic, Google) support a tools or functions parameter: you pass a schema (e.g. JSON Schema for the arguments), and the model responds with something like tool_calls: [{ name: “get_weather”, arguments: { “city”: “London” } }]. You execute that, then send the result back in a follow-up message. The model can then call another tool or give a final answer. The loop is: user message → model (maybe with tool_calls) → you run tools → you send tool results → model again.
Good tool design matters: clear names, concise descriptions, and well-scoped parameters. Too many tools or vague descriptions and the model gets confused; too few and it can’t do the job. You also need to handle errors (tool failed, timeout) and sometimes rate limits or auth. Passing back “Tool failed: …” as the observation lets the model retry or explain the failure to the user.
From simple APIs (one or two tools) to full autonomy (dozens of tools, multi-step plans), the same pattern holds: the model decides when and how to call, your code enforces safety and runs the call. That separation keeps the LLM in the “reasoning” role and keeps dangerous or privileged actions under your control.
Expect more standardization (e.g. OpenAPI-based tool discovery) and better models that follow tool schemas more reliably.
nJoy 😉
