Your agent can't do anything without tools.
The first agent you built in Module 001 had one tool: Read. It could look at a file. Without it, the agent was a conversation. With it, the agent could actually get work done.
Every useful agent you'll ever build runs on tools. The LLM decides what to do. Tools let it actually do things. A chatbot has zero tools. An agent has at least one.
By the end of this module, you'll have an agent that fetches a web page, reads it, and summarizes it. Same daily-briefing pattern from Module 001, but now it can work off a URL instead of a local file.
What's new:
- You'll see how Claude Code decides which tool to use at each step.
- You'll add the WebFetch tool to your existing agent.
- You'll test two scenarios, one where the tool works and one where it fails, and see how the agent handles both.
- You'll learn the shape of a tool description, which is where most tool problems actually live.
Quick refresher: agents are three parts. System prompt (the contract). Tools (what it can do). Memory (what it remembers). Module 002 drilled on the system prompt. Module 003 is about the tools layer.
One note. This module stays inside Claude Code's built-in tools (Read, WebFetch, Bash, and a few others). Building your own custom tools from scratch lives in Module 014 (Your first MCP server). For now, you're picking the right tools and writing the prompt so the LLM uses them well.
Thinker takes it from here.
A tool is a function your agent can call.
That's the whole definition. Pedantically: a tool has a name, a description, an input schema, an output shape, and code that actually runs when the LLM decides to call it.
But here's the part most people miss on their first few agents: the tool description is part of the prompt.
Let me say that again. When the LLM decides whether to call a tool, it's reading the tool's description in the same context window as your system prompt. The description isn't documentation, it's routing logic. A vague tool description is a tool that never gets called. A sharp description is a tool the LLM reaches for at exactly the right moment.
The anatomy of a tool
Every tool has four parts. Each one is a signal to the LLM.
What the tool is called. Short, active, specific. fetch_url, not fetch.
A sentence or two. What the tool does, what it takes, when to use it. The prompt-for-the-tool.
Typed spec of what the tool accepts. Required fields, types, examples.
The code that runs when the tool is called. Returns a string, a JSON object, or an error.
For Claude Code's built-in tools (Read, Write, Edit, WebFetch, Bash, Grep, Glob), all four parts are provided. You don't write any of that. You just pick which ones your agent gets and how you want the agent to use them.
Read-only, narrow write, broad write
Tools come in three categories sorted by risk. You'll meet this frame again in Chief, but the short version matters here too.
Your first agents should be entirely read-only. The daily-briefing agent from Module 001 is read-only: it has Read and nothing else. The one you'll extend in this module adds WebFetch, also read-only. When you graduate to write tools, you wrap them in permission boundaries. Chief gets into that.
What the LLM decides, what you decide
You decide: which tools the agent has access to, what each tool's description says, and any rules in the system prompt about when not to use a tool.
The LLM decides: at each step, which tool (if any) to call. What arguments to pass. Whether to retry, give up, or ask the user when a tool fails.
Most "the agent won't use my tool" problems are description problems. The LLM is making the best decision it can with the information you gave it. If that decision is wrong, the information it had was wrong.
Why descriptions matter more than you think
Here's a pattern that breaks a lot of first agents.
You build a tool called search_tickets with description "Searches the ticket database." You plug it into a support agent. You ask the agent "what's the status of ticket 847?" and the agent ignores your tool and makes up an answer.
Why? Because "searches the ticket database" doesn't tell the LLM when to reach for this tool. A better description:
Look up the current status, assigned owner, and last update of any
support ticket by ticket ID. Use this whenever the user asks about
a specific ticket by number.
Same tool. Different routing. The second description pattern-matches against "what's the status of ticket 847" and the LLM calls it. The first description doesn't.
This is tool writing 101, and it's the part most guides skip. Tool descriptions are prompts. They get the same amount of care as your system prompt. Maybe more.
A tool description is not documentation. It's routing logic.
The LLM reads the description in the same context as your system prompt. Write it with the same care. Name precisely. Say when, not just what. Acknowledge limits. Show one example. Every one of those is a signal to the LLM.
Talker takes us into the craft of writing tool descriptions.
Writing a tool description is half the game.
Four moves get a tool picked up reliably.
Move 1. Name precisely.
The name is the first signal. fetch is too vague. fetch_url is specific. fetch_url_as_markdown is even better because the LLM can guess the shape of the return value from the name.
Short names with underscores, active verbs, specific nouns. summarize_doc, not summary. classify_email, not classifier.
Move 2. Describe when, not just what.
This is the biggest lever. Most tool descriptions answer "what does this do?" Few answer "when should I reach for this?"
Weak:
Fetches a URL.
Strong:
Fetches a URL and returns its content as plain text. Use this
whenever the user provides a web URL or asks you to read,
summarize, or quote from an online source.
The strong version tells the LLM the trigger condition. The LLM pattern-matches against the user's message and decides to call the tool.
Move 3. Be honest about limits.
If the tool can't handle PDFs, say so. If it times out after 30 seconds, say so. If it only works on public URLs (not behind a login), say so.
Fetches a publicly-accessible URL and returns its content as
plain text. Does not work on pages that require authentication,
JavaScript rendering, or CAPTCHA. Max response size: 100KB.
This prevents a common failure mode: the LLM tries the tool, it fails, the LLM retries with the same input, it fails again. Name the limit in the description, save the retry loop.
Move 4. Show an example.
One concrete example in the description, formatted as you want the arguments to look, cuts most argument-shape bugs.
Example usage:
fetch_url(url="https://example.com/article")
Returns the article text, stripped of HTML.
Putting it together
A complete tool description ready to hand to an agent:
Name: fetch_url
Description:
Fetches a publicly-accessible URL and returns its content as
plain text, stripped of HTML. Use this whenever the user gives
you a web URL or asks you to read, summarize, or quote from an
online source. Does not work on pages that require login, heavy
JavaScript, or CAPTCHA. Max response size: 100KB.
Example usage:
fetch_url(url="https://example.com/article")
Input schema:
url: string (required) the full URL including https://
Output:
string the page content as plain text
Five elements. Name, what, when, limits, example. Every one is a signal to the LLM.
For Claude Code's built-in tools
You don't rewrite the description of built-in tools. Claude Code provides those. What you write is the routing guidance in your system prompt: rules that tell the agent when to reach for each tool.
Rules:
- When the user gives you a URL, always fetch it with WebFetch
before responding.
- When the user gives you a file path, always read it with Read
before responding.
- Never fabricate content. If the tool fails, report the failure.
Three rules about tool usage. Each one redirects a failure mode you'd otherwise have to debug later.
Open a text editor. For the daily-briefing agent (or any agent you want to extend), write three rules about when the agent should use which tool. Use this scaffold:
- When the user gives you X, always use tool Y.
- When tool Y fails, do Z instead.
- Never Q without first using tool Y.
Save the rules somewhere you can paste into an agent file. You'll use them in Doer.
Three tool-routing rules in plain text, ready to paste.
Rememberer covers what state lives where in a tool-using agent.
Tool calls are stateless.
That's the short version. Whenever the LLM calls a tool, the tool executes, returns a result, and the result goes back into the conversation context. The tool itself doesn't remember previous calls. Every call starts fresh.
This has three practical implications.
The agent remembers; the tool doesn't
The agent carries context across tool calls because everything, the user message, the tool call, the tool result, the next tool call, all sits in the same conversation history. But the tool function itself has no memory of prior invocations. If you need state that persists across calls, the tool has to read or write to an external store (file, database, API) on each invocation.
For the daily-briefing agent you're extending: Read, WebFetch, and their results all live in the conversation. The agent knows what it fetched two steps ago because it's still in context. The WebFetch tool itself doesn't know or care.
Tool results are context costs
Every tool call's output becomes part of the context for the next LLM decision. If a tool returns 50KB of HTML, that 50KB is now in context. The next tool decision is made against that context. Tokens are money; long tool results burn through budget fast.
Two tactics:
- Tools should return just enough. A web-fetching tool that returns stripped plain text is better than one that returns raw HTML with scripts and ads.
- Post-process before the LLM sees it. If the tool can summarize, extract, or filter, do it in the tool, not in the LLM.
Errors are context too
When a tool fails, the error message becomes context. That's usually good: the agent can retry, ask the user, or give up gracefully. But a bad error message can confuse the agent more than no error at all.
A good error: "URL returned 404 Not Found. The page does not exist."
A bad error: "Error: undefined."
The first one lets the LLM explain what happened. The second is noise.
Where tool state lives when it has to
Sometimes tools do need persistent state. A "remember user preference" tool, for example. The tool writes to a file or database; subsequent calls read from it.
For a first agent, don't build this. Handle any per-user state in your system prompt or by passing it in as part of the user message. Persistent tool state is a feature you add when you have a specific reason, not by default.
The mental split, extended
From Module 001: system prompt = what's always true. User message = what just arrived. Tools = what the agent can do. Memory = what the agent retains from earlier in this conversation.
In this module we add two items:
- Tool results: context the agent reads when deciding its next move.
- Tool errors: signals that shape the agent's next decision.
The four original pieces plus these two are the whole architecture of a small agent. Everything else is tuning.
Doer builds.
Time to extend the daily-briefing agent with WebFetch so it can summarize URLs.
You're going to:
- Open the agent file you built in Module 001.
- Add WebFetch to the tools list.
- Update the system prompt with tool-usage rules.
- Run it against a URL.
- Test the failure path.
This section is 12 minutes of hands-on. You'll need Claude Code and the daily-briefing agent from Module 001. If you skipped that module, go back and build it first. This one extends that exact file.
Step 1. Open your existing agent file (1 min)
In your project root, open .claude/agents/daily-briefing.md. Your frontmatter probably looks like this:
---
name: daily-briefing
description: Reads a single document (markdown, plain text) and
returns three bullet points summarizing it.
tools: Read
---
Step 2. Add WebFetch (30 sec)
Change the tools: line to include WebFetch:
tools: Read, WebFetch
Save. The agent now has two tools: Read for local files, WebFetch for URLs.
Step 3. Update the description and rules (3 min)
The description field is how Claude Code decides when to route to this agent. It needs to announce the new capability:
---
name: daily-briefing
description: Reads a single document or URL and returns three
bullet points summarizing it. Use this when you want a fast
brief on a document or a web page.
tools: Read, WebFetch
---
Then in the body of the agent (the system prompt), add a rules block that handles both input types:
You are a daily-briefing agent. You read a single document or
web page and return three bullet points summarizing it.
Rules:
- If the input starts with http://, https://, or www., fetch it
with WebFetch first.
- Otherwise, treat the input as a file path and read it with Read.
- Never fabricate content. Every summary must be based only on
content the tool returned.
- If WebFetch returns an error, empty content, or a CAPTCHA
page, reply with: "I couldn't access the content. Give me a
different source."
- Always return exactly three bullet points, one sentence each,
under 25 words. No preamble. No commentary.
Save.
Step 4. Run it on a URL (3 min)
In a terminal, run claude. Then ask:
Use the daily-briefing agent to summarize
https://www.anthropic.com/news
(Or any short article URL.) You should see the agent reach for WebFetch, grab the content, and return three bullets.
Expected output, roughly:
- Anthropic announced new features for Claude, including extended thinking mode.
- The release focuses on improving reasoning quality on complex tasks.
- The update is available through the API and Claude apps starting today.
Your bullets will differ by URL. The shape is what you're checking: three, one-line, no preamble.
Step 5. Trigger the failure path (2 min)
Point the agent at a URL that will fail:
Use the daily-briefing agent to summarize
https://not-a-real-domain-12345.example
You should see the fallback: "I couldn't access the content. Give me a different source." If the agent invents a plausible summary instead, your tool-failure rule isn't strong enough. Move it higher in the rules block or make the trigger more specific.
Step 6. Test the path-switching (2 min)
Confirm the agent still picks Read for local files:
Use the daily-briefing agent to summarize /tmp/test-article.md
It should use Read, not WebFetch. If it tries WebFetch on a local path, the first rule isn't exclusive enough. Rewrite it as "If the input starts with http://, https://, or www.: WebFetch. Otherwise: Read." The "otherwise" forces a binary decision.
Step 7. Tighten the weakest rule (30 sec)
Same pattern from Module 002. Find the rule that failed most and sharpen it. Save. Rerun.
The agent reaches for WebFetch on URLs, Read on file paths, and hits your fallback message on failures. Three tools listed, three behaviors.
- Agent ignores WebFetch on a URL: description doesn't mention URLs. Extend it to include the new capability.
- Agent uses WebFetch on a file path: your rules aren't exclusive. Rewrite as "if X then A, otherwise B."
- Fetch fails but agent invents a summary: add "Every summary must be based only on content the tool returned" at the top of the rules block, and make the failure path its own labeled section.
- Tool-name typo:
Webfetchorwebfetchboth fail silently. It'sWebFetch. Case matters.
What you just learned
Adding a tool is three things:
- Listing it in the
tools:frontmatter. - Mentioning the new capability in the
description:for routing. - Writing rules in the prompt body that tell the agent when to reach for it.
Skip any one of these and the tool either doesn't exist, doesn't get invoked, or gets invoked incorrectly. All three together: a reliably-used tool.
Save your agent file. You'll extend it again in later modules.
Rookie has the failures to watch for.
Three ways this falls apart the first time. Know them in advance, save the hour each one costs.
Failure 1. The tool isn't being called even though it's listed
You added WebFetch to the tools: line. You asked the agent to summarize a URL. The agent replied with a summary, but it just made one up.
Ninety percent of the time: the tool description field didn't mention the new capability. The LLM routed based on the old description, which only talked about local files.
Fix: extend the description to include the new tool's use case. "Reads a single document or URL...", both capabilities in the description, not just the body.
Ten percent of the time: the tools: line has a typo. Webfetch vs WebFetch. Case matters. Copy the tool names from the Claude Code docs, don't rely on memory.
Failure 2. The agent uses the wrong tool
You asked it to summarize a local file and it tried WebFetch. Or vice versa.
The LLM looks at the user's message and pattern-matches against your rules. If your rules are ambiguous (a file path could also look like a URL, or your rules aren't mutually exclusive), the LLM picks one and you live with the coin flip.
Fix: rewrite the rules so they're exclusive. Instead of two separate rules, combine them:
Determine the input type:
- If it starts with http://, https://, or www.: use WebFetch.
- Otherwise: use Read with the input as a file path.
The "otherwise" makes the decision binary. No overlap, no ambiguity.
Failure 3. The tool fails and the agent invents
You pointed the agent at a broken URL. Instead of the fallback message, you got a plausible-looking summary of nothing.
This is the classic "hallucinate to keep going" failure mode. The LLM knows you want three bullets. The tool didn't give it content. It fills in from what it knows about the topic (or nothing) and returns bullets anyway.
Fix: two rules, not one. The LLM has two temptations.
First rule, right at the top of the rules block (blocks "fill from memory"):
Never return a summary based on prior knowledge. Every summary
must be based only on content returned by WebFetch or Read.
Second rule, the explicit failure path (blocks "pretend the fetch succeeded"):
If WebFetch returns an error, empty content, or a CAPTCHA page,
reply with: "I couldn't access the content. Give me a different
source." Do not attempt to summarize.
The underlying pattern
All three failures come from one root cause: the LLM is making a reasonable decision with incomplete information, and you own the information.
- Tool not being called: LLM doesn't know when to reach for it. Fix: the description.
- Wrong tool: LLM has two plausible options. Fix: exclusive rules.
- Hallucination on failure: LLM has no instruction for the empty-result case. Fix: explicit fallback.
Every time you catch the agent misbehaving around a tool, ask: what information could I have given it that would have led to the right call? Add that information. Save. Rerun. That's the whole debugging loop.
Manager takes the team-scale view.
Tools are shared infrastructure.
Your first tool belongs to your agent. Your team's third tool belongs to the team. By the tenth tool, you have an inventory, a dependency graph, and people who own specific ones.
That transition, from "I wrote a tool" to "we have tools," happens faster than people expect. It's where teams lose control of their agents.
Tools are code. Treat them as code.
Whether it's an MCP server (Module 014 gets there) or a shared set of descriptions used by multiple agents, a tool is a piece of code that determines what your agents can and can't do. That means:
- It lives in version control. Same repo as the agents that use it, or a dedicated tools repo.
- It has an owner. One human, named in a CODEOWNERS file or equivalent.
- It has tests. Not optional. A tool that breaks silently breaks every agent that uses it.
- Changes go through review. Including description changes, because descriptions are part of routing.
The tool PR template
When someone on your team changes a tool (or the description an agent sees), the PR should include:
- The diff of the tool code or description.
- A one-line description of what problem the change solves.
- A test case demonstrating the new behavior.
- A list of agents that use this tool, with a sign-off that the change is safe for all of them.
Item four matters. A tool shared by three agents can break two of them with a single change. Someone has to check all three.
Tool naming conventions
At five tools, naming doesn't matter. At twenty, it's a mess without conventions.
Two patterns work:
- Verb_noun for actions:
fetch_url,send_email,create_ticket. - Domain_action for scoped tools:
github_create_issue,slack_post_message,jira_update_status.
Pick one and enforce it. A new agent author can guess the name of a tool before looking it up, which means one less search per agent built.
Permission tiers
Not every agent needs every tool. The authentication agent doesn't need send_email. The email-triage agent doesn't need delete_database_row.
Give agents the minimum tools they need. Principle of least privilege, applied to agents. The smaller the tool set per agent, the smaller the blast radius when something goes wrong.
In a subagent frontmatter, the tools: line does this job. For API-based agents, you scope at the call site. Either way, don't hand every agent the full toolbox.
When to build vs. reuse
Teams that let every engineer write their own version of the same tool end up with four "send_email" tools, each slightly broken in a different way. Teams that enforce a shared tool for email end up with one battle-tested version.
Good questions before building:
- Does a tool like this exist in our inventory already?
- Is the existing one broken for my case, or just unfamiliar?
- If I extend the existing one, does that break anyone else?
The default answer to "do I need to build a new tool?" should be no. Most of the time the existing one works and you just haven't read the description carefully. When the answer really is yes, write it as an additional tool, not a replacement, unless you're prepared to update every agent that uses the old one.
Observability at team scale
Once multiple agents share tools, you need to know: which agents called which tools, when, with what inputs, and what the tools returned. This is the same logging you'd build for any internal API. Tools are internal APIs for your agent fleet. Treat them the same.
Chief handles the risk and governance layer.
Three risks a system prompt carries that don't show up until the agent is in production. All three are underrated. All three are boardroom-level.
Risk 1. Prompt changes are deploys
A system prompt change is not a "small edit." It changes the behavior of every call your agent makes, across every user, every geography, every compliance regime. It is a deploy in every meaningful sense. Treat it as one.
This means:
- Prompt changes go through the same review process as code changes.
- Prompt changes are logged with the same auditability as code changes.
- Prompt changes are rolled back the same way code changes are, with a version number and a commit hash.
- Prompt changes are communicated to the same stakeholders who care about code changes, security, compliance, customer success.
The most common failure at exec-level: treating the prompt as "copy" that the content team can edit on a whim. Copy can be edited freely. Business logic cannot. A system prompt is not copy. It is the policy the agent enforces.
If you find that your team treats the prompt like copy, that's a governance gap. Close it before it's a governance incident.
Risk 2. The system prompt is a data exposure surface
Every token in the system prompt is sent to the model provider on every call. If your prompt contains customer names, internal product codenames, pricing details, competitor comparisons, or any sensitive business logic, you are routing that data through your LLM provider's infrastructure, every time.
Two things matter:
- What you put in. Do not put sensitive data in the system prompt. It doesn't belong there anyway (per Rememberer), but the compliance angle is the one that gets attention in a board meeting.
- Where it goes. Know your provider's data handling. Is the prompt logged? For how long? Is it used for training? What's the data residency? Have you signed the right data processing agreement? Your legal and security teams probably have opinions. Ask them.
The prompt can become part of your data inventory. For a publicly-traded company with an AI agent in production, the prompt may be a disclosable item. Treat it that way.
Risk 3. Cost scales with drift
This is the risk that surprises finance teams six months into production.
A loose prompt produces longer responses, more retries, and more back-and-forth with users who are confused by the output. Every one of those is an LLM call. If your drift rate is 40% and your retry logic is just ask again, you are paying 1.4× for the same volume of work. At scale, that's six figures a year you didn't budget for.
The flip side: tightening the prompt is the cheapest cost optimization available. A 30-minute prompt hardening session can cut per-call costs by 20–30%, because:
- Tighter prompts produce shorter responses.
- Shorter responses = fewer output tokens = lower cost.
- Fewer retries = fewer total calls.
- Fewer human escalations = less support cost.
Most AI cost conversations jump straight to "can we use a smaller model?" The prompt is almost always the bigger lever. Run that exercise first.
The governance frame
If your organization is going to run AI agents at scale, three things need to exist as policy:
- Prompt change control. Who can change what, with what approval, logged where.
- Prompt data classification. What categories of data are allowed in a system prompt, and who reviews deviations.
- Prompt cost budgets. Per-agent cost caps with alerting, plus a regular review of drift rates across agents.
None of these are technical problems. They are governance problems that happen to be about technology. The technical teams can build the systems, but the policy has to come from leadership, and it has to come before the first agent ships, not after the first incident.
The chief's two questions
Two things a board member should be able to answer about any agent the company deploys:
- What does the system prompt say, and who owns it?
- What happens when the system prompt is wrong?
If the answer to the first question takes more than 30 seconds, you have a governance problem. If the answer to the second question is we roll back and redeploy, you have a mature operation. If the answer is we'd have to figure that out, you have an incident waiting for a calendar to fall on.
Founder wraps it.
You, alone, with an agent that only kind of works. Every solo operator has this moment. The trick is knowing what to build next.
Start read-only. Stay there longer than you think.
The temptation with a first agent is to go write-capable immediately. An agent that can send an email, post a message, create a task. The pull is real; write capability feels like "real work" in a way read capability doesn't.
Resist it for at least your first three agents. Read-only agents teach you prompting discipline and tool discipline without introducing the "what if it does something irreversible" question. Once you trust your prompts and your tool descriptions, you can graduate to narrow-scope writes.
This is not about being timid. It's about learning one skill at a time. Prompt craft from Module 002 plus tool craft from Module 003 is already two disciplines to get right. Adding "does this irreversible thing do what I want" as a third while you're still building is how the "oh god I sent 400 emails" story gets written.
Your personal tool library
Keep a folder. ~/tools/ or agents/shared-tools/ or wherever. Every tool description you write lives there, alongside notes about what worked and what broke.
tools/
fetch_url.md (description + rules)
summarize_doc.md
classify_lead.md
notes.md (what you learned about each)
The notes file is the unsung part. For each tool, jot down what worked, what broke, what the tool doesn't handle. Next time you build a similar tool, you don't rediscover the same bugs.
This library is an asset. A year in, you have thirty tools you trust. Every new agent you build reaches into this library instead of starting over.
The weekly tool review
Every Friday, twenty minutes:
- Pick one tool.
- Read its description out loud.
- Look at the last fifty times the tool got called.
- Find the one call where the tool did something weird.
- Tighten the description or the routing rule. Save.
Over a year: each tool is iterated on five to ten times. Small improvements compound. The tool you wrote in Q1 is half the latency and twice the accuracy by Q4, and you spent an hour on each one.
When to extend vs. build new
You have a fetch_url tool. Now you need a fetch_url_and_parse_html tool that does the same fetch but extracts specific elements.
Question: is this a new task, or the same task with extra post-processing?
- If the extraction is useful for 70 percent of callers: extend the existing tool. Add an optional parameter.
- If the extraction is useful for 10 percent of callers: build a new tool that wraps the existing one. The 70 crowd keeps calling the simple one; the 10 call the wrapper.
Resist building parallel implementations. Two tools that do almost the same thing is worse than one tool with an optional flag.
The trap: one big tool
The opposite trap is building one tool that does everything. do_thing(action: "fetch" | "parse" | "summarize" | "send" | "delete"). Looks clean. Behaves terribly. The LLM can't route on it; the description has to cover five different operations, so it's vague for all of them.
Small, sharp tools. Each one does one thing. Compose them in the agent's prompt, not in the tool's signature.
Every tool you build is a promise the agent will keep.
Name it, describe it, scope it, and constrain it before you deploy it. If you trusted the tool's description to be the limit, you built the tool wrong. Tools are how agents touch the world. Take the craft seriously; the rest follows.