Module 001 · The agent loop.

Section 01

Hello

Opens the module·Names the problem

Your first agent is 90 minutes away.

Not a chatbot. Not a script. An agent. The thing that actually decides what to do next, reaches for the right tool, and comes back with something useful.

By the end of this module, you'll have one running on your own machine. A daily-briefing agent that reads a file you give it and returns three bullet points summarizing it. Small purpose. Real loop. Your own.

We'll build it in Claude Code. If you don't have Claude Code installed yet, grab it from claude.com/claude-code before you go any further. Five minutes, free, runs locally.

You came here through Module 000, so the vocabulary is already in place. You know what an agent is in the abstract. This module makes it concrete. Three things happen:

Thinker lays out the mental model. The loop. The three-part stack. What the LLM decides and what you decide.
Talker and Rememberer cover two of the three parts (system prompt, memory).
Doer is the build itself. You write the agent file, point it at a task, run it, see output.

After Doer, we zoom out. Rookie catches the three ways new builders break their first agent. Manager handles the human-agent handoff for team leads. Chief covers the risk and cost frame for decision-makers. Founder ties it together for solo operators.

One thing before we start. This module is called The agent loop. That's the thing to remember. Every agent you'll ever build is a loop. The LLM decides what to do at each step. Tools let it actually do things. Memory lets it carry context from step to step. That's the whole picture. Everything else is tuning.

Thinker takes it from here.

Section 02

Thinker

Reasoning·The loop and the stack

An agent is a loop.

Most people think of "an AI" as a thing that takes input and produces output in one shot. Prompt in, answer out. That's inference. It's a building block of agents, but it's not an agent.

An agent is what happens when you wrap inference in a loop and give it tools.

The loop

The loop has four moves. The LLM sits at the center, deciding which move to make next, every step of the way.

Perceive

Read the input

→

Decide

Pick the next move

↑

The loop

↓

Observe

See what happened

←

Act

Fire the tool

Four moves. Perceive: the agent looks at what just arrived. A user message. A file. A tool result. Decide: the LLM reads the perceive input, plus everything in context, and picks what to do next. Call a tool? Ask the user? Give up? Write the final answer? Act: whatever the decide step chose, it happens. Tool fires. Response drafts. Database writes. Observe: whatever just happened becomes input to the next perceive. The loop runs again until the LLM decides the work is done.

The word that matters most in that list is decide. In a traditional program, the developer decides every step. If this, then that. The logic tree is rigid. In an agent, the LLM decides at each step. That's autonomy. That's what makes it an agent.

The three-part stack

Every agent stands on three pieces. You'll meet all three in your first build.

Part 01

System prompt

The identity and rules. Read first, by the agent, every run. Static.

Part 02

Tools

The things the agent can do. Read a file. Call an API. Each tool has a name and a schema.

Part 03

Memory

What the agent retains across steps. For a first agent, just the current conversation.

That's it. Three pieces. Every agent in every module of this publication is some variation of these three.

What the LLM decides and what you decide

A useful frame for a first build: what decisions do you own, and what decisions does the LLM own?

You decide: who the agent is (system prompt), what tools it has access to (tool definitions), what memory it sees (context strategy), when to end the loop (termination conditions, though many agents handle this themselves).

The LLM decides: at each step, what tool to call next (or whether to call any), what to say to the user, when to stop based on the system prompt's guidance.

When your agent misbehaves, the question is usually: did I own the decision that went wrong, or did the LLM? If the system prompt is loose, that's your decision. Tighten it. If the tool is broken, that's your decision. Fix it. If the LLM is making a bad choice in context, that's a prompt problem: give it better guidance.

Why an agent is different from a chatbot

A chatbot is a loop too. Input, response, input, response. What makes an agent different is the tools. The chatbot has one thing to do: produce text. The agent has a whole set of things to do and picks among them.

This matters because the moment your agent has a tool, it can change something in the world. Send an email. Move a file. Write a ticket. That's the difference between AI that chats with you and AI that does work for you.

Every agent in this publication has at least one tool. If there's no tool, it's a chatbot with a fancier system prompt. Useful, but not our subject.

The north star of this module

An agent is a loop. The LLM decides. Tools act. Memory carries context.

Your job as the author is to make as many decisions as you can in advance. Write them into the system prompt and the tool definitions. Every decision you leave to the LLM at runtime is a place drift can enter.

Talker covers the system prompt piece of the stack. Quick version for your first build.

Section 03

Talker

Prompts·The system prompt, fast

The system prompt is how you tell the agent who it is.

Module 002 goes deep on the craft. For your first agent you need the basics: an identity, one or two rules, a format, and a sensible default when the input doesn't fit.

The identity sentence

Start with one sentence that names the agent and its job.

Weak:

You are a helpful assistant.

Strong:

You are a daily-briefing agent. You read a single document and
return three bullet points summarizing it.

The strong version is specific enough that the LLM can do its job without guessing. The weak version is the vagueness that makes first agents drift from the jump.

Two or three rules

Rules are imperatives. Always. Never. Only. Reserve those words for things that really matter.

For a daily-briefing agent, a reasonable rule block:

Rules:
- Return exactly three bullet points.
- Each bullet is one sentence, under 25 words.
- Never include commentary outside the bullets.

Three rules, each specific, each testable.

The output shape

If you want the response in a particular format, show the format. Don't describe it.

Weak:

Return a summary.

Strong:

Format:
- Bullet 1
- Bullet 2
- Bullet 3

For a first agent, plain-text bullets are fine. You don't need JSON yet. Module 003 gets into JSON shapes.

The default behavior

What happens when the input doesn't fit? For a daily-briefing agent, the default might be:

If the input is not a readable document, reply with:
"I couldn't read the document you provided. Try again with a
plain text or markdown file."

A specified default prevents the agent from improvising when it doesn't know what to do.

Putting it together

Here is the complete first system prompt for the daily-briefing agent you'll build in Doer:

You are a daily-briefing agent. You read a single document and
return three bullet points summarizing it.

Rules:
- Return exactly three bullet points.
- Each bullet is one sentence, under 25 words.
- Never include commentary outside the bullets.

Format:
- Bullet 1
- Bullet 2
- Bullet 3

If the input is not a readable document, reply with:
"I couldn't read the document you provided. Try again with a
plain text or markdown file."

Twelve lines. One identity sentence. Three rules. A format. A default. That's the whole system-prompt package for your first agent.

Build block · 3 minutes

Save your first system prompt

Open a text editor. Copy the system prompt above (starting with You are a daily-briefing agent) into a file called system_prompt_draft.txt. Save it somewhere you can find again. You'll paste it into the agent file in Doer.

Expected output

A file on disk with twelve lines of text.

Rememberer covers the memory piece of the stack.

Section 04

Rememberer

Memory·Keep memory simple

Memory is where most first agents get confusing. It doesn't need to be.

For your first agent, memory is just two things:

The conversation history of the current run.
Whatever file or data you pass in as the user message.

That's it. No database. No retrieval system. No vector store. Those come later, if they come at all.

What persists

In a Claude Code subagent, the conversation history persists within a single run. The agent asks you something. You answer. Both the question and the answer are in memory for the rest of that run.

When the run ends, the conversation is gone (unless you log it somewhere yourself). Next run starts fresh. That's actually the default you want for a daily-briefing agent. Yesterday's briefing shouldn't bleed into today's.

What doesn't persist (and why that's fine)

The system prompt persists across every run, because it's part of the agent file. It's static. Tool definitions persist for the same reason.

Nothing else persists automatically. If you want a specific piece of state to carry from run to run, you write it to a file or a database yourself. The agent reads that file at the start of each run.

This is simpler than it sounds. For your first agent, the state is: the document you're briefing on. You pass it in as the user message. The agent reads it. Job done. No persistent memory needed.

The mental split

Keep this split in your head:

System prompt: what's always true about the agent.
User message: what just arrived for this run.
Tools: what the agent can do in the world.
Memory: what the agent retains from earlier in this conversation.

Module 002 drills into the line between system prompt and memory. For now: anything that changes per run goes in the user message, not the system prompt. Keep that line clean and you won't have memory bugs.

When memory gets complicated

There is a whole category of agents where memory does get involved. Agents with ongoing relationships to users, like a personal assistant that remembers your preferences. Agents that build up knowledge over time. Agents that reason over large document collections.

Those live in later modules. For your first agent, keep it simple: the system prompt and the user message carry the whole context. That's plenty.

Doer is next. Time to build.

Section 05

Doer

Actions·Build your first agent

We're going to build a working agent in the next fifteen minutes.

The agent: a daily-briefing agent that reads a document and returns three bullet points. You run it by passing it a file. It reads, summarizes, returns.

The stack: a Claude Code subagent file. The system prompt you drafted in Talker. One tool (the Read tool, which Claude Code provides by default). No external dependencies.

Build block · 12 minutes

Ship your first agent

Step 1. Create the agent file (2 min)

In your project root (or anywhere you want to keep this agent), make a folder called .claude/agents/ if it doesn't exist. Inside it, create a file called daily-briefing.md and paste this in:

---
name: daily-briefing
description: Reads a single document (markdown, plain text) and
  returns three bullet points summarizing it. Use this when you
  want a fast brief on a document.
tools: Read
---

You are a daily-briefing agent. You read a single document and
return three bullet points summarizing it.

Rules:
- Return exactly three bullet points.
- Each bullet is one sentence, under 25 words.
- Never include commentary outside the bullets.

Format:
- Bullet 1
- Bullet 2
- Bullet 3

If the input is not a readable document, reply with: "I couldn't
read the document you provided. Try again with a plain text or
markdown file."

Save. Notice the frontmatter:

name: how Claude Code refers to this agent.
description: how Claude Code decides when to route to it. Specific descriptions are the single biggest reason first agents actually get invoked.
tools: which tools this agent has access to. Read means the Claude Code Read tool. The agent can read files.

Step 2. Give it something to read (1 min)

Make a plain text or markdown file with something to summarize. Copy any article into test-article.md, or use this:

# The three-part stack

Every agent stands on three pieces: a system prompt, a set of
tools, and a memory system.

The system prompt is the static contract. It tells the agent who
it is and what it must always or never do.

Tools are the things the agent can call: read a file, send an
email, search the web, write to a database.

Memory is what the agent retains across steps and runs. For simple
agents it's just the current conversation. For complex ones it's
a whole retrieval system.

Most first agents don't need all three. A daily-briefing agent
needs a system prompt and one tool. Memory is the conversation
it's having with you right now.

Save it to a known path. For example: /tmp/test-article.md.

Step 3. Run it (3 min)

In your terminal, run claude to start a session. Then ask:

Use the daily-briefing agent to summarize /tmp/test-article.md

Claude Code should route to your subagent. You'll see the agent read the file, then respond with three bullets.

Expected output, roughly:

- Every agent is built from three parts: system prompt, tools, and memory.
- The system prompt defines who the agent is; tools define what it can do; memory defines what it remembers.
- Simple agents need only a system prompt and one tool, not all three.

Your bullets will be worded differently. What you're checking: three bullets, no preamble, no commentary.

Step 4. Trigger the default (2 min)

Ask the agent to brief a file that doesn't exist:

Use the daily-briefing agent to summarize /tmp/does-not-exist.md

You should see the fallback message: "I couldn't read the document you provided..." If the agent hallucinates a summary of a document it couldn't read, you have a prompt problem. Make the default clause more prominent, or give it its own labeled section at the top of the prompt.

Step 5. Iterate (4 min)

Something will feel slightly off in your output. It almost always does on v1. Pick the one thing that's most wrong and fix it.

Bullets too long: tighten the rule from "under 25 words" to "under 15 words."
Agent adds "Here are three bullet points:" before the bullets: add a rule, Never include any preamble before the bullets.
Agent picks the wrong three bullets: add a rule, Prioritize the document's stated conclusions over examples or asides.

Save the file. Rerun. Read the output. Did the thing you fixed actually get better?

Expected result

A working agent you can invoke by name in Claude Code. Three bullets on a real document. The fallback message on a missing one. An iteration loop that takes seconds.

If something's wrong

The agent isn't being invoked. Your description is too vague. Rewrite it to name what the agent does, what it takes, and when to use it. Also check that the file sits at .claude/agents/daily-briefing.md in the directory where you're running Claude Code.
The agent hallucinates instead of reading the file. The Read tool isn't wired or the prompt doesn't push toward using it. Confirm tools: Read is in the frontmatter. Add a top rule: Always read the document first using the Read tool before producing any output.
The agent keeps asking clarifying questions. The prompt is missing a rule. Read the question, find the missing rule, add it. The next run will be clean.

What you just built

You have a working agent. Small, but real. Three parts:

System prompt: tells the agent it's a daily-briefing agent.
Tools: the Read tool, which lets the agent actually look at a file.
Memory: the conversation with you.

The agent file is the whole contract. It lives on your machine. You can change it, rerun, and see new behavior in seconds.

This is the pattern. Every agent you'll build in this publication is a variation of what you just did. The daily-briefing agent is trivial; the loop you learned is not.

Save daily-briefing.md somewhere you can find again. You'll reference it as you build bigger agents.

Rookie has the three ways this breaks the first time.

Section 06

Rookie

Pitfalls·Three failure modes

Three failures that catch first-time agent builders. Know them in advance, save the hour each one costs on the way.

Failure 1. The agent isn't being invoked.

You paste the daily-briefing command into Claude Code and it answers directly, not through your subagent. Or it says "I don't know what agent you're referring to."

Ninety percent of the time, this is the description field in the frontmatter. It's too vague, so Claude Code doesn't know when to route to the agent.

Weak description: "Briefs documents."
Strong description: "Reads a single document (markdown, plain text) and returns three bullet points summarizing it. Use this when you want a fast brief on a document."

The strong version has three things: what the agent does, what it takes as input, and when to use it. Claude Code reads the description as an intent match. Be specific.

Also: make sure the file is actually at .claude/agents/ (case matters) in the directory where you're running Claude Code. A subagent file in the wrong folder is a subagent file that doesn't exist.

Failure 2. The agent hallucinates instead of using the tool.

You ask it to brief a real file, and it produces a summary. Looks plausible. Three bullets. All good.

Except the summary doesn't match the file. Or worse: the file is empty, but the agent still returned three bullets.

This is the agent skipping the Read tool and guessing. It happens when the tool isn't prominent in the system prompt, or when the tools frontmatter doesn't include it.

Fix: double-check the frontmatter explicitly lists Read. Then in the system prompt body, reinforce: "Always read the document first using the Read tool before producing any output." One line, top of the rules block.

If the agent still hallucinates, pass it an obviously unparseable path and see what it does. If it doesn't invoke Read at all, the tool wiring is broken. Check your frontmatter and the Claude Code agent docs.

Failure 3. The agent goes in circles.

You ask it to brief a file, and it keeps asking clarifying questions. "Which parts of the document do you want summarized?" "What format should I use?" "How long should the bullets be?"

Everything it's asking should already be answered in the system prompt. If the agent is asking, the prompt is incomplete.

Fix: read the agent's question, find the missing rule in your system prompt, add it. Example: the agent asks "which parts of the document should I summarize?" and you add a rule, Summarize the entire document, with emphasis on the main conclusions.

Don't fight the agent. It's telling you what it needs. Give it the answer once, in the system prompt, and the next run is clean.

The shape of all three failures

All three have the same shape: the agent is missing information it needs. In Failure 1, Claude Code is missing routing info. In Failure 2, the agent is missing tool-usage guidance. In Failure 3, the agent is missing task-level decisions.

Your job as the author is to provide every piece of information the agent needs in advance. Every clarifying question you get after the fact is a piece of information that should have been in the system prompt in the first place.

Manager takes the team-scale view.

Section 07

Manager

Team process·Delegating to agents

If you run a team, your first agent is also your team's first agent. That has implications.

Agents are delegates, not automation

When you automate something, you specify every step. The script runs the same way every time. Predictable, but inflexible.

When you delegate to a person, you specify the outcome and trust them to figure out the path. Flexible, but requires judgment.

An agent sits closer to delegation. You give it a goal, a few rules, and a set of tools. It decides what to do at each step. The path varies. The outcome, if the agent is good, stays consistent.

This means agents don't replace your automation stack. They replace your "I'll do it myself" stack. The work you do that's too ambiguous for a script but too cheap to assign to a human.

The handoff

For a team, the question isn't can the agent do this? It's how does the work get handed off?

Three handoff patterns are common:

Human hands off to agent. The human decides when to invoke the agent. Low cost, good for early experiments.
Agent hands off to human. The agent works until it hits a decision it can't make, then hands to a human. Medium cost, good for quality-critical work.
Fully autonomous agent. The agent runs on its own, no human touch. High cost (of getting it right), appropriate only where the agent's failures are cheap.

Your first team agent should be pattern one. A human kicks it off for a specific task. The agent does the thing. The human reviews the output. This keeps the stakes low while you learn where the agent fails.

Where to start on a team

Pick a task that is:

Bounded. Clear inputs, clear outputs.
Frequent. Happens enough that automation pays back.
Low-stakes. A wrong answer doesn't break anything critical.
Already a bottleneck. Someone is doing this, and they don't want to be.

Examples that check all four: drafting replies to common support emails, summarizing meeting notes into action items, classifying inbound tickets by category, flagging anomalies in a daily report.

Examples that don't: anything involving compliance or legal (too high-stakes), customer-facing communication with no human review (too high-stakes), one-off creative work (not frequent), work that requires judgment no one on the team has documented (not bounded).

Build vs. keep it human

Not every repetitive task should become an agent. The break-even question: how many hours per month does this task cost the team, and how many hours will it take to build, maintain, and monitor an agent for it?

If the agent saves less than 10x its maintenance cost, it's not worth building. Especially in the first year of a team's AI journey, when every agent is also a learning vehicle.

Managing the agents themselves

An agent in production is code. It needs:

An owner (one human, named).
A location (version-controlled, same repo as the code that invokes it).
A changelog (what changed when, and why).
An eval suite (how do we know this is still working?).

This isn't optional. Teams that skip it have agents that silently rot over months. Teams that do it have agents that compound in value.

Chief handles the executive frame.

Section 08

Chief

Governance·Risk, cost, exposure

Three things to know if you're running the company that's deploying agents.

Agents are a capital allocation decision

Every agent you deploy has three cost components:

Build cost. Engineer time to design and write the agent.
Run cost. Inference spend on the LLM calls the agent makes.
Maintenance cost. Monitoring, debugging, iterating.

The build cost is one-time. The run and maintenance costs are ongoing. If you're used to thinking about software as something you build once and then forget about, agents break that model. Agents need tending.

The implication for budgeting: your inference line item will grow. Not alarmingly, but predictably. Plan for it. A team running ten modest-volume agents is burning $500 to $5,000 a month in inference by year one.

Inference spend: the new cost center

This is worth its own line. Inference spend scales with two things: how much work your agents do, and how many tokens each piece of work costs.

You control the first. You partly control the second. Inference cost per task is a function of:

Model choice. Sonnet is cheaper than Opus. Haiku is cheaper than both. Pick the smallest model that does the job well.
Prompt length. Every token in the system prompt is paid every time. Keep prompts tight.
Output length. You're paying for output tokens too. Specify shorter formats when you can.
Retry rate. Agents that fail and retry cost more than agents that succeed on the first try. Tighten your prompts.

The single biggest lever is prompt quality. Module 002 is entirely about this. A well-tuned prompt can cut inference cost by 30 to 50 percent on the same task. Before you switch to a smaller model, tighten the prompt.

The risk surface: what can the agent do if something goes wrong?

Every tool you give an agent is a thing that can happen without a human's approval. This is the point. It's also the risk.

Three categories of tools by risk level:

Read-only tools. The agent can see things but not change them. Low risk.
Write tools with narrow scope. The agent can write to one specific place (a Notion page, a draft email, a single database row). Medium risk.
Write tools with broad scope. The agent can write broadly (any database, any outbound email, financial transactions). High risk.

Your first agents should be entirely in the first category. Read-only. No email sending, no database writes, no external API calls that change state. The reader-of-documents agent you built in this module is exactly this shape.

When you graduate to write tools, add permission boundaries. Don't give an agent direct access to your production database; give it a staging copy, or a proxy that validates every write.

The governance three

Three policies your organization needs before running agents at any scale:

Change control for system prompts. Prompt changes are code deploys (see Module 002). They go through the same review and logging.
Data classification. What can and can't go into a system prompt or agent call. Your customer data is probably on the can't list for most LLM providers without an agreement in place.
Inference budget caps per agent. Every agent has a ceiling. When it exceeds, someone gets paged. Unbounded agents burn money, quietly, for weeks before anyone notices.

None of these are technical. They're policy. But they need to exist before the first agent ships, not after the first incident.

The chief's question

When someone on your team proposes an agent, ask three things:

What does it do, and who's the owner?
What happens if it's wrong? How do we know?
What's the budget, and what happens at the cap?

If the proposer can answer all three in 60 seconds, approve the pilot. If they can't, the agent isn't ready to be proposed yet.

Founder wraps it.

Section 09

Founder

Synthesis·The solo workflow

You, at your desk, looking at the blinking cursor. Ninety minutes of time. What do you do?

Your first agent solves one task you don't want to do

Not the most important task. Not the hardest. The task that nags at you. The one you open your laptop, avoid, get a coffee, come back, still avoid. That one.

For most operators, the candidates are:

Summarizing something (email threads, meeting notes, documents).
Classifying something (inbound leads, support tickets, messages).
Drafting something (replies, outlines, first passes).
Checking something (prices, availability, daily metrics).

Pick one. Narrow it. Build.

Start narrow

Resist the urge to build the "complete AI assistant for my business." That's a six-month project that will stall in month two. Build the 15-minute version of the thing you want. Ship it. Use it for a week. Then decide if it's worth expanding.

The daily-briefing agent you built in this module is at the right size. One input. One output. One rule. Three bullets. That's a working first agent.

Narrow wins for three reasons:

You can actually finish it. A narrow agent is a 90-minute project. A broad one is an ongoing distraction.
You can evaluate it. With one clear task, you know immediately if the agent is doing it well.
You can trust it. Trust comes from using something and watching it work. Narrow agents give you fast trust loops.

What to build next

After your first agent, three natural paths:

Extend the agent. Give it a second tool. Widen its scope.
Build a sibling agent. Pick a second narrow task. Build a second agent just like the first.
Chain them. Pass the output of agent A into agent B. Multi-step workflow.

Path 2 is the safest next move. You learn the pattern by doing it twice. You get a second working agent in another 90 minutes. Your library of agents starts to compound.

Path 1 is the trap. "Just one more feature" is how simple agents become unmaintainable blobs. If you're tempted to extend, ask: is this a new task? If yes, new agent. If no, new rule in the existing one.

Path 3 is where the leverage is, but it's not the first move. Build two or three working agents first. Then start chaining.

The weekly review

Set a 20-minute calendar block every Friday. Open each agent you built that week. Run it three times on realistic inputs. Watch the output.

For each:

Does it still work?
Is it drifting on any inputs?
Can you tighten one rule to make it 10 percent better?

Make the tightening. Commit. Close the laptop.

This is the founder's equivalent of a team's eval suite. Cheap, consistent, compounds fast. A year in, your agents have been iterated on 52 times each. Most people's agents have been touched zero times after they shipped.

The folder on your laptop

One folder: ~/agents/ or .claude/agents/ or wherever you keep it. Every agent file lives there. When you move laptops, you move the folder. When you work on a new project, you copy the folder.

This is your agent library. It accumulates. It's one of the most valuable artifacts you'll build as a solo operator in the next few years.

The one thing to remember

Every agent is a loop of three parts.

System prompt says who it is. Tools say what it can do. Memory says what it knows. The loop decides. Your job as the author is to make every decision you can in advance, by writing it into the system prompt or the tool definitions. Every decision you leave to the LLM at runtime is a place drift can enter. Module 002 is the deep dive on the system prompt.