Module 016 · Agent SDK. · Bots of Today

Section 01

Hello

Opens the module·Past the playground

Past the playground, into an app your users can use.

Every agent demo you've seen lives in Claude Desktop or a Jupyter notebook. Those are playgrounds. They're not apps. Your teammates can't use them. Your users will never see them.

The Agent SDK is the bridge. It gives you the primitives to embed Claude's agent loop in your own application, with your own tools, your own memory, your own UI.

This module is 90 minutes of going from playground to shipped app. By the end, you'll have:

A working mental model of the SDK's primitives.
A CLI agent app that runs on your laptop.
A deployment path to put it in front of teammates.

Prereq: Modules 001 (agent loop) and 003 (tools). If the agent loop is still fuzzy, circle back.

Thinker.

Section 02

Thinker

Reasoning·The SDK primitives

The Agent SDK is four primitives:

Messages. The conversation with the model.
Tools. Functions the model can call. Defined in code, not MCP (though MCP interop exists).
System prompt. Agent identity and rules.
The loop. You call the model, handle tool calls, call again, stop when done.

The agent loop in code

while not done:
    response = client.messages.create(...)
    if response.stop_reason == "tool_use":
        tool_results = run_tools(response.content)
        messages.append(tool_results)
    else:
        done = True

That's the whole loop. Everything else (streaming, memory, budgets) is trim around it.

SDK vs. MCP vs. Claude Code

SDK: you build the app.
MCP: you expose tools to existing clients.
Claude Code: the CLI/IDE client itself.

SDK is for when you want the agent inside your product.

Talker.

Section 03

Talker

Prompts·The agent template

The system prompt for an SDK agent is the same contract from Module 002, plus one thing: it knows about your tools.

The template

You are [agent name], a [role] for [app name].

Rules:
- [imperative 1]
- [imperative 2]
- Never call a tool unless the user's request requires it.
- Always return a final text response after using tools.

Tools available:
- search_docs: searches our internal docs by keyword.
- create_ticket: files a bug or request.
- get_user_profile: looks up a user by email.

Use the minimum number of tools needed.
Escape hatch: if the request doesn't match any tool, say so
plainly and ask one clarifying question.

Tool definitions in code

tools = [
    {
        "name": "search_docs",
        "description": "Searches internal docs. Returns up to 5 snippets matching the query.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"]
        }
    }
]

Rememberer.

Section 04

Rememberer

Memory·State across turns

SDK agents need state. Three shapes:

Conversation state. Messages list in memory during a session.
User state. Preferences, history, profile. Stored per-user in a DB.
System state. Rate limits, cache, auth tokens. In Redis or similar.

File layout

my-agent-app/
  agent/
    __init__.py
    loop.py          (the main loop)
    tools.py         (tool implementations)
    system.txt       (system prompt)
  storage/
    sessions.db
    users.db
  tests/
  pyproject.toml
  .env.example

Sessions live short. Users live forever. Treat them differently.

The context budget

Every tool result, every past message, every system prompt token, goes into the next call. Watch the budget. A long session with 20 tool calls and 5000-token responses will hit the limit faster than you expect.

Trim old tool results. Summarize old messages. Context is finite.

Doer.

Section 05

Doer

Actions·Ship a working agent app

Twelve minutes. Ship a CLI agent app that uses tools and runs a real loop.

Build block · 12 minutes

Your first SDK app

Step 1. Scaffold (2 min)

mkdir my-agent-app && cd my-agent-app
python -m venv .venv && source .venv/bin/activate
pip install anthropic

Step 2. Write the loop (5 min)

import anthropic, json

client = anthropic.Anthropic()

SYSTEM = """You are a file-reading agent. When the user asks
about a file, use read_file. Always return a plain text
summary after reading."""

tools = [{
    "name": "read_file",
    "description": "Reads the contents of a local file.",
    "input_schema": {"type": "object", "properties":
        {"path": {"type": "string"}}, "required": ["path"]}
}]

def read_file(path):
    with open(path) as f: return f.read()

def run(user_input):
    messages = [{"role": "user", "content": user_input}]
    while True:
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=SYSTEM, tools=tools, messages=messages)
        messages.append({"role": "assistant", "content": resp.content})
        if resp.stop_reason != "tool_use":
            return resp.content[0].text
        tool_results = []
        for block in resp.content:
            if block.type == "tool_use":
                result = read_file(block.input["path"])
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result[:2000]
                })
        messages.append({"role": "user", "content": tool_results})

if __name__ == "__main__":
    import sys
    print(run(sys.argv[1]))

Step 3. Run (2 min)

python agent.py "summarize /tmp/test.md"

Step 4. Test edge cases (2 min)

Try: a nonexistent file (does it fail gracefully?), an empty file (does it say so?), an adversarial input (does it escape?).

Step 5. Commit (1 min)

Expected

A working CLI agent. Takes a user query, reads files when needed, returns a summary. ~50 lines of code.

If something's wrong

Loops forever: add a max_iterations counter. If you hit 10 tool calls, bail.
Tool result too big: truncate at 2000 chars like the example, or summarize first.
Auth errors: check ANTHROPIC_API_KEY.

Rookie.

Section 06

Rookie

Pitfalls·Three SDK traps

Failure 1. The infinite tool loop

Your agent calls a tool, gets a result that prompts another call, then another. 30 tool calls later, you've burned $5 and the user is waiting.

Fix: hard cap on iterations. 10 tool calls max per user turn. Above that, return an error and ask the user for more context.

Failure 2. Dumping whole tool results

Your tool returns 50KB of JSON. The agent consumes all of it. Next turn, you're at 80% context.

Fix: summarize tool results before passing them back. "Returned 47 records; top 5 matches: ..." Agents don't need raw data to reason.

Failure 3. No streaming

Your user types a question. Your agent thinks for 8 seconds. Blank screen. User closes tab.

Fix: stream responses. Show the partial output as it arrives. Users will wait 30 seconds if they see progress, and 3 if they don't.

Manager.

Section 07

Manager

Team process·SDK in team codebases

Agent apps in a team repo are real software. They get CI, code review, versioned deploys.

The eval suite

Every agent app has an eval suite. Real inputs, expected outputs, pass/fail. Run before every deploy. Same discipline as Module 006.

Ownership

One owner per agent. They merge PRs, approve prompt changes, respond to incidents. Agent apps behave like products, they need product owners.

Prompt changes are deploys

A change to system.txt is a code change. Same PR, same review, same CI. Don't edit the prompt at runtime from a settings panel. That path ends in an outage.

Chief.

Section 08

Chief

Governance·SDK governance

Agent apps in production carry three risks worth naming.

Risk 1. Per-call cost

Agent apps run more LLM calls per user action than simple chat apps. Budget accordingly. A chat app at $0.01/request becomes an agent app at $0.15/request. At scale, different business model.

Risk 2. Tool action risk

Agent tools that write, delete, or charge carry operational risk. One bug, and the agent executes the wrong thing 10,000 times. Require human-in-the-loop confirmation for any destructive tool call, until you trust the specific tool deeply.

Risk 3. Observability

When an agent misbehaves, you need to see the full trace: messages, tool calls, tool results. Set up structured logging from day one. Debugging an agent without traces is detective work with no evidence.

Founder.

Section 09

Founder

Synthesis·The solo SDK loop

Solo founder shipping an SDK-based agent app: the minimum viable stack is ~200 lines of Python.

The solo SDK loop

Monday: write the agent loop. ~50 lines.
Tuesday: define 3-5 tools. Implement each.
Wednesday: write system.txt. Run it 10 times against real inputs.
Thursday: add structured logging. Check every trace.
Friday: wrap it in a tiny web UI (Gradio, Streamlit). Share the link.

Five-day ship. Not five months.

The three files that matter

loop.py: the agent loop itself. Rarely changes after week 1.
tools.py: one function per tool. Grows over time.
system.txt: the prompt. Updated weekly as you discover failure modes.

The one thing to remember

The SDK is 50 lines of Python around a while loop.

Every agent app you'll ever ship is a variation on that loop. Learn it once. Ship forever. The rest (streaming, memory, cost control) is trim, not substance.