Module 004 · Desk I · Agent Architecture

Memory, state, and context.

What persists, what doesn't, and where state actually lives, in 90 minutes.

90 minutes · 9 sections · ~7,500 words · Prereq: Module 003
Written for
Rookie Manager Founder

Your agent has the memory of a goldfish. Today we fix that.

When you close the tab in Module 001, the agent forgets everything. Every run starts fresh. For a daily-briefing agent, that's fine. For anything that needs to know what happened yesterday, it's a dealbreaker.

Memory is the third leg of the three-part stack. System prompt, tools, memory. We've drilled on the first two. This is the third.

By the end of this module, you'll have:

  • A mental model of the three layers of memory (conversation, session, persistent) and when to reach for each.
  • A daily-briefing agent that remembers what you briefed yesterday, stored as a plain file.
  • A set of rules for what belongs in memory versus what should be re-computed each run.
  • A light governance frame for memory at team and company scale.

Most memory problems in first-year agents come from overthinking. You don't need a vector database. You don't need an embedding pipeline. For a long time, you just need a file. We'll build exactly that.

Prereq: you shipped the daily-briefing agent in Module 001 and extended it with a tool in Module 003. If you skipped either, they take 90 minutes each, and they matter. The memory layer we add today sits on top of both.

Thinker lays out the three layers.

Memory is everything the agent retains outside the current prompt.

That's the working definition. Anything in the system prompt is not memory (it's contract). Anything in the current user message is not memory (it's the payload). Everything else the agent draws on, however it's stored, is memory.

There are three layers. You need to know which one you're building in before you start.

Layer 1. Conversation memory

The messages in the current run. The user said something. The agent called a tool. The tool returned. The agent said something back. All of that sits in the conversation array and gets sent to the model on every next call within the run.

This memory is free. You don't build it; the LLM runtime gives it to you. When the run ends, it's gone.

Ninety percent of first agents only need this layer. The daily-briefing agent from Module 001? Conversation memory is enough.

Layer 2. Session memory

State that survives the end of a single run but is scoped to a known user, conversation thread, or short window. A chatbot that remembers "you told me you prefer bullets three turns ago." A coding agent that remembers "you're working in Python 3.11 today."

You build this. A simple version is a JSON file per user. A more serious version is a database with a session key. It's readable and writable, but it's intentionally short-lived. Session memory that grows without bound becomes persistent memory in disguise, and it gets stale.

Layer 3. Persistent memory

Long-term state. What the agent learned about this user last month. The list of customers this sales agent has called. The evergreen product docs this support agent can cite.

This is where retrieval lives. You index documents. The agent queries them at the moment of need. Retrieval-augmented generation (RAG) is the umbrella term for this layer.

This layer is where most teams bolt on complexity too fast. A vector database is cool. It's also overkill for anything under ~500 documents. Start with a file. Graduate when you have to.

The rule that matters most

The north star of this module

If it changes per request, it doesn't belong in the system prompt. If it changes per user, it doesn't belong in the tool. If it changes rarely, it probably belongs in a file.

Most memory bugs come from putting state in the wrong layer. Per-request data in the system prompt makes prompts huge and stale. Per-user data in a tool call makes tools non-reusable. Start by naming the layer, then pick the storage.

What LLM decides vs. what you decide

You decide: which memory layers exist, what gets written to each, what gets read into context on each call.

The LLM decides: nothing about memory storage. The LLM only sees what you put in context. If memory is wrong, it's because you pulled the wrong thing in, not because the LLM chose badly.

Unlike tools (where the LLM picks when to call) and prompts (where the LLM interprets), memory is entirely your problem. The LLM is a consumer, not a participant. That makes memory bugs easier to debug: if the agent acted on stale information, trace back to what you put in context.

When to add each layer

The discipline: don't add a layer until you need it.

  • Start with Layer 1 only. The first three agents you build should have no session or persistent memory. Just conversation.
  • Add Layer 2 when runs repeat. If the agent gets called on the same task multiple times and should know what happened last time, add session memory. Simplest form: a file per user.
  • Add Layer 3 when you have reference material. Product docs. Support tickets. A knowledge base. Something that exists outside the agent and should inform its answers.

Skipping layers, or adding them all up front, is how agents get architecturally gnarly before they're even useful.

Talker shows how to shape what the agent sees from each layer.

Memory only matters if the agent can read it and use it well.

Three patterns for wiring memory into the prompt.

Pattern 1. The context block

When you pull memory into context, put it in its own labeled block. Don't sprinkle it through the system prompt.

User context:
- Name: Sarah Chen
- Plan: Pro (annual)
- Last ticket: 2026-04-12, resolved

Recent conversation:
- Yesterday, asked about export formats. We recommended CSV.

Task: {user_message}

One labeled block per memory source. The LLM can see what's context and what's task. You can also see it when debugging.

Pattern 2. The memory rule

Tell the agent how to use the context. A rule in the system prompt that explicitly references the memory block.

Rules:
- When the "User context" block is present, use those values
  instead of asking the user. Do not re-ask for names, plans, or
  history that is already provided.
- When "Recent conversation" is present, reference prior
  decisions if they are relevant. Do not repeat them verbatim.
- If a block is empty or missing, treat it as "unknown" and ask
  the user if you need that information.

Three rules: what to do when context is there, how to use it, what to do when it's missing. Skip any one and the LLM guesses.

Pattern 3. Summarize before inject

Raw memory is expensive. A full conversation history from the last month might be 40KB of context. Summarize it first.

Raw (bad):
  [40KB of message history]

Summarized (good):
  User has asked 12 questions in the last 2 weeks, mostly about
  export formats. They prefer CSV over JSON. They previously hit
  the quota limit on 2026-04-10.

The summary is a lossy compression of the full memory. You lose detail, you save tokens, you also focus the LLM on what matters. For most per-user memory, summaries are the right representation. Keep the raw in storage; ship the summary into context.

Putting it together

A prompt template that wires all three patterns:

You are a {agent_role}.

User context:
{user_context_summary}

Recent conversation:
{recent_summary}

Rules:
- Use values from User context without re-asking.
- Reference Recent conversation decisions when relevant.
- If any block is missing or empty, treat it as unknown.

Task:
{current_user_message}

Four slots. Three of them are memory; one is the current task. On each call, you fill the slots from your memory store and ship the full prompt to the LLM.

Build block · 4 minutes
Sketch your memory schema

For your daily-briefing agent, decide what session memory you want. The simplest version: a list of the last five documents you briefed, with one-sentence summaries.

Write the schema in plain text:

recent_briefings:
  - source: /tmp/article.md
    summary: "Anthropic released extended thinking mode..."
    timestamp: 2026-04-17

  - source: https://example.com/...
    summary: ...
    timestamp: 2026-04-18

Save this somewhere you can reference in Doer. You're not building the store yet; you're deciding the shape.

Expected output

A plain-text schema describing what session memory your agent will maintain. Three to five fields, max.

Rememberer goes deep on where this state actually lives.

Storage choices for memory, sorted by complexity.

Option 1. A plain file

A JSON file per user. A text file for shared state. A markdown file for notes the agent should reference.

When: under 100 users, under 1MB of state per user, single machine, no concurrent writes.

Pros: trivial to build, trivial to inspect, no dependencies, you can read and edit with a text editor.

Cons: doesn't scale, no indexing, no concurrent safety.

For your first agents and most solo-operator tools, this is enough. Don't be embarrassed by it; a plain file serves a surprising range of use cases.

Option 2. A lightweight database

SQLite, a small Postgres instance, even a DynamoDB table.

When: multiple users, concurrent reads and writes, need to query by fields, want transactional guarantees.

Pros: real query language, real indexing, safe concurrent access.

Cons: a second system to operate. Schemas that drift. Migration work.

Most production session memory lives here. A single SQLite file is enough for surprisingly large agent deployments.

Option 3. A vector store

Pinecone, Weaviate, Chroma, pgvector. A store that can find similar items given an embedding.

When: you have a corpus of documents (over 500 items) and the agent needs to find relevant ones at the moment of need. This is the layer 3 "persistent memory" use case.

Pros: semantic search. Finds documents by meaning, not just keyword.

Cons: embedding pipeline. Index freshness. Ranking tuning. More moving parts than a database.

Don't reach for this until you've hit the wall with Option 1 and Option 2. Most "I need a vector store" statements come from people who haven't actually tried with a plain file first.

The staleness problem

Every memory source has a freshness clock.

  • Conversation memory is always fresh (it's this run).
  • Session memory is fresh if updated this run. Stale if not.
  • Persistent memory is as fresh as your last index update. Could be hours, days, never.

If the agent acts on stale memory, the user sees it. "Thanks for your interest in our product" to a customer who canceled yesterday is the canonical staleness failure. Decide the freshness bar for each memory layer and enforce it.

Simplest freshness enforcement: timestamp every memory write. Read it. Act on fresh records. Flag or refresh stale ones.

The write conventions

When the agent writes to memory, three discipline points:

  1. Write shape is the same as read shape. If you read JSON with fields X, Y, Z, write JSON with fields X, Y, Z. Don't let the agent emit free-form text into a slot that reads structured data.
  2. Writes are timestamped. Every record gets a created_at and updated_at. No exceptions. You'll want them when you debug.
  3. Writes are bounded. Cap the memory size. For a "last N things" memory, the Nth write rotates out the oldest record. Without a cap, session memory becomes unbounded persistent memory.

Most memory-system bugs are violations of one of these three.

Doer. Build the memory layer.

Time to give the daily-briefing agent a memory.

The plan: every time the agent briefs a document, it appends a record to a file. When you ask for a brief next time, the agent reads the file, sees the last five briefings, and can reference them.

This is Layer 2, session memory, stored as a plain file. Minimum viable memory. You'll extend it later; today you ship the smallest version that works.

Build block · 12 minutes
Add a memory file to the daily-briefing agent

Step 1. Create the memory file (1 min)

In a location you'll remember, create an empty memory file:

mkdir -p ~/.bot-memory
touch ~/.bot-memory/daily-briefing.json
echo '{"recent_briefings": []}' > ~/.bot-memory/daily-briefing.json

The file exists. It holds an empty list. We'll teach the agent to read and append.

Step 2. Teach the agent to read it (3 min)

Open .claude/agents/daily-briefing.md. Add a new rule at the top of the rules block:

Rules:
- At the start of every run, read
  ~/.bot-memory/daily-briefing.json using the Read tool. This
  file contains your recent briefings.
- If the user asks "what did I brief yesterday" or similar, use
  the file to answer. Do not make up entries.
- [... your existing rules ...]

Save.

Step 3. Teach the agent to write to it (4 min)

Add the Write tool to the tools: line:

tools: Read, WebFetch, Write

Add a new rule block for writing:

After every successful briefing, append the new briefing to
~/.bot-memory/daily-briefing.json using the Edit tool.

Append format (JSON, at the end of the recent_briefings array):
  {
    "source": "",
    "summary": "",
    "timestamp": ""
  }

Cap the list at 5 entries. If the list is already at 5, remove
the oldest before appending the new one.

Save.

Step 4. Run a briefing and check memory (2 min)

In Claude Code:

Use the daily-briefing agent to summarize
https://www.anthropic.com/news

After you see the three bullets, open the memory file:

cat ~/.bot-memory/daily-briefing.json

You should see one entry with source, summary, and timestamp. If the file is still empty, your write rule isn't firing. Make it the last rule in the rules block, right before the closing instructions.

Step 5. Ask the agent to reference its own memory (2 min)

Use the daily-briefing agent to tell me what I briefed
this week.

The agent should read the memory file, see the entries, and summarize them. If it makes up briefings, the read rule isn't strong enough. Add: "Never invent briefings. Only reference entries that exist in the memory file."

Expected outcome

Agent that writes to memory after each briefing and can reference prior briefings when asked.

If something's wrong
  • Memory file not updating: check the tools: line includes Write. Check the write rule is present and specific.
  • JSON gets corrupted: the Edit tool is more reliable than Write for appending to structured files. Consider using a JSONL format (one record per line) instead of a growing JSON array.
  • Agent hallucinates entries: add a rule at the top of the prompt: "Every factual claim about prior briefings must come from ~/.bot-memory/daily-briefing.json. Quote the source or timestamp."
  • File grows unbounded: the "cap at 5" rule isn't being enforced. Move it higher in the write rule block and make it more explicit.

What you just built

An agent with Layer 2 (session) memory, stored as a flat JSON file. It writes records on success. It reads them on request. It bounds itself to 5 entries so the file doesn't grow forever.

You also proved a useful pattern: for most session memory, you don't need a database. A file plus Read/Write/Edit tools is enough.

Rookie has the three ways this breaks.

Three ways memory breaks the first time.

Failure 1. Context bloat

You add memory. The agent's context window balloons. Every call costs more. Eventually calls start failing because you exceed the context limit.

Root cause: you're shipping raw memory into context instead of a summary.

Fix: summarize before inject. The memory file can grow as large as it needs to. What goes in the prompt is a compressed version: "User has briefed 5 documents this week, most recent: X" is ~20 tokens. Shipping the raw 5-entry list is ~500 tokens. Across a year of calls, that difference is real money.

Failure 2. Stale memory acted on as fresh

The agent confidently tells the user "last week you asked about X." The user never asked about X. The memory was from two months ago, or from a different user, or got corrupted on a previous write.

Root cause: the agent treats memory as authoritative without checking freshness.

Fix: every memory read should check the timestamp. If a record is more than N days old (pick N based on use case), the agent should either refresh it, flag it, or skip it. Rule in the prompt: "Only reference memory entries less than 14 days old. Skip or flag older entries."

Failure 3. Memory that conflicts with the system prompt

You tell the agent in the system prompt "be concise." The memory file has a saved user preference: "prefers long detailed explanations." The agent gets confused and picks one at random.

Root cause: no priority rule between memory and system prompt.

Fix: in the system prompt, explicitly rank the sources.

Priority order (when instructions conflict):
1. Current user message
2. System prompt rules
3. Recent conversation
4. User preferences from memory
5. Default behavior

If a memory entry contradicts a system prompt rule, the system
prompt rule wins.

The LLM can't read your mind about which source you trust more. Tell it explicitly. This is a three-line rule that prevents a whole category of bugs.

The underlying pattern

All three failures come from the same place: memory is data the LLM consumes, and like any data feed, it needs shape, freshness, and priority rules. A memory system without those is noise in the context window.

Manager takes the team view.

Memory is the layer where your agents start storing user data. That changes the conversation.

Memory is a data classification problem

The moment you persist anything about a user, you've stepped into data classification territory. Every piece of state in your memory layer has a classification:

  • Public. The company name. The product the user is on. Usually safe.
  • Internal. Usage patterns. Preferences. Internal IDs. Needs authentication to access.
  • Sensitive. Email content. Private messages. Financial data. Needs strict access controls and likely encryption at rest.
  • Regulated. Health data, payment info, children's data. Needs specific compliance controls (HIPAA, PCI, COPPA, etc.).

Before you ship an agent with memory, run this exercise: what class of data does this agent's memory hold? If the answer is "some mix, we're not sure" the agent isn't ready to ship.

Access control at the memory layer

The agent reads memory on behalf of a user. That means the authorization check has to happen when the memory is read, not just when the agent is called.

A good pattern: every memory query is parameterized by user ID, and the query itself includes a where clause that enforces that the user ID matches the authenticated caller. In SQL:

SELECT * FROM user_memory
WHERE user_id = :user_id
  AND user_id = :authenticated_user_id

Redundant by design. If one of the checks gets bypassed, the other still protects the data.

Who owns the memory layer

The agent author thinks they own the memory. The infrastructure team thinks they own the memory. Both are partly right. The cleanest ownership model:

  • Agent author owns the schema: what fields exist, what shape.
  • Infra team owns the storage: where it lives, how it's backed up, who can access the raw store.
  • Both sign off on changes to either.

This prevents the "engineer changes the schema on a Friday, infra learns about it from an incident on Monday" failure mode.

The review checklist for memory changes

When someone on your team adds or changes memory, the PR should answer:

  1. What new data class is being stored?
  2. Who can access it?
  3. What's the retention policy?
  4. How does a user delete their data from this store?
  5. What happens to the memory when a user account is deleted?

Teams that run memory changes through this checklist end up with defensible systems. Teams that don't end up with a Slack message from legal asking "can you explain what data we have about user 12345."

Observability

Log every memory write. At minimum: user ID, timestamp, schema version, what changed. You don't need to log the content (that's its own privacy problem); you need to be able to reconstruct "what did the memory look like for user X on date Y" when you have to.

Chief takes the governance frame.

Memory is the most regulated layer of your agent stack. Most companies realize this after they've shipped, not before.

Memory is user data

Anything the agent stores about a user is user data. That triggers, depending on jurisdiction:

  • GDPR (EU): right to access, right to correction, right to deletion, data minimization, purpose limitation.
  • CCPA (California): similar rights, different enforcement.
  • PIPEDA, LGPD, and others depending on where your users live.

You don't need to be an expert on the regulations. You do need to have decided, before shipping, what the retention policy is and how a user deletes their data. If you can't answer both, the agent isn't ready.

The four questions every memory system has to answer

  1. What do we store? A documented schema, not a running inventory.
  2. Why do we store it? A stated purpose. "Because we might need it" is not a purpose.
  3. How long do we keep it? A retention policy with a number. 30 days. 6 months. "Until account deletion."
  4. How does a user delete it? A mechanism. A button in settings. A support request that reliably processes in 30 days. Something real.

The answers don't have to be fancy. They have to exist.

Cost scales with memory

Every memory lookup adds tokens to the context window. Every token in context is paid on every call. An agent with a 2KB memory payload per call, running a million times a month, is spending on that memory alone.

The question to ask: does that 2KB actually change the agent's behavior enough to pay for itself? Often the answer is no. The memory was added because someone thought it might be useful, not because it demonstrably improved the agent.

Hold memory to the same ROI bar as any other feature. If removing a memory field doesn't measurably hurt agent quality, remove it.

The breach surface

Your memory store is a target. An agent's memory is often a goldmine of user-generated content, behavioral data, and private communications. More interesting than your login database in some ways.

Defenses that matter:

  • Encryption at rest. The storage layer encrypts data on disk. Required.
  • Scoped access credentials. The agent's service account can only see the memory for users the agent is currently acting on behalf of. Not the whole store.
  • Audit logs. Every memory read and write is logged. The logs are immutable or near-immutable.
  • Backup hygiene. Backups are as protected as live data. Old backups are deleted per retention policy.

The question to ask about every agent with memory

Three things:

  • What does this agent remember, and about whom?
  • What happens if that memory leaks?
  • How does a user make it forget?

A team that can answer all three in 60 seconds has a production-ready memory system. A team that can't is one incident away from an expensive education.

Founder wraps it.

You, alone, with an agent that needs to remember things. The trap is over-engineering.

Start with a file

For solo operators, 90 percent of memory needs can be handled by a flat file or a flat folder.

A single JSON file for per-user state.
A folder of markdown notes for reference material.
A text log for the agent's action history.

That's it. No database, no vector store, no embedding pipeline. Just files the agent can read with Read and write with Edit.

You can graduate to real storage when you hit real limits: concurrency, cross-device sync, scale beyond what a file can hold. Most solo operators never hit those limits.

The memory audit habit

Once a month, look at your agent's memory file. Literally open it. Read it.

You will find:

  • Old entries that should have rotated out.
  • Garbled entries from a bug you haven't noticed.
  • Useful patterns you didn't know the agent was capturing.
  • Privacy-sensitive things you didn't realize were being stored.

Fifteen minutes a month. You'll catch problems early and you'll learn what your agent actually pays attention to.

Memory that pays for itself

The discipline: every memory field should earn its place. Ask of each one:

  • Does the agent's behavior change based on this field?
  • Would I notice if I removed it?
  • Do I understand when it gets written and when it gets read?

If the answer to any of those is "not really," remove the field. Memory that doesn't change behavior is a tax on every future call.

When to graduate

Three signals it's time to leave files behind:

  1. You have more than ~100 users and they can overlap in time.
  2. You need to query memory by non-trivial fields (not just user ID).
  3. You're running more than one instance of your agent and they need to share state.

Until you hit those, a file is the right answer. Graduate when you have to, not when you think it looks more professional.

The one thing to remember

Memory is data. Treat it like data.

Shape it, timestamp it, summarize it before you ship it to the LLM, and delete it on schedule. The agent is a consumer of memory, not its manager. That's your job.

Keep exploring
More from the library.
Browse the full catalog →