Background agents run while you sleep. Get it right and they pay for themselves forever. Get it wrong and you wake up to a bill.
So far, every agent you've built has been interactive. You invoke it. You see the output. You decide what to do next. Background agents break that pattern: they run on a schedule or a trigger, without supervision, producing output for you to review later.
This is where agents get dangerous, which is also where they get the most leveraged.
By the end of 90 minutes:
- A clear mental model of what separates a background agent from an interactive one.
- A scheduled nightly agent that summarizes your day from a log file.
- Boundaries and guardrails that prevent runaway behavior.
- An audit trail so you can trust what the agent did while you weren't watching.
We'll build a low-stakes background agent first: a summarizer that runs nightly over your activity log. The patterns port to higher-stakes background tasks (report pipelines on cron, data quality checks, monitoring) as you gain confidence.
Prereq: Module 009. Understanding report pipelines is the scaffolding for understanding background pipelines.
Thinker on what makes an agent background.
Three things make an agent "background" instead of "interactive."
Property 1. Trigger
Interactive agents are triggered by a human asking. Background agents are triggered by something else: time (a cron schedule), an event (a webhook, a new file, a database change), or a condition (a metric threshold crossed).
The trigger matters because it determines when the agent is responsible for running. A cron agent that doesn't run at 2am is broken. An interactive agent that doesn't run until you ask is fine.
Property 2. Supervision
Interactive agents have a human watching the output in near-real-time. Background agents don't. The human reads the output later, if at all.
This means background agents have to produce output that's useful and safe without a human filter. Any mistake the agent makes either goes through (if no review) or sits in a queue (if reviewed later).
Property 3. Boundaries
Interactive agents can be given latitude because a human will stop them if they go off-track. Background agents need explicit boundaries: rules about when to stop, what not to do, what constitutes a success or failure condition.
Without boundaries, a background agent can loop, make repeated tool calls, or process inputs it shouldn't. For hours. While you sleep.
The three kinds of background agents
Runs on a clock. Report pipelines, nightly summaries, weekly cleanups.
Runs when something happens. A new email arrives, a file is uploaded, a ticket is created.
Runs when a threshold is crossed. Error rate spikes, queue backs up, usage exceeds budget.
Start with scheduled. They're the most predictable. The trigger fires at known times, making it easy to observe whether the agent ran, how long it took, and what it produced.
The bounded-run principle
Every background agent has explicit limits:
- Time limit. "This run will terminate after 5 minutes no matter what."
- Call limit. "This run will make at most 10 LLM calls."
- Cost limit. "This run will cost no more than $2 in inference."
- Output limit. "This run will produce at most 100KB of output."
These limits are enforced outside the agent, in the harness code. Not by asking the LLM nicely. The LLM can't reliably enforce a budget on itself.
The read-only first rule
From Module 003: start read-only. For background agents, this is even stricter.
Your first five background agents should not write anywhere except a single output file you review. No database writes. No API calls that change state. No emails sent. Just: read inputs, produce a file, human reads file.
Graduate to write-capable background agents only after you've run read-only ones for a month or more and trust their output.
Background agents need boundaries more than interactive ones. The human isn't watching.
Time limits, call limits, cost limits, output limits. Read-only first. Audit trails. Every runaway story starts with an agent that didn't have any of these.
Talker on prompts for unsupervised runs.
Background agent prompts are extra-tight versions of the patterns from Module 005. Because no human is there to catch drift, the prompt has to be its own safety net.
Pattern 1. The "if uncertain, stop" rule
Interactive agents can ask the user for clarification. Background agents can't. The fallback for uncertainty is to stop, log, and alert a human.
Rules:
- If the input is not what you expected, do not attempt to
proceed. Write "UNEXPECTED_INPUT" to the output file along with
the specific input that surprised you. End the run.
- If any tool call returns an error, do not retry more than once.
Write "TOOL_FAILED" with the error details. End the run.
- If you cannot complete the task in the first attempt, do not
iterate. Write "INCOMPLETE" with what you did complete.
The discipline: stop early and visibly. A run that produces "UNEXPECTED_INPUT" plus a description is much better than a run that produces plausible-looking but wrong output.
Pattern 2. The scope lock
Interactive agents can be given fuzzy tasks. Background agents need a single, narrow task.
Task: Read the log file at /var/log/my-app.log. Produce a
summary of the day's activity as three bullet points.
You will not:
- Modify any files.
- Make network calls.
- Summarize anything other than this specific log file.
- Produce output longer than 500 words.
The "you will not" list is explicit. The interactive version of this prompt might have the same instructions, but phrasing them as hard constraints (rather than "your task is...") reduces scope creep.
Pattern 3. The self-termination signal
Background agents should end their runs with an explicit signal. Not just "finish outputting." A line that says "I'm done."
Format:
[summary content]
STATUS: OK | INCOMPLETE | ERROR
END_OF_RUN
The harness watching the agent looks for "END_OF_RUN" to know the run completed. Without a termination signal, the harness can't distinguish "the agent is still thinking" from "the agent ended."
Pattern 4. The logging-first rule
Every background agent run should log:
- Start time, end time, duration.
- Input summary (what it read).
- Output summary (what it wrote).
- Any tool calls made.
- Final status (OK, INCOMPLETE, ERROR).
This is the audit trail. Rememberer covers where it lives.
The full background-agent prompt
# Identity
You are a nightly log-summarizer agent.
# Capability
You read a specific log file and produce a three-bullet summary
of the day's activity. You run on a schedule, without human
supervision.
# Scope lock
You will only:
- Read /var/log/my-app.log
- Produce output as specified below
You will not:
- Modify any files
- Make network calls
- Read any other files
# Output format
Day summary ({date}):
- [first bullet]
- [second bullet]
- [third bullet]
STATUS: OK
END_OF_RUN
# Stop conditions
- If /var/log/my-app.log is empty or missing, write:
"Day summary ({date}): (no activity logged)"
STATUS: EMPTY
END_OF_RUN
- If the log file is over 10MB, do not attempt to summarize it.
Write "STATUS: TOO_LARGE" and end.
- If any line in the log seems malformed, include it in a
separate "anomalies" section and STATUS: PARTIAL.
Note the defensive posture throughout. Every clause is about when to stop, what not to do, what to write when something's off. No ambiguity about scope.
For a background agent you'd like to build, write down:
- Three conditions where the agent should stop and not retry.
- The exact output the agent should produce in each stop condition.
- The hard limits (time, calls, output size).
These become the defensive spine of your prompt.
A list of stop conditions with specific responses. Saved for Doer.
Rememberer on audit trails.
Background agents live or die by their audit trail. Without one, you can't investigate anything.
The minimum viable audit trail
For every background agent run, log at least:
{
"run_id": "uuid-here",
"agent": "nightly-summarizer",
"started_at": "2026-04-18T02:00:00Z",
"ended_at": "2026-04-18T02:00:42Z",
"status": "OK",
"input_file": "/var/log/my-app.log",
"input_size_bytes": 12480,
"output_file": "/var/reports/2026-04-18.md",
"output_size_bytes": 340,
"tool_calls": ["Read"],
"llm_calls": 1,
"cost_usd": 0.012
}
One JSON object per run. Appended to a log file. You can query it later.
Where the log lives
A flat file is fine for personal or low-volume use. Something like ~/.bot-logs/runs.jsonl, one record per line.
For team deployments, the log goes to your central logging system (CloudWatch, Datadog, Splunk, whatever you use). Background agents are infrastructure; their logs belong alongside your other infrastructure logs.
What to alert on
Not every log entry is interesting. Three conditions that should trigger an alert:
- Run didn't happen when scheduled. Cron failed, agent crashed, infrastructure issue.
- Run completed with status other than OK. Agent hit a stop condition or errored.
- Run cost exceeded threshold. Either this agent is drifting expensive, or there's a bug.
Everything else is background noise. Don't alert on normal runs; you'll stop paying attention.
The output retention policy
Background agents produce output. That output accumulates. Decide retention early:
- Daily outputs: keep 90 days, archive beyond.
- Weekly outputs: keep 1 year.
- Quarterly outputs: keep forever.
Without a retention policy, your disk fills up with 2-year-old nightly summaries nobody will ever read. Set the policy at build time; implement the cleanup as part of the background agent pattern.
The review cadence
Background agents aren't "set and forget." They need review.
- Daily (for the first week): read every output. Look for weirdness. Tune the prompt.
- Weekly (ongoing): sample 3-5 outputs from the week. Spot-check.
- Monthly: review the logs. Any runs that didn't complete normally? Any cost anomalies?
If you can't commit to at least the weekly review, don't ship the background agent yet. Unreviewed background agents rot.
Doer.
Build a scheduled nightly summarizer. It reads a log file, produces a bullet summary, and logs its own audit trail.
Step 1. Create the target log (1 min)
Make a sample log file to summarize. /tmp/daily-log.txt:
09:15 - Deployed v1.42 to production
10:30 - Code review on PR #487 (approved)
11:00 - Standup meeting
13:00 - Lunch
14:00 - Investigated flaky test in auth module, root-caused
15:30 - Wrote RFC for new memory layer
16:00 - Paired with Jamie on the payments bug
17:45 - Wrapped up
Step 2. Create the nightly agent (3 min)
Create .claude/agents/nightly-summarizer.md:
---
name: nightly-summarizer
description: Reads /tmp/daily-log.txt and produces a three-bullet
summary. Runs unsupervised on a nightly schedule.
tools: Read
---
You are a nightly log-summarizer agent.
Task: Read /tmp/daily-log.txt. Produce a three-bullet summary
of the day's activity.
Scope lock:
- You will only read /tmp/daily-log.txt.
- You will only produce the output format below.
- You will not modify any files or make network calls.
Output format:
Day summary ({date}):
- [first bullet]
- [second bullet]
- [third bullet]
STATUS: OK
END_OF_RUN
Stop conditions:
- If /tmp/daily-log.txt is empty or missing, output:
Day summary ({date}): (no activity logged)
STATUS: EMPTY
END_OF_RUN
- If the file is over 100KB, output:
STATUS: TOO_LARGE
END_OF_RUN
- If any line is severely malformed, include it in an anomalies
section and mark STATUS: PARTIAL.
Every run must end with either STATUS and END_OF_RUN. No
exceptions.
Step 3. Run it manually first (2 min)
Use the nightly-summarizer agent.
Expected output: three bullets about the day, STATUS: OK, END_OF_RUN. Verify the format is exact.
Step 4. Test empty-file stop condition (1 min)
echo "" > /tmp/daily-log.txt
# then run the agent again
Expected: "(no activity logged)", STATUS: EMPTY, END_OF_RUN. The agent should NOT invent a summary.
Step 5. Set up the schedule (2 min)
For a real deployment, you'd use cron. For this build block, a shell script that you can run manually or cron-schedule:
cat > ~/bin/run-nightly-summary.sh << 'SCRIPT'
#!/bin/bash
DATE=$(date +%Y-%m-%d)
OUTPUT_DIR=~/.bot-outputs/nightly-summary
LOG_DIR=~/.bot-logs
mkdir -p $OUTPUT_DIR $LOG_DIR
START=$(date +%s)
claude -p "Use the nightly-summarizer agent." \
> $OUTPUT_DIR/$DATE.md 2>&1
END=$(date +%s)
echo "{\"date\":\"$DATE\",\"duration_s\":$((END-START)),\"output\":\"$OUTPUT_DIR/$DATE.md\"}" \
>> $LOG_DIR/runs.jsonl
SCRIPT
chmod +x ~/bin/run-nightly-summary.sh
Step 6. Add to cron (1 min)
crontab -e
# Add this line:
0 23 * * * ~/bin/run-nightly-summary.sh
This runs the summarizer at 11 PM every night.
Step 7. Monitor for a week (passive)
Each morning, open ~/.bot-outputs/nightly-summary/YYYY-MM-DD.md and read the summary. Keep notes on:
- Did it run? (audit log should have an entry)
- Is the output format correct?
- Does the summary make sense?
A background agent that runs nightly without supervision, produces a bounded output, logs its runs, and has explicit stop conditions for anomalies.
- Cron didn't run: check
/var/log/system.log(macOS) or/var/log/syslog(Linux) for cron errors. Common: path issues, permissions. - Output format drifts between runs: the prompt isn't being followed. Tighten the scope-lock and output-format sections.
- Agent processes wrong file: scope-lock rule wasn't specific enough. Hardcode the exact path.
- Runs take forever or hang: add a timeout wrapper around the claude call (
timeout 300 claude -p ...).
What you built
A production-pattern background agent. Scheduled trigger. Scoped task. Explicit stop conditions. Audit trail. Bounded output.
The nightly-summarizer is a toy. The pattern is not. Apply this same shape to:
- Weekly metrics reports (cron-scheduled, reads data, produces report file).
- Error log summarization (every hour, reads error log, flags anomalies).
- Sales pipeline summaries (daily, reads CRM export, produces digest).
All use the same skeleton: scope lock, stop conditions, STATUS line, END_OF_RUN signal, audit log entry.
Rookie has the three ways background agents blow up.
Three ways background agents go bad.
Failure 1. The runaway agent
The agent hits an unexpected input. Instead of stopping, it tries variations. Makes 20 tool calls. Loops for an hour. You wake up to an inflated bill.
Root cause: the prompt didn't include explicit stop conditions, or the harness didn't enforce limits outside the prompt.
Fix both. The prompt says "if uncertain, stop and write UNEXPECTED_INPUT." The harness says "kill this process after 5 minutes no matter what." Defense in depth. The LLM is not reliable at self-limiting; the harness has to back it up.
Failure 2. Silent failures
The cron job runs. Exits with status 0. Produces no output. For a week. You don't notice because nothing alerted you.
Root cause: your audit logging only captured "run completed," not "produced expected output."
Fix: the audit log entry includes output size. An alert fires if output size is zero or dramatically smaller than historical norm. "The agent ran" and "the agent produced useful output" are different things; monitor both.
Failure 3. Drift without detection
The agent's output quality degrades over time. The prompt didn't change. The model changed (provider pushed an update). The outputs look fine at a glance but are subtly worse. By the time someone notices, a month's worth of outputs are marginally wrong.
Root cause: background agent outputs aren't evaluated the way interactive agent outputs are.
Fix: run the eval suite (Module 006) against background agent outputs regularly, even if the inputs didn't change. A weekly eval that compares this week's agent behavior to baseline catches model drift before it accumulates.
The production readiness checklist
Before wiring any background agent to a real trigger:
- Explicit stop conditions in the prompt (3+).
- Time, call, and output limits enforced in the harness.
- Audit log captures input size, output size, duration, status.
- Alerting on: missed runs, non-OK status, output size anomalies, cost anomalies.
- Weekly eval against baseline (for drift detection).
- Weekly or daily human review (at least for the first month).
Six items. Skipping any of them is how background agents become incidents.
Manager takes the team view.
Background agents are infrastructure. They need the same governance rigor.
The background agent registry
Every background agent your team runs is listed in a registry. A simple spreadsheet or markdown file:
Agent | Owner | Trigger | Cost/run | Status
nightly-summarizer | Aycee | cron 23:00 | $0.01 | running
weekly-metrics | Sam | cron Mon 9am | $0.08 | running
ticket-monitor | Jo | webhook | $0.03 | paused
The registry answers "what's running in our environment." Without it, background agents accumulate. After 18 months, nobody knows which ones are still useful, which are wasting money, which are broken.
The quarterly background agent review
Once a quarter, the team goes through the registry:
- Is this agent still useful?
- Does its output still match what we need?
- What's the cost trend? (Running cost going up without value going up is a red flag.)
- Any that should be retired?
Twenty minutes. Catches the "agent nobody uses but we're still paying for" problem before it compounds.
Shared infrastructure for background agents
At team scale, each agent shouldn't reinvent the harness. Build shared infrastructure:
- Shared harness: one Python library or shell framework that every background agent uses. Handles logging, timeout enforcement, retry policy, output capture.
- Shared dashboard: a single page showing all background agents' recent runs, costs, statuses.
- Shared alerting: one pattern for "agent didn't run," "agent errored," "agent exceeded cost." Every new agent plugs in, doesn't build its own.
This infrastructure pays for itself after 3-5 agents. By agent 10, you'd be drowning without it.
The ownership transfer
When the person who built a background agent leaves the team, what happens? Ideally: nothing.
The agent is documented in the registry. Its prompt is in a shared repo. Its runbook says how to investigate failures. The new owner picks it up without needing a knowledge-transfer session.
Without this discipline, departures create "abandoned agents" that nobody wants to touch. They stay running because it's scary to turn them off. They waste money until someone finally audits.
The change control for schedules
Changing when an agent runs (schedule, trigger) is a change. It goes through review like any other change.
Not "Bob decided Tuesday he'd move the run to 3am." Rather "Here's the PR: we're changing the cron from 23:00 to 03:00. Reason: 23:00 overlaps with nightly backup jobs. Effect: outputs available 4 hours later, morning review still viable."
Small change. Review still happens. Documented. Traceable. When something goes wrong two weeks later, you can see what changed and why.
Chief on when not to.
Background agents are high-leverage and high-risk. The leverage is why you want them. The risk is why most companies should deploy them more carefully than they do.
When NOT to use a background agent
Three categories where the answer is no, or at least "not yet."
High-stakes individual outputs. An agent that makes a $10K decision per run should not run in background. Each decision deserves a human review before execution. Interactive with human-in-the-loop.
Customer-facing without review. An agent that replies to customers or takes actions on customer accounts without human review is a customer-experience risk. The cost of one wrong action can exceed the value of a thousand right ones.
Novel or rapidly-changing tasks. Tasks where what "correct" looks like is still evolving. Background agents bake in assumptions. If those assumptions are in flux, the agent's output will drift away from what you want, and you won't notice until the drift is large.
The five-question test
Before approving a background agent for production, the team answers:
- What's the worst thing this agent could do in an hour of runaway?
- How much would that cost to recover from?
- How would we detect it?
- How quickly could we stop it?
- Is the upside of this agent running unsupervised worth that risk?
If the answer to Q1 includes "sending wrong emails to customers" and Q4 is "hours to notice," the agent isn't ready for production. Either tighten the scope or keep a human in the loop.
Cost governance for background agents
Background agents can quietly run up bills. Three policies:
- Per-agent budget cap. A hard limit in infrastructure, not a goal in a doc. If the agent exceeds its monthly budget, it stops running. Someone investigates.
- Per-run cost ceiling. A single run can't cost more than X. Enforced in the harness.
- Aggregate monitoring. All background agent cost visible on one dashboard. Trends tracked monthly.
Without these, background agents can, and do, eat budgets in quiet ways that don't show up until the finance team asks why the AI bill doubled.
The audit story
If asked "what is this agent doing," the answer should be available in under a minute. The registry. The logs. The prompt. The runbook.
If the answer is "uh, I'd have to go find the engineer who built it," you have an audit problem. Regulated industries especially need this: an auditor asking about AI-in-production gets a documented answer, not a scavenger hunt.
Kill-switch discipline
Every background agent has a documented way to turn it off quickly. A single command. A single config change. No need to understand the agent's internals.
In an incident, the first move is to stop the agent. The second move is to investigate. If stopping requires investigation first (because no one knows how to turn it off), you've inverted the incident response order. Bad.
The three governance items
For any organization running background agents:
- A registry of all background agents in production. Updated within 24 hours of deployment or retirement.
- A kill-switch runbook for each. One page per agent.
- A monthly cost and status review. On the record, team-visible.
Three items. Not onerous. The difference between agents-as-controlled-leverage and agents-as-shadow-infrastructure.
Founder wraps it.
For a solo operator, background agents are where the work gets real leverage. Also where solo operators most often get burned.
Your first three background agents
Three candidates, roughly in order of safety:
- A nightly summarizer of your activity. Reads your log/notes. Produces a summary. Read-only, low stakes, low cost.
- A weekly metrics report. Reads a data file. Produces a markdown report. Same pattern as module 009.
- A scheduled inbox digest. Classifies the previous day's emails. Produces a prioritized list. Does not reply.
All three are read-only. All three produce output you review. None of them take actions on your behalf.
Resist the temptation to skip to "agent that replies to my emails automatically" or "agent that pays invoices under $X." Those are graduation exercises; you don't start with them.
Your personal background infrastructure
One folder structure works for everything:
~/.bot-infrastructure/
agents/ (subagent files)
outputs/ (agent outputs, subfolder per agent)
logs/ (audit trail JSONL)
scripts/ (cron-invoked shell scripts)
runbooks/ (one MD per agent: what it does, how to stop it)
Set this up once. Every new background agent plugs in. You don't reinvent for each.
The Monday morning ritual
Every Monday, ten minutes:
- Check each background agent's last-run status.
- Read one output from each, randomly chosen from the week.
- Glance at the cost log.
If everything's fine, move on. If something's off, pause the agent, investigate. Ten minutes a week keeps your fleet healthy and builds trust that your agents are actually doing what you think they're doing.
Kill switches you can actually hit
For each background agent, know the one-line command to turn it off:
crontab -l | grep -v "run-nightly-summary.sh" | crontab -
Or, better, keep a script:
~/.bot-infrastructure/scripts/disable-agent.sh nightly-summarizer
If you can't remember the kill command in under 30 seconds, you have a kill-switch problem. Fix it before the problem fixes you.
The monthly cost check
First of the month, look at last month's total AI cost by agent. You're looking for anomalies: an agent that used to cost $3 and now costs $30. Investigate immediately.
Steady state on an established agent: costs should be flat or trending down as you tune the prompt for efficiency. Growing cost without growing value is a signal to intervene.
Background agents work great until they don't. The moment they don't is when the human isn't watching.
Stop conditions in the prompt. Hard limits in the harness. Audit trails on every run. Kill switches you've tested. Ship with all four or don't ship yet.