An alphabet book for bots.
Every field has its own vocabulary. The field of building with AI has more of one than most, and the words shift every six months.
This is a primer. Twenty-six letters. Twenty-six words. Each one a concept you'll meet again and again if you stick around. We kept the definitions short, the examples concrete, and the cross-references live. Scroll top to bottom or click a letter to jump.
No quiz at the end. No gating. If you can read this and recognize most of the words when you see them next, the primer did its job. When you hit something you're ready to build, head to Module 001.
Hello takes you to A.
A loop that decides what to do, then does it.
An agent is a program that perceives, decides, acts, and observes what happened. Then loops. The LLM handles the deciding. The tools handle the doing. The memory handles the remembering. Strip any of those three and you have something else. A chatbot, maybe. A script. Not an agent.
You'll meet agents on every page of this publication. Building them is the point.
A deployed agent with a specific job.
"Bot" used to mean anything automated. It's settled into a meaning closer to "a focused agent pointed at a specific task." A bot writes email drafts. A bot answers support questions. A bot moves tickets. Same loop as an agent, narrower purpose.
Our namesake. The nine Bots of Today are the editorial cast, and each one has a job.
Everything the model can see at this moment.
Context is what the model reads before answering. The system prompt is context. The user message is context. Retrieved documents are context. Everything inside the model's current view is context. Everything outside is a lookup problem.
Agents stand or fall on their context strategy. Too little, the model guesses. Too much, it loses focus. Every module brushes against this trade-off.
The gap between what you meant and what the model inferred.
Drift is the distance between your intent and the model's output. A tight system prompt collapses it. A loose one widens it and asks the model to guess. Drift is measurable, not mysterious. It responds to work.
Module 002 fights this one head-on. The contract closes the gap.
A test you can rerun to prove a change did what you think it did.
An eval is a small, fast, repeatable check of an agent's behavior. Input goes in. Output comes out. You score the output against what you expected. Pass or fail. Run ten times and you get a percentage.
Evals are how you know your agent got better instead of just different. Without them, every prompt change is a guess. With them, you have a dashboard.
Teaching the model by showing, not telling.
A few-shot prompt includes one or two examples of the thing you want. Input-output pairs, shown directly. The model learns the shape and copies it.
Few-shot wins when the instruction is hard to write but the pattern is easy to see. Two good examples beat five mediocre ones. Every time.
Tying output to a source you can check.
A grounded answer cites where it came from. An ungrounded one asks you to trust it. Grounding is how you move from "the model says" to "the model says, based on this document, here's the link."
You'll see grounding in retrieval-augmented agents, citation tools, and anywhere the cost of being wrong is real. Ungrounded is faster. Grounded is auditable. Most production work needs the second one.
When the model says something confidently that isn't true.
The model doesn't know it's making things up. It's generating the most likely next words, and sometimes the most likely words don't match reality. Brand names that don't exist. Library functions that were never written. Court cases that never happened.
Hallucination is the most-quoted risk of working with LLMs. It's real. It's also containable: grounding, retrieval, and narrow tasks shrink the surface dramatically. Every module in this publication assumes you're engineering against it.
The act of running the model on one input to get one output.
Inference is what happens when you hit send. The model processes your prompt, produces a response, and stops. One call, one answer. No memory of the call carries forward unless you explicitly feed it back.
The word matters mostly because it separates two things: the model's training (done once, by someone else) and the model's inference (done millions of times, by you). You don't train models in this publication. You infer, then you harness.
Structured output. The shape most agents return.
JavaScript Object Notation. The format agents live and die by. Keys. Values. Arrays. Nothing fancy. Every downstream system you wire an agent into expects JSON, because every parser knows how to read it.
If your agent returns prose when you asked for JSON, your system prompt is loose. Tighten the format specification and the escape hatch. Module 002 gets into the mechanics.
Where retrieval looks. Your documents, indexed and ready.
A knowledge base is the searchable pile of documents your agent can reach into. Product docs. Support tickets. Policy PDFs. Internal wikis. You index them once, query them at call time, feed the relevant pieces into the context.
The knowledge base isn't the agent. It's the library the agent visits. Keep them separate in your head and in your architecture.
Large language model. The engine underneath every agent.
The LLM is the thing that predicts text, one token at a time, based on everything it has read. Claude is an LLM. GPT is an LLM. Llama is an LLM. They have personalities, strengths, and costs, but at the core they all do the same thing.
When people say "the model" in this publication, they usually mean the LLM. Models come and go; the patterns you'll learn here outlive any specific one.
What the agent retains between runs, and how it finds what it needs.
Memory is everything the agent knows that isn't in the current prompt. Conversation history. Stored user preferences. Prior tickets. Past actions. Each category is a different kind of memory, and each has its own shape.
The rule: if it changes per request, it's memory, not system prompt. That single split saves you from most prompt bloat. Module 002 lives in this neighborhood.
Zero, one, or a few examples. Pick a number.
N-shot is the family name. Zero-shot means no examples. One-shot means one. Few-shot means a few. The N is the number, and the number matters more than it seems.
Higher N usually means better accuracy on narrow tasks, but higher cost and slower responses. Two solid examples beat eight weak ones. You're not feeding the model; you're calibrating it.
What the agent produces. Format matters as much as content.
Output is the thing you get back. If it's well-structured, the next system can use it. If it's loose prose, someone has to parse it by hand or with a fragile regex that breaks on the first edge case.
Every agent in this publication pays attention to its output shape. A correct answer in the wrong format is a bug.
What you say to the model.
A prompt is the text you send. Sometimes it's a sentence. Sometimes it's a paragraph with examples, rules, format specs, and a clean escape hatch. The prompt is the entire artifact of your side of the conversation.
"Prompt engineering" is the discipline of writing prompts that behave reliably. It sounds like a lot. It collapses into: say exactly what you want, show the shape, handle the edge case.
A single input asking a single thing.
A query is the short version of a prompt. One question. One ask. Often shorter than a full prompt because the system prompt is already doing the framing.
"What are the five most-used product features this quarter?" is a query. Everything the agent needs to know about tone, format, and rules already lives in the system prompt.
Fetching the relevant bit at the moment of need.
Retrieval is the lookup step that pulls specific documents out of a knowledge base right before the model needs them. You don't send the whole library. You send the five most relevant pages.
Retrieval-augmented generation (RAG) is the umbrella term. RAG isn't a trick; it's the standard pattern for agents that need to ground their answers in specific material.
The contract. Read first, by the agent, always.
The system prompt is the instructions your agent reads before anything else. It sets identity, constraints, output shape, and the escape hatch. Module 002 is entirely about this one thing.
When people say an agent drifted or went off the rails, they usually mean the system prompt was loose. Tighten the contract. The behavior follows.
A function the agent can call to do something in the world.
Tools are the hands of an agent. A search tool. A calendar tool. A calculator. An API caller. Each tool has a name, a description, and a schema for what goes in and what comes out. The agent decides when to reach for one.
Most modules in this publication teach you to build or wire tools. They're where agents stop being clever and start being useful.
What just arrived. The per-request payload.
The user message is the content of the current request. In a triage agent it's the inbound ticket. In a chatbot it's what the human typed. In a background agent it's the record that triggered the run.
User message changes every call. System prompt stays the same. Keep them apart and most prompt problems resolve themselves.
The math behind retrieval.
A vector is a list of numbers that represents a piece of text in space. Similar pieces of text have vectors that point in similar directions. Retrieval works by comparing vectors.
You don't have to understand the math to use retrieval. You do have to understand that "similarity" in this world is a geometric concept: closer vectors mean closer meaning. That's the whole frame.
A chain of agent steps. What happens in what order.
A workflow is multiple agent moves, strung together. Triage, then draft, then send. Summarize, then categorize, then file. Each step is a call, and the output of one becomes the input of the next.
Single-agent workflows are straightforward. Multi-agent workflows get interesting fast. Desk III (Workflow Automation) lives here.
The moment the agent stops thinking and does.
Execute is when the agent's tool call fires and something changes in the world. An email sends. A ticket moves. A row writes to a database. Up to this moment the agent was reasoning. After this moment, something is different than it was before.
The stakes of an agent are set at execute. Build the permission boundaries (and the rollback paths) before you turn on the first real tool.
The config format that tells an agent what it is.
YAML is the lightweight structure that sits at the top of most agent definition files. Name. Description. Tools. Model choice. Everything the infrastructure needs to know is crammed into a few labeled lines of whitespace-sensitive text.
The agent file pattern in Claude Code starts with YAML frontmatter. So do many prompt libraries. It's not exciting; it's just where the wires meet.
No examples. Just instructions and trust.
A zero-shot prompt tells the model what to do without showing any examples. "Classify this ticket." "Summarize this document." "Translate this email into German." The model figures it out from its training and your instructions alone.
Zero-shot works surprisingly often on well-defined tasks. When it doesn't, you have a decision: switch to few-shot, or tighten the instruction. Both are cheap.