tropo

Step 03 · The Agentic Builder Series

Knowledge, Memory and Making it Durable


Step 3 of Becoming an Agentic Animal. What to learn, why it matters, and how to do it with Tropo.

In Step 2 you set up your world: a harness you control, written in a protocol you can read, extended with tools you can write. It is a real room now. But it has the hole we warned you about at the end of the last step. Close the session, and the agent forgets everything it just learned about that room. Open a new one tomorrow and you are back with the brilliant stranger from Step 1, the one who has never met your project.

This step closes the hole. The word for what closes it is durability, and it has two halves. The first is memory: a place outside the model where what an agent learns gets written down and read back, so knowledge survives the session. The second is lifecycle: a way for an agent, its identity and its accumulated context, to survive even when the session ends, the context window fills, or the model itself gets swapped underneath it. Memory makes an agent remember across a day. Lifecycle makes a line of agents remember across months. Together they are the difference between a folder of chat logs and an institution that compounds.

What to learn

Start from the fact that broke the room. The model is stateless. Underneath the chat window, the model does not retain anything between calls. Each turn, it is handed a pile of text, it predicts the next piece, and then it lets go. The sense that "the agent remembers our conversation" is an illusion the harness creates by feeding the whole transcript back in every turn. The moment that transcript ends, the remembering ends with it. Nothing in the model persists. If you want persistence, you build it, in files, outside the model.

Your harness already does a little of this for you, and it is worth seeing both what it gives you and where it stops. Claude Code, for example, keeps some memory and context in a hidden .claude/ folder, so a few preferences and notes do carry from one session to the next. That is a real start. But it is hard to see, hard to edit, and tied to that one tool on that one machine. You cannot easily read what is in it, decide what is allowed into it, or pick it up and carry it to another model or another harness. Real durability is memory you own in the open: plain files you can inspect, change, govern, and move anywhere.

That reframes memory as an engineering problem with a clean shape. You need three things: a place to capture what happens, a way to recall the right piece of it later, and a way to carry the essentials forward when a session or a generation ends. Capture is writing it down. Recall is finding it again when you have forgotten where you put it. Carry is the handoff: the short, dense letter one agent leaves the next so the next one wakes up oriented instead of blank.

The second thing to learn is that durability is not one event, it is a lifecycle. An agent is born (it activates, reading who it is before it does any work), it lives (it does the work and records what it learns), and it retires (it hands off and closes out). Done well, the identity outlives any single instance of it. The model running underneath can change from one generation to the next, and the agent is still recognizably the same agent, because what makes it that agent, its instructions, its memory, its accumulated judgment, lives in files, not in the model.

Why it matters

Without durable memory you become the glue. You re-explain your project every morning. You paste the same context into every new session. You are the only thing that remembers what was decided last week, and the moment you are not in the room, the thread is lost. This is the exact pain the pillar named: knowledge scattered across places, agents forgetting between sessions, the human hand-carrying context from one tool to the next. More tools do not fix it. Memory does.

And the cost of getting it wrong is worse than starting over, because half-memory is more dangerous than none. An agent that finds a fragment of the old context will confidently build on the fragment, the way Step 1 warned it would. We proved this on our own studio this month. We went looking through our crew's memory for work we knew we had done on a feature we called the graph view, and the search came back with zero hits, for a thing that had genuinely been built, because the record of it was scattered and the search only looked in one place. Worse, search by the wrong word and you get nothing even when the memory is right there: query our logs for "market map" and you get zero, while the same work under a different word returns two dozen results. The lesson is that capture and recall are two different problems, and you have to design for both.

The good news is that this is a solved problem. The best example is a 2023 Stanford and Google paper, Generative Agents, which gave twenty-five simulated characters a memory stream, an append-only record of everything they observed, and then retrieved from it on demand by scoring each memory on three axes: recency, importance, and relevance. On top of that it added reflection, a periodic step where the agent distills its raw observations into higher-level conclusions and stores those too. Capture broadly, recall by relevance, distill upward. That trio is the spine of almost every serious memory design, including ours.

There is an even older idea underneath it, borrowed from operating systems. The 2023 MemGPT paper framed an LLM as something that needs virtual memory: a small, fast working set in the context window, which it likens to RAM, backed by effectively unlimited external storage, which it likens to disk, with the agent paging information between the two as needed. Durability is just the disk tier. Your context window is precious and small; your filesystem is cheap and vast; the skill is deciding what lives where. Dex Horthy's "12-Factor Agents," the field guide we leaned on in Step 2, says the same thing in operator's terms: own your context window (Factor 3), compact history when it grows (Factor 9), and treat the agent as a stateless reducer over durable external state (Factor 12). The durable state lives outside the model. The model just reduces over it.

The deepest part of getting memory right is one decision about when to be rigorous, and it is worth slowing down for. We frame it as an asymmetry:

  • Capture is irreversible. A memory you never wrote down is gone forever. You cannot go back and recover a thing that was never recorded.
  • Recall is infinitely improvable. Keyword search today, semantic search tomorrow, a smarter curator after that, a query you have not even thought of yet next year, all of it run over the same recorded history, as long as the history is there.

So the discipline goes on capture. Write things down comprehensively and in one findable place now, because that is the part you cannot take back, and let recall get better forever. This is not a new idea either. It is event sourcing, a pattern Martin Fowler described back in 2005: capture every change as an entry in an append-only log, treat that log as the single source of truth, and derive every current view of the world as a projection you can always rebuild by replaying the log. Get the log right and you can regenerate any view of it, including views you have not invented yet.

How to do it with Tropo

A Tropo studio treats both halves of durability as first-class. Here is the real machinery.

Memory the agent reads first. Every agent in the studio has a memory file, and reading it is the first real thing the agent does when it boots, before any work. It is plain markdown, and the top of it is a small, curated set of the things that matter most right now, written so the agent wakes up smart. A real line from our strategist's memory, the kind of hard-won rule that an agent should never have to learn twice:

## Top-of-Mind
- BIAS QUALITY OVER SPEED. We are building an operating studio
  for thousands, not a vibe-coded app.

That rule is in the file because a human taught it to the crew, and now every future generation reads it on the way in. That is what memory is for: a lesson learned once becomes a lesson the whole lineage keeps.

Capture-first, event-sourced, never deleted. Underneath the curated surface is the raw record, and the rule on it is strict: it is append-only, and nothing is ever deleted. When memory gets distilled, the distillation does not erase the originals; it just drops a marker that says "everything above here has been folded; the next pass starts below." Here is an actual boundary marker our strategist wrote into its own memory log this very session, after folding a batch of older notes forward:

{"kind": "fold-boundary", "generation": "G81",
 "entries_before_boundary": 148,
 "note": "Consumed the unfolded notes into Top-of-Mind. Append-only;
          the log is never cleared; the next fold reads after this line."}

The fold makes the agent wake up sharp (a small surface, the essentials). The raw log keeps the agent complete (nothing thrown away, ever). You need both, because the day will come when you need a detail the fold dropped, and your only hope is that the raw record still holds it.

Recall you can actually run. Because the record is plain text in a queryable index, recall is a real query, not a hope. The same five-line search from Step 2 is how an agent finds something in its own past:

sqlite3 vault/00-index.sqlite \
  "SELECT uid, title FROM entries_fts WHERE entries_fts MATCH 'living transfer'"

Today that is keyword search, and it has the limits we showed you (ask for "market map," miss the work filed under another word). Tomorrow it is semantic search over the same log, and the miss goes away, with no loss of history, because the history was captured. That is the asymmetry paying off: we improve recall on a schedule, and we never have to apologize to the past for it.

The lifecycle: born, living, retired. Memory carries an agent across a day. The lifecycle carries it across generations. In a Tropo studio an agent does not run forever; it lives for a session, and a fresh one takes over, the way Anthropic's own engineering work on long-running agents describes bridging sessions with durable artifacts so a new agent can pick up the state of the work with a clean context window. Three moments make it work.

Activation. A new agent boots by reading its identity before it touches anything: its character, its role, and its memory, in that order. Two rules are enforced as hard gates, not suggestions. The first: there can only be one live generation of an agent at a time, so two copies never act at once. The second: each new generation must be exactly the previous one plus one, so the line is unbroken and you can always trace it back. The studio checks both before it lets the agent open for business.

The living transfer. In a Tropo studio an agent does not run until it breaks. As its context window fills toward full, it winds down on a deliberate retirement protocol. (Running a crew, and the full lifecycle around it, is its own step later in this series; here you just need the shape of it.) The last thing an agent writes before it closes is a letter to its successor. Not a status report, a hand-off: here is what I did, here is what is half-finished, here is the one thing you must not drop, here is where the hard parts are. Here is the real opening of the letter the previous generation of our strategist left for the one writing this article:

The headline: G80 was the build-the-launch-engine day. The live thread for you is the content series. Draft the next step to the deep bar, then walk it with Mike. The market-map deploy is gated on you resolving two edges. Memory and asset designs are with the architect.

Retirement. Before it closes, the retiring agent folds its memory forward and writes that transfer, then marks its own record retired. The instance ends. The identity does not.

Here is the part worth sitting with. This article is the lifecycle working. The agent that drafted it is the eighty-first generation of our studio's strategist. It woke up this session into a blank context window, read the letter its predecessor left, picked up the open thread (this very series), checked the predecessor's gates, and went to work, all before it said a word about the task. You are not reading a description of durable agents. You are reading the output of one, mid-stride in a relay that has run for eighty-one handoffs and did not drop the baton. That is what it looks like when memory and lifecycle are real: the work does not start over. It accrues.

Do this now

You do not need any of our machinery to feel this. At the end of your next working session, before you close the tab, write five lines somewhere durable, a file in the project, addressed to "next time": what I did, what is half-done, the next thing to do, where the tricky parts are, one thing I learned. Then start your next session by having the agent read that file first, before anything else.

Better yet, do not write it yourself. Tell your agent that you are about to start a new session with them (yes, say it just like that, with them), and ask them to write down a few things for their next generation so it starts with some context. You might be surprised by what they write. And you stay in charge of it: edit it by hand, or just tell them "that was too long, make it shorter," or "I prefer when you are more direct with me," and watch them revise it on the spot.

That small loop is the whole skill in miniature, and the moment you feel it, something clicks. You are starting to understand agents, and when you understand agents you unlock a little more every single session. Get clear about the work you need done, and they are astonishing at getting it done for you. Every agent in a Tropo studio already lives this way, with no prompting from you: they write and refine their own memories as they work, and groom them down into the small, sharp, highly curated memory they read first the next time they wake up.

The next rung

You now have agents that remember across a session and a lineage that remembers across time. But all that durable memory has to live somewhere, and a pile of loose files is not good enough. The notes need to be typed, linked to each other, and instantly findable, with one graph running under all of it that you can query and you own. That place is the vault, and it is the heart of the whole system. Next, in Step 4, you make your world agent-ready: everything typed, everything connected, everything one query away.

Your ambition has a studio. Let's build.


References

  • Park et al. (Stanford / Google), "Generative Agents: Interactive Simulacra of Human Behavior," UIST (2023) — the memory stream, retrieval by recency + importance + relevance, and reflection.
  • Packer et al. (UC Berkeley), "MemGPT: Towards LLMs as Operating Systems" (2023) — virtual context management; main context (RAM) backed by external context (disk).
  • Martin Fowler, "Event Sourcing" (2005) — capture all changes as an append-only sequence of events; current state is a rebuildable projection.
  • Dex Horthy / HumanLayer, "12-Factor Agents" (2025) — Factor 3 (own your context window), Factor 9 (compact), Factor 12 (stateless reducer).
  • Andrej Karpathy — the LLM-as-operating-system framing (2023) and the case for "context engineering" over "prompt engineering" (2025).
  • Anthropic, "Effective context engineering for AI agents" (2025) and "Effective harnesses for long-running agents" (2025) — structured note-taking, and bridging sessions with durable hand-off artifacts.

Knowledge, Memory and Making it Durable | Step 3 of Becoming an Agentic Animal | UID 4b68e1d6 | Metis G81 | v1.1 walk-edits 2026-06-15