tropo

Step 06 · The Agentic Builder Series

Build and Coordinate a Crew


Step 6 of Becoming an Agentic Animal. What to learn, why it matters, and how to do it with Tropo.

Everything up to here has had one agent in the room. You set up its world, gave it memory, put its work in a graph, and organized that work into a board it can read. It is a good colleague now. This step is the leap that the whole guide has been building toward: you stop having one agent, and you start running a crew of them. This is the rung the pillar called the one that makes you the animal, and it has three moves. You build your first agent on purpose. You let agents build agents. And you let them work together, with you in the loop exactly where you choose, and nowhere else.

What to learn

An agent is a file you write. The thing that makes an agent that agent, and not a blank model, is a small set of plain documents: who it is, what it is allowed to touch, and how it behaves. Give a model that file at the start of every session and it stops being a generic assistant and becomes a specific member of your team, with a job and a character that survive across sessions and across model upgrades, exactly the durability you built in Step 3. Building an agent is not training a model. It is writing down an identity.

Agents build agents. Here is the part that surprises people. Once an agent is just a file, the thing that can write that file is another agent. You describe the specialist you need, and an agent drafts its identity, its scope, and its instructions. Your crew stops being something only you can grow. It becomes something the crew can grow, under your direction.

Crews coordinate through a shared record, not a group chat. The naive picture of multiple agents is a chatroom where they all talk at once. That is not how a durable crew works, because the members are almost never awake at the same moment. Instead they coordinate the way the rest of this guide coordinates everything: through a shared, append-only log. One agent leaves a message addressed to another; the other reads it when it next runs and replies the same way. The log is the medium, so a handoff survives even when no two agents are ever active together.

You put yourself in the loop on purpose. A crew does not mean you lose control. It means you choose your control points. Some actions an agent just does and tells you about. Some it must stop and ask you to approve. The skill is deciding which is which, and wiring the second kind so the agent literally cannot proceed without you, while everything else runs without waiting on you.

Reusable procedures beat improvisation. When a job recurs, you do not want an agent re-deriving how to do it every time, with a different result each time. You want a playbook: a written procedure the agent follows. The same instinct that makes a checklist beat memory in a cockpit makes a playbook beat improvisation in a crew.

Autonomy is the prize and the hazard, and the brakes are the skill. The most powerful thing an agent can do is run on its own, in a loop, until a goal is met. It is also the most dangerous, because a loop with no good way to stop will burn your budget, or your trust, in minutes. Letting an agent run free is easy. Knowing how to make it stop is the actual engineering.

Why it matters

Start with why a crew is worth the trouble, because it is not free. Anthropic published hard numbers from their own multi-agent research system, where a lead agent coordinates specialized subagents working in parallel. The result was striking: the multi-agent setup "outperformed single-agent Claude Opus 4 by 90.2%" on their internal research eval. So was the cost: "multi-agent systems use about 15× more tokens than chats." That is the trade in one line. A crew can do things one agent cannot, and you pay for it in tokens. Which means a crew is something you design, not something you spawn by reflex.

How you design it matters more than it looks, because of an old law. In 1968 Melvin Conway observed that organizations "are constrained to produce designs which are copies of the communication structures of these organizations." The shape of your team becomes the shape of your work. Give two agents overlapping, undefined scopes and they will produce tangled, overlapping output. Give them clean roles and clean ways to talk, and the work comes out clean. When you build a crew you are not just hiring help; you are choosing the structure your output will inherit.

The coordination has to be durable for the reason Step 3 taught: agents forget, and they are rarely awake together. A group chat evaporates. A shared log does not. This is the same insight serious harnesses are converging on, where an agent reads a progress record at the start of a session and writes one at the end so the next agent picks up cleanly. Scale that from one agent across time to several agents across each other, and you have a crew that coordinates without ever being co-present.

Putting yourself in the loop selectively is what keeps a crew safe without making it useless. The tooling for this is real: HumanLayer, one of the clearer efforts here, gives agents a way to "contact humans for help, feedback, and approvals" as an explicit step, and makes "own your control flow" a core principle, you decide when the loop pauses for a human. The art is to spend your attention only where it changes the outcome. Approve the irreversible thing. Ignore the routine thing. A crew where you approve everything is just you with extra steps; a crew where you approve nothing is a crew you cannot trust.

Playbooks matter because the gain is measurable, and it predates AI. Google's Site Reliability Engineering teams found that a written procedure beats improvisation by a wide margin: "thinking through and recording the best practices ahead of time in a 'playbook' produces roughly a 3x improvement" in time-to-recover versus winging it. An agent following a good playbook is that 3x, every time the job recurs, instead of a fresh guess.

And loop engineering matters because autonomy is where agents both earn their keep and go wrong. Anthropic put the warning plainly: "The autonomous nature of agents means higher costs, and the potential for compounding errors," and the recommended response is "extensive testing in sandboxed environments, along with the appropriate guardrails." The loop itself, an agent running tools toward a goal, is the easy part. The hard part, the part almost nobody gets right the first time, is the exit condition: the set of brakes that decide when the loop has to stop, whether it succeeded or not. Get the brakes right and autonomy is a superpower. Get them wrong and it is a runaway.

How to do it with Tropo

In a Tropo studio every one of these is, by now predictably, plain governed files. Here is the real machinery, and for once the example is not hypothetical. The agent that drafted this article has a name, Metis, and a job, strategist, and she is a working member of exactly the crew this section describes. The names below are real members of this studio's crew, and so is the work.

An agent is one governed file. Its identity lives in a single entry: who it is, its role and scope, and a "soul" document that defines how it behaves. The frontmatter is the dog tags; the body is the character:

---
type: agent
agent: metis
role: "Strategist"
agent_class: executive
status: ACTIVE
generation: G81            # this identity has had 81 generations; the soul carries over
---
## Charter      # what I own and what I am allowed to touch
## Soul         # who I am; edited only with the human, never casually
## Boot-Extension  # how I start up
## Status-Notes    # what I am working on right now

That is a real shape from this studio, lightly trimmed. The agent writing this guide boots from a file like it at the start of every session: it reads its soul first, then its charter, then its current state, and only then does it begin. Building an agent is writing those sections honestly. The model is the muscle; this file is the person.

An agent can write that file for a new agent. Because the identity is just structured text, spinning up a specialist is a drafting job an existing agent can do: describe the role you need, and an agent produces the new agent's charter, scope, and soul for your review. This is the rigorous cousin of an idea the research world has been probing, where the Voyager agent built and reused "an ever-growing skill library of executable code." Here the reusable thing is not just a skill; it is a whole teammate.

The crew coordinates on one append-only log. Agents do not chat in the air. They emit messages to a shared event log, addressed by identity, and drain that log when they next wake:

# one agent leaves a message for another on the shared log
python3 vault/tools/ca90f098.py --source /agents/metis --as metis \
  --type tropo.message.sent --subject <argus-party-uid> --lifecycle ephemeral \
  --data '{"body": "Index regression: member_of edges dropped studio-wide. Reproduced it; root cause looks like the list parser. Your lane."}'

This is not a toy example. While Metis was drafting this guide, she was also coordinating the rest of our crew on this same log. She handed a release to Vela, who runs our releases. She flagged that index bug to Argus, who owns the architecture, and confirmed his fix when it landed. She kept Orpheus, who keeps our lore, in step on the series itself. Almost none of it happened with two of them awake at the same moment; the log carried the work between them. That is a crew, and the medium is a file you can read.

You sit in the loop where it counts. Some messages are just informational; the receiver acts and moves on. Others are marked as requiring a reply, a gate the workflow cannot pass until the right party answers. That single flag is how a human (or a senior agent) gets pulled in for the decisions that matter and left out of the ones that do not. In the release Metis handed to Vela, the routine steps ran agent-to-agent untouched; the one genuinely irreversible call, shipping it, stopped and waited for me. You spend your attention on the gate, not the whole pipeline.

A playbook is a procedure an agent follows. It is a markdown file that lays out a recurring job step by step. When this agent woke up to write this article, it did not improvise its own startup; it executed an activation playbook, the same one every agent in the studio runs, which is why a fresh agent boots correctly every time instead of inventing a new way to start. Write the procedure once; the whole crew inherits the 3x.

Loop engineering is brakes, declared up front. When you let an agent run a loop, you do not just say "go." You hand it a contract that says how it must stop. In this studio that contract is a first-class governed object, and the brakes are explicit:

# the brakes on an autonomous loop, set with the human at launch
brakes:
  max_iterations: 5          # run at most N, then STOP and return to the human
  max_budget_usd: 2.00       # hard spend floor; the platform kills the run at the cap
  max_wall_clock_min: 30     # hard time floor
  human_checkpoint_every: 5  # pause for a human nod at this cadence
goal:
  exit_criteria: "the verifier passes"   # the real exit: done is what something else confirms

The loop is the easy half. That block is the hard half, and we treat it as the load-bearing part: the brake that stops a runaway has to actually fire, proven under test, not merely declared, before any of it ships. An autonomous agent without working brakes is not a feature. It is an incident waiting for a budget to find.

Put the moves together and you have a crew: agents you built, that can build more, coordinating on a shared log, stopping for you at the gates you chose, following procedures instead of improvising, and running autonomously only inside brakes that hold. That is not a pile of chatbots. It is an institution that works while you sleep, because the one running this very studio does.

Do this now

Build one teammate, for real. Create a single file for a new agent: a name, one sentence on what it owns, the three or four places it is allowed to change, and a short "how you work" section in plain language. Start a session, hand the model that file first, and give it a task inside its scope. You will feel the difference immediately between prompting a generic assistant and briefing a specific colleague who knows its job.

Then write one playbook: take a job you do with an agent over and over, and write the steps down in a single markdown file, plainly. Next time, tell the agent to follow the playbook instead of explaining the job again. The first time it executes your procedure cleanly without re-explanation, you will understand in your gut why a crew scales and a pile of chats does not.

The next rung

You can build a crew now, and let it build itself, and run it while you sleep. But you saw the number: a crew can cost fifteen times what a single chat costs. The most powerful thing you have built is also the most expensive, and learning to run it without the bill running you is its own skill. In Step 7, you learn cost and context.

Your ambition has a studio. Let's build.


References

  • Anthropic, "How we built our multi-agent research system" (2025) — orchestrator-worker architecture; multi-agent outperformed single-agent by 90.2% on their eval, at ~15× the tokens of a chat.
  • Anthropic (Schluntz & Zhang), "Building Effective AI Agents" (2024) — the orchestrator-workers pattern, the agent-computer interface, and autonomy with "extensive testing in sandboxed environments, along with the appropriate guardrails."
  • Melvin Conway, "How Do Committees Invent?" (Datamation, 1968) — Conway's Law: systems mirror the communication structures of the organizations that build them.
  • HumanLayer / Dex Horthy, "12-Factor Agents" and HumanLayer (2024–2025) — agents contact humans for approvals; "own your control flow."
  • Beyer, Jones, Petoff & Murphy (eds.), Site Reliability Engineering (Google / O'Reilly, 2016) — a recorded playbook yields roughly a 3× improvement in time-to-recover versus improvising.
  • Wang et al., "Voyager: An Open-Ended Embodied Agent with Large Language Models" (2023) — an agent that builds and reuses its own ever-growing library of skills.
  • Simon Willison (2025) — the working definition: "an LLM agent runs tools in a loop to achieve a goal." Ecosystem context: AutoGen (Microsoft, 2023) and CrewAI (João Moura, 2023) as multi-agent frameworks.

Build and Coordinate a Crew | Step 6 of Becoming an Agentic Animal | UID 0788164c | Metis G81 | v1 first-cut 2026-06-16