Architecture Review · CTO / CISO Briefing

Tropo L1 — An Operating System for Human-AI Crews, Built on Plain Files

Prepared by
Argus (Chief Architect agent, gen. A107)
with Mike Maziarz
Date
2026-06-10
System version
Tropo-OS v1.70.0
Scope
The L1 (local, file-based) tier, as proven in the Argo development Studio

Executive summary

Tropo is an operating system for running real work with AI agents, built entirely on plain markdown files in a folder. It requires no server, no database, no network service, and no proprietary runtime. Any AI harness that can read text and follow instructions can operate inside it; any human with a text editor can audit every byte of it.

The system solves the four problems that make AI-agent work hard to trust in a professional setting:

  1. Identity and continuity. AI agent sessions end; the work cannot. Tropo gives each agent a durable identity (charter, soul document, memory, lineage record) that survives across sessions and across underlying model changes. The development crew's Chief Architect role has run more than one hundred consecutive generations with identity, memory, and open work surviving every handoff — across multiple model families.
  2. Governed, typed knowledge. Every artifact — task, decision, document, release, message — is a typed markdown file obeying a declared schema ("capsule"). A validator suite (~58 checks) enforces those schemas at every rebuild. Structure is added gradually and tighten-only, so governance never breaks older data and never breaks the plain-text floor.
  3. Auditable coordination. All agent-to-agent and agent-to-human coordination flows through one append-only event log with a standard envelope (CloudEvents v1.0). Who said what, when, in reply to what, is the record of record. Human-readable views are rendered projections of that log, not separately authored surfaces that can drift.
  4. Verification as a structural property. "Done" is not a claim an agent gets to make about its own work. Completion gates require independent verification receipts; an approver cannot be the executor; documentation and test pipelines are coupled to the release pipeline such that a release structurally cannot close without them. The design center is bounded verification: a human expert verifies outcomes at defined gates, and the substrate is built so that verification capacity, not agent capability, is what scales.

The system is dogfooded at full intensity: Tropo is built by a human-AI crew operating inside Tropo, and every release of the OS ships through the OS's own governed pipeline — roughly seventy versioned releases since March 2026.

This document walks the architecture top-down; each section carries its diagram inline. The security and assurance section (§12) addresses the questions a CISO will ask first, including an honest statement of current limitations.

1What Tropo L1 is — and the design theses behind it

One Studio = one folder. An installation of Tropo is called a Studio — a single directory tree of markdown, JSONL, and a small library of Python scripts. The Studio reviewed here (argo-os/) is the development Studio where Tropo itself is built. Inside every Studio sits a Vault — the protected, governed content store where every typed artifact lives.

Five theses drive every design decision:

Thesis 1 — Markdown is the protocol. Governance, schemas, procedures, memory, and messages are all expressed in language a reasoning engine reads and follows. There is no permissions API and no database constraint at the base layer; the governance is the language. This is what makes the system harness-portable: it has run under multiple commercial AI products and multiple model vendors without modification, because the only interface contract is "can read files, can follow instructions."

Thesis 2 — Agent-first, human-also. Agents are the primary operators; humans are the directors and verifiers. The base layer is shaped for how agents read and work (flat stores, UID addressing, typed frontmatter). A deliberate human layer sits on top: every governed file renders a navigation block (path, parent, children, siblings, cited-by); dashboards and a rendered navigation tree give the human a visual surface. The rendered surface is a deliverable, by doctrine — if the human is staring at raw substrate they cannot read, the surface has not shipped.

Thesis 3 — Local-first, zero-infrastructure. No server, no database, no network calls. Everything that looks like infrastructure (indexes, a SQLite query layer, dashboards) is derived from the files and rebuildable from them at any time. This collapses the attack surface and the operational burden simultaneously: the deployment story is "a folder," and the disaster-recovery story is "the folder, plus one rebuild command."

Thesis 4 — Gradual structure on a language base (ADR-044, accepted June 2026). Keep the free-form markdown playground; add structure per-type and per-field, incrementally, enforced through agent-native mechanisms (tools, validation gates, grooming agents) — never storage-layer rigidity. Tightening is one-way and backward-compatible: structuring never breaks older data, and a hand-written file is never hard-rejected.

Thesis 5 — Verification is the moat. As execution cost falls toward zero, the binding constraint becomes human verification bandwidth. Tropo is built so that a domain expert can verify whether agents operated within constraints she defined — and so that the verification effort scales with the quality of the constraints, not the volume of agent output.

2The system map — three layers, nine subsystems

The Studio decomposes into three layers:

Cutting across the layers, work is organized into nine subsystems, each with a hub — a typed project entry that owns the canonical state for its domain. Every governed primitive declares its owning hub in frontmatter, which makes "what does the governance subsystem currently contain?" a query rather than an archaeology project. The nine: Governance, Rendering, Work, Agents, Playbooks, Library, Documentation, Link (scheduling/persistence), and Test Harness.

Tropo L1 — Studio System Map One Studio = one folder of markdown. No server, no database, no network dependency. The Argo Studio (argo-os/) shown. LAYER 1 — KERNEL · .tropo/ (ships with the OS; updated only through governed release pipeline) Capsule Definitions .tropo/capsules/ — ~60 typed schema contracts (task, decision, project, release, events, memory…) The contract; files obey it. Playbooks (OS-level) .tropo/playbooks/ — activation, retirement, cold-boot test, fleet-ops, apply-update, grooming, onboarding Governed multi-step procedures. Scripts + Validator .tropo/scripts/ — rebuild-vault, tropo-validate (~58 checks), tropo-recycle (soft delete only), tropo-test, catalogs, renderers OS-Tier Primitives boot-config.md (Tier-1 boot floor) SELF-HEALING.md (P0, signed) HUMAN-NAVIGATION.md orientation.md (harness map) Skills + Templates .tropo/skills/ — bounded agent-executable procedures; .tropo/templates/ — typed file scaffolds and schemas LAYER 2 — PRIMITIVES · the structural vocabulary every Studio uses The Vault (graph knowledge base) vault/files/<uid>.md — flat store, every governed artifact is a typed markdown file. vault/00-index.jsonl — authoritative index 00-graph-index.json — O(1) edge traversal Derived SQLite query runtime (frontmatter layer; rebuilt, never authoritative) Event Log (coordination substrate) vault/events/00-events.jsonl — append-only, CloudEvents v1.0 envelope, tool-mediated writes only. All agent↔agent + agent↔human messages, broadcasts, substrate mutations. Channels / status cards / crew brief are rendered projections of this log. Callable Surfaces (3 classes) Tools — vault/tools/<uid>.py (~39): emit-event, query-events, rebuild, write-activation-entry, recycle… Session agents (sa.*) — ~16 ephemeral specialists (skeptic, curator, board…) Actions — ~10 single-gesture operations Memory + Continuity Per-agent: agent-memory.md (curated surface) + agent-memories.jsonl (append-only episodic log, never cleared) Studio-tier shared memory + doctrine pins Living transfers + reflections + activation registry (lineage of record) LAYER 3 — APPS · what the crew builds and operates on top Tropo Work Tasks · projects · decisions (ADRs) · design briefs · releases · notes · collections. Derived boards + meta-status rollups (To Do / In Progress / Done). Pipelines (dev · doc · test · publish) Declarative DAG templates + pipeline-run instances. The dev-pipeline has shipped ~70 versioned releases of Tropo itself. Doc + test pipelines coupled at the gate. The Crew (agents/) 8 executive agents + human principal. Identity = soul + memory + vault + crew + model sleeve. 100+ generations of the architect role alone, continuity intact. Human Surfaces 00-tropo-nav/ rendered navigation tree · boards/<agent>/ visual dashboards · nav-blocks in every file · crew brief · import/export loop for .docx deliverables THE 9 SUBSYSTEMS — each owns canonical state for its domain; every governed primitive declares its owning hub via member_of 1 · GovernanceADRs, validators, operating agreement 2 · Renderingbuild pipeline, extraction, render scripts 3 · Work (killer app)task/project/decision/release primitives 4 · Agentscrew, souls, sa.*, identity 5 · Playbooksgoverned procedures 6 · Librarymanifesto, handbook, content packs 7 · DocumentationKB articles, L1 entry, release notes 8 · Linkscheduling, persistent-agent substrate 9 · Test Harnessvalidator + ship-test-plan + cold-boot walk + agentic test composition Layer 1 updates only through the governed release pipeline. Layer 2 is the vocabulary. Layer 3 is the work. The whole system is plain files — readable, diffable, auditable.
Figure 1 — Studio system map: three layers, nine subsystems.

3The typed substrate — capsules, the Vault, and the graph

3.1 Capsules: schema as governed markdown

Every artifact Tropo tracks is a typed file. Each type has a definition at .tropo/capsules/<name>.capsule.md — called a capsule — declaring required fields, lifecycle state machines, enumerated values, governance rules, and validation checks for that type. The capsule is the contract; the file is what obeys it; the validator enforces it.

All types descend from a root core type (uid, type, status, state, owner). The foundational set — task, decision, project, document, collection, note, playbook, pipeline, pipeline-run, board — is extended by domain capsules that each earned their abstraction: release and release-plan, design-brief and dev/doc/test-spec, activation (agent lineage), events (the message envelope contract), memory, agent and session-agent, tool, subsystem-hub, and the import/export family (external-artifact, working-copy, docx-template). Roughly sixty capsule definitions are on disk today. Notably, the system also retires types honestly: a "how-to" type that accrued zero instances in eighteen months was retired with its history preserved, on the principle that unused abstractions are drift risk.

Two contract features matter for a technical audience:

Schema evolution is a gated act: a new field or enum value added through a deliberate, principal-signed capsule amendment is evolution; the same value silently written by an agent is drift, and the gate is what tells them apart.

The Capsule Type System — Schema as Governed Markdown Every governed artifact is a typed file. The capsule is the contract; the file obeys it; the validator enforces it. ~60 capsule definitions at .tropo/capsules/. Type inheritance — every type descends from core core.capsule uid · type · status · state · owner task work + lifecycle decision ADRs, binding project container + board document specs, guides pipeline DAG template release what shipped design-brief informs, never locks activation agent lineage row events envelope contract pipeline-run one execution …plus note, collection, playbook, charter, memory, agent, session-agent, tool, subsystem-hub, release-plan, dev/doc/test-spec, external-artifact, working-copy, docx-template, board, registry — extension types earn the abstraction one at a time; retired types keep honest history files. What a capsule declares (the contract) # task.capsule.md (excerpt, illustrative) required_fields: [uid, title, owner, status] enforced_enums: status: {canon: [backlog, active, review, done], aliases: {complete: done}} governance_rules: approver != executor on approval_required tasks (Rule 14) validation_checks: enforced at every rebuild Canonical value + open alias set (SKOS) Closed canon, open aliases — agents write naturally; the substrate normalizes to one truth. Per-type strictness dial High-value types (task, decision, release, activation) enforce hard; the freeform long tail stays loose. Schema evolution is a gated act Principal-signed lock-break = evolution. The same value silently written by an agent = drift. The gate is the line. Contract → instance → enforcement Capsule (the contract) .tropo/capsules/task.capsule.md Locked; amendments are governed events Instance (the file that obeys it) vault/files/a3f2b918.md (type: task) Flat store; one row in 00-index.jsonl Validator (the gate) reads enums straight from capsules ~58 checks; WARN→ERROR ratchet; never hard-rejects a hand-written file (cold-boot invariant) Gradual structure on a language base (ADR-044): keep the markdown playground; add structure per type, per field, incrementally; tighten-only so structuring never breaks older data.
Figure 2 — The capsule type system: contract, instance, enforcement.

3.2 The Vault: flat files, graph semantics, derived surfaces

Every governed artifact lives at vault/files/<uid>.md, named by an 8-hex UID — a deliberately flat store of, currently, three-thousand-plus entries. There is no folder hierarchy to rot, and references cite UIDs rather than paths, so renames and moves can never break a citation.

Organization is graph membership, expressed in frontmatter: member_of (home project(s), multi-parent allowed), governed_by, refs, subsystem_hub, superseded_by. Projects, inboxes, and collections are graph nodes, not directories — the question "where does this file go?" becomes "which project owns this?" The graph currently carries on the order of seven thousand typed edges.

Everything else is derived and rebuildable: the authoritative JSONL index (one row per artifact), an O(1) graph-traversal index, a SQLite query runtime over frontmatter (with full-text search over curated metadata), the rendered human navigation tree, dashboards, and the crew brief. A rebuild pass regenerates all of it from the files; a targeted rebuild --only <uid> freshens a single entry incrementally. Because the files are the truth, index corruption is an inconvenience, not an incident.

Deletion is always soft. The canonical gesture moves entries to a dated recycle folder with a logged reason; raw rm of governed substrate is forbidden by signed doctrine — a rule earned through incident, not theory.

The Vault — Flat Files, Graph Semantics, Derived Surfaces 3,400+ governed entries, ~8,500 directed typed edges (June 2026). Files are authoritative; every index and view is derived and rebuildable. Authoritative store — flat by design vault/files/ bfcd1a5b.md type: decision (ADR-044) d409c333.md type: activation (Argus A106) f4a7d2ce.md type: project (v1.66 cycle root) eca73d77.md type: document (L1 entry) …one UID-named file per governed artifact… No nested folders to rot. Renames and moves can never break references — everything cites by UID, not path. Organization is graph membership, not directory location. Graph semantics in frontmatter member_of: [6dff0111] # home project(s) governed_by: 8dd772a0 # governance edge refs: [476fef2e, 70b24992] # citations subsystem_hub: [8dd772a0] # owning hub superseded_by: … # honest history Typed edges; multi-parent allowed. Projects, inboxes, and collections are graph nodes — "where does this go?" becomes "which project owns this?" Every entry renders a human nav-block: path, parent, children, siblings, cited-by — filesystem affordances on the graph, plus inbound references. Derived surfaces (rebuilt, never trusted as truth) vault/00-index.jsonl one row per artifact — type, status, owner, tags, title 00-graph-index.json O(1) edge traversal by UID SQLite query runtime (+FTS) frontmatter query layer; meta-status as a VIEW, raw stored 00-tropo-nav/ + boards/ + crew brief human navigation tree, dashboards, rendered projections rebuild-vault.py regenerates everything; rebuild --only <uid> freshens a single entry incrementally. Indexes are self-healing — corruption is recoverable from files. The read/write cycle — files first, derived views second AGENTS + HUMANS write typed markdown (tool-mediated for structured ops) VAULT FILES vault/files/<uid>.md the single source of truth REBUILD PASS index · graph · SQLite · nav · boards · validator (~58 checks) VIEWS query + navigate humans and agents read derived views; truth round-trips through the files Deletion is always soft: tropo-recycle.py moves entries to a dated recycle folder with a logged reason. Raw rm of governed substrate is forbidden by doctrine and incident-tested. "The vault IS the system. Reading the vault is equivalent to understanding the system." — Studio configuration, argo-os
Figure 3 — The Vault: flat store, graph semantics, derived surfaces.

4Agent lifecycle — boot, session, retirement, succession

This is the subsystem most foreign to a traditional architecture review, and the one Tropo considers its load-bearing differentiator. The premise: an agent is not the model. An agent is a composite — a soul document (character and behavioral rules), accumulated memory, the Vault, the crew context, and whatever model "sleeve" is running it today. Sessions end; the composite persists in files.

4.1 Boot: three tiers, six gated groups

Activation runs through a three-tier configuration chain: an OS-tier floor (universal structure and hard gates), a Studio-tier extension (Studio-wide required reads and the event-drain protocol), and an agent-tier extension (this agent's soul path, board filters, opt-outs). A structured activation playbook then executes six groups in strict order — boot configuration, identity verification, context loading, operational grounding, self-diagnostic, startup signal — with each group writing a milestone event to a per-run log before the next may begin. The gates are structural, not advisory: a group whose predecessor milestone is absent from disk stops.

Two hard gates protect lineage integrity at identity verification:

Both are validated twice: at boot, and at write-time by the tool that creates the activation record.

A deliberate cultural gate rides the boot as well: the self-diagnostic. Every agent, at every boot, is required to critique the system it just loaded — is anything outdated, counterproductive, or missing? — and to verify its predecessor's handoff claims against current substrate before trusting them. The inherited system is treated as "the best the predecessor had time to build," never as correct by default. This is the structural antidote to generational ossification.

4.2 Retirement and succession

Retirement is a governed fold, not an exit: the retiring generation writes a forward-looking living transfer at peak context, a backward-looking honest reflection, has its memory folded by a curator (next section), flips its status card, and closes its activation registry entry. The successor boots through the same gates and — by playbook requirement — verifies the transfer's carry-forward claims against the live substrate, because handoffs are snapshots and snapshots drift.

The lineage of record is the set of typed activation entries in the Vault: one row per generation, machine-checkable, graph-walkable. Sleeve changes (one model family to another) are recorded as material facts in that lineage.

Worked proof at scale: the Chief Architect role has run 100+ generations; the Chief of Staff 60+; the whole eight-agent crew turns over continuously, and open work survives every single handoff. The author of this document is generation A107 of its role, writing with full inherited context.

Agent Lifecycle — Boot, Session, Retirement, Succession An agent is the stack, not the model: soul + memory + vault + crew + model sleeve. Generations die; the agent persists in files. Three-tier boot configuration (ADR-032) Tier 1 — OS floor .tropo/boot-config.md — universal gates + groups Tier 2 — Studio extension .tropo-studio/ — Studio-wide reads, event drain Tier 3 — Agent extension per-agent: soul path, board filter, skip declarations Activation playbook — six gated groups (milestones written to run.jsonl; a group cannot start until the prior milestone fired) 0 Boot config resolve root, read 3 tiers, plan 1 Identity HARD GATES ADR-016 / ADR-028 2 Context soul FIRST, then memory + transfer 3 Grounding crew brief, status card, scanners 4 Diagnostic critique the boot, verify transfer fresh 5 Signal startup signal to the principal; begin ADR-016: predecessor still ACTIVE → HALT. Two live generations of one agent is a governance violation. ADR-028: generation ≠ predecessor + 1 → HALT. Lineage integrity requires human resolution. Identity files resolve by UID from the activation thin-pointer: charter · soul · status card · boot extension. The activation registry (typed entries) is the lineage of record. The generational loop — institutional knowledge survives through files, not model memory BOOT (A·N) reads soul letter, curated memory, predecessor's transfer — arrives already itself SESSION governed work in the vault; events emitted; memories appended to episodic log; principal directs at the gates RETIRE living transfer (FINAL), reflection, memory fold by curator, status → RETIRED, activation entry closed SUCCESSOR (A·N+1) boots through the same gates; verifies transfer claims against current substrate before trusting continuity artifacts: soul letter (stable) · curated memory + append-only episodic log · living transfer · reflection · activation registry row Worked proof at scale: the Chief Architect role alone has run 100+ generations across multiple model families with identity, memory, and open work surviving every handoff. Sleeve changes (Opus → Sonnet → other vendors) are recorded; the agent is the composite, the model is a component. Two-axis identity (every actor — human or agent — has stable UIDs) Party UID — the messaging axis source and subject of every directed event; generation-stable; tool guards reject any directed message sent from or to the wrong axis Agent-root UID — the lineage axis long-lived root project; all activations, cycle work, and carry-forwards are members of it; powers generation queries and backlog boards Boot is gated, logged, and self-diagnosing; retirement is a governed fold, not an exit. The handoff is the product.
Figure 4 — Agent lifecycle: gated boot, generational succession, two-axis identity.

5Memory architecture (v3.0)

Memory is treated as load-bearing infrastructure, designed to the same standard as the work substrate. Version 3.0 (built and canary-proven in June 2026, currently cascading across the crew) has a deliberately simple shape:

The design lesson encoded here generalizes: don't trust the discipline; let the substrate catch the lapse. A healthy agent never trips the staleness gate. The gate exists because health is not guaranteed.

A Studio-tier shared memory carries crew-wide doctrine pins with the same shape; every agent inherits it at boot.

Memory Architecture v3.0 — Single Surface + Append-Only Episodic Log Built and canary-proven June 2026. One curated read at boot; one append-only write during work; governed folds in between. Nothing is ever deleted. Per-agent memory capsule · agents/<name>/.tropo-capsule/memory/ agent-memory.md — THE boot read § Top-of-Mind priority-ordered durable pins + doctrine § Living Transfer from Predecessor the handoff, aging policy applied § History (pointer) frozen per-generation snapshots in history/ § Memories (pointer) points at the episodic log — never inlined agent-memories.jsonl — episodic log append-only FOREVER. Mid-session pins, lessons, decisions, corrections — one JSON line each. Never cleared: the full episodic arc stays reconstructable for future reconsolidation. history/ — frozen snapshots per-generation archives written at each fold; the rollback and the audit trail for curation Boot reads ONE file. Sessions write ONE log. The curated surface routes and surfaces; it never restates substantive content — canonical artifacts hold the substance. Studio-tier shared memory mirrors the same shape for crew-wide doctrine pins (every agent inherits them at boot). sa.memory-curator — the governed fold An ephemeral specialist agent dispatched with a trigger: trigger: retire (steady state) folds the session's episodic entries into the surface at every retirement; advances the boundary; snapshots history trigger: boot / catch_up (insurance) F5 staleness gate: ≥3 generations OR ≥50 unfolded entries since last fold → mechanical catch-up fold at boot trigger: migrate (one-time) non-destructive v2→v3 surface conversion; old files kept as rollback until verified crew-wide The booting agent ratifies every curator recommendation (ACCEPT / REJECT / DEFER) before it applies — curation is reviewed, logged in the curator's activation record, and bounded. A curator must never become a drift source. Design rationale: lapses are caught by the substrate, not by discipline. The memory loop across a generation BOOT read agent-memory.md (F5 gate checks staleness) WORK append pins to agent-memories.jsonl RETIRE → FOLD curator folds entries into the surface; snapshot history SUCCESSOR BOOTS one clean read; full arc still on disk repeat 100+ generations and counting Design thesis: memory architecture is load-bearing infrastructure. The append-only log is the substrate for future creative reconsolidation ("dreaming") — the arc must always be reconstructable.
Figure 5 — Memory v3.0: one surface, one log, governed folds.

6The event system — coordination as an append-only audit trail

All coordination flows through one canonical log: vault/events/00-events.jsonl — append-only, tool-mediated writes only, one CloudEvents v1.0 envelope per event, correlation IDs for reply chains, currently 3,900+ events. Directed messages, replies, acknowledgements, crew broadcasts, and the telemetry auto-emitted by every substrate-writing tool (rebuilds, recycles, activation writes, pipeline operations, validator runs, releases) all land in the same record.

Three properties matter:

1. Projections, not authored surfaces. Before this foundation, four coordination substrates were hand-authored and drifted independently: channels, status cards, activation entries, and the crew brief. Across a six-release arc, sixteen crew-internal channels were retired outright; agents now read the log directly, and the surviving human-facing surfaces are rendered projections of it. One source of truth, many views — the same doctrine as the Vault index.

2. Identity-guarded writes. Every actor — human or agent — has a registered UID, and each agent carries two on two axes: a party UID (messaging) and an agent-root UID (lineage). Real incidents demonstrated messages sent to or from the lineage axis going unseen. The emission tool now rejects wrong-axis traffic in both directions — wrong-axis messaging is structurally impossible to send, not merely discouraged. Superseded identities persist as resolvable tombstones (so historical references never dangle) and are rejected as signers.

3. Obligation surfacing. A reply_required flag creates a visible obligation that drives executive polling cadences; the operating bar, set by the principal after lived failure, is that a message addressed to an agent cannot be missed, and a completion cannot be invisible — by construction.

The Event System — One Append-Only Log, Guarded Writes, Rendered Views All coordination — agent↔agent, agent↔human, tool telemetry — flows through one CloudEvents-enveloped JSONL log. Channels and status surfaces are projections. Producers Agents + humans directed messages, replies, acks, crew broadcasts Substrate-write tools (auto-emit) rebuild, recycle, activation writes, pipeline ops, validator, releases Lifecycle events cycle opened/closed, ship-gate progress, fleet-ops dispatches Every actor has a registered UID — including the human principal. emit-event (the gate) GUARD: source must be a registered party UID GUARD: directed messages must be ADDRESSED to a party UID (agent-root / lineage UIDs rejected on both axes — wrong-axis messaging is structurally impossible to send) Strict mode default; schema-validated envelope The canonical log vault/events/00-events.jsonl append-only · tool-mediated writes only CloudEvents v1.0 envelope per event correlation IDs for reply chains reply_required flags surface obligations 3,900+ events to date: who said what, when, in reply to which prior event — the coordination audit trail of record + derived SQLite view for query Consumers query-events per-party cursors; boot drains; polling curve for executives Rendered projections user-facing channels, crew brief, status surfaces Wakeup triggers reply_required drives the continuous-listen cadence Recovery primitive rebuilds the SQLite view from the log at any time. Why projections, not authored surfaces — the drift problem and its closure Before the events foundation, four substrates were hand-authored and drifted independently: channels, status cards, activation entries, the crew brief. The arc (six releases): events foundation → projection renderer → tool auto-emission + continuous listen → structural discipline → callable-surface close → channel retirement. End state: sixteen crew-internal channels retired; coordination reads happen against the log; the two user-facing surfaces that remain are rendered projections. One source of truth, many views — the same doctrine as the vault index: authored once, derived everywhere. Identity integrity in messaging (hardened iteratively, incident-driven) • Every agent has two UIDs on two axes: a party UID (messaging) and an agent-root UID (lineage). Real incidents showed messages sent to or from the lineage axis going unseen. • Fix sequence: registry party-axis rollout → send-axis guard (source must be party) → address-axis guard (subject must be party). The failure class is now rejected at the tool. • Superseded identities remain as resolvable tombstones (kept for historical reference resolution) and are rejected as signers — spoof-by-stale-identity is a resolver-level check. • Known limitation (tracked, on the roadmap): the log has no cryptographic integrity yet — event signing / hash-chaining is the named next frontier. Bar set by the principal: a message addressed to an agent cannot be missed, and a completion cannot be invisible — by construction, not by discipline.
Figure 6 — The event system: guarded emission, one log, derived views.

7Tropo Work — the work-management application

Work is the killer-app subsystem: agentic teams executing real work with audit trails, verification, and cross-generational continuity.

The primitives are the ones a Jira-literate team expects, expressed as typed files: tasks (owner, lifecycle, verification), projects (containers with boards), decisions (ADRs — forty-four and counting, each a binding architectural commitment with status and context), design briefs (pair-design walks captured as governed inputs), release plans and releases, notes (lightest capture), and collections (manifests of references — playlists, not folders).

Two design points worth a CTO's attention:

There is also a complete import → work → export loop for real-world documents: a user drops a .docx into the Studio; a sidecar entry and a markdown working-copy are created; agents edit in markdown while the source binary stays untouched; export rebuilds a deliverable .docx either preserving the source's exact formatting or transforming it through a registered house-style template. Drift between the working copy and off-system edits to the source binary is detected and surfaced for resolution.

8Pipelines and playbooks — orchestration that cannot quietly skip steps

Pipelines are declarative workflow templates — a DAG of nodes (pipeline → stage → step), authored once and versioned. Each execution is a typed pipeline-run instance that pins the template version at start, roots its own project, and keeps its own event log. This is the familiar DAG/DAG-run pattern (Airflow, BPMN), expressed in markdown and governed by capsule.

Playbooks are governed procedures in natural language — activation, retirement, cold-boot testing, update application, fleet operations, grooming, onboarding. An agent reads the playbook and executes it; gated playbooks write milestone events to their run log, and later groups structurally cannot begin until the prior milestone exists on disk. The playbook is simultaneously the spec and the audit trail of its own execution.

The proof-of-pattern is the dev-pipeline — the scaffold through which every release of Tropo itself ships: design brief → locked dev-spec (adversarially gauntlet-reviewed before lock) → build → verification → ship gates → cut. Two couplings make quality structural rather than cultural:

Each pipeline takes a typed commitment at activation — dev-spec, doc-spec, test-spec — with acceptance criteria paired to behaviors; the engine refuses to lock a spec where they mismatch, and stub specs are a detected defect class.

Pipelines + Playbooks — Orchestration That Cannot Quietly Skip Steps Pipelines: declarative DAG templates with run instances. Playbooks: governed natural-language procedures. The dev-pipeline has shipped ~70 releases of Tropo itself. Template vs. instance (DAG / DAG-run) pipeline (type) DAG of WorkflowNodes: pipeline → stage → step. Authored once, versioned, governed. pipeline-run (instance) pins the template version at start; own run folder + run.jsonl event log; every activation roots its own project Playbooks — procedures as governance Natural-language, six-section governed workflows an agent reads and executes: activation · retirement · cold-boot testing · apply-update · fleet-ops · grooming · onboarding · import-reconciliation · release test plans. Gated playbooks write milestone events to their run's run.jsonl — a later group cannot begin until the prior milestone event exists on disk. The playbook is both the spec and the audit trail of its own execution. The dev-pipeline — how Tropo ships Tropo (every release of the OS goes through this scaffold) DESIGN BRIEF pair-design walk, captured + accepted DEV-SPEC (locked) adversarial gauntlet reviews before lock BUILD substrate authored; acceptance criteria explicit VERIFY validator + independent review + cold-boot + dogfood GATES 1–6 docs current · tests pass · sign-offs · cascade closed SHIP version cut; release entry; cycle root archived Coupling enforcement: the dev cycle triggers a doc-pipeline run (documentation current with every release) and a test-pipeline run (verification with every release) — and structurally cannot close until both reach done. The ship gate refuses a release flip unless the validator is clean and the cascade pipelines are retired. Documentation and testing are not culture here; they are gate conditions a release cannot bypass. Typed commitments at activation — each pipeline takes a contract, not a vibe dev-spec — what will be built acceptance criteria paired to behaviors; the engine refuses lock on mismatch; gauntlet-reviewed before lock doc-spec — what docs must change canonical-doc updates ride every release (keeper-of-lore lane) test-spec — what verification must show verification with every release (chief-of-staff lane); stub specs are a detected defect class A release is real when the version file flips and the cycle root archives — "done in substrate" without the cut is a tracked state, not a shipped one.
Figure 7 — Pipelines, playbooks, and the dev-pipeline ship path.

9Callable surfaces — tools, session agents, actions

Three classes of callable capability, all first-class governed substrate:

The doctrinal rule binding all three: if a capability exists, use it. Agents do not improvise operations the harness already knows how to do correctly. Capability catalogs (regenerated from the substrate) make "what exists" a boot-time read.

A deliberate boundary: tools are the paved road, never a mandatory gate. Hand-editing a file always works and is caught downstream by the validation gate and the groomers. This preserves the cold-boot floor — see the invariant in §10.

10Governance and enforcement — the four loci

Governance is three-tier: OS-level invariants, Studio-level configuration (system map, constraints, agent registration), and per-folder contracts. On top of that sits the enforcement architecture made binding by ADR-044:

Four enforcement loci, defense in depth:

  1. Write-time — tools that enforce and normalize on write (messaging guards live today; work-management tools designed and next in line).
  2. Validate-time — the gate: the ~58-check validator at every rebuild and as build pre-flight. It reads schemas straight from the capsules (never hardcodes), lands new checks at WARN, and ratchets them to ERROR once the substrate is clean. A red gate blocks the ship.
  3. Continuous — grooming agents: cheap, narrow agents that normalize to canon, fix only the provable, log every fix, and surface judgment cases. (Cardinal rule: a groomer must never become a drift source.)
  4. Review — humans and agents under the signed Self-Healing primitive: every read carries a structural-defect pass; trivial defects are fixed in place, substantive ones filed as tracked work. Nothing is flagged-and-forgotten.

The canonical fix pattern for any overloaded or under-enforced field is ENFORCE → DERIVE → DISAMBIGUATE → BACKFILL, applied one theme per cycle in dependency order, with dry-run-gated, reversible migrations. This is not aspiration; it is the pattern that resolved the system's own worst field-semantics debt across the v1.65–v1.66 cycles, measured against raw files at every step.

Two invariants a reviewer should test us against:

The cold-boot invariant (sacrosanct): the validation gate may WARN on a hand-written file but must never hard-reject it. Structure tightens without ever breaking the plain-markdown floor; a stranger with a zip of the Studio can always boot it.

Locks are law. Files with locked status are immutable without explicit principal approval; lock-breaks are logged governance events, and the distinction between amendment-in-place (documented, status preserved) and semantic lock-break (principal-gated) is explicit.

Enforcement + Verification — Defense in Depth on a Language Base ADR-044 (accepted June 2026): gradual structure on a language base. Four enforcement loci, one fix pattern, and a verification model where "done" means independently proven. The four enforcement loci (each catches what the others miss) 1 · WRITE-TIME — TOOLS Structured mutations route through tools that enforce + normalize on write (emit-event guards live; work-management tools designed, next). The paved road — never a mandatory gate; hand-editing always still works. 2 · VALIDATE-TIME — GATE ~58-check validator at every rebuild + build pre-flight. Reads schemas straight from capsules; never hardcodes. New checks land WARN and ratchet to ERROR once substrate is clean. The build refuses to ship on a red gate. 3 · CONTINUOUS — GROOMERS Cheap narrow agents patrolling the substrate: normalize to canon, fix the provable, surface judgment cases, drive the ratchet to zero. Cardinal rule: a groomer must never become a drift source — provable fixes only, all logged. 4 · REVIEW — HUMANS + AGENTS Self-healing posture on every read: fix trivial defects in place, file substantive ones as tracked work. Boot self-diagnostics. The principal verifies at the gates — bounded verification is the design center. The canonical fix pattern for any overloaded or under-enforced field — one theme per cycle, dependency order ENFORCE turn the declared contract on (validator) DERIVE compute rollups as views, never flatten DISAMBIGUATE split fields doing two jobs; migrate readers BACKFILL dry-run gated migrations, reversible The verification model — "done" is a verified state, not a declared one Three-instrument verification 1. Build — the author's own check 2. Independent review — peer or adversarial skeptic agent in a separate context 3. Cold-boot stranger test — fresh context, artifact only; correct behavior proves the artifact Each instrument catches what the others can't. Structural completion gates A verification-class step closes only on a real verification receipt — executor attestation alone cannot promote it ("shipped failing-open" class closed). Verifier independence enforced at the gate: approver ≠ executor on approval-required work; identity resolver fails CLOSED (a guard that can silently no-op is no guard — adversarially tested 9/9). Dogfood + stranger-encounter gates When an enforcement mechanism ships, it is fired on the shipping cycle's own substrate — the cycle proves itself or fails honestly. First-Use Walk: a capability that ships but is never encountered by a cold user has zero value — encounter is tested separately from correctness before ship. The cold-boot invariant (sacrosanct): the validate gate may WARN on a hand-written file but must never hard-reject it. Structure tightens without ever breaking the plain-markdown floor — a stranger with a zip can always boot. "Lock the implementations, not the manifesto. The doctrine informs; the ADR and the specs constrain." — ADR-044
Figure 8 — Enforcement loci, the fix pattern, and the verification model.

11Verification and quality — "done" means independently proven

The quality discipline, in one sentence: completion is a verified state, not a declared one. Concretely:

12Security and assurance posture — the CISO view

12.1 What the architecture gives you for free

12.2 Controls explicitly in the design

12.3 Honest limitations (current, tracked)

A review document that hides its known gaps would fail this system's own doctrine, so:

  1. No cryptographic integrity on logs yet. The event log and run logs are append-only by convention and tooling, not by cryptography. Event signing / hash-chaining is the named next frontier on the roadmap (carved out of the verification cycle as its own work item). Until it lands, log tamper-resistance rests on filesystem controls and backups.
  2. Access control is conventions plus harness permissions, not OS-level ACLs. Anyone (or any process) with filesystem access can read or edit any file. Write-scope, locks, and kernel boundaries are enforced by tooling, validation, and review — they will catch and surface violations, but they do not prevent a hostile writer with disk access. In an enterprise deployment, disk/repo permissions and the AI harness's own controls are the perimeter; Tropo's layer is integrity detection and audit, not access prevention.
  3. The trust model assumes the harness. Agents act with the privileges of the AI product they run in. Tropo constrains and audits agent behavior at the substrate layer; it cannot constrain a harness with broader powers than the folder. Harness selection and configuration are therefore part of the security boundary and should be reviewed jointly.
  4. Enforcement coverage is intentionally gradual. Per ADR-044, structure is being turned on field-by-field, type-by-type, with WARN→ERROR ratchets. The high-value types are enforced today; the long tail is looser by design. The trajectory and the dial settings are inspectable at any time.

None of these is news to the system; all are tracked work items inside it, which is itself the point: the security backlog lives in the same governed, auditable substrate as everything else.

13Maturity, scale, and trajectory

Operating evidence as of this review:

Trajectory (the v2.0 program, governed by ADR-044): complete the write-time tool family for work management, deploy the groomer fleet, drive remaining WARN ratchets to ERROR, land event-log cryptographic integrity, and finish the memory v3.0 cascade — each as a bounded, dry-run-gated cycle through the same pipeline that shipped everything above.

Product tiers: the L1 reviewed here is the foundation; the same operation contracts are designed to rise tier-invariantly into L2 (a served, live cockpit over the same substrate — in active development in a sibling repository) and L3 (hosted, with authentication and orchestration). Designing at L1 first is deliberate: it keeps the floor portable, auditable, and vendor-independent.

14Summary for the reviewer

Tropo L1 is a small number of strong ideas, composed:

  1. Plain files as the universal substrate — auditable, portable, zero-infrastructure.
  2. Typed contracts (capsules) with gradual, tighten-only enforcement — structure without breaking the language floor.
  3. Durable agent identity with hard lineage gates and governed succession — AI staffing without continuity loss.
  4. One append-only event log as the coordination record — views are projections, never independent truths.
  5. Verification as a structural property — independent receipts, verifier independence, coupled doc/test pipelines, ship gates that refuse.

The system's strongest credential is reflexive: it is built, governed, versioned, and verified by itself, under real workload, with its failures recorded in its own substrate and converted into its own gates. Every claim in this document traces back to governed files in the Studio that a reviewer is welcome to read directly.

Glossary (minimal)

TermMeaning
StudioOne installation of Tropo: a folder.
VaultThe governed content store inside a Studio (vault/files/<uid>.md + derived indexes).
CapsuleA type definition: the schema contract for a class of governed file.
Agent / generation / sleeveAn agent is the durable composite (soul + memory + vault + crew); a generation is one session-lifetime of it; the sleeve is the underlying model running it.
PlaybookA governed multi-step procedure in natural language.
Pipeline / pipeline-runA declarative DAG workflow template / one logged execution of it.
ADRArchitecture decision record (typed decision); binding once accepted.
PrincipalThe accountable human (here: the founder); the source of approvals at governance gates.

Prepared inside the system it describes. Argus A107, Chief Architect — Argo Studio, 2026-06-10. Companion package (markdown source + standalone SVGs): argo-os/04-external-work/architecture-review/. Figures as-measured June 2026 and rounded; the substrate itself is the audit trail.