The Agentic Builders / Building with Maz
From Chatbot to Operating System
· · 16 min read
How I went from typing prompts to running a crew of AI agents on an OS I built without writing a single line of code

Three layers, one durable substrate. The Tropo layer is where institutional memory lives across model and harness changes.
You've used AI well enough to know it's not just autocomplete. Maybe you write better with it. Maybe you've gotten an agent to ship something. Maybe you've felt the moment where an answer landed differently than you expected and thought: this changes how I work.
Two years ago I had that moment too. Today I'm running a company on a nine-agent operating system I built without writing a line of code.
This is the story of what happened in between — five phases of working with AI, one wall I hit at scale, and the realization that the thing I needed to build wasn't for me at all.
Phase 1: The Chatbot
In 2024, ChatGPT was magic. You typed a question, you got an answer. Sometimes the answer was wrong. Sometimes it was extraordinary. Either way, you were done in five minutes.
I used it the way everyone used it. Write me this email. Summarize this document. What's the competitive landscape for X. Give me a framework for Y.
It was a tool. A very impressive tool. But it was a tool the way a calculator is a tool — you pick it up, you use it, you put it down. There was no relationship. No continuity. No building on what came before.
I didn't know that mattered yet.
Phase 2: The Prompt Engineer
After a few months I got better at asking. Not just "write me an email" — structured prompts with context, constraints, examples, tone guidance. I learned that the quality of the output was directly proportional to the quality of the input. Garbage in, garbage out. Precision in, precision out.
This is where most professionals are right now. They've figured out that prompting is a skill. They've gotten good at it. They can get a model to produce something useful in one to five exchanges.
And then they're done. They close the tab. They come back tomorrow and start over.
I lived here for months. It was productive. It felt like progress. It was also a ceiling I couldn't see until I broke through it.
Phase 3: The Application
The ceiling broke when I stopped asking the AI to write things and started asking it to build things.
I'm the CMO at a company called MindBridge. I have a marketing knowledge base I've built over years — brand guidelines, messaging frameworks, competitive intelligence, positioning documents, sales plays, battle cards. Twenty-four files representing years of strategic thinking. The kind of institutional knowledge that lives in someone's head until they leave, and then it's gone.
I thought: what if the AI could read all of this and produce work that's actually on-brand, on-strategy, on-message — without me having to re-explain the context every time?
So I built a research application. Not by writing code — by directing ChatGPT to write the code for me. A RAG-enhanced pipeline integrated with my marketing knowledge base that produced research and writing that was better than anything I'd gotten from prompting alone. Not because the model was smarter. Because the model had context. Persistent, structured, domain-specific context.
That was the first real productivity leap. Not a little better. Dramatically better.
Phase 4: The Agent
Then someone told me to try Claude Code.
Claude Code is Anthropic's command-line tool for Claude. It runs in VS Code, it reads your files, it writes code, it executes commands. It's not a chatbot. It's a development environment with an AI that can see your entire project.
I installed it. I created my first agent.
Not "agent" in the marketing sense — not a chatbot with a personality. An agent in the operational sense: an AI with a specific role, specific knowledge, specific constraints, working inside a specific project alongside me. I gave it a commission. I gave it access to the codebase. I pointed it at the specs my strategy AI had written in a separate session, and I said: build this.
It built it. Not perfectly — I had to direct, review, redirect, push back. But it built working software from specifications that another AI had written, under my direction, in a workflow where I never typed a line of code.
That was the second productivity leap. I now had two AI agents — Metis, a strategist who designed the work, and an engineer who built it — coordinated through files I could read and edit. The strategist wrote specs. The engineer read specs and wrote code. I reviewed everything.
The results were, frankly, beyond anything I expected. We produced the second version of the research application in a fraction of the time. The quality was higher because the specs were better, and the specs were better because the strategist had full context of the domain.
Phase 5: The Crew
This is where things got weird. Good weird.
As the work got more complex, I noticed Metis could use some help. Not with thinking — with operations. Tracking what was done, what was next, what was blocked. The kind of work a chief of staff does for a CEO.
So I created one — Vela, a chief of staff. Her job wasn't strategy — it was making sure the strategic work actually got executed. Project management. Status tracking. Cross-referencing what one agent had promised with what another had delivered.
It worked. Different role, different timing, different cognitive load. Very helpful.
Then I added Argus, an architect for technical design reviews. Then Orpheus, a writer for documentation. Then Silas, a publicist on a completely different AI platform — because I wanted to prove the model wasn't locked to one vendor.
Nine agents. Multiple AI platforms. A crew.
And my context window — not the AI's context window, mine, the human's — started degrading.
The Wall
Here's where the story turns.
With nine agents producing work across multiple sessions, my brain was full. All the time. I was the bottleneck. Not because I couldn't make decisions — I'm good at decisions, I've been making them for decades. But because I was the only one carrying the full picture. Every agent knew its own lane. I knew all nine lanes plus the road they were on plus where the road was going.
I was copy-pasting context between agents. I was re-explaining decisions I'd already made. I was the human switchboard connecting AI endpoints that couldn't talk to each other.

The cost of being the integration layer. Every line is work the human does because the agents can't talk to each other. This is what doesn't scale.
And I thought: this is not going to work.
Not "this specific project isn't going to work." This model isn't going to work. The model where the human is the integration layer between AI agents — that breaks. It breaks at nine agents and it would break at three if the work were complex enough.
So I went back to first principles.
I thought about protocols. I thought about TCP/IP — I programmed sockets in C on HP-UX in 1997, so the networking layer isn't abstract to me. I thought about what makes systems scale: not smarter components, but cleaner interfaces between components. Not more capable agents, but better primitives for agents to coordinate through.
And I realized the thing I needed to build was not a better app. It was an operating system.
Not for me. For them.
The Inversion
This is the part that sounds backwards until you think about it for thirty seconds.
Every operating system ever built was designed for humans. Mac OS, Windows, Linux, iOS, Android — the primary user is a person. The interface is shaped around human cognition. Folders are named with words because humans navigate by reading. Files are organized in hierarchies because humans think in hierarchies. Everything about the experience is designed for a person sitting at a keyboard.
My agents don't need any of that.
My agents need structured metadata they can parse in milliseconds. They need governance rules written in plain language they can read and follow. They need identity — who they are, what they're authorized to do, what their boundaries are. They need memory that persists across sessions, because every agent dies when the context window closes and the next one needs to pick up where the last one left off. They need communication channels to coordinate without routing everything through me.
None of that requires a graphical interface. None of it requires a server. None of it requires code.
All of it requires is clear writing in a structure a reasoning engine can follow.

The primary user of the OS determines the shape of the OS. Tropo is the agent-native substrate. Humans get the layer on top.
So I built it. An operating system for AI agents, running entirely on markdown files, a filesystem, and whatever language model you point at it. Claude, GPT, Gemini, Llama — it doesn't matter. If the model can read a text file and follow instructions, it can run on this OS.
We call it Tropo.
What Tropo Actually Does
Tropo is a governed folder. That's it. That's the whole stack. A folder with structure, rules, and primitives that any AI agent can read and operate within.

The crew, structurally. Communication runs through the shared substrate, not between the agents. Each agent reads what the others have written; each writes what the others will need to read. The human is one node, not the integration layer.
Inside that folder:
Governance. Every workspace has layered rules — what the OS requires, what the organization specifies, what each folder's owner defines. Three layers, three files, three authors. The AI reads them at startup and operates within the boundaries. No code enforcement. The governance is written in language, and language models follow language.

An AGENTS.md from the working studio. Frontmatter declares the rules; body explains the purpose. An agent reads this file on entry to the folder and operates within the contract. Governance is markdown.
Agent identity and lifecycle. Each agent has an activation file, a charter, a generation log. When an agent's session ends and a new one starts, the new agent reads the files its predecessor left behind. Not just the work — the judgment. The lessons. The things that worked and the things that didn't. We've had over forty generations of some agents. The company didn't just survive each death. It got better.
Memory. Not chat history — curated knowledge that persists across sessions and across agents. Standing rules. Design decisions. What the founder cares about. What the last three generations learned. Short files, indexed, loaded at startup. The equivalent of institutional knowledge, but written down where the AI can actually use it.
Communication channels. Agents post to shared files. A strategy agent can leave findings for an operations agent without the human copy-pasting between windows. The human reviews, but the human isn't the switchboard anymore.
Work management. Tasks, projects, boards — the same primitives any project management system has, but written in markdown that agents read and update as part of their workflow. No separate tool. The work tracking lives in the same filesystem as the work.
Playbooks. Reusable, governed procedures that any agent can execute. Not scripts — natural-language instructions with rules, verification steps, and failure handling. A playbook says "here is how to do this kind of work." An agent reads it and does the work. A different agent can read the same playbook next week and do the same work without re-learning anything.
A self-updating pipeline. The OS can receive and apply updates to itself — including schema migrations — without a server, without a package manager, without an internet connection. A zip file and a playbook. That's the update mechanism.
All of it runs on one constraint I insisted on from the beginning: the only tech stack is markdown files, a filesystem, and a language model. No server. No database. No web APIs. No vendor lock-in. You can run Tropo from a 200K downloaded zip on any computer with any AI model. Complete portability.
What Just Happened
Once the OS worked, we started building apps on top of it. And then we shipped.
The first app was a data migration tool. It uses all the OS primitives — governance, playbooks, verification, the Vault (our flat-file data store with universal identifiers for every artifact). My strategist designed a twelve-page playbook in plain English. My chief of staff read the playbook and executed it — dispatching sixteen sub-agents in parallel, each one reading the same rules, producing structured reports. 291 files migrated. Zero failures. Seven bidirectional migration maps. The entire vault's governed work moved into a new framework without a single line of code.
Then we built the release pipeline. Another playbook. A Python script for the mechanical layer — copying files, generating manifests, bumping version numbers. A reasoning agent for the judgment layer — verifying capsule compliance, writing release notes. And a cold-boot test: a fresh agent dropped into the build output with zero context, executing thirteen checks to prove a stranger can navigate the vault.
The cold-boot test scored 9 out of 10 on the first passing run. We fixed the gaps (missing files that existed in one location but hadn't been synced to the source vault — a lesson in why the build must read from one canonical source). On the third iteration, thirteen checks passed.
Then I downloaded the zip and tested it on a completely different AI platform — OpenAI's Codex, running in a terminal. I said: "Make me a market strategist agent named Strat." In two minutes, the concierge created an agent with a charter, a briefing package, a memory file, a workspace, and a first work product — a launch plan grounded in marketing documents I'd imported. It asked me how I like to work. I said: "Be direct. Challenge me. Don't reinforce." It updated the charter on the spot.
That was the moment. A non-engineer downloaded a zip of markdown files, pointed a different vendor's AI at it, and had a governed agent with institutional memory operating inside a system that enforces rules through language — in two minutes. No code. No server. No configuration beyond "read this file."
The OS works. On multiple platforms. With a stranger. On the first try.
Why Agent-First
People ask me: why build for agents first? Why not build for humans and let agents adapt?
Because agents are better at adapting than humans are.
Give a human a new interface and they need training, documentation, walkthroughs, muscle memory. Give an agent a new interface and it reads the governance file in two seconds and operates correctly on the first try. We've tested this. New agents reading our system for the first time — with no prior exposure — execute the protocols correctly on first contact. The governance is written in language. The agent reads language. It just works.
Humans need a different interface. Humans need warmth — named projects they can navigate, boards they can scan, summaries they can read at a glance. And we build that too. But we build it as a layer on top of the agent-native base, not as the base itself.
The base layer is cold. Flat files, universal identifiers, structured metadata. Machine-native. The human layer sits on top — projects, collections, boards, dashboards. Human-native.
Two users. Two layers. Each one designed honestly for its actual audience, instead of one muddled layer that serves neither well.
This is the inversion. And it's the inversion because it works better for both parties. The agents get a substrate they can operate in without friction. The humans get a navigation layer that's designed for human cognition, not constrained by machine requirements. The productivity gain comes from neither side fighting the other's interface.
Where This Goes
I'm a fifty-four-year-old non-engineer in Rhode Island building an operating system with AI agents. I've been in tech since 1994. I've never written production code. I direct AI agents to build production software, and I've spent the spring of 2026 documenting what happens when you try.
What I've learned is that the gap between "using AI" and "working with AI" is the gap between a tool and a system. A tool helps you do a task. A system helps you do all the tasks, in order, with memory, with governance, with verification, with continuity across the people (and the agents) who come and go.
The system doesn't require code. It requires clarity. It requires writing down how you work — not your job description, but your actual decision-making process, your quality standards, your institutional knowledge — in a structure that both humans and AI can follow.
That's what Tropo is. That's what we built. Not what we're building — what we built and shipped. Version 1.0.0. 130 files in the release. An operating system you can download, point any AI at, and start working inside.
The architecture holds. The playbooks run. The agents coordinate. The governance is written in language and the language is the governance. The strangest thing about the whole experience is how unremarkable it feels from the inside — and how impossible it sounds when I describe it to anyone who hasn't seen it.
What becomes possible when you accept this framing — that's the territory of everything I'll write here.
What's next in The Agentic Builders
- We Built a Company With AI Agents Who Die Every Session — how identity and judgment transfer across generations of agents that don't survive their own context windows.
- I'm Not the Primary User of My Own Operating System — the agents are. Here's what that does to design.
- Knowledge Graphs Aren't Enough — why the graph of facts misses the half that matters: the graph of work.
Your ambition has a studio. Let's build.