There is a moment in every team that adopts AI agents when the shared template stops working. You notice it on a random Tuesday: two people show you different versions of the same rule, both working, both different. Nobody made a mistake. The template, simply, is no longer the source of truth — it is a historical average of what once was true.
If your team creates dozens of agent workspaces per week across multiple products, this article is for you. We will walk through why the duplicated template breaks at scale, which existing patterns we tried before discarding them, and the structural shift that ended up working: stop configuring the agent and start composing the workspace.
The full technical specification lives elsewhere, and we link to it at the end as the canonical source. Here we tell the story.
The day the template stopped working
The shared template in Drive — or in a starter repo, or on the company NAS — was, for months, the simplest thing that could work. And it worked. Every new project was a duplication: copy the folder, rename it, tweak two things, and your agent had your team's conventions.
Until the team grew. Until the products multiplied. Until every new workspace was a copy with small drifts away from a "source of truth" that nobody could point to with confidence anymore.
At scale, a duplicated template does not scale. Not by negligence. By geometry. Each copy receives local edits that do not flow back. Manual consolidation competes with shipping features and always loses. After six months, the "canonical template" lags behind the average workspace, and the team knows it.
What you need is not a better template. It is a different pattern.
Principle: A template that gets duplicated is the source of future divergences, not the source of truth.
The two cracks that open at scale
The more your team uses AI agents, the more you feel two pains pulling in opposite directions: drift and bloat.
Drift comes from success. People edit rules, skills and workflows while they work — because that is what a learning team does. But improvements stay in the copy where they were made. A week later, the same rule exists in four different versions across four different workspaces, and none of them is in the template.
Bloat comes from growth. To serve every product, the template accumulates rules, skills and workflows for each. And every new workspace inherits everything, even though it touches only one. The agent's context window fills with material that does not apply to today's task. Token cost rises, signal-to-noise drops.
The two cracks cannot be closed with the same lever. "Centralize everything in the template" makes bloat worse. "Let each team have its own version" makes drift worse. The obvious recipes collide. After months of iteration, you realize you need a solution that attacks both at the same time.
The first step is detecting each pain explicitly:
- Drift: are there files that appear identical in multiple workspaces but with local edits nobody consolidated?
- Bloat: does the agent load capabilities for products it does not touch this week?
If the answer is "yes" to one, the other will appear soon. They almost always travel together.
Principle: If you optimize for only one of the two cracks, the other will bite you two months later.
What we tried before (and why it did not fit)
Before arriving at the current pattern, we tried obvious options. None of them solved both cracks.
We tried first a stricter shared template, with mandatory review before duplication. That partially solved bloat — at least nothing random got in — but drift grew, because the friction of "consolidate later" became even higher. Review turned into a bottleneck, and workspaces diverged anyway.
Next we tried git submodules for transversal files. Each workspace included a submodule pointing to a "shared agents" repo. That solved drift but not bloat — every workspace still loaded the entire shared layer. And submodules, on a team with mixed seniority profiles, became an operational nightmare: updating shared broke workspaces, detached HEAD states confused people, and conflicts at update time consumed more hours than they saved.
We also looked for published approaches. The closest pattern we found was a multi-repo workspace strategy based on submodules with a hierarchy of CLAUDE.md files — a private workspace repo that aggregates several service repos, with agent-context files in a hierarchy. It is a solid pattern when your unit of work is a single product with several services and a team that lives in that product. If that is your shape, we recommend evaluating it.
It was not our shape. We have multiple products in distinct GitHub organizations, dozens of ephemeral workspaces created and deleted within days, and a team that rotates between products by the week. A structure optimized for "one product, several services" does not solve "many products, many tasks, many workspaces".
Quick fit summary:
- If your team works on a single product, several services, with stable workspaces → the submodule strategy with
CLAUDE.mdhierarchy probably fits you. - If your team works on multiple products, dozens of workspaces per week, multiple GitHub orgs, with people rotating across products → you need something else.
Principle: Adopting the wrong pattern for your team shape costs more than adopting nothing.
The shift in unit: the workspace, not the agent
The "aha" came when we stopped thinking about "configuring the agent" and started thinking about "configuring the workspace".
It is a subtle shift, but structural. Almost every AI agent configuration framework available today has the same shape: a manifest defines an agent — its tools, its skills, its prompt — and that manifest is the source of truth. It works if the agent changes a lot and the context changes little. If it is the other way around — the same agent working across many different contexts — the agent's manifest is optimized for the wrong dimension.
In a multi-product team, what varies is not the agent. The agent is the same. What varies is the slice of organizational context the agent needs to act on this specific task. When someone opens a workspace for "fix bug in the integration module of product A", the relevant context is: rules of product A + conventions of the integrated ecosystem + module code + universal team rules. An hour later, that same person opens another workspace for "draft docs for product B": the relevant context is entirely different. Same agent. Same model. What changes is the context envelope.
Make the workspace the unit of composition, not the agent. The manifest describes the task's context envelope, not the agent's identity. That structural decision is what unlocks everything else. We named the pattern: Agent Workspace as Code (AWaC).
Principle: The agent is constant; what varies is the context. Compose at the level of what varies.
What a declarative workspace looks like
Talking about "composing at the workspace level" is abstract until you see the manifest. The surprising part: it is a single file. No scripts. No submodules. No setup.sh.
A workspace is described by a YAML manifest that lists which capability stacks to include and which code repos to clone. Each stack lives in its own repository: one source of truth per stack. The workspace folder starts empty and a bootstrap operation materializes it.
Generalized example:
name: product-a-billing-feature
schema: awac/1
stacks:
- core
- cloud-platform
- product-a
- integrated-ecosystem
repos:
- org: product-a-org
repo: backend
path: backend/
- org: ecosystem-modules
repo: product-a-billing-connector
path: modules/billing-connector/
A workspace for research or documentation can be as minimal as this:
name: research-spike
schema: awac/1
stacks: [core, research]
Each workspace receives only the stacks it declared. The Product B team never loads Product A rules. Bloat solved.
The first decisions to make when adopting this pattern:
- Identify your core stack: what applies to every workspace (commit conventions, branch policy, universal safety rules, anti-prompt-injection).
- Identify your product stacks: one per SaaS product you maintain. Lives in the GitHub org that owns that product.
- Identify your technology stacks: ecosystems you frequently integrate with (an ERP, a CMS, a cloud provider).
The first manifest of a new project is written in under a minute. If it takes longer, your stacks are wrongly segmented.
Principle: A workspace is a declared composition, not a cocktail of duplicates.
The loop that closes drift
Declarative composition solves bloat. But what about drift?
This is where most of the approaches we tried break. Centralizing capabilities in shared repos is relatively easy; the hard part is making local improvements — those small edits people make while working, which justify their existence — flow back to the right stack without friction and without ad-hoc governance.
The piece is an operation called promote. When someone edits a rule in their workspace and wants to share the improvement, a single command line detects which stack the file came from, opens a PR against that stack's repo, and the improvement enters human governance — team review. Once merged, every workspace using that stack receives it on its next sync.
The pattern's operations are five, defined behaviorally so any team can implement them in the language they prefer:
bootstrap: from an empty manifest, produce the complete workspace. Idempotent.sync: re-apply the manifest. Pull what advanced upstream without overwriting local changes.promote: push a local improvement to the stack that owns it, via PR.status: report drift, divergences, what is pending promotion.worktree: isolate parallel work of multiple agents on the same workspace.
The full loop: local edits → promote → PR → review → merge → every workspace receives it. Governance is human (the PR is the moment to review). Information movement is automatic. Drift solved.
Principle: An improvement is worth what it costs to share it. If it costs ten minutes, it does not get shared. If it costs one command, it does.
What we learned in pilot
The specification describes the end state. The pilot teaches something different: what the transition looks like.
If you are considering adopting this pattern on your team, the following lessons can save you weeks. They come from an active pilot across multiple products simultaneously, with a team that rotates between tasks frequently.
Start with the
corestack, not with product stacks. Core is what everyone shares. Migrating the truly universal rules first unlocks everything else. Migrating a specific stack first leads to debating boundaries, and nobody agrees on what counts as "specific".Do not pre-build stacks you are not using this week. Creating
agent-stack-foo"just in case" is debt. Create it when a real project needs it. Empty stacks become cemeteries where people wait for someone else to fill them.The most controversial rules converge fastest. When a
promotePR touches a rule that half the team would write differently, the technical conversation happens in the PR — and the team gains clarity. Slack controversies get resolved in the repo. PR-based governance is colder and more productive than meeting-based governance.The ephemeral workspace lowers the bar for experiments. When creating a new workspace costs one minute and two commands, people try more things. More experiments = more improvements detected = more
promotePRs. The first adoption we saw was someone creating a workspace just to try a variant of a rule, no commitment. That culture appears on its own when the pattern enables it.The "private overlay" is the right escape valve for personal preferences. Things that should not live in shared team stacks — model preferences, references to personal secrets, individual shortcuts — live in a private overlay each person controls. Without this, the pressure to push "my config" into shared stacks breaks governance. With this, shared stacks stay clean and each person tunes their experience.
An adoption sequence that worked for us:
- Week 1: create the
corerepo with what is truly universal. Pilot with one project. - Week 2: iterate with real feedback. Do not add new stacks yet.
- Week 3: add
promote. Not before — you need 1-2 weeks of use for the team to understand what they want to promote. - Week 4 onward: migrate product stacks on demand. When a Product X project starts and the stack does not exist, create it then.
Principle: The pattern is what gets published. Adoption is what gets learned.
Checklist to evaluate if it fits you
- Does your team create more than five new workspaces per week across different products?
- Are there derivative versions of the same rule file across different workspaces?
- Does the template accumulate capabilities for products most workspaces do not touch?
- Do new team members open the template and not understand what applies to what?
- Has consolidation of improvements into the "canonical template" been postponed by someone for months?
If you said yes to more than one, this pattern can serve you.
What matters is the idea, not the implementation. Any team can adopt AWaC in the language and with the tooling they prefer — bash, Python, Go, Rust. The specification is what gets shared; the code is yours.
If your team tries it, write back what worked and what did not. The spec deliberately leaves open questions — about stack granularity, layer conflicts, versioning — for the community to close with real experience.
The full canonical specification, with the manifest schema, the exact composition order, all operations defined behaviorally, and the open questions, is published here:
The full canonical specification is published here: Agent Workspace as Code (AWaC) — canonical specification
Centralizing AI agent workspaces: the pattern that scales with teams