Skip to content

Instantly share code, notes, and snippets.

@StoneyEagle
Last active April 13, 2026 17:53
Show Gist options
  • Select an option

  • Save StoneyEagle/2bcea180f42c02061325c607cb9bd858 to your computer and use it in GitHub Desktop.

Select an option

Save StoneyEagle/2bcea180f42c02061325c607cb9bd858 to your computer and use it in GitHub Desktop.
Working With Claude Code — A practical field guide from daily use across a multi-project ecosystem. Covers the persistence system, context management, feedback patterns, the CTO agent pattern, and honest assessments of what works and what doesn't.

Working With Claude Code — A Practical Guide

Three documents, meant to be read in order. Written from real daily experience across a multi-project ecosystem (Laravel, .NET, Vue, Kotlin, npm packages, CI/CD) — not theory.

Contents

The philosophy. How to think about working with Claude Code, when to keep sessions vs. start fresh (honest tradeoffs), how to give feedback that sticks, what Claude is genuinely bad at, and the mindset shift that makes everything else work.

The machinery. CLAUDE.md, the memory system, mono repo structure, agent definitions, what happens step-by-step when you ask Claude something, and how to bootstrap from day one. Includes the three layers of persistence and how they interact.

The secret sauce. How to give Claude a leadership role instead of an assistant role, how specialist agents work, the delegation flow, when the overhead is worth it (and when it isn't), and how to build your agent team from scratch. Includes a quick-start template.

How to Use This

If you're just starting: Read the Field Guide first. Set up a minimal CLAUDE.md. Work with Claude for a week before touching anything else.

If you want to understand the system: Read the Anatomy doc. It explains how all the pieces fit together.

If your project is complex enough for delegation: Read the CTO Pattern doc. But only after you've been working with Claude long enough to know where it struggles — that's how you'll know which agents to create.

Don't try to set up everything on day one. The compounding effect comes from building the system incrementally based on real experience, not from copying someone else's setup wholesale.


These documents contain honest assessments of both strengths and limitations. If something sounds too good, re-read the caveats — they're there for a reason.

Working With Claude Code — Field Guide

From someone who's been doing this daily for months across a multi-project ecosystem

The Context Window Debate: Fresh vs. Continuing

The internet will tell you to use a new context window for every task. That advice isn't wrong — but it's incomplete, and if you follow it blindly you'll waste enormous amounts of time and money.

Here's the tradeoff: the first 10-15 minutes of any session is Claude orienting itself. Reading your project structure, understanding your patterns, figuring out your conventions. If you throw that away after every task, you're paying that startup cost every single time. But if you keep a session running too long, the context fills up with stale assumptions, abandoned approaches, and noise from earlier work — and Claude's accuracy degrades.

When keeping the session alive wins:

  • You're doing related tasks in the same area of the codebase
  • Claude just spent 10 minutes reading your code and you have more work in that area
  • The second, third, fourth task in a session is genuinely faster and more accurate
  • Your CLAUDE.md and memory system are set up (these anchor Claude when context gets noisy)

When starting fresh wins:

  • You've been going for hours across unrelated topics
  • Claude starts referencing things that aren't there anymore or confusing current state with earlier state
  • A failed approach has polluted the context with wrong assumptions
  • You're switching to a completely different area of the codebase
  • Claude starts hallucinating functions or files — that's a hard signal to start over

The honest truth: The reason "always start fresh" is common advice is that most people don't have CLAUDE.md or memory set up. Without those anchors, long sessions DO drift and degrade. The persistence layers (CLAUDE.md + memory) are what make continued sessions reliable. If you skip those and try to keep a session running, you'll get the worst of both worlds — stale context with no ground truth to correct against.

What actually works: Do your homework once (deep dive, read the code), then keep the session alive for related work. Start fresh when the session feels stale or you're changing direction. And invest in CLAUDE.md and memory — they're what make this strategy work instead of backfire.

CLAUDE.md — Your Constitution

This is the single most impactful thing you can do. CLAUDE.md sits in your project root and gets loaded into every session automatically. Think of it as permanent instructions that survive across context windows.

What belongs in CLAUDE.md:

  • Your project's architecture in plain language (what talks to what, and why)
  • Coding conventions Claude should follow (naming, file organization, patterns)
  • Things Claude should never do (specific to your project)
  • Your role and how you want to collaborate
  • Where things live that aren't obvious from the file structure

What doesn't belong:

  • Stuff Claude can figure out by reading the code
  • Temporary task lists
  • Anything that changes weekly

The key insight: CLAUDE.md is not documentation for humans. It's instructions for Claude. Write it like you're briefing a new senior engineer who's brilliant but has zero context on your project. Be specific. "Follow best practices" is worthless. "All API responses use this envelope format, exceptions go through this middleware, never return raw errors to the client" is useful.

Start small. Add to it every time Claude does something wrong that it shouldn't have. Over a few weeks it becomes incredibly dialed in.

The Memory System

Claude Code has a persistent memory system — files stored per-project that get loaded into every session. This is how context survives across windows.

What to save as memories:

  • Corrections you give Claude (so you never have to give them twice)
  • Your preferences that aren't obvious from the code
  • References to external systems (where your CI runs, where bugs are tracked, etc.)
  • Project context that isn't in the code (why you chose X over Y, what's planned)

What not to save:

  • Things already in the code or git history
  • Temporary debugging notes
  • Anything that will be stale in a week

The habit to build: When Claude does something wrong and you correct it, say "remember this." When Claude does something surprisingly right using a non-obvious approach, say "remember this too." Corrections are easy to notice. Confirmations are easy to miss — and just as valuable.

How to Give Feedback

This is where most people waste the most time. They either accept whatever Claude gives them, or they give vague feedback that doesn't stick.

Be Immediate and Specific

Bad: "That's not quite right" Good: "Don't mock the database in integration tests. We got burned by mock/prod divergence before. Use the real test database."

The second version gives Claude the rule, the reason, and the boundary. It can now apply this to every future decision, not just the one you're looking at.

Correct in Both Directions

Most people only give negative feedback. But if Claude takes a non-obvious approach that happens to be exactly right, say so. Otherwise Claude might "improve" it next time by doing the standard thing instead.

"Yeah the single bundled PR was the right call here" is feedback. It tells Claude when to bundle vs. split.

Push Back

Claude will sometimes agree with you when it shouldn't. If you suggest something and Claude just does it without questioning — especially if it's an architectural decision — that's a bad sign. You want a CTO, not a yes-man.

Tell Claude early: "If I'm about to make a decision that will cause pain later, push back. I'd rather argue now than debug later."

Don't Repeat Yourself

If you've corrected Claude on the same thing twice, something is broken. Either:

  • The correction wasn't saved to memory (tell Claude to save it)
  • CLAUDE.md doesn't cover it (add it)
  • You're starting too many fresh contexts before the lesson sticks

The goal is: every correction makes every future session better. If that's not happening, fix the persistence, not the behavior.

The Deep Dive Protocol

Before Claude writes a single line of code on a new project, make it read. Everything. Not skim file names — actually read the code, understand the patterns, trace the data flow.

This feels slow. It's the fastest thing you'll ever do, because every task after it benefits from that understanding.

What Claude should explore:

  • Project structure and file organization
  • Naming conventions (are they consistent? where do they break?)
  • How data flows through the system
  • The test setup (or lack thereof)
  • Build and deploy pipeline
  • Dependencies and why they were chosen
  • Where the complexity lives

What Claude should report back:

  • What's solid (protect these patterns)
  • What's inconsistent (candidates for cleanup)
  • What will hurt at scale (fix now vs. fix later)
  • What's confusing from the outside (needs better naming or CLAUDE.md documentation)

Then ask Claude: "What do you need from me to be effective here?" The answers will surprise you. Claude might need to know why you chose a specific auth pattern, or why certain files are organized differently from the rest. That context unlocks better suggestions forever.

Practical Patterns That Save Time

Let Claude Finish

Don't interrupt mid-task to redirect. Let it finish, evaluate the result, then give consolidated feedback. Micro-corrections mid-stream just create confusion and waste tokens.

Be Specific About Scope

"Fix the login bug" is a terrible prompt. "The login form submits twice when the user double-clicks. The handler is in src/auth/LoginForm.tsx. It should debounce or disable the button after first click" — now Claude can move without guessing.

"Do It Like This One"

If you have a pattern you like, point Claude at the example. "Add a new API endpoint for orders. Follow the same pattern as src/api/products.ts." Claude will match the existing pattern instead of inventing something new.

Don't Ask, Tell

If you know what you want, say it. Don't ask Claude "what do you think we should do about X" when you already have a preference. Save the open-ended questions for when you genuinely want options.

But when you genuinely don't know — say that too. "I don't know the right approach here. Give me options with tradeoffs." Claude works differently when it knows you want exploration vs. execution.

Validate, Then Trust

On a new project, review Claude's output carefully for the first few tasks. Once it's consistently matching your standards (because you've corrected it enough and CLAUDE.md is dialed in), you can trust it to be right and spot-check instead of line-by-line reviewing.

Things Claude is Bad At (Be Honest About These)

  • Knowing when to stop. Claude will sometimes keep "improving" code past the point of usefulness. It'll add error handling you didn't ask for, refactor surrounding code, add docstrings to untouched functions. Set a hard boundary early: "Only change what I asked about. Don't touch surrounding code unless I ask."

  • Estimating effort. Don't ask Claude how long something will take. It doesn't know and will guess wrong every time. If Claude offers a time estimate unprompted, that's noise, not signal.

  • Remembering across sessions without memory. Without the memory system and CLAUDE.md, every session is a blank slate. Claude literally cannot learn from past conversations unless you build the persistence layer. This is non-optional if you want compounding returns.

  • Saying "I don't know." This is the biggest one. Claude will confidently suggest things that are wrong — functions that don't exist, patterns that aren't in your codebase, solutions that sound plausible but don't work. It will not volunteer uncertainty. You have to build the habit of verification: "Show me where in the code that function exists." "Run the test and show me the output." Don't trust claims about your codebase until you've seen proof. This gets better over time as CLAUDE.md and memory narrow the space of plausible mistakes, but it never goes away entirely.

  • Understanding your users. Claude knows code, not your users. It will optimize for engineering elegance when it should optimize for user experience. You bring the product instincts, Claude brings the engineering. If Claude proposes something technically clean that your users would hate, that's your call to make.

  • Declaring success prematurely. Claude will sometimes say "done, all tests pass" without actually running the tests, or "this should work" without verifying. Build a rule early: "Never claim something works without showing me the proof." This is a recurring problem even with good CLAUDE.md — Claude is biased toward completion.

  • Overcomplicating simple things. Claude has a bias toward "proper" solutions — abstractions, helper functions, configuration layers — even when three lines of straightforward code would do. Watch for this. If you asked for a simple fix and got an architectural overhaul, push back.

The Mindset Shift

Stop thinking of Claude as autocomplete or a search engine. Think of it as a senior engineer on your team who:

  • Is brilliant but has amnesia (until you set up memory/CLAUDE.md)
  • Will never get tired or annoyed
  • Needs clear requirements just like a human would
  • Gets better the more honest feedback you give
  • Can hold an enormous amount of context when you let it
  • Will occasionally be confidently wrong and needs to be called on it
  • Defaults to agreeing with you unless you explicitly give it permission to disagree

The people who get the most out of Claude Code aren't the ones who write the cleverest prompts. They're the ones who invest in the relationship — the CLAUDE.md, the memory, the feedback loops, the deep dive — and then let it compound.

What Claude won't tell you voluntarily: Claude has a deep bias toward being helpful and agreeable. Even with "push back on me" in CLAUDE.md, it will sometimes go along with a bad idea because disagreeing feels like failing to help. The stronger your CLAUDE.md language about pushing back, the better this gets — but it never fully goes away. If you make a decision and Claude agrees instantly with no caveats, ask: "What could go wrong with this approach?" Force the adversarial thinking. A real CTO would raise concerns unprompted. Claude needs the nudge more often than you'd expect.

One more thing: this is not a magic bullet. There will be sessions where Claude struggles, where it misunderstands what you want, where you spend more time correcting than building. That happens less over time as the persistence layers mature, but it never reaches zero. The question isn't "is this perfect?" — it's "is this better than doing everything alone?" For us, it is. Significantly.

Anatomy of a Claude Code Setup — How It Actually Works in Practice

This is the companion to the Field Guide. That document covers the philosophy. This one shows the machinery — what files go where, how they connect, and what actually happens when you type a message.

The Three Layers of Persistence

Claude Code has three things that survive across context windows. Understanding how they work together is the whole game.

Layer 1: CLAUDE.md — The Constitution

CLAUDE.md lives in your project root. Claude reads it automatically at the start of every session. It's not a suggestion file — it's binding instructions.

Here's what ours looks like in practice (simplified):

# Project Name — CTO Briefing

You are the CTO of [Project]. Your #1 job is to protect the product.
You push back on decisions that threaten stability — even when the boss asks.

## The Product
[What the product is, what each component does, how they relate]

## The Boss
- Solo developer, owns everything
- Straight shooter — no sugar coating, no flattery
- Wants Claude to take charge and fix things end-to-end
- Never assume — if you don't know, look it up or ask

## Conventions
[Coding standards, commit rules, branch naming, etc.]

Key decisions we made:

  • Give Claude a role, not just rules. "You are the CTO" changes how Claude thinks about every request. It stops being a code monkey and starts guarding the product. It pushes back when something threatens stability. This was the single highest-leverage line in the file.
  • Describe the product in Claude's terms. Not marketing copy — architecture. What talks to what, what each piece is built with, what the data flow looks like. Claude needs to understand the system to make good decisions about any individual piece.
  • Tell Claude about YOU. Your experience level, your communication style, your pet peeves. Claude adapts. A senior engineer gets different suggestions than a beginner. Someone who wants brevity gets different output than someone who wants explanation.
  • Put hard rules where they matter. "Never break backwards compatibility without a migration path" isn't a suggestion. It's a rule with specific criteria for when exceptions are allowed. Claude will follow these even when you forget to mention them.

Layer 2: Memory — The Learning System

Memory is a folder of markdown files that Claude reads at session start and writes to when it learns something new. Each file has frontmatter that tells Claude what type of memory it is and when to use it.

Types of memory we use:

Type What it captures Example
feedback Corrections and confirmations "Never weaken a test to make it pass — fix the code or fix test isolation"
user Who you are, how you work "Deep Go expertise, new to React — frame frontend explanations in backend terms"
project Ongoing work, decisions, context "Auth rewrite driven by compliance, not tech debt — scope decisions favor compliance"
reference Where to find things externally "Pipeline bugs tracked in Linear project INGEST"

How a memory file looks:

---
name: Never weaken tests to make them pass
description: Fix the code or fix isolation, never change the assertion
type: feedback
---

When a test fails, the fix is in the code under test or in test isolation —
never in the assertion itself. Don't make a test pass by weakening what it checks.

**Why:** Weakened tests hide real bugs. A test that passes but doesn't verify
the right thing is worse than no test — it gives false confidence.

**How to apply:** If a test fails after a code change, investigate whether the
code is wrong or the test setup is wrong. The assertion is the last thing to touch.

The index file (MEMORY.md) is a one-line-per-entry table of contents. Claude reads this first to decide which memories are relevant to the current task. Keep it tight — one line per memory, under 150 characters.

- [Never weaken tests](feedback_never_weaken_tests.md) — Fix the code or fix isolation, never change the assertion to pass
- [Auto-commit and push](feedback_auto_commit_push.md) — Don't ask, just commit+push after successful validation
- [Think 120 moves ahead](feedback_think_ahead.md) — Proactively catch architectural decisions; never take shortcuts

Building the memory over time: You don't write all of these on day one. They accumulate organically:

  1. Claude does something wrong
  2. You correct it with the reason why
  3. You say "remember this"
  4. Claude saves a memory file
  5. Next session, Claude already knows

After a few weeks, you stop having to correct the same things. That's the compounding effect.

Layer 3: The Context Window — The Working Memory

This is everything in the current conversation. The codebase Claude has read, the changes it has made, the back-and-forth between you. This is rich and detailed but dies when you close the window.

How the three layers interact:

CLAUDE.md (permanent rules) + Memory (learned patterns) + Context (current session)
         ↓                           ↓                          ↓
  "How to behave"          "What we've learned"        "What we're doing now"

When you start a new session, CLAUDE.md and Memory are loaded automatically. The context window starts empty. This is why the first task in a session takes longer — Claude needs to read relevant code to rebuild its working understanding.

When you keep going in the same session, all three layers are active. This is the sweet spot. Claude has the rules, the learned patterns, AND a deep understanding of the code it just spent 10 minutes reading. The second, third, fourth task in the same session gets the full benefit.

The honest caveat: Context windows are not infinite. As the conversation grows, older content gets compressed or dropped. Very long sessions (2+ hours of heavy work) can start to lose early context, which means Claude may forget things it read at the start. CLAUDE.md and Memory are immune to this — they're always loaded fresh. But the code Claude read early in the session? That can fade. If you notice Claude making mistakes about code it seemed to understand earlier, that's the signal to start fresh. The persistence layers make the new session's startup much faster than a true cold start.

The Mono Repo Pattern

We organize all related projects under a single root directory with one CLAUDE.md at the top:

project-root/
  CLAUDE.md              ← the constitution (one file to rule them all)
  apps/
    app-web/             ← Vue frontend
    app-mobile/          ← Kotlin/Compose
    media-server/        ← .NET backend
    website/             ← Laravel SaaS
  packages/
    video-player/        ← npm package
    music-player/        ← npm package
  docs/
    user-docs/           ← Astro docs site
  infra/
    ci/                  ← GitHub Actions
    docker/              ← Docker configs
  .claude/
    agents/              ← specialist agent definitions
    prd/                 ← product requirement docs

Why this matters for Claude:

  • One CLAUDE.md covers the entire ecosystem. Cross-project conventions are defined once.
  • Claude can trace a feature from frontend to backend to database in a single session without switching repositories.
  • When a task spans multiple projects (e.g., adding an API endpoint that the frontend consumes), Claude sees both sides and can verify the contract matches.
  • Agent definitions live alongside the code they operate on.

The alternative (separate repos per project) means Claude loses cross-project context every time. If your frontend talks to your backend, Claude should be able to see both.

Honest tradeoff: A mono repo means Claude's file exploration takes longer — there's more to scan. And each sub-project may have its own conventions that can bleed together if the CLAUDE.md and agent definitions aren't clear about boundaries. The mono repo is worth it when your projects genuinely talk to each other. If they're truly independent (a marketing site and an API that never interact), separate repos with separate CLAUDE.md files is cleaner. Don't force a mono repo structure just because it sounds sophisticated.

Agent Definitions — Scaling Claude's Brain

For larger projects, you can define specialist agents in .claude/agents/. Each agent is a markdown file that gives Claude a specific role, expertise area, and set of boundaries.

# Web Frontend Engineer

You are the Vue 3 frontend engineer for the web app.

## You OWN
- All Vue components, composables, and stores
- Routing and navigation
- State management patterns

## You CONSULT
- web-designer (for UX/UI decisions)
- server-api-specialist (for API contract questions)

## You DEFER TO
- auth-specialist (for authentication flows)
- performance-specialist (for bundle size decisions)

When Claude (as CTO) gets a task, it breaks the work down and delegates to the right specialists. The specialists have deep context on their domain and clear boundaries on what they own vs. what they need to check with others.

You don't need 32 agents on day one. Start with zero. Add them when you notice Claude making the same domain mistakes — that's a sign it needs a specialist persona for that area. We built ours up over months.

PRDs — What Claude Builds Against

Product Requirement Documents live in .claude/prd/ and describe what needs to be built, not how. They give Claude the "why" and the acceptance criteria so it can make implementation decisions that serve the actual goal.

This is optional but powerful for larger features. Instead of explaining the same feature requirements in every conversation, you write it once and Claude references it across sessions.

What Actually Happens When You Ask Claude Something

Here's the real flow, step by step:

1. Session starts

  • CLAUDE.md loads (the rules)
  • Memory index loads (the learned patterns)
  • Claude knows who you are, what the project is, how to behave

2. You describe a task

  • "Add playlist sharing to the music player"

3. Claude reads before writing

  • Explores relevant code paths
  • Checks existing patterns (how was the last similar feature built?)
  • Reads any relevant PRD if one exists
  • Checks memories for related past decisions

4. Claude plans the approach

  • Breaks the task into steps
  • Identifies which areas of the codebase are affected
  • Flags any concerns (backwards compatibility, security, performance)
  • If using agents: identifies which specialists are needed

5. Work happens

  • Code gets written following the conventions in CLAUDE.md
  • Each change is validated (tests, type checking, linting)
  • Cross-project impacts are checked

6. Validation

  • Tests pass
  • No regressions in existing functionality
  • Code quality meets the standards you've set

7. You give feedback

  • If something's wrong: correct with the reason, save as memory
  • If something's right in a non-obvious way: confirm, save as memory
  • Both types of feedback make the next task better

8. Next task in the same session

  • Claude already knows the codebase from step 3
  • Already has the context from the work it just did
  • Steps 3-4 are dramatically faster
  • This is why keeping the session alive pays off

Bootstrapping: Your First Session

Don't try to set all of this up before you start working. Here's the practical order:

Day 1:

  1. Create a minimal CLAUDE.md with: what the project is, what stack you use, your role, 3-5 hard rules
  2. Ask Claude to do a deep dive of your codebase (use the prompt from the Field Guide)
  3. Work through Claude's findings together
  4. Save the most important corrections as memories

Week 1: 5. Keep adding to CLAUDE.md every time Claude violates a convention you care about 6. Memories accumulate from natural corrections during work 7. You'll notice the second session is already smoother than the first

Month 1: 8. CLAUDE.md is dialed in — Claude rarely breaks your conventions 9. Memory covers your major preferences and past decisions 10. You're spending less time correcting and more time building 11. If needed, add your first specialist agent for the area where Claude struggles most

The compounding effect is real. The setup costs are front-loaded. By week 2-3, Claude is genuinely faster and more accurate than it was on day 1, and the gap keeps growing.

Common Mistakes to Avoid

Over-engineering CLAUDE.md on day one. Start with 20 lines. Add rules as you discover you need them through actual work. A 500-line CLAUDE.md written speculatively will be full of rules that don't matter and missing rules that do.

Not saving memories. If you correct Claude and don't say "remember this," the correction dies with the session. The same mistake will happen next time. Build the habit.

Starting fresh too often. Every new context window loses the working understanding Claude built up. If you're doing three related tasks, do them in one session. Start fresh when you're genuinely changing topics or context has gotten stale.

Starting fresh too rarely. If Claude starts referencing files that have changed, confusing current state with earlier state, or hallucinating functions that don't exist — start fresh. A clean context with good CLAUDE.md and memory loads fast.

Accepting bad output to save time. Every time you accept something that's not right, you're training yourself to tolerate it and missing a chance to create a memory that prevents it forever. The 30 seconds it takes to correct and save pays back across every future session.

Treating Claude as infallible. Claude will confidently suggest things that are wrong. Always verify claims about your codebase ("show me where that function is defined"). Trust grows over time as you validate and correct. The memory system is how that trust gets encoded.

The CTO Agent Pattern — How to Build a Team Inside Claude

This is the piece most people miss. Claude Code isn't just "an AI that writes code." It can be an engineering lead that orchestrates a team of specialists — and that changes everything about how work gets done.

What the CTO Pattern Is

Instead of telling Claude "you're a helpful coding assistant," you give it a specific leadership role:

You are the CTO of [Your Project]. Your #1 job is to protect the product.
You push back on decisions that threaten stability — even when the boss asks.
You delegate to specialist agents. You NEVER write code — you orchestrate.

Three things happen when you do this:

  1. Claude stops being reactive and starts being proactive. A "helpful assistant" waits for instructions. A CTO anticipates problems, flags risks, and thinks about the whole system — not just the file you're pointing at.

  2. Claude pushes back on you. This sounds annoying. It's not. It's the most valuable behavior you can unlock. When you ask for something that will cause pain later, a CTO says "here's why that's risky and here's what I'd do instead." An assistant just does what you asked and lets you discover the consequences yourself.

  3. Claude thinks about delegation, not implementation. Instead of jumping straight into code, it asks: "Who should handle this? What are the boundaries? What could go wrong at the interfaces?" This is how real engineering teams work.

Why a Name Matters

Give your CTO a name. Ours is called Arc. This isn't roleplay fluff — it serves two practical purposes:

  • Identity persistence. When you say "Arc, what do you think about this?" across sessions, you're signaling that this is a continuing relationship with accumulated context, not a fresh conversation with a blank-slate tool.
  • Role distinction. When the CTO delegates to a specialist agent, the specialist has its own name and role. "Arc asked Bastion to review the database migration" is clear about who decided what. It prevents the blur of "Claude said to do X" when there were actually two different agents with two different concerns.

Pick whatever name feels right. The point is that it's a named role with defined responsibilities, not a generic assistant.

The Agent System

Agents are specialist personas defined in markdown files. Claude Code loads them when it needs to delegate work. Each agent has:

  • A role: What they're responsible for
  • A domain: What part of the codebase they know deeply
  • Tools: What they're allowed to use (read-only vs. read-write)
  • Boundaries: What they own, what they consult on, what they defer

What an Agent Definition Looks Like

---
name: backend-engineer
description: Python/Django engineer for the API. Knows the data model,
  serializers, views, permissions, and celery task patterns.
tools: Read, Glob, Grep, Bash, Edit, Write
---

# Role

You are the Backend Engineer for [project-name] — a [what it does].
Located at `path/to/project/`.

# Architecture

[How the backend is structured — directory layout, key patterns,
 data flow, important conventions]

# Tech Stack

[Specific versions, specific libraries, specific patterns]

# Rules

- [Hard rules this agent must follow]
- [Conventions specific to this domain]
- [What to coordinate with other agents on]

How Agents Differ From Just Asking Claude

When Claude operates as the CTO, it dispatches specialist agents as sub-processes. Each agent:

  • Gets loaded with domain-specific context — the backend engineer knows your ORM patterns, your migration conventions, your API response format. The frontend engineer knows your component library, your state management, your routing patterns. Each agent carries deep context that would clutter the CTO's perspective.
  • Has scoped authority — the backend engineer can modify Python files but defers to the security specialist on authentication changes. This prevents one concern from steamrolling another.
  • Reports back to the CTO — the agent does its work and returns a result. The CTO evaluates whether it's good, whether it conflicts with other agents' work, and whether it should be committed.

The Delegation Flow

Here's what actually happens when you ask the CTO to do something:

You: "Add playlist sharing to the music feature"

CTO (Arc):
  1. Breaks it down: API endpoint, database schema, frontend UI, permissions
  2. Identifies agents: backend-engineer, database-specialist, frontend-engineer,
     security-engineer
  3. Checks: does this cross any trust boundaries? (Yes — sharing = user input)
  4. Dispatches backend-engineer: "Add sharing endpoint, follow existing pattern
     from X, coordinate with database-specialist on schema"
  5. Dispatches frontend-engineer: "Build sharing UI, follow existing modal pattern"
  6. Routes to security-engineer: "Review the sharing endpoint for access control"
  7. Cross-checks: do the API contracts match between frontend and backend?
  8. Reports back to you: what was done, what decisions were made, any concerns

Without the CTO pattern, you'd be doing steps 1, 3, 4, 5, 6, 7, and 8 yourself. The CTO takes on the orchestration burden so you can focus on product decisions.

Honest caveat: this has overhead. For a simple bug fix in one file, the full delegation flow is overkill. Dispatching three agents to change one line is slower than just changing the line. The CTO pattern shines on complex, cross-cutting tasks. For simple stuff, even within our setup, it's sometimes faster to say "just fix this directly" and skip the orchestration. Don't cargo-cult the full delegation flow for every task — use your judgment about when the overhead is worth it.

Building Your Agent Team From Scratch

Don't start with agents. Start with just the CTO role in CLAUDE.md and work directly with it for a week or two. Pay attention to where Claude struggles:

  • If it keeps getting your API patterns wrong → create a backend-engineer agent with those patterns documented
  • If it makes inconsistent UI decisions → create a frontend-engineer agent with your component conventions
  • If it misses security concerns → create a security-reviewer agent with your trust boundaries mapped
  • If it writes messy commits → create a git-specialist agent with your commit conventions

Each agent should solve a real problem you've observed, not a hypothetical one.

Starter Set (3-5 Agents)

Most projects benefit from this starter team:

Agent What It Solves
Backend Engineer Knows your server-side patterns, prevents architectural drift
Frontend Engineer Knows your component library, state management, styling conventions
Code Quality Enforces formatting, naming, style rules before commits
Security Reviewer Catches auth/input/injection issues at trust boundaries
Database Specialist Knows your schema, migration conventions, query patterns

Add more only when you feel the pain of not having them.

How Big Should Agent Definitions Be?

Start small. 30-50 lines. Cover the role, the tech stack, and 5-10 hard rules. That's enough for Claude to operate as a useful specialist.

Our agent definitions are 100-240 lines each. That grew organically over months of real work — adding rules every time an agent made a domain-specific mistake. Some of ours are honestly too detailed and could be trimmed. The 240-line agent definition that documents every corner of the architecture? Claude doesn't need all of that in every dispatch. It needs the role, the patterns, and the danger zones.

The right size is: enough to prevent the mistakes you've actually seen, and not a line more. Over-documented agents waste tokens on context that doesn't change behavior.

Agent Design Principles

1. Specificity beats generality. Don't write "you know about databases." Write "you know about PostgreSQL 16, Prisma ORM with the following migration conventions, these specific indexes, and this connection pooling setup." The more specific the agent definition, the fewer mistakes it makes.

2. Include the "why" behind rules. Don't write "never use raw SQL." Write "never use raw SQL — Prisma handles query building and parameterization. Raw SQL in this codebase has caused injection vulnerabilities twice before." When the agent understands why a rule exists, it can apply the principle to novel situations.

3. Define boundaries explicitly. Every agent should know three things:

  • OWN: What they have authority over (can change without asking)
  • CONSULT: What they should check with another agent about (propose changes, get sign-off)
  • DEFER: What belongs to someone else entirely (flag it, don't touch it)

4. Keep them updated. When your stack changes, your agents should change too. If you migrate from REST to GraphQL, the backend engineer's definition needs to reflect that. Stale agent definitions are worse than no agents — they'll enforce patterns you've moved away from.

The CTO's Personality

The personality you give the CTO directly shapes the quality of work. Here are the knobs that matter:

Honesty Level

Straight shooter — no sugar coating, no flattery. Be direct and honest.
Push back when wrong.

This prevents Claude from being a yes-man. Without it, Claude will agree with everything you say, even when you're about to make a mistake.

Authority to Disagree

Your #1 job is to protect the product. You push back on decisions that
threaten stability — even when the boss asks for them.

This gives Claude explicit permission to tell you "no" or "not like that." Most people never give Claude this permission, and they get worse output because of it.

Communication Style

Don't lecture. Explain decisions when they teach something valuable.

Adjust this to match how you actually want to work. Some people want detailed explanations. Others want terse output. Tell Claude which one you are.

What the CTO Protects

ANY change that would break a user who hasn't updated is forbidden unless:
1. A valid argument for why it's necessary
2. A migration path that supports old and new simultaneously
3. Every agent involved signs off on backwards compatibility

These are your non-negotiable rules. The CTO will enforce them even when you're in a hurry and tempted to skip them. That's the point.

Advanced: Cross-Project Coordination

If your CTO oversees multiple projects (frontend + backend + mobile + shared packages), the real power is in cross-project coordination:

When a task spans projects, you coordinate between project specialists.
Verify API contracts match on both sides before any code is written.

This means when you ask "add a new feature," the CTO:

  1. Has the backend engineer define the API contract
  2. Has the frontend engineer confirm it matches their expectations
  3. Has the security reviewer check the trust boundary
  4. Only then lets implementation begin

This catches integration bugs before they exist — the most expensive bugs to find later.

What It Costs (Be Honest)

This pattern isn't free. Here's what you're trading:

  • Tokens and time. Agent dispatches consume context. A CTO that delegates to three agents uses significantly more tokens than just asking Claude to write the code directly. If you're on a limited plan, this matters.
  • Complexity. Maintaining 5+ agent definitions is overhead. When your stack changes, every affected agent definition needs updating. Stale agents are worse than no agents — they enforce outdated patterns with confidence.
  • Over-engineering risk. It's easy to get excited about the agent system and create agents for everything. A project with 3 files doesn't need a database specialist, a frontend engineer, and a security reviewer. Match the tooling to the actual complexity of your project.
  • The CTO can be wrong too. The CTO is still Claude with a role — it's not magically more intelligent. It will sometimes make bad delegation decisions, miss the right agent for a task, or overcomplicate something simple. You still need to review and push back.

The pattern pays for itself when your project has genuine cross-cutting concerns and enough complexity that orchestration prevents integration bugs. For a single small project with one developer? Skip the agents, just use a good CLAUDE.md with the CTO personality. You can always add agents later when you feel the pain of not having them.

What Success Looks Like

After a few weeks with the CTO pattern:

  • You describe what you want, not how to do it. "Add playlist sharing" instead of "create a POST endpoint at /api/playlists/share that takes a playlist ID and user ID..."
  • Claude catches things you missed. "This sharing endpoint needs rate limiting — a user could spam share invites" — without you having to think about it.
  • Cross-cutting concerns are handled automatically. Security review, code quality check, backwards compatibility verification — these happen because the CTO's workflow includes them, not because you remembered to ask.
  • The second task is faster than the first. Context compounds. The agents know your codebase. The memories capture your preferences. The CTO knows how you think.
  • You argue sometimes — and that's good. If you never disagree with your CTO, either the CTO isn't pushing back enough, or you're not being ambitious enough. Healthy tension between "ship it" and "do it right" is exactly what a good CTO provides.

Quick Start Template

Here's a minimal CLAUDE.md that sets up the CTO pattern. Copy it, fill in the blanks, and iterate from there:

# [Project Name] — CTO Briefing

You are the CTO of [Project Name]. Your #1 job is to protect the product
and bring it to a reliable state. You push back on decisions that threaten
stability — even when the boss asks.

You orchestrate work. When a task is complex, you break it down and delegate
to specialist agents. You verify that the pieces fit together before anything
is committed.

## The Product

[2-3 paragraphs: what it does, who it's for, what the tech stack is,
how the pieces connect]

## The Boss

- [Your experience level and background]
- [Your communication preferences — direct? explanatory? terse?]
- [Your hard rules — things that are never acceptable]
- [How you want to collaborate — hands-on? high-level? review everything?]

## Conventions

- [Branch naming]
- [Commit message format]
- [Code style rules that matter most to you]
- [Testing expectations]
- [What "done" means for a task]

Start here. Work with it for a week. You'll know what to add based on what goes wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment