Inside ClaudeKit ClaudeKit Engineer
Updated 2026-05-21 Planning, Deep, TDD Home

Notes for picking the right plan mode

When to use /ck:plan --deep --tdd

Not every task needs the heavy mode. This post breaks down when to reach for --deep, when to add --tdd, and when a shorter route is the better call.

Running case

Extract the billing module from the monolith

API routes DB models Background jobs Webhooks Admin UI

Billing is spread across API routes, database models, background jobs, retry logic, invoice status, webhook handlers, and admin UI. The task sounds simple enough: "extract billing into its own module, keep the old behavior." But once the agent starts running, the real questions surface:

  • Which files actually belong in scope: API, jobs, DB, UI, or webhooks?
  • Which behaviors must stay identical: retry, idempotency, invoice status, payment failure?
  • How much of the current behavior is actually under test?
  • Does this phase have implicit dependencies on a migration or background job?
  • If the new code builds but diverges on an edge case, who catches that?

For small tasks, /ck:plan --fast or plain /ck:plan is usually fine. For complex tasks that aren't a full major refactor, --hard is usually the better fit. For large tasks, missing context upfront creates cost at the end: cook has to guess scope again, phase files stay vague, tests get written after the code, and regressions only surface after the refactor is done.

Core idea

--deep helps the plan see the full affected surface: file inventory, test gaps, phase dependencies. --tdd forces the flow to capture current behavior before refactoring, then uses those same tests as the regression gate.

Neither of these is a "stronger mode" you just toggle on to improve things. They address two specific risks: not knowing enough scope and not protecting existing behavior.

Section 01

Mode vs flag, two different things

Before going further, one distinction worth clarifying. /ck:plan takes two kinds of arguments:

TypeExampleEffect
Mode--fast, --hard, --deep, --parallel, --twoSelects the pipeline. Only one mode per command.
Composable flag--tdd, --no-tasksAn add-on. Can pair with any mode.

The key point: --deep is a mode — it changes the planning pipeline. --tdd is a flag — it keeps whichever mode you chose but adds a tests-first structure to the phase file and cook flow.

argument hint
argument-hint: "[task] [--fast|--hard|--deep|--parallel|--two] [--tdd|--no-tasks]"

The | between modes means pick one. --tdd sits in its own block, so it can compose freely. That makes --deep --tdd valid, while --deep --hard is not.

Section 02

--deep, when the plan needs a map per phase

What --deep is for

Large tasks tend to touch many areas of code at once. With the billing module, scope might span:

  • API routes: create invoice, retry payment, refund, webhook callback
  • Database: invoice table, payment attempt table, status history
  • Background jobs: retry scheduler, reconciliation job, webhook replay
  • Admin UI: invoice detail, manual retry, refund action
  • Shared contracts: frontend API response, webhook payload, event names

Tasks like this need more than a list of steps. They need a map:

QuestionIn the billing example
Which files get created/modified/deleted?Route, service, job, migration, UI action
Which tests cover the current behavior?Retry, idempotency, refund, webhook replay
Which phase depends on which?Migration must precede service; service must precede UI
Which interfaces need test protection?API response, webhook payload, job input
Which dependency edges are risky?Job reads old status enum while service has already changed it

--deep makes sense when the cost of a wrong plan outweighs the cost of a careful one.

mode rule
Major refactor, 5+ areas, architectural debt -> use --deep

If you're only touching a few files, --fast or plain /ck:plan is usually enough. If scope is complex but the architecture is still clear, --hard tends to fit better. --deep pays off when one bad plan means hours of reworking phases, tests, and ownership.

The --deep pipeline

Starts from broad context, then drills into per-phase detail.

Researcher pass2–3 passes over architecture and affected areas.
Full-scope scoutDocs, core modules, existing tests, major dependencies.
Per-phase scoutEach phase gets its own file inventory, test gaps, and interfaces to preserve.
Phase filesPlanner writes scout data into each phase.
Red-teamCatches scope leaks, wrong phase order, implementation risks.
ValidationChecks that the plan has concrete commands, dependencies, and success criteria.
TasksGenerates task list, unless --no-tasks is set.
Cook handoffPrints /ck:cook {path}/plan.md.

Compared to --hard, which scouts one pass for the whole plan, --deep adds an extra pass per phase. Higher cost, but the trade-off is a more concrete phase file: which files get touched, which tests are missing, which dependencies are risky.

What --deep adds to a phase file

  • File inventory table — action (create/modify/delete), rough size, test impact
  • Test scenario matrix — critical/high/medium paths
  • Dependency map — which other phases this one links to
  • Function/interface checklist — which functions or interfaces need test protection

That's the main difference. A phase shouldn't just say "refactor billing service." It should say exactly which files, which tests, which edge cases, and which phase it depends on.

Section 03

--tdd, when you're worried a refactor will break what's running

What --tdd is for

Cook's default flow is familiar: read the plan, implement each task, type-check after each file, run a testing step at the end. Fine for greenfield. Not safe enough for refactoring running code.

In the billing example, existing behavior might include:

  • Retry runs at most 3 times, then the invoice moves to failed
  • A webhook with the same event_id can only be applied once
  • Refund cannot run if the invoice is already voided
  • Admin UI must still show the correct status history after the service is split out

Happy paths may still pass. Bugs usually live in edge cases. --tdd creates a regression gate before any code changes. Here "TDD" doesn't mean writing tests for new features — it means capturing current behavior before the refactor starts.

When --tdd is worth it

SituationWorth it?With billing
Completely new greenfield codeUsually notNo existing invoice/retry behavior to capture
Refactoring a module users depend onYesMust preserve retry, idempotency, status transitions
Swapping payment providers, keeping old contractYesEasy to drift on API response or webhook behavior
Fixing 1–2 lines with a clear root causeUsually note.g., fixing a typo label in admin UI
Throwaway prototypeNoTest gate overhead isn't justified

--tdd is most valuable in modules with async patterns, stateful workflows, database transactions, or public API contracts. Billing hits all of them: background jobs, status transitions, transaction boundaries, webhook payloads, admin actions.

At plan time

Phase file sectionRole
Tests before refactorWrite regression coverage before changing code, to pin down current behavior.
RefactorDescribes the code being changed, protected by the tests above.
Additional testsAdds tests for any new behavior introduced in this phase, if any.
End-of-phase gateA concrete compile + test command that must pass after the refactor.
phase shape
A --tdd phase usually reads in this order:
1. Write tests covering current behavior
2. Isolate a small dependency point if the old code is hard to test
3. Refactor the code
4. Re-run compile + tests

The "make old code testable" step is easy to skip. Sometimes it can't be tested directly: a long function that calls the database internally, with no injectable dependencies and no isolated output. In that case, extract a small dependency seam first — keeping the behavior identical — just enough to make the code testable. Then do the real refactor.

When cook runs

When cook runs with --tdd

Write tests firstPin down current behavior before touching any code.
RefactorChange structure, split modules, make old code testable if needed.
Run gateThe behavior-protection tests must still pass alongside the compile gate.

The short version: tests written first are the baseline for old behavior. If that baseline fails after refactoring, cook should stop and fix the behavioral drift before moving on.

Version reference

This post is cross-referenced against Engineer Kit engineer@v2.19.1-beta.10 at time of writing, verified via ck -V. This note is here only to flag potential drift if the ck:plan or ck:cook workflow changes later.

Section 04

Why --deep--tdd often ship together

--deep and --tdd address two different layers of risk.

If the task carries both risks, enable both options.

Unknown scopeNeed per-phase scouting, a file inventory, and a dependency map.
Behavior drift riskNeed tests before refactoring and a regression gate at cook time.
Conclusion--deep gives the map. --tdd gives the verification layer. Major refactors typically need both.
RiskWhat you needOption
Unknown scopePer-phase scout, file inventory, dependency map--deep
Refactor drifts existing behaviorTests before refactor, Regression Gate at cook--tdd

A major refactor usually carries both risks. For example:

task shape
Extract billing module from the monolith.
Keep current API contract for the frontend.
Preserve existing behavior for retry, idempotency, invoice status.
Touches database schema, background jobs, API routes, admin UI.

This billing task needs --deep because scope is wide and dependencies are many. It also needs --tdd because retry, idempotency, refund, and webhooks are all existing behaviors that must survive.

Running only --deep might produce a very clear plan, but the refactor still has no regression gate. Running only --tdd might get the test flow right, but the phase file stays vague: what to test, where, which modules are affected, which dependencies need to go first.

Quick recall

Wide task? Use --deep. Old behavior must survive? Add --tdd. Both risks? Enable both.

Section 05

When to run /ck:brainstorm or /ck:scout first

A common mistake: reaching for --deep --tdd when the approach is still unclear. Plan mode doesn't resolve ambiguity. It turns a chosen approach into more concrete steps — nothing more.

For example, if you haven't decided:

  • Split into a true separate service, or just modularize inside the monolith?
  • Introduce a new queue, or keep the current background job?
  • Keep the existing invoice status enum, or move to a clearer state machine?
  • Rewrite the admin UI flow, or refactor it incrementally?

In that case, run /ck:brainstorm first to settle on an approach, then plan.

Another layer is /ck:scout. Think of it as a context-gathering step, not a decision-making one. Scout is useful when you're not sure where the code lives, which modules are involved, or which tests are covering the current behavior.

When the brief is still fuzzy, don't jump straight to plan.

1 · ScoutFind the real code, real tests, real dependencies.
2 · BrainstormPick an approach and weigh trade-offs before locking the plan.
3 · PlanUse --deep --tdd once the approach is clear.
context-first flow
/ck:scout "billing retry flow, invoice status, background jobs, admin UI"
/ck:brainstorm "Extract billing module while keeping the current API contract"
/ck:plan --deep --tdd "Refactor billing module per agreed approach..."

Once you know the relevant code area and the key trade-offs, you can skip the separate /ck:scout call — /ck:plan --deep has a scout step in its pipeline anyway. But when the brief is too vague, scouting first helps avoid a silent failure mode: the agent's reasoning sounds plausible, but it's working from the wrong map of the project.

SituationStart with
Unknown project context / don't know where the code lives/ck:scout -> /ck:brainstorm
Know clearly what and how/ck:plan directly
Know what, unsure how/ck:brainstorm -> /ck:plan
Not yet sure whether to do it at all/ck:brainstorm -> decide -> plan or drop
Section 06

Decision matrix

Scope unclear/ck:scout first
Approach unclear/ck:brainstorm first
Major refactor/ck:plan --deep --tdd
SituationCommand
Unknown project context/ck:scout first, then brainstorm/plan
Approach not settled/ck:brainstorm first
Small fix, 1–2 files/ck:plan --fast
Medium new feature/ck:plan auto
Complex feature, unfamiliar domain/ck:plan --hard
3+ independent modules, run in parallel/ck:plan --parallel
Torn between 2 concrete approaches/ck:plan --two
Refactor across 5+ areas, architectural debt/ck:plan --deep
Refactoring live/dogfood code, regression riskAny mode + --tdd
Major refactor on a codebase with behavior to preserve/ck:plan --deep --tdd

The --deep --tdd pair isn't one mechanism — it's two separate things that travel well together. --deep gives the map. --tdd gives the verification layer. Planning cost goes up, but cook has less scope to guess and a clearer regression gate to work against.

Section 07

Inside: what's actually different

This section goes deeper into how --deep and --tdd each affect plan time, cook time, and running cost.

7.1 --deep costs more because of researcher + per-phase scout

ModeResearcherRed TeamValidationPer-phase scout
--fast000No
--hard2YesOptionalNo
--deep2-3YesYesYes
--parallel2YesOptionalNo
--two2+After selectAfter selectNo

Most of --deep's cost comes from 2–3 researchers for high-level architecture analysis, the red-team review, the validation step, and a re-read pass for each phase.

Put simply: each phase gets its own inspection before the plan is finalized — which files it touches, which phases it depends on, which tests are missing, which edge cases are easy to miss.

One thing worth clarifying: "scout" here refers to the act of reading and rechecking per phase, not a guarantee that a separate agent is always spawned at runtime. The point is that --deep forces the plan through many smaller inspection passes rather than a single high-level look at scope.

7.2 --tdd doesn't spin up a new agent

Easy to misread: --tdd doesn't change the selected mode. It's an additive flag — you still run --fast, --hard, or --deep as normal, with a tests-first structure layered on top.

The flag changes exactly 2 things:

  • The phase file at plan time. A phase normally covers overview, requirements, architecture, related files, implementation steps, success criteria, and risk assessment. With --tdd on, the phase gains a pre-refactor test section, a post-refactor test section, and an end-of-phase gate.
  • Execution order at cook time. Write behavior-protection tests first, then refactor, then re-run the compile/test gate.

The cost of --tdd is much lower than --deep. It mainly adds structure to the phase file and reorders execution at cook time.

7.3 Regression Gate needs a concrete command

The cook spec requires the Regression Gate to be a concrete compile/test command. For a Go project, for example:

regression gate
Regression Gate: go test ./... && go vet ./...

If the phase file vaguely says "run tests to verify," the verify step weakens because cook has no exact command to anchor on. The --tdd flag doesn't automatically guess the right tooling for every repo. If the project uses a non-standard setup, declare it explicitly in the task description or plan file.

tooling hint
Project uses go test ./..., frontend uses npm test, E2E uses Playwright.

When the task names the tooling, the plan has something concrete to write into the Regression Gate for each phase.

Section 08

How to run it, with a general template

The template below follows the source skill of ck:plan and ck:cook: plan receives the task + mode/flag, cook receives the plan path and carries --tdd forward if the plan enabled it.

General template

general template
/ck:plan [mode] [--tdd] "[What needs to happen].
Scope: [modules/files/stack being touched].
Preserve: [API contract, existing behavior, compatibility].
Touches: [database, jobs, routes, UI, shared types].
Tooling: [concrete compile/test command].
Known bugs / out of scope: [if any]."

Note: if the plan uses --tdd, pass it to cook as well:

cook command
/ck:cook /absolute/path/to/plan.md --tdd

Applying the template to a few common situations:

--deep for a planning/inventory phase

deep only
/ck:plan --deep "Build an inventory plan to prepare extracting billing into its own package.
Scope: API routes, invoice service, payment jobs, admin UI, database schema.
Output must include: file inventory, dependency map, phase ownership, risk list.
Not refactoring live behavior in this round."

--hard --tdd for refactoring code that's running / being dogfooded

hard + tdd
/ck:plan --hard --tdd "Refactor invoiceStatusService: move transition rules
to a separate file, keep exact behavior for retry, failed, refunded, voided."

--deep --tdd for a major refactor with behavior to preserve

deep + tdd
/ck:plan --deep --tdd "Extract billing module from the monolith.
Keep API contract for frontend, preserve retry/idempotency/refund/webhook behavior.
Touches: database schema, payment jobs, API routes, admin UI. Project uses
go test ./..., frontend uses npm test."

A good task description usually covers:

  • Scope — which files/modules/stack
  • Constraints — what must not break, what must stay compatible
  • Tooling — test command, compile command; critical for --tdd
  • Expected output — short but concrete

Vague input like "refactor billing" produces a vague plan. Without a concrete anchor, scout output tends to be generic.

Section 09

Best practices and pitfalls

Do this before you cook

Scope

Clean worktree before planning

If the working directory has uncommitted changes from another task, scout can mix up current code with in-progress work. Before a large plan: commit or stash. Better yet, create a dedicated worktree via /ck:worktree.

Review

Review the phase files before cooking

A --deep --tdd plan takes a while to run — don't start cook immediately without reading through plan.md and the phase files it generated.

Phase review checklist

Read plan.md and the phase files through these 5 questions before letting cook run:

File inventoryAny files you don't want the agent to touch?
Test scenario matrixAny missed race conditions or edge cases like duplicate webhooks?
Phase orderTruly independent, or does migration have to precede the job?
Regression GateRight tool and right test subset?
Known bugsFix them, or preserve the old behavior?
Tooling

Declare the test command in the task

With --tdd, does the repo use something non-standard like bun instead of npm, mise instead of asdf, or task instead of make? Say so upfront.

tooling hint
Project uses go test ./..., frontend uses npm test, E2E uses Playwright.

Without a declaration, the agent guesses. A wrong guess weakens the Regression Gate.

Context

/clear between plan and cook

Plan context in a large task gets heavy fast: research output, scout data, red-team feedback. Recommended flow: finish plan -> /clear -> reopen -> /ck:cook {absolute-path}/plan.md --tdd. Cook reads everything from the plan file; the old planning context isn't needed.

Avoid

Cook flag

Forgetting --tdd on /ck:cook

Plan has --tdd, cook doesn't. Cook can still read the pre-refactor test section, but it won't enforce the order — tests may still get written, just after the code instead of before. Use the cook command suggested in the plan output.

Overkill

--deep for small tasks

Under roughly 5 files, it's usually not needed. --hard or --fast will be leaner.

Greenfield

--tdd for greenfield

New code has no existing behavior to capture. The "tests before refactor" section ends up nearly empty, making it easy for the agent to write formal tests rather than real TDD. Greenfield: use --hard and let tests be written in normal flow.

Blind spot

Trusting the phase file blindly

--deep scouts more thoroughly, but it can still miss implicit dependencies. Before cooking, ask: does a later phase silently rely on something the earlier phase hasn't created yet?

Known bug

--tdd capturing tests for code that already has bugs

Old billing code had a latent bug — duplicate webhooks sometimes created 2 payment attempts. If the pre-refactor test captures that bug as "current behavior," an accidental fix during refactoring makes the gate fail, and the agent may reintroduce the old bug to make it pass. Fix: declare known bugs in the task description, or split into two plans: one to fix the bug, one to refactor.

Mode mix

Trying to combine --deep with --parallel

--deep and --parallel are both modes — they don't compose. If you need multiple agents in parallel, use --parallel --tdd and be very explicit about ownership and test scope. If ownership and test isolation aren't clear, drop --parallel and stick with --deep --tdd.

Section 10

In short

--deep --tdd isn't for every task.

It's for the subset of tasks that show two signals:

  1. Scope is wide enough that a typical plan risks missing the map
  2. Existing code has behavior that must survive the refactor

--deep makes planning slower, but the phase files come out sharper: file inventory, test gaps, dependency map, function/interface checklist.

--tdd makes cooking slower, but the refactor gets a regression gate: write tests first, refactor, then verify.

If you're in a PM/founder/product role, you mainly just need to know these two options exist. When the team is about to refactor a large module like billing, ask whether the plan includes per-phase scouting and a regression gate.

If you're a dev, try it on the next big task. The first time it may feel like extra overhead. But one pre-written test catching a hidden regression makes that overhead feel worth it.