When to use /ck:plan --deep --tdd

Running case

Extract the billing module from the monolith

API routes DB models Background jobs Webhooks Admin UI

Billing is spread across API routes, database models, background jobs, retry logic, invoice status, webhook handlers, and admin UI. The task sounds simple enough: "extract billing into its own module, keep the old behavior." But once the agent starts running, the real questions surface:

Which files actually belong in scope: API, jobs, DB, UI, or webhooks?
Which behaviors must stay identical: retry, idempotency, invoice status, payment failure?
How much of the current behavior is actually under test?
Does this phase have implicit dependencies on a migration or background job?
If the new code builds but diverges on an edge case, who catches that?

For small tasks, /ck:plan --fast or plain /ck:plan is usually fine. For complex tasks that aren't a full major refactor, --hard is usually the better fit. For large tasks, missing context upfront creates cost at the end: cook has to guess scope again, phase files stay vague, tests get written after the code, and regressions only surface after the refactor is done.

Core idea
--deep helps the plan see the full affected surface: file inventory, test gaps, phase dependencies. --tdd forces the flow to capture current behavior before refactoring, then uses those same tests as the regression gate.

Neither of these is a "stronger mode" you just toggle on to improve things. They address two specific risks: not knowing enough scope and not protecting existing behavior.

Section 01

Mode vs flag, two different things

Before going further, one distinction worth clarifying. /ck:plan takes two kinds of arguments:

Type	Example	Effect
Mode	`--fast`, `--hard`, `--deep`, `--parallel`, `--two`	Selects the pipeline. Only one mode per command.
Composable flag	`--tdd`, `--no-tasks`	An add-on. Can pair with any mode.

The key point: --deep is a mode — it changes the planning pipeline. --tdd is a flag — it keeps whichever mode you chose but adds a tests-first structure to the phase file and cook flow.

argument hint

argument-hint: "[task] [--fast|--hard|--deep|--parallel|--two] [--tdd|--no-tasks]"

The | between modes means pick one. --tdd sits in its own block, so it can compose freely. That makes --deep --tdd valid, while --deep --hard is not.

Section 02

`--deep`, when the plan needs a map per phase

What `--deep` is for

Large tasks tend to touch many areas of code at once. With the billing module, scope might span:

API routes: create invoice, retry payment, refund, webhook callback
Database: invoice table, payment attempt table, status history
Background jobs: retry scheduler, reconciliation job, webhook replay
Admin UI: invoice detail, manual retry, refund action
Shared contracts: frontend API response, webhook payload, event names

Tasks like this need more than a list of steps. They need a map:

Question	In the billing example
Which files get created/modified/deleted?	Route, service, job, migration, UI action
Which tests cover the current behavior?	Retry, idempotency, refund, webhook replay
Which phase depends on which?	Migration must precede service; service must precede UI
Which interfaces need test protection?	API response, webhook payload, job input
Which dependency edges are risky?	Job reads old status enum while service has already changed it

--deep makes sense when the cost of a wrong plan outweighs the cost of a careful one.

mode rule

Major refactor, 5+ areas, architectural debt -> use --deep

If you're only touching a few files, --fast or plain /ck:plan is usually enough. If scope is complex but the architecture is still clear, --hard tends to fit better. --deep pays off when one bad plan means hours of reworking phases, tests, and ownership.

The `--deep` pipeline

Starts from broad context, then drills into per-phase detail.

Researcher pass2–3 passes over architecture and affected areas.

Full-scope scoutDocs, core modules, existing tests, major dependencies.

Per-phase scoutEach phase gets its own file inventory, test gaps, and interfaces to preserve.

Phase filesPlanner writes scout data into each phase.

Red-teamCatches scope leaks, wrong phase order, implementation risks.

ValidationChecks that the plan has concrete commands, dependencies, and success criteria.

TasksGenerates task list, unless --no-tasks is set.

Cook handoffPrints /ck:cook {path}/plan.md.

Compared to --hard, which scouts one pass for the whole plan, --deep adds an extra pass per phase. Higher cost, but the trade-off is a more concrete phase file: which files get touched, which tests are missing, which dependencies are risky.

What `--deep` adds to a phase file

File inventory table — action (create/modify/delete), rough size, test impact
Test scenario matrix — critical/high/medium paths
Dependency map — which other phases this one links to
Function/interface checklist — which functions or interfaces need test protection

That's the main difference. A phase shouldn't just say "refactor billing service." It should say exactly which files, which tests, which edge cases, and which phase it depends on.

Section 03

`--tdd`, when you're worried a refactor will break what's running

What `--tdd` is for

Cook's default flow is familiar: read the plan, implement each task, type-check after each file, run a testing step at the end. Fine for greenfield. Not safe enough for refactoring running code.

In the billing example, existing behavior might include:

Retry runs at most 3 times, then the invoice moves to failed
A webhook with the same event_id can only be applied once
Refund cannot run if the invoice is already voided
Admin UI must still show the correct status history after the service is split out

Happy paths may still pass. Bugs usually live in edge cases. --tdd creates a regression gate before any code changes. Here "TDD" doesn't mean writing tests for new features — it means capturing current behavior before the refactor starts.

When `--tdd` is worth it

Situation	Worth it?	With billing
Completely new greenfield code	Usually not	No existing invoice/retry behavior to capture
Refactoring a module users depend on	Yes	Must preserve retry, idempotency, status transitions
Swapping payment providers, keeping old contract	Yes	Easy to drift on API response or webhook behavior
Fixing 1–2 lines with a clear root cause	Usually not	e.g., fixing a typo label in admin UI
Throwaway prototype	No	Test gate overhead isn't justified

--tdd is most valuable in modules with async patterns, stateful workflows, database transactions, or public API contracts. Billing hits all of them: background jobs, status transitions, transaction boundaries, webhook payloads, admin actions.

At plan time

Phase file section	Role
Tests before refactor	Write regression coverage before changing code, to pin down current behavior.
Refactor	Describes the code being changed, protected by the tests above.
Additional tests	Adds tests for any new behavior introduced in this phase, if any.
End-of-phase gate	A concrete compile + test command that must pass after the refactor.

phase shape

A --tdd phase usually reads in this order:
1. Write tests covering current behavior
2. Isolate a small dependency point if the old code is hard to test
3. Refactor the code
4. Re-run compile + tests

The "make old code testable" step is easy to skip. Sometimes it can't be tested directly: a long function that calls the database internally, with no injectable dependencies and no isolated output. In that case, extract a small dependency seam first — keeping the behavior identical — just enough to make the code testable. Then do the real refactor.

When cook runs

When cook runs with --tdd

Write tests firstPin down current behavior before touching any code.

RefactorChange structure, split modules, make old code testable if needed.

Run gateThe behavior-protection tests must still pass alongside the compile gate.

The short version: tests written first are the baseline for old behavior. If that baseline fails after refactoring, cook should stop and fix the behavioral drift before moving on.

Version reference
This post is cross-referenced against Engineer Kit engineer@v2.19.1-beta.10 at time of writing, verified via ck -V. This note is here only to flag potential drift if the ck:plan or ck:cook workflow changes later.

Section 04

Why `--deep--tdd` often ship together

--deep and --tdd address two different layers of risk.

If the task carries both risks, enable both options.

Unknown scopeNeed per-phase scouting, a file inventory, and a dependency map.

Behavior drift riskNeed tests before refactoring and a regression gate at cook time.

Conclusion--deep gives the map. --tdd gives the verification layer. Major refactors typically need both.

Risk	What you need	Option
Unknown scope	Per-phase scout, file inventory, dependency map	`--deep`
Refactor drifts existing behavior	Tests before refactor, Regression Gate at cook	`--tdd`

A major refactor usually carries both risks. For example:

task shape

Extract billing module from the monolith.
Keep current API contract for the frontend.
Preserve existing behavior for retry, idempotency, invoice status.
Touches database schema, background jobs, API routes, admin UI.

This billing task needs --deep because scope is wide and dependencies are many. It also needs --tdd because retry, idempotency, refund, and webhooks are all existing behaviors that must survive.

Running only --deep might produce a very clear plan, but the refactor still has no regression gate. Running only --tdd might get the test flow right, but the phase file stays vague: what to test, where, which modules are affected, which dependencies need to go first.

Quick recall
Wide task? Use --deep. Old behavior must survive? Add --tdd. Both risks? Enable both.

Section 05

When to run `/ck:brainstorm` or `/ck:scout` first

A common mistake: reaching for --deep --tdd when the approach is still unclear. Plan mode doesn't resolve ambiguity. It turns a chosen approach into more concrete steps — nothing more.

For example, if you haven't decided:

Split into a true separate service, or just modularize inside the monolith?
Introduce a new queue, or keep the current background job?
Keep the existing invoice status enum, or move to a clearer state machine?
Rewrite the admin UI flow, or refactor it incrementally?

In that case, run /ck:brainstorm first to settle on an approach, then plan.

Another layer is /ck:scout. Think of it as a context-gathering step, not a decision-making one. Scout is useful when you're not sure where the code lives, which modules are involved, or which tests are covering the current behavior.

When the brief is still fuzzy, don't jump straight to plan.

1 · ScoutFind the real code, real tests, real dependencies.

2 · BrainstormPick an approach and weigh trade-offs before locking the plan.

3 · PlanUse --deep --tdd once the approach is clear.

context-first flow

/ck:scout "billing retry flow, invoice status, background jobs, admin UI"
/ck:brainstorm "Extract billing module while keeping the current API contract"
/ck:plan --deep --tdd "Refactor billing module per agreed approach..."

Once you know the relevant code area and the key trade-offs, you can skip the separate /ck:scout call — /ck:plan --deep has a scout step in its pipeline anyway. But when the brief is too vague, scouting first helps avoid a silent failure mode: the agent's reasoning sounds plausible, but it's working from the wrong map of the project.

Situation	Start with
Unknown project context / don't know where the code lives	`/ck:scout` -> `/ck:brainstorm`
Know clearly what and how	`/ck:plan` directly
Know what, unsure how	`/ck:brainstorm` -> `/ck:plan`
Not yet sure whether to do it at all	`/ck:brainstorm` -> decide -> plan or drop

Section 06

Decision matrix

Scope unclear/ck:scout first

Approach unclear/ck:brainstorm first

Major refactor/ck:plan --deep --tdd

Situation	Command
Unknown project context	`/ck:scout` first, then brainstorm/plan
Approach not settled	`/ck:brainstorm` first
Small fix, 1–2 files	`/ck:plan --fast`
Medium new feature	`/ck:plan` auto
Complex feature, unfamiliar domain	`/ck:plan --hard`
3+ independent modules, run in parallel	`/ck:plan --parallel`
Torn between 2 concrete approaches	`/ck:plan --two`
Refactor across 5+ areas, architectural debt	`/ck:plan --deep`
Refactoring live/dogfood code, regression risk	Any mode + `--tdd`
Major refactor on a codebase with behavior to preserve	`/ck:plan --deep --tdd`

The --deep --tdd pair isn't one mechanism — it's two separate things that travel well together. --deep gives the map. --tdd gives the verification layer. Planning cost goes up, but cook has less scope to guess and a clearer regression gate to work against.

Section 07

Inside: what's actually different

This section goes deeper into how --deep and --tdd each affect plan time, cook time, and running cost.

7.1 `--deep` costs more because of researcher + per-phase scout

Mode	Researcher	Red Team	Validation	Per-phase scout
`--fast`	0	0	0	No
`--hard`	2	Yes	Optional	No
`--deep`	2-3	Yes	Yes	Yes
`--parallel`	2	Yes	Optional	No
`--two`	2+	After select	After select	No

Most of --deep's cost comes from 2–3 researchers for high-level architecture analysis, the red-team review, the validation step, and a re-read pass for each phase.

Put simply: each phase gets its own inspection before the plan is finalized — which files it touches, which phases it depends on, which tests are missing, which edge cases are easy to miss.

One thing worth clarifying: "scout" here refers to the act of reading and rechecking per phase, not a guarantee that a separate agent is always spawned at runtime. The point is that --deep forces the plan through many smaller inspection passes rather than a single high-level look at scope.

7.2 `--tdd` doesn't spin up a new agent

Easy to misread: --tdd doesn't change the selected mode. It's an additive flag — you still run --fast, --hard, or --deep as normal, with a tests-first structure layered on top.

The flag changes exactly 2 things:

The phase file at plan time. A phase normally covers overview, requirements, architecture, related files, implementation steps, success criteria, and risk assessment. With --tdd on, the phase gains a pre-refactor test section, a post-refactor test section, and an end-of-phase gate.
Execution order at cook time. Write behavior-protection tests first, then refactor, then re-run the compile/test gate.

The cost of --tdd is much lower than --deep. It mainly adds structure to the phase file and reorders execution at cook time.

7.3 Regression Gate needs a concrete command

The cook spec requires the Regression Gate to be a concrete compile/test command. For a Go project, for example:

regression gate

Regression Gate: go test ./... && go vet ./...

If the phase file vaguely says "run tests to verify," the verify step weakens because cook has no exact command to anchor on. The --tdd flag doesn't automatically guess the right tooling for every repo. If the project uses a non-standard setup, declare it explicitly in the task description or plan file.

tooling hint

Project uses go test ./..., frontend uses npm test, E2E uses Playwright.

When the task names the tooling, the plan has something concrete to write into the Regression Gate for each phase.

Section 08

How to run it, with a general template

The template below follows the source skill of ck:plan and ck:cook: plan receives the task + mode/flag, cook receives the plan path and carries --tdd forward if the plan enabled it.

General template

general template

/ck:plan [mode] [--tdd] "[What needs to happen].
Scope: [modules/files/stack being touched].
Preserve: [API contract, existing behavior, compatibility].
Touches: [database, jobs, routes, UI, shared types].
Tooling: [concrete compile/test command].
Known bugs / out of scope: [if any]."

Note: if the plan uses --tdd, pass it to cook as well:

cook command

/ck:cook /absolute/path/to/plan.md --tdd

Applying the template to a few common situations:

`--deep` for a planning/inventory phase

deep only

/ck:plan --deep "Build an inventory plan to prepare extracting billing into its own package.
Scope: API routes, invoice service, payment jobs, admin UI, database schema.
Output must include: file inventory, dependency map, phase ownership, risk list.
Not refactoring live behavior in this round."

`--hard --tdd` for refactoring code that's running / being dogfooded

hard + tdd

/ck:plan --hard --tdd "Refactor invoiceStatusService: move transition rules
to a separate file, keep exact behavior for retry, failed, refunded, voided."

`--deep --tdd` for a major refactor with behavior to preserve

deep + tdd

/ck:plan --deep --tdd "Extract billing module from the monolith.
Keep API contract for frontend, preserve retry/idempotency/refund/webhook behavior.
Touches: database schema, payment jobs, API routes, admin UI. Project uses
go test ./..., frontend uses npm test."

A good task description usually covers:

Scope — which files/modules/stack
Constraints — what must not break, what must stay compatible
Tooling — test command, compile command; critical for --tdd
Expected output — short but concrete

Vague input like "refactor billing" produces a vague plan. Without a concrete anchor, scout output tends to be generic.

Section 09

Best practices and pitfalls

Do this before you cook

Scope

Clean worktree before planning

If the working directory has uncommitted changes from another task, scout can mix up current code with in-progress work. Before a large plan: commit or stash. Better yet, create a dedicated worktree via /ck:worktree.

Review

Review the phase files before cooking

A --deep --tdd plan takes a while to run — don't start cook immediately without reading through plan.md and the phase files it generated.

Phase review checklist

Read plan.md and the phase files through these 5 questions before letting cook run:

File inventoryAny files you don't want the agent to touch?

Test scenario matrixAny missed race conditions or edge cases like duplicate webhooks?

Phase orderTruly independent, or does migration have to precede the job?

Regression GateRight tool and right test subset?

Known bugsFix them, or preserve the old behavior?

Tooling

Declare the test command in the task

With --tdd, does the repo use something non-standard like bun instead of npm, mise instead of asdf, or task instead of make? Say so upfront.

tooling hint

Project uses go test ./..., frontend uses npm test, E2E uses Playwright.

Without a declaration, the agent guesses. A wrong guess weakens the Regression Gate.

Context

`/clear` between plan and cook

Plan context in a large task gets heavy fast: research output, scout data, red-team feedback. Recommended flow: finish plan -> /clear -> reopen -> /ck:cook {absolute-path}/plan.md --tdd. Cook reads everything from the plan file; the old planning context isn't needed.

Avoid

Cook flag

Forgetting `--tdd` on `/ck:cook`

Plan has --tdd, cook doesn't. Cook can still read the pre-refactor test section, but it won't enforce the order — tests may still get written, just after the code instead of before. Use the cook command suggested in the plan output.

Overkill

`--deep` for small tasks

Under roughly 5 files, it's usually not needed. --hard or --fast will be leaner.

Greenfield

`--tdd` for greenfield

New code has no existing behavior to capture. The "tests before refactor" section ends up nearly empty, making it easy for the agent to write formal tests rather than real TDD. Greenfield: use --hard and let tests be written in normal flow.

Blind spot

Trusting the phase file blindly

--deep scouts more thoroughly, but it can still miss implicit dependencies. Before cooking, ask: does a later phase silently rely on something the earlier phase hasn't created yet?

Known bug

`--tdd` capturing tests for code that already has bugs

Old billing code had a latent bug — duplicate webhooks sometimes created 2 payment attempts. If the pre-refactor test captures that bug as "current behavior," an accidental fix during refactoring makes the gate fail, and the agent may reintroduce the old bug to make it pass. Fix: declare known bugs in the task description, or split into two plans: one to fix the bug, one to refactor.

Mode mix

Trying to combine `--deep` with `--parallel`

--deep and --parallel are both modes — they don't compose. If you need multiple agents in parallel, use --parallel --tdd and be very explicit about ownership and test scope. If ownership and test isolation aren't clear, drop --parallel and stick with --deep --tdd.

Section 10

In short

--deep --tdd isn't for every task.

It's for the subset of tasks that show two signals:

Scope is wide enough that a typical plan risks missing the map
Existing code has behavior that must survive the refactor

--deep makes planning slower, but the phase files come out sharper: file inventory, test gaps, dependency map, function/interface checklist.

--tdd makes cooking slower, but the refactor gets a regression gate: write tests first, refactor, then verify.

If you're in a PM/founder/product role, you mainly just need to know these two options exist. When the team is about to refactor a large module like billing, ask whether the plan includes per-phase scouting and a regression gate.

If you're a dev, try it on the next big task. The first time it may feel like extra overhead. But one pre-written test catching a hidden regression makes that overhead feel worth it.

When to use /ck:plan --deep --tdd

Mode vs flag, two different things

--deep, when the plan needs a map per phase

What --deep is for

The --deep pipeline

What --deep adds to a phase file

--tdd, when you're worried a refactor will break what's running

What --tdd is for

When --tdd is worth it

At plan time

When cook runs

Why --deep--tdd often ship together

When to run /ck:brainstorm or /ck:scout first

Decision matrix

Inside: what's actually different

7.1 --deep costs more because of researcher + per-phase scout

7.2 --tdd doesn't spin up a new agent

7.3 Regression Gate needs a concrete command

How to run it, with a general template

--deep for a planning/inventory phase

--hard --tdd for refactoring code that's running / being dogfooded

--deep --tdd for a major refactor with behavior to preserve

Best practices and pitfalls

Do this before you cook

Clean worktree before planning

Review the phase files before cooking

Declare the test command in the task

/clear between plan and cook

Avoid

Forgetting --tdd on /ck:cook

--deep for small tasks

--tdd for greenfield

Trusting the phase file blindly

--tdd capturing tests for code that already has bugs

Trying to combine --deep with --parallel

In short

`--deep`, when the plan needs a map per phase

What `--deep` is for

The `--deep` pipeline

What `--deep` adds to a phase file

`--tdd`, when you're worried a refactor will break what's running

What `--tdd` is for

When `--tdd` is worth it

Why `--deep--tdd` often ship together

When to run `/ck:brainstorm` or `/ck:scout` first

7.1 `--deep` costs more because of researcher + per-phase scout

7.2 `--tdd` doesn't spin up a new agent

`--deep` for a planning/inventory phase

`--hard --tdd` for refactoring code that's running / being dogfooded

`--deep --tdd` for a major refactor with behavior to preserve

`/clear` between plan and cook

Forgetting `--tdd` on `/ck:cook`

`--deep` for small tasks

`--tdd` for greenfield

`--tdd` capturing tests for code that already has bugs

Trying to combine `--deep` with `--parallel`