Extract the billing module from the monolith
Billing is spread across API routes, database models, background jobs, retry logic, invoice status, webhook handlers, and admin UI. The task sounds simple enough: "extract billing into its own module, keep the old behavior." But once the agent starts running, the real questions surface:
- Which files actually belong in scope: API, jobs, DB, UI, or webhooks?
- Which behaviors must stay identical: retry, idempotency, invoice status, payment failure?
- How much of the current behavior is actually under test?
- Does this phase have implicit dependencies on a migration or background job?
- If the new code builds but diverges on an edge case, who catches that?
For small tasks, /ck:plan --fast or plain /ck:plan is usually fine. For complex tasks that aren't a full major refactor, --hard is usually the better fit. For large tasks, missing context upfront creates cost at the end: cook has to guess scope again, phase files stay vague, tests get written after the code, and regressions only surface after the refactor is done.
Core idea
--deephelps the plan see the full affected surface: file inventory, test gaps, phase dependencies.--tddforces the flow to capture current behavior before refactoring, then uses those same tests as the regression gate.
Neither of these is a "stronger mode" you just toggle on to improve things. They address two specific risks: not knowing enough scope and not protecting existing behavior.
Mode vs flag, two different things
Before going further, one distinction worth clarifying. /ck:plan takes two kinds of arguments:
| Type | Example | Effect |
|---|---|---|
| Mode | --fast, --hard, --deep, --parallel, --two | Selects the pipeline. Only one mode per command. |
| Composable flag | --tdd, --no-tasks | An add-on. Can pair with any mode. |
The key point: --deep is a mode — it changes the planning pipeline. --tdd is a flag — it keeps whichever mode you chose but adds a tests-first structure to the phase file and cook flow.
argument-hint: "[task] [--fast|--hard|--deep|--parallel|--two] [--tdd|--no-tasks]"
The | between modes means pick one. --tdd sits in its own block, so it can compose freely. That makes --deep --tdd valid, while --deep --hard is not.
--deep, when the plan needs a map per phase
What --deep is for
Large tasks tend to touch many areas of code at once. With the billing module, scope might span:
- API routes: create invoice, retry payment, refund, webhook callback
- Database: invoice table, payment attempt table, status history
- Background jobs: retry scheduler, reconciliation job, webhook replay
- Admin UI: invoice detail, manual retry, refund action
- Shared contracts: frontend API response, webhook payload, event names
Tasks like this need more than a list of steps. They need a map:
| Question | In the billing example |
|---|---|
| Which files get created/modified/deleted? | Route, service, job, migration, UI action |
| Which tests cover the current behavior? | Retry, idempotency, refund, webhook replay |
| Which phase depends on which? | Migration must precede service; service must precede UI |
| Which interfaces need test protection? | API response, webhook payload, job input |
| Which dependency edges are risky? | Job reads old status enum while service has already changed it |
--deep makes sense when the cost of a wrong plan outweighs the cost of a careful one.
Major refactor, 5+ areas, architectural debt -> use --deep
If you're only touching a few files, --fast or plain /ck:plan is usually enough. If scope is complex but the architecture is still clear, --hard tends to fit better. --deep pays off when one bad plan means hours of reworking phases, tests, and ownership.
The --deep pipeline
Starts from broad context, then drills into per-phase detail.
--no-tasks is set./ck:cook {path}/plan.md.Compared to --hard, which scouts one pass for the whole plan, --deep adds an extra pass per phase. Higher cost, but the trade-off is a more concrete phase file: which files get touched, which tests are missing, which dependencies are risky.
What --deep adds to a phase file
- File inventory table — action (create/modify/delete), rough size, test impact
- Test scenario matrix — critical/high/medium paths
- Dependency map — which other phases this one links to
- Function/interface checklist — which functions or interfaces need test protection
That's the main difference. A phase shouldn't just say "refactor billing service." It should say exactly which files, which tests, which edge cases, and which phase it depends on.
--tdd, when you're worried a refactor will break what's running
What --tdd is for
Cook's default flow is familiar: read the plan, implement each task, type-check after each file, run a testing step at the end. Fine for greenfield. Not safe enough for refactoring running code.
In the billing example, existing behavior might include:
- Retry runs at most 3 times, then the invoice moves to
failed - A webhook with the same
event_idcan only be applied once - Refund cannot run if the invoice is already
voided - Admin UI must still show the correct status history after the service is split out
Happy paths may still pass. Bugs usually live in edge cases. --tdd creates a regression gate before any code changes. Here "TDD" doesn't mean writing tests for new features — it means capturing current behavior before the refactor starts.
When --tdd is worth it
| Situation | Worth it? | With billing |
|---|---|---|
| Completely new greenfield code | Usually not | No existing invoice/retry behavior to capture |
| Refactoring a module users depend on | Yes | Must preserve retry, idempotency, status transitions |
| Swapping payment providers, keeping old contract | Yes | Easy to drift on API response or webhook behavior |
| Fixing 1–2 lines with a clear root cause | Usually not | e.g., fixing a typo label in admin UI |
| Throwaway prototype | No | Test gate overhead isn't justified |
--tdd is most valuable in modules with async patterns, stateful workflows, database transactions, or public API contracts. Billing hits all of them: background jobs, status transitions, transaction boundaries, webhook payloads, admin actions.
At plan time
| Phase file section | Role |
|---|---|
| Tests before refactor | Write regression coverage before changing code, to pin down current behavior. |
| Refactor | Describes the code being changed, protected by the tests above. |
| Additional tests | Adds tests for any new behavior introduced in this phase, if any. |
| End-of-phase gate | A concrete compile + test command that must pass after the refactor. |
A --tdd phase usually reads in this order:
1. Write tests covering current behavior
2. Isolate a small dependency point if the old code is hard to test
3. Refactor the code
4. Re-run compile + tests
The "make old code testable" step is easy to skip. Sometimes it can't be tested directly: a long function that calls the database internally, with no injectable dependencies and no isolated output. In that case, extract a small dependency seam first — keeping the behavior identical — just enough to make the code testable. Then do the real refactor.
When cook runs
When cook runs with --tdd
The short version: tests written first are the baseline for old behavior. If that baseline fails after refactoring, cook should stop and fix the behavioral drift before moving on.
Version referenceThis post is cross-referenced against Engineer Kit
engineer@v2.19.1-beta.10at time of writing, verified viack -V. This note is here only to flag potential drift if theck:planorck:cookworkflow changes later.
Why --deep--tdd often ship together
--deep and --tdd address two different layers of risk.
If the task carries both risks, enable both options.
--deep gives the map. --tdd gives the verification layer. Major refactors typically need both.| Risk | What you need | Option |
|---|---|---|
| Unknown scope | Per-phase scout, file inventory, dependency map | --deep |
| Refactor drifts existing behavior | Tests before refactor, Regression Gate at cook | --tdd |
A major refactor usually carries both risks. For example:
Extract billing module from the monolith.
Keep current API contract for the frontend.
Preserve existing behavior for retry, idempotency, invoice status.
Touches database schema, background jobs, API routes, admin UI.
This billing task needs --deep because scope is wide and dependencies are many. It also needs --tdd because retry, idempotency, refund, and webhooks are all existing behaviors that must survive.
Running only --deep might produce a very clear plan, but the refactor still has no regression gate. Running only --tdd might get the test flow right, but the phase file stays vague: what to test, where, which modules are affected, which dependencies need to go first.
Quick recallWide task? Use
--deep. Old behavior must survive? Add--tdd. Both risks? Enable both.
When to run /ck:brainstorm or /ck:scout first
A common mistake: reaching for --deep --tdd when the approach is still unclear. Plan mode doesn't resolve ambiguity. It turns a chosen approach into more concrete steps — nothing more.
For example, if you haven't decided:
- Split into a true separate service, or just modularize inside the monolith?
- Introduce a new queue, or keep the current background job?
- Keep the existing invoice status enum, or move to a clearer state machine?
- Rewrite the admin UI flow, or refactor it incrementally?
In that case, run /ck:brainstorm first to settle on an approach, then plan.
Another layer is /ck:scout. Think of it as a context-gathering step, not a decision-making one. Scout is useful when you're not sure where the code lives, which modules are involved, or which tests are covering the current behavior.
When the brief is still fuzzy, don't jump straight to plan.
--deep --tdd once the approach is clear./ck:scout "billing retry flow, invoice status, background jobs, admin UI"
/ck:brainstorm "Extract billing module while keeping the current API contract"
/ck:plan --deep --tdd "Refactor billing module per agreed approach..."
Once you know the relevant code area and the key trade-offs, you can skip the separate /ck:scout call — /ck:plan --deep has a scout step in its pipeline anyway. But when the brief is too vague, scouting first helps avoid a silent failure mode: the agent's reasoning sounds plausible, but it's working from the wrong map of the project.
| Situation | Start with |
|---|---|
| Unknown project context / don't know where the code lives | /ck:scout -> /ck:brainstorm |
| Know clearly what and how | /ck:plan directly |
| Know what, unsure how | /ck:brainstorm -> /ck:plan |
| Not yet sure whether to do it at all | /ck:brainstorm -> decide -> plan or drop |
Decision matrix
/ck:scout first/ck:brainstorm first/ck:plan --deep --tdd| Situation | Command |
|---|---|
| Unknown project context | /ck:scout first, then brainstorm/plan |
| Approach not settled | /ck:brainstorm first |
| Small fix, 1–2 files | /ck:plan --fast |
| Medium new feature | /ck:plan auto |
| Complex feature, unfamiliar domain | /ck:plan --hard |
| 3+ independent modules, run in parallel | /ck:plan --parallel |
| Torn between 2 concrete approaches | /ck:plan --two |
| Refactor across 5+ areas, architectural debt | /ck:plan --deep |
| Refactoring live/dogfood code, regression risk | Any mode + --tdd |
| Major refactor on a codebase with behavior to preserve | /ck:plan --deep --tdd |
The --deep --tdd pair isn't one mechanism — it's two separate things that travel well together. --deep gives the map. --tdd gives the verification layer. Planning cost goes up, but cook has less scope to guess and a clearer regression gate to work against.
Inside: what's actually different
This section goes deeper into how --deep and --tdd each affect plan time, cook time, and running cost.
7.1 --deep costs more because of researcher + per-phase scout
| Mode | Researcher | Red Team | Validation | Per-phase scout |
|---|---|---|---|---|
--fast | 0 | 0 | 0 | No |
--hard | 2 | Yes | Optional | No |
--deep | 2-3 | Yes | Yes | Yes |
--parallel | 2 | Yes | Optional | No |
--two | 2+ | After select | After select | No |
Most of --deep's cost comes from 2–3 researchers for high-level architecture analysis, the red-team review, the validation step, and a re-read pass for each phase.
Put simply: each phase gets its own inspection before the plan is finalized — which files it touches, which phases it depends on, which tests are missing, which edge cases are easy to miss.
One thing worth clarifying: "scout" here refers to the act of reading and rechecking per phase, not a guarantee that a separate agent is always spawned at runtime. The point is that --deep forces the plan through many smaller inspection passes rather than a single high-level look at scope.
7.2 --tdd doesn't spin up a new agent
Easy to misread: --tdd doesn't change the selected mode. It's an additive flag — you still run --fast, --hard, or --deep as normal, with a tests-first structure layered on top.
The flag changes exactly 2 things:
- The phase file at plan time. A phase normally covers overview, requirements, architecture, related files, implementation steps, success criteria, and risk assessment. With
--tddon, the phase gains a pre-refactor test section, a post-refactor test section, and an end-of-phase gate. - Execution order at cook time. Write behavior-protection tests first, then refactor, then re-run the compile/test gate.
The cost of --tdd is much lower than --deep. It mainly adds structure to the phase file and reorders execution at cook time.
7.3 Regression Gate needs a concrete command
The cook spec requires the Regression Gate to be a concrete compile/test command. For a Go project, for example:
Regression Gate: go test ./... && go vet ./...
If the phase file vaguely says "run tests to verify," the verify step weakens because cook has no exact command to anchor on. The --tdd flag doesn't automatically guess the right tooling for every repo. If the project uses a non-standard setup, declare it explicitly in the task description or plan file.
Project uses go test ./..., frontend uses npm test, E2E uses Playwright.
When the task names the tooling, the plan has something concrete to write into the Regression Gate for each phase.
How to run it, with a general template
The template below follows the source skill of ck:plan and ck:cook: plan receives the task + mode/flag, cook receives the plan path and carries --tdd forward if the plan enabled it.
General template
/ck:plan [mode] [--tdd] "[What needs to happen].
Scope: [modules/files/stack being touched].
Preserve: [API contract, existing behavior, compatibility].
Touches: [database, jobs, routes, UI, shared types].
Tooling: [concrete compile/test command].
Known bugs / out of scope: [if any]."
Note: if the plan uses --tdd, pass it to cook as well:
/ck:cook /absolute/path/to/plan.md --tdd
Applying the template to a few common situations:
--deep for a planning/inventory phase
/ck:plan --deep "Build an inventory plan to prepare extracting billing into its own package.
Scope: API routes, invoice service, payment jobs, admin UI, database schema.
Output must include: file inventory, dependency map, phase ownership, risk list.
Not refactoring live behavior in this round."
--hard --tdd for refactoring code that's running / being dogfooded
/ck:plan --hard --tdd "Refactor invoiceStatusService: move transition rules
to a separate file, keep exact behavior for retry, failed, refunded, voided."
--deep --tdd for a major refactor with behavior to preserve
/ck:plan --deep --tdd "Extract billing module from the monolith.
Keep API contract for frontend, preserve retry/idempotency/refund/webhook behavior.
Touches: database schema, payment jobs, API routes, admin UI. Project uses
go test ./..., frontend uses npm test."
A good task description usually covers:
- Scope — which files/modules/stack
- Constraints — what must not break, what must stay compatible
- Tooling — test command, compile command; critical for
--tdd - Expected output — short but concrete
Vague input like "refactor billing" produces a vague plan. Without a concrete anchor, scout output tends to be generic.
Best practices and pitfalls
Do this before you cook
Clean worktree before planning
If the working directory has uncommitted changes from another task, scout can mix up current code with in-progress work. Before a large plan: commit or stash. Better yet, create a dedicated worktree via /ck:worktree.
Review the phase files before cooking
A --deep --tdd plan takes a while to run — don't start cook immediately without reading through plan.md and the phase files it generated.
Phase review checklist
Read plan.md and the phase files through these 5 questions before letting cook run:
Declare the test command in the task
With --tdd, does the repo use something non-standard like bun instead of npm, mise instead of asdf, or task instead of make? Say so upfront.
Project uses go test ./..., frontend uses npm test, E2E uses Playwright.
Without a declaration, the agent guesses. A wrong guess weakens the Regression Gate.
/clear between plan and cook
Plan context in a large task gets heavy fast: research output, scout data, red-team feedback. Recommended flow: finish plan -> /clear -> reopen -> /ck:cook {absolute-path}/plan.md --tdd. Cook reads everything from the plan file; the old planning context isn't needed.
Avoid
Forgetting --tdd on /ck:cook
Plan has --tdd, cook doesn't. Cook can still read the pre-refactor test section, but it won't enforce the order — tests may still get written, just after the code instead of before. Use the cook command suggested in the plan output.
--deep for small tasks
Under roughly 5 files, it's usually not needed. --hard or --fast will be leaner.
--tdd for greenfield
New code has no existing behavior to capture. The "tests before refactor" section ends up nearly empty, making it easy for the agent to write formal tests rather than real TDD. Greenfield: use --hard and let tests be written in normal flow.
Trusting the phase file blindly
--deep scouts more thoroughly, but it can still miss implicit dependencies. Before cooking, ask: does a later phase silently rely on something the earlier phase hasn't created yet?
--tdd capturing tests for code that already has bugs
Old billing code had a latent bug — duplicate webhooks sometimes created 2 payment attempts. If the pre-refactor test captures that bug as "current behavior," an accidental fix during refactoring makes the gate fail, and the agent may reintroduce the old bug to make it pass. Fix: declare known bugs in the task description, or split into two plans: one to fix the bug, one to refactor.
Trying to combine --deep with --parallel
--deep and --parallel are both modes — they don't compose. If you need multiple agents in parallel, use --parallel --tdd and be very explicit about ownership and test scope. If ownership and test isolation aren't clear, drop --parallel and stick with --deep --tdd.
In short
--deep --tdd isn't for every task.
It's for the subset of tasks that show two signals:
- Scope is wide enough that a typical plan risks missing the map
- Existing code has behavior that must survive the refactor
--deep makes planning slower, but the phase files come out sharper: file inventory, test gaps, dependency map, function/interface checklist.
--tdd makes cooking slower, but the refactor gets a regression gate: write tests first, refactor, then verify.
If you're in a PM/founder/product role, you mainly just need to know these two options exist. When the team is about to refactor a large module like billing, ask whether the plan includes per-phase scouting and a regression gate.
If you're a dev, try it on the next big task. The first time it may feel like extra overhead. But one pre-written test catching a hidden regression makes that overhead feel worth it.