Estimating a CI/CD overhaul

Pipeline work that ships nothing to the user. The story stays honest by slicing on what can be deleted, not on what's new.

Pipeline work has no demo. Slice by "which build can be deleted today" to keep it estimable.

CI/CD overhauls are the worst-shaped stories on the backlog. There's no user-facing outcome, no demo, no screenshot anyone can put in the release notes. The work is structural: replace one pipeline with another, migrate a build system, consolidate three test runners into one, move from a hand-rolled shell script to a proper workflow file. The team knows the work needs to happen. The team also knows it'll take a quarter, and that quarter will be invisible from outside the engineering org.

The estimation failure mode is predictable. The team votes 13 because "it's a refactor, it's big." Three sprints in, the old pipeline is still running alongside the new one, two systems are partially configured, and every engineer has a different mental model of which one to use. The story isn't done because there's no definition of done — pipelines don't ship; they get adopted, and adoption is a separate question from "the new thing exists."

What gets said in the room

SRE: "We need to move off the old runner. Two sprints."

Lead: "What's the definition of done?"

SRE: "...the new one works."

Lead: "The new one already works for half our jobs."

Backend: "But the old one still runs for the other half."

Lead: "Right. So we're done when we can delete the old one."

That's the move. The story isn't "build the new pipeline" — the team probably built that months ago in someone's spare-time spike. The story is "delete the old pipeline." Until the old build system is gone, the overhaul isn't done; it's parallel infrastructure with all the cost and none of the benefit. Sizing the story as removal instead of creation gives the team a definition of done it can vote against.

How to slice it

Inventory the jobs still running on the old system. For each job: what would it take to delete the old version? That's a story. Some of those stories are trivial (the job was already mirrored to the new pipeline; just remove the old YAML). Some are real work (test suite has a hardcoded path; deploy step depends on an old env var). Each removal is a thin slice the team can ship and demo. "We deleted the Jenkinsfile for the auth service" is a demoable outcome.

Each individual removal is also estimable in the usual way: spike the unknowns first (does this job have any consumers we don't know about?), then estimate the removal against a reference removal you already shipped. Most of the slices end up at 2 or 3 points. The story that was "13, refactor, big" was really fifteen 2s and a 3 in a trench coat.

Questions worth asking before voting

  • What gets deleted when this story is done?
  • What jobs still depend on the old system?
  • Is there a consumer of the old pipeline we don't know about (an external team, a scheduled job)?
  • What's the rollback if the new pipeline regresses after we delete the old one?
  • Who's authoritative on "the new pipeline is the source of truth"?
  • Is anyone tracking which jobs still run on the old vs new system, week to week?

If the answer to "what gets deleted" is "nothing this sprint," the team is still in the parallel-infra phase and the story should be split into the next deletion. If the team votes a number on "migrate the pipeline" without naming what gets deleted, the story isn't ready — it fails the readiness gate.

Pipeline work is done when the old thing is gone. Slice by deletion; estimate each removal on its own.

See estimating a framework upgrade for the related parallel-infrastructure pattern; horizontal vs vertical slicing for why "build the new system" is the wrong slice axis. Open a session once the team has a list of removals.