Estimating a feature-flag rollout

The story that ships the flag and forgets the cleanup, and the metrics, and the kill-switch.

A feature flag isn't a deploy mechanism; it's a small product. It needs a default, an override path, an audit trail, a kill-switch, and a removal plan. Most of those are missing from the original ticket, which usually reads "wrap it in a flag." That ticket is the gate, not the feature.

The estimate has to budget for the rollout phases (1% → 10% → 50% → 100%), the metrics that decide whether to ramp or roll back at each phase, and the cleanup ticket that removes the flag once the feature is permanent. Teams that don't schedule the cleanup ship a codebase full of dead flags within a year, which is its own operational debt.

What gets said in the room

Backend: "Wrapping the call in a flag — a few lines."

SRE: "What metric tells us the rollout is going badly?"

PM: "Which cohorts get the flag turned on first?"

Lead: "Who removes the flag once we're at 100%, and when?"

Backend: "Is there a kill-switch separate from the flag?"

Questions worth asking before voting

  • Rollout shape: linear ramp, cohort-based, or boolean cutover?
  • Default value if the flag service is down — old behavior or new?
  • Metrics that gate each phase — what counts as "going well"?
  • Kill-switch path independent of the flag system?
  • Removal ticket: created upfront, or "we'll do it later"?
  • Audit trail: who flipped the flag, when, why?

See rate-limit rollouts for the same rollout-eats-the-work pattern. The fix is the same: split the implementation, the rollout, and the cleanup into separate stories.