Estimating a data migration

Moving data between systems. The story where the migration runs in an hour and the cleanup runs for a quarter.

The migration itself is the easy part. The reconciliation step you forgot is where the points hide.

A data migration — moving data from one system to another, or from one shape to another inside the same system — looks like a transform. Read from the source, transform, write to the destination, switch traffic over. The team votes on the transform. The transform is usually the smallest piece of work in the story. The points are in the things the transform doesn't see: records that look fine but reference data that didn't make it, records that look broken but are actually correct in a way the spec didn't capture, edge cases that survived only because the old system tolerated them and the new one doesn't.

This is a different problem from a schema migration. A schema migration changes the shape of one store with the data already in it. A data migration moves data across systems, and the question isn't "does the ALTER finish in time?" — it's "what do we do about the rows that don't match either side?" Most of the work is reconciliation, cutover strategy, and rollback. The transform code is the part you write fastest and finish last.

What gets said in the room

Backend: "It's a script. Read, map, write. Two days max."

Data eng: "Have we looked at how dirty the source is?"

SRE: "What's the cutover — big bang or dual-write?"

Lead: "Who reconciles the rows that don't migrate cleanly?"

PM: "When are we sunsetting the old system?"

The PM's question is the one that should drive the estimate. If the old system goes away in a month, you need a strategy that hits 100% correctness; if it stays alongside the new one for two quarters, you can afford to leave the long tail for later. The estimate isn't of the transform — it's of the strategy. Whoever votes without knowing which one is in play is voting on a different story than everyone else.

Questions worth asking before voting

  • What's the source data quality — clean, dirty, or unknown?
  • Cutover strategy: big bang, dual-write, shadow read, gradual?
  • How long do both systems coexist? Is there a sunset date?
  • Who's reconciling the rows that don't migrate cleanly, and at what bar?
  • What's the rollback if the new system gets bad data after cutover?
  • Are there downstream consumers (reports, integrations) that need to be migrated in lockstep?

If the team votes a 5 and someone says "wait, what about the audit-log records?", you didn't have an estimation problem; you had two stories pretending to be one. Split it: the transform is one ticket, the reconciliation and cutover plan is another. The transform is sized the way the team thinks; the reconciliation is sized on whatever the data-quality spike turns up.

Vote on the strategy, not the transform. The transform is two days. The strategy is the quarter.

See estimating a database migration for the single-system schema-change variant; what is a spike for the data-quality investigation that should precede the estimate. Open a session once the cutover strategy is sketched.