Estimating a dependency upgrade
The long-tail bump. Patch and minor are noise; major across N libraries is N spikes wearing one ticket.
Patch and minor are noise. Major across N libs is N spikes. Estimate the worst one, not the average.
Dependency upgrades come in two shapes that look identical in the package-manager output. The first: a stack of patch and minor bumps the team should batch, run the test suite against, and merge as one boring PR. That story is a 1, sometimes a 2, and the only reason to vote on it is to confirm the team agreed it's boring. The second shape: a major bump across one or more libraries, where the changelog has a "breaking changes" section and the team has to read it. That's the story that hides in plain sight. The team votes a 3 because "it's just an upgrade" and ships at a 13 three sprints later.
This isn't a framework upgrade. A framework upgrade is one library you've planned around for a quarter; the team knows it's a project. A dependency upgrade is the quiet maintenance ticket someone files because the security scan flagged six libraries, and nobody on the team has read the changelogs. The trap is the distribution: of the six bumps, four are trivial, one is a renamed export, and one is a complete API redesign. The estimate that averages them is wrong in both directions.
What gets said in the room
Engineer A: "It's six bumps. Two each. 13 total."
Lead: "Are they all minor?"
Engineer A: "Three minor, three major."
Engineer B: "Did anyone read the major changelogs?"
Engineer A: "...let me look."
Lead: "We don't have an estimate yet. We have three spikes."
The lead is right. There isn't one number for this story because there isn't one story. The minor bumps go together as their own small ticket; each major bump is its own spike until someone has read the changelog and knows what's breaking. After the spikes, you have three estimates and a decision about which to ship first. Before the spikes, you have an argument about whether 13 or 21 is the right number for "we don't know."
Questions worth asking before voting
- How many bumps, and what's the major/minor/patch breakdown?
- Has anyone read the changelogs for the major bumps?
- Are any of these on libraries the team owns vs vendors?
- Is there a transitive-dependency conflict that forces them to happen together?
- What's the rollback if one of the upgrades breaks production?
- Does the security team have a deadline on this, or is it tech-debt?
If the answer to "has anyone read the changelogs" is no, the estimate isn't real. Vote a spike for each major bump (a half-day each, capped), surface what's actually breaking, then re-estimate the implementation work against what you found. If the security team has a deadline, that constrains which order you tackle them in, not whether they get spiked first.
Don't estimate dependency upgrades by counting bumps. Read the changelogs first, then estimate the changelog you found.
See estimating a framework upgrade for the single-library major-version variant; what is a spike for the changelog-reading time-box that should precede this estimate. Open a session after the spikes have answered what's actually changing.