Estimating a notifications system

The story that starts as one email and ends as the only thing the team builds for six weeks.

The first notification is straightforward: pick an event, render a template, send it. The second notification is also straightforward. The tenth notification is when the team realises they've been writing a notifications system for two months without admitting it. Preferences, digests, deduplication, do-not-disturb windows, per-channel overrides — none of which were in the original ticket — all become non-negotiable the moment a user gets paged at 3am.

Estimate the system, not the email. The team that votes on "send a Slack message when X happens" is underestimating by an order of magnitude, because they're sizing one row in a table that will eventually have thirty.

What gets said in the room

Backend: "Sending the email is a one-day ticket."

PM: "Can users turn it off?"

Backend: "Per-event or globally?"

Designer: "What does the preferences page look like?"

SRE: "What if the email service is down? Retry? Queue? Drop?"

Support: "How do we tell a user why they didn't get the email?"

Questions worth asking before voting

  • One channel today, or do we wire in email + push + Slack from the start?
  • User preferences — global toggle, per-event, or per-channel?
  • Deduplication and digests — needed now, or "later"?
  • Delivery guarantees — at-least-once, at-most-once, exactly-once?
  • Audit trail: can support tell a user why a notification didn't arrive?
  • Templating: who writes the copy, who localizes it, where does it live?

Like payment integrations, the public surface lies about the work. The number on the card should be sized for "the table of thirty," not "the first row." Open a session when the table is sketched.