The SAFe approach to normalised story points makes a classic mistake that everyone seems to make with story points. It is not “relative sizing” to compare stories to a reference story that has been estimated in time (in this case “about a day”).
As soon as you introduce time as a basis for your reference story, and use e.g. story points with Fibonacci sequence, all of the comparisons you make are based on time, i.e. a 2 point story pertains to 2 days, 5 points to 5 days, etc.
Even if you are not doing this consciously you will do it unconsciously. So all you have done is estimated “how long” the stories will take to deliver. This is not relative sizing!
The whole point of using relative sizing instead of time-based estimation is that humans are better at comparing the size of things than we are about making absolute judgement of size, e.g. we’re good at being right that building A is bigger than building B, but we’re not so good at being right that building A is about 200 metres high and building B is 150 metres.
Unfortunately when it comes to tasks that we perform, our natural tendency is to use absolute terms because the “size” of a task essentially equates in our brains to “how long”. The fact that story points are numbers doesn’t help with this. Where story points completely lose their value is when we start deliberately equating a point value with a length of time.
True relative sizing of a backlog is to pick a low value story (one that you are unlikely to implement for some time) and do not estimate it at all. What you now do is compare other stories to that story, i.e. I think story C will take longer than story B, story D will take longer than story C, story D is about the same size as story C, etc. At no point do we actually predict how long something will take. We are simply saying which stories will take longer than others, by our estimation.
When a new story emerges you then do the same thing – decide if it will take longer than the reference story (which, because you have not yet implemented it, you will not be influenced by the actual time it took), less time or about the same.
You can now measure progress against the total backlog as you deliver the stories.
One thing I do agree with in the SAFe approach is that you should not do any re-calibration/estimation. As soon as you start re-estimating stories based on how long things are actually taking you are being influenced by time. This can not only throw off the relative calibration of the backlog but also ignores the inherent variability of software increments; i.e. there will be outliers within size groups that take significantly longer (or shorter) than the modal average.
P.S. If you’ve read my other #NoEstimates stuff on this blog you will know I do not advocate the use of story point estimations at all, especially due to the way they are typically misused and abused. However, there may be some potential value in doing relative size estimates (e.g. T-shirt sizes), if done right, for one or more teams working from the same initial product backlog in order to give some indication of the overall viability of the initiative and to provoke discussion within the team(s) about the value and possible approaches for undertaking individual pieces of work, aka “what shall we do next”.