CobrastanJorji
3 hours ago
Many years ago, I worked at Amazon, and it was at the time quite fond of the "five whys" approach to root cause analysis: say what happened, ask why that happened, ask why that in turn happened, and keep going until you get to some very fundamental problem.
I was asked to write up such a document for an incident where our team had written a new feature which, upon launch, did absolutely nothing. Our team had accidentally mistyped a flag name on the last day before we handed it to a test team, the test team examined the (nonfunctional) tool for a few weeks and blessed it, and then upon turning it on, it failed to do anything. My five whys document was most about "what part of our process led to a multiweek test effort that would greenlight a tool that does nothing that it is required to do."
I recall my manager handing the doc back to me and saying that I needed to completely redo it because it was unacceptable for us to blame another team for our team's bug, which is how I learned that you can make a five why process blame any team you find convenient by choosing the question. I quit not too long after that.
BeetleB
2 hours ago
My litmus test for these types of processes: If root causes like "Inflexible with timelines", or "Incentives are misaligned (e.g. prioritizing career over quality)" are not permitted, the whole process is a waste of time.
Edit: You can see others commenting on precisely this. Examples:
https://news.ycombinator.com/item?id=45573027
https://news.ycombinator.com/item?id=45573101
grogers
2 hours ago
Usually another team's failure is covered by their own independent report. That simplifies creating the report since you don't need to collaborate closely, but also prevents shifting the blame on to anyone else (because really, both teams had failures they should have caught independently). E.g. as the last why:
Why did the testing team not catch that the feature was not functional?
This is covered by LINK
CobrastanJorji
an hour ago
If a root cause analysis is not cross team, how deep can the analysis possibly be? "Whoops, that question leads to this other process that our team doesn't directly control, guess we stop thinking about that!"
taeric
an hour ago
If your root cause is cross team, then you wind up having to make some implicit assumptions on what the other team could have done. Is akin to ending with "because the gods got angry." Not really actionable.
This is a classic "limit the scope of the feature." You want the document to be written and constrained to someone that is in a position to impact everything they talk about. If you think there was something more holistic, push for that, as well.
Note you can discuss what other teams are doing. But do that in a way that is strictly factual. And then ask why that led your team to the failure that your team owns.
lijok
an hour ago
Pretty deep. It forces you to account for failures in other domains
hshdhdhehd
2 hours ago
Interesting one.
My first thought is why is rolling out a new system to prod that is not used yet an incident? I dont think "being in prod" is sufficient. There should be tiers of service and a brand new service should not be on a tier where it having teething issues is an incident.
> what part of our process led to a multiweek test effort that would greenlight a tool that does nothing that it is required to do
would be interested to see the doc, but imagine you'd branch off the causes, one branch of the tree is: UAT didnt pick up the bug. why didn't UAT pick up the bug? .... (you'd need that teams help).
I think that team would have something that is a contributing cause. You shouldn't rely on UAT to pick up a bug in a released product. However just because it is not a root cause doesnt mean it shouldn't be addressed. Today's contributing cause can be tomorrow's root cause!
So yeah yiu dont blame another team but you also dont shield another team from one of their systems needing attention! The wording matters alot though.
The way you worded the question seems a little loaded. But you may be paraphrasing? 5 whys are usually more like "Why did they papaya team not detect the bug before deployment?"
Whereas
> what part of our process led to a multiweek test effort that would greenlight a tool that does nothing that it is required to do
Is more emotive. Sounds like a cross examiners question which isn't the vibe you'd want to go for. 5 whys should be 5 drys. Nothing spicy!
NikolaNovak
2 hours ago
That's how we do it - there are "branches" to most of our RCAs, and in fact, we have separate sections for root cause analysis (things which directly or indirectly contribute to incident, which are a branched / fractal 5 whys) and lessons learned (things which did not necessarily contribute to incident but which upon reflection we can do better - frequently incident management or communication or reporting or escalation etc).
It took a while for all the teams to embrace the rca process without fear and finger pointing, but now that it's trusted and accepted, problem management stream / rca process probably the healthiest / best viewed of our streams and processes :-)
CobrastanJorji
2 hours ago
It was an incident because it was important to leadership. It was a marketing targeting feature that was advertised to the local executive with some excitement by the management, so they were excited to share the results of it, and when there weren't results on the anticipated launch date, they wanted answers, which meant the manager treated it as an incident.
sanman8119
2 hours ago
A very relatable experience, lot of pressure to stop the Whys at the dev team and not question larger leadership or organizational moves
nobrains
an hour ago
they way i handle this with my teams: any bugs caught by the QA team go against the developers. any bugs caught after QA green lights the go live go against the QA team. (Of course, discounting any bugs that are deemed acceptable for go live by the PM).
tayo42
2 hours ago
5 why's can be very political. You can make it take whatever direction you want to tell what ever story you want. I don't get why it's cargo culted the way it is
stonemetal12
36 minutes ago
No, people can be very political. It doesn't matter what the process is.
Hell, people even legislated the value of PI that one time.
numpad0
18 minutes ago
While that might be true, the five whys is notorious for slipping into a destructive "you/I suck and firing you/I solves the problem for good and I believe it makes everyone absolutely happy" style of false conclusions.
Reportedly Toyota has organizational mitigations for that problem or reportedly the working culture there isn't so great after all. The bottom line is, it's a double edged sword to say the very least.
vivalahn
an hour ago
The next org you went to, did they also use the Five Whys or did they get by with Four True Colors instead?