nik282000
4 hours ago
I work at a plant with a site wide SCADA/HMI (Siemens WinCC) system, every alarm is displayed on every HMI regardless of its proximity to the machine or even its ability to address the issue. And any given minute a hundred or more alarms can be generated, the majority being nuisance messages like "air pressure almost low" or my favorite " " (no message set) but scattered among those is the occasional "no cooling water - explosion risk".
This plant is operated and deigned to the spec of an international corp with more than 20 factories, it's not a mom-and-pop operation. No one seems to think the excessive, useless, alarms are an issue and that any damage caused by missed warnings is the fault of the operator. When approaching management and engineering about this the responses range from "it's not in the budget" to " you're maintenance, fix all the problems and the alarms will go away".
The only way for this kind of issue to be resolved is with regulation and safety standards. An operator can't safely operate equipment when alarms are not filtered or sorted in some way. It's like forcing your IT guy to watch web server access logs live to spot vulnerabilities being exploited.
terminalshort
3 hours ago
This is a fundamental organizational and societal problem. An engineer would look at the situation and think "what is the best way to get the failure rate below a tolerable limit?" But a lawyer looks at the situation and thinks "how do I minimize liability and bad PR?" and a bureaucrat thinks "how can I be sure the blame doesn't land on me when something goes wrong?" And the answer to both of those questions is to throw an alarm on absolutely everything. So if there is a problem they can always say "our system detected the anomaly in advance and threw an alarm." Overall the system will be less safe and more expensive, but the lawyer's and bureaucrat's problems are solved. Our society is run by lawyers and bureaucrats, so their approach will win out over the engineer's. (And China's society is run by engineers, so it will win out over ours.)
gopher_space
an hour ago
Up to a certain point society is run by actuaries. Finding someone at your insurance company who both understands the problem with excess errors and appreciates how easily enumerable they are would be an interesting "whistleblowing" target.
renewiltord
an hour ago
Is it though? Engineer can optimize on different manifold. Company can succeed/fail for different reasons. Getting destroyed for legal suit because didn’t place alarm is small peace when you did better engineering.
After all, read any post-mortem comments on HN. Many of those people can be hired as expert if you like. They will say “I would have put an alert on it and had testing”. You will lose the case.
“Oh but we are trying to keep error rate low”. Yes, but now your company is dead when high error rate company is alive.
In revealed preferences, most engineers prefer vendors who have CYA. This is obvious from online comments. This is not because they are engineer. It’s because most people want to believe that event is freak accident.
Building system for error budget is not actually easy. Even for engineer who say they want it. Because when error happens, they immediately say it should not have happened. Counterfactual other errors blocked, and business existing are not considered. Every engineer is genius in hindsight. Every person is genius in hindsight.
Why these genius never make failure proof company? They do not. Who would not pay same price for 100% reliable tech?
terminalshort
an hour ago
> Getting destroyed for legal suit because didn’t place alarm is small peace when you did better engineering.
Indeed it is. That's why I said it's a larger societal problem in how we manage risk and react to failures.
> Why these genius never make failure proof company?
Because this is mostly a matter of unknown unknowns and predicting the future, so even a founder who makes zero mistakes is more likely than not to fail.
pstuart
an hour ago
> This is a fundamental organizational and societal problem
Absolutely, and we'd collectively be better served if we had tools to deal with it.
I think of it as "incentive ecology" -- as noted, everybody has their own incentives which shapes their behavior, which causes downstream issues that begin the process anew.
Obviously there's no simple one-shot solution to this, but what if we had ways to simplify and model this "web of responsibility" (some sort of game theory exposed as an easily consumed presentation, with computed outcomes that show the cost/ROI/risk/reward) that could be shared by all stakeholders?
Obscurity and deniability are the weapons wielded in most of these scenarios, so what if we could render them obsolete?
Sure, those in power would not want to yield their advantages, but the overall outcomes should reward everybody by minimizing risks and maximizing rewards for the enterprise and everybody wins.
Yes, I'm looking at it as a an engineer and a dreamer, but I think if such a tool existed that was open source and easily accessible that this work could be done by rogue participants that could put it out there so it's undeniable.
mmooss
2 hours ago
The first step in problem solving is to look in the mirror. It's not surprising that in an engineering community, the instinct is to blame outsiders - lawyers, bureaucrats, managers, finance, etc. - because those priorities are more likely to conflict with engineering, because it is harder to understand such different perspectives, and because it is easier to believe caricatures of people we don't know personally.
Those people have valuable input on issues the engineer may not understand and have little experience with. And engineers are just as likely to take the easy way out, like the caricature in the parent comment:
For example, for the manufacturer's engineering team it's much easier, faster and cheaper to slap an alarm on everything than to learn attention management and to think through and create an attention management system that is effective and reliable (and it had better be reliable - imagine if it omits the wrong alarms!). I think anyone with experience can imagine the decision to not delay the project and increase costs for that involved subproject - one that involves every component team, which is a priority for almost none of them, and which many engineers, such as the mechanical engineer working on the robotic arm, won't even understand the need for.
> And China's society is run by engineers, so it will win out over ours.
History has not been kind to engineers who do non-engineering, such as US President Herbert Hoover who built dams and but also had significant responsibility for the Great Depression. It's not that engineers can't acquire other skills and do well in those fields, but that other skills are needed - they aren't engineering. Those who accept as truth their natural egocentric bias and their professional community's bias toward engineering are unlikely to learn those skills.
terminalshort
an hour ago
Your own answer circles right back to the problem I'm talking about:
> and it had better be reliable - imagine if it omits the wrong alarms!
This is entirely based on the premise that an error due to omitting the wrong alarm is worse than an error based on including too many alarms. That right there is lawyerthink. Also, these priorities don't conflict as you say, they just take different sides of a tradeoff. Managers and finance people are balancing a tradeoff of delivery speed, cost, and quality to maximize business value. And the bureaucrats and lawyers are choosing more expensive and less reliable systems because they better manage the emotions of panicky anxious people looking for a scapegoat in a crisis. This has a cost.
Besides having bad luck in timing to be president when the stock market crashed, and therefore scapegoated for it, Herbert Hoover was well regarded in everything he did before and after his term, including many non engineering related things. So I think he is a particularly poor example of this. Public blame for things like that tends to be exactly as rational as thinking a hangover has nothing to do with last night.
mmooss
an hour ago
I don't see how it's 'lawyerthink' at all; engineers also want to prevent bad outcomes, especially from their own work, as does everyone else.
Also, I think this ignores the rest of my point to nitpick one part of a complex system, which was part of a larger point.
miki123211
2 hours ago
Useless warnings are a great CYA tactic.
THe more of them you have, the more likely it is that there's a warning if something happens. Whether the warning is ever noticed is secondary, what matters is the fact that there was a warning and the operator didn't react to it appropriately, which makes the situation the fault of the operator.
cucumber3732842
an hour ago
This is partly a problem with our workplace laws.
In the eyes of the regulators and courts individual low level employees can not take responsibility. This is the logic by which they fine the company when someone does something you shouldn't need to be told not to do on a step ladder or whatever.
What this means is that low level employees become liability sinks. Show them all the warnings and make them figure it out. Give them all sorts of conflicting rules and let them sort out which ones to follow. Etc, etc.
anonymousiam
3 hours ago
The criticality of the alerts should be classified, and presented with the alert. Users should have the ability to filter non-critical messages on certain platforms.
Unfortunately, some systems either don't track criticality, or some of the alerts are tagged with the wrong level.
(One example of the latter is the Ruckus WAP, which has a warning message tagged at the highest level of criticality, so about two or three times a month, I see the critical alert: "wmi_unified_mgmt_rx_event_handler-1864 : MGMT frame, ia_action 0x0 ia_catageory 0x3 status 0x0", which should be just an informational level alert, with nothing to be done about it. I've reported this bug to Ruckus a few times over the past five years, but they don't seem to care.)
varjag
38 minutes ago
In reality users will keep everything on default.
varjag
3 hours ago
I think it's regulated in places, as it was certainly an HMI concern ever since Three Mile Island. Our customer is really grilling vendors for generating excessive alarms. Generally for a system to pass commissioning it has to be all green, and if it starts event bombing after you're going to be chewed.
nik282000
an hour ago
I have never seen a piece of new equipment that ever gets to an All Green state, before, during or after commissioning. I frequently recommend that we do not allow the commissioning team to leave until they can get it to that state but it has yet to happen.
varjag
36 minutes ago
I guess it's the matter of setting the expectations, both on SCADA and equipment side. Spent this weekend getting rid of that last sporadic alert…
CamperBob2
3 hours ago
The only way for this kind of issue to be resolved is with regulation and safety standards.
Are you sure that's not what caused the problem in the first place? Unqualified and/or captured regulators who come up with safety standards that are out of touch with how the system needs to work in the real world?
AlotOfReading
3 hours ago
Do regulators come up with SCADA safety standards? I would have assumed it was IEC.
Regulators coming up with engineering standards is pretty rare in general. Usually they incorporate existing professional standards from organizations like SAE, IEEE, IEC, or ISO.
lostdog
3 hours ago
I wonder if you could calculate a "probability of response to major alert" and make it the inverse of the total or irrelevant alerts. Then you get to ask "our probability of major alert saliency is onlt 6%. Why have the providers set it at this level, and what can we do to raise it?"