> Controller failure: by far the most common catastrophic failure you’ll see in SSDs
In consumer drives. Often not even a hardware failure, but a firmware one, but to most consumers, this is splitting hairs as the drive is still "Dead" as the common ingress points to fix this are not present/disabled on consumer class drives (thus the blurb at the end of that section about physically swapping controllers). Also, cell failure is far more prevalent than controller failure in instances where the drives lack a DRAM/SLC cache (aka transition flash) layer. Controllers still fail, even at the hardware level, for enterprise and consumers alike though, it's a prevalant issue (pro tip, monitor and rectify the thermals and the prevalence of this problem drops significantly)
> Failure to retain charge: typically, only seen in SSDs, thumb drives, and similar devices left unpowered for long periods of time.
Also happens to flash that see lots of writes, power cycles, or frequent significant temperature fluctuations. This is more common on portable media (thumb drives) or mobile devices (phones, laptops, especially thin ones)
> Now, let’s take a look at the DC600M Series 2.5” SATA Enterprise SSD datasheet for one of my favorite enterprise-grade drives: Kingston’s DC600M.
Strange choice of drive but okay, especially considering they don't talk about any of it's features that actually make it an enterprise version as opposed to their consumer alternatives: Power loss protection, Transition flash/DRAM cache, controller and diagnostics options, etc etc.
> Although Kingston’s DC600M is 3D TLC like Samsung’s EVO (and newer “Pro”) models, it offers nearly double the endurance of Samsung’s older MLC drives, let alone the cheaper TLC! What gives?
For starters the power regulation and delivery circuitry on entrprise grade drives tends to be more robust (usually, even on a low-end drive like the DC600M), so that those writes that wear the cells are much less likely to actually cause wear due to out-of-spec voltage/amps. Their flash topology, channels, bitwidths, redundancy (for wear levelling/error correction) etc etc are also typically significantly improved. all of these things are FAR more important than the TLC/SLC/MLC discussion they dive into. None of these things are a given just because someone brands it an "Enterprise drive" but these are things that enterprises are concerned with where consumers typically don't often have workloads where such considerations really make a meaningful difference and they can just use either DWPD or brute force by vastly overbuying capacity to evaluate what works for them.
> One might, for example, very confidently expect 20GB per day to be written to a LOG vdev in a pool with synchronous NFS exports, and therefore spec a tiny 128GB consumer SSD rated for 0.3 DWPD... On the surface, this seems more than fine:
Perhaps, but let me stop you right there as the math that follows is irrelevant for the context presented. You should be asking what kind of DRAM/Transition flash (typically SLC if not DRAM) is present in the drive and how the controller handles it (also if it has PLP) before you ever consider DWPD. If your (S)LOG's payloads fit within the controllers cache size, and that's it's only meaningful workload then 0.3DWPD is totally fine as the actual NAND cells that comprise the available capacity will experience much less wear than if there were no cache present on the drive.
Furthermore, regardless of specific application, if your burstable payloads exceed whatever cache layer your drive can handle, you're going to see much more immediate performance degradation entirely independent of wear on any of your components. This is one area that significantly separates consumer flash with enterprise flash, not QLC/TLC/MLC or how many 3d stacks of it there are. That stuff IS relevant, but it's equally relevant in enterprise and consumer, and is first and foremost a function of cost and capacity than endurance, performance, or anything else.
This is an example of how DWPD is a generic that can be broadly used, but when you get into the specifics of use, can kinda fall on it's face.
Thermals are also very important to endurance/wear and performance both, and often goes overlooked/misunderstood.
DWPD is not as important as it once was when flash was expensive, drive capacity limited, and their was significantly more overhead in scaling them up (to vastly oversimplify, a lot less PCIe lanes available), but it's still a valuable metric. And like any individual metric, in isolation it can only tell you so much, and different folks/context will have different constraints and needs.
Note, kudos for them bringing it up that not all DWPD is equal. Some report DWPD endurance over 3 years instead of 5 to artificially inflate their DWPD metric, something to be aware of.
TL;DR: DWPD, IOPs, Capacity and price are all perfectly valid ways to evaluate flash drives, especially in the consumer space. As your concerns get more specific/demanding/"enterprise", they come with more and more caveats/nuance, but that's true of any metric for any device tbh.