lorenzohess
10 hours ago
Summary:
> Rather than allowing heat to build up, what if we could spread it out right from the start, inside the chip?... To do that, we’d have to introduce a highly thermally conductive material inside the IC, mere nanometers from the transistors, without messing up any of their very precise and sensitive properties. Enter an unexpected material—diamond.
> ... my research group at Stanford University has managed what seemed impossible. We can now grow a form of diamond suitable for spreading heat, directly atop semiconductor devices at low enough temperatures that even the most delicate interconnects inside advanced chips will survive... Our diamonds are a polycrystalline coating no more than a couple of micrometers thick.
> The potential benefits could be huge. In some of our earliest gallium-nitride radio-frequency transistors, the addition of diamond dropped the device temperature by more than 50 °C.
kulahan
8 hours ago
Fifty Celsius is an insane drop.
It sounds like the most important part of the article (and another cool quote) is this:
>Until recently we knew how to grow it only at circuit-slagging temperatures in excess of 1,000 °C.
So basically, the big breakthrough was low-temp growth of a diamond lattice. Very cool they can do it at such a low temperature. It must be a crazy low temp - probably under 100C?
yorwba
8 hours ago
From the article:
"we were able to find a formula that produced coatings of large-grained polycrystalline diamond all around devices at 400 °C, which is a survivable temperature for CMOS circuits and other devices."
FaradayRotation
6 hours ago
It is genuinely impressive to grow thin film polycrystalline diamond at 400C, but my understanding is this temperature is basically at the ceiling of what the circuits will tolerate in the course of manufacturing to still get a good quality device at end of line. Stress tests, anneals, and wafer bakes are usually limited to about 400C - unless the point is to deliberately degrade the chip
Not to say that it can't be done, only that the process window is not very large and the propensity for deleterious carbon soot is very high. Likely this will generate some very fun, highly integrated problem statements before we see this available for sale.
Getting heat out of the chip is such a painful and important struggle. I hope this works on a real process line. Too many benefits on the table to ignore.
Edit: Grammar, clarity
hnuser123456
2 hours ago
I wonder, in situations like the Raptor lake fiasco or other "overclocked a little too far" scenarios where the circuit degrades to the point the frequency must be reduced to maintain expected stability, that some very small spots on the chip approached that temperature, while the temp sensor read 100C or below (not kicking in thermal throttling when it should've)?
FaradayRotation
2 hours ago
Caveats: My understanding of the Raptor Lake mess is pretty limited, mostly because Intel has been fairly closed lipped on what specific issue caused that. My personal suspicion is that it was a pareto plot's worth of issues. Also, while I do know a few things about this particular topic, I am far from the final authority on it.
My understanding is that point/local resistive heating problems out in the wild tend to drive different failure modes vs the global heating techniques used on the manufacturing line, mostly because the CPU is in active operation, which changes the defect physics. Put another way, likely any particular structure in the CPU would not need to reach 400C to fail - even the small voltages used in these chips coupled with elevated temperature can drive a lot of difficult-to-catch, slow-to-manifest failure modes. Copper metal migration is the classic example of this type of problem, where copper ions slowly migrate under voltage+temperature, causing/propagating voids until finally an open circuit is made. Surprise! your chip no longer works after seeming perfectly fine! Manufacturers try to catch such problems with simulated aging through aggressive temperature and voltage experiments. Intel must have discovered a big gap in their visibility, and then discovered their CPU specs were incompatible with the stated product lifetime without a major re-spec of already sold product. Ouch.
The chip manufacturer also has some capability to make repairs and adjustments ahead of end of line, which should encompass managing some of the issues you refer to. Some big customers might have their own repair capabilities.
Edit: Clarity, trying to better address the question
kulahan
8 hours ago
Thanks, not sure how I missed that. Still, a 60% drop in required temp! These gems are truly, truly outrageous.
beautifulfreak
8 hours ago
The article says 400C