dan_sbl
11 hours ago
> For example, when the GPU is fully idle, nvidia-smi tells me that it’s only pulling 88W of power.
I haven't used a non-laptop GPU in some time, but that is a crazy amount of "idle" power consumption. Is this normal for cards like this?
Aurornis
11 hours ago
Server cards are not optimized for idle power usage. They’re expected to be fully utilized.
For server gear it’s more common to have less dynamic power and voltage switching because it produces more predictable performance and latency.
cmovq
10 hours ago
For GeForce cards you can get similar behavior by setting “Prefer maximum performance” which disables some of the low power states.
wildzzz
10 hours ago
If my gpu is sitting idle, and I mean idle with nothing loaded into its memory, it's sitting at about 18W. If I load in model that uses nearly all of the memory but that model is idle, it's at 36W. If that model is actively thinking, it's like 118W. I think this is likely due to the GPU being aware that there is real data loaded into memory and turning up the DRAM refresh rate whereas when nothing is loaded, the dynamic power is as low as possible.
legitronics
9 hours ago
Yes, I have some of these cards and AFAICT the HBM2e chips just always run at full speed. I have different variants of the pcie cards and while I can get the gpu itself into a lower power state the memory just runs full tilt. Though I see 40w on my “normal” cards and 60w on the Frankenstein card that thinks it’s an sxm4.
monster_truck
3 hours ago
IIRC this was one of the issues with 2/2e, some combination of the various available memory controllers not agreeing on a standard to manage timings and power states. I haven't played around with my Radeon VII in a long while now.
That aside idle power consumption is a driver-to-driver affair from both amd and novideo, sometimes I'm only pulling 15-30W when nothing is happening and other times it decides it needs 110w for a static 500hz screen
umanwizard
9 hours ago
I suspect the act of running nvidia-smi itself prevents the GPU from being put into a low-power state.
zacmps
9 hours ago
From memory this is true and nvml (Nvidia management library) is the way to get stats that doesn't cause the GPU to wake.