btursunbayev
8 hours ago
I work with GPUs and got tired of staring at nvidia-smi trying to figure out why things are slow. 100% GPU utilization can mean efficient compute, memory stalls, thermal throttling, or power limiting, so the number alone doesn't really help.
nvsonar reads the same NVML metrics but classifies what's actually going on, whether its compute-bound, memory-bandwidth-bound, power-limited, thermal-throttled, or data-starved. It also tracks patterns over time (clock oscillation, temperature trends, utilization dips) and can run CUDA benchmarks to check if your hardware is performing at spec.
pip install nvsonar
Open to feedback: what GPU issues do you run into that a tool like this should catch? Would something like this can save you time?