Running load using official Nvidia PyTorch image boost performance by 50%

2 pointsposted 3 months ago
by riomus

3 Comments

t-vi

3 months ago

Note that the NVIDIA container uses CUDA+cuBLAS 13.0.2 which cites "Improved performance on NVIDIA DGX Spark for FP16/BF16 and FP8 GEMMs", which seems to be your use-case. In general, I would suspect that it mostly comes to versions of the libs.

Interestingly, there is a cuBLAS 13.1 whl on PyPI, not sure what that does.

riomus

3 months ago

I did a shallow check on PyTorch (that reports it is 2.9.0) - and it is different from 2.9.0 from PyTorch index - and differences are from code parts that are months before 2.9.0 was out - that is why I am assuming that Nvidia is using their fork. For cuBLAS - natively i see it is available (libcublas.so.13.1.0.3) in same version as in the container.

user

3 months ago

[deleted]