threeducks
7 months ago
Testing PCIe bandwidth with a model that fits entirely into VRAM (quantized Phi-3 Mini at 2.39 GB on RTX 5090 with 32GB VRAM) is stupid because there won't be any memory transfer over PCIe beyond the initial model load. They should have tested a large MoE model like Qwen3-235B-A22B-GGUF, where the difference will be huge.
sitkack
7 months ago
Yeah, I would have expected more out of Puget Systems, they know better.