GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU

12 pointsposted 10 hours ago
by XMasterrrr

2 Comments

DiabloD3

8 hours ago

This isn't very useful.

V of context is not equal across models.

Also, huggingface tells you how big the model is for the exact one you have in your hand, why the weird guesswork? Dynamic quants are not going to magically fit some formula.

metadat

9 hours ago

This is super useful. Most of the time I go to run a model off Hugging Face on my 64GB MBP I run into issues where I drastically overestimated what it could do. :>