lewtun
3 months ago
Hi, Lewis here (one of the co-authors). Happy to answer any questions people have about the book :)
troelsSteegin
3 months ago
This was a good read. I was struck by the quantity of nuanced and applied knowhow it took to build SmolLM3. I am curious about the rough cost it took to engineer and train SmolLM3 - at ~400 GPUS for a least a month, and, based on the set of book co-authors, 12 engineers for at least three months. Is $3-5M a fair ballpark number? The complement is how much experience, on average, the team members had doing ML and LLM training at scale before SmolLM3. The book is "up" on recent research, so I am surmising a phd-centric team each with multiple systems built. This is not commodity skill. What the book suggests to me is that an LLM applications start up would best focus on understanding the scope and knowhow for starting from post-training.
danielmarkbruce
3 months ago
I'm a little ways through this and it's great so far, nice job.
One of the reasons people build one though is to learn. Most smart folks are quite aware that the reality of pre-training a real LLM is going to involve some head banging against the wall (ie, things don't go smoothly like "building an llm from scratch" book), and they want to go through the process.
empiko
3 months ago
Really impressive writeup. In your opinion, how long will this stay up to date? The field is constantly evolving, do you plan to keep updating this document?
lewtun
3 months ago
Thanks! I expect the book will remain relevant as long as the Transformers architecture does. That’s why we mostly focus on topics we think will stand the test of time, but let’s see how that plays out :)
danielmarkbruce
3 months ago
Finished. Great write up.