hackernews client

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

123 pointsposted 10 hours ago

10 Comments

yu3zhou4

9 hours ago

README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code

janalsncm

4 hours ago

Really practical teaching approach. I clicked in to see how safetensors are loaded and just kept reading. Thanks for sharing.

GoldenJade

3 hours ago

Thanks for sharing this. As someone currently researching LLMs, I'm sure I'll be referencing this quite a bit going forward.

xuanlin314

4 hours ago

The lesson-style README is a great approach. Breaking down LLM inference into digestible steps makes the codebase approachable even for people who haven't touched CUDA before.

dwa3592

8 hours ago

Very nice job on read me.

>>Physically, LLM is a file which contains a lot of float numbers.

aka atoms of the LLM.

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

10 Comments

yu3zhou4

janalsncm

GoldenJade

xuanlin314

dwa3592

cyanydeez

juancn

nazgulsenpai

cookiengineer

einpoklum