Show HN: 5MB Rust binary that runs HuggingFace models (no Python)

2 pointsposted 5 months ago
by MKuykendall

2 Comments

somesun

5 months ago

that's great !

I will try it, does it support GPU like cuda?

one question, can I use it as a library in my rust project, or I can only call it through new process with exe file?

MKuykendall

5 months ago

GPU/CUDA: Yes, but disabled by default for faster builds. To enable: remove LLAMA_CUDA = "OFF" from config.toml and rebuild with CUDA toolkit installed.

Rust library: Absolutely! Add shimmy = { version = "0.1.0", features = ["llama"] } to Cargo.toml. Use the inference engine directly:

let engine = shimmy::engine::llama::LlamaEngine::new(); let model = engine.load(&spec).await?; let response = model.generate("prompt", opts, None).await?;

No need to spawn processes - just import and use the components directly in your Rust code.