Show HN: Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU

10 pointsposted 5 hours ago
by arun-prasath

1 Comments

pmb_developer

2 hours ago

The output head byte budget is surprising. Did you try any tradeoff where the head is compressed more aggressively but experts stay mostly untouched?