hackernews client

nullbio

6 hours ago

They've made a hardware LLM that reaches over 14k TPS, and you can try it here: https://chatjimmy.ai/

It seems most people are not aware of this, and so I think it's important for people to realize what is coming. It also kind of feels like the industry doesn't want people to know about this, because barely anyone talks about it. If they can make these at scale, then the cost of tokens should drop dramatically for the providers. The question is if they pass those savings on.

Imagine the latest Qwen 3.7 running at 14k TPS in agentic loops... Even if the model doesn't get things right, being able to iterate that quickly on "generate code -> unit test" will be absurd.

Hardware LLM Taalas Reaches >14,000 TPS on Llama 3.1 8B

1 Comments

nullbio