jhoke
a month ago
Hey, for less latency for the llm portion, take a look at Cerebras. It won't run OpenAI models, but if you can substitute for an equivalent you might be able to have better speeds. There is a memory constraint so not sure if it's the most suitable for the project though. Curious to see how this works out in terms of consistent durable performance.
CLCKKKKK
a month ago
Ok I will. Thanks for your suggestion. It doesn't matter if it's from OpenAI.