Show HN: Open source framework OpenAI uses for Advanced Voice

97 pointsposted 10 hours ago
by russ

17 Comments

pj_mukh

5 hours ago

Super cool! Didn't realize OpenAI is just using LiveKit.

Does the pricing breakdown to be the same as having a OpenAI Advanced Voice socket open the whole time? It's like $9/hr!

It would be theoretically cheaper to use this without keeping the advanced voice socket open the whole time and just use the GPT4o streaming service [1] for whenever inference is needed (pay per token) and use livekits other components to do the rest (TTS, VAD etc.).

What's the trade off here?

[1]: https://platform.openai.com/docs/api-reference/streaming

davidz

4 hours ago

Currently it does: all audio is sent to the model.

However, we are working on turn detection within the framework, so you won't have to send silence to the model when the user isn't talking. It's a fairly straight forward path to cutting down the cost by ~50%.

solarkraft

2 hours ago

That’s some crazy marketing for a „our library happened to support this relatively simple use case“ situation. Impressive!

By the way: The cerebras voice demo also uses LiveKit for this: https://cerebras.vercel.app/

russ

36 minutes ago

There’s a ton of complexity under the “relatively simple use case” when you get to a global, 200M+ user scale.

FanaHOVA

6 hours ago

Olivier, Michelle, and Romain gave you guys a shoutout like 3 times in our DevDay recap podcast if you need more testimonial quotes :) https://www.latent.space/p/devday-2024

russ

4 hours ago

I had no idea! <3 Thank you for sharing this, made my weekend.

shayps

3 hours ago

You guys are honestly the best

mycall

6 hours ago

I wonder when Azure OpenAI will get this.

davidz

4 hours ago

I'm working on a PR now :)

gastonmorixe

6 hours ago

Nice they have many partners on this. I see Azure as well.

There is a common consensus that the new Realtime API is not actually using the same Advanced Voice model / engine - or however it works - since at least the TTS part doesn’t seem to be as capable as the one shipped with the official OpenAI app.

Any idea on this?

Source: https://github.com/openai/openai-realtime-api-beta/issues/2

russ

4 hours ago

It's using the same model/engine. I don't have knowledge of the internals, but a different subsystem/set of dedicated resources though for API traffic versus first-party apps.

One thing to note is there is no separate TTS-phase here, it's happening internally within GPT-4o, in the Realtime API and Advanced Voice.

willsmith72

5 hours ago

That was cool, but got up to $1 usage real quick

russ

4 hours ago

We had our playground (https://playground.livekit.io) up for a few days using our key. Def racked up a $$$$ bill!

wordpad25

2 hours ago

How much is it per minute of talking?

russ

2 hours ago

50% human speaking at $0.06/minute of tokens

50% AI speaking at $0.24/minute of tokens

we (LiveKit Cloud) charge ~$0.0005/minute for each participant (in this case there would be 2)

So blended is $0.151/minute

shayps

2 hours ago

It shakes out to around $0.15 per minute for an average conversation. If history is a guide though, this will get a lot cheaper pretty quickly.

cdolan

36 minutes ago

This is cheaper than old cellular calls, inflation adjusted