qwertox
8 hours ago
> The Realtime API improves this by streaming audio inputs and outputs directly, enabling more natural conversational experiences. It can also handle interruptions automatically, much like Advanced Voice Mode in ChatGPT.
> Under the hood, the Realtime API lets you create a persistent WebSocket connection to exchange messages with GPT-4o. The API supports function calling(opens in a new window), which makes it possible for voice assistants to respond to user requests by triggering actions or pulling in new context.
-
This sounds really interesting, and I see a great use cases for it. However, I'm wondering if the API provides a text transcription of both the input and output so that I can store the data directly in a database without needing to transcribe the audio separately.
-
Edit: Apparently it does.
It sends `conversation.item.input_audio_transcription.completed` [0] events when the input transcription is done (I guess a couple of them in real-time)
and `response.done` [1] with the response text.
[0] https://platform.openai.com/docs/api-reference/realtime-serv...
[1] https://platform.openai.com/docs/api-reference/realtime-serv...
bcherry
5 hours ago
yes it transcribes inputs automatically, but not in realtime.
outputs are sent in text + audio but you'll get the text very quickly and audio a bit slower, and of course the audio takes time to play back. the text also doesn't currently have timing cues so its up to you if you want to try to play it "in sync". if the user interrupts the audio, you need to send back a truncation event so it can roll its own context back, and if you never presented the text to the user you'll need to truncate it there as well to ensure your storage isn't polluted with fragments the user never heard.
pants2
3 hours ago
It's incredible that people are talking about the downfall of software engineering - now, at many companies, hundreds of call center roles will be replaced by a few engineering roles. With image fine-tuning, now we can replace radiologists with software engineers, etc. etc.
epolanski
2 hours ago
What's the role of the software engineer besides setting this up?
Your example makes me think it will merely moves QA into essentially providing countless cases and then updating them over time to improve the AIs data.
And is it really gonna be cheaper than human support?
What's gonna happen when we will find out (see the impossibility to reach a human when interacting with many companies alredy) this is gonna bring (maybe, eventually) costs down, and revenue too because pissed off customers will move elsewhere.
mrbungie
3 hours ago
Replacing call center roles with this is something I can see happening with the realtime api + voice output.
Radiologists, I'm not sure what we need is just image model finetuning + LLMs to get there.
skybrian
3 hours ago
People have been trying to replace radiologists for several years now. Maybe they'll get there, but it doesn't seem to be easy.
dcl
27 minutes ago
Radiologists will not be replaced. They will just have better tools.
tough
7 hours ago
saw velvet show hn the other dya, could be usful for storng these https://news.ycombinator.com/item?id=41637550
BoorishBears
5 hours ago
OpenAI just launched the equivalent of Velvet as a full fledged feature today.
But seperate from that you typically want some application specific storage of the current "conversation" in a very different format than raw request logging.