qwertox
a year ago
> The Realtime API improves this by streaming audio inputs and outputs directly, enabling more natural conversational experiences. It can also handle interruptions automatically, much like Advanced Voice Mode in ChatGPT.
> Under the hood, the Realtime API lets you create a persistent WebSocket connection to exchange messages with GPT-4o. The API supports function calling(opens in a new window), which makes it possible for voice assistants to respond to user requests by triggering actions or pulling in new context.
-
This sounds really interesting, and I see a great use cases for it. However, I'm wondering if the API provides a text transcription of both the input and output so that I can store the data directly in a database without needing to transcribe the audio separately.
-
Edit: Apparently it does.
It sends `conversation.item.input_audio_transcription.completed` [0] events when the input transcription is done (I guess a couple of them in real-time)
and `response.done` [1] with the response text.
[0] https://platform.openai.com/docs/api-reference/realtime-serv...
[1] https://platform.openai.com/docs/api-reference/realtime-serv...
bcherry
a year ago
yes it transcribes inputs automatically, but not in realtime.
outputs are sent in text + audio but you'll get the text very quickly and audio a bit slower, and of course the audio takes time to play back. the text also doesn't currently have timing cues so its up to you if you want to try to play it "in sync". if the user interrupts the audio, you need to send back a truncation event so it can roll its own context back, and if you never presented the text to the user you'll need to truncate it there as well to ensure your storage isn't polluted with fragments the user never heard.
pants2
a year ago
It's incredible that people are talking about the downfall of software engineering - now, at many companies, hundreds of call center roles will be replaced by a few engineering roles. With image fine-tuning, now we can replace radiologists with software engineers, etc. etc.
mrbungie
a year ago
Replacing call center roles with this is something I can see happening with the realtime api + voice output.
Radiologists, I'm not sure what we need is just image model finetuning + LLMs to get there.
epolanski
a year ago
What's the role of the software engineer besides setting this up?
Your example makes me think it will merely moves QA into essentially providing countless cases and then updating them over time to improve the AIs data.
And is it really gonna be cheaper than human support?
What's gonna happen when we will find out (see the impossibility to reach a human when interacting with many companies alredy) this is gonna bring (maybe, eventually) costs down, and revenue too because pissed off customers will move elsewhere.
cafed00d
a year ago
More than a majority of a software engineer’s time is spent on bug triage, reproducing bugs, simulating constituents in a test, and debugging fixes.
Doesn’t matter what the computer becomes — AI, AGI or God-incarnate — there’s always a role between that and the end-user. That role today is called software engineer. Tomorrow, it’ll be whatever whatever. Perhaps paid the same or less or more. Doesn’t matter.
There’s always an intermediary to deal with the shit.
Hmm, I wonder if that’s the roles priests & the clergy have been playing all this while. Except, maybe humanity is the shit God (as an end user) has to deal with
pants2
a year ago
I'd much rather talk to ChatGPT than a human support rep, provided they have the same level of ability (tools) to help you.
skybrian
a year ago
People have been trying to replace radiologists for several years now. Maybe they'll get there, but it doesn't seem to be easy.
dcl
a year ago
Radiologists will not be replaced. They will just have better tools.
karmajunkie
a year ago
the _role_ of radiologists isn’t going away, but as with software engineers, better tools means there are fewer needed to serve the same patient population. So it’s highly likely that there is going to be displacement within that industry as well.
visarga
a year ago
We really can't, it's a tool not a radiologist. Medicine is a critical field, can't afford hallucinations and sloppiness
djhn
a year ago
A radiologist makes critical life-or-death judgements. An algorithm will not, and should not, replace them.
falcor84
a year ago
A modern insulin pump also uses algorithms to make critical life or death decisions, should we replace these with doctors?
djhn
a year ago
I don't believe that is comparable. 1. Modern algorithms started out as a cronjob (which already worked better than the alternative) 2. Advances in applying optimal control theory are well known, (mostly) deterministic and explainable. They are in no way comparable to the black box that is the current state of computer vision. 3. Their failure can be readily observed and compensated for, since the patient will definitely notice. The same cannot be said about imaging.
visarga
a year ago
Does the insulin pump operate in a general space as radiology/diagnosis or is it constrained very precisely?
tough
a year ago
saw velvet show hn the other dya, could be usful for storng these https://news.ycombinator.com/item?id=41637550
BoorishBears
a year ago
OpenAI just launched the equivalent of Velvet as a full fledged feature today.
But seperate from that you typically want some application specific storage of the current "conversation" in a very different format than raw request logging.