sashank_1509
9 months ago
I have played with it for 20 minutes and here’s my review:
1. The low latency responses do make a difference. It feels miles better than any other voice chat out there.
2. Its pronunciation is excellent and very human like but it is not quite there. Somehow I can tell instantly that it’s a chatbot, it feels firmly in the uncanny valley.
3. On the same note if I was on call and there was a chatbot on the other side of the call I can instantly tell. It’s a mix of the voice with the way it responds, it just does not sounds like a human talking to you. I tried a bit to make it sound more human like, asking it to stop trying so hard in conversation being briefer etc but I wouldn’t say it made things better
And so my final review is, it is a big achievement over anything out there, nothing else comes close but it is like video game console graphics. You can instantly tell it’s not the real thing and because of that I find it harder to use than just typing to it.
achrono
9 months ago
>Somehow I can tell instantly that it’s a chatbot [...] because of that I find it harder to use than just typing to it.
That to me is precisely the reason to still use it without hesitation, because once it starts getting very-much-human, I don't know if I want to use it unless I really have to.
I think there's a lot of merit in keeping it sounding just a little artificial so that it is easier to have some psychological distance from what is already an overly anthropomorphic experience.
In religion/religious studies, there is the occasional debate of whether or not deities are/ought to be anthropomorphic, and atheism of course finds the whole notion ridiculous. Considering that our hopes and dreams with AGI can often feel religious -- maybe it's time to take that same lens towards AI.
empath75
9 months ago
I generally like it, but anytime I bump up against the guidelines, which you do if you want to do basically anything fun at all (singing!) is the most obnoxious experience, because it feels 10x worse coming from a fake smiling personality that sounds almost human than it does over text.
mewpmewp2
9 months ago
But would you want it to feel exacty like a real person in the first place? I think for that it would have to make itself far less articulate, etc as well.
threeseed
9 months ago
> I find it harder to use than just typing to it
Systems like this have existed since the 90s e.g. Dragon albeit far more rudimentary.
And the issues are exactly the same: (a) discoverability, (b) efficiency and (c) recoverability.
It is so much easier to have a screen with fixed options that you interact with, can easily see your journey and can go back for any mistakes. Versus with our voice which is the clunkiest, slowest and least precise input method we have.
zurfer
9 months ago
I understand how (a) discoverability and (c) recoverability are a problem, but what do you mean with (b) efficiency?
Most people talk faster than they can type.
Karunamon
9 months ago
That assumes perfect accuracy. If a command is misheard then you probably need to correct whatever is now in the wrong state, and then definitely reissue the original command. If its text input then you have to do some select/correct dance. Both of these things take a lot of time.
threeseed
9 months ago
But that assumes that the speech recognition is perfect.
Which at least for those of us speaking non-US English is never the case.
And you have only to ring your bank and try and transfer money between accounts with it reading out every account number and asking for confirmation every step of the way. Versus a few clicks with a mouse to see that for almost all operational tasks voice is cumbersome and inefficient.
danielbln
9 months ago
Whisper is incredibly robust, with a vast amount of language. I use it in German as well as English and it's incredibly reliable. Modern, transformer based ASR is a different ballgame.
soco
9 months ago
Time for the mandatory mention of the Scottish voice recognition elevator sketch: https://www.youtube.com/watch?v=MNuFcIRlwdc
threeseed
9 months ago
It needs to be perfect. Every single time.
Because voice interfaces don't have the equivalent of a delete key or allowing the user to quickly select a different option.
infecto
9 months ago
Voice is the future for certain interfaces. Its only clunky, slow and not precise because of the systems the voice is interacting with.
noahjk
9 months ago
> Versus with our voice which is the clunkiest, slowest and least precise input method we have.
Some related issues I have:
- my thoughts always seem to be jumbled when talking to AI
- I rush to talk quickly because any pause seems to trigger a response
- I worry words or DSL I use won’t be interpreted properly
This all leads to a pretty poor voice experience for me, and I usually forget half of what I want to talk about.
elif
9 months ago
That's merely because our conversational capability has become diminished.
I can't keep a conversation going with AI as easily as a person because of my poor skills, no fault of the AI.
I will improve over time, and there is no reason I won't be able to become as natural as jean luc Picard telling his starship what to do.
corobo
9 months ago
I think the uncanny valley feeling is going to be there no matter what they come up with. I, and therefore my brain, knows the voice is coming from a soulless machine[1] so it'll always feel a little off.
My perfect voice assistant would sound like Auto from Wall-E, which is supposedly a blend of MacOS' Ralph and Zarvox voices. Along the lines of (bear in mind I just wrote this directly into the terminal and didn't spend any time actually blending them lol)
say -v ralph -r 180 "I'm sorry Dave. I’m afraid I can’t do that" & ; say -v zarvox -r 180 "I'm sorry Dave. I’m afraid I can’t do that"
And yeah I'm almost convinced that the whole voice interaction thing came about because they interact with the computer in Star Trek using voice commands.. which is probably just because watching someone type everything into a keyboard would be some boring telly.I assume there are folks that do use it and do like it, but do they like it more than just pressing buttons to do things? No worries of being misinterpreted or having to speak like a robot at Alexa because it's failed to turn the lights off 3 voice commands in a row now. It's awesome for accessibility, don't get me wrong, I'm talking in the sense of the primary and most commonly used interface.
[1] Not a criticism, fellow soulless machines.