Ask HN: What happens when AI-voice becomes good enough?

1 pointsposted 9 hours ago
by boa00

Item id: 48507696

7 Comments

damnesian

8 hours ago

I wonder if when it truly becomes indistinguishable from reality if people won't increasingly seek direct experiences with fellow humans. We're already experiencing this as a family. AI is such a strange mental rabbit hole, we're suffering from "tailored for you" fatigue. When you just want some objective answers, what pleases you the best is NOT useful, and at this point in the curve, you have to work harder to get LLMs to give you what you need rather that what it thinks will engage you more. My adult kids have started gathering to play board games and hang out in person whereas three years ago they'd be content to play online games together. We're hitting that threshold, right now, where our biology is pushing back.

I don't think the future as painted for us presently is as guaranteed as those would profit from it would like you to think.

kvasserman

9 hours ago

I think of it this way. LLMs suppose to be good at generating text/writing, right? Well, they are not very good at it. They generate plausible content that superficially makes sense. Most people can easily tell AI generated slop from human writing. I suspect that mimicking human voice is multiple levels more difficult for LLMs than mimicking human content. The level of nuance that humans produce in their speech is probably staggering. So I maybe completely wrong, but I see no evidence so far to support the idea that either LLM's writing or speaking is going to get much better any time soon.

boa00

8 hours ago

Not sure I agree here

Text is just human thoughts in their most simple form. Writing is about expressing ideas, and there is almost an infinite number of ways to express them. Extremely difficult task, and LLMs only "imitate" it to the best of their training

This is not at all true for voice. There are an infinite number of possible voices, but a finite number of tones and phonemes you can use to express the text.

It's a much easier technical problem; it's just that it's much harder to gather proper data (you cannot just scrape Reddit and hope for the best, as LLMs do). And voice gets like 1/100th of LLMs' funding

kvasserman

6 hours ago

Ironically, one of the thing that makes written word by AI recognizable as AI is that it's too perfect. Too polished. Now think about speech patterns, they are way more than voice frequencies, tones and phonemes. One can say the same phrase gazillion different ways, with different pauses, cadence, inflections, intonations and even pitch. Humans speak "imperfectly." It's very contextual too: in many situations, we voice the same words very differently. Again, it's possible that I don't know what I'm talking about, but every example of machine talking that I've heard, I felt it was too mechanical, precisely because it was lacking the nuance of how real humans speak.

ben_w

9 hours ago

Perhaps, but for what it's worth, when I first heard OpenAI's TTS demo, I assumed they were faking it and a human was speaking because it had "um"s and "err"s.

Right now, the main thing making these things recognisable is there's so few voices. The voices themselves are basically celebrities, albeit in the same way as some annoying D-list celebrity who somehow managed to get a bajillion contracts for advertising cheap tat.

Given that LLM slop is currently rapidly degrading the trustworthiness of search results (even moreso than SEO already had), it's probably for the best if the major AI providers don't release a bunch more voices.

Jblx2

8 hours ago

Dystopian Future 3: Elderly people getting scammed out of their life savings by scammers on the phone who sound indistinguishable from their grandchildren. (The ones who's grandchildren had their voices scraped from tiktoks.)

kvasserman

6 hours ago

This, in fact, is already happening. Elderly are getting duped by scammers using voices of their relatives/children.