Show HN: Voice-Pro – AI Voice Cloning Magic: Transform Any Voice in 15 Seconds

175 pointsposted 14 hours ago
by abuskorea

110 Comments

deskr

3 hours ago

Isn't it funny how some text changes the voice in your head? Now you're hearing the best voice. It's amazing. I tell you. It's the greatest voice. Everybody’s talking about it. They are saying it's incredible. They say they've never heard as beautiful a voice before.

cies

3 hours ago

I needed until "Everybody’s talking about it" to hear it in his voice :)

Please no spoilers!

vunderba

9 hours ago

I do think that voice cloning for personal usage has actual genuine uses - in fact there was a relatively interesting news article about a person who was irrevocably losing their voice who had their vocal pattern cloned.

https://www.voanews.com/a/illness-took-away-her-voice-ai-cre...

That being said, it does seem a bit bizarre that the repo's home page is proudly trumpeting the ability to co-opt other people's identities without their permission (and yes your unique vocal pattern is definitely part of your identity - I mean it's used in some forms of biometric data). They're doing the project a bit of a disservice.

VPenkov

3 hours ago

It does have actual genuine uses. I'm in the process of recording a series of tutorials for my peers but I'd like them to hear things in my voice so it doesn't sound like I have offloaded the work to someone else.

I don't know if this helps or harms the credibility but I can't really talk more than an hour without seriously straining my voice. So cloning it sounds like a great use-case for someone with a similar problem.

Looking forward to trying this.

vunderba

2 minutes ago

I like this idea. I've been playing with the idea of having all my blog entries have corresponding narration with my own voice but I'd love to see some kind of voice cloner + gradio interface that let's me make some adjustments to things like cadence, delivery, etc. (I mean beyond just making me sound like Alvin and the Chipmunks).

satvikpendem

6 hours ago

It's useful for some things, like satire. Presidents Play is a good series in YouTube where it uses US presidents' cloned voices for comedic satire.

bbarnett

5 hours ago

A gun is useful to shoot someone, what has that to do with it being right or wrong?

satvikpendem

4 hours ago

Not sure you picked the most cogent example because lots of people will debate you on that topic...

onetokeoverthe

7 hours ago

proudly trumpeting the ability to co-opt other people's identities without their permission

EXACTLY. Clone the wrong person's voice and it's game over.

giarc

2 hours ago

My neighbour is a detective and did a course on crypto scams. He told me scammers call someone's cell phone, record their voicemail greeting and use that to clone their voice. Then can then have a very real life conversation with their grandparent and take their money.

I'm all for innovation, but I don't really see the use case of cloning random voices to make podcasts? Listening to Zuck interview Elon? ok...?

alias_neo

31 minutes ago

It's really easy for a technical person to do as well.

I use Coqui TTS[0] as part of my home automation, I wrote a small python script that lets me upload a voice clip for it to clone after I got the idea from HeyWillow[1], and a small shim that lets me send the output to a Home Assistant media player instead of using their standard output device. I run the TTS container on a VM with a Tesla P4 (~£100 to buy) and get about 1x-2x (roughly the same time it'd take to say it, to process) using the large model.

Just for a giggle, I uploaded a few 3s-5s second clip of myself speaking and cloned my voice, then executed a command to our living room media player to call my wife into the room; from another room, she was 100% convinced it was myself speaking words I'd never spoken.

I tried playing with a variety of sentences for a few hours and overall, it sounded almost exactly like me, to me, with the exception of some "attitude" and "intonation" I know I wouldn't use in my speech. I didn't notice much of an improvement using much longer clips; the short ones were "good enough".

Tangentially, it really bugs me that most phone providers in the UK insist you record a "personal greeting" now before they'll let you check your voice mail box, I just record silence, because the last thing I want/need is a voicemail greeting in my voice confirming to some randomer I didn't want calling me, who I am and that my number is active, even more so knowing how I can clone any voice to a reasonably good accuracy with just a few seconds of audio.

[0] https://github.com/coqui-ai/TTS [1] https://heywillow.io/

eurekin

2 hours ago

Technically, wouldn't a simple "Hold on, I'll call you back" test call stop that?

a2128

an hour ago

Scammers will use pressure and emotion. "Grandpa they put me in jail, I need you to bail me out please, there's not much time!" The last thing on the victim's mind is to hang up on what sounds like their crying distressed grandson to call them back. Sometimes even calling back won't work, the real grandson isn't picking up their phone and the scammer is saying that it's because they're in jail and their phone was taken.

stitched2gethr

2 hours ago

Yes, if the callee has reason to believe the caller isn't who they say they are. But this will never enter the mind of someone who's retirement age.

bagels

an hour ago

Some old people become very gullible.

pmarreck

11 minutes ago

> Linux and Mac OS are not supported

Well, that's a big old fail. Just a reminder: The given (and proper) home of open source is on an open source OS.

shannifin

12 hours ago

I don't have much real use for celebrity voices (other than fun experimentation), but I'd love to be able to clone my own voice and character voices for the purposes of creating audiobooks / audioplays without having to pay monthly fees with monthly usage limits. So I'm excited by this sort of project!

P.S. Are there any tools for synthetic voice creation? Maybe melding two or more voices together, or just exploring latent space? Would be fun for character creation to create completely new voices.

vunderba

9 hours ago

I'd be interested as well. This is where I imagine the space is going - particularly as the potential for litigation increases around cloning.

Game studios will spin up a bunch of unique virtual voices for all the dialogue of extras. It'll probably be longer before we see replacements of main characters though. There's been some research in speech-to-speech transference as well - this means that company employee A records the character B's line with the appropriate emotional nuance (angry, sad, etc.) and the emotional aspect is copied on top of the generated TTS.

thelittleone

10 hours ago

Have you tried eleven labs? I used that. Had to record 3 hours of training audio reading books and and news articles. But the result was really good.

shannifin

9 hours ago

They're great! They just cost too much for how much output I want.

stavros

8 hours ago

How much did the training cost?

jerpint

3 hours ago

StyleTTSv2 is pretty good and open source, you can easily traverse its latent space for voice

dyauspitr

12 hours ago

I’ve used tortoise tts before and trained it on my voice and a mix of voices. It’s not perfect but still impressive.

youngNed

7 hours ago

I'm looking down the comments, but not really seeing much about what this actually is, by my very quick look, it's a front end for f5-tts with a yt-dlp and whisper?

Is there anything new in this?

dangoodmanUT

4 hours ago

Yeah they made an easy to use frontend. Don't be the dropbox guy

vulcanidic

2 hours ago

I completely agree with you. This is just a web front-end, and there's nothing new about it. However, it's very easy. It's not easy to create something like this.

yawnxyz

12 hours ago

> When Windows Defender mistakenly recognizes a [virus] as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps:

kfarr

12 hours ago

Yeah I also noticed the install instructions is run this batch file that gets administrator access and starts downloading things…

gruez

12 hours ago

It's not any worse than all the projects on github with an "easy" install instructions of "curl ... | sudo sh". Heck, even an innocent "sudo make install" command can easily contain a malicious payload.

elif

3 hours ago

Yea not to mention the entire homebrew ecosystem is built around trusting random people's shell scripts.

MacOS devs blindly trust it like it's the app store.

pmarreck

5 minutes ago

A simple `brew cat <packagename>` (possibly piping to bat if you want syntax highlighting) should spit out the ruby install formula for that package, for inspection.

tonyedgecombe

9 hours ago

It's not really the sort of tool that should require admin rights though.

chefandy

12 hours ago

Yeah it’s not great but it’s definitely not unusual. And windows reputation-based execution blocking does have false positives. I work for a company that has some very very popular products and some that only see a few dozen downloads per week, and despite being signed, it still takes a while for new versions to build enough rep to not trigger the block.

muglug

13 hours ago

These tools make it very easy to scam vulnerable people, and have pretty limited use otherwise.

mistercow

2 hours ago

It’s weird to me that people look at a technology and then assume from their reckoning that they know all the uses for that technology immediately. Most technological progress happens because someone notices a creative use for something that already exists which nobody else has noticed.

Larrikin

12 hours ago

I'm absolutely using celebrity voices for my Home Assistant voice. Amazon has spent the last couple years removing the voices for Alexa that people had paid for.

nickthegreek

an hour ago

I’d love some more info on using custom voices in HA. I have an esp32-s3-box that I am setting up holiday to do voice with HA.

chefandy

12 hours ago

To be fair, they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.

whaaaaat

10 hours ago

> they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.

I think you mean "steal the labor of an actor"?

chefandy

10 hours ago

Sure, and people that already agree with you will feel good reading it, but other people who don’t agree see it as an attack. It’s pretty much impossible to slip a new idea into someone’s mind if your approach made them slam the door before even considering it. So what’s the benefit of saying it like that?

gmueckl

10 hours ago

It calls attention to the ethical implications of using a part of someone else's personal identity without their direct involvement.

MrDrMcCoy

9 hours ago

Indirect involvement can still be ok within the confines of a license agreement for using the actor's voice.

ideashower

2 hours ago

> Indirect involvement can still be ok within the confines of a license agreement for using the actor's voice.

This assumes existence of a license agreement or likeness/right of publicity law that prevents unauthorized use. But this is far from the case.

Companies have shown willingness to use actors’ voices to create synthetic voices without permission, compensation, or regard for their livelihoods. [1][2][3]

[1] https://animehunch.com/popular-japanese-voice-actors-band-to...

[2] https://www.theatlantic.com/technology/archive/2024/05/eleve...

[3] https://www.yahoo.com/entertainment/morgan-freeman-calls-una...

MrDrMcCoy

an hour ago

Of course we need laws in place to require such licensing. The fact that people are having their voice stolen now does not mean that there should never be a case where a voice can legally be cloned and used by a third party.

gmueckl

6 hours ago

But this requires a legal framework that mandates such licenses and effective emforcement / procecution of violations.

As far as I know, most countries are lagging behind when it comes to updating legislation to set binding rules around that.

anonzzzies

9 hours ago

They are pretty good for leaving messages for my blind friend. I generally find calling / voice texts a waste of time (I type and read far faster than I talk or listen, not to mention the ability to reread etc), but my blind friend prefers getting voice messages when on his phone and this works for us. I type and send and when he comes back with something, Whisper makes it into text for me.

casey2

9 hours ago

I like tools like these cause they make zero trust default even more obvious, and their "pretty limited use" is saving people hours of work.

chefandy

12 hours ago

Gen AI space to everyone else: “Your computer scientists were so preoccupied with whether or not they should, they didn’t stop to think if they could just do it anyway”

tsujamin

13 hours ago

Bulldozing grandma is just the cost of technological progress /s

uh_uh

12 hours ago

This tech is going to be ubiquitous, it's just too easy to distribute it. Grandma better starts adapting now.

thejazzman

12 hours ago

Because people make it so, not because the natural order of the world gets us there

For some reason because we can validates that we should. Any jackass has the power of a research team of phds. It's kinda weird.

chefandy

12 hours ago

Indeed. Humans ascended to dominance because we can cooperate. This every-man-for-themself idea is an aberration, not the natural order as so many claim. It’s rather astounding to think otherwise considering the logistics of how we’re communicating right now.

uh_uh

12 hours ago

Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.

chefandy

10 hours ago

> Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.

Governments successfully collectively controlling dangerous things so they don’t fall into the hands of rogue bad actors fundamentally opposes the extreme individualist every-man-for-himself perspective in every conceivable way. It’s the absolute opposite of “it’s everybody’s responsibility to protect themselves because everybody else is only going to look out for themselves.”

And when individuals have that much leverage, collective action is the only conceivable way to oppose it. Some of those things might be cultural, like mores, some might be laws, some might be more martial. I don’t see how extreme individualism even theoretically could be more powerful.

uh_uh

10 hours ago

Are you suggesting government action against putting up code like this to GitHub? It’s ok if you are, but I want to put into more concrete terms what we’re talking about.

uh_uh

12 hours ago

Demanding responsible behaviour from everybody is not going to work. Some people don't care about negative externalities that much and it's enough if only a few of them decide not to play ball. So either grandma needs to adapt which will upset some people or distributing the tech should be regulated/prosecuted which will upset another group of people.

rockemsockem

10 hours ago

I think either way grandma needs to adapt though since Russian scammers and trolls are still going to run scams with fake voices.

123yawaworht456

8 hours ago

how very politically correct of you to pretend it's Russians who scam your grandmas

chefandy

12 hours ago

You can’t adapt around brain age making it more difficult to distinguish truth from lies.

casey2

9 hours ago

Yeah, I don't really get the hulabaloo, if granny doesn't have the mental fortitude to keep up with the times she shouldn't be managing her own money. I guess better her son/daughter than a scammer but both are better than letting money rot. Put granny on foodstamps and pay $1 for her rent controled housing be done with it.

zelphirkalt

8 hours ago

Are we forgetting, that there are many elderly people without living descendants?

weq

10 hours ago

This tech is not only great for bulldozing grandma, its great at stealing content from other creators and rebranding it as your own. Based on the github, it kind of seems like thats exactly whats being advertised as the use case. Steal content from BBC, cut it up and pull the noise out/vocals/revoice the content so the algorithm cant detect the plagorism easily. The imagine detection is no where no the audio detection for copyright strikes.

There is a massive problem with this on youtube. Pretty much every category on youtube now has a host of these bots trolling content and playing the youtube strike system like a banjo. There are channels detected to showing you how to setup these content mills. This tool can make you good money.

sfjailbird

7 hours ago

First generative AI destroyed Google search, and now it has pretty much destroyed YouTube. Social platforms, including this one, are probably goners too. We live in interesting times.

ranger_danger

12 hours ago

How many victims will it take for lawmakers to do something about this?

tiborsaas

11 hours ago

It's already illegal to scam somebody. While it's always positive to protect people more, what can be done here? Any alternative I can imagine is massively oppressive of the current state of the software industry.

You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.

You essentially have to regulate access to computing power if you want to prevent bad actors doing bad things using these sort of tools.

bryanrasmussen

10 hours ago

>You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.

Regulation is putting legal limitations on things, if it is impossible to regulate free and open source tools then it would be impossible to regulate murder and lots of other things, but it turns out it isn't impossible, sure - murder happens - but people get caught for it and punished.

Sorry, but this argument is much like the early internet triumphalism - back when people said it was impossible to regulate. Turns out lots of countries now regulate it.

tiborsaas

10 hours ago

It depends on what you do with the tool. Going with your murder analogy, if there's a stabbing epidemic what do you do? 1) Ban knives 2) invest in public safety 3) investigate the root causes and improve on them?

I'm also not sure what's so regulated about the internet besides net neutrality in certain countries. Of course the government can put limits on the network, like banning services, but it's easy since they are rather easy to target. With content traveling on the network it's much harder to say if it's legit or not.

> lots of countries

What about those countries that don't regulate it and people will keep pumping out better, leaner and faster models from there? Spreading software is trivial, all you achieve is the public won't be aware of what's possible.

The more I think about it if anything should be regulated that's a requirement to provide third party (probably government backed) ID verification system so it would be possible for my mom to know it's me calling here. Basically kill called ID spoofing.

bryanrasmussen

7 hours ago

>I'm also not sure what's so regulated about the internet besides net neutrality in certain countries.

generally things are regulated on the internet that were not going to ever be regulated because it was on the internet - example - sales taxes, perhaps you are old enough to remember when sales tax collection would not ever be enforceable on internet transactions - those idiot lawyer don't know, it's on the internet, the sale didn't happen in that country or in that state no sales taxes will never happen on the internet hah hah. It's unenforceable, it is logically undoable, there are so many edge cases - ugh, the law just does not understand technology!

oops, sales taxes now on internet purchases.

GDPR is another example of things that are regulated on the internet that basically most of HN years before it happened was completely convinced would be impossible!!

If this thing becomes too big a problem for the societies regulations will be done, with varying levels of effectiveness I'm sure.

And then in twenty years time we will be saying what, you can't regulate genital eating viral synths because a guy can make those in his garage and spread them via nasal spray, this technology is unstoppable and unregulatable, not like some open source deepfake library!!

bavell

3 hours ago

It's always amusing listening to techies' musings on law... lots of misunderstandings, I suspect due to the helpful but inaccurate "code but for humans" analogy.

Obligatory/relevant xkcd: https://xkcd.com/538/

vunderba

10 hours ago

Lots of countries impose exactly what specific regulations with respect to open source tooling?

The closest thing I can think of is maybe the regulation of DRM ripping tools, but they're still out there in the wild and determined actors can easily get ahold of them. So I'm not at all confident that regulation will have any measurable meaningful effect.

notTooFarGone

9 hours ago

The fable of the "determined actor".

The "determined actor" can get bombs, tanks, fissure material. There noone says "WHELP they can get it anyway so why bother regulating it LMAO" - somehow this is different in anything not physical?

bryanrasmussen

8 hours ago

>Lots of countries impose exactly what specific regulations with respect to open source tooling?

that something is not currently regulated does not mean it can never be regulated, further it does not seem likely that they would regulate open source tooling but rather some uses and if they open source tooling allowed those uses then what would happen is -

github and other big sources of code would refuse to host it as containing not legally allowed things, so for example if they regulated it in the U.S then Github stops allowing it, and everyone moves to some European git provider.

At the same time bigger companies will stop using the library because liability.

Europe then regulates and can't be in European git repos.. at some point many devs abandon particular library because not worth it (I get it this is actually for the love of doing the illegal thing so they won't abandon but despite the power of love most things in this world do not actually run on it)

Can determined actors get ahold of them and do the things with them the law forbids them to do, sure! That's called crime. Then law enforcement catches determined actors and puts them in prison, that's called the real world!

Will criminals stop - nope because there is benefit to what they're doing. Maybe some will stop because they will think screw it I can make more money working for the man. And some will be caught sooner or later. And maybe in version two of the regulations there will be AI enhancements - this crime was committed with AI allowing us to take all your belongings and add 10 years to your sentence and deprive you of the right to ever own a computing device again...etc. etc. And some people will stop and others will get more violent and aggressive about their criminal business.

I don't know necessarily what measurable meaningful effect means, for some people it will be measurable and meaningful, for some not, for some of society the regulation would in many ways be worse than what it is fighting against. I'm not saying regulation will solve problems 100%, I'm just saying this whole they can't regulate us thing because "TECH!!!" that developers seem to regularly go through with anything they set their eye on is a pipe dream.

mnau

7 hours ago

> impossible to regulate free and open source tools

BS. Can you imagine a legislation? Yes, thus it can be done.

As an early example, the CRA (Cyber Resilience Act) already contains provisions about open source stewards and security. So far they are legal persons, aka foundations, but could easily relate to any contributor or maintainer.

russell_h

11 hours ago

Serious question: what do you think lawmakers should do?

ideashower

2 hours ago

For people's image being used without their permission: strengthen U.S. right of publicity laws with private right of action, enabling people to sue for unauthorized use of their voice or likeness.

123yawaworht456

8 hours ago

how many victims did it take for lawmakers to do something about Photoshop/GIMP/etc?

rockemsockem

10 hours ago

Quit being a doomer or keep it to yourself. This reminds me of the sound boards that were popular in the early 2000s except way more versatile. Some things are just good for people to have fun and THAT'S OKAY.

whaaaaat

10 hours ago

People are allowed to recognize the realistic negative outcomes of technology, especially on a forum that frequently discusses the tradeoffs of modern, cutting edge technologies.

rockemsockem

9 hours ago

So many AI posts are overrun with this kind of complaining from folks with limited imaginations.

On a forum that frequently discusses technology with enthusiasm you'd think there'd be more enthusiasm and more constructive criticism instead of blanket write-offs.

Mordisquitos

9 hours ago

I would argue that being able to see the drawbacks and potential negative externalities of a new technology is not a sign of a "limited imagination", but quite the contrary. An actual display of a limited imagination is the inability to imagine how a new technology can (and will) be abused in society by bad actors.

Ukv

5 hours ago

Developing some insight on its negative potential could demonstrate imagination, but the claim that it could be used to scam people is pretty much just rote repetition by now - an obligatory point made in every article and under every post about this tech (and not something that I think actually works out in practice the way most imagine it, since cold-call scam operations that dial numbers at a huge scale expecting most not to pick up can't really find a voice clip prior to each automated call).

As for positive applications, some I see:

* Allowing those with speech impairments to communicate using their natural voice again

* Allowing those uncomfortable with their natural voice, such as transgender people, to communicate closer to how they wish to be perceived

* Translation of a user's voice, maintaining emotion and intonation, for natural cross-language communication on calls

* Professional-quality audio from cheap microphone setups (for video tutorials, indie games, etc.)

* Doing character voices for a D&D session, audiobook, etc.

* Customization of voice assistants, such as to use a native accent/dialect

* Movies, podcasts, audiobooks, news broadcasts, etc. made available in a huge range of languages

* If integrated with something like airpods, babelfish-like automatic isolation and translation of any speech around you

* Privacy from being able to communicate online or record videos without revealing your real voice, which I think is why many (myself included) currently resort to text-only

* New forms of interactive media - customised movies, audio dramas where the listener plays a role, videogame NPCs that react with more than just prerecorded lines, etc.

* And of course: memes, satire, and parody

I appreciate HN's general view on technologies like encrypted messaging - not falling into "we need to ban this now because pedophiles could use it" hysteria. But for anything involving machine learning, I'm concerned how often the hacker mentality seems to go out the window and we instead get people advocating for it to be made illegal to host the code, for instance.

wingworks

9 hours ago

Just heads up, this is a trail, you have to pay to use it after 30mins..

Easier and (cheaper?) to just use elevenlabs.

jamesy0ung

5 hours ago

I haven’t looked at the code, but can you just patch out the 30 minute limit?

batch12

3 hours ago

Looks to me like the app code is compiled into pyd files. One could try and decompile. Interestingly, it's licensed as MIT.

vulcanidic

9 hours ago

It’s a bit of a hassle, but after closing the Windows command, you can restart the program and use it indefinitely. The results you worked on will still remain in the workspace folder.

ldoughty

6 hours ago

Yeah, felt like it positions itself as open source project here and on GitHub, but buries the cost in other pages... Doesn't even say the subscription cost anywhere I could find (in English). Not a huge fan of this advertising model.

jncfhnb

13 hours ago

Is there speech to speech? I have been hoping for a model I can use to do voice acting with inflection

amrrs

13 hours ago

Do you mean Inflection's Pi?

bryanrasmussen

10 hours ago

I think they mean speech "in the style of" the same as repaint this picture in the style of Van Gogh, so they will do the audio and put the correct inflection on things but then rerender it with the voice of Katharine Hepburn for example.

on edit: example of course showing the difficulty as so much of Hepburn was her inflection.

jncfhnb

2 hours ago

More so I wish to voice act a line and then have the bot mimic it with a different voice but with the same contextual voicing.

“I’m going to kill you” could be delivered (laughing jokingly / seething with rage / ominously and creepily). I’d like a bot that can mimic the delivery in a different voice.

safeimp

12 hours ago

Project looks interesting. Are there short term plans to support MacOS?

If not, any recommendations for alternative projects?

harryf

11 hours ago

Have you considered supporting whisper-at - https://github.com/YuanGongND/whisper-at ? Being able to identify sounds on a timeline can be useful e.g. politicians speech and how the audience is reacting to it (e.g. clapping, applauding)

Hard_Space

4 hours ago

This doesn't appear to have any training facility, so its misuse would seem to be limited to the pre-trained voices supplied - for the casual user (and the ease-of-use seems to be the central issue in these comments).

grahamgooch

10 hours ago

Great stuff well done. What is your latency for real time Audio?

OceanBreeze77

5 hours ago

Are banks moving away from voice verification as a means to identity checks? It seems like it's getting easier and easier to clone voices.

joshdavham

11 hours ago

Looks cool! Also, is there a reason you went with a Web-UI instead of making a native desktop app?

morkalork

2 hours ago

Just need to use this with some recordings of Majel Barrett, make a voice interface for Claude's computer use agent and we'll be all set.

newusertoday

11 hours ago

are there any TTS models which are decent but can work on devices without GPU and have relatively low RAM(4GB)

XorNot

9 hours ago

The real utility of something like this is for reducing the creative costs of voice-acting. i.e. something like this is a massive boone for mod-makers where making fully voiced anything is a huge undertaking - i.e. while my friends and family could probably provide their voice if I asked, getting a decent recording and performance out of them is just not going to be possible.

But if I can get the performance I want and shift it to another voice, then fully voicing free works becomes very accessible (even better would be generative AI which could take a sample of what you want and re-render it into something which sounds like a more professional performance - voice in-fill I suppose).

ilrwbwrkhv

12 hours ago

There are a bunch of yc start-ups who are building new models and stuff in the space. I fear they are going to get decimated really soon as the quality of local llamas keep improving.

whaaaaat

10 hours ago

> Imagine creating a podcast where Mark Zuckerberg interviews Elon Musk – using their actual voices?

I'm imagining it. It sucks to imagine.

I'm imagining it being used to scam people. I'm imagining it to leech off of performers who have worked very hard to build a recognizable voice (and it is a lot of work to speak like a performer). I'm imagining how this will be used in revenge porn. I'm imagining how this will be used to circumvent access to voice controlled things.

This is bad. You should feel bad.

And I know you are thinking, "Wait, but I worked really hard on this!" Sorry, I appreciate that it might be technically impressive, but you've basically come out with "we've invented a device that mixes bleach and ammonia automatically in your bedroom! It's so efficient at mixing those two, we can fill a space with chlorine gas in under 10 seconds! Imagine a world where every bedroom could become a toxic site with only the push of a button.

That this is posted here, proudly, is quite frankly astoundingly embarrassing for you.

Ukv

6 hours ago

I'd claim the way most people imagine it being used for scamming, cold-calls impersonating someone the victim knows, doesn't really end up working out in practice because scam operations dial numbers at a huge scale expecting most not to pick up a "scam likely" call (or be away, or a dead number, etc.). Having to find a voice clip prior to each unanswered call would tank the quantity they're able to make.

For spear-phishing (impersonate CEO, tell assistant to transfer money) it's more feasible, but I hope it forces acceptance that "somebody sounds like X over the phone" is not and has never been a good verification method - people have been falling for scams like those fake ransom calls[0] for decades.

Not that there aren't potential harms, but I think they're outweighed by positive applications. Those uncomfortable with their natural voice, such as transgender people, can communicate closer to how they wish to be perceived - or someone whose voice has been impaired (whether just a temporary cold or a permanent disorder/illness/accident) can use it from previous recordings. Privacy benefits from being able to communicate online or record videos without revealing your real voice, which I think is why many (myself included) currently resort to text-only. There's huge potential in the translation and vocal isolation aspects aiding communication - feels to me as though we're heading towards creating our own babelfish. There's also a bunch of creative applications - doing character voices for a D&D session or audiobook, memes/satire, and likely new forms of interactive media (customised movies, audio dramas where the listener plays a role, videogame NPCs that react with more than just prereccorded lines, etc.)

[0]: https://www.fbi.gov/news/stories/virtual-kidnapping

yyuugg

2 hours ago

I think most people in America are more wary of foreign sounding voices. If the person on the other end sounds like a good ol boy, they get more trust.

Scammers don't have to sound like a specific person to be helped by software like this.

Ukv

an hour ago

That aspect feels to me like "I used to racially profile people on the street to judge risk, but winter clothing now obscures skin color at a distance". There are heuristics that give non-zero information but are harmful to use, with the cost borne by some marginalized group, and I don't see it as a negative for use of such heuristics to be made less feasible. Reducing people's use of accent as a factor would be a positive for the ~1.5B Indians that aren't scammers, for instance.

I think there's also an autonomy argument to be made, if the alternative is to the effect of ensuring that people cannot use tools hide their accent (and particularly if, as above, the intent is so they can be discriminated against based on it). Even though it isn't something we've really been able to do before, I think it's generally a person's own right to modify their voice.

farzd

10 hours ago

You do realise this is not the first AI release to clone voices?

cess11

10 hours ago

Sure, and PoisonIvy wasn't the first RAT. So what? Does it get more ethical to assist fraudsters and so on once more people are doing it?

aboardRat4

10 hours ago

Without Linux support it is going to have a very limited audience.

okwhateverdude

10 hours ago

There is nothing in here that precludes you from running this on any OS that supports python + CUDA. They use miniconda for installation of python and python packages, but this could just as easily be a venv + system CUDA install or even better: a container. This is only one tiny Dockerfile away from running anywhere.

tgv

5 hours ago

I'm with the nay-sayers. Your product doesn't bring any good to this world, but it does make it easier to harm people. It's a disgrace.