pmarreck
a year ago
With the amount of debt they have, they don't really have a choice.
That said, thanks to the efforts of Meta and others, open-source AI running on your desktop is moving along at quite a pace. I can generate images superior to what is available via DALL-E/Meta/ChatGPT, and more or less completely uncensored, thanks to running FLUX.1[dev] locally (albeit slower) on the https://drawthings.ai/ app, and I can do language model work locally that gets very close to approaching, in some cases surpassing, GPT4o (albeit slower), on my M1 Macbook Pro (also uncensored, if that's what you want), and now thanks to Llama 3.2 I can also process images locally.
The only remaining things left are a good substitute for ElevenLabs' still-amazing ability to create realistic voice models of people based on a sample, voice input, multimodal interactive voice chat (i.e. Advanced Voice), more easily accessible function-calling running locally (regarding web requests, you might be able to block OpenAI, but can't block curl running from my house!), and o1-style chain-of-thought reasoning, but I think we have enough clues about how the latter works that we should see something any day now to compete with it.
(going on a tangent for a minute...)
I really want a whole-house computer that runs locally and is in charge of everything, responds like an LLM to voice commands in any voice I want (recognizing who is speaking as well), knows a bunch of things about me, has a personality I can customize like OpenAI's "custom instructions", and executes whatever functions I give it access to (searching the web, running code it's written, etc.), plus can stick to schedules. I'd be happy to pay a small licensing fee for the use of someone's voice.
I have a nightly job that coaxes me to bed at 11pm in Raphael's voice (from Baldur's Gate 3) using a dynamically-generated script from Claude. It's absolutely amazing and Andrew Wincott should seriously reach out to me to try to make a product out of it because I seem to have hit the jackpot with his voice model...
Here are 2 examples of its output: https://vocaroo.com/1kotE1UgYCoy https://vocaroo.com/1lUyZbGIHPIH (I do not know how long this will stay up on this free service, but perhaps that's for the best...)
d13
a year ago
ElevenLabs just uses tortoise with its own high quality recorded voice data. You could definitely do the same:
pmarreck
a year ago
Playing with Tortoise-TTS-v2, it's quite slow, although I tried it in WSL which may or may not have direct access to the GPU and may or may not be defaulting to CPU. I will play some more on my Linux laptop/Macs, but thanks for the heads up
pmarreck
a year ago
Just noticed that there's a Tortoise-TTS-v2 on HuggingFace (although the last update was 2 years ago). Certainly something to start playing with!