As a musician, the things I want most from generative AI is:
1. Being able to have the AI fill in a track in the song, but use the whole song as input to figure out what to generate. Ideally for drums this would be a combination of individual drum hits, effects and midi so I'm able to tweak it after generation. If it used the Ableton effects and drum rack then that would be perfect.
2. Take my singing and make it both sound great and like any combination of great singers (e.g. give me a bit of Taylor Swift combined with Cat Power)
I've had a play with the style transfer between singers (bullet point 2 above) but when I last tried it, it was garbage in / garbage out, and my singing is garbage.
What I don't want: To just generate a whole song. Adobe does this style of assistive AI well in the photo editing space but no one seems to have brought it to audio yet.
> RapMachine
Fine-tuned on pure rap data to create an AI system specialized in rap generation
Expected capabilities include AI rap battles and narrative expression through rap
Rap has exceptional storytelling and expressive capabilities, offering extraordinary application potential
Using a certain other music generator I got it to accidentally say ***. It said it with a Latino American accent too.
In fact for whatever reason this tool couldn’t use a typical AAVE voice. Just Sage Francis / Atmosphere like dictionary raps and a few Latino American ones.
A big limitation of AI sloop is it tries to not offend anyone.
Art that can’t even try to offend is barely art.
I want to play something on my keyboard (the only instrument I am slightly ok at) and then be able to tell it to play it with a saxophone and describe exactly how I want it played. I don’t need an AI to create a song for me, I need 100 session musicians at my disposal to create the song I want. I am very excited about having that type of ai.
Interesting how there is no mention of how the training data for this was collected. This does sound quite a bit better than Meta's MusicGen, but then again that model was also trained on a small licensed dataset.
Yes, please ruin music. Ruin everything you can. As long as you can build it, you should ruin it. There's really no limit. It's the masses who will actually do the ruining, so those building the technology are totally blameless. And you might even make some money, so it's all worth it.
> aggressive, Heavy Riffs, Blast Beats, Satanic Black Metal
Result: A generic pop-rock song without riffs or blast beats. Not even power metal or corset core, let alone anything even slightly resembling Black Metal.
Yup. Still doing what I expect from AI music.
The diagram is super vague. How are the lyrics encoded? What does the encoder look like inside? What is the input size, input format, output size, output format? Are the three encoder outputs added? Concatentated? When Mert and m-Hubert combine are they added? Multiplied? Subtracted? Concatenated?
I really wish people could make better diagrams.
Man, this whole topic hits way harder than I expected. AI taking shots at music creation makes me feel a bit hyped but also kinda iffy, especially when I hear people say it plays it way too safe. You think keeping things safe in AI art helps anyone actually level up or just holds us all back?
Really interesting — we're seeing more efforts now to bring the "foundation model" approach to creative domains like music, but I wonder how well these models can internalize musical structure over long time scales. Has anyone here compared ACE-Step to something like MusicGen or Riffusion in terms of coherence across entire compositions?
How do the quality and prompt adherence compare to Suno v4?
VPS 4 Cores SSD
Disk Space: 200 GB
CPU cores: 4
RAM: 4 GB
Ubuntu Server 22.04 can it run on vps server this small?
Is there a demo hosted somewhere?