Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE
The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.
Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.
AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.
Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If youโre interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.
EQG and G5 are not welcome.
>Quick start guide:docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.
>The main Doc:docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
An in-depth repository of tutorials, resources and archives.
>Online speech generationhaysay.ai
>Active tasks:Research into animation AI
Research into pony image generation
>Latest developments:http://ponepaste.org/10865
>The PoneAI drive, an archive for AI pony voice content:drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
>Clipperโs Master Files, the central location for MLP voice data:mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx
>Cool, where is the discord/forum/whatever unifying place for this project?You're looking at it.
Last Thread:
>>42103996
FAQs:
If your question isnโt listed here, take a look in the quick start guide and main doc to see if itโs already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit
>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQnac/edit#heading=h.mnnpknmj1hcy
>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYiZyJF8/edit
>I want to know more about the PPP, but I canโt be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz
>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. Thereโs always more data to collect and more AIs to train.
>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.
>What about fan-imitations of official voices?
No.
>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.
>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and weโll take a look.
>I have an idea!
Great. Post it in the thread and we'll discuss it.
>Do you have a Code of Conduct?
Of course: 15.ai/code
>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm
PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97
Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8
>woken up just 5 minutes after thread passed page 10
Stupid fuckers and their "1 post by OP with retarded one bait sentence" threads.
Anyhow, are you guys busy with doing entries for antithology or what (I know I am, im sitting on like 5 half assed ideas that still need doing) ?
>page 9 after less than 4 hours
Board activity but at what cost ?
>>42161566The cost is our sanity.
Is there a FLA of Fluttershy's cabin interior or her bedroom in the leak on web archive called MLP FLAs? I tried Dragonshy, Part 1 of Friendship is Magic and Stare Master but it's not in those...
>>42163358From what quick googlefu tells me, the list of leaked full assets episode we should have access (from season 8 episodes) is as follows :
6 - "Surf and/or Turf", 7 - "Horse Play", 8- "The Parent Map", 9 - "Non-Compete Clause", 10 - "The Break Up Break Down", 11 - "Molt Down" - , 13 - "The Mean 6"
I swear we had some bits and bobs from other episodes but I cant seem to find a proper list of what is (and is not) archived.
There is this scene from Super Speedy Cider Squeezy 3000 ( and I think in the later season eps with Nightmare Night and one were Discord suffers from being "normal" as well)?
>https://codeberg.org/nak/sample-neko
Here is a tool the I spotted on interwebs, that allow to easily list and move 1k+ sound clips from one folder to another .
I feel like it could be really useful to Anons here organising their folders for production of big or small projects.
>>42165897was litterally thinking about how i needed sound effects from the show for a project i was doing
more specifically little things like character laughs or snorts n stuff
>>42165947A lot of those are in Clipper's Master File Part 2:
https://mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ/folder/EMZF3ApB
>https://files.catbox.moe/vx3yr9.mp3
>>42164184ugh, is there a way to get the pop up when you first download a torrent to select files to download again? I've got the magnet for the leak.
Best tools if I want to gen Cozy Glow lines?
>>42169246I'm guessing you wish to have it local and didn't want to use haysay ? Get yourself python and gpt sovits.
>https://github.com/effusiveperiscope/GPT-SoVITS>https://huggingface.co/therealvul/GPT-SoVITS-v2/tree/454406eb40b63c5571f33c29f4fd8bac197131d6/CozyGlow-SVe24-GPTe48
>>42169373Which haysay architecture has the best Cozy?
>>42169376I'm pretty found of rvc one BUT it heavily dependent on the input audio .
>>42169373What's the current sota for voice2voice conversion? Preferably something that can be finetuned. The latest gptsovits v4 is very good but it doesn't sound like the reference so an additional step is needed I think
>>42169924rvc and so-vits are still the king, I think some Anons posted some other "minimal dataset voice cloning" stuff in the past but none of them seem to stick around (with the github codefags making their training process way too complex, or pulling requirements out of their assess).
I heard through the grapevine that 15.ai is coming back, anyone heard about that?
>>42170546>https://desuarchive.org/mlp/thread/41706417/#41711970Pretty sure that site is still ded, and it will stay that way for very long time (aka 4ever). if any new code were to be produce by 15ai it would need to be some kind of collaboration with other codefags to avoid being chased by tiny hat lawyers , and by logic of nobody sharing such news around means it's not happening .
>>42169924GPT-SoVITS is mainly intended for text-to-speech. The reference audio is only for providing an emotional style. For speech-to-speech, you should stick to RVC.
Is Haysay down for anyone else? I can't seem to reach the site at all.
>>42171965https://files.catbox.moe/4sz8fc.mp3
the pretty mare voice site seems to be working fine for me. did you try different browser anon?
>>42171853Why wouldn't I be able to do GPT-SoVITS => RVC?
>>42172693yeah, you can, one problem is sometimes the RVC derps out the outputs when trying to give it lines of the same character, sometimes it depends on what kind of note the clip is hitting and sometimes the electronic goblins are messing about, so just test out different TTS voices to see which one works best with the RVC character you want to output.
>https://nitter.space/shweta_ai/status/1912536464333893947
I need this for mare content, so I can finally get AJ speak a deep south accent without fluffing around the different words spelling, or get Rarity pronounce words in way more posh manner.
>>42166202>>42166241Crossposting from /chag/ thread, they are planing on doing some collaboration with /robowaifu/ guys to start making irl robot ponies. Very cool, and good luck to you !
First actually good local music model, like suno v2 quality. Fast as fuck as well.
https://www.reddit.com/r/LocalLLaMA/comments/1kg9jkq/new_sota_music_generation_model/
>>42173899Also has lora training already, could 100% train pony singing.
https://ace-step.github.io/
https://github.com/ace-step/ACE-Step
Passes the nigger test.
https://vocaroo.com/11MoCQ68jiLY
And this is fun.
>>>/g/105183843
>>>/g/105184228
I'd love to try with some MLP songs, but I'm a VRAMlet with 6GB and I don't think I can run this yet.
>>42174105uhh, the collab file they provided seems to only do "text2music", could you/somebody explain how that anon re-edited the OG song with new shitpost lyrics into it?
>>42174936oh, just noticed its in the repair->upload section. however I tried to do a "replace X lyrics with new lyrics" and it really seem to suck ass at it, so im not sure if the anon that made the above song was lucky or had enough autism to spend several hours trying all kinds of combination in making it work.
>>42175015Nope, people posted multiple results in that thread where it Just Worked. The only thing I saw is that the quality will get worse the more the lyrics are changed.
>>42175321Oh. I was trying to go for a full lyric replacement, I guess this GitHub is a right step into that direction, it just nit ready for my exact autistic requirements.
Hopefully by the next year we will get improvements on it, because I have some text parody ideas .
>>42175929I saw someone say that you can separate the stems and get better results. Perhaps you could edit portions of the lyrics one at a time, then mix them back into the instrumental.
Question:
During training, can I use files tagged as clean and noisy files?
>>42176140Sure, however keep in mind the quality of audio outputs may suffer from it, specially if the ratio of good clips vs noisy clips is skewing towards the noisy side.
And since there are characters that have pretty much noting but mostly noisy audio (like Tree Hugger) the end results may vary from "kind of bad" to "surprisingly decent" .
Question to the Anon that was working on OpenUtau diffsinger models, are you planing on creating the models for Rarity and Fluttershy?
>>42176608Truth be told, I was planning on it eventually, but I don't know if I really want to anymore. Twilight, Applejack, Rainbow Dash, and Pinkie Pie are a bit spotty as is, and I worry that with Fluttershy's abysmally low amount of singing data (from what I could find) and just not feeling up to it for her or Rarity, I don't think either of them are gonna be made into models anytime soon. Keep in mind, I don't just train one thing, I have to train the acoustic model, then the variance model, then the pitch model, and then fine tune the vocoder, which both takes a lot of time and a lot out of me. I'm not saying it won't ever happen, because I do feel weird about leaving things with just the four I did, but I can't for the life of me bring myself to do the other two just yet. But they'll come one day, hopefully.
Speaking of model training, there's still a good few voices that're absent on RVC. It'd be nice to see Moondancer and Cadance and whoever else hasn't been trained yet, Cadance has a model for RVC but it's super noisy.
>>42176713>Moondancerhuh, you are correct, I will see if I can train her rvc model.
>>42177027hmm, not a great news, Ive check the mega and even when removing only the unusable very noisy audio lines, there is still only 1m50s of audio, which is less than ideal 3m but I can still try.
>>42176713>https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Moondancer>https://vocaroo.com/1hV4kTcwCp3EHere she is, the result isn't half bad but for some reason her voice seems slipping into Rarity voice range. And of course male input voice lines will sound bit rougher in conversion.
>>42178450more years! TRUST THE PLAN!
>>42177542Awesome, thanks. I look forward to trying it once I have the time.
>>42176713 >>42178958>Cadance>https://voca.ro/188F1imvN2L7>https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/MLP_Cadance_CleanRVC model of Cadance, trained on clean audio only.
https://files.catbox.moe/x41lrp.wav
I have generated with this repo: https://github.com/CookiePPP/cookietts
Model from: https://drive.google.com/drive/folders/1nTyn6qr2b76aOE430trasuZj0Kr2H_ya
(Tacotron2: tt2_outdir_p3_2_0.5DFR_0.0Dropout)
(Hifi-gan cp_hifigan_universal44Khz_mlpft)
>Maybe I will create a better vocoder and Notebook
>>42181482That's interesting Anon but I'm not sure on how it will compare with all the new tech, since tacotron is almost five years old.
>>42181575I feel like there isn't much coming out for pony specificly in recent times though.
Does anyone want any bonus features that I can add?
>>42183033I know, right?
>>42183358> To the Inference Script
>>42183358Well, I would like it if the offline gpt-sovits script also copy the haysay options for automatic emotions drop down menu as well as the audio clip slow/speed up stretch settings, but that's something Vul would need to add to his webui script.
>nitter.space/jason_kint/status/1921546181357838531
>nitter.space/LuizaJarovsky/status/1921286826402422927
>ai copyright to affect the "commercial use"
Time to split the hairs on what counts as "commercial use" and what doesn't. Also good luck trying to force this on china and their no-fucks-given R&D departments.
>>42184597>muttmericaPhew, I thought it was actually serious.
>>42184597>america keeps digging its grave in the name of "progress"the soviet union fell behind in technology because the government tried to control things, but yeah, let's not learn anything from that.
>>42184600I can see Diseny and such trying to push for it, just like they did with hundreds of years of copyright laws, but as Anons on /g/ pointed out, all the big league companies need to do is buy portions of semi big publishing companies and claim that retroactively all the existing books on the system were allowed to be used in ai training.
>>42184613Tell me about it, I remember reading a biography of electrician that was bribed to "no be in hurry" when repairing the wheat moisture measuring apparatus, because the assigned inspector could use rule of thumb on deciding how much moisture was in the transported grain and deduce the farmers pay while pocketing the spillway difference.
>>42179060Your local AI still can't sing worth a shit.
Evolve or die, PPP.
Voice acting requires a certain melodic way of talking which your current model does not support, 3P General.
>>42185313There are no more than ten anons itt, all namefags, that know their shit, and they lead very busy lives. This thread was just anons enjoying the fruits of others' labors. There are no more fruits to enjoy, or worth enjoying so the Pony Preservation Project has become the Pony Preservation Project Preservation Project. It's over.
>Mareification not required.
>>42185322yeah, back in 2019 + 20 everybody were hyped since show only just ended and board was still pretty alive (and with everyone locked up, all they could do is making pony content without any distractions). Now a lot of the ai tools have became available (music, art, even animations) but everything is kind of disjointed and difficult to put together.
I feel Anons just need to find a proper spark, something that would be fun to work on, like randomly spotting a song and wondering how it would sound if it was done by pony.
>https://files.catbox.moe/qg2qn5.mp3
Anyhow, VS singing the Ye new song, OG cover from TowerGangToad. I really wanted to use Zecora voice but the voice clips just wouldn't come out right from neither of the model types.
>>42186106Don't forget that a lot of new stuff gets immediately corpo'd these days too. Shit like that stifles innovation.
>>42185313>melodic way of talkingChina is the future
I will save this general.
>>42186245Try replicating S1 Luna's voice. Chip in some money and put Tabitha to voice it.
>>42188332>S1 WoonaIt's technically doable.
https://huggingface.co/spaces/Plachta/VALL-E-X
https://desuarchive.org/mlp/thread/40503961/#40518915
It will just take about 1~6 months of non stop generating audio until the artificial dataset has five minutes worth audio clips.
>>/wsg/5872172
I want this, but for ponies, dubbing in my country is cursed, either VAs will put energy to empathize wrong aspect of character (a young rogue like adventurer will instead sound like snotty little shit), give no shits to act at all or give the role to somebody that will completely not fit the character.
>>42188537>https://files.catbox.moe/yck7ps.mp4fug, crossposting failed
>>42188455Would be funny if that happened.
>>42181482I've updated the synthesis script, and now these are the new results
>https://files.catbox.moe/tv8c4i.wavDoes it sound like those 48 kHz MMI models, or does it sound like newer tech?
https://www.minimax.io/audio
https://minimax-ai.github.io/tts_tech_report/
>>42191078Is thats TTS or voice conversion? It still has that funny buzzing that tacotron2 / talknet models suffered from, so its kind of hard to tell if .
>>42191153hmm, website do not seem to be more useful than other tts sites. BUT the paper is interesting, if the cloning of 5 seconds is not complete cherry picked bullshit I would love to be able to use it.
>>42191285TTS.
> The repo: > https://github.com/TheDevloper2023/cookiettsfork/tree/master/CookieTTS> which is a fork of https://github.com/CookiePPP/cookietts/tree/master
Hi, it's been a while, hasn't it?
Here's an alpha website that you can play around with: https://alpha.15.dev/
The backend is currently running on just two GPU instances, and I've set the inference batch size to 1 since this new model requires a lot more computational power than it did two years ago. I can increase the number of GPUs depending on how long each request takes.
More characters and emotions will come soon. Feel free to report any bugs or issues here, too.
>>42195922I hate your guts, sleazebag
>>42195922>https://alpha.15.dev/examplesnice examples kek
>>42195922>https://alpha.15.dev/Can I send this outside of this thread?
>>42196204Sure, go ahead. I'll make an official post on Twitter soon, probably within the next few days.
>>42195922I'm kneeling so hard rn it hurts
>>42195922I have no choice but to kneel
>>42195922IT'S HAPPENING!
>>42195922https://files.catbox.moe/k18mof.mp3
Three stars and now this? We are so fucking back boys!
GASPS
md5: e44c385cebda47eaf89133bf11bbd75e
๐
>>42195922https://files.catbox.moe/01otal.wav
Woah, hi again.
New model's sounding better than ever before. Good speed, emotion settings all work reliably, sounds clear. At the moment it sounds like the characters fall out of how they're supposed to sound on occasion though. Rarity in particular with the fear emotion gives some very strange outputs.
https://files.catbox.moe/k1kvsc.wav
Also, as a UI note, the change notifications upon switching settings and voices blocks the generation button on some resolutions when scrolled up. Only for a second, but it can still delay things.
Dear Hydrus Beta, as everyone will get really hyped for return of 15ai, I just want to say I appreciate your work and thanks to HaySay I was able to do all the fun mare music conversion. I hope you will keep it alive and updated as new voice ai will show up in the future.
>>42196337Seconding this, Haysay is a godsend for my workflow on music projects.
>>42195922https://u.pone.rs/whgPbfzU.mp3
>>42195922>new site>>42196305>new shitpost>>42196355>new smuttybrings me back
>>42195922https://voca.ro/140YNkYngHyz
>>42195922Godlike web dev skills god fuckin damn
>>42195922https://u.pone.rs/EcUvtwYk.mp3
I hope he will add the old "|" emotional control from the previous website, since the clip reference one is pretty wishy washy. Having both would be pretty perfect to fine tune the output audio.
>>42195922I can't believe waiting two weeks (a few times) actually worked!
a
md5: 757186edfb8d8ad3ddd3725b2009b32d
๐
>>42195922Yep, it's been a while, cool website.
Let me nit pick on flicker during that transition animation.
>>42196514literally unplayable
>>42195922Curious, how much (if any) AI did you use to make the website?
As for the framework.. React + Next.js? Looks good.
And welcome back.
>there is site OC
Im so sorry bro, but the internet rule demand it.
>>42196569qt oc, whose artstyle is that
>https://u.pone.rs/mLbrNDQB.mp3
Lets test this new site. Gin Blossoms - Hey Jealousy, done with Glimmer RVC to Sovits5 singing model (sounds ok, but i was hopping it would be better.
>>42195922https://vocaroo.com/1bITXue82eed
>>42195922WE ARE SO FUCKING BACK LIKE NEVER BEFORE
>>42195922we got 15.ai revival before gta 6
>>42161191 (OP)I know I speak to the dedicated deluded, but the machine is not the path.
>>42195922awesome work but damn we really need an S1 Dash voice preset or something. nu-Dash voice is fucking nails on a chalkboard.
>>42196683Get a hobby you poor creature.
>>42195922Can we get an ETA on when you are open sourcing this?
I think it is an obvious concern that this will all suddenly disappear for years again.
>>42196738I'd say completely exclude post S3 audio for mane six. Of course it's needed for side characters who lack speaking lines, but it's better to avoid when possible.
>>42196754About 14 days or so
https://files.catbox.moe/o4z53n.mp3
>>42196754one more fortnight
>>42196757>>42196754How do you know that?
>>42196778Sounds like you're not trusting the plan
>>42195922CHUDDA ETERNALLY BTFO
IT'S HAPPENING
>>42195922https://files.catbox.moe/9gopqy.mp3
>>42195922Your shit is obsolete, yes that's what happens when you sit on your ass for years with proprietary software. Thanks for GPTSoVits and other solutions. You should have disappeared with your website, at least that wouldn't have tainted the few good memories left when using your tool. Fuck you and your five hours of fame you needed to still feel relevant.
>>42195922One kinda big problem, it won't let me use the ' sign for words... which is weird since a lot of words like don't and isn't NEED that sign.
1665
md5: 0c70337ffb8b4141becfd4c2551ec813
๐
>>42196801You do not need that.
>>42196800shut up, nigger
>>42196803You're right, I don't, but if 15 can fix that, it'd be a big help. Otherwise, the ai second guesses the pronunciation for the words, and it's just... I dunno, I just think it would be a good QOL fix.
>>42196800Total barbietranny death.
>>42196801YES HE FIXED IT!! Thank you 15!
>>42196800It does sound like ass. It's a shame because they're ponies.
>>42195922>>>/g/105281388
1004
md5: b6268291d01afd634b7a3d8655da7ccf
๐
>>42195922Nightmare Moon has a huge improvement from her previous voice that just sounded like drunk Cheerilee
https://voca.ro/1j9J3CBPQqWN
01
md5: e06d8c8d85d24f0b2f0c2fcde7eecccf
๐
>>42196305https://files.catbox.moe/ryyshr.mp3
https://files.catbox.moe/nu5qft.mp3
https://files.catbox.moe/urd6et.mp3
Gosh, I've missed this so much. Posting like this takes me back.
>>42196867OKAY DAMN that actually sounds dynamic! I love it!
>>42196896Derpy, Maud, and Rainbow Dash, right? It's great that I can actually recognize the voices, to be honest.
>>42196899>Derpy>It's great that I can actually recognize the voices
>>42196896https://voca.ro/1jlDvvakwJgi
29
md5: 530bbe6718cb4100de3c1478234c9eeb
๐
https://files.catbox.moe/l8ex9a.mp3
>>42196896
>>42196902Is that not Derpy? I thought because of the โclumsyโ mistake and the familiar tone that it was her.
>>42195922https://u.pone.rs/moQGuPxl.mp3
>>42196923last one from me tonight.
https://files.catbox.moe/esztvq.mp3
>>42196571I know who's the artist I would rather not tell you directly.
he draws fuck tons of futa.
https://files.catbox.moe/qoia1a.wav
Luna's crash-out in A Royal Problem if she wasn't fucking around.
>>42197294https://files.catbox.moe/hhwgsc.mp3
mp3 like it should've been from the beginning lol.
She sounds angry & sarcastic which is how I feel, but still unintended on my part.
https://pub-f3186dbecfd64ac085ddc742fc900f59.r2.dev/twilight_sparkle_neutral_1747418267794_variation0.wav
>>42195922>Feel free to report any bugs or issues here, tooYeah I see several bugs:
0. You're still not willing to jew out despite clearly needing the money and influence. Jew out or others will outjew you. Stop being a social recluse that's how all scientists die. Learn to sue everyone cause 11.AI clearly stole your technology you moron.
1. You're not open sourcing this to the community (which are of minimal help and lack money to pay for GPUs but they're willing to learn and are very loyal and creative despite me trashtalking them myself back in October)
2. I'm pretty sure ElevenLabs, Udio.AI, SUNO.Ai, etc. stole your technology and perfected it already since 90% of the singing & talking sounds like Tara Strong, Rebecca Shoichet & Ashleigh Ball. The AI can really sing too. To an audiophille it still sounds bad, but to a normie it sounds perfect. Get a fucking marketing team, both you and Tara Strong fucked each other up and should sue every single audio AI possible.
This is what Suno Ai can do right now with the paid model:
https://www.youtube.com/shorts/udOgG0M8pVI
3. Your options & UI is still limited. If I could search a reference line to use any emotion I want without typing in phonetics then that'd be useful for the average normie. You didn't understand what I just told you, did you? LET ME USE THE REFERENCE LINE TO QUICKLY & INSTINCTIVELY USE THE EMOTION I WANT. WE HAVE AN IMPECCABLE MEMORY OF THE SHOW'S DIALOGUE LINES.
Add a voice changer/voice to audio option. It would be so much more intuitive because the AI could hear what emotion I'm going for instantly.
Today's AI still lack a ton of UI options but are getting there at an insanely quick speed such as Suno's ability to grab an existing song and have either the same singer or a new singer sing the same notes with different lyrics.
Today's AI still sounds like an untrained voice actor slurring his lines on purpose and it still sucks compared to audiophille standards, but your current robot sounding AI is dreadful by normal standards. You still haven't learned how to remove the noise?
https://www.youtube.com/watch?v=qu5nnMOQ4VU&ab_channel=A
https://www.youtube.com/watch?v=I1Dy0Zfw6Qs&ab_channel=votums
3.5 You probably didn't notice cause you're not a voice director or you're autistic but ... S1 and S2-S9 's voice directing is completely different. 90% of the dialogue lines used in S2-S9 used only these emotions; depressed, angry, flirty, ANXIOUS, TIRED, reading-off-a-script-at-gunpoint. And that's the acting ... the voices?
In S2+ everyone sounds...
Twilight sounds much lighter in S2+
Applejack & Dash sound much deeper and not in a suave way.
Pinkie sounds way lighter & screechier.
Fluttershy always sounds anxious
Rarity & Spike kinda sound the same.
4. One more thing...
>>42197340Fuck off retard.
>>421973404.
Contact the original voice actors and work together with them. Give me S1 Woona's voice and all is forgiven on my side. ;) Can't say others will forgive you for being a weak leader. These effeminate pussies need a strong leader and I suggest you do too if you can't march down 11Labs HQ and sue the living shit out of them together with Tara Strong. Sounds jewish but that's the truth. You got to outjew the jew in a jewish world. Mrs Strong knows that. I know that. Why can't you fucking comprehend that?
https://youtu.be/wbzRRp2jRHw?t=103
This is what voice acting AI sounds like now:
https://www.youtube.com/watch?v=lPAtoR3YCSc&ab_channel=UndeadHumor
https://www.youtube.com/watch?v=0j1eX7F8OOo&ab_channel=DevilArtemis
BUT I'M GUESSING YOU ALREADY KNOW THAT YET YOU STILL REFUSE TO DO SOMETHING ABOUT IT.
Call your father or something for God's sake, you college pussy kid. Your technology is being stolen under your nose and improved upon tenfold(by jews, not your followers) and you're here moping like a pussy on Twitter and then coming back with a niche version that does 1 thing barely any better and still sucks dick at the other 9 things that goes into audio.
CAN YOUR MODEL AT LEAST SING RIGHT NOW? Cause SUNO's shit can and Udio used to sing good before they had to neuter it because the record companies were after their asses. Why aren't you after their asses as well?
God you need a father in your life, kid. A father to watch over you and learn to sue and break skulls for you cause jesus christ after that twitter whine ... you're still a pussy who refuses to BE A MAN AND SUE THE LIVING SHIT OUT OF ELEVEN LABS FOR STEALING YOUR MODEL. Give Tara Strong a call too. Do you want me to do it for you?
Respectfully yours, the redpiller known as Vogelfag.
>>42197358I uh... 15 maybe should've been a bit better at leading, but WOW this is kinda rough. But they say the truth hurts... wait, aren't we only operating under the ASSUMPTION that ElevenLabs stole his work though?
oh boy the schizos are out now
>>42197340no ones reading that
>>42197369no one cares vogelfag
>>42196867https://u.pone.rs/HEiyutXb.mp3
>>42195922Btw
https://voca.ro/14Y5dHWMbMpx
>>42197340>>42197358Your words are wasted on that idiot. 15. He was always a pretentious egomaniac and I'm glad the era where we didn't have any viable alternative is long gone. He's not even competing with the current opensauce options, let alone the paid ones.
>>42197618what are the opensauce alternatives
>>42197624https://github.com/effusiveperiscope/GPT-SoVITS
>>42197645isnt that what haysay uses but it doesnt sound as good as this though
>>42197553Holy fuck. Please make a full length version of this.
>>42197553Incredible. please keep going.
>>42197553Damn, am I going to have to help finish what I've started?
>>42198418Please, Iโm begging you. Make more
>>42195922Great to have you back, the new website looks fantastic.
Some notes after a few hours of testing (mainly with Rainbow and Twilight on happy and neutral):
I noticed that speech will often sound unnatural with a "rough" sort of sound, especially at the end of sentences. It's been taking a lot of re-rolls to get outputs that sound natural throughout. As ever I'm finding it very hard to articulate exactly why a lot of outputs sound off or spot trends. Been thinking about what exactly to say here for quite some time but I think it'll be more effective to just use the report feature on any examples I come across from now on. The voices generally sound very accurate to the ponies and there's already plenty of good examples ITT, so the potential is clearly there.
Things like the Twilight #3 on the example page are common issues with the "rough" sound - "aviation AH0 N", "fly AY1", "fat AE1" "ground AW1 N D".
Pretty sure this was an issue in previous versions of 15.ai, particularly the tendency to slip up at the end of sentences.
Short sentences (~three words or less), especially when generated on their own with nothing before or after, are consistently bad.
"Anon" is often pronounced wrong, tends to get split into either "A Non" or "An On" and is spoken with a little break between them like they're two separate words.
I'm tentatively thinking that reliance on reference lines from the show to control delivery, emotion, pacing etc in the output (I assume that's what the model is doing) may not actually be the best idea. It's great if the reference line that gets picked happens to match how you want the output to sound, but more often than not it won't and you'll be totally boned if there's no match at all. Even if there is a reference line that matches, you'll still need to take the time to find it or rely on RNG for it to be used.
I won't speculate any further on this for now since I don't know exactly how the reference lines influence the model. Would be good if you could fill in some blanks here.
Not yet found any bugs with the site, but I do have some feature requests:
1 - An option to automatically play new audio as soon as generation is complete.
2 - A button on the outputs to immediately regenerate with the same settings.
3 - Report function is useful, suggest also adding a thumbs up icon or similar to highlight when the model does well.
4 - Not sure if it's my browser, but the download button always opens the audio in a new tab where I then have to click the three dots icon to download. All those extra mouse clicks quickly add up.
Hope that's helpful, you're doing great work here.
>>42195922Oh wow. Welcome back, 15! I am really happy to see you have a site back up, and the UI is slick.
>>42196337>>42196341Thank you for the kind words. I plan to keep Hay Say running. I am glad you have found it useful.
>>42198611>"Anon" is often pronounced wrong, tends to get split into either "A Non" or "An On" and is spoken with a little break between them like they're two separate words.This was because the dictionary had an incorrect transcription for "anon"; this has been fixed. If you run into any similar problems like this, you can report a transcription by hovering over the colored box and clicking the report button.
>1 - An option to automatically play new audio as soon as generation is complete.>2 - A button on the outputs to immediately regenerate with the same settings.>3 - Report function is useful, suggest also adding a thumbs up icon or similar to highlight when the model does well.>4 - Not sure if it's my browser, but the download button always opens the audio in a new tab where I then have to click the three dots icon to download. All those extra mouse clicks quickly add up.Done.
>>4219764515, is the model just GPT-SoVITS, but fine tuned on MLP?
https://voca.ro/1mlZCjsv6tJ2
Dang, this is pretty good.
>>42198701>haysay is downI am this close to considering selling my kidney for a good gpu
>>42200231What odd timing. Thanks for letting me know. The site should be back up now. The EC2 instance got in a weird state where it became unreachable again.
>>42200380The amazon anti-brony lobby is getting stronger per day. btw what would be requirements for haysay if I would like to run locally in its full compactly?
>>42200397Hay Say can run on most machines, but will be very slow on older hardware. I do not recommend running it on Apple silicon because it is very slow on that hardware (to the point that it's basically unusable). I recorded some benchmarks on several machines, which may give you a clue as to how long it will run on yours:
https://github.com/hydrusbeta/hay_say_ui?tab=readme-ov-file#testing-data--benchmarks
Having a GPU is not required.
>>42200690Oh, I forgot to mention that you need a LOT of hard drive space (about 100 GB now), and having at least 12 GB Ram is recommended.
>back to being dead
come on
https://x.com/fifteenai/status/1924269599542968655
tenor
md5: 68f57370a19ff0b60c559678836bcb99
๐
>>42203845>Discord serverKek.
>I just added 4 more GPU servers because of the huge number of requests coming in. This is actually going to bankrupt me.You know, you could just... open source it?
Then you wouldn't have to pay for any of it, you wouldn't be expected to constantly maintain it (this has been a recurring issue, let's be honest), and you would meet your original promises.
FYI, GPT-SoVITS v4 came out.
While v3 downgraded the quality, they boosted it back to 48KHz and it arguably sounds much more natural.
There's a good report here: https:// 8 chan.moe/ais/res/6258.html#q11121
>Ref: https://voca.ro/13vsNeBHC2Xu>Best result I got from v4: https://voca.ro/1j2I5rUzAZxj>Same example with v2 (the end was cut due to my shitty api): https://voca.ro/11qFHhR7HtG1This is the only comparison I've heard so far though, seems like it was a very silently received release. Needs to be tested more.
>>42200380If you could look into adding v4 to Haysay (assuming it does hold up with pony voices), that'd be much appreciated.
>>42195922You make a cute couple.
>>42204296Trying it out.
Oh boy, new setting under SoVITS Training. Guess I'm leaving that at the default 32 for now.
>>42204454>hecking mare>she/ponyHe's just having a laugh, r-right?
Inb4 they ban saying bad words with the ai
>>42204529discord can ban over stuff like saying nigger iirc if people report it
I doubt any text restriction will be imposed but its understandable you dont want kids spamming nigger word in the discord
>>42195922thank you for your service, king
>>42195922Cool that you're back. Though its a bit odd that you say that 15.dev is provided only for non-commercial use, then license the outputs under CC BY-SA 4.0, which explicitly permits commercial use. Shouldn't outputs be licensed under CC BY-NC or BY-NC-SA instead, since it would be in line with your earlier statement that the site is to be used non commercially?
>>42195922ya taking on new voice dataset or only retraining the old ones?
https://huggingface.co/OuteAI/OuteTTS-1.0-0.6B
>>42205033Their twitter examples are bit meh sounding, im guessing the wow factor would came from the fact that it can work with 14 different languages. Would be really nice if I had a voice dataset from foreign dubbing and be able to use for english languages.
If you still lurking Vul, thank you for making that sfx_sep_v2 filter for vocal remover, this stuff is so bloody helpful in prepping the audios.
>>42195922Holy shit, only noticed it now. I don't know what changed for the site to make a comeback, but it's nice to see it again.
>>42195922Did a bunch of work with Rarity today, mainly with happy emotion, and notably found that I tended to get better results when I turned the temperature way down, 0.2-0.4. Tried that with the rest of the Mane 6 but Rarity seemed to be the only one to significantly benefit, Twilight and Rainbow in particular still sound "rough" almost all the time no matter what I do.
Even so, Rarity's improvement is significant enough that I'd suggest everyone experiment with adjusting the temperature, there may be an optimal value for each character that I've not found yet.
Short inputs continue to be a problem, even short sentences that are part of a longer input - reported a bunch of instances of words being mispronounced, weirdly elongated and even skipped entirely.
Also had a few times where the page froze when I switched tabs to do other stuff while waiting for generations to complete.
Could you unlock the quality slider at least in the faster direction? I'm finding generation wait times to be the main bottleneck right now and would like to give that a try. Perhaps also allow larger batch size when faster quality options are selected too.
>>42195922>no more emotional contextualiser (the selections are a decent sidegrade I guess but come on it was much cooler)>still using arpabet despite even resolving the IPA>AI guesses what I want it to say if it's not in the dictionary instead of just phonemising the words because I know what I want it to saywhy
>MoondancerBless you, sounds like shit tho
>>42206122https://files.catbox.moe/tu4s0l.mp3
How's this?
>>42204536Fair, but I will never trust someone with a mental illness flag in the bio.
Sup, got an sudden inspiration to get the voice from Clone Wars narrator trained. Not pony model but I feel like this could get some good use out it in the future anti clips.
>https://huggingface.co/Amo/RVC_v2_GA/tree/main/models/Star_Wars_Clone_Wars_Narrator_v2
https://files.catbox.moe/bjljdm.mp3
Not 100% happy with it as the input needs to have that specific "umpf" energy to it.
>https://huggingface.co/Amo/GPT-SoVITS-v2/tree/main/Clone_Wars_Narrator_v2_so96_gpt24
Gpt-Sovits, wavs included.
https://vocaroo.com/1oycsmzwxgVy
https://vocaroo.com/12qbwj4NK8XP
https://vocaroo.com/1fBcauUi9ZIP
Due to pronunciation script some words sound pretty weird but nothing but little but of editing can't fix.
>>42195922now all i need to do is figure out how to make ponies moan
>>42207220One step ahead of you.
https://files.catbox.moe/7wktvb.mp3
All I did was enter "AAAAAAAAAAAAAAAA!" and the moaning just kinda happened.
15 for the love of God find a volunteer to do your PR, you called a random Hasbro employee pathetic that is not something you should do if they are inquiring about your service despise how obnoxious the cocksucking corpo suits are. Being aggressive like that isnโt doing anyone any favors
>>42207459All hasjew employees deserve and should be publicly mocked.
>>42207459>you called a random Hasbro employee patheticare you retarded perhaps
>>42207472Yeah I call them retarded niggers off the mic but when youโre face to face with them you shouldnโt let that go out.
Since 15 is a stemfag gook I wasnโt expecting diplomacy and social skills from him but this is actually crazy, no one cares about your inbox.
>>42207477Even if that was a scammer like who the fuck cares nobody cares about your inbox nigga
>>42207479>>42207483I care though, this is funny and based as fuck
>>42207483Repeat 30 more times about how much you don't care.
>>42207488Settle down 15 minion you have a sever to moderate
>>42207459He wasn't even calling Hasbro employees pathetic though? It was some random guy trying to snitch by CC'ing all these people.
>>42207493You're the retard sending the e-mail, got it.
>>42207525Finger pointing like that isnโt healthy tranny
https://x.com/UnslothAI/status/1924848135991656603
>>42207518This, wtf is anon talking about
>>42207576once again, it's all written like next breakthrough in technology but nobody is posting any examples at all, not even cheery picked ones.
>>42206425Still bad, just compare to any actual Moondancer speaking. I'm not knowledgeable enough to describe exactly how it's wrong, but it's too deep and not "light" enough?
it's been six fucking years, jesus christ. i still can't believe how big this project got
>>42208577it was dead for a while but only recently started becoming alive again
>>42207713Ah, okay. I thought it was a matter of quality and not the voice itself. But you're right, it's not as light as her in the show...
>>42208651https://www.youtube.com/watch?v=730zGRwbQuE
Indeed, its has been bumpy few years, yet in the end the infinite power of ponies will prevail all hardships.
>>42208651It's good to see it getting some steam. This is far too potent to let it fall to pieces.
Someone on the server wanted to get 15 to censor the swear words from the site. Say it with me...FUCK no!
>>42209071>Hey everyone look at what some nobody said on my Discord!No one here cares about social media drama. Keep it in Discord and out of here
>>42209071This is why you don't cozy up to Discord groups. They'll try to corrupt you every time.
>>42209071Gee, what a surprise.
>https://files.catbox.moe/asxfuv.mp3
close enough welcome back uberduck discord
>>42213584>uberfuckNo thanks.
>15 is back
>still dead
It's over
>>4221422415.ai isn't really good enough to revive any interest after the novelty of making ponies say nigger wears off.
>>42214240https://u.pone.rs/OWiJmVGB.mp3
>>42214240>DashconComparing a literal scam to 15 is plain retarded.
Hello fags made a new ai skit with 15.ai its good to be back
https://files.catbox.moe/29w2tt.mp4
>>42214718Comedy bros, were are you?
https://files.catbox.moe/aov4vh.mp3
>15 service re-emerges
>Typing rapidly ensues
>Old prompt tricks still draw out the mysterious liminal echoes of the mare
These digital equines have the most fascinating voices
Compilation of Liminal Trixie sounds
https://files.catbox.moe/gznbmc.mp3
https://files.catbox.moe/spb6zv.mp4
>>42215735What was that quote? I can't remember where it came from.
>>42216763moonbase trixie
I need more lewd moans. Gasps, sighs, groans, chirps, murmurs, mewlings, etc.
>>42218072I have an audio pack with random moans, give me few minutes to upload it
>>42218072NTA but here's a couple more Liminal Trixie noises.
A few grunts, laughs, even some coughs and various others.
https://files.catbox.moe/q8g80w.mp3
>>42218755I'm surprised no one has done something with that.
@hydrusbeta, what happaned to the synth app its not working and could it be possible if you can add a direct link to it on the haysay website
>Servers down
>twitter account gone
Permission to panic, sir?
>>42219523False alarm, twitter was just fucking itself up again.
>>42218072https://u.pone.rs/PRpOFwQp.001
>SpecialPacks_.zip.001https://u.pone.rs/vyoSUmbo.002
>SpecialPacks_.zip.002https://u.pone.rs/WFVSxEXw.003
>SpecialPacks_.zip.003https://u.pone.rs/mGedaJTp.004
>SpecialPacks_.zip.004https://u.pone.rs/uGmobBkJ.005
>SpecialPacks_.zip.005Rename the download files to the below quoted filenames. It's 2.27GB mix of variety sounds from ASMR, hentai games and some other gooning sources. Do use the RVC to make them pony related.
I'm preparing to train MLP models with GPT-soVITS v4
Which mare should I start with?
>Yes, I'll add the precomputed values from Haysay, once I make the WebUI.
>>42220345Applejack is a good baseline to test out accent retention and character similarity. Otherwise, testing more unique voices like Queen Chrysalis would better determine how well the model replicates the intended voice without falling back too much on similar but generic voices.
>>42220345I'd be curious to know what effect the LoRA Rank has on the models, and which one is ideal for what datasets.
>>42220536What pretrained English model was finetuned on?
>Prompts various commas and apostrophes to get hidden mare noises.
>Lyra: "Ew, I think it's some sorta booger or something"
Wow, these mares have some fascinating interpretations.
>https://files.catbox.moe/whb1r5.mp4
>https://files.catbox.moe/xqvrlq.wav
>>42222460The interface is really stylish.
>https://huggingface.co/Amo/GPT-SoVITS-v2/blob/main/TreeHugger_so96_gpt24/wavs.zip
>This file is vulnerable to threat(s) PAIT-ARV-100.
Could somebody with good quality antivirus scan this zip and files inside of it? it's probably a false positive but I want to be sure this wouldn't mess with my pc.
https://unmute.sh/
Found this, apparently they're gonna open source the text and speech models soon, but for now, you can supply a ten second voice clip of anyone you want to speak with them in a variety of topics.
>IMS Toucan - tts 7000 Languages>https://github.com/DigitalPhonetics/IMS-Toucan>https://huggingface.co/spaces/Flux9665/MassivelyMultilingualTTSI think this was posted few years back, I've noticed they had update on huggingpage about two weeks ago, after few minutes of testing, it seems to be working, however while the quality of voices is above MS Sam and the noisy talknets, the way tts is talking still feels very artificial.
The voice cloning option seems to be broken so that's sucks, however by the fact that it is able to generate voices at light speed and even has build in options for CPU usage means that it could be run on a potato tier equipment without problems.
So, its not something useful for now, but there is always possibility somebody else could take it and improve it (imagine Flutershy teaching you how to speak moonrunes).
>>42223265Thank you for sharing that Anom, and also holy fuck, this is working like pure magic, I just given them a 9s of audio clip of really low quality clip ripped from a game and it was able to replicate it without the shitty de-reverb pollution and background buzzing noise AND keeping the accent consistent. And on top of that I was able to double the amount of voice lines this character had ever spoken, so thats a massive plus on making artificial datasets.
>apparently they're gonna open source the text and speech models soonWith this kind of tech there wouldn't be a need for training full models for the bare bones TTS can be done with 10s clips and less than 5m of waiting for the voice to be clone. Man, I remember way back in mid 2020 when people talk about this tech and pretty much everybody agreed that cloning voices with 10s of audio will never sound natural or even good, how times have changed.
>>42223265I tried to see if it could recreate voice from 3s of Woona voice but sadly that was a no-go (Ive even try duplicating the voice to fill it out to 10s clips), im guessing the high pitch levels of distress is messing with their process or they do need minimum 6s of audio to be able to work out how to duplicate it.
>>42222460This is what I've got instead. I really dig the giggle in the first one.
https://vocaroo.com/154R3gQLRpG1
https://vocaroo.com/1eOcqD52A2pm
>>42224830>https://files.catbox.moe/etzhiu.mp4>Chrysalis: "(forceful exhales x3), We should take the magic inside it. You know how powerful Discord was."Guess with limited-to-no other speech input, it does fall back a lot on the Reference Text as seen in the Advanced Model Details. No wonder so many Trixie attempts had her mumbling about a good night's sleep. Less random than initially suspected.
I wonder how the model would behave if we were able to remove or modify the underlying quote(s) during synthesis, though I'm sure it's likely integral to retaining its accuracy. Come to think of it though, it would be nice to be able to select specifically what underlying reference line it's using prior to generation so that you have more chances of getting a desirable output similar to it. Could mean less resource usage too.
>>42216763what tricks did you use?
>https://github.com/PasiKoodaa/ACE-Step-RADIO
I've stumbled upon above github project, it uses the Ace Step music model to create a constant stream of ai music to replicate what online radio websites do, the requirements for it are 16GB Vram. The outputs are still on the so-so level, but given the text to song models are only about year old there is plenty of space for improvements. Also I would love to see a setup were these models sing with proper poni voices from the get go (or with the help from loras).
>Stable Audio Open Small
>Weights: https://huggingface.co/stabilityai/stable-audio-open-small
>Paper: https://arxiv.org/abs/2505.08175
>Arm learning path: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/run-stable-audio-open-small-with-lite-rt
Huh, a model that's only around 2GB? Nice to see them notice that not everybody have a endless bag of cash to spend on newest and larges GPU. Sadly it still only outputs instrumental at lower-tier quality (at least in comparison to what's already out there).
Apparently it can run 30% faster than realtime.
>>42224989Mostly the aforementioned ,',' trick, which in older pre- "dev" versions of 15 used to be able to do a lot more lewd noises and such. Used to have a text doc with a handful of other tricks used with it, but it must be on one of my older OS drives. Still serves to force further areas of silence, which in turn can allow hallucinations and other AI weirdness to creep in on purpose.
>>42225053>16GBs VramStill seems out of the memory budget of most anons, Unless it could be optimized to be at least half that with minimal loss. Even if it were finetuned on mares, without optimization I can't imagine many being able to utilize it for synthesis.
>>42225301>Very small model>Lower qualityTo be expected I suppose, but at least it's something usable for local synthesis and playing around with, aside from maybe Bark; which I should honestly revisit. Just a shame they completely abandoned it after becoming monetized in the form of Suno. Still open source like Stable Audio is however.
https://u.pone.rs/pBgJHLQr.wav
Claims to do sota zero shot cloning with tts with powerful control
https://github.com/resemble-ai/chatterbox
>>42228108From a 20s voice sample: https://litter.catbox.moe/w54fxs.wav
>>42228108I've tested with few voices, it seems to be able to run some without any problems but totally struggle with others (seems to depend on how accent/pronunciation deviate from standard way of speaking). Sadly I confirmed that this model is also unable to clone Woona voice.
Music Source Restoration
https://arxiv.org/abs/2505.21827
>We introduce Music Source Restoration (MSR), a novel task addressing the gap between idealized source separation and real-world music production. Current Music Source Separation (MSS) approaches assume mixtures are simple sums of sources, ignoring signal degradations employed during music production like equalization, compression, and reverb. MSR models mixtures as degraded sums of individually degraded sources, with the goal of recovering original, undegraded signals. Due to the lack of data for MSR, we present RawStems, a dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours. To the best of our knowledge, RawStems is the first dataset that contains unprocessed music stems with hierarchical categories. We consider spectral filtering, dynamic range compression, harmonic distortion, reverb and lossy codec as possible degradations, and establish U-Former as a baseline method, demonstrating the feasibility of MSR on our dataset. We release the RawStems dataset annotations, degradation simulation pipeline, training code and pre-trained models to be publicly available.
https://github.com/yongyizang/music_source_restoration
https://huggingface.co/datasets/yongyizang/RawStems
https://huggingface.co/yongyizang/MSR_UFormers
Github repo isn't live yet. might be cool for audio stuff
>>42229871This could be pretty useful in combination with the ACE Step song convector, if a song can have both vocals separated as well as instrumentals separated into their own track I would imagine that would help modifying it into a different style of music.
At the very least it would be nice to use it to fix the weird effects that vocal removing programs are imprinting on the instrumental files.
>https://u.pone.rs/beZAfsQC.mp3
motivational Trixie
>>42234667Precautionary bump.
Again
md5: f2b43efee3f199b15e922a8107752034
๐
Well, the twelve hours after 15 returned was fun I guess. Now back to this bullshit.
>>42239410He's gunna hurl if he keeps that up.
>>42240318The bumping kind.
>>42240322The bumping loyal
>>42240779Let me bump the thread of my people.
https://openaudio.com/blogs/s1
The .5b mini version will be open sourced
>>42243816Hmm, would be nice if there was a demo WITHOUT music so I assume they put it in to hide the lower quality. But with .5B size this thing should technically be able to run in a phone sized environment, so that's neat.
>>42243816>>42244249Neat indeed, but it's a shame they don't have any audio examples of either version (on that page at least). Hard to really get a feel of it when there's nothing to gauge or judge.
Scootaloo Scoot-Scootaloo.
>>42247180Someone said chicken?
>>42243816https://huggingface.co/fishaudio/openaudio-s1-mini
>>42247951OpenAudio S1 supports a variety of emotional, tone, and special markers to enhance speech synthesis:
1. Emotional markers: (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
2. Tone markers: (in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
3. Special markers: (laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting) (groaning) (crowd laughing) (background laughter) (audience laughing)
>>42247952>Emotional markersInteresting, hopefully there will be a decent UI and training for it
Bump
md5: 1713a6f02618929fa122fb5c07ac12b7
๐
>>42248854>bump rumpWould pump.
>>42249353Pretty bump mare. Totally would.
>15 crawls back to bait patreon donos with his half-baked model where most emotion choices result in unintelligable noise
>11 releases a new alpha that wipes the floor with his crusty garbage less than a month later
https://elevenlabs.io/v3
holy fucking kek! maybe there is a god.
>>42250109yeah but unlike fifteen, eleven labs cost money
>>42250109? elevenlabs doesn't have ponies, how is this a comparison
Remember not to give goku the attention he wants
>>42250547you have to train your own models on there you retard mcspazatron
>>42250592yeah is it any good though, last I tried to train ponies it wasn't very good
>>42247951Anybody had a chance testing this thing out? Due to bullshit reasons I'm kind of stuck phone posting but I do want to know if it's any good.
https://github.com/RVC-Boss/GPT-SoVITS/releases/tag/20250606v2pro
https://github.com/RVC-Boss/GPT-SoVITS/wiki/GPT%E2%80%90SoVITS%E2%80%90features-(%E5%90%84%E7%89%88%E6%9C%AC%E7%89%B9%E6%80%A7)
>>42250941>for 50 nvidia seriesso wait, the new models is for 50s exclusive or just optimized for the use on that hardware?
>>42207363I tried that and all it did was make Rarity do pokemon noises.
https://files.catbox.moe/1ryvaz.wav
https://files.catbox.moe/qivs4r.wav
also somethimes the AI interpretation (wish we could turn that off) says "Triple A" https://files.catbox.moe/72xgzt.wav
>>42250109>elevenfagsMiss me with that shit.
I found some free audio processing plugins, I'll be loading these in (((audacity))) to auto-process my dataset. I haven't tried it yet, but it seems promising, like a publicly released version of izotope:
https://archive.org/details/accusonus-era-bundle-v-6.2.00
They made it public before going out of business. I might reply the anchor if it gives a good result.
>>42250109I wonder (((who))) could be behind this post.
>>42252595Interesting, could you post some examples here?
>>42250109gptsovits wipes the floor with 15 shitty model already, no need to bring the big guns
>>42254373stop samefagging, your broken english is too noticeable at this point
>>42254415You wish I was samefagging retard
For characters with lots of voice lines like Spike and Twilight, if I'm using my own voice, what's the best option to choose on Haysay to sound good?
>>42256668RVC is the current gold standard as far as Haysay goes for speech-to-speech.
>>42256717It's not quite getting the intended result. Should I set voice envelope high or low? https://voca.ro/1iHl7ZMvk5Qm
>>42218755What settings did you use here? Sounds pretty good.
>>42256762If you're trying to get non-vocals out of the voice-to-voice, it's not gonna work great.
>>42256762Those were generated with 15.ai, probably the best option if you don't need voice to voice functionality and just want lewd pony noises.
>>42256762>>42256785Mostly default settings. Varying the temperature occasionally. Liminal mares also make all sorts of noises, not just lewd. I can easily imagine them being used as vocal SFX for pony videogames or something โ maybe an episode or animation like a mare drips onto the ground and the grunt is entirely synthetic and not a recycles audio from the show.
https://files.catbox.moe/7rx7zi.mp3
>>42256871>https://files.catbox.moe/7rx7zi.mp3These sound like Trixie is doing Link moves.
>>42257202Abstract mare sounds are abstract. Sadly Rvc is still the king of getting quality lewd sounds, but I still wish we had a nice tts alternative.
>>42257202Huh, yeah, this really make me want to work on my 3d modelling again... although Godot's 3D capabilities are not great still.
>>42257362Is there a place I can upload multiple audio files for easy playback? I wanted to show off what I managed with the TTS on haysay.
>>42257432Thanks. Too bad it doesn't stream playback....
https://u.pone.rs/reZpBwHV.wav (Twilight)
https://u.pone.rs/cBNqloOa.flac (Spike)
>>42257202Could totally imagine a game with Trixie acting as the hero of Hyrule.
>>42257426Damn, haven't heard Godot in a hot minute. I really need to find time and motivation to actually get into that myself. Keep telling myself that though. Sadly free time and hobbies don't pay bills.
>>42257437>doesn't stream playbackYou mean like, play it in a browser? Because usually mp3 is supported in that way.
>>42257521Yeah, I know what you mean, though I'd say getting those skills can be valuable. Personally, I wish I didn't mentally check out of a tutorial after like 30 minutes because most of them need a good hour or more to really get into the meat of it, and even taking notes, it feels like I'm not retaining it well.
>>42258274I would recommend the YT channel TheRoyalSkies, all his video (with some rare exceptions) are between one to five minutes long, always getting to the point instead of flapping about some bullshit and settings. The only downside is they are usually aimed at people who already have little bit above total 0xp noobie beginners but it's still good stuff.
>>42258298Oh, they have Cascadeur videos. I was wondering if that was usable with quadrapeds too...
>>42258349never used that addon/function, but I would imagine anything that is not a humanoid with standard two arms and legs will require lots of custom rigging.
>>42250902Thread tourist here, it's breddy gud for being local. I've been running it on a 3060 with no issue, takes about twice as long as real time but the 44.1kHz fidelity is incredible. Also the voice cloning accepts up to 90 seconds of input, with possibly more but I have yet to test that.
My main criticism is that for longer gens upward of a minute or more, the voice gets kinda washed out in a way, but you can easily circumvent that by just splitting your text into chunks.
Here's some examples I genned:
Cum Zone guy quoting Ozymandias (my favorite gen, nearly indistinguishable from real VA) https://vocaroo.com/1ngXhfejJwoB
Gilbert Gottfried navy seals (you can hear the voice getting washed out towards the end) https://vocaroo.com/1n6SZbrHzKZ1
Michael Rosen pulp fiction (it can mispronounce capitalized words, storage is pronounced as sturgeon) https://vocaroo.com/1ov76WqTjIUY
I'd say it's elevenlabs-tier, even if that comparison is now outdated because of their new model.
>>42260216for a zero shot model it's surprisingly decent. In their GitHub, do they provide a UI with emotional control or is it just bare minimum of "audio reference in, tts out"?
https://github.com/fluxions-ai/vui
https://huggingface.co/fluxions/vui
has voice cloning ability
>You can clone with the base model quite well but it's not perfect as hasn't seen that much audio / wasn't trained for long
What's the best tts for mares? I know elevenlabs is the best overall but I'm wondering how good it is for ponies
>>42261160For locally operation, it's still the gpt-sovits. I don't use paid online services so lmao on that one.
>>42223265But I guess this one could beat it, once they make it public. Having their tts model running tts integrated with Silly Tavern would honestly kick some serious ass.
file
md5: 317a28c6f60aebf372227a3aa1a41a6d
๐
>>42260489There's emotion control to a degree, you just put one of the tags in parentheses at the start. There's only a limited amount of valid tags and it can only go so far, and I haven't personally been able to use multiple in a single gen since it just says the word but YMMV
>>42261547>only one emotional tag controloh, this sucks donkey balls, I was hopping we could finally have a model that can make a advanced sentence styles eg whispering with mix of anger and confusion.
>>42261622Yeah, honestly sounds like a convoluted way to say they have multiple individual models compounded, each trained on one particular emotion and uses the parentheses determine which underlying model it uses for synthesis.
>>42261622>>42261832Well like I said, your mileage may vary. I haven't been experimenting with it nearly as much as I should, and it could very well support that. I saw an example somewhere else of Pearl from SU reading the best thing about meatballs meme and the voice there was pretty varied emotionally and realistic. To be fair, they might have been using the full model which is only available through their website, but I wouldn't knock it before trying it on the smaller model. Using my GPU for other purposes at the moment so someone else will have to test.
>>42261160Is there some kind of library with voice clips I can use to make pony models in ElevenLabs?
>>42262716megas links in OP?
>>42262326Cute bump mare.
>>42261160https://15.dev/
bump due to too much spam on the board
Is openaudio s1 the best thing right now? I copied random text from a mod page. The pronunciation is pretty good, although imo a little too neutral.
>>42265042Audio quality seems the best, pronunciation is really good as long as it's not a weird made up word.Emotions are pretty meh.
https://vocaroo.com/1l7fRlI0qtqn
>>42264548No trolls please
https://x.com/elevenlabsio/status/1933188969279500459
This is starting to get sad...
>>42268941I only have one gpu that's already too outdated for all this kind of technological novelty. I already had to throw away few ideas for song cover because random song leakage / dual vocals was fucking with conversion process.
>>42218755>pukes at the end
>>42269737There is only one thing we can do, we cook...I mean we make pony content. I was thinking of doing a "X pony makes a review about fics/books" in similar theme/feel of Rainbow Dash Presents.
With SparkTTS, voices can be cloned with even just a few seconds of audio. This allows the cloning of background characters like TwinkleShine. What I like to do is feed ai generated voices into elevenlabs in order to get a higher quality model. Love what you guys are doing!
Anyone else here that thinks about the possibilities of AGI pretty consistently?
I donโt know exactly how much overlap there is between this corner of the fandom and technological singularity enthusiasts.
>>42270729I'm always dreaming of Bicentennial Man level of AGI. Just another race of sentient beings but they're Robots! but I have no idea if we'd ever reach a singularity event or even if we do, what are the true possibilities?
>>42270729in my unprofessional opinion we don't have currently tech and materials to make something that would work as proper AGI, at best it will just more polished versions of LLM that will be so good at pretending to sound like people it will be next to impossible to distinguish them from people. I do think people in next century will make some new type of processors/programming/something else that could make the computers think and feel for real, but by that time the world and society will change so much there isn't even point in guessing how it would look like (just like trying to explain a caveman the wonders of tech from ancient roman empire).
>>42269957This. You must use the pone to save the pone
>>42262734I've tried to use the audio clips but my models sound like shit. Does anyone have some pre-made audio clips I can use for ElevenLabs that's worked well for them?
>>42272674>models sound like shitso idea what script you are using but everyone and every company that has pony voice conversions/tts are using the exact same clips from PPP.
if you are using some new experimental cloning scripts, these will require the use of 10s clips, so if you give them just 3s clip the result will sound shit.
>>42272674>ElevenLabs>Models sound like shitSo nothing new then
>>42272932>https://u.pone.rs/LvFcybeH.mp3surprise horsefuckers, I got some spare time and converted a song from my buddy to Moon Dancer vocals, enjoy.
OG song: https://suno.com/song/eae162d0-cbbb-433a-8008-5fab7bee01ba
>>41070370Is there a chance anybody here has archived this before it was deleted?
>Background Pony - "OUT OF APPLES" - Hall 'n Oates - Out of Touch (MLP Applejack AI cover)this was its title if it helps anybody find it
>>42272674ElevenLabs is shit. Just use 15.ai.
>https://u.pone.rs/EuipipDV.mp3
American (Dad) Ghost theme
I downloaded this in 2021, it's been 4 years now. How much has it improved since then?
https://vocaroo.com/11NtyOrTttKN
https://vocaroo.com/11NtyOrTttKN
https://vocaroo.com/11NtyOrTttKN
>>42276071He's right though. EL is arse.
I want to take the costanza answering machine song and change the words while maintaining his voice. What's the most appropriate model to do this with?
>>42279159>keeping the og voice but slightly editedHmm, that will be bit tricky, if you can find a version without a laughing track, you can try run the clip through the ace-step
>https://huggingface.co/spaces/ACE-Step/ACE-StepThis should allow you to use function to partly edit the lyrics without changing the music (or so that's the general idea.
The other alternative is to find some clean clips (or de-noise them with some ai program) of costanza singing in same tune as in the show, have that 2~3 minutes of dataset trained in rvc, use some other character talknet/whatever model to sing the whole song and apply it to official soundtrack
>https://www.youtube.com/watch?v=1ghIoM89cfc&list=RD1ghIoM89cfc>>42278429>from previous year>https://u.pone.rs/DFPTbUhe.mp3Dude, tech jump feels like going from writing books by hand to using printing press. Depending on what you are trying to use if for, it will for most of the time sound about ~95% like character is supposed to sound like.
>https://u.pone.rs/FHniGgaQ.mp3
Pinkie Pie - At God's Mercy (GAME SIZE)
>https://u.pone.rs/dyjpaZQU.mp3
Rainbow_Dash_sings_Land_of_Shattered_Dreams_by_DragonForce
>No Nurse Redheart on 15.ai
Boycotting 15
>>42274933Six years of saving songs comes in handy sometimes. https://files.catbox.moe/gwqv9m.mkv
>>42283796>FilenameA philosophy to live by.
>>42283796nta but thank you archive-kun anon
>https://u.pone.rs/MOQrKwwX.mp3
Redoing Cossacks letter with gpt sovits.
>https://huggingface.co/collections/kyutai/speech-to-text-685403682cf8a23ab9466886
kyutai have posted their speech-to-text models on hugging face (it's the people who made the https://unmute.sh/ site). Hopefully they will get around publishing the TTS model some time soon.
>>42253243I came back with some samples from my button's mom dataset that I used the following on:
De-Breath
De-Esser
Mouth De-Clicker
Plosive Remover
>Original Sampleshttps://files.catbox.moe/68yrm2.wav
>Processed Sampleshttps://files.catbox.moe/0d3djz.wav
Again, I read that the software is completely open sourced to public domain and no one owns the rights to it or what it makes, should be perfect for any use for processing data without spending money on IzoTope. You be the judge on how effective it is, I'd say it's good enough to shovel multi-hour datasets for free in one go and clean up whatever is left afterwards.
>>42287174Cool stuff! With it's apparent noise and reverb removal capabilities I may have to test how well it is at salvaging previously unusable data to see if existing pony models might be expanded. Gotta first test if it works well through Wine though. I wonder if I might be able to salvage more workable Redheart data.
>>42287401Hell yeah brother! That's what it's all about! There's got to be so much ponyfeather quality audio data that could have been fine with just a pop filter, and this should fix it for posterity.
Does anyone know what TTS service is best to use with SillyTavern?
>>42289075uhhh, i vaguely remember there was a plugin script (or api script?) that could connect the ST with some tts that could even be train on 10~20 minutes of dataset, but that was year or more ago and even than I personally given up on it as python dependency hell was impossible to navigate to even install that bloody thing.
>>42283796SUPERCHARGED anon, thank you
>>42283796Nice! I think I have about that in pony memes and art among others from years of saving which come to think of it I still need to find time to sort and categorise โ Thanks for the reminder.
>>42283796Autism yields its own rewards.
Nice.
gn, imma going to think of what stuff to make tomorrow
>>42174105Do we know if there are any other additional recent local audio and music generators comparable to the likes of Suno and Udio?
Aside from this example, I haven't come across a decent versatile one that can run local since Bark, which since was abandoned ages ago (as far as open source goes) and became Suno. Which is still incredibly good, but it'd be nice to have something similar that don't rely on credits and lame stuff like that.
>>42298627Stability Ai may or may not work on one, but who the fuck knows with them since they still have't publish the newer version of instrumental Stable Audio model.
Other ai song model is the YuE, but from the looks of it its bit tricky to get working locally .
>>42161191 (OP)Congratulations, 1111 aka 15!
>https://u.pone.rs/kLAzyDaA.mp3
New ai song, "I only eat 3 cheeseburgers!" from suno user ๊น์น๋ค์๋ง์๊ฐ์น, and converted with Twi vocals.
>>42300581we sell hay here not burgers
>>42300352What are you referring to?
>9
Eighth bump mare deployed
>>42300581Could go for some burgers right about now
>>42302294Thank you, kind bump mare.
>https://www.tomshardware.com/news/gddr6-vram-prices-plummet
>16 gb of vram could be as cheap as 400$
>but it wouldn't because nvidia are greedy fucks
i will never forgive the crypto bros for fucking up the market
Board is moving lightning fast this past hour.
>>42305431it's the sliderfag
>>42305442Yep, it's becoming more and more blatant every time.
>>42305442With the lack of reaction from jannies and mods (as they are too busy to jerk off to furry fag shit), Im feeling like there could be a good idea to keep a parallel thread in nhnb and mlpol too, to at least keep some bits in case the the board kept being nuked.
Anyone know how to get 15 ai to scream? Tried to use so-vits on haysay with audio but it came out like crap. Need Lyra doing it too, and so-vits doesn't have her.
>>42307024Uhhh, tts models pretty much always struggled with screaming and whispering. The older 15 model could do it to some smaller degree (but it still was a massive game of rolling the next generated clip untill you got what you wanted). I guess you could try to find screaming clip in OP mega and use that with gpt sovits reference Tts?
>>42307024Convincing screams and other less-phonetic sounds have been notoriously difficult since the very beginning of artificial speech. Feels like it comes down to a lack of data, or the specific exclusion of which due to the negative impact its kind has on training.
Closest thing I can suggest is priming. Initiate the prompt with a sentence (or multiple) of dialogue that would ordinarily be expected to be said with intensity; be that anger, seriousness, shock, whatever. The AI likes to be consistent with outputs and therefore some of that emotion will be inherited and thus carry over to concurrent sentences โ this is where you'd attempt screaming dialogue. Might also be good to try using ARPAbet for some too so it pronounced correctly.
>>42307277>>42307279Thanks for the suggestions. I ended up just regenerating an "AAAAAAAAA" prompt a bunch of times until I got as close as I could to a scream. Sounds like shite, but it was only for a little shitpost anyway. https://files.catbox.moe/z2r0c8.mp3
Which is for this for this pic in /bale/
>>42305975
>>42308681huh, pretty neat work Anon
>>42207220https://files.catbox.moe/fv2v5u.wav
https://files.catbox.moe/aeqloc.wav
https://files.catbox.moe/xl6ft5.wav
https://files.catbox.moe/xl6ft5.wav
Here's some with Flutters. I just did:
"ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, cumming! ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, fuck me, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh, ahh!"
You can hear the good parts and splice those.
>>42310164ai mares are lewd
>https://u.pone.rs/NlnRoRSa.mp3
Ghost singing Past Due - Xenophobia (aka unofficial theme song of Stellaris)
>>42310923A classic. Let the light of mankind shine brighter than the stars themselves
I might make a small lewd audio of Twiggle as a test for 15.dev.
Dialogue's a pain to get to sound natural, way more than 15ai's last version.
>9
Deploying ninth bump mare (triple pose edition)
>>42313719kek, a race to the bottom. What kind of sketchy indians will we reach when we hit 1.ai?
>>42314271Interestingly, hyphens can't be used at the start or end of a domain name. Would probably have to be negative1.ai or something
>>42315665early sleep bump
>>42314359Or simply minus1.ai. It's kind of a word play.
>>42317225Clever. I like it.
A very quick cover of Beatles' With a Little Help from My Friends with slightly modified lyrics
https://u.pone.rs/ODLJbBek.flac
>>42317966Nice work Anon! Funny enough, I listen to some random Beatles song a week ago and wished there was some covers or parodies done in pony voices.
Hi HydrusBeta, Im getting error when using the sovits 4.0 Spitfire model with 'reduce hoarsness' and 'apply nsf_higan' setting, and it works if I turn these two settings off.
>https://u.pone.rs/KbiNvzqK.mp3
Solitary Summer Dream by suno user testediserie.
I was looking for a nice summer song for Celestia, I found myself really enjoying listing to this BUT rvc and other voice converts disagreed with my vocal choice, so we all get to enjoy Spitfire cover, since her voice haven't been used that much.
What's the current torrent for the MLP leak files?
>42119384 42196683 42317225
Yet it is proper to enumerate as such among the Trotting ways.
>42161222 42269737 42208841
ppp as tragedy of the commons
Things fall apart, the centre cannot hold - Keats
pandora's vox on community in cyberspace - humdog
yet... n mare saddlepoint? The altchans apart were less a scattering of the winds and more of the Shattered sundered.
>42204138 42198701 42195922
The Cathedral and the Bazaar - Raymond, acknowledging Tarver's Bizarre Empty Temples.
Cathedral vs. Parlor - Wrye, acknowledging Monitor144hz's Patreon Pigeonhole.
Tamers1-4,5 voices when?
>42270729
It's been a long thread. Bacon-bakin' necessary.
>>42313761Who are these dunces?
>>42319348Anon, are you trying to conceptualize LLM into becoming CelestAI ?
>>42319606If it works, that would be something.
>>42319606Boop her snoot.