Thread 106022276 - /g/

Anonymous
7/25/2025, 7:59:33 PM No.106022276
elevenlabs
elevenlabs
md5: e01fd6b716af52cde2fa9294192f2059🔍
Is there any local alternative? This shit is way too expensive for voice cloning.
Replies: >>106026141 >>106026881 >>106031102 >>106032581 >>106033503
Anonymous
7/25/2025, 10:50:51 PM No.106025090
I don't know, but I'm curious as well. Have a bump, although I don't expect much, everyone is currently busy with that app leak thing.
Replies: >>106030478
Anonymous
7/25/2025, 11:44:16 PM No.106026141
>>106022276 (OP)
Voice cloning is easy as shit with RVC (just gotta have at least 2 hours of isolated voice samples and leave your computer running for like a week to train a 1000 epoch model).

The REAL bullshit is that no local models can even attempt to be as expressive as ElevenJews' TTS (even worse if you need more languages than just English). Someone should just take a lawsuit for the team and leak that shit already, god DAMN.
Anonymous
7/26/2025, 12:33:59 AM No.106026881
DC Hamer Carl's Jr commercial (Don't bother me; I'm eating) voice over_thumb.jpg
>>106022276 (OP)
I use OpenAudio S1 Mini for local text to speech and Seed-VC for voice conversion. OpenAudio S1 Mini eats up 5GB and a fine tuned Seed-VC model eats up 2GB of VRAM.

Input audio of Sports Commentator sample file from the DMOSpeech 2 demo github page
https://vocaroo.com/1kKJm37RDYFI
Openaudio S1 Mini Output file
https://vocaroo.com/1bQiLbbhKfZO
Input audio of Doc Hammer (Carl's Jr Ad announcer)
https://vocaroo.com/19XnClS0JbMi
Openaudio S1 Mini Output file
https://vocaroo.com/13JjY0eyTLon
Input audio of unknown Carl's Jr Ad announcer
https://vocaroo.com/1ffVrk1VNmz4
Openaudio S1 Mini Output file
https://vocaroo.com/185jN6BLxiuo
Input audio of Grace Randolph
https://vocaroo.com/1isQM48G8nA9
Openaudio S1 Mini Output file
https://vocaroo.com/14fY7AYlTNvH
Openaudio S1 Mini Output file fed to a fine tuned Seed-VC Model. Sample rate is 22050 Hz
https://vocaroo.com/1ayPfkoXu95U
Replies: >>106027170
Anonymous
7/26/2025, 12:54:59 AM No.106027168
What i
s the general consensus on AI voice models for audiobooks? I generated this locally in a matter of seconds.
https://vocaroo.com/1krh0SBOtMeA
Replies: >>106030243 >>106030299
Anonymous
7/26/2025, 12:55:05 AM No.106027170
>>106026881
>OpenAudio S1
Works with more languages? Spanish and Japanese?
Anonymous
7/26/2025, 5:55:41 AM No.106030243
>>106027168
Using what model? Doesn't sound too bad, though I'd prefer a female voice
Replies: >>106033279
Anonymous
7/26/2025, 6:01:07 AM No.106030299
>>106027168
>Narration slop
Any piece of shit model can do that nowadays. Make him angry, make him laugh, make him yell like he's pissed off at a nigger. Hell, make a girl sound horny. Only then will I be impressed.
Anonymous
7/26/2025, 6:24:02 AM No.106030478
>>106025090
What leak
Anonymous
7/26/2025, 8:00:53 AM No.106031102
>>106022276 (OP)
chatterbox is good

GitHub repo is /resemble-ai/chatterbox
Replies: >>106031140
Anonymous
7/26/2025, 8:08:25 AM No.106031140
>>106031102
I keep meaning to try to figure out how to take a politician giving a speech, and voiceover audio from a porn in that pol's voice.

Obv needs to be local.

>Coomer
Replies: >>106031188
Anonymous
7/26/2025, 8:20:07 AM No.106031188
>>106031140
use ffmpeg.

literally does everything - splits video from audio, combines video from audio, etc.

1) get clear audio of politician.
2) process audio with chatterbox or coqui/XTTS-v2
3) get your text
4) generate new audio
5) get video
6) find the scene you want to cut and split video into scenes
7) split audio from video
8) burn audio onto scene
9) reassemble video scenes
10)???
11) profit

chatgpt is your friend
Anonymous
7/26/2025, 12:31:20 PM No.106032581
>>106022276 (OP)
I remember there being a completely free voice cloner a few months back (which was hosted on hugging face I think). I lost the link, but from my experience it was so good that it was better than literally every single paid service. It would fuck up 10% of the time, but it only needed 15-60 seconds of a voice clip and generation was pretty fast.
Anonymous
7/26/2025, 2:24:14 PM No.106033279
>>106030243
It's Kokoro 82M. It has plenty of female voices, and some of them are great.
Replies: >>106033389 >>106035126
Anonymous
7/26/2025, 2:37:34 PM No.106033389
>>106033279
My gpu only has 16gb anon :(
And the 30gb doesnt sound nearly as great
Replies: >>106033656
Anonymous
7/26/2025, 2:51:10 PM No.106033471
F5-TTS works pretty well if you just need TTS. Is there voice changing which works well and is local and free?
Anonymous
7/26/2025, 2:55:48 PM No.106033503
>>106022276 (OP)
YWNBAW troon
Replies: >>106033546
Anonymous
7/26/2025, 3:01:25 PM No.106033546
>>106033503
We're getting closer every day. Don't worry anon, you too could be the little girl one day.
Anonymous
7/26/2025, 3:20:56 PM No.106033656
>>106033389
Anon, I'm running it with a 6GB GPU through Speech Note.
Anonymous
7/26/2025, 6:05:28 PM No.106035126
cailee spaeny in civil war
cailee spaeny in civil war
md5: 61e3c23a04ac2b7261b973e3ec262b95🔍
>>106033279
I can't install it, it just pull thousand of dependencies from pip and suddenly fails to compile "misaki". It's so frustrating. Why are AI bros so incompetent?
Replies: >>106035941
Anonymous
7/26/2025, 7:18:52 PM No.106035941
>>106035126
try using docker
Replies: >>106036365 >>106038205
Anonymous
7/26/2025, 7:57:43 PM No.106036365
>>106035941
Docker is bloat.
Anonymous
7/26/2025, 10:29:26 PM No.106038205
>>106035941
Why do we need inference to do TTS? Would not be easier to use a different approach like the old speech synthesizers from the 80s or 90s?