Thread #106022276 - /g/

Anonymous

7/25/2025, 7:59:33 PM No.106022276

elevenlabs

md5: e01fd6b716af52cde2fa9294192f2059🔍

Is there any local alternative? This shit is way too expensive for voice cloning.

Replies: >>106026141 >>106026881 >>106031102 >>106032581 >>106033503

Anonymous

7/25/2025, 10:50:51 PM No.106025090

I don't know, but I'm curious as well. Have a bump, although I don't expect much, everyone is currently busy with that app leak thing.

Replies: >>106030478

Anonymous

7/25/2025, 11:44:16 PM No.106026141

>>106022276 (OP)
Voice cloning is easy as shit with RVC (just gotta have at least 2 hours of isolated voice samples and leave your computer running for like a week to train a 1000 epoch model).

The REAL bullshit is that no local models can even attempt to be as expressive as ElevenJews' TTS (even worse if you need more languages than just English). Someone should just take a lawsuit for the team and leak that shit already, god DAMN.

Anonymous

7/26/2025, 12:33:59 AM No.106026881

DC Hamer Carl's Jr commercial (Don't bother me; I'm eating) voice over_thumb.jpg

md5: a3dc16a336d21bd1d519f2c6c41c3468🔍

>>106022276 (OP)
I use OpenAudio S1 Mini for local text to speech and Seed-VC for voice conversion. OpenAudio S1 Mini eats up 5GB and a fine tuned Seed-VC model eats up 2GB of VRAM.

Input audio of Sports Commentator sample file from the DMOSpeech 2 demo github page
https://vocaroo.com/1kKJm37RDYFI
Openaudio S1 Mini Output file
https://vocaroo.com/1bQiLbbhKfZO
Input audio of Doc Hammer (Carl's Jr Ad announcer)
https://vocaroo.com/19XnClS0JbMi
Openaudio S1 Mini Output file
https://vocaroo.com/13JjY0eyTLon
Input audio of unknown Carl's Jr Ad announcer
https://vocaroo.com/1ffVrk1VNmz4
Openaudio S1 Mini Output file
https://vocaroo.com/185jN6BLxiuo
Input audio of Grace Randolph
https://vocaroo.com/1isQM48G8nA9
Openaudio S1 Mini Output file
https://vocaroo.com/14fY7AYlTNvH
Openaudio S1 Mini Output file fed to a fine tuned Seed-VC Model. Sample rate is 22050 Hz
https://vocaroo.com/1ayPfkoXu95U

Replies: >>106027170

Anonymous

7/26/2025, 12:54:59 AM No.106027168

What i
s the general consensus on AI voice models for audiobooks? I generated this locally in a matter of seconds.
https://vocaroo.com/1krh0SBOtMeA

Replies: >>106030243 >>106030299

Anonymous

7/26/2025, 12:55:05 AM No.106027170

>>106026881
>OpenAudio S1
Works with more languages? Spanish and Japanese?

Anonymous

7/26/2025, 5:55:41 AM No.106030243

>>106027168
Using what model? Doesn't sound too bad, though I'd prefer a female voice

Replies: >>106033279

Anonymous

7/26/2025, 6:01:07 AM No.106030299

>>106027168
>Narration slop
Any piece of shit model can do that nowadays. Make him angry, make him laugh, make him yell like he's pissed off at a nigger. Hell, make a girl sound horny. Only then will I be impressed.

Anonymous

7/26/2025, 6:24:02 AM No.106030478

>>106025090
What leak

Anonymous

7/26/2025, 8:00:53 AM No.106031102

>>106022276 (OP)
chatterbox is good

GitHub repo is /resemble-ai/chatterbox

Replies: >>106031140

Anonymous

7/26/2025, 8:08:25 AM No.106031140

>>106031102
I keep meaning to try to figure out how to take a politician giving a speech, and voiceover audio from a porn in that pol's voice.

Obv needs to be local.

>Coomer

Replies: >>106031188

Anonymous

7/26/2025, 8:20:07 AM No.106031188

>>106031140
use ffmpeg.

literally does everything - splits video from audio, combines video from audio, etc.

1) get clear audio of politician.
2) process audio with chatterbox or coqui/XTTS-v2
3) get your text
4) generate new audio
5) get video
6) find the scene you want to cut and split video into scenes
7) split audio from video
8) burn audio onto scene
9) reassemble video scenes
10)???
11) profit

chatgpt is your friend

Anonymous

7/26/2025, 12:31:20 PM No.106032581

>>106022276 (OP)
I remember there being a completely free voice cloner a few months back (which was hosted on hugging face I think). I lost the link, but from my experience it was so good that it was better than literally every single paid service. It would fuck up 10% of the time, but it only needed 15-60 seconds of a voice clip and generation was pretty fast.

Anonymous

7/26/2025, 2:24:14 PM No.106033279

>>106030243
It's Kokoro 82M. It has plenty of female voices, and some of them are great.

Replies: >>106033389 >>106035126

Anonymous

7/26/2025, 2:37:34 PM No.106033389

>>106033279
My gpu only has 16gb anon :(
And the 30gb doesnt sound nearly as great

Replies: >>106033656

Anonymous

7/26/2025, 2:51:10 PM No.106033471

F5-TTS works pretty well if you just need TTS. Is there voice changing which works well and is local and free?

Anonymous

7/26/2025, 2:55:48 PM No.106033503

>>106022276 (OP)
YWNBAW troon

Replies: >>106033546

Anonymous

7/26/2025, 3:01:25 PM No.106033546

>>106033503
We're getting closer every day. Don't worry anon, you too could be the little girl one day.

Anonymous

7/26/2025, 3:20:56 PM No.106033656

>>106033389
Anon, I'm running it with a 6GB GPU through Speech Note.

Anonymous

7/26/2025, 6:05:28 PM No.106035126

cailee spaeny in civil war

md5: 61e3c23a04ac2b7261b973e3ec262b95🔍

>>106033279
I can't install it, it just pull thousand of dependencies from pip and suddenly fails to compile "misaki". It's so frustrating. Why are AI bros so incompetent?

Replies: >>106035941

Anonymous

7/26/2025, 7:18:52 PM No.106035941

>>106035126
try using docker

Replies: >>106036365 >>106038205

Anonymous

7/26/2025, 7:57:43 PM No.106036365

>>106035941
Docker is bloat.

Anonymous

7/26/2025, 10:29:26 PM No.106038205

>>106035941
Why do we need inference to do TTS? Would not be easier to use a different approach like the old speech synthesizers from the 80s or 90s?

4rchive

Thread 106022276 - /g/