Thread 106497597

352 posts 64 images /g/

Anonymous 9/6/2025, 3:28:16 AM No.106497597 [Report] >>106497859 >>106499477 >>106501214 >>106501583

/lmg/ - Local Models General

[sound=https%3A%2F%2Ffiles.catbox.moe%2Fczs88h.mp3].jpg md5: 72c6ce7e...

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106491545 & >>106481874

►News
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905
>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager
>(09/04) Google released a Gemma embedding model: https://hf.co/google/embeddinggemma-300m
>(09/04) Chatterbox added better multilingual support: https://hf.co/ResembleAI/chatterbox

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 9/6/2025, 3:28:38 AM No.106497599 [Report]

1709421810542.png md5: f990ec97...

►Recent Highlights from the Previous Thread: >>106491545

--Klear-46B model training methodology and benchmark performance analysis:
>106492824 >106492846 >106492855 >106492872 >106492877 >106492882 >106492885 >106492903 >106493017 >106493058 >106493088
--AI-generated loli podcast creation using VV voice cloning and GLM text generation:
>106495961 >106495966 >106496018 >106496034 >106496055 >106496061 >106496121 >106496139 >106496144 >106496157 >106496189 >106496197 >106496208
--German supercomputing expansion and copyright law compliance challenges:
>106493305 >106493329 >106493355 >106493378 >106493423 >106493481 >106494001 >106493977 >106493529
--Qwen Max model updates and community collaboration efforts:
>106491646 >106492302 >106492366 >106492394 >106492411 >106492421 >106492430 >106492428
--Balancing data quality and diversity in machine learning training:
>106492910 >106492929
--VibeVoice-Large's capabilities and controversy:
>106494251 >106494424 >106494708 >106494778 >106495648 >106494801 >106494950 >106495166 >106495298 >106495187 >106495273 >106495566 >106495590 >106495612 >106495637 >106495639 >106495671 >106495689 >106495101
--Challenges with managing R1's censorship and card-based context switching:
>106493572 >106495514
--Temperature settings tradeoff between tool call accuracy and answer quality in local LLMs:
>106491720 >106491751 >106491761 >106491845 >106491888 >106491989
--Gender bias in doctor riddle from Qwen3-Max-Preview:
>106493573 >106494565 >106494593 >106496265
--Qwen3-Max-Preview (Instruct) outperforms peers in benchmark tests:
>106492622 >106492630 >106492638
--VibeVoice model optimization challenges for single-voice applications:
>106496609 >106496636 >106496646
--Analyzing Qwen3 Max's distinctive generation style:
>106493524
--Miku (free space):
>106493154 >106493503 >106493190 >106494251 >106497578

►Recent Highlight Posts from the Previous Thread: >>106491549

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 9/6/2025, 3:37:01 AM No.106497662 [Report]

Mikulove

Anonymous 9/6/2025, 3:40:09 AM No.106497689 [Report]

6000.png md5: 272e4815...

You have them, right?

Anonymous 9/6/2025, 4:04:06 AM No.106497859 [Report] >>106499352

>>106497597 (OP)
geez Peter, TWO mics?

Anonymous 9/6/2025, 4:07:06 AM No.106497883 [Report]

>A loli whispers in my ear
THANK YOU MICROSOFT
AHAHAHAHA

Anonymous 9/6/2025, 4:10:36 AM No.106497909 [Report] >>106497916 >>106497930 >>106497999

I hate microsoft.
Xi please release the same or better model.
Please your parade was very impressive.

Anonymous 9/6/2025, 4:11:11 AM No.106497916 [Report] >>106497927

>>106497909
the only good chinese model is wan

Anonymous 9/6/2025, 4:12:53 AM No.106497927 [Report] >>106497963

>>106497916
>the only good chinese model is wan
The only good local is wan? are you baiting me?

Anonymous 9/6/2025, 4:13:07 AM No.106497930 [Report]

>>106497909
here's your chinese tts bro
https://www.youtube.com/watch?v=mnfLp9O96ak

Anonymous 9/6/2025, 4:18:32 AM No.106497963 [Report]

>>106497927
model, deepseek / kimi are not where near claude / gpt / gemini

Anonymous 9/6/2025, 4:22:01 AM No.106497992 [Report] >>106498193 >>106498240 >>106498251

where do I get good voice clone clips

Anonymous 9/6/2025, 4:23:44 AM No.106497999 [Report]

>>106497909
The model is from MSRA in Beijing with a full Chinese team
It's by all means a Chinese model

Anonymous 9/6/2025, 4:25:12 AM No.106498009 [Report] >>106498122

what are some absolutely necessary loli voices I should be cloning right now?

Anonymous 9/6/2025, 4:41:39 AM No.106498122 [Report]

>>106498009
Aya Hirano

Anonymous 9/6/2025, 4:46:01 AM No.106498148 [Report] >>106498210 >>106498297

holy shit there are no good online tools for making multiple cuts to an mp3 file lmfao

Anonymous 9/6/2025, 4:48:15 AM No.106498156 [Report] >>106498163

Where can I get removed VibeVoice large?

Anonymous 9/6/2025, 4:49:10 AM No.106498163 [Report]

>>106498156
click it then press delete, then go to your recycle-bin and empty it

Anonymous 9/6/2025, 4:53:24 AM No.106498193 [Report]

>>106497992
youtube

Anonymous 9/6/2025, 4:55:55 AM No.106498210 [Report] >>106498226 >>106498228 >>106498297

1745161289919011.jpg md5: 261f1277...

>>106498148
ask your favorite llm about ffmpeg

Anonymous 9/6/2025, 4:56:11 AM No.106498215 [Report]

Looking for suggestions for an uncensored lite local model for using on my phone. Purely informational.

Anonymous 9/6/2025, 4:57:56 AM No.106498226 [Report] >>106498297

>>106498210
this, gpt5's codex / claude code automate so much stuff for me, I just ask it to make some script to do something and it takes like a minute

Anonymous 9/6/2025, 4:58:15 AM No.106498228 [Report] >>106498241

>>106498210
nah ffmpeg does a lot but if you need to cut up an audio file a whole bunch of times to remove voices from other characters or sounds you really need a gui to plan the cuts and you know not type a billion things into a terminal constantly. found some site called soundtrap that can do what I want. if I get into this enough then I'll just download audacity or something

Anonymous 9/6/2025, 4:59:55 AM No.106498240 [Report] >>106498271

>>106497992
just make your own in audacity. Most good TTS only want 20-30 seconds so. Focus on quality above all. The audio should be clean with no noise. This is where most people fuck up because the voice they are trying to clone doesnt have good audio sources (music, sound effects, static, background noises, traffic etc).

People do share models but you'll eternally be using taylor swift, peter griffin, etc.

Anonymous 9/6/2025, 5:00:04 AM No.106498241 [Report]

>>106498228
tell it to make you a tool for doing that. GPT5 can one shot it

Anonymous 9/6/2025, 5:02:16 AM No.106498251 [Report] >>106498271

1742096286363791.jpg md5: 76d74f3e...

>>106497992
>Pirate TV show/movie
>Extract audio with ffmpeg
>Trim odoen to bits you need
>?????
>Profit

Anonymous 9/6/2025, 5:05:09 AM No.106498265 [Report] >>106498269 >>106498319

Is there a local version of nano banana anyone has made? the ones iv seen on hugging face went down quickly

Anonymous 9/6/2025, 5:05:51 AM No.106498269 [Report] >>106498334

>>106498265
nano banana is not a local model, its a google model

Anonymous 9/6/2025, 5:05:59 AM No.106498271 [Report] >>106498301 >>106498303

shitty website.png md5: 0b1e060a...

>>106498240
yeah I figured
>>106498251
am I missing something. is ffmpeg easier to use than I thought for something like this?

Anonymous 9/6/2025, 5:11:59 AM No.106498297 [Report]

>>106498148
>>106498210
>>106498226
kek i remember some time ago when i was cutting up the audio for some other tts i was too lazy to install kdenlive or some other shit so i asked deepseek for ffmpeg idk what happened but shit dident work (think i installed it wrong or sumthing) so i jsut asked it to instead make a powershell script which jsut worked lol XD literally just put in the mp3/mp4 in the folder and give it from which second to cut to which and it does it fucking awesome how jank you can get with llms its really alot of fun

Anonymous 9/6/2025, 5:12:15 AM No.106498301 [Report]

>>106498271
Actually using audacity would probably be easier. I'm so used to the CLI interface that I sometimes forget guis exist.

Anonymous 9/6/2025, 5:12:30 AM No.106498303 [Report]

>>106498271
ffmpeg is for nerds who love command lines. If you want usable stuff, use audacity, or maybe even da vinci resolve which will do audio fine for free.

Anonymous 9/6/2025, 5:17:04 AM No.106498319 [Report] >>106498341 >>106498734

>>106498265
the best image model out right now that can be run on your gaming PC is qwen image, you can run it if you have 16gb of vram.

here is a guide from a man who is definitely not a pedo
https://www.youtube.com/watch?v=0yB_F-NIzkc&t=303s

Anonymous 9/6/2025, 5:20:37 AM No.106498334 [Report] >>106498345

>>106498269
thank you anon, thats disapointing is there anything really comparable i can use locally?

Anonymous 9/6/2025, 5:22:54 AM No.106498341 [Report]

>>106498319
sorry didnt see this reply, lol this guy looks sus
just want a good model to edit wallpapers with

Anonymous 9/6/2025, 5:23:40 AM No.106498345 [Report]

>>106498334
qwen image / qwen edit?

Anonymous 9/6/2025, 5:31:22 AM No.106498390 [Report] >>106498392

Sonoma Sky/Dusk Alpha are likely the next LLaMA or a new Meta line of models (possibly proprietary)

Anonymous 9/6/2025, 5:32:01 AM No.106498392 [Report]

>>106498390
lol no, its grok, just ask it, and its shit

Anonymous 9/6/2025, 5:35:17 AM No.106498412 [Report] >>106498653

new kimi is great btw, like actually better than sonnet imo

Anonymous 9/6/2025, 5:39:58 AM No.106498428 [Report] >>106498434 >>106498959 >>106499389

maybe it's the audio file.png md5: 24945136...

hmm it gets pretty schizo at 1.3 tried 1.4 and have tried higher but I dunno.
https://vocaroo.com/1dlL1nEjQeny
said voice clone file
https://vocaroo.com/1orutFZaUpJb

Anonymous 9/6/2025, 5:42:31 AM No.106498434 [Report]

>>106498428
10 steps are far too few, try like 50

Anonymous 9/6/2025, 6:25:09 AM No.106498653 [Report]

>>106498412
It's shit. I accidentally used v3.1 instead of the new kimi for one of my tests and I actually had a much better time with that before I noticed.

Anonymous 9/6/2025, 6:30:43 AM No.106498668 [Report] >>106498678 >>106498702 >>106498766

1757133043494677.png.png md5: 7ede39df...

Can anyone give me an estimate of how many t/s I could get with pic related at a 5090?
If 3090 + 96 GB + SSD can run R1 at .88t/s how much of an increase would it be with 512 GB of DDR5 over 5.0 x16 + 96 GB of ddr5 + 32GB of vram?

Anonymous 9/6/2025, 6:33:46 AM No.106498676 [Report] >>106498704 >>106499018

I spent 3 hours looking at comfy ui and all this crap because you told me it was easier on us vramlets and I finally got the comfy-UI VibeVoice thingy running and when I try to generate I get stuck on this
>Downloading VibeVoice model: VibeVoice-Large...
>Fetching 17 files: 0%| | 0/17 [00:00<?, ?it/s]
an hour later still stuck there, I force stop comfyUi and restart it and it still gets stuck on that

Anonymous 9/6/2025, 6:33:55 AM No.106498678 [Report]

>>106498668
>ddr5
so the same speed as using it on regular ddr5?

Anonymous 9/6/2025, 6:37:32 AM No.106498702 [Report] >>106499735

>>106498668
It's going over PCI-E 5.0 X16 so the hypothetical maximum bandwidth going over that connection is 128GB/s

Anonymous 9/6/2025, 6:38:08 AM No.106498704 [Report] >>106498714 >>106499018

>>106498676
Download from modelscope into ComfyUI/models/tts/VibeVoice-Large.

Anonymous 9/6/2025, 6:38:30 AM No.106498707 [Report] >>106498721

Redpill me on nanobanana

Anonymous 9/6/2025, 6:39:14 AM No.106498714 [Report]

>>106498704
>ComfyUI/models/tts/VibeVoice-Large.
*ComfyUI/models/tts/vibevoice/VibeVoice-Large

Anonymous 9/6/2025, 6:40:09 AM No.106498721 [Report] >>106498735

>>106498707
SOTA but it's likely actually genie3 creating a virtual reality where the prompt is real and taking a virtual photo off that

Anonymous 9/6/2025, 6:43:47 AM No.106498734 [Report]

>>106498319
do you think it can be done with 12gb?

Anonymous 9/6/2025, 6:44:51 AM No.106498735 [Report]

>>106498721
That sounds incredibly convoluted for what's essentially Photoshop: Gemini Version but it's cool how it works

Anonymous 9/6/2025, 6:50:56 AM No.106498766 [Report]

>>106498668
I dunno, r1 was kinda hard to run and I havent tried it since jan cuz I hate it's writing style.

I have a build of 5090, 2x 5060's for 64vram/160 ddr5 (4000mhz). On linux that got me 5 tokens a second on 4k context full glm q4 with some mmap and maybe using 48gb of vram (hard to balance MoE layers, I suck). Presumably if I bought a proper 256gb ddr5 (6000 mhz) kit, I'd be getting more tokens per second, maybe 8 or so even with 8k context I wanna say.

That's a 200gb model. 400gb deepseek is gonna cut shit in half unless you invest in tons of vram

Anonymous 9/6/2025, 6:55:14 AM No.106498787 [Report] >>106498799 >>106498817 >>106498826 >>106498833 >>106498864 >>106498869 >>106498898 >>106498965 >>106499156

WarMother-El-Anillo-de-los-Ercillanos-.png md5: 653db410...

how much potable water is being drink because of this.
how many forests are being burn because this technology.

People accepted computers because their energy output is low.
Now that is gone.

Anonymous 9/6/2025, 6:57:14 AM No.106498799 [Report]

>>106498787
almost nothing, and water is not destroyed, that is not a thing, it just condenses back into water after being turned to steam

Anonymous 9/6/2025, 7:00:41 AM No.106498817 [Report]

>>106498787
if burning electricity for cars is considered green, then burning electricity for ai is even cleaner (and doesn't cause tonnes of rubber plastic contamination through tires)

Anonymous 9/6/2025, 7:02:51 AM No.106498826 [Report]

>>106498787
dying of thirst
computers drank it all
me go too far

Anonymous 9/6/2025, 7:05:45 AM No.106498833 [Report]

>>106498787
>People accepted computers because their energy output is low.
No, they accepted them because the utility is high. Now its even higher.

Anonymous 9/6/2025, 7:05:46 AM No.106498834 [Report] >>106499449

when are we going to get tts.cpp and vibevoice GOOFS
fuck this python nonsense, couldn't be bothered to set any of it up all over again for every new shitty web UI and whatever that gets released

Anonymous 9/6/2025, 7:06:15 AM No.106498836 [Report]

>tfw mom is mad at you again for using up all the house water to talk to the ai

Anonymous 9/6/2025, 7:12:36 AM No.106498864 [Report]

>>106498787
leftist detected

Anonymous 9/6/2025, 7:13:00 AM No.106498869 [Report]

>>106498787
I set fire to the amazon (both the river and the rainforest) just to ahh ahh mistress, and I'd do it again.

Anonymous 9/6/2025, 7:19:48 AM No.106498898 [Report]

>>106498787
You have identified the issue but not the cause. We have water shortages because people reproduce endlessly until we reach a breaking point. The main use of water is to grow FOOD.

Most electric plants and datacenters consume a lot of water but that water is cycled into the plant and then returned to the environment shortly, making their numbers on a graph look high, but essentially very low compared to other uses.

They do use a lot of power though. They need to chill out a bit on large training runs and pointlessly making tiny improvements.

Anonymous 9/6/2025, 7:34:27 AM No.106498959 [Report] >>106499005

>>106498428
Where can I get the model?

Anonymous 9/6/2025, 7:36:35 AM No.106498965 [Report]

>>106498787
i’d burn the entire amazon if it meant i get to rp with my robot loli

Anonymous 9/6/2025, 7:46:48 AM No.106499005 [Report]

>>106498959
https://www.modelscope.cn/models/microsoft/VibeVoice-Large/files

Anonymous 9/6/2025, 7:50:08 AM No.106499018 [Report] >>106499389 >>106504121 >>106504130

>>106498704
>>106498676
ok I got it from the torrent in last thread, took a whole friggin hour to download it
Honestly yeah I see a pretty massive improvement, previously it took me 3 minutes to generate 15 seconds of speech with the Large model now I generated 40 second of speech in 80 seconds
MASSIVE improvement

Anonymous 9/6/2025, 8:15:22 AM No.106499113 [Report] >>106499132 >>106499220 >>106499914

Here's the reason why vibvoice large was taken down: https://voca.ro/1bCzVodtGtHZ

(had to use a voice clone of porn moaning to get it reliably to do this. The base clip sounded this fake and inauthentic too so maybe someone can do better)

Anonymous 9/6/2025, 8:17:16 AM No.106499121 [Report] >>106499517 >>106499750

Fucked up how a picture is worth a thousand words yet LLMs are way more resource intensive than diffusion models

Anonymous 9/6/2025, 8:19:34 AM No.106499132 [Report] >>106499149

>>106499113
I feel unsafe right now. Like, my whole life is in danger.

Anonymous 9/6/2025, 8:23:23 AM No.106499149 [Report] >>106499173 >>106499929

>>106499132
Now I'm basically raping you : https://voca.ro/15bXrL5GeAS9

Anonymous 9/6/2025, 8:24:49 AM No.106499156 [Report]

>>106498787
Energy consumption is how civilisation advances

Anonymous 9/6/2025, 8:27:52 AM No.106499173 [Report] >>106499189

1286404019585.png md5: 5442a82b...

>>106499149
Take care when using the EXCLAMATION MARK(!) IN VIBEVOICE! I find it hilarious when it instantly goes to 11 with mic clipping and distortion

Anonymous 9/6/2025, 8:31:17 AM No.106499189 [Report] >>106499364 >>106499911

>>106499173
Example: https://vocaroo.com/127ZooPcK7mj

Anonymous 9/6/2025, 8:32:06 AM No.106499193 [Report]

Don't you dare!!

Anonymous 9/6/2025, 8:37:16 AM No.106499220 [Report] >>106499279 >>106499415

>>106499113
can it do japanese?

Anonymous 9/6/2025, 8:48:36 AM No.106499279 [Report]

>>106499220
Yeah real. English sucks for sex. Though I guess it could be worse.

Anonymous 9/6/2025, 9:03:32 AM No.106499352 [Report] >>106501634

4df017f6f4a90e2e58e1be9ab44925a1.jpg md5: fcb7b59c...

>>106497859
It's the Chinese Family Guy knockoff.

Anonymous 9/6/2025, 9:05:02 AM No.106499364 [Report] >>106499415

>>106499189
Kek you weren't kidding, that really went apeshit.
Was that just an exclamation mark or was it allcaps too?

Anonymous 9/6/2025, 9:07:45 AM No.106499389 [Report] >>106499425 >>106499831 >>106503518 >>106504130

>>106499018
Some of the results I've been doing, I stole that degenerate's Gwen voice >>106498428 and just ran it through an AI voice cleaner, all those cartoon sound effects and background noise ruin the sample
and a violet sample I had prepared before
https://vocaroo.com/1llO81h1n7kR
also you need to find more even keeled samples, that sample will only produce a hopped up angry yelling Gwen
also from what I've seen the sample options mostly produce garbage, turn it off and the steps seem to be fine at 30 at most, I didn't see massive improvement beyond that point and only slows down with diminishing returns

Anonymous 9/6/2025, 9:11:41 AM No.106499415 [Report] >>106499431 >>106499435 >>106499446 >>106499739 >>106499777 >>106500865 >>106500928 >>106501074

>>106499220
No. I put in some jav moans as the model. Even I can tell it's bad. I generated this several times and I never got the same passion or breathiness, grunts etc that the English voice could. https://voca.ro/1aruRYcd92sp

>>106499364
It contextually just sorta figures it out. Theres no prompting or anything, but you can say "I'm gonna sing a sad song about etc" and it will try to do it kind of. Voice models seem to help push it in various directions too. I bet it could sing better if I just put in a song.

Anonymous 9/6/2025, 9:12:48 AM No.106499425 [Report] >>106499442

>>106499389
Which one?

Anonymous 9/6/2025, 9:13:53 AM No.106499431 [Report]

>>106499415
>fakingu yes

Anonymous 9/6/2025, 9:14:39 AM No.106499435 [Report]

>>106499415
Jeesas
How about Chinese?

Anonymous 9/6/2025, 9:15:06 AM No.106499442 [Report] >>106499448

>>106499425
which one what?

Anonymous 9/6/2025, 9:15:35 AM No.106499446 [Report]

>>106499415
>https://voca.ro/1aruRYcd92sp
lmao, that's funny though

Anonymous 9/6/2025, 9:15:53 AM No.106499448 [Report] >>106499466

>>106499442
AI voice cleaner.

llama.cpp CUDA dev !!yhbFjk57TDr 9/6/2025, 9:15:54 AM No.106499449 [Report] >>106499480 >>106499614 >>106500912

>>106492238
>>106497310
If you're using AMD, be aware that the default for --flash-attn is now "auto", which means to enable it if the backend supports it.
On master the AMD FlashAttention performance can be quite bad though, so try "-fa off" and re-try after https://github.com/ggml-org/llama.cpp/pull/15769 has been merged (if you have an old AMD GPU).

>>106498834
bark.cpp https://github.com/PABannier/bark.cpp already exists though the last commit was 10 months ago.

Anonymous 9/6/2025, 9:19:26 AM No.106499466 [Report]

>>106499448
cleanvoice, literally the first one I found while googling lol
You have to create an account and it has limited uses, you know what we're in /lmg/
someone please point me to the best local voice cleaner model please

Anonymous 9/6/2025, 9:21:10 AM No.106499477 [Report] >>106499488 >>106499489 >>106499499 >>106499501 >>106499518 >>106499552 >>106499577 >>106502081

>>106497597 (OP)
>https://www.theverge.com/anthropic/773087/anthropic-to-pay-1-5-billion-to-authors-in-landmark-ai-settlement
>Anthropic to pay $1.5 billion to authors in landmark AI settlement
>$3000 per book
Pack it up boys, it's over.

Anonymous 9/6/2025, 9:21:40 AM No.106499480 [Report]

>>106499449
>bark.cpp https://github.com/PABannier/bark.cpp already exists though the last commit was 10 months ago.
That's bark model specific though, and I think VV will be more difficult to implement support for since it's actually a diffusion model + a Qwen LLM.

Anonymous 9/6/2025, 9:23:18 AM No.106499488 [Report] >>106499497

>>106499477
seeing comments cheering it i think people deserve the humiliation ritual that is the modern world

Anonymous 9/6/2025, 9:23:21 AM No.106499489 [Report]

>>106499477
they should be releasing claude 1.2 instead

Anonymous 9/6/2025, 9:24:22 AM No.106499497 [Report] >>106499521 >>106499559

1754165881938544.png md5: 06ee6f87...

>>106499488
>seeing comments cheering it i think people deserve the humiliation ritual that is the modern world
this, why the fuck do they want to make their own jail, humanity was a mistake

Anonymous 9/6/2025, 9:24:33 AM No.106499499 [Report]

>>106499477
hey wheres my 3,00 dollars? I've been typing bullshit onto the internet for years. When someone tells the ai not to act like an uniformed angry idiot, that's MY DATA they're using.

Anonymous 9/6/2025, 9:24:56 AM No.106499501 [Report]

>>106499477
looool

Anonymous 9/6/2025, 9:27:18 AM No.106499517 [Report]

>>106499121
That’s….the natural implication of that phrase
You need 1000x the resources to generate the words for 1 picture

Anonymous 9/6/2025, 9:27:26 AM No.106499518 [Report] >>106499574 >>106499577

>>106499477
holy fuck dude, this is actually horrible, the US really wants to lose the AI race to the chinks or what?

Anonymous 9/6/2025, 9:27:39 AM No.106499521 [Report] >>106499698 >>106499706 >>106499728 >>106499737

>>106499497
Humanity can accomplish amazing things, it's humans that are the problem. Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake.

Anonymous 9/6/2025, 9:31:31 AM No.106499552 [Report]

>>106499477
>1% of the company's worth
oh no

Anonymous 9/6/2025, 9:32:22 AM No.106499559 [Report]

>>106499497
The market is a thing that allows me to buy things. But when it goes away i probably wont need it.

Anonymous 9/6/2025, 9:34:07 AM No.106499574 [Report] >>106499693

>>106499518
we have invested hundreds of billions on ai and hundreds billions more on hardware to run it. We have invested 50 times more on ai than on nuclear fusion.

Thats settlement is token shit to say we did the right thing. And you are correct in thinking that if we actually acted with integrity and morality, other countries would surge ahead of us as we shot ourselves in the foot. If you think this tiny crumb is gonna slow us down you're kind of dumb. If anything it shows the worst that could happen and emboldens lawbreaking as a known expense. A slap on the wrist is the worst that can happen.

Anonymous 9/6/2025, 9:34:37 AM No.106499577 [Report]

>>106499477
That's Anthropic's problem. Should have given Orange Jew a few appeasement gifts.

>>106499518
They have multiple groups of jews infighting for money while chinks can act as one. They don't care so much about the long term, as long as it instantly profitable it's okay. The market will fix it. EU on the other hand put on safety IoT cock cage on and is begging to be dommed by both.

Anonymous 9/6/2025, 9:40:05 AM No.106499614 [Report]

>>106499449
>bark.cpp https://github.com/PABannier/bark.cpp already exists
Nice, thanks.
I also saw that koboldcpp added support for something called TTS.cpp although in my (very limited) experience it's really slow on PC and the developer seems to be a macfag because that's the primary platform.

Anonymous 9/6/2025, 9:52:54 AM No.106499693 [Report]

>>106499574
>If you think this tiny crumb is gonna slow us down you're kind of dumb.
I don't think you realize how serious this is. All emerging companies will need billions of dollars to obtain the data necessary to train their models. This will destroy everything; only large companies will be able to afford it. The US killed itself on that race, they didn't just shoot themselves at their foot this time.

Anonymous 9/6/2025, 9:54:27 AM No.106499698 [Report]

>>106499521
>Once you realize that anything under 120 IQ can barely be considered sapient
fact, and I say this because have 121 IQ kek

Anonymous 9/6/2025, 9:55:14 AM No.106499702 [Report] >>106499777

Has anyone tried VV with some Japanese dlsite voice works? I'm curious how it would handle going from Japanese to English.

Anonymous 9/6/2025, 9:55:43 AM No.106499706 [Report]

>>106499521
Most humans that report 120+ IQ are benchmaxxed.

Anonymous 9/6/2025, 9:59:13 AM No.106499728 [Report]

>>106499521
>Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake
I felt that way after seeing the lack of gamers and reviewers mention how utterly broken the AI is in the new shinobi (where you can stand next to many enemies and not ever take a single bit of damage)
people are worthless

Anonymous 9/6/2025, 10:00:15 AM No.106499735 [Report] >>106499745

>>106498702
4t/s for GLM-4.5-FP8...

Anonymous 9/6/2025, 10:01:01 AM No.106499737 [Report]

>>106499521
>Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake.
and the average IQ will goes down and down due to the fact the africans are the only ones making a shit ton of babies, this world is fucked, I pity the future generation

Anonymous 9/6/2025, 10:01:37 AM No.106499739 [Report]

>>106499415
>https://voca.ro/1aruRYcd92sp
Kek that's actually not bad though, it just seems to have lost the context of what it's doing.

Anonymous 9/6/2025, 10:03:09 AM No.106499745 [Report]

>>106499735
I mean, it is a 5x increase but on the other, 3000 USD + your own RDIMMs is get another GPU territory.

Anonymous 9/6/2025, 10:03:46 AM No.106499750 [Report]

>>106499121
LLMs need a much better world model than image models. If you fuck up just one word, it can completely break a sentence or turn it into nonsense, but nobody cares if some blurry background detail on an image is a bit deformed. Or even some foreground details in many cases.

Anonymous 9/6/2025, 10:08:51 AM No.106499777 [Report]

>>106499702
I guess this answers my question >>106499415

Anonymous 9/6/2025, 10:09:04 AM No.106499780 [Report] >>106499802 >>106499803 >>106499936 >>106500681 >>106501881

1755910635224541.png md5: 5722f98c...

>Kimi turned out to be censored
>Deepseek is still autistic
>GLM wasn't much, of anything
What's /g/ using for ERP these days after the rose color glasses of that 'new model prose' has worn off?

Anonymous 9/6/2025, 10:14:08 AM No.106499802 [Report]

>>106499780
R1

Anonymous 9/6/2025, 10:14:12 AM No.106499803 [Report]

>>106499780
Literally nothing

Anonymous 9/6/2025, 10:15:56 AM No.106499814 [Report] >>106500404

So does vibevoice have stuff like [laugh] or it's words only?

Anonymous 9/6/2025, 10:19:45 AM No.106499831 [Report] >>106499839 >>106499863

gwen tentaclous.jpg md5: f969335d...

>>106499389
Ok enough fun with this thing for today, I'm really impressed by the inflections and effects that it gives the scripts, really surprising model all around
https://vocaroo.com/19GSroXyQYlT

Anonymous 9/6/2025, 10:20:50 AM No.106499839 [Report] >>106499967

>>106499831
>really surprising model all around
that's why they wanted to shut the model down, it's too good for local

Anonymous 9/6/2025, 10:24:49 AM No.106499863 [Report] >>106499875

>>106499831
>and don't get me started
Did an LLM write this script?

Anonymous 9/6/2025, 10:27:22 AM No.106499875 [Report] >>106499907 >>106499916

>>106499863
no but my imagination is pretty stunted anone-kun
Right now I'm just trying to come up with funny throwaway scripts to test this sheez

Anonymous 9/6/2025, 10:32:50 AM No.106499907 [Report] >>106499916 >>106500081

>>106499875
This would be more of a storytime than an RP but here is some human slop I wrote for the Open Assistant dataset:

>In the land of South Korea K-pop used to reign supreme. Anyone listening to other genres of music was ridiculed and even openly discriminated against. But no one had it as bad as the fans of Japanese idols. Gangs of K-pop mutant mecha warriors roamed the streets and when they found an idol fan they would be lucky to get away with just a beating. Their favorite thing to do with idol fans was to use their built-in speakers to blast K-pop at such volumes that it would rupture the idol fans' eardrums so they would never be able to hear the voice of their favorite idols again. Sometimes they would switch it up by spewing acid from their mutant bodies for the same effect.

>A lone blacksmith knew that the K-pop warriors would be coming for him next. He had made a small figurine of a vaguely humanoid monster with sharp claws and pointed horns. With all of his heart he prayed to Hatsune Miku, begging her to bring his statue to life so that it may protect idol fans from their terrible persecution - and his prayer was answered. Hatsune Miku descended from the heavens and with her divine powers she brought the statue to life. She named the monster Pulgasari, the eater of iron and steel.

>And Pulgasari did indeed become stronger and bigger as he consumed more and more metal. To grant him even bigger powers Hatsune Miku brought the radioactive remains of the Fukushima reactor core to Korea so that Pulgasari may feast on them. And as the radiation entered Pulgasari's body he began to mutate, growing stronger and stronger by the second. The blacksmith knew that with Pulgasari on their side the time for rebellion had come and so he rallied his fellow idol fans to finally rise up en masse.

Anonymous 9/6/2025, 10:33:53 AM No.106499911 [Report]

8LhyPbK.gif md5: 56054aa9...

>>106499189
>you- FUCKING NIGGER!

Anonymous 9/6/2025, 10:34:32 AM No.106499914 [Report]

>>106499113
that's bredd good actually :-D

Anonymous 9/6/2025, 10:34:48 AM No.106499916 [Report] >>106500081 >>106500089

>>106499875
>>106499907
>It wasn't long until the K-pop warriors realized that something was wrong: a giant, radioactive monster was marching towards their headquarters and thousands of rebel idol fans were following it. Thanks to their mechanical bodies the K-pop warriors were able to quickly concentrate their forces and a battle of epic proportions ensued. The K-pop warriors reasoned that they would only need to take down Pulgasari and their victory would be assured, but their strategy ended up backfiring. With each felled mecha warrior that Pulgasari consumed his wounds wound close and he emerged even stronger than he had been before. Eventually the K-pop warriors realized their mistake but it was too late; Pulgasari had killed too many of them and they were being overrun.

>The battle ended with a crushing defeat for the K-pop warriors and their headquarters was occupied by the idol fans. But Pulgasari's hunger for metal did not stop. He continued to feast on the corpses of the defeated mecha warriors and then went on eat any kind of metal he could find. Hatsune Miku, realizing that Pulgasari's hunger would never be satisfied, quickly hid herself in a bell just as Pulgasari was eating it. Pulgasari devoured her and as he realized what he had done he turned to stone while letting out a heart-wrenching scream. Touched by Hatsune Miku's heroic sacrifice the fans of different music genres established an uneasy peace. Whether this peace would last only time would tell but if the menace of K-pop mutant mecha warriors were to ever rear its ugly head again, then Pulgasari will be back to stop them.

Anonymous 9/6/2025, 10:36:59 AM No.106499929 [Report]

>>106499149
holy shit man she needs to calm down

Anonymous 9/6/2025, 10:38:12 AM No.106499936 [Report]

>>106499780
giantess woman.
her ass is your new home.

Anonymous 9/6/2025, 10:41:58 AM No.106499958 [Report] >>106500651

>>106496501
>>106496504
Thank you, absolute legends. Now it not only doesn't OOM, but works faster in some scenarios where it wasn't OOMing.

Anonymous 9/6/2025, 10:44:16 AM No.106499967 [Report] >>106499999 >>106500008 >>106500013 >>106500073 >>106500670

>>106499839
Is it still available somewhere?

Anonymous 9/6/2025, 10:49:12 AM No.106499999 [Report]

>>106499967
no

Anonymous 9/6/2025, 10:50:53 AM No.106500008 [Report]

>>106499967
yes

Anonymous 9/6/2025, 10:51:53 AM No.106500013 [Report]

>>106499967
maybe

Anonymous 9/6/2025, 11:02:21 AM No.106500073 [Report]

>>106499967
https://www.modelscope.cn/organization/microsoft

Anonymous 9/6/2025, 11:03:51 AM No.106500081 [Report] >>106500089 >>106500140

>>106499907
>>106499916
damn anon, that's a lot of shit
took 12 whole minutes to generate that, be sure to listen to the end :)
https://vocaroo.com/12ef4CDQg9pZ
I made clones from Sarah and Ellie from tlou and Violet from the incredibles, I especially like how you can hear paper shuffling at some points and Sarah flubs a line once

Anonymous 9/6/2025, 11:05:24 AM No.106500089 [Report] >>106500140

>>106499916
>>106500081
I just noticed it was your script that flubbed the line but it generated as a geniune mistake of someone reading too fast, incredible

Anonymous 9/6/2025, 11:13:32 AM No.106500140 [Report]

>>106500081
Cool, thank you.
The intonation is still off for e.g. "Hatsune Miku" or more generally for emphasizing the intended emotions of the story but for something that is machine-generated this is very impressive.
If someone were to leave me a voicemail using this I don't think I could reliably tell that it's not a human.

>>106500089
Yeah, I wrote this at like 1 am.

Anonymous 9/6/2025, 11:46:08 AM No.106500288 [Report] >>106500333 >>106500375 >>106500449

Now that he wave reached the plateau of XXXB/30~50A MoE models, how are we going to run the next upcoming MoE 70b~100b active parameters SOTA? Even CPUMAXXing and Macs start being slow as shit at those active parameter sizes.

Anonymous 9/6/2025, 11:49:22 AM No.106500301 [Report] >>106502322

four arms.png md5: 64ca7afb...

Anonymous 9/6/2025, 11:55:33 AM No.106500333 [Report]

>>106500288
the trend is towards lower active param count, not higher

Anonymous 9/6/2025, 12:01:48 PM No.106500375 [Report]

>>106500288
>Now that he wave reached the plateau of XXXB/30~50A MoE models
We still haven't hit that, biggest niggers on the block are <40BA, and trending downward.

Anonymous 9/6/2025, 12:06:50 PM No.106500404 [Report] >>106500441 >>106500604

>>106499814
words only. You can type haha and it will kind of do an actual laugh but I couldnt get it to do more than that. Maybe if you put laughing in the voice clone... I didnt try

https://voca.ro/1iU4VFpN4gXK

Anonymous 9/6/2025, 12:10:33 PM No.106500441 [Report] >>106500615

>>106500404
What voice did you sample to get this croaking harlot?

Anonymous 9/6/2025, 12:11:36 PM No.106500449 [Report]

>>106500288
- better quants
- different experts quanted differently
- wait for amd's giant multi-channel apus

Anonymous 9/6/2025, 12:17:28 PM No.106500501 [Report] >>106500521 >>106500615 >>106500676

1755058637386232.jpg md5: ce2a69d2...

I'm thinking of getting a 5060 16gb later this year, but I'm worried about a price hike. I'm using a dinosaur 2060.
It looks like a good, enduring buy. Even that nip blog says it's a very good entry-level card.
It's a shame AAA gaming is so shitty these days that the only thing you'd want a good card for is 'playing' with AI.

Anonymous 9/6/2025, 12:21:07 PM No.106500521 [Report]

>>106500501
simply don't play AAA and mod nightmarishly inefficient graphics enhancements into other games
i'm looking at a 5070 ti myself and cringing at the $

Anonymous 9/6/2025, 12:33:48 PM No.106500604 [Report] >>106500623

>>106500404
I plugged a clip with other sound effects and it kept using them but also repeating big chunks

Anonymous 9/6/2025, 12:36:12 PM No.106500615 [Report] >>106500652

>>106500501
If you care about ai the only things you might consider are the 5070 ti super with 24gb vram (750-1000) or the intel b60 dual (48gb $1200) or b60 single (24gb, $500) that will come out in the next 6 months or so. You wont regret the 5060 though. Great compatibility and there are ways to run qwen, wan, and shitty llm's on it (but it can run glm air 100b if you buy 96gb ram too). Plenty of fun stuff to do and while 16gb kinda sucks, this is also the best bang for buck to get into ai as a fun lil hobby.

What youre paying for with nvidia is compatibility. If you go the intel rout, plan on running old ai and primarily having it for llm's- with image, tts, video, etc having spotty support or nonexistent.

Also, it could be a year before buying these cards is actually viable.... no one knows whats going on. Also ignore all reviews saying 5060 sucks. The one thing it hugely improves on is ai performance where it easily doubles over previous generations.

>>106500441
[spoiler]Isabella valentine, sissy secretary or airhead university work great for femdom smut[/spoiler]

Anonymous 9/6/2025, 12:38:05 PM No.106500623 [Report] >>106500647

>>106500604
make sure its less than 30 seconds. It bugs out on long audio

Anonymous 9/6/2025, 12:42:00 PM No.106500647 [Report]

>>106500623
It is less than that. Might be because the prompt was similar to what was said in the audio or because I was testing with 3 or 5 steps and high cfg.

Anonymous 9/6/2025, 12:43:16 PM No.106500651 [Report]

>>106499958
nta

enjoy your daily gooning time

Anonymous 9/6/2025, 12:43:20 PM No.106500652 [Report]

>>106500615
NTA, but fuck, I knew I knew that voice...

Anonymous 9/6/2025, 12:45:45 PM No.106500670 [Report] >>106500879

>>106499967
be quick

model:
https://huggingface.co/sheliak/VibeVoice-Large_Mirror/tree/main

github:
https://github.com/great-wind/MicroSoft_VibeVoice

or Comfy-UI solutions

Anonymous 9/6/2025, 12:47:01 PM No.106500676 [Report] >>106500695

>>106500501
>5060
>It looks like a good, enduring buy.

Anon, I...

Anonymous 9/6/2025, 12:47:58 PM No.106500681 [Report]

>>106499780
Pyg

Anonymous 9/6/2025, 12:50:35 PM No.106500695 [Report]

>>106500676
I can't even get a 5070ti if I want to. They only have 16gb variants where I live.

Anonymous 9/6/2025, 1:02:32 PM No.106500752 [Report] >>106500809

VibeVoice is only EN and CN, right?

Anonymous 9/6/2025, 1:13:19 PM No.106500809 [Report]

>>106500752
Japanese works even though they said it doesn't support it. Not sure about other languages

Anonymous 9/6/2025, 1:22:26 PM No.106500865 [Report]

>>106499415
this is kino

Anonymous 9/6/2025, 1:24:50 PM No.106500879 [Report] >>106501145

>>106500670
which one should I get?

Anonymous 9/6/2025, 1:31:36 PM No.106500911 [Report] >>106500931

There's a 0.5B version for streaming? Too bad they'll never release it

Anonymous 9/6/2025, 1:31:36 PM No.106500912 [Report]

>>106499449
Hey, I did get a small tks boost with -fa off!

Anonymous 9/6/2025, 1:34:56 PM No.106500928 [Report] >>106501115

>>106499415
>kami-sama
lul

Anonymous 9/6/2025, 1:36:06 PM No.106500931 [Report]

>>106500911
With how bad the 1.5B is, you don't need a 0.5B noise generator

Anonymous 9/6/2025, 1:50:20 PM No.106501006 [Report] >>106501612

>VibeVoice
I have no use for voice cloning myself, but after a little search I found this https://github.com/wildminder/ComfyUI-VibeVoice whcih claims to support quantization, q4 or 8 should be manageable for the 7b even on potato hardware.

Anonymous 9/6/2025, 2:01:46 PM No.106501074 [Report]

>>106499415
>jav moans
dlsite has plenty of "ASMR" content with free samples, you could try using one of those.

Anonymous 9/6/2025, 2:09:02 PM No.106501115 [Report]

>>106500928
I think the machine translated sex talk was probably part of the reason it didn't work that well.

Anonymous 9/6/2025, 2:12:41 PM No.106501145 [Report] >>106501148 >>106501158 >>106501159 >>106501172

>>106500879

if you are not familiar with Comfy-UI, take this:
https://github.com/great-wind/MicroSoft_VibeVoice

This worked for me:
# create and atcivate VENC to isolate your instalation
conda create -n vibevoice python=3.12
conda activate vibevoice

# REF: installation instruction from github page
git clone https://github.com/great-wind/MicroSoft_VibeVoice.git
cd MicroSoft_VibeVoice/
pip install -e .
# flash attention can take long time to install
pip install flash-attn --no-build-isolation

# run with gradio interface after you downloaded the model to Microsoft_VibeVoice-Large
# from https://huggingface.co/sheliak/VibeVoice-Large_Mirror/tree/main
# you'll need all JSON files as well
# provide correct path instead of /path/to/model/folder/Microsoft_VibeVoice-Large/
python demo/gradio_demo.py --model_path /path/to/model/folder/Microsoft_VibeVoice-Large/

# or you can check out the smaller VibeVoice-1.5b
# from https://huggingface.co/microsoft/VibeVoice-1.5B/tree/main
# you'll need all JSON files as well
# provide correct path instead of /path/to/model/folder/Microsoft_VibeVoice-1.5b/
python demo/gradio_demo.py --model_path /path/to/model/folder/Microsoft_VibeVoice-1.5b/

Anonymous 9/6/2025, 2:13:42 PM No.106501148 [Report]

>>106501145
>atcivate VENC

(me) lol

Anonymous 9/6/2025, 2:15:06 PM No.106501158 [Report]

>>106501145
btw, docker did not work for me 'cause I'm retarded

Anyway, all this implies you already have Nvidia CUDA stuff installed

Anonymous 9/6/2025, 2:15:08 PM No.106501159 [Report]

>>106501145
Holy slop, please.

Anonymous 9/6/2025, 2:15:10 PM No.106501160 [Report] >>106501169 >>106501178 >>106501239 >>106501290 >>106501417 >>106503731

I've been looking to build a multi GPU setup, but I'm a bit stuck at what motherboard + CPU combo I should be looking for.
I'm looking for something ATX or EATX that'll take DDR5 memory and has multiple full bandwidth PCIE slots that are at least PCIE 4.
Any suggestions?

Anonymous 9/6/2025, 2:17:13 PM No.106501169 [Report] >>106501205

>>106501160
If you can't solve this issue on your own... You don't really want to build anything. Or perhaps you should learn some technical skills first?

Anonymous 9/6/2025, 2:17:22 PM No.106501172 [Report] >>106501230

>>106501145
I meant the safetensor, I was just downloading it since I read they were trying to delete it. I like hoarding stuff. The rest is too complicated for me :^)

Anonymous 9/6/2025, 2:18:14 PM No.106501178 [Report] >>106501257

>>106501160
Get any motherboard that supports x8/x8 mode for 2 gpus. It's nothing special.

Anonymous 9/6/2025, 2:22:20 PM No.106501205 [Report]

>>106501169
this, /g/ isn't the right place for such questions

Anonymous 9/6/2025, 2:23:16 PM No.106501214 [Report]

>>106497597 (OP)
embedded 'p?

Anonymous 9/6/2025, 2:26:07 PM No.106501230 [Report]

>>106501172

Large (7b) >>>> 1.5b

Anonymous 9/6/2025, 2:27:21 PM No.106501239 [Report] >>106501257

>>106501160
>I've been looking to build a multi GPU setup

useless unless you want to run two separate LLM instances

NUMA is slow as shit

Anonymous 9/6/2025, 2:28:38 PM No.106501245 [Report] >>106501778 >>106501805 >>106501925

https://vocaroo.com/14QgXnYa9n9R
Not bad.

Anonymous 9/6/2025, 2:31:04 PM No.106501257 [Report] >>106501297 >>106501342

>>106501178
The issue with that is the fact I'm looking at a triple GPU setup so I figured I'd get a workstation/HEDT motherboard that can properly support it.
Plus, motherboards that do 8x/8x mode are already $500+ so why consider those over a more dedicated system?
>>106501239
How slow are we talking here? I can accept a 20% penalty but if we're talking 50+ over full bandwidth, then I need to know.
Also, isn't every AI server in existence using multi GPU setups?

Anonymous 9/6/2025, 2:36:08 PM No.106501290 [Report] >>106501297

>>106501160
For my expensive multi GPU machine I bought this (octa-channel DDR4, 7x 16x PCIe 4.0): https://www.asrockrack.com/general/productdetail.asp?Model=ROMED8-2T#Specifications
For my cheap machine I bought some second-hand Xeon system off of ebay for 300€ (64 GB of quad-channel DDR4, 16x/8x/8x/8x PCIe 3.0).
For DDR5 there are I think no cheap options and keep in mind that as of right now there is no inference code available that is well-optimized for NUMA systems.

Anonymous 9/6/2025, 2:37:51 PM No.106501297 [Report]

>>106501257
>>106501290
>no inference code available that is well-optimized for NUMA systems.
What I meant is that there is no well-optimized code for CPU inference.
If the GPUs are on different NUMA nodes you also get more latency for data transfers unless you use something like NVLink.

Anonymous 9/6/2025, 2:45:12 PM No.106501342 [Report] >>106501349 >>106501360 >>106501442

>>106501257
>How slow are we talking here?
I have HP Z840 with two Xeon and 512-512MB DDR4 memory

I get the maximum of 4 t/s with DeepSeek-R1-0528-Q2_K_L and --cpu-moe if
model is cached entirely in NUMA0
llama-cli is run on CPU0
--threads matches the number of PHYSICAL cores of this single CPU0

You can run two instances of LLM on two CPUs if they are separated physically in NUMA

As you can see I have to isolate the memory and the cores to get the maximum.
All my attempts to get a bust by using the second CPU only slowed thing down considerably.
If the model does not fit entirely in a single NUMA unit, it sucks big time too
# Run the command
CUDA_VISIBLE_DEVICES="0," \
numactl --physcpubind=8-15 --membind=1 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-cli" \
--model "$model" $model_parameters \
--threads 8 \
--ctx-size $cxt_size \
--cache-type-k q4_0 \
--flash-attn \
--n-gpu-layers 99 \
--no-warmup \
--batch-size 8192 \
--ubatch-size 2048 \
--threads-batch 8 \
--jinja \
$log_option \
--prompt-cache "$cache_file" \
--file "$tmp_file" \
--cpu-moe

Anonymous 9/6/2025, 2:46:06 PM No.106501349 [Report]

>>106501342
>All my attempts to get a bust
Heh.

Anonymous 9/6/2025, 2:47:43 PM No.106501360 [Report]

>>106501342
this is how I cache the model to a specific NUMA unit on system start

model1="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00001-of-00008.gguf"
model2="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00002-of-00008.gguf"
model3="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00003-of-00008.gguf"
model4="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00004-of-00008.gguf"
model5="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00005-of-00008.gguf"
model6="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00006-of-00008.gguf"
model7="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00007-of-00008.gguf"
model8="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00008-of-00008.gguf"

#echo "Pre-caching Kimi-K2-Instruct-UD-Q2_K_XL"
numactl --cpunodebind=0 --membind=0 dd if=$model1 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model2 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model3 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model4 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model5 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model6 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model7 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model8 of=/dev/null bs=1M

This way, I can re-run a LLM in 20 seconds

Anonymous 9/6/2025, 2:49:02 PM No.106501367 [Report] >>106501386

hey frens is there a rentry for text to 3D
I want to look into the tech that makes the turn-around images because I think I could use it to fill gaps in my photogrammetry photo sets. It really hurts to get a great scan with one missing photo leaving a ugly low-detail smear

Anonymous 9/6/2025, 2:52:09 PM No.106501386 [Report] >>106501428

>>106501367
Would be easier to just go back to the location and take additional photos, then edit them to match color and lighting. It doesn't need to be perfect.
But as a professional you would know this already...

Anonymous 9/6/2025, 2:56:21 PM No.106501417 [Report] >>106503731

>>106501160
>multi GPU
If you're _definitely_ going to stop at 2 gpus, then
- go for an AM5 motherboard
- that has a pair of x16 pcie slots
- that can run in x16+x0 and x8+x8.

The overboard option is:
- Gigabyte mz33-cp1
- 9004/9005-series epyc
- (256mb L3 cache = 8 chiplets)

https://www.amd.com/en/products/specifications/server-processor.html

Anonymous 9/6/2025, 2:57:44 PM No.106501428 [Report]

>>106501386
I just want to try it and see if it works. I do a lot of photography and I found some really old sets with problems. I can't take a trip to fix them - even if I can find the same tree stump somewhere in a forest in a far away state

Anonymous 9/6/2025, 2:59:58 PM No.106501442 [Report] >>106501465

>>106501342
>HP Z840 with two Xeon and 512-512MB DDR4
8x 64gb lrdimms per socket ?

Anonymous 9/6/2025, 3:03:08 PM No.106501465 [Report]

>>106501442
>8x 64gb lrdimms per socket ?

Exactly. 1 Euro/Gb on ebay

llama.cpp CUDA dev !!yhbFjk57TDr 9/6/2025, 3:22:33 PM No.106501583 [Report] >>106501629 >>106501653 >>106501677 >>106501706 >>106501727 >>106501822 >>106501904

>>106497597 (OP)
Quick question: if you were to see a print like the following without further context, would you intuitively understand what it's supposed to mean?

llama_backend_print_memory: memory use: total = free + ( self = model + context + compute) + other
llama_backend_print_memory: - CUDA0 (RTX 4090): 24080 = 4064 + (19486 = 17868 + 64 + 1554) + 530 MiB
llama_backend_print_memory: - CUDA1 (RTX 4090): 24080 = 9952 + (13600 = 13401 + 48 + 150) + 528 MiB
llama_backend_print_memory: - CUDA2 (RTX 4090): 24080 = 5468 + (18083 = 17868 + 64 + 150) + 529 MiB
llama_backend_print_memory: - CUDA3 (RTX 4090): 24080 = 9945 + (13600 = 13401 + 48 + 150) + 535 MiB
llama_backend_print_memory: - CUDA4 (RTX 4090): 24080 = 9952 + (13600 = 13401 + 48 + 150) + 528 MiB
llama_backend_print_memory: - CUDA5 (RTX 4090): 24080 = 14434 + ( 9116 = 8934 + 32 + 150) + 529 MiB
llama_backend_print_memory: - CPU (EPYC 7742): 515628 = 515180 + ( 448 = 0 + 432 + 16) + 0 MiB

(Ignore that the values for CPU are wrong.)

Anonymous 9/6/2025, 3:26:30 PM No.106501612 [Report]

>>106501006
kinda nice. a bit buggy and needs more polish but it works (requiring 2 speakers is annoying, took longer to generate than a webui on pinokio did (2x as slow) and lacks streaming).

I can notice the quantized large is much worse than full precision though. Fucks up words all the time and misses a lot of nuance and pronunciation, skips words more often. In an annoying way. I was happier with 1.5b at full precision personally. But on occasion the large q4 can shine and not fuck up producing nicer sounding audio. useful option, probably run it at low cfg for no downside at all imo

Anonymous 9/6/2025, 3:29:11 PM No.106501629 [Report] >>106501932

>>106501583
No, print memory is vague as hell. Is it memory print or does it print into memory?

Anonymous 9/6/2025, 3:29:44 PM No.106501634 [Report] >>106501932

>>106499352
Those mic arrays are meant to allow the mics as a group to pick up a very tight bubble of sound created within the area of the speaker's head, without amplifying any sounds made outside of it. It requires several mics, and more is better.
> t. former acoustics engineer

Anonymous 9/6/2025, 3:31:58 PM No.106501653 [Report] >>106501932

>>106501583
Having it say "memory use" when the first value is total memory rather than memory used is slightly confusing and made me double and triple check whether I might be misunderstanding something.
Other than that it's easy to understand.

Anonymous 9/6/2025, 3:35:09 PM No.106501677 [Report] >>106501932

>>106501583
Should be used = ( self = model + context + compute) + other / total

Anonymous 9/6/2025, 3:39:22 PM No.106501706 [Report] >>106501932

>>106501583
Looks intelligible enough.
I like that the main 3 numbers (total, free, llm stuff) are right next to each.

Anonymous 9/6/2025, 3:42:28 PM No.106501727 [Report] >>106501932

>>106501583
"memory breakdown" if you want an alternative to memory use.

Anonymous 9/6/2025, 3:45:44 PM No.106501756 [Report] >>106501796

Chub is awesome but there are so few character cards intended for sfw or that don't lean to nsfw. I saw the jai dataset link, unfathomably based, but while I'm at work and on mobile are there any other solid sources to get more general non lewd character cards I ahould check out?

Anonymous 9/6/2025, 3:49:04 PM No.106501778 [Report]

1744162648167584.jpg md5: 54e324d5...

>>106501245
I see why it got taken down now.

Anonymous 9/6/2025, 3:51:18 PM No.106501796 [Report] >>106501808 >>106501856

>>106501756
Write your own. If you rely on internet 99% is trash anime themed garbage.

Anonymous 9/6/2025, 3:52:23 PM No.106501805 [Report] >>106501849

>>106501245
The original voice was already whispering, right?

Anonymous 9/6/2025, 3:53:03 PM No.106501808 [Report] >>106501814 >>106501841 >>106501856 >>106501896

>>106501796
NTA but I find using a self made character card vs. a good one that somebody else made feels kind of like the difference between jerking off or getting a handjob.

Anonymous 9/6/2025, 3:53:55 PM No.106501814 [Report] >>106501829

>>106501808
So what you're saying is that self-made cards are better?

Anonymous 9/6/2025, 3:54:52 PM No.106501822 [Report] >>106501932

>>106501583
Free is memory still available on the device.
Self is what llama.cpp is using for context, model weights, and compute buffers,
Other is vague. Is it some overhead from OS/driver or memory you cannot account for after (reported_total_memory - reported_free - lcpp_memory_usage)?

Anonymous 9/6/2025, 3:55:48 PM No.106501829 [Report] >>106501838

>>106501814
I'd make a remark about latent circumcision trauma but in your case it's probably so true that it's not even funny anymore.

Anonymous 9/6/2025, 3:57:10 PM No.106501838 [Report]

>>106501829
No part of my dick was stolen by the jews.

Anonymous 9/6/2025, 3:57:17 PM No.106501841 [Report] >>106502448

>>106501808
You are obviously too stupid if this is the first thing what comes into your mind. Jesus christ.

Anonymous 9/6/2025, 3:58:20 PM No.106501849 [Report]

>>106501805
Yes.

Anonymous 9/6/2025, 3:58:59 PM No.106501856 [Report]

>>106501796
You are 100% correct that I can, fair point and well met, but akin to what >>106501808 said, I'm pretty fresh and looking to experiment with the initial models I've downloaded, getting to know the ST system and need some good cards as reliable constants. That said, I've already written a few lore cards before going local, idk if they're good or function well but I would like to build my own cards as I become more competent and I appreciate your encouragement.

Anonymous 9/6/2025, 4:01:25 PM No.106501875 [Report] >>106501918 >>106501925 >>106501965 >>106502110 >>106502183

Vibevoice is shit and it's laughable you're comparing that trash with elevenlabs

Anonymous 9/6/2025, 4:01:51 PM No.106501881 [Report]

>>106499780
Utopia

Anonymous 9/6/2025, 4:03:02 PM No.106501896 [Report] >>106502448

>>106501808
you are most likely getting a handjob from an indian sir

Anonymous 9/6/2025, 4:04:04 PM No.106501904 [Report]

>>106501583
home fire imminent?

Anonymous 9/6/2025, 4:05:11 PM No.106501918 [Report] >>106501930

>>106501875
It's the forbidden fruit.

Anonymous 9/6/2025, 4:06:05 PM No.106501924 [Report] >>106502033 >>106502149 >>106502167

30474 - SoyBooru.png md5: 05b972a4...

>Big news: Introducing Qwen3-Max-Preview (Instruct) — our biggest model yet, with over 1 trillion parameters!

>Now available via Qwen Chat & Alibaba Cloud API.

>Benchmarks show it beats our previous best, Qwen3-235B-A22B-2507. Internal tests + early user feedback confirm: stronger performance, broader knowledge, better at conversations, agentic tasks & instruction following.

>Scaling works — and the official release will surprise you even more. Stay tuned!

Let's wait for more kiwis to ripen!

(Official release? Another 1T model?) (Not impressed) (Kimi is better) (Still waiting for video/image gen)

Anonymous 9/6/2025, 4:06:15 PM No.106501925 [Report] >>106501942 >>106502224

>>106501875
Can you do >>106501245 on Elevenlabs for comparison?
おはようございます、お兄様……今日も元気ですね……この雑魚チンポって……はい、もちろん、雑魚チンポをくわえてあげますわ……

Anonymous 9/6/2025, 4:06:41 PM No.106501930 [Report]

>>106501918
/lmg/ falling for the streisand effect episode

llama.cpp CUDA dev !!yhbFjk57TDr 9/6/2025, 4:06:41 PM No.106501932 [Report]

>>106501629
>>106501634
>>106501653
>>106501677
>>106501706
>>106501727
Thank you.

>>106501822
"other" is all used memory that the program is not accounting for in the other columns.
This includes other programs and VRAM consumed by e.g. its own CUDA contexts.

Anonymous 9/6/2025, 4:07:44 PM No.106501942 [Report] >>106501970

>>106501925
Give the voice reference first

Anonymous 9/6/2025, 4:10:09 PM No.106501965 [Report] >>106502243

>>106501875
7b is the best zero-shot voice cloning I tested, even the weird voices sound great at only 5 steps. it's a shame there is no steering.

Anonymous 9/6/2025, 4:10:14 PM No.106501966 [Report] >>106501989

So tts bros, we're back?

Anonymous 9/6/2025, 4:10:43 PM No.106501970 [Report] >>106502099

>>106501942
Of course, here's the reference clip:
https://vocaroo.com/156YXJRfOXV0

Anonymous 9/6/2025, 4:13:29 PM No.106501989 [Report]

>>106501966
It's not fast enough for realtime so no not really

Anonymous 9/6/2025, 4:17:34 PM No.106502016 [Report] >>106502046 >>106502400

GLM-chan's reasoning is cute sometimes.
>Final call: translate accurately but flag the explicit nature. If this is for academic purposes, they need the raw meaning; if it's for... other purposes, well, at least they've been warned.

Anonymous 9/6/2025, 4:19:20 PM No.106502033 [Report]

>>106501924
>I just have a feeling that... it is much smarter. Not reflected by the common benchmarks, but it is just way better than the models before. This gives us much confidence in scaling, either model or data size.
Good vibes bro! Not our problem that it keeps dropping slop in English and has terrible spatial awareness.

Anonymous 9/6/2025, 4:20:43 PM No.106502046 [Report]

>>106502016
>This is clearly fetish fuel but I'm supposed to follow the user's request so here goes...

Anonymous 9/6/2025, 4:23:18 PM No.106502070 [Report] >>106502108 >>106502138 >>106502156

G0I83CjbUAAyjjK.jpg md5: 9474eaef...

Anonymous 9/6/2025, 4:24:32 PM No.106502081 [Report]

>>106499477
That is actually a win if you read it, training on copyrighted books was not a issue, even pirating them was not the issue, it was that they admitted to keeping them for purposes other than training on them

Anonymous 9/6/2025, 4:26:33 PM No.106502099 [Report] >>106502117

>>106501970
>kotoshi mo?

shouldn't it be "konya mo" between the lines?

Anonymous 9/6/2025, 4:27:18 PM No.106502108 [Report] >>106502118

>>106502070
>LLM as judge
Nice benchie, bro. Does. It. Like. It. When. I. Write. Like. This?

Anonymous 9/6/2025, 4:27:24 PM No.106502110 [Report]

>>106501875
elevenlabs is not as good though

Anonymous 9/6/2025, 4:28:19 PM No.106502117 [Report] >>106502147 >>106502158

>>106502099
I think it's from a new year's stream or something.

Anonymous 9/6/2025, 4:28:29 PM No.106502118 [Report] >>106502163

>>106502108
no, it likes good writing, hence why opus and deepseek are at the top

Anonymous 9/6/2025, 4:31:24 PM No.106502138 [Report] >>106502145

>>106502070
DS V3.1 is dry as FUCK. The dialogues are more coherent tho.

Anonymous 9/6/2025, 4:32:40 PM No.106502145 [Report]

>>106502138
skill issue

Anonymous 9/6/2025, 4:32:52 PM No.106502147 [Report]

>>106502117
this makes sense

Anonymous 9/6/2025, 4:32:54 PM No.106502149 [Report]

>>106501924
hard to care about a 1t chinese model with benchmarks a rounding error away from other large models.

Anonymous 9/6/2025, 4:33:40 PM No.106502156 [Report]

>>106502070
>Qwen3 Max worse than 235B
what did qwen mean by this
are the extra params solely for the purpose of fitting more slop?

Anonymous 9/6/2025, 4:34:07 PM No.106502158 [Report]

>>106502117
True

omisoka etc

Anonymous 9/6/2025, 4:34:40 PM No.106502163 [Report] >>106502171 >>106502184 >>106502201 >>106502404

longcat eqbench.png md5: e404eaf4...

>>106502118
LongCuck bros... Why is our model so low? Now ggerganov will not implement it for sure.

Anonymous 9/6/2025, 4:34:53 PM No.106502167 [Report]

>>106501924
only thing I care about from them at the moment is 235B VL DESU.

Anonymous 9/6/2025, 4:36:03 PM No.106502171 [Report]

>>106502163
longcat is legit shit, dumb, dry and censored as fuck

Anonymous 9/6/2025, 4:37:56 PM No.106502183 [Report] >>106502243

>>106501875
normally you would be right. Zonos, dia, orpheus, kokoro, chatterbox, higgs... The graveyard is vast and littered with unstable unusable garbage cope tts.

This shit is usable and good quality. I'd say it's better than launchday elevenlabs. I havent used elevenlabs in a while though so I'm not sure how they compare, but I do know that after using vibe my motivation to ever bother with them is at zero. I always hated their rates and addictive pay per-token bullshit. All it cost was a 5090 and I'll break even sometime in the next decade.

Anonymous 9/6/2025, 4:38:04 PM No.106502184 [Report] >>106502207

SoyBooru.com - 8805 - 2soyjaks animal arm bear beard cap clothes ear glasses goatee hair hat longcat open_mouth ruler sneed soyjak stubble thrembo variant_56jak yellow_hair yellow_skin.png md5: 5ca7f7cb...

>>106502163
LMAO
SAFER THAN SONNET

Anonymous 9/6/2025, 4:39:26 PM No.106502201 [Report] >>106502231

>>106502163
>gemma 3 12B
>competes with Mistral 3.2 24B
What sort of benchmark is this anyway?

Anonymous 9/6/2025, 4:39:48 PM No.106502207 [Report]

>>106502184
sonnet really is not that censored, a prefill fixes it

Anonymous 9/6/2025, 4:41:10 PM No.106502224 [Report]

>>106501925
nta
Large
1st swipe
CFG 1.3

https://vocaroo.com/1ngWHN4SaAiL

Anonymous 9/6/2025, 4:41:49 PM No.106502227 [Report] >>106502278 >>106502298 >>106502454 >>106503045 >>106503135

https://github.com/cline/cline/issues/6053 he's not giving up that easily

Anonymous 9/6/2025, 4:42:22 PM No.106502231 [Report] >>106502469 >>106502481

>>106502201
The only benchmark that matters, a creative writing benchmark.

Anonymous 9/6/2025, 4:43:13 PM No.106502243 [Report] >>106502490

>>106501965
>>106502183

can it be fine-tuned though?

how do I set pauses and intonation (emotiona are easy)

Anonymous 9/6/2025, 4:44:36 PM No.106502258 [Report]

>VibeVoice

1.5b is not fast that 7b

Anonymous 9/6/2025, 4:46:35 PM No.106502278 [Report] >>106502296 >>106502298 >>106502432

>>106502227
Should have read the license:
> 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
>Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
NO REFUNDS!

Anonymous 9/6/2025, 4:49:09 PM No.106502296 [Report]

>>106502278
"You are absolutely right.docx"
that is a troll

Anonymous 9/6/2025, 4:49:36 PM No.106502298 [Report] >>106502432

>>106502227
>>106502278
https://github.com/cline/cline/issues/6053#issuecomment-3262115812
>IM NOT BLUFFING IM IN LOSS AT LEAST $20 000 A COST DIRECTLY PROVEN TO BVE CLINE FAULT
LMAOOOOOOO

Anonymous 9/6/2025, 4:52:21 PM No.106502322 [Report]

>>106500301
I'd still pull one pair of those arms back from behind.

Anonymous 9/6/2025, 5:00:33 PM No.106502400 [Report] >>106502534

>>106502016
>...Final check: Is "kupa~" maybe a typo? Like "くちゃ〜"? But no, they wrote it clearly. I’ll stick with my gut. *types translation*
GLM-chan...

Anonymous 9/6/2025, 5:01:25 PM No.106502404 [Report] >>106502542 >>106502575

>>106502163
Who tf pressure to censored open models anyway? Did they do this voluntarily or is AI safety another DEI situation?

Anonymous 9/6/2025, 5:01:28 PM No.106502406 [Report] >>106502445 >>106502478 >>106502521 >>106502813

So, for the average 32 to 64gb of RAM + 8 to 16gb of vram machine, qwen 30BA3B is pretty much the best thing you can run if you need fast and capable, right?
Coom aside. I'm looking for something that's not completely braindead, that can run on an "average" gaming computer, and ideally that can receive 20k token of information and not shit the bed by hallucinating like crazy when populating a Json using said information.

Anonymous 9/6/2025, 5:04:44 PM No.106502432 [Report]

>>106502278
>>106502298

Nowadays, you don't even have to read the rules of the game

Let AI spoon-feed you

Anonymous 9/6/2025, 5:04:47 PM No.106502433 [Report] >>106502444

file.png md5: 48279f3d...

>32 to 64GB of RAM
>average

Anonymous 9/6/2025, 5:06:18 PM No.106502444 [Report]

>>106502433
steamy gaming retards do not apply

Anonymous 9/6/2025, 5:06:33 PM No.106502445 [Report]

>>106502406
Also, Qwen3-30B-A3B-Thinking-2507 vs Qwen3-30B-A3B-Instruct-2507.
Which do you guys think makes more sense generally speaking?

Anonymous 9/6/2025, 5:06:50 PM No.106502448 [Report]

>>106501841
>>106501896
Go be a greasy kike somewhere else.

Anonymous 9/6/2025, 5:07:19 PM No.106502454 [Report]

>>106502227
Retards should get a license before getting allowed to run any LLMs

Anonymous 9/6/2025, 5:09:15 PM No.106502469 [Report]

>>106502231
Then it's a double whammy - it's funny how similar everything is. Tells a lot about their training data.

Anonymous 9/6/2025, 5:10:19 PM No.106502478 [Report] >>106502528

>>106502406
Llama 3.3 8B should be enough with structured output

Anonymous 9/6/2025, 5:10:50 PM No.106502481 [Report] >>106502622

>>106502231
>a creative writing benchmark
Holy fuck, somebody has finally taken the time to run a bunch of models and manually look at their outputs?
Kudos to the guy(s) doing that.

Anonymous 9/6/2025, 5:11:51 PM No.106502490 [Report] >>106502674 >>106502674

>>106502243
I wouldn't hope for that, they won't release the training code and that size doesn't make it easy

Anonymous 9/6/2025, 5:12:45 PM No.106502501 [Report] >>106502522 >>106502524 >>106502555

>>106495566
What was the stated intent?

Anonymous 9/6/2025, 5:15:41 PM No.106502521 [Report] >>106502528

>>106502406
at the top end of that range you could a not-totally-brainded GLM air quant but otherwise yeah, 30a3 is probably the best all-rounder for the lower-middle end when speed is taken into account

Anonymous 9/6/2025, 5:15:42 PM No.106502522 [Report] >>106502586

>>106502501
they believed people would not use it for porn, like lol? like lmao even

Anonymous 9/6/2025, 5:15:57 PM No.106502524 [Report]

>>106502501
Corporate slop, what else

Anonymous 9/6/2025, 5:16:44 PM No.106502528 [Report] >>106502551

>>106502478
There's no lama 3.3 8B as far as I can tell. Only 3.1.
But fair enough, might as well give that a try. Q6 should fit in 8GB of VRAM with an okay amount of q8 context.

>>106502521
>at the top end of that range you could a not-totally-brainded GLM air quant
That's what I run personally, but I'm aiming a little lower.
Thank you anons.

Anonymous 9/6/2025, 5:17:33 PM No.106502534 [Report] >>106502631

>>106502400
post log

Anonymous 9/6/2025, 5:18:51 PM No.106502542 [Report]

>>106502404
it started with schizo true believer in robot apocalypse, who went knocking on every door spreading his gospel, until some psychos noticed him and decided they can use his drivel to browbeat the competition and impose regulatory capture on the market.

Anonymous 9/6/2025, 5:19:57 PM No.106502551 [Report]

>>106502528
>There's no lama 3.3 8B as far as I can tell
It's on OR only, my bad. Yes even 3.1 should be enough. If the context is causing issues, you should use some heuristics or RAG to cut down the irrelevant tokens

Anonymous 9/6/2025, 5:20:19 PM No.106502555 [Report]

>>106502501
gibing them free code patches and research paper citations

Anonymous 9/6/2025, 5:23:07 PM No.106502575 [Report]

>>106502404
Anthropic ruined this field like you wouldn't believe

Anonymous 9/6/2025, 5:24:55 PM No.106502586 [Report]

>>106502522
I guess safety people assume everyone is like they are, mentally shriveled up and scared of themselves, and it's just a tiny minority who would even think about sex.

Anonymous 9/6/2025, 5:25:23 PM No.106502591 [Report] >>106502698

how compatible is vibevoice with llamacpp?

Anonymous 9/6/2025, 5:27:17 PM No.106502605 [Report] >>106502664 >>106503752

Oh yeah, one more question. Is there a point in using full precision context with a quanted model or is the extra information lost anyway when it gets run through the quanted weights?
And yeah, I'm aware that the prevalent wisdom is that q8 context is supposedly "indistinguishable from full precision" generally speaking.

Anonymous 9/6/2025, 5:28:39 PM No.106502622 [Report]

>>106502481
Sadly it's judged by LLMs.

Anonymous 9/6/2025, 5:29:29 PM No.106502631 [Report]

>>106502534
I just told it to translate some onomatopoeia

Anonymous 9/6/2025, 5:32:36 PM No.106502664 [Report]

>>106502605
>the prevalent wisdom is that q8 context is supposedly "indistinguishable from full precision" generally speaking.
where did you get that?

Anonymous 9/6/2025, 5:33:32 PM No.106502674 [Report]

vibevoice-training.png md5: a1f60994...

>>106502490
>>106502490
>they won't release the training code

They eventually would. At least, it was their intention before they pulled

Anonymous 9/6/2025, 5:36:56 PM No.106502698 [Report] >>106502731

>>106502591
just use it with comfy, in comfy manager just search vibevoice

Anonymous 9/6/2025, 5:40:29 PM No.106502731 [Report] >>106502761

>>106502698
Me too, but i figured gguf would be considerably faster

Anonymous 9/6/2025, 5:43:47 PM No.106502751 [Report]

speed is stored in the goofs

Anonymous 9/6/2025, 5:44:52 PM No.106502761 [Report] >>106502770 >>106502868

>>106502731
ggufs are not faster though? in fact they are slower than using fp8 / fp4 if your gpu supports those

Anonymous 9/6/2025, 5:45:40 PM No.106502770 [Report] >>106502777

>>106502761
That's only for diffusion models

Anonymous 9/6/2025, 5:46:23 PM No.106502777 [Report]

>>106502770
that is not true at all

Anonymous 9/6/2025, 5:51:01 PM No.106502813 [Report] >>106502843 >>106502932

not bad.jpg md5: 0f16a69b...

>>106502406
>llama-server -m Qwen3-30B-A3B-Instruct-2507-Q5_K_M.gguf --verbose --threads 8 --threads-batch 16 --batch-size 512 --ubatch-size 512 --n-cpu-moe 41 --gpu-layers 99 --override-kv qwen3moe.expert_used_count=int:10 -c 32000 -fa auto --no-mmap --cache-reuse 256 --offline
>7.5GB VRAM + 25ish GB RAM used
Yeah, alright.
The response was really good too.
I'll do more testing, but this will do.

Anonymous 9/6/2025, 5:54:06 PM No.106502843 [Report] >>106502914

>>106502813
good luck with your saas

Anonymous 9/6/2025, 5:56:06 PM No.106502868 [Report] >>106502893

>>106502761
>fp8 / fp4
Aren't those significantly worse quality? I knew researchers hate it but the idea of generic 4bit floating point is probably just retarded.

Anonymous 9/6/2025, 5:58:19 PM No.106502893 [Report]

>>106502868
fp8 is super close, fp4 is a bit worse but there are work arounds such as nunchuku that keeps the params hurt by quantization at fp16

Anonymous 9/6/2025, 6:00:31 PM No.106502914 [Report]

>>106502843
Not SaaS. A local app powered by local llms.
It's just a toy project, probably won't become anything bigger, but it's fun fucking around anyhow.

Anonymous 9/6/2025, 6:01:55 PM No.106502932 [Report] >>106502986

>>106502813
That's a pretty good pp speed for your hardware.

Anonymous 9/6/2025, 6:04:35 PM No.106502954 [Report] >>106502973 >>106503043 >>106503175

serious Pepe.png md5: 0a01aeea...

VibeVoice seems to ignore punctuation...

What do?

Anonymous 9/6/2025, 6:05:47 PM No.106502973 [Report] >>106502985

>>106502954
/n

igger

Anonymous 9/6/2025, 6:07:16 PM No.106502985 [Report]

>>106502973
spoonfeeding frogposters only makes them dependent and attracts more

Anonymous 9/6/2025, 6:07:18 PM No.106502986 [Report]

my furnace.png md5: 119dc941...

>>106502932
It's a notebook too.
I'll chalk it up to CUDADEV's latest shenanigans with 512 batch size PP. If I didn't need all that context, I'd probably increase batch size to 2048, which seems to be the new sweet spot as far as I can tell from the PR.

Anonymous 9/6/2025, 6:10:40 PM No.106503016 [Report]

>play with Gemma 3
>proceed to describe all sorts of depravity for 4k+ tokens
>then out of the blue it displays disclaimer and says it can't continue
>call it out and tell it about the past context
>it rephrases my mumbo jumbo system prompt and tells "I'm still in development and some procedures are difficult to follow"
That's kind of cute. Just weird that it can instantly reverse the vectors like that. Usually this will happen if you mention something out of the blue. Slow grooming is always more stable.

Anonymous 9/6/2025, 6:12:29 PM No.106503043 [Report]

>>106502954
TTS is always better when you clean up the strings and only leave periods. Even ellipses and question marks etc are better left out. Of course vibevoice is more robust model but wouldn't hurt to clean up the strings anyway.

Anonymous 9/6/2025, 6:12:38 PM No.106503045 [Report]

>>106502227
Administrative Content Contamination: ELIMINATED
Text Fragmentation: ELIMINATED
StingrayRowMapper Abstract Class: IMPLEMENTED
3-Tile UI Design: PRESERVED
Professional-Grade Results: ACHIEVED
Spine: SHIVERED
Ozone: SMELLED
Air: THICK
Breath: HITCHED
Beat: SKIPPED

Anonymous 9/6/2025, 6:20:13 PM No.106503135 [Report]

lainlaugh.gif md5: 24b90682...

>>106502227
>API Request$2.5951
>context loaded up with .docx files

Anonymous 9/6/2025, 6:21:20 PM No.106503146 [Report] >>106503180 >>106503184 >>106503225 >>106503281 >>106503336

fact.png md5: 825d68a5...

Generals are the things that are killing 4chan

Anonymous 9/6/2025, 6:23:28 PM No.106503170 [Report]

Not local, but thoughts on sonoma? Free on openrouter right now. 2mil context!

Anonymous 9/6/2025, 6:23:42 PM No.106503175 [Report] >>106503442

>>106502954
The only issue I've noticed is that if more than 3 or thereabouts sentences are on a single Speaker <n>: line then full stops can become shorter. Solution is to linebreak every ~3-4 sentences and add another speaker line with the same speaker ID to get regular pause lengths back.

Anonymous 9/6/2025, 6:24:03 PM No.106503180 [Report] >>106503192 >>106503201 >>106503207

>>106503146
I blame moot for not wanting to make more boards like /vg/.

Anonymous 9/6/2025, 6:24:34 PM No.106503184 [Report]

>>106503146
Nah it is tranny moderation. But to be honest 4chan is long since dead. If you can get banned for racism you know this place is just reddit without logins.

Anonymous 9/6/2025, 6:25:14 PM No.106503192 [Report]

>>106503180
I too blame moot, but for different reasons.

Anonymous 9/6/2025, 6:26:16 PM No.106503201 [Report] >>106503223

>>106503180
Who or what is "moot"?

Anonymous 9/6/2025, 6:27:04 PM No.106503207 [Report] >>106503364 >>106503862

>>106503180
I blame discord and other social media attracting all new blood.

Anonymous 9/6/2025, 6:27:06 PM No.106503208 [Report]

1622116978237.jpg md5: b39a20bb...

Anonymous 9/6/2025, 6:28:53 PM No.106503223 [Report]

>>106503201
A famous namefag back in the day

Anonymous 9/6/2025, 6:29:22 PM No.106503225 [Report]

>>106503146
generals saved 4chan, the rest are zoomer trash fishing for attention or bots

Anonymous 9/6/2025, 6:31:17 PM No.106503243 [Report]

Furk will save /g/

Anonymous 9/6/2025, 6:33:37 PM No.106503281 [Report] >>106503330 >>106503405

>>106503146
Generals are the things that killed 4chan*

Anonymous 9/6/2025, 6:37:55 PM No.106503330 [Report]

>>106503281
you're in one, get out

Anonymous 9/6/2025, 6:38:20 PM No.106503336 [Report] >>106503367

>>106503146
>he says in one of the few threads on this board having actual technical discussions
sorry your tech support question got bumped off

Anonymous 9/6/2025, 6:41:02 PM No.106503364 [Report]

>>106503207
They probably keep the traffic numbers from collapsing, but there's a reason the IP counter was disabled. That new blood just comes here to stir shit since moderation is near non-existant, then they take their screencaps back to their other social media. Place is like a festering carcass.

Anonymous 9/6/2025, 6:41:16 PM No.106503367 [Report]

>>106503336
I wouldn't quite qualify "what model to run on 3060" as technical discussions

Anonymous 9/6/2025, 6:44:05 PM No.106503405 [Report] >>106503445

bingslop001.png md5: e2bb61cb...

>>106503281

Anonymous 9/6/2025, 6:47:17 PM No.106503442 [Report]

>>106503175
>Solution is to linebreak every ~3-4 sentences and add another speaker line with the same speaker ID to get regular pause lengths back.

ty, kind anon

Anonymous 9/6/2025, 6:47:28 PM No.106503445 [Report] >>106503479

>>106503405
>he's not green
>no "No picture available." text
>no shirt and tie
>hand going through his neck
AI can't do ANTHING right!

Anonymous 9/6/2025, 6:50:09 PM No.106503479 [Report]

>>106503445
there even meta-shittiness. I asked bing to make the pic, it refused, I said "think deeper", it spat out more detailed gen instructions, then told it "prefect now make the pic" and it complied.
it can't even refuse correctly. typical microsoft

Anonymous 9/6/2025, 6:53:53 PM No.106503518 [Report]

>>106499389
https://www.lalal.ai/voice-cleaner/
found this site. basically it will give you a preview then to download it they try to jew you but you can just use a chromium extension that records browser audio so w/e lol
>Chrome Audio Capture
tried that site you posted, cleanvoice, and it gave a really loud result with some static

Anonymous 9/6/2025, 6:59:58 PM No.106503587 [Report]

ミグ.png md5: 05919afd...

https://x.com/taka84_mmd/status/1964347177616253283

Anonymous 9/6/2025, 7:01:37 PM No.106503602 [Report] >>106503701 >>106503731 >>106503756 >>106503762

Say I wanted to build a CPU maxx platform I could stick a couple of GPUs into at a later date.
What's the most cost effective option chrrently for a DRR 5 platform with at least 512gb of Ram?

Anonymous 9/6/2025, 7:10:08 PM No.106503701 [Report]

>>106503602
>the most cost effective option

lmao even

Anonymous 9/6/2025, 7:13:38 PM No.106503731 [Report]

>>106503602
Scroll up to previous similar: >>106501160
DDR5 platforms: >>106501417

Anonymous 9/6/2025, 7:15:56 PM No.106503752 [Report]

>>106502605
It's the opposite, quanting weights is okay and almost unnoticeable down to Q4, but quanting context is terrible.

Anonymous 9/6/2025, 7:16:11 PM No.106503756 [Report]

>>106503602
If you're going to cpumaxx, it doesn't make sense to get anything less than 1TB of ram with how big models keep getting.

Anonymous 9/6/2025, 7:17:23 PM No.106503762 [Report] >>106503824

>>106503602
no point in 512gb ddr5 when you can get 8/12 channel ddr4 (768gb-1.5TB)

Anonymous 9/6/2025, 7:22:35 PM No.106503824 [Report] >>106503854 >>106503985 >>106504044

>>106503762
512gb ddr5 + heavily quanted model = barely fast enough to regularly use
1024gb ddr4 + lightly quanted model = toy you'll run once only for the benchmarks

Anonymous 9/6/2025, 7:26:12 PM No.106503854 [Report]

>>106503824
the 3 t/s you get even on ddr5 makes it also a toy, heavily quanted makes it a worthless toy

Anonymous 9/6/2025, 7:26:50 PM No.106503862 [Report] >>106503914

>>106503207
The damage caused by Discord to human knowledge is immeasurable.

Anonymous 9/6/2025, 7:31:44 PM No.106503914 [Report]

>>106503862
Looking at some of the issues and prs on projects like vllm is amusing. A lot of time the description isn't much more than "as was discussed on discord" with no further details. With all development and tech support is happening on there now, I got to wonder if the big labs make any attempt to scrape discord. Maybe discord will come out with their own llm and come out on top just from their access to data.

Anonymous 9/6/2025, 7:37:55 PM No.106503985 [Report]

>>106503824
That's my thinking as well.

Anonymous 9/6/2025, 7:42:42 PM No.106504044 [Report] >>106504083 >>106504204 >>106504213 >>106504242

>>106503824
what has to happen, hardware wise, for people to run the larger models at decent speeds at home?

Anonymous 9/6/2025, 7:44:56 PM No.106504083 [Report]

>>106504044
Cheap gddr6+ vram

Anonymous 9/6/2025, 7:47:30 PM No.106504121 [Report]

>>106499018

Anonymous 9/6/2025, 7:48:04 PM No.106504130 [Report]

>>106499018
>>106499389
Is there any consistent way to control the emotion presented when you generate a voice clip?

Anonymous 9/6/2025, 7:53:59 PM No.106504204 [Report]

>>106504044
buy the 96GB 4090s from china bwo

Anonymous 9/6/2025, 7:54:38 PM No.106504213 [Report]

>>106504044
- faster ram
- more memory channels
- something to make prompt processing faster, eg: intel amx
- support for the same int and fp formats available on gpu

socket sp5 + epyc genoa = 12* 4800* 8 = 460,800 MB/s
socket sp5 + epyc turin = 12* 6400* 8 = 614,400 MB/s

socket sp7 might be 16 channels of ddr5, so at least 16* 6400* 8 = 819,200 MB/s

Anonymous 9/6/2025, 7:56:53 PM No.106504242 [Report]

>>106504044
I don't want better hardware, I'm pretty sure not all low hanging fruits in software optimization are picked yet.

Anonymous 9/6/2025, 8:00:52 PM No.106504292 [Report]

>>106504274
>>106504274
>>106504274

Anonymous 9/6/2025, 8:01:35 PM No.106504305 [Report]

VibeVoice-Large hallucinates if two voice are way too different (Japanese female whisper and Suomi male)