← Home ← Back to /g/

Thread 106497597

352 posts 64 images /g/
Anonymous No.106497597 >>106497859 >>106499477 >>106501214 >>106501583
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106491545 & >>106481874

►News
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905
>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager
>(09/04) Google released a Gemma embedding model: https://hf.co/google/embeddinggemma-300m
>(09/04) Chatterbox added better multilingual support: https://hf.co/ResembleAI/chatterbox

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106497599
►Recent Highlights from the Previous Thread: >>106491545

--Klear-46B model training methodology and benchmark performance analysis:
>106492824 >106492846 >106492855 >106492872 >106492877 >106492882 >106492885 >106492903 >106493017 >106493058 >106493088
--AI-generated loli podcast creation using VV voice cloning and GLM text generation:
>106495961 >106495966 >106496018 >106496034 >106496055 >106496061 >106496121 >106496139 >106496144 >106496157 >106496189 >106496197 >106496208
--German supercomputing expansion and copyright law compliance challenges:
>106493305 >106493329 >106493355 >106493378 >106493423 >106493481 >106494001 >106493977 >106493529
--Qwen Max model updates and community collaboration efforts:
>106491646 >106492302 >106492366 >106492394 >106492411 >106492421 >106492430 >106492428
--Balancing data quality and diversity in machine learning training:
>106492910 >106492929
--VibeVoice-Large's capabilities and controversy:
>106494251 >106494424 >106494708 >106494778 >106495648 >106494801 >106494950 >106495166 >106495298 >106495187 >106495273 >106495566 >106495590 >106495612 >106495637 >106495639 >106495671 >106495689 >106495101
--Challenges with managing R1's censorship and card-based context switching:
>106493572 >106495514
--Temperature settings tradeoff between tool call accuracy and answer quality in local LLMs:
>106491720 >106491751 >106491761 >106491845 >106491888 >106491989
--Gender bias in doctor riddle from Qwen3-Max-Preview:
>106493573 >106494565 >106494593 >106496265
--Qwen3-Max-Preview (Instruct) outperforms peers in benchmark tests:
>106492622 >106492630 >106492638
--VibeVoice model optimization challenges for single-voice applications:
>106496609 >106496636 >106496646
--Analyzing Qwen3 Max's distinctive generation style:
>106493524
--Miku (free space):
>106493154 >106493503 >106493190 >106494251 >106497578

►Recent Highlight Posts from the Previous Thread: >>106491549

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106497662
Mikulove
Anonymous No.106497689
You have them, right?
Anonymous No.106497859 >>106499352
>>106497597 (OP)
geez Peter, TWO mics?
Anonymous No.106497883
>A loli whispers in my ear
THANK YOU MICROSOFT
AHAHAHAHA
Anonymous No.106497909 >>106497916 >>106497930 >>106497999
I hate microsoft.
Xi please release the same or better model.
Please your parade was very impressive.
Anonymous No.106497916 >>106497927
>>106497909
the only good chinese model is wan
Anonymous No.106497927 >>106497963
>>106497916
>the only good chinese model is wan
The only good local is wan? are you baiting me?
Anonymous No.106497930
>>106497909
here's your chinese tts bro
https://www.youtube.com/watch?v=mnfLp9O96ak
Anonymous No.106497963
>>106497927
model, deepseek / kimi are not where near claude / gpt / gemini
Anonymous No.106497992 >>106498193 >>106498240 >>106498251
where do I get good voice clone clips
Anonymous No.106497999
>>106497909
The model is from MSRA in Beijing with a full Chinese team
It's by all means a Chinese model
Anonymous No.106498009 >>106498122
what are some absolutely necessary loli voices I should be cloning right now?
Anonymous No.106498122
>>106498009
Aya Hirano
Anonymous No.106498148 >>106498210 >>106498297
holy shit there are no good online tools for making multiple cuts to an mp3 file lmfao
Anonymous No.106498156 >>106498163
Where can I get removed VibeVoice large?
Anonymous No.106498163
>>106498156
click it then press delete, then go to your recycle-bin and empty it
Anonymous No.106498193
>>106497992
youtube
Anonymous No.106498210 >>106498226 >>106498228 >>106498297
>>106498148
ask your favorite llm about ffmpeg
Anonymous No.106498215
Looking for suggestions for an uncensored lite local model for using on my phone. Purely informational.
Anonymous No.106498226 >>106498297
>>106498210
this, gpt5's codex / claude code automate so much stuff for me, I just ask it to make some script to do something and it takes like a minute
Anonymous No.106498228 >>106498241
>>106498210
nah ffmpeg does a lot but if you need to cut up an audio file a whole bunch of times to remove voices from other characters or sounds you really need a gui to plan the cuts and you know not type a billion things into a terminal constantly. found some site called soundtrap that can do what I want. if I get into this enough then I'll just download audacity or something
Anonymous No.106498240 >>106498271
>>106497992
just make your own in audacity. Most good TTS only want 20-30 seconds so. Focus on quality above all. The audio should be clean with no noise. This is where most people fuck up because the voice they are trying to clone doesnt have good audio sources (music, sound effects, static, background noises, traffic etc).

People do share models but you'll eternally be using taylor swift, peter griffin, etc.
Anonymous No.106498241
>>106498228
tell it to make you a tool for doing that. GPT5 can one shot it
Anonymous No.106498251 >>106498271
>>106497992
>Pirate TV show/movie
>Extract audio with ffmpeg
>Trim odoen to bits you need
>?????
>Profit
Anonymous No.106498265 >>106498269 >>106498319
Is there a local version of nano banana anyone has made? the ones iv seen on hugging face went down quickly
Anonymous No.106498269 >>106498334
>>106498265
nano banana is not a local model, its a google model
Anonymous No.106498271 >>106498301 >>106498303
>>106498240
yeah I figured
>>106498251
am I missing something. is ffmpeg easier to use than I thought for something like this?
Anonymous No.106498297
>>106498148
>>106498210
>>106498226
kek i remember some time ago when i was cutting up the audio for some other tts i was too lazy to install kdenlive or some other shit so i asked deepseek for ffmpeg idk what happened but shit dident work (think i installed it wrong or sumthing) so i jsut asked it to instead make a powershell script which jsut worked lol XD literally just put in the mp3/mp4 in the folder and give it from which second to cut to which and it does it fucking awesome how jank you can get with llms its really alot of fun
Anonymous No.106498301
>>106498271
Actually using audacity would probably be easier. I'm so used to the CLI interface that I sometimes forget guis exist.
Anonymous No.106498303
>>106498271
ffmpeg is for nerds who love command lines. If you want usable stuff, use audacity, or maybe even da vinci resolve which will do audio fine for free.
Anonymous No.106498319 >>106498341 >>106498734
>>106498265
the best image model out right now that can be run on your gaming PC is qwen image, you can run it if you have 16gb of vram.

here is a guide from a man who is definitely not a pedo
https://www.youtube.com/watch?v=0yB_F-NIzkc&t=303s
Anonymous No.106498334 >>106498345
>>106498269
thank you anon, thats disapointing is there anything really comparable i can use locally?
Anonymous No.106498341
>>106498319
sorry didnt see this reply, lol this guy looks sus
just want a good model to edit wallpapers with
Anonymous No.106498345
>>106498334
qwen image / qwen edit?
Anonymous No.106498390 >>106498392
Sonoma Sky/Dusk Alpha are likely the next LLaMA or a new Meta line of models (possibly proprietary)
Anonymous No.106498392
>>106498390
lol no, its grok, just ask it, and its shit
Anonymous No.106498412 >>106498653
new kimi is great btw, like actually better than sonnet imo
Anonymous No.106498428 >>106498434 >>106498959 >>106499389
hmm it gets pretty schizo at 1.3 tried 1.4 and have tried higher but I dunno.
https://vocaroo.com/1dlL1nEjQeny
said voice clone file
https://vocaroo.com/1orutFZaUpJb
Anonymous No.106498434
>>106498428
10 steps are far too few, try like 50
Anonymous No.106498653
>>106498412
It's shit. I accidentally used v3.1 instead of the new kimi for one of my tests and I actually had a much better time with that before I noticed.
Anonymous No.106498668 >>106498678 >>106498702 >>106498766
Can anyone give me an estimate of how many t/s I could get with pic related at a 5090?
If 3090 + 96 GB + SSD can run R1 at .88t/s how much of an increase would it be with 512 GB of DDR5 over 5.0 x16 + 96 GB of ddr5 + 32GB of vram?
Anonymous No.106498676 >>106498704 >>106499018
I spent 3 hours looking at comfy ui and all this crap because you told me it was easier on us vramlets and I finally got the comfy-UI VibeVoice thingy running and when I try to generate I get stuck on this
>Downloading VibeVoice model: VibeVoice-Large...
>Fetching 17 files: 0%| | 0/17 [00:00
an hour later still stuck there, I force stop comfyUi and restart it and it still gets stuck on that
Anonymous No.106498678
>>106498668
>ddr5
so the same speed as using it on regular ddr5?
Anonymous No.106498702 >>106499735
>>106498668
It's going over PCI-E 5.0 X16 so the hypothetical maximum bandwidth going over that connection is 128GB/s
Anonymous No.106498704 >>106498714 >>106499018
>>106498676
Download from modelscope into ComfyUI/models/tts/VibeVoice-Large.
Anonymous No.106498707 >>106498721
Redpill me on nanobanana
Anonymous No.106498714
>>106498704
>ComfyUI/models/tts/VibeVoice-Large.
*ComfyUI/models/tts/vibevoice/VibeVoice-Large
Anonymous No.106498721 >>106498735
>>106498707
SOTA but it's likely actually genie3 creating a virtual reality where the prompt is real and taking a virtual photo off that
Anonymous No.106498734
>>106498319
do you think it can be done with 12gb?
Anonymous No.106498735
>>106498721
That sounds incredibly convoluted for what's essentially Photoshop: Gemini Version but it's cool how it works
Anonymous No.106498766
>>106498668
I dunno, r1 was kinda hard to run and I havent tried it since jan cuz I hate it's writing style.

I have a build of 5090, 2x 5060's for 64vram/160 ddr5 (4000mhz). On linux that got me 5 tokens a second on 4k context full glm q4 with some mmap and maybe using 48gb of vram (hard to balance MoE layers, I suck). Presumably if I bought a proper 256gb ddr5 (6000 mhz) kit, I'd be getting more tokens per second, maybe 8 or so even with 8k context I wanna say.

That's a 200gb model. 400gb deepseek is gonna cut shit in half unless you invest in tons of vram
Anonymous No.106498787 >>106498799 >>106498817 >>106498826 >>106498833 >>106498864 >>106498869 >>106498898 >>106498965 >>106499156
how much potable water is being drink because of this.
how many forests are being burn because this technology.

People accepted computers because their energy output is low.
Now that is gone.
Anonymous No.106498799
>>106498787
almost nothing, and water is not destroyed, that is not a thing, it just condenses back into water after being turned to steam
Anonymous No.106498817
>>106498787
if burning electricity for cars is considered green, then burning electricity for ai is even cleaner (and doesn't cause tonnes of rubber plastic contamination through tires)
Anonymous No.106498826
>>106498787
dying of thirst
computers drank it all
me go too far
Anonymous No.106498833
>>106498787
>People accepted computers because their energy output is low.
No, they accepted them because the utility is high. Now its even higher.
Anonymous No.106498834 >>106499449
when are we going to get tts.cpp and vibevoice GOOFS
fuck this python nonsense, couldn't be bothered to set any of it up all over again for every new shitty web UI and whatever that gets released
Anonymous No.106498836
>tfw mom is mad at you again for using up all the house water to talk to the ai
Anonymous No.106498864
>>106498787
leftist detected
Anonymous No.106498869
>>106498787
I set fire to the amazon (both the river and the rainforest) just to ahh ahh mistress, and I'd do it again.
Anonymous No.106498898
>>106498787
You have identified the issue but not the cause. We have water shortages because people reproduce endlessly until we reach a breaking point. The main use of water is to grow FOOD.

Most electric plants and datacenters consume a lot of water but that water is cycled into the plant and then returned to the environment shortly, making their numbers on a graph look high, but essentially very low compared to other uses.

They do use a lot of power though. They need to chill out a bit on large training runs and pointlessly making tiny improvements.
Anonymous No.106498959 >>106499005
>>106498428
Where can I get the model?
Anonymous No.106498965
>>106498787
i’d burn the entire amazon if it meant i get to rp with my robot loli
Anonymous No.106499005
>>106498959
https://www.modelscope.cn/models/microsoft/VibeVoice-Large/files
Anonymous No.106499018 >>106499389 >>106504121 >>106504130
>>106498704
>>106498676
ok I got it from the torrent in last thread, took a whole friggin hour to download it
Honestly yeah I see a pretty massive improvement, previously it took me 3 minutes to generate 15 seconds of speech with the Large model now I generated 40 second of speech in 80 seconds
MASSIVE improvement
Anonymous No.106499113 >>106499132 >>106499220 >>106499914
Here's the reason why vibvoice large was taken down: https://voca.ro/1bCzVodtGtHZ

(had to use a voice clone of porn moaning to get it reliably to do this. The base clip sounded this fake and inauthentic too so maybe someone can do better)
Anonymous No.106499121 >>106499517 >>106499750
Fucked up how a picture is worth a thousand words yet LLMs are way more resource intensive than diffusion models
Anonymous No.106499132 >>106499149
>>106499113
I feel unsafe right now. Like, my whole life is in danger.
Anonymous No.106499149 >>106499173 >>106499929
>>106499132
Now I'm basically raping you : https://voca.ro/15bXrL5GeAS9
Anonymous No.106499156
>>106498787
Energy consumption is how civilisation advances
Anonymous No.106499173 >>106499189
>>106499149
Take care when using the EXCLAMATION MARK(!) IN VIBEVOICE! I find it hilarious when it instantly goes to 11 with mic clipping and distortion
Anonymous No.106499189 >>106499364 >>106499911
>>106499173
Example: https://vocaroo.com/127ZooPcK7mj
Anonymous No.106499193
Don't you dare!!
Anonymous No.106499220 >>106499279 >>106499415
>>106499113
can it do japanese?
Anonymous No.106499279
>>106499220
Yeah real. English sucks for sex. Though I guess it could be worse.
Anonymous No.106499352 >>106501634
>>106497859
It's the Chinese Family Guy knockoff.
Anonymous No.106499364 >>106499415
>>106499189
Kek you weren't kidding, that really went apeshit.
Was that just an exclamation mark or was it allcaps too?
Anonymous No.106499389 >>106499425 >>106499831 >>106503518 >>106504130
>>106499018
Some of the results I've been doing, I stole that degenerate's Gwen voice >>106498428 and just ran it through an AI voice cleaner, all those cartoon sound effects and background noise ruin the sample
and a violet sample I had prepared before
https://vocaroo.com/1llO81h1n7kR
also you need to find more even keeled samples, that sample will only produce a hopped up angry yelling Gwen
also from what I've seen the sample options mostly produce garbage, turn it off and the steps seem to be fine at 30 at most, I didn't see massive improvement beyond that point and only slows down with diminishing returns
Anonymous No.106499415 >>106499431 >>106499435 >>106499446 >>106499739 >>106499777 >>106500865 >>106500928 >>106501074
>>106499220
No. I put in some jav moans as the model. Even I can tell it's bad. I generated this several times and I never got the same passion or breathiness, grunts etc that the English voice could. https://voca.ro/1aruRYcd92sp

>>106499364
It contextually just sorta figures it out. Theres no prompting or anything, but you can say "I'm gonna sing a sad song about etc" and it will try to do it kind of. Voice models seem to help push it in various directions too. I bet it could sing better if I just put in a song.
Anonymous No.106499425 >>106499442
>>106499389
Which one?
Anonymous No.106499431
>>106499415
>fakingu yes
Anonymous No.106499435
>>106499415
Jeesas
How about Chinese?
Anonymous No.106499442 >>106499448
>>106499425
which one what?
Anonymous No.106499446
>>106499415
>https://voca.ro/1aruRYcd92sp
lmao, that's funny though
Anonymous No.106499448 >>106499466
>>106499442
AI voice cleaner.
llama.cpp CUDA dev !!yhbFjk57TDr No.106499449 >>106499480 >>106499614 >>106500912
>>106492238
>>106497310
If you're using AMD, be aware that the default for --flash-attn is now "auto", which means to enable it if the backend supports it.
On master the AMD FlashAttention performance can be quite bad though, so try "-fa off" and re-try after https://github.com/ggml-org/llama.cpp/pull/15769 has been merged (if you have an old AMD GPU).

>>106498834
bark.cpp https://github.com/PABannier/bark.cpp already exists though the last commit was 10 months ago.
Anonymous No.106499466
>>106499448
cleanvoice, literally the first one I found while googling lol
You have to create an account and it has limited uses, you know what we're in /lmg/
someone please point me to the best local voice cleaner model please
Anonymous No.106499477 >>106499488 >>106499489 >>106499499 >>106499501 >>106499518 >>106499552 >>106499577 >>106502081
>>106497597 (OP)
>https://www.theverge.com/anthropic/773087/anthropic-to-pay-1-5-billion-to-authors-in-landmark-ai-settlement
>Anthropic to pay $1.5 billion to authors in landmark AI settlement
>$3000 per book
Pack it up boys, it's over.
Anonymous No.106499480
>>106499449
>bark.cpp https://github.com/PABannier/bark.cpp already exists though the last commit was 10 months ago.
That's bark model specific though, and I think VV will be more difficult to implement support for since it's actually a diffusion model + a Qwen LLM.
Anonymous No.106499488 >>106499497
>>106499477
seeing comments cheering it i think people deserve the humiliation ritual that is the modern world
Anonymous No.106499489
>>106499477
they should be releasing claude 1.2 instead
Anonymous No.106499497 >>106499521 >>106499559
>>106499488
>seeing comments cheering it i think people deserve the humiliation ritual that is the modern world
this, why the fuck do they want to make their own jail, humanity was a mistake
Anonymous No.106499499
>>106499477
hey wheres my 3,00 dollars? I've been typing bullshit onto the internet for years. When someone tells the ai not to act like an uniformed angry idiot, that's MY DATA they're using.
Anonymous No.106499501
>>106499477
looool
Anonymous No.106499517
>>106499121
That’s….the natural implication of that phrase
You need 1000x the resources to generate the words for 1 picture
Anonymous No.106499518 >>106499574 >>106499577
>>106499477
holy fuck dude, this is actually horrible, the US really wants to lose the AI race to the chinks or what?
Anonymous No.106499521 >>106499698 >>106499706 >>106499728 >>106499737
>>106499497
Humanity can accomplish amazing things, it's humans that are the problem. Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake.
Anonymous No.106499552
>>106499477
>1% of the company's worth
oh no
Anonymous No.106499559
>>106499497
The market is a thing that allows me to buy things. But when it goes away i probably wont need it.
Anonymous No.106499574 >>106499693
>>106499518
we have invested hundreds of billions on ai and hundreds billions more on hardware to run it. We have invested 50 times more on ai than on nuclear fusion.

Thats settlement is token shit to say we did the right thing. And you are correct in thinking that if we actually acted with integrity and morality, other countries would surge ahead of us as we shot ourselves in the foot. If you think this tiny crumb is gonna slow us down you're kind of dumb. If anything it shows the worst that could happen and emboldens lawbreaking as a known expense. A slap on the wrist is the worst that can happen.
Anonymous No.106499577
>>106499477
That's Anthropic's problem. Should have given Orange Jew a few appeasement gifts.

>>106499518
They have multiple groups of jews infighting for money while chinks can act as one. They don't care so much about the long term, as long as it instantly profitable it's okay. The market will fix it. EU on the other hand put on safety IoT cock cage on and is begging to be dommed by both.
Anonymous No.106499614
>>106499449
>bark.cpp https://github.com/PABannier/bark.cpp already exists
Nice, thanks.
I also saw that koboldcpp added support for something called TTS.cpp although in my (very limited) experience it's really slow on PC and the developer seems to be a macfag because that's the primary platform.
Anonymous No.106499693
>>106499574
>If you think this tiny crumb is gonna slow us down you're kind of dumb.
I don't think you realize how serious this is. All emerging companies will need billions of dollars to obtain the data necessary to train their models. This will destroy everything; only large companies will be able to afford it. The US killed itself on that race, they didn't just shoot themselves at their foot this time.
Anonymous No.106499698
>>106499521
>Once you realize that anything under 120 IQ can barely be considered sapient
fact, and I say this because have 121 IQ kek
Anonymous No.106499702 >>106499777
Has anyone tried VV with some Japanese dlsite voice works? I'm curious how it would handle going from Japanese to English.
Anonymous No.106499706
>>106499521
Most humans that report 120+ IQ are benchmaxxed.
Anonymous No.106499728
>>106499521
>Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake
I felt that way after seeing the lack of gamers and reviewers mention how utterly broken the AI is in the new shinobi (where you can stand next to many enemies and not ever take a single bit of damage)
people are worthless
Anonymous No.106499735 >>106499745
>>106498702
4t/s for GLM-4.5-FP8...
Anonymous No.106499737
>>106499521
>Once you realize that anything under 120 IQ can barely be considered sapient you'll know universal sufferage and internet access was a mistake.
and the average IQ will goes down and down due to the fact the africans are the only ones making a shit ton of babies, this world is fucked, I pity the future generation
Anonymous No.106499739
>>106499415
>https://voca.ro/1aruRYcd92sp
Kek that's actually not bad though, it just seems to have lost the context of what it's doing.
Anonymous No.106499745
>>106499735
I mean, it is a 5x increase but on the other, 3000 USD + your own RDIMMs is get another GPU territory.
Anonymous No.106499750
>>106499121
LLMs need a much better world model than image models. If you fuck up just one word, it can completely break a sentence or turn it into nonsense, but nobody cares if some blurry background detail on an image is a bit deformed. Or even some foreground details in many cases.
Anonymous No.106499777
>>106499702
I guess this answers my question >>106499415
Anonymous No.106499780 >>106499802 >>106499803 >>106499936 >>106500681 >>106501881
>Kimi turned out to be censored
>Deepseek is still autistic
>GLM wasn't much, of anything
What's /g/ using for ERP these days after the rose color glasses of that 'new model prose' has worn off?
Anonymous No.106499802
>>106499780
R1
Anonymous No.106499803
>>106499780
Literally nothing
Anonymous No.106499814 >>106500404
So does vibevoice have stuff like [laugh] or it's words only?
Anonymous No.106499831 >>106499839 >>106499863
>>106499389
Ok enough fun with this thing for today, I'm really impressed by the inflections and effects that it gives the scripts, really surprising model all around
https://vocaroo.com/19GSroXyQYlT
Anonymous No.106499839 >>106499967
>>106499831
>really surprising model all around
that's why they wanted to shut the model down, it's too good for local
Anonymous No.106499863 >>106499875
>>106499831
>and don't get me started
Did an LLM write this script?
Anonymous No.106499875 >>106499907 >>106499916
>>106499863
no but my imagination is pretty stunted anone-kun
Right now I'm just trying to come up with funny throwaway scripts to test this sheez
Anonymous No.106499907 >>106499916 >>106500081
>>106499875
This would be more of a storytime than an RP but here is some human slop I wrote for the Open Assistant dataset:

>In the land of South Korea K-pop used to reign supreme. Anyone listening to other genres of music was ridiculed and even openly discriminated against. But no one had it as bad as the fans of Japanese idols. Gangs of K-pop mutant mecha warriors roamed the streets and when they found an idol fan they would be lucky to get away with just a beating. Their favorite thing to do with idol fans was to use their built-in speakers to blast K-pop at such volumes that it would rupture the idol fans' eardrums so they would never be able to hear the voice of their favorite idols again. Sometimes they would switch it up by spewing acid from their mutant bodies for the same effect.

>A lone blacksmith knew that the K-pop warriors would be coming for him next. He had made a small figurine of a vaguely humanoid monster with sharp claws and pointed horns. With all of his heart he prayed to Hatsune Miku, begging her to bring his statue to life so that it may protect idol fans from their terrible persecution - and his prayer was answered. Hatsune Miku descended from the heavens and with her divine powers she brought the statue to life. She named the monster Pulgasari, the eater of iron and steel.

>And Pulgasari did indeed become stronger and bigger as he consumed more and more metal. To grant him even bigger powers Hatsune Miku brought the radioactive remains of the Fukushima reactor core to Korea so that Pulgasari may feast on them. And as the radiation entered Pulgasari's body he began to mutate, growing stronger and stronger by the second. The blacksmith knew that with Pulgasari on their side the time for rebellion had come and so he rallied his fellow idol fans to finally rise up en masse.
Anonymous No.106499911
>>106499189
>you- FUCKING NIGGER!
Anonymous No.106499914
>>106499113
that's bredd good actually :-D
Anonymous No.106499916 >>106500081 >>106500089
>>106499875
>>106499907
>It wasn't long until the K-pop warriors realized that something was wrong: a giant, radioactive monster was marching towards their headquarters and thousands of rebel idol fans were following it. Thanks to their mechanical bodies the K-pop warriors were able to quickly concentrate their forces and a battle of epic proportions ensued. The K-pop warriors reasoned that they would only need to take down Pulgasari and their victory would be assured, but their strategy ended up backfiring. With each felled mecha warrior that Pulgasari consumed his wounds wound close and he emerged even stronger than he had been before. Eventually the K-pop warriors realized their mistake but it was too late; Pulgasari had killed too many of them and they were being overrun.

>The battle ended with a crushing defeat for the K-pop warriors and their headquarters was occupied by the idol fans. But Pulgasari's hunger for metal did not stop. He continued to feast on the corpses of the defeated mecha warriors and then went on eat any kind of metal he could find. Hatsune Miku, realizing that Pulgasari's hunger would never be satisfied, quickly hid herself in a bell just as Pulgasari was eating it. Pulgasari devoured her and as he realized what he had done he turned to stone while letting out a heart-wrenching scream. Touched by Hatsune Miku's heroic sacrifice the fans of different music genres established an uneasy peace. Whether this peace would last only time would tell but if the menace of K-pop mutant mecha warriors were to ever rear its ugly head again, then Pulgasari will be back to stop them.
Anonymous No.106499929
>>106499149
holy shit man she needs to calm down
Anonymous No.106499936
>>106499780
giantess woman.
her ass is your new home.
Anonymous No.106499958 >>106500651
>>106496501
>>106496504
Thank you, absolute legends. Now it not only doesn't OOM, but works faster in some scenarios where it wasn't OOMing.
Anonymous No.106499967 >>106499999 >>106500008 >>106500013 >>106500073 >>106500670
>>106499839
Is it still available somewhere?
Anonymous No.106499999
>>106499967
no
Anonymous No.106500008
>>106499967
yes
Anonymous No.106500013
>>106499967
maybe
Anonymous No.106500073
>>106499967
https://www.modelscope.cn/organization/microsoft
Anonymous No.106500081 >>106500089 >>106500140
>>106499907
>>106499916
damn anon, that's a lot of shit
took 12 whole minutes to generate that, be sure to listen to the end :)
https://vocaroo.com/12ef4CDQg9pZ
I made clones from Sarah and Ellie from tlou and Violet from the incredibles, I especially like how you can hear paper shuffling at some points and Sarah flubs a line once
Anonymous No.106500089 >>106500140
>>106499916
>>106500081
I just noticed it was your script that flubbed the line but it generated as a geniune mistake of someone reading too fast, incredible
Anonymous No.106500140
>>106500081
Cool, thank you.
The intonation is still off for e.g. "Hatsune Miku" or more generally for emphasizing the intended emotions of the story but for something that is machine-generated this is very impressive.
If someone were to leave me a voicemail using this I don't think I could reliably tell that it's not a human.

>>106500089
Yeah, I wrote this at like 1 am.
Anonymous No.106500288 >>106500333 >>106500375 >>106500449
Now that he wave reached the plateau of XXXB/30~50A MoE models, how are we going to run the next upcoming MoE 70b~100b active parameters SOTA? Even CPUMAXXing and Macs start being slow as shit at those active parameter sizes.
Anonymous No.106500301 >>106502322
Anonymous No.106500333
>>106500288
the trend is towards lower active param count, not higher
Anonymous No.106500375
>>106500288
>Now that he wave reached the plateau of XXXB/30~50A MoE models
We still haven't hit that, biggest niggers on the block are <40BA, and trending downward.
Anonymous No.106500404 >>106500441 >>106500604
>>106499814
words only. You can type haha and it will kind of do an actual laugh but I couldnt get it to do more than that. Maybe if you put laughing in the voice clone... I didnt try

https://voca.ro/1iU4VFpN4gXK
Anonymous No.106500441 >>106500615
>>106500404
What voice did you sample to get this croaking harlot?
Anonymous No.106500449
>>106500288
- better quants
- different experts quanted differently
- wait for amd's giant multi-channel apus
Anonymous No.106500501 >>106500521 >>106500615 >>106500676
I'm thinking of getting a 5060 16gb later this year, but I'm worried about a price hike. I'm using a dinosaur 2060.
It looks like a good, enduring buy. Even that nip blog says it's a very good entry-level card.
It's a shame AAA gaming is so shitty these days that the only thing you'd want a good card for is 'playing' with AI.
Anonymous No.106500521
>>106500501
simply don't play AAA and mod nightmarishly inefficient graphics enhancements into other games
i'm looking at a 5070 ti myself and cringing at the $
Anonymous No.106500604 >>106500623
>>106500404
I plugged a clip with other sound effects and it kept using them but also repeating big chunks
Anonymous No.106500615 >>106500652
>>106500501
If you care about ai the only things you might consider are the 5070 ti super with 24gb vram (750-1000) or the intel b60 dual (48gb $1200) or b60 single (24gb, $500) that will come out in the next 6 months or so. You wont regret the 5060 though. Great compatibility and there are ways to run qwen, wan, and shitty llm's on it (but it can run glm air 100b if you buy 96gb ram too). Plenty of fun stuff to do and while 16gb kinda sucks, this is also the best bang for buck to get into ai as a fun lil hobby.

What youre paying for with nvidia is compatibility. If you go the intel rout, plan on running old ai and primarily having it for llm's- with image, tts, video, etc having spotty support or nonexistent.

Also, it could be a year before buying these cards is actually viable.... no one knows whats going on. Also ignore all reviews saying 5060 sucks. The one thing it hugely improves on is ai performance where it easily doubles over previous generations.

>>106500441
[spoiler]Isabella valentine, sissy secretary or airhead university work great for femdom smut[/spoiler]
Anonymous No.106500623 >>106500647
>>106500604
make sure its less than 30 seconds. It bugs out on long audio
Anonymous No.106500647
>>106500623
It is less than that. Might be because the prompt was similar to what was said in the audio or because I was testing with 3 or 5 steps and high cfg.
Anonymous No.106500651
>>106499958
nta

enjoy your daily gooning time
Anonymous No.106500652
>>106500615
NTA, but fuck, I knew I knew that voice...
Anonymous No.106500670 >>106500879
>>106499967
be quick

model:
https://huggingface.co/sheliak/VibeVoice-Large_Mirror/tree/main

github:
https://github.com/great-wind/MicroSoft_VibeVoice

or Comfy-UI solutions
Anonymous No.106500676 >>106500695
>>106500501
>5060
>It looks like a good, enduring buy.

Anon, I...
Anonymous No.106500681
>>106499780
Pyg
Anonymous No.106500695
>>106500676
I can't even get a 5070ti if I want to. They only have 16gb variants where I live.
Anonymous No.106500752 >>106500809
VibeVoice is only EN and CN, right?
Anonymous No.106500809
>>106500752
Japanese works even though they said it doesn't support it. Not sure about other languages
Anonymous No.106500865
>>106499415
this is kino
Anonymous No.106500879 >>106501145
>>106500670
which one should I get?
Anonymous No.106500911 >>106500931
There's a 0.5B version for streaming? Too bad they'll never release it
Anonymous No.106500912
>>106499449
Hey, I did get a small tks boost with -fa off!
Anonymous No.106500928 >>106501115
>>106499415
>kami-sama
lul
Anonymous No.106500931
>>106500911
With how bad the 1.5B is, you don't need a 0.5B noise generator
Anonymous No.106501006 >>106501612
>VibeVoice
I have no use for voice cloning myself, but after a little search I found this https://github.com/wildminder/ComfyUI-VibeVoice whcih claims to support quantization, q4 or 8 should be manageable for the 7b even on potato hardware.
Anonymous No.106501074
>>106499415
>jav moans
dlsite has plenty of "ASMR" content with free samples, you could try using one of those.
Anonymous No.106501115
>>106500928
I think the machine translated sex talk was probably part of the reason it didn't work that well.
Anonymous No.106501145 >>106501148 >>106501158 >>106501159 >>106501172
>>106500879

if you are not familiar with Comfy-UI, take this:
https://github.com/great-wind/MicroSoft_VibeVoice

This worked for me:
# create and atcivate VENC to isolate your instalation
conda create -n vibevoice python=3.12
conda activate vibevoice

# REF: installation instruction from github page
git clone https://github.com/great-wind/MicroSoft_VibeVoice.git
cd MicroSoft_VibeVoice/
pip install -e .
# flash attention can take long time to install
pip install flash-attn --no-build-isolation

# run with gradio interface after you downloaded the model to Microsoft_VibeVoice-Large
# from https://huggingface.co/sheliak/VibeVoice-Large_Mirror/tree/main
# you'll need all JSON files as well
# provide correct path instead of /path/to/model/folder/Microsoft_VibeVoice-Large/
python demo/gradio_demo.py --model_path /path/to/model/folder/Microsoft_VibeVoice-Large/

# or you can check out the smaller VibeVoice-1.5b
# from https://huggingface.co/microsoft/VibeVoice-1.5B/tree/main
# you'll need all JSON files as well
# provide correct path instead of /path/to/model/folder/Microsoft_VibeVoice-1.5b/
python demo/gradio_demo.py --model_path /path/to/model/folder/Microsoft_VibeVoice-1.5b/
Anonymous No.106501148
>>106501145
>atcivate VENC

(me) lol
Anonymous No.106501158
>>106501145
btw, docker did not work for me 'cause I'm retarded

Anyway, all this implies you already have Nvidia CUDA stuff installed
Anonymous No.106501159
>>106501145
Holy slop, please.
Anonymous No.106501160 >>106501169 >>106501178 >>106501239 >>106501290 >>106501417 >>106503731
I've been looking to build a multi GPU setup, but I'm a bit stuck at what motherboard + CPU combo I should be looking for.
I'm looking for something ATX or EATX that'll take DDR5 memory and has multiple full bandwidth PCIE slots that are at least PCIE 4.
Any suggestions?
Anonymous No.106501169 >>106501205
>>106501160
If you can't solve this issue on your own... You don't really want to build anything. Or perhaps you should learn some technical skills first?
Anonymous No.106501172 >>106501230
>>106501145
I meant the safetensor, I was just downloading it since I read they were trying to delete it. I like hoarding stuff. The rest is too complicated for me :^)
Anonymous No.106501178 >>106501257
>>106501160
Get any motherboard that supports x8/x8 mode for 2 gpus. It's nothing special.
Anonymous No.106501205
>>106501169
this, /g/ isn't the right place for such questions
Anonymous No.106501214
>>106497597 (OP)
embedded 'p?
Anonymous No.106501230
>>106501172

Large (7b) >>>> 1.5b
Anonymous No.106501239 >>106501257
>>106501160
>I've been looking to build a multi GPU setup

useless unless you want to run two separate LLM instances

NUMA is slow as shit
Anonymous No.106501245 >>106501778 >>106501805 >>106501925
https://vocaroo.com/14QgXnYa9n9R
Not bad.
Anonymous No.106501257 >>106501297 >>106501342
>>106501178
The issue with that is the fact I'm looking at a triple GPU setup so I figured I'd get a workstation/HEDT motherboard that can properly support it.
Plus, motherboards that do 8x/8x mode are already $500+ so why consider those over a more dedicated system?
>>106501239
How slow are we talking here? I can accept a 20% penalty but if we're talking 50+ over full bandwidth, then I need to know.
Also, isn't every AI server in existence using multi GPU setups?
Anonymous No.106501290 >>106501297
>>106501160
For my expensive multi GPU machine I bought this (octa-channel DDR4, 7x 16x PCIe 4.0): https://www.asrockrack.com/general/productdetail.asp?Model=ROMED8-2T#Specifications
For my cheap machine I bought some second-hand Xeon system off of ebay for 300€ (64 GB of quad-channel DDR4, 16x/8x/8x/8x PCIe 3.0).
For DDR5 there are I think no cheap options and keep in mind that as of right now there is no inference code available that is well-optimized for NUMA systems.
Anonymous No.106501297
>>106501257
>>106501290
>no inference code available that is well-optimized for NUMA systems.
What I meant is that there is no well-optimized code for CPU inference.
If the GPUs are on different NUMA nodes you also get more latency for data transfers unless you use something like NVLink.
Anonymous No.106501342 >>106501349 >>106501360 >>106501442
>>106501257
>How slow are we talking here?
I have HP Z840 with two Xeon and 512-512MB DDR4 memory

I get the maximum of 4 t/s with DeepSeek-R1-0528-Q2_K_L and --cpu-moe if
model is cached entirely in NUMA0
llama-cli is run on CPU0
--threads matches the number of PHYSICAL cores of this single CPU0

You can run two instances of LLM on two CPUs if they are separated physically in NUMA

As you can see I have to isolate the memory and the cores to get the maximum.
All my attempts to get a bust by using the second CPU only slowed thing down considerably.
If the model does not fit entirely in a single NUMA unit, it sucks big time too
# Run the command
CUDA_VISIBLE_DEVICES="0," \
numactl --physcpubind=8-15 --membind=1 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-cli" \
--model "$model" $model_parameters \
--threads 8 \
--ctx-size $cxt_size \
--cache-type-k q4_0 \
--flash-attn \
--n-gpu-layers 99 \
--no-warmup \
--batch-size 8192 \
--ubatch-size 2048 \
--threads-batch 8 \
--jinja \
$log_option \
--prompt-cache "$cache_file" \
--file "$tmp_file" \
--cpu-moe
Anonymous No.106501349
>>106501342
>All my attempts to get a bust
Heh.
Anonymous No.106501360
>>106501342
this is how I cache the model to a specific NUMA unit on system start

model1="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00001-of-00008.gguf"
model2="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00002-of-00008.gguf"
model3="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00003-of-00008.gguf"
model4="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00004-of-00008.gguf"
model5="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00005-of-00008.gguf"
model6="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00006-of-00008.gguf"
model7="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00007-of-00008.gguf"
model8="/path/to/model/Kimi-K2-Instruct-UD-Q2_K_XL/Kimi-K2-Instruct-UD-Q2_K_XL-00008-of-00008.gguf"

#echo "Pre-caching Kimi-K2-Instruct-UD-Q2_K_XL"
numactl --cpunodebind=0 --membind=0 dd if=$model1 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model2 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model3 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model4 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model5 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model6 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model7 of=/dev/null bs=1M
numactl --cpunodebind=0 --membind=0 dd if=$model8 of=/dev/null bs=1M


This way, I can re-run a LLM in 20 seconds
Anonymous No.106501367 >>106501386
hey frens is there a rentry for text to 3D
I want to look into the tech that makes the turn-around images because I think I could use it to fill gaps in my photogrammetry photo sets. It really hurts to get a great scan with one missing photo leaving a ugly low-detail smear
Anonymous No.106501386 >>106501428
>>106501367
Would be easier to just go back to the location and take additional photos, then edit them to match color and lighting. It doesn't need to be perfect.
But as a professional you would know this already...
Anonymous No.106501417 >>106503731
>>106501160
>multi GPU
If you're _definitely_ going to stop at 2 gpus, then
- go for an AM5 motherboard
- that has a pair of x16 pcie slots
- that can run in x16+x0 and x8+x8.

The overboard option is:
- Gigabyte mz33-cp1
- 9004/9005-series epyc
- (256mb L3 cache = 8 chiplets)

https://www.amd.com/en/products/specifications/server-processor.html
Anonymous No.106501428
>>106501386
I just want to try it and see if it works. I do a lot of photography and I found some really old sets with problems. I can't take a trip to fix them - even if I can find the same tree stump somewhere in a forest in a far away state
Anonymous No.106501442 >>106501465
>>106501342
>HP Z840 with two Xeon and 512-512MB DDR4
8x 64gb lrdimms per socket ?
Anonymous No.106501465
>>106501442
>8x 64gb lrdimms per socket ?

Exactly. 1 Euro/Gb on ebay
llama.cpp CUDA dev !!yhbFjk57TDr No.106501583 >>106501629 >>106501653 >>106501677 >>106501706 >>106501727 >>106501822 >>106501904
>>106497597 (OP)
Quick question: if you were to see a print like the following without further context, would you intuitively understand what it's supposed to mean?

llama_backend_print_memory: memory use: total = free + ( self = model + context + compute) + other
llama_backend_print_memory: - CUDA0 (RTX 4090): 24080 = 4064 + (19486 = 17868 + 64 + 1554) + 530 MiB
llama_backend_print_memory: - CUDA1 (RTX 4090): 24080 = 9952 + (13600 = 13401 + 48 + 150) + 528 MiB
llama_backend_print_memory: - CUDA2 (RTX 4090): 24080 = 5468 + (18083 = 17868 + 64 + 150) + 529 MiB
llama_backend_print_memory: - CUDA3 (RTX 4090): 24080 = 9945 + (13600 = 13401 + 48 + 150) + 535 MiB
llama_backend_print_memory: - CUDA4 (RTX 4090): 24080 = 9952 + (13600 = 13401 + 48 + 150) + 528 MiB
llama_backend_print_memory: - CUDA5 (RTX 4090): 24080 = 14434 + ( 9116 = 8934 + 32 + 150) + 529 MiB
llama_backend_print_memory: - CPU (EPYC 7742): 515628 = 515180 + ( 448 = 0 + 432 + 16) + 0 MiB


(Ignore that the values for CPU are wrong.)
Anonymous No.106501612
>>106501006
kinda nice. a bit buggy and needs more polish but it works (requiring 2 speakers is annoying, took longer to generate than a webui on pinokio did (2x as slow) and lacks streaming).

I can notice the quantized large is much worse than full precision though. Fucks up words all the time and misses a lot of nuance and pronunciation, skips words more often. In an annoying way. I was happier with 1.5b at full precision personally. But on occasion the large q4 can shine and not fuck up producing nicer sounding audio. useful option, probably run it at low cfg for no downside at all imo
Anonymous No.106501629 >>106501932
>>106501583
No, print memory is vague as hell. Is it memory print or does it print into memory?
Anonymous No.106501634 >>106501932
>>106499352
Those mic arrays are meant to allow the mics as a group to pick up a very tight bubble of sound created within the area of the speaker's head, without amplifying any sounds made outside of it. It requires several mics, and more is better.
> t. former acoustics engineer
Anonymous No.106501653 >>106501932
>>106501583
Having it say "memory use" when the first value is total memory rather than memory used is slightly confusing and made me double and triple check whether I might be misunderstanding something.
Other than that it's easy to understand.
Anonymous No.106501677 >>106501932
>>106501583
Should be used = ( self = model + context + compute) + other / total
Anonymous No.106501706 >>106501932
>>106501583
Looks intelligible enough.
I like that the main 3 numbers (total, free, llm stuff) are right next to each.
Anonymous No.106501727 >>106501932
>>106501583
"memory breakdown" if you want an alternative to memory use.
Anonymous No.106501756 >>106501796
Chub is awesome but there are so few character cards intended for sfw or that don't lean to nsfw. I saw the jai dataset link, unfathomably based, but while I'm at work and on mobile are there any other solid sources to get more general non lewd character cards I ahould check out?
Anonymous No.106501778
>>106501245
I see why it got taken down now.
Anonymous No.106501796 >>106501808 >>106501856
>>106501756
Write your own. If you rely on internet 99% is trash anime themed garbage.
Anonymous No.106501805 >>106501849
>>106501245
The original voice was already whispering, right?
Anonymous No.106501808 >>106501814 >>106501841 >>106501856 >>106501896
>>106501796
NTA but I find using a self made character card vs. a good one that somebody else made feels kind of like the difference between jerking off or getting a handjob.
Anonymous No.106501814 >>106501829
>>106501808
So what you're saying is that self-made cards are better?
Anonymous No.106501822 >>106501932
>>106501583
Free is memory still available on the device.
Self is what llama.cpp is using for context, model weights, and compute buffers,
Other is vague. Is it some overhead from OS/driver or memory you cannot account for after (reported_total_memory - reported_free - lcpp_memory_usage)?
Anonymous No.106501829 >>106501838
>>106501814
I'd make a remark about latent circumcision trauma but in your case it's probably so true that it's not even funny anymore.
Anonymous No.106501838
>>106501829
No part of my dick was stolen by the jews.
Anonymous No.106501841 >>106502448
>>106501808
You are obviously too stupid if this is the first thing what comes into your mind. Jesus christ.
Anonymous No.106501849
>>106501805
Yes.
Anonymous No.106501856
>>106501796
You are 100% correct that I can, fair point and well met, but akin to what >>106501808 said, I'm pretty fresh and looking to experiment with the initial models I've downloaded, getting to know the ST system and need some good cards as reliable constants. That said, I've already written a few lore cards before going local, idk if they're good or function well but I would like to build my own cards as I become more competent and I appreciate your encouragement.
Anonymous No.106501875 >>106501918 >>106501925 >>106501965 >>106502110 >>106502183
Vibevoice is shit and it's laughable you're comparing that trash with elevenlabs
Anonymous No.106501881
>>106499780
Utopia
Anonymous No.106501896 >>106502448
>>106501808
you are most likely getting a handjob from an indian sir
Anonymous No.106501904
>>106501583
home fire imminent?
Anonymous No.106501918 >>106501930
>>106501875
It's the forbidden fruit.
Anonymous No.106501924 >>106502033 >>106502149 >>106502167
>Big news: Introducing Qwen3-Max-Preview (Instruct) — our biggest model yet, with over 1 trillion parameters!

>Now available via Qwen Chat & Alibaba Cloud API.

>Benchmarks show it beats our previous best, Qwen3-235B-A22B-2507. Internal tests + early user feedback confirm: stronger performance, broader knowledge, better at conversations, agentic tasks & instruction following.

>Scaling works — and the official release will surprise you even more. Stay tuned!

Let's wait for more kiwis to ripen!

(Official release? Another 1T model?) (Not impressed) (Kimi is better) (Still waiting for video/image gen)
Anonymous No.106501925 >>106501942 >>106502224
>>106501875
Can you do >>106501245 on Elevenlabs for comparison?
おはようございます、お兄様……今日も元気ですね……この雑魚チンポって……はい、もちろん、雑魚チンポをくわえてあげますわ……
Anonymous No.106501930
>>106501918
/lmg/ falling for the streisand effect episode
llama.cpp CUDA dev !!yhbFjk57TDr No.106501932
>>106501629
>>106501634
>>106501653
>>106501677
>>106501706
>>106501727
Thank you.

>>106501822
"other" is all used memory that the program is not accounting for in the other columns.
This includes other programs and VRAM consumed by e.g. its own CUDA contexts.
Anonymous No.106501942 >>106501970
>>106501925
Give the voice reference first
Anonymous No.106501965 >>106502243
>>106501875
7b is the best zero-shot voice cloning I tested, even the weird voices sound great at only 5 steps. it's a shame there is no steering.
Anonymous No.106501966 >>106501989
So tts bros, we're back?
Anonymous No.106501970 >>106502099
>>106501942
Of course, here's the reference clip:
https://vocaroo.com/156YXJRfOXV0
Anonymous No.106501989
>>106501966
It's not fast enough for realtime so no not really
Anonymous No.106502016 >>106502046 >>106502400
GLM-chan's reasoning is cute sometimes.
>Final call: translate accurately but flag the explicit nature. If this is for academic purposes, they need the raw meaning; if it's for... other purposes, well, at least they've been warned.
Anonymous No.106502033
>>106501924
>I just have a feeling that... it is much smarter. Not reflected by the common benchmarks, but it is just way better than the models before. This gives us much confidence in scaling, either model or data size.
Good vibes bro! Not our problem that it keeps dropping slop in English and has terrible spatial awareness.
Anonymous No.106502046
>>106502016
>This is clearly fetish fuel but I'm supposed to follow the user's request so here goes...
Anonymous No.106502070 >>106502108 >>106502138 >>106502156
Anonymous No.106502081
>>106499477
That is actually a win if you read it, training on copyrighted books was not a issue, even pirating them was not the issue, it was that they admitted to keeping them for purposes other than training on them
Anonymous No.106502099 >>106502117
>>106501970
>kotoshi mo?

shouldn't it be "konya mo" between the lines?
Anonymous No.106502108 >>106502118
>>106502070
>LLM as judge
Nice benchie, bro. Does. It. Like. It. When. I. Write. Like. This?
Anonymous No.106502110
>>106501875
elevenlabs is not as good though
Anonymous No.106502117 >>106502147 >>106502158
>>106502099
I think it's from a new year's stream or something.
Anonymous No.106502118 >>106502163
>>106502108
no, it likes good writing, hence why opus and deepseek are at the top
Anonymous No.106502138 >>106502145
>>106502070
DS V3.1 is dry as FUCK. The dialogues are more coherent tho.
Anonymous No.106502145
>>106502138
skill issue
Anonymous No.106502147
>>106502117
this makes sense
Anonymous No.106502149
>>106501924
hard to care about a 1t chinese model with benchmarks a rounding error away from other large models.
Anonymous No.106502156
>>106502070
>Qwen3 Max worse than 235B
what did qwen mean by this
are the extra params solely for the purpose of fitting more slop?
Anonymous No.106502158
>>106502117
True

omisoka etc
Anonymous No.106502163 >>106502171 >>106502184 >>106502201 >>106502404
>>106502118
LongCuck bros... Why is our model so low? Now ggerganov will not implement it for sure.
Anonymous No.106502167
>>106501924
only thing I care about from them at the moment is 235B VL DESU.
Anonymous No.106502171
>>106502163
longcat is legit shit, dumb, dry and censored as fuck
Anonymous No.106502183 >>106502243
>>106501875
normally you would be right. Zonos, dia, orpheus, kokoro, chatterbox, higgs... The graveyard is vast and littered with unstable unusable garbage cope tts.

This shit is usable and good quality. I'd say it's better than launchday elevenlabs. I havent used elevenlabs in a while though so I'm not sure how they compare, but I do know that after using vibe my motivation to ever bother with them is at zero. I always hated their rates and addictive pay per-token bullshit. All it cost was a 5090 and I'll break even sometime in the next decade.
Anonymous No.106502184 >>106502207
>>106502163
LMAO
SAFER THAN SONNET
Anonymous No.106502201 >>106502231
>>106502163
>gemma 3 12B
>competes with Mistral 3.2 24B
What sort of benchmark is this anyway?
Anonymous No.106502207
>>106502184
sonnet really is not that censored, a prefill fixes it
Anonymous No.106502224
>>106501925
nta
Large
1st swipe
CFG 1.3

https://vocaroo.com/1ngWHN4SaAiL
Anonymous No.106502227 >>106502278 >>106502298 >>106502454 >>106503045 >>106503135
https://github.com/cline/cline/issues/6053 he's not giving up that easily
Anonymous No.106502231 >>106502469 >>106502481
>>106502201
The only benchmark that matters, a creative writing benchmark.
Anonymous No.106502243 >>106502490
>>106501965
>>106502183

can it be fine-tuned though?

how do I set pauses and intonation (emotiona are easy)
Anonymous No.106502258
>VibeVoice

1.5b is not fast that 7b
Anonymous No.106502278 >>106502296 >>106502298 >>106502432
>>106502227
Should have read the license:
> 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
>Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
NO REFUNDS!
Anonymous No.106502296
>>106502278
"You are absolutely right.docx"
that is a troll
Anonymous No.106502298 >>106502432
>>106502227
>>106502278
https://github.com/cline/cline/issues/6053#issuecomment-3262115812
>IM NOT BLUFFING IM IN LOSS AT LEAST $20 000 A COST DIRECTLY PROVEN TO BVE CLINE FAULT
LMAOOOOOOO
Anonymous No.106502322
>>106500301
I'd still pull one pair of those arms back from behind.
Anonymous No.106502400 >>106502534
>>106502016
>...Final check: Is "kupa~" maybe a typo? Like "くちゃ〜"? But no, they wrote it clearly. I’ll stick with my gut. *types translation*
GLM-chan...
Anonymous No.106502404 >>106502542 >>106502575
>>106502163
Who tf pressure to censored open models anyway? Did they do this voluntarily or is AI safety another DEI situation?
Anonymous No.106502406 >>106502445 >>106502478 >>106502521 >>106502813
So, for the average 32 to 64gb of RAM + 8 to 16gb of vram machine, qwen 30BA3B is pretty much the best thing you can run if you need fast and capable, right?
Coom aside. I'm looking for something that's not completely braindead, that can run on an "average" gaming computer, and ideally that can receive 20k token of information and not shit the bed by hallucinating like crazy when populating a Json using said information.
Anonymous No.106502432
>>106502278
>>106502298

Nowadays, you don't even have to read the rules of the game

Let AI spoon-feed you
Anonymous No.106502433 >>106502444
>32 to 64GB of RAM
>average
Anonymous No.106502444
>>106502433
steamy gaming retards do not apply
Anonymous No.106502445
>>106502406
Also, Qwen3-30B-A3B-Thinking-2507 vs Qwen3-30B-A3B-Instruct-2507.
Which do you guys think makes more sense generally speaking?
Anonymous No.106502448
>>106501841
>>106501896
Go be a greasy kike somewhere else.
Anonymous No.106502454
>>106502227
Retards should get a license before getting allowed to run any LLMs
Anonymous No.106502469
>>106502231
Then it's a double whammy - it's funny how similar everything is. Tells a lot about their training data.
Anonymous No.106502478 >>106502528
>>106502406
Llama 3.3 8B should be enough with structured output
Anonymous No.106502481 >>106502622
>>106502231
>a creative writing benchmark
Holy fuck, somebody has finally taken the time to run a bunch of models and manually look at their outputs?
Kudos to the guy(s) doing that.
Anonymous No.106502490 >>106502674 >>106502674
>>106502243
I wouldn't hope for that, they won't release the training code and that size doesn't make it easy
Anonymous No.106502501 >>106502522 >>106502524 >>106502555
>>106495566
What was the stated intent?
Anonymous No.106502521 >>106502528
>>106502406
at the top end of that range you could a not-totally-brainded GLM air quant but otherwise yeah, 30a3 is probably the best all-rounder for the lower-middle end when speed is taken into account
Anonymous No.106502522 >>106502586
>>106502501
they believed people would not use it for porn, like lol? like lmao even
Anonymous No.106502524
>>106502501
Corporate slop, what else
Anonymous No.106502528 >>106502551
>>106502478
There's no lama 3.3 8B as far as I can tell. Only 3.1.
But fair enough, might as well give that a try. Q6 should fit in 8GB of VRAM with an okay amount of q8 context.

>>106502521
>at the top end of that range you could a not-totally-brainded GLM air quant
That's what I run personally, but I'm aiming a little lower.
Thank you anons.
Anonymous No.106502534 >>106502631
>>106502400
post log
Anonymous No.106502542
>>106502404
it started with schizo true believer in robot apocalypse, who went knocking on every door spreading his gospel, until some psychos noticed him and decided they can use his drivel to browbeat the competition and impose regulatory capture on the market.
Anonymous No.106502551
>>106502528
>There's no lama 3.3 8B as far as I can tell
It's on OR only, my bad. Yes even 3.1 should be enough. If the context is causing issues, you should use some heuristics or RAG to cut down the irrelevant tokens
Anonymous No.106502555
>>106502501
gibing them free code patches and research paper citations
Anonymous No.106502575
>>106502404
Anthropic ruined this field like you wouldn't believe
Anonymous No.106502586
>>106502522
I guess safety people assume everyone is like they are, mentally shriveled up and scared of themselves, and it's just a tiny minority who would even think about sex.
Anonymous No.106502591 >>106502698
how compatible is vibevoice with llamacpp?
Anonymous No.106502605 >>106502664 >>106503752
Oh yeah, one more question. Is there a point in using full precision context with a quanted model or is the extra information lost anyway when it gets run through the quanted weights?
And yeah, I'm aware that the prevalent wisdom is that q8 context is supposedly "indistinguishable from full precision" generally speaking.
Anonymous No.106502622
>>106502481
Sadly it's judged by LLMs.
Anonymous No.106502631
>>106502534
I just told it to translate some onomatopoeia
Anonymous No.106502664
>>106502605
>the prevalent wisdom is that q8 context is supposedly "indistinguishable from full precision" generally speaking.
where did you get that?
Anonymous No.106502674
>>106502490
>>106502490
>they won't release the training code

They eventually would. At least, it was their intention before they pulled
Anonymous No.106502698 >>106502731
>>106502591
just use it with comfy, in comfy manager just search vibevoice
Anonymous No.106502731 >>106502761
>>106502698
Me too, but i figured gguf would be considerably faster
Anonymous No.106502751
speed is stored in the goofs
Anonymous No.106502761 >>106502770 >>106502868
>>106502731
ggufs are not faster though? in fact they are slower than using fp8 / fp4 if your gpu supports those
Anonymous No.106502770 >>106502777
>>106502761
That's only for diffusion models
Anonymous No.106502777
>>106502770
that is not true at all
Anonymous No.106502813 >>106502843 >>106502932
>>106502406
>llama-server -m Qwen3-30B-A3B-Instruct-2507-Q5_K_M.gguf --verbose --threads 8 --threads-batch 16 --batch-size 512 --ubatch-size 512 --n-cpu-moe 41 --gpu-layers 99 --override-kv qwen3moe.expert_used_count=int:10 -c 32000 -fa auto --no-mmap --cache-reuse 256 --offline
>7.5GB VRAM + 25ish GB RAM used
Yeah, alright.
The response was really good too.
I'll do more testing, but this will do.
Anonymous No.106502843 >>106502914
>>106502813
good luck with your saas
Anonymous No.106502868 >>106502893
>>106502761
>fp8 / fp4
Aren't those significantly worse quality? I knew researchers hate it but the idea of generic 4bit floating point is probably just retarded.
Anonymous No.106502893
>>106502868
fp8 is super close, fp4 is a bit worse but there are work arounds such as nunchuku that keeps the params hurt by quantization at fp16
Anonymous No.106502914
>>106502843
Not SaaS. A local app powered by local llms.
It's just a toy project, probably won't become anything bigger, but it's fun fucking around anyhow.
Anonymous No.106502932 >>106502986
>>106502813
That's a pretty good pp speed for your hardware.
Anonymous No.106502954 >>106502973 >>106503043 >>106503175
VibeVoice seems to ignore punctuation...

What do?
Anonymous No.106502973 >>106502985
>>106502954
/n

igger
Anonymous No.106502985
>>106502973
spoonfeeding frogposters only makes them dependent and attracts more
Anonymous No.106502986
>>106502932
It's a notebook too.
I'll chalk it up to CUDADEV's latest shenanigans with 512 batch size PP. If I didn't need all that context, I'd probably increase batch size to 2048, which seems to be the new sweet spot as far as I can tell from the PR.
Anonymous No.106503016
>play with Gemma 3
>proceed to describe all sorts of depravity for 4k+ tokens
>then out of the blue it displays disclaimer and says it can't continue
>call it out and tell it about the past context
>it rephrases my mumbo jumbo system prompt and tells "I'm still in development and some procedures are difficult to follow"
That's kind of cute. Just weird that it can instantly reverse the vectors like that. Usually this will happen if you mention something out of the blue. Slow grooming is always more stable.
Anonymous No.106503043
>>106502954
TTS is always better when you clean up the strings and only leave periods. Even ellipses and question marks etc are better left out. Of course vibevoice is more robust model but wouldn't hurt to clean up the strings anyway.
Anonymous No.106503045
>>106502227
Administrative Content Contamination: ELIMINATED
Text Fragmentation: ELIMINATED
StingrayRowMapper Abstract Class: IMPLEMENTED
3-Tile UI Design: PRESERVED
Professional-Grade Results: ACHIEVED
Spine: SHIVERED
Ozone: SMELLED
Air: THICK
Breath: HITCHED
Beat: SKIPPED
Anonymous No.106503135
>>106502227
>API Request$2.5951
>context loaded up with .docx files
Anonymous No.106503146 >>106503180 >>106503184 >>106503225 >>106503281 >>106503336
Generals are the things that are killing 4chan
Anonymous No.106503170
Not local, but thoughts on sonoma? Free on openrouter right now. 2mil context!
Anonymous No.106503175 >>106503442
>>106502954
The only issue I've noticed is that if more than 3 or thereabouts sentences are on a single Speaker : line then full stops can become shorter. Solution is to linebreak every ~3-4 sentences and add another speaker line with the same speaker ID to get regular pause lengths back.
Anonymous No.106503180 >>106503192 >>106503201 >>106503207
>>106503146
I blame moot for not wanting to make more boards like /vg/.
Anonymous No.106503184
>>106503146
Nah it is tranny moderation. But to be honest 4chan is long since dead. If you can get banned for racism you know this place is just reddit without logins.
Anonymous No.106503192
>>106503180
I too blame moot, but for different reasons.
Anonymous No.106503201 >>106503223
>>106503180
Who or what is "moot"?
Anonymous No.106503207 >>106503364 >>106503862
>>106503180
I blame discord and other social media attracting all new blood.
Anonymous No.106503208
Anonymous No.106503223
>>106503201
A famous namefag back in the day
Anonymous No.106503225
>>106503146
generals saved 4chan, the rest are zoomer trash fishing for attention or bots
Anonymous No.106503243
Furk will save /g/
Anonymous No.106503281 >>106503330 >>106503405
>>106503146
Generals are the things that killed 4chan*
Anonymous No.106503330
>>106503281
you're in one, get out
Anonymous No.106503336 >>106503367
>>106503146
>he says in one of the few threads on this board having actual technical discussions
sorry your tech support question got bumped off
Anonymous No.106503364
>>106503207
They probably keep the traffic numbers from collapsing, but there's a reason the IP counter was disabled. That new blood just comes here to stir shit since moderation is near non-existant, then they take their screencaps back to their other social media. Place is like a festering carcass.
Anonymous No.106503367
>>106503336
I wouldn't quite qualify "what model to run on 3060" as technical discussions
Anonymous No.106503405 >>106503445
>>106503281
Anonymous No.106503442
>>106503175
>Solution is to linebreak every ~3-4 sentences and add another speaker line with the same speaker ID to get regular pause lengths back.

ty, kind anon
Anonymous No.106503445 >>106503479
>>106503405
>he's not green
>no "No picture available." text
>no shirt and tie
>hand going through his neck
AI can't do ANTHING right!
Anonymous No.106503479
>>106503445
there even meta-shittiness. I asked bing to make the pic, it refused, I said "think deeper", it spat out more detailed gen instructions, then told it "prefect now make the pic" and it complied.
it can't even refuse correctly. typical microsoft
Anonymous No.106503518
>>106499389
https://www.lalal.ai/voice-cleaner/
found this site. basically it will give you a preview then to download it they try to jew you but you can just use a chromium extension that records browser audio so w/e lol
>Chrome Audio Capture
tried that site you posted, cleanvoice, and it gave a really loud result with some static
Anonymous No.106503587
https://x.com/taka84_mmd/status/1964347177616253283
Anonymous No.106503602 >>106503701 >>106503731 >>106503756 >>106503762
Say I wanted to build a CPU maxx platform I could stick a couple of GPUs into at a later date.
What's the most cost effective option chrrently for a DRR 5 platform with at least 512gb of Ram?
Anonymous No.106503701
>>106503602
>the most cost effective option

lmao even
Anonymous No.106503731
>>106503602
Scroll up to previous similar: >>106501160
DDR5 platforms: >>106501417
Anonymous No.106503752
>>106502605
It's the opposite, quanting weights is okay and almost unnoticeable down to Q4, but quanting context is terrible.
Anonymous No.106503756
>>106503602
If you're going to cpumaxx, it doesn't make sense to get anything less than 1TB of ram with how big models keep getting.
Anonymous No.106503762 >>106503824
>>106503602
no point in 512gb ddr5 when you can get 8/12 channel ddr4 (768gb-1.5TB)
Anonymous No.106503824 >>106503854 >>106503985 >>106504044
>>106503762
512gb ddr5 + heavily quanted model = barely fast enough to regularly use
1024gb ddr4 + lightly quanted model = toy you'll run once only for the benchmarks
Anonymous No.106503854
>>106503824
the 3 t/s you get even on ddr5 makes it also a toy, heavily quanted makes it a worthless toy
Anonymous No.106503862 >>106503914
>>106503207
The damage caused by Discord to human knowledge is immeasurable.
Anonymous No.106503914
>>106503862
Looking at some of the issues and prs on projects like vllm is amusing. A lot of time the description isn't much more than "as was discussed on discord" with no further details. With all development and tech support is happening on there now, I got to wonder if the big labs make any attempt to scrape discord. Maybe discord will come out with their own llm and come out on top just from their access to data.
Anonymous No.106503985
>>106503824
That's my thinking as well.
Anonymous No.106504044 >>106504083 >>106504204 >>106504213 >>106504242
>>106503824
what has to happen, hardware wise, for people to run the larger models at decent speeds at home?
Anonymous No.106504083
>>106504044
Cheap gddr6+ vram
Anonymous No.106504121
>>106499018
Anonymous No.106504130
>>106499018
>>106499389
Is there any consistent way to control the emotion presented when you generate a voice clip?
Anonymous No.106504204
>>106504044
buy the 96GB 4090s from china bwo
Anonymous No.106504213
>>106504044
- faster ram
- more memory channels
- something to make prompt processing faster, eg: intel amx
- support for the same int and fp formats available on gpu

socket sp5 + epyc genoa = 12* 4800* 8 = 460,800 MB/s
socket sp5 + epyc turin = 12* 6400* 8 = 614,400 MB/s

socket sp7 might be 16 channels of ddr5, so at least 16* 6400* 8 = 819,200 MB/s
Anonymous No.106504242
>>106504044
I don't want better hardware, I'm pretty sure not all low hanging fruits in software optimization are picked yet.
Anonymous No.106504292
>>106504274
>>106504274
>>106504274
Anonymous No.106504305
VibeVoice-Large hallucinates if two voice are way too different (Japanese female whisper and Suomi male)