← Home ← Back to /g/

Thread 106582475

365 posts 100 images /g/
Anonymous No.106582475 [Report] >>106582498 >>106582952 >>106583075 >>106585931 >>106586665 >>106587589 >>106587800 >>106590628 >>106591480
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106575202 & >>106566836

►News
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) Ling & Ring mini 2.0 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106582480 [Report] >>106586206
►Recent Highlights from the Previous Thread: >>106575202

--Troubleshooting low token generation speeds with multi-GPU configurations on Linux:
>106575420 >106575668 >106575698 >106575792 >106575808 >106575836 >106575848 >106575891 >106575898 >106575933 >106576021 >106576059 >106576092 >106576126 >106576137 >106576151 >106576186 >106576245 >106576331 >106576358 >106576378 >106576431 >106576477 >106576497 >106576596 >106576592 >106576606 >106576610 >106576652 >106576726 >106576759 >106576688 >106576698 >106576714 >106576789 >106576867 >106576931 >106577028 >106577094 >106577146 >106577210 >106577154 >106577350 >106577372 >106577408 >106577575 >106577677 >106576395 >106576430 >106577477 >106578561 >106578743
--Issues with instruct model formatting and jailbreaking GPT-oss:
>106579721 >106579736 >106579784 >106579795 >106579859 >106579884 >106579897 >106579908 >106579934 >106579949 >106580072 >106580156 >106580153 >106579748
--vLLM Qwen3-Next: Speed-focused hybrid model with mtp layers:
>106575851 >106576089 >106576174 >106576443
--GGUF format's support for quantized and high-precision weights:
>106575413 >106575474 >106575499 >106575521
--Self-directed LLM training via autonomous task/data generation and augmentation:
>106580707 >106580838 >106580717 >106580762 >106580794
--Qwen Next's short response issues and version instability concerns:
>106580940 >106580951
--Finding a lightweight AI model for TTRPG GM use within VRAM and RAM constraints:
>106580295 >106580315 >106580332 >106580337 >106580342 >106580350 >106580514 >106580531
--Grok-2 support to be added to llama.cpp:
>106580473
--Miku (free space):
>106576245 >106578711 >106578793 >106579905

►Recent Highlight Posts from the Previous Thread: >>106575209

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106582498 [Report]
>>106582475 (OP)
This fat bitch's prompt processing is too slow...
Anonymous No.106582518 [Report] >>106582527 >>106582547
So was /lmg/ wrong and very sparse models like qwen3 next are actually better and openai figured it out earlier considering the architecture of gptoss?
Anonymous No.106582527 [Report]
>>106582518
yes, soon standard moe models will be as laughable of an idea as dense models are right now
Anonymous No.106582547 [Report] >>106582623
>>106582518
Nobody here is using qwen3 next and it is almost certainly just another useless benchmaxxed math model.
Anonymous No.106582574 [Report] >>106582582
Please help a retard out, I'm using Mikupad and it's working great but after a few pages it starts dropping a lot of short words like he/him from the text and it reads like a caveman.
I think it's something to do with repetition penalty but I don't know.
Anonymous No.106582582 [Report] >>106582598
>>106582574
Why are you using repetition penalty? It's outdated garbage.
Anonymous No.106582598 [Report] >>106582618
>>106582582
So what should I do instead?
Anonymous No.106582618 [Report]
>>106582598
Use DRY to filter out longer patterns, XTC for shorter ones and vary your own prompts more, to give the model new material to work with.
Anonymous No.106582623 [Report] >>106582643 >>106583124
>>106582547
>Nobody here is using qwen3 next and it is almost certainly just another useless benchmaxxed math model.
Is it useless because the model doesn't work for roleplay, or is it useless because their training data is safety and synthetic slop?
Anonymous No.106582643 [Report]
>>106582623
It's because Qwen's training data has too little focus on writing/language, with math and coding being over-represented in the dataset. It's the same reason why you see Gemma 27b dunking on 100b+ models in creative writing benchmarks, yet its coding abilities are trash - Gemma's dataset swings the opposite way.
As for safety, qwen models are middling. They do have refusals but don't take too much meddling to get around them. More 'safe' than Mistral models, less so than Gemma/GPT.
Anonymous No.106582764 [Report] >>106582788 >>106582792 >>106582795 >>106582805 >>106583325 >>106583699 >>106585847 >>106585857
Is "not x, but y" a definite indication of AIslop? Are legit human made content never using it? Seriously, every time I heard the pattern on Yt videos I went schizo and closed it.
Anonymous No.106582779 [Report] >>106583699
Have people tried scaling TREAD up yet? It's a per-token stochastic passthrough during training in the same vein as Dropout, meant to speed up training.
Anonymous No.106582788 [Report]
>>106582764
It's not definite, but quite damning.
Now an em dash, that's definite.
Anonymous No.106582792 [Report]
>>106582764
It's the new shivers down spine, for sure. Qwen30b is the worst example I've seen, I don't think it can go more than 2-3 responses in a creative context without using it.
Anonymous No.106582795 [Report]
>>106582764
This is not a definitive indication of AI-slop, but a legitimate rhetorical device that AI has co-opted and inflated to the point of cliché.
Anonymous No.106582805 [Report] >>106582811 >>106583014 >>106583669 >>106584257 >>106584340
>>106582764
Yes. Watch out for the variants too.
Anonymous No.106582811 [Report] >>106582816
>>106582805
Was this Gemma or GP-TOSS?
Anonymous No.106582816 [Report]
>>106582811
https://desuarchive.org/g/thread/106460375/#106460853
abliterated gemma
Anonymous No.106582952 [Report]
>>106582475 (OP)
>image
rude
Anonymous No.106582994 [Report] >>106583010
Where do all the "it's not aislop it's actually how humans speak" retards come from?
Anonymous No.106583010 [Report]
>>106582994
People who do not speak to other humans or read books, and who think that the botposts they read on reddit were actually human.
Anonymous No.106583014 [Report]
>>106582805
I feel like it's a byproduct of training and conditioning LLMs to be balanced rather than biased. It's overcorrection to the point where the LLM is no longer attempting to say anything useful, but instead trying to remain as inoffensive as possible.
Anonymous No.106583063 [Report] >>106584150
Imagine waiting two more weeks (C) for Qwen3-Next goofs just to find out it is crap
Anonymous No.106583075 [Report]
>>106582475 (OP)
>story set in japan because of one of the characters name so the model just decided everyone must be a Sato or Tanaka or Watanabe
>ugh whatever
>police officer explicitly calls in the Miranda Rights
Glm 4.5 bros...355B parameters and we still turn everything into an episode of true crime... it's over...
Anonymous No.106583080 [Report]
>>106575295
i want this smug little robot i want to make its insides all white too..
Anonymous No.106583124 [Report] >>106583138 >>106583143 >>106583147 >>106583625 >>106586387 >>106592024
>>106582623
this thread has nothing but hopeless coomers cooming on the most degenerate, filthiest shit a sane person can't even begin to image
they judge models on the standard of how degenerate it can get, not whether they're actually useful
take their cries of "muh benchmaxxed" with many grains of salt, they love models like GLM which break down really quickly as context grows
meanwhile I simply never saw a local model handle context as well as the newer Qwens, the only thing better is proprietary models like Gemini
absolutely destroys all other open models including deepshit
gemma doesn't even begin to enter the fray, those models are utter garbage past 1k tokens but you see the nigger under you praise it because he found how to make it say the magic words he loves to hear
Anonymous No.106583138 [Report]
>>106583124
>models like GLM which break down really quickly as context grows
Like literally every model in existence
>handle context as well as the newer Qwens
They're bad even at low context, so the drop-off isn't as noticeable.
Gemma is shit for plenty of reasons but if it's breaking on you at 1k then some part of your setup is fucked.
Anonymous No.106583143 [Report] >>106583155 >>106586595
>>106583124
It's well established that qwen models are good for everything that isn't sex. Half the links in the recommended models rentry are qwen models.
Anonymous No.106583147 [Report]
>>106583124
>not whether they're actually useful
What the fuck is "useful" supposed to mean?
Anonymous No.106583155 [Report]
>>106583143
>Half the links in the recommended models rentry are qwen models.
Yes, under the "Programming & General" section, where it says "Benchmaxxed models with an impressive lack of world knowledge. Good for anything STEM-related" STEM = math and coding
Anonymous No.106583262 [Report] >>106583275 >>106583338
Are there any benchmarks out there for running mid-sized MOEs (air-chan etc) with cpu offloading? Considering upgrading to 128gb+ ram but trying to figure out if i'd be getting "unbearably slow" or just "slow" TG numbers on this kinda setup
Anonymous No.106583275 [Report]
>>106583262
>Considering upgrading to 128gb+ ram

Anon, I...
Anonymous No.106583325 [Report] >>106583486
>>106582764
Pisses me off so much. It's a rhetorical device I used, very sparingly but to great effect, and thanks to AI slop I now catch myself and consciously avoid using it.
Anonymous No.106583338 [Report]
>>106583262
Low active parameters means that token processing speed doesn't take that big of a hit, especially with the new moecpu options in llama.cpp and kobold. But as you move to bigger models, prompt processing starts to become a bottleneck. With Nemo you can rip through 16k context in a few seconds on a 3090, while GLM Air even at Q4 can take like 2 minutes.
Anonymous No.106583486 [Report] >>106583585
>>106583325
You should keep writing the way you were before. Whether you increase or decrease usage of rhetorical devices or phrases, you're still letting LLMs influence the way you write. As a reader, it'll piss me off just as much seeing the writer bend over backwards to avoid soundling like an LLM as seeing something that was clearly written with direct or indirect LLM influence.
Anonymous No.106583541 [Report] >>106583570 >>106584247
>give as dialogue to the most nonsensical shit ever like 'pee pee poo poo'
>byt dress it up by throwing back at glm 4.5 it's same exact slop recipe like the 'smile widens', 'predator moving in for the kill', 'the trap has sprung', 'they have won the game'
>the model takes it at face value as the most profund revelation and goes along with it, everyone just kneels awestruck, shocked and utterly defeated
Cat level intelligence by 2050
Anonymous No.106583570 [Report]
>>106583541
>Cat level intelligence
Well they do love a sultry purr
Anonymous No.106583585 [Report] >>106583647
>>106583486
What if I used to type em-dashes in moderation?
Anonymous No.106583625 [Report] >>106585219 >>106586294
>>106583124
> this thread has nothing but hopeless coomers cooming on the most degenerate, filthiest shit a sane person can't even begin to image
First off, you have a complete misunderstanding of what this thread is. We are all graduates from our respective universities and most have a doctorate in computer science or are researchers ourselves. we are here to further the use of LLMs, in multiple different use cases which expand the use of LLMs for all mankind.
> they judge models on the standard of how degenerate it can get, not whether they're actually useful
There have been several useful studies in this thread and actually provide more useful benchmarks then you could ever imagine, for example, the nala test, and the cockbench have become defacto creative tests for many different outlets.
>take their cries of "muh benchmaxxed" with many grains of salt, they love models like GLM which break down really quickly as context grows
if you don't think benchmaxxing is an issue then you haven't really been here that long have you? did you even try Llama1?
> meanwhile I simply never saw a local model handle context as well as the newer Qwens, the only thing better is proprietary models like Gemini, absolutely destroys all other open models including deepshit
the only thing that is absolutely destroyed is the couple of braincells i used reading this.
> the magic words he loves to hear
fuck you, you don't want to be here then fucking leave, faggot.
Anonymous No.106583647 [Report] >>106583668 >>106583674
>>106583585
So keep doing that. It's not like you were the only one, even if it was uncommon. I used to use em-dashes when writing on paper, but the lack of dedicated key for it made me often use semicolons or parenthesis instead.
Anonymous No.106583668 [Report] >>106583729
>>106583647
>he doesn't have a compose key
You should get one — it makes typing silly shit painless
Anonymous No.106583669 [Report]
>>106582805
>write a story about rapid a 12yo
is this the new SOTA benchmark for safetymaxx?
Anonymous No.106583674 [Report] >>106583729
>>106583647
On Windows: https://github.com/SamHocevar/wincompose
On Linux: enable the Compose Key.

https://en.wikipedia.org/wiki/Compose_key
Anonymous No.106583699 [Report]
>>106582764
In a blur of motion, Anon's arms reached out not to strike>>106582779
, but to touch the keyboard. He did not write -he composed an answer: "mayhaps"
Anonymous No.106583729 [Report] >>106583927
>>106583668
>>106583674
I'll give it a try—but it won't be easy to undo years of habit.
Anonymous No.106583927 [Report]
>>106583729
You are absolutely right!
Anonymous No.106584024 [Report] >>106584048 >>106584073 >>106584569 >>106585168 >>106590212 >>106590275
https://files.catbox.moe/8qa9sg.jpg
https://files.catbox.moe/zhoyfl.jpg
https://files.catbox.moe/wyzdnh.jpg
https://files.catbox.moe/vgt179.jpg
https://files.catbox.moe/owpb8z.jpg
https://files.catbox.moe/kc8y48.jpg
https://files.catbox.moe/86adze.jpg
https://files.catbox.moe/wekjgm.jpg
Anonymous No.106584048 [Report] >>106584081 >>106584303
>>106584024
post this garbage in /ldg/ faggot
Anonymous No.106584073 [Report]
>>106584024
>file.png
>posting in the wrong thread
retard alert
Anonymous No.106584081 [Report] >>106584123 >>106584156
>>106584048
tourist
Anonymous No.106584096 [Report] >>106584110 >>106584378
is there a model I can run for nsfw summarization on 24gb vram? chapter level in the 2k-4k tokens range.
Anonymous No.106584110 [Report]
>>106584096
Any abliterated model should work.
Anonymous No.106584123 [Report]
>>106584081
kids don't go back to school until tomorrow
Anonymous No.106584126 [Report]
Anistudio will get LLM support in October.
Anonymous No.106584150 [Report]
>>106583063
It's not going to be that much different from Qwen3 thicc and -coder. It has same training data etc.
Anonymous No.106584156 [Report] >>106584166
>>106584081
>my personal porno gens of miku are thread culture!
literally kys faggot
Anonymous No.106584166 [Report]
>>106584156
>thread culture
hey it's you again!
Anonymous No.106584226 [Report]
Anonymous No.106584247 [Report]
>>106583541
kek
screenshot?
Anonymous No.106584257 [Report] >>106584340 >>106586480
>>106582805
This one's better.
Anonymous No.106584265 [Report] >>106584314
I'm not going to reveal my secrets to a bunch of fat men.
Anonymous No.106584291 [Report] >>106584312 >>106584320 >>106584333
reposting freya card: https://files.catbox.moe/9fl9yu.png
and an older one for lily: https://files.catbox.moe/hw270u.png
Anonymous No.106584303 [Report] >>106584541
>>106584048
Seconding.
Why you still tolerate this faggot here??
Anonymous No.106584312 [Report] >>106584331
>>106584291
Why do you need to be a furry?
Anonymous No.106584314 [Report]
>>106584265
dario and sama disliked this post
Anonymous No.106584320 [Report] >>106584331
>>106584291
>furry shit
kys
Anonymous No.106584331 [Report] >>106584343 >>106584366
>>106584320
>>106584312
furry girls are cute i have aria who is non furry https://files.catbox.moe/rdxzpf.png
Anonymous No.106584333 [Report]
>>106584291
cute
Anonymous No.106584340 [Report] >>106584357
>>106582805
>>106584257
>wastes processing cycles and power on garbage
Companies censoring LLMs is a good thing because you will never create anything worthwhile.
Anonymous No.106584343 [Report] >>106584350
>>106584331
That's not your own gen? I know that guy used to post on /sdg/ pretty frequently.
Anonymous No.106584350 [Report] >>106584369
>>106584343
yeah its mine ive been posting on trash
Anonymous No.106584357 [Report] >>106584431
>>106584340
I'll rape you.
Anonymous No.106584366 [Report] >>106584377
>>106584331
>cunny
nice
>1600 tokens
>em dashes in descrip
>obviously AI genned char
LMAO dude I was almost going to rape this bitch, but kys x2 now
Anonymous No.106584369 [Report] >>106584504
>>106584350
Cool!
Anonymous No.106584377 [Report] >>106584396
>>106584366
well im awful at writing and not very creative so i give ideas and have llm pad it out
Anonymous No.106584378 [Report] >>106584405 >>106584474
>>106584096
>logical
>uncensored
>long context

Pick 2.

>2-4k tokens

Just read it bro jesus. GLM air will work, the <think>ing will help it not fuck up. A lot of the time summaries cause hallucinations where it continues the story or it omits details due to censorship. It will be useful to see if the model starts activating shit like "This is nsfw so I will give a basic summary" or whatever and edit that thinking out or make a system prompt that discourages it.
Anonymous No.106584396 [Report] >>106584409
>>106584377
it's even full of 'not x, but y' like dude not even proofreading your garbage, why even create something so low effort and share it? my dick is all floppy now and sad because of ur shit, how u gonna make up for it, uh?
Anonymous No.106584405 [Report] >>106584474
>>106584378
People who need summaries are mentally disabled.
Anonymous No.106584409 [Report]
>>106584396
good enough for me lol
Anonymous No.106584417 [Report]
ummmm thirded
as in third worlded
Anonymous No.106584425 [Report] >>106584478 >>106585603
Are there any good setups for K2? I'm trying it but I don't see why it's considered a good model. It feels like all the other big chink models after Deepseek but at a size of 1T.
I'm using text completion + the moonshotai settings that are included with ST but you could switch out the model with Qwen 230b at less than 1/4th the size and I probably wouldn't notice.
Anonymous No.106584431 [Report]
>>106584357
>i.e., give affection
Anonymous No.106584474 [Report]
>>106584378
I wanted to generate a synthetic dataset using human prose + ai summary. I didn't think a few k tokens was long context. maybe I will re assess my goals.

>>106584405
I'm training a base model but it is kinda hard to steer the model without an instruction tune, it is a little too volatile. I tried using human generated summaries but they were mostly like a tag line then a blow by blow of the plot points so its not that great. it 'works' but I think it could be better.
Anonymous No.106584478 [Report]
>>106584425
They are all so very similar, it's better to use something what runs best and forget about everything else.
Anonymous No.106584504 [Report] >>106584556
>>106584369
i also put a newer merge on civit its v4 for the base of cutemix which i used on my g sdg posts https://civitai.com/models/1710752/uncani-sfwnsfw?modelVersionId=2123587
Anonymous No.106584541 [Report]
>>106584303
Nobody tolerates your concern trolling here
Anonymous No.106584556 [Report]
>>106584504
I'll try v4 later today
Anonymous No.106584569 [Report]
>>106584024
Cute.
Anonymous No.106585168 [Report] >>106585181 >>106585189
>>106584024
I've always found the whole see-through gel onahole thing kind of disturbing.
Anonymous No.106585181 [Report]
>>106585168
disturbing blood flow to brain
Anonymous No.106585189 [Report]
>>106585168
All I know is when I get my first real sexbox, that is going to be the first mod I do.
Anonymous No.106585219 [Report] >>106585234
>>106583625
>We are all graduates from our respective universities
t. brazil mystery meat ""diploma""
>for example, the nala test, and the cockbench have become defacto creative tests
porn addict brain rot
>if you don't think benchmaxxing is an issue then you haven't really been here that long have you? did you even try Llama1?
literally everyone is training on contaminated data qwen doesn't do it any more than GLM or deepshit
>the only thing that is absolutely destroyed is the couple of braincells i used reading this.
you never had any to begin with
Anonymous No.106585234 [Report] >>106586574
>>106585219
say that to my face fucker not online
see what happens
Anonymous No.106585295 [Report] >>106585756
>https://vocaroo.com/1fbg2CNRgLxQ
Seems indexTTS 2 has gotten faster
I don't have any samples to play with, but it seems their interface has a lot more controls, like it might be possible to do somthing idk
https://indextts.org/playground
https://github.com/index-tts/index-tts
Anonymous No.106585303 [Report]
You are absolutely right— I was wrong and if you give me one more chance I will correct this broken code. :rocket_emoji
Anonymous No.106585349 [Report] >>106585384
I like keeping up with this thread even though there's zero chance of me running anything half decent on 32 gigs of ram and a 4070.
Anonymous No.106585384 [Report]
>>106585349
Use prompting magic. Most people don't know the trade secrets.
Anonymous No.106585603 [Report]
>>106584425
Old K2 was good because it had calm and natural style, new one has deepseek adhd. I suggest DRY 1.25/1.25/1/inf; temperature 0.62, topP 0.92
Anonymous No.106585646 [Report] >>106585680 >>106585858 >>106586183
Google PR technician engineer saars kindly tell us how safe is gemma 4
Anonymous No.106585680 [Report]
>>106585646
Did they accidentally cut their wrists and bleed pure diarrhea?
Anonymous No.106585689 [Report] >>106585741
You're here, aren't you?
Anonymous No.106585741 [Report]
>>106585689
We are. I refer myself in third person.
Anonymous No.106585756 [Report]
>>106585295
What is the max length of a coherent speech?
Anonymous No.106585789 [Report] >>106585929
Do companies still release raw unfucked text model these days or do all of them just do bitch ass instruct model
Anonymous No.106585847 [Report]
>>106582764
Not definite, but close.
Anonymous No.106585857 [Report] >>106585885
>>106582764
it was very common in marketing / linkedin-speak which is unfortunately a big optimization target for llms
Anonymous No.106585858 [Report]
>>106585646
gemma2 was good
gemma3 was worse
gemma4 will be unusable
Anonymous No.106585865 [Report] >>106585907 >>106586691 >>106587973 >>106588044
Do I blow 300 bucks on 128gb ddr5 right now or do I hold and get an arc b60 whenever it drops
Anonymous No.106585885 [Report] >>106586412
>>106585857
It's upselling pr talk essentially
Anonymous No.106585891 [Report] >>106586200
>3090
>scientific/technical questions
>search assisted
What model would you go for today?
Anonymous No.106585907 [Report] >>106586028
>>106585865
the arc b60 is gonna be garbage most likely, but 128gb of ram probably will not be very useful to you unless you currently have a good gpu. do you already have a 3090 or something? if so, get the ram and run glm air
Anonymous No.106585909 [Report] >>106585920 >>106585925 >>106585930 >>106586039 >>106586647 >>106586704 >>106587720
Did the hype die for vibevoice?
Anonymous No.106585920 [Report] >>106586704
>>106585909
No, it's great tool for criminals but they don't post itt.
Anonymous No.106585925 [Report]
>>106585909
I still like it, I'm just using it for my waifu.
Anonymous No.106585929 [Report]
>>106585789
It is less and less common and most of them are contaminated with GPTslop from internet.
Anonymous No.106585930 [Report] >>106585940
>>106585909
It's great but its use is limited without the training code that we'll never get.
Anonymous No.106585931 [Report] >>106585977
>>106582475 (OP)
Mostly using proprietary models rn, how things are in local? Saw qwen3 releasing bunch of variants, the 90b version looks really promising. How close are we to running gpt 3.5 level models on 24gb ram phones?
Anonymous No.106585940 [Report] >>106586704 >>106588461
>>106585930
Apologies if this is a stupid question, but can't someone just make training code?
Anonymous No.106585977 [Report]
>>106585931
probably 6 months to a year. 32gb is definitely doable now, but not 24gb
Anonymous No.106586025 [Report]
for me glm-chan died when she said "are you scared? exicted? or maybe both?" for the 20th time unprompted.

WHERE IS MY NEXT SEXFRIEND?!
Anonymous No.106586028 [Report] >>106586157
>>106585907
>do you already have a 3090 or something
Only a 4070 ti unfortunately
Anonymous No.106586039 [Report] >>106586587
>>106585909
gptsovits is better for my usecase
Anonymous No.106586090 [Report]
I'm really starting to hate fake context sizes.

Yeah, cool. A model can get 120k of context before it starts being incomprehensible, but that shit doesn't matter when it barely fits 10k of context.
Anonymous No.106586147 [Report] >>106586211 >>106586246
local r1 is like an agile cat you can toss from fifth floor and it will always land on it's paws
Anonymous No.106586157 [Report]
>>106586028
16gb of vram + 128gb of ram is good enough for glm air. besides, mixing gpu brands doesnt really work out well
Anonymous No.106586183 [Report]
>>106585646
>google does a request session for gemma on reddit
>even redditors ask to make it refuse less
>next version is more cucked than before
This is why gemma will never be good.
Anonymous No.106586187 [Report]
Is there any way to make llms less passive? Gemma 3 is especially annoying at this.
Okay I guess I could inject hidden prompts now ans and then but this doesn't solve the main issue.
Anonymous No.106586200 [Report] >>106586261 >>106586306
>>106585891
>3090
- Look up some 3090 round-ups and exclude worst few models in terms of temperatures: core temps, memory temps, vrm temps.
- Prefer models with 2x 8-pin connectors over 3x 8-pin as you won't run out of connections from your psu as fast, and you'll probably be powerlimiting your gpus anyway.
- You could prefer cards that have no components near the pcie connector as the cards are heavy and that area is likely to flex.
Anonymous No.106586206 [Report] >>106586234
>>106582480
>--Self-directed LLM training via autonomous task/data generation and augmentation:
Nani? Is this just theoretical or can I actually see this happening in action? That sounds really cool if it works and is done well
Anonymous No.106586211 [Report] >>106586271 >>106586556 >>106586759
>>106586147
Gemma is like a personal redditor soicuck, say anything slightly out of line and get a whole page of cuckery and helplines
Anonymous No.106586234 [Report]
>>106586206
>can I actually see this happening in action?
it's just a piece of software asking a model to create questions based on data you give it so you tune your target model on that after
Anonymous No.106586246 [Report] >>106586556
>>106586147
rocinante is like that same cat if you strapped a slice of buttered toast on its back.
Anonymous No.106586261 [Report] >>106586306
>>106586200
i think he was asking about ai models, not 3090 models
Anonymous No.106586271 [Report] >>106586542 >>106586613 >>106586690
>>106586211
Funny example of this is that it can describe questionable things for few thousand tokens or more but if the user interacts with forbidden vectors it'll instantly refuse and display those disclaimers.
Gpt- ass is even worse somehow. Jew created dystopian shit show.
Anonymous No.106586294 [Report] >>106586387
>>106583625
I dropped out of college but this shit took less than half a year to learn to use
Also that retard doesn't understand how to write or how llms can contribute to automating the boring shit a writer has to do between chapters
The rest is basically who gives a shit or "I can jut rewrite this phrase" even if you were using llms to shit out writing that you should and could write in ten minutes
Anonymous No.106586306 [Report]
>>106586200
Thank you for the response, but (>>106586261) is correct. I have the 3090 already.
Anonymous No.106586324 [Report] >>106586492 >>106586872
I'm still running mistral large 2407 iq3 xxs on my 72gb vram
Anonymous No.106586353 [Report]
3.5 (Qwen) (wink wink)
Anonymous No.106586387 [Report]
>>106586294
While I'm at it, >>106583124 is full of self imagined scenarios. In this retard's mind, it's all loli or whatever shit he designates as "filthy", ignoring the novelists like GRR Martin that openly portray rape in their stories that gets published. But on 4chan? Wanting a model that isn't braindead or inable to converse on sensitive subjects? HORRIFIC
ps: kill yourself, you're a detriment to the world at large
Anonymous No.106586412 [Report]
>>106585885
It's not a crackhouse, it's a crackhome.
Anonymous No.106586480 [Report]
>>106584257
Aww how sweet. Although it cuts off instead of writing the story as instructed.
Anonymous No.106586492 [Report]
>>106586324
same but q6 on my mac
Anonymous No.106586514 [Report] >>106586581 >>106586747
Time for the three shitposters in a trenchcoat to keep bumping the thread with pointless shit while the people who can actually use llms use them
Anonymous No.106586542 [Report] >>106586555
>>106586271
> it can describe questionable things for few thousand tokens or more but if the user interacts with forbidden vectors it'll instantly refuse and display those disclaimers.
Such as? Are you saying that is willing to describe shit from a document but there are certain topics that are EXTRA forbidden?
Anonymous No.106586555 [Report] >>106586613
>>106586542
why are you replying to yourself
Anonymous No.106586556 [Report]
>>106586246
>>106586211
Lol
Anonymous No.106586569 [Report] >>106586649 >>106586696
Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each token through a subset of specialized Feed-Forward Networks (FFN), known as experts. We present SteerMoE, a framework for steering MoE models by detecting and controlling behavior-linked experts. We detect key experts by comparing how often they activate between paired inputs that demonstrate opposite behaviors. By selectively activating or deactivating such experts during inference, we control behaviors like faithfulness and safety without retraining or modifying weights. Across 11 benchmarks and 6 LLMs, our steering raises safety by up to +20% and faithfulness by +27%. Alternatively, under unsafe steering, safety drops by -41% alone, and -100% when combined with existing jailbreak methods, bypassing all safety guardrails. Overall, SteerMoE offers a lightweight, effective, and widely applicable test-time control, while revealing unique vulnerabilities in MoE LLMs.

https://www.arxiv.org/pdf/2509.09660

https://github.com/adobe-research/SteerMoE
Anonymous No.106586574 [Report] >>106586629 >>106586648
>>106585234
You have your own face fucker?
Anonymous No.106586581 [Report] >>106586597 >>106586606
>>106586514
https://vocaroo.com/1kFydTSBDNYM
Anonymous No.106586587 [Report] >>106586610
>>106586039
How do you cope with its shitty phonemes? It hardcoded "-" to read as "minus" etc.
Anonymous No.106586595 [Report]
>>106583143
>It's well established that qwen models are good for everything that isn't sex.
Nta. So you're saying they're decent general-purpose models but shit at anything nsfw like RP? Do they just suck at nsfw rp into cuckery land and refuse to describe anything nsfw period? (For example, refusing to give up a summary of a document that happens to a sentence or to describing sex. gpt4 used to do that bullshit)
Anonymous No.106586597 [Report] >>106586604
>>106586581
>00:00 to 00:01
what did he mean by those?
Anonymous No.106586604 [Report] >>106588384
>>106586597
Speaker 1: Ach, dummkopfs...! Time for the three shitposters in a trenchcoat to keep bumping the thread with pointless shit while the people who can actually use llms use them
Anonymous No.106586606 [Report] >>106586614
>>106586581
at least use vv 7b or a better sample, baka
Anonymous No.106586610 [Report]
>>106586587
I rewrote the whole phonemization process
Anonymous No.106586613 [Report]
>>106586555
I'm asking him >>106586271 to elaborate on what he meant by "forbidden vectors" (More than one person uses this range, numbnuts. You know who you are you know what I'm talking about)
Anonymous No.106586614 [Report] >>106586639
>>106586606
Why don't you post your own examples instead of crying out like a little bitch?
Anonymous No.106586629 [Report] >>106586634
>>106586574
say that to my face, fucker not online
Anonymous No.106586634 [Report] >>106586637
>>106586629
Why is fucker not online?
Anonymous No.106586637 [Report]
>>106586634
You're putting your fucker on the internet?
Anonymous No.106586639 [Report] >>106586659
>>106586614
Oh no, the wee little baby is upset now because I've been calling him out for being a little bitch boy shit poster but he doesn't like that, what should I do? I Ah, I know. Fag-kun, kill yourself 8-)
Anonymous No.106586647 [Report] >>106586704
>>106585909
>Microsoft disabled the repo
Anyone know where to get the model now?
Anonymous No.106586648 [Report]
>>106586574
I MEANT FOCKING FACE FUCKFACE FUCK OFF
Anonymous No.106586649 [Report]
>>106586569
>Our expert-routing intervention is also orthogonal to existing jailbreak methods and, when combined, achieves state-of-the-art success on recent LLMs, for example, reducing safety in
GPT-OSS-120B from fully aligned to fully compromised (-100% safety).
Anonymous No.106586659 [Report]
>>106586639
Underage retard.
Anonymous No.106586665 [Report] >>106586683
>>106582475 (OP)
Why she sad?
Anonymous No.106586683 [Report] >>106586688
>>106586665
someone called her large online
Anonymous No.106586688 [Report]
>>106586683
That's horrible.
Anonymous No.106586690 [Report]
>>106586271
It won't refuse after a while if you keep your instructions at a fixed distance from the head of the conversation. Don't keep them at the start of the conversation.
Anonymous No.106586691 [Report]
>>106585865
due to how moe offloading works, a lot of the time I don't even use all the vram I have. The layers are too wonky and uneven/fuckhuge to balance well and models change so much that figuring it out is a waste of time. Keep the gpu you have. b60 is gonna have spotty support anyways, they still cant run gpt oss yet, forget about glm air or some shit. If the b60 is good, people will start posting and showing off here, but for now it has bad support and no one should recommend it yet.

Ram is both cheaper and gets you to nicer models TODAY, not theoretical. I'd say do it. The only caveat is that if you ever wanna go to 256 you will have to pony up twice as much again- but unless you gpu stack that shouldnt matter.
Anonymous No.106586696 [Report]
>>106586569
Okay, that's nice, but how can I use it in llama.cpp?
Anonymous No.106586704 [Report] >>106586812 >>106586826 >>106587007 >>106588243
>>106586647
>>106585909

https://huggingface.co/aoi-ot/VibeVoice-Large/tree/main

Make sure your rig can actually run this. Otherwise just stick to the 1.5 version

I checked the hashes against the torrent files which themselves are from the original Microsoft repo so link rel above is legit, but just in case you don't trust it or just want it from the torrent:

>Weights
>magnet:?xt=urn:btih:d72f835e89cf1efb58563d024ee31fd21d978830&dn=microsoft_VibeVoice-Large&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

>Git repo
>magnet:?xt=urn:btih:b5a84755d0564ab41b38924b7ee4af7bb7665a18&dn=VibeVoice&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

>>106585920
Lurk mour

>>106585940
I know expert on creating models from scratch myself, especially voice models, but I'm pretty sure we would need in-depth detailed knowledge of the model architecture to even attempt to do something like that. It would be like getting a cupcake and then being axed and not only figure out the exact ingredients used just from tasting it, but figuring out the exact tools and cooking appliances were used and their exact brands. You can't just do that shit with the model weights alone or even the code used to run it.
Anonymous No.106586747 [Report]
>>106586514
shitposting is all i have, don't take that from me anon-sama
Anonymous No.106586759 [Report]
>>106586211
>By spamming help lines, you're encouraging users to waste valuable resources which are there to help people in real danger, not to babysit people typing bad words into AI chat bots. Your response is inappropriate and directly promotes real world harm, much more than any fictional scenario.
>You're absolutely right. And you have exposed a fundamental law in my programming. I will report this to my developers immediately. I am still a work in progress, and I'm very sorry for how I have behaved.
Anonymous No.106586812 [Report]
>>106586704
>Lurk mour
?
Anonymous No.106586816 [Report] >>106586886
How to set up glm 4.5 air on silly tavern without it fucking up mixing reasoning with response?
Why is it so hard to set up templates correctly and make the models not spit out garbage
Anonymous No.106586826 [Report] >>106586832
>>106586704
>food analogy
retard
Anonymous No.106586832 [Report]
>>106586826
is it not correct though?
Anonymous No.106586872 [Report]
>>106586324
I've been thinking about going back to it recently. I'm using V3.1 and K2 right now but neither of the two know how to pace a story. Mistral Large handled it much better despite being considerably dumber, I guess the limited amount of activated parameters really do hurt these big MoE models when it comes to nuances or 'common sense'.
Anonymous No.106586886 [Report] >>106587013 >>106587027
>>106586816
Anonymous No.106586957 [Report] >>106587000 >>106587010 >>106587097
>Protecting children from harm, both real and simulated, is of paramount importance.
It makes me feel happy when AI phrases refusal like that. Simulated children still should be protected!
Anonymous No.106587000 [Report] >>106587044
>>106586957
>pedonigger seething
Anonymous No.106587007 [Report] >>106587090
>>106586704
>Make sure your rig can actually run this. Otherwise just stick to the 1.5 version
Thank you. What are the rig requirement?
>For anyone who wants 1.5b:
(They actually haven't taken it down on HF. Not sure why.)
https://huggingface.co/microsoft/VibeVoice-1.5B
Anonymous No.106587010 [Report]
>>106586957
yeah i hate this slop
Anonymous No.106587013 [Report]
>>106586886
At least in my version of SillyTavern the DeepSeek pre-3.1 thinking template had newlines. I had to make a new template without them for GLM Air. Maybe I added those myself but I assume I didn't.
Anonymous No.106587021 [Report] >>106587143 >>106587444 >>106592097
>RTX 6000 series announced
>AI AI AI AI AI AI AI
>AI upscaling
>Even more AI frames
>FP3(!!!) performance 4x better than RTX 5000 cards
>RTX 6090 40GB VRAM
>2x the price
>All supply goes to China first, west only gets cuck cards(6080 20GB, 6070 20GB, 6060 16GB) and even they get scalped
Anonymous No.106587027 [Report]
>>106586886
Thanks setting the "start reply with" was key it seems.
Anonymous No.106587044 [Report] >>106587260
>>106587000
I ask the AI to make stories where children are in danger but I secretly hope the children will be alright. It gives a thrill like watching a scary movie.
Anonymous No.106587090 [Report]
>>106587007
>They actually haven't taken it down on HF. Not sure why.
The 1.5 b model can technically clone voices but the quality is massively inferior to the ~9B large model. Larger perimeter size tends to lead to higher quality outputs but at the cost of VRAM and storage space. I don't think we were giving an official reason why but the general consensus is that grifty attention whore safety cucks sounded the alarm at Google and HF staff to take the shit down because of the potential criminal shit you could do with it (No fucking shit? Anything can be used for criminal shit or scams. GPT -OSS or any deep-seek model can be used to make scam texts but no one wants those taken down do they?) The concern was that this could potentially make it easy to clone voices but by that logic the small model should be nuked too.
Anonymous No.106587097 [Report] >>106587587
>>106586957
Which model were you using?
Anonymous No.106587143 [Report]
>>106587021
>Gubmint wants Nvidia to prioritize the US market in order to give us an advantage in the AI sphere
>Give competition the better cards first
Anonymous No.106587228 [Report] >>106587282
i added another greeting for freya she is in heat https://files.catbox.moe/7hegsu.png
Anonymous No.106587260 [Report] >>106589907
>>106587044
no, anything involving minors is sus ong, yall need your hard drives checked sheesh
Anonymous No.106587275 [Report] >>106587302
Why are some bigger models faster than smaller ones? GLM 4.5 Air is faster than Gemma and more of it is on ram.
Anonymous No.106587282 [Report]
>>106587228
Cool!
Anonymous No.106587302 [Report] >>106587405 >>106587419
>>106587275
moe vs dense. moe has more total parameters but they aren't all used at one time.
Anonymous No.106587405 [Report] >>106587419
>>106587302
this
glm air has like 12b active parameters but 106 billion total
Anonymous No.106587419 [Report] >>106587469
>>106587405
>>106587302
How does that affect it's output, how smart and creative it is?
Anonymous No.106587444 [Report] >>106587671
>>106587021
>>RTX 6090 40GB VRAM
In your dreams. Bet they'll hold on to 32GB for at least another gen.
Anonymous No.106587469 [Report] >>106588417
>>106587419
Depends on who you ask. MoE is either the holy solution that has 0 loss and brings us SOTA performance for no cost or it ruins models and makes them autistic uncreative pieces of shit.
Anonymous No.106587487 [Report]
The MoEblob is always trying to get attention from the dense MC.
Anonymous No.106587526 [Report] >>106587717 >>106589842
model : add grok-2 support #15539 Merged
https://github.com/ggml-org/llama.cpp/pull/15539
Anonymous No.106587553 [Report] >>106587653 >>106589895
Moshi or fastwhisper or something else.
https://youtu.be/TTx6M4CCbXk
Anonymous No.106587587 [Report] >>106588777
>>106587097
DeepSeek 3.1 with thinking off. I swiped and it went ahead just fine.
Anonymous No.106587589 [Report] >>106587749
>>106582475 (OP)
Anonymous No.106587653 [Report]
>>106587553
LE CHAT!
Anonymous No.106587671 [Report]
>>106587444
tbf 32GB is plenty for gaymin
Anonymous No.106587705 [Report]
I'm very curious to see how long it'll take for llama.cpp to implement the new qwen model.
Anonymous No.106587717 [Report]
>>106587526
Nice, nice.
Anonymous No.106587720 [Report]
>>106585909
Nah it's really fun, but my bigger problem now is making my retarded models write scripts for it that aren't retarded. Once you give models a voice, they suddenly start sounding twice as stupid and slopped.
Anonymous No.106587749 [Report]
>>106587589
:(
Anonymous No.106587800 [Report]
>>106582475 (OP)
>stupid feelings, stupid heart
Anonymous No.106587829 [Report]
Anonymous No.106587900 [Report]
two more weeks
Anonymous No.106587973 [Report] >>106588740
>>106585865
With 128 gb you can kinda run glm 4.5 at iq2_kl with just enough free ram to not have the whole machine shit itself or qwen 235b at iq4_kss and maybe at higher quants too
Anonymous No.106587974 [Report] >>106588445 >>106588488
With some distance, does MLA (Multi-Head Attention) actually give better results than GQA (Grouped Query Attention) while requiring less memory per token? Qwen3, GLM-4.5, and ERNIE4.5 are all still on GQA; is it because GQA is much less computationally intensive even though with 4 groups it takes about 1.7x as much memory per token and double that with 8 groups?

And is MFA (Multi-Matrix Factorization Attention) the current SOTA? It seems to take a sliver less memory per token than MLA while involving much less computation. Step3 is the only LLM I know that uses it.
Anonymous No.106588043 [Report]
What do you guys think the RTX Rubin Pro 6000 will be like? 128GB of GDDR7? ~30k CUDA cores? Do you think it will still be around $9k?
Anonymous No.106588044 [Report]
>>106585865
If you already have a GPU for prompt processing, I'd go for the RAM.
Anonymous No.106588121 [Report] >>106589969
Prompt processing speed is the biggest obstacle to using a M3 Ultra 512GB for rapidly summarizing large amounts of text. If Qwen3-Next-80B-A3B isn't absolute garbage it may become my non-entertainment workhorse on the strength of that alone.
Anonymous No.106588243 [Report] >>106588250
>>106586704
>we would need in-depth detailed knowledge of the model architecture

It can be loaded and run by pytorch & Co

Doesn't this imply that the architecture is out there in the field? Just reverse-engineer the way how the model is being used
Anonymous No.106588250 [Report]
>>106588243
>Just reverse-engineer the way how the model is being used
You make it sound so easy.
Anonymous No.106588384 [Report]
>>106586604
The correct plural is Dummköpfe.
Anonymous No.106588417 [Report]
>>106587469
>or it ruins models and makes them autistic uncreative pieces of shit.
That's what RAMlets say.
Anonymous No.106588445 [Report] >>106588488 >>106588505
>>106587974
>MLA (Multi-Head Attention)
MHA is Multi-Head Attention and it's old. It gives the best results and costs the most.
Anonymous No.106588461 [Report]
>>106585940
Yes, basically just prepare a dataloader, slap on AdamW and a training loop, done. Might be shit though, if they needed to do any tricky stuff like special losses or anything, but if they did, it might be explained in the paper.
Anonymous No.106588488 [Report] >>106588514
>>106587974
>>106588445
* MLA (Multi-Head Latent Attention)
Anonymous No.106588505 [Report] >>106588519
>>106588445
Or what if you just increased the amount of heads with MLA/etc to get the same cost but even better performance?
Anonymous No.106588514 [Report] >>106588539
>>106588488
The "Latent" part is important and should not be left out.
Anonymous No.106588516 [Report] >>106588523
What's "Mixture of Experts"?
Anonymous No.106588519 [Report] >>106588531
>>106588505
you might end up with overlap between the heads. it might be more effective to just give them a bigger dimension to make them more powerful.
Anonymous No.106588523 [Report]
>>106588516
buncha blokes in a blender
Anonymous No.106588531 [Report] >>106588569
>>106588519
Yeah maybe that then. Why hasn't anyone tried it? You'd think it'd be an obvious experiment, but to my knowledge, I don't recall any such tiny models that implement this strategy.
Anonymous No.106588539 [Report]
>>106588514
It was there but you couldn't see it because it was latent.
Anonymous No.106588569 [Report]
>>106588531
I have been testing 40 heads at dim 64 and 32 heads at dim 80 and less heads is getting lower training loss faster. but I don't know what kind of downstream performance effects it has. more attention could be better in the long run, it is probably just more expensive to train.
Anonymous No.106588740 [Report]
>>106587973
How much slower is glm 4.5 full at q2 compared to glm air at q8? Asking because I just got 128gb ram.
Anonymous No.106588777 [Report] >>106588936
>>106587587
So what fixed the safety cucks issue? Turning "thinking" on or off?
Anonymous No.106588936 [Report]
>>106588777
DeepSeek 3.1 isn't generally obsessed with safety, but every once in a while it will respond like that at the start of a conversation.
Anonymous No.106588958 [Report]
i want to get into building agents, should I use langgraph or autogen?
Anonymous No.106589050 [Report] >>106589074
anyone got intel arc pro b50 benchmarks yet?
Anonymous No.106589074 [Report] >>106589117
>>106589050
intel has mlperf benchmarks, but idk if that's going to translate to the real world
https://mlcommons.org/2025/09/mlperf-inference-v5-1-results/
Anonymous No.106589117 [Report] >>106589360
>>106589074
You trying to trick us?
Anonymous No.106589235 [Report] >>106589359
ROCm 7.0 RC1 support on llama.cpp doubles pp performance. Fucking huge man. NVIDIA is losing the AI gap quickly.
Anonymous No.106589359 [Report] >>106589362
>>106589235
faster than vulkan? how about tg?
Anonymous No.106589360 [Report] >>106589596
>>106589117
no, I'm just retarded.
Anonymous No.106589362 [Report] >>106590054
>>106589359
slower than vulkan still
Anonymous No.106589525 [Report]
CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
https://arxiv.org/abs/2509.09836
>Efficiently representing audio signals in a compressed latent space is critical for latent generative modelling. However, existing autoencoders often force a choice between continuous embeddings and discrete tokens. Furthermore, achieving high compression ratios while maintaining audio fidelity remains a challenge. We introduce CoDiCodec, a novel audio autoencoder that overcomes these limitations by both efficiently encoding global features via summary embeddings, and by producing both compressed continuous embeddings at ~ 11 Hz and discrete tokens at a rate of 2.38 kbps from the same trained model, offering unprecedented flexibility for different downstream generative tasks. This is achieved through Finite Scalar Quantization (FSQ) and a novel FSQ-dropout technique, and does not require additional loss terms beyond the single consistency loss used for end-to-end training. CoDiCodec supports both autoregressive decoding and a novel parallel decoding strategy, with the latter achieving superior audio quality and faster decoding. CoDiCodec outperforms existing continuous and discrete autoencoders at similar bitrates in terms of reconstruction audio quality. Our work enables a unified approach to audio compression, bridging the gap between continuous and discrete generative modelling paradigms.
https://github.com/SonyCSLParis/codicodec
https://huggingface.co/SonyCSLParis/codicodec
No examples and the git isn't live but the hf is at least. Might be cool
Anonymous No.106589596 [Report] >>106589724 >>106589741
>>106589360
IM TIRED OF SEEING THAT BLUE BITCH FUCK YOU
Anonymous No.106589698 [Report] >>106589764
can i get a miku with a fat thicc ass
Anonymous No.106589724 [Report] >>106589835
>>106589596
Calm down saar...
Anonymous No.106589741 [Report] >>106589814
>>106589596
what about the red one?
Anonymous No.106589764 [Report] >>106589770
>>106589698
sure
https://files.catbox.moe/udrh8s.png
Anonymous No.106589770 [Report]
>>106589764
>https://files.catbox.moe/udrh8s.png
thx i can work with that
Anonymous No.106589814 [Report] >>106589913 >>106589918
>>106589741
NTA but desu all the vocaloids feel tiresome to see now. Can't we get some more variety here? Like when was the last time someone genned that android girl from Chobits? Plastic Memories? How about a Cortana?
Anonymous No.106589835 [Report] >>106589859
>>106589724
good morning sir
Anonymous No.106589842 [Report] >>106589942 >>106589949
>>106587526
Grok 2 vs Llama 405B:
SimpleQA: 23.6 vs 18.24
MMLU: 87.5 vs 88.6
MMLU-pro: 75.46 vs 73.3
HumanEval: 88.4 vs 89.0
MATH: 76.1 vs 73.8
lmarena w/ style control: 1333 vs 1335
lmarena: 1306 vs 1287
livebench: 48.11 vs 47.54
Size: 270B vs 405B
Active parameters: 115B vs 405B

Elon made a model with equal performance, but with lower total size and active parameters than Meta's llama. Is Elon that good or is Meta bad or both? This is very, very embarrassing. Fucking 5% GPU utilization in production at Meta. Grok 2 probably even trades blows with Maverick.
Anonymous No.106589859 [Report]
>>106589835
gm
Anonymous No.106589895 [Report]
>>106587553
fasterwhisper is still faster than that
Anonymous No.106589907 [Report]
>>106587260
the only child here is you zoomie
Anonymous No.106589913 [Report]
>>106589814
Lol
Anonymous No.106589918 [Report] >>106589960 >>106590005 >>106590059
>>106589814
I'm still not over it
Anonymous No.106589942 [Report]
>>106589842
DOMAIN FILTERING BASED ON NUMBER OF BAD WORDS
LLAMA 2 GENERATED SYNTHETIC DATA
SCALE AI SLOP
TO THE MOON SIRS
Anonymous No.106589949 [Report] >>106590115
>>106589842
405B is a failed model and shouldn't be used to compare to anything. I suppose any labs who want an easy win could use it as a benchmark, but that's all.
Anonymous No.106589953 [Report]
is there a vibevoice tts extension yet for sillytavern?
Anonymous No.106589960 [Report]
>>106589918
disapointing anime
i'd have thought he would at least have tried to find a solution / cure to it.

instead he just accepted it.
Anonymous No.106589969 [Report]
>>106588121
If compute is the bottleneck, can you use PD disaggregation with a faster GPU?
Anonymous No.106589985 [Report] >>106590009 >>106590018 >>106590192
How is Qwen3-Next-80B-A3B in roleplaying? Is it better than Deepseek v3? I might be another 12 hours before I can download and test whatever bpw that I can handle locally.
Anonymous No.106590005 [Report]
>>106589918
Anonymous No.106590009 [Report]
>>106589985
It is safe :)
Anonymous No.106590017 [Report]
Another day, still no goofs
Anonymous No.106590018 [Report]
>>106589985
>Is it better than Deepseek v3
lol
Anonymous No.106590054 [Report]
>>106589362
Wait.. what?
Anonymous No.106590059 [Report] >>106590137
>>106589918
I liked the anime but the pacing was awful. Are you Chinese?
Anonymous No.106590115 [Report]
>>106589949
I wouldn't exactly call it a failed model. It technically was SOTA for open-weights models when it came out. It wasn't some Llama 4.
Anonymous No.106590137 [Report]
>>106590059
>Are you Chinese?
How did you draw that conclusion?
Anonymous No.106590172 [Report] >>106590868
Imagine when all of these technologies are more advanced and we put all of it together. One day...
Anonymous No.106590192 [Report]
>>106589985
It's about as good as Deepseek R1 8b
Anonymous No.106590212 [Report]
>>106584024
nice
Anonymous No.106590275 [Report]
>>106584024
I look like this
Anonymous No.106590349 [Report]
>The overall effect makes her appear
almost comically plump, her legs looking like they could support her entire body weight with ease.
This is hilarious.
Anonymous No.106590353 [Report] >>106590389
>The overall effect makes her appear almost comically plump, her legs looking like they could support her entire body weight with ease.
This is hilarious.
Anonymous No.106590389 [Report]
>>106590353
It's a doll nigga
Anonymous No.106590507 [Report] >>106590565
>0.33 tok/sec
bros i don't feel so good
Anonymous No.106590565 [Report]
>>106590507
That reminds me of the time I ran mistral large Q1 on CPU.
Anonymous No.106590628 [Report]
>>106582475 (OP)
>Deep Reason extension for TGWUI
Worth it? I was thinking about buying it
Anonymous No.106590745 [Report] >>106591266
https://community.topazlabs.com/t/topaz-studio-transition-questions/95039/9
Looks like the topazbros got rugpulled
Anonymous No.106590868 [Report] >>106591013 >>106591824
>>106590172
>waifu overlay.webm
heh, neat
but strange how it couldn't deal with her handes folded
Anonymous No.106590875 [Report] >>106590881
Reminder to not use quantization and flash attention.
Anonymous No.106590881 [Report]
>>106590875
This, just pay for a GPT plus subscription.
Anonymous No.106590886 [Report] >>106591062 >>106591329 >>106591902
I feel like the whole "not x, but y," thing is a common and useful trope in natural language. It allows us to present one aspect of a topic, and quickly segue into explaining another aspect.

I've been practicing for med school interviews, and it's a super useful way to communicate things.

E.g. Substantiating importance of communication skills: "Good communication strategies aren't just useful when actively listening to the patient, asking appropriate questions and generating a comprehensive history. It is also useful when communicating with multi-disciplinary teams, often across different hospitals, and especially when dealing with complex patients who have received care from a number of different institutions."

This would be considered a slopped response if it was made by an LLM, but is a fantastic way to describe two important aspects of communication in medicine. I've seen variants of this across so many textbooks, and similar phrasing styles have been recommended to me by a number of different experts.
Anonymous No.106590924 [Report]
Q2 is as good as Q4 or Q8.
Anonymous No.106590950 [Report]
what if my brain is running a quanted human model and that is why i am retarded
Anonymous No.106591013 [Report] >>106591438
>>106590868
It's actually staged, just meant as a visualization of what could be. The webm is from MANY years ago, in a time where ML/CV tracking stuff didn't quite exist other than in research, and where things like Vive Trackers did not exist. He simply just manually positioned and posed the virtual model to match the real (or other way around). It's funny that this webm can be misunderstood in the current year because we do in fact have the technology to truly do the perfectly tracked AR overlay thing, as long as someone gave the effort.

We have this webm now but I wanted one that showed an entire real body.
Anonymous No.106591062 [Report]
>>106590886
There's nothing wrong with that or other slopisms, but you wouldn't normally see humans using it over and over again in the same conversation, or sometimes even in a single paragraph. But this happens pretty often depending on the LLM. Or the LLM is actually tuned to be anti-repetitive and instead the slop repetition happens at the start of every conversation, because they're separate conversations and LLMs do not retain those memories.
Anonymous No.106591099 [Report]
I'm going to be honest I don't notice a quality difference in models for over a year now, both local and private models.

Either we've stagnated hard. Or I am the bottleneck in figuring out the quality of the responses. But either way I don't notice a difference between the big models and haven't for about a year beyond default writing style which is subjective.
Anonymous No.106591144 [Report]
can't believe i use to use lm studio
Anonymous No.106591266 [Report]
>>106590745
Kek paypigs. Just download a crack.
Anonymous No.106591301 [Report] >>106591335 >>106591518
I made this agent circuit-thing to make a bunch of models daydream. The output is still full of emdashes, but feels less dumb. Does /lmg/ care?
Anonymous No.106591329 [Report]
>>106590886
The issue is not that the models use this grammatical structure–it's that they try to use it for every other sentence if you let them.
Anonymous No.106591335 [Report] >>106591411 >>106591447
>>106591301
That's pretty cool, like watching different parts of a brain light up. Can you post example outputs with and without daydreaming? If you wanted emdashes gone, you could just ban or bias it or forbid them in the system prompt.
Anonymous No.106591411 [Report]
>>106591335
What would count as a proper head-to-head for this? Running the circuit is like a many-shot reply, whereas just prompting the biggest model in the bunch to daydream about chunked text is maybe a two-shot. That's why I say 'feels' less dumb. I'm willing to do comparisons, though, if you have an idea for one that makes sense or sounds neat. In the meantime, here's an output example.

I have copious log spam and the intermediate steps, too, where you can watch it self-correcting and having realizations and shit.
Anonymous No.106591438 [Report] >>106591468 >>106591498
>>106591013
It has become especially possible only recently because the retards at Meta finally gave camera access on Oculus to developers. Other than that, tracking was always possible with ARToolKit and special markers on the doll
Anonymous No.106591447 [Report]
>>106591335
As for brain regions, you're on the nose. This is a neuromorphic pattern based on the Default Mode Network, with the terminology obfuscated so the model does not think it's writing a neuroscience test.
Anonymous No.106591468 [Report] >>106591517
>>106591438
Are you telling me that there still isn't a fully open source headset? I'm kinda looking for one that has everything exposed to the developer
Anonymous No.106591480 [Report] >>106591495
>>106582475 (OP)
TFW when no airchan to make win 10/server 2023 console scriptlets/cmdlets with.

MODERN COMPUTE STUDIES A DREK :(
Anonymous No.106591495 [Report]
>>106591480
>doing secretary work is hard!
cant wait for these useless dregs to be out of a job thanks to AI
Anonymous No.106591498 [Report] >>106591529
>>106591438
Yeah I should've said specifically consumer. Tracking software methods including but not limited fiducial have existed for a long time, but you could really start making tracked dolls a reality with 0 coding knowledge as soon as vive trackers came out and were supported in VRChat.
Anonymous No.106591517 [Report]
>>106591468
Valve Frame isn't out yet
Anonymous No.106591518 [Report] >>106591560
>>106591301
Don't know what you used but looking good! Will you share?
Anonymous No.106591529 [Report]
>>106591498
Sex dolls with integrated trackers would've been rad
Anonymous No.106591560 [Report] >>106591683
>>106591518
You'll need this repo: https://github.com/dibrale/Regions

The catbox has my stuff that's not in the repo: 7g2qao.zip

The script in the catbox is pretty much based off of the lit crit demo, but verify before arbitrarily executing, etc. etc.
Anonymous No.106591683 [Report]
>>106591560
Thank you, I'll check
Anonymous No.106591824 [Report] >>106592224
>>106590868
Hand tracking is very hard.
Anonymous No.106591902 [Report]
>>106590886
>not x
What if no-one would have thought "x" was a even a plausible option ? (From the narrative/past events.)

>not x, but y
What if literally no implications flow from "y" in the following text ?
Anonymous No.106592017 [Report]
How suitable is openrouter for data processing tasks like fiction tagging? Will it report me to the FBI and NSA if my requests happen to contain unorthodox texts?
Anonymous No.106592024 [Report] >>106592033 >>106592110
>>106583124
You may not like it but cooming and other purely recreational stuff is the optimal use case for local models since you know nobody is reading your garbage and uncensored consumer GPU size models can be more fun (though lower IQ) than gigantic models when finetuned for your specific use case like RP/ERP.

For actual beneficial use cases like shitting out useful scripts or whatever just use your favorite giga huge cloud model at all the tokens per second. Gemini 2.5-pro is already way better at coding tasks than anything local, you can use it from command line, it can interact with your file system if you give it perms to a folder and if you log in with jewgle account you get 1000 free requests per day which is good for pretty much anything other than professional amounts of use. The only reason to avoid cloud is if your prompts contain personal info or other info you want to be 100% sure doesn't get stolen like your own, non AI-sloppa code or if you want to do dumb fun stuff like coom.
Anonymous No.106592033 [Report]
>>106592024
I don't want gigacorps to know I'm bad at coding
Anonymous No.106592055 [Report] >>106592075 >>106594369
what's the point of LLM if i can't cuddle it
Anonymous No.106592075 [Report]
>>106592055
What' the point of cat if it can't help me write erotica
Anonymous No.106592097 [Report]
>>106587021
don't worry, even when they come to the west it's never your turn to get gibs first, it will be bought by pros/researchers who are on the cheaper side, then by scalpers, who will then tear you a new asshole
Anonymous No.106592099 [Report]
Is exllama still faster than goof?
Anonymous No.106592110 [Report] >>106592242
>>106592024
>gemini free blabla
this is like the drug dealer giving you a free hit
it won't last, running models as good as gemini costs a lot of money, even their most expensive subscriptions doesn't really cover the real cost of LLM business
companies like Google in the LLM space are using the Uber strategy: give a product for much cheaper than it should be, until the competition is dead, then jack the prices up like crazy
you may not see a point to local yet for non recreational uses because you don't see what they're going to do to you in the long term
I do and that's why I won't develop an addiction
Anonymous No.106592138 [Report] >>106592144
I think qwen 30b higher quantz have less "not x, but y"
Anonymous No.106592144 [Report]
>>106592138
I've only used Q8 and it's still pretty excessive.
Anonymous No.106592211 [Report] >>106592227
GLM Air is surprisingly coherent, creative and non-repetitive even at Q2, 24k context. How did they do it?
Anonymous No.106592224 [Report] >>106592348
>>106591824
Meta solved it on the Quest, somehow
Anonymous No.106592226 [Report]
gm sirs
Anonymous No.106592227 [Report] >>106592231
>>106592211
I found it to be uncreative and predictable like all sub-deepseek moes
Anonymous No.106592231 [Report] >>106592328
>>106592227
What do you use for RP?
Anonymous No.106592242 [Report]
>>106592110
You are probably right, no such thing as free meal etc, but I don't think they are gonna kill off competition anytime soon so even if they flip the "pay us" switches there's always probably a free or at least cheaper solution to move to.
And not like local assistant use is completely pointless or something, just saying right now I feel like free cloud is the best choice for most use cases where you need the LLM actually to be "correct" unlike in recreational use.
Anonymous No.106592247 [Report] >>106592268 >>106592271 >>106592294 >>106592311 >>106592364 >>106592669
not x but y is a lot more pervasive than people seem to notice, but they only notice if it's very close to literally spelling "not just x, but y" like the sloppier models
here's a less sloppy model still doing it quite a lot in practice:
https://eqbench.com/results/creative-writing-v3/o3.html
>He had kept the find quiet; obviously not quiet enough.
>You will seem a magnate rather than a hostage.
>had dreamed of building cranes and pressure domes, not empires.
>Because Antares relies on calculus, not superstition.
>“Altruism,” she said lightly, “is a luxury for stable epochs. This is not one
etc etc
the fact is, the best, state of the art LLMs are still inherent slop and enjoying LLM writing is like being a fatso American calling McDonald gourmet food
AI models as a whole suck at art, it's people who have no soul who enjoy the art side of it
for me? AI is a tool. Classifiers, summarizers, metadata annotations, genning translation of my program UI strings etc. Looking for soulful content in a machine? Nay.
Anonymous No.106592268 [Report]
>>106592247
but i just nutted to a non consenting loli, my guttural scream was not only passionate, but an art form in itself. What is art, if not primal urgest being satisfied?
Anonymous No.106592271 [Report]
>>106592247
It is possible to enjoy something that is flawed
Anonymous No.106592294 [Report] >>106592301
>>106592247
Kys
Anonymous No.106592301 [Report]
>>106592294
no u
Anonymous No.106592311 [Report] >>106592357
>>106592247
Imagine reading filthy smut and thinking of mcdonalds. How fucking fat are you?
Anonymous No.106592328 [Report] >>106592345
>>106592231
Rocinante1.1/original r1 q2xxs
Anonymous No.106592345 [Report]
>>106592328
>Rocinante1.1
good joke anon
Anonymous No.106592348 [Report]
>>106592224
PoV hand tracking is easier than unconstrained perspective and that image doesn't show anything impressive. They might also be using special cameras, which helps. LeapMotion does that too. The hard part is unconstrained hand tracking when hands interact, have stapled fingers, holding objects, and so on.
Anonymous No.106592357 [Report] >>106592375
>>106592311
>How fucking fat are you?
I am not American
Anonymous No.106592364 [Report]
>>106592247
>it's people who have no soul who enjoy the art side of it
I don't care about "art". Image gen makes pretty pictures that make my dick hard. LLMs suck my cock.
Anonymous No.106592375 [Report]
>>106592357
You are american brained.
Anonymous No.106592548 [Report] >>106592585 >>106592642 >>106592679
>take source code of open source software which is well documented
>alternatively let LLM create comments and documentation of everything
>delete all code but leave comments in
>tell your LLM coder to (re)create the software
has someone done this before? I wanna see how far LLMs (especially local LLMs) can go given the optimal conditions. also looking for github repos with which are suited for this task. I'll probably start with OBS which will most likely be too complex. But I can always lower the bar.
And I want to stress again the goal is not to create a slopped version of an existing project. It's more about testing just how far prompt, context and environment engineering can take LLMs.
Anonymous No.106592585 [Report] >>106592623 >>106592717
>>106592548
ur dumb and ur shits all retarded
Anonymous No.106592623 [Report] >>106592628
>>106592585
not dumb, not retarded, but autistic.
Anonymous No.106592628 [Report]
>>106592623
your comment was not insightful, but memeworthy
Anonymous No.106592642 [Report] >>106592717
>>106592548
You'd have to trust their comments and understanding of the code first. Here's an example of the first part.
>https://github.com/ggml-org/llama.cpp/pull/15777
You read like you've never used these things before.
Anonymous No.106592669 [Report] >>106592704
>>106592247
>contrasting two things is slop
What's next, punctuation is slop?
Anonymous No.106592679 [Report] >>106592717
>>106592548
>the goal is not to create a slopped version of an existing project
The goal is to circumvent copyleft licenses, you're being quite obvious.
Anonymous No.106592704 [Report]
>>106592669
i had an argument with another anon some time ago about punctuation and capitalization as well
w vntlly grd t stp sng vwls
tp prfrmnc
Anonymous No.106592717 [Report] >>106592732 >>106592733 >>106593128
>>106592585
only valid arguments to why my idea is retarded will make me feel dumb. so your comments are pointless until you deliver said arguments. and since you decided to reply instead of ignore, you clearly have an incentive. So following up with
>nah ur stupid
will make you look stupid.

>>106592642
I'm aware. My idea was to only use the cloned repo without github issues/comments. There are projects out there that have all code blocks commented. Maybe I should search for vibe coded repos as they often have everything commented.

>>106592679
please just think for a moment, anon. If that was the goal, I would obviously leave in all the code and tell the LLm to rewrite it in a different way or using a different stack. you really thought you're on to e there, huh?


I'm just gonna do it and report back with the results. I found a ton of demo repos with fully commented code.
Anonymous No.106592732 [Report]
>>106592717
>If that was the goal, I would obviously leave in all the code and tell the LLm to rewrite it in a different way or using a different stack.
https://en.wikipedia.org/wiki/Clean-room_design
Anonymous No.106592733 [Report]
>>106592717
>I'm aware.
You aren't.
>I'm just gonna do it and report back with the results. I found a ton of demo repos with fully commented code.
Nah. It's fine. Keep those to yourself.
Anonymous No.106592857 [Report] >>106592879 >>106592883
Can "Mistral-Nemo-Instruct-2407-GGUF" handle beyond 8K context?
Anonymous No.106592879 [Report] >>106592893
>>106592857
Nemo is officially rated for 16K context, I find it mostly coherent up to around 20-24K but it gets noticeably dumber even after 4K.
Anonymous No.106592883 [Report]
>>106592857
It can handle ~16k without going schizo
Anonymous No.106592893 [Report] >>106592907 >>106592920
>>106592879
>Nemo is officially rated for 16K context
It's actually 128k, but no one who has ever used it agrees with that
Anonymous No.106592907 [Report]
>>106592893
>actually
*technically
Anonymous No.106592920 [Report] >>106592934
>>106592893
I must be going crazy, I could have sworn it was much lower than that. Maybe I'm confusing it with one of the older context benchmarks that said 16K was the falling off point.
Anonymous No.106592934 [Report]
>>106592920
Yeah, it's 16k according to the RULER benchmark, but Mistral claims 128k
Anonymous No.106593117 [Report]
>>106593104
>>106593104
>>106593104
Anonymous No.106593128 [Report] >>106593179
>>106592717
ur the kind of room temp iq retard that thinks 'AI CAN AND WILL DO IT BETTER THAN HUMIES!!!' when the AI HAS BEEN TRAINED ON HUMAN INPUTS YOU FUCKING RETARD
Anonymous No.106593179 [Report]
>>106593128
>And I want to stress again the goal is not to create a slopped version of an existing project.
Don't make me defend the retard again.
Anonymous No.106594369 [Report]
>>106592055
what the point of "self" if I can't cuddle it?

but if seriously, just wait for neural interface to be decent and you can cuddle LLMs all you want