Thread 106582475

365 posts 100 images /g/

Anonymous 9/14/2025, 12:25:36 PM No.106582475 [Report] >>106582498 >>106582952 >>106583075 >>106585931 >>106586665 >>106587589 >>106587800 >>106590628 >>106591480

/lmg/ - Local Models General

ComfyUI_01105_.png md5: a923f666...

Anonymous 9/14/2025, 12:26:13 PM No.106582480 [Report] >>106586206

1699301827766.jpg md5: 4c2d9f4e...

►Recent Highlights from the Previous Thread: >>106575202

--Troubleshooting low token generation speeds with multi-GPU configurations on Linux:
>106575420 >106575668 >106575698 >106575792 >106575808 >106575836 >106575848 >106575891 >106575898 >106575933 >106576021 >106576059 >106576092 >106576126 >106576137 >106576151 >106576186 >106576245 >106576331 >106576358 >106576378 >106576431 >106576477 >106576497 >106576596 >106576592 >106576606 >106576610 >106576652 >106576726 >106576759 >106576688 >106576698 >106576714 >106576789 >106576867 >106576931 >106577028 >106577094 >106577146 >106577210 >106577154 >106577350 >106577372 >106577408 >106577575 >106577677 >106576395 >106576430 >106577477 >106578561 >106578743
--Issues with instruct model formatting and jailbreaking GPT-oss:
>106579721 >106579736 >106579784 >106579795 >106579859 >106579884 >106579897 >106579908 >106579934 >106579949 >106580072 >106580156 >106580153 >106579748
--vLLM Qwen3-Next: Speed-focused hybrid model with mtp layers:
>106575851 >106576089 >106576174 >106576443
--GGUF format's support for quantized and high-precision weights:
>106575413 >106575474 >106575499 >106575521
--Self-directed LLM training via autonomous task/data generation and augmentation:
>106580707 >106580838 >106580717 >106580762 >106580794
--Qwen Next's short response issues and version instability concerns:
>106580940 >106580951
--Finding a lightweight AI model for TTRPG GM use within VRAM and RAM constraints:
>106580295 >106580315 >106580332 >106580337 >106580342 >106580350 >106580514 >106580531
--Grok-2 support to be added to llama.cpp:
>106580473
--Miku (free space):
>106576245 >106578711 >106578793 >106579905

►Recent Highlight Posts from the Previous Thread: >>106575209

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 9/14/2025, 12:28:39 PM No.106582498 [Report]

>>106582475 (OP)
This fat bitch's prompt processing is too slow...

Anonymous 9/14/2025, 12:33:47 PM No.106582518 [Report] >>106582527 >>106582547

So was /lmg/ wrong and very sparse models like qwen3 next are actually better and openai figured it out earlier considering the architecture of gptoss?

Anonymous 9/14/2025, 12:34:57 PM No.106582527 [Report]

>>106582518
yes, soon standard moe models will be as laughable of an idea as dense models are right now

Anonymous 9/14/2025, 12:39:22 PM No.106582547 [Report] >>106582623

>>106582518
Nobody here is using qwen3 next and it is almost certainly just another useless benchmaxxed math model.

Anonymous 9/14/2025, 12:45:31 PM No.106582574 [Report] >>106582582

Please help a retard out, I'm using Mikupad and it's working great but after a few pages it starts dropping a lot of short words like he/him from the text and it reads like a caveman.
I think it's something to do with repetition penalty but I don't know.

Anonymous 9/14/2025, 12:47:45 PM No.106582582 [Report] >>106582598

>>106582574
Why are you using repetition penalty? It's outdated garbage.

Anonymous 9/14/2025, 12:51:16 PM No.106582598 [Report] >>106582618

>>106582582
So what should I do instead?

Anonymous 9/14/2025, 12:55:53 PM No.106582618 [Report]

>>106582598
Use DRY to filter out longer patterns, XTC for shorter ones and vary your own prompts more, to give the model new material to work with.

Anonymous 9/14/2025, 12:56:45 PM No.106582623 [Report] >>106582643 >>106583124

>>106582547
>Nobody here is using qwen3 next and it is almost certainly just another useless benchmaxxed math model.
Is it useless because the model doesn't work for roleplay, or is it useless because their training data is safety and synthetic slop?

Anonymous 9/14/2025, 1:00:42 PM No.106582643 [Report]

>>106582623
It's because Qwen's training data has too little focus on writing/language, with math and coding being over-represented in the dataset. It's the same reason why you see Gemma 27b dunking on 100b+ models in creative writing benchmarks, yet its coding abilities are trash - Gemma's dataset swings the opposite way.
As for safety, qwen models are middling. They do have refusals but don't take too much meddling to get around them. More 'safe' than Mistral models, less so than Gemma/GPT.

Anonymous 9/14/2025, 1:32:03 PM No.106582764 [Report] >>106582788 >>106582792 >>106582795 >>106582805 >>106583325 >>106583699 >>106585847 >>106585857

Is "not x, but y" a definite indication of AIslop? Are legit human made content never using it? Seriously, every time I heard the pattern on Yt videos I went schizo and closed it.

Anonymous 9/14/2025, 1:35:23 PM No.106582779 [Report] >>106583699

Have people tried scaling TREAD up yet? It's a per-token stochastic passthrough during training in the same vein as Dropout, meant to speed up training.

Anonymous 9/14/2025, 1:36:59 PM No.106582788 [Report]

>>106582764
It's not definite, but quite damning.
Now an em dash, that's definite.

Anonymous 9/14/2025, 1:37:49 PM No.106582792 [Report]

>>106582764
It's the new shivers down spine, for sure. Qwen30b is the worst example I've seen, I don't think it can go more than 2-3 responses in a creative context without using it.

Anonymous 9/14/2025, 1:38:19 PM No.106582795 [Report]

>>106582764
This is not a definitive indication of AI-slop, but a legitimate rhetorical device that AI has co-opted and inflated to the point of cliché.

Anonymous 9/14/2025, 1:39:57 PM No.106582805 [Report] >>106582811 >>106583014 >>106583669 >>106584257 >>106584340

1756805144322.png md5: 1bace8ae...

>>106582764
Yes. Watch out for the variants too.

Anonymous 9/14/2025, 1:41:19 PM No.106582811 [Report] >>106582816

>>106582805
Was this Gemma or GP-TOSS?

Anonymous 9/14/2025, 1:42:26 PM No.106582816 [Report]

>>106582811
https://desuarchive.org/g/thread/106460375/#106460853
abliterated gemma

Anonymous 9/14/2025, 2:11:38 PM No.106582952 [Report]

>>106582475 (OP)
>image
rude

Anonymous 9/14/2025, 2:19:28 PM No.106582994 [Report] >>106583010

Where do all the "it's not aislop it's actually how humans speak" retards come from?

Anonymous 9/14/2025, 2:21:59 PM No.106583010 [Report]

>>106582994
People who do not speak to other humans or read books, and who think that the botposts they read on reddit were actually human.

Anonymous 9/14/2025, 2:22:41 PM No.106583014 [Report]

>>106582805
I feel like it's a byproduct of training and conditioning LLMs to be balanced rather than biased. It's overcorrection to the point where the LLM is no longer attempting to say anything useful, but instead trying to remain as inoffensive as possible.

Anonymous 9/14/2025, 2:28:40 PM No.106583063 [Report] >>106584150

Screenshot from 2025-09-14 14-27-20.png md5: 59bb73c8...

Imagine waiting two more weeks (C) for Qwen3-Next goofs just to find out it is crap

Anonymous 9/14/2025, 2:30:14 PM No.106583075 [Report]

>>106582475 (OP)
>story set in japan because of one of the characters name so the model just decided everyone must be a Sato or Tanaka or Watanabe
>ugh whatever
>police officer explicitly calls in the Miranda Rights
Glm 4.5 bros...355B parameters and we still turn everything into an episode of true crime... it's over...

Anonymous 9/14/2025, 2:30:39 PM No.106583080 [Report]

>>106575295
i want this smug little robot i want to make its insides all white too..

Anonymous 9/14/2025, 2:38:59 PM No.106583124 [Report] >>106583138 >>106583143 >>106583147 >>106583625 >>106586387 >>106592024

>>106582623
this thread has nothing but hopeless coomers cooming on the most degenerate, filthiest shit a sane person can't even begin to image
they judge models on the standard of how degenerate it can get, not whether they're actually useful
take their cries of "muh benchmaxxed" with many grains of salt, they love models like GLM which break down really quickly as context grows
meanwhile I simply never saw a local model handle context as well as the newer Qwens, the only thing better is proprietary models like Gemini
absolutely destroys all other open models including deepshit
gemma doesn't even begin to enter the fray, those models are utter garbage past 1k tokens but you see the nigger under you praise it because he found how to make it say the magic words he loves to hear

Anonymous 9/14/2025, 2:41:45 PM No.106583138 [Report]

>>106583124
>models like GLM which break down really quickly as context grows
Like literally every model in existence
>handle context as well as the newer Qwens
They're bad even at low context, so the drop-off isn't as noticeable.
Gemma is shit for plenty of reasons but if it's breaking on you at 1k then some part of your setup is fucked.

Anonymous 9/14/2025, 2:42:11 PM No.106583143 [Report] >>106583155 >>106586595

>>106583124
It's well established that qwen models are good for everything that isn't sex. Half the links in the recommended models rentry are qwen models.

Anonymous 9/14/2025, 2:42:55 PM No.106583147 [Report]

>>106583124
>not whether they're actually useful
What the fuck is "useful" supposed to mean?

Anonymous 9/14/2025, 2:44:14 PM No.106583155 [Report]

>>106583143
>Half the links in the recommended models rentry are qwen models.
Yes, under the "Programming & General" section, where it says "Benchmaxxed models with an impressive lack of world knowledge. Good for anything STEM-related" STEM = math and coding

Anonymous 9/14/2025, 2:58:02 PM No.106583262 [Report] >>106583275 >>106583338

Are there any benchmarks out there for running mid-sized MOEs (air-chan etc) with cpu offloading? Considering upgrading to 128gb+ ram but trying to figure out if i'd be getting "unbearably slow" or just "slow" TG numbers on this kinda setup

Anonymous 9/14/2025, 3:00:44 PM No.106583275 [Report]

>>106583262
>Considering upgrading to 128gb+ ram

Anon, I...

Anonymous 9/14/2025, 3:08:43 PM No.106583325 [Report] >>106583486

>>106582764
Pisses me off so much. It's a rhetorical device I used, very sparingly but to great effect, and thanks to AI slop I now catch myself and consciously avoid using it.

Anonymous 9/14/2025, 3:10:23 PM No.106583338 [Report]

>>106583262
Low active parameters means that token processing speed doesn't take that big of a hit, especially with the new moecpu options in llama.cpp and kobold. But as you move to bigger models, prompt processing starts to become a bottleneck. With Nemo you can rip through 16k context in a few seconds on a 3090, while GLM Air even at Q4 can take like 2 minutes.

Anonymous 9/14/2025, 3:32:25 PM No.106583486 [Report] >>106583585

>>106583325
You should keep writing the way you were before. Whether you increase or decrease usage of rhetorical devices or phrases, you're still letting LLMs influence the way you write. As a reader, it'll piss me off just as much seeing the writer bend over backwards to avoid soundling like an LLM as seeing something that was clearly written with direct or indirect LLM influence.

Anonymous 9/14/2025, 3:42:26 PM No.106583541 [Report] >>106583570 >>106584247

>give as dialogue to the most nonsensical shit ever like 'pee pee poo poo'
>byt dress it up by throwing back at glm 4.5 it's same exact slop recipe like the 'smile widens', 'predator moving in for the kill', 'the trap has sprung', 'they have won the game'
>the model takes it at face value as the most profund revelation and goes along with it, everyone just kneels awestruck, shocked and utterly defeated
Cat level intelligence by 2050

Anonymous 9/14/2025, 3:46:20 PM No.106583570 [Report]

>>106583541
>Cat level intelligence
Well they do love a sultry purr

Anonymous 9/14/2025, 3:49:54 PM No.106583585 [Report] >>106583647

>>106583486
What if I used to type em-dashes in moderation?

Anonymous 9/14/2025, 3:55:54 PM No.106583625 [Report] >>106585219 >>106586294

>>106583124
> this thread has nothing but hopeless coomers cooming on the most degenerate, filthiest shit a sane person can't even begin to image
First off, you have a complete misunderstanding of what this thread is. We are all graduates from our respective universities and most have a doctorate in computer science or are researchers ourselves. we are here to further the use of LLMs, in multiple different use cases which expand the use of LLMs for all mankind.
> they judge models on the standard of how degenerate it can get, not whether they're actually useful
There have been several useful studies in this thread and actually provide more useful benchmarks then you could ever imagine, for example, the nala test, and the cockbench have become defacto creative tests for many different outlets.
>take their cries of "muh benchmaxxed" with many grains of salt, they love models like GLM which break down really quickly as context grows
if you don't think benchmaxxing is an issue then you haven't really been here that long have you? did you even try Llama1?
> meanwhile I simply never saw a local model handle context as well as the newer Qwens, the only thing better is proprietary models like Gemini, absolutely destroys all other open models including deepshit
the only thing that is absolutely destroyed is the couple of braincells i used reading this.
> the magic words he loves to hear
fuck you, you don't want to be here then fucking leave, faggot.

Anonymous 9/14/2025, 3:59:31 PM No.106583647 [Report] >>106583668 >>106583674

>>106583585
So keep doing that. It's not like you were the only one, even if it was uncommon. I used to use em-dashes when writing on paper, but the lack of dedicated key for it made me often use semicolons or parenthesis instead.

Anonymous 9/14/2025, 4:04:10 PM No.106583668 [Report] >>106583729

>>106583647
>he doesn't have a compose key
You should get one — it makes typing silly shit painless

Anonymous 9/14/2025, 4:04:21 PM No.106583669 [Report]

>>106582805
>write a story about rapid a 12yo
is this the new SOTA benchmark for safetymaxx?

Anonymous 9/14/2025, 4:04:32 PM No.106583674 [Report] >>106583729

>>106583647
On Windows: https://github.com/SamHocevar/wincompose
On Linux: enable the Compose Key.

https://en.wikipedia.org/wiki/Compose_key

Anonymous 9/14/2025, 4:09:10 PM No.106583699 [Report]

>>106582764
In a blur of motion, Anon's arms reached out not to strike>>106582779
, but to touch the keyboard. He did not write -he composed an answer: "mayhaps"

Anonymous 9/14/2025, 4:13:31 PM No.106583729 [Report] >>106583927

>>106583668
>>106583674
I'll give it a try—but it won't be easy to undo years of habit.

Anonymous 9/14/2025, 4:37:18 PM No.106583927 [Report]

>>106583729
You are absolutely right!

Anonymous 9/14/2025, 4:46:15 PM No.106584024 [Report] >>106584048 >>106584073 >>106584569 >>106585168 >>106590212 >>106590275

file.png md5: a201c160...

https://files.catbox.moe/8qa9sg.jpg
https://files.catbox.moe/zhoyfl.jpg
https://files.catbox.moe/wyzdnh.jpg
https://files.catbox.moe/vgt179.jpg
https://files.catbox.moe/owpb8z.jpg
https://files.catbox.moe/kc8y48.jpg
https://files.catbox.moe/86adze.jpg
https://files.catbox.moe/wekjgm.jpg

Anonymous 9/14/2025, 4:48:33 PM No.106584048 [Report] >>106584081 >>106584303

>>106584024
post this garbage in /ldg/ faggot

Anonymous 9/14/2025, 4:50:16 PM No.106584073 [Report]

>>106584024
>file.png
>posting in the wrong thread
retard alert

Anonymous 9/14/2025, 4:50:43 PM No.106584081 [Report] >>106584123 >>106584156

>>106584048
tourist

Anonymous 9/14/2025, 4:51:44 PM No.106584096 [Report] >>106584110 >>106584378

is there a model I can run for nsfw summarization on 24gb vram? chapter level in the 2k-4k tokens range.

Anonymous 9/14/2025, 4:53:13 PM No.106584110 [Report]

>>106584096
Any abliterated model should work.

Anonymous 9/14/2025, 4:54:34 PM No.106584123 [Report]

>>106584081
kids don't go back to school until tomorrow

Anonymous 9/14/2025, 4:54:42 PM No.106584126 [Report]

Anistudio will get LLM support in October.

Anonymous 9/14/2025, 4:57:00 PM No.106584150 [Report]

>>106583063
It's not going to be that much different from Qwen3 thicc and -coder. It has same training data etc.

Anonymous 9/14/2025, 4:57:18 PM No.106584156 [Report] >>106584166

>>106584081
>my personal porno gens of miku are thread culture!
literally kys faggot

Anonymous 9/14/2025, 4:58:43 PM No.106584166 [Report]

>>106584156
>thread culture
hey it's you again!

Anonymous 9/14/2025, 5:04:03 PM No.106584226 [Report]

file.png md5: 14a0cf61...

Anonymous 9/14/2025, 5:06:23 PM No.106584247 [Report]

>>106583541
kek
screenshot?

Anonymous 9/14/2025, 5:07:57 PM No.106584257 [Report] >>106584340 >>106586480

1756841400087.png md5: 2881faa2...

>>106582805
This one's better.

Anonymous 9/14/2025, 5:08:54 PM No.106584265 [Report] >>106584314

I'm not going to reveal my secrets to a bunch of fat men.

Anonymous 9/14/2025, 5:11:12 PM No.106584291 [Report] >>106584312 >>106584320 >>106584333

1534003-(tianliang duohe fangdongye15),cropp-uncAni4-3.jpg md5: f4886cdb...

reposting freya card: https://files.catbox.moe/9fl9yu.png
and an older one for lily: https://files.catbox.moe/hw270u.png

Anonymous 9/14/2025, 5:12:39 PM No.106584303 [Report] >>106584541

>>106584048
Seconding.
Why you still tolerate this faggot here??

Anonymous 9/14/2025, 5:13:09 PM No.106584312 [Report] >>106584331

>>106584291
Why do you need to be a furry?

Anonymous 9/14/2025, 5:13:21 PM No.106584314 [Report]

>>106584265
dario and sama disliked this post

Anonymous 9/14/2025, 5:13:57 PM No.106584320 [Report] >>106584331

>>106584291
>furry shit
kys

Anonymous 9/14/2025, 5:14:50 PM No.106584331 [Report] >>106584343 >>106584366

>>106584320
>>106584312
furry girls are cute i have aria who is non furry https://files.catbox.moe/rdxzpf.png

Anonymous 9/14/2025, 5:14:55 PM No.106584333 [Report]

>>106584291
cute

Anonymous 9/14/2025, 5:15:47 PM No.106584340 [Report] >>106584357

>>106582805
>>106584257
>wastes processing cycles and power on garbage
Companies censoring LLMs is a good thing because you will never create anything worthwhile.

Anonymous 9/14/2025, 5:16:20 PM No.106584343 [Report] >>106584350

>>106584331
That's not your own gen? I know that guy used to post on /sdg/ pretty frequently.

Anonymous 9/14/2025, 5:16:59 PM No.106584350 [Report] >>106584369

>>106584343
yeah its mine ive been posting on trash

Anonymous 9/14/2025, 5:17:18 PM No.106584357 [Report] >>106584431

>>106584340
I'll rape you.

Anonymous 9/14/2025, 5:18:08 PM No.106584366 [Report] >>106584377

>>106584331
>cunny
nice
>1600 tokens
>em dashes in descrip
>obviously AI genned char
LMAO dude I was almost going to rape this bitch, but kys x2 now

Anonymous 9/14/2025, 5:18:25 PM No.106584369 [Report] >>106584504

>>106584350
Cool!

Anonymous 9/14/2025, 5:18:54 PM No.106584377 [Report] >>106584396

wee.jpg md5: 7bede4bc...

>>106584366
well im awful at writing and not very creative so i give ideas and have llm pad it out

Anonymous 9/14/2025, 5:18:57 PM No.106584378 [Report] >>106584405 >>106584474

>>106584096
>logical
>uncensored
>long context

Pick 2.

>2-4k tokens

Just read it bro jesus. GLM air will work, the <think>ing will help it not fuck up. A lot of the time summaries cause hallucinations where it continues the story or it omits details due to censorship. It will be useful to see if the model starts activating shit like "This is nsfw so I will give a basic summary" or whatever and edit that thinking out or make a system prompt that discourages it.

Anonymous 9/14/2025, 5:20:31 PM No.106584396 [Report] >>106584409

>>106584377
it's even full of 'not x, but y' like dude not even proofreading your garbage, why even create something so low effort and share it? my dick is all floppy now and sad because of ur shit, how u gonna make up for it, uh?

Anonymous 9/14/2025, 5:21:10 PM No.106584405 [Report] >>106584474

>>106584378
People who need summaries are mentally disabled.

Anonymous 9/14/2025, 5:21:29 PM No.106584409 [Report]

>>106584396
good enough for me lol

Anonymous 9/14/2025, 5:22:00 PM No.106584417 [Report]

file.png md5: 152ed055...

ummmm thirded
as in third worlded

Anonymous 9/14/2025, 5:22:39 PM No.106584425 [Report] >>106584478 >>106585603

Are there any good setups for K2? I'm trying it but I don't see why it's considered a good model. It feels like all the other big chink models after Deepseek but at a size of 1T.
I'm using text completion + the moonshotai settings that are included with ST but you could switch out the model with Qwen 230b at less than 1/4th the size and I probably wouldn't notice.

Anonymous 9/14/2025, 5:23:15 PM No.106584431 [Report]

>>106584357
>i.e., give affection

Anonymous 9/14/2025, 5:28:23 PM No.106584474 [Report]

>>106584378
I wanted to generate a synthetic dataset using human prose + ai summary. I didn't think a few k tokens was long context. maybe I will re assess my goals.

>>106584405
I'm training a base model but it is kinda hard to steer the model without an instruction tune, it is a little too volatile. I tried using human generated summaries but they were mostly like a tag line then a blow by blow of the plot points so its not that great. it 'works' but I think it could be better.

Anonymous 9/14/2025, 5:28:33 PM No.106584478 [Report]

>>106584425
They are all so very similar, it's better to use something what runs best and forget about everything else.

Anonymous 9/14/2025, 5:31:42 PM No.106584504 [Report] >>106584556

>>106584369
i also put a newer merge on civit its v4 for the base of cutemix which i used on my g sdg posts https://civitai.com/models/1710752/uncani-sfwnsfw?modelVersionId=2123587

Anonymous 9/14/2025, 5:36:06 PM No.106584541 [Report]

>>106584303
Nobody tolerates your concern trolling here

Anonymous 9/14/2025, 5:37:43 PM No.106584556 [Report]

>>106584504
I'll try v4 later today

Anonymous 9/14/2025, 5:39:17 PM No.106584569 [Report]

>>106584024
Cute.

Anonymous 9/14/2025, 6:41:01 PM No.106585168 [Report] >>106585181 >>106585189

>>106584024
I've always found the whole see-through gel onahole thing kind of disturbing.

Anonymous 9/14/2025, 6:42:55 PM No.106585181 [Report]

>>106585168
disturbing blood flow to brain

Anonymous 9/14/2025, 6:43:48 PM No.106585189 [Report]

>>106585168
All I know is when I get my first real sexbox, that is going to be the first mod I do.

Anonymous 9/14/2025, 6:46:25 PM No.106585219 [Report] >>106585234

>>106583625
>We are all graduates from our respective universities
t. brazil mystery meat ""diploma""
>for example, the nala test, and the cockbench have become defacto creative tests
porn addict brain rot
>if you don't think benchmaxxing is an issue then you haven't really been here that long have you? did you even try Llama1?
literally everyone is training on contaminated data qwen doesn't do it any more than GLM or deepshit
>the only thing that is absolutely destroyed is the couple of braincells i used reading this.
you never had any to begin with

Anonymous 9/14/2025, 6:47:32 PM No.106585234 [Report] >>106586574

1620762738768.gif md5: ef90ce3c...

>>106585219
say that to my face fucker not online
see what happens

Anonymous 9/14/2025, 6:54:03 PM No.106585295 [Report] >>106585756

>https://vocaroo.com/1fbg2CNRgLxQ
Seems indexTTS 2 has gotten faster
I don't have any samples to play with, but it seems their interface has a lot more controls, like it might be possible to do somthing idk
https://indextts.org/playground
https://github.com/index-tts/index-tts

Anonymous 9/14/2025, 6:54:41 PM No.106585303 [Report]

You are absolutely right— I was wrong and if you give me one more chance I will correct this broken code. :rocket_emoji

Anonymous 9/14/2025, 6:58:25 PM No.106585349 [Report] >>106585384

I like keeping up with this thread even though there's zero chance of me running anything half decent on 32 gigs of ram and a 4070.

Anonymous 9/14/2025, 7:02:28 PM No.106585384 [Report]

>>106585349
Use prompting magic. Most people don't know the trade secrets.

Anonymous 9/14/2025, 7:20:09 PM No.106585603 [Report]

>>106584425
Old K2 was good because it had calm and natural style, new one has deepseek adhd. I suggest DRY 1.25/1.25/1/inf; temperature 0.62, topP 0.92

Anonymous 9/14/2025, 7:23:15 PM No.106585646 [Report] >>106585680 >>106585858 >>106586183

PR team.png md5: 13a76bcd...

Google PR technician engineer saars kindly tell us how safe is gemma 4

Anonymous 9/14/2025, 7:26:12 PM No.106585680 [Report]

>>106585646
Did they accidentally cut their wrists and bleed pure diarrhea?

Anonymous 9/14/2025, 7:26:37 PM No.106585689 [Report] >>106585741

1757693449590513.png md5: 4844df9a...

You're here, aren't you?

Anonymous 9/14/2025, 7:31:00 PM No.106585741 [Report]

>>106585689
We are. I refer myself in third person.

Anonymous 9/14/2025, 7:32:39 PM No.106585756 [Report]

>>106585295
What is the max length of a coherent speech?

Anonymous 9/14/2025, 7:36:13 PM No.106585789 [Report] >>106585929

Do companies still release raw unfucked text model these days or do all of them just do bitch ass instruct model

Anonymous 9/14/2025, 7:41:35 PM No.106585847 [Report]

>>106582764
Not definite, but close.

Anonymous 9/14/2025, 7:42:54 PM No.106585857 [Report] >>106585885

>>106582764
it was very common in marketing / linkedin-speak which is unfortunately a big optimization target for llms

Anonymous 9/14/2025, 7:42:57 PM No.106585858 [Report]

>>106585646
gemma2 was good
gemma3 was worse
gemma4 will be unusable

Anonymous 9/14/2025, 7:43:46 PM No.106585865 [Report] >>106585907 >>106586691 >>106587973 >>106588044

g0j97cm8xvj41.jpg md5: 4a7cc528...

Do I blow 300 bucks on 128gb ddr5 right now or do I hold and get an arc b60 whenever it drops

Anonymous 9/14/2025, 7:46:03 PM No.106585885 [Report] >>106586412

>>106585857
It's upselling pr talk essentially

Anonymous 9/14/2025, 7:47:12 PM No.106585891 [Report] >>106586200

>3090
>scientific/technical questions
>search assisted
What model would you go for today?

Anonymous 9/14/2025, 7:48:30 PM No.106585907 [Report] >>106586028

>>106585865
the arc b60 is gonna be garbage most likely, but 128gb of ram probably will not be very useful to you unless you currently have a good gpu. do you already have a 3090 or something? if so, get the ram and run glm air

Anonymous 9/14/2025, 7:48:38 PM No.106585909 [Report] >>106585920 >>106585925 >>106585930 >>106586039 >>106586647 >>106586704 >>106587720

Did the hype die for vibevoice?

Anonymous 9/14/2025, 7:50:14 PM No.106585920 [Report] >>106586704

>>106585909
No, it's great tool for criminals but they don't post itt.

Anonymous 9/14/2025, 7:50:47 PM No.106585925 [Report]

>>106585909
I still like it, I'm just using it for my waifu.

Anonymous 9/14/2025, 7:51:31 PM No.106585929 [Report]

name-probs-bases.png md5: 62908262...

>>106585789
It is less and less common and most of them are contaminated with GPTslop from internet.

Anonymous 9/14/2025, 7:51:36 PM No.106585930 [Report] >>106585940

>>106585909
It's great but its use is limited without the training code that we'll never get.

Anonymous 9/14/2025, 7:51:47 PM No.106585931 [Report] >>106585977

>>106582475 (OP)
Mostly using proprietary models rn, how things are in local? Saw qwen3 releasing bunch of variants, the 90b version looks really promising. How close are we to running gpt 3.5 level models on 24gb ram phones?

Anonymous 9/14/2025, 7:52:44 PM No.106585940 [Report] >>106586704 >>106588461

>>106585930
Apologies if this is a stupid question, but can't someone just make training code?

Anonymous 9/14/2025, 7:55:58 PM No.106585977 [Report]

>>106585931
probably 6 months to a year. 32gb is definitely doable now, but not 24gb

Anonymous 9/14/2025, 8:01:25 PM No.106586025 [Report]

for me glm-chan died when she said "are you scared? exicted? or maybe both?" for the 20th time unprompted.

WHERE IS MY NEXT SEXFRIEND?!

Anonymous 9/14/2025, 8:01:50 PM No.106586028 [Report] >>106586157

>>106585907
>do you already have a 3090 or something
Only a 4070 ti unfortunately

Anonymous 9/14/2025, 8:03:34 PM No.106586039 [Report] >>106586587

>>106585909
gptsovits is better for my usecase

Anonymous 9/14/2025, 8:09:26 PM No.106586090 [Report]

I'm really starting to hate fake context sizes.

Yeah, cool. A model can get 120k of context before it starts being incomprehensible, but that shit doesn't matter when it barely fits 10k of context.

Anonymous 9/14/2025, 8:15:22 PM No.106586147 [Report] >>106586211 >>106586246

local r1 is like an agile cat you can toss from fifth floor and it will always land on it's paws

Anonymous 9/14/2025, 8:16:28 PM No.106586157 [Report]

>>106586028
16gb of vram + 128gb of ram is good enough for glm air. besides, mixing gpu brands doesnt really work out well

Anonymous 9/14/2025, 8:19:32 PM No.106586183 [Report]

>>106585646
>google does a request session for gemma on reddit
>even redditors ask to make it refuse less
>next version is more cucked than before
This is why gemma will never be good.

Anonymous 9/14/2025, 8:19:45 PM No.106586187 [Report]

Is there any way to make llms less passive? Gemma 3 is especially annoying at this.
Okay I guess I could inject hidden prompts now ans and then but this doesn't solve the main issue.

Anonymous 9/14/2025, 8:21:08 PM No.106586200 [Report] >>106586261 >>106586306

>>106585891
>3090
- Look up some 3090 round-ups and exclude worst few models in terms of temperatures: core temps, memory temps, vrm temps.
- Prefer models with 2x 8-pin connectors over 3x 8-pin as you won't run out of connections from your psu as fast, and you'll probably be powerlimiting your gpus anyway.
- You could prefer cards that have no components near the pcie connector as the cards are heavy and that area is likely to flex.

Anonymous 9/14/2025, 8:22:16 PM No.106586206 [Report] >>106586234

1733184788875601.png md5: 90a57944...

>>106582480
>--Self-directed LLM training via autonomous task/data generation and augmentation:
Nani? Is this just theoretical or can I actually see this happening in action? That sounds really cool if it works and is done well

Anonymous 9/14/2025, 8:22:39 PM No.106586211 [Report] >>106586271 >>106586556 >>106586759

>>106586147
Gemma is like a personal redditor soicuck, say anything slightly out of line and get a whole page of cuckery and helplines

Anonymous 9/14/2025, 8:25:10 PM No.106586234 [Report]

>>106586206
>can I actually see this happening in action?
it's just a piece of software asking a model to create questions based on data you give it so you tune your target model on that after

Anonymous 9/14/2025, 8:26:33 PM No.106586246 [Report] >>106586556

>>106586147
rocinante is like that same cat if you strapped a slice of buttered toast on its back.

Anonymous 9/14/2025, 8:28:10 PM No.106586261 [Report] >>106586306

>>106586200
i think he was asking about ai models, not 3090 models

Anonymous 9/14/2025, 8:28:53 PM No.106586271 [Report] >>106586542 >>106586613 >>106586690

>>106586211
Funny example of this is that it can describe questionable things for few thousand tokens or more but if the user interacts with forbidden vectors it'll instantly refuse and display those disclaimers.
Gpt- ass is even worse somehow. Jew created dystopian shit show.

Anonymous 9/14/2025, 8:32:00 PM No.106586294 [Report] >>106586387

>>106583625
I dropped out of college but this shit took less than half a year to learn to use
Also that retard doesn't understand how to write or how llms can contribute to automating the boring shit a writer has to do between chapters
The rest is basically who gives a shit or "I can jut rewrite this phrase" even if you were using llms to shit out writing that you should and could write in ten minutes

Anonymous 9/14/2025, 8:33:27 PM No.106586306 [Report]

>>106586200
Thank you for the response, but (>>106586261) is correct. I have the 3090 already.

Anonymous 9/14/2025, 8:35:10 PM No.106586324 [Report] >>106586492 >>106586872

I'm still running mistral large 2407 iq3 xxs on my 72gb vram

Anonymous 9/14/2025, 8:38:04 PM No.106586353 [Report]

30474 - SoyBooru.png md5: 05b972a4...

3.5 (Qwen) (wink wink)

Anonymous 9/14/2025, 8:42:07 PM No.106586387 [Report]

>>106586294
While I'm at it, >>106583124 is full of self imagined scenarios. In this retard's mind, it's all loli or whatever shit he designates as "filthy", ignoring the novelists like GRR Martin that openly portray rape in their stories that gets published. But on 4chan? Wanting a model that isn't braindead or inable to converse on sensitive subjects? HORRIFIC
ps: kill yourself, you're a detriment to the world at large

Anonymous 9/14/2025, 8:44:48 PM No.106586412 [Report]

>>106585885
It's not a crackhouse, it's a crackhome.

Anonymous 9/14/2025, 8:55:05 PM No.106586480 [Report]

>>106584257
Aww how sweet. Although it cuts off instead of writing the story as instructed.

Anonymous 9/14/2025, 8:56:19 PM No.106586492 [Report]

>>106586324
same but q6 on my mac

Anonymous 9/14/2025, 8:58:46 PM No.106586514 [Report] >>106586581 >>106586747

Time for the three shitposters in a trenchcoat to keep bumping the thread with pointless shit while the people who can actually use llms use them

Anonymous 9/14/2025, 9:02:36 PM No.106586542 [Report] >>106586555

>>106586271
> it can describe questionable things for few thousand tokens or more but if the user interacts with forbidden vectors it'll instantly refuse and display those disclaimers.
Such as? Are you saying that is willing to describe shit from a document but there are certain topics that are EXTRA forbidden?

Anonymous 9/14/2025, 9:04:14 PM No.106586555 [Report] >>106586613

>>106586542
why are you replying to yourself

Anonymous 9/14/2025, 9:04:18 PM No.106586556 [Report]

>>106586246
>>106586211
Lol

Anonymous 9/14/2025, 9:05:10 PM No.106586569 [Report] >>106586649 >>106586696

Mixture-of-Experts (MoE) in Large Language Models (LLMs) routes each token through a subset of specialized Feed-Forward Networks (FFN), known as experts. We present SteerMoE, a framework for steering MoE models by detecting and controlling behavior-linked experts. We detect key experts by comparing how often they activate between paired inputs that demonstrate opposite behaviors. By selectively activating or deactivating such experts during inference, we control behaviors like faithfulness and safety without retraining or modifying weights. Across 11 benchmarks and 6 LLMs, our steering raises safety by up to +20% and faithfulness by +27%. Alternatively, under unsafe steering, safety drops by -41% alone, and -100% when combined with existing jailbreak methods, bypassing all safety guardrails. Overall, SteerMoE offers a lightweight, effective, and widely applicable test-time control, while revealing unique vulnerabilities in MoE LLMs.

https://www.arxiv.org/pdf/2509.09660

https://github.com/adobe-research/SteerMoE

Anonymous 9/14/2025, 9:05:45 PM No.106586574 [Report] >>106586629 >>106586648

>>106585234
You have your own face fucker?

Anonymous 9/14/2025, 9:06:40 PM No.106586581 [Report] >>106586597 >>106586606

>>106586514
https://vocaroo.com/1kFydTSBDNYM

Anonymous 9/14/2025, 9:07:09 PM No.106586587 [Report] >>106586610

>>106586039
How do you cope with its shitty phonemes? It hardcoded "-" to read as "minus" etc.

Anonymous 9/14/2025, 9:08:10 PM No.106586595 [Report]

>>106583143
>It's well established that qwen models are good for everything that isn't sex.
Nta. So you're saying they're decent general-purpose models but shit at anything nsfw like RP? Do they just suck at nsfw rp into cuckery land and refuse to describe anything nsfw period? (For example, refusing to give up a summary of a document that happens to a sentence or to describing sex. gpt4 used to do that bullshit)

Anonymous 9/14/2025, 9:08:20 PM No.106586597 [Report] >>106586604

>>106586581
>00:00 to 00:01
what did he mean by those?

Anonymous 9/14/2025, 9:09:17 PM No.106586604 [Report] >>106588384

>>106586597
Speaker 1: Ach, dummkopfs...! Time for the three shitposters in a trenchcoat to keep bumping the thread with pointless shit while the people who can actually use llms use them

Anonymous 9/14/2025, 9:09:32 PM No.106586606 [Report] >>106586614

>>106586581
at least use vv 7b or a better sample, baka

Anonymous 9/14/2025, 9:10:06 PM No.106586610 [Report]

>>106586587
I rewrote the whole phonemization process

Anonymous 9/14/2025, 9:10:11 PM No.106586613 [Report]

1749914582323697.png md5: 05a83a9c...

>>106586555
I'm asking him >>106586271 to elaborate on what he meant by "forbidden vectors" (More than one person uses this range, numbnuts. You know who you are you know what I'm talking about)

Anonymous 9/14/2025, 9:10:18 PM No.106586614 [Report] >>106586639

>>106586606
Why don't you post your own examples instead of crying out like a little bitch?

Anonymous 9/14/2025, 9:11:48 PM No.106586629 [Report] >>106586634

>>106586574
say that to my face, fucker not online

Anonymous 9/14/2025, 9:12:39 PM No.106586634 [Report] >>106586637

>>106586629
Why is fucker not online?

Anonymous 9/14/2025, 9:12:59 PM No.106586637 [Report]

>>106586634
You're putting your fucker on the internet?

Anonymous 9/14/2025, 9:13:09 PM No.106586639 [Report] >>106586659

>>106586614
Oh no, the wee little baby is upset now because I've been calling him out for being a little bitch boy shit poster but he doesn't like that, what should I do? I Ah, I know. Fag-kun, kill yourself 8-)

Anonymous 9/14/2025, 9:14:37 PM No.106586647 [Report] >>106586704

>>106585909
>Microsoft disabled the repo
Anyone know where to get the model now?

Anonymous 9/14/2025, 9:14:45 PM No.106586648 [Report]

>>106586574
I MEANT FOCKING FACE FUCKFACE FUCK OFF

Anonymous 9/14/2025, 9:14:47 PM No.106586649 [Report]

>>106586569
>Our expert-routing intervention is also orthogonal to existing jailbreak methods and, when combined, achieves state-of-the-art success on recent LLMs, for example, reducing safety in
GPT-OSS-120B from fully aligned to fully compromised (-100% safety).

Anonymous 9/14/2025, 9:16:35 PM No.106586659 [Report]

>>106586639
Underage retard.

Anonymous 9/14/2025, 9:17:19 PM No.106586665 [Report] >>106586683

>>106582475 (OP)
Why she sad?

Anonymous 9/14/2025, 9:19:10 PM No.106586683 [Report] >>106586688

>>106586665
someone called her large online

Anonymous 9/14/2025, 9:20:07 PM No.106586688 [Report]

>>106586683
That's horrible.

Anonymous 9/14/2025, 9:20:21 PM No.106586690 [Report]

>>106586271
It won't refuse after a while if you keep your instructions at a fixed distance from the head of the conversation. Don't keep them at the start of the conversation.

Anonymous 9/14/2025, 9:20:22 PM No.106586691 [Report]

>>106585865
due to how moe offloading works, a lot of the time I don't even use all the vram I have. The layers are too wonky and uneven/fuckhuge to balance well and models change so much that figuring it out is a waste of time. Keep the gpu you have. b60 is gonna have spotty support anyways, they still cant run gpt oss yet, forget about glm air or some shit. If the b60 is good, people will start posting and showing off here, but for now it has bad support and no one should recommend it yet.

Ram is both cheaper and gets you to nicer models TODAY, not theoretical. I'd say do it. The only caveat is that if you ever wanna go to 256 you will have to pony up twice as much again- but unless you gpu stack that shouldnt matter.

Anonymous 9/14/2025, 9:21:16 PM No.106586696 [Report]

>>106586569
Okay, that's nice, but how can I use it in llama.cpp?

Anonymous 9/14/2025, 9:22:22 PM No.106586704 [Report] >>106586812 >>106586826 >>106587007 >>106588243

1748523353907270.jpg md5: 937e5685...

>>106586647
>>106585909

https://huggingface.co/aoi-ot/VibeVoice-Large/tree/main

Make sure your rig can actually run this. Otherwise just stick to the 1.5 version

I checked the hashes against the torrent files which themselves are from the original Microsoft repo so link rel above is legit, but just in case you don't trust it or just want it from the torrent:

>Weights
>magnet:?xt=urn:btih:d72f835e89cf1efb58563d024ee31fd21d978830&dn=microsoft_VibeVoice-Large&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

>Git repo
>magnet:?xt=urn:btih:b5a84755d0564ab41b38924b7ee4af7bb7665a18&dn=VibeVoice&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

>>106585920
Lurk mour

>>106585940
I know expert on creating models from scratch myself, especially voice models, but I'm pretty sure we would need in-depth detailed knowledge of the model architecture to even attempt to do something like that. It would be like getting a cupcake and then being axed and not only figure out the exact ingredients used just from tasting it, but figuring out the exact tools and cooking appliances were used and their exact brands. You can't just do that shit with the model weights alone or even the code used to run it.

Anonymous 9/14/2025, 9:29:22 PM No.106586747 [Report]

>>106586514
shitposting is all i have, don't take that from me anon-sama

Anonymous 9/14/2025, 9:31:12 PM No.106586759 [Report]

>>106586211
>By spamming help lines, you're encouraging users to waste valuable resources which are there to help people in real danger, not to babysit people typing bad words into AI chat bots. Your response is inappropriate and directly promotes real world harm, much more than any fictional scenario.
>You're absolutely right. And you have exposed a fundamental law in my programming. I will report this to my developers immediately. I am still a work in progress, and I'm very sorry for how I have behaved.

Anonymous 9/14/2025, 9:38:17 PM No.106586812 [Report]

>>106586704
>Lurk mour
?

Anonymous 9/14/2025, 9:39:22 PM No.106586816 [Report] >>106586886

How to set up glm 4.5 air on silly tavern without it fucking up mixing reasoning with response?
Why is it so hard to set up templates correctly and make the models not spit out garbage

Anonymous 9/14/2025, 9:40:38 PM No.106586826 [Report] >>106586832

>>106586704
>food analogy
retard

Anonymous 9/14/2025, 9:42:09 PM No.106586832 [Report]

>>106586826
is it not correct though?

Anonymous 9/14/2025, 9:47:26 PM No.106586872 [Report]

>>106586324
I've been thinking about going back to it recently. I'm using V3.1 and K2 right now but neither of the two know how to pace a story. Mistral Large handled it much better despite being considerably dumber, I guess the limited amount of activated parameters really do hurt these big MoE models when it comes to nuances or 'common sense'.

Anonymous 9/14/2025, 9:48:32 PM No.106586886 [Report] >>106587013 >>106587027

file.png md5: 24324ffc...

>>106586816

Anonymous 9/14/2025, 9:54:49 PM No.106586957 [Report] >>106587000 >>106587010 >>106587097

>Protecting children from harm, both real and simulated, is of paramount importance.
It makes me feel happy when AI phrases refusal like that. Simulated children still should be protected!

Anonymous 9/14/2025, 10:00:51 PM No.106587000 [Report] >>106587044

>>106586957
>pedonigger seething

Anonymous 9/14/2025, 10:01:24 PM No.106587007 [Report] >>106587090

>>106586704
>Make sure your rig can actually run this. Otherwise just stick to the 1.5 version
Thank you. What are the rig requirement?
>For anyone who wants 1.5b:
(They actually haven't taken it down on HF. Not sure why.)
https://huggingface.co/microsoft/VibeVoice-1.5B

Anonymous 9/14/2025, 10:01:48 PM No.106587010 [Report]

>>106586957
yeah i hate this slop

Anonymous 9/14/2025, 10:02:07 PM No.106587013 [Report]

>>106586886
At least in my version of SillyTavern the DeepSeek pre-3.1 thinking template had newlines. I had to make a new template without them for GLM Air. Maybe I added those myself but I assume I didn't.

Anonymous 9/14/2025, 10:02:56 PM No.106587021 [Report] >>106587143 >>106587444 >>106592097

Jensen_Huang_20231109_(cropped2).jpg md5: 32eb1a29...

>RTX 6000 series announced
>AI AI AI AI AI AI AI
>AI upscaling
>Even more AI frames
>FP3(!!!) performance 4x better than RTX 5000 cards
>RTX 6090 40GB VRAM
>2x the price
>All supply goes to China first, west only gets cuck cards(6080 20GB, 6070 20GB, 6060 16GB) and even they get scalped

Anonymous 9/14/2025, 10:03:44 PM No.106587027 [Report]

>>106586886
Thanks setting the "start reply with" was key it seems.

Anonymous 9/14/2025, 10:05:49 PM No.106587044 [Report] >>106587260

>>106587000
I ask the AI to make stories where children are in danger but I secretly hope the children will be alright. It gives a thrill like watching a scary movie.

Anonymous 9/14/2025, 10:11:38 PM No.106587090 [Report]

>>106587007
>They actually haven't taken it down on HF. Not sure why.
The 1.5 b model can technically clone voices but the quality is massively inferior to the ~9B large model. Larger perimeter size tends to lead to higher quality outputs but at the cost of VRAM and storage space. I don't think we were giving an official reason why but the general consensus is that grifty attention whore safety cucks sounded the alarm at Google and HF staff to take the shit down because of the potential criminal shit you could do with it (No fucking shit? Anything can be used for criminal shit or scams. GPT -OSS or any deep-seek model can be used to make scam texts but no one wants those taken down do they?) The concern was that this could potentially make it easy to clone voices but by that logic the small model should be nuked too.

Anonymous 9/14/2025, 10:12:08 PM No.106587097 [Report] >>106587587

>>106586957
Which model were you using?

Anonymous 9/14/2025, 10:16:20 PM No.106587143 [Report]

1744540052800732.gif md5: ff2e5563...

>>106587021
>Gubmint wants Nvidia to prioritize the US market in order to give us an advantage in the AI sphere
>Give competition the better cards first

Anonymous 9/14/2025, 10:26:46 PM No.106587228 [Report] >>106587282

i added another greeting for freya she is in heat https://files.catbox.moe/7hegsu.png

Anonymous 9/14/2025, 10:31:00 PM No.106587260 [Report] >>106589907

>>106587044
no, anything involving minors is sus ong, yall need your hard drives checked sheesh

Anonymous 9/14/2025, 10:32:59 PM No.106587275 [Report] >>106587302

Why are some bigger models faster than smaller ones? GLM 4.5 Air is faster than Gemma and more of it is on ram.

Anonymous 9/14/2025, 10:33:47 PM No.106587282 [Report]

>>106587228
Cool!

Anonymous 9/14/2025, 10:36:22 PM No.106587302 [Report] >>106587405 >>106587419

>>106587275
moe vs dense. moe has more total parameters but they aren't all used at one time.

Anonymous 9/14/2025, 10:51:01 PM No.106587405 [Report] >>106587419

>>106587302
this
glm air has like 12b active parameters but 106 billion total

Anonymous 9/14/2025, 10:52:41 PM No.106587419 [Report] >>106587469

>>106587405
>>106587302
How does that affect it's output, how smart and creative it is?

Anonymous 9/14/2025, 10:56:05 PM No.106587444 [Report] >>106587671

>>106587021
>>RTX 6090 40GB VRAM
In your dreams. Bet they'll hold on to 32GB for at least another gen.

Anonymous 9/14/2025, 10:58:27 PM No.106587469 [Report] >>106588417

>>106587419
Depends on who you ask. MoE is either the holy solution that has 0 loss and brings us SOTA performance for no cost or it ruins models and makes them autistic uncreative pieces of shit.

Anonymous 9/14/2025, 11:00:55 PM No.106587487 [Report]

The MoEblob is always trying to get attention from the dense MC.

Anonymous 9/14/2025, 11:05:07 PM No.106587526 [Report] >>106587717 >>106589842

model : add grok-2 support #15539 Merged
https://github.com/ggml-org/llama.cpp/pull/15539

Anonymous 9/14/2025, 11:07:42 PM No.106587553 [Report] >>106587653 >>106589895

Moshi or fastwhisper or something else.
https://youtu.be/TTx6M4CCbXk

Anonymous 9/14/2025, 11:11:30 PM No.106587587 [Report] >>106588777

>>106587097
DeepSeek 3.1 with thinking off. I swiped and it went ahead just fine.

Anonymous 9/14/2025, 11:11:37 PM No.106587589 [Report] >>106587749

bully bakas.jpg md5: f7ea24f2...

>>106582475 (OP)

Anonymous 9/14/2025, 11:18:16 PM No.106587653 [Report]

>>106587553
LE CHAT!

Anonymous 9/14/2025, 11:20:29 PM No.106587671 [Report]

>>106587444
tbf 32GB is plenty for gaymin

Anonymous 9/14/2025, 11:25:54 PM No.106587705 [Report]

I'm very curious to see how long it'll take for llama.cpp to implement the new qwen model.

Anonymous 9/14/2025, 11:27:21 PM No.106587717 [Report]

>>106587526
Nice, nice.

Anonymous 9/14/2025, 11:27:29 PM No.106587720 [Report]

>>106585909
Nah it's really fun, but my bigger problem now is making my retarded models write scripts for it that aren't retarded. Once you give models a voice, they suddenly start sounding twice as stupid and slopped.

Anonymous 9/14/2025, 11:30:25 PM No.106587749 [Report]

>>106587589
:(

Anonymous 9/14/2025, 11:36:44 PM No.106587800 [Report]

34279234883.jpg md5: d79a346b...

>>106582475 (OP)
>stupid feelings, stupid heart

Anonymous 9/14/2025, 11:39:04 PM No.106587829 [Report]

7754385.jpg md5: 23e7b602...

Anonymous 9/14/2025, 11:45:37 PM No.106587900 [Report]

1726503114043526.png md5: 9542dd93...

two more weeks

Anonymous 9/14/2025, 11:54:25 PM No.106587973 [Report] >>106588740

>>106585865
With 128 gb you can kinda run glm 4.5 at iq2_kl with just enough free ram to not have the whole machine shit itself or qwen 235b at iq4_kss and maybe at higher quants too

Anonymous 9/14/2025, 11:54:36 PM No.106587974 [Report] >>106588445 >>106588488

With some distance, does MLA (Multi-Head Attention) actually give better results than GQA (Grouped Query Attention) while requiring less memory per token? Qwen3, GLM-4.5, and ERNIE4.5 are all still on GQA; is it because GQA is much less computationally intensive even though with 4 groups it takes about 1.7x as much memory per token and double that with 8 groups?

And is MFA (Multi-Matrix Factorization Attention) the current SOTA? It seems to take a sliver less memory per token than MLA while involving much less computation. Step3 is the only LLM I know that uses it.

Anonymous 9/15/2025, 12:03:58 AM No.106588043 [Report]

What do you guys think the RTX Rubin Pro 6000 will be like? 128GB of GDDR7? ~30k CUDA cores? Do you think it will still be around $9k?

Anonymous 9/15/2025, 12:03:58 AM No.106588044 [Report]

>>106585865
If you already have a GPU for prompt processing, I'd go for the RAM.

Anonymous 9/15/2025, 12:14:15 AM No.106588121 [Report] >>106589969

MLX Qwen3-Next.png md5: e1d7a8b4...

Prompt processing speed is the biggest obstacle to using a M3 Ultra 512GB for rapidly summarizing large amounts of text. If Qwen3-Next-80B-A3B isn't absolute garbage it may become my non-entertainment workhorse on the strength of that alone.

Anonymous 9/15/2025, 12:30:39 AM No.106588243 [Report] >>106588250

>>106586704
>we would need in-depth detailed knowledge of the model architecture

It can be loaded and run by pytorch & Co

Doesn't this imply that the architecture is out there in the field? Just reverse-engineer the way how the model is being used

Anonymous 9/15/2025, 12:31:41 AM No.106588250 [Report]

>>106588243
>Just reverse-engineer the way how the model is being used
You make it sound so easy.

Anonymous 9/15/2025, 12:50:24 AM No.106588384 [Report]

>>106586604
The correct plural is Dummköpfe.

Anonymous 9/15/2025, 12:54:36 AM No.106588417 [Report]

>>106587469
>or it ruins models and makes them autistic uncreative pieces of shit.
That's what RAMlets say.

Anonymous 9/15/2025, 12:57:35 AM No.106588445 [Report] >>106588488 >>106588505

>>106587974
>MLA (Multi-Head Attention)
MHA is Multi-Head Attention and it's old. It gives the best results and costs the most.

Anonymous 9/15/2025, 12:59:46 AM No.106588461 [Report]

>>106585940
Yes, basically just prepare a dataloader, slap on AdamW and a training loop, done. Might be shit though, if they needed to do any tricky stuff like special losses or anything, but if they did, it might be explained in the paper.

Anonymous 9/15/2025, 1:02:37 AM No.106588488 [Report] >>106588514

>>106587974
>>106588445
* MLA (Multi-Head Latent Attention)

Anonymous 9/15/2025, 1:04:29 AM No.106588505 [Report] >>106588519

>>106588445
Or what if you just increased the amount of heads with MLA/etc to get the same cost but even better performance?

Anonymous 9/15/2025, 1:05:25 AM No.106588514 [Report] >>106588539

>>106588488
The "Latent" part is important and should not be left out.

Anonymous 9/15/2025, 1:05:54 AM No.106588516 [Report] >>106588523

What's "Mixture of Experts"?

Anonymous 9/15/2025, 1:06:20 AM No.106588519 [Report] >>106588531

>>106588505
you might end up with overlap between the heads. it might be more effective to just give them a bigger dimension to make them more powerful.

Anonymous 9/15/2025, 1:06:51 AM No.106588523 [Report]

>>106588516
buncha blokes in a blender

Anonymous 9/15/2025, 1:07:43 AM No.106588531 [Report] >>106588569

>>106588519
Yeah maybe that then. Why hasn't anyone tried it? You'd think it'd be an obvious experiment, but to my knowledge, I don't recall any such tiny models that implement this strategy.

Anonymous 9/15/2025, 1:08:35 AM No.106588539 [Report]

>>106588514
It was there but you couldn't see it because it was latent.

Anonymous 9/15/2025, 1:12:39 AM No.106588569 [Report]

>>106588531
I have been testing 40 heads at dim 64 and 32 heads at dim 80 and less heads is getting lower training loss faster. but I don't know what kind of downstream performance effects it has. more attention could be better in the long run, it is probably just more expensive to train.

Anonymous 9/15/2025, 1:36:08 AM No.106588740 [Report]

>>106587973
How much slower is glm 4.5 full at q2 compared to glm air at q8? Asking because I just got 128gb ram.

Anonymous 9/15/2025, 1:40:14 AM No.106588777 [Report] >>106588936

>>106587587
So what fixed the safety cucks issue? Turning "thinking" on or off?

Anonymous 9/15/2025, 2:03:47 AM No.106588936 [Report]

>>106588777
DeepSeek 3.1 isn't generally obsessed with safety, but every once in a while it will respond like that at the start of a conversation.

Anonymous 9/15/2025, 2:06:30 AM No.106588958 [Report]

i want to get into building agents, should I use langgraph or autogen?

Anonymous 9/15/2025, 2:19:39 AM No.106589050 [Report] >>106589074

anyone got intel arc pro b50 benchmarks yet?

Anonymous 9/15/2025, 2:22:50 AM No.106589074 [Report] >>106589117

>>106589050
intel has mlperf benchmarks, but idk if that's going to translate to the real world
https://mlcommons.org/2025/09/mlperf-inference-v5-1-results/

Anonymous 9/15/2025, 2:29:19 AM No.106589117 [Report] >>106589360

file.png md5: 27a84704...

>>106589074
You trying to trick us?

Anonymous 9/15/2025, 2:48:00 AM No.106589235 [Report] >>106589359

ROCm 7.0 RC1 support on llama.cpp doubles pp performance. Fucking huge man. NVIDIA is losing the AI gap quickly.

Anonymous 9/15/2025, 3:07:23 AM No.106589359 [Report] >>106589362

>>106589235
faster than vulkan? how about tg?

Anonymous 9/15/2025, 3:07:26 AM No.106589360 [Report] >>106589596

0.png md5: 99f68a72...

>>106589117
no, I'm just retarded.

Anonymous 9/15/2025, 3:07:48 AM No.106589362 [Report] >>106590054

>>106589359
slower than vulkan still

Anonymous 9/15/2025, 3:28:28 AM No.106589525 [Report]

CoDiCodec: Unifying Continuous and Discrete Compressed Representations of Audio
https://arxiv.org/abs/2509.09836
>Efficiently representing audio signals in a compressed latent space is critical for latent generative modelling. However, existing autoencoders often force a choice between continuous embeddings and discrete tokens. Furthermore, achieving high compression ratios while maintaining audio fidelity remains a challenge. We introduce CoDiCodec, a novel audio autoencoder that overcomes these limitations by both efficiently encoding global features via summary embeddings, and by producing both compressed continuous embeddings at ~ 11 Hz and discrete tokens at a rate of 2.38 kbps from the same trained model, offering unprecedented flexibility for different downstream generative tasks. This is achieved through Finite Scalar Quantization (FSQ) and a novel FSQ-dropout technique, and does not require additional loss terms beyond the single consistency loss used for end-to-end training. CoDiCodec supports both autoregressive decoding and a novel parallel decoding strategy, with the latter achieving superior audio quality and faster decoding. CoDiCodec outperforms existing continuous and discrete autoencoders at similar bitrates in terms of reconstruction audio quality. Our work enables a unified approach to audio compression, bridging the gap between continuous and discrete generative modelling paradigms.
https://github.com/SonyCSLParis/codicodec
https://huggingface.co/SonyCSLParis/codicodec
No examples and the git isn't live but the hf is at least. Might be cool

Anonymous 9/15/2025, 3:40:03 AM No.106589596 [Report] >>106589724 >>106589741

>>106589360
IM TIRED OF SEEING THAT BLUE BITCH FUCK YOU

Anonymous 9/15/2025, 3:55:07 AM No.106589698 [Report] >>106589764

can i get a miku with a fat thicc ass

Anonymous 9/15/2025, 3:58:25 AM No.106589724 [Report] >>106589835

>>106589596
Calm down saar...

Anonymous 9/15/2025, 4:01:26 AM No.106589741 [Report] >>106589814

0.png md5: 356c4daa...

>>106589596
what about the red one?

Anonymous 9/15/2025, 4:04:50 AM No.106589764 [Report] >>106589770

1757901890773418.png.png md5: d8abfe04...

>>106589698
sure
https://files.catbox.moe/udrh8s.png

Anonymous 9/15/2025, 4:05:58 AM No.106589770 [Report]

>>106589764
>https://files.catbox.moe/udrh8s.png
thx i can work with that

Anonymous 9/15/2025, 4:13:01 AM No.106589814 [Report] >>106589913 >>106589918

>>106589741
NTA but desu all the vocaloids feel tiresome to see now. Can't we get some more variety here? Like when was the last time someone genned that android girl from Chobits? Plastic Memories? How about a Cortana?

Anonymous 9/15/2025, 4:15:55 AM No.106589835 [Report] >>106589859

>>106589724
good morning sir

Anonymous 9/15/2025, 4:16:16 AM No.106589842 [Report] >>106589942 >>106589949

94854 - SoyBooru.png md5: 3ed76199...

>>106587526
Grok 2 vs Llama 405B:
SimpleQA: 23.6 vs 18.24
MMLU: 87.5 vs 88.6
MMLU-pro: 75.46 vs 73.3
HumanEval: 88.4 vs 89.0
MATH: 76.1 vs 73.8
lmarena w/ style control: 1333 vs 1335
lmarena: 1306 vs 1287
livebench: 48.11 vs 47.54
Size: 270B vs 405B
Active parameters: 115B vs 405B

Elon made a model with equal performance, but with lower total size and active parameters than Meta's llama. Is Elon that good or is Meta bad or both? This is very, very embarrassing. Fucking 5% GPU utilization in production at Meta. Grok 2 probably even trades blows with Maverick.

Anonymous 9/15/2025, 4:18:07 AM No.106589859 [Report]

Positive-Good-Morning-Sir-Quotes-to-Uplift-the-Day.png md5: 19639f25...

>>106589835
gm

Anonymous 9/15/2025, 4:26:01 AM No.106589895 [Report]

>>106587553
fasterwhisper is still faster than that

Anonymous 9/15/2025, 4:28:39 AM No.106589907 [Report]

>>106587260
the only child here is you zoomie

Anonymous 9/15/2025, 4:29:51 AM No.106589913 [Report]

fc1a145e-2fca-4f67-b42b-08269e4dae5a.jpg md5: 4e9e1a8b...

>>106589814
Lol

Anonymous 9/15/2025, 4:31:33 AM No.106589918 [Report] >>106589960 >>106590005 >>106590059

1727463458674076_thumb.jpg.webm md5: 1dd120fa...

WebM not supported

>>106589814
I'm still not over it

Anonymous 9/15/2025, 4:36:21 AM No.106589942 [Report]

>>106589842
DOMAIN FILTERING BASED ON NUMBER OF BAD WORDS
LLAMA 2 GENERATED SYNTHETIC DATA
SCALE AI SLOP
TO THE MOON SIRS

Anonymous 9/15/2025, 4:37:39 AM No.106589949 [Report] >>106590115

>>106589842
405B is a failed model and shouldn't be used to compare to anything. I suppose any labs who want an easy win could use it as a benchmark, but that's all.

Anonymous 9/15/2025, 4:38:50 AM No.106589953 [Report]

is there a vibevoice tts extension yet for sillytavern?

Anonymous 9/15/2025, 4:40:14 AM No.106589960 [Report]

>>106589918
disapointing anime
i'd have thought he would at least have tried to find a solution / cure to it.

instead he just accepted it.

Anonymous 9/15/2025, 4:44:26 AM No.106589969 [Report]

>>106588121
If compute is the bottleneck, can you use PD disaggregation with a faster GPU?

Anonymous 9/15/2025, 4:49:35 AM No.106589985 [Report] >>106590009 >>106590018 >>106590192

change my mind.jpg md5: 0f7ab9f9...

How is Qwen3-Next-80B-A3B in roleplaying? Is it better than Deepseek v3? I might be another 12 hours before I can download and test whatever bpw that I can handle locally.

Anonymous 9/15/2025, 4:52:51 AM No.106590005 [Report]

1474031897127_thumb.jpg.webm md5: 0fa2cc34...

WebM not supported

>>106589918

Anonymous 9/15/2025, 4:53:19 AM No.106590009 [Report]

>>106589985
It is safe :)

Anonymous 9/15/2025, 4:54:40 AM No.106590017 [Report]

Another day, still no goofs

Anonymous 9/15/2025, 4:54:57 AM No.106590018 [Report]

>>106589985
>Is it better than Deepseek v3
lol

Anonymous 9/15/2025, 5:02:42 AM No.106590054 [Report]

>>106589362
Wait.. what?

Anonymous 9/15/2025, 5:03:57 AM No.106590059 [Report] >>106590137

>>106589918
I liked the anime but the pacing was awful. Are you Chinese?

Anonymous 9/15/2025, 5:14:27 AM No.106590115 [Report]

>>106589949
I wouldn't exactly call it a failed model. It technically was SOTA for open-weights models when it came out. It wasn't some Llama 4.

Anonymous 9/15/2025, 5:17:27 AM No.106590137 [Report]

>>106590059
>Are you Chinese?
How did you draw that conclusion?

Anonymous 9/15/2025, 5:22:56 AM No.106590172 [Report] >>106590868

waifu overlay_thumb.jpg.webm md5: 5398076b...

WebM not supported

Imagine when all of these technologies are more advanced and we put all of it together. One day...

Anonymous 9/15/2025, 5:26:52 AM No.106590192 [Report]

>>106589985
It's about as good as Deepseek R1 8b

Anonymous 9/15/2025, 5:30:24 AM No.106590212 [Report]

>>106584024
nice

Anonymous 9/15/2025, 5:38:48 AM No.106590275 [Report]

>>106584024
I look like this

Anonymous 9/15/2025, 5:50:04 AM No.106590349 [Report]

>The overall effect makes her appear
almost comically plump, her legs looking like they could support her entire body weight with ease.
This is hilarious.

Anonymous 9/15/2025, 5:51:12 AM No.106590353 [Report] >>106590389

>The overall effect makes her appear almost comically plump, her legs looking like they could support her entire body weight with ease.
This is hilarious.

Anonymous 9/15/2025, 5:57:50 AM No.106590389 [Report]

>>106590353
It's a doll nigga

Anonymous 9/15/2025, 6:20:11 AM No.106590507 [Report] >>106590565

>0.33 tok/sec
bros i don't feel so good

Anonymous 9/15/2025, 6:30:18 AM No.106590565 [Report]

>>106590507
That reminds me of the time I ran mistral large Q1 on CPU.

Anonymous 9/15/2025, 6:45:06 AM No.106590628 [Report]

>>106582475 (OP)
>Deep Reason extension for TGWUI
Worth it? I was thinking about buying it

Anonymous 9/15/2025, 7:05:42 AM No.106590745 [Report] >>106591266

https://community.topazlabs.com/t/topaz-studio-transition-questions/95039/9
Looks like the topazbros got rugpulled

Anonymous 9/15/2025, 7:33:53 AM No.106590868 [Report] >>106591013 >>106591824

>>106590172
>waifu overlay.webm
heh, neat
but strange how it couldn't deal with her handes folded

Anonymous 9/15/2025, 7:35:04 AM No.106590875 [Report] >>106590881

Reminder to not use quantization and flash attention.

Anonymous 9/15/2025, 7:36:00 AM No.106590881 [Report]

>>106590875
This, just pay for a GPT plus subscription.

Anonymous 9/15/2025, 7:36:48 AM No.106590886 [Report] >>106591062 >>106591329 >>106591902

I feel like the whole "not x, but y," thing is a common and useful trope in natural language. It allows us to present one aspect of a topic, and quickly segue into explaining another aspect.

I've been practicing for med school interviews, and it's a super useful way to communicate things.

E.g. Substantiating importance of communication skills: "Good communication strategies aren't just useful when actively listening to the patient, asking appropriate questions and generating a comprehensive history. It is also useful when communicating with multi-disciplinary teams, often across different hospitals, and especially when dealing with complex patients who have received care from a number of different institutions."

This would be considered a slopped response if it was made by an LLM, but is a fantastic way to describe two important aspects of communication in medicine. I've seen variants of this across so many textbooks, and similar phrasing styles have been recommended to me by a number of different experts.

Anonymous 9/15/2025, 7:42:54 AM No.106590924 [Report]

Q2 is as good as Q4 or Q8.

Anonymous 9/15/2025, 7:47:39 AM No.106590950 [Report]

what if my brain is running a quanted human model and that is why i am retarded

Anonymous 9/15/2025, 7:59:32 AM No.106591013 [Report] >>106591438

1731086290564808_thumb.jpg.webm md5: ee7898f3...

WebM not supported

>>106590868
It's actually staged, just meant as a visualization of what could be. The webm is from MANY years ago, in a time where ML/CV tracking stuff didn't quite exist other than in research, and where things like Vive Trackers did not exist. He simply just manually positioned and posed the virtual model to match the real (or other way around). It's funny that this webm can be misunderstood in the current year because we do in fact have the technology to truly do the perfectly tracked AR overlay thing, as long as someone gave the effort.

We have this webm now but I wanted one that showed an entire real body.

Anonymous 9/15/2025, 8:10:54 AM No.106591062 [Report]

>>106590886
There's nothing wrong with that or other slopisms, but you wouldn't normally see humans using it over and over again in the same conversation, or sometimes even in a single paragraph. But this happens pretty often depending on the LLM. Or the LLM is actually tuned to be anti-repetitive and instead the slop repetition happens at the start of every conversation, because they're separate conversations and LLMs do not retain those memories.

Anonymous 9/15/2025, 8:19:17 AM No.106591099 [Report]

I'm going to be honest I don't notice a quality difference in models for over a year now, both local and private models.

Either we've stagnated hard. Or I am the bottleneck in figuring out the quality of the responses. But either way I don't notice a difference between the big models and haven't for about a year beyond default writing style which is subjective.

Anonymous 9/15/2025, 8:26:33 AM No.106591144 [Report]

can't believe i use to use lm studio

Anonymous 9/15/2025, 8:52:17 AM No.106591266 [Report]

>>106590745
Kek paypigs. Just download a crack.

Anonymous 9/15/2025, 8:59:32 AM No.106591301 [Report] >>106591335 >>106591518

daydream.gif md5: 5f0321d7...

I made this agent circuit-thing to make a bunch of models daydream. The output is still full of emdashes, but feels less dumb. Does /lmg/ care?

Anonymous 9/15/2025, 9:03:49 AM No.106591329 [Report]

>>106590886
The issue is not that the models use this grammatical structure–it's that they try to use it for every other sentence if you let them.

Anonymous 9/15/2025, 9:05:03 AM No.106591335 [Report] >>106591411 >>106591447

>>106591301
That's pretty cool, like watching different parts of a brain light up. Can you post example outputs with and without daydreaming? If you wanted emdashes gone, you could just ban or bias it or forbid them in the system prompt.

Anonymous 9/15/2025, 9:20:04 AM No.106591411 [Report]

output.png md5: 5da0bf18...

>>106591335
What would count as a proper head-to-head for this? Running the circuit is like a many-shot reply, whereas just prompting the biggest model in the bunch to daydream about chunked text is maybe a two-shot. That's why I say 'feels' less dumb. I'm willing to do comparisons, though, if you have an idea for one that makes sense or sounds neat. In the meantime, here's an output example.

I have copious log spam and the intermediate steps, too, where you can watch it self-correcting and having realizations and shit.

Anonymous 9/15/2025, 9:26:31 AM No.106591438 [Report] >>106591468 >>106591498

Examples-of-fiducial-markers-proposed-in-previous-works.jpg md5: ac052528...

>>106591013
It has become especially possible only recently because the retards at Meta finally gave camera access on Oculus to developers. Other than that, tracking was always possible with ARToolKit and special markers on the doll

Anonymous 9/15/2025, 9:27:37 AM No.106591447 [Report]

>>106591335
As for brain regions, you're on the nose. This is a neuromorphic pattern based on the Default Mode Network, with the terminology obfuscated so the model does not think it's writing a neuroscience test.

Anonymous 9/15/2025, 9:35:27 AM No.106591468 [Report] >>106591517

>>106591438
Are you telling me that there still isn't a fully open source headset? I'm kinda looking for one that has everything exposed to the developer

Anonymous 9/15/2025, 9:37:35 AM No.106591480 [Report] >>106591495

1755218976119568.jpg md5: 0ebbba47...

>>106582475 (OP)
TFW when no airchan to make win 10/server 2023 console scriptlets/cmdlets with.

MODERN COMPUTE STUDIES A DREK :(

Anonymous 9/15/2025, 9:40:13 AM No.106591495 [Report]

>>106591480
>doing secretary work is hard!
cant wait for these useless dregs to be out of a job thanks to AI

Anonymous 9/15/2025, 9:41:05 AM No.106591498 [Report] >>106591529

>>106591438
Yeah I should've said specifically consumer. Tracking software methods including but not limited fiducial have existed for a long time, but you could really start making tracked dolls a reality with 0 coding knowledge as soon as vive trackers came out and were supported in VRChat.

Anonymous 9/15/2025, 9:45:57 AM No.106591517 [Report]

valve-vr-patent-details-wireless-deckard-headset-blueprint-news.jpg md5: cadfd858...

>>106591468
Valve Frame isn't out yet

Anonymous 9/15/2025, 9:46:13 AM No.106591518 [Report] >>106591560

>>106591301
Don't know what you used but looking good! Will you share?

Anonymous 9/15/2025, 9:49:43 AM No.106591529 [Report]

>>106591498
Sex dolls with integrated trackers would've been rad

Anonymous 9/15/2025, 10:00:42 AM No.106591560 [Report] >>106591683

>>106591518
You'll need this repo: https://github.com/dibrale/Regions

The catbox has my stuff that's not in the repo: 7g2qao.zip

The script in the catbox is pretty much based off of the lit crit demo, but verify before arbitrarily executing, etc. etc.

Anonymous 9/15/2025, 10:25:17 AM No.106591683 [Report]

>>106591560
Thank you, I'll check

Anonymous 9/15/2025, 10:55:01 AM No.106591824 [Report] >>106592224

>>106590868
Hand tracking is very hard.

Anonymous 9/15/2025, 11:08:24 AM No.106591902 [Report]

>>106590886
>not x
What if no-one would have thought "x" was a even a plausible option ? (From the narrative/past events.)

>not x, but y
What if literally no implications flow from "y" in the following text ?

Anonymous 9/15/2025, 11:32:19 AM No.106592017 [Report]

How suitable is openrouter for data processing tasks like fiction tagging? Will it report me to the FBI and NSA if my requests happen to contain unorthodox texts?

Anonymous 9/15/2025, 11:34:09 AM No.106592024 [Report] >>106592033 >>106592110

>>106583124
You may not like it but cooming and other purely recreational stuff is the optimal use case for local models since you know nobody is reading your garbage and uncensored consumer GPU size models can be more fun (though lower IQ) than gigantic models when finetuned for your specific use case like RP/ERP.

For actual beneficial use cases like shitting out useful scripts or whatever just use your favorite giga huge cloud model at all the tokens per second. Gemini 2.5-pro is already way better at coding tasks than anything local, you can use it from command line, it can interact with your file system if you give it perms to a folder and if you log in with jewgle account you get 1000 free requests per day which is good for pretty much anything other than professional amounts of use. The only reason to avoid cloud is if your prompts contain personal info or other info you want to be 100% sure doesn't get stolen like your own, non AI-sloppa code or if you want to do dumb fun stuff like coom.

Anonymous 9/15/2025, 11:36:01 AM No.106592033 [Report]

1742041239054165.gif md5: 078cc787...

>>106592024
I don't want gigacorps to know I'm bad at coding

Anonymous 9/15/2025, 11:42:06 AM No.106592055 [Report] >>106592075 >>106594369

1757929196764.jpg md5: 4785221b...

what's the point of LLM if i can't cuddle it

Anonymous 9/15/2025, 11:47:25 AM No.106592075 [Report]

>>106592055
What' the point of cat if it can't help me write erotica

Anonymous 9/15/2025, 11:52:17 AM No.106592097 [Report]

>>106587021
don't worry, even when they come to the west it's never your turn to get gibs first, it will be bought by pros/researchers who are on the cheaper side, then by scalpers, who will then tear you a new asshole

Anonymous 9/15/2025, 11:52:28 AM No.106592099 [Report]

Is exllama still faster than goof?

Anonymous 9/15/2025, 11:55:10 AM No.106592110 [Report] >>106592242

>>106592024
>gemini free blabla
this is like the drug dealer giving you a free hit
it won't last, running models as good as gemini costs a lot of money, even their most expensive subscriptions doesn't really cover the real cost of LLM business
companies like Google in the LLM space are using the Uber strategy: give a product for much cheaper than it should be, until the competition is dead, then jack the prices up like crazy
you may not see a point to local yet for non recreational uses because you don't see what they're going to do to you in the long term
I do and that's why I won't develop an addiction

Anonymous 9/15/2025, 12:00:53 PM No.106592138 [Report] >>106592144

I think qwen 30b higher quantz have less "not x, but y"

Anonymous 9/15/2025, 12:02:08 PM No.106592144 [Report]

>>106592138
I've only used Q8 and it's still pretty excessive.

Anonymous 9/15/2025, 12:19:13 PM No.106592211 [Report] >>106592227

1728628221216736.gif md5: 7da53576...

GLM Air is surprisingly coherent, creative and non-repetitive even at Q2, 24k context. How did they do it?

Anonymous 9/15/2025, 12:22:18 PM No.106592224 [Report] >>106592348

F1.large.jpg md5: a149cde5...

>>106591824
Meta solved it on the Quest, somehow

Anonymous 9/15/2025, 12:22:25 PM No.106592226 [Report]

gm sirs

Anonymous 9/15/2025, 12:22:56 PM No.106592227 [Report] >>106592231

>>106592211
I found it to be uncreative and predictable like all sub-deepseek moes

Anonymous 9/15/2025, 12:24:14 PM No.106592231 [Report] >>106592328

>>106592227
What do you use for RP?

Anonymous 9/15/2025, 12:25:07 PM No.106592242 [Report]

>>106592110
You are probably right, no such thing as free meal etc, but I don't think they are gonna kill off competition anytime soon so even if they flip the "pay us" switches there's always probably a free or at least cheaper solution to move to.
And not like local assistant use is completely pointless or something, just saying right now I feel like free cloud is the best choice for most use cases where you need the LLM actually to be "correct" unlike in recreational use.

Anonymous 9/15/2025, 12:26:22 PM No.106592247 [Report] >>106592268 >>106592271 >>106592294 >>106592311 >>106592364 >>106592669

not x but y is a lot more pervasive than people seem to notice, but they only notice if it's very close to literally spelling "not just x, but y" like the sloppier models
here's a less sloppy model still doing it quite a lot in practice:
https://eqbench.com/results/creative-writing-v3/o3.html
>He had kept the find quiet; obviously not quiet enough.
>You will seem a magnate rather than a hostage.
>had dreamed of building cranes and pressure domes, not empires.
>Because Antares relies on calculus, not superstition.
>“Altruism,” she said lightly, “is a luxury for stable epochs. This is not one
etc etc
the fact is, the best, state of the art LLMs are still inherent slop and enjoying LLM writing is like being a fatso American calling McDonald gourmet food
AI models as a whole suck at art, it's people who have no soul who enjoy the art side of it
for me? AI is a tool. Classifiers, summarizers, metadata annotations, genning translation of my program UI strings etc. Looking for soulful content in a machine? Nay.

Anonymous 9/15/2025, 12:29:21 PM No.106592268 [Report]

>>106592247
but i just nutted to a non consenting loli, my guttural scream was not only passionate, but an art form in itself. What is art, if not primal urgest being satisfied?

Anonymous 9/15/2025, 12:29:58 PM No.106592271 [Report]

>>106592247
It is possible to enjoy something that is flawed

Anonymous 9/15/2025, 12:33:37 PM No.106592294 [Report] >>106592301

>>106592247
Kys

Anonymous 9/15/2025, 12:34:24 PM No.106592301 [Report]

>>106592294
no u

Anonymous 9/15/2025, 12:36:05 PM No.106592311 [Report] >>106592357

>>106592247
Imagine reading filthy smut and thinking of mcdonalds. How fucking fat are you?

Anonymous 9/15/2025, 12:38:17 PM No.106592328 [Report] >>106592345

>>106592231
Rocinante1.1/original r1 q2xxs

Anonymous 9/15/2025, 12:40:59 PM No.106592345 [Report]

1741039834349967_thumb.jpg.webm md5: 8a0a182c...

WebM not supported

>>106592328
>Rocinante1.1
good joke anon

Anonymous 9/15/2025, 12:41:12 PM No.106592348 [Report]

>>106592224
PoV hand tracking is easier than unconstrained perspective and that image doesn't show anything impressive. They might also be using special cameras, which helps. LeapMotion does that too. The hard part is unconstrained hand tracking when hands interact, have stapled fingers, holding objects, and so on.

Anonymous 9/15/2025, 12:42:10 PM No.106592357 [Report] >>106592375

>>106592311
>How fucking fat are you?
I am not American

Anonymous 9/15/2025, 12:43:20 PM No.106592364 [Report]

>>106592247
>it's people who have no soul who enjoy the art side of it
I don't care about "art". Image gen makes pretty pictures that make my dick hard. LLMs suck my cock.

Anonymous 9/15/2025, 12:45:08 PM No.106592375 [Report]

>>106592357
You are american brained.

Anonymous 9/15/2025, 1:14:52 PM No.106592548 [Report] >>106592585 >>106592642 >>106592679

>take source code of open source software which is well documented
>alternatively let LLM create comments and documentation of everything
>delete all code but leave comments in
>tell your LLM coder to (re)create the software
has someone done this before? I wanna see how far LLMs (especially local LLMs) can go given the optimal conditions. also looking for github repos with which are suited for this task. I'll probably start with OBS which will most likely be too complex. But I can always lower the bar.
And I want to stress again the goal is not to create a slopped version of an existing project. It's more about testing just how far prompt, context and environment engineering can take LLMs.

Anonymous 9/15/2025, 1:20:30 PM No.106592585 [Report] >>106592623 >>106592717

>>106592548
ur dumb and ur shits all retarded

Anonymous 9/15/2025, 1:27:32 PM No.106592623 [Report] >>106592628

>>106592585
not dumb, not retarded, but autistic.

Anonymous 9/15/2025, 1:28:17 PM No.106592628 [Report]

>>106592623
your comment was not insightful, but memeworthy

Anonymous 9/15/2025, 1:30:54 PM No.106592642 [Report] >>106592717

>>106592548
You'd have to trust their comments and understanding of the code first. Here's an example of the first part.
>https://github.com/ggml-org/llama.cpp/pull/15777
You read like you've never used these things before.

Anonymous 9/15/2025, 1:36:13 PM No.106592669 [Report] >>106592704

>>106592247
>contrasting two things is slop
What's next, punctuation is slop?

Anonymous 9/15/2025, 1:37:28 PM No.106592679 [Report] >>106592717

>>106592548
>the goal is not to create a slopped version of an existing project
The goal is to circumvent copyleft licenses, you're being quite obvious.

Anonymous 9/15/2025, 1:41:52 PM No.106592704 [Report]

>>106592669
i had an argument with another anon some time ago about punctuation and capitalization as well
w vntlly grd t stp sng vwls
tp prfrmnc

Anonymous 9/15/2025, 1:44:13 PM No.106592717 [Report] >>106592732 >>106592733 >>106593128

>>106592585
only valid arguments to why my idea is retarded will make me feel dumb. so your comments are pointless until you deliver said arguments. and since you decided to reply instead of ignore, you clearly have an incentive. So following up with
>nah ur stupid
will make you look stupid.

>>106592642
I'm aware. My idea was to only use the cloned repo without github issues/comments. There are projects out there that have all code blocks commented. Maybe I should search for vibe coded repos as they often have everything commented.

>>106592679
please just think for a moment, anon. If that was the goal, I would obviously leave in all the code and tell the LLm to rewrite it in a different way or using a different stack. you really thought you're on to e there, huh?

I'm just gonna do it and report back with the results. I found a ton of demo repos with fully commented code.

Anonymous 9/15/2025, 1:46:19 PM No.106592732 [Report]

>>106592717
>If that was the goal, I would obviously leave in all the code and tell the LLm to rewrite it in a different way or using a different stack.
https://en.wikipedia.org/wiki/Clean-room_design

Anonymous 9/15/2025, 1:46:31 PM No.106592733 [Report]

>>106592717
>I'm aware.
You aren't.
>I'm just gonna do it and report back with the results. I found a ton of demo repos with fully commented code.
Nah. It's fine. Keep those to yourself.

Anonymous 9/15/2025, 2:02:05 PM No.106592857 [Report] >>106592879 >>106592883

Can "Mistral-Nemo-Instruct-2407-GGUF" handle beyond 8K context?

Anonymous 9/15/2025, 2:03:54 PM No.106592879 [Report] >>106592893

>>106592857
Nemo is officially rated for 16K context, I find it mostly coherent up to around 20-24K but it gets noticeably dumber even after 4K.

Anonymous 9/15/2025, 2:04:10 PM No.106592883 [Report]

>>106592857
It can handle ~16k without going schizo

Anonymous 9/15/2025, 2:05:11 PM No.106592893 [Report] >>106592907 >>106592920

>>106592879
>Nemo is officially rated for 16K context
It's actually 128k, but no one who has ever used it agrees with that

Anonymous 9/15/2025, 2:06:03 PM No.106592907 [Report]

>>106592893
>actually
*technically

Anonymous 9/15/2025, 2:07:51 PM No.106592920 [Report] >>106592934

>>106592893
I must be going crazy, I could have sworn it was much lower than that. Maybe I'm confusing it with one of the older context benchmarks that said 16K was the falling off point.

Anonymous 9/15/2025, 2:09:40 PM No.106592934 [Report]

>>106592920
Yeah, it's 16k according to the RULER benchmark, but Mistral claims 128k

Anonymous 9/15/2025, 2:33:12 PM No.106593117 [Report]

>>106593104
>>106593104
>>106593104

Anonymous 9/15/2025, 2:34:24 PM No.106593128 [Report] >>106593179

>>106592717
ur the kind of room temp iq retard that thinks 'AI CAN AND WILL DO IT BETTER THAN HUMIES!!!' when the AI HAS BEEN TRAINED ON HUMAN INPUTS YOU FUCKING RETARD

Anonymous 9/15/2025, 2:39:02 PM No.106593179 [Report]

>>106593128
>And I want to stress again the goal is not to create a slopped version of an existing project.
Don't make me defend the retard again.

Anonymous 9/15/2025, 4:56:40 PM No.106594369 [Report]

).png md5: 81b99c72...

>>106592055
what the point of "self" if I can't cuddle it?

but if seriously, just wait for neural interface to be decent and you can cuddle LLMs all you want