/lmg/ - Local Models General - /g/ (#106034671)

Anonymous

7/26/2025, 5:20:57 PM No.106034671

md5: 092b1dddc2aee94eff77efdabb25a44e🔍

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106022725 & >>106011911

►News
>(07/26) Intern-S1 released: 235B multimodal reasoning model: https://hf.co/internlm/Intern-S1
>(07/25) Qwen3-235B-A22B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-235B-A22B-Thinking-2507
>(07/24) Magistral Small 1.1 update released: https://hf.co/mistralai/Magistral-Small-2507
>(07/24) YUME interactive world generation model released: https://stdstu12.github.io/YUME-Project
>(07/22) Higgs Audio v2 released: https://www.boson.ai/blog/higgs-audio-v2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Replies: >>106034733 >>106034798 >>106035079

Anonymous

7/26/2025, 5:21:33 PM No.106034680

__hatsune_miku_vocaloid_drawn_by_aosaki_yato__33e82cf706871f75e04fa7924c5b157f

md5: 65177e2fc1f5e6ad9ab285618197f9d0🔍

►Recent Highlights from the Previous Thread: >>106022725

--Confidential computing may protect against providers but not against corporations or determined attackers:
>106032051 >106032109 >106032125 >106032847 >106032906 >106032943 >106033227
--Pretraining data augmentation and synthetic recycling as emerging industry standards:
>106032565 >106032668 >106032775 >106033083 >106033222 >106033274
--Optimizing 24B model inference on 16GB GPU via tensor offloading and quantization:
>106030678 >106030699 >106030726 >106030746 >106030888 >106030911 >106030936 >106030952 >106031007 >106031309 >106031127 >106031129
--Meta AI's leaked system prompt reveals poor prompt engineering with excessive negative directives:
>106030482 >106033266 >106033329 >106033349 >106033368 >106033422
--LLMs as unreliable standalone tools but useful when integrated with traditional systems:
>106025198 >106025285 >106025482 >106025526 >106025721 >106025949 >106026069 >106028082
--ST fails to remove thinking blocks starting with parentheses from context:
>106028030 >106028084 >106028130 >106029199 >106029213 >106029266
--Anon builds minimal terminal interface for LLMs:
>106024282 >106024421 >106024944 >106024758
--Intern-S1 is a Qwen3 fine-tune with 235B MoE architecture and multimodal support:
>106031261 >106031267 >106031296 >106031307 >106031358 >106031277 >106031280
--Small 350M model shows strong coherence with custom tokenizer:
>106034024 >106034064 >106034083 >106034143 >106034205
--Challenges in building persistent, stat-aware LLM-driven RPGs locally:
>106033668 >106033772 >106033817
--Anon shares thoughts on NuQwen Thinker for RP:
>106024560 >106024601 >106024668
--Links:
>106033095 >106023407 >106029240 >106023197
--Miku (free space):
>106022834 >106022983 >106023491 >106026316 >106029061 >106029200 >106030177 >106031397

►Recent Highlight Posts from the Previous Thread: >>106022743

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Replies: >>106034773

Anonymous

7/26/2025, 5:26:36 PM No.106034733

>>106034671 (OP)
Never seen this miku before

Anonymous

7/26/2025, 5:29:31 PM No.106034773

>>106034680
>no (You)s
I've been slacking.

Anonymous

7/26/2025, 5:31:20 PM No.106034798

>>106034671 (OP)
Cool local cyberllama

Anonymous

7/26/2025, 5:33:51 PM No.106034824

>>106009695
hi, if you're still here and the rig is too, can you post your discord in this email?
gout_margin330@simplelogin.com
PS: would be willing to ship it half way across the world to India/Germany? I'll pay the shipping of course.

Replies: >>106034951

Anonymous

7/26/2025, 5:47:03 PM No.106034951

>>106034824
I hope you're planning to use it for something other than LLMs because 4 channels of DDR4 is beyond useless.

Replies: >>106035276 >>106035653

Anonymous

7/26/2025, 5:47:42 PM No.106034957

>>106034746
>>106034786
Even with no CUDA the 2k Strix Halo boxes and 10k Mac Studios are still selling, and even literal whos like Rockchip can manage to offer their own inference software stack for their SBCs
If you offer (perceived) value people will buy your shit even with all the pains in the ass, the stupid part of that concept is I doubt you can R&D and sell what would essentially be high end server hardware at bargain bin prices and still make any money off of it

Replies: >>106034995 >>106035282

Anonymous

7/26/2025, 5:51:09 PM No.106034995

>>106034957
inference maybe but I think nvidia still has the stranglehold on the training market

Replies: >>106035043

Anonymous

7/26/2025, 5:55:28 PM No.106035040

I would like to gloat and proclaim that I was probably right about my fan theory that main reason reasoning works is that it improves / fixes attention. It was kinda obvious to me that with reasoning the model basically sums up the context so far which then puts those summary token up top and makes them more significant. And the reason I am thinking it is confirmed is the long context result of 235B non reasoning verision update on long context benchmark. It is like the model has learned to rely on reasoning for the summary and that is why if you forcibly strip it out in continued training it is gonna do poorly on long context.

Replies: >>106035100

Anonymous

7/26/2025, 5:55:39 PM No.106035043

>>106034995
That's the point, isn't it? I'm pretty sure the vast majority here would trade training for cheaper access to more fast memory.

Anonymous

7/26/2025, 5:59:14 PM No.106035079

>>106034671 (OP)
Finally a good non-schizo OP

Replies: >>106035190

Anonymous

7/26/2025, 6:02:19 PM No.106035100

>>106035040
I'll be honest I had zero fucking faith reasoning would do anything when o1 was released, since it didn't seem that beneficial
In retrospect it makes sense though. It's good for retention in creative writing settings since it tends to probe the model to identify and highlight key details before responding (basically like a more natural RAG) and it's good for problem solving since having scratch space to organize thoughts is essential for hard problems
I still wish there was a better way to handle it though. It still feels like there should be a more natural and efficient modality for these models to reason than just a hidden block of text

Replies: >>106035172

Anonymous

7/26/2025, 6:10:46 PM No.106035172

>>106035100
I still think reasoning is crap for creative writing. It causes the reverse problem, where the model focuses on irrelevant details and collapses over time. It's very bad at following the implicit thread of narrative and tone. The only time reasoning works well is when making single-turn one shots. Which makes sense, since that's what it was always for, benchmaxxing single-turn puzzles.

Anonymous

7/26/2025, 6:11:31 PM No.106035180

Reasoning is woke

Anonymous

7/26/2025, 6:12:35 PM No.106035190

p3iae89b28ff1

md5: 91e23e80e95de9f633c1b82b7bb4339b🔍

>>106035079
Next should be picrel.

Replies: >>106035209 >>106037082 >>106037324

Anonymous

7/26/2025, 6:14:07 PM No.106035209

>>106035190
>left: changes clothes
>right: changes into a literally different character
dropped

Replies: >>106035285

Anonymous

7/26/2025, 6:20:48 PM No.106035276

>>106034951
You're totally wrong. Six channels of DDR4 is 2-3 t/s running deepseek r1 q4_k_m. It's slow but not useless.

Replies: >>106035357

Anonymous

7/26/2025, 6:21:18 PM No.106035282

>>106034957
>the stupid part of that concept is I doubt you can R&D and sell what would essentially be high end server hardware at bargain bin prices and still make any money off of it
Ampere (the company) did that for a while, cheap 8-channel ARM server CPUs in workstation form factor with socketable everything.
See https://system76.com/desktops/thelio-astra-a1.1-n1/configure
But they never bothered updating it and they eventually got bought out, unfortunate because a modern equivalent would shit all over Strix Halo/DGX Spark.

Replies: >>106035360

Anonymous

7/26/2025, 6:21:33 PM No.106035285

>>106035209
What if Miku is actually bald and she rotates through wigs because her "hair" is so unwieldly long, coming with annoyances like having to wash it, stepping on herself (safer if it just falls off) or random samurais passing by cutting her hair off to test their swords?

Replies: >>106035308 >>106035324 >>106035348

Anonymous

7/26/2025, 6:23:57 PM No.106035308

>>106035285
hot

Anonymous

7/26/2025, 6:25:47 PM No.106035324

>>106035285
>bald
blasphemy

Anonymous

7/26/2025, 6:27:41 PM No.106035348

>>106035285
>she
>her

Replies: >>106035409

Anonymous

7/26/2025, 6:28:22 PM No.106035357

>>106035276
2-3 t/s for a reasoning model is useless. 10 is bare minimum.

Replies: >>106035653

Anonymous

7/26/2025, 6:28:30 PM No.106035360

>>106035282
>$4,763.00 for 512GB
meh, I'd rather go with 9004

Anonymous

7/26/2025, 6:30:43 PM No.106035382

WanVideoWrapper_FunControlCamera_00048_thumb.jpg

md5: cf31e032217a6624a8ed413aac5de161🔍

I can't find much motivation in LLMs anymore. Even "fintuned" mistral-small is an "it's important to remember that" faggot model... yeah yeah you can talk it into what you want to do, but still.
What kind of safety slop is in the typical base model these days? Say gemma3 12b. Is it just a NSFW-pruned training set?

Replies: >>106035404 >>106035663 >>106037082

Anonymous

7/26/2025, 6:32:20 PM No.106035401

>>106034722
No it is retarded because people can't get it through their thick fucking skulls that even though inferencing is memory bandwidth bound prompt processing is compute bound. You can have 6 billion exabytes per second of memory bandwidth but if you have a narrow compute pipeline like a CPU, and especially some shitty meme ARM CPU it's still going to take forever to do the prompt processing.
On top of that there's diminishing returns when adding more memory channels. Doubling the memory channels doesn't double your real world performance. Not even close. It increases it, sure, but at some point you would find a crossing point where the inefficiencies of managing more memory channels outweighs the increase in bandwidth.

Replies: >>106035536

Anonymous

7/26/2025, 6:32:26 PM No.106035404

>>106035382
Oh yeah where the hell is Wan 2.2?

Anonymous

7/26/2025, 6:32:47 PM No.106035409

>>106035348
[insert ungendered robot pronouns]

Anonymous

7/26/2025, 6:45:34 PM No.106035536

>>106035401 (Me)
Also adding to my rant here multiple memory channels doesn't just divide the memory load bit-per-bit
The memory is basically divided into "pages" of a specific size, (usually somewhere between like 256 bytes and 2 kilobytes) which are then interleaved. And so your performance only theoretically doubles in situations where every bit of data you need alternatingly comes from pages that are on opposite memory channels. (using 2-channel as an example here).
But at the same time having pages that are too small just bogs shit down with mountains of superfluous memory pages and in fact when setting up your memory interleave for running LLMs it actually helps to max out page size to prevent this. So even in a perfect world where there is no inefficiency in switching memory pages you'd have a lot of situations where the next consecutive bit you need is on the same page as the last, negating the bandwidth of the second channel entirely.
If it were just as simple as making 1 trillion channel RAM someone would already be doing it.

Anonymous

7/26/2025, 6:46:56 PM No.106035547

You guys ever used that "memory" function in the web interface of the likes of gemini and chatgpt?
Is there a frontend out there that has something like that for open weight LLMs?
Basically, it's kind of like lorebooks or databank in Silly, only the LLM itself has agency to save and edit these memories and you can request in the conversation for it to do so.
As far as I can tell, it's just an application of tool use.

Replies: >>106035568

Anonymous

7/26/2025, 6:48:43 PM No.106035568

>>106035547
It's literally just vector storage that injects shit into the context as it sees fit. How the prompt is formatted to incorporate it is obviously trade secret. It obviously works way better than any open shit we've had to that effect but it's still not perfect.

Replies: >>106035600

Anonymous

7/26/2025, 6:50:48 PM No.106035600

>>106035568
>trade secret
>works better than any open shit
lmao

Replies: >>106035668

Anonymous

7/26/2025, 6:55:09 PM No.106035653

>>106034951
yep that is one of the use cases, other than that I just want to enter the server h/w world and be core and RAM rich. I also plan on opening up my services to my extended techy family so this would be plenty for that.
It has 6 channels though, how many are considered ideal?
>>106035357
>2-3 t/s for a reasoning model is useless. 10 is bare minimum.
I do agree here. 2-3 t/s is only usable for non-reasoning models. It once took 17 minutes for QwQ 32B running at 2 t/s to generate a shitty coding related answer. That was the first and last usage of it.

Anonymous

7/26/2025, 6:56:16 PM No.106035663

>>106035382
Why not just ignore the retards in this thread that say that the moment you touch the weights the model becomes retarded and become a sloptuner yourself.

You can almost certainly RL against those moralizing patterns to make them almost never happen. You just need the VRAM anon.and a lot of patience to experiment. Unlearning is also a thing. There's so much that can be done to improve the quality of existing models, but people seem to have just given up because tuning SOTA is too VRAM expensive.

Gemma likely was trained with excessive synthetic slop (distillation too), and it hasn't seen much NSFW.

Replies: >>106035738 >>106035768 >>106035849 >>106036656

Anonymous

7/26/2025, 6:56:48 PM No.106035668

>>106035600
It does work well.
the only place the chatgpt 'memories' seem to fail me is with image generation. It adds image generation tokens to the memories for some ungodly reason which then causes weird shit to happen if you ask it to generate an image at the end of a long conversation.

Replies: >>106035805

Anonymous

7/26/2025, 6:56:49 PM No.106035669

file

md5: 0bf10542a78541d774cd0d2d335fc54d🔍

Replies: >>106035901 >>106037082

Anonymous

7/26/2025, 7:03:58 PM No.106035738

>>106035663
>You can almost certainly RL against those moralizing patterns
you cant
>Unlearning is also a thing.
it isnt
>There's so much that can be done to improve the quality of existing models
there isnt
>but people seem to have just given up
its over

Replies: >>106035878

Anonymous

7/26/2025, 7:04:02 PM No.106035742

Did anything interesting come from that mercury LLM (diffusion-based)?

Anonymous

7/26/2025, 7:05:55 PM No.106035764

>I notice he made a typo again ("solutiton"), but I'll let it slide since he's clearly trying to understand.
r1 is bullyng me in its thoughts again ;_;

Replies: >>106035931 >>106035961

Anonymous

7/26/2025, 7:06:12 PM No.106035768

>>106035663
>become a sloptuner yourself.
Quickest way for completely losing interest in *actually using* the models, while wasting money (yours or somebody else's) in the process.

Replies: >>106035878

Anonymous

7/26/2025, 7:08:57 PM No.106035805

>>106035668
Not saying it doesn't work, lmaoing at 'no open solutions matching it' and it being a 'trade secret'

Anonymous

7/26/2025, 7:11:36 PM No.106035849

>>106035663
>You can almost certainly RL against those moralizing patterns to make them almost never happen.
>almost certainly
>almost never happen
I trust this clueless retard

Replies: >>106035878

Anonymous

7/26/2025, 7:14:10 PM No.106035878

>>106035738
It is, I've tried it on toy models, it works.
I don't have the VRAM for larger though
>>106035768
.How come? Because it becomes "work"?
>>106035849
>never does research
>always assumes it's impossible
this thread is a dead-end because of this demoralizing type of posting
If you actually think optimizing in a way that makes a pattern unlikely to happen is impossible, I have a few bridges to sell you

Replies: >>106035939 >>106036140

Anonymous

7/26/2025, 7:15:35 PM No.106035901

>>106035669
i like this style, what's the model/lora?

Replies: >>106035913

Anonymous

7/26/2025, 7:16:32 PM No.106035913

>>106035901
https://danbooru.donmai.us/posts?tags=akableak

Replies: >>106035934

Anonymous

7/26/2025, 7:18:14 PM No.106035931

>>106035764
>misspell a character's name
>R1 spends half of its thinking analyzing it as if I did it on purpose
I just refer to characters as {{char}} now.

Replies: >>106035945 >>106035972 >>106035986

Anonymous

7/26/2025, 7:18:20 PM No.106035934

>>106035913
thans

Anonymous

7/26/2025, 7:18:39 PM No.106035939

>>106035878
>It is, I've tried it on toy models, it works.
Just because it works on toy models doesn't mean it will be so easy to scale up. Smaller models are so stupid they can barely keep their own moralizing patterns straight on their own.

Replies: >>106036008

Anonymous

7/26/2025, 7:19:21 PM No.106035945

>>106035931
>using reasoning for rp
is it worth it?

Replies: >>106036012

Anonymous

7/26/2025, 7:20:20 PM No.106035961

>>106035764
me analysing my gf's timeline

Anonymous

7/26/2025, 7:21:18 PM No.106035972

>>106035931
>Hmm. The user wrote {char}} in their message. Perhaps it is some sort of code or puzzle? Let's see. To "char" is to burn or blacken [...]

Replies: >>106035986 >>106036012

Anonymous

7/26/2025, 7:22:54 PM No.106035986

>>106035972
>>106035931
lol

Anonymous

7/26/2025, 7:24:21 PM No.106036008

>>106035939
Maybe, although,can't know without trying.
I'm a poorfag, so I lack the VRAM, but there's at least a few people with 96GB+ VRAM in this thread, they could tune those 12Bs with that.
For larger, you probably would have to rent though.

Anonymous

7/26/2025, 7:24:28 PM No.106036011

Is there a better model for vision than molmo 7b? It has to be smarter AND not a falls-silent-if-triggered-faggot like gemma when it comes to describing things.
I'm trying to caption lora images and it's just too damn much to do by hand.

Anonymous

7/26/2025, 7:24:37 PM No.106036012

>>106035945
I think so. I haven't tried V3. I was using Mistral Large 2 until R1 and I hated its positivity bias. R1 doesn't have that and I'm pretty sure it's because it thinks which stops it from leaning into whatever I write. Maybe V3 would be good too, I don't know.
>>106035972
Kek, ST replaces {{char}} with the character's name though.

Replies: >>106036027 >>106036038

Anonymous

7/26/2025, 7:25:50 PM No.106036027

>>106036012
Anon, reread his post. He didn't say "{{char}}".

Replies: >>106036080

Anonymous

7/26/2025, 7:26:45 PM No.106036038

>>106036012
r1 works well with alpaca template and doesn't try to think. you can even just do story writing with it without formatting, with mikupad or similar. just saying

Anonymous

7/26/2025, 7:27:10 PM No.106036046

1728785224121273

md5: 53891f54b225f0a0f9e8ed0cf81a58a0🔍

lol this is so deceptive. I've been debugging this for hours.
Accidentally passing float instead of bool means not only are attending to future tokens, you're attending more (+1)

Replies: >>106036085 >>106036116

Anonymous

7/26/2025, 7:30:01 PM No.106036080

>>106036027
Yeah, I'm not good at reading.

Replies: >>106036102

Anonymous

7/26/2025, 7:30:19 PM No.106036085

>>106036046
float mask would have to be -1e8 or something for it to work

Replies: >>106036092

Anonymous

7/26/2025, 7:30:47 PM No.106036092

>>106036085
float('-inf')

Anonymous

7/26/2025, 7:31:40 PM No.106036102

>>106036080
This might be a poor choice of hobby for you then.

Anonymous

7/26/2025, 7:32:52 PM No.106036116

>>106036046
lol the exact same thing happened to me, torch.bool getting upcasted and breaking my attention mask, what a fucked up api

>not only are attending to future tokens, you're attending more (+1)
Not true, True(1) means to include the token, so it attends to past tokens "more" if you have this bug. Of course any amount of future data bleeding in is still a bug and +1 isn't that much. Make sure you don't still have a bug

Replies: >>106036136

Anonymous

7/26/2025, 7:34:13 PM No.106036136

>>106036116
True means not allowed to attend though

Replies: >>106038916

Anonymous

7/26/2025, 7:34:33 PM No.106036140

>>106035878
>How come? Because it becomes "work"?
That is one factor. Another is that getting to see how the models work behind the curtain is a big turn off that makes you not want to seriously interact with them. And eventually (unless you're putting serious data gathering/cleaning efforts and compute, i.e. money) you may slowly come to the realization that you could have got there anyway with a fraction of the effort and time by better prompting the models you already had.

Do it if you're curious about the process, but forget about making "better models", especially if you're doing it alone. Those recommending others to finetune are probably too invested in the art (i.e. are getting paid in some way) and/or delusional.

Replies: >>106036246 >>106036275 >>106036286

Anonymous

7/26/2025, 7:41:02 PM No.106036209

IT'S SUNDAY IN CHINA AND GLM 4.5 IS STILL NOT RELEASED

Anonymous

7/26/2025, 7:44:39 PM No.106036246

>>106036140
> Another is that getting to see how the models work behind the curtain is a big turn off that makes you not want to seriously interact with them
I don't know about that, if you already knew from the start you're just dealing with an autoregressive LLM, just a completion model, it wouldn't be that much of a surprise. I think i this was the case, people wouldn't work on things, but they do, because they see some magic/potential in it regardless.
>"better models
better as far as your subjective taste. I would say that if the earlier poster that was annoyed at the moralizing patterns wanted them gone, he could do it, there's at least 5-6 methods I can think of that can be used to make them less likely to happen.
I think if you're not doing it for "yourself" then there's little point in working on it anyway, you have to be the user. And yes, maybe you can get the same effect with prompting, but it's frustrating to know it produces that output by default and surely you'd want to fix it.

Anonymous

7/26/2025, 7:48:14 PM No.106036275

>>106036140
I have a very good local TTS model, you don't. Training as a private individual also has advantages. Think about what these could be.

Anonymous

7/26/2025, 7:49:32 PM No.106036286

>>106036140
>to see how the models work behind the curtain is a big turn off
Why?

Replies: >>106036381

Anonymous

7/26/2025, 7:49:40 PM No.106036287

I've started using a depth 0 author's note that tells nuqwen instruct to write 2-5 paragraphs and this seems to have cured it's one-liner addiction

Anonymous

7/26/2025, 7:50:05 PM No.106036292

mikuquestion2

md5: 5dc450542c36df3307e4681904a46926🔍

Is it possible to have official /lmg/ mascot Hatsune Miku automatically sing the output of local models?

Replies: >>106036430 >>106037082

Anonymous

7/26/2025, 7:59:05 PM No.106036381

>>106036286
You'll see that LLMs are not as smart as you'd think, that their outputs are mostly a function of the bias of the training data you're using, that they'll never learn what you want to the degree you'd like without destroying their previously learned capabilities, that you'll never get too far with just LoRA/QLoRA finetuning (and ultimately, that machine learning in general sucks).

And then, if you've curated the data yourself, perhaps even going through every sample by hand, after a while you won't really want to see any of that anymore.

Replies: >>106036565

Anonymous

7/26/2025, 8:05:25 PM No.106036430

>>106036292
You need a license.

Replies: >>106036468 >>106036476

Anonymous

7/26/2025, 8:08:40 PM No.106036468

>>106036430
Oi m8 u got a loicense for dat mascot?

Anonymous

7/26/2025, 8:09:16 PM No.106036476

343

md5: a1ea1847aa64dae1306c52d656f6d90b🔍

>>106036430
>license
Which Git zoo did you escape from?

Anonymous

7/26/2025, 8:18:02 PM No.106036565

>>106036381
NTA, but
>You'll see that LLMs are not as smart as you'd think
you already can see them fuck up in the dumbest ways, but people still use them? Bigger usually is less dumb though.
> mostly a function of the bias of the training data you're using
does this come as a surprise?
> that they'll never learn what you want to the degree you'd like without destroying their previously learned capabilities
just make sure the capabilities you care about are invoked in the batch to make sure SGD or whatever optimizer you're using doesn't destroy it, that and catastrophic forgetting is less bad with larger models.
>you'll never get too far with just LoRA/QLoRA finetuning
true, it's slower and causes artifacts
you could still do higher rank loras and merge them in, but more steps are needed. ideally, full finetuning is preferable.
>And then, if you've curated the data yourself, perhaps even going through every sample by hand, after a while you won't really want to see any of that anymore.
maybe you need better tools and figuring out how to slavedrive your LLMs better and with more automation. obviously gets into synth slopping territory, but do you really believe most labs hand curate everything?

Anonymous

7/26/2025, 8:25:25 PM No.106036656

>>106035663
I feel like a lot of the slo[tune hate comes from vramlets running 12b-30b models. Once you get to 70b, the sloptunes become smart enough for smut and are just better.

Also, as companies clean their datasets better and better, smut may become functionally impossible with more modern models (maybe).

Replies: >>106036746 >>106036829 >>106037052

Anonymous

7/26/2025, 8:27:55 PM No.106036679

What are people using to convert HTML to Markdown? Digging into Claude Code, I found that it uses https://mixmark-io.github.io/turndown/ internally. There's also the Obsidian Web Clipper and Pandoc.

Replies: >>106036897

Anonymous

7/26/2025, 8:33:09 PM No.106036746

>>106036656
good 50b when
70 is too goddamn slow

Replies: >>106036840 >>106036893

Anonymous

7/26/2025, 8:39:12 PM No.106036829

>>106036656
Looking at how things are going with all that age verification b.s. and all things considered we may be in a situation in which Nemo is still the only good uncensored model in 2035.

Replies: >>106036881

Anonymous

7/26/2025, 8:40:23 PM No.106036840

>>106036746
you can try TheDrummer_Valkyrie-49B-v1. Its the only sloptune I know of at the midpoint of 30 and 70b. I think it's okay and definitely feels kinda like 70b.

Replies: >>106037008

Anonymous

7/26/2025, 8:43:38 PM No.106036881

>>106036829
You already have much larger models with unfiltered datasets though? And China doesn't filter as much (recent Qwen's not counting).
By203 5 the price of the hardware would have dropped considerably enough.

Replies: >>106036904 >>106036994

Anonymous

7/26/2025, 8:44:21 PM No.106036893

>>106036746
Llama-3_3-Nemotron-Super-49B-v1_5

Anonymous

7/26/2025, 8:44:49 PM No.106036897

>>106036679
>heading tag not converted to a markdown heading

Anonymous

7/26/2025, 8:45:35 PM No.106036904

>>106036881
Your right, buddy boy.

Anonymous

7/26/2025, 8:49:50 PM No.106036955

llamacpp training state? what can I expect?

Replies: >>106037249

Anonymous

7/26/2025, 8:53:02 PM No.106036994

>>106036881
>And China doesn't filter as much
*Two* model makers haven't filtered much. It's not a lot considering all the models they release (which are all just derivatives of ds and qwen). And as always, until they follow it up with a new model with as few restrictions as the previous one, they're all suspect.
>By203 5 the price of the hardware would have dropped considerably enough.
People will traffic used 3090s in their assholes to get them through customs. The good news is that models stopped getting bigger. That would have been terrible...

Replies: >>106037093

Anonymous

7/26/2025, 8:54:10 PM No.106037008

>>106036840
buy an ad you fucking faggot

Anonymous

7/26/2025, 8:54:41 PM No.106037015

thedr*mmer should be put on the spam list

Replies: >>106037038

Anonymous

7/26/2025, 8:56:42 PM No.106037038

>>106037015
>thedr*mmer
Are you triggered by words?

Anonymous

7/26/2025, 8:58:02 PM No.106037052

>>106036656
I hate sloptuners because in aggregate they're fraudulent, slimy personalities pretending to be ML luminaires, who seem to believe they're making good shit just because they've found clueless retards willing to throw money or free compute at them for their experiments.

Replies: >>106037093 >>106037130

Anonymous

7/26/2025, 8:59:27 PM No.106037069

Is cydonia 24b any good? better than nemo/rocinate?

Replies: >>106037091 >>106037103

Anonymous

7/26/2025, 9:00:22 PM No.106037082

janny tongue my anus

md5: 142f6cda899bd2ad1432ae5e75341a5b🔍

>>106035190
>>106035382
>>106035669
>>106036292
vocaloidtranny posting porn in /ldg/: >>105715769
It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation: https://desuarchive.org/g/thread/104414999/#q104418525 https://desuarchive.org/g/thread/104414999/#q104418574
he makes ryona picture: >>105714003 of some random generic anime girl the different anon posted earlier: >>105704741 (could be the vocaloidtranny playing both sides)
here >>105884523 he tests bait poster bot for better shitflinging in threads
admits spamming /v/ with AI slop: https://desuarchive.org/g/thread/103462620/#103473545

Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.

TLDR: vocaloid troon / janitor protects resident avatarfags and deletes everyone who outs him, making the general his little personal safespace with samefagging. Is prone to screech "Go back to teh POL!" when someone posts something mildly political about language models or experiments around topic.

As said in previous thread(s) >>105716637 I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed spamming. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted

xis ai slop profiles
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models

Replies: >>106037088 >>106037093 >>106037117 >>106039690

Anonymous

7/26/2025, 9:00:53 PM No.106037088

>>106037082
based

Anonymous

7/26/2025, 9:01:09 PM No.106037091

>>106037069
I've been testing it with my dungeons and dragons python thing and I think 3.2 base Mistral is better.
However don't take my word alone as I haven't been that extensive about it yet. I think Cydonia has lost some IQ in the process if I'm correct...

Anonymous

7/26/2025, 9:01:20 PM No.106037093

>>106037052
You're jealous they get free compute. I'm jealous too, but I wouldn't hate them for being more fortunate!
>>106036994
>People will traffic used 3090s in their assholes to get them through customs
Eh, fuck doomers, they won't win.
Things aren't looking that bad, Trump endorsed open source AI/open weights, he even let Jensen sell to any country besides China, the Biden diffusion rule is gone, even China is getting those h800s
>>106037082
fuck off

Replies: >>106037149

Anonymous

7/26/2025, 9:02:15 PM No.106037103

>>106037069
To add: Rocinante isn't that good either not even for a dilly dally simple chat. It's somewhat strange that it gets spammed here so much. 12B models are trash in general.

Replies: >>106037123

Anonymous

7/26/2025, 9:03:02 PM No.106037117

afc0e3f33e42d9924bf22a2212307b2429fbb9bb2ebe927f590a72152b9d7bb8

md5: 3ff48855adf799e4ba7e979203319c20🔍

>>106037082
Have another.

Replies: >>106037708

Anonymous

7/26/2025, 9:03:29 PM No.106037123

>>106037103
thedr*mmer sloptunes are all shit. the neet faggot is spamming his shit constantly here because he has mental issues.

Replies: >>106037154

Anonymous

7/26/2025, 9:04:11 PM No.106037130

>>106037052
how so? Im using qwen 235b and no matter what it's biased to start talking about how 'but then they realized the joys of consent and mutual respect" no matter how hard you jailbreak it within a few paragraphs. Only a sloptune can fix that, only a sloptune ever will. Every LLM ever made has a personality, and changing it to fit the purpose is fine. The sterile models good at coding and ethics put out by corporations are just a different kind of slop. I know you don't enjoy using that crap, and if you are, youre a frustrated cuck who just hates local.

Replies: >>106037146 >>106037238

Anonymous

7/26/2025, 9:05:04 PM No.106037144

Rocinante is the official /lmg/ model and only a single schizo samefag posts that it's bad because he hates Drummer for mental illness-related reasons.

Replies: >>106037182 >>106037817

Anonymous

7/26/2025, 9:05:17 PM No.106037146

>>106037130
>Only a sloptune can fix that, only a sloptune ever will.
Sloptunes don't fix that at all.

Replies: >>106037178

Anonymous

7/26/2025, 9:05:32 PM No.106037149

>>106037093
Things will work out well if we get just one other company (not AMD because they dont compete) undercutting vram jewery from Nvidia, but if that doesn't happen I can see always being perpetually overpriced and gens behind what would be reasonable in a competitive hardware market.

Replies: >>106037182

Anonymous

7/26/2025, 9:05:50 PM No.106037154

>>106037123
Everyone who has tried his shit has complained about how retarded and fried they are, logs were posted by people who gave in and tried them and showed all the shivers and refusals, and yet thread after thread you have people recommending his models and only his models with nothing to back it up.

Replies: >>106037199 >>106037817

Anonymous

7/26/2025, 9:07:21 PM No.106037178

>>106037146
yah they do. Fucking eva-qwen 70b will just start rapes for the hell of it. Dumbass. Why you gotta make such a dumb post on 4chan? Did I hurt your fucking feelings or some shit?

Replies: >>106037207

Anonymous

7/26/2025, 9:07:48 PM No.106037182

>>106037144
Now don't get ahead of yourself. Some are okay, but don't go thinking it's SOTA or anything., it's semiusable as many sloptunes.
>>106037149
Intel, Tenstorrent, a bunch of chinese GPU makers, a new one appeared in recent days, saying they're shipping ~september. I think most are just using GDDR, but remember blackwell 6000 is *just* GDDR7 and reaches 96GB.

Replies: >>106037817

Anonymous

7/26/2025, 9:09:06 PM No.106037199

>>106037154
you have one (1) spammer recommending them (hint: its the faggot himself). he is begging for donations on discord and has bots spamming r*ddit with his retarded sloptunes as well.

it's all so exhausting.

Anonymous

7/26/2025, 9:09:36 PM No.106037207

>>106037178
>eva-qwen 70b will just start rapes for the hell of it.
This is what the average sloptune user thinks is a selling point. I guess if all you need is ahh ahh mistress to get off sloptunes are great.

Replies: >>106037244

Anonymous

7/26/2025, 9:10:59 PM No.106037226

the drummer is like hatsune miku

Anonymous

7/26/2025, 9:11:43 PM No.106037238

>>106037130
the subtext to this post is that sloptunes are for people with severe skill issues

Replies: >>106037261

Anonymous

7/26/2025, 9:12:07 PM No.106037244

>>106037207
have fun with models that have to be explicitly prompted to use the word penis, then they use it once, and revert back to flowery language.

llama.cpp CUDA dev !!yhbFjk57TDr

7/26/2025, 9:12:30 PM No.106037249

>>106036955
>llamacpp training state?
My work on it is on hold until I've achieved other objectives, in particular better code for evaluating the quality of finetuned models and higher batched throughput for the HTTP server.

>what can I expect?
A long wait.

Replies: >>106037283

Anonymous

7/26/2025, 9:13:49 PM No.106037261

>>106037238
no thats bullshit. The prune the data the models are trained on. Many models have literally never seen smut. You're prompting a tiny part of the model that has been stripped of all soul on purpose. You corporate shill.

Replies: >>106037278 >>106037290 >>106037314 >>106037973

Anonymous

7/26/2025, 9:15:04 PM No.106037278

>>106037261
>Many models have literally never seen smut.
Exactly, and you think a quick finetune is going to fix that?

Anonymous

7/26/2025, 9:15:17 PM No.106037283

>>106037249
Oh, i thought it was already merged.

Replies: >>106037302

Anonymous

7/26/2025, 9:15:50 PM No.106037290

>>106037261
hello the*rummer keep begging for 25$ on discord to release your slopshit

llama.cpp CUDA dev !!yhbFjk57TDr

7/26/2025, 9:16:31 PM No.106037302

>>106037283
A PR of mine that added training support has been merged but I don't think the code is in a state where it's actually usable beyond a proof of concept.

Anonymous

7/26/2025, 9:18:10 PM No.106037314

>>106037261
mhm that's nice sweetie go back to the kobold discord now

Anonymous

7/26/2025, 9:18:48 PM No.106037324

1748016876223423

md5: be2dc98ebdbc53c4f14a78e76bb27c76🔍

>>106035190
<A challenger appears>

Replies: >>106038329

Anonymous

7/26/2025, 9:48:39 PM No.106037708

>>106037117
>wifi signal or loud sleeves?

Anonymous

7/26/2025, 9:49:53 PM No.106037721

https://github.com/SillyTavern/SillyTavern/releases/tag/1.13.2

Replies: >>106037759

Anonymous

7/26/2025, 9:50:34 PM No.106037735

>he pulled

Anonymous

7/26/2025, 9:52:33 PM No.106037759

>>106037721
Looks unimportant what am i missing?

Anonymous

7/26/2025, 9:56:36 PM No.106037798

1737537680115987

md5: b8934275834e4a4d5100fc5f706f4e5f🔍

bros...

Replies: >>106039938

Anonymous

7/26/2025, 9:58:46 PM No.106037817

>>106037144
>>106037154
>>106037182
yet not one, not ONE of you faggots offer a model that is better than Rocinante.
Hypocrites, the lot of you.

Replies: >>106037825 >>106037845 >>106037902

Anonymous

7/26/2025, 9:58:49 PM No.106037818

got the opportunity to buy a modded 2080 ti with 22gb of vram for about 400 usd, worth it or nah? kinda wanna go for it just for the novelty but like, i could also just keep my money

Replies: >>106037935 >>106037950

Anonymous

7/26/2025, 9:59:40 PM No.106037825

>>106037817
Because there isn't one.

Anonymous

7/26/2025, 10:00:16 PM No.106037840

glm4 100b moe is going to save local

Anonymous

7/26/2025, 10:00:52 PM No.106037845

>>106037817
R1

Anonymous

7/26/2025, 10:05:19 PM No.106037902

>>106037817
In the same size class? I don't know I haven't tested every slop tune out there

But things like R1 or DS3 need no finetune and easily beat it.

Anonymous

7/26/2025, 10:07:50 PM No.106037935

>>106037818
If you don't miss the money why not. I'd get it if I had that opportunity.

Anonymous

7/26/2025, 10:09:03 PM No.106037950

>>106037818
You can use that VRAM for CUDA, but you can't use it for games, I think.

Anonymous

7/26/2025, 10:11:04 PM No.106037973

>>106037261
>The prune the data the models are trained on. Many models have literally never seen smut. You're prompting a tiny part of the model that has been stripped of all soul on purpose.
Didn't you just accidentally explain why sloptunes don't work?

Replies: >>106038037

Anonymous

7/26/2025, 10:16:54 PM No.106038037

>>106037973
Not drummer, but:
Both you and quoted are wrong.
You're implying continued pretrain is necessary. Maybe it is for good quality, but in practice most have seen some smut, but of purple prose variety, and some knowledge is there, so a finetune would be able to get it to write more explicit.
You're also wrong to assume that you'll truly need billions of tokens to make this work.
Billions of tokens might be needed to get much better, more natural quality to it, but something half-way there can be had with far less.
The question is how much can a sloptune achieve in practice. How are you going to benchmark that, besides "try it and find out"

Replies: >>106038053 >>106038083

Anonymous

7/26/2025, 10:18:25 PM No.106038053

>>106038037
You're alright. The thread really be like
>sloptunes bad!
>sloptunes good!
>nuh uh
>nuh uh

Anonymous

7/26/2025, 10:20:24 PM No.106038083

>>106038037
>The question is how much can a sloptune achieve in practice.
I think it is either nothing and model still works or something and model overfits to the point it is retarded.

Replies: >>106038169 >>106038194 >>106038201

Anonymous

7/26/2025, 10:26:32 PM No.106038169

>>106038083
I guess I disagree here, you can get coherent and quite usable output. I think people forget that almost every single instruct tune from corpos is already overfit to hell. The probabilities a base model outputs and what instruct/chat tunes output are very different, the latter often tend to have less variable outputs. There's still plenty of room for finetunes.

Replies: >>106038183

Anonymous

7/26/2025, 10:27:49 PM No.106038183

>>106038169
>you can get coherent and quite usable output
Yes when you do 1 epoch on low learn rate. Then the finetuning did nothing.

Replies: >>106038279

Anonymous

7/26/2025, 10:28:24 PM No.106038194

>>106038083
nta.
If finetuning does nothing, instruct tuning wouldn't be possible. We know it can be done.
If finetuning can only fry models, good instructs wouldn't be possible. We know it can be done.
So if we can do instruct, why not something else?
Tuners being shit at it or not is a different thing. I rather they keep on trying.

Replies: >>106038254 >>106038273 >>106038335

Anonymous

7/26/2025, 10:28:57 PM No.106038201

>>106038083
In my experience some tunes improve on the intelligence of the parent model in some contexts but decrease intelligence in others, and ultimately they are not truly better or worse in intelligence when it comes to writing tasks. But they do uncensor the model a bit and change the style noticeably, so it's a net improvement if you're willing to load up a model for RP and use a different model for assistant stuff.

Replies: >>106038230 >>106038279

Anonymous

7/26/2025, 10:31:17 PM No.106038230

>>106038201
Most tunes are absolute garbage though btw, like 99% of the ones you see on HF and the greedy nala test.

Anonymous

7/26/2025, 10:33:53 PM No.106038254

>>106038194
>If finetuning does nothing, instruct tuning wouldn't be possible.
Don't they interchange instruct shit with some generous pretraining iterations?

Replies: >>106038279 >>106038327

Anonymous

7/26/2025, 10:35:22 PM No.106038273

>>106038194
>I rather they keep on trying.
For the most part they're keeping methods and the data to themselves, so any individual or lab that would like to reproduce the results has to start from scratch. Sloptuners can go get fucked. AI companies at least give us entire base models and professionally-made general-purpose instruct tunes.

Replies: >>106038292 >>106038327

Anonymous

7/26/2025, 10:36:03 PM No.106038279

>>106038183
I've seen cases where at high enough LR or bad hyperparams you fry it quite badly, I've seen cases where it barely learns, but you can find a good goldilocks zone where it's not really broken, and it has learned something. Obviously you'll always trade stuff off as >>106038201 said, you can at best try to harm the least and get the most of what you want.
There is no unbiased tune out there, not from corpos, not from sloptuners, but there's plenty fun ones? Until youget tired of the slop of that particular variant! Even big stuff like R1 and K2 have slop you'll notice given enough tries.

>>106038254
They didn't do this originally. Nowadays this is called "annealing", and they put a bit of instruct mix into the base dataset to make instruct tuning easier.
It's also why Llama3 and a few others like Qwen start regurgitating math slop on an empty context when you use the base model, they clearly altered the distribution of data from the normal ones so that it's better at benchs and easier to tune for their purposes.

Replies: >>106038317

Anonymous

7/26/2025, 10:37:21 PM No.106038292

>>106038273
>AI companies at least give us entire base models
I thought they'd stopped doing this and are just putting out the instruct tuned models now.

Replies: >>106038302

Anonymous

7/26/2025, 10:38:20 PM No.106038302

>>106038292
Some still do base model releases, but many are annealed.

Replies: >>106038317

Anonymous

7/26/2025, 10:39:34 PM No.106038317

>>106038279
>>106038302
>annealing
Wait that's what that term was always referring to?
Sheeit.

Replies: >>106038349

Anonymous

7/26/2025, 10:40:11 PM No.106038327

>>106038254
DS apparently didn't and it's one of the best models. Back in the day, a dedicated instruct tuning round was the norm. Base was Base.

>>106038273
>AI companies at least give us entire base models
SOME AI companies at least give us SOME models. Not that we'd be able to reproduce anything they make with what they provide. Other than allenai probably.
>and professionally-made general-purpose instruct tunes.
Yeah. Those falcon models are looking great...

Anonymous

7/26/2025, 10:40:22 PM No.106038329

Gww0y24bgAAoKy7

md5: 84dc697356b632d4a7de8de3404151e4🔍

>>106037324
still amazed people save these

Anonymous

7/26/2025, 10:40:46 PM No.106038335

>>106038194
Let's make an analogy
A person that was cared for excessively by his parents, never got to experience the real world, grew up with an immature, incomplete world view. This is a model with a "curated" pretraining dataset. Instruction sft would be like teaching him how to be an assistant, it's relatively simple and desn't require new knowledge, only adhering to a pattern. But you can't teach him how to be a street gangster because he only has a superficial understanding of this concept. He may memorize some behaviors but that's about it.

Replies: >>106038366 >>106038546

Anonymous

7/26/2025, 10:41:56 PM No.106038348

is running local models even worth it? like these models in the range up to 72b are retarded, it's better to just pay for the claude or grok than to buy few gpus, I have been thinking about a maxxed out macbook pro but buying base and paying for subscription will be many times cheaper, I would love to run grok 4 locally but the tech just isn't there yet and I am thinking it will never be since this is all just a big scam or like in 20 years

Replies: >>106038403

Anonymous

7/26/2025, 10:42:01 PM No.106038349

>>106038317
Yes, it's usually stuff like instruct data, synthetic slop (for example math) and similar stuff to what they'd try to benchmark it on. There were a few non-annealed models base models recently , maybe DOTS one or some others. I forgot. Unfortunately I think that one still filtered NSFW somewhat from the dataset.

Anonymous

7/26/2025, 10:43:55 PM No.106038366

>>106038335
Now this goes to sample efficiency, how many fiction books do you have to train until it can do a character or style well. This is something you probably ocould get concrete data on.

Replies: >>106038422

Anonymous

7/26/2025, 10:45:13 PM No.106038376

So to sum it up. Dots was bad at sex and didn't save local. Minimax was bad at sex and didn't save local. Exaone was bad at sex and didn't save local. Hunyuan was bad at sex and didn't save local. Ernie was bad at sex and didn't save local. Nemotron was bad at sex and didn't save local.

Replies: >>106038383 >>106038406 >>106038436 >>106038545

Anonymous

7/26/2025, 10:45:34 PM No.106038381

Any good Ani cards that use and improve upon the original's prompt?

Replies: >>106038456

Anonymous

7/26/2025, 10:45:40 PM No.106038383

>>106038376
qwen was good at sex and saved local

Anonymous

7/26/2025, 10:46:54 PM No.106038403

>>106038348
Depends on whether you feel comfortable with elon reading your mesugaki correction rps.

Anonymous

7/26/2025, 10:47:23 PM No.106038406

>>106038376
Qwen was good at sex and didn't save local (it's still dumb and it mostly still doesn't know more pop culture than a 27B).

Anonymous

7/26/2025, 10:48:53 PM No.106038422

>>106038366
Yeah it's possible but sloptunes are called sloptunes for a reason.

Anonymous

7/26/2025, 10:50:13 PM No.106038436

>>106038376
Is Kimi bad too? Haven't bothered to try it.

Replies: >>106038479

Anonymous

7/26/2025, 10:51:00 PM No.106038447

file

md5: 33f98feb135b48734651dd6abe446bbe🔍

I found you "MoE is bad" poster.

Anonymous

7/26/2025, 10:52:15 PM No.106038456

1753153010352392_thumb.jpg

md5: 1268eb8fec4c7719412e0066038b12cd🔍

>>106038381
I don't think ani made a card for his mascot yet.

Anonymous

7/26/2025, 10:53:54 PM No.106038479

>>106038436
It's very good, but the instruct is safetyslopped, needs a prefill to make it not refuse most porn.
Base model probably is quite good, but I'd be surprised if you can beat the official instruct at performance.
Unironically it needs an uncensoring tune or merge back to base of the"refusing" experts, to uncensor it. Prefilling works without any modifications, of course.

Replies: >>106038498

Anonymous

7/26/2025, 10:55:06 PM No.106038498

>>106038479
>or merge back to base of the"refusing" experts
Has anyone tried this on any MoE? At least in theory it sounds plausible.

Replies: >>106038551

Anonymous

7/26/2025, 10:59:33 PM No.106038545

>>106038376
What does good sex look like?

Replies: >>106038599

Anonymous

7/26/2025, 10:59:41 PM No.106038546

>>106038335
Are we using bad analogies now? Fine. When did you stop learning things?
>teaching him how to be an assistant, it's relatively simple and desn't require new knowledge, only adhering to a pattern
It's a statistical model. Following patterns is what it does.
>But you can't teach him how to be a street gangster because he only has a superficial understanding of this concept
Why not? Have someone live an extra 20 years in the right environment and he could learn anything. Couldn't you?
>He may memorize some behaviors but that's about it.
Have you put words together in new ways since you learn how to speak or are they all copies of things you've heard before? Would it be impossible for you to learn a new language?
This is the danger of analogies.

Anonymous

7/26/2025, 10:59:55 PM No.106038551

>>106038498
DeepSeek has a repo and paper for doing "ESFT" (specific expert finetuning), should need much less VRAM.

The merge back stuff was tried by those TNG guys on deepseek v3/R1, but given it's already uncensored for most purposes aside from a few CCP things, it was far less useful. Kimi 2 on the other hand needs this as it will consistently refuse even 30-40 turns into the chat without a prefill. With a prefill, you can get it to write anything, results are good and the output is fun, although still some mild slop specific to it, but enjoyable enough.

Anonymous

7/26/2025, 11:03:56 PM No.106038599

>>106038545
Like you plug 8k tokens of your organic ERP logs into the model and what it spits out is kinda the same (but also different) as the continuation of your organic ERP logs. I only used IQ1 R1 so maybe Q4 is different but even that fails at that.

Anonymous

7/26/2025, 11:10:39 PM No.106038685

We need 2T models.

Replies: >>106038692 >>106038702

Anonymous

7/26/2025, 11:11:09 PM No.106038692

>>106038685
local?

Replies: >>106038748

Anonymous

7/26/2025, 11:11:35 PM No.106038702

>>106038685
llama4-behemoth 2t/288a soon

Anonymous

7/26/2025, 11:12:11 PM No.106038709

How is the expert selected?

Replies: >>106038722 >>106038727 >>106038734 >>106038737 >>106038792 >>106038948 >>106039217

Anonymous

7/26/2025, 11:13:00 PM No.106038722

>>106038709
Router bribery.

Anonymous

7/26/2025, 11:13:21 PM No.106038727

>>106038709
By using A(n) I(ndian).

Anonymous

7/26/2025, 11:13:56 PM No.106038734

>>106038709
Just like everything related to LLMs, vibes.

Anonymous

7/26/2025, 11:14:12 PM No.106038737

>>106038709
nobody knows, countless papers have been written but still nobody understands how the Neural Network works

Anonymous

7/26/2025, 11:15:18 PM No.106038748

>>106038692
Only for people who can organise lan parties.

Anonymous

7/26/2025, 11:16:51 PM No.106038766

Is rocinante still the meta for <16GB? I've been running that on my 1080ti and gemma3 27b on my 7900xtx for what feels like forever now.
Electricity is 65c/kwh I can't run more gpu.

Replies: >>106038789 >>106038804 >>106038826

Anonymous

7/26/2025, 11:18:00 PM No.106038789

>>106038766
yes

Replies: >>106039478

Anonymous

7/26/2025, 11:18:14 PM No.106038792

anexpert

md5: 0f4ba150ed857d31bd684922152a65cb🔍

>>106038709
By saying "WAAAAAAAAAAAAAAAAAAAAAAAAAAAA" the loudest.

Replies: >>106039236 >>106039236

Anonymous

7/26/2025, 11:19:26 PM No.106038804

>>106038766
Qwen 30B A3B.

Replies: >>106038829 >>106038864 >>106039478

Anonymous

7/26/2025, 11:21:14 PM No.106038826

>>106038766
>Is rocinante still the meta for <16GB?
Yes.

Replies: >>106039478

Anonymous

7/26/2025, 11:21:30 PM No.106038829

>>106038804
it's retarded as fuck don't listen to this anon

Anonymous

7/26/2025, 11:24:43 PM No.106038864

>>106038804
I sure love when my waifus randomly start speaking in Chinese.

Anonymous

7/26/2025, 11:29:46 PM No.106038916

shot

md5: 29367e8fcd0a3c626476f82efe99f275🔍

>>106036136
Oh that's even worse, is that huggingface? I got bit by the pytorch SDPA function which does things the opposite way. The situation is even more retarded lmao.

Replies: >>106038929

Anonymous

7/26/2025, 11:31:08 PM No.106038929

>>106038916
https://docs.pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html

Replies: >>106039204

Anonymous

7/26/2025, 11:33:05 PM No.106038948

>>106038709
You check which one is least likely to say sex as next token and you pick that one.

Anonymous

7/26/2025, 11:46:06 PM No.106039093

red footbocchi3

md5: ae4727db6f833a4e5df8145979ad01bd🔍

What local model is 4plebs using to describe and add description metadata to all their images?
I want to do that for my own 4chan folder which is like 100,000 images and save the descriptions in the exif metadata so I can find images faster.

Replies: >>106039104 >>106039179

Anonymous

7/26/2025, 11:47:07 PM No.106039104

>>106039093
Kita has athlete's foot.

Anonymous

7/26/2025, 11:54:23 PM No.106039179

>>106039093
Gemma is probably the best local LLM capable of describing images.
If you want booru tags there's joytag.

Anonymous

7/26/2025, 11:57:02 PM No.106039204

>>106038929
https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

They're not even consistent in their own framework

Anonymous

7/26/2025, 11:58:08 PM No.106039217

>>106038709
by a diverse committee that prioritizes equity and inclusivity

Anonymous

7/26/2025, 11:59:29 PM No.106039236

>>106038792
I wish >>106038792 was true, because it'd be cute, but it'd be inefficient if you had to actually run all the weights and you'd be back to dense performance.

Look at https://raw.githubusercontent.com/deepseek-ai/DeepSeek-V3/refs/heads/main/inference/model.py

The short of it is as you'd expect, some LP is applied after the output of some block and you get some scores from that which decide what o run net. So that MLP has to learn which expert to use. This is not really differentiable, but you can use various trick to make it trainable.

See the Gate and MoE classes in the source code there, it's "simple"

Anonymous

7/27/2025, 12:00:31 AM No.106039247

git pulled and my ikllama isn't outputting any tokens but it keeps iterating.....

Replies: >>106039931

Anonymous

7/27/2025, 12:01:33 AM No.106039260

>he pulled
HAHHAHAHAHHAHAHAHAHAHA

Anonymous

7/27/2025, 12:07:29 AM No.106039305

smugfolderimage2623

md5: 69179650b434d27a6e18ad41fc34f6a6🔍

>he pulled

Replies: >>106039471

Anonymous

7/27/2025, 12:20:48 AM No.106039444

Nvidia RTX 5070 SUPER 18GB is going to save local models

Replies: >>106039471 >>106040350

Anonymous

7/27/2025, 12:23:16 AM No.106039471

>>106039444
>18GB
see this image: >>106039305

Anonymous

7/27/2025, 12:23:43 AM No.106039478

>>106038789
>>106038826
sad state of affairs
>>106038804
I tried it, seemed on par with gemma3 for general chat/info bot and worse than rocinante for RP.
I also tried some larger models half on cpu RAM half on gpu and it's unusably slow, like 5t/s or something. Dual channel RAM is ass I regret building an AM5 AI machine.

Anonymous

7/27/2025, 12:24:58 AM No.106039491

Is it possible to fine tune quantized large models like Deepseek with QLoRA? How much vram would be necessary and would it be possible to reduce the requirements further with optimization techniques?

Anonymous

7/27/2025, 12:28:37 AM No.106039540

import-chat

md5: a51011ca360899cd9284a7deb581b1f5🔍

!EXPERIMENT!

If you import this human-sourced ERP chat in SillyTavern, remove any system prompt or anything else (keep just the conversation), then delete or branch the chat at any given point (e.g. message #27), will your LLM continue it in character, maintaining the same style and good situational awareness?

https://files.catbox.moe/6axnn8.jsonl

Replies: >>106039561

Anonymous

7/27/2025, 12:30:10 AM No.106039561

>>106039540
>will your LLM continue it in character, maintaining the same style and good situational awareness?
Yeah that is the way I benchmark models and even when 235B is noticeably better than anything else before it, it still fails tests like this.

Anonymous

7/27/2025, 12:43:12 AM No.106039687

next week is going to be HUGE for local models

Anonymous

7/27/2025, 12:43:25 AM No.106039690

>>106037082
Doing god's work.

Anonymous

7/27/2025, 12:44:30 AM No.106039706

animated sex

Anonymous

7/27/2025, 12:48:47 AM No.106039744

Here's a small snippet which will parse SillyTavern's json world book into a human readable config file.
>https://files.catbox.moe/vlvhp1.py

Anonymous

7/27/2025, 12:51:42 AM No.106039772

Sex with the new intern?

Replies: >>106039782

Anonymous

7/27/2025, 12:52:36 AM No.106039782

>>106039772
dude shut up if HR catches us saying stuff like that we'll get fired

Anonymous

7/27/2025, 1:03:32 AM No.106039888

I don't see the the amount of active parameters for the new InternLM model mentioned anywhere.
Call me paranoid but this being exactly 235b (+6b vision) with the exact same tokenizer as Qwen3-235b makes me think this is another chink scam where they rebadged Qwen.

Replies: >>106039919 >>106039940

Anonymous

7/27/2025, 1:05:47 AM No.106039919

>>106039888
>https://huggingface.co/internlm/Intern-S1
>Built upon a 235B MoE language model (Qwen3) and a 6B Vision encoder (InternViT)
It's not like they are hiding it.

Anonymous

7/27/2025, 1:06:41 AM No.106039931

>>106039247
FALSE ALARM! Ikllama probably works. ST is the guilty nigger.

Anonymous

7/27/2025, 1:07:17 AM No.106039938

>>106037798
The cloud is just…

Anonymous

7/27/2025, 1:07:21 AM No.106039940

>>106039888
every internlm model has been qwen with vision bolted on so this is no surprise

Anonymous

7/27/2025, 1:11:15 AM No.106039965

now that the 235b dust has settled, instruct or thinker?

Replies: >>106039980 >>106040020 >>106040359 >>106040383 >>106040760

Anonymous

7/27/2025, 1:12:45 AM No.106039980

>>106039965
The more you think the more woke you are and LLMs are no different

Anonymous

7/27/2025, 1:17:23 AM No.106040020

>>106039965
I don't care about qwen outside of qwq. It's crazy how they had a great soulful reasoner that they simply had to scale up and they arrived on Qwen3 instead.

Replies: >>106040065 >>106040094

Anonymous

7/27/2025, 1:21:06 AM No.106040065

>>106040020
>qwq
>soulful
bait

Anonymous

7/27/2025, 1:23:39 AM No.106040094

>>106040020
they did, it's called 235b
only the small models were cheapo distilled afterthoughts

Replies: >>106040148

Anonymous

7/27/2025, 1:24:52 AM No.106040108

I've been using mistral nemo 12b tunes and it's getting boring. Is there a decent model with different DNA in that range to play with

Replies: >>106040189

Anonymous

7/27/2025, 1:29:12 AM No.106040148

>>106040094
235b has nothing in common with qwq preview

Replies: >>106040275

Anonymous

7/27/2025, 1:31:12 AM No.106040173

Pulled my benis from Migu ass

Anonymous

7/27/2025, 1:31:44 AM No.106040179

>traced idle patterns
>guttural
Yawn.

Anonymous

7/27/2025, 1:32:52 AM No.106040189

>>106040108
Just go on and upgrade to 24B or 27B. It doesn't matter. 24B is the absolute minimum and it's still shit anyway.

Anonymous

7/27/2025, 1:40:45 AM No.106040275

eqbench writing similarity

md5: cd9c1d935be15b722f07c20478b5b7a8🔍

>>106040148
>qwq preview
yeah because no one used that garbage
assuming you mean regular qwq it very clearly does

Replies: >>106040318

Anonymous

7/27/2025, 1:45:36 AM No.106040318

>>106040275
qwq was more creative and the only downside was getting stuck in repetition, regular qwq and 235b are just more boring r1 distills

Anonymous

7/27/2025, 1:47:41 AM No.106040336

Imagine still talking about old 235B when the new 2507 checkpoint's here

Anonymous

7/27/2025, 1:49:08 AM No.106040350

>>106039444
keek

Anonymous

7/27/2025, 1:49:49 AM No.106040357

Imagine still talking about old 2507 when the new S1 checkpoint's here

Anonymous

7/27/2025, 1:49:53 AM No.106040359

>>106039965
instruct

Anonymous

7/27/2025, 1:52:26 AM No.106040383

>>106039965
Instruct for RP
Coder for run of the mill coding
Thinking for hard shit
Qwen actually did something right. I'm shocked too

Replies: >>106040449

Anonymous

7/27/2025, 1:56:59 AM No.106040435

Never mind again. Ikllama latest version is fucked on my pc. I downloaded some fork from 4 days ago and it works properly.

Anonymous

7/27/2025, 1:58:38 AM No.106040449

>>106040383
>Instruct for RP
>Thinking for hard shit
But my penis is hard... so....

Anonymous

7/27/2025, 2:26:03 AM No.106040677

god I wish I could safely share what I've seen. it's fine... only have to live like this for a week tops

Replies: >>106040747 >>106040752 >>106040766 >>106040776 >>106040787 >>106040830

Anonymous

7/27/2025, 2:33:26 AM No.106040747

>>106040677
Whisper sweet nothings about the strawberry-flavored OpenGPT in my ear

Anonymous

7/27/2025, 2:33:50 AM No.106040752

>>106040677
Okay mr fbi reviewing the epstein files

Anonymous

7/27/2025, 2:34:35 AM No.106040760

>>106039965
Thinker for RP if your RP is not too unsafe.
Instruct for RP if your RP is very unsafe, and use something like "write 2 paragraphs" to prevent it from doing one liners.

Replies: >>106040808

Anonymous

7/27/2025, 2:35:05 AM No.106040766

>>106040677
CONFIRMED: OpenAI's NEW STRAWBERRY GPT5 AGI will bring about the APOCALYPSE and END ALL HUMAN LIFE in "A WEEK TOPS"

Anonymous

7/27/2025, 2:35:41 AM No.106040776

>>106040677
ok sam

Anonymous

7/27/2025, 2:36:47 AM No.106040787

>>106040677
*delayed by a chinese model taking a dump all over it*

Anonymous

7/27/2025, 2:38:22 AM No.106040808

>>106040760
>unsafe
why would qwen care whether the characters are wearing condoms?

Anonymous

7/27/2025, 2:39:26 AM No.106040819

I am one of the testers for openai's new model. I asked it what a mesugaki is and an hour later Sam emailed me personally and invited me to a party on some island or something.
Should I go?

Replies: >>106040825

Anonymous

7/27/2025, 2:40:04 AM No.106040825

>>106040819
RUN

Anonymous

7/27/2025, 2:41:17 AM No.106040830

>>106040677
OpenAI hype squad again?
I'll register my predictions that I won't be very surprised by it.
There's a number of things we know would give impressive results but nobody has the balls to scale it up, OpenAI could do it, but I'd give it a sub-10% that they do anything I would personally consider important. But even if they do that, I would expect it to get replicated when the others once again notice they were sleeping on the wheel - even when many already stated that this or that should be done already, but nobody did it. Once OAI does it, then everyone else copies this obvious thing. Instead of preemptively betting money on scaling up the things they should believe were good. Meh, 5 years of this already since 2020 summer.
Anything else? If not that, I expect some incremental improvement that would be cool and all, but not earth shattering. And not really relevant to /lmg/ since corpo APIs and usual OAI slop. I would not even expect their open source model to be much better than the other stuff, probably very positivity and safety slopped making Deepseek a winner again here for not doing that.

Replies: >>106040845

Anonymous

7/27/2025, 2:43:10 AM No.106040845

>>106040830
I'll also say that if you truly want to post it, just use the gay proxy and post anonymously through Tor, won't be traced back to you, unless the information you want to share was known mostly to you and very very few people.

Replies: >>106040867

Anonymous

7/27/2025, 2:46:08 AM No.106040867

>>106040845
Have you considered, even for a moment, that he was just larping and didn't actually see anything?

Replies: >>106040904

Anonymous

7/27/2025, 2:46:12 AM No.106040871

Guys my uncle works for open AI and he said that the open model will be gpt5o mini and that it knows how to RP as a Mesugaki Nala while quoting Castlevania: SOTN and that's why it has been delayed.

Replies: >>106040891 >>106040934 >>106040948

Anonymous

7/27/2025, 2:48:34 AM No.106040891

>>106040871
If I give you my chocolate lunch milk for a week can you ask your uncle to leak the RP capable gpt5o weights?

Anonymous

7/27/2025, 2:50:07 AM No.106040904

>>106040867
I assumed he was larping, but on the off chance he was not, I reminded him that he has an easy way to anonymously leak.

Anonymous

7/27/2025, 2:52:03 AM No.106040920

>chocolate
For me, it's strawberry milk ;)

Anonymous

7/27/2025, 2:53:24 AM No.106040934

>>106040871
>it knows how to RP as a Mesugaki Nala while quoting Castlevania: SOTN
audible lol

Anonymous

7/27/2025, 2:54:20 AM No.106040948

>>106040871
How many watermelons can it hold? This is make or break information.