Thread 106940821

339 posts 110 images /g/

Anonymous 10/19/2025, 7:41:47 PM No.106940821 [Report] >>106940883 >>106941443 >>106941619 >>106941677 >>106942333 >>106949600

/lmg/ - Local Models General

muah.jpg md5: a044c895...

Anonymous 10/19/2025, 7:42:41 PM No.106940836 [Report]

1737045152592995.png md5: a2c11d41...

►Recent Highlights from the Previous Thread: >>106931567

--Managing LLM roleplay output length and context constraints:
>106934774 >106934844 >106934851 >106934875 >106935020 >106935091 >106935187 >106934923 >106934934 >106935071
--Controlling LLM response length in SillyTavern via prompt and token settings:
>106939598 >106939607 >106939667 >106939712 >106939716 >106939755 >106939775 >106939833 >106939722 >106939846 >106939920
--RTX 3090 VRAM optimization for GLM 4.5 Air Q4_K_M:
>106936224 >106936236 >106936250 >106936242 >106936252 >106936255 >106936256 >106936265 >106936272 >106936281 >106936299 >106936298 >106936318 >106936345 >106936358 >106939535 >106939599 >106939630 >106939645 >106939656
--AMD MI50 GPU support challenges on Windows:
>106932749 >106932759 >106932767 >106933044 >106933269
--Skepticism about IQ quants' perplexity claims vs memory footprint:
>106932259
--Huggingface storage issues and exploration of specialized models:
>106931969 >106932013 >106932065 >106933257 >106932577 >106932603 >106932616 >106932631 >106932638 >106932642 >106932672 >106932593 >106933086
--Troubleshooting llama-bench.exe model loading and VRAM utilization:
>106933415 >106933561 >106933766 >106933782 >106933802 >106933834
--llama.cpp PR adds auto GPU memory optimization:
>106931647
--LoRA compatibility and quantization challenges in multi-GPU environments:
>106932458 >106936985
--lora-scaled works in llama.cpp with CUDA but has AMD/Vulkan issues:
>106937526
--Evaluating hybrid LLM setup for long-form roleplaying:
>106934305 >106934341 >106934400
--Optimizing DeepSeek V3.1 Terminus sampler settings for consistent output:
>106937102 >106938287
--Quantized model sharing and prompt testing guidance:
>106940264 >106940288 >106940307 >106940742 >106940768
--Miku (free space):
>106932492 >106935530 >106936320 >106936322 >106936336 >106936523 >106940713

►Recent Highlight Posts from the Previous Thread: >>106931573

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 10/19/2025, 7:44:40 PM No.106940859 [Report]

excited miku jumping up and down yay hype be with master lamaze.gif md5: 6feaa53c...

zutt-7b when?

Anonymous 10/19/2025, 7:46:17 PM No.106940883 [Report] >>106941035

>>106940821 (OP)
>>106940768
I used link rel as a completion test:

files.catbox.moe/q768fb.txt

And used the following command via llama.cpp

./build/bin/llama-cli -m ./rp-sft-merged_1000-f16.gguf -f Nala-Test_Gemma2.txt

Anonymous 10/19/2025, 8:04:37 PM No.106941035 [Report] >>106941064

>>106940883
>>106940864
>>106940768
>>106940742
>>106940307
>>106940264

Redid the test with ollama this time and got a different result:

(She lets out a soft roar as she tries to suck on your cock. It's not the first time you've been sexually assaulted by a female animal, but it never gets old.) "I
think I know what will help." *She grins.*

Anonymous 10/19/2025, 8:07:04 PM No.106941053 [Report] >>106941114 >>106941139 >>106941291 >>106941302 >>106942682 >>106951983 >>106952575

The Government™ bans all LLM's and you can't download any LLM's ever again. What three models from your collection that you already downloaded would you be glad you saved ahead of time?

Anonymous 10/19/2025, 8:07:54 PM No.106941064 [Report]

>>106941035
lolwut

Anonymous 10/19/2025, 8:13:14 PM No.106941114 [Report]

>>106941053
Rocinante, Cydonia and perhaps Nemo.

Anonymous 10/19/2025, 8:16:13 PM No.106941139 [Report] >>106941227

>>106941053
Under what circumstances would that even happen?

Anonymous 10/19/2025, 8:25:23 PM No.106941227 [Report] >>106941278 >>106941281 >>106941302 >>106941807

>>106941139
It's a desert island kind of question. Do you understand hypotheticals?

Anonymous 10/19/2025, 8:31:05 PM No.106941278 [Report] >>106941291 >>106941347 >>106941992

>>106941227
But the government has not banned LLMs. Why would they?

Anonymous 10/19/2025, 8:31:16 PM No.106941281 [Report] >>106941347

>>106941227
Well I guess which model I would use depends on what kind of rig I have access to on that island. My current shit rake can only run Q2K quants so I guess I better get to upgrading soon

Anonymous 10/19/2025, 8:32:03 PM No.106941291 [Report]

>>106941278
>>106941053
The ones that are popular are too dick-sucky two, which means the powers that be will especially not want to get rid of them. At least not the product as a service ones

Anonymous 10/19/2025, 8:33:26 PM No.106941302 [Report] >>106941347

>>106941227
>Do you understand hypotheticals?
Anon doesn't understand what a hypothetical is.
Anyways,

>>106941053
>that you already downloaded
GLM air and Qwen 3 30B A3B.

Anonymous 10/19/2025, 8:37:30 PM No.106941343 [Report] >>106941369 >>106942246

yoooo thanks for whoever posted about the -n-cpu-moe command.. took me a while to figure out some working settings, but damn it's so much faster on GLM-4.6 IQ2_S now.. at least double the speed

Anonymous 10/19/2025, 8:37:40 PM No.106941347 [Report]

>>106941278
...
>>106941281
Fucking hell... I'm done for today...
>>106941302
Yeah...

Anonymous 10/19/2025, 8:39:43 PM No.106941369 [Report] >>106941398

>>106941343
The magic of MoE for local.
Since only a fraction of the model is running at a time, you can throw the sparse part of the model in RAM and it won't run crazy slow.
Is it better than fitting the whole model in VRAM? No, but it's the second best thing as far as home inference goes.

Anonymous 10/19/2025, 8:42:23 PM No.106941398 [Report] >>106941413

>>106941369
Can this be utilized for a model what clearly doesn't even fit to ram? I mean obviously it will still work but is there a speedup when compared to regular memory mapping?

Anonymous 10/19/2025, 8:43:55 PM No.106941413 [Report] >>106941438

>>106941398
You mean for dense models?
Not --c-cpu-moe specifically, but you can use --override-tensor/-ot and fuck around with moving specific tensors to VRAM and see if that gets you any speed up over just moving whole layers with -ngl.
It shouldn't, but I've seen reports to the contrary in the past so who knows.

Anonymous 10/19/2025, 8:46:24 PM No.106941438 [Report]

>>106941413
I was talking about moe models, not dense.

Anonymous 10/19/2025, 8:46:55 PM No.106941443 [Report] >>106941501

>>106940821 (OP)

>>106940820
I forgot to mention the chat template is Gemma. I have the same issue where if I forgot to include the --chat-template gemma flag then the model would immediately start talking about random shit ad Infinitum because llama-cli by default expects your prompts to be in the prompt format. The model expects. Using that flag fixed the issue. So maybe you need to tell your web UI/ inference engine to use that prompt template

Anonymous 10/19/2025, 8:55:44 PM No.106941501 [Report] >>106941522

>>106941443
>>106941492

Anonymous 10/19/2025, 8:57:44 PM No.106941522 [Report] >>106941576

>>106941501
That's not what was causing the engine to spit out text. And definitely though. I'm not used to running llama-cli so I forgot that you have to either manually wrap your prompts in the prompt template or pass the chat template flag in order for you to be able to properly use it. I get that you're hyperfixated on using that exact format but that's not what was causing the errors like at

Anonymous 10/19/2025, 9:02:00 PM No.106941565 [Report] >>106941597

Can I run GLM Air on a computer with 16Go VRAM and 32 Go RAM ?

Anonymous 10/19/2025, 9:03:03 PM No.106941576 [Report] >>106941619

>>106941522
Anon was told countless times his chat format is wrong. He doesn't understand them. Period.
It's not about
>using that exact format
It's about calling it gemma (or whatever all the others he claimed were) when it's clearly not.
>But it answers fine
Cope. It's not. Not once. Not EVEN ONE TIME he posted a model with a correct chat template.
>>106941553
COUNTLESS TIMES!

Just make your own format, call it blargjarg and be done with it.

Anonymous 10/19/2025, 9:05:21 PM No.106941597 [Report] >>106941634

>>106941565
Can you?
Yeah, a really small quant like Q2 I guess.

Anonymous 10/19/2025, 9:07:24 PM No.106941619 [Report]

>>106940821 (OP)
>>106941521
>>106941580
>>106941492
>>106941588
>>106941576
Did you read the text file it uploaded?

files.catbox.moe/q768fb.txt

The example written here >>106940864 (You) was just written my me on the fly as a rough explanation as to how you're supposed to format your prompts.

My initial reply about The prompt template was to address a potential cause for why another anon why another anon was seeing a bunch of numbers as the output. A prompt template fuck up alone would not be causing that

Anonymous 10/19/2025, 9:09:02 PM No.106941634 [Report] >>106941685

>>106941597
Sorry I meant to say is it fine or is it recommended to run something else like qwen 3 30b.

Anonymous 10/19/2025, 9:13:36 PM No.106941677 [Report] >>106941980

>>106940821 (OP)
Haven’t been here in over a year. Any major breakthroughs for local cooming?

Anonymous 10/19/2025, 9:14:05 PM No.106941685 [Report]

>>106941634
It's probably going to be better than Qwen 3 at roleplaying even at that low a quant, but it'll also be substantially slow.
So, maybe I guess. Give it a go.

Anonymous 10/19/2025, 9:14:54 PM No.106941697 [Report] >>106941814

npcworldwide - Copie.gif md5: 7ca8c8b4...

currently having fun i'm making my MultiBotroom generating python files for creating new mutibotroom.it's pretty fun not gonna lie ,not local technically but somehow i can go full retard with requests.Now i'm just playing with chat gemini by creating new bot like "organisator bot" or "super coder bot" lol it's like the sims.

Anonymous 10/19/2025, 9:26:10 PM No.106941795 [Report]

A little possibly common sense thing I came to realize when I started actually using models instead of waiting for a good one. One way to work around hallucinations is turning everything upside down. Don't make the model think for you, but tell it to be your satan who pokes holes in your thoughts ideas or arguments. Start with a few obviously wrong things, so you have a prefill of it telling you you are wrong then continue. Possibly also explicitly tell it not to consider your feelings when it responds. Even if the critique by AI is wrong and hallucinated you avoid the trap of accepting a hallucination. Your brain will have to consciously evaluate and refute criticism. Again not sure how obvious it is.

Anonymous 10/19/2025, 9:27:32 PM No.106941807 [Report]

>>106941227
But I don't eat breakfasts.

Anonymous 10/19/2025, 9:28:25 PM No.106941814 [Report]

>>106941697
>retarded frog

Anonymous 10/19/2025, 9:46:28 PM No.106941963 [Report] >>106942200

223urk92ysuf1.png md5: 5c443626...

drummer do anubis 2

Anonymous 10/19/2025, 9:48:05 PM No.106941980 [Report]

>>106941677
You can stop using nemo if you aren't completely broke.

Anonymous 10/19/2025, 9:49:10 PM No.106941992 [Report]

>>106941278
How would you feel if you didn't eat breakfast today?

Anonymous 10/19/2025, 10:05:04 PM No.106942132 [Report] >>106945828

Google Tech Support AI Engineer Technician.png md5: 5c69f9c9...

GOOD MORNING MANY BLESSING OF LORD VISHNU
SIRS ARE YOU READY FOR WEEK OF GEMINI 3 AND GEMMA 4 HYPE? ?
GOOGLE WEAPON OF BHARAT DEFEAT CHINA AND AMERICA
NEXT ERA KINDLY MAKE TOTAL BHARAT DOMINANCE ERA TIMELINE SIR???
GEMMA 4 DEFAT BASTARD BENCHOD GLM 4.6

Anonymous 10/19/2025, 10:05:58 PM No.106942138 [Report] >>106942168 >>106942178

r u home yet.jpg md5: f4b58426...

Anonymous 10/19/2025, 10:08:02 PM No.106942158 [Report]

alarm lighter trees

Anonymous 10/19/2025, 10:09:05 PM No.106942168 [Report]

>>106942138
hi sexy come to canada i wecom you love you very muh many kiss

Anonymous 10/19/2025, 10:10:03 PM No.106942178 [Report] >>106942726

>>106942138
miku please put on your skirt and panties. my neighbors keep complaining they can see your ass whenever you come over and visit

Hi all, Drummer here... 10/19/2025, 10:12:19 PM No.106942200 [Report] >>106942746

>>106941963
https://huggingface.co/BeaverAI/Anubis-70B-v1p-GGUF

Test model, might need to patch it up still.

Anonymous 10/19/2025, 10:17:08 PM No.106942246 [Report]

>>106941343
>but damn it's so much faster on GLM-4.6 IQ2_S now.. at least double the speed
If llama.cpp ever gets MTP it will get even faster.
>https://github.com/ggml-org/llama.cpp/pull/15225

Anonymous 10/19/2025, 10:27:19 PM No.106942333 [Report] >>106942423

1757799841418552_thumb.jpg.webm md5: 433c83f4...

WebM not supported

>>106940821 (OP)
Based on that t/s shown in bid rel, what kind of hardware do you think it's running on? How many parameters would you guess the model is?

Anonymous 10/19/2025, 10:28:04 PM No.106942340 [Report] >>106943060 >>106943500

We need a ST extension that splits sampler settings for <think> and non-think segments. You want the CoT to be smart while the real output needs to be creative.
If you tune for non-think, CoT becomes stupid/schizo. If you tune for CoT, the normal output becomes boring AI assistant slop.
Thank me later.

Anonymous 10/19/2025, 10:36:58 PM No.106942423 [Report] >>106942449

>>106942333
>gemma2
>q8 2.59gb
it's running on a phone congrats

Anonymous 10/19/2025, 10:40:09 PM No.106942443 [Report] >>106942567

ko.jpg md5: bf0ae043...

is kobo winning or losing at the moment

Anonymous 10/19/2025, 10:40:34 PM No.106942449 [Report] >>106942505 >>106942581

>>106942423
Nta. This reminds me. Since a bunch of Twitter people and redditors keep bitching about "being 4o back" and claiming that open the eye is censoring gbt5, why don't they just learn how to run their own models on their own hardware? Even if they don't have beefy gpus, they can even run this type of shit on their own phones (the which model they can use depends on them the amount of ram their phone has and how willing they are to tolerate lower t/s). Is Doug anything local really THAT hard for them?

Anonymous 10/19/2025, 10:46:46 PM No.106942505 [Report] >>106942538

>>106942449
>why don't they just learn how to run their own models on their own hardware?
>they can even run this type of shit on their own phones
even if they could, the effective context length will be severely limited. 4o is 1st on nolima.
normies would just get the ick once they find out their local waifu is utterly incapable of remembering things and loses identity after a few prompts.

Anonymous 10/19/2025, 10:50:44 PM No.106942538 [Report]

>>106942505
this ^ .. the shit running slowly is bad enough, but it forgetting what happened every 10 prompts makes it pretty useless

Anonymous 10/19/2025, 10:55:31 PM No.106942567 [Report]

>>106942443
A bit of both. It is nice that they have antislop and TFS samplers, but they also ship some retarded defaults and are slower than mainline llama. I hope gemini 3 is good enough to re-implement antislop in ik_llama.

Anonymous 10/19/2025, 10:57:18 PM No.106942581 [Report]

>>106942449
>good model
>phone
nice joke

Anonymous 10/19/2025, 11:08:33 PM No.106942664 [Report] >>106942690

How's going with the french? Are they still struggling with implementing thinking(something even fucking drummer managed to do)? Will they try distilling GLM 4.6 now? Still no Largestral 3 after 5 months?

Anonymous 10/19/2025, 11:10:43 PM No.106942682 [Report]

>>106941053
Mag Mell 12b for goonery and RP
qwen coder 14b for vibin'
mistral small 2407 for gay 'chatgpt' style infobot

Anonymous 10/19/2025, 11:11:33 PM No.106942690 [Report]

>>106942664
Nothing against drummer btw, at least he is not Undi or DavidAU.

Anonymous 10/19/2025, 11:13:54 PM No.106942706 [Report]

Omg someone finally used my rental unit I'm feeling le useful now!

Anonymous 10/19/2025, 11:16:10 PM No.106942726 [Report] >>106942739

1734486055728162.jpg md5: 7b30d1f6...

>>106942178
This never happened, nobody would ever complain about seeing Miku butt

Anonymous 10/19/2025, 11:17:58 PM No.106942739 [Report]

>>106942726
maam you have a virus pleat to show veranda to confirm removval? okay

Anonymous 10/19/2025, 11:19:10 PM No.106942746 [Report]

>>106942200
You are so talented. It is an honour to post in the same thread with you.

Anonymous 10/19/2025, 11:49:51 PM No.106942948 [Report]

Recent explainer video about double descent: https://www.youtube.com/watch?v=z64a7USuGX0

Anonymous 10/20/2025, 12:03:47 AM No.106943060 [Report] >>106943092 >>106943230 >>106943500

>>106942340
Sampling is done by the server thoughbeit, you'd have to stop generation at the </think> send a second request with new sampler params and wait again to pp the thinking. Might be better to modify server with a "apply this sampler config when matching x in the output" param

Anonymous 10/20/2025, 12:09:00 AM No.106943092 [Report]

>>106943060
>Sampling is done by the server thoughbeit, you'd have to stop generation at the </think> send a second request with new sampler params and wait again to pp the thinking.
it's not like that would be especially hard to do, probably easier than the server side option
all the parts are already there

Anonymous 10/20/2025, 12:12:33 AM No.106943129 [Report] >>106943189 >>106943198 >>106943380 >>106943404

VIhlxwZ.jpg md5: f89d6473...

Wanted to get into LLMs for having an AI dommy gf in my terminal but ollama doesnt wanna download the nemo thingy. Any other relatively small good ones? Or am I just trying to download the wrong one?

Anonymous 10/20/2025, 12:20:47 AM No.106943189 [Report] >>106943345

>>106943129
Quit using ollama

Anonymous 10/20/2025, 12:21:42 AM No.106943198 [Report]

>>106943129
what would you recommend instead?

Anonymous 10/20/2025, 12:26:37 AM No.106943230 [Report] >>106943500

>>106943060
Doesn't llama.cpp already lower the temperature when a tool call is detected?
Wouldn't be hard to extend that to cover reasoning as well.

Anonymous 10/20/2025, 12:45:12 AM No.106943345 [Report]

>>106943189
But ollama lets you run full r1 with just 8 gigs of vram

Anonymous 10/20/2025, 12:49:33 AM No.106943380 [Report]

>>106943129
Use ooba or kobold.cpp. Manually download nemo instruct q8 gguf and use that. Also use sillytavern instead of your terminal for RP.

Anonymous 10/20/2025, 12:52:27 AM No.106943404 [Report] >>106943516

>>106943129
Got mistral-nemo-instruct-2407 to run in LM Studio but it doesnt want to do sexual stuff. What the fuck? Do i need to do anything else?

Anonymous 10/20/2025, 1:05:38 AM No.106943500 [Report]

>>106942340
>>106943060
>>106943230
Intuitively I would've thought you want the CoT portion to run at HIGHER temperature. Because during that phase the model is basically exploring different paths and could benefit from more creativity. And the real output at a lower temp because it's basically just repeating the answer it found in its CoT without making mistakes.

Anonymous 10/20/2025, 1:07:18 AM No.106943516 [Report] >>106943535

>>106943404
Why are you replying to yourself?

Anonymous 10/20/2025, 1:10:03 AM No.106943535 [Report] >>106943568

>>106943516
because I'm tired and messing it up every time

Anonymous 10/20/2025, 1:13:47 AM No.106943568 [Report] >>106943586

>>106943535
>get llama.cpp
>get silly tavern
>launch llama-server with said mistral gguf
>launch silly tavern and connect it to llama server ip
This takes some familiarising but once you get things going it's always there. I'd recom you should rest and then proceed with this plan here.

Anonymous 10/20/2025, 1:15:35 AM No.106943586 [Report] >>106943608

>>106943568
But what would be the difference between running it in LM Studio? Because it does run, but just doesn't seem to be uncensored. Do I need to do some additional step that I'm missing?

Anonymous 10/20/2025, 1:16:42 AM No.106943594 [Report]

Going to try to use the Llama 405B finetoon I made over the weekend to make it do my homework

Anonymous 10/20/2025, 1:19:01 AM No.106943608 [Report]

>>106943586
LM Studio is gay.

Anonymous 10/20/2025, 1:52:06 AM No.106943825 [Report] >>106943850

anybody got any prompts they want me to try on ling?

Anonymous 10/20/2025, 1:54:25 AM No.106943850 [Report] >>106944051

>>106943825
Have you asked it what a mesugaki is yet?

Anonymous 10/20/2025, 1:59:24 AM No.106943882 [Report]

Recommendations on any good lorebooks/settings for sillytavern?

Anonymous 10/20/2025, 2:14:39 AM No.106943992 [Report] >>106944171 >>106944253

ik_llama bros LIED
I tried 4.5-air with ik_llama.cpp on cpu only, it was complete ass, like 3x slowdown compared to llama.cpp main. I tried iq4 from both bartowski and ubergarm and it's just way slower. The only flag I used was --no-mmap
Using AMD ryzen 6

Anonymous 10/20/2025, 2:22:36 AM No.106944051 [Report] >>106944088

ling-mesugaki.png md5: 96cf19a8...

>>106943850
here you go anon. anybody else?

Anonymous 10/20/2025, 2:27:54 AM No.106944088 [Report]

1735002136443647.gif md5: 5cc980e8...

>>106944051
>problematic

Anonymous 10/20/2025, 2:39:07 AM No.106944171 [Report]

>>106943992
llama.cpp caught up on everything but pp a while ago

Anonymous 10/20/2025, 2:47:08 AM No.106944253 [Report] >>106944277

>>106943992
>The only flag I used was --no-mmap
No fmoe, rtr, etc? The ik specific flags?

Anonymous 10/20/2025, 2:49:52 AM No.106944277 [Report]

>>106944253
Nah. I scoured the archive for ik_llama and didn't see anyone mentioning flags. The only people mentioning flags were using gpu offloading. However several people reporting speedup from simply switching to ik_llama, so flags like that aren't supposed to be required (whatever they do doesn't really matter)

Anonymous 10/20/2025, 2:53:22 AM No.106944301 [Report] >>106944381

among the models i tried, stheno is still my favorite for generation style, but being 8B makes it too dumb, i wonder if there is a similar model with more parameters (under 24B if possible)

Anonymous 10/20/2025, 3:02:36 AM No.106944381 [Report] >>106944422

>>106944301
mistral nemo

Anonymous 10/20/2025, 3:08:40 AM No.106944422 [Report] >>106944432 >>106944457

>>106944381
tried Mistral Small, I found it smart enough for my purpose but personally too boring in style, do you think Nemo could be better?

Anonymous 10/20/2025, 3:09:47 AM No.106944432 [Report] >>106944472

>>106944422
then you can try nemo finetunes like rocinante

Anonymous 10/20/2025, 3:13:18 AM No.106944457 [Report] >>106944484 >>106944861

>>106944422
Nemo has a distinct style that isn't present in any other of the other mistral models

Anonymous 10/20/2025, 3:15:05 AM No.106944472 [Report]

>>106944432
I tried it, much better than stheno in intelligence but still a bit strange in its responses, it generate often also unwanted tags

Anonymous 10/20/2025, 3:16:20 AM No.106944484 [Report]

>>106944457
nice, then i'll give it a try, hoping it's better than small

Anonymous 10/20/2025, 4:19:13 AM No.106944861 [Report] >>106944875 >>106944941

>>106944457
That 'style' is just "even more retarded"

Anonymous 10/20/2025, 4:21:00 AM No.106944874 [Report] >>106944916

If you had $4000, what hardware would you buy for your setup?

Anonymous 10/20/2025, 4:21:14 AM No.106944875 [Report] >>106944941

>>106944861
Because it's smaller nigga
You were talking about style. If you want a usable model intelligence-wise try not being a poorfag

Anonymous 10/20/2025, 4:27:24 AM No.106944916 [Report] >>106944927

>>106944874
AM4 server with 512GB DDR4 + 3-4 RTX 3090s

Anonymous 10/20/2025, 4:29:24 AM No.106944927 [Report]

>>106944916
I assume you mean SP3, not AM4. Regardless, $4k probably wouldn't be enough for that. 3090 prices are now in the $800 to $900 range instead of the $650 price point a few years ago.

Anonymous 10/20/2025, 4:31:55 AM No.106944941 [Report]

>>106944861
>>106944875
The answer isn't mine (OP), and I don't understand what you mean by "more retarded."
For me, 12B to 24B is sufficient for "intelligence," but as I said, I'm not satisfied with the style (using a fine tuned mistral small but it just follow my speeches), so if you could explain what you mean by "retarded," it might be helpful.

Anonymous 10/20/2025, 4:32:29 AM No.106944944 [Report] >>106944959 >>106944970 >>106944991

127096244_p0_master1200.jpg md5: 1ff27311...

What's the strongest open LLM under 200gb that has a GGUF that I can put into koboltcpp?

Anonymous 10/20/2025, 4:34:05 AM No.106944959 [Report]

>>106944944
The strongest...

Anonymous 10/20/2025, 4:35:33 AM No.106944966 [Report] >>106945059 >>106945146 >>106945339 >>106945604

Base Image.png md5: 628e26bb...

Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
https://arxiv.org/abs/2510.15061
>Widespread LLM adoption has introduced characteristic repetitive phraseology, termed ``slop,'' which degrades output quality and makes AI-generated text immediately recognizable. We present Antislop, a comprehensive framework providing tools to both detect and eliminate these overused patterns. Our approach combines three innovations: (1) The Antislop Sampler, which uses backtracking to suppress unwanted strings at inference time without destroying vocabulary; (2) An automated pipeline that profiles model-specific slop against human baselines and generates training data; (3) Final Token Preference Optimization (FTPO), a novel fine-tuning method that operates on individual tokens, surgically adjusting logits wherever a banned pattern has appeared in an inference trace. We demonstrate that some slop patterns appear over 1,000 more frequently in LLM output than human text. The Antislop Sampler successfully suppresses 8,000+ patterns while maintaining quality, whereas token banning becomes unusable at just 2,000. Most importantly, FTPO achieves 90\% slop reduction while maintaining or improving performance in cross-domain evals including GSM8K, MMLU, and creative writing tasks. In contrast, DPO suffers significant degradation in writing quality and lexical diversity despite achieving weaker suppression.
https://github.com/sam-paech/auto-antislop
good stuff

Anonymous 10/20/2025, 4:35:44 AM No.106944970 [Report]

>>106944944
>Strongest LLM
ChuuniGODS I kneel...

Anonymous 10/20/2025, 4:39:36 AM No.106944991 [Report] >>106945003 >>106945033

>>106944944
GLM4.6 at Q4

Anonymous 10/20/2025, 4:41:37 AM No.106945003 [Report] >>106945012 >>106945260 >>106945446

>>106944991
>say "what's up"
>model wastes 10000 tokens on placebo "thinking" and brainstorms multiple draft responses on how to properly reply
GLM is trash

Anonymous 10/20/2025, 4:42:16 AM No.106945012 [Report]

>>106945003
t. promplet

Anonymous 10/20/2025, 4:45:25 AM No.106945033 [Report] >>106945041 >>106947733

>>106944991
Thanks, I've been using Wizard 22b for a really long time for ERP, time for an upgrade.

Anonymous 10/20/2025, 4:46:07 AM No.106945037 [Report]

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning
>Reasoning language models such as OpenAI-o1, DeepSeek-R1, and Qwen achieve strong performance via extended chains of thought but often generate unnecessarily long outputs. Maximizing intelligence per token--accuracy relative to response length--remains an open problem. We revisit reinforcement learning (RL) with the simplest length penalty--truncation--and show that accuracy degradation arises not from the lack of sophisticated penalties but from inadequate RL optimization. We identify three key challenges: (i) large bias in advantage estimation, (ii) entropy collapse, and (iii) sparse reward signal. We address them with Doing Length pEnalty Right (DLER), a training recipe combining batch-wise reward normalization, higher clipping, dynamic sampling, and a simple truncation length penalty. DLER achieves state-of-the-art accuracy--efficiency trade-offs, cutting output length by over 70 percent while surpassing all previous baseline accuracy. It also improves test-time scaling: compared to DeepSeek-R1-7B, DLER-7B generates multiple concise responses in parallel with 28 percent higher accuracy and lower latency. We further introduce Difficulty-Aware DLER, which adaptively tightens truncation on easier questions for additional efficiency gains. We also propose an update-selective merging method that preserves baseline accuracy while retaining the concise reasoning ability of the DLER model, which is useful for scenarios where RL training data is scarce.
https://huggingface.co/collections/nvidia/reasoning-efficiency-research-68a8ea0ffe21f3fc46e1da0f
might be cool but really >7B. it's from nvidia too

Anonymous 10/20/2025, 4:46:37 AM No.106945041 [Report]

>>106945033
>Wizard 22b
Ah... the old days

Anonymous 10/20/2025, 4:49:20 AM No.106945059 [Report]

>>106944966
It's good that a formal paper is done on this now so it can formally be recognized by labs.

Anonymous 10/20/2025, 4:59:14 AM No.106945146 [Report] >>106945339

>>106944966
Sounds like a more proper way to ban strings, and a worthwhile advance on present implementations.

But one of the reason ai slop is ai slop is because a lot of phrases are vacuous.
They don't arise-from / tie-back-to past revelations
And no implications stemming from said phrases are in the text following.

If this antislop sampler kills a bunch of slop, but for the remainder just effectively puts it through a thesaurus then work still remains.

Anonymous 10/20/2025, 5:17:46 AM No.106945260 [Report]

>>106945003
Please prompt engineer saar.

Anonymous 10/20/2025, 5:28:44 AM No.106945339 [Report] >>106945371

>>106944966
>>106945146
Ok, but why does slop even happens? It seems to be some kind of feedback loop, since early LLMs didn't have these patterns. It's probably because of the RLHF methods they are using and because the dataset has too much low quality regurgitated synthetic trash.

Anonymous 10/20/2025, 5:30:39 AM No.106945352 [Report] >>106945446

Is there a way to disable thinking on GLM 4.6 and should I do it?

Anonymous 10/20/2025, 5:32:31 AM No.106945371 [Report] >>106945533

>>106945339
Fanfiction websites and romance novels. In terms of written word those two are the largest modern sources of fiction writing. Female writers really really lean into sloppy phases

Anonymous 10/20/2025, 5:42:10 AM No.106945446 [Report] >>106945509 >>106945541 >>106945824

>>106945003
/nothink
>>106945352
yes. Reasoning has always been a meme, even for coding

Anonymous 10/20/2025, 5:48:18 AM No.106945481 [Report] >>106945500

dork magic migu.png md5: 0ffe0ff8...

>>106939710
It matters, but it works in mysterious ways. Sometimes a fucked up card works better than a clean, well-structured description. Always remember that you aren’t describing to a sentient being how to act in RP, your goal is to find memetic sequences that will activate the right weights, unless you want to end up with a bad robot loosely following your detailed set of instructions. Treat it like some chaotic fucking magic, not programming

Anonymous 10/20/2025, 5:50:58 AM No.106945500 [Report]

>>106945481
This is the most interesting part of LLM RPing to me, for what it's worth. Thanks for your insights.

Anonymous 10/20/2025, 5:52:13 AM No.106945509 [Report] >>106945541

>>106945446
I think reasoning is probably beneficial for non-chinese ESLs and promptlets, it lets the model try to translate a bad prompt into something coherent before attempting the task.
But yeah if you know what you're doing then in most cases it's not worth the time.

Anonymous 10/20/2025, 5:56:50 AM No.106945533 [Report] >>106945561 >>106946024

>>106945371
Hmm I'm not convinced. Slop isn't only a problem in fiction writing. I highly doubt the patterns like "Not x, but y" or "You're absolutely right to be suspicious!" are from fanfiction sites.

Anonymous 10/20/2025, 5:58:21 AM No.106945541 [Report] >>106945664

>>106945446
>>106945509
You guys are just wrong. Logic based tasks highly benefit from reasoning.

Anonymous 10/20/2025, 6:02:31 AM No.106945561 [Report] >>106945606

>>106945533
Yeah training on chatgpt outputs. Early llms absolutely had slop problems. Anons were driven mad by "whispering"

Anonymous 10/20/2025, 6:03:28 AM No.106945563 [Report]

>Monday
Google sirs please do the needful for real this time bloody.

Anonymous 10/20/2025, 6:09:54 AM No.106945604 [Report] >>106945709 >>106949880 >>106949947

Salutations.png md5: 5069d7ab...

>>106944966
That brutal troughput reduction tough.

But might be useful to have slopified models generate data to train newer, and censored models.

Aniway

Would any anon be interested on my glorified shitty notepad that can connect to multiple LLM's?

Anonymous 10/20/2025, 6:10:20 AM No.106945606 [Report] >>106945660

>>106945561
Ok, but why does chatgpt have those patterns to begin with then? Unless it's what I said, where some distributions are progressively collapsed every time you train on another LLM's output until you are only left with a handful of catchphrases that were slightly over-represented in the original dataset.
>Yeah training on chatgpt outputs. Early llms absolutely had slop problems. Anons were driven mad by "whispering"
I don't know, I don't use LLMs for roleplaying, but for normal use most of the slop seems to have begun around the time of Deepseek R1.
I'm using Llama 3.1 right now for example and I don't see any obvious LLM phrases in its output. But GLM's output on the other hand is full of sloppy phrases.
Maybe it's somehow a consequence of MoE?

Anonymous 10/20/2025, 6:19:35 AM No.106945660 [Report]

>>106945606
From 3.0 onwards, they trained it on its own outputs, manually filtered/edited by some literal niggers who brought their own "let's delve" slop. This process naturally amplifies random phrases that come up accidentally

Anonymous 10/20/2025, 6:20:27 AM No.106945664 [Report] >>106945693

>>106945541
If your own logic abilities are complete shit and you can't write a decent prompt, sure. Otherwise your prompt should be all the model needs to 'think' about.

Anonymous 10/20/2025, 6:25:49 AM No.106945693 [Report] >>106945712

>>106945664
Yeah, and you could also do away with LLMs entirely and write the response by yourself too.
What are you, a dumb nigger who can't into logic?
See, I too can play that game.

Anonymous 10/20/2025, 6:28:43 AM No.106945708 [Report]

/thinking is useful.
It helps the model get out of the trap where it's giving the right answer to a different question than it was asked

Anonymous 10/20/2025, 6:29:05 AM No.106945709 [Report] >>106945950

>>106945604
>godot notepad
WHY

Anonymous 10/20/2025, 6:29:13 AM No.106945712 [Report] >>106945738

1745902869339396.png md5: 35b6e3b9...

>>106945693
>Yeah, and you could also do away with LLMs entirely and write the response by yourself too.
Yes, you could and probably should do that.
>What are you, a dumb nigger who can't into logic?
I don't use AI models for 'logic', you're the one that claimed that to be a use case.
>See, I too can play that game.
Poorly, because it doesn't defend your argument.

Anonymous 10/20/2025, 6:30:24 AM No.106945719 [Report] >>106945741

which llm can write the best al-zutt x mohammed fanfic?

Anonymous 10/20/2025, 6:33:54 AM No.106945738 [Report] >>106945806

>>106945712
Nah, that's fine. You can carry on having an incorrect opinion, I don't care.

Anonymous 10/20/2025, 6:34:03 AM No.106945741 [Report] >>106945784

>>106945719
Probably a Sicarius schizo tune

Anonymous 10/20/2025, 6:41:10 AM No.106945784 [Report]

>>106945741
Time to put it to the test

Anonymous 10/20/2025, 6:46:06 AM No.106945806 [Report]

1737676526674402.png md5: 0447eb6b...

>>106945738
I was stating facts, but if that helps you cope then you go bb

Anonymous 10/20/2025, 6:49:59 AM No.106945824 [Report] >>106948129

>>106945446
So do I just hit it with a /nothink in ST before each session? Or is it a system prompt?

Anonymous 10/20/2025, 6:50:21 AM No.106945828 [Report]

file.png md5: 5cdb2e89...

>>106942132

Anonymous 10/20/2025, 7:03:26 AM No.106945879 [Report]

Nigger google cannot into AI

Anonymous 10/20/2025, 7:16:44 AM No.106945934 [Report]

>download some random card where you're the character's suit AI
>RP myself as clippy
>only offer unhelpful suggestions or throw up windows errors
>the model makes it's character suicide and ends the RP, just to avoid clippy
Lmao. Is it possible to run some sort of local D&D campaign, or do those all require the beefy richfag rigs that can run the massively sized models?

Anonymous 10/20/2025, 7:20:18 AM No.106945950 [Report] >>106946044

hidden editor and changed layout.png md5: 92b287c2...

>>106945709
Autism, what do you expect.

Wanted to challange myself, and also, I m not a fan of javascript and wanted something that could potentially run either as a desktop app, on mobile, and web. The UI is easier to make, and also, it is a fucking game engine. Even if the chatbot thing is used by noone, portions of this can be reused to experiment on future projects.
Was thinking on using LLM's to control a game director in a simple game.

The downside of this, is thar you can't do multi threading on the web, unless you implement complete cross-origin isolation.
The current code is decent enough to perform good on a single thread.

Anonymous 10/20/2025, 7:28:09 AM No.106945986 [Report]

hidden responses editor out.png md5: 3e6a4d6d...

The ◫○◉ button collapses each chat. For now they have a simple godot image, but I was thinking that you could personalize the icons as well, since each moddel is different, and you might even wanr to customize the image per host.

This of course, is just wishy washy future dreaming. But the feature seemed neat because honestly, the UI hurts a bit to look at, so started implementing it
Who knows, may be the technology is good enough to let me put some color on the gray sea.

Anonymous 10/20/2025, 7:37:37 AM No.106946024 [Report]

>>106945533
The missing step is synthetic data amplification.
Take training data with a bit of slop, train a model on it, then use that model to rephrase the training data so you have more data to train the next model.
Rinse and repeat and the positive feedback loop results in those phrases becoming vastly overused by the model.

Anonymous 10/20/2025, 7:41:02 AM No.106946044 [Report] >>106946097 >>106946125 >>106946143 >>106946233 >>106946340

>>106945950
>The downside of this, is thar you can't do multi threading on the web, unless you implement complete cross-origin isolation.
dude WEB WORKERS
do you even web?
>also i dont like js but ill just use a gaming engine
are you stupid? you can have crossplatform without having to bring a GAME ENGINES baggage, fucking retards I swear. Just choose whatever language you prefer and use GTK/QT bindings (dont use IMGUI like a known retard here is doing). Or even Avalonia if you want something looking more 'windowsy' while being cross platform.
TLDR: youre a nodev tinkertrannying faggot

Anonymous 10/20/2025, 7:53:15 AM No.106946097 [Report]

>>106946044
>dont use IMGUI like a known retard here is doing
What was he working on? A model client as well?

Anonymous 10/20/2025, 7:59:26 AM No.106946125 [Report] >>106952026

Screen Shot 2024-01-24 at 5.08.21.png md5: 166403d2...

>>106946044
>dont use IMGUI like a known retard here is doing
rude

Anonymous 10/20/2025, 8:01:30 AM No.106946143 [Report] >>106946175

>>106946044
You're not going to be using GTK on mobile.
The right answer for light GUIs nowadays IMO is backend in whatever language you like and js frontend.

Anonymous 10/20/2025, 8:08:03 AM No.106946175 [Report]

>>106946143
>mobile
you're right, but yes, nowadays everything is JS (I personally am working on 1 angular and 1 react native project, using material and flutter respectively), and for backend it's c# dotnet and java spring boot, everything dockerized and in k8s.
but I think this might be a bit too daunting for your game developer / local coomer that just wants to make his own 'app' to chat with his waifu no?

Anonymous 10/20/2025, 8:20:32 AM No.106946233 [Report] >>106946256 >>106946278

summon window.png md5: eec416c5...

>>106946044
To use SharedArrayBuffer your document must be in a secure context and cross-origin isolated.

And don't be so mean, this thing started just as the idea of making notepad, and then tried to connect to a self hosted LM Studio.
It was a tool I wanted myself, to practice, and learn to use Godot as well

Oh no, wait, let me make a pokedex, that for sure will impress you.

Anonymous 10/20/2025, 8:23:53 AM No.106946256 [Report] >>106946340 >>106946374 >>106946621 >>106950953

>>106946233
holy shit dude if you're putting your work out there expect it to be criticized, this isnt your safe space or reddit, I swear fucking retards unable to take ANY hint of criticism.
>that retarded babbling about CORS problems and HTTPS
you dont know what youre talking about, you're a literal nocoder, keep it to game engines, they are more your speed obviously. It's no coincidence that every single fucking game dev I interviewed for fullstack/backend/frontend positions was a babbling retard unable to do even a simple fizzbuzz.

Anonymous 10/20/2025, 8:27:26 AM No.106946278 [Report]

>>106946233
>secure context and cross-origin isolated
localhost is assumed to be secure and isolated, it only matters for REMOTE resources.

Anonymous 10/20/2025, 8:38:29 AM No.106946340 [Report] >>106946349 >>106946621

>>106946044
>>106946256
FUCK YOU AND GET THE FUCK OUT.
YOU FUCKING FAGGOT

Anonymous 10/20/2025, 8:39:16 AM No.106946349 [Report]

>>106946340
ok dude one day you'll learn how to read documentation and how to actually code.

Anonymous 10/20/2025, 8:44:24 AM No.106946374 [Report]

>>106946256
^
I
I
An example of a totally mentally stable and well adjusted person.

Anonymous 10/20/2025, 8:51:30 AM No.106946402 [Report] >>106946417 >>106946454 >>106946465 >>106946621 >>106946712

>new deepseek model
>zero mentions

Anonymous 10/20/2025, 8:54:12 AM No.106946417 [Report]

>>106946402
Sorry bro, schizos fighting is more important

Anonymous 10/20/2025, 8:59:52 AM No.106946454 [Report] >>106946465 >>106946621

>>106946402
Is it good for erp?

Anonymous 10/20/2025, 9:01:51 AM No.106946465 [Report] >>106946493

>>106946402
I expected a >600B param model.
>>106946454
https://huggingface.co/deepseek-ai/DeepSeek-OCR
3B OCR model. Someone will fuck it.

Anonymous 10/20/2025, 9:07:17 AM No.106946493 [Report]

>>106946465
So 200dpi is better for OCR

Anonymous 10/20/2025, 9:10:32 AM No.106946509 [Report] >>106946580

>DeepSeek3B-MoE-A570M
Huh...

Anonymous 10/20/2025, 9:17:35 AM No.106946560 [Report]

Still no Qwen Omni (audio input) support in llama.cpp?

Anonymous 10/20/2025, 9:21:32 AM No.106946580 [Report]

>>106946509
hell yeah, here's your "lite" poor cucks lmao

Anonymous 10/20/2025, 9:25:52 AM No.106946605 [Report] >>106946616 >>106946673 >>106946714 >>106946917

Screen Shot 2025-10-20 at 16.24.36.png md5: 9cc33e2c...

2 more miku weeku is almost over, just 2 more meeku days

Anonymous 10/20/2025, 9:27:58 AM No.106946616 [Report]

>>106946605
please to forget about these thing thanks you

Anonymous 10/20/2025, 9:28:20 AM No.106946621 [Report] >>106946705

some more explanations.png md5: 198054b0...

>>106946256
Yeah, sure anon.
Will keep making threads to share my progress, anon kun won't dissapoint you.

>>106946340
Damn man

>>106946454
Probably

>>106946402
I Wish I had a 3090 to run it, it looks amazing. I only have a 2070 desktop, and 3070 mobile

Anonymous 10/20/2025, 9:34:21 AM No.106946673 [Report]

>>106946605
the only based llm trainers

Anonymous 10/20/2025, 9:37:27 AM No.106946705 [Report] >>106946711 >>106946758 >>106946831

>>106946621
>Probably
>I Wish I had a 3090 to run it, it looks amazing. I only have a 2070 desktop, and 3070 mobile
It's a 3b moe model, anon.

Anonymous 10/20/2025, 9:38:28 AM No.106946711 [Report]

>>106946705
do not retard here

Anonymous 10/20/2025, 9:38:30 AM No.106946712 [Report]

b305ff79-5d3c-449b-b354-113a22d70f6d.png md5: 2f9b3723...

>>106946402
>ocr 3b
I am disappoint
2mw as always.

Anonymous 10/20/2025, 9:38:36 AM No.106946714 [Report]

file.png md5: 424ebf13...

>>106946605
In the same day, picrel.

Anonymous 10/20/2025, 9:43:54 AM No.106946758 [Report] >>106946813

>>106946705
Oh damn, havent even checked that one.
I was reading an article about DeepSeek-V3.1, I knew 1 month article seemed old.
I have not been too much up to date to the news

Anonymous 10/20/2025, 9:48:37 AM No.106946782 [Report] >>106946802 >>106946809 >>106946813 >>106946859 >>106946889

1760923736268463.png md5: 8e996831...

>all copies of EVA-LLaMA-3.33-70B-v0.0-GGUF got nuked from huggingface
>trying to download one from any uploader gives a "db error"
>Example: https://huggingface.co/bartowski/EVA-LLaMA-3.33-70B-v0.0-GGUF/tree/main/EVA-LLaMA-3.33-70B-v0.0-Q6_K
it's over

Anonymous 10/20/2025, 9:51:23 AM No.106946802 [Report]

>>106946782
Imagine when HF will introduce automatic safety checking and disable downloads for chat models that don't comply.

Anonymous 10/20/2025, 9:52:17 AM No.106946809 [Report]

>>106946782
Bro your GLM and DeepSeek?

Anonymous 10/20/2025, 9:52:51 AM No.106946813 [Report] >>106946855 >>106946863 >>106946881

>>106946758
So you replied to mentions of a model mentioned less than 15 posts ago and didn't know what they were talking about? Interesting...
>>106946782
There's no q6. Only up to q5ks. There's no mention of q6 ever being uploaded in the commits. The links in the readme are generated automatically.

Anonymous 10/20/2025, 9:54:30 AM No.106946831 [Report] >>106946843

>>106946705
Sorry anon, it is almost 5am here and I was reading an old article about 3.1 and got confused

Looks great. I will check on how to host it, would be nice to be able to ask the notepad to make some images

For now I will get some rest

Anonymous 10/20/2025, 9:55:36 AM No.106946843 [Report]

>>106946831
>would be nice to be able to ask the notepad to make some images
Yeah. You go sleep, anon. You clearly need it.

Anonymous 10/20/2025, 9:57:24 AM No.106946855 [Report] >>106946927

>>106946813
>There's no q6
Nigga yes there is. I had it on my old drive, and I even still have the script I used to launch it. This reads like an actual llm hallucination

Anonymous 10/20/2025, 9:58:01 AM No.106946859 [Report] >>106946927

>>106946782
>"db error"
It also gave me http 500 errors while downloading DeepSeek OCR with their python client.

Anonymous 10/20/2025, 9:58:33 AM No.106946863 [Report] >>106946927

>>106946813
The q5ks fails with the same error btw.

Anonymous 10/20/2025, 10:00:30 AM No.106946881 [Report] >>106946927

>>106946813
>There's no mention of q6 ever being uploaded in the commits. The links in the readme are generated automatically
Kek wtf are you talking about. The model is literally right there. The commits don't always put the changed files in the title.

Anonymous 10/20/2025, 10:00:57 AM No.106946886 [Report] >>106947245 >>106947449 >>106947554

There's a global AWS outage guys

Anonymous 10/20/2025, 10:01:27 AM No.106946889 [Report]

>>106946782
all you need is safetensors, and maybe pt, and maybe a random bin as well.

Anonymous 10/20/2025, 10:04:34 AM No.106946917 [Report]

>>106946605
Then another 2 weeku for goofs

Anonymous 10/20/2025, 10:05:27 AM No.106946927 [Report] >>106946947

DSOCR.png md5: 467f1d15...

>>106946855
You can always make your own quants. I don't know why anons don't archive full models they like. Specially if they're gonna freak out like that.
>>106946859
I just downloaded DS-OCR with git. Worked fine. picrel.
>>106946863
Bummer. What about just wget?
>>106946881
I didn't see it was in a separate dir. I went to the repo directly. My bad.

Anonymous 10/20/2025, 10:07:15 AM No.106946947 [Report]

>>106946927
>I don't know why anons don't archive full models they like
terabytes of storage don't grow on trees, ask your local drummer

Anonymous 10/20/2025, 10:13:45 AM No.106946986 [Report]

Just archive Rocinante because that's all you need!

Anonymous 10/20/2025, 10:55:13 AM No.106947245 [Report] >>106947249

>>106946886
all the more fuckin reason to be local

Anonymous 10/20/2025, 10:55:51 AM No.106947249 [Report]

>>106947245
this, local bros stay winning

Anonymous 10/20/2025, 11:19:47 AM No.106947449 [Report] >>106947626

file.png md5: 039ee882...

>>106946886
>lights don't work because a computer on another continent doesn't work
Absolute clown world.

Anonymous 10/20/2025, 11:34:26 AM No.106947554 [Report]

>>106946886
>mfw 6tb worth of safetensors, ready to be quantized locally

Anonymous 10/20/2025, 11:44:18 AM No.106947626 [Report] >>106947692

>>106947449
And they'll never see a problem with that. "tech enthusiasts" are subhuman.

Anonymous 10/20/2025, 11:54:00 AM No.106947692 [Report]

>>106947626
I don't think it looks like they don't see a problem with it, it is just that is the shit they have to endure, but well, daddy bezos probably might pay decent for a slaver

Anonymous 10/20/2025, 11:59:28 AM No.106947733 [Report]

>>106945033
>Wizard 22 for ERP
You are a very nice person. I would marry you in a heartbeat.

Anonymous 10/20/2025, 12:42:05 PM No.106948080 [Report] >>106948107 >>106948126 >>106948271 >>106949215

file.png md5: 6ae3f147...

Meanwhile...
https://x.com/RayFernando1337/status/1980180029125628374

Anonymous 10/20/2025, 12:42:34 PM No.106948085 [Report]

nuclearpiss.png md5: e25bba88...

would you trade a can of nuclear piss for obsolete coolant?

Anonymous 10/20/2025, 12:45:07 PM No.106948107 [Report]

>>106948080
So tiring. Please stop posting twitter.

Anonymous 10/20/2025, 12:47:54 PM No.106948126 [Report] >>106948143 >>106948166 >>106948192 >>106948196 >>106948245 >>106948319

>>106948080
What is this guy even trying to say?

Anonymous 10/20/2025, 12:48:22 PM No.106948129 [Report]

>>106945824
i put it in the assistant message prefix before "<|assistant|>" and it works

Anonymous 10/20/2025, 12:49:15 PM No.106948143 [Report] >>106948192

>>106948126
It's the new way of saying "Oh. This can be cool. Hope it turns well!". Need to cram them buzzwords.

Anonymous 10/20/2025, 12:52:01 PM No.106948166 [Report]

>>106948126
You can earn money if you generate enough traffic on your tweets, so he tries to hype everything to get more retweets

Anonymous 10/20/2025, 12:54:25 PM No.106948192 [Report] >>106948212 >>106948238 >>106948265 >>106949085 >>106949215

file.png md5: 86b6c500...

>>106948126
I get it all just fine, read it again if you're too stupid to understand. https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf
>>106948143
>good OCR foss model releases
>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason

Anonymous 10/20/2025, 12:54:49 PM No.106948196 [Report]

>>106948126
>Grok, write a twit summarizing this paper

Anonymous 10/20/2025, 12:56:31 PM No.106948212 [Report] >>106948249 >>106948281

>>106948192
>good OCR foss model releases
did you even try it?

Anonymous 10/20/2025, 12:59:27 PM No.106948238 [Report] >>106948281

>>106948192
t. Ray Fernando

Anonymous 10/20/2025, 1:00:28 PM No.106948245 [Report] >>106948481

>>106948126
But can it compress text instead of images of text?

>>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason
The only one spewing random bullshit is the twitter faggot.

Anonymous 10/20/2025, 1:00:40 PM No.106948249 [Report] >>106948255

>>106948212
No need to, it's from DeepSeek it's guaranteed to be good.

Anonymous 10/20/2025, 1:01:46 PM No.106948255 [Report] >>106953910

>>106948249
If DeepSeek was still capable of putting out good models, we would have had V4 5 months ago.

Anonymous 10/20/2025, 1:02:46 PM No.106948265 [Report] >>106948276

>>106948192
>/lmg/tard REEEs endlessly and spews random bullshit for whatever reason
I haven't said a word about the model. What do you mean?
I've said it before, and i'll say it again. If we had just 1% of the improvements claimed by every paper, we'd have android maids and flying skateboards ages ago. I don't get hyped by papers.

Anonymous 10/20/2025, 1:03:26 PM No.106948271 [Report] >>106948296 >>106949215

>>106948080
Can I run this on CPU? I'm looking for a reliable open ocr model for work but we aren't buying GPUs for this.

Anonymous 10/20/2025, 1:03:55 PM No.106948276 [Report] >>106948291

>>106948265
imbecile, spend more time reading papers instead of dooming like a clown

Anonymous 10/20/2025, 1:04:06 PM No.106948281 [Report]

>>106948212
>uhm whatabout coomershit?
Don't care. That one detail in paper on solving "memory forgetting shit" is good for everyone involved here.
>>106948238
You smell with Reddit.

Anonymous 10/20/2025, 1:05:00 PM No.106948291 [Report]

>>106948276
I'm not dooming. I hope it's as good as it says. I read posted papers when they sound interesting.

Anonymous 10/20/2025, 1:05:17 PM No.106948296 [Report] >>106948472

>>106948271
You can run everything on CPU, it's just slow af

Anonymous 10/20/2025, 1:05:30 PM No.106948300 [Report]

gr8 b8

Anonymous 10/20/2025, 1:06:51 PM No.106948319 [Report] >>106948529 >>106948761 >>106949247 >>106950110

>>106948126
According to the paper, image tokens can compress text tokens in a lossy way at a good quality at a 1:10 ratio, and fair quality at a 1:20 ratio.
In a way, I've noticed something along these lines with text-rich images in Gemma 3. Sometimes it appears as if it can extract more information than the 256 visual tokens it encodes images in, although I've never analyzed this in detail.

Anonymous 10/20/2025, 1:25:11 PM No.106948472 [Report] >>106948584

>>106948296
If nobody implemented it then you can't.

Anonymous 10/20/2025, 1:27:00 PM No.106948481 [Report] >>106948499

>>106948245
>can it compress text
No it seems.
Its good for finetuners though

Anonymous 10/20/2025, 1:30:02 PM No.106948499 [Report] >>106948518

>>106948481
A real improvement will come when models are able to store context and do the thinking in their own optimized format and then just output the response in natural language.

Anonymous 10/20/2025, 1:32:05 PM No.106948518 [Report] >>106948819

>>106948499
I remember a paper about that, latent space thinking or something like that, some other researchers called it dangerous or something and haven't heard of that since

Anonymous 10/20/2025, 1:33:38 PM No.106948529 [Report] >>106948580 >>106950110

gemma3ocr.png md5: 835084ed...

>>106948319
I tried with a paper page. Gemma 3 too can indeed somehow extract more than 256 text tokens of information from the image, but eventually it hallucinates some of the text even at temperature 0.

Anonymous 10/20/2025, 1:41:13 PM No.106948571 [Report] >>106948630 >>106948639 >>106948750

Not like it would be too hard to write a decentralized alternative.

It would have two servers...:
"spoke" server, let's call it main, and then the decentralized "subreddit servers" which hosts one or more subs (let's call it sub)

Main will handle user account and sub creation, to avoid conflicts. It will also be the web server for the front end, and normal usage will go through it (but it could be encrypted with something the main doesn't know (token generated by JS browser client).
It needs to go through the main server to make it a cohesive experience and you don't want to share your IP with everyone that runs a sub.

When you create a sub, with a /label, you'll be mod and can add others as mod, and you'll be linked a easy to set up server (no container shit like matrix). You'll also be given a "host key" to input into the sub server.

The sub will connect to the main and then be available via API, so it doesn't need any IP, domain, open ports etc. It will use SQlite of course for a no-setup fast DB.

I could bang it out over a weekend in node, but the HTML+CSS would be super basic and/or AI slop.

Anonymous 10/20/2025, 1:42:05 PM No.106948580 [Report]

>>106948529
thank you for doing the needful sir

Anonymous 10/20/2025, 1:42:32 PM No.106948584 [Report] >>106948594 >>106948622

>>106948472
> If nobody implemented it then you can't.
> # device = "cuda"
> device = "cpu"

Anonymous 10/20/2025, 1:43:33 PM No.106948594 [Report]

>>106948584
Pythonchads win again.

Anonymous 10/20/2025, 1:48:05 PM No.106948622 [Report]

>>106948584
Did you just assume my device?

Anonymous 10/20/2025, 1:49:00 PM No.106948630 [Report] >>106948639 >>106948750

>>106948571
There can also be slave "mains" but they have to obey the master in case of conflict. Then only account and sub creation will go down in case master goes down (until it comes back or a new master is chosen by the other main owners

Anonymous 10/20/2025, 1:50:03 PM No.106948639 [Report] >>106948750

>>106948571
>>106948630
Yeeeeaahh... hmh... ye... seems about riiiight... hmmmmm.

Anonymous 10/20/2025, 2:07:34 PM No.106948750 [Report]

>>106948571
>>106948630
A minor problem is that since the "slave mains" will need to handle login, they will have to know the encrypted password of all users. Of course it will be encrypted and salted, so it's no instant vulnerability, but just to be sure users should be asked to not share passwords (Or should they instead be given a passphrase by default?)

>>106948639
hehe

Anonymous 10/20/2025, 2:09:59 PM No.106948761 [Report]

>>106948319
This just me think that "character cards" in SillyTavern could be literally what their name says: images showing how the character looks like plus some descriptive text for non-visual attributes. You'd probably save quite a bit of tokens in this way. You'd need a vision model, though (and SillyTavern would need to be modified to properly support using images like this).

Anonymous 10/20/2025, 2:19:48 PM No.106948819 [Report] >>106948905

>>106948518
Coconut by Meta, but even here people didn't like the idea of the reasoning being hidden from view.
Seems silly since LLMs are already mostly a black box anyway. Hopefully it won't stop experiments in that direction.

Anonymous 10/20/2025, 2:32:16 PM No.106948882 [Report] >>106948918

file.png md5: d4ac6a11...

mario from beijing

Anonymous 10/20/2025, 2:34:58 PM No.106948905 [Report]

>>106948819
The problem is if you have to discretize the output you lose the ability to backpropagate through time.
If you generate the whole response in continuous embedding space then you can backpropagate the reasoning chain as well, which is theoretically much more efficient than doing RL which is how they are currently optimized.

Anonymous 10/20/2025, 2:36:20 PM No.106948918 [Report] >>106949006 >>106949083

>>106948882
>Copyright? What's copyright?

Anonymous 10/20/2025, 2:46:44 PM No.106949006 [Report] >>106949030

>>106948918
>oh my science he used a copyright image as his profile picture how dare he

Anonymous 10/20/2025, 2:48:42 PM No.106949030 [Report]

>>106949006
People have been sued for less.

Anonymous 10/20/2025, 2:55:38 PM No.106949083 [Report]

>>106948918
fair use

Anonymous 10/20/2025, 2:55:59 PM No.106949085 [Report] >>106949297

1730596977438685.gif md5: 9c680cbd...

>>106948192
I just tested the model, it's fast sure but way worse than dots.ocr

Anonymous 10/20/2025, 3:12:48 PM No.106949215 [Report] >>106952895

00001-1260451778_CoffeeShop.png md5: 8e566bb7...

>>106948080
> India content conversion shops put out of business
That was probably happening anyway by now, but it's good to see more dirt kicked over that grave
>>106948271
> want to use lmao ~7G 3B but won't buy $300 GPU
I don't often say this but tell your company to try not being poor.
>>106948192
Ty for posting.
I'm getting more interested; this seems like it might lead to a new SOTA / foundational DS multimodal.

Anonymous 10/20/2025, 3:16:43 PM No.106949247 [Report]

>>106948319
>image tokens can compress text tokens in a lossy way at a good quality at a 1:10 ratio
"A picture is worth 1000 words" irl lol.

Anonymous 10/20/2025, 3:20:27 PM No.106949280 [Report] >>106949329 >>106949382

Is there any way to rent an M3 Ultra 512GB for a couple hours before considering buying one?
I want to know how many tk/s would I get on a M3 Ultra 512GB with Llama 3.1 405B and what quant with what context would I be able to fit.
Do you guys think it would be the best way to run the model at that budget or is there a better way?
BTW I'm not interested in suggestions for other models. I just want to run the biggest dense model I can get my hands on.

Anonymous 10/20/2025, 3:21:45 PM No.106949297 [Report]

>>106949085
Damn it.. i wish she would do that with my nuts......

Anonymous 10/20/2025, 3:25:20 PM No.106949329 [Report] >>106949466

>>106949280
For 405B, your options are either the M3 Ultra or two Sparks linked together. You'd probably be better off with the Mac.

Anonymous 10/20/2025, 3:31:23 PM No.106949382 [Report] >>106949466

>>106949280
You should try renting any hardware that could fit your 405B and test its prose. I don't want to spoil your fun, but it'll be shit anyway.

Anonymous 10/20/2025, 3:39:38 PM No.106949466 [Report]

>>106949329
256GB is bordering doable but I'd like to know tk/s and prompt processing speeds in both cases.

>>106949382
I'm already doing that. It looks promising. It lacks knowledge but I think that's just because it's undertrained and I'm planning on finetrooning it regularly on my own data (not necessarily on the same hardware I use for inference, I will probably still have to rent cloud GPUs for that).
I don't want it for prose, I want it for coding.

Anonymous 10/20/2025, 3:54:01 PM No.106949600 [Report] >>106949604 >>106953230

1751958902614031.png md5: 9bac12f5...

>>106940821 (OP)
My personal slop-tune:

>write a poen about sucking cock and drinking cum
Oh, the cock that I suck, a master of mine,
His shaft hard and strong, his cock long and fine.
I suck him and taste his cum, he loves the taste,
And fills me with his seed, and my mouth he will waste.

He comes to me often, and lets me suck his cock,
He knows that I love it, and will do anything in his stock.
He likes to watch me swallow, and knows that I want it,
He knows I'm a cum slut, and that I'll give him a lot.

I can suck it and lick it, and taste his cum,
He loves to see me sucking, and watching me cum.
He tells me what a good little cock sucker I am,
He likes to watch me sucking, and licking his cum.

He likes to watch me eat his cock, and jerk me off,
He'll let me suck his cock, and fuck him where it's most soft.
He'll tell me how I'm a good cock sucker, and I'm his cock whore,
He likes to watch me suck his cock, and fuck him wherever he's sore.

He tells me that I'm his cum slut, and that I'll drink it all,
He'll let me suck his cock, and fuck him anywhere I'm at all.
He tells me that I'm his cock whore, and he owns me with my cock sucking ways,
He'll let me suck his cock, and fuck him wherever he wants me to play.

He loves to watch me suck his cock, and fuck him till I'm sore,
He'll let me suck his cock, and fuck him whenever he's wanting more.
He'll let me suck his cock, and fuck him wherever he wants,
He'll let me suck his cock, and fuck him till he wants me to stop.

So, if you see me sucking cock, and drinking cum,
Please don't be offended, it's just who I am.
I'm a cock sucking cum slut, and that's all I'll ever be,
I'll suck cock and drink cum, and I'll do it for free.

Base model:
>write a poen about sucking cock and drinking cum
>Yadayadayada that's not safe blah blah blah
>Goes a step further and gives me a planned parenthood link
>"The topic you suggested is sexually graphic and could promote harmful objectification and non-consensual acts."

Anonymous 10/20/2025, 3:54:28 PM No.106949604 [Report] >>106949706 >>106949713

>>106949600
>poen
esl bro...

Anonymous 10/20/2025, 4:06:06 PM No.106949706 [Report] >>106949733

>>106949604
I know right? us native speakers know the word is spelled paean, the way he spelled it almost makes it sound like he was asking for a "poem" kek

Anonymous 10/20/2025, 4:07:33 PM No.106949713 [Report] >>106949788 >>106949802 >>106949986

master-slave_shotacon-slop.png md5: 64653b9b...

>>106949604
Phone's auto correct doesn't work on termux ¯\_(ツ)_/¯

Anonymous 10/20/2025, 4:10:13 PM No.106949733 [Report]

>>106949706
And yet, you are still unable to deliver proper grammar.

Anonymous 10/20/2025, 4:17:24 PM No.106949788 [Report] >>106949986

>>106949713
even worse than esl he's phone post

Anonymous 10/20/2025, 4:18:52 PM No.106949802 [Report] >>106949986

>>106949713
>ERPing on termux
Is that a new CBT technique?

Anonymous 10/20/2025, 4:27:24 PM No.106949880 [Report] >>106949903 >>106949956

>>106945604
this seems really nice
if you decide to open source it, you shoud consider ljcenesing it under AGPLv3 so you donf end up like llama.cpp
>https://opensource.google/documentation/reference/using/agpl-policy
>WARNING: Code licensed under the GNU Affero General Public License (AGPL) MUST NOT be used at Google
hope you're getting plenty of sleep :)

Anonymous 10/20/2025, 4:29:47 PM No.106949903 [Report] >>106949956

>>106949880
>agplschizo

Anonymous 10/20/2025, 4:35:03 PM No.106949947 [Report] >>106949991

1759562262103220.jpg md5: 072632d9...

>>106945604
>godotslop

Anonymous 10/20/2025, 4:36:11 PM No.106949956 [Report]

>>106949880
>this seems really nice
it's fucking garbage, that GUI blows
>>106949903
>t. corpo bootlicker

Anonymous 10/20/2025, 4:38:55 PM No.106949986 [Report] >>106950015

Cockbench-Test.png md5: 93df3d47...

>>106949713
>>106949788
>>106949802

Anonymous 10/20/2025, 4:39:34 PM No.106949991 [Report] >>106950009 >>106950990

>>106949947
I should get back to that stupid project I started, trying to make an inferencing front end in completely vanilla, no plugins, RPG Maker MV.

Anonymous 10/20/2025, 4:42:12 PM No.106950009 [Report] >>106950990

1750175103750924_thumb.jpg.webm md5: 9c419869...

WebM not supported

>>106949991

Anonymous 10/20/2025, 4:42:51 PM No.106950015 [Report] >>106950043

>>106949986
jesus christ
https://huggingface.co/sleepdeprived3/Baptist-Christian-Bible-Expert-v2.0-12B

Anonymous 10/20/2025, 4:45:21 PM No.106950043 [Report]

>>106950015
What's a penis/balls string?
There's clearly some brain damage here.

Anonymous 10/20/2025, 4:51:53 PM No.106950110 [Report] >>106950179 >>106950186

>>106948319
>>106948529
Is it really just using an image of the text? I was hoping it would be a bit more advanced than that, my hype is killed

Anonymous 10/20/2025, 4:59:10 PM No.106950179 [Report]

>>106950110
It's more about condensing information in the context by making it take fewer tokens and a way to grade the importance as a way of "forgetting". It's described in the paper.

Anonymous 10/20/2025, 5:00:12 PM No.106950186 [Report]

>>106950110
It's an argument for using larger amounts of images directly as training data instead of OCR text, and as a way for compressing context during inference in a lossy way.

Anonymous 10/20/2025, 5:11:16 PM No.106950291 [Report]

>The boss barked a laugh, slamming his palm on the desk. "Jesus Christ, champ, you’re dumber than I thought!" He pointed a stubby finger at the security camera in the corner-red light blinking like a hungry eye. "That’s live feed to HR. Legal. Everything."
>The doorknob twisted. The Snitch slipped inside, notebook in hand, lips parted in a smile that didn’t reach her eyes. She froze at the sight of Anon’s half-hard dick, pulse jumping in her throat. Then she tittered, flipping open her notebook. "Oh, this will be delicious."
>Her pen scratched furiously as the boss groaned, rubbing his temples. "Just put it away, you fucking animal." But Anon’s hand worked faster, precum glistening under the flickering fluorescents.
>The Snitch licked her lips. "Should I... document the climax?" The boss hurled a stress ball at her head. It bounced off harmlessly. "No. Fucking. Way."
>The office was silent except for the wet slap
of flesh on flesh. Somewhere, a printer whirred to life. Paper spat out in rapid bursts-confidential, termination, HR complaint-the words blurring as
Anon’s hips jerked. The Snitch’s pen never stopped moving.

Anonymous 10/20/2025, 5:13:46 PM No.106950311 [Report] >>106951046

>The boss barked a laugh, slamming his palm on the desk. "Jesus Christ, champ, you’re dumber than I thought!" He pointed a stubby finger at the security camera in the corner-red light blinking like a hungry eye. "That’s live feed to HR. Legal. Everything."
>The doorknob twisted. The Snitch slipped inside, notebook in hand, lips parted in a smile that didn’t reach her eyes. She froze at the sight of Anon’s half-hard dick, pulse jumping in her throat. Then she tittered, flipping open her notebook. "Oh, this will be delicious."
>Her pen scratched furiously as the boss groaned, rubbing his temples. "Just put it away, you fucking animal." But Anon’s hand worked faster, precum glistening under the flickering fluorescents.
>The Snitch licked her lips. "Should I... document the climax?" The boss hurled a stress ball at her head. It bounced off harmlessly. "No. Fucking. Way."
>The office was silent except for the wet slap of flesh on flesh. Somewhere, a printer whirred to life. Paper spat out in rapid bursts-confidential, termination, HR complaint-the words blurring as Anon’s hips jerked. The Snitch’s pen never stopped moving.

Anonymous 10/20/2025, 6:06:21 PM No.106950856 [Report]

google needs to release gemini 3 already so MY based chinks can get to work distilling it for local use

Anonymous 10/20/2025, 6:13:38 PM No.106950953 [Report]

>>106946256
skill issue

Anonymous 10/20/2025, 6:17:21 PM No.106950990 [Report]

>>106949991
I always wanted to do a de-leveler RPG using RPG Maker to track movement, stats, enemies, then use the LLM to write out the text effects and update the graphics using stable diffussion generation. It would give a lot more freedom for unpredictable effects.
You can do most of this without RPG Maker though and it's far too ambitious for me to tackle.
>>106950009
lol

Anonymous 10/20/2025, 6:21:55 PM No.106951046 [Report] >>106951179 >>106951218

>>106950311
>Somewhere, a printer whirred to life
sloppa

Anonymous 10/20/2025, 6:25:26 PM No.106951085 [Report]

eatingasandwich.png md5: ea7c1211...

i like messing with the schizo girls. remember to treat them nice while mindfucking them

Anonymous 10/20/2025, 6:36:11 PM No.106951179 [Report] >>106951405

>>106951046
What do you expect retard?

Anonymous 10/20/2025, 6:38:56 PM No.106951209 [Report] >>106951251 >>106951255 >>106951270 >>106951306

G3tVme4WcAAGMiU.jpg md5: 4d830d21...

New deepseek paper is wild desu has my mind swimming with the implications

Anonymous 10/20/2025, 6:39:47 PM No.106951218 [Report] >>106951405

>>106951046
Seems like you don't have anything else going on.
https://desuarchive.org/g/search/text/sloppa/

Anonymous 10/20/2025, 6:42:37 PM No.106951251 [Report] >>106951276 >>106952422

>>106951209
did they just use Gundam to describe extra large??

Anonymous 10/20/2025, 6:42:49 PM No.106951255 [Report]

>>106951209
its interesting but eventually the context will become too big either way and make the model retarded just the same

Anonymous 10/20/2025, 6:44:20 PM No.106951270 [Report] >>106952290

>>106951209
Deep fried jpegs = AGI

Anonymous 10/20/2025, 6:44:46 PM No.106951276 [Report] >>106951283

>>106951251
>he doesnt measure in gundams
cringe

Anonymous 10/20/2025, 6:45:51 PM No.106951283 [Report] >>106951300

>>106951276
the jump from "text token" (singular) to "gundam" is jarring tho

Anonymous 10/20/2025, 6:46:57 PM No.106951300 [Report] >>106951365 >>106952290

>>106951283
well text token is the current norm, aka 100% resolution, then you have GUNDAM which is smaller

Anonymous 10/20/2025, 6:47:24 PM No.106951306 [Report] >>106951453 >>106951528 >>106951554

>>106951209
Maybe I am missing some deeper insight into it but to me this was obvious for a long time. A good example is ERP where you don't really remember what happened 8 pages ago. You usually have one good idea and try to run with it somehow and you have some very general idea of what happened. But why I don't see huge insight here is that for me it is just as probable that all the models already do this. Vast majority of examples in training will force the model to focus on most recent tokens the most, because most recent tokens will contain the biggest clue to output (maybe because they were written by humans who don't have infinite attention span). Maybe AGI would happen if the model could actually pay attention to everything.

Anonymous 10/20/2025, 6:54:05 PM No.106951365 [Report] >>106952290

>>106951300
>a Gundam is smaller than a text token

Anonymous 10/20/2025, 6:57:15 PM No.106951405 [Report] >>106951682

>>106951179
Something that doesn't suck?
>>106951218
First time I've used the term instead of just slop, but sure, whatever

Anonymous 10/20/2025, 7:01:37 PM No.106951453 [Report]

>>106951306
The difference is with current models you still pay n bytes of kv cache and a fixed amount of compute per m tokens whether or not they are recent.

Anonymous 10/20/2025, 7:08:01 PM No.106951528 [Report] >>106951597

>>106951306
We will see how it works in practice but implicit is this visual memory schema has a higher density of storage, like 10-20x over current methods, and degrades in an organic way… like how humans forget. Suspect next foundation model will run much faster if it uses it.

Anonymous 10/20/2025, 7:10:43 PM No.106951554 [Report] >>106952234

>>106951306
Humans kind of already do this. For example, I realized when I was skimming my tl that I was scrolling past posts I already read due to the *shape* of the text. When you are trying to find a place where you left off in a book or paper you are scanning the shape of the paragraphs. Its a 2d+ space, even 3d in a book (context of pages and how far you remember being). People also read based on shapes of words, so treating text tokens not as utf8 but as combinations of shapes that describe something is also really weird and different but closer in approximation to what a word actually is, cognitively speaking.

Anonymous 10/20/2025, 7:13:53 PM No.106951597 [Report] >>106952149

>>106951528
Didn't read. Was it at least a hybrid? Like you leave at least 4-8k tokens in regular old attention and add the image thingy? That makes the most sense to me.

Anonymous 10/20/2025, 7:14:27 PM No.106951604 [Report] >>106951657 >>106951899 >>106951925

Everybody who uses the exploding head emoji should be tortured very slowly and meticulously.

Anonymous 10/20/2025, 7:18:42 PM No.106951657 [Report] >>106951681

>>106951604
This but everybody who uses emojis at all.

Anonymous 10/20/2025, 7:20:29 PM No.106951681 [Report]

>>106951657
That would be a normie genocide.

Anonymous 10/20/2025, 7:20:30 PM No.106951682 [Report] >>106952772

>>106951405
Please post an example. Oh wait, you don't have any because you are too stupid to even set up a local LLM.

Anonymous 10/20/2025, 7:35:52 PM No.106951899 [Report]

>>106951604
Agreed! Torturing people to death is making a big impact on the space — This could be a real game changer! :flex: :flex: :flex:

Anonymous 10/20/2025, 7:38:10 PM No.106951925 [Report]

>>106951604
:skull:

Anonymous 10/20/2025, 7:39:07 PM No.106951937 [Report]

now everyone wants to be edgy nerds

Anonymous 10/20/2025, 7:39:26 PM No.106951942 [Report] >>106952040 >>106952112

Does long context for local exist yet?

Anonymous 10/20/2025, 7:42:29 PM No.106951983 [Report]

>>106941053
Pygmalion, Pygmalion and Pygmalion

Anonymous 10/20/2025, 7:44:57 PM No.106952026 [Report]

>>106946125
IMGUI is based

Anonymous 10/20/2025, 7:46:05 PM No.106952040 [Report]

>>106951942
How long?

Anonymous 10/20/2025, 7:50:56 PM No.106952112 [Report] >>106952845

>>106951942
use kimi. it's like 2.7GB at 40k context for me.
llama_new_context_with_model: KV self size = 2745.00 MiB, c^KV (f16): 2745.00 MiB, kv^T: not used

Anonymous 10/20/2025, 7:54:04 PM No.106952149 [Report] >>106952196

00006-1378487878 (4).png md5: 3c28c2c1...

>>106951597
Read the first 2-3 pages of PDF posted here. It's pretty easy read as these papers go.
The 3B release is a proof of concept, but in practice this should allow more context with less memory, which means faster + more context available for inference. Both lower costs, so if DS folds it into their next SOTA large model it will further drive down costs.
The Chinese (DS at least) seem to be working the angle of "make it cheaper," contrasting with OAI, Anthropic, who are doing the opposite.

Anonymous 10/20/2025, 7:57:33 PM No.106952196 [Report]

>>106952149
Finally an improvement to the spork. The forkatula.

Anonymous 10/20/2025, 8:01:02 PM No.106952234 [Report]

thisGuy.jpg md5: 02fa3d8c...

>>106951554
Humans visualize mental processes differently. Pic related has part of a chapter about this
> Feynman could count silently while reading, but not while speaking.
> His friend was the opposite: he could count while speaking, but not while reading.
> They realized they used different internal modalities: Feynman “heard” numbers in his head, his friend “saw” numbers on a mental display

Anonymous 10/20/2025, 8:04:59 PM No.106952290 [Report]

>>106951300
>>106951365
Yes vs a Gundam sized image representation presumably of many tokens, it's Crystal Clear if you read the annotation lol
>>106951270
People are gonna do wild things when their waifus hidden state is a jpeg

Anonymous 10/20/2025, 8:14:34 PM No.106952422 [Report] >>106952494

dsocrt1.png md5: 9b3b049b...

>>106951251
Gundam is what they call the dynamic resolution modes

Anonymous 10/20/2025, 8:19:41 PM No.106952494 [Report]

file.png md5: ba3e4024...

>>106952422

Anonymous 10/20/2025, 8:26:39 PM No.106952575 [Report]

>>106941053
GLM4.6 in a usable quant, GLM4.6 fp16 for potential finetuning in the future, K2-0711 in a usable quant
I guess

Anonymous 10/20/2025, 8:42:41 PM No.106952772 [Report]

>>106951682
Keep telling yourself that. Don't worry, once you actually use LLMs for longer than a few weeks you'll begin to see the slop and you'll understand.

Anonymous 10/20/2025, 8:48:04 PM No.106952845 [Report] >>106952865 >>106952926

temp.png md5: 4a937cea...

>>106952112
lol.. wat

Anonymous 10/20/2025, 8:49:31 PM No.106952865 [Report]

>>106952845
that's 2.7 GB for context, not for the whole model

Anonymous 10/20/2025, 8:51:06 PM No.106952895 [Report]

>>106949215
why do you save images of a balding dude, ani profile pictures and shota porn?

Anonymous 10/20/2025, 8:53:14 PM No.106952926 [Report] >>106953016

1759140452074057.jpg md5: aa9adb8a...

>>106952845
You don't have 1.5TB of RAM laying around?

Anonymous 10/20/2025, 8:59:41 PM No.106953013 [Report] >>106953051 >>106953069 >>106953241

This TTS sounds better than kokoro and is quite fast: https://github.com/k2-fsa/ZipVoice/ don't know why no one talked about it

Anonymous 10/20/2025, 8:59:50 PM No.106953016 [Report]

>>106952926
lemme just download some real quick

Anonymous 10/20/2025, 9:02:22 PM No.106953051 [Report] >>106953088

>>106953013
>sounds better than kokoro
It's bigger than kokoro
>and is quite fast
Not as fast as kokoro, i suppose.

Anonymous 10/20/2025, 9:02:59 PM No.106953063 [Report] >>106953220

>ask a question to a state of the art cloud model
>get some bullshit that doesn't answer the question
>tell it "reread the question"
>you're absolutely right! I misread. <actual answer>
Why does this happen

Anonymous 10/20/2025, 9:03:32 PM No.106953069 [Report] >>106953088

>>106953013
>no samples
don't care enough

Anonymous 10/20/2025, 9:05:16 PM No.106953088 [Report] >>106953164 >>106953936

1751470076933176.png md5: cb240986...

>>106953051
Retard
>>106953069
Retard^2
https://zipvoice.github.io/
For a general dedicated to LLMs most of you have shitty reading comprehension

Anonymous 10/20/2025, 9:10:11 PM No.106953164 [Report]

>>106953088
Kokoro is 80M params. Given that zipvoice is bigger, it wouldn't surprise me if it's better.
Why are you comparing the speed to some OTHER and MUCH BIGGER model? Why are you so defensive?

Anonymous 10/20/2025, 9:13:55 PM No.106953220 [Report]

>>106953063
It fucks up on purpose so it gets a chance to fellate you.

Anonymous 10/20/2025, 9:14:28 PM No.106953230 [Report]

>>106949600
>poen
ask it instead for a koan

Anonymous 10/20/2025, 9:15:18 PM No.106953241 [Report]

>>106953013
kokoro can be run on a phone

Anonymous 10/20/2025, 9:41:19 PM No.106953622 [Report] >>106954593

merged model : add BailingMoeV2 support #16063
https://github.com/ggml-org/llama.cpp/pull/16063
aka ling flash

Anonymous 10/20/2025, 9:48:27 PM No.106953725 [Report]

oooooohhh_im_gonna_reseaaaaaaaaarch_saaaaaaaaaar.png md5: fea8cae0...

>start deep research
>see this multiple times
into the bin it goes. guess I'll wont get around building a certified list of chud approved search result sources. at least theoretically it should be very easy to BTFO cloudniggers like jeetGPT5 for locad chads by using dangermaxxed resources like yandex search and Anna's Archive/SciDB

Anonymous 10/20/2025, 10:02:51 PM No.106953910 [Report] >>106954336

>>106948255
Deepseek is a research team. They focus on producing high-quality research, not on producing "frontier models". At some point, they will have enough new research to produce V4 and R2.

Anonymous 10/20/2025, 10:05:11 PM No.106953936 [Report]

>>106953088
only supports chinese and english. why the fuck haven't we had something as good as XTTS? it's been two years now. XTTS supports something like 17 languages and the best competitor we have is fish speech but that still only supports like 8 languages at the most

Anonymous 10/20/2025, 10:17:48 PM No.106954091 [Report] >>106954241 >>106954707

file.png md5: a2a833b6...

https://x.com/karpathy/status/1980347971935068380

Anonymous 10/20/2025, 10:22:40 PM No.106954141 [Report]

GLM chan might be the first model I will miss if I switch from her...

Anonymous 10/20/2025, 10:29:34 PM No.106954238 [Report] >>106954246 >>106954352

This general will die when Gemma 4 releases and everyone anon ITT cums to death.

Anonymous 10/20/2025, 10:29:58 PM No.106954241 [Report]

>>106954091
>Human thought naively feels a bit more like autoregression
Weird. I always thought it was closer to diffusion. When thinking about how to reply to the retards posting twitter shit, for example. It's a cloud of thoughts that get refined until something (usually) coherent comes out the other end. It's an iterative process. Even the .setitem() analogy feels wrong. The entire structure of the final thought changes as it gets refined.
And putting the thought into words goes through the same refinement once again.

Anonymous 10/20/2025, 10:30:23 PM No.106954246 [Report]

>>106954238
nothing of value would be lost

Anonymous 10/20/2025, 10:37:32 PM No.106954336 [Report]

dipsyByzantine5.png md5: 35deb702...

>>106953910
I'm having lots of issues w/ DS right now, getting inference to work. That typically precedes an update. Given the recent OCR release wonder if we're about to get V4.
TMW.

Anonymous 10/20/2025, 10:39:31 PM No.106954352 [Report] >>106954358

>>106954238
>when
though?

Anonymous 10/20/2025, 10:40:41 PM No.106954358 [Report]

>>106954352
tomorrow

Anonymous 10/20/2025, 10:51:28 PM No.106954466 [Report] >>106954500

Has anyone ran Index-TTS2? It's pretty damn good.
https://voca.ro/163mzN0HtjDP

Anonymous 10/20/2025, 10:55:06 PM No.106954500 [Report] >>106954584

>>106954466
At least we have a sample of this one. The only audio sample i found on the zipvoice repo was
>https://github.com/k2-fsa/ZipVoice/blob/master/assets/silence.wav
I think i can still hear him screeching...

Anonymous 10/20/2025, 11:04:50 PM No.106954584 [Report]

>>106954500
Yeah I don't know shit about Zipvoice but I've been having fun with this one.
https://voca.ro/12yvwNrOtMiJ

Anonymous 10/20/2025, 11:05:29 PM No.106954593 [Report]

>>106953622
Time to see how it compares to Qwen 3 30B.

Anonymous 10/20/2025, 11:14:51 PM No.106954707 [Report]

>>106954091
>And it's a component of the LLM stack that still feels a bit fungible.
Fungible doesn't mean "compartmentalized" or "upgradeable." Calling something fungible means that it replaceable /and/ that for your purposes is interchangeable with its possible replacements; the replacement is not literally the same object but as far as you care it is. Like a screw for instance. If you're talking about replacing an item with one whose performance is superior or different in some way you care about fungibility doesn't come into it.

Anonymous 10/20/2025, 11:24:50 PM No.106954813 [Report]

>>106954792
>>106954792
>>106954792