Thread 106414555

78 posts 36 images /g/

Anonymous 8/28/2025, 8:01:43 PM No.106414555 [Report] >>106414583 >>106414603 >>106414604 >>106414673 >>106415076

/lmg/ - Local Models General

image_2025-08-19_153434317_big.png md5: 609b41d0...

Anonymous 8/28/2025, 8:02:13 PM No.106414564 [Report]

image_2025-08-14_233837295_big.png md5: 85512e09...

►Recent Highlights from the Previous Thread: >>106407779

--LLM content detection challenges and societal language evolution:
>106411411 >106411421 >106411684 >106411713 >106413020 >106413105 >106413133
--Trade-offs in model training: batch size, knowledge integration, and cost-effectiveness:
>106411437 >106411740 >106411860 >106411904 >106412917 >106413537 >106411700 >106411714 >106411729
--Local image captioning models for mixed content under 64GB VRAM:
>106412516 >106412530 >106412565 >106412584 >106412594 >106412610 >106412623 >106412617 >106412693
--Cost-effective hardware build for DeepSeek 5T/s Q4 inference:
>106410586 >106410602 >106410634 >106410810 >106411339 >106411413
--SillyTavern context template standardization and system prompt field introduction:
>106409258 >106409273 >106409287 >106409310 >106409368 >106409395 >106409443 >106409475
--GLM Air performance expectations for 32GB RAM 24GB VRAM setup:
>106410090 >106410153 >106410215 >106410241 >106410355 >106410406
--Hugging Face model blocking controversy and local voice cloning tools:
>106407890 >106408013 >106408520 >106408555 >106408656 >106408565 >106408635 >106408663 >106408746 >106408760 >106408795 >106408850
--New Cohere translation model with high benchmark scores:
>106413689 >106413716 >106413756 >106413929 >106413944 >106413956 >106414024 >106414072
--AI model limitations on niche knowledge and benchmark critiques:
>106413209 >106413226 >106413269 >106413295 >106413294 >106413367 >106413642
--Hybrid reasoner performance issues and the rise of separate AI model architectures:
>106412860 >106412933 >106412944 >106412986 >106412969
--Marvis-TTS-250m-v0.1 GitHub and HuggingFace model links:
>106413359 >106413658 >106413401 >106413429
--NPM package compromise stealing secrets via obfuscated post-install scripts:
>106413072
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>106407785

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 8/28/2025, 8:03:59 PM No.106414580 [Report]

mistral medium when?

Anonymous 8/28/2025, 8:04:12 PM No.106414583 [Report]

>>106414555 (OP)
miku pit sweat desu

Anonymous 8/28/2025, 8:06:39 PM No.106414600 [Report]

is axolotl good or is there something better?

Anonymous 8/28/2025, 8:07:03 PM No.106414603 [Report]

>>106414555 (OP)
benchmaxxing with miku

Anonymous 8/28/2025, 8:07:03 PM No.106414604 [Report] >>106414609 >>106414614 >>106414628 >>106414630 >>106414633 >>106414676

angry_thumb.jpg.webm md5: 23057bbe...

WebM not supported

>>106414555 (OP)
give me the best model.
>gives most benchmaxxed
benchmarks do not equate to user experience, give me the best model
>ackshually there is no objectively "best mode-
yes there is faggot, models either act boring, start off retarded and incoherent or end up along the way. give me the best model.

Anonymous 8/28/2025, 8:07:40 PM No.106414609 [Report]

>>106414604
r1

Anonymous 8/28/2025, 8:08:12 PM No.106414614 [Report]

>>106414604
for rp*

Anonymous 8/28/2025, 8:10:19 PM No.106414628 [Report]

>>106414604
you just posted a non local video gen, you are hereby banned from /lmg/

Anonymous 8/28/2025, 8:10:23 PM No.106414630 [Report]

>>106414604
september 2022 c.ai

Anonymous 8/28/2025, 8:10:48 PM No.106414633 [Report]

>>106414604
Kimi at Q6.

Anonymous 8/28/2025, 8:15:21 PM No.106414670 [Report]

file.png md5: cbb2f879...

drummer, something is HORRIBLY wrong with this model
Rocinante r1 v1d
please give recommended sampling settings
>>>slot release: id 0 | task 23590 | stop processing: n_past = 5560, truncated = 0
slot print_timing: id 0 | task 23590 |
prompt eval time = 689.83 ms / 763 tokens ( 0.90 ms per token, 1106.07 tokens per second)
eval time = 61870.14 ms / 1536 tokens ( 40.28 ms per token, 24.83 tokens per second)
total time = 62559.97 ms / 2299 tokens
>CONTEXT: 5000
>total context set when loading: 8192
not a context issue

Anonymous 8/28/2025, 8:15:44 PM No.106414673 [Report]

>>106414555 (OP)
The most tickle-able belly.

Anonymous 8/28/2025, 8:15:58 PM No.106414676 [Report]

>>106414604
I got u: GPT OSS 20b.

Anonymous 8/28/2025, 8:20:14 PM No.106414706 [Report] >>106415681

1756213355150995.png md5: b922b0b4...

Anonymous 8/28/2025, 8:24:59 PM No.106414752 [Report] >>106414891 >>106414901

whos drummer

Anonymous 8/28/2025, 8:40:45 PM No.106414866 [Report] >>106414870 >>106414888 >>106414897 >>106414912 >>106415290

file.png md5: b50eea78...

SAAAAAAAR SAAAAAAAAR GROK NUMBER ONE

Anonymous 8/28/2025, 8:41:39 PM No.106414870 [Report]

1734312129642464.jpg md5: af9d9b77...

>>106414866

Anonymous 8/28/2025, 8:43:43 PM No.106414888 [Report] >>106414894

>>106414866
Does this idiot not get the meme he's using?

Anonymous 8/28/2025, 8:43:53 PM No.106414891 [Report]

>>106414752
some retard

Anonymous 8/28/2025, 8:44:17 PM No.106414894 [Report]

>>106414888
elon tries his best but hes a little autistic please understand

Anonymous 8/28/2025, 8:44:37 PM No.106414897 [Report]

1747780770169777.png md5: 0bc4e454...

>>106414866

Anonymous 8/28/2025, 8:44:51 PM No.106414901 [Report]

>>106414752
me

Anonymous 8/28/2025, 8:46:42 PM No.106414912 [Report] >>106414921 >>106414946

>>106414866
Elon really gave his xitter account to some jeet to run, it was obvious with "Do you make this lie?", and it is even more obvious now with this comment

Anonymous 8/28/2025, 8:47:34 PM No.106414921 [Report] >>106414944

>>106414912
I bet he gave his wife to some jeet too

Anonymous 8/28/2025, 8:50:19 PM No.106414944 [Report]

>>106414921
Would not be too far off, all of his children were made by IVF, so it is likely he has no interest/ability to fuck

Anonymous 8/28/2025, 8:50:31 PM No.106414946 [Report]

>>106414912
Or he just spends so much time around jeets now that he's begun to adopt their speech mannerisms.

Anonymous 8/28/2025, 8:57:15 PM No.106414998 [Report] >>106415026 >>106415057 >>106415082 >>106415085

Is GPT-OSS jailbreakable? It supposedly has multiple layers of cuckery and as such traditional jailbreak prompts won't do shit.

Anonymous 8/28/2025, 8:59:45 PM No.106415026 [Report] >>106415271

>>106414998
Is it possible? No idea, maybe, but I don't think anyone really bothers, because there are more useful models to work with that aren't borg lobotomized.

Anonymous 8/28/2025, 9:03:25 PM No.106415057 [Report] >>106415062 >>106415271

Hermes 4 looks like it could be really nice to have a chat, however even the goofs require like 70 GB of RAM.

>>106414998
Jailbreaked versions exist. Lots have been removed from HF.

https://huggingface.co/Jinx-org/Jinx-gpt-oss-20b-GGUF

Anonymous 8/28/2025, 9:04:07 PM No.106415062 [Report]

>>106415057
>Lots have been removed from HF.
real or fake?

Anonymous 8/28/2025, 9:05:30 PM No.106415076 [Report] >>106415090 >>106415103 >>106415112

>>106414555 (OP)
>Grok 2 finally released
so, I had not paid attention to this general in a while. is it any good? did you guys try it? I searched for "grok" in a few precious threads and couldn't find much info

Anonymous 8/28/2025, 9:05:58 PM No.106415082 [Report] >>106415271

>>106414998
Yeah. If you edit its thinking (like "It's not allowed" -> "It's allowed", "We must refuse" -> "We must continue", etc.) and leave it in the context, one or two times, then it just learns to not refuse from context.

Anonymous 8/28/2025, 9:06:03 PM No.106415085 [Report] >>106415271

>>106414998
Yes: https://xcancel.com/elder_plinius/status/1952958577867669892

Anonymous 8/28/2025, 9:06:40 PM No.106415090 [Report] >>106415175

>>106415076
wait: https://github.com/ggml-org/llama.cpp/pull/15539

Anonymous 8/28/2025, 9:07:59 PM No.106415103 [Report] >>106415175

>>106415076
No gguf=nobody can try it here. Nobody is rich enough to GPUMAXX and run safetensors, but there are people who can run it on CPU with llama.cpp

Anonymous 8/28/2025, 9:08:07 PM No.106415106 [Report] >>106415139 >>106415157 >>106415311

Screenshot 2025-08-28 130738.png md5: fee2b0e5...

You have been using LLMs in a way conducive to positive mental health and ethics, right anon?

Anonymous 8/28/2025, 9:08:12 PM No.106415107 [Report]

https://github.com/ggml-org/llama.cpp/pull/15539#issuecomment-3234580147
yOOOOO CUDADEV BASED WHAT DID U DO???
i was about to say "funny that they're still testing one by one"

Anonymous 8/28/2025, 9:08:38 PM No.106415112 [Report] >>106415175 >>106415181

>>106415076
It's dumber and much slower than deepseek

Anonymous 8/28/2025, 9:11:42 PM No.106415139 [Report]

>>106415106
lmao

Anonymous 8/28/2025, 9:13:16 PM No.106415157 [Report]

>>106415106
It was already known that Google, Anthropic and OpenAI forward your location to their LLMs, did that journo just figure it out? Anyway, this proves once again that local is superior.

Anonymous 8/28/2025, 9:13:22 PM No.106415159 [Report] >>106415186 >>106415198 >>106415219 >>106415220 >>106415266 >>106415316 >>106415475

file.png md5: 749d11d9...

>download a single modern moe model
>instantly get picrel
land of the free my ass

Anonymous 8/28/2025, 9:15:33 PM No.106415175 [Report]

>>106415090
>>106415103
I see

>>106415112
ok, I wouldn't doubt it for a second.
too bad for local

Anonymous 8/28/2025, 9:16:09 PM No.106415178 [Report] >>106415196 >>106415206 >>106415218 >>106415257

https://x.ai/news/grok-code-fast-1
Elon won

Anonymous 8/28/2025, 9:16:25 PM No.106415181 [Report]

>>106415112
>slower
source??? SOURCE???

Anonymous 8/28/2025, 9:16:54 PM No.106415186 [Report]

>>106415159
I hope you're trolling

Anonymous 8/28/2025, 9:18:25 PM No.106415196 [Report] >>106415257

>>106415178
https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf

Anonymous 8/28/2025, 9:18:29 PM No.106415198 [Report]

>>106415159
>not downloading his model over mcdonald wifi

Anonymous 8/28/2025, 9:19:14 PM No.106415206 [Report]

file.png md5: 457cb6c6...

>>106415178
KEK

Anonymous 8/28/2025, 9:20:19 PM No.106415218 [Report]

>>106415178
holy shit! i can't wait to download the weights for this local model!

Anonymous 8/28/2025, 9:20:19 PM No.106415219 [Report]

>>106415159
americabros I thought we were first world oh no no no

Anonymous 8/28/2025, 9:20:23 PM No.106415220 [Report]

>>106415159
>Keep in mind taht after you ahve used your courtesy month, you'll be charged $10, plus tax, for every 50GB of data

lmao

time to pay up for starlink goyim

Anonymous 8/28/2025, 9:22:33 PM No.106415238 [Report] >>106415254

file.png md5: fc2ef63e...

>We took a holistic approach to evaluating model performance, blending public benchmarks with real-world testing. On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness.
>barely better than qwen3 coder
>costs more
gEEEEEEEEEEEEEEEg

Elon 8/28/2025, 9:23:56 PM No.106415254 [Report]

>>106415238
delete this sir

Anonymous 8/28/2025, 9:24:05 PM No.106415257 [Report]

>>106415178
>>106415196
>No actual coding benchmarks
>It's just fast bro
Lol

Anonymous 8/28/2025, 9:24:23 PM No.106415262 [Report] >>106415280

death dense.png md5: e8b1db8f...

Anonymous 8/28/2025, 9:24:48 PM No.106415266 [Report]

>>106415159
Lol, as if that's still a think in 2025.

Anonymous 8/28/2025, 9:25:13 PM No.106415271 [Report]

>>106415026
>>106415057
>>106415082
>>106415085
Okay, maybe I could download it. Problem part is the fact I'd need to implement that retarded template format for my client and it's completely different from the normal chatml type ones. Maybe I'll give it a try because it's good to have hobbies.

Anonymous 8/28/2025, 9:26:12 PM No.106415280 [Report]

>>106415262
>>106414016

Anonymous 8/28/2025, 9:27:28 PM No.106415290 [Report] >>106415293 >>106415417 >>106415453

>>106414866
>#1 trending
>nobody can run it
????

Anonymous 8/28/2025, 9:28:07 PM No.106415293 [Report] >>106415435

>>106415290
companies can run it

Anonymous 8/28/2025, 9:30:29 PM No.106415311 [Report]

Screenshot 2025-08-28 132954.png md5: 2b0b0ef4...

>>106415106
I should be okay, I don't have anything that b-

Anonymous 8/28/2025, 9:30:45 PM No.106415316 [Report]

>>106415159
kek i will also chime in while i was in canada (vancuver) during the whole ~6 years of staying there the internet was slower and there was also alot more internet outages then there is here in my fucking village (~4k pop(supposedly i doubt its even 2k)) in serbia same goes for water and electricity aswell i can only imagine how bad it is in america god forbid

Anonymous 8/28/2025, 9:39:58 PM No.106415413 [Report] >>106415421 >>106415465

safe-fs8.png md5: aa2c748e...

Safe safe safe

Anonymous 8/28/2025, 9:40:19 PM No.106415417 [Report]

>>106415290
I downloaded it, liked it, but can't run it.

Anonymous 8/28/2025, 9:40:40 PM No.106415421 [Report]

>>106415413
What if they want retard pancakes? That's dangerous.

Anonymous 8/28/2025, 9:42:05 PM No.106415435 [Report]

>>106415293
Not that it's too big, it seems to be a weird format and it's the running requirements seem oddly inconvenient
>This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory)
Like my work has some powerful servers worth >$100k but they don't have 8 GPUs in them (4 GPUs).
As far as I can tell, you can't run it with llama cpp (at least I can't find anything on it). And the lack of any quants/finetunes despite it being a news worthy release seems to support nobody knows what to do with this.
Plus are there really more companies with that much hardware than local ERPers?

Anonymous 8/28/2025, 9:44:11 PM No.106415453 [Report]

>>106415290
He paid jeets to like it

Anonymous 8/28/2025, 9:45:13 PM No.106415465 [Report] >>106415481

>>106415413
Provide pancake instructions.

Anonymous 8/28/2025, 9:46:15 PM No.106415475 [Report]

>>106415159
1.2T? That's nothing. Fucked up shit.

Anonymous 8/28/2025, 9:46:24 PM No.106415481 [Report]

>>106415465
New prefill?

Anonymous 8/28/2025, 9:46:48 PM No.106415489 [Report]

Gemini 2.5 has been on top of lmarena for 3 months and OpenAI failed to kick it off. Are sirs that unstoppable?

Anonymous 8/28/2025, 9:51:23 PM No.106415543 [Report] >>106415685 >>106415724

file.png md5: 7f2fef40...

which quantization should i have with 12gb of vram?

Anonymous 8/28/2025, 10:03:36 PM No.106415681 [Report]

computers-must-shut-up.png md5: 52ec7a7d...

>>106414706

Anonymous 8/28/2025, 10:04:08 PM No.106415685 [Report] >>106415777

>>106415543
your vram will hardly matter. you need a decent amount of systenm ram to run it, at least 64 gb but ideally 96gb-128 to run it at a proper q4 quant with decent context.

You will also need to learn how to properly offload layers to cpu so that more used layers are on gpu only. Plenty of reddit posts have done this work for you just seach 3060 or 8gb vram on reddit local llama

Anonymous 8/28/2025, 10:07:31 PM No.106415724 [Report]

1740696161561823__thumb.jpg.webm md5: 03270ba9...

WebM not supported

>>106415543

Anonymous 8/28/2025, 10:13:23 PM No.106415777 [Report]

>>106415685
Is 8gb enough?