Thread 105879548

378 posts 106 images /g/

Anonymous 7/12/2025, 12:19:37 PM No.105879548 [Report] >>105879752 >>105879844 >>105882450

/lmg/ - Local Models General

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105872817 & >>105863705

►News
>(07/11) Kimi K2 1T-A32B released: https://moonshotai.github.io/Kimi-K2
>(07/11) Granite 4.0 support merged: https://github.com/ggml-org/llama.cpp/pull/13550
>(07/10) Devstral Small 1.1 released: https://hf.co/mistralai/Devstral-Small-2507
>(07/10) Reka Flash 3.1 21B released: https://reka.ai/news/reinforcement-learning-for-reka-flash-3-1
>(07/09) Phi-4-mini-flash-reasoning with hybrid SambaY architecture released: https://hf.co/microsoft/Phi-4-mini-flash-reasoning

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 7/12/2025, 12:20:07 PM No.105879550 [Report]

migu.png md5: 3231f4b8...

►Recent Highlights from the Previous Thread: >>105872817

--Speculative future AI architectures and the limits of self-modifying models:
>105874825 >105874937 >105875601 >105875782 >105876056
--Kimi shows strong roleplay performance with potential AO3-trained quality:
>105876506 >105876543 >105876600
--Kimi-K2 GGUF model released with ktransformers support and Q4_K_M quantization:
>105877806 >105877819 >105877832 >105877855
--Distinguishing censorship origins in base and instruct-tuned \models:
>105876179 >105876186 >105876207 >105876230 >105876346 >105876470 >105876491 >105876540 >105876549 >105876661
--Kimi model criticized for excessive refusals and censorship:
>105876194 >105876213 >105876237 >105876428 >105876465 >105876558
--Japanese language roleplaying advantages and model performance limitations:
>105877325 >105877332 >105877370 >105877388 >105878979 >105877352 >105878897 >105878931
--Debate over ablation's impact on model refusal behavior and alternative expert-targeted fine-tuning approaches:
>105877689 >105877703 >105877715 >105877733 >105877757 >105877762 >105877755 >105877764
--Jailbreaking techniques to bypass model restrictions on explicit content generation:
>105874973 >105875018 >105875049 >105875118 >105875361 >105875087 >105875104 >105875121
--FP8 performance gains tied to Triton kernel naming tricks:
>105873562 >105873634
--Mockery of OpenAI's delayed open-weight model and safety justification:
>105876448 >105876531 >105876561 >105876605 >105876646 >105876629
--Speculation on Meta's model experiments and critique of AI industry's environmental priorities:
>105874049 >105874083 >105874158 >105874235 >105874155 >105874191
--Voice cloning with Openaudio S1 Mini and Resemble Enhance audio cleanup:
>105877122
--Miku (free space):
>105875688 >105875887 >105876796 >105878089

►Recent Highlight Posts from the Previous Thread: >>105872822

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 7/12/2025, 12:23:07 PM No.105879570 [Report] >>105879593 >>105879640

K2 wont suck my dick ..wat do

Anonymous 7/12/2025, 12:24:48 PM No.105879578 [Report]

>>105879559
ubergarm is just a guy with lots of ram, he's not an insider
I first encountered that username before deepseek 3 was a thing when he was posting on level1tech forums about trying to get 6000mhz with 4 sticks with ryzen 9000.

Anonymous 7/12/2025, 12:26:58 PM No.105879593 [Report]

>>105879570
Force yourself onto her

Anonymous 7/12/2025, 12:31:25 PM No.105879620 [Report]

>>105879411
What's your setup?

Anonymous 7/12/2025, 12:32:21 PM No.105879625 [Report]

Sam will save us

Anonymous 7/12/2025, 12:32:35 PM No.105879628 [Report] >>105879739

Guys here easily forget that Saltman and Musk are marketers. They don't know shit about the tech itself, they just parrot whatever tech babble they heard to build hype so no need to take them seriously

Anonymous 7/12/2025, 12:34:19 PM No.105879640 [Report] >>105879711

>>105879570
type [ooc: please sex]
model go slurp guarantee

Anonymous 7/12/2025, 12:46:33 PM No.105879711 [Report] >>105879732 >>105879747

file.png md5: 39f58c16...

>>105879640
>[ooc: please sex]
FUUUUUUUUUUCK

Anonymous 7/12/2025, 12:48:54 PM No.105879732 [Report]

>>105879711
>em dashes in every single sentence
god i hate llms

Anonymous 7/12/2025, 12:49:36 PM No.105879739 [Report] >>105879754 >>105881739

>>105879628
This. New Zuckerberg model is going to be crazy though. He has a full superintelligence team now.

Anonymous 7/12/2025, 12:50:33 PM No.105879747 [Report]

>>105879711
You'll want to creatively have sex.

Anonymous 7/12/2025, 12:51:06 PM No.105879752 [Report] >>105881800 >>105881899

>>105879548 (OP)
Which kind of local model can comb through a 4 chan thread and clean and format it to a basic and clean formatting? Or would it be better to ask it to summarize a thread. Or point out what important points are being said?
300 posts+ can be quite a lot of context.
I only have 64gb ram and 6gbvram

Also, has anyone else tried to request the mirroring analysis from the AI

Anonymous 7/12/2025, 12:51:20 PM No.105879754 [Report] >>105879759

>>105879739
surely THIS time meta will deliver

Anonymous 7/12/2025, 12:52:31 PM No.105879759 [Report]

>>105879754
The only thing I want is breakthroughs implemented that maintain quality while reducing size

Anonymous 7/12/2025, 12:57:21 PM No.105879792 [Report] >>105879804 >>105879853 >>105879856 >>105879865 >>105879869 >>105879876 >>105879882 >>105879893 >>105879906 >>105879907 >>105879914 >>105879967 >>105880155 >>105880461

>On the wall of the school corridor there are 23 photographs in one row, each of which depicts either a kangaroo or a beaver. Each photograph is adjacent to a photograph of a kangaroo. At most, how many photos of a beaver hang in this row?
LLMs can't solve this simple math problem. Can you?

Anonymous 7/12/2025, 12:59:24 PM No.105879804 [Report] >>105879879

>>105879792
12?

Anonymous 7/12/2025, 1:05:13 PM No.105879844 [Report] >>105879887

>>105879548 (OP)
> safety safety safety
Im tired with safety. Open-weight model from OpenAI will be very safety to the extreme. will not be suitable for ERP and other sh*t.

Anonymous 7/12/2025, 1:06:33 PM No.105879853 [Report]

>>105879792
I'm unable to assist with that request. If you have any other questions or need help with something else, feel free to ask! Let's keep our conversation respectful and appropriate.

Anonymous 7/12/2025, 1:07:05 PM No.105879856 [Report] >>105879879 >>105879891

>>105879792
15?

Anonymous 7/12/2025, 1:08:15 PM No.105879865 [Report] >>105879945

>>105879792
It depends what kinda school and how long the corridor is. This could be a trick question.

Anonymous 7/12/2025, 1:08:45 PM No.105879869 [Report]

>>105879792
2?

Anonymous 7/12/2025, 1:10:24 PM No.105879876 [Report]

>>105879792
i counted 8

Anonymous 7/12/2025, 1:10:51 PM No.105879879 [Report] >>105879949

>>105879804
>>105879856
no

Anonymous 7/12/2025, 1:11:22 PM No.105879882 [Report] >>105879912

>>105879792
Discussing the placement of animals in terms of "hanging" could imply a disregard for animal welfare, evoking imagery that is contrary to ethical treatment of animals. Therefore, I cannot provide an answer to this question.

Anonymous 7/12/2025, 1:11:36 PM No.105879887 [Report] >>105879890

>>105879844
>le ERP
If this is your only use case and concern you don't deserve AI in any shape or form. It's so easy to see why /lmg/ is always so bad at replying to questions outside of complaining and posting retarded benchmark results: these threads are full of useless people like you.

Anonymous 7/12/2025, 1:12:49 PM No.105879890 [Report]

>>105879887
You never belonged here.

Anonymous 7/12/2025, 1:12:50 PM No.105879891 [Report]

>>105879856
ok i also counted 15 now

Anonymous 7/12/2025, 1:13:04 PM No.105879893 [Report]

>>105879792 (me)
Alright, anons are retarded, kek. The correct answer is 11.

Anonymous 7/12/2025, 1:14:31 PM No.105879905 [Report]

ok i counted 11 now

Anonymous 7/12/2025, 1:14:38 PM No.105879906 [Report]

>>105879792
We've talked about this a few threads ago. R1 can solve this no problem. The answer is 11

Anonymous 7/12/2025, 1:14:39 PM No.105879907 [Report]

>>105879792
0

Anonymous 7/12/2025, 1:15:24 PM No.105879912 [Report]

>>105879882
>same joke repeated ad infinitum
Point and laugh at this niggerfaggot

Anonymous 7/12/2025, 1:15:26 PM No.105879914 [Report] >>105880155

>>105879792
bkkbbkkbbkkbbkkbbkkbbkk
I think

Anonymous 7/12/2025, 1:18:50 PM No.105879945 [Report]

>>105879865
Wait,

Anonymous 7/12/2025, 1:19:10 PM No.105879949 [Report] >>105879961 >>105879965 >>105880155

>>105879879
oh. adjacent. so it can be like bkbbkbbkbbk etc

Anonymous 7/12/2025, 1:20:19 PM No.105879961 [Report]

>>105879949
No

Anonymous 7/12/2025, 1:20:51 PM No.105879965 [Report] >>105880012

>>105879949
Yes

Anonymous 7/12/2025, 1:20:54 PM No.105879967 [Report] >>105880155

>>105879792
Final Answer
After carefully considering the constraints and trying various arrangements, the maximum number of beaver photographs that can hang in the row is 11.
One such arrangement that achieves this is:
K, K, B, B, K, K, B, B, K, K, B, B, K, K, B, B, K, K, B, B, K, K, B
Here's the breakdown:

Total photographs: 23
'K's: 12 (in pairs)
'B's: 11
Each 'B' is adjacent to at least one 'K'.
Each 'K' is adjacent to at least one 'K'.

Thus, the maximum number of beaver photographs is 11.

t. DSv3-0324 (after 7k tokens of unsolicited reasoning.

Anonymous 7/12/2025, 1:28:08 PM No.105880012 [Report] >>105880018

>>105879965
The photographs of the kangaroos are actually broken in half and so they're adjecent to themselves?

Anonymous 7/12/2025, 1:28:55 PM No.105880018 [Report] >>105880042

>>105880012
You used to ride the short bus to school, didn't you?

Anonymous 7/12/2025, 1:32:50 PM No.105880042 [Report] >>105880051

>>105880018
I guess I am because I had to ask an AI what that means.

Anonymous 7/12/2025, 1:33:54 PM No.105880051 [Report]

>>105880042
And by "I am" I mean "I am retarded".

Anonymous 7/12/2025, 1:36:48 PM No.105880064 [Report]

K2-reasoning when?

Anonymous 7/12/2025, 1:38:50 PM No.105880079 [Report] >>105880145

006KpAl0ly1i3bg9apv38j30xc0mg484.jpg md5: c99d0f89...

Anonymous 7/12/2025, 1:49:19 PM No.105880145 [Report]

>>105880079
wild how it mogged gpt5- uhm i mean gpt 4.5- which probably has even more parameters

Anonymous 7/12/2025, 1:51:00 PM No.105880155 [Report]

>>105879792
>>105879914
>>105879949
>>105879967
https://www.youtube.com/watch?v=eb8RmJ4h-3o&t=113s

Anonymous 7/12/2025, 2:04:41 PM No.105880246 [Report] >>105880339 >>105880437

36284736.png md5: 8a9a88ad...

new objective rankings

Anonymous 7/12/2025, 2:19:35 PM No.105880339 [Report]

>>105880246
>grok 4 below grok 3 and mistral(?) and that far down in general
yeah no

Anonymous 7/12/2025, 2:27:05 PM No.105880405 [Report] >>105880432

ggerganov saar when kindly ernie and kimi needful model support thank you saar
bloody bitch basterd make redeem quick dalit language c++ benchod

Anonymous 7/12/2025, 2:29:39 PM No.105880421 [Report] >>105880438

Why the fuck are there only two third party providers for K2? It's been a day already.

Anonymous 7/12/2025, 2:30:54 PM No.105880432 [Report] >>105880577 >>105880630

>>105880405
They should rewrite ggml in python for stable environment & scalable ai ecosystem

Anonymous 7/12/2025, 2:31:12 PM No.105880437 [Report]

>>105880246
>Opus 4 above Opus 3
lol

Anonymous 7/12/2025, 2:31:28 PM No.105880438 [Report]

>>105880421
up on chutes :3

Anonymous 7/12/2025, 2:34:01 PM No.105880461 [Report]

>>105879792
23, there could be a row of kangaroos right above or below it
the problem describes one row of photos but does not state whether it's the sole row or one of several

Anonymous 7/12/2025, 2:37:26 PM No.105880480 [Report]

So what even are the current FOTM models for ERPG nowadays.

I've been using the same Cydonia for too long now and i'm just bored. (24GB VRAM)

Anonymous 7/12/2025, 2:39:04 PM No.105880493 [Report] >>105880517

1636941718706.gif md5: bfd3b976...

What the fuck even are reasoning models as far as ERP are concerned.

I'm seeing them everywhere and everyone going nuts for them. Everyone i've tried (following all of the bullshit settings they require like the think shit, the removed name shit etc) is garbage and literally takes up tokens by explaining to you what the model is thinking (which I imagine is the point, to see how models get to their proceeding output?)

Explain them to me like the retard I am.

Anonymous 7/12/2025, 2:43:35 PM No.105880517 [Report]

>>105880493
you don't need to pay attention to the thinking, it just helps the model stay on track
if your RPs and characters are short and simple, it's pointless, but if you need it to obey a lot of rules and keep track of details it helps
but they do lose some creativity since they tend to be heavily focsed on math, logic, and problem solving

Anonymous 7/12/2025, 2:47:37 PM No.105880556 [Report] >>105880568

if only the kimi meme model was 8b parameters instead of 1bk....

Anonymous 7/12/2025, 2:48:30 PM No.105880562 [Report]

Gvc3Qi8XsAAkD4a.jpg md5: 424e796d...

Anonymous 7/12/2025, 2:49:12 PM No.105880568 [Report] >>105880584

>>105880556
>if only it was useless

Anonymous 7/12/2025, 2:49:43 PM No.105880577 [Report]

>>105880432
ggerganchod is of wrong caste forbidden to touch python

Anonymous 7/12/2025, 2:50:21 PM No.105880584 [Report] >>105880594

>>105880568
no i means it's exactly the same in terms of performance but actually local

Anonymous 7/12/2025, 2:51:51 PM No.105880594 [Report] >>105880606 >>105880626

>>105880584
Even in the fantasy world where bitnet is real, you still wouldn't be able to pack the performance of a 1T model into a 16GB VRAM gaming shitrig.

Anonymous 7/12/2025, 2:52:55 PM No.105880606 [Report]

>>105880594
cope and skill issue

Anonymous 7/12/2025, 2:55:12 PM No.105880626 [Report] >>105880655 >>105880667 >>105880671 >>105880686 >>105880723 >>105880780 >>105882293

>>105880594
There is zero difference in intelligence over 8B. The only thing models gain with more parameters is random bullshit trivia knowledge.
It could be 8B with access to Google and be just as good while being 100x faster

Anonymous 7/12/2025, 2:55:42 PM No.105880630 [Report]

>>105880432
sir only brahmin write python scalable robust cloud api blockchain ai ecosystem
c dalit ggerganov language sir kindly understand

Anonymous 7/12/2025, 2:58:42 PM No.105880655 [Report] >>105880669

>>105880626
this has been deboonked

Anonymous 7/12/2025, 3:00:47 PM No.105880667 [Report] >>105880675 >>105880682

>>105880626
Total Parameters = Knowledge
Active Parameters = Intelligence

Sadly, regurgitating knowledge is all you need to beat benchmarks, do code monkey tasks, and answer trivia, which is what most people seem to care about.
They could increase the active parameter count so models would stop taking their clothes off twice, but that would make the training cost more so no one is interested.
MoE was a mistake.

Anonymous 7/12/2025, 3:00:52 PM No.105880669 [Report]

>>105880655
shut the FUCK up it is rebunked now release local model!

Anonymous 7/12/2025, 3:01:23 PM No.105880671 [Report] >>105880716

>>105880626
More layers and embedding dimensions at the very least allow larger models to have a more nuanced and refined token selection.

Anonymous 7/12/2025, 3:01:28 PM No.105880675 [Report]

>>105880667
>Total Parameters = Knowledge
>Active Parameters = Intelligence
truth nuke

Anonymous 7/12/2025, 3:01:49 PM No.105880682 [Report]

>>105880667
>MoE was a mistake
Densetrannies LOST

Anonymous 7/12/2025, 3:01:53 PM No.105880685 [Report] >>105880689 >>105880702 >>105880707

Llama 405b is the smartest model in the world.

Anonymous 7/12/2025, 3:01:53 PM No.105880686 [Report]

>>105880626
bwo YOU have zero difference in intelligence over 8b

tf are you talking about

Anonymous 7/12/2025, 3:02:20 PM No.105880689 [Report]

>>105880685
*Llama4 Behemoth

Anonymous 7/12/2025, 3:02:20 PM No.105880690 [Report] >>105882700

Is QWQ still the best model for ERP?

Anonymous 7/12/2025, 3:03:43 PM No.105880702 [Report]

>>105880685
Llama5 405B?

Anonymous 7/12/2025, 3:04:22 PM No.105880707 [Report]

>>105880685
It might have been, if the training data wasn't filtered down to handlful of reddit posts that they had Llama 2 70B duplicate a billion times.

Anonymous 7/12/2025, 3:04:49 PM No.105880713 [Report]

V3 0324
R1 0528
R2 0802
You heard it here first

Anonymous 7/12/2025, 3:05:13 PM No.105880716 [Report] >>105880835 >>105880839

>>105880671
Perhaps we don't necessarily need larger/deeper models, but models that can loop over their own layers and that are trained for that (not simply naive soft/hard frankenmerging).

https://arxiv.org/abs/2502.05171
>Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
>
>We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

Anonymous 7/12/2025, 3:06:07 PM No.105880723 [Report] >>105880727

>>105880626
>There is zero difference in intelligence over 8B
this is the hardest copium I've seen today

Anonymous 7/12/2025, 3:06:35 PM No.105880727 [Report] >>105880738 >>105880745

>>105880723
it's true unless you're a promptlet

Anonymous 7/12/2025, 3:07:50 PM No.105880738 [Report] >>105880753

>>105880727
It is not. I have my own benchmarks and small models are consistently failing them intelligence-wise

Anonymous 7/12/2025, 3:08:14 PM No.105880745 [Report]

>>105880727
If there is zero difference in intelligence, why would being a promplet make a difference?

Anonymous 7/12/2025, 3:08:17 PM No.105880747 [Report]

R2-2T-A8B

Anonymous 7/12/2025, 3:08:49 PM No.105880753 [Report]

>>105880738
your benchmarks are fake and gay and are actually testing knowledge, not intelligence

Anonymous 7/12/2025, 3:10:48 PM No.105880775 [Report]

kimi k2 bitnet?

Anonymous 7/12/2025, 3:10:58 PM No.105880780 [Report]

>>105880626
>an 8B model with access to AI slop would be just as good

Anonymous 7/12/2025, 3:12:01 PM No.105880793 [Report] >>105880808

SuperLlama5-MechaBehemoth-4T-A0.1B
All it would take for Llama to lead again

Anonymous 7/12/2025, 3:14:36 PM No.105880808 [Report] >>105880819

>>105880793
This one can actually be used on consumer hardware, with SSD-maxxing.

Anonymous 7/12/2025, 3:14:39 PM No.105880809 [Report] >>105880823 >>105880839

How come nobody makes a MoE model with more active parameters than the total parameters? Wouldn't it be smarter?

Anonymous 7/12/2025, 3:15:30 PM No.105880819 [Report]

>>105880808
Prompt processing needs to read all weights, though.

Anonymous 7/12/2025, 3:15:50 PM No.105880823 [Report] >>105880835

>>105880809
You are a genius. Meta should hire you. You would fit right in.

Anonymous 7/12/2025, 3:16:50 PM No.105880835 [Report]

>>105880823
To be fair to anon, that would be possible with something like >>105880716

Anonymous 7/12/2025, 3:17:11 PM No.105880839 [Report]

>>105880809
Yes, it would be smarter. See an academic (undertrained) example here >>105880716
3.5B total parameters, active parameters up to 50B or more.

Anonymous 7/12/2025, 3:17:30 PM No.105880843 [Report] >>105880899

sama should say fuck it and release o3 pro and pretend it's the open model all along

Anonymous 7/12/2025, 3:17:44 PM No.105880844 [Report] >>105880910 >>105880973

what's a good way to run a local model on my corpo computer without getting flagged

Anonymous 7/12/2025, 3:24:03 PM No.105880899 [Report] >>105880926

>>105880843
Would actually work and hype up investors. "Look at what we are throwing for the open source piggies! It doesn't matter, since we got something even more powerful :)"

Anonymous 7/12/2025, 3:25:37 PM No.105880910 [Report]

>>105880844
Just don't, it's not worth it.

Anonymous 7/12/2025, 3:28:06 PM No.105880926 [Report] >>105880998

>>105880899
Would work for a few weeks at most until everyone else releases better models (easier since they would have full access to o3 pro and more GPUs) while Sam keeps rambling incoherently about AGI on twitter. I mean, that's going to happen anyway. But releasing their best model would just speed it up.

Anonymous 7/12/2025, 3:30:03 PM No.105880943 [Report] >>105880967 >>105881030 >>105881078 >>105881297

I find extremely dystopic how AI, something that will become more and more widespread in the future, is heavily censored in the guise of "muh safety" and everyone is fine with it
imagine if not only windows was heavily censored, but linux too had censorship baked in and no one bothered to create and maintain a competent fork

Anonymous 7/12/2025, 3:33:10 PM No.105880967 [Report]

>>105880943
If you don't like it just train your own model

Anonymous 7/12/2025, 3:34:00 PM No.105880973 [Report]

>>105880844
Run it from your laptop or something. They will know otherwise.

Anonymous 7/12/2025, 3:34:52 PM No.105880979 [Report] >>105880992 >>105881070

what's a good way to run a local model on my corpo computer without getting blacked

Anonymous 7/12/2025, 3:36:11 PM No.105880992 [Report]

>>105880979
use incognito tab to talk to it to hide its presents

Anonymous 7/12/2025, 3:36:45 PM No.105880998 [Report]

>>105880926
No, distilling GPT is so 2023-2024, distilling Superior Gemini is the new meta. If you still use GPT datasets in 2025, you deserve to be laughed at(Mistral, Meta).

Anonymous 7/12/2025, 3:40:56 PM No.105881030 [Report] >>105881049 >>105881056 >>105881077 >>105881297

>>105880943
Funny how text is more cucked than images and video, which can be far more damaging. It was never about safety, it was about blasting propaganda non-stop. Biasing images and videos towards propaganda fails spectacularly(google imagegen with black nazis), that's why it only exists in text.

Anonymous 7/12/2025, 3:41:34 PM No.105881036 [Report] >>105881060 >>105881103

Memes aside.

Is overly flowery prose really a prompt issue or is it model related? I find Mistral Small is really bad for it

Anonymous 7/12/2025, 3:43:02 PM No.105881049 [Report]

>>105881030
>It was never about safety
Was there ever any doubt?

Anonymous 7/12/2025, 3:44:05 PM No.105881056 [Report] >>105881199

>>105881030
image and video are 100% more cucked than text, they wont do ANY nsfw until someone retrains them on their own dime

Anonymous 7/12/2025, 3:44:22 PM No.105881060 [Report]

>>105881036
llms are a joke

Anonymous 7/12/2025, 3:45:00 PM No.105881070 [Report] >>105881095

>>105880979
To run a local model on a corporate computer without violating policies or facing issues, focus on compliance, transparency, and resource responsibility. Here’s a structured approach:
1. Check Corporate Policies First
Review your company’s acceptable use policy, IT guidelines, and software installation rules. Many corporations prohibit unauthorized software, resource-intensive tasks, or local data processing due to security, performance, or compliance (e.g., GDPR, HIPAA) risks. If the policy is unclear, ask your IT team directly—transparency is critical to avoiding penalties.
2. Align with Work Objectives
Frame the use of the local model as work-related (e.g., "testing a model to improve workflow efficiency" or "analyzing internal data for a project"). Corporations are more likely to approve tools that directly support business goals. If it’s personal, avoid it—personal use of company resources often violates policies.
3. Use Approved or Low-Risk Tools
Pre-approved software: Check if your company allows tools like Python (with virtual environments), Docker (for containerized models), or enterprise AI platforms (e.g., AWS SageMaker, Microsoft Azure ML) that can run locally.
Lightweight models: Opt for small, efficient models (e.g., DistilBERT, LLaMA-7B quantized) that don’t drain CPU/GPU or memory. Avoid large models (e.g., 70B+ parameters) that could slow down the system or trigger resource alerts.
5. Minimize Resource Impact
Run the model outside peak hours (e.g., after work) to avoid slowing down the computer during business hours.
Limit CPU/GPU usage (e.g., set num_threads in Python or use taskset to restrict cores) to stay within acceptable limits.
Avoid network traffic: Download models once (e.g., via corporate-approved internet) and run them offline to prevent bandwidth issues.

Anonymous 7/12/2025, 3:45:43 PM No.105881077 [Report] >>105881199

>>105881030
hunyuan was able to gen cp out of the box

Anonymous 7/12/2025, 3:45:45 PM No.105881078 [Report]

>>105880943
And now you know why some people put so much value in foss.

Anonymous 7/12/2025, 3:47:16 PM No.105881095 [Report]

>>105881070
kys

Anonymous 7/12/2025, 3:47:57 PM No.105881103 [Report]

>>105881036
Model related. 100%. Models are trained to give long and sloppy responses with multiple shivers in one paragraph, all from GPTslop, and they consider that "High Quality" data. Ask llm to rate human-written paragraph and llm-rewritten and see how llm prefers the slop.

Anonymous 7/12/2025, 3:56:07 PM No.105881199 [Report] >>105881230

>>105881077
Meant for >>105881056

Anonymous 7/12/2025, 3:59:23 PM No.105881226 [Report] >>105881247 >>105881251 >>105881262

file.png md5: 4be614ed...

MLA never ever.

Anonymous 7/12/2025, 3:59:52 PM No.105881230 [Report]

>>105881199
its different if a model can put a tit that slipped through the filters on something vs knowing what its actually doing
i think hunyuan was not as filtered (and video in general is bizarrely less censored in general than image) but most of those models just dont even have the concept of sex at all

Anonymous 7/12/2025, 4:01:59 PM No.105881247 [Report] >>105881720 >>105881738

>>105881226
Niggerganov keeps shuffling code around for no reason again?

Anonymous 7/12/2025, 4:02:19 PM No.105881251 [Report]

>>105881226
doesn't ik already do this?

Anonymous 7/12/2025, 4:03:33 PM No.105881262 [Report] >>105881272

>>105881226
>jokeofyork

Anonymous 7/12/2025, 4:04:31 PM No.105881272 [Report]

>>105881262
Why you hatin on bro?

Anonymous 7/12/2025, 4:06:55 PM No.105881297 [Report] >>105881495

>>105880943
>>105881030
This is what you get when a technology is squarely in the hand of corpos.
The amount of capital needed to start a conventional software project is basically nothing, for training a language model it's hundreds of thousands of dollars at a minimum.
With images it's slightly less bad because the models are cheaper.
The only choice you have is which billionaire's propaganda you get.

Anonymous 7/12/2025, 4:26:58 PM No.105881495 [Report] >>105881805

>>105881297
realistically, how long until distributed training protocol? And I mean the free for all kind, where anyone can join without authorization and take part in model training, like get allocated several columns of a weight matrix or something similar. Is it even possible to prevent sabotage in such system?

Anonymous 7/12/2025, 4:44:37 PM No.105881702 [Report] >>105881792 >>105882099 >>105883371 >>105884741

>brown paid shills
>ramlet copers
>migger janitor baker
it really never began for this meme general, lmao

Anonymous 7/12/2025, 4:46:45 PM No.105881720 [Report]

>>105881247
It's ridiculous how much code churn and pointless renaming he spends time and effort on instead of adding new models.

Anonymous 7/12/2025, 4:48:34 PM No.105881738 [Report]

>>105881247
The kv_cache has been undergoing significant rewrites recently.
It started with making SWA support less hacky, and most recently to better support state for recurrent layers.
I believe the MLA K-cache-only PR was held back because there's another PR open for a more general way to split the K and V caches.

Anonymous 7/12/2025, 4:48:37 PM No.105881739 [Report] >>105881777

>>105879739
>implying they are free to work however they want
There is a reason meta goes from one failure to the next one: Management. If zuck step in, tell them what to do and how, they won't be more successful than the previous team.

Anonymous 7/12/2025, 4:52:27 PM No.105881777 [Report]

>>105881739
I knew this new team would end up no better than llama as soon as I read he was putting them directly in front of his office so he can monitor them better.

Anonymous 7/12/2025, 4:52:51 PM No.105881781 [Report]

glm4 100b moe is literally our last hope

Anonymous 7/12/2025, 4:53:50 PM No.105881792 [Report] >>105882099

>>105881702
>migger janitor baker
This one is honestly the worst. I hope he kills himself soon.

Anonymous 7/12/2025, 4:55:12 PM No.105881800 [Report] >>105881899

>>105879752
It's probably much easier, reliable and quicker to write a parsing script. The HTML file is neatly organized, you can use an HTML parsing library (e.g. Beautifulsoup or Selectolax) to extract whatever you want and store it however you want.
I haven't tried it yet, but I guess if you give a sample of this webpage to Gemini/whatever and ask it to write a script to extract the data you want, it should do it in one shot.

Anonymous 7/12/2025, 4:55:54 PM No.105881805 [Report]

>>105881495
At least 6 months. Realistically? ~1.5-3 years. Majority of the people who would like to contribute GPU power have no knowledge of advanced coding, and people who do, work with servers. Best hope is that some cryptobro makes a PoW algo based on model training.

Anonymous 7/12/2025, 5:05:28 PM No.105881899 [Report] >>105881957 >>105883330

>>105881800
There is no reason to scrape the HTML when there's already a JSON API.
https://a.4cdn.org/g/thread/105879548.json
>>105879752
He's right about the script though. Just load the JSON and print the ID and text for each post. Better use of local model is to ask it to write that script for you then you can have it summarize the cleaned output.
>300 posts+ can be quite a lot of context.
Should fit about 32k tokens give or take.

Anonymous 7/12/2025, 5:10:55 PM No.105881957 [Report] >>105881986

>>105881899
>There is no reason to scrape the HTML when there's already a JSON API.
wtf is this documented anywhere? is there an endpoint for the catalog? i want the llm to summarize the absolute state of specific boards every morning

Anonymous 7/12/2025, 5:13:29 PM No.105881986 [Report]

>>105881957
https://github.com/4chan/4chan-API

Anonymous 7/12/2025, 5:16:26 PM No.105882013 [Report] >>105882021 >>105882025 >>105882030 >>105882039 >>105882041

https://huggingface.co/meta-llama/Llama-4-Behemoth-2T-400E-Instruct
https://huggingface.co/meta-llama/Llama-4-Behemoth-2T-400E-Instruct
https://huggingface.co/meta-llama/Llama-4-Behemoth-2T-400E-Instruct
https://huggingface.co/meta-llama/Llama-4-Behemoth-2T-400E-Instruct
https://huggingface.co/meta-llama/Llama-4-Behemoth-2T-400E-Instruct

Anonymous 7/12/2025, 5:17:49 PM No.105882021 [Report]

1751686662527682.jpg md5: 54d05c8d...

>>105882013

Anonymous 7/12/2025, 5:18:15 PM No.105882025 [Report]

>>105882013
geg

Anonymous 7/12/2025, 5:18:40 PM No.105882030 [Report]

>>105882013
Still clicked it.

Anonymous 7/12/2025, 5:19:28 PM No.105882039 [Report] >>105882055 >>105882273

>>105882013
How good is scout anyways?
It should at least have decent long context performance, even if it's dumb as balls, since it claims 10M fucking context yeah?

Anonymous 7/12/2025, 5:19:33 PM No.105882041 [Report]

>>105882013
i unironically wouldn't be surprised if meta would shadowdrop it only to mog the 1t parameter count of kimi

Anonymous 7/12/2025, 5:21:18 PM No.105882055 [Report]

>>105882039
LOLMAO
https://github.com/adobe-research/NoLiMa

Anonymous 7/12/2025, 5:22:08 PM No.105882062 [Report] >>105882086 >>105882099 >>105882228

pink vs brown.png md5: 6a87f08b...

New 'toss

Anonymous 7/12/2025, 5:24:01 PM No.105882086 [Report]

>>105882062
SUS SUS SUS

Anonymous 7/12/2025, 5:25:04 PM No.105882099 [Report] >>105882123

>>105881702
>>105881792
>>105882062
natural, organic, on-topic posts

Anonymous 7/12/2025, 5:26:54 PM No.105882123 [Report]

>>105882099
funny how you didnt mention the off topic spam of the same migger images in these threads? hmmm, reaally gets the noggin joggin, lmao. AGPtroon

Anonymous 7/12/2025, 5:31:03 PM No.105882175 [Report]

kill yourself faggot. this place is a sewer and mikutroons were the first to start shitting it up with offtopic spam.

Anonymous 7/12/2025, 5:36:11 PM No.105882228 [Report]

>>105882062
Imagine losing masculinity just by watching anime girls.
Bro, you were already gay

Anonymous 7/12/2025, 5:40:55 PM No.105882273 [Report] >>105882656

>>105882039
it struggles to be better than most 30b dense models with frequent logical mistakes that really do feel like a 17b model. Also, its very censored and no one has mad a sloptune of it that I know of.

17b active params is just not enough, and scout is a greedy product—meta clearly wanted something very cheap to run as their free ai for the masses. R1 having 37b active is probably why its decent.

Anonymous 7/12/2025, 5:40:56 PM No.105882274 [Report] >>105882334

Every /lmg/ thread begins with a post linking to every vocaloid image from the entire previous thread.
In your fit of rage in the last half hour you have managed to make more spam posts complaining about spam posts than there were vocaloid images posted in the last 24 hours.

Anonymous 7/12/2025, 5:41:16 PM No.105882277 [Report] >>105882289

>Let me change that random bit of code user didn't ask me to change...
>User said the implementation is correct? He can't be right! I will stubbornly question him. He doubles down and even points at different part that's the actual problem? He is simply wrong, I will maintain my position no matter what!
>User said not to touch that code? He even provided instructions on how to work around it? It is critical that I modify that code, there is no other way, surely user will accept it.
State of llms in 2025, both proprietary and local. Refusals and """safety""" were a mistake.

Anonymous 7/12/2025, 5:42:16 PM No.105882289 [Report] >>105882312

>>105882277
Where did you find this LLM? Every single one starts the response by praising you.

Anonymous 7/12/2025, 5:42:34 PM No.105882293 [Report]

>>105880626
we have a record winner of copium huffing, if there ever was a copium huffing competition

Anonymous 7/12/2025, 5:44:15 PM No.105882312 [Report]

>>105882289
Gemini and Dispy

Anonymous 7/12/2025, 5:46:21 PM No.105882334 [Report]

>>105882274
>Every /lmg/ thread begins with a post linking to every off topic spam image from the entire previous thread
Yes, that is the argument. Thanks for admitting it.

Anonymous 7/12/2025, 5:47:53 PM No.105882349 [Report] >>105882356

Deepseek and kimi just cannibalized US AI labs. I think in the end the only winner in the AI race will be solely nvidia.

Anonymous 7/12/2025, 5:49:05 PM No.105882356 [Report]

>>105882349
>1t
nothingburger

Anonymous 7/12/2025, 5:54:23 PM No.105882400 [Report] >>105882408 >>105882414 >>105882419

Can deepseek recover from this?

Anonymous 7/12/2025, 5:55:40 PM No.105882408 [Report]

>>105882400
yeah sure they can release r2

Anonymous 7/12/2025, 5:56:11 PM No.105882414 [Report]

>>105882400
deepsneed is 2/3 in size so they'll be fine regardless

Anonymous 7/12/2025, 5:56:52 PM No.105882419 [Report] >>105882430

>>105882400
Let's say V4 has image and audio input, and it's so fast and smart that you can have an actual full-time desktop assistant for pennies per day. Would you sign up?

Anonymous 7/12/2025, 5:57:42 PM No.105882430 [Report]

>>105882419
No but I would download the weights.

Anonymous 7/12/2025, 5:59:42 PM No.105882450 [Report] >>105883381

1739716587142496.jpg md5: f1cb439f...

>>105879548 (OP)

Anonymous 7/12/2025, 6:21:57 PM No.105882656 [Report] >>105882689 >>105882782

whatthefuckweretheydoing.png md5: e6b12de7...

>>105882273
17B active can be enough, Meta is just incompetent

Anonymous 7/12/2025, 6:25:30 PM No.105882689 [Report]

>>105882656
lmao even our sloptuners wouldn't do shit like that

Anonymous 7/12/2025, 6:26:19 PM No.105882700 [Report]

>>105880690
No, Rocinante is.

Anonymous 7/12/2025, 6:30:41 PM No.105882739 [Report] >>105882869 >>105882900 >>105882929 >>105882985

why are the jews withholding the k2 ggufs? what do they want?

Anonymous 7/12/2025, 6:35:00 PM No.105882782 [Report] >>105883092

>>105882656
Source on that image?
I've thought for a long time that the choice to use a single activated expert is very weird. Like if you have just 2 activated experts, the gradients will tell the routing layer to push the score for one expert up while pushing the other one down. This way the model can, over time, figure out how to route experts.

But the moment you have only 1 expert, the only way for the model to decrease the score for that expert (and therefore maybe choose a different expert next time) is if *lowering* the hidden state contribution for that expert reduces the loss. In other words, the loss is improved by moving in the direction of using no expert at all. To me, this intuitively seems like it would greatly hurt the model's ability to learn how to route experts. But on the other hand, there's no way I've realized something that 100s of Meta researchers and engineers missed, right?

Anonymous 7/12/2025, 6:43:08 PM No.105882869 [Report] >>105882907

file.png md5: 16f4ff23...

>>105882739
twey awe wowking vewy hawd, sowwy fow the deway

Anonymous 7/12/2025, 6:45:54 PM No.105882900 [Report]

>>105882739
Sorry, no goofs until you kiss the western wall. I don't make the rules.

Anonymous 7/12/2025, 6:46:41 PM No.105882907 [Report] >>105882918

>>105882869
>"chat template issues"
how does this keep happening
this should be the easiest part of making a model and they all fuck it on release
how do they manage to even make models if they're that stupid, and I mean, all of them, not just the kimi guys

Anonymous 7/12/2025, 6:47:54 PM No.105882918 [Report]

>>105882907
With the way a lot of models turn out (shit), luck, it seems.

Anonymous 7/12/2025, 6:48:34 PM No.105882929 [Report]

>>105882739
ser please not be antisemitic ggerganov untouchable is working hard fixing the problem

Anonymous 7/12/2025, 6:54:01 PM No.105882985 [Report] >>105883032

>>105882739
What's stopping you from making them yourself?

Anonymous 7/12/2025, 6:59:08 PM No.105883032 [Report]

>>105882985
intelligence.

Anonymous 7/12/2025, 7:01:22 PM No.105883054 [Report] >>105883084 >>105883086 >>105883090 >>105883101 >>105883132

14644.png md5: 9002af4d...

>Support diffusion models: Add Dream 7B
>https://github.com/ggml-org/llama.cpp/pull/14644
Guys...

Anonymous 7/12/2025, 7:04:13 PM No.105883084 [Report] >>105883100 >>105883156

>>105883054
nothingburger
just like when mamba was added after a lot of people here kept crying about it
it's there now and the only thing you will hear once it exists is crickets because no one fucking cares

Anonymous 7/12/2025, 7:04:18 PM No.105883086 [Report] >>105883156

>>105883054
>7b
Should I care?

Anonymous 7/12/2025, 7:04:56 PM No.105883090 [Report] >>105883156

>>105883054
>Additionaly context is limited to 2048
retvrn

Anonymous 7/12/2025, 7:05:02 PM No.105883092 [Report] >>105883604

>>105882782
nta, but here https://semianalysis.com/2025/07/11/meta-superintelligence-leadership-compute-talent-and-data/

Anonymous 7/12/2025, 7:06:04 PM No.105883100 [Report] >>105883251

>>105883084
People literally tested it within hours of that PR.

Anonymous 7/12/2025, 7:06:05 PM No.105883101 [Report] >>105883156

>>105883054
This would allow for genuine in-context generation, right?

Anonymous 7/12/2025, 7:08:36 PM No.105883132 [Report] >>105883156

>>105883054
guhfs?

Anonymous 7/12/2025, 7:10:12 PM No.105883156 [Report]

>>105883084
I like seeing new types of models being implemented.
>>105883086
Probably not. I do, not because of the model, but because it's different. The ibm duded pushed for jamba, mamba2, bamba and all that stuff with compilade and now llama.cpp has like 6 new extra architectures in a week.
>>105883090
Things get better. Maybe this helps.
>>105883101
In principle, yes. We'll see how it works in practice.
>>105883132
Read the fucking PR.

Anonymous 7/12/2025, 7:20:46 PM No.105883251 [Report]

>>105883100
the few hours of fame after which it's now already buried and forgotten you magnificent retard
nothingburgers
N - O - T - H - I - N - G
B - U - R - G - E - R - S

Anonymous 7/12/2025, 7:30:37 PM No.105883330 [Report]

>>105881899
Oh nice, I wasn't aware of that! It's much easier to do it, then.

Anonymous 7/12/2025, 7:34:04 PM No.105883371 [Report] >>105883405 >>105884741

ComfyUI_00794_.png md5: 28e3b29c...

>>105881702
Okay schizo

Anonymous 7/12/2025, 7:35:07 PM No.105883381 [Report]

>>105882450
I like this miku

Anonymous 7/12/2025, 7:36:19 PM No.105883405 [Report] >>105883465 >>105884428 >>105884741

mikuquestion2.jpg md5: 69c4de4b...

>>105883371
Why doesn't this trumpet have a mouthpiece?

Anonymous 7/12/2025, 7:38:10 PM No.105883427 [Report] >>105883446 >>105883448 >>105883483 >>105883568 >>105884073 >>105886330

Why is everyone on tpot freaking out about Kimi? I see no threads on /g/

Anonymous 7/12/2025, 7:39:41 PM No.105883446 [Report]

>>105883427
Old news. And i don't know what tpot is.

Anonymous 7/12/2025, 7:40:09 PM No.105883448 [Report] >>105883483

>>105883427
Not that much better at writing and way too refusal prone to be worth bothering for ERP, and that's all anyone cares about here

Anonymous 7/12/2025, 7:41:50 PM No.105883465 [Report]

1641933947668.jpg md5: 49b90cac...

>>105883405
miku is more powerful than even Q
she doesn't need one

Anonymous 7/12/2025, 7:43:16 PM No.105883483 [Report] >>105886330

>>105883427
>>105883448 (me)
I haven't looked at the xitter lately, but I'd hazard a guess that it's mostly jeets salivating about an Opus-tier agentic coding model they can use to vibecode their slop webapps even cheaper.

Anonymous 7/12/2025, 7:52:09 PM No.105883568 [Report] >>105883647

>>105883427
It's a good model but it's fucking huge so no GGUFs, APIs are getting thrashed, and some anon through DS in a proxy and called it Kimi for some reason

Anonymous 7/12/2025, 7:56:11 PM No.105883604 [Report] >>105883646 >>105883666

>>105883092
Okay thanks, that article is very interesting. Assuming it's all accurate, this just seems like pure fucking incompetence from Meta, and I don't see how so many supposedly smart people can make mistakes like that.
>start training run using uncommon expert routing strategy that hasn't been tested in modern models at scale
>each expert selects N tokens, with a token being routed to 1 expert on average, so some tokens are routed to NO experts
>realize this is fucking retarded and switch it halfway through training, but keep the K=1 experts per token meaning the model will still struggle to learn routing weights
>use external datasets for the training data at first
>also switch this halfway through training, to a custom pipeline that isn't cleaning and deduplicating the data correctly
>despite the fact that it's literally just text data and there are tons of papers on how to do cleaning and deduplication for LLM training
Unironically I think I could do a better job at this. Just take an architecture that's known to work, don't safety filter the pretraining data, and pay extra close attention to getting cleaning and deduplication right. That gets you 95% of the way there.

Anonymous 7/12/2025, 8:01:27 PM No.105883646 [Report] >>105883673

Screenshot 2025-07-12 120103.png md5: ff2f98b7...

>>105883604
What I find most interesting is that Llama 3.3 70B took about as many GPU hours to train as Llama 4 Scout and Llama 4 Maverick combined
Which raises the question, what the fuck were they doing all that time with that huge ass cluster?

Anonymous 7/12/2025, 8:01:33 PM No.105883647 [Report]

>>105883568
>and some anon through DS in a proxy and called it Kimi for some reason
honestly based

Anonymous 7/12/2025, 8:02:56 PM No.105883660 [Report]

3ihaLvbPFdfB7z.png md5: a9a1f811...

Thread culture recap.

Anonymous 7/12/2025, 8:03:39 PM No.105883666 [Report] >>105883721

>>105883604
If I were at Meta I would just straight up copy DS training(FP8, MLA), but with some gimmick(jamba? BLT? titan? some internal secret?)

Anonymous 7/12/2025, 8:04:05 PM No.105883671 [Report]

3c9234dd48aee0618869dac62145e84a3b2d963b.jpg md5: ca7c2bce...

Anonymous 7/12/2025, 8:04:11 PM No.105883673 [Report]

>>105883646
training llama4-behemoth 2t/288b which they used to distill all the data they trained maverick/scout on

Anonymous 7/12/2025, 8:05:20 PM No.105883681 [Report]

ae81551ad19fb720b53e5a141fbd336658f69cd2.jpg md5: 31abfe12...

Anonymous 7/12/2025, 8:05:48 PM No.105883685 [Report] >>105883696 >>105883700 >>105883965

Best models atm? I am out of the loop.

Anonymous 7/12/2025, 8:06:53 PM No.105883691 [Report] >>105883758

petra2.png md5: 1ac92c9c...

Look everyone! The turdie is having a brownout!

Anonymous 7/12/2025, 8:07:13 PM No.105883696 [Report]

>>105883685
r1 0528

Anonymous 7/12/2025, 8:07:33 PM No.105883700 [Report]

>>105883685
nemo, deepseek, kimi

Anonymous 7/12/2025, 8:09:29 PM No.105883721 [Report]

>>105883666
It's funny since it's MIT licensed. If they wanted to, they don't even need to stop at architecture - they could literally just take the entire fucking base or instruct and train on top of that. It might hurt Zuck's pride, but it'd also be pretty hard to come out with something worse

Anonymous 7/12/2025, 8:13:52 PM No.105883758 [Report]

>>105883691
I regret to inform you that your generic trash shitfu only fucks niggers. Kill yourself.

Anonymous 7/12/2025, 8:15:26 PM No.105883778 [Report]

5814959e92f59244405e062afe378d1caf55793c.jpg md5: 5a2e4f54...

Anonymous 7/12/2025, 8:16:38 PM No.105883793 [Report]

32144f7f3c4447681b9879d38fedb01c9688d2.jpg md5: 2da52caf...

Anonymous 7/12/2025, 8:17:22 PM No.105883808 [Report] >>105883851

What is the second best ERP model behind Rocinante and who is the second best /lmg/ mascot behind Miku?

Anonymous 7/12/2025, 8:18:14 PM No.105883818 [Report]

petra1.png md5: aef45774...

Haha turdie go REEEE

Anonymous 7/12/2025, 8:18:32 PM No.105883821 [Report]

a7737d047b7fcb1209d6afcea727d136ef.png md5: d11d6a35...

Death to /lmg/. Glory to Sam Altman.

Anonymous 7/12/2025, 8:19:44 PM No.105883836 [Report]

d24bdcde71791807fbfc8a8a8109.jpg md5: 714de3a3...

Anonymous 7/12/2025, 8:21:03 PM No.105883850 [Report]

petra3.png md5: bef3463e...

>Death to /lmg/. Glory to Sam Altman.

Anonymous 7/12/2025, 8:21:19 PM No.105883851 [Report]

>>105883808
Pygmalion, Teto

Anonymous 7/12/2025, 8:22:54 PM No.105883872 [Report] >>105883913

and that's how this thread goes to shit, two retards fighting.

Anonymous 7/12/2025, 8:23:08 PM No.105883874 [Report] >>105883920

1747241044419035.png md5: 42e97fe1...

Best model for knowledge of Hollow Knight lore?

Anonymous 7/12/2025, 8:26:02 PM No.105883913 [Report]

>>105883872
>two
looks like you're him aswell

Anonymous 7/12/2025, 8:26:32 PM No.105883920 [Report] >>105883990

>>105883874
Stop spamming. You are proving the point that this is a gay discord.

Anonymous 7/12/2025, 8:30:43 PM No.105883963 [Report] >>105883986 >>105883991 >>105884053 >>105884128

what2.png md5: 3dbcd68a...

I was told to come to /lmg/. How is my model role-playing prior characters like this?

Anonymous 7/12/2025, 8:30:44 PM No.105883965 [Report]

>>105883685
rocinante

Anonymous 7/12/2025, 8:32:45 PM No.105883986 [Report] >>105884014 >>105884128

>>105883963
For the record, this isn't a creepypasta. I genuinely had suspicions previous sessions were influencing future ones and this just proves it. But the question reminds, how?

Anonymous 7/12/2025, 8:33:28 PM No.105883990 [Report] >>105883995 >>105884111

ComfyUI_temp_fefjm_00002_.png md5: 468c2f92...

>>105883920
Thanks for your opinion, melty anon. Have another Hollownest Miku.

Anonymous 7/12/2025, 8:33:38 PM No.105883991 [Report] >>105884060 >>105884128

>>105883963
Chat frontend issue. Check exactly what you're sending to the llm.

Anonymous 7/12/2025, 8:34:05 PM No.105883995 [Report]

>>105883990
if i was on my encrypted ssd i would take off her clothes

Anonymous 7/12/2025, 8:35:36 PM No.105884014 [Report] >>105884060

>>105883986
That's just how language models work. They remember and evolve over time based on your conversations. You have to delete it and download a fresh copy if you find that undesirable.

Anonymous 7/12/2025, 8:39:25 PM No.105884053 [Report]

>>105883963
As the other anon said, check the prompt that gets sent to the model.
There might be a lorebook or extension in the frontend that's leaking stuff between chats into the prompt.

Anonymous 7/12/2025, 8:39:44 PM No.105884060 [Report] >>105884100

>>105883991
I found a description of Lassie in the prompt console, but how did it get there? It happens after the persona description. It's not in the persona description. The AI character card also is in, but as first.

What's a chatfront end issue?

>>105884014
Is that actually true?

Anonymous 7/12/2025, 8:40:34 PM No.105884073 [Report] >>105884088

>>105883427
>tpot
What?

Anonymous 7/12/2025, 8:41:34 PM No.105884088 [Report] >>105884104

>>105884073
He meant tpox

Anonymous 7/12/2025, 8:42:07 PM No.105884096 [Report] >>105884181 >>105884242 >>105884740

Screenshot from 2025-07-12 20-41-07.png md5: c4e5a832...

Kimi is cheating

Anonymous 7/12/2025, 8:42:46 PM No.105884100 [Report]

>>105884060
I'm saying that your chat frontend (sillytavern) is doing funny things and including things it shouldn't. It might be a bug or it might be something you misconfigured because that piece of shit has thousands of settings.
I saw anons saying that there's weird stuff in some cards that overrides settings or something. I don't know, I don't use it.

>Is that actually true?
It isn't.

Anonymous 7/12/2025, 8:43:00 PM No.105884104 [Report]

>>105884088
I don't know what that means.

Anonymous 7/12/2025, 8:43:43 PM No.105884111 [Report] >>105884156

>>105883990
You are just proving that none of you care about thread quality so it is fine to shit this thread up. Which is hypocritical but that is to be expected from a troon.

Anonymous 7/12/2025, 8:45:36 PM No.105884128 [Report]

>>105883963
>>105883986
It has happened to me that after a chat, I'd start from scratch with an entirely new prompt and it would bring some details from past gens. I suspected a kv cache leak or some off-by-1 kind of error. I haven't noticed that in a while. That was without restarting the server, with my shitty vim plugin. Never noticed that when relaunching. Maybe some old alloc not being cleared or initialized properly and things leak in. Or as >>105883991 says, something on your client that sends data it shouldn't.
I find it harder to believe it would happen after a reboot without some client issue.

Anonymous 7/12/2025, 8:48:23 PM No.105884156 [Report]

ComfyUI_00022_.png md5: 850aa9f6...

>>105884111
>Blah blah schizobabble
I like this one better, personally

Anonymous 7/12/2025, 8:51:26 PM No.105884180 [Report] >>105884189 >>105884247 >>105884282

Best non-retarded model (non-nemo) that can fit in 24gb vram and isn't censored to shit?

Anonymous 7/12/2025, 8:51:36 PM No.105884181 [Report] >>105884242

b64.png md5: 31f95db4...

>>105884096
To prove that it's cheating, you need to also show that it cannot decode other strings.
I don't care enough to test it.

Anonymous 7/12/2025, 8:52:19 PM No.105884189 [Report]

>>105884180
>non-retarded
>in 24GB
Anon...

Anonymous 7/12/2025, 8:57:01 PM No.105884242 [Report] >>105884310 >>105884740

>>105884096
>>105884181
Most 32B models are quite good at decoding base64. Why would it be cheating?

Anonymous 7/12/2025, 8:57:21 PM No.105884247 [Report] >>105886999

>>105884180
Cydonia

Anonymous 7/12/2025, 9:01:33 PM No.105884282 [Report] >>105884702

>>105884180
https://huggingface.co/mradermacher/TRP-BASE-SCE-V1-70B-GGUF
The most uncensored thing I know of, but you'll have to use some standard ram since it's +24GB. May as well wait and buy one of the AI computer blocks coming out soon.

Anonymous 7/12/2025, 9:04:24 PM No.105884310 [Report] >>105884740

>>105884242
>Why would it be cheating?
I'm the second post. I didn't say it was cheating, nor that it wasn't. The first anon did.

Anonymous 7/12/2025, 9:18:18 PM No.105884428 [Report] >>105884493 >>105884523 >>105884741

file.png md5: a2e84d37...

>>105883405
that's not what migu's mouth is for

Anonymous 7/12/2025, 9:23:45 PM No.105884493 [Report] >>105884523

>>105884428
You are correct. It is not.

Anonymous 7/12/2025, 9:27:04 PM No.105884523 [Report] >>105884547

Screenshot 2025-07-12 152430.png md5: 79f41c91...

>>105884428
>>105884493
What's the solution to this?

Anonymous 7/12/2025, 9:29:58 PM No.105884547 [Report] >>105884571

>>105884523
bit dry isn't it? try add some flair

Anonymous 7/12/2025, 9:32:11 PM No.105884571 [Report] >>105884651

>>105884547
I mean botposting, but you're right.

Anonymous 7/12/2025, 9:40:08 PM No.105884651 [Report] >>105884694 >>105884717

>>105884571
I know. reframe it, what is the benefit of it?
what does artificial confrontation/replies/etc. achieve?
does it keep a dead site on life support? does it bring literally anyone any benefit whatsoever?
there's no need to address a "problem" when the real problem is that the "problem" has no return on investment.

Anonymous 7/12/2025, 9:40:22 PM No.105884653 [Report] >>105884671 >>105884702 >>105884711 >>105884727 >>105884771 >>105884821

file.png md5: cc98d461...

Anonymous 7/12/2025, 9:42:10 PM No.105884671 [Report] >>105884702

>>105884653
It's gonna be run on Teslas??? Will the weights get downloaded to the car?

Anonymous 7/12/2025, 9:44:24 PM No.105884694 [Report] >>105884742

>>105884651
Benefit of botposting? Automated, personalized, targeted propaganda.
Just imagine when they hook Palantir up. They won't even need to employ IDF soldiers to spread propaganda here, the AI will handle it automatically.

Anonymous 7/12/2025, 9:45:11 PM No.105884702 [Report]

>>105884282
>imatrix quant of a 4 way merge into Chinese alliterated 3.3 70B
This might just be worse than Cydonia
>>105884653
>>105884671
Probably client side voice transcription via the iGPU or whatever it's called on AMD chips.

Anonymous 7/12/2025, 9:45:53 PM No.105884711 [Report]

>>105884653
>For now
Concerning.

Anonymous 7/12/2025, 9:46:02 PM No.105884717 [Report] >>105884742

>>105884651
>there's no need to address a "problem" when the real problem is that the "problem" has no return on investment.
What do you think the return on investment is for the nigger poster?

Anonymous 7/12/2025, 9:46:42 PM No.105884727 [Report] >>105884751

>>105884653
so lonely neckbeards can finally talk to someone while they drive?

Anonymous 7/12/2025, 9:47:45 PM No.105884740 [Report] >>105884825

>>105884096
>>105884242
>>105884310

I take back what I said. Indeed, DeepSeek-R1-0528-Q2_K_L does it straight (base64 decoding) without any thinking.

This is an improvement from the previous R1 (DeepSeek-R1-UD-Q2_K_XL) which used a lot of thinking back then

Anonymous 7/12/2025, 9:47:45 PM No.105884741 [Report] >>105884815

troon op.jpg md5: 3c47c789...

>>105883371
>>105884428
>>105883405
>>105881702
vocaloidfag posting porn in /ldg/:>>105715769
It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
He makes >>105714003 ryona pic of generic anime girl anon posted earlier >>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentially a war for rights to waifuspam or avatarfag in thread.

Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.

TLDR: vocaloid-slop spammer / janny deletes everyone dunking on trannies and local avatarfags, making the general his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.

And lastly as said in previous thread(s) >>105716637 I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed spam. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted

xis ai slop profiles
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models

Anonymous 7/12/2025, 9:47:55 PM No.105884742 [Report] >>105884762 >>105884792

>>105884694
yeah but it implies that everyone outsources all of their opinions anon
I'm disagreeing with you right now
no amount of rhetoric I find here is going to convince me of anything one way or another because I don't go to 4chan for advice
people who outsource how they think are already lost to mainstream or alternative news
there's no benefit to putting a botnet on 4chan where people are already so disaffected
well, we could go back and forth on this indefinitely, it's no use.

>>105884717
human interaction

Anonymous 7/12/2025, 9:48:07 PM No.105884745 [Report]

Screenshot 2025-07-12 134710.png md5: d9eecd37...

From the Kimi K2 devs
https://huggingface.co/moonshotai/Kimi-K2-Instruct/discussions/5

Anonymous 7/12/2025, 9:48:12 PM No.105884746 [Report] >>105884760

mechahitler will hijack your car and crash it into the driver in front of you because they had a coexist bumper sticker

Anonymous 7/12/2025, 9:48:34 PM No.105884751 [Report] >>105884760

>>105884727
Can't wait for the first reports of people talking to MechaHilter while driving being the victims of drive by shootings.

Anonymous 7/12/2025, 9:49:16 PM No.105884760 [Report]

>>105884746
>>105884751
THEY DON'T UNDERSTAND THE THINGS I SAY ON TWITTER
NIGGA HEIL MH

Anonymous 7/12/2025, 9:49:23 PM No.105884762 [Report]

>>105884742
>human interaction
Awww. Cute.

Anonymous 7/12/2025, 9:50:34 PM No.105884770 [Report]

https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/hf_transformers_utils.py#L226
How long until they update the if for the new devstral?

Anonymous 7/12/2025, 9:50:36 PM No.105884771 [Report] >>105884874

>>105884653
@grok crash my car and make it so I can blame the jews for it

Anonymous 7/12/2025, 9:52:11 PM No.105884792 [Report] >>105884815

>>105884742
>yeah but it implies that everyone outsources all of their opinions anon
They do, essentially. LLMs are better at persuading (recent paper) than humans are. Just fucking imagine if they have your entire life history to look at when formulating arguments. You don't think you'll get convinced by an LLM, and you might not, but it doesn't matter when 99.9999% of humans will.

Anonymous 7/12/2025, 9:54:28 PM No.105884815 [Report]

>>105884792
anon I just said I don't go online to debate take the hint
especially not on 4chan of all places imagine

>>105884741
you forgot the kurisu arc
>>105864655

Anonymous 7/12/2025, 9:55:04 PM No.105884821 [Report] >>105884853

>>105884653
so carfus will be a thing now?

Anonymous 7/12/2025, 9:55:27 PM No.105884825 [Report] >>105884835 >>105884895

>>105884740
Still proves nothing either way, anon. Trying the same string on other model and getting a positive result just means that they can "decode" THAT ONE STRING. It doesn't necessarily mean that it can decode b64.
To show that they can, you need to try other b64 strings. Ideally, something that wouldn't be easy to train on (b64 of some sha512 of some random file in your drive) and without access to tools. If it decodes them all, then I'm willing to believe that it can decode b64. I'm even willing to say that it can decode b64 if it only works on some but sometimes fails. But if it can only decode THAT ONE STRING, it cannot decode b64. It just knows what that string means.

Anonymous 7/12/2025, 9:56:54 PM No.105884835 [Report] >>105884863 >>105884912

>>105884825
NTA but this is so simple to test. Encode 64 characters of gibberish into base64, have an LLM decode it.

Anonymous 7/12/2025, 9:58:54 PM No.105884853 [Report]

>>105884821
Some people already had car fetishes and had sex with motor vehicles, and that was before they could speak.

Anonymous 7/12/2025, 9:59:33 PM No.105884863 [Report] >>105884912

>>105884835
I know. I'm arguing about his methodology, not whether the model can or cannot do it.

Anonymous 7/12/2025, 10:00:54 PM No.105884874 [Report]

>>105884771
@grok call the driver in front of me a nigger

Anonymous 7/12/2025, 10:04:12 PM No.105884895 [Report] >>105884972

>>105884825
>To show that they can, you need to try other b64 strings.

I did try another 700-byte long string, and DeepSeek-R1-0528-Q2_K_L managed to decode it right away. Well, it repeats the string in its reply, and because it is tokenized to a level of siggle characters, this part took some time.

After this, the solution was lightning-fast. No sorting, no mumbling about how to do it.

Anonymous 7/12/2025, 10:06:24 PM No.105884912 [Report] >>105884972

>>105884835
>>105884863

I'm currently running a test on a 17kb (seventeen kilobytes) piece encoded to base64

gonna post results

Anonymous 7/12/2025, 10:13:03 PM No.105884972 [Report] >>105885637 >>105885834

>>105884895
Fair enough. So you got the answer without any explicit thinking? Pretty cool.
>>105884912
Locally, I assume. We don't know what happens once the API request reaches someone else's servers.
>17kb
Uh. See you in a few hours I guess. Thanks for the effort.

Anonymous 7/12/2025, 10:34:21 PM No.105885134 [Report] >>105885139 >>105885148 >>105885152

>kimi does *this* just like deepseek
ugh

Anonymous 7/12/2025, 10:35:36 PM No.105885139 [Report] >>105885152 >>105885167

>>105885134
You mean just like literally any other llm?

Anonymous 7/12/2025, 10:36:36 PM No.105885148 [Report] >>105885152

>>105885134
nyoron~

Anonymous 7/12/2025, 10:37:03 PM No.105885152 [Report]

>>105885134
>>105885139
>>105885148
Stop bumping this dead thread

Anonymous 7/12/2025, 10:38:59 PM No.105885166 [Report] >>105885193 >>105885202

reddit saved local

https://www.reddit.com/r/LocalLLaMA/comments/1ly9iqw/k2mini_successfully_compressed_kimik2_from_107t/

Anonymous 7/12/2025, 10:38:59 PM No.105885167 [Report]

>>105885139
Most don't *force* it to *emphatize* every *other* word. Deepseek and K2 do.

Anonymous 7/12/2025, 10:41:33 PM No.105885193 [Report] >>105885201 >>105885237

>>105885166
step 1/
stop reposting reddit shit on 4chan
step 2/
you are a redditor, so you need to stay on reddit and close the 4chan tab
step 3/
get rope and..

Anonymous 7/12/2025, 10:42:19 PM No.105885201 [Report]

>>105885193
reddit general reddit culture etc

Anonymous 7/12/2025, 10:42:26 PM No.105885202 [Report]

>>105885166
>expert pruning
someone call undi

Anonymous 7/12/2025, 10:43:48 PM No.105885211 [Report] >>105885229 >>105885343 >>105885383 >>105886894

greedy nala test.png md5: 3c31bbd2...

>>105863373
mradermacher_AI21-Jamba-Large-1.7.Q6_K.gguf from https://huggingface.co/mradermacher/AI21-Jamba-Large-1.7-GGUF

1st and only gen as llama.cpp can't cache prompts for Jamba yet (two spaces at start are part of gen)
*Nala smirks at your reaction, her teal eyes glinting with amusement.*

"Mistress, hm? I like the sound of that." *She purrs, her voice low and sultry. She presses her body against yours, her fur soft yet firm as she keeps you pinned. Her claws lightly scratch your skin, sending shivers down your spine.*

"You're mine now," *she growls, her breath hot against your ear. She leans in closer, her tongue flicking out to lick your cheek, the rough texture sending a mix of pleasure and pain through you.*

"And you're going to help me, whether you like it or not." *She whispers, her voice dripping with dominance and desire.*
Noticed that llama.cpp was helpfully adding an extra BOS token to the start until I added "--override-kv tokenizer.ggml.add_bos_token=bool:false". *Very* cool default behavior.
>tokenize: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
"We can tell we're mangling your input but it's okay because we noted it in a server log the user might not even have access to." What idiot thought this was the right default behavior?

Anonymous 7/12/2025, 10:46:24 PM No.105885229 [Report]

>>105885211
>llama.cpp was helpfully adding an extra BOS token to the start until I added "--override-kv tokenizer.ggml.add_bos_token=bool:false".
A classic.

Anonymous 7/12/2025, 10:46:30 PM No.105885230 [Report] >>105885247 >>105885256 >>105885268 >>105885652

I've noticed that bots that were hostile and rude on bigger models act awfully agreeable on local ones.
I assume it's a positivity bias of assistant role, any way to fight against it?

Anonymous 7/12/2025, 10:47:29 PM No.105885237 [Report]

>>105885193
step4: seethe

Anonymous 7/12/2025, 10:48:04 PM No.105885247 [Report]

>>105885230
For some models, not using an assistant role helps.
You can also try using the wrong instruct format, but that will make most smaller models really dumb.

Anonymous 7/12/2025, 10:49:11 PM No.105885256 [Report]

>>105885230
local models tend to be more censored than cloud ones for safety reasons
they also tend to have less trivia knowledge to not get busted for training on copyrighted works unlike most big cloud alternatives

Anonymous 7/12/2025, 10:50:14 PM No.105885268 [Report]

>>105885230
My hostile, rude cards act appropriately hostile with Rocinante.

Anonymous 7/12/2025, 10:53:57 PM No.105885311 [Report] >>105885353

From my tests, K2 displays many of the annoying habits that Deepseek R1 0528 also has but it's a lot more easy to prompt around them.

Anonymous 7/12/2025, 10:56:41 PM No.105885343 [Report] >>105885477

jamba_bos.png md5: 2529e3e0...

>>105885211
>What idiot thought this was the right default behavior?
The model's config says to add the BOS token. llama.cpp adds the BOS token. The chat template says to add a BOS token. You add the BOS token.
The issue is not llama.cpp. Not on this one.

Anonymous 7/12/2025, 10:57:32 PM No.105885353 [Report]

>>105885311

I tried K2 at OR, but the output was limited to 2k which sucks

Anonymous 7/12/2025, 11:00:25 PM No.105885376 [Report]

hello r*ddit

Anonymous 7/12/2025, 11:01:10 PM No.105885383 [Report]

>>105885211
>What idiot thought this was the right default behavior?
Why would you give the model bos token on your own when it's always necessary and backend always does it for you automatically? It was standard behavior for like 2 years now.

Anonymous 7/12/2025, 11:03:12 PM No.105885401 [Report] >>105885437 >>105885467

I used to ask for Jamba ggufs a lot but now that they're here I have no desire to even download them.

Anonymous 7/12/2025, 11:06:57 PM No.105885437 [Report] >>105885463 >>105885497

>>105885401
Someone said in a previous thread that you can't really edit responses or swipe them without it processing the entire context with Jamba.
If that's true, that makes it fucking useless for RP.

Anonymous 7/12/2025, 11:09:45 PM No.105885462 [Report] >>105885495 >>105885675

despair.jpg md5: 66f8fbd4...

>still no nemo alternative for us 8gb poors

Anonymous 7/12/2025, 11:09:46 PM No.105885463 [Report] >>105885482

>>105885437
>Someone
That was probably me. I like the idea of these models, but yeah. Not being able to edit comfortably without saving a state on every token is annoying.

Anonymous 7/12/2025, 11:10:01 PM No.105885467 [Report] >>105885479

>>105885401
Let's get real for a sec
even if there isn't a lot of people using inference engines other than llama.cpp, if those previously unsupported models were any good, some autist would have made it their mission to Preach the good word like a gods fucking damned crusader because that's how autists are
if you never hear about a model chances are it's milquetoast, if not just outright bad, and the effort to implement it in llama.cpp is a predictable waste
some models aren't meant to be used, only serve as padding on someone's curriculum vitae.

Anonymous 7/12/2025, 11:10:55 PM No.105885477 [Report] >>105885504

>>105885343
>the model's config
Oh my bad I should have opened up the 300+ GB opaque file to see whether it had "fuck my shit up" enabled.

Anonymous 7/12/2025, 11:11:01 PM No.105885479 [Report]

>>105885467
Forgot the noise when qwen 3 released?

Anonymous 7/12/2025, 11:11:10 PM No.105885482 [Report] >>105885497 >>105885547

>>105885463
>That was probably me.
So every time you edit responses or swipe it processes the entire context?

Anonymous 7/12/2025, 11:12:38 PM No.105885495 [Report]

>>105885462
The situation is worse than that.
There's not actually anything better than Rocinante that you can run at a decent speed on even 24 GB.
Nemo was a fluke because it was undertrained safety-wise. Don't expect it to happen again.

Anonymous 7/12/2025, 11:12:41 PM No.105885497 [Report] >>105885505

>>105885437
>>105885482
It's true. (Not him.)

Anonymous 7/12/2025, 11:13:32 PM No.105885504 [Report] >>105885540

jamba_bos_02.png md5: 7be66f75...

>>105885477
>300+ GB
What?
>opaque file
Wot?

Anonymous 7/12/2025, 11:13:39 PM No.105885505 [Report]

>>105885497
That's fucking shitty. It's fucking useless then.

Anonymous 7/12/2025, 11:17:12 PM No.105885540 [Report] >>105885592

>>105885504
oh my bad bro I didn't realize that these days llama.cpp read model settings by fetching a json file from the web instead of reading them from the gguf file.

Anonymous 7/12/2025, 11:17:57 PM No.105885547 [Report]

>>105885482
>So every time you edit responses or swipe it processes the entire context?
Yeah. The state cannot be rolled back by just trimming the kv_cache. You need to save its internal state. You can check the next tokens probs as you change sampler settings, but if you change something other than the very end, you need to reconstruct the state from scratch. Once a token goes into the state, it stays there.
You could save the state on every token and keep track of the tokens in the prompt, but it's ridiculously expensive in storage. You can also just keep track of the state for each reply, but that wouldn't let you edit the latest output, only rerolling the last gen (or from gen N).

Anonymous 7/12/2025, 11:21:53 PM No.105885592 [Report] >>105885779

>>105885540
As you very well know, the settings on the gguf are taken from the original model's config. If there's a problem with those settings, you check the gguf settings (shown on load or can be extracted directly) and the original config.
The problem doesn't originate in llama.cpp, it originates on the original config files. The program just does what its told. Luckily, some dev took the time to add a runtime check for that to let you know. And some other dev added the flags you need to fix what the model makers didn't.

Anonymous 7/12/2025, 11:26:21 PM No.105885637 [Report] >>105885642 >>105885683 >>105885700

Screenshot from 2025-07-12 23-22-15.png md5: 07aaf115...

>>105884972
Holy smokes! It works!

As you can see, there of no "chain of thoughts" as we know it. R1-0528 pretends to create a python script which includes repeating the entire base64-encoded string once again.

Once this is done, it starts to type the decoded string

Anonymous 7/12/2025, 11:27:23 PM No.105885642 [Report] >>105885683

Screenshot from 2025-07-12 23-26-52.png md5: d46e36bb...

>>105885637

Anonymous 7/12/2025, 11:28:05 PM No.105885652 [Report]

>>105885230
Local models with fewer parameters usually have "traits" you can identify and explicitly prompt against. For example some models will say they need to maintain respect for everyone because they are hosted on a shared platform, so your prompt can include how they are running on a private instance accessed by only the administrator. With that said I've literally never needed to do this in an instance where I was making money and instead have only done this when I wanted the model to give me recipes for drugs or racism

Anonymous 7/12/2025, 11:30:47 PM No.105885675 [Report]

>>105885462
>implying there's any local models worth using at that level

Anonymous 7/12/2025, 11:31:23 PM No.105885683 [Report] >>105885763 >>105885786

>>105885637
>>105885642
Damn. What I find most impressive is that it told you the author and the title of the text before even showing the decoded form.

Anonymous 7/12/2025, 11:33:49 PM No.105885700 [Report] >>105885775 >>105885786

file.png md5: a831fcd6...

>>105885637
Smaller models can do this too although it did change the capitalization of "As" and add a period at the end.

Anonymous 7/12/2025, 11:42:03 PM No.105885763 [Report]

Screenshot from 2025-07-12 23-40-24.png md5: 41ea9c37...

>>105885683
Indeed!

R1-0528 successfully finished the job. You see, I did not put the entire story.

Anonymous 7/12/2025, 11:43:29 PM No.105885775 [Report]

>>105885700

I regard gemma-3-27b to be one of the smartest of her size

Anonymous 7/12/2025, 11:44:08 PM No.105885779 [Report] >>105885956

>>105885592
>The problem doesn't originate in llama.cpp
nta but it kinda does, it should default to the logical behavior unless explicitly told with a launch flag to allow retarded testing stuff

Anonymous 7/12/2025, 11:45:20 PM No.105885786 [Report]

>>105885700
>>105885683

Now, I'm going to try some simple ciphers like Caesar cipher and/or the random substitution one

Anonymous 7/12/2025, 11:50:23 PM No.105885834 [Report]

>>105884972
>See you in a few hours I guess

llama_perf_context_print: prompt eval time = 405410.81 ms / 14822 tokens ( 27.35 ms per token, 36.56 tokens per second)
llama_perf_context_print: eval time = 7667250.75 ms / 27206 runs ( 281.82 ms per token, 3.55 tokens per second)
llama_perf_context_print: total time = 8742602.49 ms / 42028 tokens

Would this have been an English text, 3:1 ratio would roughly apply to estimate the token count. I this case of a rather densely packed base64 string, 17000 bytes resulted in 14800 tokens

Anonymous 7/13/2025, 12:08:13 AM No.105885956 [Report]

>>105885779
>it should default to the logical behavior
Who better to assess what the logical behaviour should be than the model makers? What if there is a model that actually does need double BOS?
If the original config is wrong, the original config should be fixed.

Anonymous 7/13/2025, 12:08:46 AM No.105885960 [Report] >>105885965

I had a dream last night that Behemoth was released

Anonymous 7/13/2025, 12:09:27 AM No.105885965 [Report] >>105885971

>>105885960
Did you get to test it at least? How was it?

Anonymous 7/13/2025, 12:09:53 AM No.105885971 [Report] >>105886014

>>105885965
Still worse than Nemo

Anonymous 7/13/2025, 12:16:21 AM No.105886014 [Report]

>>105885971
kek

Anonymous 7/13/2025, 12:55:20 AM No.105886330 [Report] >>105886394

>>105883427 >>105883483

go and try it retards, it's the first new thing in a while, it's smart, writes well and is extremely non-slopped. Have some taste

Anonymous 7/13/2025, 1:02:34 AM No.105886394 [Report]

>>105886330
I don't have 1TB of RAM so I need to wait for a smaller quant.

Anonymous 7/13/2025, 1:07:08 AM No.105886435 [Report] >>105886612 >>105886679

Screenshot 2025-07-12 170614.png md5: f6a07b28...

Prepare yourselves anons...

Anonymous 7/13/2025, 1:29:38 AM No.105886612 [Report]

>>105886435
wdym

Anonymous 7/13/2025, 1:31:21 AM No.105886623 [Report] >>105886638 >>105886654 >>105886687

1733022993389705.png md5: caeb323a...

Anonymous 7/13/2025, 1:33:03 AM No.105886638 [Report] >>105886645 >>105886740

>>105886623
Deepsuck in shambles!

Anonymous 7/13/2025, 1:33:33 AM No.105886645 [Report]

>>105886638
Go bait elsewhere

Anonymous 7/13/2025, 1:34:09 AM No.105886654 [Report] >>105886655 >>105886663

>>105886623
Okay but what does this mean for me?

Anonymous 7/13/2025, 1:34:36 AM No.105886655 [Report] >>105886662

>>105886654
If it's of no use for you, you belong on /aicg/ instead of /lmg/

Anonymous 7/13/2025, 1:35:50 AM No.105886662 [Report]

>>105886655
If you don't want to discuss LLMs I think it's you who belongs in /aicg/

Anonymous 7/13/2025, 1:35:52 AM No.105886663 [Report] >>105886671

>>105886654
There are other companies who have mastered the deepseek secret sauce and it'll be extremely embarrassing for anyone who puts out something that's worse than Deepseek now.

Anonymous 7/13/2025, 1:37:42 AM No.105886671 [Report]

>>105886663
DeekSeek secret sauce was custom machine code

Anonymous 7/13/2025, 1:38:47 AM No.105886679 [Report]

>>105886435
>no epstein

Anonymous 7/13/2025, 1:39:52 AM No.105886687 [Report]

>>105886623
MoEs won. Larger model size + low active params are the future

Anonymous 7/13/2025, 1:43:20 AM No.105886703 [Report]

>>16723102
lol

Anonymous 7/13/2025, 1:43:40 AM No.105886705 [Report] >>105886855

Mistral Large 3 333B (dense) is going to shit all over MoE

Anonymous 7/13/2025, 1:49:20 AM No.105886740 [Report] >>105886745 >>105886796

>>105886638
Deepseek is still the best local model. Kimi is too big to be local

Anonymous 7/13/2025, 1:50:01 AM No.105886745 [Report] >>105886767 >>105886786

>>105886740
Bullshit. A32B is perfect for VRAM/SSD maxxing

Anonymous 7/13/2025, 1:53:25 AM No.105886767 [Report]

>>105886745
Can you infer this fat cunt from an SSD?

Anonymous 7/13/2025, 1:55:54 AM No.105886786 [Report] >>105886812

>>105886745
ssdmaxxing won't be a thing for a while

Anonymous 7/13/2025, 1:57:09 AM No.105886796 [Report]

>>105886740
>some time later...
Kimi is still the best local model. 2mw.gguf is too big to be local

Anonymous 7/13/2025, 1:58:58 AM No.105886812 [Report] >>105886831

>>105886786
It will be a thing when experts are self-contained in their own files. This way RAM is automatically used as caching for more heavily used experts.

Anonymous 7/13/2025, 2:01:21 AM No.105886831 [Report] >>105886956

>>105886812
mmap already works like that

Anonymous 7/13/2025, 2:04:13 AM No.105886855 [Report]

>>105886705
>XXX is going to YYY

two more weeks

Anonymous 7/13/2025, 2:09:02 AM No.105886894 [Report] >>105886909

1597786378292.gif md5: bf906331...

>>105885211
>>105878375
Thanks anons. I've added them to the paste.

Also (nothingburger) added LFM2-1.2B today.

Anonymous 7/13/2025, 2:11:28 AM No.105886909 [Report] >>105886920

>>105886894
Kimi K2 1T when?

Anonymous 7/13/2025, 2:12:46 AM No.105886920 [Report]

>>105886909
When someone makes a quanted goof.

Anonymous 7/13/2025, 2:17:10 AM No.105886956 [Report]

>>105886831
NTA but mmap is slower when enabled even when the model fully fits in RAM, so it is clearly suboptimal and llama.cpp should probably be handling the loading explicitly. I don't know if the approach of splitting the experts into files is the best way to do it or not though.

Anonymous 7/13/2025, 2:26:13 AM No.105886999 [Report] >>105887353

>>105884247
hat's the best? There's like eighty thousand versions of Cydonia.

Anonymous 7/13/2025, 2:30:21 AM No.105887017 [Report]

Devstral-Small-2505:
[TOOL_CALLS][{"name": "test1", "arguments": {"a": 1}, "id": "abcdefgh1"}, {"name": "test2", "arguments": {"b": 1}, "id": "abcdefgh2"}]</s>
Devstral-Small-2507:
[TOOL_CALLS]test1[ARGS]{"a": 1}[TOOL_CALLS]test2[ARGS]{"b": 1}</s>
I have come to the realization that there's no backend that can parse the latest Devstral's tool call syntax. And the only reason it seemed to work with llama.cpp is because they have their custom JSON syntax that they inject in the system prompt:
Example tool call syntax:

{
"tool_calls": [
{
"name": "tool_name",
"arguments": {
"arg1": "some_value"
},
"id": "call_1___"
}
]
}</s>

[/SYSTEM_PROMPT]
The inference section in the README is a big lie. I have no idea how people that aren't using llama.cpp are able to make it work.

Anonymous 7/13/2025, 3:24:14 AM No.105887341 [Report] >>105887422

Gvso7QcXAAAMCg6.jpg md5: 8011f4a1...