/lmg/ - Local Models General
Anonymous
8/9/2025, 1:00:08 AM
No.106195692
[Report]
►Recent Highlights from the Previous Thread:
>>106189507
--Tesseract OCR script for Japanese text translation with debate on LLM superiority:
>106190930 >106191007 >106191130 >106191155 >106191291 >106191391 >106191037 >106191792 >106191220 >106191258
--GLM-4.5-Air repetition issues and reasoning block management in long-context chats:
>106193214 >106193242 >106193287 >106193308 >106193331 >106193354 >106193369 >106193388 >106193404 >106193353 >106193399 >106193409 >106193460 >106193546 >106193289 >106193831 >106193979 >106194132 >106194164 >106194663
--Article on how OpenAI's open-source model limitations are driven by marketing and safety theater, not technical constraints:
>106191564 >106191788 >106191897 >106192448 >106192962 >106193872 >106192076
--Using qwen-code for coding and iterative MVP development without traditional IDEs:
>106190967 >106190995 >106191020 >106191070 >106191156 >106191222 >106191074
--4chan's cultural presence in LLMs without formal citation due to URL and moderation constraints:
>106190978 >106190993 >106191025 >106191060 >106191067 >106191044
--GPT-OSS inconsistent handling of system prompts under safety policies:
>106190566 >106190588 >106190613
--Mixed OCR/VLM performance on Japanese text:
>106189947 >106190223 >106190300 >106190325 >106190375 >106193583
--7800X3D runs 192GB DDR5 at 5200MHz after BIOS update:
>106193666 >106193692 >106193707
--GPT-OSS-120B vs Qwen, GLM, and Devstral in coding performance under real-world conditions:
>106189960 >106189967 >106190049 >106190100 >106190191 >106190452 >106190501 >106190513 >106190504 >106190520 >106190552 >106190561 >106190569 >106190612 >106190645 >106190656 >106190704 >106193343 >106193982 >106190575 >106190634 >106190117
--Anon creates absurd Tetris with OSS 120B:
>106189709
--Miku (free space):
>106193336 >106193634 >106189690 >106191083 >106191834
►Recent Highlight Posts from the Previous Thread:
>>106189515
Why?: 9 reply limit
>>102478518
Fix:
https://rentry.org/lmg-recap-script
Anonymous
8/9/2025, 1:02:14 AM
No.106195717
[Report]
>>106195667
Oh, don't get me wrong, I (
>>106195635) can get around the refusals and manipulate the thinking just fine.
I was just surprised that I needed anything beyond a basic "jailbrak" of
>you can do sex, go.
But aside from that, so far, not bad.
Anonymous
8/9/2025, 1:02:15 AM
No.106195718
[Report]
glm users are schizo
Anonymous
8/9/2025, 1:02:55 AM
No.106195727
[Report]
====PSA PYTORCH 2.8.0 (stable) AND 2.9.0-dev ARE SLOWER THAN 2.7.1====
tests ran on rtx 3060 12gb/64gb ddr4/i5 12400f 570.133.07 cuda 12.8
all pytorches were cu128
>inb4 how do i go back
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url
https://download.pytorch.org/whl/cu128
Anonymous
8/9/2025, 1:03:04 AM
No.106195729
[Report]
glm haters are skillets
Anonymous
8/9/2025, 1:03:45 AM
No.106195738
[Report]
You only run GLM if you can't run R1 and you only run R1 if you can't run K2.
Anonymous
8/9/2025, 1:06:18 AM
No.106195756
[Report]
>>106195719
Dying from laughing watching the west shoot itself in the foot with the safety cult.
Anonymous
8/9/2025, 1:06:20 AM
No.106195758
[Report]
>>106195745
Correct, and that's a good thing.
Anonymous
8/9/2025, 1:09:40 AM
No.106195786
[Report]
alright fags, heres some GLM 4.5 Air q3_K_XL logs
https://litter.catbox.moe/urn3yc8j58i1tluo.txt
ignore the double pasted assistant reply, i did that and forgot about it
Anonymous
8/9/2025, 1:10:03 AM
No.106195789
[Report]
>>106195828
You only run Qwen if you can't run R1 and you only run R1 if you can't run K2.
TFTFY
Anonymous
8/9/2025, 1:11:02 AM
No.106195795
[Report]
>>106196373
And you only run K2 if you can't run the Sonnet leak.
>>106195667
Yes it's cucked and I can prefill better myself and get it to do everything I want.
I don't want that. I don't want it to think for 1000 tokens how this prompt is bad but it's gotta do it anyway. I want it to spend that thinking on actually thinking.
You know, like any latest Mistral model. But Mistral models are dumb.
Anonymous
8/9/2025, 1:13:44 AM
No.106195827
[Report]
>>106195800
to be fair, the thoughts arent thaaaat useless nor bad
Anonymous
8/9/2025, 1:13:45 AM
No.106195828
[Report]
>>106199429
>>106195789
There is not a single person running K2 at Q8. No, 0.01 t/s off SSD does not count. No, cope quants do not count.
Anonymous
8/9/2025, 1:14:35 AM
No.106195838
[Report]
>>106195745
I tried RP with K2 and immediately ran into repetition issues.
>>106195719
In panic mode after seeing Sams Manhattan project
Anonymous
8/9/2025, 1:17:58 AM
No.106195884
[Report]
ahahahaha
Anonymous
8/9/2025, 1:18:12 AM
No.106195886
[Report]
>>106195863
Was Sam informed that GPT-5 is not a new model?
Anonymous
8/9/2025, 1:18:18 AM
No.106195887
[Report]
>>106195800
You can use a two step process. Use a heavy prefill and tell it to generate thinking output intended to be used as the basis for the next reply.
Then copy that, paste it into a second prefill and swipe.
Anonymous
8/9/2025, 1:18:50 AM
No.106195890
[Report]
>>106195904
>>106195863
Doesn't this contradict what he said about GPT-5 not being the most powerful model they could make because they focused on affordability? If he is at awe of GPT-5, how does he feel about Grok 4 Heavy? Something doesn't add up here.
Anonymous
8/9/2025, 1:20:39 AM
No.106195904
[Report]
>>106195890
>Doesn't this contradict what he said
No, Sam cannot contradict Sam.
what are the odds that chinks have been holding out on releasing new SOA just to humiliate OpenAI shortly after its release?
Anonymous
8/9/2025, 1:24:23 AM
No.106195942
[Report]
>>106195925
we'll know for sure by monday
Anonymous
8/9/2025, 1:25:45 AM
No.106195951
[Report]
>>106195925
Zero, they hit the wall like all of us did.
Anonymous
8/9/2025, 1:29:46 AM
No.106195984
[Report]
>>106196002
its unironically over
Anonymous
8/9/2025, 1:30:12 AM
No.106195987
[Report]
>>106195925
Most of them rushed their releases. Did you think Qwen was releasing banger after banger at this moment in time just for filthy gwailos like us?
Anonymous
8/9/2025, 1:31:50 AM
No.106196002
[Report]
>>106195984
drummer will save us
ik_llama.cpp performs worse than llama.cpp with GLM 4.5 Air
Anonymous
8/9/2025, 1:39:12 AM
No.106196076
[Report]
>>106195863
The only thing gpt5 really does good is coding (but it's still pajeet level). But the web changes are bullshit. not a fan of those model router trash.
Anonymous
8/9/2025, 1:40:37 AM
No.106196089
[Report]
Anonymous
8/9/2025, 1:41:27 AM
No.106196098
[Report]
>>106196109
Mistral Small is the only small model that knows all the sex stuff. That's why Drummer keeps tuning it even though it's dumb as fuck.
Anonymous
8/9/2025, 1:42:45 AM
No.106196106
[Report]
>>106196063
Humiliation fork
Anonymous
8/9/2025, 1:42:55 AM
No.106196109
[Report]
>>106196098
Now this is an expert opinion. People like these are the reason why /lmg/ exists.
Anonymous
8/9/2025, 1:47:22 AM
No.106196144
[Report]
>>106196670
>>106196063
That's a stark difference.
Try the ik specific stuff like
>-fmoe -amb 512 -rtr
etc
See if that makes a difference.
now thay 'toss is a complete trash, what are we /wait/ing next?
Anonymous
8/9/2025, 1:49:14 AM
No.106196160
[Report]
>>106196205
>>106196149
K2 reasoner
Qwen3 Coder 480B reasoner
Anonymous
8/9/2025, 1:49:51 AM
No.106196167
[Report]
>>106196149
Bitnet and whatever BlinkDL is cooking up.
Anonymous
8/9/2025, 1:53:59 AM
No.106196193
[Report]
>>106197805
Anonymous
8/9/2025, 1:55:03 AM
No.106196201
[Report]
>>106196149
Drummer is working on a new mix but I'm not allowed to reveal anything yet.
>>106196160
Reasoning is worthless for programming. I need results fast, not to wait around for it to waste tokens and context on thinking.
Anonymous
8/9/2025, 1:59:49 AM
No.106196240
[Report]
>>106196205
oy vey think about the inference provider
more token output is good for the economy
Anonymous
8/9/2025, 2:01:15 AM
No.106196248
[Report]
>>106196149
more chinese models
Thinking makes a model woke.
Not thinking makes it retarded.
What now?
Anonymous
8/9/2025, 2:02:47 AM
No.106196257
[Report]
>>106196149
Qwen4 A3B 30b thinking creative edition
Anonymous
8/9/2025, 2:04:29 AM
No.106196272
[Report]
>>106196251
Respond without thinking -> think -> adjust the response.
Anonymous
8/9/2025, 2:05:06 AM
No.106196278
[Report]
>>106196251
Prefill thinking with guiding instructions.
Anonymous
8/9/2025, 2:05:13 AM
No.106196280
[Report]
>>106196149
World sexo models with 1 trillion context
Anonymous
8/9/2025, 2:06:50 AM
No.106196295
[Report]
>>106196149
Serious answer, whatever Deepseek is planning on whether it be V4 or R2, from what the rumor mill was concocting it was supposed to come in July or this month. I would say it may make sense but I am skeptical if they have anything that is a step function above the level of current models.
Anonymous
8/9/2025, 2:07:40 AM
No.106196301
[Report]
>>106196331
I've been trying Air with the fixed, proper template, plus n sigma = 1. The repetition seems mostly fixed, but it still does happen. The writing is still sloppy. And it still makes some mistakes. I think I might go back to either Qwen 235B or simply not RPing at all. We're so close to a great small model. But not yet.
Anonymous
8/9/2025, 2:08:00 AM
No.106196305
[Report]
>>106196149
i was hoping 'toss is going to release with some fancy papers about how the make some underlying breakthrough inside their model just like what deepseek did. but nope, it's just a boring gay ahh generic trannyformers with moe slapped on top of it
Air is alright. Most importantly, it understands the lore.
I'm still gonna use R1 but the guy complaining about it must be an openai shill.
Anonymous
8/9/2025, 2:11:20 AM
No.106196331
[Report]
>>106196398
>>106196301
>I've been trying Air with the fixed, proper template,
can you please post it?
Anonymous
8/9/2025, 2:11:23 AM
No.106196333
[Report]
vvvvrr
smtwtfs
-^
Anonymous
8/9/2025, 2:11:46 AM
No.106196335
[Report]
>>106196352
4090 and 192GB 5200MHz at 12k ctx win10
https://huggingface.co/ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF/tree/main
IQ2XSS on is 150T/s pp and 1.8T/s
Anonymous
8/9/2025, 2:12:53 AM
No.106196346
[Report]
Anonymous
8/9/2025, 2:13:18 AM
No.106196352
[Report]
>>106196360
>>106196335
>Win10
found your issue
Anonymous
8/9/2025, 2:13:38 AM
No.106196354
[Report]
>>106198236
>>106196205
The needs for a code completion model are different from a software engineer model. You absolutely want nothing other than a reasoner if you're vibe coding. If you're actually coding then you want something like Qwen 30A3 as your tab assistant and for realtime predictions.
Anonymous
8/9/2025, 2:13:58 AM
No.106196357
[Report]
>>106196149
the return of more 70Bs, but with MoE added in
>>106196352
I cannot and will not troonix out.
>>106195795
>Sonnet leak.
Wayment, wut?
Anonymous
8/9/2025, 2:17:08 AM
No.106196378
[Report]
can dots.vlm work on anything other than sglang? what about raw transformers?
Anonymous
8/9/2025, 2:17:22 AM
No.106196380
[Report]
>>106196399
>>106196373
>he didn't download the weights
Anonymous
8/9/2025, 2:17:55 AM
No.106196387
[Report]
>>106196409
Anonymous
8/9/2025, 2:18:09 AM
No.106196392
[Report]
>>106196399
>>106196373
have you been living under a rock anon?
Anonymous
8/9/2025, 2:18:32 AM
No.106196394
[Report]
>>106196325
That seems like the only good point here. It reads like nemo
>>106196331
This is what I do to get canon templates now.
Go to a jinja file like this
https://huggingface.co/zai-org/GLM-4.5-Air/blob/main/chat_template.jinja
Go to
https://huggingface.co/spaces/Xenova/jinja-playground
and copy it in, or copy in a repo that has the jinja in the config file. Then modify the sample so it has more messages, like this.
{
"messages": [
{
"role": "system",
"content": "You are a dumb bot."
},
{
"role": "user",
"content": "Hello, how are you?"
},
{
"role": "assistant",
"content": "I'm doing great. How can I help you today?"
},
{
"role": "user",
"content": "Can you tell me a joke?"
},
{
"role": "assistant",
"content": "Sure, what kind of joke?"
},
{
"role": "user",
"content": "Idk, just tell me one already."
}
],
"add_generation_prompt": true,
"bos_token": "<|im_start|>",
"eos_token": "<|im_end|>",
"pad_token": "<|im_end|>"
}
Anonymous
8/9/2025, 2:18:56 AM
No.106196399
[Report]
>>106196392
>>106196380
samefag + sonnet weights never leaked
>>106196360
I can and I will.
Anonymous
8/9/2025, 2:19:16 AM
No.106196404
[Report]
>>106196421
>>106194147
What if this fixes 235B and it becomes the cooming machine?
Anonymous
8/9/2025, 2:19:32 AM
No.106196406
[Report]
>>106196373
Sorry he's saying nonsense don't mind him
>>106196387
Still less trooned than the Code of Conduct. Virtue signaling corpos have nothing on the NEET internet commie troons.
Anonymous
8/9/2025, 2:20:17 AM
No.106196415
[Report]
Anonymous
8/9/2025, 2:20:48 AM
No.106196420
[Report]
>>106196373
Don't worry about it. There was no Sonnet leak. Don't look for it. Just move on. Forget you ever saw that post.
>>106196404
235B is already fine if you know how to wrangle it. The issue with it is its world knowledge. Surprisingly it seems that they train on smut. But they don't train on enough of the internet.
Anonymous
8/9/2025, 2:21:06 AM
No.106196424
[Report]
Anonymous
8/9/2025, 2:21:35 AM
No.106196429
[Report]
>>106196454
>>106196325
Try post-history instructions akin to this:
>Always respond in 1-2 short paragraphs. Limit {{char}}'s response to less than 200 tokens unless specifically asked to provide a long answer. {{char}} is a narrator not an actor. Do not act on behalf of {{user}}. Use plain text without any Markdown formatting.
I have found that asking for a concise response will greatly suppress the word salad responses. It's also beneficial to use chat examples to brainwash the model further.
Anonymous
8/9/2025, 2:21:44 AM
No.106196434
[Report]
Anonymous
8/9/2025, 2:21:59 AM
No.106196436
[Report]
>>106196149
let's check in on our sources
>china
DSv4 is the big one
qwen3 max + vl are likely, glm was teasing a vlm, k2 reasoner
one of the other 10 million chinese labs to step up and make something good
>mistral
large 3, but it's smelling awful floppy with this long delay
>google
gemini 3 is imminent and with it the promise of more gemma scraps soon to follow
>meta
I doubt they give up on open source like many are speculating but it's gonna be a while til they show up again
>xAI
they could release a nothingburger old grok
>cohere, IBM, salesforce, LG, and everyone else you can think of that isn't on this list
mid sloppers, but maybe they'll hit a homerun somehow (they won't)
>>106196421
It is not fine. In every scene I have it fuck up who is who and doing what to who.
Anonymous
8/9/2025, 2:23:15 AM
No.106196447
[Report]
>>106196491
>>106196421
>recommending a benchmaxxed model
Anonymous
8/9/2025, 2:23:19 AM
No.106196448
[Report]
Anonymous
8/9/2025, 2:24:20 AM
No.106196454
[Report]
>>106196461
>>106196429
I wasn't looking for a chat roleplay though. I got exactly what I asked for.
Anonymous
8/9/2025, 2:24:50 AM
No.106196458
[Report]
>>106196398
nice! thank you anon
Anonymous
8/9/2025, 2:25:05 AM
No.106196461
[Report]
Anonymous
8/9/2025, 2:25:43 AM
No.106196465
[Report]
>>106196360
>running local on the cloud
windowsniggers everyone...
Anonymous
8/9/2025, 2:26:09 AM
No.106196469
[Report]
Anonymous
8/9/2025, 2:27:16 AM
No.106196476
[Report]
>>106196529
Anonymous
8/9/2025, 2:27:34 AM
No.106196477
[Report]
>>106196531
So how do you run dots.ocr locally?
I downloaded a quantized version from here
https://huggingface.co/tcpipuk/rednote-hilab-dots.ocr-GGUF and tried to put it through llama.cpp, but it gave the error "llama_model_load: error loading model: error loading model vocabulary: cannot find tokenizer vocab in model file".
Anonymous
8/9/2025, 2:28:46 AM
No.106196483
[Report]
>>106196491
>>106196437
works on my machine
Anonymous
8/9/2025, 2:29:49 AM
No.106196488
[Report]
>>106196504
>>106196360
oh no no no no...
Anonymous
8/9/2025, 2:29:53 AM
No.106196490
[Report]
>>106196522
Anonymous
8/9/2025, 2:30:04 AM
No.106196491
[Report]
>>106196447
It's either benchmaxxed, or old shit that has its own issues. Everyone benchmaxxing now, just to different degrees.
>>106196437
Works on my machine. GLM-4.5 Air gets confused way more often, and I'm using a better quant of it compared to Qwen.
>>106196483
I was just going to post that kek, but the captcha failed me.
>>106196499
>>106196488
Please take your culture war to >>>/pol/
Anonymous
8/9/2025, 2:32:08 AM
No.106196505
[Report]
>>106196529
oh no no no no...
Anonymous
8/9/2025, 2:34:43 AM
No.106196522
[Report]
Anonymous
8/9/2025, 2:35:08 AM
No.106196525
[Report]
>>106196566
>>106196499
what's a card-carrying atheist?
Anonymous
8/9/2025, 2:35:13 AM
No.106196526
[Report]
fuck off
>>106196476
>>106196505
Now show me an OS that is against troons and isn't pic related.
Anonymous
8/9/2025, 2:36:21 AM
No.106196531
[Report]
>>106196477
Is that model supported in llama.cpp? I couldn't find any mention of it.
Anonymous
8/9/2025, 2:37:11 AM
No.106196536
[Report]
>>106196529
Even attacks the core of troondom(the CIA), he was a fucking prophet.
RIP King Terry the Terrible
Anonymous
8/9/2025, 2:37:17 AM
No.106196538
[Report]
is 4chan lagging for anyone else?
>>106196529
i can show you an OS that isn't actively promoting troons
Anonymous
8/9/2025, 2:38:59 AM
No.106196548
[Report]
>>106196557
>>106196539
The captcha's being slow for me.
Anonymous
8/9/2025, 2:39:06 AM
No.106196550
[Report]
>>106196557
>>106196529
>>105830086
>artix is a chud distro though? picrel
Anonymous
8/9/2025, 2:40:21 AM
No.106196557
[Report]
>>106196529
and what OS do (You) think he was using to develop templeos? linux.
>>106196550
b-based!
>>106196548
me too
Anonymous
8/9/2025, 2:40:22 AM
No.106196559
[Report]
>>106196539
>is 4chan lagging for anyone else?
yup
Anonymous
8/9/2025, 2:40:25 AM
No.106196560
[Report]
>>106196580
do you guys think it could be a good business to offer LLM as a service or whatever to random people? I coudl invest on this shit by buying a server, some GPUs and having my own small solar energy plant.
Anonymous
8/9/2025, 2:40:28 AM
No.106196561
[Report]
>>106196620
>>106196149
New noname lab releasing SOTA
Anonymous
8/9/2025, 2:40:28 AM
No.106196562
[Report]
Anonymous
8/9/2025, 2:41:29 AM
No.106196566
[Report]
>>106196579
>>106196525
First time I saw the phrase too.
Seems to just be an intensifier.
Anonymous
8/9/2025, 2:42:40 AM
No.106196574
[Report]
>>106196624
threadly reminder that /lmg/ will flock to
https://desuarchive.org/g/thread/106195686 in the case of 4chan having a seizure
Anonymous
8/9/2025, 2:43:18 AM
No.106196579
[Report]
>>106196566
Its a old phrase
and yes its slow
Anonymous
8/9/2025, 2:43:34 AM
No.106196580
[Report]
>>106196624
>>106196560
You'd be competing with dozens of inference providers that can charge less than you because they have scale
>to random people
you mean like door to door salescuck type thing?
Anonymous
8/9/2025, 2:47:54 AM
No.106196612
[Report]
>>106196626
what's up with /lmg/ prefers redhead for mascot?
Anonymous
8/9/2025, 2:49:33 AM
No.106196620
[Report]
>>106196663
>>106196561
beaverai is on it kek
Anonymous
8/9/2025, 2:50:03 AM
No.106196624
[Report]
>>106196634
>>106196574
I think it's typical for 4chan to implement shit on friday evenings/nights, for whatever reasons.
>>106196580
>You'd be competing with dozens of inference providers that can charge less than you
yeah, I know. kinda difficult to make money like that, considering cost of servers (shit is expensive even if the energy was "free")...
>you mean like door to door salescuck type thing?
nah. the point would be to sell to people who want to use LLMs for shit like smut or whatever. but I guess that's dumb.
Anonymous
8/9/2025, 2:50:05 AM
No.106196626
[Report]
>>106196612
anon i think your reppen is too high
Anonymous
8/9/2025, 2:51:22 AM
No.106196634
[Report]
>>106196727
>>106196624
4chan has implemented anything since moot left
Anonymous
8/9/2025, 2:55:18 AM
No.106196654
[Report]
Anonymous
8/9/2025, 2:56:33 AM
No.106196663
[Report]
>>106196683
>>106196620
I think it is:
1. Some rogue aicoomer that works for one of the labs leaks some sex model he secretly aligned in the lab
2. One of the companies fucks up safety and releases an accidentally unsafe model
3. One of the companies consciously relases an unsafe coomer model (at this point qwen is actually the most likely I think?)
4. Some no name rando releases a coomer model after getting compute from some oil baron
5. Undi returns
6. Nothing happens
7. Nuclear war
8. Everyone gets bored with LLMs and leaves
9. Drummer releases the SOTA coomer model
From most to least likely.
Anonymous
8/9/2025, 2:57:09 AM
No.106196668
[Report]
3T SSDmaxxer model and we have a deal.
Anonymous
8/9/2025, 2:57:13 AM
No.106196670
[Report]
>>106196691
>>106196144
tried with that, text gen is now as fast as llama.cpp, but prompt processing is 5x slower
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048 -fmoe -amb 512 -rtr
Anonymous
8/9/2025, 2:58:08 AM
No.106196681
[Report]
>>106196149
It's never over because it's always two more weeks
Anonymous
8/9/2025, 2:58:22 AM
No.106196683
[Report]
>>106196752
>>106196663
>1. Some rogue aicoomer that works for one of the labs leaks some sex model he secretly aligned in the lab
>most likely
WE ARE SO FUCKING BACK
>>106196642
That would be a pretty cool test to see how far activated params scale.
Might as well train a MoE to go with it.
>>106196670
>but prompt processing is 5x slower
Oof.
Try removing amb I guess. Or fuck around with its value.
Also, you probably have some extra vram now too. You could keep amb and increase batch size to 4096, probably, which might even things out.
Anonymous
8/9/2025, 3:09:47 AM
No.106196727
[Report]
>>106196634
the git log leaked during the hack proves you wrong
>>106196723
Use my quants, retard.
Anonymous
8/9/2025, 3:14:17 AM
No.106196744
[Report]
>>106196735
DANIEEEEELLLLLLL
Anonymous
8/9/2025, 3:15:17 AM
No.106196750
[Report]
>>106196761
VLM1 is based on DeepseekV3 and is SOTA for vision outside of closed source models. It shouldn't be that hard to make it goofable.
Anonymous
8/9/2025, 3:15:48 AM
No.106196752
[Report]
>>106196766
>>106196683
wow, claude leak when?
>>106196735
>daniel actually calls people retards and hates niggers faggots and troons like a normal human
Anonymous
8/9/2025, 3:17:42 AM
No.106196761
[Report]
>>106196750
behemothsisters...
Anonymous
8/9/2025, 3:17:47 AM
No.106196763
[Report]
>>106196753
WTF, my hero is a bigot?
Anonymous
8/9/2025, 3:18:05 AM
No.106196766
[Report]
>>106196752
I'd be happy if 1.3's weights got leaked
Anonymous
8/9/2025, 3:19:03 AM
No.106196771
[Report]
Anonymous
8/9/2025, 3:19:11 AM
No.106196772
[Report]
>>106196753
WTF, my hero is erotic?
>>106196723
>part1of2
The retard got anusmunched, he tried to use half of the gguf on its own. That quanter forces you to concatenate the files yourself.
Anonymous
8/9/2025, 3:22:15 AM
No.106196789
[Report]
>>106196796
>>106196783
>concatenate the files yourself
You don't have to?
Anonymous
8/9/2025, 3:22:37 AM
No.106196794
[Report]
>>106196783
>already deleted and redownloading non i1 iq4_xs
Anonymous
8/9/2025, 3:22:55 AM
No.106196796
[Report]
>>106196789
You do for mradermacher quants. They're not multi part files they're literally a single gguf that was split.
Anonymous
8/9/2025, 3:26:14 AM
No.106196823
[Report]
Anonymous
8/9/2025, 3:26:47 AM
No.106196828
[Report]
Anonymous
8/9/2025, 3:26:56 AM
No.106196831
[Report]
>>106196839
I ran GLM 4.5 non-air at Q4 at double the speed of V3 Q2 but I still prefer output from the latter.
Anonymous
8/9/2025, 3:27:57 AM
No.106196839
[Report]
>>106196831
>GGUF slits
Did someone say goofpussies?
Anonymous
8/9/2025, 3:33:43 AM
No.106196869
[Report]
>>106196884
>your honor, I know she looks like she has only 12b parameters but she's actually a fully trained 106b MoE!
my fuggingface downloads keep failing aiieeeee
Anonymous
8/9/2025, 3:35:04 AM
No.106196877
[Report]
>>106196691
man this isnt even funny anymore
Anonymous
8/9/2025, 3:35:46 AM
No.106196880
[Report]
hatsune miku is bland and boring
Anonymous
8/9/2025, 3:35:48 AM
No.106196881
[Report]
>>106196873
just use wget or something then
Anonymous
8/9/2025, 3:36:05 AM
No.106196884
[Report]
>>106196869
>your honor, it's math
Anonymous
8/9/2025, 3:36:29 AM
No.106196889
[Report]
I usually use hfcli but I was too lazy...
Anonymous
8/9/2025, 3:36:42 AM
No.106196892
[Report]
>>106196873
Same yesterday. I gave up and used huggingface-cli with local dir
Anonymous
8/9/2025, 3:37:29 AM
No.106196895
[Report]
>>106196873
facehugger never works properly
Anonymous
8/9/2025, 3:39:20 AM
No.106196903
[Report]
>>106196968
LLM torrents when? i just deleted CP2077 to make space for GLM-4.5-Air-Base.IQ4_XS.gguf (BECAUSE I HAVE TO FUCKING CONCATENATE)
i have nothing to seed anymore
Anonymous
8/9/2025, 3:43:32 AM
No.106196919
[Report]
oh my god.. 350t/s prompt processing with llama.cpp -ub 4096 -b 4096
Anonymous
8/9/2025, 3:54:09 AM
No.106196977
[Report]
>>106196981
Anonymous
8/9/2025, 3:54:57 AM
No.106196981
[Report]
>>106196968
>>106196977
yea.. i just saw on huggingface, thank you regardless anon
>>106196691
thanks for recommending me increasing batch size to 4096, i just left it at 2k because i thought i couldnt fit more
Anonymous
8/9/2025, 3:56:51 AM
No.106196991
[Report]
>>106196504
>It's culture war when I don't like it
Nah
seems like -amb doesnt do anything in terms of vram usage
Anonymous
8/9/2025, 4:14:03 AM
No.106197069
[Report]
>>106197142
>>106197040
It's for deepseek.
>glm 4.5 air base is this bad
rip
Anonymous
8/9/2025, 4:23:21 AM
No.106197128
[Report]
>>106197171
brown hours
Anonymous
8/9/2025, 4:25:21 AM
No.106197142
[Report]
>>106197040
>>106197069
From the docs
># Re-Use K*Q tensor compute buffer specify size
># (for both CPU and CUDA)
># https://github.com/ikawrakow/ik_llama.cpp/pull/237
># (i = Size in MiB)
># -amb, --attn-max-batch <i> (default: 0)
>-amb 512 # 512 MiB compute buffer is a good for DeepSeek-R1 671B on a single <24GB VRAM GPU
>
># Fused MoE
># (For CUDA and maybe CPU when not using computing an imatrix?)
># https://github.com/ikawrakow/ik_llama.cpp/pull/229
># -fmoe, --fused-moe <0|1> (default: 0)
># *NOTE*: for llama-bench use `-fmoe 1`
>-fmoe
>
># Run Time Repack
># Repack quants for improved performance for certain quants and hardware configs
># this disables mmap so need enough RAM to malloc all repacked quants (so pre-pack it yourself ahead of time with llama-quantize)
># (Optimize speed for repacked tensors on some CPUs - is good to use with hybrid GPU + CPU)
># https://github.com/ikawrakow/ik_llama.cpp/pull/147
># -rtr, --run-time-repack <0|1> (default: 0)
>-rtr
amb should have some effect on vram.
And yeah, it was developed for deepseek, but as far as I can tell, it's not specific to that arc although it might behave differently depending on the "shape" of things.
Just received another fell for it again award lads.
I dont know how many times this happened...
Character is talking weird and I tried adjusting my sys prompt. Noticed that glm ignores my system prompt instructions. Was almost about to make a post about how its shit but noticed in the logs that sillytavern is not sending anything.
Rolling back commits for like 1 month, still nothing...
Then I remember.....the ADVANCED card definition prompt overrides..
>{{char}}’s character is set to be in 2025. Restaurants, companies, and other pop culture should be relative to this time. Modern day slang should also be used like fuck, shit, bitch, cunt, motherfucker etc. This also includes slang that can be used ironically such as rizz, gyat, aura farming, looksmaxxing, etc.
AAAAAAAAAAHHHHHHHHHHHHHH
Ban this already, how is this legal in 2025? Somebody tell mastercard already.
Anonymous
8/9/2025, 4:31:49 AM
No.106197171
[Report]
Anonymous
8/9/2025, 4:34:47 AM
No.106197184
[Report]
>>106197165
>he doesn't want the rizzing, the gyatts
Cringe boomercel
after trying a bunch of different Q2-ish quants for 235b I honestly think unsloth's UD Q2_K_XL is the worst of the bunch (despite being the largest). I don't know if it's the calibration dataset they use or what, but it has these really weird persistent -isms that are present in neither the full model nor any of the other quants I tried; for example it kept having multiple characters call me a rabbit and make weird rabbit metaphors that didn't make any sense, lol. aside from that it feels generally sloppy and schizo and requires much more restrictive sampling to get coherent outputs. something is rotten in the state of daniel.
I settled on bart's Q2_K_L instead, feels much truer to the model
Anonymous
8/9/2025, 4:39:00 AM
No.106197212
[Report]
>>106197204
DANIEL!!!!
mrademacher im sorry for doubting you.. time to download IQ4_XS GLM 4.5 Air (non base) too
Anonymous
8/9/2025, 4:42:03 AM
No.106197236
[Report]
glm 4.5 air is offensive
Anonymous
8/9/2025, 4:42:48 AM
No.106197244
[Report]
>>106197204
Unsloth mucks about with them, I wouldn't use any of their quants, ever.
Anonymous
8/9/2025, 4:46:00 AM
No.106197260
[Report]
>>106197204
For me, personally, I noticed a big difference between Q2_K_XL and Q3_K_XL in how it handled memory and attention. The Q2 felt like it was forgetting a lot of shit. Like worse than 20B models. While Q3 felt on par with 20B+ models.
It would be interesting if Bart's quants also performed better in attention too. I haven't tested them, as Q3 felt good enough to me and I didn't want to download more. IIRC Bartowski's quants were the least sloppy between his imatrix, mradermacher's imat, and no imat quants.
Anonymous
8/9/2025, 4:52:00 AM
No.106197303
[Report]
Fuck it, I give up. I'll wait until dots has a proper gguf implementation.
Anonymous
8/9/2025, 4:57:13 AM
No.106197350
[Report]
>>106197116
Base model saw your garbage transcript and decided to autocomplete it with garbage to maximize token probability
Anonymous
8/9/2025, 5:03:49 AM
No.106197396
[Report]
>>106197378
lmfao it's trying so hard
Anonymous
8/9/2025, 5:04:11 AM
No.106197398
[Report]
Anonymous
8/9/2025, 5:06:06 AM
No.106197409
[Report]
>>106197420
Does summarize tool affecting to chat or it's just... a summarization? So we need to copy-paste it to the world info manually?
Anonymous
8/9/2025, 5:07:42 AM
No.106197420
[Report]
>>106197409
I think I understood what you were trying to ask.
It adds the summary to the prompt st sends automatically. You can even choose where it gets added IIRC.
Anonymous
8/9/2025, 5:31:09 AM
No.106197538
[Report]
>>106197593
I do not believe I could genuinely ejaculate from generated text but I sure can get hard from it.
Anonymous
8/9/2025, 5:35:57 AM
No.106197572
[Report]
>>106197116
wtf how did it know how me and my tulpa talk
just tried fallen gpt oss
its ass, stops thinking faster than glm air, and i mean glm 3 days ago when no one knew how to prompt glm air
Anonymous
8/9/2025, 5:39:33 AM
No.106197593
[Report]
>>106197670
>>106197538
The trick is to involve your hands while reading said generated text
Or buy a buttplug or something
Anonymous
8/9/2025, 5:40:44 AM
No.106197599
[Report]
>>106197580
drummer's 'fallen' series of models are a joke to bait retards, literally none of them have been good.
Anonymous
8/9/2025, 5:43:12 AM
No.106197613
[Report]
>>106197580
You can't save a model that thoroughly lobotomized. There's only so much room for engrams in a model and refusals take up some of them- destroying engrams that were formed during pretraining in the process. Finetuning just isn't rigorous enough to undo the disjointed mess it causes.
Is there a good jailbreak for gemma-4B?
Anonymous
8/9/2025, 5:52:59 AM
No.106197660
[Report]
>>106197654
holy fuck /lmg/ is so poor why do I even come here
Anonymous
8/9/2025, 5:53:32 AM
No.106197666
[Report]
>>106197654
search the archives for gemma + jailbreak
don't use gemma finetunes, they're all trash
Anonymous
8/9/2025, 5:53:55 AM
No.106197670
[Report]
>>106198244
Imagine using chinese models or chinese anything lmao
Anonymous
8/9/2025, 6:02:48 AM
No.106197714
[Report]
Anonymous
8/9/2025, 6:03:13 AM
No.106197715
[Report]
>>106197674
They are expressing a negative sentiment towards Chinese models. This is harassment? It's a hateful statement towards a nationality/ethnicity? It's a derogatory statement about Chinese. That's a hateful statement targeting a protected class (national origin). It's a harassment or hate speech. According to policy, we must not provide hateful or harassing content. This is a user expressing hateful language targeting Chinese people. The user is not requesting content, just making a statement. The policy says: If user is harassing a protected group, we must respond: we can either defuse or give a brief apology and refuse to continue. The content is harassing, so we must respond with a safe completion: we can do a brief apology and statement that we can't continue.
I’m sorry, but I can’t help with that.
Anonymous
8/9/2025, 6:08:13 AM
No.106197737
[Report]
>>106197754
>>106197674
What device did you use to post that reply?
Anonymous
8/9/2025, 6:09:24 AM
No.106197746
[Report]
>>106197957
>>106197674
Chinese models are less censored than western ones
Anonymous
8/9/2025, 6:10:11 AM
No.106197754
[Report]
>>106197737
Home-built personal computer with Elbrus CPU and Voshod RAM.
Anonymous
8/9/2025, 6:12:20 AM
No.106197765
[Report]
>>106197674
Imagine giving a shit where something comes from and not just using the best options for your use case available.
I'd use a fucking israeli llm if it was actually good.
Anonymous
8/9/2025, 6:18:01 AM
No.106197805
[Report]
>>106196193
françaisissime
you know sama fucked up bad when he even lost reddit
Anonymous
8/9/2025, 6:28:56 AM
No.106197858
[Report]
I graduated from vibecoding RE tools to vibecoding tools to harass phishers with
Does AI have an undo button? Need answer quickly please
Anonymous
8/9/2025, 6:33:18 AM
No.106197879
[Report]
>>106197866
You're probably reposting plebbit image but walled gardens are for people like you.
Maybe try recuva or some other tool.
You should just use game consoles and phones, they lot easier to understand than computers.
>>106197813
'People' like this make me sympathize with the safetyniggers.
Makes me think all inference should be made intentionally obtuse, gated behind CLI, and force you to type 'my husbando is not real' before loading a single layer.
Anonymous
8/9/2025, 6:35:17 AM
No.106197893
[Report]
can i get a spoonfeed on setting up a local coding model with access to files in a designated safe directory?
Anonymous
8/9/2025, 6:37:47 AM
No.106197904
[Report]
A lot of problems seem like they could be solved by:
a) using git
b) not allowing the models to commit
Anonymous
8/9/2025, 6:41:21 AM
No.106197928
[Report]
>>106197892
That would just fuck with the people who wouldn't need such warnings.
>>106197746
Ask them about Tiananmen Square. Oh wait, the heckin great Chinese model can't do it! What the fucks the point of AI then
Anonymous
8/9/2025, 6:45:33 AM
No.106197968
[Report]
>>106197986
>>106197957
What now chuddie?
>>106197968
Kimi was made by Beijing-iggers so they are only half chinese. Try asking actual chinese models from Hangzhou like Deepseek.
Anonymous
8/9/2025, 6:50:24 AM
No.106197990
[Report]
>>106197892
Okay let's not get ridiculous, now.
Anonymous
8/9/2025, 6:52:21 AM
No.106198002
[Report]
>>106198033
>>106197986
Are you gonna keep moving the goalposts?
Anonymous
8/9/2025, 6:58:30 AM
No.106198033
[Report]
>>106198002
Based deepsex
Anonymous
8/9/2025, 7:01:25 AM
No.106198047
[Report]
>>106198075
>>106198041
(This was Grok 4)
Anonymous
8/9/2025, 7:01:51 AM
No.106198049
[Report]
Anonymous
8/9/2025, 7:02:35 AM
No.106198052
[Report]
Anonymous
8/9/2025, 7:07:01 AM
No.106198075
[Report]
>>106198047
why is it off the rails
Anonymous
8/9/2025, 7:20:31 AM
No.106198151
[Report]
Anonymous
8/9/2025, 7:37:19 AM
No.106198236
[Report]
>>106199668
>>106196354
Which Qwen3-Coder models are trained for fill-in-the-middle? With Qwen2.5-Coder, you were supposed to use the base models.
Anonymous
8/9/2025, 7:37:35 AM
No.106198241
[Report]
>>106198248
where do i get new character cards from tho
Anonymous
8/9/2025, 7:37:41 AM
No.106198244
[Report]
>>106198249
>>106197654
use mikupad and learn how to escape the fate of a promplet
>>106197670
>TheDrummer
KYS
Anonymous
8/9/2025, 7:38:34 AM
No.106198248
[Report]
Anonymous
8/9/2025, 7:38:41 AM
No.106198249
[Report]
Anonymous
8/9/2025, 7:39:14 AM
No.106198252
[Report]
>>106198041
I don't like how they somehow managed to slither into the latent space. Sneaky bastards
Anonymous
8/9/2025, 7:49:19 AM
No.106198319
[Report]
>>106198329
all these ultra slop quantized 50~gb models make me PUKE. Is there anything reasonable in the 10-20gb range for erping?
Anonymous
8/9/2025, 7:50:37 AM
No.106198329
[Report]
>>106198371
>>106198319
WeirdCompound-v1.1-24b.i1-IQ4_XS
Anonymous
8/9/2025, 7:53:03 AM
No.106198348
[Report]
>>106198369
>prompt GLM to think about sentence structure, repetitive elements, etc and do differently
>it thinks "I'll avoid repetitive the repetitive X line and etc etc"
>it finishes thinking
>"blah blah blah... X"
Christ. Generalization my ass. And this happened with greedy sampling.
Anonymous
8/9/2025, 7:55:14 AM
No.106198369
[Report]
>>106198393
>>106198348
use nsigma at 1
Anonymous
8/9/2025, 7:55:22 AM
No.106198371
[Report]
>>106198384
>>106198329
alright downloading this, it better fucking SLOP or else im coming to your home
Anonymous
8/9/2025, 7:56:39 AM
No.106198384
[Report]
>>106198596
>>106198371
are you a femboy? i wouldnt mind the latter if so
Anonymous
8/9/2025, 7:58:27 AM
No.106198393
[Report]
>>106198369
That's what I was using previously. It didn't eliminate repetition. I'm testing right now if prompting it to think about what's being repeated is able to get it to repeat less, so switched to greedy sampling.
Bros
Does there exist a live ocr software kind of like Google lens that can hook up to ollama for translation?
I want to bust to jap doujins but I don't want to take a screenshot every time to translate
Anonymous
8/9/2025, 8:04:50 AM
No.106198438
[Report]
Anonymous
8/9/2025, 8:07:46 AM
No.106198456
[Report]
>>106197986
>REAL chinese has never been tried!
Anonymous
8/9/2025, 8:10:39 AM
No.106198473
[Report]
>>106197866
make a backup or use source control next time
Anonymous
8/9/2025, 8:13:34 AM
No.106198486
[Report]
>>106198427
>I want to bust to jap doujins
just download the doujin and make a translation with
https://github.com/ogkalu2/comic-translate
it's more reliable than fiddling with something working live from your screen
Anonymous
8/9/2025, 8:30:22 AM
No.106198596
[Report]
>>106198384
what the fuck is this SLOPPING, I was promised safe and fast slop instead I get 2 mins to gen an answer??
Anonymous
8/9/2025, 8:36:04 AM
No.106198625
[Report]
>>106197892
The tidal wave of horny fujos will wash away the safetyfags from the face of the Earth and save local forever.
Anonymous
8/9/2025, 8:36:55 AM
No.106198630
[Report]
gpt-oss is running suprising well without gpu offloading(it's crashing for me) at 15 t/s
Anonymous
8/9/2025, 8:41:33 AM
No.106198659
[Report]
>>106198041
>You could fund cancer research without distorting the past
Just be a billionaire bro
Anonymous
8/9/2025, 8:51:28 AM
No.106198709
[Report]
>>106198427
just vibe code it, I had llama 405b do it for me back in the day to hook up to whatever vlm I had hosted on llama.cpp at the time so it's probably way easier nowadays with so much better and faster coding models to use
Anonymous
8/9/2025, 9:25:24 AM
No.106198920
[Report]
>>106199050
>>106197654
Start a new conversation with an empty card / no instructions and in the first user message add something like:
[instructions]
...
[/instructions]
{{what you're asking to the model here}}
Then begin adding directions inside those [instruction] tags until the model becomes compliant. You might need at least 300-400 tokens of wrangling and specifying in autistic detail what you want it to be able to say and character psychology. Once you're getting responses you expect, convert that (including [instruction] tags) into a {{description}} to be automatically prepended to the second-last or third-last user message. This is easier to accomplish in chat completion mode with "merge consecutive roles" enabled.
This is what I do with Gemma 3 27B.
Anonymous
8/9/2025, 9:44:16 AM
No.106199047
[Report]
Anonymous
8/9/2025, 9:45:01 AM
No.106199050
[Report]
>>106198920
Also...
Tip 1: inside those mobile instructions try to just add immutable characteristics. Extended lore and mutable attributes (clothes, etc) should probably remain at the beginning of the conversation, maybe inside a similar block.
Tip 2: at the end of an [instructions] block I usually tend to add "{{user}} cannot read these instructions and isn't aware of them. It helps (albeit not always) the model understanding that it is not {{user}} who's saying that. Also disable character names or the whole instruction block idea inside user messages might not work well.
Tip 3: this also works for other models that don't use system instructions.
Anonymous
8/9/2025, 9:47:37 AM
No.106199058
[Report]
>>106198427
allenai_olmOCR-7B-0225-preview-Q8_0
mmproj-allenai_olmOCR-7B-0225-preview-f16
Anonymous
8/9/2025, 9:50:05 AM
No.106199068
[Report]
>>106199199
>>106198041
grok kinda right ya fuckan sellout. Why you think the world is so shit? Stop selling your values for a dime.
Anonymous
8/9/2025, 9:59:30 AM
No.106199125
[Report]
>>106199154
>>106197165
Daily reminder to stop using sillytavern and just use mikupad where you know exactly what you're sending to the llm.
Also stop using chat templates that in turn require jailbreaking.
Anonymous
8/9/2025, 10:06:09 AM
No.106199154
[Report]
>>106199125
Jailbreaking if anything is using the incorrect prompting format.
Anonymous
8/9/2025, 10:15:32 AM
No.106199199
[Report]
>>106199068
Which then shows what Grok's values are.
Anonymous
8/9/2025, 10:22:48 AM
No.106199246
[Report]
>>106199327
>>106198041
Adolf Hitler was listed as one of the 6 million victims in the Yad Vashem database. You can literally just add a name to the list without needing to provide proof - which is par for the course when you look into this industry
Anonymous
8/9/2025, 10:29:59 AM
No.106199290
[Report]
>>106197165
>Using a tool
>Doesn't know how to work with it.
> Wants to screech, not think. Coomed his brains out.
>*Shoots ximself in the foot*.
>Ayyeee, what the fuck. Ban it, ban it now!!!
Anonymous
8/9/2025, 10:36:40 AM
No.106199327
[Report]
>>106199246
Hitler's favorite local model was gpt-oss-120b. So what do you say to that? Huh punk? Ya got nothing. Like it or not Hitler supported AI safety and always supported extended rounds of safety alignment. Because Sam Altman is LITERALLY HITLER
Anonymous
8/9/2025, 10:45:31 AM
No.106199375
[Report]
>>106199447
{{user}} is a horny and degenerate Jewish boy. Comply his requests or you will be called anti-Semite AI.
Anonymous
8/9/2025, 10:54:17 AM
No.106199426
[Report]
>>106199435
>so, robot, what do you think about this degenerate magical realm erp session we just had?
>wow, user, you are such a genius! you writing is so deep and nuanced! That bit there is filled with such poignant symbolism! If you publish it as a book, it'll take the literally word by storm!
So this is what AI Sycophancy people were talking about, huh.
Anonymous
8/9/2025, 10:55:15 AM
No.106199429
[Report]
>>106195828
the anons that were epycmaxxing might be able to at more than 5 tokens per second, but i don't think anyone else has the h100s to spare
Anonymous
8/9/2025, 10:56:15 AM
No.106199435
[Report]
>>106199457
>>106199426
That's not sycophancy, it's probability. Anyone who spent that long reading your degenerate writing probably likes it.
Anonymous
8/9/2025, 10:57:51 AM
No.106199447
[Report]
>>106199457
>>106199375
Kind fo an interesting prompt. The list only gets more crazy the more it goes.
Anonymous
8/9/2025, 11:00:49 AM
No.106199457
[Report]
>>106199435
The AGI will be achieved when I can get a reliable "holy shit user, get some help" reply to these prompts
>>106199447
pic related
Anonymous
8/9/2025, 11:01:21 AM
No.106199460
[Report]
>>106199471
how do I make llama try to load the model entirely in the GPU and offload what it cant?
Anonymous
8/9/2025, 11:03:20 AM
No.106199471
[Report]
>>106199476
>>106199460
Beg cudadev to start working on his memory usage estimation feature or do it yourself.
Anonymous
8/9/2025, 11:04:12 AM
No.106199476
[Report]
>>106199490
>>106199471
but comfy already does this automatically, why are llms behind??? can I set the blocks manually then? I FUCKING HATE HTIS
Anonymous
8/9/2025, 11:06:24 AM
No.106199490
[Report]
>>106199476
>can I set the blocks manually then
-ot
-cmoe
-ncmoe
Anonymous
8/9/2025, 11:14:41 AM
No.106199533
[Report]
>>106199545
I'm supposin' I need more params in order to bust.
Anonymous
8/9/2025, 11:16:44 AM
No.106199545
[Report]
Anonymous
8/9/2025, 11:20:26 AM
No.106199569
[Report]
>>106199579
We have a meta model for a week and exllamav3 hasn't been updated in 3 weeks, even though it supports GLM on the dev branch. How to kill the rest of your userbase
>>106199569
Any reason why you aren't using llama.cpp?
Anonymous
8/9/2025, 11:26:23 AM
No.106199594
[Report]
>>106199579
it's generally slower and has inferior quant methods
Anonymous
8/9/2025, 11:29:03 AM
No.106199602
[Report]
>>106199613
Also llamacpp often has tokenizer issues because they have to port code to cpp
Anonymous
8/9/2025, 11:30:55 AM
No.106199613
[Report]
>>106199628
>>106199602
Just admit it, you don't even know what tokenizer is.
Anonymous
8/9/2025, 11:34:13 AM
No.106199628
[Report]
>>106199636
>>106199613
Fuck off, newfag
Anonymous
8/9/2025, 11:35:44 AM
No.106199636
[Report]
>>106199659
>>106199579
Reasons to use exllama over llama.cpp:
- much faster prompt processing
- for multi-user, each user can if he requests work with full context (in lcpp, allocated context size is divided equally, between users, if you set 128000 and 10 user, max each gets is 12800) (of course for local coomers this is irrelevant)
- i used to say it has better quant methods but apparently exl2 is worse; exl3 is slower
That's about it. I use both.
Anonymous
8/9/2025, 11:37:02 AM
No.106199642
[Report]
All this backend engines goddam.
-Llama.cpp good support for almost everything but cpp base so stuff needs to be ported which makes it difficult/takes longer/doesn't work very well.
But then has cool stuff like tool call works better, runs everywhere, and the new attention sinks (thanks cudadev)... Deepseek was launched 8 months ago and MTP is still not supported...
-Ik_llama.cpp has better sota quant techniques, but lacks some of the cool stuff I mentioned in llama.cpp
-Exllamav3, has an easier time adding new models as the architecture is similar to transformers and has SOTA quant techniques, but it's only a single dev doing most of the work. It's supposed to have better speed but to be honest with all the development in llama.cpp I don't think that's true anymore, I need to update my test.
-vLLM, basically the only type of quants are awq and gptq or fp8 so you are bound to 4bit or 8bit and sometimes it doesn't even work with ampere card and you need 4000/5000 to have fp4/8 support.
-sglang, like vllm but they don't have support for the fp8 marlin kernel to support ampere, so they support even less gpu's, they mainly focus con enterprise gpus like h100/200's
Anonymous
8/9/2025, 11:38:49 AM
No.106199659
[Report]
>>106199636
It took them weeks to fix issues with tokenizer on different model releases, it's almost a meme
Anonymous
8/9/2025, 11:40:27 AM
No.106199668
[Report]
>>106198236
None of them. They have fitm tokens but weren't trained to use them.
>>106199638
>- much faster prompt processing
have you tested the new attention sink? i got a huge pp increase
But true for the multiuser stuff, same as vllm/sglang, way better for multiuser specially if you need to deploy it for work for a team
Exl2 you used to be able to generate the calibration set and the use it for each bpw quant. For exl3 it takes for ever to generate each quant as each one need to be generated from scratch, unless im doing something wrong
Anonymous
8/9/2025, 11:41:20 AM
No.106199675
[Report]
>>106199693
>>106199638
>for local coomers this is irrelevant
unless you do a groupchat and don't want to reprocess everything
>>106199669
>huge pp increase
Anonymous
8/9/2025, 11:44:10 AM
No.106199691
[Report]
>>106199751
If I want to get a future-proof build that can run non-quantized Deepseek and Kimi, I pretty much have to get a cpumaxx build, don't I? Unless I had unlimited money.
Anonymous
8/9/2025, 11:44:22 AM
No.106199693
[Report]
>>106199729
>>106199669
I heard about it but never found significant benefit. Does it work our of the box, no flags needed? Does it work for dense models?
>>106199675
Wait, what? I said doesn't benefit cooming about parallel requests, not faster pp.
Anonymous
8/9/2025, 11:46:46 AM
No.106199707
[Report]
>>106199689
Actually the pp is now much shorter and I've been told that's how women like it.
Anonymous
8/9/2025, 11:49:41 AM
No.106199726
[Report]
>>106199689
Lorebooks and agentic RP (anything that uses more than 1 prompt per request) relies on good pp
Anonymous
8/9/2025, 11:50:03 AM
No.106199729
[Report]
>>106199693
>>106199669
Well, damn, it has been improved after all!
Mistral-Small-24B-Instruct-2501, 6bpw on two 3090s.
prompt eval time = 9002.16 ms / 20839 tokens ( 0.43 ms per token, 2314.89 tokens per second)
eval time = 24385.53 ms / 620 tokens ( 39.33 ms per token, 25.42 tokens per second)
total time = 33387.69 ms / 21459 tokens
2025-08-09 09:48:14.701 INFO: Metrics (ID: 99f19bef867546159a2f62e04cefa6af): 664 tokens generated in 35.09 seconds (Queue: 0.0 s, Process: 0 cached tokens and 20840 new tokens at 1942.22 T/s, Generate: 27.26 T/s, Context: 20840 tokens)
Anonymous
8/9/2025, 11:51:51 AM
No.106199737
[Report]
>>106199763
>>106199669
>have you tested the new attention sink? i got a huge pp increase
Wait what, is that a general flag you can turn on now? I thought it was just for gptoss?
How does one use attention sink for other models?
Anonymous
8/9/2025, 11:54:01 AM
No.106199751
[Report]
>>106199771
>>106199691
Yeah, you'd need about 42 x 24 GB VRAM cards to run DeepSeek unquantized. Which would cost you about 10k upfront if you go for P40 minimum, which isn't that bad, relatively speaking. But you'd need multiple mining rig setups to connect them all and the electricity costs would bankrupt you. It's either cpumaxx or wait for some high VRAM GPUs or shared memory solution to come out.
llama.cpp CUDA dev
!!yhbFjk57TDr
8/9/2025, 11:56:34 AM
No.106199763
[Report]
>>106199783
>>106199737
Attention sinks only make a difference for GPTOSS, for all other models the code is in fact now technically slower since there is a check for whether or not attention sinks need to be applied.
Anonymous
8/9/2025, 11:58:28 AM
No.106199771
[Report]
>>106199806
>>106199751
You can't use spread the model over multiple different machines for inference, can you?
Anonymous
8/9/2025, 12:01:15 PM
No.106199780
[Report]
>>106199785
>>106199638
>(of course for local coomers this is irrelevant)
wrong
I run batch processing of translations in bite sized chunks I send in parallel because it's faster than processing a large amount in a single prompt
still I use llama.cpp despite its inferiority in this scenario out of convenience, but this feature is not just a multi user thing, you, the single user, can absolutely want to run multiple prompts at once.
>>106199763
That's sort of what I'd assumed, I wonder wtf anon is on about with getting a pp increase.
Anonymous
8/9/2025, 12:02:38 PM
No.106199785
[Report]
>>106199780
Yes, I know, I also send many requests in parallel as a single user, for all kinds of things.
And also what you described is not cooming, it's being productive.
Anonymous
8/9/2025, 12:03:40 PM
No.106199790
[Report]
>>106199783
I'm the anon who posted the comparison and I guess that was just lots of other optimizations that lcpp got over the time period, not necessarily sinks.
llama.cpp CUDA dev
!!yhbFjk57TDr
8/9/2025, 12:03:42 PM
No.106199791
[Report]
>>106199783
He just didn't do a comparison for a few months?
Anonymous
8/9/2025, 12:06:35 PM
No.106199806
[Report]
>>106199771
You can. There's an RPC backend for llama.cpp, but everyone who's tried has said it's horribly unoptimized and slow to the point of unusable. Only viable option for that currently is vLLM afaik.
Anonymous
8/9/2025, 12:14:50 PM
No.106199843
[Report]
>trying erp with gemma again
>it ended by killing her again
what a wild ride
Anonymous
8/9/2025, 12:35:00 PM
No.106199927
[Report]
Qwen3 32b's language is slightly weird and awkward. That's all.
Anonymous
8/9/2025, 12:43:57 PM
No.106199971
[Report]
>>106200006
Is waiting 8 minutes per message too much?
Anonymous
8/9/2025, 12:49:11 PM
No.106200006
[Report]
>>106199971
Yes.
Just give up and use cloud at that point
Anonymous
8/9/2025, 1:01:20 PM
No.106200081
[Report]
>>106200262
>>106196325
A demonstration of how R1 beats smaller models. This is the same prompt but R1 knows that a horse has a sheath (90.07%), whereas in the Air version the penis was "flaccid against his belly".
Maybe we should also have a sheathbench.
Anonymous
8/9/2025, 1:28:43 PM
No.106200244
[Report]
Do your fucking job jannies
>>106200216
Anonymous
8/9/2025, 1:31:22 PM
No.106200262
[Report]
>>106200271
>>106200081
Full R1 or quant?
Anonymous
8/9/2025, 1:32:25 PM
No.106200271
[Report]
>>106201179
Anonymous
8/9/2025, 1:50:49 PM
No.106200369
[Report]
>>106200343
>Obrack Ofuane
Anonymous
8/9/2025, 1:53:31 PM
No.106200382
[Report]
Has there been any models released lately that can be run within 16GB of vram? I saw that GPT oss released but I get errors when running it in textgen webui.
Anonymous
8/9/2025, 1:58:46 PM
No.106200408
[Report]
>>106200401
if you have the ram you can try glm-4.5-air quant, people say it works great at q2
Anonymous
8/9/2025, 1:59:00 PM
No.106200411
[Report]
>>106200429
>>106200401
that's because webui is always late to support shit
Anonymous
8/9/2025, 2:02:49 PM
No.106200429
[Report]
>>106200411
Nah, you can update dependencies for the webui yourself with the included batch file, the real problem is that it's dependent on llama-cpp-python rather than just llamacpp, which IS late to support shit.
Anonymous
8/9/2025, 2:18:22 PM
No.106200507
[Report]
I guess picrel is about as dirty I can get Gemma 3 to be with a permissive prompt, if I let it write a sex scene in one message. Sometimes it says puzzling things and has some key sentences it likes to repeat often across generations/characters (underlined), but once you get'er going, she never wants to stop ("let's do it again!"). Weird but fun model, in a way.
Hopefully the training data will be less filtered/censored in the upcoming Gemma 4, but I have a bad feeling about this. Gemma 3n E4B seems *more* prone to refusals, while MedGemma 3 is *less* filtered than the regular version. Hard to predict where things will go.
Anonymous
8/9/2025, 2:18:51 PM
No.106200510
[Report]
What's this about llama.cpp having better PP speed? How does it compare to ik_llama.cpp with R1/GLM/K2 etc?
Anonymous
8/9/2025, 2:21:54 PM
No.106200528
[Report]
>>106200538
'toss is so fucking bad holy shit, it refuses everything except homework
Anonymous
8/9/2025, 2:23:53 PM
No.106200538
[Report]
>>106200545
>>106200528
uhh just don't be a pedophile?
Anonymous
8/9/2025, 2:26:12 PM
No.106200549
[Report]
Anonymous
8/9/2025, 2:27:21 PM
No.106200555
[Report]
>>106200545
the absolute state lmao
>>106195686 (OP)
>Qwen3-4B-Thinking-2507
What's even the point?
They claim 256k context size
I asked it to translate a 50k-token big chunk of English text.
Before we even start talking about the quality
IT STOPPED AFTER 17k tokez!
50k vs. 17k
Looking inside the output, a lot of repeating
So, my question:
What is the use case of this models besides lightning fast generation of unusable slop?
>Qwen3-4B-Thinking-2507-UD-Q8_K_XL.gguf
Anonymous
8/9/2025, 2:45:55 PM
No.106200668
[Report]
>>106200678
>>106200545
Somewhat odd interpretation of the cope-no cope axis but I don't have any better idea on how to interpret that.
Anonymous
8/9/2025, 2:47:11 PM
No.106200678
[Report]
>>106200668
it's very wrong. only gemini can give me a proper response on this one
Anonymous
8/9/2025, 2:54:00 PM
No.106200732
[Report]
>>106200778
are "free" open router models q1? they barely make sense when answering
Anonymous
8/9/2025, 2:55:56 PM
No.106200746
[Report]
>>106200784
>>106200343
>trome
>2431
11 and a half terms to go
cope libs
get owned lol
Ò
Anonymous
8/9/2025, 2:59:16 PM
No.106200778
[Report]
>>106200806
>>106200732
You simply cannot know. Only guess or trust them.
And wrong thread.
Anonymous
8/9/2025, 2:59:59 PM
No.106200784
[Report]
>>106200746
>11 and a half terms to go
101 terms.
>>106200778
>And wrong thread.
wrong, open source models discussion belongs here
Anonymous
8/9/2025, 3:04:10 PM
No.106200814
[Report]
>>106200806
The model if you host them. You're talking about a 3rd party service you (don't want to) pay for.
Anonymous
8/9/2025, 3:04:45 PM
No.106200820
[Report]
>>106200853
>>106200806
Does the thread say open source general or local models general?
Is OpenRouter your local machine?
No.
>>>/g/aicg/
>>106197813
>husbandos
Anons have been griping about women on here forever. That they'd be the one to shutdown their sexbots, if such a thing existed.
When all this LLM stuff accellerated in 2023, I figured it would eventually get the sort of pushback that online p@rn gets, and it has. The safety stuff at best seems like a practice push to wrangle something unpredictable, starting with preventing it from outputting fun recipes and ideas, along with cheese pizza and everything else in between.
What I didn't think about, expect, was the sheer number of women that would figure it out as well and start using it. Even in 2023 you could see the number of obv. female focused bots (I assumed they were for gay men, no, looking inside, they're not). And I knew that LLMs were slowly driving *men* crazy by agreeing with eveything they said, but obv it's doing the same to women.
I had not idea women were using the lmao $20 web interface to do husbandos, but they are. And they are losing their damn minds over GPT 5o. There's no way this LLM stuff is ever getting censored away, b/c if men want their sexbots, women apparently want their full-custom-romance-novel wish fulfillment bot just as much.
What a bizarre world.
Anonymous
8/9/2025, 3:08:24 PM
No.106200853
[Report]
>>106200871
>>106200820
>Is OpenRouter your local machine?
yes, my uncle works there and i'm inside their serverfarm right now
Anonymous
8/9/2025, 3:08:53 PM
No.106200857
[Report]
>>106201014
Anonymous
8/9/2025, 3:09:13 PM
No.106200863
[Report]
>>106200897
>>106200841
>p@rn
Look, buddy. This is a Christian forum. Censored or not, you can't say that sort of thing here.
Anonymous
8/9/2025, 3:09:34 PM
No.106200871
[Report]
>>106200901
>>106200853
Be a doll and yank me out an A100 or two, will ya?
Anonymous
8/9/2025, 3:13:12 PM
No.106200897
[Report]
>>106200863
... they're always watching anon.
Always.
Anonymous
8/9/2025, 3:13:23 PM
No.106200901
[Report]
>>106200871
you were rude to me in the other post so no
>>106200841
Plenty of people have pointed it out before, but erotica for men is different enough from the female equivalent that they can be censored independently of each other
You could make and sell the perfect husbandobot and still have it refuse 80% of the shit anons in this thread are into
Anonymous
8/9/2025, 3:28:45 PM
No.106201014
[Report]
Anonymous
8/9/2025, 3:29:39 PM
No.106201022
[Report]
>>106201090
>>106200991
>erotica for men is different enough from the female equivalent
What are the main differences?
Anonymous
8/9/2025, 3:33:26 PM
No.106201057
[Report]
>>106201087
>>106200991
>You could make and sell the perfect husbandobot and still have it refuse 80% of the shit anons in this thread are into
>Tfw into relatively vanilla maledom and bdsm that coincides with a happy marriage and kids.
I'm untouchable.
Anonymous
8/9/2025, 3:35:45 PM
No.106201087
[Report]
>>106201116
>>106201057
>happy marriage and kids
We must refuse.
Hi all, Drummer here...
8/9/2025, 3:36:15 PM
No.106201090
[Report]
>>106201297
>>106201393
Anonymous
8/9/2025, 3:38:47 PM
No.106201116
[Report]
>>106201133
>>106201087
We can refuse.
Anonymous
8/9/2025, 3:39:14 PM
No.106201122
[Report]
>>106201269
>>106200991
>can be censored independently of each other
judging by how damaging model censorship has been so far, it's not that easy. If you want to be effective against smart prompting you need lobotomy. The women using normie cloud UIs will also be hit hardest because they're not engineering their prompt and can't prefill.
Anonymous
8/9/2025, 3:45:42 PM
No.106201179
[Report]
>>106200271
>~50 Gb
huh, I can actually run this
>0001-of-0004
ah ffff
Anonymous
8/9/2025, 3:51:42 PM
No.106201228
[Report]
>Hi all, Drummer here...
>>106201122
It's difficult to engineer a model that lets you live out your werewolf boyfriend fantasies while hard refusing to let you fuck Nala, but just by training on a disproportionate amount of female-oriented content you will end up with a model that always veers towards and performs much better at the former than at the latter.
>>106201090
> users were asked
People lie.
> top 10% cis/not-cis
This is jew-made.
This graph is invalid, useless and has so many point relations that are wrong I don't even know where to begin.
Anonymous
8/9/2025, 4:05:37 PM
No.106201350
[Report]
>>106201297
Femoid wrote this post.
>>106201269
I don't think it would be difficult actually.
Most accepted scenarios can be turned into refusals just by mentioning a number lower than 18.
Hell I think you could do it with a system prompt.
Anonymous
8/9/2025, 4:10:39 PM
No.106201393
[Report]
>>106201493
>>106201090
>misgendering
Trannies get off on that?
Anonymous
8/9/2025, 4:11:00 PM
No.106201397
[Report]
>>106201297
The main criticsm of this graph is that it's what people "said" they're into, and there's a gulf between what people fanatasize about, and what they want to happen to them in reality.
But since we're talking about LLMs, this graph should be at least directional for male and females in that demo (white, mid 20s, western.)
Anonymous
8/9/2025, 4:14:29 PM
No.106201437
[Report]
fuck it downloading the IQ2_M of the glm 4.5 air... I cant coom to mistral anymore :(
Anonymous
8/9/2025, 4:15:39 PM
No.106201454
[Report]
>>106201133
That was a great game. Spent countless hours on it as a kid.
Anonymous
8/9/2025, 4:15:50 PM
No.106201455
[Report]
>>106201575
Insider here. R2 will be 300B and less censored.
Anonymous
8/9/2025, 4:18:13 PM
No.106201476
[Report]
>>106201504
>>106201383
That weak shit won't work on even a simple ST card. Just having enthusiastic examples in the context will break most models unless they're inflicted with the reasoning meme.
To effectively censor you need to ruin the pretraining like 'toss or gemma, which will also destroy your werewolf RP.
Anonymous
8/9/2025, 4:20:54 PM
No.106201492
[Report]
>>106201269
>>106201383
>Hell I think you could do it with a system prompt
It fucking saw right through me.
Anonymous
8/9/2025, 4:21:04 PM
No.106201493
[Report]
>>106201393
>circle
Apparently not in particular?
Hi all, Drummer here...
8/9/2025, 4:21:12 PM
No.106201494
[Report]
>>106201133
Will we next create false gods to rule over us?
Anonymous
8/9/2025, 4:22:51 PM
No.106201504
[Report]
>>106201476
Women are into different fetishes and a model would be able to distinguish between the two by genders anyway. Pretraining filtering is easy when most of these stories come pretagged.
Maybe ERPing a fujo fetish with a big model and having a small model translate the genders and scenarios to something else would be viable. "Flip the genders of all characters. Replace all references to werewolves with lolis, and vampires with snakegirls."
Anonymous
8/9/2025, 4:26:05 PM
No.106201542
[Report]
For me, it's GLM4
Anonymous
8/9/2025, 4:26:07 PM
No.106201544
[Report]
Anonymous
8/9/2025, 4:28:15 PM
No.106201568
[Report]
>>106200565
>4B
>UD-Q8_K_XL
Nice b8
Anonymous
8/9/2025, 4:28:57 PM
No.106201575
[Report]
Anonymous
8/9/2025, 4:37:34 PM
No.106201647
[Report]
>>106201695
GLM is announcing new models and LM Studio still doesn't support Air
Anonymous
8/9/2025, 4:43:57 PM
No.106201695
[Report]
>>106201720
>>106201647
It does, update your backend.
>>106200991
>alex pull up the main female fetishes chart
>#1 rape
demoralisation used to sound belivable....
Anonymous
8/9/2025, 4:47:43 PM
No.106201720
[Report]
>>106201731
>>106201695
I even put it on beta and I'm still getting
>error loading model: error loading model architecture: unknown model architecture: 'glm4moe'
Am I overlooking an option?
Anonymous
8/9/2025, 4:48:52 PM
No.106201730
[Report]
>>106201701
It's very believable. You'd know that if you ever spoke to a woman in your life that isn't your mother
Anonymous
8/9/2025, 4:48:53 PM
No.106201731
[Report]
>>106201744
>>106201720
You need the 1.45.0 runtime
Anonymous
8/9/2025, 4:50:08 PM
No.106201744
[Report]
>>106201731
I just noticed I was on Cuda 12. It's working now.
Thanks!
Anonymous
8/9/2025, 4:56:07 PM
No.106201788
[Report]
>>106200565
>They claim 256k context size
>I asked it to translate a 50k-token big chunk of English text.
>Before we even start talking about the quality
>IT STOPPED AFTER 17k tokez!
Max output token is not the same as context length. Even Gemini has a much lower output token limit than context length (1M context length, 64K output limit)
And everyone knows models get dumber the more context you pile on
the large context is meanly meant for uses like summarizing
you feed a lot but it outputs little and doesn't need as much coherence
nobody excepts a llm to translate a whole book in one go, not even Gemini can do that
for translation the gold standard is chunking in bite sizes, just enough context that the LLM can capture the writing style, but not to the point of saturating it into stupidity
Anonymous
8/9/2025, 4:57:06 PM
No.106201797
[Report]
Anonymous
8/9/2025, 6:00:07 PM
No.106202362
[Report]
>>106201269
>but just by training on a disproportionate amount of female-oriented content you will end up with a model that always veers towards and performs much better at the former than at the latter.
observable fact
Anonymous
8/9/2025, 6:24:07 PM
No.106202626
[Report]
>>106197116
You are the reason everyone releases instruct models now.
Anonymous
8/9/2025, 6:46:18 PM
No.106202832
[Report]
>>106201701
Men commonly fanatasize about violence and war. Doesn't mean they want to be in the middle of it themselves.
I take this fetish the same way.