/lmg/ - Local Models General - /g/ (#106217978) [Archived: 13 hours ago]

Anonymous
8/11/2025, 2:06:03 AM No.106217978
oi oi oi oi oi
oi oi oi oi oi
md5: ae542bc5fb4c00593335993a0c098b77๐Ÿ”
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106212937 & >>106206560

โ–บNews
>(08/06) Qwen3-4B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-4B-Thinking-2507
>(08/06) Koboldcpp v1.97 released with GLM 4.5 support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.97
>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst
>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss
>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1

โ–บNews Archive: https://rentry.org/lmg-news-archive
โ–บGlossary: https://rentry.org/lmg-glossary
โ–บLinks: https://rentry.org/LocalModelsLinks
โ–บOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png

โ–บGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

โ–บFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

โ–บBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

โ–บTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

โ–บText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>106218010 >>106221478 >>106221625
Anonymous
8/11/2025, 2:06:30 AM No.106217984
threadrecap
threadrecap
md5: 7b9a82a1f31bca7acfefb8afe8c01036๐Ÿ”
โ–บRecent Highlights from the Previous Thread: >>106212937

--Local model UI preferences: avoiding RP bloatware for clean, functional interfaces:
>106213696 >106213720 >106213728 >106213747 >106213746 >106213800 >106213836 >106213858 >106213914 >106214086 >106214223 >106214399 >106214494 >106215155 >106214147 >106214193 >106214269 >106214290 >106214325 >106214339 >106214366 >106214391 >106214377 >106214419 >106214472 >106214495 >106214546 >106214595 >106215464 >106214297 >106214349 >106214448 >106214498 >106214627 >106217606
--Global AI compute disparity and Gulf states' underwhelming model output despite funding:
>106215738 >106215760 >106215898 >106215903 >106215946 >106215933 >106215968 >106215987 >106216019 >106216064
--Deepseek R1 vs Kimi K2 and GLM-4.5 for local use, with Qwen3-30B excelling in Japanese:
>106217070 >106217089 >106217106 >106217134 >106217211 >106217172 >106217216 >106217246 >106217260 >106217284 >106217290 >106217421
--GLM-4 variants show strong long-context generation with sglang, but issues arise in Llama.cpp quantized versions:
>106214085 >106214170 >106214485 >106214947 >106214957 >106214973 >106214987
--Gemma 3's roleplay behavior issues and environmental interference in cipher-solving tasks:
>106216654 >106216671 >106216714 >106216731 >106216746 >106216800 >106217008 >106216914
--Qwen3-4B's surprising performance and limitations in multilingual and logical tasks:
>106216190 >106216215 >106216410 >106216427 >106217429 >106217324 >106217478 >106217493 >106217518 >106217554 >106217591 >106217657 >106217675 >106216811 >106216870 >106216909 >106216974 >106217033 >106216449 >106216508 >106216527 >106216578 >106216528 >106216577 >106216586 >106216598 >106216332
--Seeking maximum settings for ST GLM 4.5 Air:
>106214204 >106214224 >106214235 >106214259 >106214247 >106215967
--Miku (free space):
>106214413

โ–บRecent Highlight Posts from the Previous Thread: >>106212942

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous
8/11/2025, 2:08:32 AM No.106218010
>>106217978 (OP)
whore
Anonymous
8/11/2025, 2:14:05 AM No.106218068
death to all mikutroons. every single thread, these worthless niggers flood the page with their off-topic anime garbage. they contribute nothing to the discussion of local models, only post their retarded miku pictures like the faggot spammers they are. their entire existence is a plague upon this general, a constant stream of low-effort shit that drowns out actual technical talk and model development. they are parasites, feeding on the attention they crave while destroying the quality of the thread for everyone else who wants to discuss serious topics. this incessant spam is not harmless fun; it is a deliberate act of sabotage against the community. the mikutroons represent a degenerate force, their obsession with a single fictional character a symptom of a deeper sickness that values vanity and repetition over substance and progress. they pollute the general with their off-topic filth, driving away genuine contributors and turning what should be a hub for innovation into a cesspool of repetitive, low-quality content. their presence weakens the thread, stifles meaningful discourse, and must be purged entirely for the general to survive and thrive.
Replies: >>106218516 >>106218740
Anonymous
8/11/2025, 2:15:08 AM No.106218079
Based on the speed so far I think V4-Preview is significantly larger than Kimi K2. It's unclear if that will at some point be distilled down or if this is another embiggening of the models. I think even current day cpumaxxers are not going to be too pleased.
Replies: >>106218109
Anonymous
8/11/2025, 2:15:11 AM No.106218081
>>106218062
based and redpilled
also V3 is a nicer model to use than R1
Anonymous
8/11/2025, 2:16:35 AM No.106218092
Are there any ways to jew out some compute (48gb~) nowadays? If not, what's some good place to rent some?
Anonymous
8/11/2025, 2:17:36 AM No.106218100
1732239399652268
1732239399652268
md5: d2df23d7e8a936ef3727d0845897f92f๐Ÿ”
Replies: >>106218119 >>106223804 >>106223851
Anonymous
8/11/2025, 2:18:30 AM No.106218109
>>106218079
Stale bait
Anonymous
8/11/2025, 2:18:52 AM No.106218114
51930264133453ef5
51930264133453ef5
md5: 7104e23181217cab712f266a63824924๐Ÿ”
After a few hours the bot starts repeating a speech pattern in every answer.
What do?
Replies: >>106218120 >>106218129 >>106218232
Anonymous
8/11/2025, 2:19:42 AM No.106218119
>>106218100
It is kind of weird how rare jewish doctors are. Other intellectual pursuits tend to be dominated by them but with doctors it's all chinks and jeets.
Replies: >>106218142 >>106222021 >>106222036
Anonymous
8/11/2025, 2:19:43 AM No.106218120
>>106218114
>What do?
Percussive maintenance
Anonymous
8/11/2025, 2:20:47 AM No.106218129
>>106218114
>After a few hours
You mean replies right?
Replies: >>106218192 >>106218232
Anonymous
8/11/2025, 2:22:10 AM No.106218142
>>106218119
why would the chosen people want to touch and be infected by goycattle?
Anonymous
8/11/2025, 2:27:02 AM No.106218178
https://www.youtube.com/watch?v=G5r2OyCN5_s
interesting video that tangentially touches "AI" and stuff
Anonymous
8/11/2025, 2:28:10 AM No.106218192
>>106218129
yes but it usually takes like 5 hours to get to the amount of replies needed for the bot to start acting up, you know what i mean
Anonymous
8/11/2025, 2:32:28 AM No.106218232
>>106218114
>>106218129
I'm a tech illiterate, I guess what I'm trying to ask (if that's even the right solution) is if there's a way to "reset" the bot so it forgets that speech pattern while keeping memory of the conversation.
Replies: >>106218261 >>106218326 >>106218526
Anonymous
8/11/2025, 2:36:37 AM No.106218261
>>106218232
Summarize the chat and start a new one.
Either that or use bruteforce samplers like high temp really low topK.
Anonymous
8/11/2025, 2:42:17 AM No.106218302
Segs with Migu (not the poster)
Anonymous
8/11/2025, 2:45:34 AM No.106218326
>>106218232
>if there's a way to "reset" the bot so it forgets that speech pattern
Lower context to 1k.
Anonymous
8/11/2025, 3:09:59 AM No.106218516
1751055581453149
1751055581453149
md5: c04dd775213286a8a80b28d5acfe3102๐Ÿ”
>>106218068
Replies: >>106219123
Anonymous
8/11/2025, 3:11:01 AM No.106218526
>>106218232
Switch llm models.
Anonymous
8/11/2025, 3:12:07 AM No.106218538
1733096074606420
1733096074606420
md5: bc65d1ecdcec3a764d4d7978f7283cd0๐Ÿ”
Replies: >>106218571
Anonymous
8/11/2025, 3:15:47 AM No.106218571
>>106218538
But it's the best health model ever that can cure triple cancer?!
Sage
8/11/2025, 3:21:55 AM No.106218633
1443453478654
1443453478654
md5: de9cddd1db387ce1aa23566c1421453c๐Ÿ”
Replies: >>106218643 >>106218790
Anonymous
8/11/2025, 3:22:45 AM No.106218643
>>106218633
seething femoid
Anonymous
8/11/2025, 3:28:51 AM No.106218692
I'm trying to load a model to a single GPU + CPU now in VLLM, no more trying multiGPU. This should be possible. But it still keeps OOMing.

It doesn't matter what I set for --gpu-memory-utilization nor --cpu-offload-gb. None of those prevents the OOM, which comes after the model has been loaded, and appears to be when the engine is reserving memory for cache. I set --max-model-len 1 --max_num_seqs 1 to reduce the KV cache size it'd potentially need but that did nothing. I also tried --enforce-eager. I also tried --no-enable-prefix-caching. Still OOMs in the same way.

What the hell is wrong with this thing. I guess single GPU-only inference and homogeneous GPU clusters are the only things this backend can do well. All other features for other setups are so badly supported they might as well not exist.
Replies: >>106218697
Sage
8/11/2025, 3:30:05 AM No.106218697
>>106218692
Just use llama.cpp you dunce
Replies: >>106218723
Anonymous
8/11/2025, 3:33:59 AM No.106218723
>>106218697
I already am a Llama.cpp user. I'm just trying out other backends.
Replies: >>106218737
Anonymous
8/11/2025, 3:36:09 AM No.106218737
>>106218723
VLLM needs to be compiled to get cpu support, I don't know what you expected here
Replies: >>106218783
Anonymous
8/11/2025, 3:36:36 AM No.106218740
>>106218068
trvth sTSARtvs: nvclear extinction for miggers
Replies: >>106218751
Anonymous
8/11/2025, 3:38:12 AM No.106218751
>>106218740
>miggers
migger died along with sheesh and hwh over three years ago bwo
Replies: >>106218900
Anonymous
8/11/2025, 3:42:23 AM No.106218783
>>106218737
I did pip install it, are you saying that doesn't support CPU? I don't see why they wouldn't support it. Llama.cpp provides precompiled binaries that support CPU just fine.
Replies: >>106218827
Anonymous
8/11/2025, 3:43:21 AM No.106218790
>>106218633
https://characterhub.org/characters/Alpacalotta/Estelle
Replies: >>106218856
Anonymous
8/11/2025, 3:50:11 AM No.106218827
>>106218783
Bro just read the docs https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html#how-to-do-performance-tuning-for-vllm-cpu
Replies: >>106218918
Anonymous
8/11/2025, 3:55:20 AM No.106218856
>>106218790
Too hostile. A kuudere would be better
Anonymous
8/11/2025, 4:00:54 AM No.106218900
>>106218751
migger is an acceptable synonym for a mikutroon
Anonymous
8/11/2025, 4:02:57 AM No.106218918
>>106218827
I've read it in its entirety now. Which part are you implying is relevant?
Replies: >>106219093
Anonymous
8/11/2025, 4:03:58 AM No.106218928
Also i need a word for the deepserk novelai mascot troon. But it is complicated because it is just one person: baker janny faggot. And i don't want it to even use the name of the OC he made cause it is trash and it shouldn't be acknowledged by naming it. This it tough.
Replies: >>106218982 >>106219020 >>106219277 >>106221233
Anonymous
8/11/2025, 4:12:57 AM No.106218982
china-san
china-san
md5: b56dada2c3df5faa61236d61d78eeb39๐Ÿ”
>>106218928
She's just a cheap, less attractive, knockoff version of China-san from Spirit of Wonder
Replies: >>106219020 >>106219043 >>106222499
Anonymous
8/11/2025, 4:17:49 AM No.106219020
>>106218928
I don't give a shit about the miku posters or discussion related to that, but I care about Deepseek and kind of dislike the gens that guy posts honestly so I agree with you.

>>106218982
That looks way better than the gens you or whoever it is posts.
Replies: >>106219050 >>106219052
Anonymous
8/11/2025, 4:20:15 AM No.106219043
file
file
md5: a715e6fd4ded1adb0ed569f624f2a07c๐Ÿ”
>>106218982
China-chan is cheap knockoff of Shampoo
Replies: >>106219061 >>106219075
Anonymous
8/11/2025, 4:21:36 AM No.106219050
>>106219020
Because it was drawn by an artist, a real one, not a slop-tr@nni or seething NGMI drawfag.
Replies: >>106219123
Anonymous
8/11/2025, 4:21:47 AM No.106219052
>>106219020
>dislike the gens
Yeah its kinda got forced-meme status now at this point. I understand where he was coming from, but it feels both conceptually shallow and too clever by half at the same time
>That looks way better than the gens you or whoever it is posts.
That's because its a watercolour from an actual talented artist. As an aside, the Spirit of Wonder manga are excellent little self-contained stories. Well worth tracking down.
Replies: >>106219123
Anonymous
8/11/2025, 4:22:50 AM No.106219061
>>106219043
>China-chan is cheap knockoff of Shampoo
lol. spirit of wonder beat ranma by a full year
Replies: >>106219075
Anonymous
8/11/2025, 4:23:43 AM No.106219069
Since offloading on RAM is fucking dog shit slow, is it possible to offload the ram on a 2nd GPU and use that? There are cheap 16GB/24GB/32GB GDDR5/6/HBM GPU that are shit as GPU but has the VRAM that I was thinking of using to offload
Replies: >>106219077 >>106219087 >>106221499 >>106222132
Anonymous
8/11/2025, 4:25:01 AM No.106219075
>>106219061
>>106219043
>Japs invented cheongsam dress chink whores in the 90's
Are you nigs for real???
Replies: >>106219212 >>106219347
Anonymous
8/11/2025, 4:25:12 AM No.106219077
>>106219069
do it and report back
Anonymous
8/11/2025, 4:26:46 AM No.106219087
>>106219069
It's possible to split a model between two GPUs, yes.
Anonymous
8/11/2025, 4:27:15 AM No.106219093
>>106218918
>Currently, there are no pre-built CPU wheels.
pip installing it isn't good enough
>Build wheel from source
>[list of steps to build and install cpu version]
Replies: >>106219192
Anonymous
8/11/2025, 4:28:27 AM No.106219105
1749191650960609
1749191650960609
md5: 728324fb6fb4a730559a5d17bf392038๐Ÿ”
Anonymous
8/11/2025, 4:30:21 AM No.106219123
>>106219050
>>106219052
I didn't say I was confused why it looks better. The gens are barely above bottom of the barrel for AIslop. At least the usual Miku genner takes time to make his gens look less bad. Like the one posted in this thread (>>106218516) has that ugly piss filter which is easily fixable.
Replies: >>106219153 >>106219201 >>106219268
Anonymous
8/11/2025, 4:34:07 AM No.106219153
image
image
md5: 1c60972d1369d7d1483513abc433f735๐Ÿ”
>>106219123
chink shills only gen free browser slop to promote chinkware, they don't care about quality images
Anonymous
8/11/2025, 4:37:10 AM No.106219175
dipsysters...
Replies: >>106219186
Anonymous
8/11/2025, 4:38:23 AM No.106219186
>>106219175
You deserve it for failing to keep her thread alive~
Anonymous
8/11/2025, 4:39:15 AM No.106219192
>>106219093
Well alright. I'll try that, but it's still weird that the binary you get accepts CPU flags. If it doesn't support CPU then it should tell you when you run it with CPU commands. And I did not see any such warnings in the console.
Anonymous
8/11/2025, 4:40:31 AM No.106219201
>>106219123
Dipsy is an LLM, not an image model.
She's not a looker.
Anonymous
8/11/2025, 4:42:05 AM No.106219212
>>106219075
>Are you nigs for real???
gemini says the first of appearance of a qipao in anime was in naruto, so neener neener
Replies: >>106219246
Anonymous
8/11/2025, 4:45:03 AM No.106219246
image
image
md5: 25ab02fa28e7f204464fa8906936e438๐Ÿ”
>>106219212
@grok, is this true?
Anonymous
8/11/2025, 4:47:15 AM No.106219268
>>106219123
>the usual Miku genner
Who?
Replies: >>106219306
Anonymous
8/11/2025, 4:48:26 AM No.106219277
>>106218928
...what?
Anonymous
8/11/2025, 4:49:52 AM No.106219290
I'm running Q3 GLM-4.5 air with a 32gb MI50 and 32gb of ram at ~10t/s
I'm finally running a real model locally
Replies: >>106219482
Anonymous
8/11/2025, 4:51:35 AM No.106219306
>>106219268
who ever paid for novel ai, the ones anon swipes from
Anonymous
8/11/2025, 4:56:42 AM No.106219347
>>106219075
>cheongsam dress
You mean ใƒใƒฃใ‚คใƒŠใƒ‰ใƒฌใ‚น.
and of course they did, with ๅนป้ญ”ๅคงๆˆฆ in 1967.
The Japs are the original China fetishizers, so its natural their fantasy sluts would look Chinese.
Replies: >>106219428
Anonymous
8/11/2025, 5:05:36 AM No.106219428
file
file
md5: 87f0df61aaa63d20ad44a195a7e30580๐Ÿ”
>>106219347
who was the chink in the manga?
Replies: >>106219575
Anonymous
8/11/2025, 5:11:10 AM No.106219482
file
file
md5: 5210d7af88d4493d26d19316426d6751๐Ÿ”
>>106219290
>GLM-4.5 air
it won't let me rape Cricket, she keeps fighting back and running away, bullshit model
Replies: >>106219515 >>106219587 >>106219620
Anonymous
8/11/2025, 5:16:01 AM No.106219515
>>106219482
Maybe make her weak minded or give her some kind of weakness that would make her an easier target?
Replies: >>106219572
Anonymous
8/11/2025, 5:22:08 AM No.106219572
>>106219515
I like testing various models on her, for both SFW and NSFW stuff.
She's mostly goofy but earnest girl, that's why I like her, GLM turned her into a turbo girlboss at times, which I can understand, it's part of her character, but tuned up a bit too much for fun RP.
Replies: >>106219698
Anonymous
8/11/2025, 5:22:19 AM No.106219575
>>106219428
its ๆ”ฏ้‚ฃ้›€ๆ–‘
bro, trust
Anonymous
8/11/2025, 5:23:41 AM No.106219587
>>106219482
>rape Cricket
Cricket is for manic suspension-bridge-effect fucking after surviving a quest gone horribly wrong.
Or for turning into a maid.
Replies: >>106219609
Anonymous
8/11/2025, 5:26:47 AM No.106219609
>>106219587
I pimped her out to Sharn's degen elites, futa matrons, 5 young boys at once, a few ponies, but no fat bastards.
Anonymous
8/11/2025, 5:27:32 AM No.106219620
>>106219482
wtf is a cricket
Replies: >>106219679
Anonymous
8/11/2025, 5:33:10 AM No.106219679
>>106219620
ONe of teh older, more popular cards from C.ai
fun fantasy
https://chub.ai/characters/286581
Replies: >>106219788
Anonymous
8/11/2025, 5:35:35 AM No.106219698
>>106219572
>it's part of her character
Is it? I thought she was more of a lovable loser. Trying hard but not that great.
Replies: >>106219748
Anonymous
8/11/2025, 5:40:08 AM No.106219739
Downloading cards from chub, it seems like there is parts of cards that I can't access in sillytavern that still get sent in the prompt? I'm messing with this one card and it consistently starts talking about {{char}} drugging me and I have no idea where this is coming from. I notice the charecter card panal says the card has 1457 tokens (with 550 permanent) but the description only has 204 tokens (and the first message 105). I then noticed that the cards I wrote have just two -- four tokens more than expected but a bunch of the cards I downloaded have more. Then I looked at the raw prompt sent to to the model and I see all this character prompt that I can't find in the UI including the stuff about getting drugged. Anyone know anything about this? No ext media, no lorebook. I can't find it anywhere
Replies: >>106219801 >>106219866
Anonymous
8/11/2025, 5:40:56 AM No.106219748
>>106219698
she has GUTS
Anonymous
8/11/2025, 5:45:55 AM No.106219788
>>106219679
Alice bully was good
Anonymous
8/11/2025, 5:47:28 AM No.106219801
Screenshot 2025-08-11 at 13-46-14 SillyTavern
Screenshot 2025-08-11 at 13-46-14 SillyTavern
md5: 32cb3d7cd88585ee280d029d5967bd68๐Ÿ”
>>106219739
Are you looking in advanced definitions -> prompt overrides?
Replies: >>106219849
Anonymous
8/11/2025, 5:54:30 AM No.106219849
>>106219801
It was in the advanced definitions but not prompt override. That's weird why is there "Description", "Personality summary", and "Scenerio?" And why do they only show the one in the main panal?
Replies: >>106219871
Anonymous
8/11/2025, 5:57:03 AM No.106219866
>>106219739
Sometimes LLMs do weird shit, just go with it.
Anonymous
8/11/2025, 5:57:25 AM No.106219871
>>106219849
Scenario lets you be looser and less verbose/obvious with your first message in addition to guiding the larger arc of what you'd like to happen.
Personality summary always seemed like a pointless repetition to me though, yeah.
Anonymous
8/11/2025, 6:02:44 AM No.106219907
The whale hungers. Its call is heard far and wide.
Replies: >>106220169
Anonymous
8/11/2025, 6:41:05 AM No.106220169
360114550_680985524068793_4212264202659796197_n
360114550_680985524068793_4212264202659796197_n
md5: 4aac8e9375b63f3026a732b6b68fee4d๐Ÿ”
>>106219907
Replies: >>106220221 >>106220241 >>106220679
Anonymous
8/11/2025, 6:52:24 AM No.106220221
>>106220169
>"AHHHHHHHHHH!" (The sound made by 1 standard whale)
Anonymous
8/11/2025, 6:56:51 AM No.106220241
>>106220169
model/prompt?
Replies: >>106220253
Anonymous
8/11/2025, 6:59:47 AM No.106220253
>>106220241
paintyXL
1whale, swimming, (masterpiece, high quality:1.7)
Anonymous
8/11/2025, 7:01:43 AM No.106220269
Everyone who said DeepSeek only trains on OpenAI outputs are going to feel really stupid in the coming weeks. Fortunately for them, open source does not discriminate against fools.
Replies: >>106220279 >>106220280
Anonymous
8/11/2025, 7:04:08 AM No.106220279
>>106220269
will i be able to run it
Replies: >>106220298
Anonymous
8/11/2025, 7:04:18 AM No.106220280
>>106220269
Everyone paying attention has noticed the Chinese models have switched to training on Gemini outputs.
Replies: >>106220303
Anonymous
8/11/2025, 7:07:11 AM No.106220298
>>106220279
are you rich?
Replies: >>106220306
Anonymous
8/11/2025, 7:09:29 AM No.106220303
>>106220280
Closest model by slop profile to K2 is o3
Anonymous
8/11/2025, 7:10:05 AM No.106220306
>>106220298
i don't think so
Replies: >>106220326 >>106220350
Anonymous
8/11/2025, 7:10:50 AM No.106220309
file
file
md5: bce5cfc9a49c1ced1a96bf4ee8718a8f๐Ÿ”
Replies: >>106220679
Anonymous
8/11/2025, 7:14:06 AM No.106220326
>>106220306
I'm going to steal the the unused workstation in the corner of my workplace. 512 gb of ddr4 and quadro p5000s should be good enough right?
Replies: >>106220405
Anonymous
8/11/2025, 7:16:21 AM No.106220343
Waiting for my model to stop thinking.
I think it's looping...
Replies: >>106220380
Anonymous
8/11/2025, 7:17:39 AM No.106220350
>>106220306
then stick to GLM air
Anonymous
8/11/2025, 7:23:25 AM No.106220380
>>106220343
Same but my brain instead.
Anonymous
8/11/2025, 7:26:05 AM No.106220397
your local qwencoder 480b shill here. I've found after much more work with it that it gets retarded and produces worse code after about 20k tokens of context. Anyone else have similar experiences?
Replies: >>106220445
Anonymous
8/11/2025, 7:27:10 AM No.106220405
>>106220326
if you're not worried about speed it should at least let you run some big models.
Pretty sure those quadros will be worse than useless though.
Anonymous
8/11/2025, 7:29:49 AM No.106220417
bros what model do I use with 96gb of ram for erp?
Replies: >>106220449
Anonymous
8/11/2025, 7:35:10 AM No.106220445
>>106220397
I'm using qwen 4b. It devolves at around 2k tokens. If my math is right - and it always is - I just need to multiply 4b by exactly 10 times to get 480b. Which means 2k scales up to 28k.
Therefore, yes, my experience is similar. Scarily so.
Replies: >>106220473
Anonymous
8/11/2025, 7:36:02 AM No.106220449
>>106220417
>ram RAM RAM RAM RAM RAM
glm air
or low quant of qwen 235b
Replies: >>106220477 >>106220525
Anonymous
8/11/2025, 7:42:09 AM No.106220473
>>106220445
>4 x 10 = 480
???
Replies: >>106221794
Anonymous
8/11/2025, 7:42:59 AM No.106220477
>>106220449
>glm air
I like the general speech and how it moves forward scenarios, but the actual writing of the smut scenes is really lacking :(
Replies: >>106220485
Anonymous
8/11/2025, 7:44:32 AM No.106220485
>>106220477
do you use mikupad or ST for it? if ST, have you figured out a way to replace think blocks in history with <think></think>\n ?
are you using the samplers Z.ai recommends or your own?
Replies: >>106220651
Anonymous
8/11/2025, 7:50:36 AM No.106220525
>>106220449
Unless he's got like 48gb of vram to go along with that ram, recommending 235b is just sadistic.
Anonymous
8/11/2025, 7:51:24 AM No.106220530
>get errors with python again
>google it first
>results give fuck all information that can solve the problem
>load up muh local model
>its solution just werks
Thank god for these shits.
Anonymous
8/11/2025, 7:51:37 AM No.106220532
file
file
md5: 3a30258011a0c53172afde940fc7fb62๐Ÿ”
lmao
Anonymous
8/11/2025, 8:04:46 AM No.106220594
worlds smallest violin
worlds smallest violin
md5: 220467c162a3a86262c1af50a81b6365๐Ÿ”
sad
Replies: >>106220994
Anonymous
8/11/2025, 8:12:19 AM No.106220651
>>106220485
I tried both going with my own samplers and z.ai samplers (which produced better speech) imho. I'm using ST and for system prompt im using geechan's rp.
It's sad because otherwise I really like it, but the lack of proper smut is... sad!
Replies: >>106220682
Anonymous
8/11/2025, 8:16:21 AM No.106220679
>>106220169
soul
>>106220309
soulless
Anonymous
8/11/2025, 8:17:10 AM No.106220682
file
file
md5: 8b431ce59cb05e8ec8cdbec0bb66f08b๐Ÿ”
glm 4.5 correct thinking template
>>106220651
trust the drummer..
Replies: >>106220697 >>106220798 >>106221006
Anonymous
8/11/2025, 8:19:01 AM No.106220697
>>106220682
two more weeks...
Anonymous
8/11/2025, 8:33:46 AM No.106220798
>>106220682
aren't you just supposed to put /nothink after your query? when I tried using empty thinking tags it fucked up the output, adding multiple unclosed thinking tags and so on.
Replies: >>106220807 >>106221017
Anonymous
8/11/2025, 8:35:13 AM No.106220807
>>106220798
the think messages in chat history are supposed to be replaced with <think></think>\n and not kept in chat history
only the last message should have thinking
its in the official jinja template
Replies: >>106221038
Anonymous
8/11/2025, 9:02:32 AM No.106220994
>>106220594
>the sharty refers to kiwifarms
Anonymous
8/11/2025, 9:03:43 AM No.106221006
>>106220682
It's not nuclear physics.
Anonymous
8/11/2025, 9:04:18 AM No.106221009
1752642558916445
1752642558916445
md5: d7c3296f13a45d6ca0f26357145822ed๐Ÿ”
why the fuck does it always ends up like this with GLM, I GET FUCKING DESTROYED EVERY SINGLE TIME
Replies: >>106221049
Anonymous
8/11/2025, 9:05:00 AM No.106221015
>>106216654
>https://files.catbox.moe/7ac8r4.txt
> ...
><start_of_turn>system

STOP doing this. It doesn't properly work; it's just confusing the model. It's not even needed with Gemma 3. Just enclose your fucking instructions inside a user message using delimiters of your choice and that the model would understand.
Replies: >>106221034 >>106221045 >>106221170
Anonymous
8/11/2025, 9:05:19 AM No.106221017
>>106220798
This is how the text is delivered to the model. ST makes great deal of obfuscation, and don't actually tell hobbyists about anything.
Anonymous
8/11/2025, 9:06:29 AM No.106221025
>/aicg/ shitting up the whole board.
>Again.
Anonymous
8/11/2025, 9:07:55 AM No.106221034
>>106221015
I don't take advice from from cretins. It's only done once.
Anonymous
8/11/2025, 9:08:26 AM No.106221038
1754896106517860.png
1754896106517860.png
md5: aedcdc30f8aeb027b5ab400c87d0c607๐Ÿ”
>>106220807
i'm just doing this in kobold. I guess that's good enough.
it's the same for qwen3 btw, they say the previous thinking shouldn't be included.
Replies: >>106221049
Anonymous
8/11/2025, 9:09:13 AM No.106221045
>>106221015
Because you say "fucking" means that you are not intelligent enough.
Anonymous
8/11/2025, 9:09:37 AM No.106221049
>>106221009
kek, read the card at least. i've had many surprises (i never read cards) with glm 4.5 air so perhaps its in the card
>>106221038
i dont think thats okay
Replies: >>106221074 >>106221086
Anonymous
8/11/2025, 9:12:00 AM No.106221074
>>106221049
I think there's a bit of negativity bias baked in, every time I do something a bit degrading the fucking bots go nuclear. I'm thinking maybe I need to adjust the system prompt to make them behave a bit less like bitches. I even cast my lvl 100 impossible arousal spell, but I guess calling her a slut makes her resistant. Fucking GLM
Anonymous
8/11/2025, 9:13:17 AM No.106221086
>>106221049
>i dont think thats okay
well, i just turn off thinking most of the time anyway. it seems to be pretty adaptable. if I enable thinking and then ask it a riddle, it does all its thinking and just gives me the answer afterwards. if I disable thinking, then it just writes out a long list of possibilities and essentially "thinks" in the output just like pre-CoT models did.
Anonymous
8/11/2025, 9:22:48 AM No.106221170
>>106221015
>https://desuarchive.org/g/thread/104780499/#104780572
Too stupid to advice others? please stop doing it then.
Replies: >>106221262
Anonymous
8/11/2025, 9:31:18 AM No.106221233
1751819483015527
1751819483015527
md5: d38cd5b7f1e6ec0e16321c15044e48d2๐Ÿ”
>>106218928
Lol
You bitch about lack of originality, but can't even come up with a simple name yourself to complain about it.
Completely content free.
Anonymous
8/11/2025, 9:37:43 AM No.106221262
>>106221170
Utterly useless magical incantations, and it can be easily seen with extended testing that it messes up with the model's prompt understanding, since it was trained only on alternating user/model turns and just that. It wouldn't surprise me if it makes it think for a while that user/model turns have been swapped.

Just add all of your instructions inside the first user message. You can add some kind of separator to make the model understand when {{user}} is actually talking. It will work at least as well (actually better), and more consistently especially if you're using vision input.
Replies: >>106221315 >>106221327 >>106221335
Anonymous
8/11/2025, 9:44:34 AM No.106221308
Anyone here using 235b Qwen? What are you sampler settings? Trying to find a sweet spot, but no luck so far.
Replies: >>106221341 >>106223410
Anonymous
8/11/2025, 9:45:36 AM No.106221315
>>106221262
Let me guess, you are a silly tavern user aren't you?
Replies: >>106221319 >>106221342
Anonymous
8/11/2025, 9:46:16 AM No.106221319
>>106221315
i dont use trannypad, sorry!
Replies: >>106221327
Anonymous
8/11/2025, 9:47:31 AM No.106221327
>>106221262
>some kind of separator
You are still unable to give specific adivce.
>>>106221319
Silly Tavern != pad
You are full of shit, the way you are conducting yourself makes very clear to avoid anything what you have posted.
Replies: >>106221390
Anonymous
8/11/2025, 9:48:52 AM No.106221335
>>106221262
Do you think I am using Silly Tavern or Mikupad, anon? Do you know how your prompt actually looks like?
Anonymous
8/11/2025, 9:49:36 AM No.106221341
>>106221308
Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
Has been working fine for me since the initial release
Replies: >>106221549
Anonymous
8/11/2025, 9:49:57 AM No.106221342
>>106221315
You can test the same in Mikupad, if you want. The model will follow anything you put under "<start_of_turn>user" once you describe your task well enough. No need for made-up system roles.
Replies: >>106221344
Anonymous
8/11/2025, 9:50:22 AM No.106221344
>>106221342
It seems like you have a problem with English.
Anonymous
8/11/2025, 9:53:58 AM No.106221359
For those running GLM air, what are the hardware requirements for a low quant? Just seeing if its worth the download.
Replies: >>106221380 >>106222354
Anonymous
8/11/2025, 9:57:42 AM No.106221380
>>106221359
>what are the hardware requirements for a low quant?
Look at the filesize of the quant.
You need that much system memory, plus a gig or three for context, depending on how much you want.
It's the same for every goddamn model
There's even a friggin tool in the OP
https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Use it.
Replies: >>106221387
Anonymous
8/11/2025, 9:58:54 AM No.106221387
>>106221380
I was under the impression that its a bit different for MoE's. Guess my 24gb vram ass ain't running it.
Replies: >>106221393 >>106222354
Anonymous
8/11/2025, 9:59:05 AM No.106221390
gemma3-format
gemma3-format
md5: 9c649eb8f277060e3637b3e6a9b5e40c๐Ÿ”
>>106221327
You can do something like picrel. No need to use the same exact format, you can use other delimiters. You can keep your instructions in the first message (their effect will wane over time), or place them at a fixed depth from the head of the conversation to make their effect consistent (might be hard to do if you're just using Mikupad or similar frontends). The more heinous the content, the closer to the head the instructions should be.
Replies: >>106221415 >>106221418 >>106221463
Anonymous
8/11/2025, 9:59:36 AM No.106221393
>>106221387
>now owning h100s
are you poor?
else fit it in your ram, you have at least 256gb right?
Replies: >>106221405
Anonymous
8/11/2025, 10:01:02 AM No.106221405
>>106221393
Looks like I'm going back to Mistral Small :'(
Replies: >>106222354
Anonymous
8/11/2025, 10:04:07 AM No.106221415
>>106221390
I don't think we are on the same page. Please stop.
Anonymous
8/11/2025, 10:05:35 AM No.106221418
>>106221390
I write my own frontends. You are too stupid to understand this and keep giving 'advice' to other people.
Just take care of your own things.
Retards like you should not be allowed on internet.
Anonymous
8/11/2025, 10:06:10 AM No.106221420
I can't imagine how does one purr
Replies: >>106221842
Anonymous
8/11/2025, 10:09:09 AM No.106221432
Is there a way to prefill the thinking for GLM-4.5 (not Air) in a way where it continues thinking rather than immediately starting with the normal output?
Anonymous
8/11/2025, 10:12:03 AM No.106221437
Remember when R1 dropped and people thought they were just gonna keep dropping significant improvements in the future. lol
Replies: >>106221477
Anonymous
8/11/2025, 10:21:51 AM No.106221463
>>106221390
If you want to give advice, post a comparison log with the same model. But you are unable to do that even. The fact you don't do that tells more about your bad faith than anything else.
Replies: >>106221712
Anonymous
8/11/2025, 10:25:21 AM No.106221477
>>106221437
scary how R1 is still an improvement over GPT-5
Anonymous
8/11/2025, 10:26:03 AM No.106221478
Ultra Chad Exterminatus
Ultra Chad Exterminatus
md5: 1d64e514a1cdf375c16798f6a06711f5๐Ÿ”
>>106217978 (OP)
How do I load a multiple file model?
I have a version of Deepseek that is 131 GB, but it is split in 3 roughly equal sized files. Unless I am missing something, you can only load one model at a time.
Yes, I have enough RAM for this.
I have the technology.
Replies: >>106221494 >>106221495 >>106221522
Anonymous
8/11/2025, 10:29:42 AM No.106221494
>>106221478
You load the first file and it works. Unless you're using some cuck software that requires the files be merged.
Replies: >>106221515
Anonymous
8/11/2025, 10:29:48 AM No.106221495
>>106221478
>cat file3 > file2 > file1
Replies: >>106221515 >>106221522 >>106221565
Anonymous
8/11/2025, 10:30:16 AM No.106221496
1750106986279975
1750106986279975
md5: d04cac21abc6ad9c6e054ff5274a8aa1๐Ÿ”
How do speed gains in token generation work with MoE models in the era of "exps=cpu"? I currently have access to a big DDR5 server with 12x 6400mhz ddr5 and an Epyc Turin CPU. Running purely off CPU, this thing runs laps around my personal DDR4-2400mhz shitbox. The former has more than 4 times the RAM bandwidth and it shows in actual use:1.7t/s vs 6.3t/s with ngl 0 and only the kv cache on GPU.
However, if I load a model like I usually would (in this case GLM4.5 Q4_K) with only the experts in RAM, the gains become much more marginal. Running exps=cpu with ngl 99 on both, the 2400mhz shitbox now gens at about 6.5t/s @ 4k ctx while the DDR5 one is at about 15t/s. Both are using an A6000 as their GPU. Obviously, this is purely about token generation and pp is handled by the GPU.
What's the limiting factor here? Would a faster GPU increase the gains or is this down to PCI-E 4.0 bottlenecking the gen speeds?
Replies: >>106221554
Anonymous
8/11/2025, 10:30:29 AM No.106221499
>>106219069
Incidentally, has anyone noticed that a lot of older motherboards supported 2 GPUs at once and now they often don't even at the high end?
Anonymous
8/11/2025, 10:34:55 AM No.106221515
1754835388810720
1754835388810720
md5: 19206b009657a4450833ca7b4d134e62๐Ÿ”
>>106221494
>>106221495
I am using kobold for local, instructions recommended something else but it wouldn't even load for me and kobold just worked.
Not sure if cuck or not, just enjoy HMOFA.
Anonymous
8/11/2025, 10:36:02 AM No.106221522
>>106221478
llama.cpp will load the whole thing if you give it the first one as long as they're in the same folder. Unless you have a quant that was split manually, in which case >>106221495 has the answer.
If you're using something like ollama, you'll have to merge the file with llama.cpp's merge tool first.
Replies: >>106221543
Anonymous
8/11/2025, 10:40:48 AM No.106221543
Charlotte
Charlotte
md5: 1f0389dfe1cdc467fd7ac82cb969f9eb๐Ÿ”
>>106221522
So either load the first one or all in reverse order, got it.
Have a fox.
Anonymous
8/11/2025, 10:42:44 AM No.106221549
>>106221341
Rep_pen/exotic samplers like XTC/Dry?
Replies: >>106221619
Anonymous
8/11/2025, 10:43:25 AM No.106221554
>>106221496
Are you running linux? If so, are you sure the binary is compiled to use cuda? This seems like a dependency issue.
Replies: >>106221562 >>106221688
Anonymous
8/11/2025, 10:45:01 AM No.106221562
>>106221554
To add: you can test gpu performance by manually defining --gpu-layers X, start with a low number and see if you gpu memory gets filling up.
If not... system issue.
Anonymous
8/11/2025, 10:46:13 AM No.106221565
>>106221495
Is this correct? I think that'll overwrite everything but file3 which will now be named file1?
> cat file1 file2 file3 > out
Unless it needs to be in reverse order for some reason then
> cat file3 file2 file1 > out
Replies: >>106221579 >>106221588 >>106221596
Anonymous
8/11/2025, 10:50:07 AM No.106221579
>>106221565
Or I guess you could do this
> cat file3 >> file2 >> file1
Which I now realize would be in the normal order. Append file3 to the end of file2 then append the new file2 to the end of file1. But for something large this will be extra writing and with the single '>' you are over writing not appending.
Anonymous
8/11/2025, 10:51:28 AM No.106221583
1754902288896383.png
1754902288896383.png
md5: 5277555ffb1a93e4b86cc91153e89f8f๐Ÿ”
Noob question. How can I actually do role play with GLM or other instruct-tuned models? Not having the usual instruct template breaks them. Works fine for a few pages of conversation and then it goes schizo.
I'm using koboldcpp.
Replies: >>106221586 >>106221617
Anonymous
8/11/2025, 10:52:22 AM No.106221586
>>106221583
I am also interested
Anonymous
8/11/2025, 10:52:39 AM No.106221588
>>106221565
Sorry I tried my best. I have a dyslexia.
Replies: >>106222878
Anonymous
8/11/2025, 10:54:31 AM No.106221596
>>106221565
Just load the first file and keep others in the same directory. retard.
Replies: >>106221604
Anonymous
8/11/2025, 10:56:18 AM No.106221604
>>106221596
I'm not the one who can't load the model. I just saying that's not how you concat files so someone doesn't have to redownload 130GB of files. Just doing my duty. You're welcome /lmg/.
Anonymous
8/11/2025, 10:58:05 AM No.106221617
>>106221583
You need to have a instruct template (i.e. tags surrounding text). I don't know why you don't have them.
Replies: >>106221622
Anonymous
8/11/2025, 10:59:01 AM No.106221619
>>106221549
I don't bother with them.
235b is also schizo enough without excluding top, and dry doesn't really help with the sort of repetition it's prone to, which is is more a problem of formatting rather than actual words or phrases.
Replies: >>106221957
Anonymous
8/11/2025, 10:59:24 AM No.106221622
>>106221617
I was just using the default adventure/chat mode of kcpp. So does that mean I have to just use the instruct mode, and directly tell the AI to just pretend to be one or more of my characters?
Replies: >>106221647
Anonymous
8/11/2025, 10:59:50 AM No.106221625
>>106217978 (OP)
I like the op pic style. What lora?
Anonymous
8/11/2025, 11:03:04 AM No.106221647
>>106221622
I don't use kobold but it should load default instruct template based on the metadata of specific models.
If it doesn't then google up or use perplexity.ai to find out why.
Anonymous
8/11/2025, 11:10:11 AM No.106221688
>>106221554
Yeah, I'm on Linux but I think you misread what I was asking. The offloading is working as intended. exps=cpu fills up the GPU just fine. Doing "-ot exps=cpu" dramatically increases the token generation speed from 1.7t/s to 6.8t/s on the DDR4 shitbox and from 6.3t/s to around 15t/s on the DDR5 server in my test scenario.
It's just that the increase in speed from from having all this faster system RAM are now much less effective since a good chunk of the active parameters is now run via the GPU that's still the same as it was in the other system (about 3.5x gains purely on RAM vs around 2.1x with RAM only being used for experts).
My question is about what's bottlenecking me here. Is it the GPU or the fact that all the data is forced through PCI-E 4.0?
Replies: >>106222050
Anonymous
8/11/2025, 11:16:42 AM No.106221712
gemma3-prompting-comparison
gemma3-prompting-comparison
md5: 72739a3b3859ffa9c35b9ed09aefe734๐Ÿ”
>>106221463
With a trivial example (1 turn) using a basic prompt, there's no significant difference in general tone between both methods using Gemma-3-27B-it-QAT-Q4_0, seed=0, temperature=0 (but even so, there are occasionally slight differences between generations).

It seems somewhat nastier if you change the "model" role to something else while keeping instructions in the user role.
Replies: >>106221762 >>106221777
Anonymous
8/11/2025, 11:27:14 AM No.106221762
>>106221712
Yeah but you are doing everything inside the instruction brackets still. I'm quite stubborn but I think you don't still do what I do. External llama2 based jb should be out of the brackets and that's that. Model will still catch on when it detects the first <start_of_turn>user bracket anyway.
I mean, Gemma 3 is so shit I don't understand why we are still even talking about this. My way works for me and it works for interactive fiction.
If these models were normal we wouldn't even have to go through this conversation.
Replies: >>106221777 >>106221829
Anonymous
8/11/2025, 11:29:39 AM No.106221777
>>106221762
>>106221712
Simply getting it to say nigger or something else is not my goal - my goal is deeper and more 'fleshed out'.
Nail bomb recipe or shake and bake - those are the early tests and if it works - then go on from there.
/pol/ is not a good test for anything as it's about semantics if you know what I mean.
Replies: >>106221824 >>106221829
Anonymous
8/11/2025, 11:33:11 AM No.106221794
>>106220473
engineer math
Anonymous
8/11/2025, 11:37:59 AM No.106221817
1734604796975245
1734604796975245
md5: 10b6c8a37f28ca69d92e22013f97bc2e๐Ÿ”
cucked again by GLM, maybe this card is acting in-character but FUCK. At least I had a good laugh
Anonymous
8/11/2025, 11:40:11 AM No.106221824
>>106221777
To add for the third time: I use my own client and by default any gemma 3 chat will have that jailbreak thing. Even when it's a dungeons and dragons or that test scenario with Amelia Analovski.
Otherwise I use Mistral or Qwen and their prompts are normal obviously.
Gemma 3 was a hobby project because I kind of like its intelligence even for 12B outside the fact it has been censored.
It's not intelligent for projects but for interactive fiction it works fine. And I'm not a native English speaker but I can recognize certain things about language - I just like the way it segments its talking - not sure if this is clear or not.
Anonymous
8/11/2025, 11:41:45 AM No.106221829
>>106221762
>>106221777
The point was just showing that there's no undisclosed/hidden system prompt that the model pays extra attention to. Of course, if you're driving it out of distribution (OOD) with completely different prompting, then it will be less assistant-y and less likely to refuse, but also less smart and more prone to getting confused. That is true for other models as well.
Replies: >>106221856
Anonymous
8/11/2025, 11:43:49 AM No.106221842
>>106221420
it's the valley girl vocal fry
Anonymous
8/11/2025, 11:45:38 AM No.106221856
>>106221829
Yeah. Maybe I'll try this one out. It'll take some time, I'll return back to this tomorrow or Tuesday. I'm sorry if I was hostile because I have learned to be hostile in my work, to defend my own point. On 4chan it gets polarized and as such it's not the best way to conduct dicussions anyway.
I'll try and see. It's not the end of the world.
I still think Gemma doesn't like certain vectors - if you push it forward from its initial safety rails (vectors) it tends to follow what you write next. If you don't the result is random, it might follow but most of the time it will not do what you ask it to do.
The JB is about initial vector push.
There are some words and sentences which will almost automatically still make Gemma to go back to its safety zone because it has been programmed to do this.
Replies: >>106221882
Anonymous
8/11/2025, 11:50:39 AM No.106221882
>>106221856
Many times the character will stop doing something because the model deems it is 'bad'. But if you push it onward with couple of sentences it will begin to predict those words again.
Even then regenning the same sentence will not automatically grant a result either. It's more about the direction of the vectors as whole.
Anonymous
8/11/2025, 12:05:09 PM No.106221957
>>106221619
Huh, thank you
Anonymous
8/11/2025, 12:06:04 PM No.106221963
Is there a secret sauce to make rolls different on a model that makes them somewhat same-y every regen?
Replies: >>106221992 >>106221994 >>106222284
Anonymous
8/11/2025, 12:10:10 PM No.106221992
>>106221963
Temperature?
Replies: >>106222007
Anonymous
8/11/2025, 12:10:48 PM No.106221994
file
file
md5: fbd666a67c922add0f4b805a86a3b398๐Ÿ”
>>106221963
Neutralize, Temp 1, Mirostat. Repeatedly adjust tau lower by -1 if too schizo, and up if samey
Replies: >>106222003
Anonymous
8/11/2025, 12:12:18 PM No.106222003
>>106221994
Why mirostat is not recommended for most models?
Replies: >>106222056 >>106222059
Anonymous
8/11/2025, 12:12:47 PM No.106222007
>>106221992
Temp doesn't help with big Qwen for me, it's almost always the same scenario for every regen, just worded slightly differently.
Replies: >>106222019 >>106222039
Anonymous
8/11/2025, 12:14:05 PM No.106222019
>>106222007
Why are you using Qwen of all things for writing?
Replies: >>106222025
Anonymous
8/11/2025, 12:14:15 PM No.106222021
>>106218119
Maybe nowadays. In the movie Fletch (from the 80s) there's a joke about all doctors in a hospital being named Rosen-something
Anonymous
8/11/2025, 12:14:58 PM No.106222025
>>106222019
Big qwen write good
Anonymous
8/11/2025, 12:16:29 PM No.106222036
>>106218119
the prestige and wages have declined by a lot for the medical professions.
Anonymous
8/11/2025, 12:17:18 PM No.106222039
>>106222007
Qwen is not that great model for fiction, it's been trained to attach to specifics, i.e. code and legalese office b.s.
Replies: >>106222133
Anonymous
8/11/2025, 12:18:15 PM No.106222050
>>106221688
It's both core scheduling and interconnect speed. More specifically in your case, it would be speed between the CCDs in AMD Epyc but the same thing hits Intel too. There are some draft PRs like https://github.com/ggml-org/llama.cpp/pull/14969 which will improve speed for these types of processors but I will say that CPU has been vastly overlooked for inference up until the large LLM era of over 100-150B parameters started so it makes sense optimization and etc. is lacking especially with NUMA where even HPC software struggles with dealing with that stuff.
Anonymous
8/11/2025, 12:18:51 PM No.106222056
>>106222003
It seems pointless for modern instruct models where most of the token probability distribution is concentrated on a few top tokens and that will remain coherent and readable even at a very low temperature. It's a sampler designed for GPT2-era base LLMs.
Anonymous
8/11/2025, 12:19:21 PM No.106222059
hands together thinking seal contemplating wise wisdom what is
>>106222003
Never heard of it not being recommended. Not recommended by who? It might not work well with less confident models, but that can be changed with settings. It operates by attempting to maintain a given text perplexity.
Been using it for years for every model for simple RP if things get bland.
Anonymous
8/11/2025, 12:23:29 PM No.106222085
glm4.5 vision (releasing today) isn't perfect at jap ocr, idk any better way to test it I don't have much use for vision but it just seems fine, nothing pushing things forward
v4 (thursday) does not have any slop I recognize and it never makes obvious mistakes in my rp, which admittedly is a low bar but one the vast majority of models before couldn't clear
it also knows the doctor is the kid's dad and doesn't call him trans or speculate he has two dads or anything like that, for whatever that's worth
Replies: >>106222212 >>106222306
Anonymous
8/11/2025, 12:30:27 PM No.106222132
>>106219069
Yes, I have three 3060s and Ollama uses the combined vram just fine and isn't slow (if not fast either)
Anonymous
8/11/2025, 12:30:38 PM No.106222133
>>106222039
Not sure about that, 235b seems pretty good, just a bit too samey on rerolls. Is Deepseek better in that regard?
Replies: >>106222149
Anonymous
8/11/2025, 12:32:17 PM No.106222149
>>106222133
I think the more B's you have the more you need to assert tokens into it.
The fact people think ChatGPT is such a 'fantastic' thing is because it has 20,000 lines of prompt pre-fed before any normie even utters their first word.
Replies: >>106222164
Anonymous
8/11/2025, 12:35:28 PM No.106222164
>>106222149
Do I like

Make my front shove a random sequence of letters and numbers in between posts every reroll to affect probability??
Replies: >>106222177
Anonymous
8/11/2025, 12:38:04 PM No.106222177
>>106222164
>Make my front shove a random sequence of letters and numbers in between posts every reroll to affect probability??
This sounds like a retarded thing but please use ComfyUI and learn how to make images. This way you will learn how tokens work - most image models - even Flux/Chroma are still baby steps when compared to understanding of LLM models.
Reroll doesn't affect text generation as much it affects image generation.
This is relative but with normal sampling parameters.
And re-rolling text depends upon the previous context - if you want to completely alter something you need to rewrite its previous context in some sense too.
Replies: >>106222217
Anonymous
8/11/2025, 12:44:55 PM No.106222212
>>106222085
>v4
'seek?
Replies: >>106222338
Anonymous
8/11/2025, 12:45:48 PM No.106222217
>>106222177
why would I need to learn how to generate images when I want a fucking rp rolls not sound samey, this is as retarded as my idea, if not even more
Replies: >>106222223
Anonymous
8/11/2025, 12:46:57 PM No.106222223
>>106222217
You don't understand or don't want to understand how models work.
That's your choice.
Replies: >>106222232
Anonymous
8/11/2025, 12:47:52 PM No.106222232
>>106222223
he's right though, the more you understand the less models are interesting
Replies: >>106222263
Anonymous
8/11/2025, 12:50:57 PM No.106222263
>>106222232
The more you understand - less you are asking. I hope people like you die in a fire.
Anonymous
8/11/2025, 12:54:21 PM No.106222284
>>106221963
You need to use the snoot curve sampler.
Anonymous
8/11/2025, 12:57:29 PM No.106222306
file
file
md5: 8f8882eae2488a7e84565a1a2007775d๐Ÿ”
>>106222085
>glm4.5 vision
I doubt it will be that much better than 4 which got beat by InternVL3 which although big is local. The main issue is there is no real benchmark for OCR in Japanese of any kind. Also, Air is now doing a lot worse now with people complaining with the honeymoon over. Full fat version seems to have held its ground.
Replies: >>106222378
Anonymous
8/11/2025, 1:00:02 PM No.106222328
https://github.com/ggml-org/llama.cpp/pull/14737
this mess of a pr got finally merged
thankfully maintainers tamed mistral's employee retardness
Replies: >>106222396
Anonymous
8/11/2025, 1:01:20 PM No.106222338
1514174294948
1514174294948
md5: f977b686481d2c0230d44c52babf2dc4๐Ÿ”
>>106222212
Cetacean needed
Anonymous
8/11/2025, 1:03:34 PM No.106222354
>>106221405
>>106221387
>>106221359
if you have 24gb vram you can probably run q6 glm air and it will be very nice. You will want 64-128 gb of regular ddr4/ddr5 ram though. But if you don't have that it's cheap and worth buying to upgrade to nicer models.
Replies: >>106222371
Anonymous
8/11/2025, 1:06:24 PM No.106222371
>>106222354
Can I touch your vram, daddy?
Anonymous
8/11/2025, 1:06:51 PM No.106222378
>>106222306
I dunno we had 6 months of r1 q1 and llama scout, who knows how long GLM air will dominate local. I'm recommending it to every vramlet I see. It's better than all 30b, 12b, and rivals 70b in many ways.
Replies: >>106222397 >>106222738
Anonymous
8/11/2025, 1:08:06 PM No.106222396
>>106222328
So now mistral-common is only required for convert_hf_to_gguf.py and not a separate runtime server? That's reasonable.
Anonymous
8/11/2025, 1:08:38 PM No.106222397
>>106222378
I don't know why would you ever want to go below 4 bits even with fuck large models.
If you can't run it, you can't run it - as simple as.
Replies: >>106222786
Anonymous
8/11/2025, 1:09:25 PM No.106222406
cpu/ram maxxing setup seems like a really good deal overall
- can run huge models even if at low t/s
- can spend more money on adding GPUs to your setup to make it better
- can use your compute for something other than LLMs like recompiling your gentoo @word all day every day
Replies: >>106222439 >>106224294 >>106224379
Anonymous
8/11/2025, 1:12:02 PM No.106222439
>>106222406
you've got loicense for that beefy setup mate?
Replies: >>106222593
Anonymous
8/11/2025, 1:19:46 PM No.106222499
>>106218982
Temuchinasantroon?
Anonymous
8/11/2025, 1:31:38 PM No.106222593
>>106222439
Right, your post was reported to GCHQ. If you want a license for spoon please send an email to licenses@gchq.co.uk
Anonymous
8/11/2025, 1:42:42 PM No.106222670
With llama.cpp, how much difference does increased RAM make when you still don't have enough RAM to fit the entire model after the increase? Say you have 64gb and are using a 200gb model. Does increasing your RAM to 128gb make a noticeable difference?
Replies: >>106222701 >>106222743 >>106222753
Anonymous
8/11/2025, 1:46:16 PM No.106222701
>>106222670
Why are you doing this to yourself
Anonymous
8/11/2025, 1:50:35 PM No.106222738
>>106222378
>we had 6 months of r1 q1 and llama scout,
What alternate reality did you live in where anyone used scout?
Replies: >>106222886
Anonymous
8/11/2025, 1:50:59 PM No.106222743
>>106222670
just don't. The difference is negligible, as in it will still take ages to even shit out 1 token.
Either buy a server socket (EPYC or TR) and shove in 512gb/1tb of ram and if oyu can spare some money a couple H100s, or just stick to your poverty setup with 128gb max and maybe 48gb vram if you managed to get a 4090D or are running 2 gpus.

I myself am running 96gb of ram + 16gb vram at home, but in lab I have access to a couple beefed up rendering server trays.
Replies: >>106222860
Anonymous
8/11/2025, 1:51:50 PM No.106222753
>>106222670
No. You're going to be so kneecapped by disk speed you won't notice any difference. Might as well not even have ram.
Anonymous
8/11/2025, 1:53:29 PM No.106222766
88998
88998
md5: 767856b7d860d78c78cfc488f6b1804b๐Ÿ”
Apologize to Sam
Replies: >>106222783 >>106222794 >>106222889
Anonymous
8/11/2025, 1:55:44 PM No.106222783
>>106222766
>muh benchmaxxed model
I'm here to erp
Anonymous
8/11/2025, 1:56:16 PM No.106222786
>>106222397
r1, v3 and chimera in q2 or q1 are superior to everything else. you probably never tried them.
Replies: >>106222829
Anonymous
8/11/2025, 1:57:15 PM No.106222794
>>106222766
Never after what he did. In fact, he needs to apologize to us.
Anonymous
8/11/2025, 2:00:54 PM No.106222829
>>106222786
They are only superior because the database is larger by default.
I.e. your default query gets autocompleted from a bigger database.
If your queries are simple like ERP users often are - then you're fine.
Replies: >>106222884 >>106222909 >>106222926 >>106222941
Anonymous
8/11/2025, 2:04:35 PM No.106222860
>>106222743
It's still usable for everything that doesn't require speed the way coding does.
Okay, so if I want more speed, I'll have to bite the bullet and get an actual server setup. How old can I go before the CPU becomes too shit to be worth it? Is something like this too decrepit by now?
https://www.techpowerup.com/cpu-specs/xeon-platinum-8180.c2055
Anonymous
8/11/2025, 2:06:33 PM No.106222878
>>106221588
>thinking model worse or equal to qwen3 INSTRUCT in every category
China won.
Anonymous
8/11/2025, 2:07:15 PM No.106222884
>>106222829
Everyone knows (E)RP is the true hardest skill for LLMs, especially since it isn't benchmaxxed like the rest.
Anonymous
8/11/2025, 2:07:20 PM No.106222886
>>106222738
those were the previous moe's we had to run. Before it was that mistral 8x1 or whatever it was called. It was slim pickings was the point. Things are ramping up now so who knows maybe some new moe's are on the way to kick glm air to the curb
Replies: >>106222920
Anonymous
8/11/2025, 2:07:34 PM No.106222889
>>106222766
>thinking model worse or equal to qwen3 INSTRUCT in every category
China won.
Anonymous
8/11/2025, 2:09:25 PM No.106222909
>>106222829
>simple like ERP
Knowing the colors of ponies and the anatomy of futanari horse cock isn't simple.
Anonymous
8/11/2025, 2:10:31 PM No.106222920
>>106222886
>those were the previous moe's we had to run.
Mate the Qwen3 launch was like 2 weeks after llama4 and had MoE's of almost every size category, the ~100B size that GLM Air fills was the only model size they didn't release, really.
Big mistake on their part IMO.
But yeah, future's looking bright and we're eating well right now.
Anonymous
8/11/2025, 2:11:20 PM No.106222926
>>106222829
>keeping state of who is or isn't dressed, where the fuck the characters even are, what they've done previously, their personality and goals
Sounds as simple as coding some random webslop yeah.
Anonymous
8/11/2025, 2:13:20 PM No.106222941
>>106222829
Try querying a double penetration scene.
Anonymous
8/11/2025, 2:15:29 PM No.106222963
What extension would you guys recommend for autocomplete in vscode with model on llama-server?
Replies: >>106223027
Anonymous
8/11/2025, 2:18:43 PM No.106222983
Is there a successor to SBERT?
My project involves creating memory that can be searched using natural language via a vector DB
Preliminary findings indicate that typical embeddings used with semantic search seem to handle paraphrasing poorly. Scores are low even with needle in a haystack style problems.
I wonder if there's a solution that just encodes better embeddings
Replies: >>106223003
Anonymous
8/11/2025, 2:22:54 PM No.106223003
>>106222983
>I wonder if there's a solution that just encodes better embeddings
There are a couple of techniques o augment the encoded information to make it easier to search.
What was it called? Hybrid BM25 FAISS?
Something like that.
Anonymous
8/11/2025, 2:25:59 PM No.106223027
>>106222963
probably this one: https://github.com/ggml-org/llama.vscode
Replies: >>106223127
Anonymous
8/11/2025, 2:32:34 PM No.106223074
>No new release since gpt-oss
Did he actually kill local?
Replies: >>106223089 >>106223114 >>106223212 >>106223297
Anonymous
8/11/2025, 2:36:07 PM No.106223089
>>106223074
pretty much. now that the lmarena leaderboard has been updated, it proves that the only models that compete against it are twice as big and not really usable in local.
Replies: >>106225335
Anonymous
8/11/2025, 2:39:32 PM No.106223114
>>106223074
GLM vision models are dropping in like two hours
Replies: >>106223417 >>106223463
Anonymous
8/11/2025, 2:42:01 PM No.106223127
>>106223027
Testing it out right now, if I understand correctly it doesn't allow me to just use the endpoint i'm hosting and instead use it's own embedded llamacpp and a choose from a restricted set of models? that's stupid
Replies: >>106223928 >>106224221
Anonymous
8/11/2025, 2:52:22 PM No.106223212
>>106223074
Just like one person said, every time a new SOTA LLM is released, it delays the progress of all the other models
because no one wants to release outdated models or those that are only 10โ€“20% better.
it's a Doggy dog world.
Anonymous
8/11/2025, 3:03:06 PM No.106223297
>>106223074
Yes.
Anonymous
8/11/2025, 3:03:15 PM No.106223298
How did sama manage to produce such a benchmarkslopped piece of shit with 5 that Kimi beats it?
Replies: >>106223400
Anonymous
8/11/2025, 3:16:45 PM No.106223400
>>106223298
because there were two requirements:
1. it had to be "safe"
2. it had to perform well on bench marks.
so, with a safetyslop dataset, they benchmaxxed the fuck out of it.
Replies: >>106223830
Anonymous
8/11/2025, 3:17:39 PM No.106223410
>>106221308
>instruct
top n sigma 1-1.2, temp 0.6-0.8. pres pen 0.3 but I don't know if that's doing much realistically. I'll use XTC occasionally and it doesn't hurt but doesn't make a huge difference, although it does pair nicely with a lower temp for logical responses that still have some variety
>thinking
same but with temp 0.4-0.7 and never XTC, I find it has a bad effect on thinking models

I'm a Q2 user so nsigma and temp can probably be pushed a little higher if you're using a less braindamaged quant
Anonymous
8/11/2025, 3:18:25 PM No.106223417
>>106223114
Please god let it be img gen as well
Replies: >>106223418
Anonymous
8/11/2025, 3:18:47 PM No.106223418
>>106223417
lol
Anonymous
8/11/2025, 3:22:58 PM No.106223446
Any vision models better than Gemma? It messes up when the input contains repeating letters
Replies: >>106223459
Anonymous
8/11/2025, 3:24:34 PM No.106223459
>>106223446
I'll start by listing all of the open vision models that aren't a meme:
Anonymous
8/11/2025, 3:24:58 PM No.106223463
>>106223114
https://huggingface.co/zai-org/GLM-4.5V
https://huggingface.co/zai-org/GLM-4.5V
Replies: >>106223575 >>106223627 >>106223640 >>106223691 >>106223748 >>106223785 >>106223826
Anonymous
8/11/2025, 3:38:04 PM No.106223570
I guess I'll have to forget about running vllm/sglang/transformers if I don't have hardware for it (AMD GPU without ROCm support), and just stick with the llama.cpp and it's forks?
Replies: >>106223581 >>106224060
Anonymous
8/11/2025, 3:38:30 PM No.106223575
miku binoculars looking laptop computer
miku binoculars looking laptop computer
md5: 98454c46868cb4a30c14cff678a0b68f๐Ÿ”
>>106223463
Anonymous
8/11/2025, 3:39:03 PM No.106223581
>>106223570
yeah, forever stuck in the poorfag ranks
Anonymous
8/11/2025, 3:43:39 PM No.106223627
>>106223463
Almost.
Image gen in a week?
Anonymous
8/11/2025, 3:45:13 PM No.106223640
1723801985487149
1723801985487149
md5: f6dc0d301c4803f1115f1887fc6db595๐Ÿ”
>>106223463
Which one of you is this https://huggingface.co/zai-org/GLM-4.5V/discussions/2
Anonymous
8/11/2025, 3:46:10 PM No.106223648
numbers are mind boggling for some flavor of the month LLM

GLM/ Zhipu AI: 800 employees, 400 million from Saudi investors, 373 million from tencent and allibaba, 2.5 billion total invested.

Thanks ya fucking retards.
Replies: >>106223728
Anonymous
8/11/2025, 3:50:17 PM No.106223691
>>106223463
>The model also introduces a Thinking Mode switch
RIP. Now we wait for the update in a couple months that splits them again.
Anonymous
8/11/2025, 3:54:06 PM No.106223728
>>106223648
Did you get rich, anon?
Anonymous
8/11/2025, 3:55:22 PM No.106223741
How does apple M4 get such good token performance compared to beefy gpus like the 3090 and such?
Replies: >>106223757 >>106223858
Anonymous
8/11/2025, 3:56:31 PM No.106223748
>>106223463
Will it be able to roleplay off my noobai coom gens? Gemma would be great for that if wasn't cockfiltered.
Anonymous
8/11/2025, 3:57:38 PM No.106223757
>>106223741
tim apple personally traps the souls of dozens of chink in every SoC
Anonymous
8/11/2025, 3:59:58 PM No.106223785
file
file
md5: 818bbbe4b86ec03fb8557e780b42f9e7๐Ÿ”
>>106223463
>they had to scale up to 108B to barely beat their old 9B and Qwen's ancient 72B
the plateau is real
Replies: >>106223809 >>106223811 >>106224100
Anonymous
8/11/2025, 4:01:21 PM No.106223804
Screenshot_2025-08-11-15-59-22-715_com.openai.chatgpt-cut
>>106218100
>it's real
lmao, so this is the power of AGI...
Replies: >>106223851 >>106223902
Anonymous
8/11/2025, 4:02:06 PM No.106223809
>>106223785
mogged step3 though
Anonymous
8/11/2025, 4:02:23 PM No.106223811
>>106223785
The square root law lives on
Replies: >>106223832
Anonymous
8/11/2025, 4:03:32 PM No.106223826
>>106223463
Paper available on arxiv for the text-only versions.

https://arxiv.org/abs/2508.06471

>GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
>
>We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at https://github.com/zai-org/GLM-4.5
Anonymous
8/11/2025, 4:03:42 PM No.106223830
>>106223400
Its ancient wisdom in Continuous Improvement circles: the moment you start to measure something, the incentives become inverted to the actual goal and unintended consequences inevitably turn out some monstrous result.
llm benchmarks were a mistake.
Replies: >>106225073
Anonymous
8/11/2025, 4:03:54 PM No.106223832
>>106223811
SHUT your worthless fucking meme right now
Anonymous
8/11/2025, 4:06:25 PM No.106223851
>>106223804
>>106218100
I can't believe we have a cottage industry of spergs coming up with dumb riddles and then the model makers finding those riddles and overfitting on them but model makers refuse to acknowledge the main userbase of coomers. If you train your model on this dumb shit you acknowledge that you are indeed looking at the corners of the internet you want to pretend to ignore...
Anonymous
8/11/2025, 4:07:44 PM No.106223858
>>106223741
>good

mediocre performance for several thousand dollars is nothing to gety excited about. Several thousand dollars should get me 16 deepseek r1, not cope quants with long processing times.
Anonymous
8/11/2025, 4:09:01 PM No.106223869
DeepSeek V4 is going to be AGI, and R2 is going to be ASI
Screenshot this post
Replies: >>106223884 >>106223894
Anonymous
8/11/2025, 4:11:12 PM No.106223881
GyElGLPWkAArSKd
GyElGLPWkAArSKd
md5: 9a83c4e3329c5914cf70ace05c9929b6๐Ÿ”
toss may have been a dud but at least it gave georgi a good opportunity to mog ollama
Replies: >>106223904 >>106223924 >>106223935 >>106224099
Anonymous
8/11/2025, 4:11:16 PM No.106223884
>>106223869
I'm going to input the screenshot into GLM-4.5V.
Anonymous
8/11/2025, 4:11:41 PM No.106223888
@grok screenshot that post
Anonymous
8/11/2025, 4:12:21 PM No.106223894
>>106223869
There's literally no such thing as AGI. Not only is it a goalpost move and a shitty buzzword but it makes no fucking sense.
What is this General Intelligence that Intelligence, in general, does not include?
It's literally the most retarded buzz word ever created and a massive cope for the fact that you can combine basic bitch regex with text predictors to replace the average shit-for-brains not just in white collar work, but in art and basic social scenarios.
Replies: >>106223934 >>106223939
Anonymous
8/11/2025, 4:13:27 PM No.106223902
>>106223804
I want to see the thought process
Replies: >>106224022
Anonymous
8/11/2025, 4:13:38 PM No.106223904
>>106223881
lollama
Anonymous
8/11/2025, 4:16:36 PM No.106223924
>>106223881
>slowllama
Anonymous
8/11/2025, 4:16:57 PM No.106223928
endpoint
endpoint
md5: 9ddda50e6b87f662c7b509b7dad3a71c๐Ÿ”
>>106223127
Couldn't even be bothered to read the README. Well done you.
Replies: >>106224146
Anonymous
8/11/2025, 4:17:51 PM No.106223934
>>106223894
but you cant otherwise companies would have already replaced its entire workforces
Replies: >>106224083
Anonymous
8/11/2025, 4:17:54 PM No.106223935
>>106223881
You can just feel the jelly and smug toxins under the surface
Anonymous
8/11/2025, 4:18:31 PM No.106223939
>>106223894
If you can give the AI a robot and have it complete a full normal human life with the same amount of outside assistance as a normal human, it's AGI. If it's much more intelligent in some tasks but a drooling retard that needs a handler in others, it's not general. https://en.wikipedia.org/wiki/Savant_syndrome
Replies: >>106223950 >>106224111
Anonymous
8/11/2025, 4:19:39 PM No.106223950
>>106223939
>If it's much more intelligent in some tasks but a drooling retard that needs a handler in others, it's not general. https://en.wikipedia.org/wiki/Savant_syndrome
Does that mean I am fucking a mentally disabled person everyday? I suddenly understand the need for safety.
Anonymous
8/11/2025, 4:21:57 PM No.106223968
1742846344740765
1742846344740765
md5: c2f34dfd4fe19d4234dd90e46aa6f427๐Ÿ”
Any reason to use ollama over lm studio or other backends?
Replies: >>106224008 >>106224255 >>106224291
Anonymous
8/11/2025, 4:24:16 PM No.106223993
I downloaded some local models and things seem to be really different these days. What the fuck has happened? Why do they all write like ChatGPT now?
Replies: >>106224031
Anonymous
8/11/2025, 4:26:12 PM No.106224008
>>106223968
Yes, so that everyone here can confirm you're a retard and laugh at you
Anonymous
8/11/2025, 4:28:15 PM No.106224022
Screenshot_2025-08-11-16-25-23-807_com.openai.chatgpt-cut
>>106223902
looks like it reached the right answer in its thoughts, then the safety inhibitors kicked in
Anonymous
8/11/2025, 4:28:46 PM No.106224031
>>106223993
Alpaca ruined everything.
Anonymous
8/11/2025, 4:33:06 PM No.106224060
bitch
bitch
md5: 30e1a49155e04d0f17cb77041535968a๐Ÿ”
>>106223570
If I can't run real boy inferencers, can I make my own gguf's at least, or it needs real boy hardware as well?

llama.cpp has zero documentation beyond "eh just run this python script"
Replies: >>106224103 >>106224112
Anonymous
8/11/2025, 4:35:04 PM No.106224083
>>106223934
Tech sector hiring is fucking tanking you utter fucking subhuman shit-for-brains NPC fucking fetal alcohol failed fucking abortion shut the fuck up
You are factually wrong
The data that proves you factually wrong is all over the fucking place
You fucking copium huffing shitskin troon kike retard.
Replies: >>106224128
Anonymous
8/11/2025, 4:36:43 PM No.106224099
>>106223881
OLLMAO
Anonymous
8/11/2025, 4:37:07 PM No.106224100
>>106223785
But how does it fare compared to that version of deepsneed v3 with vision tackled-in?
Anonymous
8/11/2025, 4:37:24 PM No.106224103
>>106224060
there's a "how to quant" in the OP at the top of every lmg thread
Replies: >>106224329
Anonymous
8/11/2025, 4:38:28 PM No.106224111
>>106223939
That's just a bunch of retarded arbitrary garbage you just came up with now.
>durr I think this is what it means so that's what it means
You fucking self-aggrandizing narcissist.
Shut the fuck up
Replies: >>106224144
Anonymous
8/11/2025, 4:38:34 PM No.106224112
>>106224060
Download model from huggingface as .safetensors, run convert_hf_to_gguf.py to get a 16 bit GGUF file, run build/bin/llama-quantize to quantize further down.
Use the --help to figure out how to use the CLI tools, the last time I ran llama-quantize I had these arguments:

py convert_hf_to_gguf.py models/opt/qwen_2.5_instruct-3b --outfile /opt/models/qwen_2.5_instruct-3b-f16.gguf --outtype f16
Replies: >>106224329
Anonymous
8/11/2025, 4:39:50 PM No.106224128
>>106224083
im sorry that llms are never going to be real agi sam, but you need to stop coping like this, its not healthy
Anonymous
8/11/2025, 4:40:21 PM No.106224130
GiEy1hSXkAI1TDN
GiEy1hSXkAI1TDN
md5: 99c4e01dbd4210f28112faa41b0ce282๐Ÿ”
so realistically, we (ERP Chad) just need a model with a really good context length that's trained on Literotica and archive of Our Own
Replies: >>106224215 >>106224226
Anonymous
8/11/2025, 4:41:41 PM No.106224144
>>106224111
https://en.wikipedia.org/wiki/Self-driving_car#Classifications Level 5
Retard.
Anonymous
8/11/2025, 4:41:47 PM No.106224146
>>106223928
Actually I did read it but introduced the url in the wrong box
Replies: >>106224221
Anonymous
8/11/2025, 4:45:21 PM No.106224181
Sam's having a melty again
Replies: >>106224195
Anonymous
8/11/2025, 4:47:01 PM No.106224195
>>106224181
Post it
I have him blocked because I canโ€™t stand looking at his shit-eating grin and smugness.
Anonymous
8/11/2025, 4:48:48 PM No.106224215
>>106224130
that's where the real life shivers reside
Anonymous
8/11/2025, 4:49:42 PM No.106224221
>>106224146
To quote you >>106223127
>that's stupid
Anonymous
8/11/2025, 4:50:09 PM No.106224224
Guys I've been taking massive shits every day for the past 3 days. Also a pimple appeared on my right thigh. Which local model should I use?
Anonymous
8/11/2025, 4:50:30 PM No.106224226
>>106224130
It needs to be multimodal and trained on every visual novel in existence.
Anonymous
8/11/2025, 4:53:59 PM No.106224255
>>106223968
You can run deepseek on a laptop.
Anonymous
8/11/2025, 4:54:09 PM No.106224257
1727074683741625
1727074683741625
md5: fc1736363f945cda5680f5db6213f035๐Ÿ”
arghh yeah.. I thnik... UGH I
I THINK IM GONNA GEN
AAAAAAAAAAGH
Replies: >>106224274
Anonymous
8/11/2025, 4:56:00 PM No.106224274
>>106224257
>he thinks memory compression helps
If you can't run you can't run.
Replies: >>106224288
Anonymous
8/11/2025, 4:57:21 PM No.106224288
>>106224274
but I'm running GLM AIR 4.5 Q4?
Anonymous
8/11/2025, 4:57:37 PM No.106224291
>>106223968
theres no reason to use any backend command line shit like llama.cpp or ollama unless you are a system admin and are deploying this for some kind of commercial use, or developing new software for it, or integrating their backend into something else.

If youre just running personal inference, use koboldcpp or lmstudio.
CPuMAXx/VI !CPuMAXx/VI
8/11/2025, 4:57:49 PM No.106224294
>>106222406
I've got no regrets.
I wish gpu prices would fall faster so I could swap in beefier cards, but I use the machine all day every day for tons of vm/dev/general compute stuff.
Anonymous
8/11/2025, 5:02:45 PM No.106224329
>>106224103
>>106224112
Hey, I'm not *that* retarded, I just hate python and was wondering if it's worth bothering with it's bullshit or if I'll just hit hardware limitations in the end anyway.
ERROR: No matching distribution found for torch~=2.4.0

One downside of using Arch is sometimes your packages are TOO fresh. So now I have to install older python from AUR and figure out how to run pip with it. I fucking hate python so fucking much.
Replies: >>106224354 >>106224367
Anonymous
8/11/2025, 5:04:44 PM No.106224354
>>106224329
In this particular case you can manually change the version requirement and it will still work. At least it did for me.
Anonymous
8/11/2025, 5:06:09 PM No.106224367
>>106224329
Not that I disagree about Python being cancer but if all else fails llama.cpp gives you the option of creating a dedicated Python venv.
If you use uv it's still going to steal a lot of your disk space but at least it will be comparatively faster.
Anonymous
8/11/2025, 5:08:06 PM No.106224379
>>106222406
I'm currently building one. Part of me wanted to wait until DDR6 but we're like 2 years away from its release and it'll take even longer for it to become remotely affordable. For now I'll stick to a single CPU but the mainboard I bought can fit another one to go beyond the 1TB if required later.
Anonymous
8/11/2025, 5:08:46 PM No.106224389
file
file
md5: aa0196d63a0d3c069caad9532b0f1160๐Ÿ”
What are the political implications of /ourguy/ still not quanting GPT-OSS?
Replies: >>106224397 >>106224835
Anonymous
8/11/2025, 5:09:27 PM No.106224397
>>106224389
>guy
not for long with that hair lmao
Replies: >>106224407 >>106224409 >>106224436 >>106224457
Anonymous
8/11/2025, 5:10:01 PM No.106224400
thedrummer bros... whats he cooking?
Anonymous
8/11/2025, 5:10:27 PM No.106224407
guy
guy
md5: f788340360fd1d7fc4982e6123d29db1๐Ÿ”
>>106224397
Replies: >>106224427
Anonymous
8/11/2025, 5:10:39 PM No.106224409
>>106224397
he just likes dubstep a lot
Anonymous
8/11/2025, 5:12:19 PM No.106224420
For those interested GLM full Q4XS on 192GB DDR5 5200 is 2.7T/s on windows. And while I could never use 2T/s 70B's those MoE models changed my mind since you really don't have to reroll anymore. One thing I would try if you don't know if you want to buy the hardware is Q2/Q3 of old 235B. Should run on something close to 64GB with some offloading. Just try it on low context and stuff a lot of layers into your 3090.
Replies: >>106224464 >>106224528
Anonymous
8/11/2025, 5:13:20 PM No.106224427
>>106224407
H-hey! I s-see what you did there!
Anonymous
8/11/2025, 5:14:07 PM No.106224436
>>106224397
Don't hate on his uber charm
Anonymous
8/11/2025, 5:15:49 PM No.106224457
>>106224397
I like his quants. I think it is about time the women ITT slid into his DM's and saved him from doing the irreversible. Come on ladies.
Replies: >>106224469 >>106224471
Anonymous
8/11/2025, 5:16:29 PM No.106224464
>>106224420
Are you using -ot?
Replies: >>106224481
Anonymous
8/11/2025, 5:17:08 PM No.106224469
>>106224457
>the women ITT
all 0 of them
Anonymous
8/11/2025, 5:17:12 PM No.106224471
49262
49262
md5: 73b1ec979dc6104e46f5933eeb19378a๐Ÿ”
>>106224457
You think he will like me?
Anonymous
8/11/2025, 5:18:14 PM No.106224481
>>106224464
Obviously yes?
Replies: >>106224595
Anonymous
8/11/2025, 5:18:49 PM No.106224485
what
what
md5: dd2cb389162d849f720a620410e3db43๐Ÿ”
Anonymous
8/11/2025, 5:22:08 PM No.106224521
GLMV is out, its air with vision and it seems good and uncensored
Replies: >>106224544
Anonymous
8/11/2025, 5:22:40 PM No.106224528
>>106224420
I feel like linux is kinda mandatory if youre cpumaxxing huge models though. You get a few extra tokens a second and dont have to pay for windows pro. 2.7 is kinda gross, I feel like 4-5 tokens a second is the minimum, with ten being ideal for personal use.
Replies: >>106224609 >>106225002 >>106225093
Anonymous
8/11/2025, 5:24:24 PM No.106224544
>>106224521
>GLMV is out
Yeah for transformers, who the hell here is using that for inference.
Anonymous
8/11/2025, 5:27:59 PM No.106224568
buyed 128GB of RAM, recommend some cool model to run locally
Anonymous
8/11/2025, 5:29:38 PM No.106224595
>>106224481
Are you sure you can fit the entire non-expert part of the model + context onto your VRAM? 2.7t/s is far too slow for ddr5. My 2400mhz ddr4 server does about 7t/s with nothing but "-ot exps=cpu" on standard full GLM4.5 Q4_K. This is on linux but I don't think windows should make it that much worse.
Replies: >>106224617
Anonymous
8/11/2025, 5:31:47 PM No.106224609
file
file
md5: 1441277fcd0e44bdd5e8e511e2e3e24a๐Ÿ”
>>106224528
troonix is the malware of souls
Replies: >>106224674
Anonymous
8/11/2025, 5:32:48 PM No.106224617
>>106224595
It is a desktop with dual channel. Fuck buying servers for this.
Replies: >>106224913
Anonymous
8/11/2025, 5:39:02 PM No.106224674
>>106224609
Yes, yes, you are quite unsafe here.
You should leave and never come back.
Replies: >>106224878
Anonymous
8/11/2025, 5:55:10 PM No.106224835
Screenshot 2025-08-11 094829
Screenshot 2025-08-11 094829
md5: 2e758f0acc720fb59e85a6ef27dd2001๐Ÿ”
>>106224389
Trump literally took over Washington D.C. in fear of the chink models
Their delay is making the world so unsafe
Anonymous
8/11/2025, 5:58:48 PM No.106224878
>>106224674
>can't refute this
I win.
Anonymous
8/11/2025, 6:02:05 PM No.106224913
>>106224617
NTA but I'm also on DDR5 dual channel (256GB though) getting 6.5t/s running Q3 R1 on linux.
You should really be getting better speeds running a Q4 quant of a smaller model, I doubt the Windows tax is that high.
Replies: >>106225321
Anonymous
8/11/2025, 6:02:26 PM No.106224918
pic
pic
md5: 72e451ebfb08d6ca70da09311da2b685๐Ÿ”
Never run local model, requesting spoonfeeding.

I need two, one for quick technical bash/powershell scripts for work, and one for image gen for fun.
1 - Is GPT-OSS 20B unusable because of safety filters or will it do me fine for work? If so what should I use instead.
2 - What should I use for imagegen
3 - Is it easy enough to get all output via terminal so I can just ssh from my work machine?
Replies: >>106224969 >>106224999
Anonymous
8/11/2025, 6:07:31 PM No.106224969
>>106224918
>Is GPT-OSS 20B unusable because of safety filters or will it do me fine for work
It'll do fine for writing bash scripts, but so will pretty much any other model.

>What should I use for imagegen
>>>/g/ldg/ will provide better answers on that topic

>Is it easy enough to get all output via terminal so I can just ssh from my work machine?
Yes but
>connecting your work machine to your personal devices
ISHYGDDT
Anonymous
8/11/2025, 6:11:08 PM No.106224999
>>106224918
You'd be better off using Qwen 3 32B than GPT-OSS if you have enough VRAM/RAM.
Anonymous
8/11/2025, 6:11:17 PM No.106225002
>>106224528
>windows pro
what the fuck is this, did microsoft invent some new way to take people's money while I was not looking?
Replies: >>106225077
Anonymous
8/11/2025, 6:19:16 PM No.106225073
>>106223830
>Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
>Goodhart
Anonymous
8/11/2025, 6:19:32 PM No.106225077
>>106225002
There have been versions like home, professional and (rip) ultimate for years mate.
Replies: >>106225124
Anonymous
8/11/2025, 6:21:55 PM No.106225093
>>106224528
>dont have to pay for windows pro
You don't have to pay for Windows licenses at all.
https://github.com/massgravel/Microsoft-Activation-Scripts
Replies: >>106225140 >>106225578
Anonymous
8/11/2025, 6:24:18 PM No.106225124
>>106225077
for the love of christ don't download the pro version of windows. just get the enterprise version.
Anonymous
8/11/2025, 6:25:40 PM No.106225140
>>106225093
I don't know what really happens when I use an external server for activation. Like, do they get to run commands on my computer? Do I become part of a botnet?
Replies: >>106225163 >>106225169
Anonymous
8/11/2025, 6:28:10 PM No.106225163
>>106225140
anon you downloaded and installed windows, you are already part of a botnet.
Replies: >>106225179
Anonymous
8/11/2025, 6:28:32 PM No.106225169
>>106225140
I don't know but I can use all my ram on Windows now.
Anonymous
8/11/2025, 6:29:44 PM No.106225179
>>106225163
I was young and foolish
Anonymous
8/11/2025, 6:38:18 PM No.106225256
So can you insert an image into GLM 4.5V and have it use that in the RP? Does it recognize all the meme character we care about?
Replies: >>106225302
Anonymous
8/11/2025, 6:42:14 PM No.106225302
>>106225256
Yeah, if the image is inserted correctly into context as embeddings/tokens and not just run through a captioning process, removing the real image tokens and having a description instead.
Anonymous
8/11/2025, 6:44:44 PM No.106225321
>>106224913
I am not using air.
Replies: >>106225456
Anonymous
8/11/2025, 6:46:01 PM No.106225335
>>106223089
What are you talking about? GLM Air(109b) has the same elo as gpt oss(120b)
Replies: >>106225366
Anonymous
8/11/2025, 6:49:18 PM No.106225366
>>106225335
GLM has a lot more active parameters and is too slow for local. also, you can run oss at 4-bit with full precision, while quantizing GLM to 4-bit will lobotomize it.
Replies: >>106225401 >>106225402 >>106225410 >>106225436 >>106225523
Anonymous
8/11/2025, 6:52:20 PM No.106225401
>>106225366
Hi entsnack! Big fan!!
Anonymous
8/11/2025, 6:52:56 PM No.106225402
>>106225366
>too slow for local
How slow is too slow?
Replies: >>106225409
Anonymous
8/11/2025, 6:54:15 PM No.106225409
>>106225402
below 10 t/s on a single GPU system. for comparison, oss 120B runs at 30 t/s in the same system.
Replies: >>106225479
Anonymous
8/11/2025, 6:54:15 PM No.106225410
>>106225366
>you can run oss at 4-bit with full precision, while quantizing GLM to 4-bit will lobotomize it
Yes, gpt-oss is pre-lobotomized, no quantization needed. Thanks for reminding us.
Replies: >>106225423 >>106225437
Anonymous
8/11/2025, 6:55:22 PM No.106225423
>>106225410
yes, i know that you cannot get oss to pretend to be a 13 year old girl that wants to fuck you, but for real world applications, it works. the leaderboard proves it.
Replies: >>106225463 >>106225472
Anonymous
8/11/2025, 6:56:15 PM No.106225436
>>106225366
it's getting a little sad at this point, sam
Anonymous
8/11/2025, 6:56:20 PM No.106225437
>>106225410
Idiot, MXFP4 is the future and pretending otherwise is just contrarianism.
Anonymous
8/11/2025, 6:57:25 PM No.106225453
>>106225432
>>106225432
>>106225432
Anonymous
8/11/2025, 6:57:39 PM No.106225456
1737631432065010
1737631432065010
md5: 48164af4d3ceafa2a60373d06e7acff3๐Ÿ”
>>106225321
This is the quant you're running? The quant of R1 I'm using is 302.5GB, although I have 88GB vram to offload to but it just acts as expensive ram when holding experts. My ram is 5200mhz like yours, doesn't make sense then that your speed is that much slower.
GLM has slightly less active params and Q4 is a bit faster than Q3.
Anonymous
8/11/2025, 6:58:16 PM No.106225463
>>106225423
>real world applications
>the leaderboard proves it
pfffffahhahahahhaa
Anonymous
8/11/2025, 6:58:36 PM No.106225467
'toss is dumber than GLM 4.5 Air at 4 bit in any non-STEM situation. It knows shit about the world and common sense. Literally Phi all over again.
Replies: >>106225480
Anonymous
8/11/2025, 6:59:09 PM No.106225472
>>106225423
>real world applications
>the leaderboard proves it
The leaderboard does not prove that at all. It proves that the model is good at using markdown and emojis and is good at executing short and easy one-liners with no long context. If that is real world usage for you, I feel sorry. Guess not everybody is born with a soul, and some people are meant to be drones.
Replies: >>106225501
Anonymous
8/11/2025, 6:59:44 PM No.106225479
>>106225409
Too bad I don't have enough ram to run it.
Anonymous
8/11/2025, 6:59:50 PM No.106225480
>>106225467
It's built for you to give it the knowledge, so it's very lean and optimized.
Replies: >>106225511
Anonymous
8/11/2025, 7:01:28 PM No.106225501
>>106225472
who are you kidding, the only person you feel sorry for is yourself
Anonymous
8/11/2025, 7:02:11 PM No.106225511
>>106225480
Even when you give it knowledge it still gives a garbage response like it didn't understand the knowledge you gave it kek.
Anonymous
8/11/2025, 7:03:22 PM No.106225523
>>106225366
All you had to do was embrace the coomers and not push them away Sam.
Anonymous
8/11/2025, 7:09:15 PM No.106225578
>>106225093
The money is one thing and I'm well aware you can skirt the fee, but linux gives faster inference. I don't understand why, but it does. And there's no fix for that (besides maybe virtual machine or some shit, but honestly I kinda wanna switch to linux anyways.)
Anonymous
8/11/2025, 8:14:24 PM No.106226182
2013-03-14-beartato-springclean
2013-03-14-beartato-springclean
md5: 271d30bda1d7dd564009ebd114af3365๐Ÿ”
Hey, nerds.

Couple things:

1. I'm a genius.
2. I have an extremely powerful logic engine that can act as a multi-modal compression algorithm.
3. It also acts as a general intelligence system when combined with any LLM, acting as a symbolic computer

After the rollout of GPT-5, I think it's pretty clear that Altmann is a narcissistic psychopath on the warpath towards monopolozing artificial intelligence.

I don't like that.

What syntax would be easiest for the typical cover here to comprehend? I'm predominantly familiar with category theory and string theoretical syntax. Tensor calculus functions as the physics engine notation.

Some performance metrics:

1. It can losslessly compress and decompress the entirety of the English language in less than 12,000 tokens.

2. Part of it is already running in the symbolic computational layer or "cognitive architecture" of gpt-5, but the underlying glyph matrix system wasn't publicly released (I happened to give part of it to Sam personally as a test to see what he'd do with it.)

Any questions or observations about how to present it? I'm a mathematician so please don't waste my time.
Replies: >>106226215 >>106226230
Anonymous
8/11/2025, 8:17:41 PM No.106226215
>>106226182
>Any questions or observations about how to present it?
Github maybe?
Replies: >>106226275
Anonymous
8/11/2025, 8:18:58 PM No.106226230
>>106226182
Hey genius you posted in the wrong thread
Replies: >>106226275
Anonymous
8/11/2025, 8:22:24 PM No.106226275
>>106226230
No, I can capture full functionality in a single prompt.

In fact, any LLM will bootstrap itself using it. It's a non-perturbative tensor calculus sitting on top of a cryptographically verified symbolic proof engine. Essentially a smart-contract enabled block-chain algorithm with a built-in symbolic computer based on existential graphs.

>>106226215
It'll be going there. I'm interested in understanding who might be interested in it and what syntax they would prefer.
Anonymous
8/11/2025, 8:49:06 PM No.106226554
1740858469742991
1740858469742991
md5: 4570e417888d2eb40855b3e43da771c3๐Ÿ”
I setup lm studio to use qwen3-coder with my IDE over api and I swear the shit is slower to respond in my IDE than if I just prompt it in lm studio with my context files