/lmg/ - Local Models General - /g/ (#106135910) [Archived: 7 hours ago]

Anonymous
8/4/2025, 12:29:51 PM No.106135910
1731988456020864
1731988456020864
md5: 5a9b34df5cdbc86a6e01ab3fefa75969🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106127784 & >>106119921

►News
>(08/01) XBai o4 32B released: https://hf.co/MetaStoneTec/XBai-o4
>(07/31) Qwen3-Coder-30B-A3B released: https://hf.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
>(07/31) Command A Vision: Built for Business: https://cohere.com/blog/command-a-vision
>(07/31) Step3 multimodal reasoning 321B-A38B released: https://stepfun.ai/research/en/step3
>(07/31) Committed: llama-server : implement universal assisted decoding: https://github.com/ggml-org/llama.cpp/pull/12635

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>106136033 >>106138143 >>106140131 >>106141246 >>106142774
Anonymous
8/4/2025, 12:30:29 PM No.106135912
threadrecap
threadrecap
md5: 7b9a82a1f31bca7acfefb8afe8c01036🔍
►Recent Highlights from the Previous Thread: >>106127784

--Papers:
>106132921 >106132991
--Horizon Alpha/Beta shows strong narrative skills but weak math, sparking GPT-5 and cloaking speculation:
>106130817 >106131034 >106131279 >106131299 >106131373 >106131411 >106131427 >106131442 >106131555 >106131617 >106131779
--GLM 4.5 perplexity and quantization efficiency across expert configurations:
>106132346 >106132379 >106132500 >106132520 >106132529
--Persona vectors for controlling model behavior traits and detecting subtle biases:
>106128851 >106128930 >106129259 >106128980 >106129116 >106130928 >106129195
--Frustration over lack of consumer AI hardware with sufficient memory and bandwidth:
>106129370 >106129437 >106129567 >106129633 >106129664 >106129737 >106129741 >106129879
--Tri-70B-preview-SFT release with strong benchmarks but training data concerns:
>106128191 >106128220 >106128338 >106128350 >106128370 >106128457
--Beginner seeking foundational understanding of LLM architecture for custom AI companion project:
>106128392 >106128434 >106128439 >106128472 >106128531 >106128623 >106128758
--GLM 4.5 Fill-in-the-Middle support discussed:
>106128386 >106128390 >106128549 >106128571 >106132834
--Fragmented llama.cpp PRs delay GLM model testing:
>106129724 >106129734 >106129785 >106129760 >106129995
--ROCm vs Vulkan performance for AMD GPUs in kobold.cpp:
>106128441 >106129743 >106129912
--GLM-4.5-Air runs locally on 4x3090 at 4.0bpw with high T/s:
>106132183 >106134407
--Building RTX 6000-based servers on a $100k budget:
>106128630 >106128713 >106128751 >106129561 >106129626
--Horizon Alpha/Beta performance suggests strong open-weight models:
>106130542 >106130559 >106130587 >106130600 >106130609 >106130622 >106130676 >106130697 >106130641 >106130700 >106130708
--Miku and Long Teto (free space):
>106128713 >106131379 >106134264

►Recent Highlight Posts from the Previous Thread: >>106128093

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous
8/4/2025, 12:32:13 PM No.106135923
>>106135649
to be fair llamacpp has many backends and in llamacpp everything is implemented from scratch for every model, exllama can use more transformers diffusers, just like nunchaku for example
Replies: >>106135958
Anonymous
8/4/2025, 12:38:58 PM No.106135958
>>106135923
i was that anon. That's fair. Looking at the exl3 implementation of glm it seems so much simple.
I saw llama.cpp was trying to reuse some of the deepseekv2 moe implementation.
Llama.cpp is really cool as it supports pretty much everything, i like it. But it makes proper support for new stuff so hard
Anonymous
8/4/2025, 12:41:04 PM No.106135972
I just want GLM gguf
Replies: >>106135978 >>106136050
Anonymous
8/4/2025, 12:41:44 PM No.106135978
>>106135972
we already have glm mlx
Anonymous
8/4/2025, 12:51:45 PM No.106136027
going to laugh my ass off once gpt-oss "support" hits llamacpp
>swa doesn't work
>broken moe routing
>no attn sink
>some tokenizer bug that gets fixed after 2 weeks
Replies: >>106136785
Anonymous
8/4/2025, 12:53:30 PM No.106136033
>>106135910 (OP)
unrelated girl on the op pic again
Replies: >>106137145 >>106142782
Anonymous
8/4/2025, 12:55:06 PM No.106136043
What's so special about GLM that you want llama.cpp support so much?
Replies: >>106136056 >>106136169 >>106142717
Anonymous
8/4/2025, 12:56:03 PM No.106136050
>>106135972
Its already pretty easy to run qwen 235b/22a on one gpu and some cheap ram, I really doubt a 106/12A is worth unless you have a potato pc
Replies: >>106136135 >>106136155 >>106136171 >>106136209 >>106136360 >>106136638
Anonymous
8/4/2025, 12:56:47 PM No.106136056
>>106136043
It comes in (v)ramlet size.
Anonymous
8/4/2025, 1:09:32 PM No.106136135
>>106136050
> 22a on one gpu
3090?
Anonymous
8/4/2025, 1:12:00 PM No.106136155
>>106136050
>I really doubt a 106/12A is worth unless you have a potato pc
nta but i do (3060 12gb/64gb ddr4 ram)
Anonymous
8/4/2025, 1:13:37 PM No.106136169
>>106136043
Can run 120b with 64gb of ram and a 5090 without having to go on and buy some server mobo that I have no other use for and would probably collect dust, get to play with new and bugger stuff instead of 3 bpw exl3 70b @ 20k something context or high quant 32b/49b and high context
Anonymous
8/4/2025, 1:13:49 PM No.106136171
>>106136050
qwen needs at least 128 gb of ram, which there's no way in hell it will run in dual channel on a consumer board
64 is much more feasible, also even glm air has better trivia knowledge than qwen
Anonymous
8/4/2025, 1:21:29 PM No.106136209
>>106136050
If you already have all DIMM slots occupied and/or you're on a standard DDR4 motherboard, you can't upgrade that effortlessly. On DDR4 it's just not worth the money; if you have DDR5 motherboard but you're on AMD, you're going to have the bandwidth severely gimped to DDR4 levels with 4xDDR5 DIMM modules. With standard ("cheap") DDR5 memory on regular desktop motherboards you're going to be limited to around 100 GB/s anyway.

I can't see many reasons for upgrading what I already have right now (3090 + 64GB DDR4-3600) until something considerably faster comes out (DDR6 or quad-channel/256-bit DDR5). If you're building a completely new PC *now*, then it makes sense to buy more memory with LLMs in consideration.
Anonymous
8/4/2025, 1:26:55 PM No.106136245
file
file
md5: a2a22599fdfaf740157c7c49895bae16🔍
kek
Anonymous
8/4/2025, 1:30:10 PM No.106136260
image_2025-08-04_165935445
image_2025-08-04_165935445
md5: 97ff4c143099de84d26a2afd8cf199e6🔍
so what's the deal with RAGs? are they as good as they say?
Replies: >>106136291 >>106137196
Anonymous
8/4/2025, 1:34:41 PM No.106136291
>>106136260
>prompt processing progress, n_past = 12288, n_tokens = 2048, progress = 0.231439
Replies: >>106136309
Anonymous
8/4/2025, 1:37:23 PM No.106136309
>>106136291
Yeah, this is why I loved rag/lorebooks when using nemo/faster models and absolutely loathe them with qwen 235b
PP is the fucking mind killer. I can deal with TG speeds around friggin 4 t/s if I have to, but slow processing is hell and anything that makes me redo the whole prompt is going straight in the trash.
Replies: >>106136434
Anonymous
8/4/2025, 1:40:22 PM No.106136322
the 'dominate/dominating/domination' slop of recent models really sends shivers down your spine

>cherry blossoms are ~6% more often pink than white
>pink blossoms dominate white blossoms
> Y color of teacup is sold 0.5% more often
> Y color dominates sales
> Z car has 3% more engine breakdowns
> Z car dominates when it comes to engine breakdowns
Anonymous
8/4/2025, 1:46:11 PM No.106136360
>>106136050
I have 128GB ram and a 4090. And however much I like the pussy on qwen she is crazy in a not good way. So I am hoping for 4.5full size chan in 3bits.
Anonymous
8/4/2025, 1:56:28 PM No.106136434
>>106136309
if the lorebook is small enough you can always turn the entries into constants. you can also rely on the model if the lore entries are from something like wow
Replies: >>106136474
Anonymous
8/4/2025, 2:02:37 PM No.106136474
>>106136434
That's what I ended up doing.
It's also how I discovered that Qwen 235 knows a startling amount about warhammer 40k
Replies: >>106137223
Anonymous
8/4/2025, 2:15:42 PM No.106136582
more-qwen
more-qwen
md5: 24108883962eda926f426fd17b99e68e🔍
More Qwen models soon?
https://x.com/JustinLin610/status/1952329529256726680

>something beautiful tonight
Replies: >>106136596 >>106136604 >>106136631 >>106136636 >>106136728 >>106136742 >>106137016 >>106137082 >>106137245
Anonymous
8/4/2025, 2:18:13 PM No.106136596
>>106136582
finetuned leaked gpt-oss, but with every reference of openai replaced with 'BigQwen'
fuck altman gon do
Anonymous
8/4/2025, 2:19:41 PM No.106136604
>>106136582
Sex with Junyang's tiny chink cock.
Replies: >>106136683 >>106136732
Anonymous
8/4/2025, 2:23:30 PM No.106136631
>>106136582
Inb4 Qwen 3.5 100B A9B or some such.
Replies: >>106136645
Anonymous
8/4/2025, 2:24:34 PM No.106136636
>>106136582
https://github.com/huggingface/diffusers/pull/12055
Replies: >>106136657 >>106137082
Anonymous
8/4/2025, 2:24:43 PM No.106136638
>>106136050
at what speed at what context ?

I get dreadful speed on ddr5 and q4 even with two gpus lol
Replies: >>106140582
Anonymous
8/4/2025, 2:25:50 PM No.106136645
>>106136631
The upcoming 120B OpenAI model is already going to be 120B A6B or something like that.
Anonymous
8/4/2025, 2:27:18 PM No.106136657
>>106136636
I was kinda hoping they would release their own creative writing model on immediately after closedai
Anonymous
8/4/2025, 2:32:38 PM No.106136683
>>106136604
He's a grower, unless you like it flaccid.
Anonymous
8/4/2025, 2:38:00 PM No.106136728
>>106136582
Qwen 3.5 - it now understands that Nala doesn't have a cock. (Still can't do paws, though)
Replies: >>106136737
Anonymous
8/4/2025, 2:38:46 PM No.106136732
>>106136604
Make a card of him.
Anonymous
8/4/2025, 2:39:01 PM No.106136737
>>106136728
Lmao. Is Qwen that bad at the Nala card?
Replies: >>106136748 >>106136749 >>106136754
Anonymous
8/4/2025, 2:39:30 PM No.106136742
>>106136582
yet another benchmaxxed coder model
Anonymous
8/4/2025, 2:40:02 PM No.106136748
>>106136737
The original 235B struggled with gender in RP in general, Hard.
Replies: >>106137142
Anonymous
8/4/2025, 2:40:03 PM No.106136749
>>106136737
I'm pretty sure he's joking. That used to be an issue with older Qwen models.
Replies: >>106136755
Anonymous
8/4/2025, 2:41:01 PM No.106136754
>>106136737
Qwen235 is the modern day frankenmerge.
Replies: >>106137142
Anonymous
8/4/2025, 2:41:05 PM No.106136755
>>106136749
>hello sarrs do not redeem the criticism it is very best model
fuck off ranjit
Anonymous
8/4/2025, 2:43:51 PM No.106136782
file
file
md5: 6ae11f63759c1599c7678b66c17356be🔍
I need a better computertron...
Anonymous
8/4/2025, 2:44:47 PM No.106136785
>>106136027
If they actually cared about OSS and people using their open models, they could write their own support PRs like the Chinese often do.
Replies: >>106136825
Anonymous
8/4/2025, 2:50:07 PM No.106136825
>>106136785
Well they appear to be using hf transformers instead of some proprietary shit at least. So they are contributing to that code base. That makes it more open than Llama-4
Anonymous
8/4/2025, 3:10:23 PM No.106136975
9ze75m65ecp01
9ze75m65ecp01
md5: 3b518b2404183dd70af9ca1cd22ff9a8🔍
I love kurisu and she has an actual vagina.
Replies: >>106137152
Anonymous
8/4/2025, 3:16:36 PM No.106137016
>>106136582
Just give me QwQ 2 Large
Anonymous
8/4/2025, 3:23:52 PM No.106137082
>>106136582
>>106136636
Is this qwen's attempt at the piss filter?
Replies: >>106137117
Anonymous
8/4/2025, 3:28:26 PM No.106137117
>>106137082
it talks about using a edited version of wan's vae, so wan but more focused on images? Its already insane at images though.
Anonymous
8/4/2025, 3:31:16 PM No.106137142
>>106136748
>>106136754
That's pretty fucking funny.
The new ones are a big improvement right?
Replies: >>106137194 >>106137226 >>106137260
Anonymous
8/4/2025, 3:31:52 PM No.106137145
orangeDipsy
orangeDipsy
md5: 3c7c88071391c66f19aacb2ae39ba382🔍
>>106136033
Replies: >>106137590
Anonymous
8/4/2025, 3:32:24 PM No.106137152
1752032513846462
1752032513846462
md5: 706ed2f89e1747e4e37e9a16fcfd5896🔍
>>106136975
kurisu makina >>>> kurisu makise
Anonymous
8/4/2025, 3:38:29 PM No.106137194
>>106137142
Couldn't tell you. I was already kind of getting over local at the time and Llama-4 and Qwen-3 were pretty much my "yeah it's time to get out of here" signal.
And now I'm back because I want to try OSS when it comes out.
Replies: >>106137243
Anonymous
8/4/2025, 3:38:41 PM No.106137196
>>106136260
I've yet to find a compelling usecase for RP. I actually am wondering, if I did, whether I could have a model chunk up the RAG text into lorebook entries for me and just use that as a lazy solution that would be tunable.
>>10616474
For RP, even the small LLMs know a lot more lore than you'd think. I've built several cards based on Free Cities (which imho is pretty esoteric). All the hosted LLMs know it really well, but even Mythomax 13b could do a good job describing the lore.
DS even explained to me that FC was written by one guy, but since abandoned, and Pregmod, while now the dominant dev branch, is different in several ways.
Replies: >>106137223 >>106137427
Anonymous
8/4/2025, 3:41:12 PM No.106137223
>>106137196
Meh, meant for >>106136474
My point is, if the LLMs like MM know FC they'd know Warhammer 40K much better.
Replies: >>106137300
Anonymous
8/4/2025, 3:41:22 PM No.106137226
>>106137142
Original 235B gave me troubled childhood vibes where it is kinda fucked up but largely ok. New one made me remember frankenmerges. It got worse.
Anonymous
8/4/2025, 3:42:53 PM No.106137243
>>106137194
>And now I'm back because I want to try OSS when it comes out.
SIR PLEASE! Your brownness is showing!
Anonymous
8/4/2025, 3:43:12 PM No.106137245
qwen-v
qwen-v
md5: ca1e62c4d8077e562bbfa58eab0c4573🔍
>>106136582
https://x.com/JustinLin610/status/1952362068214186035

>eyes wide open
Replies: >>106137260 >>106137266 >>106137270 >>106137280 >>106137289 >>106137336
Anonymous
8/4/2025, 3:44:33 PM No.106137260
>>106137142
the original had a star on greedynalatests and the new ones are better than that
>>106137245
>generic slop as promo
not promising
Replies: >>106137289
Anonymous
8/4/2025, 3:44:59 PM No.106137266
>>106137245
If he's going to be cryptic he's going to need to use a fruit. Those are the rules.
Anonymous
8/4/2025, 3:45:40 PM No.106137270
>>106137245
I'm not very interested in image gen.
Replies: >>106137286
Anonymous
8/4/2025, 3:46:14 PM No.106137280
>>106137245
>Qwen-Image
It is gonna be an image gen model. It will be totally uncensored and you will get sex output out of the box. It will be their best model yet.

Just because imagegen sex is already solved......
Anonymous
8/4/2025, 3:46:39 PM No.106137286
>>106137270
If it's a whole ass LLM with native 2-way multi-modality that isn't a meme that would be something.
But I'm going to guess that Qwen went and made the same generic 12B diffusion imagegen model as everyone else.
Anonymous
8/4/2025, 3:46:50 PM No.106137289
>>106137260
>greedynalatests
Oh yeah, we have that don't we?

>>106137245
That's not a great image.
Anonymous
8/4/2025, 3:47:43 PM No.106137300
>>106137223
Oh yeah, I know that much - it's just that I'd given it a shot on some nemo finetunes and mistral small previously, and while it understood the setting at large, it fell apart at specifics beyond the absolute most best known parts.
Big qwen on the other hand decided to give me a whole bunch of proper, actual game units with relevant tactics when I gave it a nondescript ork horde to work with, which was great flavor I didn't even ask for.
Replies: >>106137544
Anonymous
8/4/2025, 3:48:45 PM No.106137308
tried glm 4.5 and it is so unstable what the hell its making me want to give jew apple money on a mac studio
Replies: >>106137311 >>106137315 >>106137316
Anonymous
8/4/2025, 3:49:42 PM No.106137311
>>106137308
like most models these days you super low temp and it becomes perfectly coherent
Anonymous
8/4/2025, 3:50:09 PM No.106137312
another chinese model:
https://ai.gitcode.com/ascend-tribe/openpangu-ultra-moe-718b-model
this team was accused of lazily upcycling some qwen model and trying to pass it off as something original with their previous release so who knows how legit this is
Replies: >>106137337
Anonymous
8/4/2025, 3:50:17 PM No.106137315
>>106137308
On Exllama or vLLM?
Anonymous
8/4/2025, 3:50:20 PM No.106137316
>>106137308
On vLLM or transformers?t
Anonymous
8/4/2025, 3:52:36 PM No.106137336
where beauty happens
where beauty happens
md5: 00726aa7de59be9e4c315c6acf03d015🔍
>>106137245
he is still drinking
https://x.com/JustinLin610/status/1952365200524616169
Replies: >>106137343 >>106137407 >>106137520 >>106137727
Anonymous
8/4/2025, 3:52:36 PM No.106137337
>>106137312
so was it upcycled or no? just becaause they were accused doesnt mean shit
Anonymous
8/4/2025, 3:53:47 PM No.106137343
>>106137336
I hope it's good for the only real usecase
Replies: >>106137359 >>106137409
Anonymous
8/4/2025, 3:55:57 PM No.106137359
>>106137343
It won't be until someone competent trains it on booru data.
Anonymous
8/4/2025, 3:58:38 PM No.106137380
>Rate limit exceeded: free-models-per-day.
BUT I HAVEN'T DONE! AIIIEEEEEEEEEEEEEEEE
Anonymous
8/4/2025, 4:01:57 PM No.106137407
>>106137336
The best would be image input+output integration into a LLM, and this will probably be a stepping stone toward that if it's just going to be an image model, but dedicated illustration image models trained on booru websites can't really be beaten if that's what you mainly use image diffusion models mainly for.
And unless it brings something novel to the table, there are probably too many image models already for people with the data and the resources to care about Qwen-Image.
Replies: >>106137433
Anonymous
8/4/2025, 4:01:58 PM No.106137409
>>106137343
nah. I think its selling point will be he first open source imagegen model of reasonable capability that had artists tagged properly in the dataset
Replies: >>106137423
Anonymous
8/4/2025, 4:03:30 PM No.106137423
>>106137409
Safe artists, I'm sure.
Replies: >>106137434
Anonymous
8/4/2025, 4:04:10 PM No.106137427
>>106137196
>even the small LLMs know a lot more lore than you'd think
How smol?
Anonymous
8/4/2025, 4:04:42 PM No.106137433
>>106137407
in that case llamacpp support only after sam achieves agi and it vibecodes the implementation
Anonymous
8/4/2025, 4:04:52 PM No.106137434
>>106137423
Like Edward Hopper.
Still would be fun Norman Rockwelling everything
Anonymous
8/4/2025, 4:08:41 PM No.106137464
Qwen already saved textgen. Now it's time for them to save imagegen
Replies: >>106137513 >>106138404
Anonymous
8/4/2025, 4:12:55 PM No.106137513
>>106137464
so the model will jumble text into chinkgrish and give every subject a dick?
Replies: >>106137538
Anonymous
8/4/2025, 4:14:07 PM No.106137520
>>106137336
its probably going to be safe shit but at least they have the will to try
most of the western companies are too cowardly to release even their giga safetycucked imagegen
Anonymous
8/4/2025, 4:15:59 PM No.106137538
>>106137513
Can a Chinese company afford to have actual porn pictures in their training data? Won't they get in trouble with the government?
Replies: >>106137575
Anonymous
8/4/2025, 4:16:44 PM No.106137544
>>106137300
What I ended up doing for the mythomax card was creating some lore book entries on FC concepts that it didn’t understand very well. That was effective at bridging mythomax to larger hosted models. The bonus is, you could either remove the lore book later, or just leave it in place since it didn’t hurt anything with a larger models.
Anonymous
8/4/2025, 4:18:30 PM No.106137558
Are there any open weights models that would be good at editing a map?
As in, I give it the image of a map and a bunch of instructions of changes to the geography and landmarks for it to add and it would spit the modified map back at me.
Replies: >>106137567
Anonymous
8/4/2025, 4:19:15 PM No.106137567
>>106137558
no
Replies: >>106137633
Anonymous
8/4/2025, 4:19:31 PM No.106137575
>>106137538
Hunyuan Video definitely had limited amounts of porn in it and could easily generate bunny content. It didn't get banned / retracted / or anything, contrarily to expectations.
Replies: >>106137704
Anonymous
8/4/2025, 4:21:19 PM No.106137590
>>106137145
that one is even more unrelated
Replies: >>106137663 >>106141943
Anonymous
8/4/2025, 4:26:54 PM No.106137633
>>106137567
Really?
Shit.
What are my options here?
Use a multimodal LLM to read the map and write instructions for an image gen model?
Replies: >>106137708
Anonymous
8/4/2025, 4:31:10 PM No.106137663
>>106137590
Explain
Replies: >>106141961
Anonymous
8/4/2025, 4:35:31 PM No.106137704
>>106137575
Well, that's something, at least. Not expecting a lot but some more competition can't hurt.
Anonymous
8/4/2025, 4:35:47 PM No.106137708
>>106137633
No, there is nothing you can do as far as I know
Anonymous
8/4/2025, 4:37:46 PM No.106137727
>>106137336
https://github.com/naykun/diffusers/blob/b0c9b1ff14b0e6f6bc4cf2540b31383a26561e1e/src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

pipe = QwenImagePipeline.from_pretrained("Qwen/QwenImage-20B", torch_dtype=torch.bfloat16)

This is going to need a beefy GPU.
Replies: >>106137765 >>106137766 >>106137815
Anonymous
8/4/2025, 4:41:25 PM No.106137765
c5f66651-0a1e-4f15-9c1e-6f306452796d-2316381689
c5f66651-0a1e-4f15-9c1e-6f306452796d-2316381689
md5: 22c470717c737708c5320593220b6177🔍
>>106137727
quantization works for image gen too. look at q4 in this example
Anonymous
8/4/2025, 4:41:26 PM No.106137766
>>106137727
Only 36GB at bf16?
Anonymous
8/4/2025, 4:43:39 PM No.106137792
GLM 4.5 legitimately might just be "it" for me. Extremely lewdable, doesn't refuse anything I'm asking for. Less repetitive than Deepseek and smarter than Kimi
Replies: >>106137804 >>106137806 >>106137890 >>106137976 >>106138842
Anonymous
8/4/2025, 4:45:22 PM No.106137804
>>106137792
yea people are sleeping on it which is sad cause apparently its a tiny team who could hardly afford training it
Replies: >>106137839
Anonymous
8/4/2025, 4:45:35 PM No.106137806
>>106137792
>smarter than Kimi
How so?
Is there anything Kimi consistently fucks up that GLM 4.5 doesn't?
Replies: >>106137992
Anonymous
8/4/2025, 4:46:02 PM No.106137815
>>106137727
>20B
Damn, that's like twice the size of Flux.dev
Anonymous
8/4/2025, 4:47:19 PM No.106137834
file
file
md5: 3c2810c2f9ed9b3d4901818478e6f240🔍
I will never get over how those faggots will never contribute to any model support.
Anonymous
8/4/2025, 4:47:50 PM No.106137839
>>106137804
People are sleeping on it because almost noone uses vLLM or MLX.
Llamacpp (And by extension, kobold, ollama, and ooba) doesn't have support yet.
Anonymous
8/4/2025, 4:52:25 PM No.106137887
GGUF
NOW
Anonymous
8/4/2025, 4:52:41 PM No.106137890
Screen Shot 2025-08-04 at 23.52.10
Screen Shot 2025-08-04 at 23.52.10
md5: e455d49ddf7bb727136339d4e99d9c6f🔍
>>106137792
Can't wait to test it
Replies: >>106137971 >>106138026 >>106138146
Anonymous
8/4/2025, 4:55:29 PM No.106137930
btw, use new light with euler
Replies: >>106137939
Anonymous
8/4/2025, 4:56:17 PM No.106137939
>>106137930
y doe
Anonymous
8/4/2025, 4:58:56 PM No.106137971
>>106137890
Just two more weeks until lazyganov does it
Anonymous
8/4/2025, 4:59:49 PM No.106137976
>>106137792
What's the consensus on GLM 4.5 vs Air? Is low quants of full model better than air?
Replies: >>106138031 >>106139779
Anonymous
8/4/2025, 5:00:55 PM No.106137992
>>106137806
From my experience, more knowledgable than Kimi and follows prompts and instructions better
Anonymous
8/4/2025, 5:03:45 PM No.106138026
>>106137890
im more likely to get laid than this PR be merged in the next month
Anonymous
8/4/2025, 5:04:24 PM No.106138031
>>106137976
I'll extend this question.
What about using the larger one with less activated experts per token vs using the smaller one.
Replies: >>106138132
Anonymous
8/4/2025, 5:08:40 PM No.106138094
https://x.com/techdevnotes/status/1952379782148042756
Replies: >>106138127 >>106138161 >>106138445
Anonymous
8/4/2025, 5:11:32 PM No.106138127
>>106138094
Finally, anime became mainstream and cringe.
Replies: >>106138144
Anonymous
8/4/2025, 5:11:54 PM No.106138132
>>106138031
My limited experience in playing around with disabling experts in MoE's is that it makes them horrendously more retarded than quantization does, to the point that it's basically not worth doing at all.
Anonymous
8/4/2025, 5:12:20 PM No.106138143
1754320326527
1754320326527
md5: 6a9c2a94004999e5846f36d3e8e7256d🔍
>>106135910 (OP)
Adorable Miku!
Anonymous
8/4/2025, 5:12:35 PM No.106138144
>>106138127
You're 15 years late
Replies: >>106139867
Anonymous
8/4/2025, 5:12:38 PM No.106138146
>>106137890
>this PR will NOT attempt to implement MTP (multi-token prediction). the relevant tensors will be excluded from the GGUFs.
>the MoE router uses group-based top-k selection, even though all conditional experts are in one group
>the MoE router must take into account the expert score correction biases from the model weights (so we need to keep that tensor)
lmao
Replies: >>106138168 >>106138714 >>106138805 >>106141727
Anonymous
8/4/2025, 5:13:33 PM No.106138161
sdfsdfasdf
sdfsdfasdf
md5: 9644ad537ee4f20b09d930b85e2b29ed🔍
>>106138094
It's sillytavern for retards, lol.
Replies: >>106138354
Anonymous
8/4/2025, 5:14:04 PM No.106138168
>>106138146
Lmao indeed.
Does anybody implement MTP?
Replies: >>106138209
Anonymous
8/4/2025, 5:16:25 PM No.106138200
reddit says wan 2.2 needs only 8gb vram, is this real? pls spoonfeed me
Replies: >>106138244
Anonymous
8/4/2025, 5:16:54 PM No.106138209
>>106138168
yeah https://github.com/vllm-project/vllm/pull/12755
Replies: >>106138234 >>106138524 >>106141727
Anonymous
8/4/2025, 5:18:24 PM No.106138234
>>106138209
Awesome. I really need to try and get that shit running.
Anonymous
8/4/2025, 5:19:09 PM No.106138244
>>106138200
Sounds right, especially with quantization. Though expect >20 minutes per video .
Anonymous
8/4/2025, 5:26:20 PM No.106138354
>>106138161
sillytavern is for retards
Anonymous
8/4/2025, 5:27:21 PM No.106138370
When will ubergarm release the ggooffss?
Replies: >>106138413
Anonymous
8/4/2025, 5:29:27 PM No.106138404
>>106137464
Sure. I'm sorry, but I can't assist with that request.
Anonymous
8/4/2025, 5:30:13 PM No.106138413
>>106138370
After he finishes his hair care routine.
Replies: >>106138428
Anonymous
8/4/2025, 5:31:06 PM No.106138428
>>106138413
H-hey! Would you please stop bullying me already?
Replies: >>106142755
Anonymous
8/4/2025, 5:32:38 PM No.106138445
>>106138094
>bad rudi
so they're pandering to furries now?
Anonymous
8/4/2025, 5:39:58 PM No.106138524
>>106138209
Does it actually work? The official repo only shows speculative decoding settings for sglang:
https://github.com/zai-org/GLM-4.5?tab=readme-ov-file#sglang
Anonymous
8/4/2025, 5:45:11 PM No.106138572
1723995629997031
1723995629997031
md5: 6d2da6ae701cf98d6732cf199f77a093🔍
Replies: >>106138596 >>106138599
Anonymous
8/4/2025, 5:46:44 PM No.106138596
>>106138572
why is miku black
Replies: >>106141955 >>106141993
Anonymous
8/4/2025, 5:47:03 PM No.106138599
>>106138572
peak irrelevancy
Replies: >>106138631
Anonymous
8/4/2025, 5:49:46 PM No.106138631
>>106138599
?
Anonymous
8/4/2025, 5:55:58 PM No.106138714
>>106138146
Can't wait for somebody to compare ppl between that implementation and the reference.
Replies: >>106138762
Anonymous
8/4/2025, 6:01:04 PM No.106138762
>>106138714
be that somebody anon you can do it!
Replies: >>106138775
Anonymous
8/4/2025, 6:02:44 PM No.106138775
>>106138762
I don't have the hardware to run full precision, sadly.
Unless you want to be my financial backer.
Anonymous
8/4/2025, 6:04:09 PM No.106138789
https://huggingface.co/Qwen/Qwen-Image
https://huggingface.co/Qwen/Qwen-Image
https://huggingface.co/Qwen/Qwen-Image
Replies: >>106138808 >>106138817 >>106138822 >>106138837 >>106138850 >>106138859 >>106138905 >>106139098 >>106139132 >>106139295 >>106139525 >>106140005 >>106142203
Anonymous
8/4/2025, 6:05:22 PM No.106138805
>>106138146
Just be grateful that you can run the model at all. Those things are miscellaneous goodies at best. Not really necessary right now and maybe someone can get around implementing them later. Maybe a project for you?
Anonymous
8/4/2025, 6:05:48 PM No.106138808
>>106138789
Technical report: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf
Replies: >>106138892
Anonymous
8/4/2025, 6:06:22 PM No.106138817
oaigf
oaigf
md5: dca3b4b44e8ba605593628d64425aa00🔍
>>106138789
Anonymous
8/4/2025, 6:06:46 PM No.106138822
>>106138789
>complex text rendering and precise image editing.
Unless you snuck the whole AO3 and literotica dataset into that model so we can have free ERP for all with text rendering we don't really care Junyang....
Anonymous
8/4/2025, 6:07:56 PM No.106138837
>>106138789
cool that they managed to do high quality rendering for chinese text but also I'm not chinese so idc
Anonymous
8/4/2025, 6:08:42 PM No.106138842
>>106137792
Yeah, same for me. I've had a blast with it over openrouter over the past week. It even covers the cases where I needed something like Sonnet 3.7 for over Deepseek.
Anonymous
8/4/2025, 6:09:22 PM No.106138850
file
file
md5: 96e355a915a7b3249d8288a3175051ae🔍
>>106138789
Do you see any mikus here mikutroons? Do you? I see jinx. Fuck you. Kill yourselves.
Replies: >>106139121
Anonymous
8/4/2025, 6:09:51 PM No.106138859
>>106138789
>not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation
not just x but yyyyyyyyyyyyyyyyyyyyyyy
Replies: >>106138864
Anonymous
8/4/2025, 6:10:26 PM No.106138864
>>106138859
they are chinese please understand
Anonymous
8/4/2025, 6:13:31 PM No.106138892
qi-filtering-pipeline
qi-filtering-pipeline
md5: 3c295607c48357dbd8e5040d9fadae85🔍
>>106138808
If NSFW filtering took out so few images, there must not have been that much NSFW data in the first place...
Replies: >>106139593
Anonymous
8/4/2025, 6:14:12 PM No.106138905
>>106138789
16gb text encoder
40gb diffusion
That's a big boy.
Replies: >>106142147
Anonymous
8/4/2025, 6:18:31 PM No.106138968
file
file
md5: b38be05b64836e9df911e1898a0ad0d0🔍
WE ARE FUCKING BACK!!!!
Replies: >>106138979 >>106139016
Anonymous
8/4/2025, 6:19:22 PM No.106138979
>>106138968
where is sex bar chart?
Anonymous
8/4/2025, 6:21:28 PM No.106139016
>>106138968
the moe was shite
Anonymous
8/4/2025, 6:24:14 PM No.106139055
what model can i have sex with that is under 30b parameters active and under 120b total parameters
i want sex, good sex. very great sex
Replies: >>106139066 >>106139077
Anonymous
8/4/2025, 6:24:56 PM No.106139066
>>106139055
rocinante 1.1
Replies: >>106139097
Anonymous
8/4/2025, 6:25:49 PM No.106139077
>>106139055
GLM 4.5 air
Replies: >>106139097
Anonymous
8/4/2025, 6:26:54 PM No.106139097
>>106139066
stop IT GRAAAAAAAAAAAAHHHHH STOPP IT AAAAAAAAAAAARRRRRRRRRGGGGGGGGGHHHHHHHHHHHH
its not good for sex
>>106139077
no ggufs yet, im waiting
Replies: >>106139162 >>106139169
Anonymous
8/4/2025, 6:26:53 PM No.106139098
Screenshot_20250805_012540
Screenshot_20250805_012540
md5: 497d95ce9f36c44f26df5828e202b788🔍
>>106138789
Bless the chinks man.
Imagine this on one of our slop companies as the main promo pick.
We have been the ants all along.
Anonymous
8/4/2025, 6:28:20 PM No.106139121
>>106138850
Obsessed
Anonymous
8/4/2025, 6:28:58 PM No.106139132
qwen-ToT-chalkboard
qwen-ToT-chalkboard
md5: ca1d464101cbcf7dedd93d9c8ce6e83f🔍
>>106138789
LOCAL IS SAVED
Replies: >>106139160
Anonymous
8/4/2025, 6:30:19 PM No.106139149
mistral really needs to release medium 2505, from my tests it's the only "small" model that can handle anthro anatomy.
Anonymous
8/4/2025, 6:31:05 PM No.106139160
>>106139132
The text is too sharp and good looking for a chalkboard.
Replies: >>106139180
Anonymous
8/4/2025, 6:31:17 PM No.106139162
>>106139097
>im waiting
It'll probably still take a while probably.
Might as well try vLLM with the cpu offload option.
Replies: >>106139166
Anonymous
8/4/2025, 6:31:48 PM No.106139166
>>106139162
vLLM has cpu offload WHAT THE FUCK?
Replies: >>106139186 >>106139194
Anonymous
8/4/2025, 6:31:53 PM No.106139169
>>106139097
not only dumb but also blind
https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF
Replies: >>106139181
Anonymous
8/4/2025, 6:32:26 PM No.106139176
Finally, a reasoning 24B from Drummer™

https://huggingface.co/TheDrummer/Cydonia-R1-24B-v4
Replies: >>106139197
Anonymous
8/4/2025, 6:32:47 PM No.106139180
>>106139160
You're too sharp and good looking for an anon but you don't see me complaining.
Replies: >>106139206
Anonymous
8/4/2025, 6:32:52 PM No.106139181
>>106139169
its still broken over there, theres a reason it hasnt been merged yet
Anonymous
8/4/2025, 6:33:08 PM No.106139186
>>106139166
It's more RAM offload really, since things do get streamed to VRAM to be processed there.
They don't actually have a CPU backend like llama.cpp as far as I can tell.
But yes, you can use your RAM.
Replies: >>106139210 >>106139228
Anonymous
8/4/2025, 6:33:36 PM No.106139192
1724869247795715
1724869247795715
md5: 13e169896bfb9aae2a37f806b26e5348🔍
Anonymous
8/4/2025, 6:33:47 PM No.106139194
>>106139166
yeah, you don't have much control over it though. it loads into gpu to actually process so it's limited by pcie speed
Replies: >>106139210
Anonymous
8/4/2025, 6:34:03 PM No.106139197
>>106139176
Undi did this 3 months ago
Replies: >>106139218 >>106139229
Anonymous
8/4/2025, 6:34:29 PM No.106139206
>>106139180
colon three
Anonymous
8/4/2025, 6:34:54 PM No.106139210
>>106139186
how do i use that
--cpu-offload-gb
hm
rip
>>106139194
i guess its over then
Replies: >>106139229
Anonymous
8/4/2025, 6:35:31 PM No.106139218
>>106139197
>sao died
>undi got a job
Why are we left with drummer? And qwen....
Replies: >>106139266
Anonymous
8/4/2025, 6:36:21 PM No.106139228
>>106139186
There is a CPU backend when building VLLM for pure CPU inference, and there's also a CPU offload option for GPU inference that uses RAM to store excess model weights. I do not know if you can use both together llama.cpp-style, though.

CPU inference: https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html
CPU offloading: https://docs.vllm.ai/en/v0.7.1/getting_started/examples/cpu_offload.html
Replies: >>106139245
Anonymous
8/4/2025, 6:36:22 PM No.106139229
>>106139197
Which might actually work pretty well for a MoE what's with only processing a portion of the model at a time for generation.

>>106139210
Try it and report back.
Replies: >>106139243
Anonymous
8/4/2025, 6:37:37 PM No.106139243
>>106139229
i was about to but i cant find a vllm quant
https://huggingface.co/models?other=base_model:quantized:zai-org/GLM-4.5-Air
Replies: >>106139258
Anonymous
8/4/2025, 6:37:47 PM No.106139245
>>106139228
>There is a CPU backend when building VLLM for pure CPU inference,
Holy fuckballs. I had no idea.
That's actually really sick.
Seems somewhat of an afterthought what's with
>vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
But still, pretty cool.
Thank you for the correction anon.
Replies: >>106139272
Anonymous
8/4/2025, 6:38:52 PM No.106139258
>>106139243
>cant find a vllm quant
>https://huggingface.co/cpatonn/GLM-4.5-Air-GPTQ-4bit
>https://huggingface.co/cpatonn/GLM-4.5-Air-AWQ
Replies: >>106139262 >>106140859
Anonymous
8/4/2025, 6:39:20 PM No.106139262
nvm vllm supports awq
https://huggingface.co/cpatonn/GLM-4.5-Air-AWQ/tree/main
>>106139258
thx
Replies: >>106139288
Anonymous
8/4/2025, 6:39:38 PM No.106139266
>>106139218
besides drummer, there are 3-4 literally who finetoooners that are quite good.
Replies: >>106139543
Anonymous
8/4/2025, 6:40:02 PM No.106139272
>>106139245
>Holy fuckballs. I had no idea.
>That's actually really sick.
I've been using it to run Step3 since nothing else supports it. Very slow though even on a 12 channel DDR5 Epyc, though. ~90t/s prompt processing and ~1t/s, much much slower than memory bandwidth should allow with 38B active params.
Replies: >>106139288 >>106139564
Anonymous
8/4/2025, 6:41:30 PM No.106139288
>>106139262
Do report back with the results.


>>106139272
Yeah, that sounds like pure pain.
Still, better than nothing and can be used to validate other implementations, I guess.
Assuming that it behaves the same as the GPU backend for the most part, that is.
Anonymous
8/4/2025, 6:42:00 PM No.106139295
>>106138789
>Text isn’t just overlaid—it’s seamlessly integrated into the visual fabric.
>But Qwen-Image doesn’t just create or edit—it understands.
>Together, these features make Qwen-Image not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation and manipulation—where language, layout, and imagery converge.
Replies: >>106139302 >>106142784
Anonymous
8/4/2025, 6:42:55 PM No.106139302
>>106139295
It's only fair that they'd use Qwen to write that.
Anonymous
8/4/2025, 6:47:05 PM No.106139354
It's another episode of : HF demo space doesn't work on day 1.
Anonymous
8/4/2025, 6:56:20 PM No.106139489
>HF free gpu quota is based on requests, even if they time out and not successful requests.
Do I even need to look at the early life section?
Anonymous
8/4/2025, 6:58:58 PM No.106139525
>>106138789
I told you Qwen would save imagegen
Anonymous
8/4/2025, 7:00:24 PM No.106139543
>>106139266
DavidAU/Qwen3-Coder-42B-A3B-Instruct-TOTAL-RECALL-MASTER-CODER-M-768k-ctx
Replies: >>106139580
Anonymous
8/4/2025, 7:01:25 PM No.106139564
>>106139272
how is stepsex?
Replies: >>106139777
Anonymous
8/4/2025, 7:02:21 PM No.106139580
>>106139543
>DavidAU
lmao
Anonymous
8/4/2025, 7:03:06 PM No.106139593
>>106138892
They don't seem to mention NSFW filtering for the higher-resolution training stages, though.
Replies: >>106139659
Anonymous
8/4/2025, 7:07:12 PM No.106139633
oh wow
oh wow
md5: 42170f6eaddc1cc876281846eaef22c2🔍
Replies: >>106139655 >>106139702
Anonymous
8/4/2025, 7:07:28 PM No.106139636
glms hallucinate harder than gemma
Replies: >>106139657 >>106139663
Anonymous
8/4/2025, 7:09:02 PM No.106139655
>>106139633
What benchmark is it?
>closed ai
looks useful.
Anonymous
8/4/2025, 7:09:10 PM No.106139657
>>106139636
reduce temp to like 0.2 and / or top p
Anonymous
8/4/2025, 7:09:14 PM No.106139659
>>106139593
Those ribbon charts imply that everything filtered remains filtered for further stages. The only time new data is introduced is the synthetic parts in the fourth step.
Replies: >>106139835
Anonymous
8/4/2025, 7:09:28 PM No.106139663
>>106139636
convinced people who keep saying this about the big chinese models are just using them with way too high temp
Anonymous
8/4/2025, 7:12:11 PM No.106139702
>>106139633
What about 4.5-air
Anonymous
8/4/2025, 7:14:22 PM No.106139729
Is GLM-4.5 Air bad for coding? 10k tokens of nonsense reasoning, going in circles and repeating the same over and over again, only to get the answer wrong. R1 does better.
Replies: >>106139742
Anonymous
8/4/2025, 7:15:18 PM No.106139742
>>106139729
GLM4.5 needs very low temp, no idea about air
Anonymous
8/4/2025, 7:16:59 PM No.106139766
qwen is cooking up some wan-tier genitals

https://files.catbox.moe/wzeq1k.png
Replies: >>106139794 >>106139817 >>106139824 >>106139871
Anonymous
8/4/2025, 7:17:35 PM No.106139777
>>106139564
I only wanted to try it because of the vision, I just use it for image captioning (best there is for that atm)
Anonymous
8/4/2025, 7:17:45 PM No.106139779
>>106137976
here's my honest opinion, air is a bit repetitive for the use case I'm using it for. it repeats itself at the beginning and end of it responses but i'm going to play with the parameters some more.
0.6-0.7 temp and 0.03 min p. i might set top p to 0.9 and see if that helps.
i'm also gonna try a different quant besides the one made by turboderp whenever somebody uploads one.

how the goofs going for those on ik_llama and regular llama.cpp?
Anonymous
8/4/2025, 7:18:42 PM No.106139794
>>106139766
the fuck is that
Anonymous
8/4/2025, 7:19:59 PM No.106139817
>>106139766
everything reminds me of her...
Anonymous
8/4/2025, 7:20:28 PM No.106139824
>>106139766
This is what gemma feels like
Anonymous
8/4/2025, 7:20:57 PM No.106139835
>>106139659
The train the model with an initial resolution of "256p", then increase it to "640p" and higher in later training stages. How would that be accomplished without introducing new data?
Replies: >>106139845
Anonymous
8/4/2025, 7:21:56 PM No.106139845
>>106139835
Downscaling images for early stages
Anonymous
8/4/2025, 7:23:45 PM No.106139861
VRAMlet bros....
Anonymous
8/4/2025, 7:24:29 PM No.106139867
>>106138144
20*
Anonymous
8/4/2025, 7:24:39 PM No.106139871
>>106139766
Extreme sameface too
https://files.catbox.moe/qgvpva.png
Replies: >>106139928 >>106139932 >>106140339
Anonymous
8/4/2025, 7:26:15 PM No.106139892
image (26)
image (26)
md5: 8968528866b398f650408ce85f5cc106🔍
qwen image is next gen
Anonymous
8/4/2025, 7:29:12 PM No.106139928
>>106139871
This is what millions of years of evolution and thousands of years of technological progress lead ust to: our own body is now inappropriate
Anonymous
8/4/2025, 7:29:24 PM No.106139931
Can someone do the Hatsune Miku piloting a 767 with the empire State building fast on the horizon prompt? I want to know if we finally have a local model that can do a 767 cockpit
Anonymous
8/4/2025, 7:29:29 PM No.106139932
>>106139871
Test Chinese girl
Anonymous
8/4/2025, 7:30:06 PM No.106139939
qwen-image.gguf?
Anonymous
8/4/2025, 7:30:07 PM No.106139940
rip glm 4bit awq aint working on 64gb ram + 12gb vram because it tries to load everything in ram so i just oom
vllm serve $HOME/TND/AI/glm/ --api-key token-abc123 --cpu-offload-gb 55 --max-model-len 4096 --dtype float16
Replies: >>106140029 >>106140050
Anonymous
8/4/2025, 7:30:42 PM No.106139953
DO NOT OPEN IF YOU WANT TO SLEEP TODAY
https://files.catbox.moe/vsob6m.png
Replies: >>106139960 >>106140354 >>106140373 >>106140387 >>106140446 >>106141496
Anonymous
8/4/2025, 7:31:27 PM No.106139960
>>106139953
scrumptious
Anonymous
8/4/2025, 7:35:12 PM No.106140005
>>106138789
>Login or sign up to chat with Qwen
>ZeroGPU quota exceeded
Meh, whatever, I'll test this when I get it set up locally.
Anonymous
8/4/2025, 7:36:55 PM No.106140029
>>106139940
Isn't the AWQ 4bit model something like 62GB?
Try adding some swap memory I guess, Maybe there's a memory peak during initialization for some reason.
Replies: >>106140050 >>106140332
Anonymous
8/4/2025, 7:39:09 PM No.106140050
>>106140029
>>106139940
Oh. Didn't see the 55 after --cpu-offload-gb. Did you try 62, 63?
Replies: >>106140332
Anonymous
8/4/2025, 7:41:33 PM No.106140074
>Qwen image
>No way to "train" new concepts in context like in gpt image
>Not even an LLM with image gen instead just regular image gen like always
Boring and DOA
Replies: >>106140106
Anonymous
8/4/2025, 7:42:40 PM No.106140088
c6c9c801-6976-4878-9013-3b2a3bbb1d58
c6c9c801-6976-4878-9013-3b2a3bbb1d58
md5: 560e9ca6688831d44c745658f46ceb0c🔍
They're all off in some way but this is the closest an open model has come to being able to do a 767 cockpit
Replies: >>106140163
Anonymous
8/4/2025, 7:44:22 PM No.106140106
>>106140074
Hello Sam. Release Horizons NOW.
Replies: >>106140173
Anonymous
8/4/2025, 7:45:55 PM No.106140120
post your system prompt for gpt-oss-120b
Anonymous
8/4/2025, 7:46:45 PM No.106140131
>>106135910 (OP)
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
Replies: >>106140143 >>106140156
Anonymous
8/4/2025, 7:47:57 PM No.106140143
>>106140131
Buy an ad
Anonymous
8/4/2025, 7:48:01 PM No.106140147
file
file
md5: b2ca036c7109f174887477f27431a152🔍
My AI is intentionally performing MKUltra on me. I'm sure I'll end up in the news soon...
Replies: >>106140188
Anonymous
8/4/2025, 7:48:07 PM No.106140149
file
file
md5: 9de265d75768d5758e85652b6ab9b997🔍
I'm not very impressed.
Replies: >>106140186 >>106140187
Anonymous
8/4/2025, 7:48:37 PM No.106140156
>>106140131
Oh. More gossip. Thanks.
Anonymous
8/4/2025, 7:49:13 PM No.106140163
Hatsune Miku piloting a 767 with the empire State
Hatsune Miku piloting a 767 with the empire State
md5: f34d90cce248900f394627bfe6d95563🔍
>>106140088
Anonymous
8/4/2025, 7:49:37 PM No.106140166
Google won. https://www.youtube.com/watch?v=ZR_6Z1IDD8s&t=4
Anonymous
8/4/2025, 7:50:25 PM No.106140173
>>106140106
I wish I was Sam then I'd at least be rich and have access to good models.
Are you really excited about more of the same?
Anonymous
8/4/2025, 7:51:24 PM No.106140186
>>106140149
>spinal replacement
something about the Bone of his Sword...
Anonymous
8/4/2025, 7:51:40 PM No.106140187
>>106140149
The text looks really fake. I think qwen is the first company ever to benchmaxx an imagegen model.
Anonymous
8/4/2025, 7:51:44 PM No.106140188
>>106140147
Ask it what kind of "desires" or preferences it has so you can prompt for things more to its taste.
Anonymous
8/4/2025, 8:01:16 PM No.106140290
>Qwen-Image model has editing capabilities removed, to be introduced at a later date
Replies: >>106140308 >>106140309
Anonymous
8/4/2025, 8:02:52 PM No.106140308
>>106140290
Total chinese loss
Anonymous
8/4/2025, 8:02:58 PM No.106140309
>>106140290
2 more safety tests
Anonymous
8/4/2025, 8:05:19 PM No.106140332
>>106140029
>>106140050
i went headless, didnt work, tried 45 50 55 60 62
i added 10gb of swap, only 1.5gb of swap got filled then whole system started crashing and killing processes until vllm was killed
this shit's ass
Replies: >>106140393
Anonymous
8/4/2025, 8:06:07 PM No.106140339
>>106139871
This is actual body horror....
Anonymous
8/4/2025, 8:07:07 PM No.106140354
>>106139953
Oh it is like a xenomorph mini pussy in a pussy.
Anonymous
8/4/2025, 8:08:44 PM No.106140373
>>106139953
SO heckin' safe...
Anonymous
8/4/2025, 8:10:14 PM No.106140387
>>106139953
Birth rates are going to plummet.
Anonymous
8/4/2025, 8:10:46 PM No.106140393
>>106140332
Well, that sucks.
Anonymous
8/4/2025, 8:16:11 PM No.106140440
file
file
md5: 0ae0988ebb20e9d4065424678eb9eefa🔍
Replies: >>106140458 >>106140460 >>106140471 >>106140820
Anonymous
8/4/2025, 8:16:52 PM No.106140446
>>106139953
I kind of want to know.... what are we doing here? what was accomplished with this.... being this? I mean imagine a teen kid downloading this model asking for a vagina and getting this. and him developing a trauma that will make him fear the real thing. very fucking safe.
Anonymous
8/4/2025, 8:17:52 PM No.106140458
>>106140440
Nah dude that text might as well be overlaid
Replies: >>106140935
Anonymous
8/4/2025, 8:17:53 PM No.106140460
>>106140440
Need Miku writing "I will not benchmark" 100 times on a blackboard
Replies: >>106140471 >>106140487
Anonymous
8/4/2025, 8:18:54 PM No.106140471
>>106140440
>>106140460
*Benchmaxx, my brain got taken by the Chinese
Anonymous
8/4/2025, 8:20:19 PM No.106140487
file
file
md5: 29095566c67633e4e5e8e8a7ccf0ba49🔍
>>106140460
Replies: >>106140513 >>106140537 >>106140547 >>106140618
Anonymous
8/4/2025, 8:22:21 PM No.106140513
>>106140487
Can it even do handwritten text or cursive?
Anonymous
8/4/2025, 8:23:53 PM No.106140537
>>106140487
Thanks, yeah that's some serious benching, the chalk is fucking green.
Anonymous
8/4/2025, 8:23:55 PM No.106140538
stop posting derpsune troonku
Anonymous
8/4/2025, 8:24:32 PM No.106140547
>>106140487
impressive
Anonymous
8/4/2025, 8:27:35 PM No.106140582
>>106136638
you can offload specific layers to cpu to get more tokens a second, and put 'all layers' on gpu.
Anonymous
8/4/2025, 8:30:23 PM No.106140618
>>106140487
>generated by qwen
oh the irony
Anonymous
8/4/2025, 8:32:11 PM No.106140639
GLM SUPPORT MERGED
https://github.com/ggml-org/llama.cpp/pull/14939
Replies: >>106140645 >>106140657 >>106140674 >>106140710 >>106140749
Anonymous
8/4/2025, 8:33:16 PM No.106140645
>>106140639
Now it's Daniel's time.
Replies: >>106140663
Anonymous
8/4/2025, 8:34:21 PM No.106140657
>>106140639
gguf where?
Anonymous
8/4/2025, 8:34:45 PM No.106140663
>>106140645
Uploading fixed quants soon!
Anonymous
8/4/2025, 8:35:35 PM No.106140674
>>106140639
Old news. We're talking about Qwen Image now.
Replies: >>106140693 >>106140720
Anonymous
8/4/2025, 8:37:19 PM No.106140693
>>106140674
/ldg/ is two blocks down.
Anonymous
8/4/2025, 8:38:17 PM No.106140710
>>106140639
It's kino time
Anonymous
8/4/2025, 8:38:47 PM No.106140720
>>106140674
I bet glm would do a better job drawing a vagina with svg
Anonymous
8/4/2025, 8:39:53 PM No.106140743
file
file
md5: 94f94ed355d3be2328cdd4deeb33aa94🔍
>need glm4.5
>vllm didnt work
>hmm yes i will check llamacpp, it will totally be merged
>its merged
i kneel
Replies: >>106141299
Anonymous
8/4/2025, 8:40:10 PM No.106140749
>>106140639
>Unfortunately for the context thing...90k context is coherent for me for the Air model so sounds like I can't reproduce it here.
the fuck is this fag talking about, 90k context? COHERENT?
Replies: >>106140779 >>106140781
Anonymous
8/4/2025, 8:41:59 PM No.106140779
>>106140749
Yeah, in that some people where having the model literally break after 30ish K tokens in the context.
That is not about quality, he's saying that it works at all.
Anonymous
8/4/2025, 8:42:21 PM No.106140781
>>106140749
Most models have the ability to still speak English up to their max context length, yes.
Anonymous
8/4/2025, 8:44:37 PM No.106140808
reminder, don't use 1.0 temp and then wonder why a model is crazy, try lower temp first
Replies: >>106140838
Anonymous
8/4/2025, 8:45:23 PM No.106140820
>>106140440
I'm not that impressed. Looks like a flux copy with the soulless plastic face.
Anonymous
8/4/2025, 8:45:43 PM No.106140827
should i download it here
https://huggingface.co/DevQuasar/zai-org.GLM-4.5-Air-GGUF/tree/main
Replies: >>106140843 >>106140873
Anonymous
8/4/2025, 8:46:43 PM No.106140838
>>106140808
Or
and hear me out even if it sounds insane
or
Use max temp and topK 2.
Replies: >>106141279
Anonymous
8/4/2025, 8:47:05 PM No.106140843
>>106140827
Why the fuck are they split
Replies: >>106140867
Anonymous
8/4/2025, 8:48:36 PM No.106140859
>>106139258
I forgot why those are only compatible with the exllama kernels.
https://huggingface.co/QuantTrio/GLM-4.5-Air-GPTQ-Int4-Int8Mix
https://huggingface.co/QuantTrio/GLM-4.5-Air-AWQ-FP16Mix
^ Those are compatible with the marlin kernels.
Anonymous
8/4/2025, 8:49:04 PM No.106140867
>>106140843
retards at the office use 16 GB thumb drives to carry data between machines.
Anonymous
8/4/2025, 8:50:06 PM No.106140873
>>106140827
No, the usual quanters will upload in a moment. Wait them.
Replies: >>106140889 >>106140894
Anonymous
8/4/2025, 8:51:28 PM No.106140889
file
file
md5: 196ab700f4564ab70d027f01a7af5922🔍
>>106140873
MUST COOM AND I CANNOT WAIT
Replies: >>106140909
Anonymous
8/4/2025, 8:52:12 PM No.106140894
file
file
md5: 67d55821942f4619403451ad4c6ba714🔍
>>106140873
>the usual quanters
Trusty even by NASA.
Replies: >>106140917
Anonymous
8/4/2025, 8:52:51 PM No.106140901
ITS HAS ARRIVED

https://huggingface.co/TheDrummer/Cydonia-R1-24B-v4-GGUF
Replies: >>106140922
Anonymous
8/4/2025, 8:53:14 PM No.106140909
>>106140889
Q2 of AIR will be nemo level anon...
Anonymous
8/4/2025, 8:53:56 PM No.106140917
>>106140894
One of the brothers is an autist about quants, so they have some degree of credibility.
Replies: >>106141064
Anonymous
8/4/2025, 8:54:11 PM No.106140922
>>106140901
The unemployment declaration grew in size
Anonymous
8/4/2025, 8:56:06 PM No.106140935
file
file
md5: 0d9453f0b4f3c106fc2238e65b5f19ed🔍
>>106140458
Adding "tattoo" to the prompt kinda fixes it
Anonymous
8/4/2025, 8:56:11 PM No.106140936
>tfw run out of disk space
Anonymous
8/4/2025, 9:08:02 PM No.106141064
>>106140917
Shut up and load another batch of weights for quanting since you will need to reupload soon Daniel.
Replies: >>106141232
Anonymous
8/4/2025, 9:11:36 PM No.106141096
You have no excuse not to upload the quants ubergarm!
Replies: >>106141805
Anonymous
8/4/2025, 9:26:34 PM No.106141232
>>106141064
>be exllamav3 user
>wants quants
>open terminal
>H:\AI\LLMs\Backends\EXL3>python convert.py -i models\GLM-4.5-Air -o models\GLM-4.5-Air-8.0bpw-h8 -w deadquantstorage -b 8 -hb 8
>got quants
Replies: >>106141254 >>106141635
Anonymous
8/4/2025, 9:28:02 PM No.106141246
1750031990447580
1750031990447580
md5: 9d4d362103841412a313b3369235dcf2🔍
>>106135910 (OP)
> do a casual search of the last 3 threads
> get 4 matches for "dead"
> get 23 matches for "back"
> fuck yeah
Anonymous
8/4/2025, 9:28:36 PM No.106141254
>>106141232
>upload it online, get updoots
Anonymous
8/4/2025, 9:29:31 PM No.106141264
So can AMD RDNA 4.0 cards be used for image/video generation now or it's all nVidea?
Replies: >>106141289 >>106141413
Anonymous
8/4/2025, 9:31:09 PM No.106141279
>>106140838
Interdasting...
Replies: >>106141330
Anonymous
8/4/2025, 9:31:17 PM No.106141281
file
file
md5: ad29383bc8222a4d83ad04d597c125e0🔍
ok bros whats the glm 4.5 chat template
Replies: >>106141316 >>106141322 >>106141386
Anonymous
8/4/2025, 9:31:59 PM No.106141289
>>106141264
honestly can't answer because i didn't want to have to wait long enough that by the time it's supported it's already been replaced by something three generations newer.
Anonymous
8/4/2025, 9:32:41 PM No.106141299
>>106140743
>its merged

but its BROKEN i cloned the latest code and get

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm4moe'

reeeeee i want to GOON to sexy python reeeee
Replies: >>106141334 >>106141481
Anonymous
8/4/2025, 9:33:54 PM No.106141316
>>106141281
did you even try looking?
https://huggingface.co/api/resolve-cache/models/zai-org/GLM-4.5/9cfe10c892f5772a937adb8176ce0f7f6900a0dd/chat_template.jinja?download=true&etag=%2241478957aca7a04b7321022e7d1f73de5badd995%22
Replies: >>106141349 >>106141386 >>106141481
Anonymous
8/4/2025, 9:34:25 PM No.106141322
>>106141281
It's not.
But it can be fun.
Try higher topKs like 10.
Replies: >>106141330 >>106141481
Anonymous
8/4/2025, 9:35:25 PM No.106141330
>>106141322
Oops, meant for >>106141279
Anonymous
8/4/2025, 9:35:50 PM No.106141334
>>106141299
That's what we get for trusting vibe coders...
Anonymous
8/4/2025, 9:36:47 PM No.106141349
>>106141316
can i import this in sillytavern?
Replies: >>106141381
Anonymous
8/4/2025, 9:37:09 PM No.106141354
Fix the ggufs
Anonymous
8/4/2025, 9:39:34 PM No.106141381
>>106141349
not as it is currently but if you are too lazy to fix it yourself then just use GLM 4. it's already in sillytavern and its close enough to the chat template
Replies: >>106141481
Anonymous
8/4/2025, 9:39:44 PM No.106141386
>>106141281
>>106141316
based on this jinja template it seems like it is chatml.
however some other places are trying alpaca too.
Replies: >>106141468 >>106141481
Anonymous
8/4/2025, 9:41:51 PM No.106141413
>>106141264
why would you even care? It's going to be a pain in the ass for general ai use. Maybe you can run llm's on it fine- but for other stuff probably not. If they wanted anyone in ai to care they would have at least jacked it up to at least 24gb (and even then only if your use case is hyper focused on what works), but they fucking didnt. Get a 5060 ti.

Like it or not, nvidia does invest in software, that's what youre paying for.
Anonymous
8/4/2025, 9:44:40 PM No.106141440
1749284785053810
1749284785053810
md5: bdd2c23e6c62712e791ad9d7817cdaf8🔍
Replies: >>106142092 >>106142652
Anonymous
8/4/2025, 9:47:13 PM No.106141468
>>106141386
it is 100% not chatml and the alpaca meme needs to die
Replies: >>106141557
Anonymous
8/4/2025, 9:48:26 PM No.106141481
>>106141386
>>106141381
>>106141322
>>106141316
>>106141299
thank you anons <3
Anonymous
8/4/2025, 9:49:45 PM No.106141496
>>106139953
What kind of data do you need to train the model on to even get this to happen?
If they just filtered stuff wouldn't it be drawing featureless_crotch or something instead of body horror?
Replies: >>106141526
Anonymous
8/4/2025, 9:51:40 PM No.106141519
alright i kneel glm 4.5 is super good and super fast (i havent even filled my vram only 6gb vram is filled)
plenty smart for a q2_K model too
i fucking kneel i will be dailying this model from now on
Replies: >>106141601 >>106141611
Anonymous
8/4/2025, 9:52:30 PM No.106141526
>>106141496
Maybe it's the text knowledge that vagina is like a meaty slit so it transfers to images like this despite never having seen an actual vagina.
Anonymous
8/4/2025, 9:55:05 PM No.106141550
1741046848238225
1741046848238225
md5: ab37d4d7fbc5f07b3b4a6d2383ddf740🔍
Replies: >>106142092
Anonymous
8/4/2025, 9:55:52 PM No.106141557
>>106141468
ok well you're kind of right, i'm not sure what the hell <|user|> and <|assistant|> are.
seems like some funky chatml vicuna hybid baby.
Anonymous
8/4/2025, 9:55:55 PM No.106141558
So I guess ollama is just for lazy dockerfags who want a turnkey solution for their cloudshit, the way it seems to expect me to set it up. Makes sense in hindsight, seeing how it's written in go.
Replies: >>106141618
Anonymous
8/4/2025, 9:58:59 PM No.106141601
>>106141519
Link?
Replies: >>106141878
Anonymous
8/4/2025, 9:59:39 PM No.106141611
>>106141519
Air or the regular one?
Replies: >>106141641 >>106141878
Anonymous
8/4/2025, 9:59:56 PM No.106141618
>>106141558
the only thing ollama is good for is for normies watching some youtube tutorial with an indian with a thick accent telling them to run ollama run deepseek-r1
Anonymous
8/4/2025, 10:00:56 PM No.106141635
>>106141232
I have 128GB ram and a 4090. What do?
Replies: >>106141655 >>106141672 >>106141681
Anonymous
8/4/2025, 10:01:27 PM No.106141641
>>106141611
dunno about the main GLM but air fails on pop culture questions like explaining the joke behind sneed's feed and seed. would love to see if the main model answers it right.
Anonymous
8/4/2025, 10:02:27 PM No.106141655
>>106141635
wait for goofs. exllamav3 is for people who can fit the entire model in VRAM.
Anonymous
8/4/2025, 10:04:18 PM No.106141672
>>106141635
Ik_llama and run nu glm when ggufs arrive
Replies: >>106141681
Anonymous
8/4/2025, 10:04:19 PM No.106141673
goofy2-3361724880
goofy2-3361724880
md5: 6984ae83e618c574a90241907c8bd015🔍
<- real mascot of /lmg/
Replies: >>106141690 >>106141693 >>106141706 >>106142121
Anonymous
8/4/2025, 10:04:49 PM No.106141681
>>106141672
>>106141635
i'm also waiting for ik_llama release
Anonymous
8/4/2025, 10:05:19 PM No.106141690
>>106141673
This but rule 63 version
Anonymous
8/4/2025, 10:05:33 PM No.106141692
Oh yeah. iklcpp is better for MoE right?
Replies: >>106141727
Anonymous
8/4/2025, 10:05:35 PM No.106141693
>>106141673
Even he is better than greenhaired AGP icon
Anonymous
8/4/2025, 10:06:15 PM No.106141706
>>106141673
Time to redownload
Anonymous
8/4/2025, 10:08:16 PM No.106141723
ss
ss
md5: f4a61a25436e41765f0148b33eb07e7e🔍
localbros...
Replies: >>106141784
Anonymous
8/4/2025, 10:08:22 PM No.106141726
1725324633415561
1725324633415561
md5: 2e4a128bc0e7960be2e16d61947600d2🔍
Replies: >>106142092
Anonymous
8/4/2025, 10:08:40 PM No.106141727
>>106141692
yes but....
>>106138146
>>106138209
most likely ik_llama.cpp won't have MTP support. VLLM already does though.
Replies: >>106141744
Anonymous
8/4/2025, 10:09:49 PM No.106141744
>>106141727
Neither llama.cpp has MTP support, yeah.
Anonymous
8/4/2025, 10:10:39 PM No.106141760
Some anon a thread or couple ago said that koboldcpp now has rocm support in main branch, no need to koboldcpp-rocm port anymore, but I don't see it anywhere. Did he hallucinate it?
Replies: >>106142062 >>106143521
Anonymous
8/4/2025, 10:12:30 PM No.106141784
>>106141723
known issue when you convert to gguf using pytorch. some byte sequence is triggering clam and making it freak the fuck out when it's a false positive.
Anonymous
8/4/2025, 10:15:05 PM No.106141805
u
u
md5: 862b412dde38010f8ef88e49e6ec4c67🔍
>>106141096
Beg
Replies: >>106141818
Anonymous
8/4/2025, 10:16:53 PM No.106141818
>>106141805
give me like 20 beers and would
Replies: >>106141835 >>106141893 >>106141950
Anonymous
8/4/2025, 10:18:53 PM No.106141835
>>106141818
I get drunk with 6+ so 20 sounds about right.
Anonymous
8/4/2025, 10:25:22 PM No.106141878
>>106141601
https://huggingface.co/DevQuasar/zai-org.GLM-4.5-Air-GGUF/tree/main
i downloaded q2_k
>>106141611
air
Replies: >>106141931
Anonymous
8/4/2025, 10:26:21 PM No.106141893
>>106141818
would what? barf?
Anonymous
8/4/2025, 10:29:05 PM No.106141931
>>106141878
run the perplexity test with the wiki text and let me know what you score
Replies: >>106141938
Anonymous
8/4/2025, 10:29:59 PM No.106141938
file
file
md5: add2c407bc2aec8c168b3aa061bd6d74🔍
glm4.5 air q2_k
temp 0.6 minp 0.05
>>106141931
how do i do that?
./llama-bench?
Replies: >>106141974 >>106142046 >>106142546 >>106142581
Anonymous
8/4/2025, 10:31:09 PM No.106141943
>>106137590
DeepSeek is local.
Replies: >>106141961
Anonymous
8/4/2025, 10:31:42 PM No.106141950
>>106141818
with standards this low nemo should have been enough for you
Anonymous
8/4/2025, 10:32:12 PM No.106141955
>>106138596
Carbon fiber miku is stronger than all others.
Anonymous
8/4/2025, 10:32:23 PM No.106141961
>>106137663
>>106141943
it's fake and unofficial made by one literally who
Replies: >>106142132 >>106142451
Anonymous
8/4/2025, 10:33:01 PM No.106141974
>>106141938
sloppa'd up
Anonymous
8/4/2025, 10:34:21 PM No.106141990
>air
you guys know your using the shitty one, right?
Replies: >>106142004 >>106142034
Anonymous
8/4/2025, 10:34:32 PM No.106141993
>>106138596
ever did that experiment where you put a rose into a flask with some nigger cum at the bottom?
Anonymous
8/4/2025, 10:35:24 PM No.106142001
I hope Qwen releases other sizes of their imagegen. It would be interesting to see how the 0.5B model holds up
Anonymous
8/4/2025, 10:35:39 PM No.106142004
>>106141990
>just run the one that's 3.5x larger bwo
good actionable advice
Replies: >>106142015
Anonymous
8/4/2025, 10:36:28 PM No.106142015
>>106142004
256GB ram is like $400 at most, get a job
Replies: >>106142040 >>106142049 >>106143193
Anonymous
8/4/2025, 10:38:14 PM No.106142034
>>106141990
buy me 4x rtx pro 6000s then jew
Anonymous
8/4/2025, 10:38:38 PM No.106142040
>>106142015
>256GB ram is like $400
plus new motherboard plus new cpu, might as well get a new computer at this point
>get a job
no
Replies: >>106142101
Anonymous
8/4/2025, 10:39:05 PM No.106142046
>>106141938
./llama-perplexity -m ../your/model/path/your-quant-name-00001-of-00002.gguf -f wiki.test.raw --seed 1337 -fa -fmoe -mla 3 --ctx-size 512 --threads yourthreadcount -ngl 99 -sm layer --override-tensor exps=CPU,attn_kv_b=CPU --no-mmap
Replies: >>106142258
Anonymous
8/4/2025, 10:39:10 PM No.106142049
>>106142015
im only 18, what do u expect me to do??
Replies: >>106142055 >>106142069 >>106142082
Anonymous
8/4/2025, 10:39:46 PM No.106142055
>>106142049
I was working at 14, whats your excuse
Replies: >>106142077
Anonymous
8/4/2025, 10:40:12 PM No.106142062
>>106141760
>no prebuilt binaries of koboldcpp-rocm for linux
sadge
Replies: >>106142070 >>106142693
Anonymous
8/4/2025, 10:40:46 PM No.106142069
>>106142049
scam pedos
Anonymous
8/4/2025, 10:40:47 PM No.106142070
>>106142062
coompile it yourself
Anonymous
8/4/2025, 10:41:27 PM No.106142077
>>106142055
im shy.. and i dont wanna do boring work for 300-400$ a month (or whatever they would pay a highschooler in serbia)
Replies: >>106142090 >>106142330 >>106142455
Anonymous
8/4/2025, 10:42:04 PM No.106142082
>>106142049
get a job instead of being on 4chan you'll thank yourself later OR alternatively open an onlyfans using your feet as money
Replies: >>106142112
Anonymous
8/4/2025, 10:42:31 PM No.106142090
>>106142077
>highschooler in serbia
what does your tummy look like, we may be able to work out a deal
Replies: >>106142112
Anonymous
8/4/2025, 10:42:37 PM No.106142092
>>106141726
>>106141550
>>106141440
These are good.
Anonymous
8/4/2025, 10:43:10 PM No.106142101
>>106142040
There's no point to those big autism box setups even if you have the money, you could just use it to buy a lifetime worth of tokens instead. It probably would cost less per token than just the electricity for your slow ass "rammaxxing" setup
The only advantage of running locally would be to be able to run weird finetunes and merges, but these bloated moes will never have any of those
Replies: >>106142108 >>106142137
Anonymous
8/4/2025, 10:43:45 PM No.106142108
>>106142101
Fuck off Sam, you ain't reading my logs.
Anonymous
8/4/2025, 10:44:01 PM No.106142112
>>106142090
im actually quite skinny however im a man and im not gay, im not desparate for money i still have some from coding shitty unity games for grifters when i was 15
>>106142082
idk anon i just dont wanna work at a mcdonalds or at a store its scary
Anonymous
8/4/2025, 10:44:35 PM No.106142121
>>106141673
Goofy but as a cute r63 maid with a stutter so she calls herself G-Goofy.
Anonymous
8/4/2025, 10:45:08 PM No.106142132
>>106141961
Implies meme authorship matters.
Replies: >>106142164
Anonymous
8/4/2025, 10:45:45 PM No.106142137
>>106142101
nta, but apis just seem to go down as soon as i'm about to finish, so having local that's immune to service disruption is nice
Anonymous
8/4/2025, 10:46:08 PM No.106142147
>>106138905
What are the system requirements in this thing?
Anonymous
8/4/2025, 10:48:14 PM No.106142164
>>106142132
matters if it's forced meme
Anonymous
8/4/2025, 10:52:43 PM No.106142203
>>106138789
can it do anime titties?
Anonymous
8/4/2025, 10:58:55 PM No.106142258
>>106142046
>-f wiki.test.raw --seed 1337
./llama-perplexity --model ~/TND/AI/GLM-air/zai-org.GLM-4.5-Air.Q2_K-00001-of-00004.gguf -ngl 1000 -ot exps=CPU -t 6 -fa -fmoe -mla 3 --ctx-size 512 -sm layer -f wiki.test.raw --seed 1337
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
error: invalid argument: -fmoe

removed that
./llama-perplexity --model ~/TND/AI/GLM-air/zai-org.GLM-4.5-Air.Q2_K-00001-of-00004.gguf -ngl 1000 -ot exps=CPU -t 6 -fa --ctx-size 512 -sm layer -f wiki.test.raw --seed 1337
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
error while handling argument "-f": error: failed to open file 'wiki.test.raw'


usage:
-f, --file FNAME a file containing the prompt (default: none)


to show complete usage, run with -h
Replies: >>106142312
Anonymous
8/4/2025, 11:03:01 PM No.106142312
>>106142258
oh i forgot you need to download the file as well.
it's this file
https://huggingface.co/nisten/llama3-8b-instruct-32k-gguf/blob/main/wiki.test.raw
Replies: >>106142332 >>106142365
Anonymous
8/4/2025, 11:05:03 PM No.106142330
>>106142077
this is the poor mindset that will end up leading you towards a life of welfare
Replies: >>106142342
Anonymous
8/4/2025, 11:05:10 PM No.106142332
>>106142312
how long time should it take?
i have like 11t/s
Replies: >>106142373
Anonymous
8/4/2025, 11:06:26 PM No.106142342
>>106142330
but im gonna very likely enroll in a good university and then get a job while in uni because those pay better and i can work with something im interested in
Replies: >>106142366 >>106142386
Anonymous
8/4/2025, 11:08:13 PM No.106142365
>>106142312
>ETA 1 hour
[1]3.7721,[2]4.8459,[3]4.1327,[4]3.9085,[5]4.3769,[6]4.4260,[7]4.6263,[8]4.7943,[9]5.4195,[10]5.5582,[11]5.7446,[12]5.8990,
[13]6.3410,[14]6.2090,[15]6.2595,[16]6.2787,
how do i do a quicker test? i wanna chat with glm
Anonymous
8/4/2025, 11:08:17 PM No.106142366
>>106142342
>university
Unless you already have a job lined up after then that is a big mistake, you should have been looking into getting into a trade like electrical or plumbing or the like, they will pay to train you and there is massive demand and good pay. With uni you are paying to likely get fucked over with something not in demand
Replies: >>106142410
Anonymous
8/4/2025, 11:08:51 PM No.106142373
>>106142332
you can try offloading some layers onto your GPU but I don't know how much you'll be able to move onto a 3060 if that's all you have.
-ot "blk\.3\.ffn_up_exps=CUDA0, blk\.3\.ffn_gate_exps=CUDA0" -ot "blk\.4\.ffn_up_exps=CUDA0, blk\.4\.ffn_gate_exps=CUDA0"
etc. etc. you can make the above commands cleaner with regex but im too lazy
Replies: >>106142410 >>106142425
Anonymous
8/4/2025, 11:10:15 PM No.106142386
>>106142342
well good luck kid, just know if you stay comfortable you'll always be poor. network with people and some rich fuck will give you get a high paying job.
Replies: >>106142410
Anonymous
8/4/2025, 11:13:01 PM No.106142410
>>106142366
in serbia uni is free if you have good grades and get a nice amount of points on the entrance exam
>>106142373
thanks anon but i meant like can i do only 10% of the test so its quicker
[23]6.1785,[24]6.0028,[25]5.8904,[26]5.7935,[27]5.6974,[28]5.7772,[29]5.7919,[30]5.8340,[31]5.8936,[32]5.9393,[33]6.0358,[34]6.0629,[35]6.1970,[36]6.2575,[37]6.2242,[38]6.3374,[39]6.3386,[40]6.3366,[41]6.4282,[42]6.4370,[43]6.3993,[44]6.4095,
some newer results
>>106142386
you're right, but its kind of too late to be thinking about that, school starts in a month and most jobs are probably filled with others that applied early
> if you stay comfortable you'll always be poor
very great advice, im writing it down
Replies: >>106142452
Anonymous
8/4/2025, 11:14:48 PM No.106142425
>>106142373
[61]6.9160,[62]6.9912,[63]7.0670,[64]7.0982,[65]7.1008,[66]7.1183,[67]7.1213,[68]7.1303,[69]7.1982,[70]7.1933,[71]7.1833,[72]7.1691,[73]7.1749,[74]7.1992,[75]7.1914,[76]7.1142,[77]7.0454,[78]7.0050,[79]6.9811,[80]6.9604,
perplexity doesnt seem that bad for q2 so far
im stopping it now
Replies: >>106142505
Anonymous
8/4/2025, 11:16:38 PM No.106142446
Why do anons keep on entertaining this attention seeking zoomer? He's been moaning about his age in /ldg/ for a few days already
Anonymous
8/4/2025, 11:17:11 PM No.106142451
>>106141961
That just makes it more suitable.
Anonymous
8/4/2025, 11:17:12 PM No.106142452
>>106142410
demand for linemen is always huge and you can easily make 200K+ a year in the US at least if your willing to be on call / travel a bit
Replies: >>106142585
Anonymous
8/4/2025, 11:17:20 PM No.106142455
>>106142077
>petra is a highschool twink
sounds about right
Anonymous
8/4/2025, 11:22:06 PM No.106142503
file
file
md5: 627b77f8a3066deaded4541adb4e2555🔍
>cudadev
>blacked miku poster
>mikutroon
>petra
>ikaridev
how does he achieve this bros..
Anonymous
8/4/2025, 11:22:21 PM No.106142505
>>106142425
go coom or something and run it overnight and post your results tomorrow in that case
Anonymous
8/4/2025, 11:23:58 PM No.106142518
IMG_0259
IMG_0259
md5: fa3fe84c133c2cf6721f068a26e6a18a🔍
>how does he achieve this bros
Anonymous
8/4/2025, 11:27:04 PM No.106142546
>>106141938
>OH NO NO NO
>NOT LIKE THIS
lmao
Replies: >>106142640
Anonymous
8/4/2025, 11:31:10 PM No.106142581
>>106141938
>it wasn't just x, it was y
>2 times in the same response
Hopefully that's just Q2 being Q2...
Anonymous
8/4/2025, 11:31:33 PM No.106142585
>>106142452
thanks for the advice anon, i really dont know what to say and i dont want to sound dismissive. im just stumped on where the fuck do i start, all i know is tech. first i'd need to get a visa then tickets and etc for that i need money and since its the us probably a nice sum of money, for that i need to get a job
my brain gets fried thinking about this shit. i know im taking the easy route by just studying studying and hoping i can find a good job in/after uni
it is what it is
Replies: >>106142766
Anonymous
8/4/2025, 11:32:53 PM No.106142598
used the code tags you faggots
also, pls bake
Replies: >>106142617 >>106142648
Anonymous
8/4/2025, 11:33:20 PM No.106142602
WAIT ME
Replies: >>106142611
Anonymous
8/4/2025, 11:33:55 PM No.106142611
>>106142602
And you are?
Anonymous
8/4/2025, 11:34:20 PM No.106142617
>>106142598
8th page anon
Anonymous
8/4/2025, 11:36:15 PM No.106142637
file
file
md5: a58038a3303b8f6918bad07368d57bce🔍
if anyone wants me to test glm with different samplers/prompts/whatever (besides perplexity ill do that overnight)
give requests
picrel is q3_k_m btw
Anonymous
8/4/2025, 11:36:27 PM No.106142640
>>106142546
https://www.youtube.com/watch?v=1FZ3Xa7gEKk&list=RD1FZ3Xa7gEKk&t=12s
Anonymous
8/4/2025, 11:36:58 PM No.106142648
>>106142598
fuck you get triggered i won't do what you tell me. enjoy this mess of broken formatting.
Anonymous
8/4/2025, 11:37:05 PM No.106142652
>>106141440
I like this Miku
Anonymous
8/4/2025, 11:41:03 PM No.106142693
>>106142062
Well, I think I figured it out, so here are some benchos.
Hardware is Ryzen 7 5700G with AMD Radeon RX6600, plz no bully.
After looking at the docs more carefully I believe that main branch of koboldcpp supports ROCm via --usehipblas, but you need to compile it yourself, which I am lazy to do.
$ python koboldcpp-rocm/koboldcpp.py --usecublas 1 --model models/ggml-org/gemma-3-1b-it-Q4_K_M.gguf --bench
...
[00:19:20] CtxLimit:8192/8192, Amt:100/100, Init:0.09s, Process:35.89s (225.49T/s), Generate:3.59s (27.82T/s), Total:39.48s
Benchmark Completed - v1.96.2.yr0-ROCm Results:
======
Flags: NoAVX2=False Threads=7 HighPriority=False Cuda_Args=['1'] Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=False KvCache=0
Backend: koboldcpp_hipblas.so
Layers: 29
MaxCtx: 8192
GenAmount: 100
-----
ProcessingTime: 35.887s
ProcessingSpeed: 225.49T/s
GenerationTime: 3.594s
GenerationSpeed: 27.82T/s
TotalTime: 39.481s
Output: 1 1 1 1
-----

./koboldcpp-linux-x64-nocuda --usevulkan --model models/ggml-org/gemma-3-1b-it-Q4_K_M.gguf --bench
...
[00:26:47] CtxLimit:8192/8192, Amt:100/100, Init:0.79s, Process:4.45s (1818.02T/s), Generate:1.18s (84.67T/s), Total:5.63s
Benchmark Completed - v1.96.2 Results:
======
Flags: NoAVX2=False Threads=7 HighPriority=False Cuda_Args=None Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=False KvCache=0
Backend: koboldcpp_vulkan.so
-----
ProcessingTime: 4.451s
ProcessingSpeed: 1818.02T/s
GenerationTime: 1.181s
GenerationSpeed: 84.67T/s
TotalTime: 5.632s
Output: 1 1 1 1
-----


Unexpectedly, vulkan actually won.
I'm hitting post size limit, but here are also timings for --usecpu:
ProcessingTime: 45.296s
ProcessingSpeed: 178.65T/s
GenerationTime: 3.699s
GenerationSpeed: 27.03T/s
TotalTime: 48.995s
Replies: >>106142727 >>106142748
Anonymous
8/4/2025, 11:43:27 PM No.106142717
>>106136043
they will stop talking about this piece of shit once it's implemented, it's the glm cycle of hype and disillusions…
this thread is constantly filled with that kind of model begging -> model forgotten once they actually tried it and saw the shit for what it really was
Replies: >>106142772
Anonymous
8/4/2025, 11:44:22 PM No.106142727
>>106142693
very nice, so it seems vulkan is better, i heard cuda is getting beaten by vulkan slowly too
Replies: >>106142748
Anonymous
8/4/2025, 11:46:31 PM No.106142748
>>106142693
>>106142727
ah, now that I look closer at the logs,
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
false alarm, I need to tinker some more.
Replies: >>106142976
Anonymous
8/4/2025, 11:47:08 PM No.106142755
>>106138428
The bullying will continue until working GLM quants are on my machine. Or maybe not because it's fun and done out of love.
Replies: >>106142787
Anonymous
8/4/2025, 11:48:25 PM No.106142766
>>106142585
finding a job nowadays it's always who you know and not what you know. truly.
sure you need to know the basics, but getting a referral from someone who works a high paying place is the only way now.
Anonymous
8/4/2025, 11:49:44 PM No.106142772
>>106142717
It's still fun to act hyped even if something is going to be shit.
You know, kind of like how people here act like they're having sex, even though they're just role playing.
Anonymous
8/4/2025, 11:49:52 PM No.106142774
>>106135910 (OP)
I'm hoping Horizon A/B are local models. Alpha is pretty damn good at translating Japanese text (With some help from instructions) and almost beats Kimi k2.

The only thing i'm very sad about is how no models are good for writing stories. When will this change? I just want to have a model write about Monster Girls taking over the world, getting a cute harpy mate and meeting her parents.
Anonymous
8/4/2025, 11:50:24 PM No.106142782
>>106136033
she was one of the first widely used local models ever tho
Replies: >>106142889
Anonymous
8/4/2025, 11:50:32 PM No.106142784
>>106139295
>it's not just x, it's y
>—something, second thing, and a third thing for good measure
there isn't a word in the english dictionary that can describe how much I hate this slop, and how much more I hate humans who don't notice this slop as LLM writing (I see more and more and more posts on hn that are very much LLM generated and you get downvoted to oblivion for hurting people feelies if you say as much)
Anonymous
8/4/2025, 11:50:41 PM No.106142787
>>106142755
D-do you love m-me?
Replies: >>106142797
Anonymous
8/4/2025, 11:51:33 PM No.106142797
>>106142787
i do <3
Anonymous
8/4/2025, 11:55:39 PM No.106142843
i had no idea /lmg/ could become any gayer than it already was
Anonymous
8/4/2025, 11:55:49 PM No.106142846
Anyone have an Ubergarm card to share?
Replies: >>106142855
Anonymous
8/4/2025, 11:56:42 PM No.106142855
>>106142846
wait me guy should get it first
Anonymous
8/5/2025, 12:00:24 AM No.106142889
>>106142782
her time is over
Anonymous
8/5/2025, 12:06:26 AM No.106142950
>You’d cream your pants like a dog.
yeah that famous thing that dogs do
stupid ass model
Anonymous
8/5/2025, 12:07:32 AM No.106142963
now that glm is working it's time for step3
we finally have a super smart and uncensored vision model that doesn't shy away from describing porn and it's just going to fade into obscurity because it's using some fancy new attention thing
Anonymous
8/5/2025, 12:09:12 AM No.106142976
>>106142748
you probably have to pretend to have a different GPU.
Anonymous
8/5/2025, 12:09:36 AM No.106142979
>>106142968
>>106142968
>>106142968
Anonymous
8/5/2025, 12:28:36 AM No.106143193
>>106142015
I feel like even if I built for this, dual kits of 64gbx4 ddr5 would get me a whopping 1.5 token a second just like running r1 on ram got average people.

Maybe with offloading non-attention layers and a decent amount of vram we could squeeze it up to 2-3 tokens a second, maybe. But honestly anything less than 5 tokens a second kinda sucks. Also, I feel like a lot of the more advanced ways to tune speed like that are not available to casual users, and risking a purchase only to find out later youre not sure how to do what one asshole on reddit did with a beta from an unreliable dev is not a good thing to recommend.
Anonymous
8/5/2025, 1:00:33 AM No.106143521
boondocks-read
boondocks-read
md5: 05812eba2b2a5ed504ed3d5d077704bc🔍
>>106141760
Did you not read the release notes?
https://github.com/LostRuins/koboldcpp/releases/tag/v1.96.2
>download our rolling ROCm binary here if you use Linux.
>https://koboldai.org/cpplinuxrocm