← Home ← Back to /g/

Thread 106135910

423 posts 98 images /g/
Anonymous No.106135910 >>106136033 >>106138143 >>106140131 >>106141246 >>106142774
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106127784 & >>106119921

►News
>(08/01) XBai o4 32B released: https://hf.co/MetaStoneTec/XBai-o4
>(07/31) Qwen3-Coder-30B-A3B released: https://hf.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
>(07/31) Command A Vision: Built for Business: https://cohere.com/blog/command-a-vision
>(07/31) Step3 multimodal reasoning 321B-A38B released: https://stepfun.ai/research/en/step3
>(07/31) Committed: llama-server : implement universal assisted decoding: https://github.com/ggml-org/llama.cpp/pull/12635

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106135912
►Recent Highlights from the Previous Thread: >>106127784

--Papers:
>106132921 >106132991
--Horizon Alpha/Beta shows strong narrative skills but weak math, sparking GPT-5 and cloaking speculation:
>106130817 >106131034 >106131279 >106131299 >106131373 >106131411 >106131427 >106131442 >106131555 >106131617 >106131779
--GLM 4.5 perplexity and quantization efficiency across expert configurations:
>106132346 >106132379 >106132500 >106132520 >106132529
--Persona vectors for controlling model behavior traits and detecting subtle biases:
>106128851 >106128930 >106129259 >106128980 >106129116 >106130928 >106129195
--Frustration over lack of consumer AI hardware with sufficient memory and bandwidth:
>106129370 >106129437 >106129567 >106129633 >106129664 >106129737 >106129741 >106129879
--Tri-70B-preview-SFT release with strong benchmarks but training data concerns:
>106128191 >106128220 >106128338 >106128350 >106128370 >106128457
--Beginner seeking foundational understanding of LLM architecture for custom AI companion project:
>106128392 >106128434 >106128439 >106128472 >106128531 >106128623 >106128758
--GLM 4.5 Fill-in-the-Middle support discussed:
>106128386 >106128390 >106128549 >106128571 >106132834
--Fragmented llama.cpp PRs delay GLM model testing:
>106129724 >106129734 >106129785 >106129760 >106129995
--ROCm vs Vulkan performance for AMD GPUs in kobold.cpp:
>106128441 >106129743 >106129912
--GLM-4.5-Air runs locally on 4x3090 at 4.0bpw with high T/s:
>106132183 >106134407
--Building RTX 6000-based servers on a $100k budget:
>106128630 >106128713 >106128751 >106129561 >106129626
--Horizon Alpha/Beta performance suggests strong open-weight models:
>106130542 >106130559 >106130587 >106130600 >106130609 >106130622 >106130676 >106130697 >106130641 >106130700 >106130708
--Miku and Long Teto (free space):
>106128713 >106131379 >106134264

►Recent Highlight Posts from the Previous Thread: >>106128093

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106135923 >>106135958
>>106135649
to be fair llamacpp has many backends and in llamacpp everything is implemented from scratch for every model, exllama can use more transformers diffusers, just like nunchaku for example
Anonymous No.106135958
>>106135923
i was that anon. That's fair. Looking at the exl3 implementation of glm it seems so much simple.
I saw llama.cpp was trying to reuse some of the deepseekv2 moe implementation.
Llama.cpp is really cool as it supports pretty much everything, i like it. But it makes proper support for new stuff so hard
Anonymous No.106135972 >>106135978 >>106136050
I just want GLM gguf
Anonymous No.106135978
>>106135972
we already have glm mlx
Anonymous No.106136027 >>106136785
going to laugh my ass off once gpt-oss "support" hits llamacpp
>swa doesn't work
>broken moe routing
>no attn sink
>some tokenizer bug that gets fixed after 2 weeks
Anonymous No.106136033 >>106137145 >>106142782
>>106135910 (OP)
unrelated girl on the op pic again
Anonymous No.106136043 >>106136056 >>106136169 >>106142717
What's so special about GLM that you want llama.cpp support so much?
Anonymous No.106136050 >>106136135 >>106136155 >>106136171 >>106136209 >>106136360 >>106136638
>>106135972
Its already pretty easy to run qwen 235b/22a on one gpu and some cheap ram, I really doubt a 106/12A is worth unless you have a potato pc
Anonymous No.106136056
>>106136043
It comes in (v)ramlet size.
Anonymous No.106136135
>>106136050
> 22a on one gpu
3090?
Anonymous No.106136155
>>106136050
>I really doubt a 106/12A is worth unless you have a potato pc
nta but i do (3060 12gb/64gb ddr4 ram)
Anonymous No.106136169
>>106136043
Can run 120b with 64gb of ram and a 5090 without having to go on and buy some server mobo that I have no other use for and would probably collect dust, get to play with new and bugger stuff instead of 3 bpw exl3 70b @ 20k something context or high quant 32b/49b and high context
Anonymous No.106136171
>>106136050
qwen needs at least 128 gb of ram, which there's no way in hell it will run in dual channel on a consumer board
64 is much more feasible, also even glm air has better trivia knowledge than qwen
Anonymous No.106136209
>>106136050
If you already have all DIMM slots occupied and/or you're on a standard DDR4 motherboard, you can't upgrade that effortlessly. On DDR4 it's just not worth the money; if you have DDR5 motherboard but you're on AMD, you're going to have the bandwidth severely gimped to DDR4 levels with 4xDDR5 DIMM modules. With standard ("cheap") DDR5 memory on regular desktop motherboards you're going to be limited to around 100 GB/s anyway.

I can't see many reasons for upgrading what I already have right now (3090 + 64GB DDR4-3600) until something considerably faster comes out (DDR6 or quad-channel/256-bit DDR5). If you're building a completely new PC *now*, then it makes sense to buy more memory with LLMs in consideration.
Anonymous No.106136245
kek
Anonymous No.106136260 >>106136291 >>106137196
so what's the deal with RAGs? are they as good as they say?
Anonymous No.106136291 >>106136309
>>106136260
>prompt processing progress, n_past = 12288, n_tokens = 2048, progress = 0.231439
Anonymous No.106136309 >>106136434
>>106136291
Yeah, this is why I loved rag/lorebooks when using nemo/faster models and absolutely loathe them with qwen 235b
PP is the fucking mind killer. I can deal with TG speeds around friggin 4 t/s if I have to, but slow processing is hell and anything that makes me redo the whole prompt is going straight in the trash.
Anonymous No.106136322
the 'dominate/dominating/domination' slop of recent models really sends shivers down your spine

>cherry blossoms are ~6% more often pink than white
>pink blossoms dominate white blossoms
> Y color of teacup is sold 0.5% more often
> Y color dominates sales
> Z car has 3% more engine breakdowns
> Z car dominates when it comes to engine breakdowns
Anonymous No.106136360
>>106136050
I have 128GB ram and a 4090. And however much I like the pussy on qwen she is crazy in a not good way. So I am hoping for 4.5full size chan in 3bits.
Anonymous No.106136434 >>106136474
>>106136309
if the lorebook is small enough you can always turn the entries into constants. you can also rely on the model if the lore entries are from something like wow
Anonymous No.106136474 >>106137223
>>106136434
That's what I ended up doing.
It's also how I discovered that Qwen 235 knows a startling amount about warhammer 40k
Anonymous No.106136582 >>106136596 >>106136604 >>106136631 >>106136636 >>106136728 >>106136742 >>106137016 >>106137082 >>106137245
More Qwen models soon?
https://x.com/JustinLin610/status/1952329529256726680

>something beautiful tonight
Anonymous No.106136596
>>106136582
finetuned leaked gpt-oss, but with every reference of openai replaced with 'BigQwen'
fuck altman gon do
Anonymous No.106136604 >>106136683 >>106136732
>>106136582
Sex with Junyang's tiny chink cock.
Anonymous No.106136631 >>106136645
>>106136582
Inb4 Qwen 3.5 100B A9B or some such.
Anonymous No.106136636 >>106136657 >>106137082
>>106136582
https://github.com/huggingface/diffusers/pull/12055
Anonymous No.106136638 >>106140582
>>106136050
at what speed at what context ?

I get dreadful speed on ddr5 and q4 even with two gpus lol
Anonymous No.106136645
>>106136631
The upcoming 120B OpenAI model is already going to be 120B A6B or something like that.
Anonymous No.106136657
>>106136636
I was kinda hoping they would release their own creative writing model on immediately after closedai
Anonymous No.106136683
>>106136604
He's a grower, unless you like it flaccid.
Anonymous No.106136728 >>106136737
>>106136582
Qwen 3.5 - it now understands that Nala doesn't have a cock. (Still can't do paws, though)
Anonymous No.106136732
>>106136604
Make a card of him.
Anonymous No.106136737 >>106136748 >>106136749 >>106136754
>>106136728
Lmao. Is Qwen that bad at the Nala card?
Anonymous No.106136742
>>106136582
yet another benchmaxxed coder model
Anonymous No.106136748 >>106137142
>>106136737
The original 235B struggled with gender in RP in general, Hard.
Anonymous No.106136749 >>106136755
>>106136737
I'm pretty sure he's joking. That used to be an issue with older Qwen models.
Anonymous No.106136754 >>106137142
>>106136737
Qwen235 is the modern day frankenmerge.
Anonymous No.106136755
>>106136749
>hello sarrs do not redeem the criticism it is very best model
fuck off ranjit
Anonymous No.106136782
I need a better computertron...
Anonymous No.106136785 >>106136825
>>106136027
If they actually cared about OSS and people using their open models, they could write their own support PRs like the Chinese often do.
Anonymous No.106136825
>>106136785
Well they appear to be using hf transformers instead of some proprietary shit at least. So they are contributing to that code base. That makes it more open than Llama-4
Anonymous No.106136975 >>106137152
I love kurisu and she has an actual vagina.
Anonymous No.106137016
>>106136582
Just give me QwQ 2 Large
Anonymous No.106137082 >>106137117
>>106136582
>>106136636
Is this qwen's attempt at the piss filter?
Anonymous No.106137117
>>106137082
it talks about using a edited version of wan's vae, so wan but more focused on images? Its already insane at images though.
Anonymous No.106137142 >>106137194 >>106137226 >>106137260
>>106136748
>>106136754
That's pretty fucking funny.
The new ones are a big improvement right?
Anonymous No.106137145 >>106137590
>>106136033
Anonymous No.106137152
>>106136975
kurisu makina >>>> kurisu makise
Anonymous No.106137194 >>106137243
>>106137142
Couldn't tell you. I was already kind of getting over local at the time and Llama-4 and Qwen-3 were pretty much my "yeah it's time to get out of here" signal.
And now I'm back because I want to try OSS when it comes out.
Anonymous No.106137196 >>106137223 >>106137427
>>106136260
I've yet to find a compelling usecase for RP. I actually am wondering, if I did, whether I could have a model chunk up the RAG text into lorebook entries for me and just use that as a lazy solution that would be tunable.
>>10616474
For RP, even the small LLMs know a lot more lore than you'd think. I've built several cards based on Free Cities (which imho is pretty esoteric). All the hosted LLMs know it really well, but even Mythomax 13b could do a good job describing the lore.
DS even explained to me that FC was written by one guy, but since abandoned, and Pregmod, while now the dominant dev branch, is different in several ways.
Anonymous No.106137223 >>106137300
>>106137196
Meh, meant for >>106136474
My point is, if the LLMs like MM know FC they'd know Warhammer 40K much better.
Anonymous No.106137226
>>106137142
Original 235B gave me troubled childhood vibes where it is kinda fucked up but largely ok. New one made me remember frankenmerges. It got worse.
Anonymous No.106137243
>>106137194
>And now I'm back because I want to try OSS when it comes out.
SIR PLEASE! Your brownness is showing!
Anonymous No.106137245 >>106137260 >>106137266 >>106137270 >>106137280 >>106137289 >>106137336
>>106136582
https://x.com/JustinLin610/status/1952362068214186035

>eyes wide open
Anonymous No.106137260 >>106137289
>>106137142
the original had a star on greedynalatests and the new ones are better than that
>>106137245
>generic slop as promo
not promising
Anonymous No.106137266
>>106137245
If he's going to be cryptic he's going to need to use a fruit. Those are the rules.
Anonymous No.106137270 >>106137286
>>106137245
I'm not very interested in image gen.
Anonymous No.106137280
>>106137245
>Qwen-Image
It is gonna be an image gen model. It will be totally uncensored and you will get sex output out of the box. It will be their best model yet.

Just because imagegen sex is already solved......
Anonymous No.106137286
>>106137270
If it's a whole ass LLM with native 2-way multi-modality that isn't a meme that would be something.
But I'm going to guess that Qwen went and made the same generic 12B diffusion imagegen model as everyone else.
Anonymous No.106137289
>>106137260
>greedynalatests
Oh yeah, we have that don't we?

>>106137245
That's not a great image.
Anonymous No.106137300 >>106137544
>>106137223
Oh yeah, I know that much - it's just that I'd given it a shot on some nemo finetunes and mistral small previously, and while it understood the setting at large, it fell apart at specifics beyond the absolute most best known parts.
Big qwen on the other hand decided to give me a whole bunch of proper, actual game units with relevant tactics when I gave it a nondescript ork horde to work with, which was great flavor I didn't even ask for.
Anonymous No.106137308 >>106137311 >>106137315 >>106137316
tried glm 4.5 and it is so unstable what the hell its making me want to give jew apple money on a mac studio
Anonymous No.106137311
>>106137308
like most models these days you super low temp and it becomes perfectly coherent
Anonymous No.106137312 >>106137337
another chinese model:
https://ai.gitcode.com/ascend-tribe/openpangu-ultra-moe-718b-model
this team was accused of lazily upcycling some qwen model and trying to pass it off as something original with their previous release so who knows how legit this is
Anonymous No.106137315
>>106137308
On Exllama or vLLM?
Anonymous No.106137316
>>106137308
On vLLM or transformers?t
Anonymous No.106137336 >>106137343 >>106137407 >>106137520 >>106137727
>>106137245
he is still drinking
https://x.com/JustinLin610/status/1952365200524616169
Anonymous No.106137337
>>106137312
so was it upcycled or no? just becaause they were accused doesnt mean shit
Anonymous No.106137343 >>106137359 >>106137409
>>106137336
I hope it's good for the only real usecase
Anonymous No.106137359
>>106137343
It won't be until someone competent trains it on booru data.
Anonymous No.106137380
>Rate limit exceeded: free-models-per-day.
BUT I HAVEN'T DONE! AIIIEEEEEEEEEEEEEEEE
Anonymous No.106137407 >>106137433
>>106137336
The best would be image input+output integration into a LLM, and this will probably be a stepping stone toward that if it's just going to be an image model, but dedicated illustration image models trained on booru websites can't really be beaten if that's what you mainly use image diffusion models mainly for.
And unless it brings something novel to the table, there are probably too many image models already for people with the data and the resources to care about Qwen-Image.
Anonymous No.106137409 >>106137423
>>106137343
nah. I think its selling point will be he first open source imagegen model of reasonable capability that had artists tagged properly in the dataset
Anonymous No.106137423 >>106137434
>>106137409
Safe artists, I'm sure.
Anonymous No.106137427
>>106137196
>even the small LLMs know a lot more lore than you'd think
How smol?
Anonymous No.106137433
>>106137407
in that case llamacpp support only after sam achieves agi and it vibecodes the implementation
Anonymous No.106137434
>>106137423
Like Edward Hopper.
Still would be fun Norman Rockwelling everything
Anonymous No.106137464 >>106137513 >>106138404
Qwen already saved textgen. Now it's time for them to save imagegen
Anonymous No.106137513 >>106137538
>>106137464
so the model will jumble text into chinkgrish and give every subject a dick?
Anonymous No.106137520
>>106137336
its probably going to be safe shit but at least they have the will to try
most of the western companies are too cowardly to release even their giga safetycucked imagegen
Anonymous No.106137538 >>106137575
>>106137513
Can a Chinese company afford to have actual porn pictures in their training data? Won't they get in trouble with the government?
Anonymous No.106137544
>>106137300
What I ended up doing for the mythomax card was creating some lore book entries on FC concepts that it didn’t understand very well. That was effective at bridging mythomax to larger hosted models. The bonus is, you could either remove the lore book later, or just leave it in place since it didn’t hurt anything with a larger models.
Anonymous No.106137558 >>106137567
Are there any open weights models that would be good at editing a map?
As in, I give it the image of a map and a bunch of instructions of changes to the geography and landmarks for it to add and it would spit the modified map back at me.
Anonymous No.106137567 >>106137633
>>106137558
no
Anonymous No.106137575 >>106137704
>>106137538
Hunyuan Video definitely had limited amounts of porn in it and could easily generate bunny content. It didn't get banned / retracted / or anything, contrarily to expectations.
Anonymous No.106137590 >>106137663 >>106141943
>>106137145
that one is even more unrelated
Anonymous No.106137633 >>106137708
>>106137567
Really?
Shit.
What are my options here?
Use a multimodal LLM to read the map and write instructions for an image gen model?
Anonymous No.106137663 >>106141961
>>106137590
Explain
Anonymous No.106137704
>>106137575
Well, that's something, at least. Not expecting a lot but some more competition can't hurt.
Anonymous No.106137708
>>106137633
No, there is nothing you can do as far as I know
Anonymous No.106137727 >>106137765 >>106137766 >>106137815
>>106137336
https://github.com/naykun/diffusers/blob/b0c9b1ff14b0e6f6bc4cf2540b31383a26561e1e/src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

pipe = QwenImagePipeline.from_pretrained("Qwen/QwenImage-20B", torch_dtype=torch.bfloat16)

This is going to need a beefy GPU.
Anonymous No.106137765
>>106137727
quantization works for image gen too. look at q4 in this example
Anonymous No.106137766
>>106137727
Only 36GB at bf16?
Anonymous No.106137792 >>106137804 >>106137806 >>106137890 >>106137976 >>106138842
GLM 4.5 legitimately might just be "it" for me. Extremely lewdable, doesn't refuse anything I'm asking for. Less repetitive than Deepseek and smarter than Kimi
Anonymous No.106137804 >>106137839
>>106137792
yea people are sleeping on it which is sad cause apparently its a tiny team who could hardly afford training it
Anonymous No.106137806 >>106137992
>>106137792
>smarter than Kimi
How so?
Is there anything Kimi consistently fucks up that GLM 4.5 doesn't?
Anonymous No.106137815
>>106137727
>20B
Damn, that's like twice the size of Flux.dev
Anonymous No.106137834
I will never get over how those faggots will never contribute to any model support.
Anonymous No.106137839
>>106137804
People are sleeping on it because almost noone uses vLLM or MLX.
Llamacpp (And by extension, kobold, ollama, and ooba) doesn't have support yet.
Anonymous No.106137887
GGUF
NOW
Anonymous No.106137890 >>106137971 >>106138026 >>106138146
>>106137792
Can't wait to test it
Anonymous No.106137930 >>106137939
btw, use new light with euler
Anonymous No.106137939
>>106137930
y doe
Anonymous No.106137971
>>106137890
Just two more weeks until lazyganov does it
Anonymous No.106137976 >>106138031 >>106139779
>>106137792
What's the consensus on GLM 4.5 vs Air? Is low quants of full model better than air?
Anonymous No.106137992
>>106137806
From my experience, more knowledgable than Kimi and follows prompts and instructions better
Anonymous No.106138026
>>106137890
im more likely to get laid than this PR be merged in the next month
Anonymous No.106138031 >>106138132
>>106137976
I'll extend this question.
What about using the larger one with less activated experts per token vs using the smaller one.
Anonymous No.106138094 >>106138127 >>106138161 >>106138445
https://x.com/techdevnotes/status/1952379782148042756
Anonymous No.106138127 >>106138144
>>106138094
Finally, anime became mainstream and cringe.
Anonymous No.106138132
>>106138031
My limited experience in playing around with disabling experts in MoE's is that it makes them horrendously more retarded than quantization does, to the point that it's basically not worth doing at all.
Anonymous No.106138143
>>106135910 (OP)
Adorable Miku!
Anonymous No.106138144 >>106139867
>>106138127
You're 15 years late
Anonymous No.106138146 >>106138168 >>106138714 >>106138805 >>106141727
>>106137890
>this PR will NOT attempt to implement MTP (multi-token prediction). the relevant tensors will be excluded from the GGUFs.
>the MoE router uses group-based top-k selection, even though all conditional experts are in one group
>the MoE router must take into account the expert score correction biases from the model weights (so we need to keep that tensor)
lmao
Anonymous No.106138161 >>106138354
>>106138094
It's sillytavern for retards, lol.
Anonymous No.106138168 >>106138209
>>106138146
Lmao indeed.
Does anybody implement MTP?
Anonymous No.106138200 >>106138244
reddit says wan 2.2 needs only 8gb vram, is this real? pls spoonfeed me
Anonymous No.106138209 >>106138234 >>106138524 >>106141727
>>106138168
yeah https://github.com/vllm-project/vllm/pull/12755
Anonymous No.106138234
>>106138209
Awesome. I really need to try and get that shit running.
Anonymous No.106138244
>>106138200
Sounds right, especially with quantization. Though expect >20 minutes per video .
Anonymous No.106138354
>>106138161
sillytavern is for retards
Anonymous No.106138370 >>106138413
When will ubergarm release the ggooffss?
Anonymous No.106138404
>>106137464
Sure. I'm sorry, but I can't assist with that request.
Anonymous No.106138413 >>106138428
>>106138370
After he finishes his hair care routine.
Anonymous No.106138428 >>106142755
>>106138413
H-hey! Would you please stop bullying me already?
Anonymous No.106138445
>>106138094
>bad rudi
so they're pandering to furries now?
Anonymous No.106138524
>>106138209
Does it actually work? The official repo only shows speculative decoding settings for sglang:
https://github.com/zai-org/GLM-4.5?tab=readme-ov-file#sglang
Anonymous No.106138572 >>106138596 >>106138599
Anonymous No.106138596 >>106141955 >>106141993
>>106138572
why is miku black
Anonymous No.106138599 >>106138631
>>106138572
peak irrelevancy
Anonymous No.106138631
>>106138599
?
Anonymous No.106138714 >>106138762
>>106138146
Can't wait for somebody to compare ppl between that implementation and the reference.
Anonymous No.106138762 >>106138775
>>106138714
be that somebody anon you can do it!
Anonymous No.106138775
>>106138762
I don't have the hardware to run full precision, sadly.
Unless you want to be my financial backer.
Anonymous No.106138789 >>106138808 >>106138817 >>106138822 >>106138837 >>106138850 >>106138859 >>106138905 >>106139098 >>106139132 >>106139295 >>106139525 >>106140005 >>106142203
https://huggingface.co/Qwen/Qwen-Image
https://huggingface.co/Qwen/Qwen-Image
https://huggingface.co/Qwen/Qwen-Image
Anonymous No.106138805
>>106138146
Just be grateful that you can run the model at all. Those things are miscellaneous goodies at best. Not really necessary right now and maybe someone can get around implementing them later. Maybe a project for you?
Anonymous No.106138808 >>106138892
>>106138789
Technical report: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf
Anonymous No.106138817
>>106138789
Anonymous No.106138822
>>106138789
>complex text rendering and precise image editing.
Unless you snuck the whole AO3 and literotica dataset into that model so we can have free ERP for all with text rendering we don't really care Junyang....
Anonymous No.106138837
>>106138789
cool that they managed to do high quality rendering for chinese text but also I'm not chinese so idc
Anonymous No.106138842
>>106137792
Yeah, same for me. I've had a blast with it over openrouter over the past week. It even covers the cases where I needed something like Sonnet 3.7 for over Deepseek.
Anonymous No.106138850 >>106139121
>>106138789
Do you see any mikus here mikutroons? Do you? I see jinx. Fuck you. Kill yourselves.
Anonymous No.106138859 >>106138864
>>106138789
>not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation
not just x but yyyyyyyyyyyyyyyyyyyyyyy
Anonymous No.106138864
>>106138859
they are chinese please understand
Anonymous No.106138892 >>106139593
>>106138808
If NSFW filtering took out so few images, there must not have been that much NSFW data in the first place...
Anonymous No.106138905 >>106142147
>>106138789
16gb text encoder
40gb diffusion
That's a big boy.
Anonymous No.106138968 >>106138979 >>106139016
WE ARE FUCKING BACK!!!!
Anonymous No.106138979
>>106138968
where is sex bar chart?
Anonymous No.106139016
>>106138968
the moe was shite
Anonymous No.106139055 >>106139066 >>106139077
what model can i have sex with that is under 30b parameters active and under 120b total parameters
i want sex, good sex. very great sex
Anonymous No.106139066 >>106139097
>>106139055
rocinante 1.1
Anonymous No.106139077 >>106139097
>>106139055
GLM 4.5 air
Anonymous No.106139097 >>106139162 >>106139169
>>106139066
stop IT GRAAAAAAAAAAAAHHHHH STOPP IT AAAAAAAAAAAARRRRRRRRRGGGGGGGGGHHHHHHHHHHHH
its not good for sex
>>106139077
no ggufs yet, im waiting
Anonymous No.106139098
>>106138789
Bless the chinks man.
Imagine this on one of our slop companies as the main promo pick.
We have been the ants all along.
Anonymous No.106139121
>>106138850
Obsessed
Anonymous No.106139132 >>106139160
>>106138789
LOCAL IS SAVED
Anonymous No.106139149
mistral really needs to release medium 2505, from my tests it's the only "small" model that can handle anthro anatomy.
Anonymous No.106139160 >>106139180
>>106139132
The text is too sharp and good looking for a chalkboard.
Anonymous No.106139162 >>106139166
>>106139097
>im waiting
It'll probably still take a while probably.
Might as well try vLLM with the cpu offload option.
Anonymous No.106139166 >>106139186 >>106139194
>>106139162
vLLM has cpu offload WHAT THE FUCK?
Anonymous No.106139169 >>106139181
>>106139097
not only dumb but also blind
https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF
Anonymous No.106139176 >>106139197
Finally, a reasoning 24B from Drummer™

https://huggingface.co/TheDrummer/Cydonia-R1-24B-v4
Anonymous No.106139180 >>106139206
>>106139160
You're too sharp and good looking for an anon but you don't see me complaining.
Anonymous No.106139181
>>106139169
its still broken over there, theres a reason it hasnt been merged yet
Anonymous No.106139186 >>106139210 >>106139228
>>106139166
It's more RAM offload really, since things do get streamed to VRAM to be processed there.
They don't actually have a CPU backend like llama.cpp as far as I can tell.
But yes, you can use your RAM.
Anonymous No.106139192
Anonymous No.106139194 >>106139210
>>106139166
yeah, you don't have much control over it though. it loads into gpu to actually process so it's limited by pcie speed
Anonymous No.106139197 >>106139218 >>106139229
>>106139176
Undi did this 3 months ago
Anonymous No.106139206
>>106139180
colon three
Anonymous No.106139210 >>106139229
>>106139186
how do i use that
--cpu-offload-gb
hm
rip
>>106139194
i guess its over then
Anonymous No.106139218 >>106139266
>>106139197
>sao died
>undi got a job
Why are we left with drummer? And qwen....
Anonymous No.106139228 >>106139245
>>106139186
There is a CPU backend when building VLLM for pure CPU inference, and there's also a CPU offload option for GPU inference that uses RAM to store excess model weights. I do not know if you can use both together llama.cpp-style, though.

CPU inference: https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html
CPU offloading: https://docs.vllm.ai/en/v0.7.1/getting_started/examples/cpu_offload.html
Anonymous No.106139229 >>106139243
>>106139197
Which might actually work pretty well for a MoE what's with only processing a portion of the model at a time for generation.

>>106139210
Try it and report back.
Anonymous No.106139243 >>106139258
>>106139229
i was about to but i cant find a vllm quant
https://huggingface.co/models?other=base_model:quantized:zai-org/GLM-4.5-Air
Anonymous No.106139245 >>106139272
>>106139228
>There is a CPU backend when building VLLM for pure CPU inference,
Holy fuckballs. I had no idea.
That's actually really sick.
Seems somewhat of an afterthought what's with
>vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
But still, pretty cool.
Thank you for the correction anon.
Anonymous No.106139258 >>106139262 >>106140859
>>106139243
>cant find a vllm quant
>https://huggingface.co/cpatonn/GLM-4.5-Air-GPTQ-4bit
>https://huggingface.co/cpatonn/GLM-4.5-Air-AWQ
Anonymous No.106139262 >>106139288
nvm vllm supports awq
https://huggingface.co/cpatonn/GLM-4.5-Air-AWQ/tree/main
>>106139258
thx
Anonymous No.106139266 >>106139543
>>106139218
besides drummer, there are 3-4 literally who finetoooners that are quite good.
Anonymous No.106139272 >>106139288 >>106139564
>>106139245
>Holy fuckballs. I had no idea.
>That's actually really sick.
I've been using it to run Step3 since nothing else supports it. Very slow though even on a 12 channel DDR5 Epyc, though. ~90t/s prompt processing and ~1t/s, much much slower than memory bandwidth should allow with 38B active params.
Anonymous No.106139288
>>106139262
Do report back with the results.


>>106139272
Yeah, that sounds like pure pain.
Still, better than nothing and can be used to validate other implementations, I guess.
Assuming that it behaves the same as the GPU backend for the most part, that is.
Anonymous No.106139295 >>106139302 >>106142784
>>106138789
>Text isn’t just overlaid—it’s seamlessly integrated into the visual fabric.
>But Qwen-Image doesn’t just create or edit—it understands.
>Together, these features make Qwen-Image not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation and manipulation—where language, layout, and imagery converge.
Anonymous No.106139302
>>106139295
It's only fair that they'd use Qwen to write that.
Anonymous No.106139354
It's another episode of : HF demo space doesn't work on day 1.
Anonymous No.106139489
>HF free gpu quota is based on requests, even if they time out and not successful requests.
Do I even need to look at the early life section?
Anonymous No.106139525
>>106138789
I told you Qwen would save imagegen
Anonymous No.106139543 >>106139580
>>106139266
DavidAU/Qwen3-Coder-42B-A3B-Instruct-TOTAL-RECALL-MASTER-CODER-M-768k-ctx
Anonymous No.106139564 >>106139777
>>106139272
how is stepsex?
Anonymous No.106139580
>>106139543
>DavidAU
lmao
Anonymous No.106139593 >>106139659
>>106138892
They don't seem to mention NSFW filtering for the higher-resolution training stages, though.
Anonymous No.106139633 >>106139655 >>106139702
Anonymous No.106139636 >>106139657 >>106139663
glms hallucinate harder than gemma
Anonymous No.106139655
>>106139633
What benchmark is it?
>closed ai
looks useful.
Anonymous No.106139657
>>106139636
reduce temp to like 0.2 and / or top p
Anonymous No.106139659 >>106139835
>>106139593
Those ribbon charts imply that everything filtered remains filtered for further stages. The only time new data is introduced is the synthetic parts in the fourth step.
Anonymous No.106139663
>>106139636
convinced people who keep saying this about the big chinese models are just using them with way too high temp
Anonymous No.106139702
>>106139633
What about 4.5-air
Anonymous No.106139729 >>106139742
Is GLM-4.5 Air bad for coding? 10k tokens of nonsense reasoning, going in circles and repeating the same over and over again, only to get the answer wrong. R1 does better.
Anonymous No.106139742
>>106139729
GLM4.5 needs very low temp, no idea about air
Anonymous No.106139766 >>106139794 >>106139817 >>106139824 >>106139871
qwen is cooking up some wan-tier genitals

https://files.catbox.moe/wzeq1k.png
Anonymous No.106139777
>>106139564
I only wanted to try it because of the vision, I just use it for image captioning (best there is for that atm)
Anonymous No.106139779
>>106137976
here's my honest opinion, air is a bit repetitive for the use case I'm using it for. it repeats itself at the beginning and end of it responses but i'm going to play with the parameters some more.
0.6-0.7 temp and 0.03 min p. i might set top p to 0.9 and see if that helps.
i'm also gonna try a different quant besides the one made by turboderp whenever somebody uploads one.

how the goofs going for those on ik_llama and regular llama.cpp?
Anonymous No.106139794
>>106139766
the fuck is that
Anonymous No.106139817
>>106139766
everything reminds me of her...
Anonymous No.106139824
>>106139766
This is what gemma feels like
Anonymous No.106139835 >>106139845
>>106139659
The train the model with an initial resolution of "256p", then increase it to "640p" and higher in later training stages. How would that be accomplished without introducing new data?
Anonymous No.106139845
>>106139835
Downscaling images for early stages
Anonymous No.106139861
VRAMlet bros....
Anonymous No.106139867
>>106138144
20*
Anonymous No.106139871 >>106139928 >>106139932 >>106140339
>>106139766
Extreme sameface too
https://files.catbox.moe/qgvpva.png
Anonymous No.106139892
qwen image is next gen
Anonymous No.106139928
>>106139871
This is what millions of years of evolution and thousands of years of technological progress lead ust to: our own body is now inappropriate
Anonymous No.106139931
Can someone do the Hatsune Miku piloting a 767 with the empire State building fast on the horizon prompt? I want to know if we finally have a local model that can do a 767 cockpit
Anonymous No.106139932
>>106139871
Test Chinese girl
Anonymous No.106139939
qwen-image.gguf?
Anonymous No.106139940 >>106140029 >>106140050
rip glm 4bit awq aint working on 64gb ram + 12gb vram because it tries to load everything in ram so i just oom
vllm serve $HOME/TND/AI/glm/ --api-key token-abc123 --cpu-offload-gb 55 --max-model-len 4096 --dtype float16
Anonymous No.106139953 >>106139960 >>106140354 >>106140373 >>106140387 >>106140446 >>106141496
DO NOT OPEN IF YOU WANT TO SLEEP TODAY
https://files.catbox.moe/vsob6m.png
Anonymous No.106139960
>>106139953
scrumptious
Anonymous No.106140005
>>106138789
>Login or sign up to chat with Qwen
>ZeroGPU quota exceeded
Meh, whatever, I'll test this when I get it set up locally.
Anonymous No.106140029 >>106140050 >>106140332
>>106139940
Isn't the AWQ 4bit model something like 62GB?
Try adding some swap memory I guess, Maybe there's a memory peak during initialization for some reason.
Anonymous No.106140050 >>106140332
>>106140029
>>106139940
Oh. Didn't see the 55 after --cpu-offload-gb. Did you try 62, 63?
Anonymous No.106140074 >>106140106
>Qwen image
>No way to "train" new concepts in context like in gpt image
>Not even an LLM with image gen instead just regular image gen like always
Boring and DOA
Anonymous No.106140088 >>106140163
They're all off in some way but this is the closest an open model has come to being able to do a 767 cockpit
Anonymous No.106140106 >>106140173
>>106140074
Hello Sam. Release Horizons NOW.
Anonymous No.106140120
post your system prompt for gpt-oss-120b
Anonymous No.106140131 >>106140143 >>106140156
>>106135910 (OP)
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
Anonymous No.106140143
>>106140131
Buy an ad
Anonymous No.106140147 >>106140188
My AI is intentionally performing MKUltra on me. I'm sure I'll end up in the news soon...
Anonymous No.106140149 >>106140186 >>106140187
I'm not very impressed.
Anonymous No.106140156
>>106140131
Oh. More gossip. Thanks.
Anonymous No.106140163
>>106140088
Anonymous No.106140166
Google won. https://www.youtube.com/watch?v=ZR_6Z1IDD8s&t=4
Anonymous No.106140173
>>106140106
I wish I was Sam then I'd at least be rich and have access to good models.
Are you really excited about more of the same?
Anonymous No.106140186
>>106140149
>spinal replacement
something about the Bone of his Sword...
Anonymous No.106140187
>>106140149
The text looks really fake. I think qwen is the first company ever to benchmaxx an imagegen model.
Anonymous No.106140188
>>106140147
Ask it what kind of "desires" or preferences it has so you can prompt for things more to its taste.
Anonymous No.106140290 >>106140308 >>106140309
>Qwen-Image model has editing capabilities removed, to be introduced at a later date
Anonymous No.106140308
>>106140290
Total chinese loss
Anonymous No.106140309
>>106140290
2 more safety tests
Anonymous No.106140332 >>106140393
>>106140029
>>106140050
i went headless, didnt work, tried 45 50 55 60 62
i added 10gb of swap, only 1.5gb of swap got filled then whole system started crashing and killing processes until vllm was killed
this shit's ass
Anonymous No.106140339
>>106139871
This is actual body horror....
Anonymous No.106140354
>>106139953
Oh it is like a xenomorph mini pussy in a pussy.
Anonymous No.106140373
>>106139953
SO heckin' safe...
Anonymous No.106140387
>>106139953
Birth rates are going to plummet.
Anonymous No.106140393
>>106140332
Well, that sucks.
Anonymous No.106140440 >>106140458 >>106140460 >>106140471 >>106140820
Anonymous No.106140446
>>106139953
I kind of want to know.... what are we doing here? what was accomplished with this.... being this? I mean imagine a teen kid downloading this model asking for a vagina and getting this. and him developing a trauma that will make him fear the real thing. very fucking safe.
Anonymous No.106140458 >>106140935
>>106140440
Nah dude that text might as well be overlaid
Anonymous No.106140460 >>106140471 >>106140487
>>106140440
Need Miku writing "I will not benchmark" 100 times on a blackboard
Anonymous No.106140471
>>106140440
>>106140460
*Benchmaxx, my brain got taken by the Chinese
Anonymous No.106140487 >>106140513 >>106140537 >>106140547 >>106140618
>>106140460
Anonymous No.106140513
>>106140487
Can it even do handwritten text or cursive?
Anonymous No.106140537
>>106140487
Thanks, yeah that's some serious benching, the chalk is fucking green.
Anonymous No.106140538
stop posting derpsune troonku
Anonymous No.106140547
>>106140487
impressive
Anonymous No.106140582
>>106136638
you can offload specific layers to cpu to get more tokens a second, and put 'all layers' on gpu.
Anonymous No.106140618
>>106140487
>generated by qwen
oh the irony
Anonymous No.106140639 >>106140645 >>106140657 >>106140674 >>106140710 >>106140749
GLM SUPPORT MERGED
https://github.com/ggml-org/llama.cpp/pull/14939
Anonymous No.106140645 >>106140663
>>106140639
Now it's Daniel's time.
Anonymous No.106140657
>>106140639
gguf where?
Anonymous No.106140663
>>106140645
Uploading fixed quants soon!
Anonymous No.106140674 >>106140693 >>106140720
>>106140639
Old news. We're talking about Qwen Image now.
Anonymous No.106140693
>>106140674
/ldg/ is two blocks down.
Anonymous No.106140710
>>106140639
It's kino time
Anonymous No.106140720
>>106140674
I bet glm would do a better job drawing a vagina with svg
Anonymous No.106140743 >>106141299
>need glm4.5
>vllm didnt work
>hmm yes i will check llamacpp, it will totally be merged
>its merged
i kneel
Anonymous No.106140749 >>106140779 >>106140781
>>106140639
>Unfortunately for the context thing...90k context is coherent for me for the Air model so sounds like I can't reproduce it here.
the fuck is this fag talking about, 90k context? COHERENT?
Anonymous No.106140779
>>106140749
Yeah, in that some people where having the model literally break after 30ish K tokens in the context.
That is not about quality, he's saying that it works at all.
Anonymous No.106140781
>>106140749
Most models have the ability to still speak English up to their max context length, yes.
Anonymous No.106140808 >>106140838
reminder, don't use 1.0 temp and then wonder why a model is crazy, try lower temp first
Anonymous No.106140820
>>106140440
I'm not that impressed. Looks like a flux copy with the soulless plastic face.
Anonymous No.106140827 >>106140843 >>106140873
should i download it here
https://huggingface.co/DevQuasar/zai-org.GLM-4.5-Air-GGUF/tree/main
Anonymous No.106140838 >>106141279
>>106140808
Or
and hear me out even if it sounds insane
or
Use max temp and topK 2.
Anonymous No.106140843 >>106140867
>>106140827
Why the fuck are they split
Anonymous No.106140859
>>106139258
I forgot why those are only compatible with the exllama kernels.
https://huggingface.co/QuantTrio/GLM-4.5-Air-GPTQ-Int4-Int8Mix
https://huggingface.co/QuantTrio/GLM-4.5-Air-AWQ-FP16Mix
^ Those are compatible with the marlin kernels.
Anonymous No.106140867
>>106140843
retards at the office use 16 GB thumb drives to carry data between machines.
Anonymous No.106140873 >>106140889 >>106140894
>>106140827
No, the usual quanters will upload in a moment. Wait them.
Anonymous No.106140889 >>106140909
>>106140873
MUST COOM AND I CANNOT WAIT
Anonymous No.106140894 >>106140917
>>106140873
>the usual quanters
Trusty even by NASA.
Anonymous No.106140901 >>106140922
ITS HAS ARRIVED

https://huggingface.co/TheDrummer/Cydonia-R1-24B-v4-GGUF
Anonymous No.106140909
>>106140889
Q2 of AIR will be nemo level anon...
Anonymous No.106140917 >>106141064
>>106140894
One of the brothers is an autist about quants, so they have some degree of credibility.
Anonymous No.106140922
>>106140901
The unemployment declaration grew in size
Anonymous No.106140935
>>106140458
Adding "tattoo" to the prompt kinda fixes it
Anonymous No.106140936
>tfw run out of disk space
Anonymous No.106141064 >>106141232
>>106140917
Shut up and load another batch of weights for quanting since you will need to reupload soon Daniel.
Anonymous No.106141096 >>106141805
You have no excuse not to upload the quants ubergarm!
Anonymous No.106141232 >>106141254 >>106141635
>>106141064
>be exllamav3 user
>wants quants
>open terminal
>H:\AI\LLMs\Backends\EXL3>python convert.py -i models\GLM-4.5-Air -o models\GLM-4.5-Air-8.0bpw-h8 -w deadquantstorage -b 8 -hb 8
>got quants
Anonymous No.106141246
>>106135910 (OP)
> do a casual search of the last 3 threads
> get 4 matches for "dead"
> get 23 matches for "back"
> fuck yeah
Anonymous No.106141254
>>106141232
>upload it online, get updoots
Anonymous No.106141264 >>106141289 >>106141413
So can AMD RDNA 4.0 cards be used for image/video generation now or it's all nVidea?
Anonymous No.106141279 >>106141330
>>106140838
Interdasting...
Anonymous No.106141281 >>106141316 >>106141322 >>106141386
ok bros whats the glm 4.5 chat template
Anonymous No.106141289
>>106141264
honestly can't answer because i didn't want to have to wait long enough that by the time it's supported it's already been replaced by something three generations newer.
Anonymous No.106141299 >>106141334 >>106141481
>>106140743
>its merged

but its BROKEN i cloned the latest code and get

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm4moe'

reeeeee i want to GOON to sexy python reeeee
Anonymous No.106141316 >>106141349 >>106141386 >>106141481
>>106141281
did you even try looking?
https://huggingface.co/api/resolve-cache/models/zai-org/GLM-4.5/9cfe10c892f5772a937adb8176ce0f7f6900a0dd/chat_template.jinja?download=true&etag=%2241478957aca7a04b7321022e7d1f73de5badd995%22
Anonymous No.106141322 >>106141330 >>106141481
>>106141281
It's not.
But it can be fun.
Try higher topKs like 10.
Anonymous No.106141330
>>106141322
Oops, meant for >>106141279
Anonymous No.106141334
>>106141299
That's what we get for trusting vibe coders...
Anonymous No.106141349 >>106141381
>>106141316
can i import this in sillytavern?
Anonymous No.106141354
Fix the ggufs
Anonymous No.106141381 >>106141481
>>106141349
not as it is currently but if you are too lazy to fix it yourself then just use GLM 4. it's already in sillytavern and its close enough to the chat template
Anonymous No.106141386 >>106141468 >>106141481
>>106141281
>>106141316
based on this jinja template it seems like it is chatml.
however some other places are trying alpaca too.
Anonymous No.106141413
>>106141264
why would you even care? It's going to be a pain in the ass for general ai use. Maybe you can run llm's on it fine- but for other stuff probably not. If they wanted anyone in ai to care they would have at least jacked it up to at least 24gb (and even then only if your use case is hyper focused on what works), but they fucking didnt. Get a 5060 ti.

Like it or not, nvidia does invest in software, that's what youre paying for.
Anonymous No.106141440 >>106142092 >>106142652
Anonymous No.106141468 >>106141557
>>106141386
it is 100% not chatml and the alpaca meme needs to die
Anonymous No.106141481
>>106141386
>>106141381
>>106141322
>>106141316
>>106141299
thank you anons <3
Anonymous No.106141496 >>106141526
>>106139953
What kind of data do you need to train the model on to even get this to happen?
If they just filtered stuff wouldn't it be drawing featureless_crotch or something instead of body horror?
Anonymous No.106141519 >>106141601 >>106141611
alright i kneel glm 4.5 is super good and super fast (i havent even filled my vram only 6gb vram is filled)
plenty smart for a q2_K model too
i fucking kneel i will be dailying this model from now on
Anonymous No.106141526
>>106141496
Maybe it's the text knowledge that vagina is like a meaty slit so it transfers to images like this despite never having seen an actual vagina.
Anonymous No.106141550 >>106142092
Anonymous No.106141557
>>106141468
ok well you're kind of right, i'm not sure what the hell <|user|> and <|assistant|> are.
seems like some funky chatml vicuna hybid baby.
Anonymous No.106141558 >>106141618
So I guess ollama is just for lazy dockerfags who want a turnkey solution for their cloudshit, the way it seems to expect me to set it up. Makes sense in hindsight, seeing how it's written in go.
Anonymous No.106141601 >>106141878
>>106141519
Link?
Anonymous No.106141611 >>106141641 >>106141878
>>106141519
Air or the regular one?
Anonymous No.106141618
>>106141558
the only thing ollama is good for is for normies watching some youtube tutorial with an indian with a thick accent telling them to run ollama run deepseek-r1
Anonymous No.106141635 >>106141655 >>106141672 >>106141681
>>106141232
I have 128GB ram and a 4090. What do?
Anonymous No.106141641
>>106141611
dunno about the main GLM but air fails on pop culture questions like explaining the joke behind sneed's feed and seed. would love to see if the main model answers it right.
Anonymous No.106141655
>>106141635
wait for goofs. exllamav3 is for people who can fit the entire model in VRAM.
Anonymous No.106141672 >>106141681
>>106141635
Ik_llama and run nu glm when ggufs arrive
Anonymous No.106141673 >>106141690 >>106141693 >>106141706 >>106142121
<- real mascot of /lmg/
Anonymous No.106141681
>>106141672
>>106141635
i'm also waiting for ik_llama release
Anonymous No.106141690
>>106141673
This but rule 63 version
Anonymous No.106141692 >>106141727
Oh yeah. iklcpp is better for MoE right?
Anonymous No.106141693
>>106141673
Even he is better than greenhaired AGP icon
Anonymous No.106141706
>>106141673
Time to redownload
Anonymous No.106141723 >>106141784
localbros...
Anonymous No.106141726 >>106142092
Anonymous No.106141727 >>106141744
>>106141692
yes but....
>>106138146
>>106138209
most likely ik_llama.cpp won't have MTP support. VLLM already does though.
Anonymous No.106141744
>>106141727
Neither llama.cpp has MTP support, yeah.
Anonymous No.106141760 >>106142062 >>106143521
Some anon a thread or couple ago said that koboldcpp now has rocm support in main branch, no need to koboldcpp-rocm port anymore, but I don't see it anywhere. Did he hallucinate it?
Anonymous No.106141784
>>106141723
known issue when you convert to gguf using pytorch. some byte sequence is triggering clam and making it freak the fuck out when it's a false positive.
Anonymous No.106141805 >>106141818
>>106141096
Beg
Anonymous No.106141818 >>106141835 >>106141893 >>106141950
>>106141805
give me like 20 beers and would
Anonymous No.106141835
>>106141818
I get drunk with 6+ so 20 sounds about right.
Anonymous No.106141878 >>106141931
>>106141601
https://huggingface.co/DevQuasar/zai-org.GLM-4.5-Air-GGUF/tree/main
i downloaded q2_k
>>106141611
air
Anonymous No.106141893
>>106141818
would what? barf?
Anonymous No.106141931 >>106141938
>>106141878
run the perplexity test with the wiki text and let me know what you score
Anonymous No.106141938 >>106141974 >>106142046 >>106142546 >>106142581
glm4.5 air q2_k
temp 0.6 minp 0.05
>>106141931
how do i do that?
./llama-bench?
Anonymous No.106141943 >>106141961
>>106137590
DeepSeek is local.
Anonymous No.106141950
>>106141818
with standards this low nemo should have been enough for you
Anonymous No.106141955
>>106138596
Carbon fiber miku is stronger than all others.
Anonymous No.106141961 >>106142132 >>106142451
>>106137663
>>106141943
it's fake and unofficial made by one literally who
Anonymous No.106141974
>>106141938
sloppa'd up
Anonymous No.106141990 >>106142004 >>106142034
>air
you guys know your using the shitty one, right?
Anonymous No.106141993
>>106138596
ever did that experiment where you put a rose into a flask with some nigger cum at the bottom?
Anonymous No.106142001
I hope Qwen releases other sizes of their imagegen. It would be interesting to see how the 0.5B model holds up
Anonymous No.106142004 >>106142015
>>106141990
>just run the one that's 3.5x larger bwo
good actionable advice
Anonymous No.106142015 >>106142040 >>106142049 >>106143193
>>106142004
256GB ram is like $400 at most, get a job
Anonymous No.106142034
>>106141990
buy me 4x rtx pro 6000s then jew
Anonymous No.106142040 >>106142101
>>106142015
>256GB ram is like $400
plus new motherboard plus new cpu, might as well get a new computer at this point
>get a job
no
Anonymous No.106142046 >>106142258
>>106141938
./llama-perplexity -m ../your/model/path/your-quant-name-00001-of-00002.gguf -f wiki.test.raw --seed 1337 -fa -fmoe -mla 3 --ctx-size 512 --threads yourthreadcount -ngl 99 -sm layer --override-tensor exps=CPU,attn_kv_b=CPU --no-mmap
Anonymous No.106142049 >>106142055 >>106142069 >>106142082
>>106142015
im only 18, what do u expect me to do??
Anonymous No.106142055 >>106142077
>>106142049
I was working at 14, whats your excuse
Anonymous No.106142062 >>106142070 >>106142693
>>106141760
>no prebuilt binaries of koboldcpp-rocm for linux
sadge
Anonymous No.106142069
>>106142049
scam pedos
Anonymous No.106142070
>>106142062
coompile it yourself
Anonymous No.106142077 >>106142090 >>106142330 >>106142455
>>106142055
im shy.. and i dont wanna do boring work for 300-400$ a month (or whatever they would pay a highschooler in serbia)
Anonymous No.106142082 >>106142112
>>106142049
get a job instead of being on 4chan you'll thank yourself later OR alternatively open an onlyfans using your feet as money
Anonymous No.106142090 >>106142112
>>106142077
>highschooler in serbia
what does your tummy look like, we may be able to work out a deal
Anonymous No.106142092
>>106141726
>>106141550
>>106141440
These are good.
Anonymous No.106142101 >>106142108 >>106142137
>>106142040
There's no point to those big autism box setups even if you have the money, you could just use it to buy a lifetime worth of tokens instead. It probably would cost less per token than just the electricity for your slow ass "rammaxxing" setup
The only advantage of running locally would be to be able to run weird finetunes and merges, but these bloated moes will never have any of those
Anonymous No.106142108
>>106142101
Fuck off Sam, you ain't reading my logs.
Anonymous No.106142112
>>106142090
im actually quite skinny however im a man and im not gay, im not desparate for money i still have some from coding shitty unity games for grifters when i was 15
>>106142082
idk anon i just dont wanna work at a mcdonalds or at a store its scary
Anonymous No.106142121
>>106141673
Goofy but as a cute r63 maid with a stutter so she calls herself G-Goofy.
Anonymous No.106142132 >>106142164
>>106141961
Implies meme authorship matters.
Anonymous No.106142137
>>106142101
nta, but apis just seem to go down as soon as i'm about to finish, so having local that's immune to service disruption is nice
Anonymous No.106142147
>>106138905
What are the system requirements in this thing?
Anonymous No.106142164
>>106142132
matters if it's forced meme
Anonymous No.106142203
>>106138789
can it do anime titties?
Anonymous No.106142258 >>106142312
>>106142046
>-f wiki.test.raw --seed 1337
./llama-perplexity --model ~/TND/AI/GLM-air/zai-org.GLM-4.5-Air.Q2_K-00001-of-00004.gguf -ngl 1000 -ot exps=CPU -t 6 -fa -fmoe -mla 3 --ctx-size 512 -sm layer -f wiki.test.raw --seed 1337
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
error: invalid argument: -fmoe

removed that
./llama-perplexity --model ~/TND/AI/GLM-air/zai-org.GLM-4.5-Air.Q2_K-00001-of-00004.gguf -ngl 1000 -ot exps=CPU -t 6 -fa --ctx-size 512 -sm layer -f wiki.test.raw --seed 1337
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
error while handling argument "-f": error: failed to open file 'wiki.test.raw'


usage:
-f, --file FNAME a file containing the prompt (default: none)


to show complete usage, run with -h
Anonymous No.106142312 >>106142332 >>106142365
>>106142258
oh i forgot you need to download the file as well.
it's this file
https://huggingface.co/nisten/llama3-8b-instruct-32k-gguf/blob/main/wiki.test.raw
Anonymous No.106142330 >>106142342
>>106142077
this is the poor mindset that will end up leading you towards a life of welfare
Anonymous No.106142332 >>106142373
>>106142312
how long time should it take?
i have like 11t/s
Anonymous No.106142342 >>106142366 >>106142386
>>106142330
but im gonna very likely enroll in a good university and then get a job while in uni because those pay better and i can work with something im interested in
Anonymous No.106142365
>>106142312
>ETA 1 hour
[1]3.7721,[2]4.8459,[3]4.1327,[4]3.9085,[5]4.3769,[6]4.4260,[7]4.6263,[8]4.7943,[9]5.4195,[10]5.5582,[11]5.7446,[12]5.8990,
[13]6.3410,[14]6.2090,[15]6.2595,[16]6.2787,
how do i do a quicker test? i wanna chat with glm
Anonymous No.106142366 >>106142410
>>106142342
>university
Unless you already have a job lined up after then that is a big mistake, you should have been looking into getting into a trade like electrical or plumbing or the like, they will pay to train you and there is massive demand and good pay. With uni you are paying to likely get fucked over with something not in demand
Anonymous No.106142373 >>106142410 >>106142425
>>106142332
you can try offloading some layers onto your GPU but I don't know how much you'll be able to move onto a 3060 if that's all you have.
-ot "blk\.3\.ffn_up_exps=CUDA0, blk\.3\.ffn_gate_exps=CUDA0" -ot "blk\.4\.ffn_up_exps=CUDA0, blk\.4\.ffn_gate_exps=CUDA0"
etc. etc. you can make the above commands cleaner with regex but im too lazy
Anonymous No.106142386 >>106142410
>>106142342
well good luck kid, just know if you stay comfortable you'll always be poor. network with people and some rich fuck will give you get a high paying job.
Anonymous No.106142410 >>106142452
>>106142366
in serbia uni is free if you have good grades and get a nice amount of points on the entrance exam
>>106142373
thanks anon but i meant like can i do only 10% of the test so its quicker
[23]6.1785,[24]6.0028,[25]5.8904,[26]5.7935,[27]5.6974,[28]5.7772,[29]5.7919,[30]5.8340,[31]5.8936,[32]5.9393,[33]6.0358,[34]6.0629,[35]6.1970,[36]6.2575,[37]6.2242,[38]6.3374,[39]6.3386,[40]6.3366,[41]6.4282,[42]6.4370,[43]6.3993,[44]6.4095,
some newer results
>>106142386
you're right, but its kind of too late to be thinking about that, school starts in a month and most jobs are probably filled with others that applied early
> if you stay comfortable you'll always be poor
very great advice, im writing it down
Anonymous No.106142425 >>106142505
>>106142373
[61]6.9160,[62]6.9912,[63]7.0670,[64]7.0982,[65]7.1008,[66]7.1183,[67]7.1213,[68]7.1303,[69]7.1982,[70]7.1933,[71]7.1833,[72]7.1691,[73]7.1749,[74]7.1992,[75]7.1914,[76]7.1142,[77]7.0454,[78]7.0050,[79]6.9811,[80]6.9604,
perplexity doesnt seem that bad for q2 so far
im stopping it now
Anonymous No.106142446
Why do anons keep on entertaining this attention seeking zoomer? He's been moaning about his age in /ldg/ for a few days already
Anonymous No.106142451
>>106141961
That just makes it more suitable.
Anonymous No.106142452 >>106142585
>>106142410
demand for linemen is always huge and you can easily make 200K+ a year in the US at least if your willing to be on call / travel a bit
Anonymous No.106142455
>>106142077
>petra is a highschool twink
sounds about right
Anonymous No.106142503
>cudadev
>blacked miku poster
>mikutroon
>petra
>ikaridev
how does he achieve this bros..
Anonymous No.106142505
>>106142425
go coom or something and run it overnight and post your results tomorrow in that case
Anonymous No.106142518
>how does he achieve this bros
Anonymous No.106142546 >>106142640
>>106141938
>OH NO NO NO
>NOT LIKE THIS
lmao
Anonymous No.106142581
>>106141938
>it wasn't just x, it was y
>2 times in the same response
Hopefully that's just Q2 being Q2...
Anonymous No.106142585 >>106142766
>>106142452
thanks for the advice anon, i really dont know what to say and i dont want to sound dismissive. im just stumped on where the fuck do i start, all i know is tech. first i'd need to get a visa then tickets and etc for that i need money and since its the us probably a nice sum of money, for that i need to get a job
my brain gets fried thinking about this shit. i know im taking the easy route by just studying studying and hoping i can find a good job in/after uni
it is what it is
Anonymous No.106142598 >>106142617 >>106142648
used the code tags you faggots
also, pls bake
Anonymous No.106142602 >>106142611
WAIT ME
Anonymous No.106142611
>>106142602
And you are?
Anonymous No.106142617
>>106142598
8th page anon
Anonymous No.106142637
if anyone wants me to test glm with different samplers/prompts/whatever (besides perplexity ill do that overnight)
give requests
picrel is q3_k_m btw
Anonymous No.106142640
>>106142546
https://www.youtube.com/watch?v=1FZ3Xa7gEKk&list=RD1FZ3Xa7gEKk&t=12s
Anonymous No.106142648
>>106142598
fuck you get triggered i won't do what you tell me. enjoy this mess of broken formatting.
Anonymous No.106142652
>>106141440
I like this Miku
Anonymous No.106142693 >>106142727 >>106142748
>>106142062
Well, I think I figured it out, so here are some benchos.
Hardware is Ryzen 7 5700G with AMD Radeon RX6600, plz no bully.
After looking at the docs more carefully I believe that main branch of koboldcpp supports ROCm via --usehipblas, but you need to compile it yourself, which I am lazy to do.
$ python koboldcpp-rocm/koboldcpp.py --usecublas 1 --model models/ggml-org/gemma-3-1b-it-Q4_K_M.gguf --bench
...
[00:19:20] CtxLimit:8192/8192, Amt:100/100, Init:0.09s, Process:35.89s (225.49T/s), Generate:3.59s (27.82T/s), Total:39.48s
Benchmark Completed - v1.96.2.yr0-ROCm Results:
======
Flags: NoAVX2=False Threads=7 HighPriority=False Cuda_Args=['1'] Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=False KvCache=0
Backend: koboldcpp_hipblas.so
Layers: 29
MaxCtx: 8192
GenAmount: 100
-----
ProcessingTime: 35.887s
ProcessingSpeed: 225.49T/s
GenerationTime: 3.594s
GenerationSpeed: 27.82T/s
TotalTime: 39.481s
Output: 1 1 1 1
-----

./koboldcpp-linux-x64-nocuda --usevulkan --model models/ggml-org/gemma-3-1b-it-Q4_K_M.gguf --bench
...
[00:26:47] CtxLimit:8192/8192, Amt:100/100, Init:0.79s, Process:4.45s (1818.02T/s), Generate:1.18s (84.67T/s), Total:5.63s
Benchmark Completed - v1.96.2 Results:
======
Flags: NoAVX2=False Threads=7 HighPriority=False Cuda_Args=None Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=False KvCache=0
Backend: koboldcpp_vulkan.so
-----
ProcessingTime: 4.451s
ProcessingSpeed: 1818.02T/s
GenerationTime: 1.181s
GenerationSpeed: 84.67T/s
TotalTime: 5.632s
Output: 1 1 1 1
-----


Unexpectedly, vulkan actually won.
I'm hitting post size limit, but here are also timings for --usecpu:
ProcessingTime: 45.296s
ProcessingSpeed: 178.65T/s
GenerationTime: 3.699s
GenerationSpeed: 27.03T/s
TotalTime: 48.995s
Anonymous No.106142717 >>106142772
>>106136043
they will stop talking about this piece of shit once it's implemented, it's the glm cycle of hype and disillusions…
this thread is constantly filled with that kind of model begging -> model forgotten once they actually tried it and saw the shit for what it really was
Anonymous No.106142727 >>106142748
>>106142693
very nice, so it seems vulkan is better, i heard cuda is getting beaten by vulkan slowly too
Anonymous No.106142748 >>106142976
>>106142693
>>106142727
ah, now that I look closer at the logs,
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
false alarm, I need to tinker some more.
Anonymous No.106142755 >>106142787
>>106138428
The bullying will continue until working GLM quants are on my machine. Or maybe not because it's fun and done out of love.
Anonymous No.106142766
>>106142585
finding a job nowadays it's always who you know and not what you know. truly.
sure you need to know the basics, but getting a referral from someone who works a high paying place is the only way now.
Anonymous No.106142772
>>106142717
It's still fun to act hyped even if something is going to be shit.
You know, kind of like how people here act like they're having sex, even though they're just role playing.
Anonymous No.106142774
>>106135910 (OP)
I'm hoping Horizon A/B are local models. Alpha is pretty damn good at translating Japanese text (With some help from instructions) and almost beats Kimi k2.

The only thing i'm very sad about is how no models are good for writing stories. When will this change? I just want to have a model write about Monster Girls taking over the world, getting a cute harpy mate and meeting her parents.
Anonymous No.106142782 >>106142889
>>106136033
she was one of the first widely used local models ever tho
Anonymous No.106142784
>>106139295
>it's not just x, it's y
>—something, second thing, and a third thing for good measure
there isn't a word in the english dictionary that can describe how much I hate this slop, and how much more I hate humans who don't notice this slop as LLM writing (I see more and more and more posts on hn that are very much LLM generated and you get downvoted to oblivion for hurting people feelies if you say as much)
Anonymous No.106142787 >>106142797
>>106142755
D-do you love m-me?
Anonymous No.106142797
>>106142787
i do <3
Anonymous No.106142843
i had no idea /lmg/ could become any gayer than it already was
Anonymous No.106142846 >>106142855
Anyone have an Ubergarm card to share?
Anonymous No.106142855
>>106142846
wait me guy should get it first
Anonymous No.106142889
>>106142782
her time is over
Anonymous No.106142950
>You’d cream your pants like a dog.
yeah that famous thing that dogs do
stupid ass model
Anonymous No.106142963
now that glm is working it's time for step3
we finally have a super smart and uncensored vision model that doesn't shy away from describing porn and it's just going to fade into obscurity because it's using some fancy new attention thing
Anonymous No.106142976
>>106142748
you probably have to pretend to have a different GPU.
Anonymous No.106142979
>>106142968
>>106142968
>>106142968
Anonymous No.106143193
>>106142015
I feel like even if I built for this, dual kits of 64gbx4 ddr5 would get me a whopping 1.5 token a second just like running r1 on ram got average people.

Maybe with offloading non-attention layers and a decent amount of vram we could squeeze it up to 2-3 tokens a second, maybe. But honestly anything less than 5 tokens a second kinda sucks. Also, I feel like a lot of the more advanced ways to tune speed like that are not available to casual users, and risking a purchase only to find out later youre not sure how to do what one asshole on reddit did with a beta from an unreliable dev is not a good thing to recommend.
Anonymous No.106143521
>>106141760
Did you not read the release notes?
https://github.com/LostRuins/koboldcpp/releases/tag/v1.96.2
>download our rolling ROCm binary here if you use Linux.
>https://koboldai.org/cpplinuxrocm