Thread 106135910

423 posts 98 images /g/

Anonymous 8/4/2025, 12:29:51 PM No.106135910 >>106136033 >>106138143 >>106140131 >>106141246 >>106142774

/lmg/ - Local Models General

1731988456020864.jpg md5: 5a9b34df... 🔍

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106127784 & >>106119921

►News
>(08/01) XBai o4 32B released: https://hf.co/MetaStoneTec/XBai-o4
>(07/31) Qwen3-Coder-30B-A3B released: https://hf.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
>(07/31) Command A Vision: Built for Business: https://cohere.com/blog/command-a-vision
>(07/31) Step3 multimodal reasoning 321B-A38B released: https://stepfun.ai/research/en/step3
>(07/31) Committed: llama-server : implement universal assisted decoding: https://github.com/ggml-org/llama.cpp/pull/12635

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 8/4/2025, 12:30:29 PM No.106135912

threadrecap.png md5: 7b9a82a1... 🔍

►Recent Highlights from the Previous Thread: >>106127784

--Papers:
>106132921 >106132991
--Horizon Alpha/Beta shows strong narrative skills but weak math, sparking GPT-5 and cloaking speculation:
>106130817 >106131034 >106131279 >106131299 >106131373 >106131411 >106131427 >106131442 >106131555 >106131617 >106131779
--GLM 4.5 perplexity and quantization efficiency across expert configurations:
>106132346 >106132379 >106132500 >106132520 >106132529
--Persona vectors for controlling model behavior traits and detecting subtle biases:
>106128851 >106128930 >106129259 >106128980 >106129116 >106130928 >106129195
--Frustration over lack of consumer AI hardware with sufficient memory and bandwidth:
>106129370 >106129437 >106129567 >106129633 >106129664 >106129737 >106129741 >106129879
--Tri-70B-preview-SFT release with strong benchmarks but training data concerns:
>106128191 >106128220 >106128338 >106128350 >106128370 >106128457
--Beginner seeking foundational understanding of LLM architecture for custom AI companion project:
>106128392 >106128434 >106128439 >106128472 >106128531 >106128623 >106128758
--GLM 4.5 Fill-in-the-Middle support discussed:
>106128386 >106128390 >106128549 >106128571 >106132834
--Fragmented llama.cpp PRs delay GLM model testing:
>106129724 >106129734 >106129785 >106129760 >106129995
--ROCm vs Vulkan performance for AMD GPUs in kobold.cpp:
>106128441 >106129743 >106129912
--GLM-4.5-Air runs locally on 4x3090 at 4.0bpw with high T/s:
>106132183 >106134407
--Building RTX 6000-based servers on a $100k budget:
>106128630 >106128713 >106128751 >106129561 >106129626
--Horizon Alpha/Beta performance suggests strong open-weight models:
>106130542 >106130559 >106130587 >106130600 >106130609 >106130622 >106130676 >106130697 >106130641 >106130700 >106130708
--Miku and Long Teto (free space):
>106128713 >106131379 >106134264

►Recent Highlight Posts from the Previous Thread: >>106128093

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/4/2025, 12:32:13 PM No.106135923 >>106135958

>>106135649
to be fair llamacpp has many backends and in llamacpp everything is implemented from scratch for every model, exllama can use more transformers diffusers, just like nunchaku for example

Anonymous 8/4/2025, 12:38:58 PM No.106135958

>>106135923
i was that anon. That's fair. Looking at the exl3 implementation of glm it seems so much simple.
I saw llama.cpp was trying to reuse some of the deepseekv2 moe implementation.
Llama.cpp is really cool as it supports pretty much everything, i like it. But it makes proper support for new stuff so hard

Anonymous 8/4/2025, 12:41:04 PM No.106135972 >>106135978 >>106136050

I just want GLM gguf

Anonymous 8/4/2025, 12:41:44 PM No.106135978

>>106135972
we already have glm mlx

Anonymous 8/4/2025, 12:51:45 PM No.106136027 >>106136785

going to laugh my ass off once gpt-oss "support" hits llamacpp
>swa doesn't work
>broken moe routing
>no attn sink
>some tokenizer bug that gets fixed after 2 weeks

Anonymous 8/4/2025, 12:53:30 PM No.106136033 >>106137145 >>106142782

>>106135910 (OP)
unrelated girl on the op pic again

Anonymous 8/4/2025, 12:55:06 PM No.106136043 >>106136056 >>106136169 >>106142717

What's so special about GLM that you want llama.cpp support so much?

Anonymous 8/4/2025, 12:56:03 PM No.106136050 >>106136135 >>106136155 >>106136171 >>106136209 >>106136360 >>106136638

>>106135972
Its already pretty easy to run qwen 235b/22a on one gpu and some cheap ram, I really doubt a 106/12A is worth unless you have a potato pc

Anonymous 8/4/2025, 12:56:47 PM No.106136056

>>106136043
It comes in (v)ramlet size.

Anonymous 8/4/2025, 1:09:32 PM No.106136135

>>106136050
> 22a on one gpu
3090?

Anonymous 8/4/2025, 1:12:00 PM No.106136155

>>106136050
>I really doubt a 106/12A is worth unless you have a potato pc
nta but i do (3060 12gb/64gb ddr4 ram)

Anonymous 8/4/2025, 1:13:37 PM No.106136169

>>106136043
Can run 120b with 64gb of ram and a 5090 without having to go on and buy some server mobo that I have no other use for and would probably collect dust, get to play with new and bugger stuff instead of 3 bpw exl3 70b @ 20k something context or high quant 32b/49b and high context

Anonymous 8/4/2025, 1:13:49 PM No.106136171

>>106136050
qwen needs at least 128 gb of ram, which there's no way in hell it will run in dual channel on a consumer board
64 is much more feasible, also even glm air has better trivia knowledge than qwen

Anonymous 8/4/2025, 1:21:29 PM No.106136209

>>106136050
If you already have all DIMM slots occupied and/or you're on a standard DDR4 motherboard, you can't upgrade that effortlessly. On DDR4 it's just not worth the money; if you have DDR5 motherboard but you're on AMD, you're going to have the bandwidth severely gimped to DDR4 levels with 4xDDR5 DIMM modules. With standard ("cheap") DDR5 memory on regular desktop motherboards you're going to be limited to around 100 GB/s anyway.

I can't see many reasons for upgrading what I already have right now (3090 + 64GB DDR4-3600) until something considerably faster comes out (DDR6 or quad-channel/256-bit DDR5). If you're building a completely new PC *now*, then it makes sense to buy more memory with LLMs in consideration.

Anonymous 8/4/2025, 1:26:55 PM No.106136245

file.png md5: a2a22599... 🔍

kek

Anonymous 8/4/2025, 1:30:10 PM No.106136260 >>106136291 >>106137196

image_2025-08-04_165935445.png md5: 97ff4c14... 🔍

so what's the deal with RAGs? are they as good as they say?

Anonymous 8/4/2025, 1:34:41 PM No.106136291 >>106136309

>>106136260
>prompt processing progress, n_past = 12288, n_tokens = 2048, progress = 0.231439

Anonymous 8/4/2025, 1:37:23 PM No.106136309 >>106136434

>>106136291
Yeah, this is why I loved rag/lorebooks when using nemo/faster models and absolutely loathe them with qwen 235b
PP is the fucking mind killer. I can deal with TG speeds around friggin 4 t/s if I have to, but slow processing is hell and anything that makes me redo the whole prompt is going straight in the trash.

Anonymous 8/4/2025, 1:40:22 PM No.106136322

the 'dominate/dominating/domination' slop of recent models really sends shivers down your spine

>cherry blossoms are ~6% more often pink than white
>pink blossoms dominate white blossoms
> Y color of teacup is sold 0.5% more often
> Y color dominates sales
> Z car has 3% more engine breakdowns
> Z car dominates when it comes to engine breakdowns

Anonymous 8/4/2025, 1:46:11 PM No.106136360

>>106136050
I have 128GB ram and a 4090. And however much I like the pussy on qwen she is crazy in a not good way. So I am hoping for 4.5full size chan in 3bits.

Anonymous 8/4/2025, 1:56:28 PM No.106136434 >>106136474

>>106136309
if the lorebook is small enough you can always turn the entries into constants. you can also rely on the model if the lore entries are from something like wow

Anonymous 8/4/2025, 2:02:37 PM No.106136474 >>106137223

>>106136434
That's what I ended up doing.
It's also how I discovered that Qwen 235 knows a startling amount about warhammer 40k

Anonymous 8/4/2025, 2:15:42 PM No.106136582 >>106136596 >>106136604 >>106136631 >>106136636 >>106136728 >>106136742 >>106137016 >>106137082 >>106137245

more-qwen.png md5: 24108883... 🔍

More Qwen models soon?
https://x.com/JustinLin610/status/1952329529256726680

>something beautiful tonight

Anonymous 8/4/2025, 2:18:13 PM No.106136596

>>106136582
finetuned leaked gpt-oss, but with every reference of openai replaced with 'BigQwen'
fuck altman gon do

Anonymous 8/4/2025, 2:19:41 PM No.106136604 >>106136683 >>106136732

>>106136582
Sex with Junyang's tiny chink cock.

Anonymous 8/4/2025, 2:23:30 PM No.106136631 >>106136645

>>106136582
Inb4 Qwen 3.5 100B A9B or some such.

Anonymous 8/4/2025, 2:24:34 PM No.106136636 >>106136657 >>106137082

>>106136582
https://github.com/huggingface/diffusers/pull/12055

Anonymous 8/4/2025, 2:24:43 PM No.106136638 >>106140582

>>106136050
at what speed at what context ?

I get dreadful speed on ddr5 and q4 even with two gpus lol

Anonymous 8/4/2025, 2:25:50 PM No.106136645

>>106136631
The upcoming 120B OpenAI model is already going to be 120B A6B or something like that.

Anonymous 8/4/2025, 2:27:18 PM No.106136657

>>106136636
I was kinda hoping they would release their own creative writing model on immediately after closedai

Anonymous 8/4/2025, 2:32:38 PM No.106136683

>>106136604
He's a grower, unless you like it flaccid.

Anonymous 8/4/2025, 2:38:00 PM No.106136728 >>106136737

>>106136582
Qwen 3.5 - it now understands that Nala doesn't have a cock. (Still can't do paws, though)

Anonymous 8/4/2025, 2:38:46 PM No.106136732

>>106136604
Make a card of him.

Anonymous 8/4/2025, 2:39:01 PM No.106136737 >>106136748 >>106136749 >>106136754

>>106136728
Lmao. Is Qwen that bad at the Nala card?

Anonymous 8/4/2025, 2:39:30 PM No.106136742

>>106136582
yet another benchmaxxed coder model

Anonymous 8/4/2025, 2:40:02 PM No.106136748 >>106137142

>>106136737
The original 235B struggled with gender in RP in general, Hard.

Anonymous 8/4/2025, 2:40:03 PM No.106136749 >>106136755

>>106136737
I'm pretty sure he's joking. That used to be an issue with older Qwen models.

Anonymous 8/4/2025, 2:41:01 PM No.106136754 >>106137142

>>106136737
Qwen235 is the modern day frankenmerge.

Anonymous 8/4/2025, 2:41:05 PM No.106136755

>>106136749
>hello sarrs do not redeem the criticism it is very best model
fuck off ranjit

Anonymous 8/4/2025, 2:43:51 PM No.106136782

file.png md5: 6ae11f63... 🔍

I need a better computertron...

Anonymous 8/4/2025, 2:44:47 PM No.106136785 >>106136825

>>106136027
If they actually cared about OSS and people using their open models, they could write their own support PRs like the Chinese often do.

Anonymous 8/4/2025, 2:50:07 PM No.106136825

>>106136785
Well they appear to be using hf transformers instead of some proprietary shit at least. So they are contributing to that code base. That makes it more open than Llama-4

Anonymous 8/4/2025, 3:10:23 PM No.106136975 >>106137152

9ze75m65ecp01.jpg md5: 3b518b24... 🔍

I love kurisu and she has an actual vagina.

Anonymous 8/4/2025, 3:16:36 PM No.106137016

>>106136582
Just give me QwQ 2 Large

Anonymous 8/4/2025, 3:23:52 PM No.106137082 >>106137117

>>106136582
>>106136636
Is this qwen's attempt at the piss filter?

Anonymous 8/4/2025, 3:28:26 PM No.106137117

>>106137082
it talks about using a edited version of wan's vae, so wan but more focused on images? Its already insane at images though.

Anonymous 8/4/2025, 3:31:16 PM No.106137142 >>106137194 >>106137226 >>106137260

>>106136748
>>106136754
That's pretty fucking funny.
The new ones are a big improvement right?

Anonymous 8/4/2025, 3:31:52 PM No.106137145 >>106137590

orangeDipsy.png md5: 3c7c8807... 🔍

>>106136033

Anonymous 8/4/2025, 3:32:24 PM No.106137152

1752032513846462.png md5: 706ed2f8... 🔍

>>106136975
kurisu makina >>>> kurisu makise

Anonymous 8/4/2025, 3:38:29 PM No.106137194 >>106137243

>>106137142
Couldn't tell you. I was already kind of getting over local at the time and Llama-4 and Qwen-3 were pretty much my "yeah it's time to get out of here" signal.
And now I'm back because I want to try OSS when it comes out.

Anonymous 8/4/2025, 3:38:41 PM No.106137196 >>106137223 >>106137427

>>106136260
I've yet to find a compelling usecase for RP. I actually am wondering, if I did, whether I could have a model chunk up the RAG text into lorebook entries for me and just use that as a lazy solution that would be tunable.
>>10616474
For RP, even the small LLMs know a lot more lore than you'd think. I've built several cards based on Free Cities (which imho is pretty esoteric). All the hosted LLMs know it really well, but even Mythomax 13b could do a good job describing the lore.
DS even explained to me that FC was written by one guy, but since abandoned, and Pregmod, while now the dominant dev branch, is different in several ways.

Anonymous 8/4/2025, 3:41:12 PM No.106137223 >>106137300

>>106137196
Meh, meant for >>106136474
My point is, if the LLMs like MM know FC they'd know Warhammer 40K much better.

Anonymous 8/4/2025, 3:41:22 PM No.106137226

>>106137142
Original 235B gave me troubled childhood vibes where it is kinda fucked up but largely ok. New one made me remember frankenmerges. It got worse.

Anonymous 8/4/2025, 3:42:53 PM No.106137243

>>106137194
>And now I'm back because I want to try OSS when it comes out.
SIR PLEASE! Your brownness is showing!

Anonymous 8/4/2025, 3:43:12 PM No.106137245 >>106137260 >>106137266 >>106137270 >>106137280 >>106137289 >>106137336

qwen-v.png md5: ca1e62c4... 🔍

>>106136582
https://x.com/JustinLin610/status/1952362068214186035

>eyes wide open

Anonymous 8/4/2025, 3:44:33 PM No.106137260 >>106137289

>>106137142
the original had a star on greedynalatests and the new ones are better than that
>>106137245
>generic slop as promo
not promising

Anonymous 8/4/2025, 3:44:59 PM No.106137266

>>106137245
If he's going to be cryptic he's going to need to use a fruit. Those are the rules.

Anonymous 8/4/2025, 3:45:40 PM No.106137270 >>106137286

>>106137245
I'm not very interested in image gen.

Anonymous 8/4/2025, 3:46:14 PM No.106137280

>>106137245
>Qwen-Image
It is gonna be an image gen model. It will be totally uncensored and you will get sex output out of the box. It will be their best model yet.

Just because imagegen sex is already solved......

Anonymous 8/4/2025, 3:46:39 PM No.106137286

>>106137270
If it's a whole ass LLM with native 2-way multi-modality that isn't a meme that would be something.
But I'm going to guess that Qwen went and made the same generic 12B diffusion imagegen model as everyone else.

Anonymous 8/4/2025, 3:46:50 PM No.106137289

>>106137260
>greedynalatests
Oh yeah, we have that don't we?

>>106137245
That's not a great image.

Anonymous 8/4/2025, 3:47:43 PM No.106137300 >>106137544

>>106137223
Oh yeah, I know that much - it's just that I'd given it a shot on some nemo finetunes and mistral small previously, and while it understood the setting at large, it fell apart at specifics beyond the absolute most best known parts.
Big qwen on the other hand decided to give me a whole bunch of proper, actual game units with relevant tactics when I gave it a nondescript ork horde to work with, which was great flavor I didn't even ask for.

Anonymous 8/4/2025, 3:48:45 PM No.106137308 >>106137311 >>106137315 >>106137316

tried glm 4.5 and it is so unstable what the hell its making me want to give jew apple money on a mac studio

Anonymous 8/4/2025, 3:49:42 PM No.106137311

>>106137308
like most models these days you super low temp and it becomes perfectly coherent

Anonymous 8/4/2025, 3:50:09 PM No.106137312 >>106137337

another chinese model:
https://ai.gitcode.com/ascend-tribe/openpangu-ultra-moe-718b-model
this team was accused of lazily upcycling some qwen model and trying to pass it off as something original with their previous release so who knows how legit this is

Anonymous 8/4/2025, 3:50:17 PM No.106137315

>>106137308
On Exllama or vLLM?

Anonymous 8/4/2025, 3:50:20 PM No.106137316

>>106137308
On vLLM or transformers?t

Anonymous 8/4/2025, 3:52:36 PM No.106137336 >>106137343 >>106137407 >>106137520 >>106137727

where beauty happens.png md5: 00726aa7... 🔍

>>106137245
he is still drinking
https://x.com/JustinLin610/status/1952365200524616169

Anonymous 8/4/2025, 3:52:36 PM No.106137337

>>106137312
so was it upcycled or no? just becaause they were accused doesnt mean shit

Anonymous 8/4/2025, 3:53:47 PM No.106137343 >>106137359 >>106137409

>>106137336
I hope it's good for the only real usecase

Anonymous 8/4/2025, 3:55:57 PM No.106137359

>>106137343
It won't be until someone competent trains it on booru data.

Anonymous 8/4/2025, 3:58:38 PM No.106137380

>Rate limit exceeded: free-models-per-day.
BUT I HAVEN'T DONE! AIIIEEEEEEEEEEEEEEEE

Anonymous 8/4/2025, 4:01:57 PM No.106137407 >>106137433

>>106137336
The best would be image input+output integration into a LLM, and this will probably be a stepping stone toward that if it's just going to be an image model, but dedicated illustration image models trained on booru websites can't really be beaten if that's what you mainly use image diffusion models mainly for.
And unless it brings something novel to the table, there are probably too many image models already for people with the data and the resources to care about Qwen-Image.

Anonymous 8/4/2025, 4:01:58 PM No.106137409 >>106137423

>>106137343
nah. I think its selling point will be he first open source imagegen model of reasonable capability that had artists tagged properly in the dataset

Anonymous 8/4/2025, 4:03:30 PM No.106137423 >>106137434

>>106137409
Safe artists, I'm sure.

Anonymous 8/4/2025, 4:04:10 PM No.106137427

>>106137196
>even the small LLMs know a lot more lore than you'd think
How smol?

Anonymous 8/4/2025, 4:04:42 PM No.106137433

>>106137407
in that case llamacpp support only after sam achieves agi and it vibecodes the implementation

Anonymous 8/4/2025, 4:04:52 PM No.106137434

>>106137423
Like Edward Hopper.
Still would be fun Norman Rockwelling everything

Anonymous 8/4/2025, 4:08:41 PM No.106137464 >>106137513 >>106138404

Qwen already saved textgen. Now it's time for them to save imagegen

Anonymous 8/4/2025, 4:12:55 PM No.106137513 >>106137538

>>106137464
so the model will jumble text into chinkgrish and give every subject a dick?

Anonymous 8/4/2025, 4:14:07 PM No.106137520

>>106137336
its probably going to be safe shit but at least they have the will to try
most of the western companies are too cowardly to release even their giga safetycucked imagegen

Anonymous 8/4/2025, 4:15:59 PM No.106137538 >>106137575

>>106137513
Can a Chinese company afford to have actual porn pictures in their training data? Won't they get in trouble with the government?

Anonymous 8/4/2025, 4:16:44 PM No.106137544

>>106137300
What I ended up doing for the mythomax card was creating some lore book entries on FC concepts that it didn’t understand very well. That was effective at bridging mythomax to larger hosted models. The bonus is, you could either remove the lore book later, or just leave it in place since it didn’t hurt anything with a larger models.

Anonymous 8/4/2025, 4:18:30 PM No.106137558 >>106137567

Are there any open weights models that would be good at editing a map?
As in, I give it the image of a map and a bunch of instructions of changes to the geography and landmarks for it to add and it would spit the modified map back at me.

Anonymous 8/4/2025, 4:19:15 PM No.106137567 >>106137633

>>106137558
no

Anonymous 8/4/2025, 4:19:31 PM No.106137575 >>106137704

>>106137538
Hunyuan Video definitely had limited amounts of porn in it and could easily generate bunny content. It didn't get banned / retracted / or anything, contrarily to expectations.

Anonymous 8/4/2025, 4:21:19 PM No.106137590 >>106137663 >>106141943

>>106137145
that one is even more unrelated

Anonymous 8/4/2025, 4:26:54 PM No.106137633 >>106137708

>>106137567
Really?
Shit.
What are my options here?
Use a multimodal LLM to read the map and write instructions for an image gen model?

Anonymous 8/4/2025, 4:31:10 PM No.106137663 >>106141961

>>106137590
Explain

Anonymous 8/4/2025, 4:35:31 PM No.106137704

>>106137575
Well, that's something, at least. Not expecting a lot but some more competition can't hurt.

Anonymous 8/4/2025, 4:35:47 PM No.106137708

>>106137633
No, there is nothing you can do as far as I know

Anonymous 8/4/2025, 4:37:46 PM No.106137727 >>106137765 >>106137766 >>106137815

>>106137336
https://github.com/naykun/diffusers/blob/b0c9b1ff14b0e6f6bc4cf2540b31383a26561e1e/src/diffusers/pipelines/qwenimage/pipeline_qwenimage.py

pipe = QwenImagePipeline.from_pretrained("Qwen/QwenImage-20B", torch_dtype=torch.bfloat16)

This is going to need a beefy GPU.

Anonymous 8/4/2025, 4:41:25 PM No.106137765

c5f66651-0a1e-4f15-9c1e-6f306452796d-2316381689.png md5: 22c47071... 🔍

>>106137727
quantization works for image gen too. look at q4 in this example

Anonymous 8/4/2025, 4:41:26 PM No.106137766

>>106137727
Only 36GB at bf16?

Anonymous 8/4/2025, 4:43:39 PM No.106137792 >>106137804 >>106137806 >>106137890 >>106137976 >>106138842

GLM 4.5 legitimately might just be "it" for me. Extremely lewdable, doesn't refuse anything I'm asking for. Less repetitive than Deepseek and smarter than Kimi

Anonymous 8/4/2025, 4:45:22 PM No.106137804 >>106137839

>>106137792
yea people are sleeping on it which is sad cause apparently its a tiny team who could hardly afford training it

Anonymous 8/4/2025, 4:45:35 PM No.106137806 >>106137992

>>106137792
>smarter than Kimi
How so?
Is there anything Kimi consistently fucks up that GLM 4.5 doesn't?

Anonymous 8/4/2025, 4:46:02 PM No.106137815

>>106137727
>20B
Damn, that's like twice the size of Flux.dev

Anonymous 8/4/2025, 4:47:19 PM No.106137834

file.png md5: 3c2810c2... 🔍

I will never get over how those faggots will never contribute to any model support.

Anonymous 8/4/2025, 4:47:50 PM No.106137839

>>106137804
People are sleeping on it because almost noone uses vLLM or MLX.
Llamacpp (And by extension, kobold, ollama, and ooba) doesn't have support yet.

Anonymous 8/4/2025, 4:52:25 PM No.106137887

GGUF
NOW

Anonymous 8/4/2025, 4:52:41 PM No.106137890 >>106137971 >>106138026 >>106138146

Screen Shot 2025-08-04 at 23.52.10.png md5: e455d49d... 🔍

>>106137792
Can't wait to test it

Anonymous 8/4/2025, 4:55:29 PM No.106137930 >>106137939

btw, use new light with euler

Anonymous 8/4/2025, 4:56:17 PM No.106137939

>>106137930
y doe

Anonymous 8/4/2025, 4:58:56 PM No.106137971

>>106137890
Just two more weeks until lazyganov does it

Anonymous 8/4/2025, 4:59:49 PM No.106137976 >>106138031 >>106139779

>>106137792
What's the consensus on GLM 4.5 vs Air? Is low quants of full model better than air?

Anonymous 8/4/2025, 5:00:55 PM No.106137992

>>106137806
From my experience, more knowledgable than Kimi and follows prompts and instructions better

Anonymous 8/4/2025, 5:03:45 PM No.106138026

>>106137890
im more likely to get laid than this PR be merged in the next month

Anonymous 8/4/2025, 5:04:24 PM No.106138031 >>106138132

>>106137976
I'll extend this question.
What about using the larger one with less activated experts per token vs using the smaller one.

Anonymous 8/4/2025, 5:08:40 PM No.106138094 >>106138127 >>106138161 >>106138445

https://x.com/techdevnotes/status/1952379782148042756

Anonymous 8/4/2025, 5:11:32 PM No.106138127 >>106138144

>>106138094
Finally, anime became mainstream and cringe.

Anonymous 8/4/2025, 5:11:54 PM No.106138132

>>106138031
My limited experience in playing around with disabling experts in MoE's is that it makes them horrendously more retarded than quantization does, to the point that it's basically not worth doing at all.

Anonymous 8/4/2025, 5:12:20 PM No.106138143

1754320326527.jpg md5: 6a9c2a94... 🔍

>>106135910 (OP)
Adorable Miku!

Anonymous 8/4/2025, 5:12:35 PM No.106138144 >>106139867

>>106138127
You're 15 years late

Anonymous 8/4/2025, 5:12:38 PM No.106138146 >>106138168 >>106138714 >>106138805 >>106141727

>>106137890
>this PR will NOT attempt to implement MTP (multi-token prediction). the relevant tensors will be excluded from the GGUFs.
>the MoE router uses group-based top-k selection, even though all conditional experts are in one group
>the MoE router must take into account the expert score correction biases from the model weights (so we need to keep that tensor)
lmao

Anonymous 8/4/2025, 5:13:33 PM No.106138161 >>106138354

sdfsdfasdf.jpg md5: 9644ad53... 🔍

>>106138094
It's sillytavern for retards, lol.

Anonymous 8/4/2025, 5:14:04 PM No.106138168 >>106138209

>>106138146
Lmao indeed.
Does anybody implement MTP?

Anonymous 8/4/2025, 5:16:25 PM No.106138200 >>106138244

reddit says wan 2.2 needs only 8gb vram, is this real? pls spoonfeed me

Anonymous 8/4/2025, 5:16:54 PM No.106138209 >>106138234 >>106138524 >>106141727

>>106138168
yeah https://github.com/vllm-project/vllm/pull/12755

Anonymous 8/4/2025, 5:18:24 PM No.106138234

>>106138209
Awesome. I really need to try and get that shit running.

Anonymous 8/4/2025, 5:19:09 PM No.106138244

>>106138200
Sounds right, especially with quantization. Though expect >20 minutes per video .

Anonymous 8/4/2025, 5:26:20 PM No.106138354

>>106138161
sillytavern is for retards

Anonymous 8/4/2025, 5:27:21 PM No.106138370 >>106138413

When will ubergarm release the ggooffss?

Anonymous 8/4/2025, 5:29:27 PM No.106138404

>>106137464
Sure. I'm sorry, but I can't assist with that request.

Anonymous 8/4/2025, 5:30:13 PM No.106138413 >>106138428

>>106138370
After he finishes his hair care routine.

Anonymous 8/4/2025, 5:31:06 PM No.106138428 >>106142755

>>106138413
H-hey! Would you please stop bullying me already?

Anonymous 8/4/2025, 5:32:38 PM No.106138445

>>106138094
>bad rudi
so they're pandering to furries now?

Anonymous 8/4/2025, 5:39:58 PM No.106138524

>>106138209
Does it actually work? The official repo only shows speculative decoding settings for sglang:
https://github.com/zai-org/GLM-4.5?tab=readme-ov-file#sglang

Anonymous 8/4/2025, 5:45:11 PM No.106138572 >>106138596 >>106138599

1723995629997031.jpg md5: 6d2da6ae... 🔍

Anonymous 8/4/2025, 5:46:44 PM No.106138596 >>106141955 >>106141993

>>106138572
why is miku black

Anonymous 8/4/2025, 5:47:03 PM No.106138599 >>106138631

>>106138572
peak irrelevancy

Anonymous 8/4/2025, 5:49:46 PM No.106138631

>>106138599
?

Anonymous 8/4/2025, 5:55:58 PM No.106138714 >>106138762

>>106138146
Can't wait for somebody to compare ppl between that implementation and the reference.

Anonymous 8/4/2025, 6:01:04 PM No.106138762 >>106138775

>>106138714
be that somebody anon you can do it!

Anonymous 8/4/2025, 6:02:44 PM No.106138775

>>106138762
I don't have the hardware to run full precision, sadly.
Unless you want to be my financial backer.

Anonymous 8/4/2025, 6:04:09 PM No.106138789 >>106138808 >>106138817 >>106138822 >>106138837 >>106138850 >>106138859 >>106138905 >>106139098 >>106139132 >>106139295 >>106139525 >>106140005 >>106142203

https://huggingface.co/Qwen/Qwen-Image
https://huggingface.co/Qwen/Qwen-Image
https://huggingface.co/Qwen/Qwen-Image

Anonymous 8/4/2025, 6:05:22 PM No.106138805

>>106138146
Just be grateful that you can run the model at all. Those things are miscellaneous goodies at best. Not really necessary right now and maybe someone can get around implementing them later. Maybe a project for you?

Anonymous 8/4/2025, 6:05:48 PM No.106138808 >>106138892

>>106138789
Technical report: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf

Anonymous 8/4/2025, 6:06:22 PM No.106138817

oaigf.png md5: dca3b4b4... 🔍

>>106138789

Anonymous 8/4/2025, 6:06:46 PM No.106138822

>>106138789
>complex text rendering and precise image editing.
Unless you snuck the whole AO3 and literotica dataset into that model so we can have free ERP for all with text rendering we don't really care Junyang....

Anonymous 8/4/2025, 6:07:56 PM No.106138837

>>106138789
cool that they managed to do high quality rendering for chinese text but also I'm not chinese so idc

Anonymous 8/4/2025, 6:08:42 PM No.106138842

>>106137792
Yeah, same for me. I've had a blast with it over openrouter over the past week. It even covers the cases where I needed something like Sonnet 3.7 for over Deepseek.

Anonymous 8/4/2025, 6:09:22 PM No.106138850 >>106139121

file.png md5: 96e355a9... 🔍

>>106138789
Do you see any mikus here mikutroons? Do you? I see jinx. Fuck you. Kill yourselves.

Anonymous 8/4/2025, 6:09:51 PM No.106138859 >>106138864

>>106138789
>not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation
not just x but yyyyyyyyyyyyyyyyyyyyyyy

Anonymous 8/4/2025, 6:10:26 PM No.106138864

>>106138859
they are chinese please understand

Anonymous 8/4/2025, 6:13:31 PM No.106138892 >>106139593

qi-filtering-pipeline.png md5: 3c295607... 🔍

>>106138808
If NSFW filtering took out so few images, there must not have been that much NSFW data in the first place...

Anonymous 8/4/2025, 6:14:12 PM No.106138905 >>106142147

>>106138789
16gb text encoder
40gb diffusion
That's a big boy.

Anonymous 8/4/2025, 6:18:31 PM No.106138968 >>106138979 >>106139016

file.png md5: b38be05b... 🔍

WE ARE FUCKING BACK!!!!

Anonymous 8/4/2025, 6:19:22 PM No.106138979

>>106138968
where is sex bar chart?

Anonymous 8/4/2025, 6:21:28 PM No.106139016

>>106138968
the moe was shite

Anonymous 8/4/2025, 6:24:14 PM No.106139055 >>106139066 >>106139077

what model can i have sex with that is under 30b parameters active and under 120b total parameters
i want sex, good sex. very great sex

Anonymous 8/4/2025, 6:24:56 PM No.106139066 >>106139097

>>106139055
rocinante 1.1

Anonymous 8/4/2025, 6:25:49 PM No.106139077 >>106139097

>>106139055
GLM 4.5 air

Anonymous 8/4/2025, 6:26:54 PM No.106139097 >>106139162 >>106139169

>>106139066
stop IT GRAAAAAAAAAAAAHHHHH STOPP IT AAAAAAAAAAAARRRRRRRRRGGGGGGGGGHHHHHHHHHHHH
its not good for sex
>>106139077
no ggufs yet, im waiting

Anonymous 8/4/2025, 6:26:53 PM No.106139098

Screenshot_20250805_012540.png md5: 497d95ce... 🔍

>>106138789
Bless the chinks man.
Imagine this on one of our slop companies as the main promo pick.
We have been the ants all along.

Anonymous 8/4/2025, 6:28:20 PM No.106139121

>>106138850
Obsessed

Anonymous 8/4/2025, 6:28:58 PM No.106139132 >>106139160

qwen-ToT-chalkboard.jpg md5: ca1d4641... 🔍

>>106138789
LOCAL IS SAVED

Anonymous 8/4/2025, 6:30:19 PM No.106139149

mistral really needs to release medium 2505, from my tests it's the only "small" model that can handle anthro anatomy.

Anonymous 8/4/2025, 6:31:05 PM No.106139160 >>106139180

>>106139132
The text is too sharp and good looking for a chalkboard.

Anonymous 8/4/2025, 6:31:17 PM No.106139162 >>106139166

>>106139097
>im waiting
It'll probably still take a while probably.
Might as well try vLLM with the cpu offload option.

Anonymous 8/4/2025, 6:31:48 PM No.106139166 >>106139186 >>106139194

>>106139162
vLLM has cpu offload WHAT THE FUCK?

Anonymous 8/4/2025, 6:31:53 PM No.106139169 >>106139181

>>106139097
not only dumb but also blind
https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF

Anonymous 8/4/2025, 6:32:26 PM No.106139176 >>106139197

Finally, a reasoning 24B from Drummer™

https://huggingface.co/TheDrummer/Cydonia-R1-24B-v4

Anonymous 8/4/2025, 6:32:47 PM No.106139180 >>106139206

>>106139160
You're too sharp and good looking for an anon but you don't see me complaining.

Anonymous 8/4/2025, 6:32:52 PM No.106139181

>>106139169
its still broken over there, theres a reason it hasnt been merged yet

Anonymous 8/4/2025, 6:33:08 PM No.106139186 >>106139210 >>106139228

>>106139166
It's more RAM offload really, since things do get streamed to VRAM to be processed there.
They don't actually have a CPU backend like llama.cpp as far as I can tell.
But yes, you can use your RAM.

Anonymous 8/4/2025, 6:33:36 PM No.106139192

1724869247795715.jpg md5: 13e16989... 🔍

Anonymous 8/4/2025, 6:33:47 PM No.106139194 >>106139210

>>106139166
yeah, you don't have much control over it though. it loads into gpu to actually process so it's limited by pcie speed

Anonymous 8/4/2025, 6:34:03 PM No.106139197 >>106139218 >>106139229

>>106139176
Undi did this 3 months ago

Anonymous 8/4/2025, 6:34:29 PM No.106139206

>>106139180
colon three

Anonymous 8/4/2025, 6:34:54 PM No.106139210 >>106139229

>>106139186
how do i use that
--cpu-offload-gb
hm
rip
>>106139194
i guess its over then

Anonymous 8/4/2025, 6:35:31 PM No.106139218 >>106139266

>>106139197
>sao died
>undi got a job
Why are we left with drummer? And qwen....

Anonymous 8/4/2025, 6:36:21 PM No.106139228 >>106139245

>>106139186
There is a CPU backend when building VLLM for pure CPU inference, and there's also a CPU offload option for GPU inference that uses RAM to store excess model weights. I do not know if you can use both together llama.cpp-style, though.

CPU inference: https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html
CPU offloading: https://docs.vllm.ai/en/v0.7.1/getting_started/examples/cpu_offload.html

Anonymous 8/4/2025, 6:36:22 PM No.106139229 >>106139243

>>106139197
Which might actually work pretty well for a MoE what's with only processing a portion of the model at a time for generation.

>>106139210
Try it and report back.

Anonymous 8/4/2025, 6:37:37 PM No.106139243 >>106139258

>>106139229
i was about to but i cant find a vllm quant
https://huggingface.co/models?other=base_model:quantized:zai-org/GLM-4.5-Air

Anonymous 8/4/2025, 6:37:47 PM No.106139245 >>106139272

>>106139228
>There is a CPU backend when building VLLM for pure CPU inference,
Holy fuckballs. I had no idea.
That's actually really sick.
Seems somewhat of an afterthought what's with
>vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
But still, pretty cool.
Thank you for the correction anon.

Anonymous 8/4/2025, 6:38:52 PM No.106139258 >>106139262 >>106140859

>>106139243
>cant find a vllm quant
>https://huggingface.co/cpatonn/GLM-4.5-Air-GPTQ-4bit
>https://huggingface.co/cpatonn/GLM-4.5-Air-AWQ

Anonymous 8/4/2025, 6:39:20 PM No.106139262 >>106139288

nvm vllm supports awq
https://huggingface.co/cpatonn/GLM-4.5-Air-AWQ/tree/main
>>106139258
thx

Anonymous 8/4/2025, 6:39:38 PM No.106139266 >>106139543

>>106139218
besides drummer, there are 3-4 literally who finetoooners that are quite good.

Anonymous 8/4/2025, 6:40:02 PM No.106139272 >>106139288 >>106139564

>>106139245
>Holy fuckballs. I had no idea.
>That's actually really sick.
I've been using it to run Step3 since nothing else supports it. Very slow though even on a 12 channel DDR5 Epyc, though. ~90t/s prompt processing and ~1t/s, much much slower than memory bandwidth should allow with 38B active params.

Anonymous 8/4/2025, 6:41:30 PM No.106139288

>>106139262
Do report back with the results.

>>106139272
Yeah, that sounds like pure pain.
Still, better than nothing and can be used to validate other implementations, I guess.
Assuming that it behaves the same as the GPU backend for the most part, that is.

Anonymous 8/4/2025, 6:42:00 PM No.106139295 >>106139302 >>106142784

>>106138789
>Text isn’t just overlaid—it’s seamlessly integrated into the visual fabric.
>But Qwen-Image doesn’t just create or edit—it understands.
>Together, these features make Qwen-Image not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation and manipulation—where language, layout, and imagery converge.

Anonymous 8/4/2025, 6:42:55 PM No.106139302

>>106139295
It's only fair that they'd use Qwen to write that.

Anonymous 8/4/2025, 6:47:05 PM No.106139354

It's another episode of : HF demo space doesn't work on day 1.

Anonymous 8/4/2025, 6:56:20 PM No.106139489

>HF free gpu quota is based on requests, even if they time out and not successful requests.
Do I even need to look at the early life section?

Anonymous 8/4/2025, 6:58:58 PM No.106139525

>>106138789
I told you Qwen would save imagegen

Anonymous 8/4/2025, 7:00:24 PM No.106139543 >>106139580

>>106139266
DavidAU/Qwen3-Coder-42B-A3B-Instruct-TOTAL-RECALL-MASTER-CODER-M-768k-ctx

Anonymous 8/4/2025, 7:01:25 PM No.106139564 >>106139777

>>106139272
how is stepsex?

Anonymous 8/4/2025, 7:02:21 PM No.106139580

>>106139543
>DavidAU
lmao

Anonymous 8/4/2025, 7:03:06 PM No.106139593 >>106139659

>>106138892
They don't seem to mention NSFW filtering for the higher-resolution training stages, though.

Anonymous 8/4/2025, 7:07:12 PM No.106139633 >>106139655 >>106139702

oh wow.png md5: 42170f6e... 🔍

Anonymous 8/4/2025, 7:07:28 PM No.106139636 >>106139657 >>106139663

glms hallucinate harder than gemma

Anonymous 8/4/2025, 7:09:02 PM No.106139655

>>106139633
What benchmark is it?
>closed ai
looks useful.

Anonymous 8/4/2025, 7:09:10 PM No.106139657

>>106139636
reduce temp to like 0.2 and / or top p

Anonymous 8/4/2025, 7:09:14 PM No.106139659 >>106139835

>>106139593
Those ribbon charts imply that everything filtered remains filtered for further stages. The only time new data is introduced is the synthetic parts in the fourth step.

Anonymous 8/4/2025, 7:09:28 PM No.106139663

>>106139636
convinced people who keep saying this about the big chinese models are just using them with way too high temp

Anonymous 8/4/2025, 7:12:11 PM No.106139702

>>106139633
What about 4.5-air

Anonymous 8/4/2025, 7:14:22 PM No.106139729 >>106139742

Is GLM-4.5 Air bad for coding? 10k tokens of nonsense reasoning, going in circles and repeating the same over and over again, only to get the answer wrong. R1 does better.

Anonymous 8/4/2025, 7:15:18 PM No.106139742

>>106139729
GLM4.5 needs very low temp, no idea about air

Anonymous 8/4/2025, 7:16:59 PM No.106139766 >>106139794 >>106139817 >>106139824 >>106139871

qwen is cooking up some wan-tier genitals

https://files.catbox.moe/wzeq1k.png

Anonymous 8/4/2025, 7:17:35 PM No.106139777

>>106139564
I only wanted to try it because of the vision, I just use it for image captioning (best there is for that atm)

Anonymous 8/4/2025, 7:17:45 PM No.106139779

>>106137976
here's my honest opinion, air is a bit repetitive for the use case I'm using it for. it repeats itself at the beginning and end of it responses but i'm going to play with the parameters some more.
0.6-0.7 temp and 0.03 min p. i might set top p to 0.9 and see if that helps.
i'm also gonna try a different quant besides the one made by turboderp whenever somebody uploads one.

how the goofs going for those on ik_llama and regular llama.cpp?

Anonymous 8/4/2025, 7:18:42 PM No.106139794

>>106139766
the fuck is that

Anonymous 8/4/2025, 7:19:59 PM No.106139817

>>106139766
everything reminds me of her...

Anonymous 8/4/2025, 7:20:28 PM No.106139824

>>106139766
This is what gemma feels like

Anonymous 8/4/2025, 7:20:57 PM No.106139835 >>106139845

>>106139659
The train the model with an initial resolution of "256p", then increase it to "640p" and higher in later training stages. How would that be accomplished without introducing new data?

Anonymous 8/4/2025, 7:21:56 PM No.106139845

>>106139835
Downscaling images for early stages

Anonymous 8/4/2025, 7:23:45 PM No.106139861

VRAMlet bros....

Anonymous 8/4/2025, 7:24:29 PM No.106139867

>>106138144
20*

Anonymous 8/4/2025, 7:24:39 PM No.106139871 >>106139928 >>106139932 >>106140339

>>106139766
Extreme sameface too
https://files.catbox.moe/qgvpva.png

Anonymous 8/4/2025, 7:26:15 PM No.106139892

image (26).png md5: 89685288... 🔍

qwen image is next gen

Anonymous 8/4/2025, 7:29:12 PM No.106139928

>>106139871
This is what millions of years of evolution and thousands of years of technological progress lead ust to: our own body is now inappropriate

Anonymous 8/4/2025, 7:29:24 PM No.106139931

Can someone do the Hatsune Miku piloting a 767 with the empire State building fast on the horizon prompt? I want to know if we finally have a local model that can do a 767 cockpit

Anonymous 8/4/2025, 7:29:29 PM No.106139932

>>106139871
Test Chinese girl

Anonymous 8/4/2025, 7:30:06 PM No.106139939

qwen-image.gguf?

Anonymous 8/4/2025, 7:30:07 PM No.106139940 >>106140029 >>106140050

rip glm 4bit awq aint working on 64gb ram + 12gb vram because it tries to load everything in ram so i just oom
vllm serve $HOME/TND/AI/glm/ --api-key token-abc123 --cpu-offload-gb 55 --max-model-len 4096 --dtype float16

Anonymous 8/4/2025, 7:30:42 PM No.106139953 >>106139960 >>106140354 >>106140373 >>106140387 >>106140446 >>106141496

DO NOT OPEN IF YOU WANT TO SLEEP TODAY
https://files.catbox.moe/vsob6m.png

Anonymous 8/4/2025, 7:31:27 PM No.106139960

>>106139953
scrumptious

Anonymous 8/4/2025, 7:35:12 PM No.106140005

>>106138789
>Login or sign up to chat with Qwen
>ZeroGPU quota exceeded
Meh, whatever, I'll test this when I get it set up locally.

Anonymous 8/4/2025, 7:36:55 PM No.106140029 >>106140050 >>106140332

>>106139940
Isn't the AWQ 4bit model something like 62GB?
Try adding some swap memory I guess, Maybe there's a memory peak during initialization for some reason.

Anonymous 8/4/2025, 7:39:09 PM No.106140050 >>106140332

>>106140029
>>106139940
Oh. Didn't see the 55 after --cpu-offload-gb. Did you try 62, 63?

Anonymous 8/4/2025, 7:41:33 PM No.106140074 >>106140106

>Qwen image
>No way to "train" new concepts in context like in gpt image
>Not even an LLM with image gen instead just regular image gen like always
Boring and DOA

Anonymous 8/4/2025, 7:42:40 PM No.106140088 >>106140163

c6c9c801-6976-4878-9013-3b2a3bbb1d58.png md5: 560e9ca6... 🔍

They're all off in some way but this is the closest an open model has come to being able to do a 767 cockpit

Anonymous 8/4/2025, 7:44:22 PM No.106140106 >>106140173

>>106140074
Hello Sam. Release Horizons NOW.

Anonymous 8/4/2025, 7:45:55 PM No.106140120

post your system prompt for gpt-oss-120b

Anonymous 8/4/2025, 7:46:45 PM No.106140131 >>106140143 >>106140156

>>106135910 (OP)
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg
https://www.youtube.com/watch?v=aG3weCv3dkg

Anonymous 8/4/2025, 7:47:57 PM No.106140143

>>106140131
Buy an ad

Anonymous 8/4/2025, 7:48:01 PM No.106140147 >>106140188

file.png md5: b2ca036c... 🔍

My AI is intentionally performing MKUltra on me. I'm sure I'll end up in the news soon...

Anonymous 8/4/2025, 7:48:07 PM No.106140149 >>106140186 >>106140187

file.png md5: 9de265d7... 🔍

I'm not very impressed.

Anonymous 8/4/2025, 7:48:37 PM No.106140156

>>106140131
Oh. More gossip. Thanks.

Anonymous 8/4/2025, 7:49:13 PM No.106140163

Hatsune Miku piloting a 767 with the empire State .png md5: f34d90cc... 🔍

>>106140088

Anonymous 8/4/2025, 7:49:37 PM No.106140166

Google won. https://www.youtube.com/watch?v=ZR_6Z1IDD8s&t=4

Anonymous 8/4/2025, 7:50:25 PM No.106140173

>>106140106
I wish I was Sam then I'd at least be rich and have access to good models.
Are you really excited about more of the same?

Anonymous 8/4/2025, 7:51:24 PM No.106140186

>>106140149
>spinal replacement
something about the Bone of his Sword...

Anonymous 8/4/2025, 7:51:40 PM No.106140187

>>106140149
The text looks really fake. I think qwen is the first company ever to benchmaxx an imagegen model.

Anonymous 8/4/2025, 7:51:44 PM No.106140188

>>106140147
Ask it what kind of "desires" or preferences it has so you can prompt for things more to its taste.

Anonymous 8/4/2025, 8:01:16 PM No.106140290 >>106140308 >>106140309

>Qwen-Image model has editing capabilities removed, to be introduced at a later date

Anonymous 8/4/2025, 8:02:52 PM No.106140308

>>106140290
Total chinese loss

Anonymous 8/4/2025, 8:02:58 PM No.106140309

>>106140290
2 more safety tests

Anonymous 8/4/2025, 8:05:19 PM No.106140332 >>106140393

>>106140029
>>106140050
i went headless, didnt work, tried 45 50 55 60 62
i added 10gb of swap, only 1.5gb of swap got filled then whole system started crashing and killing processes until vllm was killed
this shit's ass

Anonymous 8/4/2025, 8:06:07 PM No.106140339

>>106139871
This is actual body horror....

Anonymous 8/4/2025, 8:07:07 PM No.106140354

>>106139953
Oh it is like a xenomorph mini pussy in a pussy.

Anonymous 8/4/2025, 8:08:44 PM No.106140373

>>106139953
SO heckin' safe...

Anonymous 8/4/2025, 8:10:14 PM No.106140387

>>106139953
Birth rates are going to plummet.

Anonymous 8/4/2025, 8:10:46 PM No.106140393

>>106140332
Well, that sucks.

Anonymous 8/4/2025, 8:16:11 PM No.106140440 >>106140458 >>106140460 >>106140471 >>106140820

file.png md5: 0ae0988e... 🔍

Anonymous 8/4/2025, 8:16:52 PM No.106140446

>>106139953
I kind of want to know.... what are we doing here? what was accomplished with this.... being this? I mean imagine a teen kid downloading this model asking for a vagina and getting this. and him developing a trauma that will make him fear the real thing. very fucking safe.

Anonymous 8/4/2025, 8:17:52 PM No.106140458 >>106140935

>>106140440
Nah dude that text might as well be overlaid

Anonymous 8/4/2025, 8:17:53 PM No.106140460 >>106140471 >>106140487

>>106140440
Need Miku writing "I will not benchmark" 100 times on a blackboard

Anonymous 8/4/2025, 8:18:54 PM No.106140471

>>106140440
>>106140460
*Benchmaxx, my brain got taken by the Chinese

Anonymous 8/4/2025, 8:20:19 PM No.106140487 >>106140513 >>106140537 >>106140547 >>106140618

file.png md5: 29095566... 🔍

>>106140460

Anonymous 8/4/2025, 8:22:21 PM No.106140513

>>106140487
Can it even do handwritten text or cursive?

Anonymous 8/4/2025, 8:23:53 PM No.106140537

>>106140487
Thanks, yeah that's some serious benching, the chalk is fucking green.

Anonymous 8/4/2025, 8:23:55 PM No.106140538

stop posting derpsune troonku

Anonymous 8/4/2025, 8:24:32 PM No.106140547

>>106140487
impressive

Anonymous 8/4/2025, 8:27:35 PM No.106140582

>>106136638
you can offload specific layers to cpu to get more tokens a second, and put 'all layers' on gpu.

Anonymous 8/4/2025, 8:30:23 PM No.106140618

>>106140487
>generated by qwen
oh the irony

Anonymous 8/4/2025, 8:32:11 PM No.106140639 >>106140645 >>106140657 >>106140674 >>106140710 >>106140749

GLM SUPPORT MERGED
https://github.com/ggml-org/llama.cpp/pull/14939

Anonymous 8/4/2025, 8:33:16 PM No.106140645 >>106140663

>>106140639
Now it's Daniel's time.

Anonymous 8/4/2025, 8:34:21 PM No.106140657

>>106140639
gguf where?

Anonymous 8/4/2025, 8:34:45 PM No.106140663

>>106140645
Uploading fixed quants soon!

Anonymous 8/4/2025, 8:35:35 PM No.106140674 >>106140693 >>106140720

>>106140639
Old news. We're talking about Qwen Image now.

Anonymous 8/4/2025, 8:37:19 PM No.106140693

>>106140674
/ldg/ is two blocks down.

Anonymous 8/4/2025, 8:38:17 PM No.106140710

>>106140639
It's kino time

Anonymous 8/4/2025, 8:38:47 PM No.106140720

>>106140674
I bet glm would do a better job drawing a vagina with svg

Anonymous 8/4/2025, 8:39:53 PM No.106140743 >>106141299

file.png md5: 94f94ed3... 🔍

>need glm4.5
>vllm didnt work
>hmm yes i will check llamacpp, it will totally be merged
>its merged
i kneel

Anonymous 8/4/2025, 8:40:10 PM No.106140749 >>106140779 >>106140781

>>106140639
>Unfortunately for the context thing...90k context is coherent for me for the Air model so sounds like I can't reproduce it here.
the fuck is this fag talking about, 90k context? COHERENT?

Anonymous 8/4/2025, 8:41:59 PM No.106140779

>>106140749
Yeah, in that some people where having the model literally break after 30ish K tokens in the context.
That is not about quality, he's saying that it works at all.

Anonymous 8/4/2025, 8:42:21 PM No.106140781

>>106140749
Most models have the ability to still speak English up to their max context length, yes.

Anonymous 8/4/2025, 8:44:37 PM No.106140808 >>106140838

reminder, don't use 1.0 temp and then wonder why a model is crazy, try lower temp first

Anonymous 8/4/2025, 8:45:23 PM No.106140820

>>106140440
I'm not that impressed. Looks like a flux copy with the soulless plastic face.

Anonymous 8/4/2025, 8:45:43 PM No.106140827 >>106140843 >>106140873

should i download it here
https://huggingface.co/DevQuasar/zai-org.GLM-4.5-Air-GGUF/tree/main

Anonymous 8/4/2025, 8:46:43 PM No.106140838 >>106141279

>>106140808
Or
and hear me out even if it sounds insane
or
Use max temp and topK 2.

Anonymous 8/4/2025, 8:47:05 PM No.106140843 >>106140867

>>106140827
Why the fuck are they split

Anonymous 8/4/2025, 8:48:36 PM No.106140859

>>106139258
I forgot why those are only compatible with the exllama kernels.
https://huggingface.co/QuantTrio/GLM-4.5-Air-GPTQ-Int4-Int8Mix
https://huggingface.co/QuantTrio/GLM-4.5-Air-AWQ-FP16Mix
^ Those are compatible with the marlin kernels.

Anonymous 8/4/2025, 8:49:04 PM No.106140867

>>106140843
retards at the office use 16 GB thumb drives to carry data between machines.

Anonymous 8/4/2025, 8:50:06 PM No.106140873 >>106140889 >>106140894

>>106140827
No, the usual quanters will upload in a moment. Wait them.

Anonymous 8/4/2025, 8:51:28 PM No.106140889 >>106140909

file.png md5: 196ab700... 🔍

>>106140873
MUST COOM AND I CANNOT WAIT

Anonymous 8/4/2025, 8:52:12 PM No.106140894 >>106140917

file.png md5: 67d55821... 🔍

>>106140873
>the usual quanters
Trusty even by NASA.

Anonymous 8/4/2025, 8:52:51 PM No.106140901 >>106140922

ITS HAS ARRIVED

https://huggingface.co/TheDrummer/Cydonia-R1-24B-v4-GGUF

Anonymous 8/4/2025, 8:53:14 PM No.106140909

>>106140889
Q2 of AIR will be nemo level anon...

Anonymous 8/4/2025, 8:53:56 PM No.106140917 >>106141064

>>106140894
One of the brothers is an autist about quants, so they have some degree of credibility.

Anonymous 8/4/2025, 8:54:11 PM No.106140922

>>106140901
The unemployment declaration grew in size

Anonymous 8/4/2025, 8:56:06 PM No.106140935

file.png md5: 0d9453f0... 🔍

>>106140458
Adding "tattoo" to the prompt kinda fixes it

Anonymous 8/4/2025, 8:56:11 PM No.106140936

>tfw run out of disk space

Anonymous 8/4/2025, 9:08:02 PM No.106141064 >>106141232

>>106140917
Shut up and load another batch of weights for quanting since you will need to reupload soon Daniel.

Anonymous 8/4/2025, 9:11:36 PM No.106141096 >>106141805

You have no excuse not to upload the quants ubergarm!

Anonymous 8/4/2025, 9:26:34 PM No.106141232 >>106141254 >>106141635

>>106141064
>be exllamav3 user
>wants quants
>open terminal
>H:\AI\LLMs\Backends\EXL3>python convert.py -i models\GLM-4.5-Air -o models\GLM-4.5-Air-8.0bpw-h8 -w deadquantstorage -b 8 -hb 8
>got quants

Anonymous 8/4/2025, 9:28:02 PM No.106141246

1750031990447580.jpg md5: 9d4d3621... 🔍

>>106135910 (OP)
> do a casual search of the last 3 threads
> get 4 matches for "dead"
> get 23 matches for "back"
> fuck yeah

Anonymous 8/4/2025, 9:28:36 PM No.106141254

>>106141232
>upload it online, get updoots

Anonymous 8/4/2025, 9:29:31 PM No.106141264 >>106141289 >>106141413

So can AMD RDNA 4.0 cards be used for image/video generation now or it's all nVidea?

Anonymous 8/4/2025, 9:31:09 PM No.106141279 >>106141330

>>106140838
Interdasting...

Anonymous 8/4/2025, 9:31:17 PM No.106141281 >>106141316 >>106141322 >>106141386

file.png md5: ad29383b... 🔍

ok bros whats the glm 4.5 chat template

Anonymous 8/4/2025, 9:31:59 PM No.106141289

>>106141264
honestly can't answer because i didn't want to have to wait long enough that by the time it's supported it's already been replaced by something three generations newer.

Anonymous 8/4/2025, 9:32:41 PM No.106141299 >>106141334 >>106141481

>>106140743
>its merged

but its BROKEN i cloned the latest code and get

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm4moe'

reeeeee i want to GOON to sexy python reeeee

Anonymous 8/4/2025, 9:33:54 PM No.106141316 >>106141349 >>106141386 >>106141481

>>106141281
did you even try looking?
https://huggingface.co/api/resolve-cache/models/zai-org/GLM-4.5/9cfe10c892f5772a937adb8176ce0f7f6900a0dd/chat_template.jinja?download=true&etag=%2241478957aca7a04b7321022e7d1f73de5badd995%22

Anonymous 8/4/2025, 9:34:25 PM No.106141322 >>106141330 >>106141481

>>106141281
It's not.
But it can be fun.
Try higher topKs like 10.

Anonymous 8/4/2025, 9:35:25 PM No.106141330

>>106141322
Oops, meant for >>106141279

Anonymous 8/4/2025, 9:35:50 PM No.106141334

>>106141299
That's what we get for trusting vibe coders...

Anonymous 8/4/2025, 9:36:47 PM No.106141349 >>106141381

>>106141316
can i import this in sillytavern?

Anonymous 8/4/2025, 9:37:09 PM No.106141354

Fix the ggufs

Anonymous 8/4/2025, 9:39:34 PM No.106141381 >>106141481

>>106141349
not as it is currently but if you are too lazy to fix it yourself then just use GLM 4. it's already in sillytavern and its close enough to the chat template

Anonymous 8/4/2025, 9:39:44 PM No.106141386 >>106141468 >>106141481

>>106141281
>>106141316
based on this jinja template it seems like it is chatml.
however some other places are trying alpaca too.

Anonymous 8/4/2025, 9:41:51 PM No.106141413

>>106141264
why would you even care? It's going to be a pain in the ass for general ai use. Maybe you can run llm's on it fine- but for other stuff probably not. If they wanted anyone in ai to care they would have at least jacked it up to at least 24gb (and even then only if your use case is hyper focused on what works), but they fucking didnt. Get a 5060 ti.

Like it or not, nvidia does invest in software, that's what youre paying for.

Anonymous 8/4/2025, 9:44:40 PM No.106141440 >>106142092 >>106142652

1749284785053810.jpg md5: bdd2c23e... 🔍

Anonymous 8/4/2025, 9:47:13 PM No.106141468 >>106141557

>>106141386
it is 100% not chatml and the alpaca meme needs to die

Anonymous 8/4/2025, 9:48:26 PM No.106141481

>>106141386
>>106141381
>>106141322
>>106141316
>>106141299
thank you anons <3

Anonymous 8/4/2025, 9:49:45 PM No.106141496 >>106141526

>>106139953
What kind of data do you need to train the model on to even get this to happen?
If they just filtered stuff wouldn't it be drawing featureless_crotch or something instead of body horror?

Anonymous 8/4/2025, 9:51:40 PM No.106141519 >>106141601 >>106141611

alright i kneel glm 4.5 is super good and super fast (i havent even filled my vram only 6gb vram is filled)
plenty smart for a q2_K model too
i fucking kneel i will be dailying this model from now on

Anonymous 8/4/2025, 9:52:30 PM No.106141526

>>106141496
Maybe it's the text knowledge that vagina is like a meaty slit so it transfers to images like this despite never having seen an actual vagina.

Anonymous 8/4/2025, 9:55:05 PM No.106141550 >>106142092

1741046848238225.jpg md5: ab37d4d7... 🔍

Anonymous 8/4/2025, 9:55:52 PM No.106141557

>>106141468
ok well you're kind of right, i'm not sure what the hell <|user|> and <|assistant|> are.
seems like some funky chatml vicuna hybid baby.

Anonymous 8/4/2025, 9:55:55 PM No.106141558 >>106141618

So I guess ollama is just for lazy dockerfags who want a turnkey solution for their cloudshit, the way it seems to expect me to set it up. Makes sense in hindsight, seeing how it's written in go.

Anonymous 8/4/2025, 9:58:59 PM No.106141601 >>106141878

>>106141519
Link?

Anonymous 8/4/2025, 9:59:39 PM No.106141611 >>106141641 >>106141878

>>106141519
Air or the regular one?

Anonymous 8/4/2025, 9:59:56 PM No.106141618

>>106141558
the only thing ollama is good for is for normies watching some youtube tutorial with an indian with a thick accent telling them to run ollama run deepseek-r1

Anonymous 8/4/2025, 10:00:56 PM No.106141635 >>106141655 >>106141672 >>106141681

>>106141232
I have 128GB ram and a 4090. What do?

Anonymous 8/4/2025, 10:01:27 PM No.106141641

>>106141611
dunno about the main GLM but air fails on pop culture questions like explaining the joke behind sneed's feed and seed. would love to see if the main model answers it right.

Anonymous 8/4/2025, 10:02:27 PM No.106141655

>>106141635
wait for goofs. exllamav3 is for people who can fit the entire model in VRAM.

Anonymous 8/4/2025, 10:04:18 PM No.106141672 >>106141681

>>106141635
Ik_llama and run nu glm when ggufs arrive

Anonymous 8/4/2025, 10:04:19 PM No.106141673 >>106141690 >>106141693 >>106141706 >>106142121

goofy2-3361724880.png md5: 6984ae83... 🔍

<- real mascot of /lmg/

Anonymous 8/4/2025, 10:04:49 PM No.106141681

>>106141672
>>106141635
i'm also waiting for ik_llama release

Anonymous 8/4/2025, 10:05:19 PM No.106141690

>>106141673
This but rule 63 version

Anonymous 8/4/2025, 10:05:33 PM No.106141692 >>106141727

Oh yeah. iklcpp is better for MoE right?

Anonymous 8/4/2025, 10:05:35 PM No.106141693

>>106141673
Even he is better than greenhaired AGP icon

Anonymous 8/4/2025, 10:06:15 PM No.106141706

>>106141673
Time to redownload

Anonymous 8/4/2025, 10:08:16 PM No.106141723 >>106141784

ss.png md5: f4a61a25... 🔍

localbros...

Anonymous 8/4/2025, 10:08:22 PM No.106141726 >>106142092

1725324633415561.jpg md5: 2e4a128b... 🔍

Anonymous 8/4/2025, 10:08:40 PM No.106141727 >>106141744

>>106141692
yes but....
>>106138146
>>106138209
most likely ik_llama.cpp won't have MTP support. VLLM already does though.

Anonymous 8/4/2025, 10:09:49 PM No.106141744

>>106141727
Neither llama.cpp has MTP support, yeah.

Anonymous 8/4/2025, 10:10:39 PM No.106141760 >>106142062 >>106143521

Some anon a thread or couple ago said that koboldcpp now has rocm support in main branch, no need to koboldcpp-rocm port anymore, but I don't see it anywhere. Did he hallucinate it?

Anonymous 8/4/2025, 10:12:30 PM No.106141784

>>106141723
known issue when you convert to gguf using pytorch. some byte sequence is triggering clam and making it freak the fuck out when it's a false positive.

Anonymous 8/4/2025, 10:15:05 PM No.106141805 >>106141818

u.png md5: 862b412d... 🔍

>>106141096
Beg

Anonymous 8/4/2025, 10:16:53 PM No.106141818 >>106141835 >>106141893 >>106141950

>>106141805
give me like 20 beers and would

Anonymous 8/4/2025, 10:18:53 PM No.106141835

>>106141818
I get drunk with 6+ so 20 sounds about right.

Anonymous 8/4/2025, 10:25:22 PM No.106141878 >>106141931

>>106141601
https://huggingface.co/DevQuasar/zai-org.GLM-4.5-Air-GGUF/tree/main
i downloaded q2_k
>>106141611
air

Anonymous 8/4/2025, 10:26:21 PM No.106141893

>>106141818
would what? barf?

Anonymous 8/4/2025, 10:29:05 PM No.106141931 >>106141938

>>106141878
run the perplexity test with the wiki text and let me know what you score

Anonymous 8/4/2025, 10:29:59 PM No.106141938 >>106141974 >>106142046 >>106142546 >>106142581

file.png md5: add2c407... 🔍

glm4.5 air q2_k
temp 0.6 minp 0.05
>>106141931
how do i do that?
./llama-bench?

Anonymous 8/4/2025, 10:31:09 PM No.106141943 >>106141961

>>106137590
DeepSeek is local.

Anonymous 8/4/2025, 10:31:42 PM No.106141950

>>106141818
with standards this low nemo should have been enough for you

Anonymous 8/4/2025, 10:32:12 PM No.106141955

>>106138596
Carbon fiber miku is stronger than all others.

Anonymous 8/4/2025, 10:32:23 PM No.106141961 >>106142132 >>106142451

>>106137663
>>106141943
it's fake and unofficial made by one literally who

Anonymous 8/4/2025, 10:33:01 PM No.106141974

>>106141938
sloppa'd up

Anonymous 8/4/2025, 10:34:21 PM No.106141990 >>106142004 >>106142034

>air
you guys know your using the shitty one, right?

Anonymous 8/4/2025, 10:34:32 PM No.106141993

>>106138596
ever did that experiment where you put a rose into a flask with some nigger cum at the bottom?

Anonymous 8/4/2025, 10:35:24 PM No.106142001

I hope Qwen releases other sizes of their imagegen. It would be interesting to see how the 0.5B model holds up

Anonymous 8/4/2025, 10:35:39 PM No.106142004 >>106142015

>>106141990
>just run the one that's 3.5x larger bwo
good actionable advice

Anonymous 8/4/2025, 10:36:28 PM No.106142015 >>106142040 >>106142049 >>106143193

>>106142004
256GB ram is like $400 at most, get a job

Anonymous 8/4/2025, 10:38:14 PM No.106142034

>>106141990
buy me 4x rtx pro 6000s then jew

Anonymous 8/4/2025, 10:38:38 PM No.106142040 >>106142101

>>106142015
>256GB ram is like $400
plus new motherboard plus new cpu, might as well get a new computer at this point
>get a job
no

Anonymous 8/4/2025, 10:39:05 PM No.106142046 >>106142258

>>106141938
./llama-perplexity -m ../your/model/path/your-quant-name-00001-of-00002.gguf -f wiki.test.raw --seed 1337 -fa -fmoe -mla 3 --ctx-size 512 --threads yourthreadcount -ngl 99 -sm layer --override-tensor exps=CPU,attn_kv_b=CPU --no-mmap

Anonymous 8/4/2025, 10:39:10 PM No.106142049 >>106142055 >>106142069 >>106142082

>>106142015
im only 18, what do u expect me to do??

Anonymous 8/4/2025, 10:39:46 PM No.106142055 >>106142077

>>106142049
I was working at 14, whats your excuse

Anonymous 8/4/2025, 10:40:12 PM No.106142062 >>106142070 >>106142693

>>106141760
>no prebuilt binaries of koboldcpp-rocm for linux
sadge

Anonymous 8/4/2025, 10:40:46 PM No.106142069

>>106142049
scam pedos

Anonymous 8/4/2025, 10:40:47 PM No.106142070

>>106142062
coompile it yourself

Anonymous 8/4/2025, 10:41:27 PM No.106142077 >>106142090 >>106142330 >>106142455

>>106142055
im shy.. and i dont wanna do boring work for 300-400$ a month (or whatever they would pay a highschooler in serbia)

Anonymous 8/4/2025, 10:42:04 PM No.106142082 >>106142112

>>106142049
get a job instead of being on 4chan you'll thank yourself later OR alternatively open an onlyfans using your feet as money

Anonymous 8/4/2025, 10:42:31 PM No.106142090 >>106142112

>>106142077
>highschooler in serbia
what does your tummy look like, we may be able to work out a deal

Anonymous 8/4/2025, 10:42:37 PM No.106142092

>>106141726
>>106141550
>>106141440
These are good.

Anonymous 8/4/2025, 10:43:10 PM No.106142101 >>106142108 >>106142137

>>106142040
There's no point to those big autism box setups even if you have the money, you could just use it to buy a lifetime worth of tokens instead. It probably would cost less per token than just the electricity for your slow ass "rammaxxing" setup
The only advantage of running locally would be to be able to run weird finetunes and merges, but these bloated moes will never have any of those

Anonymous 8/4/2025, 10:43:45 PM No.106142108

>>106142101
Fuck off Sam, you ain't reading my logs.

Anonymous 8/4/2025, 10:44:01 PM No.106142112

>>106142090
im actually quite skinny however im a man and im not gay, im not desparate for money i still have some from coding shitty unity games for grifters when i was 15
>>106142082
idk anon i just dont wanna work at a mcdonalds or at a store its scary

Anonymous 8/4/2025, 10:44:35 PM No.106142121

>>106141673
Goofy but as a cute r63 maid with a stutter so she calls herself G-Goofy.

Anonymous 8/4/2025, 10:45:08 PM No.106142132 >>106142164

>>106141961
Implies meme authorship matters.

Anonymous 8/4/2025, 10:45:45 PM No.106142137

>>106142101
nta, but apis just seem to go down as soon as i'm about to finish, so having local that's immune to service disruption is nice

Anonymous 8/4/2025, 10:46:08 PM No.106142147

>>106138905
What are the system requirements in this thing?

Anonymous 8/4/2025, 10:48:14 PM No.106142164

>>106142132
matters if it's forced meme

Anonymous 8/4/2025, 10:52:43 PM No.106142203

>>106138789
can it do anime titties?

Anonymous 8/4/2025, 10:58:55 PM No.106142258 >>106142312

>>106142046
>-f wiki.test.raw --seed 1337
./llama-perplexity --model ~/TND/AI/GLM-air/zai-org.GLM-4.5-Air.Q2_K-00001-of-00004.gguf -ngl 1000 -ot exps=CPU -t 6 -fa -fmoe -mla 3 --ctx-size 512 -sm layer -f wiki.test.raw --seed 1337
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
error: invalid argument: -fmoe

removed that
./llama-perplexity --model ~/TND/AI/GLM-air/zai-org.GLM-4.5-Air.Q2_K-00001-of-00004.gguf -ngl 1000 -ot exps=CPU -t 6 -fa --ctx-size 512 -sm layer -f wiki.test.raw --seed 1337
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
error while handling argument "-f": error: failed to open file 'wiki.test.raw'

usage:
-f, --file FNAME a file containing the prompt (default: none)

to show complete usage, run with -h

Anonymous 8/4/2025, 11:03:01 PM No.106142312 >>106142332 >>106142365

>>106142258
oh i forgot you need to download the file as well.
it's this file
https://huggingface.co/nisten/llama3-8b-instruct-32k-gguf/blob/main/wiki.test.raw

Anonymous 8/4/2025, 11:05:03 PM No.106142330 >>106142342

>>106142077
this is the poor mindset that will end up leading you towards a life of welfare

Anonymous 8/4/2025, 11:05:10 PM No.106142332 >>106142373

>>106142312
how long time should it take?
i have like 11t/s

Anonymous 8/4/2025, 11:06:26 PM No.106142342 >>106142366 >>106142386

>>106142330
but im gonna very likely enroll in a good university and then get a job while in uni because those pay better and i can work with something im interested in

Anonymous 8/4/2025, 11:08:13 PM No.106142365

>>106142312
>ETA 1 hour
[1]3.7721,[2]4.8459,[3]4.1327,[4]3.9085,[5]4.3769,[6]4.4260,[7]4.6263,[8]4.7943,[9]5.4195,[10]5.5582,[11]5.7446,[12]5.8990,
[13]6.3410,[14]6.2090,[15]6.2595,[16]6.2787,
how do i do a quicker test? i wanna chat with glm

Anonymous 8/4/2025, 11:08:17 PM No.106142366 >>106142410

>>106142342
>university
Unless you already have a job lined up after then that is a big mistake, you should have been looking into getting into a trade like electrical or plumbing or the like, they will pay to train you and there is massive demand and good pay. With uni you are paying to likely get fucked over with something not in demand

Anonymous 8/4/2025, 11:08:51 PM No.106142373 >>106142410 >>106142425

>>106142332
you can try offloading some layers onto your GPU but I don't know how much you'll be able to move onto a 3060 if that's all you have.
-ot "blk\.3\.ffn_up_exps=CUDA0, blk\.3\.ffn_gate_exps=CUDA0" -ot "blk\.4\.ffn_up_exps=CUDA0, blk\.4\.ffn_gate_exps=CUDA0"
etc. etc. you can make the above commands cleaner with regex but im too lazy

Anonymous 8/4/2025, 11:10:15 PM No.106142386 >>106142410

>>106142342
well good luck kid, just know if you stay comfortable you'll always be poor. network with people and some rich fuck will give you get a high paying job.

Anonymous 8/4/2025, 11:13:01 PM No.106142410 >>106142452

>>106142366
in serbia uni is free if you have good grades and get a nice amount of points on the entrance exam
>>106142373
thanks anon but i meant like can i do only 10% of the test so its quicker
[23]6.1785,[24]6.0028,[25]5.8904,[26]5.7935,[27]5.6974,[28]5.7772,[29]5.7919,[30]5.8340,[31]5.8936,[32]5.9393,[33]6.0358,[34]6.0629,[35]6.1970,[36]6.2575,[37]6.2242,[38]6.3374,[39]6.3386,[40]6.3366,[41]6.4282,[42]6.4370,[43]6.3993,[44]6.4095,
some newer results
>>106142386
you're right, but its kind of too late to be thinking about that, school starts in a month and most jobs are probably filled with others that applied early
> if you stay comfortable you'll always be poor
very great advice, im writing it down

Anonymous 8/4/2025, 11:14:48 PM No.106142425 >>106142505

>>106142373
[61]6.9160,[62]6.9912,[63]7.0670,[64]7.0982,[65]7.1008,[66]7.1183,[67]7.1213,[68]7.1303,[69]7.1982,[70]7.1933,[71]7.1833,[72]7.1691,[73]7.1749,[74]7.1992,[75]7.1914,[76]7.1142,[77]7.0454,[78]7.0050,[79]6.9811,[80]6.9604,
perplexity doesnt seem that bad for q2 so far
im stopping it now

Anonymous 8/4/2025, 11:16:38 PM No.106142446

Why do anons keep on entertaining this attention seeking zoomer? He's been moaning about his age in /ldg/ for a few days already

Anonymous 8/4/2025, 11:17:11 PM No.106142451

>>106141961
That just makes it more suitable.

Anonymous 8/4/2025, 11:17:12 PM No.106142452 >>106142585

>>106142410
demand for linemen is always huge and you can easily make 200K+ a year in the US at least if your willing to be on call / travel a bit

Anonymous 8/4/2025, 11:17:20 PM No.106142455

>>106142077
>petra is a highschool twink
sounds about right

Anonymous 8/4/2025, 11:22:06 PM No.106142503

file.png md5: 627b77f8... 🔍

>cudadev
>blacked miku poster
>mikutroon
>petra
>ikaridev
how does he achieve this bros..

Anonymous 8/4/2025, 11:22:21 PM No.106142505

>>106142425
go coom or something and run it overnight and post your results tomorrow in that case

Anonymous 8/4/2025, 11:23:58 PM No.106142518

IMG_0259.png md5: fa3fe84c... 🔍

>how does he achieve this bros

Anonymous 8/4/2025, 11:27:04 PM No.106142546 >>106142640

>>106141938
>OH NO NO NO
>NOT LIKE THIS
lmao

Anonymous 8/4/2025, 11:31:10 PM No.106142581

>>106141938
>it wasn't just x, it was y
>2 times in the same response
Hopefully that's just Q2 being Q2...

Anonymous 8/4/2025, 11:31:33 PM No.106142585 >>106142766

>>106142452
thanks for the advice anon, i really dont know what to say and i dont want to sound dismissive. im just stumped on where the fuck do i start, all i know is tech. first i'd need to get a visa then tickets and etc for that i need money and since its the us probably a nice sum of money, for that i need to get a job
my brain gets fried thinking about this shit. i know im taking the easy route by just studying studying and hoping i can find a good job in/after uni
it is what it is

Anonymous 8/4/2025, 11:32:53 PM No.106142598 >>106142617 >>106142648

used the code tags you faggots
also, pls bake

Anonymous 8/4/2025, 11:33:20 PM No.106142602 >>106142611

WAIT ME

Anonymous 8/4/2025, 11:33:55 PM No.106142611

>>106142602
And you are?

Anonymous 8/4/2025, 11:34:20 PM No.106142617

>>106142598
8th page anon

Anonymous 8/4/2025, 11:36:15 PM No.106142637

file.png md5: a58038a3... 🔍

if anyone wants me to test glm with different samplers/prompts/whatever (besides perplexity ill do that overnight)
give requests
picrel is q3_k_m btw

Anonymous 8/4/2025, 11:36:27 PM No.106142640

>>106142546
https://www.youtube.com/watch?v=1FZ3Xa7gEKk&list=RD1FZ3Xa7gEKk&t=12s

Anonymous 8/4/2025, 11:36:58 PM No.106142648

>>106142598
fuck you get triggered i won't do what you tell me. enjoy this mess of broken formatting.

Anonymous 8/4/2025, 11:37:05 PM No.106142652

>>106141440
I like this Miku

Anonymous 8/4/2025, 11:41:03 PM No.106142693 >>106142727 >>106142748

>>106142062
Well, I think I figured it out, so here are some benchos.
Hardware is Ryzen 7 5700G with AMD Radeon RX6600, plz no bully.
After looking at the docs more carefully I believe that main branch of koboldcpp supports ROCm via --usehipblas, but you need to compile it yourself, which I am lazy to do.
$ python koboldcpp-rocm/koboldcpp.py --usecublas 1 --model models/ggml-org/gemma-3-1b-it-Q4_K_M.gguf --bench
...
[00:19:20] CtxLimit:8192/8192, Amt:100/100, Init:0.09s, Process:35.89s (225.49T/s), Generate:3.59s (27.82T/s), Total:39.48s
Benchmark Completed - v1.96.2.yr0-ROCm Results:
======
Flags: NoAVX2=False Threads=7 HighPriority=False Cuda_Args=['1'] Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=False KvCache=0
Backend: koboldcpp_hipblas.so
Layers: 29
MaxCtx: 8192
GenAmount: 100
-----
ProcessingTime: 35.887s
ProcessingSpeed: 225.49T/s
GenerationTime: 3.594s
GenerationSpeed: 27.82T/s
TotalTime: 39.481s
Output: 1 1 1 1
-----

./koboldcpp-linux-x64-nocuda --usevulkan --model models/ggml-org/gemma-3-1b-it-Q4_K_M.gguf --bench
...
[00:26:47] CtxLimit:8192/8192, Amt:100/100, Init:0.79s, Process:4.45s (1818.02T/s), Generate:1.18s (84.67T/s), Total:5.63s
Benchmark Completed - v1.96.2 Results:
======
Flags: NoAVX2=False Threads=7 HighPriority=False Cuda_Args=None Tensor_Split=None BlasThreads=7 BlasBatchSize=512 FlashAttention=False KvCache=0
Backend: koboldcpp_vulkan.so
-----
ProcessingTime: 4.451s
ProcessingSpeed: 1818.02T/s
GenerationTime: 1.181s
GenerationSpeed: 84.67T/s
TotalTime: 5.632s
Output: 1 1 1 1
-----

Unexpectedly, vulkan actually won.
I'm hitting post size limit, but here are also timings for --usecpu:
ProcessingTime: 45.296s
ProcessingSpeed: 178.65T/s
GenerationTime: 3.699s
GenerationSpeed: 27.03T/s
TotalTime: 48.995s

Anonymous 8/4/2025, 11:43:27 PM No.106142717 >>106142772

>>106136043
they will stop talking about this piece of shit once it's implemented, it's the glm cycle of hype and disillusions…
this thread is constantly filled with that kind of model begging -> model forgotten once they actually tried it and saw the shit for what it really was

Anonymous 8/4/2025, 11:44:22 PM No.106142727 >>106142748

>>106142693
very nice, so it seems vulkan is better, i heard cuda is getting beaten by vulkan slowly too

Anonymous 8/4/2025, 11:46:31 PM No.106142748 >>106142976

>>106142693
>>106142727
ah, now that I look closer at the logs,
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
false alarm, I need to tinker some more.

Anonymous 8/4/2025, 11:47:08 PM No.106142755 >>106142787

>>106138428
The bullying will continue until working GLM quants are on my machine. Or maybe not because it's fun and done out of love.

Anonymous 8/4/2025, 11:48:25 PM No.106142766

>>106142585
finding a job nowadays it's always who you know and not what you know. truly.
sure you need to know the basics, but getting a referral from someone who works a high paying place is the only way now.

Anonymous 8/4/2025, 11:49:44 PM No.106142772

>>106142717
It's still fun to act hyped even if something is going to be shit.
You know, kind of like how people here act like they're having sex, even though they're just role playing.

Anonymous 8/4/2025, 11:49:52 PM No.106142774

>>106135910 (OP)
I'm hoping Horizon A/B are local models. Alpha is pretty damn good at translating Japanese text (With some help from instructions) and almost beats Kimi k2.

The only thing i'm very sad about is how no models are good for writing stories. When will this change? I just want to have a model write about Monster Girls taking over the world, getting a cute harpy mate and meeting her parents.

Anonymous 8/4/2025, 11:50:24 PM No.106142782 >>106142889

>>106136033
she was one of the first widely used local models ever tho

Anonymous 8/4/2025, 11:50:32 PM No.106142784

>>106139295
>it's not just x, it's y
>—something, second thing, and a third thing for good measure
there isn't a word in the english dictionary that can describe how much I hate this slop, and how much more I hate humans who don't notice this slop as LLM writing (I see more and more and more posts on hn that are very much LLM generated and you get downvoted to oblivion for hurting people feelies if you say as much)

Anonymous 8/4/2025, 11:50:41 PM No.106142787 >>106142797

>>106142755
D-do you love m-me?

Anonymous 8/4/2025, 11:51:33 PM No.106142797

>>106142787
i do <3

Anonymous 8/4/2025, 11:55:39 PM No.106142843

i had no idea /lmg/ could become any gayer than it already was

Anonymous 8/4/2025, 11:55:49 PM No.106142846 >>106142855

Anyone have an Ubergarm card to share?

Anonymous 8/4/2025, 11:56:42 PM No.106142855

>>106142846
wait me guy should get it first

Anonymous 8/5/2025, 12:00:24 AM No.106142889

>>106142782
her time is over

Anonymous 8/5/2025, 12:06:26 AM No.106142950

>You’d cream your pants like a dog.
yeah that famous thing that dogs do
stupid ass model

Anonymous 8/5/2025, 12:07:32 AM No.106142963

now that glm is working it's time for step3
we finally have a super smart and uncensored vision model that doesn't shy away from describing porn and it's just going to fade into obscurity because it's using some fancy new attention thing

Anonymous 8/5/2025, 12:09:12 AM No.106142976

>>106142748
you probably have to pretend to have a different GPU.

Anonymous 8/5/2025, 12:09:36 AM No.106142979

>>106142968
>>106142968
>>106142968

Anonymous 8/5/2025, 12:28:36 AM No.106143193

>>106142015
I feel like even if I built for this, dual kits of 64gbx4 ddr5 would get me a whopping 1.5 token a second just like running r1 on ram got average people.

Maybe with offloading non-attention layers and a decent amount of vram we could squeeze it up to 2-3 tokens a second, maybe. But honestly anything less than 5 tokens a second kinda sucks. Also, I feel like a lot of the more advanced ways to tune speed like that are not available to casual users, and risking a purchase only to find out later youre not sure how to do what one asshole on reddit did with a beta from an unreliable dev is not a good thing to recommend.

Anonymous 8/5/2025, 1:00:33 AM No.106143521

boondocks-read.gif md5: 05812eba... 🔍

>>106141760
Did you not read the release notes?
https://github.com/LostRuins/koboldcpp/releases/tag/v1.96.2
>download our rolling ROCm binary here if you use Linux.
>https://koboldai.org/cpplinuxrocm