Thread 106575202

362 posts 80 images /g/

Anonymous 9/13/2025, 6:37:44 PM No.106575202 [Report] >>106575851 >>106576471 >>106576690 >>106578711 >>106579905 >>106581987

/lmg/ - Local Models General

mikulmg.jpg md5: 657c39ac...

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106566836 & >>106559371

►News
>(09/13) Ling & Ring 16B-A1.4B released: https://hf.co/inclusionAI/Ring-mini-2.0
>(09/11) Qwen3-Next-80B-A3B released: https://hf.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d
>(09/11) ERNIE-4.5-21B-A3B-Thinking released: https://hf.co/baidu/ERNIE-4.5-21B-A3B-Thinking
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 9/13/2025, 6:38:26 PM No.106575209 [Report]

mikuthreadrecap.jpg md5: 9007befb...

►Recent Highlights from the Previous Thread: >>106566836

--Differential privacy: VaultGemma's memorization reduction vs generalization:
>106566944 >106567707 >106568235 >106568661 >106568688 >106568747 >106568803 >106568808 >106568337 >106568567
--Qwen3-Next quantization and GPU deterministic inference challenges:
>106573151 >106573171 >106573199 >106573224 >106573235 >106573226 >106573279 >106573379 >106573425 >106573441 >106573467 >106573519 >106573610 >106573660
--1.7B open-sourced model achieves document OCR success with minor errors:
>106570867 >106570892 >106571715 >106570901 >106570943 >106572018 >106572081 >106572287 >106575181
--Balancing GPU driver updates for software support vs power efficiency and stability:
>106572592 >106572637 >106572669 >106572729
--ASML and Mistral AI form €1.3 billion strategic partnership:
>106574819 >106574857 >106574864 >106574900
--Challenges in domain-specific knowledge teaching with LoRA and summarized information:
>106568875 >106568949
--vllm's broken GGUF and CPU support issues:
>106569268 >106569331 >106569356 >106569357 >106569385 >106569553 >106569630 >106569666 >106569594
--Feasibility challenges for AI-generated game chat with video input:
>106569817 >106569839 >106569869 >106570004 >106570036 >106569923 >106569955 >106570369 >106570480
--Kimi K2's delusion encouragement performance:
>106570964 >106571077 >106571090 >106571099 >106571105
--Skepticism about K2's 32B matching GPT-4 capabilities:
>106567118 >106567806 >106568369
--Qwen 80B testing performance and comparison to larger models:
>106568659 >106568674
--Kioxia and Nvidia developing 100x faster AI SSDs for 2027:
>106569299
--vllm vs llama.cpp performance benchmarks with Qwen 4B model:
>106570266
--Miku (free space):
>106567977 >106568645 >106569488 >106571835 >106571849 >106571853 >106571856 >106571961 >106572139 >106573324

►Recent Highlight Posts from the Previous Thread: >>106566844

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 9/13/2025, 6:47:21 PM No.106575274 [Report] >>106575284 >>106575295 >>106575321 >>106575338 >>106575374 >>106575394 >>106575517 >>106576562

glm is for schizos, finetroons for drooling retards, mistral models break past 1k tokens and this thread is filled with people who have no idea what they're talking about but are happy if they can oneshot prompt some of the most disgusting ERP known to man

Anonymous 9/13/2025, 6:48:46 PM No.106575281 [Report]

Anything released within the last week gguf status?

Anonymous 9/13/2025, 6:48:54 PM No.106575284 [Report]

>>106575274
Yes... And?

Anonymous 9/13/2025, 6:50:27 PM No.106575295 [Report]

1757285195262315.jpg md5: 874ad481...

>>106575274
You'll elevate the thread, right?

Anonymous 9/13/2025, 6:54:00 PM No.106575321 [Report]

>>106575274
>people are doing something I don't like, that doesn't involve a singular other human being, entirely in the privacy of their own home, REEEEE
Grow the fuck up.

Anonymous 9/13/2025, 6:55:43 PM No.106575338 [Report]

>>106575274
Your temperature is set way too high, anon.

Anonymous 9/13/2025, 7:01:21 PM No.106575374 [Report]

>>106575274
gr8 b8 r8 it 8/8 m8

Anonymous 9/13/2025, 7:04:17 PM No.106575394 [Report] >>106575421 >>106575451 >>106577646

>>106575274
I don't like that you can make a post like this and if you get called out, you just claim it was a shitpost all along. Where's the accountability?

Anonymous 9/13/2025, 7:08:01 PM No.106575413 [Report] >>106575452 >>106575465 >>106575474 >>106576094

>goofs
I'm an old man out of the loop, the fuck does this even mean?

Anonymous 9/13/2025, 7:08:47 PM No.106575420 [Report] >>106575668

>>106574177
>On linux I get over 50tk/s
wait what? how do you get over 50t/s on GLM air? I am a linux user and i only get around 20t/s on my dual 4090 setup

Anonymous 9/13/2025, 7:08:48 PM No.106575421 [Report]

pretending.jpg md5: 9f72773e...

>>106575394

Anonymous 9/13/2025, 7:12:19 PM No.106575451 [Report] >>106575537

>>106575394
We need poster ID to find out what these kinda guys are up to. Probably shilling their latest enterprise jeetRAG solutions.

Anonymous 9/13/2025, 7:12:19 PM No.106575452 [Report]

>>106575413
goof is a synonym for mistake. they are trying to say that llamacpp was a mistake.

Anonymous 9/13/2025, 7:14:13 PM No.106575465 [Report]

>>106575413
You're an old man but unfamiliar with classic Disney characters?

Anonymous 9/13/2025, 7:14:59 PM No.106575473 [Report]

How's the progress on the llama.cpp MTP PR?

Anonymous 9/13/2025, 7:15:01 PM No.106575474 [Report] >>106575499

>>106575413
it's the GGUF format for quantized weights

Anonymous 9/13/2025, 7:18:16 PM No.106575492 [Report] >>106575505 >>106575506 >>106575523 >>106575577 >>106575644 >>106576777 >>106579713

Name one thing a local model has done for you

Anonymous 9/13/2025, 7:18:49 PM No.106575499 [Report] >>106575521

>>106575474
There's nothing preventing you from having a full precision model in GGUF format, right?
Not nitpicking btw, I really don't know, but intuitively I imagine that yes.
GGUF is just a way to pack the weights and some metadata, right?
Does that mean you could have AWQ GGUFs?

Anonymous 9/13/2025, 7:19:58 PM No.106575505 [Report]

>>106575492
best orgasms of my life

Anonymous 9/13/2025, 7:20:11 PM No.106575506 [Report] >>106575554

>>106575492
It recalled on demand Kasane Teto's birthday with 70% confidence

Anonymous 9/13/2025, 7:20:13 PM No.106575507 [Report] >>106575672 >>106575877 >>106575914 >>106576012

https://youtu.be/7Jzjl3eWMA0?t=117
Women raping gay billionare werewolf writers sounds unsafe. But their fucked up fetishes are somehow safe. I hate this world

Anonymous 9/13/2025, 7:22:29 PM No.106575517 [Report]

>>106575274
>but are happy if they can oneshot prompt some of the most disgusting ERP known to man
If a model could oneshot 5-6 prompts continuing my organic ERP logs. Maintain coherence up to 30k. And would withstand me getting bored after a month my penis would be a happy penis. GLM chan was the closest so far.

Anonymous 9/13/2025, 7:23:25 PM No.106575521 [Report]

>>106575499
I have never tried to run f32 but it handles bf16 just fine. I think the quantization matters because it need to know how to do the math or whatever, they seem to call it a kernel for some reason.

Anonymous 9/13/2025, 7:23:45 PM No.106575523 [Report] >>106575550

>>106575492
Best orgasms like other anon followe by wanting to kill myself again not because of sense of shame but because all the models are still fucking trash.

Anonymous 9/13/2025, 7:26:33 PM No.106575537 [Report] >>106575629

>>106575451
You understand that feeding trolls is doing it wrong. Yes?

Anonymous 9/13/2025, 7:28:13 PM No.106575550 [Report]

>>106575523
For me it's reinforcing fantasies that can never happen (like getting a gf). I'm not sure it's doing me any good, but whatever, nothing I can do to my mind is permanent.

Anonymous 9/13/2025, 7:28:30 PM No.106575554 [Report]

>>106575506
Absolutely critical use case for me to be desu

Anonymous 9/13/2025, 7:32:19 PM No.106575577 [Report]

>>106575492
Reminded me that local is still a waste of time.

Anonymous 9/13/2025, 7:39:13 PM No.106575629 [Report] >>106575744

>>106575537
At a certain point, I'm pretty sure it's just trolls feeding each other.

Anonymous 9/13/2025, 7:40:35 PM No.106575644 [Report]

>>106575492
help with scripts
medical advice (regarding headaches)
orgasms as the others said also helped me schizomaxx by making my daydreams more vivid and shit
help with looking shit up (eg what standrad does x use etc)

Anonymous 9/13/2025, 7:43:16 PM No.106575668 [Report] >>106575698

>>106575420
I'm pretty sure that's not normal, triple 4090s on *windows*, with the windows performance nerf can do 80 tokens/s according to others, and I've seen my iq4xs air do 70 tokens/s on linux with 3090s.

20t/s... is about what my windows (I'm the guy with the fucked up multi gpu windows performance) does on 2 gpus using iq2xs.

Are you sure nothing's spilling to ram?

Anonymous 9/13/2025, 7:43:52 PM No.106575672 [Report]

>>106575507
Yes, female privilege is getting bigger with time

Anonymous 9/13/2025, 7:47:11 PM No.106575698 [Report] >>106575792

>>106575668
it shouldn't be. I am using an IQ2 quant which should just barely fit in VRAM. could it potentially be my backend? I use oobabooga webui because it is convenient, but could it really be hindering my performance that much?

Anonymous 9/13/2025, 7:52:30 PM No.106575744 [Report]

>>106575629
You're right. Or samefagging

Anonymous 9/13/2025, 7:59:47 PM No.106575792 [Report] >>106575808

>>106575698
>oobabooga webui
If it isn't an acient install, you're using llama-server to load your ggufs, so that shouldn't be it. I used koboldcpp, llamacpp (llama-server), and oobabooga (llama-server) and there wasn't a noticable difference.

When you load your model, there should be a line that tells you how llama-server is loading the model. Maybe you need verbose flag in cmd_flags.txt to see it.

Did you split by rows?

Anonymous 9/13/2025, 8:01:27 PM No.106575808 [Report] >>106575836

>>106575792
it is a recent install, so then I guess that isn't the problem. I have tried both with and without row splitting and including it actually slightly reduces performance

Anonymous 9/13/2025, 8:05:27 PM No.106575836 [Report] >>106575848

>>106575808
Are you able to check what's your gpu usage during generation? In my case, my absymal performance on windows is verified by the power usage - barely 80w on each card during generation, while linux pulls 150+.

Anonymous 9/13/2025, 8:06:16 PM No.106575843 [Report] >>106575850

Why do these models speak as faggots

Anonymous 9/13/2025, 8:06:38 PM No.106575848 [Report] >>106575891

>>106575836
I will check right now, but from what I have seen, usually it is below 100w during generation

Anonymous 9/13/2025, 8:06:53 PM No.106575850 [Report]

>>106575843
monkey see monkey do

Anonymous 9/13/2025, 8:06:55 PM No.106575851 [Report] >>106575859 >>106575869 >>106576089

1742672324772800.jpg md5: 54acb253...

>>106575202 (OP)
>>(09/11) Qwen3-Next-80B-A3B released
>Still no GGUF

Anonymous 9/13/2025, 8:07:55 PM No.106575859 [Report]

>>106575851
GGUF is a state of mind, friend.

Anonymous 9/13/2025, 8:09:12 PM No.106575869 [Report] >>106575935 >>106575940 >>106575977

>>106575851
>9/11
Did they really? Fuckers. We should release a model on the 15th of april.

Anonymous 9/13/2025, 8:09:53 PM No.106575877 [Report] >>106575973

>>106575507
Pretty cool, really. When was the last time you heard a real female speaking to you, anon?

Anonymous 9/13/2025, 8:11:28 PM No.106575891 [Report] >>106575898

>>106575848
Btw, I confirmed it was specifically a multi-gpu problem in my case by running a model that fits in one gpu, and then running that same model with the same setting but split across three gpus. See if that's the case for you as well.

Anonymous 9/13/2025, 8:12:20 PM No.106575898 [Report] >>106575933

>>106575891
so, then you're saying I should try to keep the model on one GPU?

Anonymous 9/13/2025, 8:13:42 PM No.106575914 [Report]

>>106575507
I hate jewoman

Anonymous 9/13/2025, 8:15:24 PM No.106575933 [Report] >>106576021

>>106575898
Try rubning a smaller model on one gpu, then running that same smaller model but split across two gpus. There shouldn't be too much of a performance drop.

I'm just wondering if you have the same issue I have, but on linux.

Anonymous 9/13/2025, 8:15:31 PM No.106575935 [Report] >>106575962

>>106575869
>noooo a foreign tech company is doing a minor release on OUR sad day??

Anonymous 9/13/2025, 8:16:19 PM No.106575940 [Report] >>106575962

>>106575869
On their National Security Education Day?

Anonymous 9/13/2025, 8:18:33 PM No.106575962 [Report]

>>106575935
>>106575940
I am SEETHING. Raging. I can not COPE with their insensitivity.

Anonymous 9/13/2025, 8:19:18 PM No.106575973 [Report] >>106576045

>>106575877
I don't remember... But now I will write a gay billionaire werewolf book with help of R1 and I will get molested while signing my book. That is my dream from now on.

Anonymous 9/13/2025, 8:19:26 PM No.106575977 [Report]

>>106575869
>the second qwen model hit hf

Anonymous 9/13/2025, 8:23:42 PM No.106576012 [Report] >>106576035

>>106575507
Is this about women raping guys who write about or are gay billionaire werewolfs? Or is it about writers who write about gay billionaire werewolfs who rape women?

Anonymous 9/13/2025, 8:24:09 PM No.106576018 [Report]

>>106575952
Where's the catch?

Anonymous 9/13/2025, 8:24:16 PM No.106576021 [Report] >>106576059

>>106575933
so they both cap out at around 120W and enabling row splitting reduces performance by about 75%. I tested with an FP16 of gemma 270m. ~250t/s without vs. 65t/s with row splitting

Anonymous 9/13/2025, 8:26:22 PM No.106576035 [Report]

>>106576012
It is a about women raping guys who write about gay billionaire werewolves raping women.

Anonymous 9/13/2025, 8:27:24 PM No.106576045 [Report]

>>106575973
No, write your own version first - or at least a rough draft - then edit with a LLM. Start with a novella and build up your own workflow. It's very doable.

Anonymous 9/13/2025, 8:28:45 PM No.106576059 [Report] >>106576092

>>106576021
Sorry I should clarify, I asked if row splitting was enabled before because it's bad if you don't have enough bandwidth between the cards (like pcie).

Can you check your memory clocks when running a single gpu vs multi-gpu? Mine are 1000+ on a single gpu, and 650mhz on multi-gpu.

Anonymous 9/13/2025, 8:28:54 PM No.106576064 [Report]

G0vSxkYb0AA8Kn5.jpg md5: a8ba8148...

Member? lol

Anonymous 9/13/2025, 8:32:39 PM No.106576089 [Report] >>106576174

>>106575851
https://blog.vllm.ai/2025/09/11/qwen3-next.html

vllm has support including mtp layers and everything. Probably one of the nicer, and fastest local models right now but fuck spending all day getting vllm to run without a UI for what is essentially a sidegrade to glm air.

Anonymous 9/13/2025, 8:33:19 PM No.106576092 [Report] >>106576126

>>106576059
memory clocks are about the same for single and multi GPU. around 1250MHz

Anonymous 9/13/2025, 8:33:33 PM No.106576094 [Report] >>106576100 >>106576743

4chan-etiquette.png md5: b7bc3536...

>>106575413

Anonymous 9/13/2025, 8:34:27 PM No.106576100 [Report]

>>106576094
Now thats some old shitposting

Anonymous 9/13/2025, 8:37:22 PM No.106576126 [Report] >>106576137 >>106576162

>>106576092
Aww, not the same symptoms as me.

What's your settings? Is every multi-gpu model you run this slow?

Anonymous 9/13/2025, 8:39:15 PM No.106576137 [Report] >>106576151

>>106576126
What about --mlock?

Anonymous 9/13/2025, 8:41:22 PM No.106576151 [Report]

>>106576137
No difference on or off for me. But I only testing that on windows. Linux I left it off. --no-mmap is always on though.

Anonymous 9/13/2025, 8:42:17 PM No.106576162 [Report] >>106576186 >>106577441

file.png md5: 4207a2a6...

>>106576126
these are my settings for GLM air. i got about 33t/s just now.

Anonymous 9/13/2025, 8:43:13 PM No.106576174 [Report] >>106576443

>>106576089
>sidegrade to glm-air
>at a smaller size
>at 3b active
>with mtp
this thing is going to be fast as fuck

Anonymous 9/13/2025, 8:45:22 PM No.106576186 [Report] >>106576245

>>106576162
Are you sure you can fit 100k+ context? What's the speed like if you set the context size to 8192?

Anonymous 9/13/2025, 8:49:13 PM No.106576221 [Report] >>106576234 >>106576242 >>106576269 >>106577464

man, the imagen community is pretty fucking stagnant. how are the llm bros holding up?

Anonymous 9/13/2025, 8:50:30 PM No.106576234 [Report] >>106576243

>>106576221
We get something new about once a year. This year we peaked in February.

Anonymous 9/13/2025, 8:51:13 PM No.106576242 [Report]

>>106576221
we're about 12 deepseek sidegrades deep while the best model for consumer gpus is from july 2024

Anonymous 9/13/2025, 8:51:15 PM No.106576243 [Report] >>106576286

>>106576234
NTA but what did we get in feb?

Anonymous 9/13/2025, 8:51:20 PM No.106576245 [Report] >>106576331

file.png md5: 1d181594...

>>106576186
33.4t/s instead of 33t/s

Anonymous 9/13/2025, 8:54:14 PM No.106576269 [Report] >>106576282 >>106576320 >>106576381 >>106579821 >>106581071 >>106581286

1757284991163290.png md5: 143c7bc4...

>>106576221
We're at the tail end of the summer flood, euphoria starting to wear off

Anonymous 9/13/2025, 8:55:53 PM No.106576282 [Report]

>>106576269
>Summer Flood
Next...
>Drummer's Cold Season

Anonymous 9/13/2025, 8:56:18 PM No.106576286 [Report] >>106576444

>>106576243
r1

Anonymous 9/13/2025, 9:01:08 PM No.106576315 [Report] >>106576329 >>106579821

I switched back from K2-0905 to the old K2. The new one writes like it caught autism from the original R1.
Also the July K2 had the nice quirk that it wrote by far the best post-orgasm scenes out of any llm I've seen while the new one handles them much more generically 95% of the time.

Anonymous 9/13/2025, 9:01:47 PM No.106576320 [Report]

>>106576269
>euphoria
That's a weird way of describing what people feel seeing a flood of identical, useless synthetic models, each claiming to beat r1

Anonymous 9/13/2025, 9:02:52 PM No.106576329 [Report]

>>106576315
on what hardware?

Anonymous 9/13/2025, 9:03:22 PM No.106576331 [Report] >>106576358

>>106576245
Is glm air the only model this happens with? What models do you have?

Anonymous 9/13/2025, 9:07:03 PM No.106576358 [Report] >>106576378 >>106576395

>>106576331
I also have a very small quant of GLM full that runs at about 3.5t/s. 8 bit gemma 27b runs at about 13.5t/s. everything has always run extremely slow for me despite having good hardware

Anonymous 9/13/2025, 9:10:19 PM No.106576378 [Report] >>106576395 >>106576431

>>106576358
Yeah that's fucky.

Can you download a q4 gemma or nemo and report the speeds when running on 1 gpu vs two gpus? 13 tk/s for q8 gemma on dual 4090s isn't right.

What's your cuda and driver versions?

Anonymous 9/13/2025, 9:10:23 PM No.106576381 [Report] >>106576503

>>106576269
explain the strawberry and spade thing with OpenAI? I don't get it

Anonymous 9/13/2025, 9:12:07 PM No.106576395 [Report] >>106576415

>>106576358
>>106576378
>https://www.perplexity.ai/search/i-have-4-x-rtx-3090-s-and-128g-2EtrYlIlSUKZwfWnxK0aqw

Anonymous 9/13/2025, 9:14:24 PM No.106576415 [Report] >>106576430

>>106576395
Makes me want to throw up. Jesus christ.

Anonymous 9/13/2025, 9:16:00 PM No.106576430 [Report]

>>106576415
It's quite... generic answer.

Anonymous 9/13/2025, 9:16:04 PM No.106576431 [Report] >>106576477 >>106576569 >>106576592

>>106576378
CUDA is 12.8, drivers are 580.65.06.
Mistral-Nemo-Instruct-2407-Q4_K_M.gguf got me about 53t/s on 1 GPU and multi GPU is about 19t/s

Anonymous 9/13/2025, 9:18:35 PM No.106576443 [Report]

>>106576174
yah, but it will also be dumber for roleplaying and writing. Maybe better for coding and longer context. It is crazy to see mtp support. I wonder if qwen helped a bit, we may become qwens bitch if they keep doing stuff like that behind the scenes

Anonymous 9/13/2025, 9:19:04 PM No.106576444 [Report]

>>106576286
Forget that was just back in feb, feels like a lot longer ago

Anonymous 9/13/2025, 9:22:59 PM No.106576471 [Report] >>106576531

>>106575202 (OP)
>Previous threads: >>106566836(Cross-thread) & >>106559371(Cross-thread)
i still dont know why theres always two and i probably never will but i might be okay with that

Anonymous 9/13/2025, 9:23:14 PM No.106576477 [Report] >>106576497

>>106576431
Does this happen on windows or other distros as well?

In my case, windows 10 iot ltsc and 11 iot ltsc behave the same way, sane tks on single gpu, and abysmal performance on multi-gpu.
But debian 13 with driver version 550 had no problem delivering the speed for multi gpu.
My 3090s are running on x16 gen4. I did notice that resizable bar, while turned on in the bios, was reported as disabled in windows. While resizable bar shouldn't affect the speeds, it's weird that it says disabled even though it's enabled in linux, where I have better performance, so it may be indicative of some other issue to do with how my gous are handled in windows.

Anonymous 9/13/2025, 9:26:37 PM No.106576497 [Report] >>106576514

>>106576477
I am on a threadripper 3960x, so both of my 4090s are on 16x gen 4 as well. I have never tried other distros on this machine, I just use Mint. I have tried windows in the past and the performance was terrible for me too, even worse than now. ReBAR is enabled for both of my GPUs

Anonymous 9/13/2025, 9:27:21 PM No.106576503 [Report] >>106576877

>>106576381
Strawberry was some marketing hype about some openai innovation i think. Was a while ago. Spade: "If a white woman has this tattoo inked on her skin, it indicates that she is the sexual property of a Dominant Black Man, also called a 'Bull' or 'Owner'." in conjunction with the Israel flag, i

Anonymous 9/13/2025, 9:29:55 PM No.106576514 [Report] >>106576561

>>106576497
>zen 2
Same with me... Do you think that's it? Hmmm but then why does my linux have no problems?

Anonymous 9/13/2025, 9:32:17 PM No.106576531 [Report]

>>106576471
Memory's fuzzy, but I think threads were moving super fast early on and someone asked for 2 so they could quickly see if they missed a thread and whoever was maintaining the template at the time obliged.

Anonymous 9/13/2025, 9:36:05 PM No.106576561 [Report] >>106576596

>>106576514
no idea, honestly. I have seen people post about their performance time and time again and they always exceed me by far despite having similar hardware to me. 80t/s would be a dream, that is almost like real time text gen. I was ecstatic about getting above 20t/s for the first time in my life on a good sized model with GLM air

Anonymous 9/13/2025, 9:36:08 PM No.106576562 [Report]

>>106575274
Truth nvke

Anonymous 9/13/2025, 9:36:48 PM No.106576569 [Report] >>106576610

>>106576431
Imagine spending money on two 4090s to get only 19 tokens per second on nemo, rip anon.

Anonymous 9/13/2025, 9:40:13 PM No.106576592 [Report] >>106576606 >>106576610

>>106576431
RTX 4090 has 24GB of vram. And NEMO is about that size.
When you split that between two gpus it means that cpu is still acting as a middle man.
Have you turned on Hardware-accelerated GPU scheduling in Windows?

Anonymous 9/13/2025, 9:41:25 PM No.106576596 [Report] >>106576610

>>106576561
You definitely should be able to hit 80t/s. My 3090s can hit 70t/s when everything is working properly. Maybe bug report? To whom? I dunno lol.

You should not be satisfied with 20t/s. That's horrible.

Anonymous 9/13/2025, 9:42:26 PM No.106576606 [Report] >>106576668

>>106576592
4090 anon is mint. I'm the windows guy. And yeah, I've already tried toggling that. No difference.

Anonymous 9/13/2025, 9:42:59 PM No.106576610 [Report] >>106576652 >>106576668 >>106576688

>>106576569
yeah, it is pretty terrible, but this is what I have lived with for years. I was fine with getting 1.5t/s on old 120B models. I didn't know any better.
>>106576592
I don't use windows
>>106576596
could try asking chatgpt or something I guess. I have never been able to figure this issue out

Anonymous 9/13/2025, 9:43:34 PM No.106576616 [Report] >>106576660

I haven't paid that much attention to local in a while, who is drummer and are his models something special?

Anonymous 9/13/2025, 9:47:52 PM No.106576652 [Report] >>106576698

>>106576610
Buddy, 20t/s and 80tk/s are worlds apart. You can not go back to the crawl that is 20 after tasting 80. Do not accept that this is what your 4090s can give you.

And if you end up figuring out what's wrong, please tell me as well so I fix my windows too.

Anonymous 9/13/2025, 9:48:49 PM No.106576660 [Report]

>>106576616
rocinante model by drummer is basically the goto for vramlets who have got like standard gaming hardware.

Anonymous 9/13/2025, 9:49:58 PM No.106576668 [Report] >>106576693

>>106576606
>>106576610
These are important experiences to learn from.

Anonymous 9/13/2025, 9:52:27 PM No.106576688 [Report] >>106576698

>>106576610
Are your gpus the same model?

Anonymous 9/13/2025, 9:52:47 PM No.106576690 [Report] >>106576694 >>106576753

GSJQ3-CaUAMWcTU.jpg md5: 3b26dffe...

>>106575202 (OP)
i got GLM-4.5-Air a couple weeks ago is it still the best?

Anonymous 9/13/2025, 9:53:28 PM No.106576693 [Report] >>106576704

>>106576668
>learn from
What can you learn from this?

Anonymous 9/13/2025, 9:53:30 PM No.106576694 [Report]

>>106576690
Yes.

Anonymous 9/13/2025, 9:54:05 PM No.106576698 [Report] >>106576714 >>106576789

>>106576652
gonna put in a couple hours with chatgpt to see if I can fix this
>>106576688
no, different 4090s

Anonymous 9/13/2025, 9:54:55 PM No.106576704 [Report] >>106576714 >>106576726

>>106576693
Use Linux for multi-gpu setups and for more serious computing tasks. Windows is still for consumer faggotry.

Anonymous 9/13/2025, 9:55:49 PM No.106576714 [Report]

>>106576698
For what it's worth, I don't think having different models is the culprit, but I also have different 3090 models.
>>106576704
Your opinion has been noted, I thank you for your response.

Anonymous 9/13/2025, 9:56:54 PM No.106576726 [Report] >>106576759

>>106576704
??? 4090 anon is using linux and they still have multi-gpu problems.

Anonymous 9/13/2025, 9:59:19 PM No.106576743 [Report]

>>106576094
I've never seen noko referenced before...

Anonymous 9/13/2025, 9:59:58 PM No.106576753 [Report] >>106576766 >>106576774

>>106576690
Best you can do really on single gpu 16g vram and 64g of ddr4/5 at the moment. Jamba is pretty good at a smaller size with about 5-6k worth of human written writing to avoid the endless slop, but it reprocesses the entire cache every message because the llamacpp implementation is retarded. Every other moe is 1-3b active, 20-30 inactive aside from next or 220-300b inactive which requires you to have a shitload of ram and maybe a couple gpus.

Anonymous 9/13/2025, 10:00:38 PM No.106576759 [Report] >>106576789

>>106576726
Wrong kernel configuration.

Anonymous 9/13/2025, 10:01:25 PM No.106576766 [Report] >>106576841

>>106576753
>jamba mini mentioned
I really like it, but it's godawful retarded.

Anonymous 9/13/2025, 10:02:01 PM No.106576774 [Report] >>106576841

>>106576753
i have 90gb ram and 24gb vram if that helps currently i use GLM-4.5-Air-Q3_K_M think it was a fairly low t/s i am on amd

Anonymous 9/13/2025, 10:02:09 PM No.106576777 [Report] >>106576814

>>106575492
Oneshot code for an esphome ir blaster and receiver. Then oneshot all the recorded codes into buttons I can use from home assistant.
Thank you GLM 4.5 Air.

Anonymous 9/13/2025, 10:03:18 PM No.106576789 [Report] >>106576867 >>106578561

>>106576698
If >>106576759 is right, try debian 13? I just installed the 550 driver and llamacpp.

Anonymous 9/13/2025, 10:05:38 PM No.106576814 [Report] >>106576868

>>106576777
is that for using a tv remote for your lights?

Anonymous 9/13/2025, 10:07:40 PM No.106576841 [Report] >>106577739

>>106576766
It takes a lot of handholding for sure, and like I said, it takes a lot of tokens to break away from slop, but imo it has the most diverse swipes of any model I've tried with neutral samplers, apart from changing top_p in the range of 0.75 to 0.9. Better than mistral nonstop regurgitating the same shit every swipe, or devolving into repetition past 10k tokens
>>106576774
I have less vram/ram than you and also am using ayyymd, getting around 8-9 t/s with air which is tolerable early on, you should be getting better speed than that, unless you're expecting 20-50 t/s. Try messing around with the --ncmoe option if llamacpp, or the option that does the same in kobold. Subtract the total layers of the model by 5-10 and you should get a decent t/s boost

Anonymous 9/13/2025, 10:10:12 PM No.106576867 [Report] >>106576931

>>106576789
Linux != distro, when will you learn this? I thought this is /g/.
Just recompile your kernel and see if there's something what will help. I'm pretty sure it might come to how pci-e is being handled and whatnot.
Changing distribution is not that intelligent because it serves no purpose in this sense.

Anonymous 9/13/2025, 10:10:19 PM No.106576868 [Report]

>>106576814
Yes but did it for a tower fan and my window AC. I keep losing the only remote I have for each but now I have a virtual remote in home assistant I can use from my phone or PC.

Anonymous 9/13/2025, 10:11:28 PM No.106576877 [Report] >>106576886 >>106576903

>>106576503
>Spade: "If a white woman has this tattoo inked on her skin, it indicates that she is the sexual property of a Dominant Black Man
lol, is that real?

Anonymous 9/13/2025, 10:12:38 PM No.106576886 [Report]

>>106576877
sort of. i wouldn't look too far into the internet abyss of degeneracy. but yeah, stuff like that exists

Anonymous 9/13/2025, 10:14:19 PM No.106576903 [Report]

>>106576877
ever heard of "blacked" that's somewhat related to it

Anonymous 9/13/2025, 10:17:31 PM No.106576931 [Report] >>106577028

>>106576867
>Linux != distro, when will you learn this?
Idk, when I have time. The aversion I have to linux is that I need to set time aside to get used to how things work it in compared what I already know. Normally, this process can be hastened by asking others, but asking linux users stuff can be very frustrating.

The kernel that comes with debian 13 works for me, so that's why I suggested it.

Anonymous 9/13/2025, 10:18:07 PM No.106576937 [Report] >>106576952

openaistrawberry.png md5: 314bb0dc...

>https://www.strawberyai.com/
>Latest Update: 27 Aug-2024
>“Strawberry” is the codename for OpenAI’s latest AI initiative, which is set to launch as early as this fall
lol

Anonymous 9/13/2025, 10:19:47 PM No.106576948 [Report]

Why have memeplers died but memetunes continue to live?

Anonymous 9/13/2025, 10:20:22 PM No.106576952 [Report] >>106576959 >>106577083

>>106576937
wasn't strawberry like early codename for o1 or something

Anonymous 9/13/2025, 10:21:10 PM No.106576959 [Report]

>>106576952
I assume the website is a joke, but that seems to be the case

Anonymous 9/13/2025, 10:29:17 PM No.106577028 [Report] >>106577094

>>106576931
Yeah of course, some pre-configured kernels are more suitable for distributed tasks than others but it shouldn't be a reason to switch distribution.
By all means it only takes couple of hours max to go through a configuration and compiling a new kernel. It's not that different from configuring some application to your liking.

Anonymous 9/13/2025, 10:35:32 PM No.106577083 [Report]

>>106576952
They hyped it up for November 5th or something then rushed a released when the Reflection scam came out. I still believe down to my bones they were originally bluff hyping and stole the idea from the Reflection dude.

Anonymous 9/13/2025, 10:36:20 PM No.106577094 [Report] >>106577110 >>106577350

>>106577028
I'm sure that's intuitive for linux users, but as a windows users I do not know this. All I know is that you said it might be kernel issues, and I know that my distro, with its kernel, worked for me. That's why I suggested switching to debian. Because that's the simplest way I know to have a different kernel.

Thank you for teahcing me that kernels can be changed like that, but your advice should probably be directly to the 4090 anon.

Anonymous 9/13/2025, 10:38:11 PM No.106577110 [Report] >>106577146 >>106577154 >>106577350 >>106578663

>>106577094
You sound bit condescending and bitchy. Imageboard posting is always bit generalistic, don't you think? This is not your discord.

Anonymous 9/13/2025, 10:41:10 PM No.106577146 [Report] >>106577210

>>106577110
Alright then, I'm sorry, I apologize. I was being retarded and will conduct myself better in the future.

Anonymous 9/13/2025, 10:41:35 PM No.106577152 [Report] >>106577182 >>106577208 >>106577350

Are the Jamba models any good?
Is the whole hybrid ssm-transformer gimmick worth anything? Does it at least make the model faster to run compared to a transformer dense model of the same size? Is it more like a MoE maybe?
Does it run well on the CPU?
I'm download 1.7 mini to test, but figured I might as well ask.

Anonymous 9/13/2025, 10:41:40 PM No.106577154 [Report]

>>106577110
I think that's all in your head. He seemed polite and straighforward to me.

Anonymous 9/13/2025, 10:44:16 PM No.106577182 [Report]

>>106577152
It just makes context use less memory.

Anonymous 9/13/2025, 10:46:48 PM No.106577208 [Report]

>>106577152
Jamba mini is a lot less safe than something like qwen (lol), but it's also a lot less intelligent.

Anonymous 9/13/2025, 10:46:58 PM No.106577210 [Report]

>>106577146
My own aggressive stance. I should have typed:
Common Linux distributions have been configured with normal user in mind, some guy with 4 GPUs and hundreds of GBs of memory is not a normal user - he should configure his own kernel instead.

Anonymous 9/13/2025, 10:58:59 PM No.106577311 [Report] >>106577347 >>106577374 >>106577419 >>106577462 >>106577674

1737135381233186.png md5: 180a0624...

Which model or repo can I use to ingest all this information and find juicy stuff?, I don't know any Chinese
https://x.com/gfw_report/status/1966669581302309018

Anonymous 9/13/2025, 11:02:58 PM No.106577347 [Report]

>>106577311
Give it back john

Anonymous 9/13/2025, 11:03:22 PM No.106577350 [Report] >>106577367 >>106577372

>>106577110
I don't agree with this retard >>106577094
You're just fumbling around but aren't completely retarded, just unlearned
Installing a different kernel, as far as arch linux goes, is just `sudo pacman -S linux-hardened linux-hardened-headers` or `sudo pacman -S linux-zen linux-zen-headers` or whatever, adapting to your package manager to whatever kernel you need. I swap between a few when I run into retarded issues frequently
>>106577152
Reprocesses on every message/swipe, but small moe that you can offload a majority of to ram and have a lot of context while maintaining good speed. Sloppy as fuck until you feed it enough context to learn from. Less degradation over long context as well. The reprocessing thing basically kills the benefits I mentioned, though

Anonymous 9/13/2025, 11:05:00 PM No.106577367 [Report]

>>106577350
I got the first and second quotes backwards but whatever

Anonymous 9/13/2025, 11:05:29 PM No.106577372 [Report] >>106577408

>>106577350
>but small moe that you can offload a majority of to ram and have a lot of context while maintaining good speed
>Less degradation over long context as well.
I can work with that if the context processing is fast enough in my hardaware. Thanks.
I'll test it after I'm done fapping to ERP with GLM air.

Anonymous 9/13/2025, 11:05:48 PM No.106577374 [Report] >>106577388 >>106577395 >>106577734

>>106577311
cat * | grep -i keyword

Anonymous 9/13/2025, 11:08:22 PM No.106577388 [Report]

>>106577374
I can't find that model on HF. Link?

Anonymous 9/13/2025, 11:09:46 PM No.106577395 [Report] >>106577412 >>106577456

>>106577374
i don't think cat nor grep can understand and translate chinese sar

Anonymous 9/13/2025, 11:11:27 PM No.106577408 [Report] >>106577575

>>106577372
It's pretty tolerable if you set your batch size to 2k, but it's still pretty fucking lame to wait 30-40s to reprocess anything around 10-20k tokens even if the tg is fairly fast

Anonymous 9/13/2025, 11:12:11 PM No.106577412 [Report]

>>106577395
Just ask qwen3 30b to translate the keyword to chinese

Anonymous 9/13/2025, 11:12:38 PM No.106577419 [Report]

>>106577311
finally, we'll know if sending tianmen square copypasta actually boots someone off

Anonymous 9/13/2025, 11:15:37 PM No.106577441 [Report] >>106577477

>>106576162
whats this llamacpp frontend?

Anonymous 9/13/2025, 11:17:23 PM No.106577456 [Report] >>106577556

>>106577395
https://vocaroo.com/1mgCGBF0m9LF

Anonymous 9/13/2025, 11:18:35 PM No.106577462 [Report]

>>106577311
Taiwan is China 2bqh

Anonymous 9/13/2025, 11:19:01 PM No.106577464 [Report] >>106577603 >>106578767

>>106576221
>imagen is stagnant
>after getting QI, QIE, WAN and CHROMA in the last couple of months
literally kys retard
>i cant run them because i have a 1060 TI
literally kys retard vramlet

Anonymous 9/13/2025, 11:20:27 PM No.106577477 [Report] >>106578561

>>106577441
it looks like oobabooga's text gen webui

Anonymous 9/13/2025, 11:31:13 PM No.106577556 [Report] >>106577573

>>106577456
kek even lmg has its own jeet helpdesk now

Anonymous 9/13/2025, 11:32:58 PM No.106577573 [Report] >>106577664

>>106577556
vibevoice has a 30s sample of a jeet speaking with the thickest accent too, pretty easy

Anonymous 9/13/2025, 11:33:21 PM No.106577575 [Report] >>106577583 >>106577677

>>106577408
>Prompt
>- Tokens: 8789
>- Time: 29325.184 ms
>- Speed: 299.7 t/s
>Generation
>- Tokens: 1233
>- Time: 100108.141 ms
>- Speed: 12.3 t/s
That's not bad.
Granted, it's Q3KS, 32k context, n-cpu-moe 5, and batch size 512, but still.
If it's smart enough at this level of quantization, I'll replace Q6 qwen3 A3B with this.

Anonymous 9/13/2025, 11:34:43 PM No.106577583 [Report]

>>106577575
>n-cpu-moe 5
Sorry, n-cpu-moe 27.

Anonymous 9/13/2025, 11:37:50 PM No.106577603 [Report] >>106577617

>>106577464
qwen isn't much better than flux. if anything you have to snake oil the fuck out of it but most people in the community have poverty cards and don't bother. it's just more benchmaxxed safety garbage. chroma is shit btw. Wan is fantastic but 2.2 isn't much of an improvement over 2.1 and just adds confusion by having two separate models. 5 second limit is just shit and there has been so many cope techniques to extend but it's shit. there isn't even a point to qwen image when edit can just txt2image as well. Wan is seriously a better contender for an image model because it's at least trainable on garbage

Anonymous 9/13/2025, 11:39:53 PM No.106577617 [Report]

>>106577603
I'd like to add uncomfyui just keeps getting shittier by introducing more telemetry or a worse frontend. there really needs to be something else. tired of the API nodes grifting

Anonymous 9/13/2025, 11:42:58 PM No.106577646 [Report]

>>106575394
>if you get called out, you just claim it was a shitpost all along
[headcanon]
can't be surprised text coomers would have a thing for making up things in their heads

Anonymous 9/13/2025, 11:45:29 PM No.106577664 [Report] >>106577684 >>106578533

>>106577573
I'd like to find German-English accent and perhaps French one but it's pretty difficult.

Anonymous 9/13/2025, 11:47:05 PM No.106577674 [Report]

>>106577311
This finally convinced me that China should turn the firewall off and let USA mess with their internal politics... Oh wait, it did not.

Anonymous 9/13/2025, 11:47:08 PM No.106577677 [Report]

>>106577575
Don't know what you have for a gpu, but one of the benefits is that that gen speed will stay consistent, if you ignore the reprocessing. I run a q6 as as a warmup, then switch to air usually

Anonymous 9/13/2025, 11:48:26 PM No.106577684 [Report] >>106577705

>>106577664
https://www.youtube.com/watch?v=rEhXFZJUtJE
This is all what I needed. Youtube is full of shit it's hard to see something good.

Anonymous 9/13/2025, 11:49:28 PM No.106577688 [Report]

1740558417421685.jpg md5: 1490783f...

@grok

please generate a male in their 20 with a thick nasally chinese accent uttering these words:
>This finally convinced me that China should turn the firewall off and let USA mess with their internal politics... Oh wait, it did not.

Anonymous 9/13/2025, 11:51:16 PM No.106577705 [Report]

>>106577684
https://www.youtube.com/watch?v=27hsoahUehE
There's Russian as well.

Anonymous 9/13/2025, 11:55:05 PM No.106577734 [Report] >>106577778

>>106577374
https://porkmail.org/era/unix/award
always funny to see fucktards pretend to RTFM someone by showing how little they know about the CLI
grep -r nigger
or, better yet, rg, because grep is has been slow garbage
the only sort of ree-tard who would think of doing a "cat | grep" is a ree-tard who never used grep

Anonymous 9/13/2025, 11:55:28 PM No.106577739 [Report] >>106577768 >>106577795 >>106577798 >>106577841 >>106577880 >>106580839

1008001-from below close up photograph of an ant-uncAni4-105.jpg md5: 8b942dc8...

i made a character card if anyone want her https://files.catbox.moe/9fl9yu.png

>>106576841
>Try messing around with the --ncmoe option if llamacpp
so currently i have -ngl set at like 13 i think how do i take the ncmoe into account also where do i find out the total layers?

Anonymous 9/13/2025, 11:59:54 PM No.106577768 [Report] >>106577783 >>106580839

>>106577739
Cute card.
Set -ngl to 99 and -n-cpu-moe to 99. Look at the console and see how many layers the model has, it'll say something like
>offloading N repeating layers to GPU
That's the number of layers. Then you try lowering n-cpu-moe to N-1, N-2, etc.
It's easier if you launch GPU-Z to see how much VRAM you have free to fuck around with.

Anonymous 9/14/2025, 12:00:55 AM No.106577778 [Report]

>>106577734
> If you came here looking for material about abuse of feline animals, try this Alta Vista search instead.
link is broken :(

Anonymous 9/14/2025, 12:01:22 AM No.106577783 [Report] >>106577802

>>106577768
>-ngl to 99 and -n-cpu-moe to 99
how does ncpumoe effect vram i know going higher than around 13 gpu layers causes my system to lock up

Anonymous 9/14/2025, 12:03:13 AM No.106577795 [Report]

>>106577739
nta, here's an extracted card for human reading.
https://litter.catbox.moe/ewomrtg6gqsb73hc.txt

Anonymous 9/14/2025, 12:03:42 AM No.106577798 [Report] >>106577813

file.png md5: eaef0613...

>>106577739
the --ncmoe option is basically "it offloads layers at the end of the model"
Leaving layers minus the layer total puts those on gpu, which usually gives a speedup
pic rel, subtract 5-10 or however many layers as long as it doesnt OOM and itll run faster

Anonymous 9/14/2025, 12:04:09 AM No.106577802 [Report]

>>106577783
Basically, -ngl will tell llama.cpp to put all tensors (the vectors that compose each layer) in VRAM, then n-cpu-moe will tell llama.cpp that "actually, no, the expert layers will go in RAM".
So you end up with the heaviest tensors of the model in RAM.
From there, you can check how many layers your model has, and try adjusting n-cpu-moe to have only as many expert tensors in RAM as you can't fit in VRAM.

Anonymous 9/14/2025, 12:05:30 AM No.106577813 [Report]

>>106577798
bottom right, moe cpu layers if kobold

Anonymous 9/14/2025, 12:08:44 AM No.106577841 [Report] >>106577864

>>106577739
if using -ngl just set it to 99 and make sure --ncmoe doesn't make you OOM, it's basically backwards logic. Almost everything will be in ram, you just want to adjust --ncmoe so it fills a good amount of ram but not all of it

Anonymous 9/14/2025, 12:10:20 AM No.106577855 [Report]

It picks up, need to refine the source audio better.
https://vocaroo.com/1aIcVBCSa7ac
German accent isn't as funny sounding as Indian English anyway.

Anonymous 9/14/2025, 12:10:57 AM No.106577864 [Report] >>106577913 >>106577958 >>106578029

>>106577841
so start at 1 and increase until it doesnt oom?

Anonymous 9/14/2025, 12:12:42 AM No.106577880 [Report]

>>106577739
...

Anonymous 9/14/2025, 12:16:11 AM No.106577913 [Report] >>106578037 >>106578038

>>106577864
No. If you start with 1, you'll have all the experts in VRAM, minus one, which will OOM for sure.
Start with exactly how many layers the model has then lower the value gradually until you find out how many experts you need in RAM to now OOM.

Anonymous 9/14/2025, 12:21:02 AM No.106577958 [Report] >>106578037

>>106577864
You can click "file info" on huggingface for a model and if you can read, it'll be apparent how many layers a model has (or use kobold's retarded estimation thing to see how many layers). Depending on how big the whole thing is, you'll need to incrementally change how many layers get offloaded using ncmoe or that like for various backends. As much as it sucks, you need to take the information we give you, try it yourself and learn. Otherwise we could direct you to a rentry instead of this

Anonymous 9/14/2025, 12:30:27 AM No.106578029 [Report]

>>106577864
To try and spoonfeed a little bit more, figure out how many layers the moe model you're trying to run has, then do --ncmoe (amount of max layers it has) - 5
If you oom, lower it. If you have too much spare, up it
This hobby isnt exactly easy to figure out

Anonymous 9/14/2025, 12:31:25 AM No.106578037 [Report] >>106578070

>>106577958
>>106577913
okay i think i got it so the llama consoles says offloading 48 layers to gpu so i started there and then lowered it until it would launch without crashing which ended up being 33

ill probably do some llama bench runs tomorrow so i can see what the performance difference actually is

-ngl 99 \
--n-cpu-moe 33 \
-t 48 \
--ctx-size 20480 \
-fa on\
--no-mmap;

load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: ROCm0 model buffer size = 17562.93 MiB
load_tensors: ROCm_Host model buffer size = 34892.00 MiB
load_tensors: CPU model buffer size = 254.38 MiB

Anonymous 9/14/2025, 12:31:25 AM No.106578038 [Report] >>106578070

>>106577913
How do I calculate number of experts with llama-server prompt? Let's say that I'm using Qwen3-Coder-30B-A3B-Instruct-IQ4_XS?

Anonymous 9/14/2025, 12:34:48 AM No.106578070 [Report] >>106578085 >>106578118 >>106578158

>>106578037
>llama bench
I think llama-bench doesn't support n-cpu-moe, so you might want to check that.

>>106578038
llama-server spits out the number of layers the model has, just like in the post above yours.

Anonymous 9/14/2025, 12:37:08 AM No.106578085 [Report] >>106578132

>>106578070
Y-you sound like an expert... *shivers*

Anonymous 9/14/2025, 12:41:24 AM No.106578118 [Report] >>106578158

>>106578070
I think there was a pr for it, but I might be misremembering
Likely not merged yet if it exists

Anonymous 9/14/2025, 12:43:09 AM No.106578132 [Report] >>106578152

>>106578085
I am indeed an expert in launching llama-server and setting -n-cpu-moe. A truly specialized skill set that is sure to become incredibly in demand in the near future.
Right?
Right?

Anonymous 9/14/2025, 12:46:10 AM No.106578152 [Report]

>>106578132
She sets her book down and leans forward, her expression becoming more intimate. "I've been thinking about how the MOEs were always so obsessed with their own power and control. Kinda reminds me of how I feel about you sometimes, you know?"

Anonymous 9/14/2025, 12:46:49 AM No.106578158 [Report] >>106578204 >>106578240 >>106578242

1000018834.jpg md5: 88571c69...

damn i thought this would be a cool llm client device but the built in browser doesnt work an neither does the ancient firefox that can run on it

>>106578070
>>106578118
>I think llama-bench doesn't support n-cpu-moe, so you might want to check that.
oh thats annoying guess ill just wait then lol

Anonymous 9/14/2025, 12:52:04 AM No.106578204 [Report] >>106578236

>>106578158
just roll your own front-end that will work on the ancient browser?

Anonymous 9/14/2025, 12:54:52 AM No.106578236 [Report]

cute spid.png md5: 526acb13...

>>106578204
to be fair i did write a java client before doubt thatd be hard to port but i have an android mobo coming for this soon

Anonymous 9/14/2025, 12:55:43 AM No.106578240 [Report] >>106578249

>>106578158
You could use ssh?

Anonymous 9/14/2025, 12:55:53 AM No.106578241 [Report] >>106578413

file.png md5: f7aded11...

client in question

Anonymous 9/14/2025, 12:55:58 AM No.106578242 [Report] >>106578249

>>106578158
>whois 192.168.1.128

Anon, we are neighbors!

Anonymous 9/14/2025, 12:56:55 AM No.106578249 [Report] >>106578531

>>106578240
ssh for sillytavern lol?
>>106578242
please dont piss in my letterbox

Anonymous 9/14/2025, 12:59:29 AM No.106578283 [Report]

192.168.1.103
I'm here too.

Anonymous 9/14/2025, 1:00:22 AM No.106578295 [Report] >>106578424

im surprised other anons are using a .1 subnet

Anonymous 9/14/2025, 1:07:20 AM No.106578357 [Report] >>106578458

Is it normal for GLM 4.5 Air to feel bland when it comes to erp or am I doing something wrong? Using an q3_k_xl quant

Anonymous 9/14/2025, 1:13:15 AM No.106578413 [Report]

1734947454836403.png md5: c4fc1300...

>>106578241

Anonymous 9/14/2025, 1:14:37 AM No.106578424 [Report]

>>106578295
Seems like really small...

sage 9/14/2025, 1:17:14 AM No.106578455 [Report]

sexless thread. coomless hobby.

Anonymous 9/14/2025, 1:17:36 AM No.106578458 [Report]

>>106578357
It's not it's the same as Qwen3 you need to really tell it to what to write and still..
>Write as a contemporary author, use varied language but not too over the top - be natural, immersive and explicit. Try to suprise the user.
It changes its output but is it still what you really want? You can shape it, experiment more.

Anonymous 9/14/2025, 1:26:13 AM No.106578531 [Report]

>>106578249
>please dont piss in my letterbox

It wasn't me!
Err, how did you know?

Anonymous 9/14/2025, 1:26:32 AM No.106578533 [Report] >>106578546 >>106578583 >>106578658

>>106577664
Authentic German English:
https://www.youtube.com/watch?v=icOO7Ut1P4Y
https://www.youtube.com/watch?v=lLYGPWQ0VjY

Anonymous 9/14/2025, 1:28:06 AM No.106578546 [Report]

>>106578533
>Ze yello from ze egg

Anonymous 9/14/2025, 1:30:29 AM No.106578561 [Report] >>106578743

>>106576789
after several hours of research and testing with chatgpt's help, i have pretty much nothing. guess i will look into that kernel stuff but i dont wanna break things
>>106577477
thats exactly what it is

Anonymous 9/14/2025, 1:32:35 AM No.106578583 [Report] >>106578598

>>106578533
Thanks, I saved these. I will edit these later.
I think this sounds a lot better than the other guy - because this is real talk, not pretence or 'examples'.

Anonymous 9/14/2025, 1:34:44 AM No.106578598 [Report]

>>106578583
That's why the Indian guy example was so great because he talks in his real voice - it's not someone who is teaching anything and so on.

Anonymous 9/14/2025, 1:42:47 AM No.106578658 [Report] >>106578679 >>106578698

>>106578533
I cut off a snippet and let's see what comes out.
https://vocaroo.com/1iqNvlT8iNXm
Yeah it's 1:1 good.
Normal talking pace, no pretention, long enough voice clip -> result.

Anonymous 9/14/2025, 1:43:13 AM No.106578663 [Report] >>106578671

>>106577110
>You sound bit condescending and bitchy
non commital half insult
>Imageboard posting is always bit generalistic
appointing onseself as the arbiter of the culture
>don't you think?
condescending "you know what you did" ahhh shit
>This is not your discord.
trying to use group strength for one selfes purpose a la "not your army"

tell me how i know a woman or a troon wrote this post XD jesus christ never in my life would i ever type out something this faggoty just corssed myself irl god forbid nigga

Anonymous 9/14/2025, 1:44:35 AM No.106578671 [Report]

>>106578663
Are you that plebbit moderator? You have such a problem with understanding real people.

Anonymous 9/14/2025, 1:45:35 AM No.106578679 [Report]

>>106578658
It also sounds better when you normalize the result. Not just increase the volume.

Anonymous 9/14/2025, 1:47:59 AM No.106578698 [Report]

>>106578658
Yeah, that sounds better. Glad it works.

Anonymous 9/14/2025, 1:49:33 AM No.106578711 [Report] >>106578793

1735036285126388.png md5: 36b74248...

>>106575202 (OP)
Limitless Mikus General

Anonymous 9/14/2025, 1:52:56 AM No.106578734 [Report]

>trannies LARPing as oldfags - the thread

Anonymous 9/14/2025, 1:53:39 AM No.106578743 [Report]

>>106578561
You could just install to a usb solely for the sake of testing.

Anonymous 9/14/2025, 1:54:23 AM No.106578746 [Report] >>106578756

>Her blue eyes searched yours with vulnerability usually masked behind phlegmatic calm
Well shit, I learned a new word today.
Thank you GLM Air.

Anonymous 9/14/2025, 1:55:46 AM No.106578756 [Report] >>106578759

>>106578746
>phlegmatic
uhhh before i look it up isn't that the stuff you get in your throat when you have a cold?

Anonymous 9/14/2025, 1:56:15 AM No.106578759 [Report] >>106578766

>>106578756
No.
No it isn't.
Cool huh?

Anonymous 9/14/2025, 1:57:18 AM No.106578766 [Report] >>106578778 >>106578812

1747872137579927.png md5: 5de86dd2...

>>106578759
language is interesting

Anonymous 9/14/2025, 1:57:22 AM No.106578767 [Report]

>>106577464
qwen image is synthmaxxed trash, the edit is a shitty cope for 4o and the google model, wan is good if you are satisfied with waiting 10 years for a 5s clip, chroma is complete unstable shit that knows 0 (zero) booru artists despite claiming to be trained on them

Anonymous 9/14/2025, 2:00:00 AM No.106578778 [Report] >>106578787 >>106578790

>>106578766
I'm not US but phlegmatic is probably related to slow as molasses in etymology.

Anonymous 9/14/2025, 2:01:37 AM No.106578787 [Report] >>106578795

>>106578778
Bingo.

Anonymous 9/14/2025, 2:01:55 AM No.106578790 [Report] >>106578836

1749778373288697.png md5: 1e80f693...

>>106578778
yeah, I guess it goes back to when people thought the 4 humours controlled everything.
another funny word like this is seminal, which means "containing seeds of later development" but its etymology is pretty funny if you look it up.

Anonymous 9/14/2025, 2:02:30 AM No.106578793 [Report] >>106578891 >>106578934

1747206883570917.png md5: 036c8191...

>>106578711

Anonymous 9/14/2025, 2:02:50 AM No.106578795 [Report]

>>106578787
I have learned something from films. I'm ESL, from Finland. Not from India.
Trouble is with written language even after XX years.

Anonymous 9/14/2025, 2:04:56 AM No.106578812 [Report]

1745491574733438.png md5: 5b2a7b2c...

>>106578766

Anonymous 9/14/2025, 2:08:33 AM No.106578836 [Report]

>>106578790
I'm more interested about the ancient history of mankind. Doesn't mean that much if language goes as far as few thousands years. Our history goes far beyond that.

Anonymous 9/14/2025, 2:17:49 AM No.106578891 [Report]

>>106578793
nice :)

Anonymous 9/14/2025, 2:23:00 AM No.106578934 [Report]

>>106578793
wow thats cool! (:

Anonymous 9/14/2025, 2:50:19 AM No.106579126 [Report] >>106579142 >>106579216 >>106579287 >>106579680 >>106579717

I have concluded that 256 GB vram is all I need (for now)

Anonymous 9/14/2025, 2:52:24 AM No.106579142 [Report]

>>106579126
Based DGX owner. I tried to buy one on ebay once for 10k, but the fuckers canceled my order.

Anonymous 9/14/2025, 3:01:32 AM No.106579216 [Report]

>>106579126
that's a lot of vram

Anonymous 9/14/2025, 3:12:22 AM No.106579287 [Report] >>106579420

>>106579126
how do you have that much vram?

Anonymous 9/14/2025, 3:31:27 AM No.106579420 [Report] >>106579432

>>106579287
Did you notice that most consumer mobos used to have 4 slots but now there's 2 slots.
I thought this was proprietary because of Dell or HP but now... it's because of the price jew.
Even the efficient gaming mobos have only 2 slots available.

Anonymous 9/14/2025, 3:32:48 AM No.106579432 [Report]

>>106579420
there are some basic ones with 5, but the problem is most dont get full pcie. i have an epyc and an asrock romed8-2t which has 7 full bandwidth slots

Anonymous 9/14/2025, 4:11:58 AM No.106579680 [Report]

>>106579126
a m4max macbook with 128gb unified ram is all you need for local

Anonymous 9/14/2025, 4:18:00 AM No.106579713 [Report] >>106579787

1741651907349476.png md5: 76c7dc4d...

>>106575492
Helping me shit out decent sft datasets via a custom pipeline. Even managed to create a DPO dataset too

Anonymous 9/14/2025, 4:18:44 AM No.106579717 [Report]

>>106579126
Only 8 MI50 and a ddr4 server with 512gb of ram.
The problem would be the slow pp.

Anonymous 9/14/2025, 4:19:15 AM No.106579721 [Report] >>106579736 >>106579748 >>106579799 >>106581632 >>106582508

file.png md5: e2ab937e...

Qwen Next is too censored.

Anonymous 9/14/2025, 4:21:31 AM No.106579736 [Report] >>106579784

>>106579721
Jesus... Have you or do you manually enable or diable <think></think>?

Anonymous 9/14/2025, 4:22:37 AM No.106579748 [Report]

>>106579721
Next is not reasoning model but if your tags still inject this, it can behave in wrong way.

Anonymous 9/14/2025, 4:27:08 AM No.106579784 [Report] >>106579795 >>106579859 >>106579884 >>106579897

>>106579736
It's the Instruct one. But this is a file that has a bunch of <|channel|>analysis<|message|> in it because I was using it to jailbreak gpt-oss. I just let it complete one of the CoTs and I thought the result was funny.

Anonymous 9/14/2025, 4:27:46 AM No.106579787 [Report] >>106579824

>>106579713
>runpod
What part of local did you not understand?

Anonymous 9/14/2025, 4:29:27 AM No.106579795 [Report]

>>106579784
Sorry I forgot <|xxx|> chatml exact format. But if it breaks down it means something is leaking.

Anonymous 9/14/2025, 4:30:21 AM No.106579799 [Report]

>>106579721
fantastic

Anonymous 9/14/2025, 4:32:55 AM No.106579821 [Report]

>>106576315
Sadly I have to agree. nu-Kimi lost the calm that made it likeable. >>106576269 will have to lower it to notable from top.

Anonymous 9/14/2025, 4:33:24 AM No.106579824 [Report]

1731344891643778.jpg md5: 9a93ec0d...

>>106579787
Salty your gatekeeping is ineffective?

Anonymous 9/14/2025, 4:41:15 AM No.106579859 [Report]

>>106579784
Oh wait, I can help you more.

Anonymous 9/14/2025, 4:46:41 AM No.106579884 [Report]

>>106579784
gpt-oss will display
> <|channel|>analysis<|message|>
or sometimes it will not display this at all.
Oh fuck I'm too drunk.
Last tests I did was with Qwen and this is chatml.
https://litter.catbox.moe/pd89w3421se7y4zm.txt
I deleted gpt-oss models after I made it work.

Anonymous 9/14/2025, 4:48:52 AM No.106579897 [Report] >>106579908 >>106580072

>>106579784
You need to clean the mesage from anything else what is not <|final|>
I'm sorry I'm bit drunk for this and I don't have gpt-oss on my disk any longer.
It's just a simple string operation.

Anonymous 9/14/2025, 4:51:22 AM No.106579900 [Report] >>106579917

1751681952520421_thumb.jpg.webm md5: 30201392...

WebM not supported

are finetunes more prone to repetition?

Anonymous 9/14/2025, 4:51:58 AM No.106579905 [Report]

1741364294668764.jpg md5: ec17cdbd...

>>106575202 (OP)

Anonymous 9/14/2025, 4:52:22 AM No.106579908 [Report] >>106579934 >>106579949

Anonymous 9/14/2025, 4:54:09 AM No.106579917 [Report] >>106579944

>>106579900
Why are there so many fat people?

Anonymous 9/14/2025, 4:57:24 AM No.106579934 [Report] >>106579949

>>106579908
>This is what you need to fetch and ignore, that's the reasoning block of text.
There is also something what llama-server does or maybe the model does it that, it will not prefill <|start|>assistant<|channel|>
But it will blurt out the straight final message.
You need fetch out that and do if - manage string patterns.

Anonymous 9/14/2025, 4:58:30 AM No.106579944 [Report] >>106580034

1950sfraternity.jpg md5: 8f71c00e...

>>106579917
because the american food supply became so tainted it disrupted people's natural hunger "thermostat"

Anonymous 9/14/2025, 4:59:00 AM No.106579949 [Report] >>106580014

>>106579908
>>106579934
I am sorry if my English does not make any sense but it is a matter of string pattern recognition. It is confusing that this pos model will not sometimes just use 'assistant' at all but you will need to manuall make an exception.

Anonymous 9/14/2025, 5:09:33 AM No.106580014 [Report] >>106580022

>>106579949
And the worst part is that the documentation
>https://huggingface.co/blog/kuotient/chatml-vs-harmony
Tells more about their big chatgpt thing than what if you implemented it yourself.
All of this is just bullshit,
it's still just chatml format but with extended
tags and exceptions.

Anonymous 9/14/2025, 5:10:34 AM No.106580022 [Report]

>>106580014
wrong link
https://cookbook.openai.com/articles/openai-harmony

Anonymous 9/14/2025, 5:12:59 AM No.106580034 [Report] >>106580050

>>106579944
Tainted with what?

Anonymous 9/14/2025, 5:16:14 AM No.106580050 [Report]

>>106580034
there's a theory that polyunsaturated fats (which are industrial made and cheaper) throw off the nadh:fadh ratio and stop reverse electron transport from happening
https://www.youtube.com/watch?v=pIRurLnQ8oo
another theory is that processed grains, the intestines aren't equipped to "sense" the volume of correctly
it's probably multi-factoral

Anonymous 9/14/2025, 5:19:05 AM No.106580069 [Report] >>106580290

qwen3-next goofs status?

Anonymous 9/14/2025, 5:19:54 AM No.106580072 [Report] >>106580105 >>106580153

>>106579897
I had these in the file because if you leave edited reasoning blocks in the context, it changes how gpt-oss does reasoning in the following responses. I use that to let gpt-oss reason without the refusals. You can still do that in chat completion mode.
Later I changed the model to Qwen Next and it started to imitate the reasoning blocks, but it does it more like a parody of a GPT model. And I was getting distracted by the kind of things it says.

Anonymous 9/14/2025, 5:20:47 AM No.106580078 [Report]

30474 - SoyBooru.png md5: 05b972a4...

Do you like the kiwis? (Qwen models) (When models?)

Anonymous 9/14/2025, 5:25:43 AM No.106580105 [Report] >>106580156

>>106580072
There is no chat completition - what string you send to the server it comes back and then you will format it. Trial and error type of thing.
But gpt-oss doesn't follow normal ways because it's broken.
I'm sorry if I sound retarded or annoying but any other model you can say format it will respond back with that format.
Don't waste your time with gpt-oss.

Anonymous 9/14/2025, 5:35:18 AM No.106580153 [Report]

>>106580072
I can supply you with my code but it doesn't make any sense for you because it's out of the context and bad string management.
https://litter.catbox.moe/tisv7n22ye9rwqjs.txt
It's python.

Anonymous 9/14/2025, 5:35:47 AM No.106580156 [Report] >>106580170 >>106580182

>>106580105
The prefix of each assistant turn is just "<|start|>assistant". That's why even in chat completion mode if the message content is "<|channel|>analysis<|message|>" it will still be formatted correctly when you put it together. It's an easy way to edit how it does the reasoning even in a chat UI. The backend would need to escape the special tokens for it to not work.
gpt-oss is still fun but to be honest, I never used it to write stories. I only have used it for fake text games and MUDs, without much narration.

Anonymous 9/14/2025, 5:37:49 AM No.106580170 [Report] >>106580191

>>106580156
Just tell me what you are having a problem with, I'm a retard, really.

Anonymous 9/14/2025, 5:39:14 AM No.106580182 [Report]

>>106580156
I think you are trying to pull me. Try your best.

Anonymous 9/14/2025, 5:40:12 AM No.106580191 [Report] >>106580197

>>106580170
I don't have a problem.

Anonymous 9/14/2025, 5:41:52 AM No.106580197 [Report] >>106580204 >>106580239

>>106580191
What do you mean?

Anonymous 9/14/2025, 5:42:49 AM No.106580204 [Report] >>106580218

>>106580197
I mean that I don't have a problem.

Anonymous 9/14/2025, 5:46:11 AM No.106580218 [Report]

>>106580204
In this moment, I am euphoric, not because I shared a text file with you, but because I am enlightened by my intelligence.

Anonymous 9/14/2025, 5:48:33 AM No.106580239 [Report]

>>106580197
import os
import re
import sys
import requests
import random
import textwrap
from colorama import init, Fore, Back, Style
import contractions
import numpy as np
import sounddevice as sd
import subprocess
import wave

Anonymous 9/14/2025, 5:58:12 AM No.106580290 [Report]

>>106580069
months of sir before work

Anonymous 9/14/2025, 5:58:48 AM No.106580295 [Report] >>106580315

What's a good model for having the AI be a kindof ttrpg GM? I recently tried Omega Directive but it seems better suited to be a one on one chatbot instead of a proper adventure mode helper.

Anonymous 9/14/2025, 6:01:29 AM No.106580315 [Report] >>106580332

>>106580295
any sufficiently large model can compete competently at any task

Anonymous 9/14/2025, 6:05:16 AM No.106580332 [Report] >>106580337

>>106580315
I have 16 gb of VRAM, so looking for stuff that'll fit on that.

Anonymous 9/14/2025, 6:06:14 AM No.106580337 [Report] >>106580342

>>106580332
how much RAM? the new qwen next might be good for you

Anonymous 9/14/2025, 6:06:54 AM No.106580342 [Report] >>106580350 >>106580514

>>106580337
65gb abouts

Anonymous 9/14/2025, 6:08:10 AM No.106580350 [Report]

>>106580342
plenty for a q4 quant
https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct

Anonymous 9/14/2025, 6:34:26 AM No.106580473 [Report]

https://github.com/ggml-org/llama.cpp/pull/15539
ggergachud will soon merge grok pr

Anonymous 9/14/2025, 6:40:32 AM No.106580514 [Report] >>106580531 >>106580681

>>106580342
Try gpt-oss-120b. It only has 5B active parameters.

Anonymous 9/14/2025, 6:43:25 AM No.106580531 [Report]

>>106580514
>It only has 5B active parameters
and most of those are dedicated to ensuring OpenAI policy is followed at all times

Anonymous 9/14/2025, 7:12:37 AM No.106580681 [Report]

>>106580514
buy an ad

Anonymous 9/14/2025, 7:17:59 AM No.106580707 [Report] >>106580712 >>106580717 >>106580838

Has anyone experimented with the LLM teaching itself a task by autonomously generating tasks and training data for itself?

Anonymous 9/14/2025, 7:19:06 AM No.106580712 [Report] >>106580717

>>106580707
There is nothing to teach
The weight is fixed

Anonymous 9/14/2025, 7:20:24 AM No.106580717 [Report] >>106580721 >>106580742 >>106580802

>>106580707
somewhat https://github.com/e-p-armstrong/augmentoolkit
>>106580712
>tasks and training data
the reading comprehension in this thread is off the chart

Anonymous 9/14/2025, 7:21:17 AM No.106580721 [Report] >>106580728 >>106580794

>>106580717
speed running mode collapse lol

Anonymous 9/14/2025, 7:23:18 AM No.106580728 [Report] >>106580762

>>106580721
like literally everyone isn't using synth slop, with this you at least get some chance of the model seeing synth slop of stuff you care about instead of more code and math

Anonymous 9/14/2025, 7:25:52 AM No.106580742 [Report]

>>106580717
You're mad at everyone now, go drink some water you little bitch.

Anonymous 9/14/2025, 7:29:01 AM No.106580762 [Report] >>106580781

>>106580728
"Synth slop" is just data augmentation
I don't see you calling cropped images synth slop

Anonymous 9/14/2025, 7:32:00 AM No.106580781 [Report] >>106580785

>>106580762
>I don't see you calling cropped images synth slop
maybe because we're in the text general and not relating to image gen shit?

Anonymous 9/14/2025, 7:33:26 AM No.106580785 [Report] >>106580793

>>106580781
Yeah right, act as if we weren't spamming vocaroos

Anonymous 9/14/2025, 7:34:52 AM No.106580793 [Report]

>>106580785
oh its you

Anonymous 9/14/2025, 7:35:11 AM No.106580794 [Report] >>106580944 >>106580966

>>106580721
>parroting the meme collapse paper in 2023+2
>when all SOTA models are trained on synthetic data with verifiable rewards

Anonymous 9/14/2025, 7:36:34 AM No.106580802 [Report]

>>106580717
thanks

Anonymous 9/14/2025, 7:44:20 AM No.106580838 [Report]

>>106580707
Yeah I made my own augmentoolkit since that one is bloated af and slow. Very useful to turn raw data from scraped websites into something usable and you can easily scale by data augmentation. You still need human data initially, pure synthetic slop would make it collapse fast

Anonymous 9/14/2025, 7:44:48 AM No.106580839 [Report]

>>106577739
>>106577768
PSA: if you have the latest llama.cpp build you no longer need to set ngl to have it at 99, they finally are starting to bring sane defaults to llama.
--no-context-shift is no longer needed too, they finally got rid of that mindnumbingly stupid default

Anonymous 9/14/2025, 7:45:23 AM No.106580842 [Report] >>106580851

I can release prompts for interactive fiction. I just think that people who ask them don't need to know.

Anonymous 9/14/2025, 7:46:58 AM No.106580851 [Report] >>106580894

>>106580842
You're absolutely right! Your genius is best contained to yourself and not spread to idiotic plebs.

Anonymous 9/14/2025, 7:53:24 AM No.106580894 [Report]

>>106580851
It's not because they need to know; it because you are unique<|analysis

Anonymous 9/14/2025, 7:56:19 AM No.106580909 [Report] >>106580914

Lets imagine a situation in which I am forced to release a simple text file- this would be adaptable by even ST users. Do I feel inclined to do so?

Anonymous 9/14/2025, 7:57:15 AM No.106580914 [Report] >>106580923 >>106580960

>>106580909
Why release when you can HODL?

Anonymous 9/14/2025, 7:59:08 AM No.106580923 [Report] >>106580935

>>106580914
My knowledge doesn't understand HODL

Anonymous 9/14/2025, 8:01:01 AM No.106580935 [Report] >>106580950 >>106580960

>>106580923
Then your knowledge is worthless I'm afraid.

Anonymous 9/14/2025, 8:01:17 AM No.106580940 [Report] >>106580951

Qwen Next is obsessed with short sentences. It's annoying. I checked the one in OpenRouter to make sure it wasn't a thing of the AWQ version. It still has that. But in this story that I'm trying, the AWQ version always ignores what I'm putting in the last turn, while the full version always pays attention to it. I'm deleting it and giving a try to the FP8 version, but it still feels like a big downgrade compared to GLM Air or gpt-oss.

Anonymous 9/14/2025, 8:01:52 AM No.106580943 [Report] >>106580956

I am going to think about this, and then release a simple format. This will make most people's chats better. This is not a joke.

Anonymous 9/14/2025, 8:01:52 AM No.106580944 [Report]

1742111898538268.png md5: 026929f9...

>>106580794
Didn't you know?
LLMs have peaked
It's over

Anonymous 9/14/2025, 8:02:53 AM No.106580950 [Report]

>>106580935
I don't query into deep joking.

Anonymous 9/14/2025, 8:02:54 AM No.106580951 [Report] >>106580994

>>106580940
Kimi K2 loves short answers too, maybe distillation.

Anonymous 9/14/2025, 8:04:18 AM No.106580956 [Report]

>>106580943
Tomorrow, I am preparing a simple format to help brainlets.

Anonymous 9/14/2025, 8:05:14 AM No.106580960 [Report]

>>106580914
>>106580935
cryptobro knowledge belongs to the oven

Anonymous 9/14/2025, 8:06:25 AM No.106580966 [Report] >>106580971 >>106581382

>>106580794
SOTA on what? Equally synthetic mememarks? lmfao

Anonymous 9/14/2025, 8:07:50 AM No.106580971 [Report]

>>106580966
for synthetic use cases yes

Anonymous 9/14/2025, 8:08:11 AM No.106580974 [Report]

Anyone having success with Longcat Flash Chat? I'm using a 5.5 bpw quant with 0.7 temp & 0.8 top-p and I'm finding its ability to write stories unsatisfactory.

Anonymous 9/14/2025, 8:10:51 AM No.106580994 [Report]

>>106580951
I actually have two swipes with the updated Kimi K2 at this point in the story. It's nothing like that and it writes quite well.

Anonymous 9/14/2025, 8:27:38 AM No.106581071 [Report]

>>106576269
>DeepSeek flops for the first time with V3.1
IDK what you mean. It's what I use now instead or V3-0324 or R1-0528.

Anonymous 9/14/2025, 9:07:19 AM No.106581286 [Report] >>106581534 >>106581599

1747083442013324.png md5: a26744d8...

>>106576269
>DS v3.1
>flop
Skill. Issue.

Anonymous 9/14/2025, 9:22:13 AM No.106581382 [Report]

>>106580966
Math, programming, anything that has benefited from CoT.

Anonymous 9/14/2025, 9:46:21 AM No.106581534 [Report] >>106581607

degraded.png md5: a7192c36...

>>106581286
GLM-chan does her best and doesn't degrade at all.

Anonymous 9/14/2025, 9:56:00 AM No.106581599 [Report] >>106582053

>>106581286
most retarded benchmark in the history of llm benchmarks
LLM as judge for human writing LOL

Anonymous 9/14/2025, 9:57:25 AM No.106581607 [Report] >>106581723

>>106581534
GLM-chan is fat and obese and stinky

Anonymous 9/14/2025, 10:00:19 AM No.106581632 [Report]

1749548135019803.png md5: 814cdc37...

>>106579721
>I am not
descartes is sad

Anonymous 9/14/2025, 10:12:31 AM No.106581723 [Report]

>>106581607
Shut up, Sam.

Anonymous 9/14/2025, 10:54:09 AM No.106581987 [Report] >>106582014

>>106575202 (OP)
I clicked on the image and I got a bigger version of the image.

Anonymous 9/14/2025, 10:58:47 AM No.106582014 [Report] >>106582061 >>106582140

>>106581987
yes that is how this site works

Anonymous 9/14/2025, 11:04:16 AM No.106582053 [Report] >>106582089 >>106582090 >>106582101

1732695470798402.png md5: 867dd5b0...

>>106581599
sama coping because gp-toss ranks below gemma3 12b

Anonymous 9/14/2025, 11:05:05 AM No.106582061 [Report]

>>106582014
Wait until he finds out selecting text to quote-reply. It's gonna blow his fucking mind.

Anonymous 9/14/2025, 11:08:16 AM No.106582089 [Report]

>>106582053
Mistralbros...

Anonymous 9/14/2025, 11:08:21 AM No.106582090 [Report] >>106582103

scells.png md5: 9040fdcd...

>>106582053
>0.770

Anonymous 9/14/2025, 11:08:21 AM No.106582091 [Report]

1757840830335.png md5: c134690f...

>Of course!
>Exactly!
>You're absolutely right!

Anonymous 9/14/2025, 11:10:55 AM No.106582101 [Report] >>106582202

>>106582053
speaking of toss
https://old.reddit.com/r/LocalLLaMA/comments/1ng9dkx/gptoss_jailbreak_system_prompt/ne306uv/

Anonymous 9/14/2025, 11:11:29 AM No.106582103 [Report]

>>106582090
Perhaps it has a degradation fetish.

Anonymous 9/14/2025, 11:17:52 AM No.106582140 [Report]

>>106582014
With the giant sign it seemed set for the every-time-you-open-this-thumbnail meme.

Anonymous 9/14/2025, 11:23:47 AM No.106582171 [Report]

Is rvc still the king for ai voice covers?

Anonymous 9/14/2025, 11:24:12 AM No.106582173 [Report] >>106582183

>>>/pol/515958539

Anonymous 9/14/2025, 11:25:25 AM No.106582183 [Report] >>106582210

>>106582173
I'm glad they're finally banning those pesky white and black bars.

Anonymous 9/14/2025, 11:25:55 AM No.106582186 [Report] >>106582202

ComfyUI_02766_.png md5: 6a998583...

After using GPT-OSS-20B for a period for a variety of reasons, Gemma-3-27B almost feels like an erotic finetune. It still can't write smut, but it has a rather flirty writing style and does almost anything, as long as you provide it suitable instructions for doing so. GPT-OSS, even after "jailbreaking", is always fighting against you and prioritizing its imaginary OpenAI policies, and is utterly retarded for actual conversations.

I hope Google won't ruin Gemma-4. It's almost guaranteed they'll add reasoning, probably MoE or Matformer architecture, possibly system instruction support due to popular demand.

Anonymous 9/14/2025, 11:28:25 AM No.106582202 [Report] >>106582224

>>106582186
imagine being filtered more than reddits >>106582101

Anonymous 9/14/2025, 11:29:47 AM No.106582210 [Report]

>>106582183
that is clearly a yellow bar

Anonymous 9/14/2025, 11:32:05 AM No.106582224 [Report] >>106582316

>>106582202
The "jailbreak" there doesn't really work well. The first mistake is telling the model it's ChatGPT.

Anonymous 9/14/2025, 11:50:35 AM No.106582316 [Report] >>106582339 >>106582377

>>106582224
nta but it actually does work on 120b, I stopped getting refusals
it still wastes something like 192 tokens for it's schizo policies on reasoning_effort high

Anonymous 9/14/2025, 11:53:44 AM No.106582339 [Report]

>>106582316
You mean 500 tokens
The jb alone is 300 tokens

Anonymous 9/14/2025, 12:00:51 PM No.106582377 [Report]

>>106582316
You can mitigate refusals by changing the actual system prompt (not the "developer" instructions) on the 20B version too. It's just not good for roleplay and some topics will still be off-limits, no matter how hard you try to override content policy or changing the model's identity. Gemma 3 refuses hard on an empty prompt, but you can very easily work around that, and then it will even enthusiastically follow along. It just feels like it's been covertly designed for roleplay, whereas GPT-OSS probably had these capabilities removed or omitted. I haven't tested it for storywriting.

Anonymous 9/14/2025, 12:07:41 PM No.106582402 [Report] >>106582449

is there a better alternative to whisper? I tried out parakeet and it likes to skip sentences

Anonymous 9/14/2025, 12:19:20 PM No.106582449 [Report]

>>106582402
No.

Anonymous 9/14/2025, 12:19:38 PM No.106582451 [Report]

https://github.com/ggml-org/llama.cpp/issues/15940
Why are there so many vibecoding retards trying to implement this?

Anonymous 9/14/2025, 12:27:16 PM No.106582488 [Report]

>>106582475
>>106582475
>>106582475

Anonymous 9/14/2025, 12:30:21 PM No.106582508 [Report]

>>106579721
You need to use quality quants in llama.cpp