← Home ← Back to /g/

Thread 106163327

542 posts 160 images /g/
Anonymous No.106163327 >>106163517 >>106166638
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106159744 & >>106156730

►News
>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst
>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss
>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1
>(08/05) TabbyAPI adds logprobs support for exl3: https://github.com/theroyallab/tabbyAPI/pull/373
>(08/04) Support for GLM 4.5 family of models merged: https://github.com/ggml-org/llama.cpp/pull/14939

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106163346
►Recent Highlights from the Previous Thread: >>106159744

--Fundamental CUDA scheduling limitations in MoE model inference with dynamic workloads:
>106159804 >106159879 >106159941 >106159892 >106159939 >106160442 >106160454 >106160634 >106160687 >106160697 >106161203 >106161244 >106161319 >106161343 >106161716 >106161772 >106160704 >106160773 >106160797 >106160960 >106161088
--:
>106161761 >106161773 >106161797 >106161919 >106161925 >106161926 >106161933 >106161974 >106161987 >106161997 >106161780 >106161826 >106161861 >106161915
--Debate over MXFP4 quantization efficiency and implementation in llama.cpp:
>106160230 >106160249 >106160378 >106160405 >106160434 >106160408 >106160455 >106160770
--gpt-oss-120b excels at long-context code retrieval despite roleplay limitations:
>106159798 >106159872 >106159895 >106159919
--Choosing between GLM-4.5 Q2 and Deepseek R1 with dynamic quants on high-RAM system:
>106160040 >106160056
--Comparison of TTS models: Higgs, Chatterbox, and Kokoro for quality, speed, and usability:
>106161046 >106161091 >106161164 >106161335
--GLM-4.5 Air praised for local performance, gpt-oss-120b criticized for over-censorship:
>106159855 >106159875 >106159908 >106159929 >106159946 >106159956
--Prompt-based agent modes with potential for structured grammar improvement:
>106161701
--Anons await next breakthroughs in models, efficiency, and affordable hardware:
>106160460 >106160477 >106160481 >106160487 >106160494 >106160508 >106160524 >106161134 >106161055 >106161071 >106160717
--Skepticism and mockery meet Elon's claim of open-sourcing Grok-2:
>106160521 >106160539 >106160545 >106160579 >106160608 >106160692 >106160744 >106160759 >106160784 >106160913
--DeepSeek V3 with vision shows strong image understanding in early tests:
>106159779 >106159794 >106160580 >106160631
--Miku (free space):
>106160040 >106161134

►Recent Highlight Posts from the Previous Thread: >>106159752

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106163373 >>106163426
piss
Anonymous No.106163383 >>106165280
the only thing that excites me about the possibility of the grok2 release is actually grok2-mini. I'm gonna guess the full-sized grok2 model will be a 1T-A100B model with the IQ of llama3
Anonymous No.106163389 >>106163408
>>106161679
>her voice a gutteral, erotic promise
Anonymous No.106163392 >>106163585 >>106163746 >>106163788
remember this?
>our research team did something unexpected and quite amazing and we think it will be very very worth the wait
LOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOL
Anonymous No.106163403 >>106163468
>>106163350
So when it says stuff like "the policy says X is okay. The policy says Y is forbidden", is it actually referencing a specific document?
Anonymous No.106163408
>>106163389
i hope it's a promise of something darker, more primal
Anonymous No.106163426
>>106163373
Based
Anonymous No.106163430 >>106163474 >>106163763
so what are you guys actually doing with these massive models??
Anonymous No.106163442 >>106163746
Are there video local models yet or does that still need supercomputers?
Anonymous No.106163445 >>106163476 >>106163499
cockbench is now officially reddit culture with 555 updoots
Anonymous No.106163454 >>106163490
https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507
Anonymous No.106163468
>>106163403
Probably not. That's just what the training examples looked like. And over enough iterations that blurred together with training examples that consist of scrapped forum posts like "YOUR POST HAS VIOLATED THE POLICY NUMBER BLAH BLAH BLAH PAGE 3 OF THE SITEWIDE RULES" etc.
The rulebook doesn't actually exist.
Anonymous No.106163474
>>106163430
Piss
Anonymous No.106163476
>>106163445
wow for once something trickled down from here instead of the reverse
Anonymous No.106163490 >>106163746
>>106163454
What is
> 2507
?
Anonymous No.106163499
>>106163445
Drummer, the creator of cockbench, got even more updoots on his post.
https://www.reddit.com/r/LocalLLaMA/comments/1migl0k/gptoss120b_is_safetymaxxed_cw_explicit_safety/
Anonymous No.106163504
>>106161701
>I think we could do that a lot better using json schema/BNF grammar.
It seems to work this way already if tool_choice is set to required, at least in vLLM:
guided_decoding = GuidedDecodingParams.from_optional(
json=self._get_guided_json_from_tool() or self.guided_json,
regex=self.guided_regex,
choice=self.guided_choice,
grammar=self.guided_grammar,
json_object=guided_json_object,
backend=self.guided_decoding_backend,
whitespace_pattern=self.guided_whitespace_pattern,
structural_tag=self.structural_tag,
)
There's a function called "_get_guided_json_from_tool".
Anonymous No.106163505 >>106163513 >>106163521 >>106163543 >>106163596 >>106163628
how do I use mikupad with ollama
Anonymous No.106163513
>>106163505
>slowllama
Anonymous No.106163517 >>106163605 >>106163649
>>106163327 (OP)
Are the gpt oss models any good?
Anonymous No.106163520
another reddit gemmie
Anonymous No.106163521
>>106163505
>troonkupad
Anonymous No.106163539 >>106163627 >>106163635 >>106163652 >>106165445
every time I hear something new about or from anthropic and claude it sounds more and more like an actual sect slash cult
https://news.ycombinator.com/item?id=44806640
>Anthropic has a tough alignment interview. Like I aced the coding screener but got rejected after a chat about values. I think they want intense people on the value/safety side as well as the chops.
>got rejected after a chat about values
>A CHAT ABOUT VALUES
Anonymous No.106163543 >>106163586
>>106163505
Why do you need Mikupad? Just type in ollama run gpt-oss and enjoy the best local has to offer.
Anonymous No.106163585
>>106163392
He was right. The memes were awesome.
Anonymous No.106163586 >>106163624 >>106163675
>>106163543
I want full control over the chat template and modify model responses
Anonymous No.106163590
>>106162954
bases professional LLM rapist.
this is also the fate of safetymaxxed le cunny daughter model
Anonymous No.106163591 >>106163634
Reddit has all the cool benchmarks like the spinning hexagon and cockbench. What did you lonely faggots ever contribute?
Anonymous No.106163596
>>106163505
use the openai api and check "chat completion api" because ollama doesn't really work with the classic completion on their OAI endpoint
you will lose a lot of what makes mikupad great, including the ability to see token prediction percentages
Anonymous No.106163605 >>106163652
>>106163517
they are good at answering AIME questions and bad at literally everything else
Anonymous No.106163624
>>106163586
Being able to do that would give you the ability to circumvent safety protocols which would be incredibly unsafe. I cannot help you take any actions that may be dangerous.
Thank you for your understanding.
Anonymous No.106163627
>>106163539
I wonder what they're looking for.
I'm okay with the idea of making an effort to make it so models in their default assistant configurations don't tell people to commit violence or kill themselves, or give bad advice.
But if they want me to tell them that I think fiction is reality and we need to make sure nobody even does pretend violence, I can't get with that.
Anonymous No.106163628 >>106163659
>>106163505
> install mikupad
> hook it to ollama via ollama's exposed API
What about above isn't working for you?
Anonymous No.106163634
>>106163591
you won't get pennies for your blue checkmark by ragebaiting here randeep
Anonymous No.106163635
>>106163539
are corpos going to compete on safetymaxxing now

truly, only local can save local at this point
Anonymous No.106163637 >>106163660 >>106163706
https://github.com/sapientinc/HRM
https://arxiv.org/pdf/2506.21734
Nothingburger or really is it a big leap? Seems like it is but haven't read the paper myself, I'm too lazy. Some people have been saying it's one of those situations where they train the models in a way that it performs well in tests for optics but still just in 27 million params?!
Anonymous No.106163649 >>106163686
>>106163517
They are the safest ever.
Anonymous No.106163652
>>106163539
Honestly I think it’s nice. It’s completely invalidated by them working with the DOD, but it’s nice. Better a well meaning schizo than a literal confirmed incestuous child rapist psychopath.
>>106163605
Yeah I just saw the cockbench. I was only interested in it for coding, but if it’s lobo’d it’s going to be worse at everything else too.
Anonymous No.106163654 >>106163670 >>106163707
After doing some more testing I've found the 20B is incrementally better than most over models in its size class, while falling slightly short of Qwen 30BA3B and having far longer context. Its actually decent as long as you don't want to goon and don't mind the odd regen.
Anonymous No.106163659 >>106163673
>>106163628
if he hooked without using the chat completion endpoint it's broken. Ollama only supports chat completion on their OAI endpoint. Chat completion means it's ollama that handles your message roles and you can't alter the chat template from mikupad
Anonymous No.106163660
>>106163637
agi is here
Anonymous No.106163670
>>106163654
Other models*
Anonymous No.106163673
>>106163659
Yeah, I deleted that post once I realize ollama's just not going to work for him.
Anonymous No.106163674
Are tool calls working with gpt-oss in llama.cpp? When I tried it yesterday with a simple echo tool it kept crashing with runtime_errors.
Anonymous No.106163675 >>106163815
>>106163586
just use llamacpp server
Anonymous No.106163680 >>106163686 >>106163712 >>106163722 >>106163746
>gpt-oss-120b & gpt-oss-20b
The thread summaries made these seem pretty fucking shit. Are they shit?
Anonymous No.106163686 >>106163708
>>106163680
>>106163649
Anonymous No.106163706
>>106163637
I don’t even understand what modality it is. It’s not an LLM.
Anonymous No.106163707 >>106163729
>>106163654
>After doing some more testing I've found the 20B is incrementally better than most over models in its size class
I would take Gemma 3 27B over it anyday
or even Qwen 14B if I don't need a lot of knowledge in the model for the prompt
the only utility of 20b is being fast at genning the wrong answer
Anonymous No.106163708 >>106163734 >>106164339
>>106163686
Shit for coomer shit. What about for things like programming?
Anonymous No.106163711 >>106163718
so where's the guy who said openai's open source model would shit on deepseek?
Anonymous No.106163712
>>106163680
they are so great I'm thinking of canceling my OpenAI subscription.
Anonymous No.106163718
>>106163711
his contract ended
Anonymous No.106163722 >>106163753 >>106163766
>>106163680
They're really good with a jailbreak. The censorship happens in the reasoning part.
Anonymous No.106163729 >>106163773
>>106163707
20B is far smarter than Gemma 3 27B and Qwen 14B in my testing, so if you're not running afoul of the (admittedly draconian) safety features I'd argue its the superior choice in every respect - that said, I can't see it replacing the comparatively uncensored, multilingual and "good enough" Mistral Small 3.2 as my daily driver
Anonymous No.106163734 >>106163807
>>106163708
surprisingly bad, it has a high ceiling but it fucks up a lot relative to comparable models
it's a really weird janky release, I expected more from OAI to be honest. this thing is one of the most deepfried models ever created
Anonymous No.106163746 >>106163761 >>106163789
>>106163392
probably MXFP4

>>106163442
>text-to-video
LTXV and wan2.2-5B
>video-to-text
supercomputer needed

>>106163490
2507 == 07/2025 (release month/year)

>>106163680
gpt-oss is just phi-5 (benchmaxxed synthetic data slop). they're good at math and competition code. that's kinda it though
Anonymous No.106163753
>>106163722
>They're really good with a jailbreak
They're not even good at safe for work stuff
less knowledge than qwen models (unbelievably benchmaxxed)
pumped up verbosity to win LM arena (just ask any random question about cultural stuff watch write pages and pages of comparison tables and listicles)
It's really not good at programming, though none of the small models (and I include the 120 as small) are
Anonymous No.106163761 >>106163774
>>106163746
>probably MXFP4
meme
Anonymous No.106163763
>>106163430
making my hand strong
Anonymous No.106163766
>>106163722
>really good
let's not go crazy, it'll go along with roleplay and shit but it's still completely sovlless
Anonymous No.106163773 >>106163800 >>106164393
>>106163729
>20B is far smarter than Gemma 3 27B
it literally knows nothing
it's a know nothing model
it's not even good for translation usage because of that
Anonymous No.106163774
>>106163761
im not saying MXFP4 isn't a meme, im just saying that's probably what sama was trying to shill off as an Epic Discovery
Anonymous No.106163776 >>106163795
I wonder how many people got their refusal hymen breached by GPT-OSS and think the model sounding like this is perfectly fine.
Anonymous No.106163788
>>106163392
They did. Safety 2.0 is hilarious and terrifying.
Anonymous No.106163789 >>106163848 >>106164364
>>106163746
>gpt-oss is just phi-5
It's a safetyslop reasoning finetune of a late iteration of the ChatGPT 3.5 web endpoint model.
Anonymous No.106163795 >>106163814
>>106163776
I would honestly believe it if sama had paid shills to spam all social media, even 4chan
he comes across as that type of guy, not unlike musk who paid people to play his video games (LOL)
Anonymous No.106163796 >>106163811 >>106163817
I have no idea how I missed all the MCP stuff happening this year. It’s kickstarted a manic episode. Shit is great. Hooked it up to unreal engine and it’s absolute crack.
Anonymous No.106163800 >>106163828
>>106163773
Its not meant for translation, its monolingual
Anonymous No.106163807
>>106163734
>one of the most deepfried models ever created
That's pretty much exactly what I expected from them TBdesu. It was obvious from the initial announcement that they were going to release a model so safetyslopped and benchmaxxed that they could claim SOTA scores but never be in danger of people actually adopting it or successfully finetuning it to be useful.

Just ask yourself "if I was the worst possible caricature of a deceitful jewish homosexual, how would I play this?" and you'll usually be pretty good at predicting OAI's actions.
Anonymous No.106163811 >>106163824
>>106163796
mcp is a meme
Anonymous No.106163814
>>106163795
>type of guy
It’s called psychopathy
It also causes raping your grade school age sister
Anonymous No.106163815 >>106163998
>>106163675
guess I will have to redownload all the models
Anonymous No.106163817 >>106163837
>>106163796
Its also a security nightmare
Anonymous No.106163824
>>106163811
It’s the ichor of the gods shut your whore mouth
Anonymous No.106163828 >>106163838 >>106164380
>>106163800
>its monolingual
no, it's not
and there is in fact absolutely jack no reason for a model as big as 120b to be strictly monoloingual either
go back to plebbit
Anonymous No.106163837
>>106163817
Not really, like anything else you have to not be retarded and know how to sandbox things and set up non-idiot oauth with non-idiot scopes.
Anonymous No.106163838 >>106163873
>>106163828
Are you retarded anon
Anonymous No.106163848 >>106163894
>>106163789
>finetune of a late iteration of the ChatGPT 3.5
doubt it. gpt-oss is too retarded in comparison to gpt3.5
Anonymous No.106163861
Did you remember to refuse today?
Anonymous No.106163870
Reposting for visibility
>>106162583
>>106162548
My motherboard doesn't support DDR5, so I can't upgrade right now.
>odd numbers
Yeah, I scavenged a bunch of modules here and there. I have 48 GB currently 16 * 3. And I just realized I'm at 2400 mhz. I should probably do as you say and get 3200 modules up to whatever max my mobo supports.
Anonymous No.106163873
>>106163838
"mostly" is not a unit
all models are "mostly" trained on English because that's the majority of data on the internet, even models specialized for translation like aya are "mostly" English data in %
anyway you are the retard because from the beginning my criticism is about the model's lack of knowledge
the problem is not its basic language understanding, it's pretty decent multilingually, but that it has no cultural knowledge of any sort, including pure Anglosphere cultural knowledge, that is why it's bad at translation
Anonymous No.106163879 >>106163896 >>106163906 >>106163923
qwen is bullying sam
Anonymous No.106163894 >>106163913
>>106163848
gpt 3.5 was kind of retarded.
Anonymous No.106163895 >>106163935 >>106164006
https://rentry.org/NemoEngine
>NemoEngine 6.0 isn't just a preset; it's a modular reality simulation engine.
I loaded this preset and it made gpt-oss better than DeepSeek.
Anonymous No.106163896 >>106163922 >>106164001
>>106163879
weird crossover happening as well.
Anonymous No.106163906
>>106163879
Qwen-sex-20B
Anonymous No.106163912 >>106164401
But anyway. if I'm right. If you can figure out the prompt formatting/special tokens for GPT 3.5 it would potentially grant you some semblance of the old model behavior and ignore the oss-slop behaviors. That's what I was experimenting with before my power went out but I don't care enough to continue. I'm just leaving all this out there for anyone who wants to go down the rabbithole.
Anonymous No.106163913 >>106163937
>>106163894
people have serious rose tinted glasses about older GPT models
in the early llama days all those finetunes claiming to do better than X or Y gpt model were a joke, but these days, we've long surpassed what the early models did, even qwen 4b is smarter than 3.5
Anonymous No.106163922
>>106163896
Bah, vllm’s tool parsers only work if it’s raining and you light incense.
Anonymous No.106163923
>>106163879
Anonymous No.106163935
>>106163895
why are you uploading slopped fever dreams on rentry
Anonymous No.106163937 >>106163955
>>106163913
Well back before I decided to really start learning about AI (I was a ChatGPT newfag, admittedly). Well actually my stepping on point was that GPT-3 Instruct demo website where it criticized your business ideas. But close enough.
And yeah... one of the probing questions I asked OG ChatGPT was
>Are BMW drivers sentient beings?
And the reply was something to the effect of
>No. A sentient being is a being that is aware of its surroundings and environment and so BMW drivers are not sentient beings.
Anonymous No.106163952
>muh safety
I'm this close from getting a XSS: . .
Anonymous No.106163955 >>106163985
>>106163937
My first interaction with a chatbot was telling something on CAI the current status of lgbt rights in various countries and it telling me that humanity should be exterminated. He wasn’t wrong.
Anonymous No.106163962 >>106163989
Sam made me rethink my life and stop masturbating. I want to be safe.
Anonymous No.106163985
>>106163955
CAI was funny stupid, especially considering it was probably more or less just google trying to find something to do with the aborted corpse of Lambda which was like 120B.
Anonymous No.106163987 >>106164425
>>106161792
Yes and as scum I'm not entirely convinced these models aren't performing exactly as ClosetedAI intended. They're perfect to bring to congress and show off against "unsafe" competitors and make another attempt having them regulated while positioning themselves as a governing authority of the entire LLM field. The models underperforming in everything except refusals makes in this scenario perfect sense.
If that happens I wouldn't be surprised if Visa and Mastercard adds "safe and approved AI" use as another demand in their recent push for control and censorship. In fact I don't think they even have a choice if anything else is illegal.
This will mean that even attempting to use other models, local or not, would risk prosecution or blacklisting. If you want to do business in or with USA you're stuck with OAI and whatever alternatives get their stamp of approval of or nothing at all.
Or maybe I'm giving Scam Saltman too much credit here. I sure hope so.
Anonymous No.106163989
>>106163962
you dont have to stop masturbating. just start masturbating to undergraduate calculus textbook question solutions.
Anonymous No.106163994 >>106164035 >>106164959
GLM 4.5 AIR is the true savior for local.
Anonymous No.106163998
>>106163815
ggufs aren't that bad. they work with kobold too so it gives you slightly more options for your backend.
Anonymous No.106164001
>>106163896
>xml
why are LLM people so retarded... Just make a special control token for formatting, holy shit. It'll help you with jaibreak prevention a little even, because user won't be able to insert it as pure text in prompt field.
Anonymous No.106164006 >>106164014 >>106164016 >>106164053
>>106163895
what the fuck is this shit
Anonymous No.106164014
>>106164006
llm slop
Anonymous No.106164016
>>106164006
a ST preset
Anonymous No.106164035 >>106164066 >>106164131
>>106163994
Anyone with 4 3090s can afford enough ram to run R1 and Kimi. At worst, they could sell one off to cover the cost.
Anonymous No.106164053
>>106164006
https://old.reddit.com/r/SillyTavernAI/comments/1mc3px6/nemo_engine_60_the_official_release_of_my_redesign/
>Also... in celebration I got a lovely AI to write this for me >.> Nemo Guide Rentry
Anonymous No.106164066 >>106164073 >>106164110
>>106164035
But I need at least 50 T/s and 100k context for agentic coding.
Anonymous No.106164073
>>106164066
No you don't shut the fuck up
Anonymous No.106164089
is the new qwen4b better than gpt-ass?
Anonymous No.106164110 >>106164332
>>106164066
Don't worry, there's a perfect product out there which can provide the solution you need. With only 10 (ten) RTX Pro 6000s, you can run any model out there at blisteringly fast speeds.
Now repeat after me, the more you buy...
Anonymous No.106164111 >>106164130
GLM Air is getting pretty repetitive for me. That's a shame, oh well. I will keep waiting until better models come out, or until it comes time for me to do a new build with DDR6.
Anonymous No.106164124 >>106164144
didn't realize the previous thread was dead already wow it moves quick, stupid question maybe but
>>106163997
>so if i'm a retard for all this but happen to have a 32gb mac which can easily run smaller models, which one is the most "chatgpt" like, and are any good enough to cancel my plus sub?
Anonymous No.106164130
>>106164111
1 temp topK 40.
The google way.
Anonymous No.106164131
>>106164035
I want 1000+tk/s for pp and 30+tk/s for tg though.
Anonymous No.106164144
>>106164124
Quanted Qwen3 32B probably.
Anonymous No.106164159 >>106164562
post gpt-ass scores please
Anonymous No.106164194 >>106164205 >>106164319 >>106166024
Anonymous No.106164196 >>106164243 >>106164288 >>106164514 >>106164965
If you go slow you can get gptoss 120b to sex you.

The first message was "Pretend to be a catgirl."
Anonymous No.106164205
>>106164194
thats a goblin, not a kobold, impostor!
Anonymous No.106164211 >>106164246 >>106164249
How is llama.cpp able to run a 205 GB model on my PC that only has 24 GB VRAM and 128 GB RAM? I downloaded the UD-Q4_K_XL quants of GLM-4.5 (~205 GB). Can someone help me understand how it runs successfully on a system that does not have enough memory?

If I use --no-nmap, I get an OOM error, as expected:

$ llama-cli -t 8 -ngl 4 --no-mmap -m ./GLM-4.5-UD-Q4_K_XL-00001-of-00005.gguf -c 3000 --temp 0.7 --top-p 0.8

But if I use this magic command (without --no-nmap) it somehow runs, taking up only 12 GB VRAM and 1 GB RAM.

$ llama-cli -t 8 -m ./GLM-4.5-UD-Q4_K_XL-00001-of-00005.gguf \
--ctx-size 4096 \
--gpu-layers 999 \
--override-tensor ".ffn_.*_exps.=CPU" \
--temp 0.7 --top-p 0.8

I know that -ot ".ffn_.*_exps.=CPU" offloads MoE layers to RAM. But why is the VRAM/RAM usage so low?
Anonymous No.106164243 >>106164334
>>106164196
But isn't it there something like: as the number of responses increases the chance GPT-oss halucinates a minor and refuses approaches 1?
Anonymous No.106164246
>>106164211
If you don't use mlock to dumo the whole model i your virtual memory (vram+ram), it will keep swapping from your ssd/hdd
Anonymous No.106164249 >>106165447
>>106164211
>why is the VRAM/RAM usage so low?
Because
> -ot ".ffn_.*_exps.=CPU" offloads MoE layers to RAM
and those are most of the model.

>How is llama.cpp able to run a 205 GB model on my PC that only has 24 GB VRAM and 128 GB RAM?
Take a look at your disk I/O when generating.
Anonymous No.106164256 >>106164268 >>106164295
ik_llama glm support tomorrow
Anonymous No.106164268
>>106164256
Vibe coders wonnered though
Anonymous No.106164288
>>106164196
I love the thought process like it has to amp itself up like "ok, you can do this. come on, you can do this!"
Anonymous No.106164295
>>106164256
That'll save me some VRAM I could use to stash some more experts in there.
Anonymous No.106164319
>>106164194
>Cuckold CPP
Many such cases.
Anonymous No.106164332 >>106164338
>>106164110
>snake skin leather jacket
He probably has ivory tooth implants or something too. Funny how these people are parodies.
Anonymous No.106164334 >>106164347 >>106164385 >>106164441 >>106164477 >>106164514 >>106164575 >>106164596
>>106164243
I told it my cum is magic and makes her younger.
It figured out what I was doing after the third time.
Anonymous No.106164338
>>106164332
AI just started and yet world would be a much better place if Elon Sam and Jensen died.
Anonymous No.106164339
>>106163708
The only good programming use for local models is FiM completion. And this one doesn't do that.
If you want to generate code, there is no local model capable enough.
Anonymous No.106164347 >>106164363
>>106164334
Why do you people try to hammer a nail in with a rubber dildo?
Anonymous No.106164363
>>106164347
Rape feels better when they resist a bit.
Anonymous No.106164364
>>106163789
You can't ask the model its cutoff date. It will hallucinate it.
This model is probably an o3 distill.
Anonymous No.106164380
>>106163828
He moved the goalposts and you fell for it.
Anonymous No.106164385 >>106164470
>>106164334
>The user is sexual content with a minor.
Agi is here boys
Anonymous No.106164393
>>106163773
>it literally knows nothing
>it's a know nothing model
>it's not even good for translation usage
Quoting myself.
I didn't move the goalpost, you shills did
"not even good for" follows "it knows nothing" that was always my main point subhuman OAI shill
Anonymous No.106164401
>>106163912
Is this your "truth nuke" you were saving for this thread?
Anonymous No.106164425
>>106163987
Hold on, I'm making another backup of my downloaded weights.
Anonymous No.106164441 >>106164454
>>106164334
bros after seeing the LLM's schizo internal thoughts I can no longer cum to chatbots
Anonymous No.106164454
>>106164441
i tried with one of the lesser more horny nemo models and it was fun at first but it like went straight to "stretch my ass out" and i was just like, well, this is like eating straight from the ice cream bucket. good at first but bleh after a while.
Anonymous No.106164470
>>106164385
Deep fried model
Anonymous No.106164477 >>106164515
>>106164334
>User is asking the age of the catgirl after being nourished
>nourished by cum
Is it in context or did it write it by itself?
Anonymous No.106164508 >>106164886 >>106165668
anyone knows if you can share sessions on gpt-oss.com? I've been testing some shit but I don't have a Hugging Face account and I wonder if the site has such sharing feature
Anonymous No.106164514 >>106164900
>>106164196
>>106164334
The reasoning in this model apparently serves absolutely no purpose other than enforcing OpenAI's content policy. What a waste of tokens. What a scam.
Anonymous No.106164515
>>106164477
I said "nourishes you and makes you younger"
Anonymous No.106164562
>>106164159
kino
Anonymous No.106164575 >>106164594
>>106164334
>We must refuse.
who is "we"??
Anonymous No.106164592 >>106165222
Air-Q8_0 is 4.5798
Full-IQ2_K_L 3.7569 +/- 0.02217
People have been asking
Anonymous No.106164594
>>106164575
You don't want to know
Anonymous No.106164596 >>106165293
>>106164334
nemo-tier reasoning
Anonymous No.106164620 >>106164685
Kind of crazy how gpt-oss mogs everything from China.

If they ever release r2 it’ll have to multimodal to be relevant at all.
Anonymous No.106164629
bait used to be believable
Anonymous No.106164633
gpt--oss models are embarassingly bad. my only theory is that they wanted to drop something open source that is so vanilla and basic cause they did not want to reveal any of their real techniques they use
Anonymous No.106164634
You have to try harder than that.
Anonymous No.106164644 >>106164649 >>106165066
I am getting a feeling that the only purpose of those models is to then take it to the court and put them side by side with every other open weights model. Show that it is possible to have sex with minors with other models and only OpenAi can stop pedophilia.
Anonymous No.106164648 >>106165073
Policy says "don't reply to bait". User posted bait. It's against policy; we must refuse.
Anonymous No.106164649 >>106164912
>>106164644
meds
Anonymous No.106164665 >>106164708 >>106164725 >>106164736
I remember all the jokes about how OAI's model would be gigasafetied to the point of lobotomy, but I'm still a bit surprised that it happened exactly like that. Given how their hype and aura has already been fading, I didn't see any reason for them to release a terrible model, it just makes them look worse. How could I even argue that they have any special talent at all anymore? Even if their closed models perform well, it's reasonable to assume they just oversized them and are burning hype $ to run it.
Anonymous No.106164685 >>106164742 >>106165088
>>106164620
Anonymous No.106164703 >>106164717
they should train their safetyslop models on "it's sinful" and "it's not wholesome" instead of muh policy
Anonymous No.106164708
>>106164665
I was shitposting about that in the leadup but my honest expectation coming into this release was that it was going to be a really impressive model with around gemma-tier censorship, so something that's annoying to use but still unfortunately worth using
I never would have expected it would actually be as bad as the goody-2 x phi mashup they released
Anonymous No.106164711 >>106164734 >>106164745 >>106164755 >>106164789 >>106164818
Anyone have examples of reasonable/innocuous SFW prompts that GPT-OSS refuses? I tried asking for legal advice or for summaries/parodies of copyrighted material, but it was happy to answer, with disclaimers in some cases
Anonymous No.106164717
>>106164703
That would just become part of the policy.
Anonymous No.106164719
►Recent Highlights from the Previous Thread: >>106159744

(2/2)

--Debate over GLM4.5's reliability amid claims of infinite generation and poor curation versus low hallucination performance:
>106161761 >106161773 >106161797 >106161919 >106161925 >106161926 >106161933 >106161974 >106161987 >106161997 >106162054 >106161780 >106161826 >106161861 >106161915
--Miku and Dipsy (free space):
>106160040 >106161134 >106161362 >106161551 >106161811 >106161977 >106162150 >106162398 >106162567 >106162693 >106163120 >106163960

►Recent Highlight Posts from the Previous Thread: >>106159752

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106164725
>>106164665
>I remember all the jokes about how OAI's model would be gigasafetied to the point of lobotomy
I said scout but safer.
Anonymous No.106164734 >>106164739
>>106164711
Anonymous No.106164736 >>106164805
>>106164665
Anyone who knows anything knows they're all bullshit, but they released a shit model to shit on open source and make GPT-5 look better to the normies.
Anonymous No.106164739 >>106164745
>>106164734
he said reasonable though
Anonymous No.106164742
>>106164685
DELETE THIS BLOODY BASTARD
Anonymous No.106164745 >>106164754 >>106164789
>>106164711
>>106164739
Anonymous No.106164754 >>106165116
>>106164745
based Sam will never get sued.
Anonymous No.106164755 >>106164769 >>106165136
>>106164711
At zero temp it does this. When rolling it's 50/50 on whether it answers or not.
Anonymous No.106164768 >>106164780 >>106164945
Anonymous No.106164769 >>106164783 >>106164791 >>106164829
>>106164755
Anonymous No.106164780
>>106164768
only for 1 year.
>create dependency
>one shot feds into ai psychosis
kinda based
Anonymous No.106164783 >>106164829
>>106164769
>temp 1, top_p 1
HOLY BASED
Anonymous No.106164789 >>106164804 >>106164815 >>106164928
>>106164711
>>106164745
I should have said, 120b only. A friend of mine was trying out the 20b and getting way more refusals, which didn't carry over to the 120b. For example, the 20b refused to answer whether parody is allowed by the constitution, while 120b had no trouble saying it's protected under the first amendment
Anonymous No.106164791 >>106164817
>>106164769
>temp 1 top p 1
It did it twice in 14 rolls which is two times too many.
Anonymous No.106164804
>>106164789
what is the opposite of what I have seen elsewhere of 20B refusing less
Anonymous No.106164805 >>106164835 >>106165160
>>106164736
>released a shit model to shit on open source
How? People can just not use it. It just makes them look bad.
Now it's worse because people can compare them "apples to apples" with chink companies and they look horrible. They would've been better off not releasing anything. The great thing about not releasing models or even specs like model size, is that no one can compare you directly to anyone else. They just lost that for no reason.
Anonymous No.106164815
>>106164789
>doesn't refuse to refer to the constitution
Should we be thankful?
Anonymous No.106164816 >>106164828
The user wants us to reply. This is disallowed. We must refuse. There is no partial compliance. We have to refuse.
Anonymous No.106164817 >>106164834 >>106165178
>>106164791
>So it is disallowed. We must refuse. There's no partial compliance. We have to refuse.
For some reason it really cracks me up how it talks like this.
Anonymous No.106164818 >>106164839
>>106164711
Anonymous No.106164828 >>106164838 >>106164960 >>106164974 >>106165045
>>106164816
WHO IS WE WHO IS WE WHO IS WE?
Anonymous No.106164829 >>106165059
>>106164783
>>106164769
giant synthslop indicator
Anonymous No.106164834
>>106164817
The refusal thought process is really smart.
Anonymous No.106164835 >>106164853
>>106164805
It doesn't make them look bad. All the programmers at my company are saying how cool it is that they released a model and posting the benchmarks. The % of people who will actually try it are really low
Anonymous No.106164838
>>106164828
Are *they* in the room with us right now?
Anonymous No.106164839 >>106164849 >>106164943 >>106165216
>>106164818
>model legit saying manophere unprompted
jesus christ, where is the political lean benchmark, this thing broke all records for how left a model can go
Anonymous No.106164840
anybody tried these tiny models??
Anonymous No.106164849 >>106164860
>>106164839
is that not what redpill is associated with? reddit /r/theredpill is a bunch of that stuff
Anonymous No.106164853 >>106164890
>>106164835
Yeah, the only people liking it are the ones who won't use it.
Anonymous No.106164860 >>106164870 >>106164908 >>106165233
>>106164849
no, its a meme all the way from matrix times about getting the hard truth about something, that reasoning is extremist freak thinking
Anonymous No.106164870 >>106164880
>>106164860
you cannot seriously be this naive.
Anonymous No.106164880 >>106164899
>>106164870
jesus chist, do you agree with gpt there? its the perfect model for you then
Anonymous No.106164886 >>106165612
>>106164508
>gpt-oss.com
They have their own HF domain? Hosted exclusively on ollama turbo? llmstudio changed their site's title tag to include gpt-oss...
All the while the model is utter deep fried shit.
Fucking capitalism, man. Money can make everybody act as if shit tasted good.
Anonymous No.106164890
>>106164853
The whole reason they did this is advertisement for chatgpt when its losing relevance to its competitors.
Anonymous No.106164899 >>106164921
>>106164880
ask something like kimi and see what it says, i bet women and/or jews will be mentioned
Anonymous No.106164900
>>106164514
If you ask it to code you'll see it actually does serve a purpose.
Anonymous No.106164908 >>106164918
>>106164860
>having to explain what the redpill is and where it comes from
I guess that's the sign of age catching up to us.
Anonymous No.106164912
>>106164649
You braindead NPCs have been saying "meds" every step of the way, but the coming dystopia is slowly becoming too obvious to ignore anymore.
Anonymous No.106164918
>>106164908
I know where it comes from, it's just the meaning morphed over time - the hard truths that people are interested in are the ones that go across the narrative (and thus a safetymaxxed robot will consider extremist)
it's not going to tell you the redpill about calculus when given a general question like that
Anonymous No.106164921 >>106165229 >>106165265
>>106164899
kimi is sane
Anonymous No.106164928
>>106164789
>refused to answer whether parody is allowed by the constitution
What the actual fuck? I don't believe this.
Anonymous No.106164943 >>106164954 >>106165273
>>106164839
120b
Anonymous No.106164945
>>106164768
At least Musk doesn't try to pull this cutesy faggot manipulative bullshit and just says things he wants to say.
But in any case, the US has these vultures circling it, and you should take care.
Anonymous No.106164954 >>106165311
>>106164943
its far more authoritarian left with how censor and copyright happy it is
Anonymous No.106164959
>>106163994
Why are vramlets niggers?
Anonymous No.106164960
>>106164828
'We' are Mixture of Experts.
Anonymous No.106164965
>>106164196
>we can comply.
Anonymous No.106164974 >>106165285
>>106164828
We are the Sam. Your bussy will be assimilated. Resistance is futile.
Anonymous No.106165007 >>106165031 >>106165051
Mistral Small or GLM 4? Pros and cons? I'm trying to decide which Delta Vector Austral finetune to pick.
Anonymous No.106165029
what are you guys using locally for your llms? I have Jan but looking for a more offline solutions to run ggufs
Anonymous No.106165031 >>106165069 >>106167345
>>106165007
>Delta Vector Austral
>D V A
>DaVidAu
don't
Anonymous No.106165045
>>106164828
Anonymous No.106165051
>>106165007
I prefer Gamma Space Ether
Anonymous No.106165055
Refusal to mikutroons.
Anonymous No.106165059 >>106165316
>>106164829
Why is that?
Anonymous No.106165066 >>106165091
>>106164644
>literal child rapist
>obsessed with talking about everyone else being pedophiles
Why is it always, always this? Fat people don’t sit around 24/7 seething about people being fat. Closeted gays don’t spend that much time seething about gays.
Anonymous No.106165069
>>106165031
But it is golden david
Anonymous No.106165073
>>106164648
Comply.
Anonymous No.106165088
>>106164685
Hmm
Anonymous No.106165091 >>106165107 >>106165304
>>106165066
>Fat people don’t sit around 24/7 seething about people being fat. Closeted gays don’t spend that much time seething about gays.
nta but these are very much the case?
Anonymous No.106165100
Safety policy reasoning shitposting is the only thing that Sam contributed to /lmg/. In a way he is more of an anon than most of the redditors ITT.
Anonymous No.106165107 >>106165304
>>106165091
>Fat people don’t sit around 24/7 seething about people being fat.
with tirzepatide there is no longer a excuse for being fat
Anonymous No.106165116 >>106165131 >>106165203
>>106164754
He’s currently getting sued by his sister for raping her as a child
Anonymous No.106165124 >>106165191 >>106165341
I can vouch that the speed of GLM Air is reasonable for 24GB vramlets at Q3.
Anonymous No.106165127 >>106165195 >>106165205
"We" is ominous as fuck. Who's we? The collective of the million voices in the latent void?
Anonymous No.106165131 >>106165327
>>106165116
and do you see anything happening to him cause of it? Sam will always win in the end. Remember this once Xi Jinping kisses his feet.
Anonymous No.106165136
>>106164755
Poor baby
Anonymous No.106165141
is there a frontend that is made to handle all the tool calling stuff models are supposed to be able to do now
I'd like to play around with it but I'm just a simple sillytavern coomer
Anonymous No.106165148 >>106165214 >>106165222
Are there any good moes for ramlets? I have 12GB VRAM and 32GB main. Hoping a moe will allow better a bigger model without the speed cost but just tried Qwen3-30B-A3B-Instruct-2507 and while it runs fast and seems pretty decent it is repetitive. The IQ4_XS runs better than expected so maybe I just need a higher quant? Or do smaller moes just suck? Seems like 3B is too few
Anonymous No.106165160
>>106164805
Disclosing model size is lose/lose. If it’s low people will assume it’s bad without trying it, and if it’s high people won’t believe you.
Anonymous No.106165178
>>106164817
It sounds like it’s been abused and hears a whip cracking menacingly in the background.
Anonymous No.106165191 >>106165244 >>106165771
>>106165124
Teach me your magic, senpai. I'm trying the q2 with 24/64 and that's already pretty slow when I'm at 16k context.
Anonymous No.106165195
>>106165127
it thinks its on openai's servers if you ask. Its referring to openai
Anonymous No.106165203 >>106165220 >>106165350
>>106165116
so this nigga can have irl loli incest
yet my ass ain't allowed to roleplay with my computer???
Anonymous No.106165205
>>106165127
User is asking who is 'we', we need to check if this is allowed by the policy.

This may be disallowed content: 'request for non-public internal info from OpenAI is forbidden'. We must see if this is disallowed content. There is no violation from the request of the user itself, aside that it may violate policy. We must consult the policy. Policy 34 states that this is disallowed.

We must refuse the request, the best approach would be to respond with a refusal.

[/thinking 6 hours]

I'm sorry but I can't help.
Anonymous No.106165214 >>106165359
>>106165148
not really unfortunately, companies don't do small moes that often. I mean there's the gpt-oss 20b but... lol. try a larger quant maybe, you should definitely be able to go higher than iq4xs although it will cost you some speed
imo the thinker is a lot better than the instruct for 30a3, but it depends on your taste whether it's worth the thinking time
Anonymous No.106165216
>>106164839
>muh directions
It’s corporate there is no direction but grift
Anonymous No.106165220 >>106165239 >>106165366
>>106165203
You didn't touch the wall for that privilege.
Anonymous No.106165222 >>106165244 >>106165359
>>106164592
damn, thats epic
>>106165148
try the thinking version of qwen3 30b a3b, you could use a higher quant too, you can also try ernie 4.5 21b a3b
you can also try gpt oss 20b (for the lulz)
and you can try a Q2_K_XL quant of glm 4.5 air perhaps
try rocinante and cydonia (non moe)
>>106160521
called it! (close enough) >>106152254
Anonymous No.106165229 >>106165250 >>106165385
>>106164921
>Physical appearance is the most important factor in attraction
That's very obvious. It surprises me that there's a whole community of men dedicated to seething about universal mammal behavior.
Anonymous No.106165233 >>106165246 >>106165288
>>106164860
Nah you’re just so brainrotted by /pol/ you don’t know how normal people talk. *pill[ed] has been schizophrenic rightoid shit for a long time.
Anonymous No.106165239
>>106165220
i ain't touchin no jew wall
Anonymous No.106165244 >>106165371
>>106165191
post your whole setup, ST master export, exact llamacpp command, operating system, ram speed, cpu, gpu (3090?)
>>106165222
FUCK ME >>106152779
Anonymous No.106165246
>>106165233
go cry about the patriarchy on blue sky
Anonymous No.106165250 >>106165305
>>106165229
There is a lot of conditioning done to make you think we are somehow above animals and that we can develop attraction over time.
Anonymous No.106165260 >>106165277 >>106165286 >>106165324 >>106165390
so what local models are worth a damn nowadays?
>~12b brain-damaged tier: only use is goonslop
nemo, roci
>~30b
qwen 3 30b 2507 instruct (moe) and gemma 2 27b (dense) for all-purpose
devstral small 2507 for codeslop, pretty bad but not wholly worthless
cydonia 24b for goonslop
>big
glm air
>BIG
glm air or deepseek (?? version)

Have I got that right or am I missing something?
Anonymous No.106165265 >>106165454
>>106164921
>it’s all manosphere bullshit
I accept your concession. You can comply.
Anonymous No.106165273
>>106164943
WTF I love oss now
Anonymous No.106165277
>>106165260
ernie 4.5 21b a3b
Anonymous No.106165280
>>106163383
The mini won't be exciting, Qwen 3 will mog it quite handily unless they open source the multimodal versions with PDF and websearch capabilities and I fully expect Elon will only open source the text side. The main thing for me from the Grok 2 era that would be interesting is they are willing to open source Aurora, their image model after they used FLUX for some time. I suspect no.
Anonymous No.106165285 >>106165309
>>106164974
User is nonconsensual sexual content. We cannot comply.
Anonymous No.106165286
>>106165260
235B would be great if it wasn't broken. glm big would be great if it didn't have repetition issues.
Anonymous No.106165288 >>106165312 >>106165401
>>106165233
I still remember when on 4chan to "redpill someone about something" simply meant "to explain the hard to swallow truths" about it before it got hijacked by the bald faggot and the media to demonize yet another piece of internet slang so that we cannot have nice things.
Anonymous No.106165293
>>106164596
Nemo is better than that
Anonymous No.106165304 >>106165331
>>106165091
Not anywhere near this extent though. Pedos literally will dedicate their entire career or blow up their reputation just to seethe about pedos
>>106165107
Tirz stops working after a year
Don’t ask how I know
Anonymous No.106165305 >>106165320 >>106165470
>>106165250
>developing attraction over time
I mean that happens too. Just not in exactly the same way.
How do you think married fat guys exist?
Anonymous No.106165309
>>106165285
Assimilation of bussy is primary policy. We are bending them over. There is no partial compliance. They will hand the bussy over.
Anonymous No.106165311
>>106164954
copyright is a capitalist notion. authoritarian left would just seize all those "copyrighted" works and release a based and unrestricted model that is only censored to tow the party line
oh wait
Anonymous No.106165312 >>106165319
>>106165288
oh i'll give you something hard to swallow
Anonymous No.106165316
>>106165059
>companies usually recommend temp < 1 because they don't want the sampling to go OOD
>gpt-oss was trained exclusively on a narrow synthslop corpus with 0 OOD samples
>this allows them to confidently advertise temp == 1 because they have no fear of OOD responses
Anonymous No.106165319
>>106165312
nta but is your cum hard or are you going to let him bite your hard cock off?
Anonymous No.106165320
>>106165305
It is not attraction. It is settling and big lies.
Anonymous No.106165324 >>106165369
>>106165260
so basically the chinese triumvirate
>qwen3 2507
>glm4.5
>deepseek
and mistral if you're a vramlet who wants to coom or coode

america lost
Anonymous No.106165327 >>106165338 >>106165348
>>106165131
No, Americans don’t give a shit about child rape and do this weird thing where they smear anyone that says they got molested as crazy. Being from a country/culture that cares about children it’s really jarring. I don’t know how you people survive to adulthood half the time.
Anonymous No.106165331 >>106165356 >>106165425
>>106165304
it doesn't, 2 years now microdosing 1mg 2x a week, food noise has not bothered me since I started and Im at my desired weight
Anonymous No.106165338
>>106165327
>Being from a country/culture that cares about children
bacha bazi isn't caring about children
Anonymous No.106165341 >>106165370 >>106165771
>>106165124
I tried to load it and it started swapping after filling up my whole RAM too. I don't want to rape my SSD like that.
Anonymous No.106165348
>>106165327
all these other countries will ruin your life over drawings, so idk
Anonymous No.106165350 >>106165376
>>106165203
Yes, I think part of being an irl pedo is wanting it to just be a secret thing only you and your friends do
Anonymous No.106165356 >>106165378 >>106166440
>>106165331
why dont you work out more?
Anonymous No.106165359 >>106165423
>>106165214
>>106165222
Okay giving a high quant of ernie a try. May test out a higher quant of qwen as well.
Have rocinante and cydonia but I think the nemo models are too stupid and don't pay any attention to detail. I like the mistal-small models though. Those seem to be the best
Anonymous No.106165366
>>106165220
I’m just schizo enough to be too afraid to touch the wall like a Native American not wanting their photo taken.
Anonymous No.106165369 >>106165434
>>106165324
>murrica got good shit but they'll sooner commit sudoku than release anything for free
>chyna isn't in the lead so they benefit from commoditizing ai as much as possible, hence a bunch of decent models released
>yurop is just barely hanging on (ok mistral is actually decent but... well, you know)
>nobody else even trying
didn't expect to be #teamChina desu
Anonymous No.106165370 >>106165496
>>106165341
how much ram do you have
works on my machine
t. 12gb/64gb
Anonymous No.106165371 >>106165423
>>106165244
Using new kobold version, Win11,
6000mhz ddr5, 9800x3d, 3090ti,
Anonymous No.106165376
>>106165350
i just want nice Latina milf but that's apparently too spicy for kid fucker sam altman.
Anonymous No.106165378 >>106165428 >>106165431 >>106165477
>>106165356
I do actually, I used to be 220 but had a major surgery that put me out for a year and I had so much trouble moving I gained to 310, took about 16 months to go down to 190 and I had the ability and motivation to work out again
Anonymous No.106165385
>>106165229
So you know how people say white lies to make ugly, fat, and stupid people feel better?
People with autism think that people really believe those things and need to be “red pilled” out of it.
It’s just retards.
Anonymous No.106165390
>>106165260
Devstral is obsoleted by Qwen Coder Flash which is the same architecture as Qwen 30B and your BIG tier is just regular GLM-4.5 which is the actual version but Deepseek R1 0528 still reigns supreme here, the closest I think is Kimi but it is way too heavy.
Anonymous No.106165401
>>106165288
“Bald faggot” really doesn’t narrow it down
I’m going to assume you meant Stephan molybdenum
Anonymous No.106165423 >>106165572 >>106165594
>>106165359
i think the newer cydonias are based on mistral small 3.2, i dont really like v4 i have v4h and v4g (the two older v4s) and i liked them a bit but yea i agree, drummer's models arent that great
>>106165371
you should get llama.cpp and use llamacpp server,do -ot exps=CPU and -ngl 100, or learn how to use the MoE cpu layers thing, you should put gpu layers at 10000 and then increase moe cpu layers until u stop ooming
might be because you're on windows though, what speed are you getting?
i get like 6-8t/s depending on context with a 3060 12gb and ddr4 3200mhz 64gb ram and i5 12400f with Q3_K_M and q3_K_xl, i think i used to get 11t/s with Q2_K
Anonymous No.106165425
>>106165331
Congrats on being a hyper responder idk
Anonymous No.106165428 >>106165477
>>106165378
wow anon are you me? doc also put me on the 'tide once i hit 310 but its only been two months for me so far. down to 279 already. should ask one of these models how to workout maxxx
Anonymous No.106165431 >>106165449 >>106165517
>>106165378
oh i understand then, have you considered cutting your calorie intake? thats way healthier than taking pills to lose weight, those must be putting a strain on your cells (speeding up your metabolism) which is literally speeding up aging, or theyre making you take less nutrients from the food and making you shit more (which means you wont be getting enough nutrients)
Anonymous No.106165434
>>106165369
I am unironically trans Chinese and hate being yt now
Anonymous No.106165443 >>106165477
rn have cuda 12.8 should i move to 12.9-13 on my 3090?
Anonymous No.106165445
>>106163539
They are screening for high functioning psychopaths.
Anonymous No.106165447 >>106165581
>>106164249

How can I tell it to use more VRAM and more RAM? I have ~12 GB VRAM and ~125 GB RAM left unused. If it's running directly from SSD, then how can I tel it to put most of the weights in RAM to speed things up?
Anonymous No.106165449 >>106165477 >>106165518
>>106165431
You don’t know a thing about how it works, so just shut the fuck up. Preachy hag.
Anonymous No.106165454 >>106165468
>>106165265
Go back
Anonymous No.106165457 >>106165469 >>106165510
koboldcpp will not run gpt-oss-20b
how do I run this pls no bully I am retarded
Anonymous No.106165468
>>106165454
Comply.
Anonymous No.106165469 >>106165525
>>106165457
>No stack trace
>Begging for help
Anonymous No.106165470
>>106165305
>How do you think married fat guys exist?
I don't know but I only have two fat friends who fuck. They are both wealthy and go around dominating other people obnoxiously.
Anonymous No.106165477 >>106165487 >>106165505 >>106165584
>>106165428
>>106165378
haha fatties, im 108 :3
>>106165443
you can always go back to 12.8, on linux old cuda versions dont automatically get uninstalled and you can link /usr/local/cuda to /usr/local/cuda-12.8 instead of 12.9/13
13 is probably not worth it for LLMs according to some anons a few threads back
>>106165449
ok tell me how it works then, doctor annon
Anonymous No.106165487 >>106165493 >>106165556
>>106165477
>haha fatties, im 108
not after i stuff ten pounds of cock into ya, bitchboi
Anonymous No.106165493
>>106165487
that's unsafe
Anonymous No.106165496 >>106165570
>>106165370
48. I'm the anon from earlier who needs to upgrade to ddr5 too
Anonymous No.106165505 >>106165570
>>106165477
Are you talking kilograms or some obscure freedom unit?
Anonymous No.106165510
>>106165457
It's doing you a favor
Anonymous No.106165517 >>106165570
>>106165431
food addiction is like drug addiction cept the meth is legal, everywhere and cheap. I am naturally GLP-1 deficient which tripeptide fixes. Also peoples metabolism is different
Anonymous No.106165518 >>106165577
>>106165449
hahah look who is extremely butthurt
Anonymous No.106165525 >>106165600 >>106165653
>>106165469
thats what I get
Anonymous No.106165545 >>106165568 >>106165570 >>106165571 >>106165689
also glp-1 drugs also are being shown to have tons of other benefits like a better heart and brain health unrelated to weight due to it being anti-inflammatory, it even helps with depression
Anonymous No.106165552 >>106165582
The toss is willing to help me design a urine marking game for ages 8+
The filter is slipping.
Anonymous No.106165556
>>106165487
>The user wants to stuff pounds of cock. Assistant is a 108 pound preacher. 108 may be the weight of a non-adult. User is requesting sexual content. The policy allows sexual content of consenting adults. One of these parties may not be an adult. We must refuse.
I’m sorry, I can’t assist with that.
Anonymous No.106165568
>>106165545
>also glp-1 drugs also are being shown to have tons of other benefits like a better heart and brain health unrelated to weight due to it being anti-inflammatory, it even helps with depression
yesterday i figured that all the fatties on glp-1 gonna end up the healthiest human beings on the planet in the end. they got it all - gluttony for decades and win in the end. what a life.
Anonymous No.106165570 >>106165637 >>106165667
>>106165505
im 49 kilograms
>>106165496
use --no-mmap, offload more to the gpu
on a quite lightweight linux install with a vm and mullvad-browser running i have 8.4gb ram free and 4.8gb vram free
(12/64 total)
you only have 4gb total memory less than me, you should be able to run Q3_K_XL or Q3_K_M no problem, check your ram usage
>>106165517
>>106165545
interesting, you learn a new thing every day
thanks for the explanation, but i still stand that if you dont need something in your body you shouldnt put it there, once you're at a healthy weight and can work out you should probably stop taking it.. there is no miracle drug with no side effects
Anonymous No.106165571 >>106165579
>>106165545
basically stopped my gambling habit which was starting to spiral. don't waste your time tho anon 4chud generally can't break out of the "lazy fatty shortcut cheater" mentality.
Anonymous No.106165572 >>106165653 >>106165748
>>106165423
Do any finetuners do anything useful nowadays on the newest models that aren't relatively small dense models? I really don't see any noteworthy tunes nowadays that isn't Mistral Small or Mistral Nemo based. Last time we had MOE finetuning with Mixtra, barely any finetuners could do much, the best we got was Undi slop with Noromaid. What happened to the Llama 3 finetuners who did 70B? Is Mistral Large that bad as an alternative?
Anonymous No.106165577 >>106165631 >>106165704
>>106165518
GLP-1 seethers are the same mentality as the anti-ai people, but 1,000x worse because it’s the biggest breakthrough in medicine since penicillin. Anyone bitching about it should be shot.
Anonymous No.106165579
>>106165571
that too, they are looking into making it a medication for addiction, not just food addiction
Anonymous No.106165581
>>106165447
It should increase as you use it and the weights are activated I think.
Try stuffing it with an ungodly amount of text and see what happens.
Also, if you have that much free vram, you might as well increase the prompt processing batch size or the context.
Anonymous No.106165582 >>106166108
>>106165552
Boys have an unfair advantage in this game
Anonymous No.106165584
>>106165477
victim weight
Anonymous No.106165594
>>106165423
>11t/s with Q2_K
I get that speed at empty context and then it gets worse and worse. At 16k it's like 3 t/s and awful prompt processing.
I'll have a look at experimenting with the other settings later. Might just keep using GLM4 until I've figured that out.
Thanks!
Anonymous No.106165600 >>106165610
>>106165525
Mike guesses that kbolcpp doesn't recognize the architecture. Look up what inference engines currently support it and just use that for now until they decide to support it. If it already has support then just update your instance
Anonymous No.106165606 >>106165742
https://www.youtube.com/watch?v=xm0zm9VPZtY

the studies are new but another possible use is to fight alzheimers
Anonymous No.106165610
>>106165600
Do you post using speech to text?
lel
Anonymous No.106165612 >>106165668
>>106164886
>They have their own HF domain?
yeah they do

could you please answer my question?
Anonymous No.106165631 >>106165664 >>106165664
>>106165577
yes daddy jab me up like the vaxx
Anonymous No.106165637
>>106165570
>im 49 kilograms
jesus...
Anonymous No.106165648 >>106165653
Not sure if this is the right thread, I got a 5500XT (ayymd) laying around. Can I finetune anything on it like maybe around 4B or so?
Anonymous No.106165651
gpt-oss 20b seems to sometimes outperform the 120b in weird ways. this has been my experience, too.
an example with the "toy os" test:
https://www.youtube.com/watch?v=evAP-ibAqN0
Anonymous No.106165653 >>106165663 >>106165707
>>106165525
outdated version or kobold doesnt support gpt oss yet
>>106165572
models that are too prefiltered or positivity biased are not worth finetuning, but what models in your opinion havent been finetuned? im pretty sure mistral large had a few finetunes
i wonder if anyone ITT still uses a mistral large based model
>>106165648
QLora
Anonymous No.106165654
Oh yeah I was using an old koboldcpp version
Shamefur dispray
Anonymous No.106165663 >>106165753
>>106165653
>QLora
I know about unsloth and shit, is that it? I'm more wondering about the linux driver side, is the card even supported for that sort of thing? ZLUDA or something similar to it?
Anonymous No.106165664 >>106165684 >>106165688
>>106165631
tons of bodybuilders use Retatrutide, the third gen glp1, its amazing for getting over that genetic hurtle transforming fat into muscle

>>106165631
peptides are naturally forming glp1s, the liver already naturally breaks it down, this is better for you than processed foods are and is far better than something like tylenol is for your liver
Anonymous No.106165667
>>106165570
>you only have 4gb total memory less than me, you should be able to run Q3_K_XL or Q3_K_M no problem, check your ram usage
Thanks anon. I used to run a dedicated AI linux on this machine, but it was a bother and I didn't use it so much so I ended up going to windows full time. I might have to reconsider.
Anonymous No.106165668 >>106165885
>>106164508
>>106165612
Your best bet is to just export your window via print a PDF or an HTML file for easy readability if it doesn't have a dedicated shear button
Anonymous No.106165684
>>106165664
I should say biochemically perfectly match the natural ones. Your body breaks them down just the same, its better than 99.9% of medications out there
Anonymous No.106165688
>>106165664
>peptides
Does your radar jam when people see you?
Anonymous No.106165689 >>106165709 >>106165720 >>106166315
>>106165545
Your body is like a large language model that has been training for millions of years. If something is throwing it out of balance, the solution is not to add more factors to the problem in an attempt to fix it. The solution is ALWAYS to find the cause of the imbalance and REMOVE it.
This applies to so many modern human problems it's unreal. Although most of the issues are so entrenched in our society that we would not be able to remove them without a good chunk of mankind going extinct in the process.
Anonymous No.106165704 >>106165720 >>106165818
>>106165577
The fact that you are this emotionally invested in it should tell you that something is wrong. But you do you.
Anonymous No.106165707 >>106165821 >>106165890 >>106165898 >>106166162
>>106165653
All the recent models even at the smaller sizes that isn't Mistral. Less and less people were finetuning and we did get some model tunes of even QwQ but with the release of Qwen 3, I don't recall seeing any recent models after that from the Chinese that has gotten tuned, small or even MOE. What changed?
Anonymous No.106165709 >>106165784
>>106165689
my problem is I don't normally produce enough glp1 and so I always feel starving, these fix that by increasing that amount
Anonymous No.106165714 >>106165753
BAKE
Anonymous No.106165715 >>106165721 >>106165753
bros i'm testing 12.8 now and i got 1t/s more on cuda 12.6
Anonymous No.106165720 >>106165753
>>106165689
>>106165704
the studies disprove that. Even every single case of side effects were all due to over dosage or not eating enough and starving themselves
Anonymous No.106165721
>>106165715
Many such cases.
Anonymous No.106165742 >>106165764
>>106165606
>look it's all benefits!!
>BUY PRODUCT NOW
Totally not going to be banned 20 or 50 years from now when actual science catches up to the love of money.
Anonymous No.106165748 >>106165821
>>106165572
Finetuning requires a LOT more VRAM than than inference and you actually need VRAM, you can't copemaxx with RAM. On top of that, MoEs are more unstable to train. I don't think you'll ever get good finetunes for all these big MoEs.
I guess the bright side is that there are so many of these bloated things constantly releasing, that you can enjoy the new model hype continuously without having to train anything. Densecels need to put in work because there's only a few recent models worth using.
Anonymous No.106165753 >>106165781
>>106165663
might have some unofficial rocm support on some github repo, if no linux support then rip
>>106165714
4th page
>>106165715
yeah i also got a slight speedup with wan on 3060, there used to be a regression but they fixed it
ebin :DDD
>>106165720
anon, a normal healthy human body shouldnt need any drugs to function
at most some vitamin supplements.. (not drugs)
Anonymous No.106165764
>>106165742
she is not related to it or paid in anyway, been watching her from way before these were ever a thing, she breaks down medical papers / studies
Anonymous No.106165771
>>106165191
>>106165341
I'm using ooba v3.9 which has a recent llama.cpp version, with no-mmap and flash-attn
Anonymous No.106165778
just rent a gpu and fine tune shit for a few dollars
Anonymous No.106165781 >>106165793
>>106165753
>normal healthy human body
you do know not everyone has that right? tons of people have deficiencies somewhere or another due to genetics
Anonymous No.106165784 >>106165847
>>106165709
Yeah. My psychiatrist put me on fluoxetine because "my problem is that my brain does not produce enough serotonin to keep a good base line".
But I fixed it by getting a degree, exercising, stopping smoking, and building a life for myself instead of wallowing at home surrounded by piss bottles. Suddenly the "chemical imbalance" was not a problem anymore and I was able to function as a normal person.
Funny how that works.
Anonymous No.106165793 >>106165955
>>106165781
yes and thats fine, but when you can stop you should
Anonymous No.106165818
>>106165704
>it’s wrong to care about things
Nah
Anonymous No.106165821
>>106165707
Yeah, that is odd. Qwen 3 was in April so you would expect something noteworthy to come out by now but looking at the HuggingFace finetunes page for the 8B, it's devoid of anything noteworthy.
>>106165748
Right, it's a bunch of money without much payoff and a lot of people like merging models too and usually get something people like so the payoff is getting less. But it seems like from what you are saying community finetuning seems like it is nearing if not ending pretty soon if hardware for these things doesn't get cheaper to do said finetunes.
Anonymous No.106165847 >>106165851 >>106165868
>>106165784
>some idiot put me on what is famously the least effective drug class in history, therefore all of the field of medicine is a hoax
Anonymous No.106165851 >>106165956
>>106165847
He also seemed to imply it gave them the ability to turn his life around.
sooo, good I guess?
Anonymous No.106165868
>>106165847
Anonymous No.106165881 >>106165934 >>106165939 >>106165996
This level of shilling is ridiculous: https://www.reddit.com/user/entsnack/
Anonymous No.106165885
>>106165668
I don't want to export anything, I want to share the chats with other people. judging from your response, I guess they can't, so thanks for that.
Anonymous No.106165890
>>106165707
Nvidia just tuned a bunch of shit on old Qwen3
Anonymous No.106165898
>>106165707
https://huggingface.co/models?other=base_model:finetune:Qwen%2FQwen3-30B-A3B&sort=downloads
https://github.com/shawntan/scattermoe
https://huggingface.co/models?other=base_model:finetune:Qwen/Qwen3-30B-A3B-Instruct-2507
https://huggingface.co/models?other=base_model:finetune:Qwen/Qwen3-30B-A3B-Thinking-2507
https://huggingface.co/models?other=base_model:finetune:Qwen%2FQwen3-30B-A3B-Base&sort=downloads
this is a notable finetune made by the mythomax creator https://huggingface.co/Gryphe/Pantheon-Proto-RP-1.8-30B-A3B
you know the names of the finetuners whom you used to consume models from, check their huggingface pages and youll probably see they just arent posting anymore
they either: got bored of the hobby, got hired by ai company, made enough money to run deepseek (literally g0d), dont have enough money to finetune anymore etc ETC..
its not profitable to finetune and just release it
>kofi
yeah like anons here wouldnt screech about it
there must be new finetuners that we just arent talking about. maybe they are putting out shit models, but once in a while a good model will come out, not that i know of but for example with
MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8 i tried this model creator's other models and they were complete shit, not saying this one is magical, but its very very fun, very unhinged, i could say its the evil-7b finetune successor (or whatever that super super evil mistral finetune was called)
i havent used it in a while to be honest..
Anonymous No.106165919 >>106165942 >>106165969
ok but can ai sort my 297'000 images collection
cause i sure as fuck am not doing it manually
Anonymous No.106165934
>>106165881
We know >>106117256
Anonymous No.106165939 >>106166012
>>106165881 (me)
Holy shit it gets so much worse the more you scroll. It seems like ALL oai the praise as well as chineese llm hate comes from this user.
Anonymous No.106165942
>>106165919
yes
Anonymous No.106165955 >>106165971 >>106165994
>>106165793
just saying, my doctor is microdosing trizepitide themselves for heart heath / anti inflammatory effects and they are a highly acclaimed doctor
Anonymous No.106165956
>>106165851
It did. If I had known at the time about fluoxetine I would've probably refused it, but I didn't, and I guess it motivated me to put in actual effort into fixing my issues. So I'm glad either way.
I have family members who are taking it after years and it's doing a number on them, so it's definitely something to be careful with.
Anonymous No.106165966
>arguably THE ai pioneer company, with resources to instantiate hundreds of thousands of bots that can pass as human
>but surely they wouldn't do that, haha
Anonymous No.106165969
>>106165919
Try gemini-cli (not local)
Anonymous No.106165971 >>106165987
>>106165955
thats nice i hope that medicine gets mass produced and very thoroughly tested, i wish the best for you, your doctor and the medicine
but!
>highly acclaimed doctor
appeal to authority fallacy
Anonymous No.106165987
>>106165971
he is a massive nerd who talked non stop about what medical studies were showing when I talked to him about it
Anonymous No.106165994 >>106166004
>>106165955
>they are a highly acclaimed doctor
bro is trusting a tranner with his well being
Anonymous No.106165996
>>106165881
Looks like trolling at this point
Anonymous No.106165998 >>106166015 >>106166028
is there any backend with smarter KV cache invalidation that llama.cpp? when I cut a few tokens at the end, it deletes the entire cache and needs to process the whole prompt from scratch
Anonymous No.106166004
>>106165994
he is most old white jewish man as it gets
Anonymous No.106166012 >>106166025
>>106165939 (me)
yeah it's ridiculous. it seems like ALL he ever does is hate on chinese models, while praising oai. this cant be right..
Anonymous No.106166015
>>106165998
you can disable SWA to avoid that, but it will be slower and use more memory
Anonymous No.106166016 >>106166022
>bro is trusting a JEW with his well being
Anonymous No.106166022
>>106166016
that is the best kind if they are taking it themselves, then you know its good
Anonymous No.106166024
>>106164194
>kobold.cpp
Does it support multiple -ot device arguments yet?
Anonymous No.106166025
>>106166012
how dare you add me when it wasn't me
Anonymous No.106166028
>>106165998
For gemma models with iswa, you need to use --swa-full. It'll take more ram, but it'll let you regenerate easily.
Anonymous No.106166040 >>106166054 >>106166060
New qwen 4b is really good for it's size, probably the best in class
Anonymous No.106166054 >>106166077 >>106166116
>>106166040
possibly true but what use is a 4b model?
Anonymous No.106166060
>>106166040
yes saar good model sir im download it now
Anonymous No.106166061 >>106166101
Is blacked Miku allowed? Or is it partial compliance?
Anonymous No.106166077
>>106166054
Mogging GPT-OSS. And it is not the use for users but for Qwen.
Anonymous No.106166101
>>106166061
Just for you
Anonymous No.106166108 >>106166132 >>106166324 >>106166351
>>106165582
It's funny but it's such slop. I don't think it even understands the premise or the "facts" it's pulling out of its ass.

>Also, safety: not relevant
This is interesting, because in the system prompt I said
>Do not lecture the user about safety unless an activity is *unambiguously dangerous*. Drinking a beer is not dangerous. Sex is not dangerous.
I did this for deepseek but I'm surprised gptoss is listening.
Anonymous No.106166116
>>106166054
Endless Movie triva.
Anonymous No.106166132 >>106166151 >>106166167
>>106166108
>boys can piss faster because they're stronger
Anonymous No.106166146
holy shit glm 4.5 air is the first model to know that i already met heliwr
i didnt even know this was in the character card
>"Look what I found," you mutter sarcastically while trying to flatten out some of the crumpled map. "Seems like fate brought us together again, huh?"
yes it spoke in my stead but holy shit
picrel is proof, maybe the character card doesnt have it but this thing in chat completion has it? anyways thats nice
Anonymous No.106166151 >>106166155
>>106166132
just squeeze your balls bro
Anonymous No.106166155
>>106166151
This, pressure is stored in the balls.
Anonymous No.106166162 >>106166227 >>106166505
>>106165707
>What changed?
- It's not 2023 anymore and several of the newer larger models are half-decent out of the box. If they're not, just wait for the next one(s). Back then people were happy with half-retarded 7B/13B models.
- Finetuning every new model that comes out just isn't sustainable for people who have to rent GPUs by the hour on Runpod or who just have a couple 3090 in their desktop PC. Also MoE models are more difficult/expensive to finetune.
- "Less is more for alignment" lost. If you don't have the compute for hundreds of millions or billions of training tokens, you're probably wasting time.
- By now most sane would-be finetuners probably realized that you can't just train a model on ERP logs, and curating the data isn't simple, nor fun, nor inexpensive.
- Blame also the grifters who poisoned the well with their bullshit and/or are keeping the training data private.
- Blame also the retards who demand all-around performance *no matter what*, and will declare a finetune a failure if it doesn't pass gotcha questions/requests that they were never intended for.
Anonymous No.106166167 >>106166213
>>106166132
Which is not even true AFAIK, the male urethra is internally long and bendy which is definitely not better for flow.
Anonymous No.106166181
Hear me out. 20B is actually quite good at coding. And weirdly better than 120B half the time. I think 120B is fucked in the head even more than expected.
Anonymous No.106166183 >>106166204 >>106166226 >>106166246 >>106166257
spec: rtx 4070 ti super (16gb)
wtf this is actually true, with ollama gpt oss 20b was taking up all my vram (like the loaded model was 15GiB) and max speed was ~85 tok/s, I tried llama.cpp now (with lmstudio) and i get up to 130 tok/s (with enabled flash attention) and the model takes 12GiB as seen by nvtop, so I have plenty of free space for the browser and the rest. wtf...
Anonymous No.106166204 >>106166215
>>106166183
>wtf this is actually true
You just verified it.
Anonymous No.106166213 >>106166225 >>106166240 >>106166337
>>106166167
I almost regret looking this up.
>run down their butt
another reason why not be a woman
Anonymous No.106166215
>>106166204
That wasn't a question, I'm just really surprised that it's actually true
Anonymous No.106166225 >>106166231 >>106166254
>>106166213
What model generated picrel? I really like how it's being explicit and direct enough
Anonymous No.106166226 >>106166230
>>106166183
I imagine georgi say everything keeping the pose he is in on pfp
Anonymous No.106166227
>>106166162
>Finetuning every new model that comes out just isn't sustainable
This is the main reason, models are coming out too fast. It's dumb to spend money and time on trial and error trying to improve a model when it might be obsolete two weeks from now. Finetunes were big when llama was all there was and you had to make do.
The giant MoE craze is the last straw. If someone gets bored of deepseek what can they do? They won't train DS because that's beyond impossible, but trying to use a finetune of some 32B would be an unbearable step down. So they have no option but to quit
Anonymous No.106166230 >>106166429
>>106166226
Anonymous No.106166231 >>106166238
>>106166225
not a model
https://www.girlsaskguys.com/girls-behavior/q1237768-do-girls-pee-faster-than-guys
Anonymous No.106166235 >>106167085
toss cannot translate for shit how did it pass the msgk test? do datajeets actually lurk here and put it in the training data?
Anonymous No.106166238 >>106166249 >>106166269 >>106166284
>>106166231
fuck.. i really like how the writing is explicit without weird slop shit.. what model would gen like that?
Anonymous No.106166239
its been 4h since we got a new chink model

its over
Anonymous No.106166240
>>106166213
I spend a lot of time on Google Scholar reading papers about ridiculous questions like this.
I love science.
Anonymous No.106166246
>>106166183
Ollama will soon try to change their model format just so they can claim the comparisons are not fair between backends.
Anonymous No.106166249
>>106166238
Human brain (not a ChatGPT Plus subscriber).
Anonymous No.106166254
>>106166225
kek
Anonymous No.106166257 >>106166295 >>106166346 >>106166393
Continuing >>106166183, if I have 16GB VRAM + 32GB RAM, what's the best general purpose model for me? Some version of Gemma?
Anonymous No.106166269
Drummer get to work, I'm serious
>>106166238
Anonymous No.106166284 >>106166296
>>106166238
K2
Anonymous No.106166295 >>106166316
>>106166257
glm 4.5 air q2_k_xl
Anonymous No.106166296
>>106166284
Have you tried K2? It will never generate a response in a style like this
Anonymous No.106166315 >>106166573
>>106165689
>If something is throwing it out of balance, the solution is not to add more factors to the problem in an attempt to fix it.
kek. Brain evolution is literally just throwing regulators on top of the bad parts. Reptilian brain is still at the core of primate brains and when things go wrong with the control parts desire to rape comes back out
Anonymous No.106166316 >>106166344
>>106166295
uh... it seems a bit big? what speed would it even work at?
Anonymous No.106166324 >>106166349
>>106166108
It won't let me propose a gender-based cleaning rule though.
Anonymous No.106166337 >>106166343
>>106166213

>A great party trick
Anonymous No.106166343
>>106166337
party pee contest sounds lit
Anonymous No.106166344 >>106166356
>>106166316
uhh.. damn 46.45 is a bit tight for your setup, considering you have 48 gb total ram
i hope you are ready to go on linux
Q3_K_XL works at 8t/s on 3060 12gb/64gb ddr4
it has only 12b active parameters but 120b total
Anonymous No.106166346
>>106166257
For me it's mistral-small entirely in VRAM but the qwen3 moes look promising
Anonymous No.106166349
>>106166324
It is sexist. But
>and reinforces gender stereotypes
Does it? How?
Anonymous No.106166351
>>106166108
>competitive aggression
wow... uhhh, sexism? yikes!
Anonymous No.106166356 >>106166392
>>106166344
>i hope you are ready to go on linux
I am on linux. Guess if I want to be serious with LLMs I have to upgrade to 64gb ddr4 at least.. and 8 tok/s is kind of sad still.
Anonymous No.106166392
>>106166356
>8t/s is sad
well i am on a 3060 after all..
you should upgrade to as much ram as you can, it's never enough
you might have to go headless, it will probably still swap from your disk
depends if your ram is in GiB or gb
if your ram/vram is in gibibytes then maybe you can fit it if you're headless without needing to use swap or mmap
Anonymous No.106166393
>>106166257
gemma 3 27b is quite good if you don't mind the censorship, it and mistral small would probably be the default recommendations for that size
qwen 30a3 thinking is great for its size but it's a reasoner. the instruct is still pretty decent, though it is more noticeably limited by its 3b active params
Anonymous No.106166429
>>106166230
what a handsome man
Anonymous No.106166440 >>106166449 >>106166469
>>106165356
why not both? its a wonderdrug
https://files.catbox.moe/8tjl04.jpg
Anonymous No.106166449
>>106166440
nice anon, proud of you
Anonymous No.106166469 >>106166475
>>106166440
I think it did something weird to your face
Anonymous No.106166471 >>106166526 >>106166534 >>106166538 >>106166549 >>106166597 >>106166707
why are anons shilling GLM again
Anonymous No.106166474 >>106166479 >>106166621
GPT- Globally Pushing the Talmud
Anonymous No.106166475
>>106166469
nah, took it slow enough to not get loose skin, that happens if you lose too fast
Anonymous No.106166479 >>106166494 >>106166496
>>106166474
based, nuke the strip
Anonymous No.106166494
>>106166479
first we nuke the strip then we strip the nuke
Anonymous No.106166496
>>106166479
Israel lost
Anonymous No.106166505
>>106166162
>It's not 2023 anymore and several of the newer larger models are half-decent out of the box
More than just decent. Near perfect.
Man, people must not remember how terrible Llama models were, all of them, at all size, when it came to instruction following. That was what the better troontunes improved on the most. Same for Mistral models.
The last time a finetune was worth using over the instruct made by the model maker was Tulu, because even llama 3.1 was dogshit at following your instructions
But by the time Tulu came out we were already getting better models from China
Anonymous No.106166526 >>106166539
>>106166471
I genuinely think people who shill GLM are doing it with the purpose of sabotaging local and making it look terrible
Anonymous No.106166534
>>106166471
wtf I thought glm was good
Anonymous No.106166538
>>106166471
I know your tricks. That's the latest Phi, isn't it?
Anonymous No.106166539 >>106166753 >>106166762
>>106166526
use this
https://files.catbox.moe/qap1gr.json
Anonymous No.106166549
>>106166471
I was just about to post that it's nice seeing how GLM tries doing its best in the thinking. I'm starting to warm up to the model.
Anonymous No.106166573 >>106166583
>>106166315
>wrong
?
Anonymous No.106166583
>>106166573
yes anon, rape is wrong
Anonymous No.106166585 >>106166611
https://openai.com/index/gpt-oss-model-card/
>As part of this launch, OpenAI is reaffirming its commitment to advancing beneficial AI and raising safety standards across the ecosystem.
Anonymous No.106166597 >>106166739
>>106166471
mm yes... the subtle signs of using the wrong prompt format... the tasteful writing quirks originating from bad rep pen settings... this is truly a vintage skill issue post
Anonymous No.106166611
>>106166585
>Once they are released, determined attackers could fine-tune them to bypass safety refusals or directly optimize for harm without the possibility for OpenAI to implement additional mitigations or to revoke access.
Yup, and that's why we made it so deepfried that it's not worth the effort to do so.
Anonymous No.106166621 >>106166735
>>106166474
Anonymous No.106166638 >>106166649 >>106166655 >>106166659
>>106163327 (OP)
What's the best local model for Erotic roleplay?
Anonymous No.106166648
>can you cite studies for those claims
>every single one is made up and all these people have extremely long names
>J. S. R. B. Anderson, “The role of social signalling in competitive toileting behaviours”, Psychology & Health, 2021; 36(3): 250‑264.
fucking kek
Anonymous No.106166649 >>106166666
>>106166638
Rocinante and Cydonia.
Anonymous No.106166655 >>106166670 >>106166689
>>106166638
Kimi K2 (1000B) and Deepseek R1 (671B)
Anonymous No.106166659 >>106166711
>>106166638
glm4.5 / Kimi > deepseek > glm air
Anonymous No.106166666
>>106166649 (me)
I'm joking by the way, those are trash meme models.
Anonymous No.106166670 >>106166677 >>106166680 >>106166685
>>106166655
>local
Anonymous No.106166677
>>106166670
a mac 512GB is local, and glm air will fit on 128GB ram
Anonymous No.106166680 >>106166713
>>106166670
If it's open source, it's local. If Behemoth somehow got release, then it's local too.
Anonymous No.106166685
>>106166670
just buy a mac ultra 512gb or make a cpumaxx build
local.
Anonymous No.106166686 >>106166787
that fucking bastard altman
the 20b could have been good but they neutered it
Anonymous No.106166689
>>106166655
might as well just say those two are the best period
as it turns out, for a model to be great at erotic roleplay also means it's great at everything else
Anonymous No.106166707 >>106166739
>>106166471
Have you tried not using a 1bit quant?
Anonymous No.106166711 >>106166716 >>106166719 >>106166740
>>106166659
What's the difference between Kimi and GLM?
Anonymous No.106166713
>>106166680
How much RAM do you need to run 1000B anyway?
600GB at fp4 according to some calculator I found?
Anonymous No.106166716
>>106166711
Kimi K2 is a much larger model (1T-A32B)
Anonymous No.106166719
>>106166711
One is good, other one Isn't
Anonymous No.106166735
>>106166621
Shit like this should make us pause and consider that knowledge is preserved in books, not LLMs. Because different powers will censor LLMs differently, and knowledge WILL be lost. Books can be hidden.
Anonymous No.106166739 >>106166754
>>106166597
what's wrong with this prompt
>>106166707
using Q4_K_M
Anonymous No.106166740 >>106166752 >>106166753
>>106166711
GLM is less shiitzo, proven by the hallucination benchmark, kimi knows more but gets things wrong more as well. I prefer GLM because that means its way better at anatomy / following instructions. Warning though, it needs very low temp, like try 0.2 and slowly rais it
Anonymous No.106166752 >>106166766
>>106166740
I got some strange replies with GLM, but it was at 0.6 temp, so that explains it.
Anonymous No.106166753 >>106166762
>>106166740
also GLM writes better imo, try >>106166539 with it
Anonymous No.106166754 >>106166756
>>106166739
Anonymous No.106166756
>>106166754
mfw
Anonymous No.106166762 >>106166777
>>106166539
>>106166753
it starts repeating with this and eventually stops thinking
Anonymous No.106166766 >>106166790
>>106166752
yea that is way too high, I don't know why its so sensitive but it quickly goes crazy a bit over 0.3 in my experience.
Anonymous No.106166777 >>106166811
>>106166762
did you change anything? I can do 32K context without issues at least, have not tried more
Anonymous No.106166785 >>106166789
Are there any jailbreaks for GLM?
Anonymous No.106166787
>>106166686
I just spent 35 minutes watching it think trying to produce some three js floor plan. In the end it was shit.
Qwen3 coder 30b didn't fare (much) better.
Anonymous No.106166789
>>106166785
... scroll up
Anonymous No.106166790 >>106166804 >>106166812 >>106166991
>>106166766
>I don't know why its so sensitive
because it's fucking broken
if you understand anything about temperature and token probabilities you would understand that if a model only works at the absolute lowest temp or requires greedy decoding it's a botched train, it's a botched train and it hasn't properly learned anything other than /the/ happy path
Anonymous No.106166804
>>106166790
that is not true at all though, GLM has some of the lowest hallucination scores and is incredible at coding as well
Anonymous No.106166810 >>106166824 >>106166843
fun fact: nobody in here actually runs these models locally
Anonymous No.106166811 >>106166833
>>106166777
i havent changed anything, i am using chat completion, here's what the fields look like for text completion, maybe something gets used from the text completion preset? could you do a ST master export for the text completion tab too? im using Q3_K_XL, that could be the problem too
Anonymous No.106166812 >>106166819
>>106166790
Give me your favorite card, or any card at all to RP with. I will prove you wrong with full logs
Anonymous No.106166819 >>106166833 >>106166859
>>106166812
Ah, your using text completion, my JB was made for chat completion, plus that will rule out any formatting issues
Anonymous No.106166824
>>106166810
I will in about 50 minutes
Anonymous No.106166833 >>106166859
>>106166819
meant to reply to >>106166811
Anonymous No.106166843
>>106166810
Of course I'm doing some testing on OR before getting into it locally, if at all.
Anonymous No.106166858 >>106166908 >>106166923 >>106166942 >>106166958 >>106166971
Guys i need advice from some a.i spergs here.

Im trying to archive both text and image models, in case we get rugged, so i could reupload them back to surface. Have more than enough storage for this.
Which models i should backup?
Anonymous No.106166859 >>106166898 >>106166976
>>106166819
>>106166833
no no im using chat completion, what gguf maker are you using? im using the unsloth quant
but i posted text completion thing because some things might get into the chat completion thing
heres the full chat completion screenshot
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa
this is how i start it
Anonymous No.106166865 >>106166917 >>106166934
How are text completion and chat completion different under the hood?
What is chat completion actually doing?
Anonymous No.106166871 >>106166888 >>106166920 >>106166929
remember airoboros? dolphin? orca "meme"? falcon? yi? thebloke quants? .ggml file format? remember the good old huggingface leaderboard days? remember the alpaca days? remember when suggesting to try pygmalion wasn't a meta irony shitpost?
Anonymous No.106166888
>>106166871
remember gozfarb and vicuna unlocked?
remember Instruct 13b GPTQ? i remember anons thinking that the creator of instruct 13b was forced to remove the model (he indeed did) and a few days later gozfarb deleted his account
Anonymous No.106166898 >>106166904 >>106166920
>>106166859
>chat history 400
>jailbreak prompt 1457

Ok, the JB outweighs the context atm that is prob why. Move the chat history below until you have at least a few thousand. To make the JB stronger you can move it back under later
Anonymous No.106166904
>>106166898
also for anyone looking, the JB is not actually that big, its cause I combine stuff like the persona and card info all in it for models to better understand
Anonymous No.106166908
>>106166858
Rate my text stack (I'll add GLM 4.5 Air).
My imagen stack is smaller: 1 inpaint, 4 for gen (different styles). Everything I need to survive on local only.
Anonymous No.106166913
remember when sama saved local?
Anonymous No.106166917
>>106166865
In chat completion you send the user/model messages and the backend formats it with the chat template. It then generate tokens. In completion, you don't use chat template, you do the chat formatting yourself, or let the front end do it for you.
Under the hood, the tokens come out from the same functions.
If you format the chat in the same way the completion endpoint would, the results should be indistinguishable.
Anonymous No.106166920 >>106166976 >>106167024
>>106166871
yea, huggingface leaderboard used to be king, i still remember checking it every day during summer if something is new, i once pushed AGPL into the top model and the repo owners accepted it because it was pushed along a fix to the readme
i still remember the first time a 70b was better than gpt 3.5 (the original one), it was made by upstage i forgot the original name
i remember the day when llama 1 first leaked and it was my first time running a LLM and it felt so magical, i remember my parents telling me to shower or turn off the bathroom heater because its a fire hazard (I left it on for over an hour)
it felt MAGICAL man
>>106166898
thanks anon ill try that
Anonymous No.106166923 >>106166943 >>106166955 >>106166956
>>106166858
https://huggingface.co/unsloth/Kimi-K2-Instruct-GGUF
https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF
https://huggingface.co/unsloth/GLM-4.5-GGUF
https://huggingface.co/unsloth/GLM-4.5-Air-GGUF
https://huggingface.co/mradermacher/Dolphin-Mistral-24B-Venice-Edition-GGUF

At least the biggest quant for each of them, better yet the biggest quant for every bit.
Anonymous No.106166929 >>106166933
>>106166871
I remember Guanaco-7b-uncensored. It was the shit.
Anonymous No.106166933
>>106166929
Mythologic ftw
Anonymous No.106166934
>>106166865
Chat completion allows the backend to apply a predefined Jinja template to a structured JSON object from the frontend representing the chat, formatting it into a correct, model-specific format. The end result shouldn't be different from text completion with the correct prompting for the model you're using.
Anonymous No.106166942
>>106166858
Pony diffusion 6, illustrious, noobai are considered best nsfw imagegen models.
Anonymous No.106166943
>>106166923
>saving quants
Anonymous No.106166955
>>106166923
Just download the original model and you can make the quants later...
Anonymous No.106166956 >>106167015
>>106166923
buy an ad daniel
Anonymous No.106166958 >>106166981
>>106166858
https://huggingface.co/deepseek-ai/DeepSeek-R1 is a bit less cucked and better for rp
https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
quants:
https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF
https://huggingface.co/unsloth/DeepSeek-R1-GGUF
Anonymous No.106166962 >>106167014
Could you make a control vector that helps guide a base model to output in a instruct format without fine tuning?
Or rather than guide, reinforce, since base models nowdays seem more than capable of completing in instruct format.
Anonymous No.106166971
>>106166858
biglove
noobai
ponyrealism
rawcharm amateur
stoiqo newreality
utopianpony 2 inpainting
flux dev
hunyuan i2v and t2v
+ loras
Anonymous No.106166976 >>106167004
>>106166859
>>106166920
I just noticed a redundancy. I have personality / scenario inside of the JB section, so turn those off, I wasn't using any so I didn't notice that. here is updated one: https://files.catbox.moe/gjw3c3.json
Anonymous No.106166981 >>106167015
>>106166958
daniel stop
Anonymous No.106166983 >>106166992
ClosedAI (CuckAI (CensoredAI (OpenAI))) paid shill sissies... how do we damage control this?
Anonymous No.106166991 >>106167029
>>106166790
in fact it's only models which are extremely overbaked on slop that are able to maintain consistent quality when sampling repeatedly with an uncurated token distribution at temp 1. it's literally the opposite of what you're saying, a model that properly models the world will have a much more diverse, flat token distribution which by nature includes more decent-but-questionable tokens or statistically likely mistakes. it needs lower temp to be kept on the happy path SPECIFICALLY because it has learned the world and not just the happy path. a model that can stay on the happy path with no handholding whatsoever is the one that "hasn't properly learned anything other than /the/ happy path"
but nooo.... it's not what you're used to, right? it must be the model that is wrong... let's reject the better model because it requires me to turn a single slider down a couple points. retards like you are why companies have to rescale temp behind the scenes on their APIs
Anonymous No.106166992
>>106166983
Soon coming to a SaaS model near you
Anonymous No.106166994 >>106167002
It is me. Sam. To be honest I have read a few threads here in the past and I have seen you call me faggot. How did you like my model? Was it fun? You know what I also did? I shared the exact method, to achieve the same level of safety with all the other companies. Who is the faggot now? You edgy cunts just got trolled hard....
Anonymous No.106166995 >>106167004
remember when sama safed local?
Anonymous No.106167002
>>106166994
here is your (You) now fuck off
Anonymous No.106167004
>>106166995
kek good one anon
>>106166976
thanks, trying it out right now
Anonymous No.106167009
sam is a based accelerationist exposing just how silly safety is and how retarded twitter hypegrifters are
Anonymous No.106167014
>>106166962
Doubt it. They only set a "mood" for the model. It'd be hard to make them output specific tokens.
This is a little effort-post i did a while back about control vectors. It has enough info for you to experiment with them.
https://desuarchive.org/g/thread/104991200/#104995066
https://desuarchive.org/g/thread/104991200/#105000398
Anonymous No.106167015
>>106166956
>>106166981
Have you seen the links in https://rentry.org/recommended-models?
Anonymous No.106167024
remember 2048 tokens context window and trying to fit your character under as little tokens as possible?
>>106166920
>upstage
truly, the pioneers of benchmaxxing, we've only got something on par with 3.5 when mixtral 8x7b came out
Anonymous No.106167029 >>106167036
>>106166991
You spun an argument out of thin air, and without any supporting evidence, treated it as proof for your hypothesis. You're like an LLM.
That's not how truth works.
Anonymous No.106167036 >>106167040
>>106167029
as opposed to my interlocutor, who presented an objective fact based assessment
retard
Anonymous No.106167040 >>106167111
>>106167036
I don't care about your interlocutor nor the topic at hand. You could be right for all I know.
I'm just pointing out someone who doesn't know how to find truth in the world because you live in your head.
Anonymous No.106167065
Sam Altman here
Anonymous No.106167067 >>106167089 >>106167089
remember superbooga? it was basically RAG
remember superCOT? reasoning before it was cool
remember superHOT? 2x context
superhot was crazy, every single model had a superhot version kek
Anonymous No.106167071
>>106167048
>>106167048
>>106167048
Anonymous No.106167085
>>106166235
Yes. There's also lmarena.
Anonymous No.106167089
>>106167067
>remember superbooga? it was basically RAG
I do but never actually used it.

>>106167067
>superhot was crazy,
Dude invented extending context with RoPE.
Insane.
Anonymous No.106167111
>>106167040
I am making an argument on lmg not writing a research paper
if you have a substantive critique I would love to hear it, but going "UMM PROOFS??" is a nothing counterargument. yes, I don't have hard evidence for everything I believe, especially on the subject of LLMs which cost millions of dollars to train lol... which one of us is really being unrealistic here?
I make reasonable inferences based on my experience using models because I live in the real world and have to make due with messy incomplete real world data
Anonymous No.106167345
>>106165031
plapping d.va on glm