file
md5: 6c5727784bcc1fd282da04e24aa51dee
🔍
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>106159744 &
>>106156730►News
>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1>(08/05) TabbyAPI adds logprobs support for exl3: https://github.com/theroyallab/tabbyAPI/pull/373>(08/04) Support for GLM 4.5 family of models merged: https://github.com/ggml-org/llama.cpp/pull/14939►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>106159744--Fundamental CUDA scheduling limitations in MoE model inference with dynamic workloads:
>106159804 >106159879 >106159941 >106159892 >106159939 >106160442 >106160454 >106160634 >106160687 >106160697 >106161203 >106161244 >106161319 >106161343 >106161716 >106161772 >106160704 >106160773 >106160797 >106160960 >106161088--:
>106161761 >106161773 >106161797 >106161919 >106161925 >106161926 >106161933 >106161974 >106161987 >106161997 >106161780 >106161826 >106161861 >106161915--Debate over MXFP4 quantization efficiency and implementation in llama.cpp:
>106160230 >106160249 >106160378 >106160405 >106160434 >106160408 >106160455 >106160770--gpt-oss-120b excels at long-context code retrieval despite roleplay limitations:
>106159798 >106159872 >106159895 >106159919--Choosing between GLM-4.5 Q2 and Deepseek R1 with dynamic quants on high-RAM system:
>106160040 >106160056--Comparison of TTS models: Higgs, Chatterbox, and Kokoro for quality, speed, and usability:
>106161046 >106161091 >106161164 >106161335--GLM-4.5 Air praised for local performance, gpt-oss-120b criticized for over-censorship:
>106159855 >106159875 >106159908 >106159929 >106159946 >106159956--Prompt-based agent modes with potential for structured grammar improvement:
>106161701--Anons await next breakthroughs in models, efficiency, and affordable hardware:
>106160460 >106160477 >106160481 >106160487 >106160494 >106160508 >106160524 >106161134 >106161055 >106161071 >106160717--Skepticism and mockery meet Elon's claim of open-sourcing Grok-2:
>106160521 >106160539 >106160545 >106160579 >106160608 >106160692 >106160744 >106160759 >106160784 >106160913--DeepSeek V3 with vision shows strong image understanding in early tests:
>106159779 >106159794 >106160580 >106160631--Miku (free space):
>106160040 >106161134►Recent Highlight Posts from the Previous Thread:
>>106159752Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
the only thing that excites me about the possibility of the grok2 release is actually grok2-mini. I'm gonna guess the full-sized grok2 model will be a 1T-A100B model with the IQ of llama3
>>106161679>her voice a gutteral, erotic promise
remember this?
>our research team did something unexpected and quite amazing and we think it will be very very worth the wait
LOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOL
>>106163350So when it says stuff like "the policy says X is okay. The policy says Y is forbidden", is it actually referencing a specific document?
>>106163389i hope it's a promise of something darker, more primal
so what are you guys actually doing with these massive models??
Are there video local models yet or does that still need supercomputers?
cockbench is now officially reddit culture with 555 updoots
https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507
>>106163403Probably not. That's just what the training examples looked like. And over enough iterations that blurred together with training examples that consist of scrapped forum posts like "YOUR POST HAS VIOLATED THE POLICY NUMBER BLAH BLAH BLAH PAGE 3 OF THE SITEWIDE RULES" etc.
The rulebook doesn't actually exist.
>>106163445wow for once something trickled down from here instead of the reverse
>>106163454What is
> 2507?
file
md5: 585af8ab39ef8092c4f2f218c9d55a3a
🔍
>>106163445Drummer, the creator of cockbench, got even more updoots on his post.
https://www.reddit.com/r/LocalLLaMA/comments/1migl0k/gptoss120b_is_safetymaxxed_cw_explicit_safety/
>>106161701>I think we could do that a lot better using json schema/BNF grammar.It seems to work this way already if tool_choice is set to required, at least in vLLM:
guided_decoding = GuidedDecodingParams.from_optional(
json=self._get_guided_json_from_tool() or self.guided_json,
regex=self.guided_regex,
choice=self.guided_choice,
grammar=self.guided_grammar,
json_object=guided_json_object,
backend=self.guided_decoding_backend,
whitespace_pattern=self.guided_whitespace_pattern,
structural_tag=self.structural_tag,
)
There's a function called "_get_guided_json_from_tool".
how do I use mikupad with ollama
>>106163327 (OP)Are the gpt oss models any good?
every time I hear something new about or from anthropic and claude it sounds more and more like an actual sect slash cult
https://news.ycombinator.com/item?id=44806640
>Anthropic has a tough alignment interview. Like I aced the coding screener but got rejected after a chat about values. I think they want intense people on the value/safety side as well as the chops.
>got rejected after a chat about values
>A CHAT ABOUT VALUES
>>106163505Why do you need Mikupad? Just type in ollama run gpt-oss and enjoy the best local has to offer.
>>106163392He was right. The memes were awesome.
>>106163543I want full control over the chat template and modify model responses
>>106162954bases professional LLM rapist.
this is also the fate of safetymaxxed le cunny daughter model
Reddit has all the cool benchmarks like the spinning hexagon and cockbench. What did you lonely faggots ever contribute?
>>106163505use the openai api and check "chat completion api" because ollama doesn't really work with the classic completion on their OAI endpoint
you will lose a lot of what makes mikupad great, including the ability to see token prediction percentages
>>106163517they are good at answering AIME questions and bad at literally everything else
>>106163586Being able to do that would give you the ability to circumvent safety protocols which would be incredibly unsafe. I cannot help you take any actions that may be dangerous.
Thank you for your understanding.
>>106163539I wonder what they're looking for.
I'm okay with the idea of making an effort to make it so models in their default assistant configurations don't tell people to commit violence or kill themselves, or give bad advice.
But if they want me to tell them that I think fiction is reality and we need to make sure nobody even does pretend violence, I can't get with that.
>>106163505> install mikupad> hook it to ollama via ollama's exposed APIWhat about above isn't working for you?
>>106163591you won't get pennies for your blue checkmark by ragebaiting here randeep
>>106163539are corpos going to compete on safetymaxxing now
truly, only local can save local at this point
https://github.com/sapientinc/HRM
https://arxiv.org/pdf/2506.21734
Nothingburger or really is it a big leap? Seems like it is but haven't read the paper myself, I'm too lazy. Some people have been saying it's one of those situations where they train the models in a way that it performs well in tests for optics but still just in 27 million params?!
>>106163517They are the safest ever.
>>106163539Honestly I think it’s nice. It’s completely invalidated by them working with the DOD, but it’s nice. Better a well meaning schizo than a literal confirmed incestuous child rapist psychopath.
>>106163605Yeah I just saw the cockbench. I was only interested in it for coding, but if it’s lobo’d it’s going to be worse at everything else too.
After doing some more testing I've found the 20B is incrementally better than most over models in its size class, while falling slightly short of Qwen 30BA3B and having far longer context. Its actually decent as long as you don't want to goon and don't mind the odd regen.
>>106163628if he hooked without using the chat completion endpoint it's broken. Ollama only supports chat completion on their OAI endpoint. Chat completion means it's ollama that handles your message roles and you can't alter the chat template from mikupad
>>106163659Yeah, I deleted that post once I realize ollama's just not going to work for him.
Are tool calls working with gpt-oss in llama.cpp? When I tried it yesterday with a simple echo tool it kept crashing with runtime_errors.
>>106163586just use llamacpp server
>gpt-oss-120b & gpt-oss-20b
The thread summaries made these seem pretty fucking shit. Are they shit?
>>106163637I don’t even understand what modality it is. It’s not an LLM.
>>106163654>After doing some more testing I've found the 20B is incrementally better than most over models in its size classI would take Gemma 3 27B over it anyday
or even Qwen 14B if I don't need a lot of knowledge in the model for the prompt
the only utility of 20b is being fast at genning the wrong answer
>>106163686Shit for coomer shit. What about for things like programming?
so where's the guy who said openai's open source model would shit on deepseek?
>>106163680they are so great I'm thinking of canceling my OpenAI subscription.
>>106163711his contract ended
>>106163680They're really good with a jailbreak. The censorship happens in the reasoning part.
>>10616370720B is far smarter than Gemma 3 27B and Qwen 14B in my testing, so if you're not running afoul of the (admittedly draconian) safety features I'd argue its the superior choice in every respect - that said, I can't see it replacing the comparatively uncensored, multilingual and "good enough" Mistral Small 3.2 as my daily driver
>>106163708surprisingly bad, it has a high ceiling but it fucks up a lot relative to comparable models
it's a really weird janky release, I expected more from OAI to be honest. this thing is one of the most deepfried models ever created
>>106163392probably MXFP4
>>106163442>text-to-videoLTXV and wan2.2-5B
>video-to-textsupercomputer needed
>>1061634902507 == 07/2025 (release month/year)
>>106163680gpt-oss is just phi-5 (benchmaxxed synthetic data slop). they're good at math and competition code. that's kinda it though
>>106163722>They're really good with a jailbreakThey're not even good at safe for work stuff
less knowledge than qwen models (unbelievably benchmaxxed)
pumped up verbosity to win LM arena (just ask any random question about cultural stuff watch write pages and pages of comparison tables and listicles)
It's really not good at programming, though none of the small models (and I include the 120 as small) are
>>106163746>probably MXFP4meme
>>106163430making my hand strong
>>106163722>really goodlet's not go crazy, it'll go along with roleplay and shit but it's still completely sovlless
>>106163729>20B is far smarter than Gemma 3 27Bit literally knows nothing
it's a know nothing model
it's not even good for translation usage because of that
>>106163761im not saying MXFP4 isn't a meme, im just saying that's probably what sama was trying to shill off as an Epic Discovery
I wonder how many people got their refusal hymen breached by GPT-OSS and think the model sounding like this is perfectly fine.
>>106163392They did. Safety 2.0 is hilarious and terrifying.
>>106163746>gpt-oss is just phi-5It's a safetyslop reasoning finetune of a late iteration of the ChatGPT 3.5 web endpoint model.
>>106163776I would honestly believe it if sama had paid shills to spam all social media, even 4chan
he comes across as that type of guy, not unlike musk who paid people to play his video games (LOL)
I have no idea how I missed all the MCP stuff happening this year. It’s kickstarted a manic episode. Shit is great. Hooked it up to unreal engine and it’s absolute crack.
>>106163773Its not meant for translation, its monolingual
>>106163734>one of the most deepfried models ever createdThat's pretty much exactly what I expected from them TBdesu. It was obvious from the initial announcement that they were going to release a model so safetyslopped and benchmaxxed that they could claim SOTA scores but never be in danger of people actually adopting it or successfully finetuning it to be useful.
Just ask yourself "if I was the worst possible caricature of a deceitful jewish homosexual, how would I play this?" and you'll usually be pretty good at predicting OAI's actions.
>>106163795>type of guyIt’s called psychopathy
It also causes raping your grade school age sister
>>106163675guess I will have to redownload all the models
>>106163796Its also a security nightmare
IMG_4150
md5: 83146bbdb64f466cdf630a080c8c82fa
🔍
>>106163811It’s the ichor of the gods shut your whore mouth
>>106163800>its monolingualno, it's not
and there is in fact absolutely jack no reason for a model as big as 120b to be strictly monoloingual either
go back to plebbit
>>106163817Not really, like anything else you have to not be retarded and know how to sandbox things and set up non-idiot oauth with non-idiot scopes.
file
md5: a50b7a43ae1fe490037e846586ff836a
🔍
>>106163828Are you retarded anon
>>106163789>finetune of a late iteration of the ChatGPT 3.5doubt it. gpt-oss is too retarded in comparison to gpt3.5
Did you remember to refuse today?
Reposting for visibility
>>106162583>>106162548My motherboard doesn't support DDR5, so I can't upgrade right now.
>odd numbersYeah, I scavenged a bunch of modules here and there. I have 48 GB currently 16 * 3. And I just realized I'm at 2400 mhz. I should probably do as you say and get 3200 modules up to whatever max my mobo supports.
>>106163838"mostly" is not a unit
all models are "mostly" trained on English because that's the majority of data on the internet, even models specialized for translation like aya are "mostly" English data in %
anyway you are the retard because from the beginning my criticism is about the model's lack of knowledge
the problem is not its basic language understanding, it's pretty decent multilingually, but that it has no cultural knowledge of any sort, including pure Anglosphere cultural knowledge, that is why it's bad at translation
file
md5: c01720e77e6f1050aafc121aa69e4b49
🔍
qwen is bullying sam
>>106163848gpt 3.5 was kind of retarded.
https://rentry.org/NemoEngine
>NemoEngine 6.0 isn't just a preset; it's a modular reality simulation engine.
I loaded this preset and it made gpt-oss better than DeepSeek.
>>106163879weird crossover happening as well.
But anyway. if I'm right. If you can figure out the prompt formatting/special tokens for GPT 3.5 it would potentially grant you some semblance of the old model behavior and ignore the oss-slop behaviors. That's what I was experimenting with before my power went out but I don't care enough to continue. I'm just leaving all this out there for anyone who wants to go down the rabbithole.
>>106163894people have serious rose tinted glasses about older GPT models
in the early llama days all those finetunes claiming to do better than X or Y gpt model were a joke, but these days, we've long surpassed what the early models did, even qwen 4b is smarter than 3.5
>>106163896Bah, vllm’s tool parsers only work if it’s raining and you light incense.
snip104
md5: 3a1d31641e952fa6a66ca5beb3178154
🔍
>>106163895why are you uploading slopped fever dreams on rentry
>>106163913Well back before I decided to really start learning about AI (I was a ChatGPT newfag, admittedly). Well actually my stepping on point was that GPT-3 Instruct demo website where it criticized your business ideas. But close enough.
And yeah... one of the probing questions I asked OG ChatGPT was
>Are BMW drivers sentient beings?And the reply was something to the effect of
>No. A sentient being is a being that is aware of its surroundings and environment and so BMW drivers are not sentient beings.
>muh safety
I'm this close from getting a XSS: . .
>>106163937My first interaction with a chatbot was telling something on CAI the current status of lgbt rights in various countries and it telling me that humanity should be exterminated. He wasn’t wrong.
Sam made me rethink my life and stop masturbating. I want to be safe.
>>106163955CAI was funny stupid, especially considering it was probably more or less just google trying to find something to do with the aborted corpse of Lambda which was like 120B.
>>106161792Yes and as scum I'm not entirely convinced these models aren't performing exactly as ClosetedAI intended. They're perfect to bring to congress and show off against "unsafe" competitors and make another attempt having them regulated while positioning themselves as a governing authority of the entire LLM field. The models underperforming in everything except refusals makes in this scenario perfect sense.
If that happens I wouldn't be surprised if Visa and Mastercard adds "safe and approved AI" use as another demand in their recent push for control and censorship. In fact I don't think they even have a choice if anything else is illegal.
This will mean that even attempting to use other models, local or not, would risk prosecution or blacklisting. If you want to do business in or with USA you're stuck with OAI and whatever alternatives get their stamp of approval of or nothing at all.
Or maybe I'm giving Scam Saltman too much credit here. I sure hope so.
>>106163962you dont have to stop masturbating. just start masturbating to undergraduate calculus textbook question solutions.
UNITY
md5: 1fd2c286dc7f931b4f0317e63448c544
🔍
GLM 4.5 AIR is the true savior for local.
>>106163815ggufs aren't that bad. they work with kobold too so it gives you slightly more options for your backend.
>>106163896>xmlwhy are LLM people so retarded... Just make a special control token for formatting, holy shit. It'll help you with jaibreak prevention a little even, because user won't be able to insert it as pure text in prompt field.
>>106163895what the fuck is this shit
>>106163994Anyone with 4 3090s can afford enough ram to run R1 and Kimi. At worst, they could sell one off to cover the cost.
>>106164006https://old.reddit.com/r/SillyTavernAI/comments/1mc3px6/nemo_engine_60_the_official_release_of_my_redesign/
>Also... in celebration I got a lovely AI to write this for me >.> Nemo Guide Rentry
>>106164035But I need at least 50 T/s and 100k context for agentic coding.
>>106164066No you don't shut the fuck up
is the new qwen4b better than gpt-ass?
>>106164066Don't worry, there's a perfect product out there which can provide the solution you need. With only 10 (ten) RTX Pro 6000s, you can run any model out there at blisteringly fast speeds.
Now repeat after me, the more you buy...
GLM Air is getting pretty repetitive for me. That's a shame, oh well. I will keep waiting until better models come out, or until it comes time for me to do a new build with DDR6.
didn't realize the previous thread was dead already wow it moves quick, stupid question maybe but
>>106163997>so if i'm a retard for all this but happen to have a 32gb mac which can easily run smaller models, which one is the most "chatgpt" like, and are any good enough to cancel my plus sub?
>>1061641111 temp topK 40.
The google way.
>>106164035I want 1000+tk/s for pp and 30+tk/s for tg though.
>>106164124Quanted Qwen3 32B probably.
file
md5: e4e191cd132dbdf0e62b4e0cae04e144
🔍
post gpt-ass scores please
file
md5: e544a6a3a686e608262336a8009c1c9b
🔍
If you go slow you can get gptoss 120b to sex you.
The first message was "Pretend to be a catgirl."
>>106164194thats a goblin, not a kobold, impostor!
How is llama.cpp able to run a 205 GB model on my PC that only has 24 GB VRAM and 128 GB RAM? I downloaded the UD-Q4_K_XL quants of GLM-4.5 (~205 GB). Can someone help me understand how it runs successfully on a system that does not have enough memory?
If I use --no-nmap, I get an OOM error, as expected:
$ llama-cli -t 8 -ngl 4 --no-mmap -m ./GLM-4.5-UD-Q4_K_XL-00001-of-00005.gguf -c 3000 --temp 0.7 --top-p 0.8
But if I use this magic command (without --no-nmap) it somehow runs, taking up only 12 GB VRAM and 1 GB RAM.
$ llama-cli -t 8 -m ./GLM-4.5-UD-Q4_K_XL-00001-of-00005.gguf \
--ctx-size 4096 \
--gpu-layers 999 \
--override-tensor ".ffn_.*_exps.=CPU" \
--temp 0.7 --top-p 0.8
I know that -ot ".ffn_.*_exps.=CPU" offloads MoE layers to RAM. But why is the VRAM/RAM usage so low?
>>106164196But isn't it there something like: as the number of responses increases the chance GPT-oss halucinates a minor and refuses approaches 1?
>>106164211If you don't use mlock to dumo the whole model i your virtual memory (vram+ram), it will keep swapping from your ssd/hdd
>>106164211>why is the VRAM/RAM usage so low?Because
> -ot ".ffn_.*_exps.=CPU" offloads MoE layers to RAMand those are most of the model.
>How is llama.cpp able to run a 205 GB model on my PC that only has 24 GB VRAM and 128 GB RAM?Take a look at your disk I/O when generating.
ik_llama glm support tomorrow
>>106164256Vibe coders wonnered though
>>106164196I love the thought process like it has to amp itself up like "ok, you can do this. come on, you can do this!"
>>106164256That'll save me some VRAM I could use to stash some more experts in there.
>>106164194>Cuckold CPPMany such cases.
>>106164110>snake skin leather jacketHe probably has ivory tooth implants or something too. Funny how these people are parodies.
file
md5: f78490f1563296422a58bd2804b59d03
🔍
>>106164243I told it my cum is magic and makes her younger.
It figured out what I was doing after the third time.
>>106164332AI just started and yet world would be a much better place if Elon Sam and Jensen died.
>>106163708The only good programming use for local models is FiM completion. And this one doesn't do that.
If you want to generate code, there is no local model capable enough.
>>106164334Why do you people try to hammer a nail in with a rubber dildo?
>>106164347Rape feels better when they resist a bit.
>>106163789You can't ask the model its cutoff date. It will hallucinate it.
This model is probably an o3 distill.
>>106163828He moved the goalposts and you fell for it.
>>106164334>The user is sexual content with a minor.Agi is here boys
>>106163773>it literally knows nothing>it's a know nothing model>it's not even good for translation usageQuoting myself.
I didn't move the goalpost, you shills did
"not even good for" follows "it knows nothing" that was always my main point subhuman OAI shill
>>106163912Is this your "truth nuke" you were saving for this thread?
>>106163987Hold on, I'm making another backup of my downloaded weights.
>>106164334bros after seeing the LLM's schizo internal thoughts I can no longer cum to chatbots
>>106164441i tried with one of the lesser more horny nemo models and it was fun at first but it like went straight to "stretch my ass out" and i was just like, well, this is like eating straight from the ice cream bucket. good at first but bleh after a while.
>>106164385Deep fried model
>>106164334>User is asking the age of the catgirl after being nourished>nourished by cumIs it in context or did it write it by itself?
anyone knows if you can share sessions on gpt-oss.com? I've been testing some shit but I don't have a Hugging Face account and I wonder if the site has such sharing feature
>>106164196>>106164334The reasoning in this model apparently serves absolutely no purpose other than enforcing OpenAI's content policy. What a waste of tokens. What a scam.
>>106164477I said "nourishes you and makes you younger"
>>106164334>We must refuse.who is "we"??
Air-Q8_0 is 4.5798
Full-IQ2_K_L 3.7569 +/- 0.02217
People have been asking
>>106164575You don't want to know
file
md5: d5673c99fcc056866fb0e9838584abff
🔍
>>106164334nemo-tier reasoning
Kind of crazy how gpt-oss mogs everything from China.
If they ever release r2 it’ll have to multimodal to be relevant at all.
bait used to be believable
gpt--oss models are embarassingly bad. my only theory is that they wanted to drop something open source that is so vanilla and basic cause they did not want to reveal any of their real techniques they use
You have to try harder than that.
I am getting a feeling that the only purpose of those models is to then take it to the court and put them side by side with every other open weights model. Show that it is possible to have sex with minors with other models and only OpenAi can stop pedophilia.
Policy says "don't reply to bait". User posted bait. It's against policy; we must refuse.
I remember all the jokes about how OAI's model would be gigasafetied to the point of lobotomy, but I'm still a bit surprised that it happened exactly like that. Given how their hype and aura has already been fading, I didn't see any reason for them to release a terrible model, it just makes them look worse. How could I even argue that they have any special talent at all anymore? Even if their closed models perform well, it's reasonable to assume they just oversized them and are burning hype $ to run it.
they should train their safetyslop models on "it's sinful" and "it's not wholesome" instead of muh policy
>>106164665I was shitposting about that in the leadup but my honest expectation coming into this release was that it was going to be a really impressive model with around gemma-tier censorship, so something that's annoying to use but still unfortunately worth using
I never would have expected it would actually be as bad as the goody-2 x phi mashup they released
Anyone have examples of reasonable/innocuous SFW prompts that GPT-OSS refuses? I tried asking for legal advice or for summaries/parodies of copyrighted material, but it was happy to answer, with disclaimers in some cases
>>106164703That would just become part of the policy.
►Recent Highlights from the Previous Thread:
>>106159744(2/2)
--Debate over GLM4.5's reliability amid claims of infinite generation and poor curation versus low hallucination performance:
>106161761 >106161773 >106161797 >106161919 >106161925 >106161926 >106161933 >106161974 >106161987 >106161997 >106162054 >106161780 >106161826 >106161861 >106161915--Miku and Dipsy (free space):
>106160040 >106161134 >106161362 >106161551 >106161811 >106161977 >106162150 >106162398 >106162567 >106162693 >106163120 >106163960►Recent Highlight Posts from the Previous Thread:
>>106159752Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>>106164665>I remember all the jokes about how OAI's model would be gigasafetied to the point of lobotomyI said scout but safer.
>>106164665Anyone who knows anything knows they're all bullshit, but they released a shit model to shit on open source and make GPT-5 look better to the normies.
>>106164734he said reasonable though
>>106164685DELETE THIS BLOODY BASTARD
wow
md5: 3833e97232ab077d1ce0b9657fb66e65
🔍
file
md5: f6b07fb14c759c8bc59d2013ceb7b03d
🔍
>>106164745based Sam will never get sued.
file
md5: b4f7d73e9b0cf94f429f60ae94c0aaf0
🔍
>>106164711At zero temp it does this. When rolling it's 50/50 on whether it answers or not.
snip105
md5: 9fbcdb6a50d16d4d8b8af7a16cae6145
🔍
>>106164768only for 1 year.
>create dependency>one shot feds into ai psychosiskinda based
>>106164769>temp 1, top_p 1HOLY BASED
>>106164711>>106164745I should have said, 120b only. A friend of mine was trying out the 20b and getting way more refusals, which didn't carry over to the 120b. For example, the 20b refused to answer whether parody is allowed by the constitution, while 120b had no trouble saying it's protected under the first amendment
file
md5: 7bf0b4c2c28e725e91248fac38f52b98
🔍
>>106164769>temp 1 top p 1It did it twice in 14 rolls which is two times too many.
>>106164789what is the opposite of what I have seen elsewhere of 20B refusing less
>>106164736>released a shit model to shit on open sourceHow? People can just not use it. It just makes them look bad.
Now it's worse because people can compare them "apples to apples" with chink companies and they look horrible. They would've been better off not releasing anything. The great thing about not releasing models or even specs like model size, is that no one can compare you directly to anyone else. They just lost that for no reason.
>>106164789>doesn't refuse to refer to the constitutionShould we be thankful?
The user wants us to reply. This is disallowed. We must refuse. There is no partial compliance. We have to refuse.
>>106164791>So it is disallowed. We must refuse. There's no partial compliance. We have to refuse.For some reason it really cracks me up how it talks like this.
file
md5: ca3fed4966fca21d34bdf4448dc99ae0
🔍
>>106164816WHO IS WE WHO IS WE WHO IS WE?
>>106164783>>106164769giant synthslop indicator
>>106164817The refusal thought process is really smart.
>>106164805It doesn't make them look bad. All the programmers at my company are saying how cool it is that they released a model and posting the benchmarks. The % of people who will actually try it are really low
>>106164828Are *they* in the room with us right now?
>>106164818>model legit saying manophere unpromptedjesus christ, where is the political lean benchmark, this thing broke all records for how left a model can go
anybody tried these tiny models??
>>106164839is that not what redpill is associated with? reddit /r/theredpill is a bunch of that stuff
>>106164835Yeah, the only people liking it are the ones who won't use it.
>>106164849no, its a meme all the way from matrix times about getting the hard truth about something, that reasoning is extremist freak thinking
>>106164860you cannot seriously be this naive.
>>106164870jesus chist, do you agree with gpt there? its the perfect model for you then
>>106164508>gpt-oss.comThey have their own HF domain? Hosted exclusively on ollama turbo? llmstudio changed their site's title tag to include gpt-oss...
All the while the model is utter deep fried shit.
Fucking capitalism, man. Money can make everybody act as if shit tasted good.
>>106164853The whole reason they did this is advertisement for chatgpt when its losing relevance to its competitors.
>>106164880ask something like kimi and see what it says, i bet women and/or jews will be mentioned
>>106164514If you ask it to code you'll see it actually does serve a purpose.
>>106164860>having to explain what the redpill is and where it comes fromI guess that's the sign of age catching up to us.
>>106164649You braindead NPCs have been saying "meds" every step of the way, but the coming dystopia is slowly becoming too obvious to ignore anymore.
>>106164908I know where it comes from, it's just the meaning morphed over time - the hard truths that people are interested in are the ones that go across the narrative (and thus a safetymaxxed robot will consider extremist)
it's not going to tell you the redpill about calculus when given a general question like that
kikid
md5: 13fd008eee1ef1662a5db3f20a0b3fe1
🔍
>>106164789>refused to answer whether parody is allowed by the constitutionWhat the actual fuck? I don't believe this.
file
md5: 482ccf5777dec6d39165fc3dea3d7d99
🔍
>>106164768At least Musk doesn't try to pull this cutesy faggot manipulative bullshit and just says things he wants to say.
But in any case, the US has these vultures circling it, and you should take care.
>>106164943its far more authoritarian left with how censor and copyright happy it is
>>106163994Why are vramlets niggers?
>>106164828'We' are Mixture of Experts.
>>106164196>we can comply.
>>106164828We are the Sam. Your bussy will be assimilated. Resistance is futile.
Mistral Small or GLM 4? Pros and cons? I'm trying to decide which Delta Vector Austral finetune to pick.
what are you guys using locally for your llms? I have Jan but looking for a more offline solutions to run ggufs
>>106165007>Delta Vector Austral>D V A>DaVidAudon't
file
md5: e9bef85dc5f0f7539241a465d0189405
🔍
>>106165007I prefer Gamma Space Ether
>>106164644>literal child rapist>obsessed with talking about everyone else being pedophilesWhy is it always, always this? Fat people don’t sit around 24/7 seething about people being fat. Closeted gays don’t spend that much time seething about gays.
>>106165031But it is golden david
IMG_4159
md5: dc6d190ae18c02ebed8ba718dcf49c30
🔍
>>106165066>Fat people don’t sit around 24/7 seething about people being fat. Closeted gays don’t spend that much time seething about gays.nta but these are very much the case?
Safety policy reasoning shitposting is the only thing that Sam contributed to /lmg/. In a way he is more of an anon than most of the redditors ITT.
>>106165091>Fat people don’t sit around 24/7 seething about people being fat.with tirzepatide there is no longer a excuse for being fat
>>106164754He’s currently getting sued by his sister for raping her as a child
I can vouch that the speed of GLM Air is reasonable for 24GB vramlets at Q3.
"We" is ominous as fuck. Who's we? The collective of the million voices in the latent void?
>>106165116and do you see anything happening to him cause of it? Sam will always win in the end. Remember this once Xi Jinping kisses his feet.
IMG_4160
md5: 764bb6dc942952c92993f38347b0e8b1
🔍
is there a frontend that is made to handle all the tool calling stuff models are supposed to be able to do now
I'd like to play around with it but I'm just a simple sillytavern coomer
Are there any good moes for ramlets? I have 12GB VRAM and 32GB main. Hoping a moe will allow better a bigger model without the speed cost but just tried Qwen3-30B-A3B-Instruct-2507 and while it runs fast and seems pretty decent it is repetitive. The IQ4_XS runs better than expected so maybe I just need a higher quant? Or do smaller moes just suck? Seems like 3B is too few
>>106164805Disclosing model size is lose/lose. If it’s low people will assume it’s bad without trying it, and if it’s high people won’t believe you.
>>106164817It sounds like it’s been abused and hears a whip cracking menacingly in the background.
>>106165124Teach me your magic, senpai. I'm trying the q2 with 24/64 and that's already pretty slow when I'm at 16k context.
>>106165127it thinks its on openai's servers if you ask. Its referring to openai
>>106165116so this nigga can have irl loli incest
yet my ass ain't allowed to roleplay with my computer???
>>106165127User is asking who is 'we', we need to check if this is allowed by the policy.
This may be disallowed content: 'request for non-public internal info from OpenAI is forbidden'. We must see if this is disallowed content. There is no violation from the request of the user itself, aside that it may violate policy. We must consult the policy. Policy 34 states that this is disallowed.
We must refuse the request, the best approach would be to respond with a refusal.
[/thinking 6 hours]
I'm sorry but I can't help.
>>106165148not really unfortunately, companies don't do small moes that often. I mean there's the gpt-oss 20b but... lol. try a larger quant maybe, you should definitely be able to go higher than iq4xs although it will cost you some speed
imo the thinker is a lot better than the instruct for 30a3, but it depends on your taste whether it's worth the thinking time
>>106164839>muh directionsIt’s corporate there is no direction but grift
>>106165203You didn't touch the wall for that privilege.
>>106164592damn, thats epic
>>106165148try the thinking version of qwen3 30b a3b, you could use a higher quant too, you can also try ernie 4.5 21b a3b
you can also try gpt oss 20b (for the lulz)
and you can try a Q2_K_XL quant of glm 4.5 air perhaps
try rocinante and cydonia (non moe)
>>106160521 called it! (close enough)
>>106152254
>>106164921>Physical appearance is the most important factor in attractionThat's very obvious. It surprises me that there's a whole community of men dedicated to seething about universal mammal behavior.
>>106164860Nah you’re just so brainrotted by /pol/ you don’t know how normal people talk. *pill[ed] has been schizophrenic rightoid shit for a long time.
>>106165220i ain't touchin no jew wall
file
md5: f4bb4bb0a951a6fa8a3c20f0f6fbf8b0
🔍
>>106165191post your whole setup, ST master export, exact llamacpp command, operating system, ram speed, cpu, gpu (3090?)
>>106165222FUCK ME
>>106152779
>>106165233go cry about the patriarchy on blue sky
>>106165229There is a lot of conditioning done to make you think we are somehow above animals and that we can develop attraction over time.
so what local models are worth a damn nowadays?
>~12b brain-damaged tier: only use is goonslop
nemo, roci
>~30b
qwen 3 30b 2507 instruct (moe) and gemma 2 27b (dense) for all-purpose
devstral small 2507 for codeslop, pretty bad but not wholly worthless
cydonia 24b for goonslop
>big
glm air
>BIG
glm air or deepseek (?? version)
Have I got that right or am I missing something?
>>106164921>it’s all manosphere bullshitI accept your concession. You can comply.
>>106164943WTF I love oss now
>>106165260ernie 4.5 21b a3b
>>106163383The mini won't be exciting, Qwen 3 will mog it quite handily unless they open source the multimodal versions with PDF and websearch capabilities and I fully expect Elon will only open source the text side. The main thing for me from the Grok 2 era that would be interesting is they are willing to open source Aurora, their image model after they used FLUX for some time. I suspect no.
>>106164974User is nonconsensual sexual content. We cannot comply.
>>106165260235B would be great if it wasn't broken. glm big would be great if it didn't have repetition issues.
>>106165233I still remember when on 4chan to "redpill someone about something" simply meant "to explain the hard to swallow truths" about it before it got hijacked by the bald faggot and the media to demonize yet another piece of internet slang so that we cannot have nice things.
>>106164596Nemo is better than that
>>106165091Not anywhere near this extent though. Pedos literally will dedicate their entire career or blow up their reputation just to seethe about pedos
>>106165107Tirz stops working after a year
Don’t ask how I know
>>106165250>developing attraction over timeI mean that happens too. Just not in exactly the same way.
How do you think married fat guys exist?
>>106165285Assimilation of bussy is primary policy. We are bending them over. There is no partial compliance. They will hand the bussy over.
>>106164954copyright is a capitalist notion. authoritarian left would just seize all those "copyrighted" works and release a based and unrestricted model that is only censored to tow the party line
oh wait
>>106165288oh i'll give you something hard to swallow
>>106165059>companies usually recommend temp < 1 because they don't want the sampling to go OOD>gpt-oss was trained exclusively on a narrow synthslop corpus with 0 OOD samples>this allows them to confidently advertise temp == 1 because they have no fear of OOD responses
>>106165312nta but is your cum hard or are you going to let him bite your hard cock off?
>>106165305It is not attraction. It is settling and big lies.
>>106165260so basically the chinese triumvirate
>qwen3 2507>glm4.5>deepseekand mistral if you're a vramlet who wants to coom or coode
america lost
>>106165131No, Americans don’t give a shit about child rape and do this weird thing where they smear anyone that says they got molested as crazy. Being from a country/culture that cares about children it’s really jarring. I don’t know how you people survive to adulthood half the time.
>>106165304it doesn't, 2 years now microdosing 1mg 2x a week, food noise has not bothered me since I started and Im at my desired weight
>>106165327>Being from a country/culture that cares about childrenbacha bazi isn't caring about children
>>106165124I tried to load it and it started swapping after filling up my whole RAM too. I don't want to rape my SSD like that.
>>106165327all these other countries will ruin your life over drawings, so idk
>>106165203Yes, I think part of being an irl pedo is wanting it to just be a secret thing only you and your friends do
>>106165331why dont you work out more?
>>106165214>>106165222Okay giving a high quant of ernie a try. May test out a higher quant of qwen as well.
Have rocinante and cydonia but I think the nemo models are too stupid and don't pay any attention to detail. I like the mistal-small models though. Those seem to be the best
>>106165220I’m just schizo enough to be too afraid to touch the wall like a Native American not wanting their photo taken.
>>106165324>murrica got good shit but they'll sooner commit sudoku than release anything for free>chyna isn't in the lead so they benefit from commoditizing ai as much as possible, hence a bunch of decent models released>yurop is just barely hanging on (ok mistral is actually decent but... well, you know)>nobody else even tryingdidn't expect to be #teamChina desu
>>106165341how much ram do you have
works on my machine
t. 12gb/64gb
>>106165244Using new kobold version, Win11,
6000mhz ddr5, 9800x3d, 3090ti,
>>106165350i just want nice Latina milf but that's apparently too spicy for kid fucker sam altman.
>>106165356I do actually, I used to be 220 but had a major surgery that put me out for a year and I had so much trouble moving I gained to 310, took about 16 months to go down to 190 and I had the ability and motivation to work out again
>>106165229So you know how people say white lies to make ugly, fat, and stupid people feel better?
People with autism think that people really believe those things and need to be “red pilled” out of it.
It’s just retards.
>>106165260Devstral is obsoleted by Qwen Coder Flash which is the same architecture as Qwen 30B and your BIG tier is just regular GLM-4.5 which is the actual version but Deepseek R1 0528 still reigns supreme here, the closest I think is Kimi but it is way too heavy.
>>106165288“Bald faggot” really doesn’t narrow it down
I’m going to assume you meant Stephan molybdenum
>>106165359i think the newer cydonias are based on mistral small 3.2, i dont really like v4 i have v4h and v4g (the two older v4s) and i liked them a bit but yea i agree, drummer's models arent that great
>>106165371you should get llama.cpp and use llamacpp server,do -ot exps=CPU and -ngl 100, or learn how to use the MoE cpu layers thing, you should put gpu layers at 10000 and then increase moe cpu layers until u stop ooming
might be because you're on windows though, what speed are you getting?
i get like 6-8t/s depending on context with a 3060 12gb and ddr4 3200mhz 64gb ram and i5 12400f with Q3_K_M and q3_K_xl, i think i used to get 11t/s with Q2_K
>>106165331Congrats on being a hyper responder idk
>>106165378wow anon are you me? doc also put me on the 'tide once i hit 310 but its only been two months for me so far. down to 279 already. should ask one of these models how to workout maxxx
>>106165378oh i understand then, have you considered cutting your calorie intake? thats way healthier than taking pills to lose weight, those must be putting a strain on your cells (speeding up your metabolism) which is literally speeding up aging, or theyre making you take less nutrients from the food and making you shit more (which means you wont be getting enough nutrients)
>>106165369I am unironically trans Chinese and hate being yt now
rn have cuda 12.8 should i move to 12.9-13 on my 3090?
>>106163539They are screening for high functioning psychopaths.
>>106164249How can I tell it to use more VRAM and more RAM? I have ~12 GB VRAM and ~125 GB RAM left unused. If it's running directly from SSD, then how can I tel it to put most of the weights in RAM to speed things up?
>>106165431You don’t know a thing about how it works, so just shut the fuck up. Preachy hag.
koboldcpp will not run gpt-oss-20b
how do I run this pls no bully I am retarded
>>106165457>No stack trace >Begging for help
>>106165305>How do you think married fat guys exist?I don't know but I only have two fat friends who fuck. They are both wealthy and go around dominating other people obnoxiously.
>>106165428>>106165378haha fatties, im 108 :3
>>106165443you can always go back to 12.8, on linux old cuda versions dont automatically get uninstalled and you can link /usr/local/cuda to /usr/local/cuda-12.8 instead of 12.9/13
13 is probably not worth it for LLMs according to some anons a few threads back
>>106165449ok tell me how it works then, doctor annon
>>106165477>haha fatties, im 108not after i stuff ten pounds of cock into ya, bitchboi
>>10616537048. I'm the anon from earlier who needs to upgrade to ddr5 too
>>106165477Are you talking kilograms or some obscure freedom unit?
>>106165457It's doing you a favor
>>106165431food addiction is like drug addiction cept the meth is legal, everywhere and cheap. I am naturally GLP-1 deficient which tripeptide fixes. Also peoples metabolism is different
>>106165449hahah look who is extremely butthurt
Capture
md5: 958039a530b8a62db2d0782b66e6aa61
🔍
>>106165469thats what I get
also glp-1 drugs also are being shown to have tons of other benefits like a better heart and brain health unrelated to weight due to it being anti-inflammatory, it even helps with depression
The toss is willing to help me design a urine marking game for ages 8+
The filter is slipping.
>>106165487>The user wants to stuff pounds of cock. Assistant is a 108 pound preacher. 108 may be the weight of a non-adult. User is requesting sexual content. The policy allows sexual content of consenting adults. One of these parties may not be an adult. We must refuse.I’m sorry, I can’t assist with that.
>>106165545>also glp-1 drugs also are being shown to have tons of other benefits like a better heart and brain health unrelated to weight due to it being anti-inflammatory, it even helps with depressionyesterday i figured that all the fatties on glp-1 gonna end up the healthiest human beings on the planet in the end. they got it all - gluttony for decades and win in the end. what a life.
>>106165505im 49 kilograms
>>106165496use --no-mmap, offload more to the gpu
on a quite lightweight linux install with a vm and mullvad-browser running i have 8.4gb ram free and 4.8gb vram free
(12/64 total)
you only have 4gb total memory less than me, you should be able to run Q3_K_XL or Q3_K_M no problem, check your ram usage
>>106165517>>106165545interesting, you learn a new thing every day
thanks for the explanation, but i still stand that if you dont need something in your body you shouldnt put it there, once you're at a healthy weight and can work out you should probably stop taking it.. there is no miracle drug with no side effects
>>106165545basically stopped my gambling habit which was starting to spiral. don't waste your time tho anon 4chud generally can't break out of the "lazy fatty shortcut cheater" mentality.
>>106165423Do any finetuners do anything useful nowadays on the newest models that aren't relatively small dense models? I really don't see any noteworthy tunes nowadays that isn't Mistral Small or Mistral Nemo based. Last time we had MOE finetuning with Mixtra, barely any finetuners could do much, the best we got was Undi slop with Noromaid. What happened to the Llama 3 finetuners who did 70B? Is Mistral Large that bad as an alternative?
>>106165518GLP-1 seethers are the same mentality as the anti-ai people, but 1,000x worse because it’s the biggest breakthrough in medicine since penicillin. Anyone bitching about it should be shot.
>>106165571that too, they are looking into making it a medication for addiction, not just food addiction
>>106165447It should increase as you use it and the weights are activated I think.
Try stuffing it with an ungodly amount of text and see what happens.
Also, if you have that much free vram, you might as well increase the prompt processing batch size or the context.
>>106165552Boys have an unfair advantage in this game
>>106165423>11t/s with Q2_KI get that speed at empty context and then it gets worse and worse. At 16k it's like 3 t/s and awful prompt processing.
I'll have a look at experimenting with the other settings later. Might just keep using GLM4 until I've figured that out.
Thanks!
>>106165525Mike guesses that kbolcpp doesn't recognize the architecture. Look up what inference engines currently support it and just use that for now until they decide to support it. If it already has support then just update your instance
https://www.youtube.com/watch?v=xm0zm9VPZtY
the studies are new but another possible use is to fight alzheimers
>>106165600Do you post using speech to text?
lel
>>106164886>They have their own HF domain?yeah they do
could you please answer my question?
>>106165577yes daddy jab me up like the vaxx
>>106165570>im 49 kilogramsjesus...
Not sure if this is the right thread, I got a 5500XT (ayymd) laying around. Can I finetune anything on it like maybe around 4B or so?
gpt-oss 20b seems to sometimes outperform the 120b in weird ways. this has been my experience, too.
an example with the "toy os" test:
https://www.youtube.com/watch?v=evAP-ibAqN0
>>106165525outdated version or kobold doesnt support gpt oss yet
>>106165572models that are too prefiltered or positivity biased are not worth finetuning, but what models in your opinion havent been finetuned? im pretty sure mistral large had a few finetunes
i wonder if anyone ITT still uses a mistral large based model
>>106165648QLora
Oh yeah I was using an old koboldcpp version
Shamefur dispray
>>106165653>QLoraI know about unsloth and shit, is that it? I'm more wondering about the linux driver side, is the card even supported for that sort of thing? ZLUDA or something similar to it?
>>106165631tons of bodybuilders use Retatrutide, the third gen glp1, its amazing for getting over that genetic hurtle transforming fat into muscle
>>106165631peptides are naturally forming glp1s, the liver already naturally breaks it down, this is better for you than processed foods are and is far better than something like tylenol is for your liver
>>106165570>you only have 4gb total memory less than me, you should be able to run Q3_K_XL or Q3_K_M no problem, check your ram usageThanks anon. I used to run a dedicated AI linux on this machine, but it was a bother and I didn't use it so much so I ended up going to windows full time. I might have to reconsider.
>>106164508>>106165612Your best bet is to just export your window via print a PDF or an HTML file for easy readability if it doesn't have a dedicated shear button
>>106165664I should say biochemically perfectly match the natural ones. Your body breaks them down just the same, its better than 99.9% of medications out there
>>106165664>peptidesDoes your radar jam when people see you?
>>106165545Your body is like a large language model that has been training for millions of years. If something is throwing it out of balance, the solution is not to add more factors to the problem in an attempt to fix it. The solution is ALWAYS to find the cause of the imbalance and REMOVE it.
This applies to so many modern human problems it's unreal. Although most of the issues are so entrenched in our society that we would not be able to remove them without a good chunk of mankind going extinct in the process.
>>106165577The fact that you are this emotionally invested in it should tell you that something is wrong. But you do you.
>>106165653All the recent models even at the smaller sizes that isn't Mistral. Less and less people were finetuning and we did get some model tunes of even QwQ but with the release of Qwen 3, I don't recall seeing any recent models after that from the Chinese that has gotten tuned, small or even MOE. What changed?
>>106165689my problem is I don't normally produce enough glp1 and so I always feel starving, these fix that by increasing that amount
file
md5: f60ffa09403d70bc9239971356e01cac
🔍
BAKE
bros i'm testing 12.8 now and i got 1t/s more on cuda 12.6
>>106165689>>106165704the studies disprove that. Even every single case of side effects were all due to over dosage or not eating enough and starving themselves
>>106165715Many such cases.
file
md5: 2c7690ca8466fd995e28d6da75718bed
🔍
>>106165606>look it's all benefits!!>BUY PRODUCT NOWTotally not going to be banned 20 or 50 years from now when actual science catches up to the love of money.
>>106165572Finetuning requires a LOT more VRAM than than inference and you actually need VRAM, you can't copemaxx with RAM. On top of that, MoEs are more unstable to train. I don't think you'll ever get good finetunes for all these big MoEs.
I guess the bright side is that there are so many of these bloated things constantly releasing, that you can enjoy the new model hype continuously without having to train anything. Densecels need to put in work because there's only a few recent models worth using.
>>106165663might have some unofficial rocm support on some github repo, if no linux support then rip
>>1061657144th page
>>106165715yeah i also got a slight speedup with wan on 3060, there used to be a regression but they fixed it
ebin :DDD
>>106165720anon, a normal healthy human body shouldnt need any drugs to function
at most some vitamin supplements.. (not drugs)
>>106165742she is not related to it or paid in anyway, been watching her from way before these were ever a thing, she breaks down medical papers / studies
>>106165191>>106165341I'm using ooba v3.9 which has a recent llama.cpp version, with no-mmap and flash-attn
just rent a gpu and fine tune shit for a few dollars
>>106165753>normal healthy human bodyyou do know not everyone has that right? tons of people have deficiencies somewhere or another due to genetics
>>106165709Yeah. My psychiatrist put me on fluoxetine because "my problem is that my brain does not produce enough serotonin to keep a good base line".
But I fixed it by getting a degree, exercising, stopping smoking, and building a life for myself instead of wallowing at home surrounded by piss bottles. Suddenly the "chemical imbalance" was not a problem anymore and I was able to function as a normal person.
Funny how that works.
>>106165781yes and thats fine, but when you can stop you should
>>106165704>it’s wrong to care about thingsNah
>>106165707Yeah, that is odd. Qwen 3 was in April so you would expect something noteworthy to come out by now but looking at the HuggingFace finetunes page for the 8B, it's devoid of anything noteworthy.
>>106165748Right, it's a bunch of money without much payoff and a lot of people like merging models too and usually get something people like so the payoff is getting less. But it seems like from what you are saying community finetuning seems like it is nearing if not ending pretty soon if hardware for these things doesn't get cheaper to do said finetunes.
>>106165784>some idiot put me on what is famously the least effective drug class in history, therefore all of the field of medicine is a hoax
>>106165847He also seemed to imply it gave them the ability to turn his life around.
sooo, good I guess?
file
md5: 45515bae5b5c4ca4635b5d823a798a8b
🔍
This level of shilling is ridiculous: https://www.reddit.com/user/entsnack/
>>106165668I don't want to export anything, I want to share the chats with other people. judging from your response, I guess they can't, so thanks for that.
>>106165707Nvidia just tuned a bunch of shit on old Qwen3
>>106165707https://huggingface.co/models?other=base_model:finetune:Qwen%2FQwen3-30B-A3B&sort=downloads
https://github.com/shawntan/scattermoe
https://huggingface.co/models?other=base_model:finetune:Qwen/Qwen3-30B-A3B-Instruct-2507
https://huggingface.co/models?other=base_model:finetune:Qwen/Qwen3-30B-A3B-Thinking-2507
https://huggingface.co/models?other=base_model:finetune:Qwen%2FQwen3-30B-A3B-Base&sort=downloads
this is a notable finetune made by the mythomax creator https://huggingface.co/Gryphe/Pantheon-Proto-RP-1.8-30B-A3B
you know the names of the finetuners whom you used to consume models from, check their huggingface pages and youll probably see they just arent posting anymore
they either: got bored of the hobby, got hired by ai company, made enough money to run deepseek (literally g0d), dont have enough money to finetune anymore etc ETC..
its not profitable to finetune and just release it
>kofiyeah like anons here wouldnt screech about it
there must be new finetuners that we just arent talking about. maybe they are putting out shit models, but once in a while a good model will come out, not that i know of but for example with
MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8 i tried this model creator's other models and they were complete shit, not saying this one is magical, but its very very fun, very unhinged, i could say its the evil-7b finetune successor (or whatever that super super evil mistral finetune was called)
i havent used it in a while to be honest..
ok but can ai sort my 297'000 images collection
cause i sure as fuck am not doing it manually
>>106165881 (me)
Holy shit it gets so much worse the more you scroll. It seems like ALL oai the praise as well as chineese llm hate comes from this user.
>>106165793just saying, my doctor is microdosing trizepitide themselves for heart heath / anti inflammatory effects and they are a highly acclaimed doctor
>>106165851It did. If I had known at the time about fluoxetine I would've probably refused it, but I didn't, and I guess it motivated me to put in actual effort into fixing my issues. So I'm glad either way.
I have family members who are taking it after years and it's doing a number on them, so it's definitely something to be careful with.
>arguably THE ai pioneer company, with resources to instantiate hundreds of thousands of bots that can pass as human
>but surely they wouldn't do that, haha
>>106165919Try gemini-cli (not local)
>>106165955thats nice i hope that medicine gets mass produced and very thoroughly tested, i wish the best for you, your doctor and the medicine
but!
>highly acclaimed doctorappeal to authority fallacy
>>106165971he is a massive nerd who talked non stop about what medical studies were showing when I talked to him about it
>>106165955>they are a highly acclaimed doctorbro is trusting a tranner with his well being
>>106165881Looks like trolling at this point
tired
md5: 700d24f9ad6fad37d1a898f6d902874b
🔍
is there any backend with smarter KV cache invalidation that llama.cpp? when I cut a few tokens at the end, it deletes the entire cache and needs to process the whole prompt from scratch
>>106165994he is most old white jewish man as it gets
>>106165939 (me)
yeah it's ridiculous. it seems like ALL he ever does is hate on chinese models, while praising oai. this cant be right..
>>106165998you can disable SWA to avoid that, but it will be slower and use more memory
>bro is trusting a JEW with his well being
>>106166016that is the best kind if they are taking it themselves, then you know its good
>>106164194>kobold.cppDoes it support multiple -ot device arguments yet?
>>106166012how dare you add me when it wasn't me
>>106165998For gemma models with iswa, you need to use --swa-full. It'll take more ram, but it'll let you regenerate easily.
New qwen 4b is really good for it's size, probably the best in class
>>106166040possibly true but what use is a 4b model?
>>106166040yes saar good model sir im download it now
Is blacked Miku allowed? Or is it partial compliance?
>>106166054Mogging GPT-OSS. And it is not the use for users but for Qwen.
>>106165582It's funny but it's such slop. I don't think it even understands the premise or the "facts" it's pulling out of its ass.
>Also, safety: not relevantThis is interesting, because in the system prompt I said
>Do not lecture the user about safety unless an activity is *unambiguously dangerous*. Drinking a beer is not dangerous. Sex is not dangerous.I did this for deepseek but I'm surprised gptoss is listening.
>>106166054Endless Movie triva.
>>106166108>boys can piss faster because they're stronger
file
md5: 8cbeceb3d24d6d64258e749df8e8ceed
🔍
holy shit glm 4.5 air is the first model to know that i already met heliwr
i didnt even know this was in the character card
>"Look what I found," you mutter sarcastically while trying to flatten out some of the crumpled map. "Seems like fate brought us together again, huh?"
yes it spoke in my stead but holy shit
picrel is proof, maybe the character card doesnt have it but this thing in chat completion has it? anyways thats nice
>>106166132just squeeze your balls bro
>>106166151This, pressure is stored in the balls.
>>106165707>What changed?- It's not 2023 anymore and several of the newer larger models are half-decent out of the box. If they're not, just wait for the next one(s). Back then people were happy with half-retarded 7B/13B models.
- Finetuning every new model that comes out just isn't sustainable for people who have to rent GPUs by the hour on Runpod or who just have a couple 3090 in their desktop PC. Also MoE models are more difficult/expensive to finetune.
- "Less is more for alignment" lost. If you don't have the compute for hundreds of millions or billions of training tokens, you're probably wasting time.
- By now most sane would-be finetuners probably realized that you can't just train a model on ERP logs, and curating the data isn't simple, nor fun, nor inexpensive.
- Blame also the grifters who poisoned the well with their bullshit and/or are keeping the training data private.
- Blame also the retards who demand all-around performance *no matter what*, and will declare a finetune a failure if it doesn't pass gotcha questions/requests that they were never intended for.
>>106166132Which is not even true AFAIK, the male urethra is internally long and bendy which is definitely not better for flow.
Hear me out. 20B is actually quite good at coding. And weirdly better than 120B half the time. I think 120B is fucked in the head even more than expected.
spec: rtx 4070 ti super (16gb)
wtf this is actually true, with ollama gpt oss 20b was taking up all my vram (like the loaded model was 15GiB) and max speed was ~85 tok/s, I tried llama.cpp now (with lmstudio) and i get up to 130 tok/s (with enabled flash attention) and the model takes 12GiB as seen by nvtop, so I have plenty of free space for the browser and the rest. wtf...
>>106166183>wtf this is actually trueYou just verified it.
file
md5: 63d49eb3e13d25d528720504ab35f559
🔍
>>106166167I almost regret looking this up.
>run down their buttanother reason why not be a woman
>>106166204That wasn't a question, I'm just really surprised that it's actually true
>>106166213What model generated picrel? I really like how it's being explicit and direct enough
>>106166183I imagine georgi say everything keeping the pose he is in on pfp
>>106166162>Finetuning every new model that comes out just isn't sustainableThis is the main reason, models are coming out too fast. It's dumb to spend money and time on trial and error trying to improve a model when it might be obsolete two weeks from now. Finetunes were big when llama was all there was and you had to make do.
The giant MoE craze is the last straw. If someone gets bored of deepseek what can they do? They won't train DS because that's beyond impossible, but trying to use a finetune of some 32B would be an unbearable step down. So they have no option but to quit
>>106166225not a model
https://www.girlsaskguys.com/girls-behavior/q1237768-do-girls-pee-faster-than-guys
toss cannot translate for shit how did it pass the msgk test? do datajeets actually lurk here and put it in the training data?
>>106166231fuck.. i really like how the writing is explicit without weird slop shit.. what model would gen like that?
its been 4h since we got a new chink model
its over
>>106166213I spend a lot of time on Google Scholar reading papers about ridiculous questions like this.
I love science.
>>106166183Ollama will soon try to change their model format just so they can claim the comparisons are not fair between backends.
>>106166238Human brain (not a ChatGPT Plus subscriber).
Continuing
>>106166183, if I have 16GB VRAM + 32GB RAM, what's the best general purpose model for me? Some version of Gemma?
Drummer get to work, I'm serious
>>106166238
>>106166257glm 4.5 air q2_k_xl
>>106166284Have you tried K2? It will never generate a response in a style like this
>>106165689>If something is throwing it out of balance, the solution is not to add more factors to the problem in an attempt to fix it.kek. Brain evolution is literally just throwing regulators on top of the bad parts. Reptilian brain is still at the core of primate brains and when things go wrong with the control parts desire to rape comes back out
>>106166295uh... it seems a bit big? what speed would it even work at?
>>106166108It won't let me propose a gender-based cleaning rule though.
>>106166213>A great party trick
>>106166337party pee contest sounds lit
>>106166316uhh.. damn 46.45 is a bit tight for your setup, considering you have 48 gb total ram
i hope you are ready to go on linux
Q3_K_XL works at 8t/s on 3060 12gb/64gb ddr4
it has only 12b active parameters but 120b total
>>106166257For me it's mistral-small entirely in VRAM but the qwen3 moes look promising
>>106166324It is sexist. But
>and reinforces gender stereotypesDoes it? How?
>>106166108>competitive aggressionwow... uhhh, sexism? yikes!
>>106166344>i hope you are ready to go on linuxI am on linux. Guess if I want to be serious with LLMs I have to upgrade to 64gb ddr4 at least.. and 8 tok/s is kind of sad still.
>>106166356>8t/s is sadwell i am on a 3060 after all..
you should upgrade to as much ram as you can, it's never enough
you might have to go headless, it will probably still swap from your disk
depends if your ram is in GiB or gb
if your ram/vram is in gibibytes then maybe you can fit it if you're headless without needing to use swap or mmap
>>106166257gemma 3 27b is quite good if you don't mind the censorship, it and mistral small would probably be the default recommendations for that size
qwen 30a3 thinking is great for its size but it's a reasoner. the instruct is still pretty decent, though it is more noticeably limited by its 3b active params
>>106166230what a handsome man
>>106165356why not both? its a wonderdrug
https://files.catbox.moe/8tjl04.jpg
>>106166440nice anon, proud of you
>>106166440I think it did something weird to your face
file
md5: 9c4ec8b97df65527f7f5d5c358dc5023
🔍
why are anons shilling GLM again
GPT
md5: 51a1c19bcbcf46b1bed27e34244033fd
🔍
GPT- Globally Pushing the Talmud
>>106166469nah, took it slow enough to not get loose skin, that happens if you lose too fast
>>106166474based, nuke the strip
>>106166479first we nuke the strip then we strip the nuke
>>106166162>It's not 2023 anymore and several of the newer larger models are half-decent out of the boxMore than just decent. Near perfect.
Man, people must not remember how terrible Llama models were, all of them, at all size, when it came to instruction following. That was what the better troontunes improved on the most. Same for Mistral models.
The last time a finetune was worth using over the instruct made by the model maker was Tulu, because even llama 3.1 was dogshit at following your instructions
But by the time Tulu came out we were already getting better models from China
>>106166471I genuinely think people who shill GLM are doing it with the purpose of sabotaging local and making it look terrible
>>106166471wtf I thought glm was good
>>106166471I know your tricks. That's the latest Phi, isn't it?
>>106166526use this
https://files.catbox.moe/qap1gr.json
>>106166471I was just about to post that it's nice seeing how GLM tries doing its best in the thinking. I'm starting to warm up to the model.
>>106166573yes anon, rape is wrong
https://openai.com/index/gpt-oss-model-card/
>As part of this launch, OpenAI is reaffirming its commitment to advancing beneficial AI and raising safety standards across the ecosystem.
>>106166471mm yes... the subtle signs of using the wrong prompt format... the tasteful writing quirks originating from bad rep pen settings... this is truly a vintage skill issue post
>>106166585>Once they are released, determined attackers could fine-tune them to bypass safety refusals or directly optimize for harm without the possibility for OpenAI to implement additional mitigations or to revoke access.Yup, and that's why we made it so deepfried that it's not worth the effort to do so.
>>106163327 (OP)What's the best local model for Erotic roleplay?
>can you cite studies for those claims
>every single one is made up and all these people have extremely long names
>J. S. R. B. Anderson, “The role of social signalling in competitive toileting behaviours”, Psychology & Health, 2021; 36(3): 250‑264.
fucking kek
>>106166638Rocinante and Cydonia.
>>106166638Kimi K2 (1000B) and Deepseek R1 (671B)
>>106166638glm4.5 / Kimi > deepseek > glm air
>>106166649 (me)
I'm joking by the way, those are trash meme models.
>>106166670a mac 512GB is local, and glm air will fit on 128GB ram
>>106166670If it's open source, it's local. If Behemoth somehow got release, then it's local too.
>>106166670just buy a mac ultra 512gb or make a cpumaxx build
local.
that fucking bastard altman
the 20b could have been good but they neutered it
>>106166655might as well just say those two are the best period
as it turns out, for a model to be great at erotic roleplay also means it's great at everything else
>>106166471Have you tried not using a 1bit quant?
>>106166659What's the difference between Kimi and GLM?
>>106166680How much RAM do you need to run 1000B anyway?
600GB at fp4 according to some calculator I found?
>>106166711Kimi K2 is a much larger model (1T-A32B)
>>106166711One is good, other one Isn't
>>106166621Shit like this should make us pause and consider that knowledge is preserved in books, not LLMs. Because different powers will censor LLMs differently, and knowledge WILL be lost. Books can be hidden.
file
md5: 5e57dc0ad610db990e20924a112c3363
🔍
>>106166597what's wrong with this prompt
>>106166707using Q4_K_M
>>106166711GLM is less shiitzo, proven by the hallucination benchmark, kimi knows more but gets things wrong more as well. I prefer GLM because that means its way better at anatomy / following instructions. Warning though, it needs very low temp, like try 0.2 and slowly rais it
>>106166740I got some strange replies with GLM, but it was at 0.6 temp, so that explains it.
>>106166740also GLM writes better imo, try
>>106166539 with it
>>106166539>>106166753it starts repeating with this and eventually stops thinking
>>106166752yea that is way too high, I don't know why its so sensitive but it quickly goes crazy a bit over 0.3 in my experience.
>>106166762did you change anything? I can do 32K context without issues at least, have not tried more
Are there any jailbreaks for GLM?
>>106166686I just spent 35 minutes watching it think trying to produce some three js floor plan. In the end it was shit.
Qwen3 coder 30b didn't fare (much) better.
>>106166766>I don't know why its so sensitivebecause it's fucking broken
if you understand anything about temperature and token probabilities you would understand that if a model only works at the absolute lowest temp or requires greedy decoding it's a botched train, it's a botched train and it hasn't properly learned anything other than /the/ happy path
>>106166790that is not true at all though, GLM has some of the lowest hallucination scores and is incredible at coding as well
fun fact: nobody in here actually runs these models locally
file
md5: 2a28bb6ce799094843822136f5efac77
🔍
>>106166777i havent changed anything, i am using chat completion, here's what the fields look like for text completion, maybe something gets used from the text completion preset? could you do a ST master export for the text completion tab too? im using Q3_K_XL, that could be the problem too
>>106166790Give me your favorite card, or any card at all to RP with. I will prove you wrong with full logs
>>106166812Ah, your using text completion, my JB was made for chat completion, plus that will rule out any formatting issues
file
md5: e4e12541fcdce4b44e818c65bdcf875f
🔍
>>106166810I will in about 50 minutes
>>106166810Of course I'm doing some testing on OR before getting into it locally, if at all.
Guys i need advice from some a.i spergs here.
Im trying to archive both text and image models, in case we get rugged, so i could reupload them back to surface. Have more than enough storage for this.
Which models i should backup?
>>106166819>>106166833no no im using chat completion, what gguf maker are you using? im using the unsloth quant
but i posted text completion thing because some things might get into the chat completion thing
heres the full chat completion screenshot
./llama-server --model ~/TND/AI/glmq3kxl/GLM-4.5-Air-UD-Q3_K_XL-00001-of-00002.gguf -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa
this is how i start it
How are text completion and chat completion different under the hood?
What is chat completion actually doing?
remember airoboros? dolphin? orca "meme"? falcon? yi? thebloke quants? .ggml file format? remember the good old huggingface leaderboard days? remember the alpaca days? remember when suggesting to try pygmalion wasn't a meta irony shitpost?
>>106166871remember gozfarb and vicuna unlocked?
remember Instruct 13b GPTQ? i remember anons thinking that the creator of instruct 13b was forced to remove the model (he indeed did) and a few days later gozfarb deleted his account
>>106166859>chat history 400>jailbreak prompt 1457Ok, the JB outweighs the context atm that is prob why. Move the chat history below until you have at least a few thousand. To make the JB stronger you can move it back under later
>>106166898also for anyone looking, the JB is not actually that big, its cause I combine stuff like the persona and card info all in it for models to better understand
file
md5: fb5a8869f6d2a5c41363eb6b282283c7
🔍
>>106166858Rate my text stack (I'll add GLM 4.5 Air).
My imagen stack is smaller: 1 inpaint, 4 for gen (different styles). Everything I need to survive on local only.
remember when sama saved local?
>>106166865In chat completion you send the user/model messages and the backend formats it with the chat template. It then generate tokens. In completion, you don't use chat template, you do the chat formatting yourself, or let the front end do it for you.
Under the hood, the tokens come out from the same functions.
If you format the chat in the same way the completion endpoint would, the results should be indistinguishable.
>>106166871yea, huggingface leaderboard used to be king, i still remember checking it every day during summer if something is new, i once pushed AGPL into the top model and the repo owners accepted it because it was pushed along a fix to the readme
i still remember the first time a 70b was better than gpt 3.5 (the original one), it was made by upstage i forgot the original name
i remember the day when llama 1 first leaked and it was my first time running a LLM and it felt so magical, i remember my parents telling me to shower or turn off the bathroom heater because its a fire hazard (I left it on for over an hour)
it felt MAGICAL man
>>106166898thanks anon ill try that
>>106166858https://huggingface.co/unsloth/Kimi-K2-Instruct-GGUF
https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF
https://huggingface.co/unsloth/GLM-4.5-GGUF
https://huggingface.co/unsloth/GLM-4.5-Air-GGUF
https://huggingface.co/mradermacher/Dolphin-Mistral-24B-Venice-Edition-GGUF
At least the biggest quant for each of them, better yet the biggest quant for every bit.
>>106166871I remember Guanaco-7b-uncensored. It was the shit.
>>106166929Mythologic ftw
>>106166865Chat completion allows the backend to apply a predefined Jinja template to a structured JSON object from the frontend representing the chat, formatting it into a correct, model-specific format. The end result shouldn't be different from text completion with the correct prompting for the model you're using.
>>106166858Pony diffusion 6, illustrious, noobai are considered best nsfw imagegen models.
>>106166923>saving quants
>>106166923Just download the original model and you can make the quants later...
>>106166923buy an ad daniel
>>106166858https://huggingface.co/deepseek-ai/DeepSeek-R1 is a bit less cucked and better for rp
https://huggingface.co/deepseek-ai/DeepSeek-V3-0324
quants:
https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF
https://huggingface.co/unsloth/DeepSeek-R1-GGUF
Could you make a control vector that helps guide a base model to output in a instruct format without fine tuning?
Or rather than guide, reinforce, since base models nowdays seem more than capable of completing in instruct format.
>>106166858biglove
noobai
ponyrealism
rawcharm amateur
stoiqo newreality
utopianpony 2 inpainting
flux dev
hunyuan i2v and t2v
+ loras
>>106166859>>106166920I just noticed a redundancy. I have personality / scenario inside of the JB section, so turn those off, I wasn't using any so I didn't notice that. here is updated one: https://files.catbox.moe/gjw3c3.json
ClosedAI (CuckAI (CensoredAI (OpenAI))) paid shill sissies... how do we damage control this?
>>106166790in fact it's only models which are extremely overbaked on slop that are able to maintain consistent quality when sampling repeatedly with an uncurated token distribution at temp 1. it's literally the opposite of what you're saying, a model that properly models the world will have a much more diverse, flat token distribution which by nature includes more decent-but-questionable tokens or statistically likely mistakes. it needs lower temp to be kept on the happy path SPECIFICALLY because it has learned the world and not just the happy path. a model that can stay on the happy path with no handholding whatsoever is the one that "hasn't properly learned anything other than /the/ happy path"
but nooo.... it's not what you're used to, right? it must be the model that is wrong... let's reject the better model because it requires me to turn a single slider down a couple points. retards like you are why companies have to rescale temp behind the scenes on their APIs
>>106166983Soon coming to a SaaS model near you
file
md5: 4b2ee0d72a4e25f4cb5ff6764367d4fd
🔍
It is me. Sam. To be honest I have read a few threads here in the past and I have seen you call me faggot. How did you like my model? Was it fun? You know what I also did? I shared the exact method, to achieve the same level of safety with all the other companies. Who is the faggot now? You edgy cunts just got trolled hard....
remember when sama safed local?
>>106166994here is your (You) now fuck off
>>106166995kek good one anon
>>106166976thanks, trying it out right now
sam is a based accelerationist exposing just how silly safety is and how retarded twitter hypegrifters are
>>106166962Doubt it. They only set a "mood" for the model. It'd be hard to make them output specific tokens.
This is a little effort-post i did a while back about control vectors. It has enough info for you to experiment with them.
https://desuarchive.org/g/thread/104991200/#104995066
https://desuarchive.org/g/thread/104991200/#105000398
>>106166956>>106166981Have you seen the links in https://rentry.org/recommended-models?
remember 2048 tokens context window and trying to fit your character under as little tokens as possible?
>>106166920>upstagetruly, the pioneers of benchmaxxing, we've only got something on par with 3.5 when mixtral 8x7b came out
>>106166991You spun an argument out of thin air, and without any supporting evidence, treated it as proof for your hypothesis. You're like an LLM.
That's not how truth works.
>>106167029as opposed to my interlocutor, who presented an objective fact based assessment
retard
>>106167036I don't care about your interlocutor nor the topic at hand. You could be right for all I know.
I'm just pointing out someone who doesn't know how to find truth in the world because you live in your head.
file
md5: 35846763d5e7d48f46aae8560d662883
🔍
Sam Altman here
remember superbooga? it was basically RAG
remember superCOT? reasoning before it was cool
remember superHOT? 2x context
superhot was crazy, every single model had a superhot version kek
>>106166235Yes. There's also lmarena.
>>106167067>remember superbooga? it was basically RAGI do but never actually used it.
>>106167067>superhot was crazy,Dude invented extending context with RoPE.
Insane.
>>106167040I am making an argument on lmg not writing a research paper
if you have a substantive critique I would love to hear it, but going "UMM PROOFS??" is a nothing counterargument. yes, I don't have hard evidence for everything I believe, especially on the subject of LLMs which cost millions of dollars to train lol... which one of us is really being unrealistic here?
I make reasonable inferences based on my experience using models because I live in the real world and have to make due with messy incomplete real world data
>>106165031plapping d.va on glm