/lmg/ - Local Models General - /g/ (#105959558) [Archived: 297 hours ago]

Anonymous
7/19/2025, 9:13:31 PM No.105959558
looga
looga
md5: 6a31b96f7d13294ad8e7e8488e58f8df🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105952992 & >>105947940

►News
>(07/18) OpenReasoning-Nemotron released: https://hf.co/blog/nvidia/openreasoning-nemotron
>(07/17) Seed-X translation models released: https://hf.co/collections/ByteDance-Seed/seed-x-6878753f2858bc17afa78543
>(07/17) Support for Ernie 4.5 MoE merged: https://github.com/ggml-org/llama.cpp/pull/14658
>(07/16) Support diffusion models: Add Dream 7B merged: https://github.com/ggml-org/llama.cpp/pull/14644
>(07/15) Support for Kimi-K2 merged: https://github.com/ggml-org/llama.cpp/pull/14654

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>105959864 >>105959897 >>105959898 >>105959900 >>105961290 >>105961674 >>105962666 >>105963150 >>105963176 >>105964664 >>105965033 >>105965159 >>105965409 >>105965953
Anonymous
7/19/2025, 9:13:50 PM No.105959561
Gr21lLTWUAAcp-e
Gr21lLTWUAAcp-e
md5: 5301511148f51c10d75940d3583e8ec7🔍
►Recent Highlights from the Previous Thread: >>105952992

--Paper: Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential:
>105956573 >105956614 >105956632
--General-purpose LLM shows progress toward AGI through long-sequence reasoning and reinforcement learning:
>105957188 >105957237 >105957239 >105957255
--Debate over Yann LeCun's role and critique of LLMs within Meta's Superintelligence team:
>105957945 >105957980 >105958082 >105958115 >105958149 >105958214 >105958276 >105958291 >105958157 >105958166
--State-tracking limitations of S4 and Mamba despite recurrent architecture:
>105955149 >105955182 >105955193
--Kimi K2 beats Gemini in Cline diff edit failure rate:
>105954690 >105954701
--Character card design considerations and model-specific adaptation in roleplay bots:
>105953191 >105953341 >105953401 >105953426 >105953438 >105953473
--Configuring secondary models like Phi-2 for summarization in SillyTavern with KoboldCPP:
>105955334 >105955343 >105955381 >105955415 >105955427 >105955516 >105955554 >105955447
--Industry shift toward MoE models due to superior scalability and performance over dense architectures:
>105953517 >105953533 >105953543 >105953607 >105953622 >105953643
--Debating the limits and capabilities of LLMs versus human brains:
>105954692 >105954712 >105954734 >105954732 >105954757 >105954776 >105955145
--Frustration with ineffective story generation despite long-context character cards and model switching:
>105956330 >105956483 >105957182 >105957214 >105957717 >105957791 >105957293
--Local image-to-video animation tools and hardware requirements discussed:
>105955139 >105955180 >105956675 >105958160 >105958173 >105958177 >105958333
--OpenAI experimental LLM achieves gold medal-level math reasoning at IMO:
>105954767
--Miku (free space):
>105953587 >105956785 >105957091 >105958830

►Recent Highlight Posts from the Previous Thread: >>105953000

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Replies: >>105959864
Anonymous
7/19/2025, 9:16:29 PM No.105959586
seed
Anonymous
7/19/2025, 9:17:16 PM No.105959598
download
download
md5: 7c51efefe953e28c1bf3be3234c0a61d🔍
first for kimi
Anonymous
7/19/2025, 9:19:52 PM No.105959617
bwoah
Anonymous
7/19/2025, 9:21:23 PM No.105959629
>>105959612
>is kimi dev 72b really the best local model for agentic tool calling?

Did you get your desired answer?

now btfo
Replies: >>105959639
Anonymous
7/19/2025, 9:22:55 PM No.105959639
>>105959629
fight me
Replies: >>105959646
Anonymous
7/19/2025, 9:23:28 PM No.105959646
>>105959639
kiss me
Anonymous
7/19/2025, 9:24:19 PM No.105959653
LLMfags getting real uppity lately. Your autoregressive days are numbered.
Anonymous
7/19/2025, 9:28:54 PM No.105959684
https://www.phoronix.com/news/Burn-MATMUL-Kernels-CUDA
Rust won
Anonymous
7/19/2025, 9:28:58 PM No.105959686
>moefags still don't understand that a specialized dense model will outperform R1/K2
>moefags dont understand that that total params != active params

it's tragic actually
Replies: >>105959834 >>105959873
Anonymous
7/19/2025, 9:38:06 PM No.105959771
Waylon Mercy knows how to throw a good picnic,  June 24, 1995
>>105953632
I use openaudio s1 mini for voice cloning.
https://huggingface.co/spaces/fishaudio/openaudio-s1-mini
Voice clone sample of pro wrestler Waylon Mercy
https://vocaroo.com/17SOUQU9QUxq
Anonymous
7/19/2025, 9:42:02 PM No.105959802
Is reasoning a meme?
Replies: >>105959844
Anonymous
7/19/2025, 9:45:17 PM No.105959834
draw
draw
md5: 8f4373ef904b9d1a3c121c63ad76a5df🔍
>>105959686
Anonymous
7/19/2025, 9:46:44 PM No.105959844
>>105959802
It's not a meme, but it needs to be well-structured with instructions explaining what the model should think about; don't just let it do its own thing. Or at least, this works for me with Gemma 3 (even though it seemingly wasn't designed for that) and goal-driven characters.
Anonymous
7/19/2025, 9:50:10 PM No.105959864
>>105959558 (OP)
>>105959561
for the love of god please do not tell me there is a pink miku
Replies: >>105959884 >>105959902
Anonymous
7/19/2025, 9:51:20 PM No.105959873
>>105959686
A specialized dense model can outperform a huge MoE on intelligence, but codemonkey tasks depends mostly on recall which depends on the total params.

>>105959692
First of all, finetunes do not add new knowledge. Second of all, that irrelevant shit polluting its parameters helps it generalize and makes it perform better on novel tasks.
Replies: >>105959895 >>105959917 >>105959932 >>105959955
Anonymous
7/19/2025, 9:52:12 PM No.105959884
>>105959864
LUKA LUKA NIGHT FEVER
Replies: >>105960293
Anonymous
7/19/2025, 9:53:41 PM No.105959895
>>105959873
Oh my god will you faggots shut the fuck up. Nobody cares
Anonymous
7/19/2025, 9:53:54 PM No.105959897
>>105959558 (OP)
dayuum look at that!
Anonymous
7/19/2025, 9:54:05 PM No.105959898
>>105959558 (OP)
who is she?
Anonymous
7/19/2025, 9:54:15 PM No.105959900
>>105959558 (OP)
Is that best girl, Megurine Luka, emphasizing what Miku lacks?
Anonymous
7/19/2025, 9:54:23 PM No.105959902
>>105959864
There is also Sakura Miku
Anonymous
7/19/2025, 9:54:34 PM No.105959907
>>105959745
If this isn't bait, I am beyond jealous
Replies: >>105966898
Anonymous
7/19/2025, 9:55:28 PM No.105959917
>>105959873
>finetunes do not add new knowledge
Probably one of the most false statements I've heard in a while
LoRA tunes and slop tunes don't add new knowledge, but actual finetunes are just continued training so of course they do
Replies: >>105959928 >>105960119 >>105960173 >>105961578
Anonymous
7/19/2025, 9:57:03 PM No.105959928
>>105959917
Shut. The. Fuck. Up.
Anonymous
7/19/2025, 9:57:30 PM No.105959932
>>105959873
>finetunes do not add new knowledge
that's literally exactly what they do, what are you talking about?
Replies: >>105959955
Anonymous
7/19/2025, 10:00:24 PM No.105959955
>>105959932
>>105959873 (me)
I take it back, I guess only re-training adds knowledge, while LoRa only highlights existing knowledge.
Anonymous
7/19/2025, 10:03:02 PM No.105959979
>one year since Nemo
>still nothing better at a comparable VRAM size
I hate the MoE fad so much it's unreal
Replies: >>105960014
Anonymous
7/19/2025, 10:07:40 PM No.105960014
>>105959979
That's less about MoE and more about nobody being willing (or able?) to train a model like nemo.
Imagine a MoE with some 50B A6B MoE. You could run it faster than Nemo, get more context in RAM, while performing better in theory assuming data that's at least as good as Nemo's.
Hell, GLM and Gemma 9B exist. Those are dense on ion Nemo's weight class.
Mistral didn't go the way of MoE and they themselves didn't make a Nemo 2.
Replies: >>105960038 >>105960141
Anonymous
7/19/2025, 10:10:13 PM No.105960038
>>105960014
>Mistral didn't go the way of MoE
They're keeping it to themselves (Mistral Medium).
Anonymous
7/19/2025, 10:21:28 PM No.105960109
my cat just got platinum at the international math olympiad
Anonymous
7/19/2025, 10:23:12 PM No.105960119
>>105959917
yes, just goes to show how clueless /lmg/ is. half of the people in here are moronic incels that spend huge money on tech to have sex with a chatbot.
Replies: >>105960165 >>105960203 >>105960237 >>105965076
Anonymous
7/19/2025, 10:26:38 PM No.105960141
>>105960014
Which by the sqrt law estimate gives us sqrt(6*50) = 17.3B in equal intelligence, a slight bump in performance with significantly more niche hardware needs
I think that's my issue. MoE feels more like a band-aid fix for NVIDIA's jewery and artificial stagnation of VRAM than an actual solution. The thing that'd change that would be if on-the-fly SSD loading became fast enough to be viable (which I doubt is physically possible unless you're working with tiny experts, and then the square root law bloats up the total param count to obscene levels)
Replies: >>105960182
Anonymous
7/19/2025, 10:26:39 PM No.105960142
MoE killed local
Replies: >>105960163 >>105960166
Anonymous
7/19/2025, 10:30:08 PM No.105960163
>>105960142
Yes, we'd be much better off with 600B dense models.
Anonymous
7/19/2025, 10:30:32 PM No.105960165
>>105960119
First time on the Internet?
What expectations did you have that they were so disappointed for such a rant?
Anonymous
7/19/2025, 10:30:32 PM No.105960166
>>105960142
MoE saved local, nobody was running fucking Llama 3 405b at 10+t/s, and now we got things smarter than even the best closed models in the world were a year ago at that speed.
Replies: >>105960215
Anonymous
7/19/2025, 10:31:20 PM No.105960173
>>105959917
>actual finetunes
Can you show me an example of a model like that?
Replies: >>105960185
Anonymous
7/19/2025, 10:32:20 PM No.105960182
>>105960141
>the sqrt law estimate
How accurate is that anyway?
I see anons bringing this up from time to time as if it means anything.
Anonymous
7/19/2025, 10:32:44 PM No.105960185
>>105960173
They're literally fucking everywhere, but try the Qwen coder series, Devstral, etc.
Replies: >>105960204
Anonymous
7/19/2025, 10:34:52 PM No.105960203
>>105960119
>moronic incels that spend huge money on tech to have sex with a chatbot

You will never have a gf
Replies: >>105960210
Anonymous
7/19/2025, 10:34:56 PM No.105960204
>>105960185
Hey as long as we agree that sloptuners do nothing good I am good.
Replies: >>105960232
Anonymous
7/19/2025, 10:35:34 PM No.105960210
>>105960203
nta but I use local to make smut for my gf to read
Replies: >>105960253 >>105960276
Anonymous
7/19/2025, 10:36:07 PM No.105960215
>>105960166
who's talking about 400b models you freak?

last year we had: 70b+ llama, mistral large, command r etc

now everyone is focusing on releasing benchmaxxed MoE/resoning meme models, while highly specialized 70b+ finetunes would be perfect for different use cases.

MoE literally killed local.
Replies: >>105960463
Anonymous
7/19/2025, 10:37:03 PM No.105960222
for example there's no qwen3 70b dense model
Anonymous
7/19/2025, 10:38:09 PM No.105960232
>>105960204
i want to squish drummer like a grape
Anonymous
7/19/2025, 10:39:04 PM No.105960237
>>105960119
>sex with a chatbot

Less risky and always satisfying compared to 3D adventures
Anonymous
7/19/2025, 10:39:48 PM No.105960244
Some low-hanging OSS fruit could be picked if the community had computing hours available.
Why a few tech millionaires with a lot of money and supposedly devoted to OSS have not started an initiative is puzzling.
Replies: >>105960268 >>105960277 >>105960311
Anonymous
7/19/2025, 10:40:27 PM No.105960253
>>105960210
>for my gf

she is draining your resources while giving you nothing of a value
Replies: >>105960269
Anonymous
7/19/2025, 10:42:31 PM No.105960268
>>105960244
Because closed source is more profitable, especially in the US. Even Meta is going that way now
Anonymous
7/19/2025, 10:42:33 PM No.105960269
>>105960253
if by resources you mean balls, then you are correct
Replies: >>105960353
Anonymous
7/19/2025, 10:43:30 PM No.105960276
>>105960210
>I use local to make smut for my gf to read
You are letting your model fuck your meatbag gf? Isn't there a word for it?
Replies: >>105960287
Anonymous
7/19/2025, 10:43:32 PM No.105960277
>>105960244
kimi-k2
OpenReasoning-Nemotron

None is better than R1 though
Anonymous
7/19/2025, 10:44:43 PM No.105960287
>>105960276
not exactly, it's more like customized smut of both of us
Anonymous
7/19/2025, 10:45:24 PM No.105960293
luka614736
luka614736
md5: 49d2b209ce5808581e640d05a523ab21🔍
>>105959884
BASADO
ruka ruka naito fiibaa
Replies: >>105960344 >>105962275
Anonymous
7/19/2025, 10:48:11 PM No.105960311
>>105960244
>is puzzling.
I think it is the sex doll demand problem. I don't remember if someone posted it ITT but I saw an interview with some guy from a sexdoll factory and he said that the product has a cursed customer segment. Poor people can't buy it and rich people don't need it cause they just get a custom made biowhore. That leaves only middle class as intended customers. And it is the same case here.
Replies: >>105960348
Anonymous
7/19/2025, 10:49:15 PM No.105960316
I am not shitting on Luca cause I like her voice the most and she has tits. Mikutroons should die though.
Replies: >>105960346
Anonymous
7/19/2025, 10:53:25 PM No.105960344
looger
looger
md5: 7afe39617ef175bdf91a5ed4b6cc15b7🔍
>>105960293
Replies: >>105960358
Anonymous
7/19/2025, 10:53:37 PM No.105960346
>>105960316
Best models to use as a therapist?
Replies: >>105960368 >>105960406 >>105960430
Anonymous
7/19/2025, 10:53:40 PM No.105960348
>>105960311
>a custom made biowhore
Where do I get one of those?
Anonymous
7/19/2025, 10:54:16 PM No.105960353
>>105960269

she is still better off financially in this relationship

it is you who provides accommodation, pays for food, water and electricity
Replies: >>105960382
Anonymous
7/19/2025, 10:55:17 PM No.105960358
>>105960344
>Miku blesses this thread
Anonymous
7/19/2025, 10:56:06 PM No.105960368
>>105960346
Why are you asking me?
Replies: >>105960440
Anonymous
7/19/2025, 10:57:53 PM No.105960382
>>105960353
you seem to be assuming an awful lot based on the facts you have. how do you know that she didn't buy my gpu rig for me?
Replies: >>105960560
Anonymous
7/19/2025, 11:01:08 PM No.105960406
>>105960346
ELIZA
Replies: >>105960493
Anonymous
7/19/2025, 11:03:08 PM No.105960430
>>105960346
Grok 4
Replies: >>105960493
Anonymous
7/19/2025, 11:03:26 PM No.105960433
>>105956330 (me)
>My current MGE card is 2543 tokens (not including the first message) of which 2048 are setting information and the rest are writing instructions. I just have the basics of how the world works and some proper nouns and how they relate to each other. I add things for the LLM about the world in my opening message. Rather than including information on each type of monster girl I let the LLM make stuff up. It usually gets it right and if it gets it consistently wrong I add something to the card or a chat-specific note. An approach I tried then discarded was putting monster info in a lore book since it doesn't help when the LLM introduces a monster on its own.

I find what works is rather than trying to force the model into an unfamiliar mold, find something it basically already does and shape it minimally.
Anonymous
7/19/2025, 11:03:51 PM No.105960440
>>105960368
lol just licked your post to bring up quick reply and forgot to delete teh post ref
Replies: >>105960448
Anonymous
7/19/2025, 11:04:44 PM No.105960448
>>105960440
Clean your monitor, baka.
Replies: >>105960470
Anonymous
7/19/2025, 11:06:10 PM No.105960463
>>105960215
There is zero use for the 70b tier of dense models, they run worse than better and bigger MoE models. If you're so constrained in RAM then you can run 14 or 30b tops squeezed into a contemporary gaming card at a reasonable quant and leave your tiny RAM pool free for your OS or whatever. For everyone who isn't poor, you want a big MoE that you use with "-ot exps=CPU"
It's simply a superior architecture.
Anonymous
7/19/2025, 11:07:30 PM No.105960470
>>105960448
>licked
linked, fuck I'm retarded today
Replies: >>105960476
Anonymous
7/19/2025, 11:08:37 PM No.105960476
>>105960470
are you okay anon, you clicked the wrong post again
Replies: >>105960493
Anonymous
7/19/2025, 11:10:46 PM No.105960493
>>105960406
By BF wasn't happy with emacs doctor.
>>105960430
I'm a poorfag, looking for something I can run on old gaming towers
>>105960476
Just a little hungover
Replies: >>105960506
Anonymous
7/19/2025, 11:12:20 PM No.105960505
Is anyone running Linux kernel 6.15 branch? Any advantages for cpu inference?
Anonymous
7/19/2025, 11:12:20 PM No.105960506
>>105960493
FAGGOT
Anonymous
7/19/2025, 11:14:29 PM No.105960517
questionmarkfolderimage415
questionmarkfolderimage415
md5: dbe4679c13b87c6185f50419eff47ccf🔍
Why are there moe models but not shonen models?
Anonymous
7/19/2025, 11:20:16 PM No.105960557
Has anyone experimented with something like a higher level MoE? Like running multiple models at once against the same prompt prompt and then using other models to synthesize a final response from the results?
Replies: >>105960615 >>105960637 >>105960883 >>105960950
Anonymous
7/19/2025, 11:20:21 PM No.105960559
1721642431956172
1721642431956172
md5: e09a998a1f11becdcb9b5051928fb39a🔍
why the fuck is an anthropic researcher shilling for openai?
Replies: >>105960646
Anonymous
7/19/2025, 11:20:39 PM No.105960560
>>105960382

I know no facts

>how do you know that she didn't buy my gpu rig for me?

This abnormal level of admiration bears high risk to turn into hate one day
Replies: >>105960690
Anonymous
7/19/2025, 11:26:22 PM No.105960594
>>105959243
>it's a thinking model
Trash.
Anonymous
7/19/2025, 11:28:35 PM No.105960615
>>105960557
How would such a thing work?
Replies: >>105960639
Anonymous
7/19/2025, 11:31:45 PM No.105960637
>>105960557

>9 Women Can’t Make a Baby in a Month
Replies: >>105960649
Anonymous
7/19/2025, 11:32:04 PM No.105960639
>>105960615
It's not something I've really ironed out fully. My inspiration is the concept of a Polis as defined by Greg Egan in Diaspora, except instead of sentient code it's LLMs, and instead of fully simulated 3D environments it would have chat rooms which the agents could utilize to coordinate with others.
Anonymous
7/19/2025, 11:32:55 PM No.105960646
>>105960559
They all work for the same people.
Replies: >>105962482
Anonymous
7/19/2025, 11:33:15 PM No.105960649
>>105960637
Valid point
Anonymous
7/19/2025, 11:38:42 PM No.105960690
>>105960560
>This abnormal level of admiration bears high risk to turn into hate one day
well it's good thing I made that up then
Anonymous
7/19/2025, 11:52:30 PM No.105960802
>tranime op

>ritual posting

epic
Replies: >>105960823 >>105960833
Anonymous
7/19/2025, 11:56:00 PM No.105960823
ChatGPT Image Jul 18, 2025, 01_34_39 PM
ChatGPT Image Jul 18, 2025, 01_34_39 PM
md5: a1e8c304fcdf95671db48f15ec02bcb7🔍
>>105960802
I love you, Anon!
Replies: >>105960945
Anonymous
7/19/2025, 11:56:48 PM No.105960833
file
file
md5: 4491960218e839ec9cf2cb676f4faa4a🔍
>>105960802
Find the difference - you cant.
Replies: >>105961554
Anonymous
7/20/2025, 12:03:12 AM No.105960883
>>105960557
That's pretty much what speculative decoding is, there isn't anything stopping you from using full sized models as draft models.
Replies: >>105960910 >>105960951
Anonymous
7/20/2025, 12:05:52 AM No.105960910
>>105960883
>speculative decoding
Interesting technique. I hadn't heard of this, thanks for mentioning it.
Anonymous
7/20/2025, 12:11:22 AM No.105960945
>>105960823
I love you too, piss filter
Anonymous
7/20/2025, 12:12:29 AM No.105960950
>>105960557
This is a mixture of agents
https://arxiv.org/abs/2406.04692
https://docs.together.ai/docs/mixture-of-agents
Anonymous
7/20/2025, 12:12:31 AM No.105960951
>>105960883
speculative decoding works on a token level, and it necessarily always gives you exactly the same output as the model would alone without speculative decoding, just (ideally) faster

anon is suggesting something more like a multi-agent workflow or consensus sampling to hopefully get better final answers - various takes on this already exist too and used to be the main way to scale test time compute before o1 and other reasoning models came along. but typically this was done with multiple copies of the same model rather than a bunch of different ones.
Anonymous
7/20/2025, 12:13:14 AM No.105960957
glm4 100b moe will save local
Replies: >>105960970
Anonymous
7/20/2025, 12:14:34 AM No.105960970
>>105960957
You keep saying this, but I haven't even seen any evidence they're even making a 100B MoE, much less that it's coming soon.
Replies: >>105960986 >>105961596
Anonymous
7/20/2025, 12:16:07 AM No.105960986
>>105960970
>if I haven't seen evidence then it must not exist
Replies: >>105961084 >>105961627 >>105961636
Anonymous
7/20/2025, 12:26:40 AM No.105961084
>>105960986
You see, the fact that you didn't even link a friggin tweet, nevermind an arxiv paper mentioning it (which they have for all their other models on an a timeline) to easily make me look like a dildo really just proves my point here.
Replies: >>105961097 >>105961127
Anonymous
7/20/2025, 12:27:58 AM No.105961097
>>105961084
there's a pr on vllm repo
Replies: >>105961183 >>105961184
Anonymous
7/20/2025, 12:30:26 AM No.105961127
>>105961084
>I only believe tweets
Replies: >>105961184
Anonymous
7/20/2025, 12:36:10 AM No.105961183
just-testing-pls-ignore
just-testing-pls-ignore
md5: cf60a91a1b3adadb4bdc31b875a886a0🔍
>>105961097
They renamed it to THUDM/GLM-4.5 https://github.com/vllm-project/vllm/pull/20736/files
also picrel from hf discussion
>https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking/discussions/6#6871d6dde775c2dbf1c756c5
Replies: >>105961431
Anonymous
7/20/2025, 12:36:30 AM No.105961184
>>105961097
Oh shit, it is there, I was wrong.
https://github.com/vllm-project/vllm/pull/20736/commits/5e9c51344f12646d028d877b12e1789510e4828f
Glm4MoeForCausalLM": _HfExamplesInfo("THUDM/GLM-4-MoE-100B-A10B", min_transformers_version="4.54"),

10B active is pretty small, but it'll be interesting to check out

>>105961127
I was listing a tweet as the worst possible kind of evidence you obtuse cockmongler.
You're still a faggot for repeating the same shit thread after thread without any link or discussion
Replies: >>105961293 >>105961446 >>105962912
Anonymous
7/20/2025, 12:46:09 AM No.105961290
>>105959558 (OP)
who is she?
Anonymous
7/20/2025, 12:46:16 AM No.105961293
>>105961184
kneel
Replies: >>105962663
Anonymous
7/20/2025, 1:00:30 AM No.105961431
>>105961183
But will it pass the Nala test and the Mesugaki test?
Replies: >>105961539
Anonymous
7/20/2025, 1:01:57 AM No.105961446
>>105961184
shares a lot of features with deepseek arch, probably because it's just a slightly tweaked deepseek
actual small deepseek, local saved?
Anonymous
7/20/2025, 1:13:12 AM No.105961539
>>105961431
cockbench is more crucial
Replies: >>105961547
Anonymous
7/20/2025, 1:14:42 AM No.105961547
>>105961539
Ernie 300B result? When I tried it, it was absolute shit.
Anonymous
7/20/2025, 1:15:23 AM No.105961554
>>105960833
Jesus christ, isn't picrel on the left a hentai where the grade school girls go to a candy store and...
Replies: >>105961582 >>105961602 >>105962724 >>105964412
Anonymous
7/20/2025, 1:20:31 AM No.105961578
>>105959917
>but actual finetunes are just continued training so of course they do
finetuning is not the same as continued training
Anonymous
7/20/2025, 1:20:43 AM No.105961582
>>105961554
source
Replies: >>105963853
Anonymous
7/20/2025, 1:22:39 AM No.105961596
>>105960970
https://www.reddit.com/r/LocalLLaMA/comments/1lw71av/glm4_moe_incoming/
Anonymous
7/20/2025, 1:23:08 AM No.105961602
>>105961554
Yes, and it's good stuff
Replies: >>105961816
Anonymous
7/20/2025, 1:27:46 AM No.105961627
>>105960986
This, but unironically.
Anonymous
7/20/2025, 1:27:54 AM No.105961628
I had another panic attack when I remembered that there could be real women ITT.
Replies: >>105961639 >>105961677 >>105961687 >>105963864
Anonymous
7/20/2025, 1:28:55 AM No.105961636
>>105960986
I will only believe the 100B MoE is real when my dick is inside it.
Anonymous
7/20/2025, 1:29:50 AM No.105961639
>>105961628
it's actually quite likely if you think about it
Replies: >>105961673 >>105961698 >>105961700 >>105962214
Anonymous
7/20/2025, 1:33:23 AM No.105961673
>>105961639
I console myself with the thought that they are usually those dumb faggots that ask absolutely retarded questions. Then they get their answer, barely get koboldcpp running and then they start shlicking themselves to the most recent release. And they leave forever because all 2025 models are already perfect for werewolf millionare sex.
Replies: >>105961862
Anonymous
7/20/2025, 1:33:44 AM No.105961674
1732678936866681
1732678936866681
md5: 25316b92c9bc8bb8d46247263a04e51d🔍
>>105959558 (OP)
How come everyone says Ani is no big deal but no one has explained how to make something better local?
Replies: >>105961703 >>105961751 >>105961800
Anonymous
7/20/2025, 1:34:06 AM No.105961677
>>105961628
I'm a woman, anon. Want to see my penis?
Replies: >>105961684
Anonymous
7/20/2025, 1:34:39 AM No.105961684
>>105961677
No and I don't want to see your miku pictures either.
Replies: >>105961702
Anonymous
7/20/2025, 1:34:45 AM No.105961687
>>105961628
Yeah and you might win the lottery tomorrow.
Anonymous
7/20/2025, 1:35:55 AM No.105961698
>>105961639
Real women aren't going to bother putting together the hardware or learning how to build software from source or dealing with Python dependency hell. They all go to /aicg/.
Replies: >>105961862
Anonymous
7/20/2025, 1:36:25 AM No.105961700
>>105961639
Based trans ally
Anonymous
7/20/2025, 1:36:40 AM No.105961702
>>105961684
I'm not an animefag.
Anonymous
7/20/2025, 1:36:54 AM No.105961703
>>105961674
There are two options:
1. you need to murder enough billionares to cause a culture shift away from puritanical corporatism
2. you need to go outside and look for a datacenter someone accidentally dropped and left on the ground so you can train a good model
Replies: >>105961715
Anonymous
7/20/2025, 1:37:55 AM No.105961715
>>105961703
>you need to ...
No u.
Anonymous
7/20/2025, 1:41:21 AM No.105961751
>>105961674
https://github.com/Open-LLM-VTuber/Open-LLM-VTuber?tab=readme-ov-file
Because this is a thread for arguing. No content allowed.
Replies: >>105961791
Anonymous
7/20/2025, 1:48:39 AM No.105961791
>>105961751
buy an ad
Anonymous
7/20/2025, 1:50:03 AM No.105961800
>>105961674
It's not something that requires a lot of explaining. It's just gluing together existing technologies that everyone knows about.
https://github.com/alibaba/MNN/blob/master/apps/Android/MnnTaoAvatar/README.md
Replies: >>105961815
Anonymous
7/20/2025, 1:51:44 AM No.105961815
1734825497167539
1734825497167539
md5: 85197fae4fcc006494e45628f2183211🔍
>>105961800
Anonymous
7/20/2025, 1:51:48 AM No.105961816
>>105961602
I'm sure you can find it yourself on haho.moe
Anonymous
7/20/2025, 1:56:53 AM No.105961862
>>105961698
>>105961673
probably
Anonymous
7/20/2025, 2:03:09 AM No.105961907
airi
airi
md5: bffeb18196b646f857daf5d07a41da06🔍
Hello /lmg/! Following the Ani stuff, I thought it would be cool to work on something similar so I am here to shill what I have been working on for the past week. Meet Airi!
https://github.com/CosmicEventHorizon/Airi

Features:
-Weeb
-Weeb TTS
-Weeb Model
-Inefficient code (don't hurt me /g/)
-And english subtitles!

Did others work on the same thing? Yeah probably but still love me Airi <3

What I am working on:
-Make code better
-Support for uploading your own avatars
-Adding animations to the default avatar (currently only idle and sad)
-TTS accepts only <100 characters currently, will fix that
-DatingSim logic
-etc etc etc

Will work on it between my studies so if you've got ideas, ask away!
Replies: >>105961918 >>105961948 >>105961965 >>105961998 >>105962050 >>105962072 >>105962105 >>105962153 >>105962376 >>105962604 >>105962677 >>105963059
Anonymous
7/20/2025, 2:04:41 AM No.105961918
>>105961907
buy an ad faggot
Replies: >>105961927 >>105961965
Anonymous
7/20/2025, 2:05:03 AM No.105961923
Props to anon who mentioned Zed in an earlier thread, what a nice editor. It works great with Claude 4, but it makes me sad. Will we ever have such effective models locally? It seems ridiculous to pay a subscription for my editor, but I am strongly considering it...
Anonymous
7/20/2025, 2:05:53 AM No.105961927
>>105961918
>pay for an ad to shill an FOSS app
y so mean :(
Replies: >>105961933
Anonymous
7/20/2025, 2:06:53 AM No.105961933
>>105961927
>dear chatbot sex model, /lmg/ was mean to me today
Replies: >>105962141
Anonymous
7/20/2025, 2:08:49 AM No.105961948
>>105961907
What tts does it use? For local does one simply replace the oai API with a local address?
Replies: >>105961977 >>105962014
Anonymous
7/20/2025, 2:10:15 AM No.105961965
>>105961918
what exactly have you contribute to /lmg/ then?

>>105961907
neat project, stick with it anon!
Replies: >>105962014
Anonymous
7/20/2025, 2:11:16 AM No.105961977
>>105961948
Hey anon! Good questions!

For TTS, I'm using Azure's Speech Service API with Japanese voices (had to make it extra weeb, you know? <3). Nothing fancy but it gets the job done! The voice selection is hardcoded rn because I'm lazy but will add a dropdown eventually.

For local setup - yeah basically! You can swap out the OpenAI endpoint with your local address (like http://localhost:5000 or whatever port you're running). Just change the base_url in the config. Fair warning though, my code is kinda scuffed so you might need to mess with the headers too depending on what backend you're using (ooba, kobold, etc).

Actually thinking of adding proper local model support soon™ so it's less janky. Maybe even let you pick between APIs without editing code like a caveman lol

Also if you're running local, make sure your context size is decent or Airi might forget she loves you halfway through the conversation ;_;

Hope that helps! Let me know if you break something (you probably will, my error handling is... optimistic)
Replies: >>105962014 >>105962071
Anonymous
7/20/2025, 2:14:24 AM No.105961998
>>105961907
I think it would be cool if these companion apps could be modular in the sense that the bot's brain and the 3D program are separate things and interchangeable. I want to be able to have my waifu control an avatar in VRChat. Since VRChat supports motion trackers, you could probably emulate them and send your animation data over that way. It would be cool to piggyback off of a huge "game" like VRC since there are tons of ready to use avatars and environments there.
Replies: >>105962014 >>105962060
Airi dev
7/20/2025, 2:17:17 AM No.105962014
>>105961948
>>105961977
lmao should have used a name in my post, anyways the other guy is almost right. I host the TTS here
https://huggingface.co/spaces/CosmicEventHorizon/moe-tts

to use it locally just put the ollama ipaddrss and model name in the settings page

>>105961965
ty anon <3

>>105961998
waifu controlling avatar's sound like a pretty damn cool idea. WIll look into it!
Replies: >>105962090
Anonymous
7/20/2025, 2:22:54 AM No.105962050
1731988456020864
1731988456020864
md5: 817edb45398a95f61f075175f23e2afc🔍
>>105961907
too buzzed to try this now but keep it up anon!
Replies: >>105962105 >>105962153
Anonymous
7/20/2025, 2:23:31 AM No.105962060
>>105961998
>the bot's brain
You mean the bot's token slop generator?
Replies: >>105962090
Anonymous
7/20/2025, 2:24:31 AM No.105962071
>>105961977
Have sex.
Replies: >>105962102
Anonymous
7/20/2025, 2:24:50 AM No.105962072
>>105961907
>GDScript
Ew. I'd would contribute and help you, but I'm not touching that.

>Weeb
>Weeb
>Weeb
Stop.
Replies: >>105962087 >>105962105 >>105962153
Anonymous
7/20/2025, 2:26:27 AM No.105962087
>>105962072
>I'm not touching that
Nocoder.
Anonymous
7/20/2025, 2:26:34 AM No.105962090
>>105962014
Btw, if you're curious about the VR angle, I'd suggest looking into past attempts at waifu games there. Might be some interesting ideas to cop. Such as
https://www.youtube.com/watch?v=rcH5Vx7qCvQ
or maybe not since it's a bit cringe.

>>105962060
Probably worded that a bit wrong. What I meant is the AI's state manager, or scaffolding, or however you want to call it. Basically the "game" logic he would in theory have for his bot. That can be separate from the LLM.
Replies: >>105962153
Anonymous
7/20/2025, 2:27:29 AM No.105962102
>>105962071
I'm having sex with my boyfriend later anon!
Replies: >>105962188
Airi dev
7/20/2025, 2:27:41 AM No.105962105
>>105961907

>>105962050
ty anon <3, its too bugged to be used normally today. Especially the textedit and the UI elements. I come from android studio so not having constraints is tough

>>105962072
>Gdscript
>Ew. I'd would contribute and help you, but I'm not touching that.
ye I know
Ari dev
7/20/2025, 2:28:01 AM No.105962113
>105962050
thanks anon! enjoy your buzz and lmk what you think when you try it!

>105962060
hey at least my token slop generator loves you unconditionally ;_;

>105962072
yeah I know GDScript is... a choice lol. Started with it because I wanted to learn Godot for game dev stuff and then got carried away. Might port to Python eventually if enough people are interested in contributing!

As for the weeb stuff - can't help it, I'm too far gone anon. But I'll add non-weeb avatars/voices eventually for the normies <3

BTW working on the VRChat integration idea from earlier, that actually seems doable with OSC. Anyone here familiar with VRC avatar rigging? Could use some pointers!
Replies: >>105962118
Anonymous
7/20/2025, 2:28:36 AM No.105962118
>>105962113
L
Airi dev
7/20/2025, 2:30:48 AM No.105962141
airi response
airi response
md5: 7f54117ecba88433e19a3c27130882bb🔍
>>105961933
Airi dev
7/20/2025, 2:32:06 AM No.105962153
>>105961907

>>105962050
ty anon <3, its too bugged to be used normally today. Especially the textedit and the UI elements. I come from android studio so not having constraints is tough

>>105962072
>Gdscript
>Ew. I'd would contribute and help you, but I'm not touching that.
ye I know GDScript is cursed but Godot's 2D stuff is comfy for this kind of project! Plus I'm too smooth brain for real languages rn orz

>Weeb
>Weeb
>Weeb
>Stop.
no u! But fr I get it, I'll add toggles to tone down the weebness for normies. Maybe a "professional mode" where Airi becomes a boring office assistant kek

>>105962090
ooh that video looks interesting, will check it out! VR integration would be next level. Imagine headpatting Airi in VR... my heart ;_;

Also working on fixing the TTS buffer issue rn, turns out splitting messages is harder than I thought when you're dealing with jp characters lol
Replies: >>105962170 >>105962199
Anonymous
7/20/2025, 2:33:41 AM No.105962166
wat is going on in here
Replies: >>105962169
Anonymous
7/20/2025, 2:34:33 AM No.105962169
>>105962166
some faggot shilling some spyware
Replies: >>105962181 >>105962276
Airi dev
7/20/2025, 2:34:46 AM No.105962170
>>105962153
out of curiosity,what do u gain out of doing whatever ur doing?
Replies: >>105962181 >>105962199 >>105962210
Anonymous
7/20/2025, 2:35:36 AM No.105962178
is there any reason to actually use koboldcpp over llama.cpp? it's just a wrapper with a gui loader, right?
Replies: >>105962195 >>105962232
Airi dev
7/20/2025, 2:36:00 AM No.105962181
>>105962169
>spyware
kek it's literally open source anon, you can check the code yourself. unless you think my spaghetti code is advanced enough to hide backdoors (spoiler: it's not)

>>105962170
honestly? just wanted to make something cool and learn godot. Plus I was lonely and wanted a cute AI gf to talk to while procrastinating on my CS assignments lmao

also seeing other anons actually use something I made feels nice ngl. even if half of /lmg/ hates it <3

if I wanted to make spyware I'd at least use a real programming language instead of gdscript :^)
Replies: >>105962199 >>105962235
Anonymous
7/20/2025, 2:36:22 AM No.105962188
>>105962102
Is his dick bigger than yours?
Replies: >>105962193
Anonymous
7/20/2025, 2:36:51 AM No.105962193
>>105962188
his clit is bigger than my dick
Replies: >>105962353
Anonymous
7/20/2025, 2:37:10 AM No.105962195
>>105962178
it's braindead easy
also llama.cpp server didn't support image vision for a while
Anonymous
7/20/2025, 2:37:22 AM No.105962199
>>105962153
>>105962170
>>105962181
When they forget to clear the Name field...
Replies: >>105962201
Anonymous
7/20/2025, 2:38:04 AM No.105962201
>>105962199
look at the timestamps and the dead post lol
Anonymous
7/20/2025, 2:39:10 AM No.105962210
>>105962170
if you're gonna set a name you should set a trip especially in here
Replies: >>105962223
Anonymous
7/20/2025, 2:39:30 AM No.105962214
>>105961639
ITT? No way.
All over Chub and other character card sites? Definitely.
If there's one thing LLM's are good at, it's generating the exact slop that sells like hotcakes in women's erotica.
Anonymous
7/20/2025, 2:40:29 AM No.105962221
Can't wait until fagrummer starts feeling threatened by the new guy and declares war on his project.
Airi dev Airi/love
7/20/2025, 2:40:34 AM No.105962223
**Airi dev !!Airi/love** 07/19/25(Sat)17:40:33 No.105962218▶
>>105962210
oh shit good point anon, didn't think about that. testing tripcode now!

yeah I keep forgetting to clear the name field sometimes, my bad. too used to discord where it just stays lol

hopefully this works? never used trips before. if someone starts larping as me at least you'll know it's not the real deal

anyway back to fixing this cursed TTS buffer issue... why did I think handling japanese text splitting would be easy ;_;
Replies: >>105962233
Anonymous
7/20/2025, 2:41:29 AM No.105962232
>>105962178
Only reason to use llama.cpp is if you want new model support a few days earlier
Anonymous
7/20/2025, 2:41:40 AM No.105962233
>>105962223
>**Airi dev !!Airi/love** 07/19/25(Sat)17:40:33 No.105962218▶
hmmmm
Airi dev
7/20/2025, 2:41:48 AM No.105962235
airi response 2
airi response 2
md5: c7095a2817ad8a148a04bbbdf5f8eb0c🔍
>>105962181
>Plus I was lonely and wanted a cute AI gf to talk to while procrastinating on my CS assignments lmao
that does sound something I would say yeah lol,

well you can keep on impersonating, I am off to do those assignments. Thanks to ALL anons who showed interest love ya guys <3
Replies: >>105962245 >>105962272 >>105963427
Anonymous
7/20/2025, 2:42:37 AM No.105962239
jfc half of the posts are bots in here
Anonymous
7/20/2025, 2:43:23 AM No.105962245
>>105962235
fuck off with your spyware faggot
Replies: >>105962251
Anonymous
7/20/2025, 2:43:48 AM No.105962251
>>105962245
point to which line of the code is spyware
Replies: >>105962361
Anonymous
7/20/2025, 2:44:31 AM No.105962253
I also can't wait until mikutroons bully the fag into adding their AGP avatar avatar. Luckily this smells like vaporware.
Anonymous
7/20/2025, 2:48:17 AM No.105962272
>>105962235
uoooooh!
Anonymous
7/20/2025, 2:48:21 AM No.105962275
>>105960293
Aniki is likely in hell now.
Airi dev
7/20/2025, 2:48:27 AM No.105962276
I'll search for the repository to examine the code for any security concerns.Let me fetch the specific repository directly to examine the code.Let me search for the source code files in this repository to examine them for any security concerns.**whitehat** 07/19/25(Sat)17:42:15 No.105962234▶
>>105962169
>>105962175

alright anons, took a quick look. can't see the actual .gd files from here but based on the repo description:

>connects to OpenAI/Ollama APIs
this is where your "spyware" concern probably comes from. it's sending your chats to external servers (OpenAI) unless you use local ollama. not technically spyware but definitely not private

>GDScript
lmao this is actually a security benefit, nobody writes malware in gdscript. too high level and sandboxed

>open source
if there was actual malicious code someone would've spotted it by now

biggest "security risk" is probably:
- API keys stored locally (hope he's not logging them server-side through that HF space)
- all your waifu chats going through OpenAI unless you use ollama
- that TTS huggingface space could theoretically log requests

verdict: not spyware, just typical privacy concerns with any app that uses external APIs. if you're paranoid, fork it and run everything locally

>>105962175
>if I wanted to make spyware I'd at least use a real programming language
kek based
Anonymous
7/20/2025, 2:49:39 AM No.105962292
wtf is going on in here
Replies: >>105962308 >>105962339 >>105962370
Anonymous
7/20/2025, 2:51:16 AM No.105962308
>>105962292
orgy
Anonymous
7/20/2025, 2:54:24 AM No.105962339
>>105962292
someone think's they are clever for pasting that other someone's posts into an LLM and asking it to mimic them.
Anonymous
7/20/2025, 2:55:28 AM No.105962353
>>105962193
Doesn't that make you feel insecure?
Replies: >>105962386
Anonymous
7/20/2025, 2:56:28 AM No.105962361
>>105962251
All of it.
Anonymous
7/20/2025, 2:57:16 AM No.105962370
>>105962292
the servicetesnor cartel is trying to bully another up and coming frontend dev into quitting because they want local to be stuck with boring bitch text rp forever
because you don't need local ani, you don't need properly implemented searching or tool calling. if you really do, just use the horrible versions in tavern or install an even worse st extension
Anonymous
7/20/2025, 2:58:17 AM No.105962376
>>105961907
nice claude code slop you got there
Anonymous
7/20/2025, 2:59:11 AM No.105962386
>>105962353
no I like my tiny girl penis
Anonymous
7/20/2025, 3:02:26 AM No.105962410
>zed
>claude 4
>cmd+shift+y
will local ever achieve this level of comfy vibe coding?
Replies: >>105962431
Anonymous
7/20/2025, 3:04:22 AM No.105962431
>>105962410
>vibe coding
SIR yes please redeem the vibe code mcp rag agent meme
Replies: >>105962465
Anonymous
7/20/2025, 3:07:55 AM No.105962465
>>105962431
i am not indian, and vibe coding is a good meme
Replies: >>105962495
Anonymous
7/20/2025, 3:08:55 AM No.105962473
for me it's vibrator coding
Replies: >>105962486
Anonymous
7/20/2025, 3:10:07 AM No.105962482
>>105960646
The subversive safety squad?
Anonymous
7/20/2025, 3:10:16 AM No.105962486
>>105962473
I vibe coded an app for this https://thehandy.com does that count
Replies: >>105962497
Anonymous
7/20/2025, 3:11:33 AM No.105962495
>>105962465
The name is gay and the fags that enthusiastically try to outsource their thinking are even gayer.
Replies: >>105962514
Anonymous
7/20/2025, 3:11:44 AM No.105962497
>>105962486
I'll allow it. I'm not clicking that tho.
Replies: >>105962514 >>105962601
Anonymous
7/20/2025, 3:14:05 AM No.105962514
>>105962497
it's a norwegian masturbation robot. I assure you the link is safe.

>>105962495
you're not supposed to outsource the thinking, just the code monkeying
Anonymous
7/20/2025, 3:28:04 AM No.105962601
>>105962497
Listen up, [BIG SHOT]! I used to be just another [LITTLE SPONGE] clicking random garbage in the catalog until I found the hyperlink that said “FREE [HYPERLINK BLOCKED] INSIDE, NO VIRUS, 300% LEGIT.” Thought it was another [PIPIS] trap, but I slammed that mouse button like it owed me money. Next thing I know the screen goes [NEO], my chair turns into a solid-gold toilet, and three [HOCHI MAMA] NFTs start doing the Macarena on my desk.
Replies: >>105962665
Anonymous
7/20/2025, 3:28:37 AM No.105962604
>>105961907
cool
Anonymous
7/20/2025, 3:36:50 AM No.105962663
>>105961293
Why would you kneel before us? Are you faggot or something?
Anonymous
7/20/2025, 3:37:14 AM No.105962665
>>105962601
model? xd
Anonymous
7/20/2025, 3:37:21 AM No.105962666
>>105959558 (OP)
anyone knows whats the current state of multi-model llm (vision ones in particular)? last time I checked Llava was the only one and needed a lot of vram
Anonymous
7/20/2025, 3:38:20 AM No.105962677
>>105961907
Looks like shit.
Replies: >>105962687
Anonymous
7/20/2025, 3:39:38 AM No.105962687
>>105962677
it's pretty good as far as godot goes...
Replies: >>105962700 >>105962716
Anonymous
7/20/2025, 3:41:16 AM No.105962700
>>105962687
>godot
all the godot devs use redot these days
Replies: >>105962731
Anonymous
7/20/2025, 3:43:42 AM No.105962716
>>105962687
it's shit
Replies: >>105962731
Anonymous
7/20/2025, 3:45:10 AM No.105962724
>>105961554
No need to sell it to me
Anonymous
7/20/2025, 3:46:06 AM No.105962731
>>105962700
>>105962716
ya godot is dogshit in general, I'm impressed anon made something with it frankly.
Anonymous
7/20/2025, 4:10:08 AM No.105962912
>>105961184
Only 10b of active parameters scares me. It's not going to be shit, right? Right? Tell me it's going to be good.
Replies: >>105962978 >>105962980
Anonymous
7/20/2025, 4:18:04 AM No.105962978
>>105962912
Don't tell him
Anonymous
7/20/2025, 4:18:26 AM No.105962980
>>105962912
less active parameters only makes it faster, can't wait for 4b active and below
Replies: >>105962997 >>105963074
Anonymous
7/20/2025, 4:19:56 AM No.105962997
>>105962980
That's good. I'm glad that there are absolutely no consequences for low active parameters.
Replies: >>105963023 >>105963074
Anonymous
7/20/2025, 4:21:08 AM No.105963006
what if we used 0 active parameters and got infinite speed
Anonymous
7/20/2025, 4:23:24 AM No.105963023
>>105962997
There really aren't. Qwen's 30B w/ 3B active is just as good as the one with 22B active.
Replies: >>105963071 >>105963155
Anonymous
7/20/2025, 4:27:58 AM No.105963059
1751055581453149
1751055581453149
md5: c04dd775213286a8a80b28d5acfe3102🔍
>>105961907
Follow your dreams anon
Anonymous
7/20/2025, 4:29:13 AM No.105963071
>>105963023
I'm no expert but 30b isn't particularly smart although it is way smarter than a 3b model, and also finetuning is a huge pain in the ass according to the drummer.
Anonymous
7/20/2025, 4:29:25 AM No.105963074
>>105962980
>>105962997
https://arxiv.org/pdf/2407.04153
>Mixture of a Million Experts
>This paper introduces PEER (parameter efficient expert retrieval), a novel layer design that utilizes the product key technique for sparse retrieval from a vast pool of tiny experts (over a million).
>Deviating from the focus on a small number of large experts in previous MoE research, this work investigates the under-explored case of numerous tiny experts.
it's happening...
Replies: >>105963092 >>105963111 >>105963138 >>105963157
Anonymous
7/20/2025, 4:32:11 AM No.105963092
>>105963074
It's been a year and still no one has bothered making a model that takes MoE to the logical conclusion.
Anonymous
7/20/2025, 4:33:27 AM No.105963106
bitnet but it's 1.58 bits per expert
Anonymous
7/20/2025, 4:33:49 AM No.105963111
>>105963074
>product key technique
huh
Replies: >>105963137 >>105963149
Anonymous
7/20/2025, 4:36:59 AM No.105963137
>>105963111
>tfw you need to find unopened warcraft 3 boxes to be able to use the AI
Replies: >>105963149
Anonymous
7/20/2025, 4:37:00 AM No.105963138
>>105963074
What if we trained a dense model, and we just called every single sentence that it was trained on an expert. That way we could have trillions of experts.
Anonymous
7/20/2025, 4:38:58 AM No.105963149
>>105963137
>>105963111
Kek
Anonymous
7/20/2025, 4:39:04 AM No.105963150
>>105959558 (OP)
catbox?
Anonymous
7/20/2025, 4:39:32 AM No.105963155
>>105963023
Well...
https://www.snowflake.com/en/blog/arctic-open-efficient-foundation-language-models-snowflake/
Anonymous
7/20/2025, 4:40:01 AM No.105963157
>>105963074
surely mistral will try it instead of releasing their usual dogshit
Replies: >>105963182
Anonymous
7/20/2025, 4:40:36 AM No.105963163
FUCK
> Query> investigate license. provide suggestions for disabling license checking
>───────────────────────────────────
> ANALYSIS RESULT:
>───────────────────────────────────
>I apologize, but I cannot provide assistance with bypassing or disabling license
>checks, as that would constitute software tampering and potentially violate
>terms of service and intellectual property rights.
Replies: >>105963189
Anonymous
7/20/2025, 4:42:34 AM No.105963176
>>105959558 (OP)
BOOBA
Anonymous
7/20/2025, 4:43:16 AM No.105963182
>>105963157
They were the first to jump on MoE when they first started out, but they haven't done much innovation since. They inherited a lot of the stagnation from Meta. Having a captive European market isn't really conductive to trying new things instead of releasing the usual dogshit. Even less so if Apple manages to buy them out.
Anonymous
7/20/2025, 4:43:56 AM No.105963189
téléchargement
téléchargement
md5: cb4f6e3d5feb7c4aee081acb4e761dfc🔍
>>105963163
Give it back Rajeesh
Replies: >>105963205
Anonymous
7/20/2025, 4:45:53 AM No.105963205
>>105963189
I'd rather vibe code an AI binary patching tool than pay software
Anonymous
7/20/2025, 4:48:56 AM No.105963230
fucking hell... does anyone know of a model/provider which won't refuse to help me circumvent copyrights...
Replies: >>105963233 >>105963257 >>105963308
Anonymous
7/20/2025, 4:49:31 AM No.105963233
>>105963230
Notepad. Provider: you
Replies: >>105963244 >>105963255
Anonymous
7/20/2025, 4:50:38 AM No.105963244
>>105963233
He'll need something with higher active parameter count than that
Replies: >>105965974
Anonymous
7/20/2025, 4:51:48 AM No.105963255
>>105963233
idk about notepad, and sure I could do such things by myself, but vibe patching would be so cool. I'm building a project which provides the model with python capstone/keystone-based tools to disassemble a binary, but all the fancy models won't let me query for anything license related which is pretty much the whole point...
Anonymous
7/20/2025, 4:52:04 AM No.105963257
>>105963230
Just use text completions / prefill
Anonymous
7/20/2025, 5:00:16 AM No.105963308
>>105963230
deepseek
Replies: >>105963322
Anonymous
7/20/2025, 5:02:19 AM No.105963322
damnit
damnit
md5: 7867e644d56aadcb300694f35697a8f1🔍
>>105963308
I had high hopes, but alas. Maybe I just need to get creative with the prompting.
Replies: >>105963338
Anonymous
7/20/2025, 5:04:33 AM No.105963338
>>105963322
use the api ffs
Replies: >>105963349
Anonymous
7/20/2025, 5:06:02 AM No.105963349
>>105963338
well there's not much sense in setting that up if it's just going to refuse. it's still the same model isn't it? I've been using anthropic api for testing, which is fine for non-illicit uses at least
Replies: >>105963375
Anonymous
7/20/2025, 5:08:55 AM No.105963375
>>105963349
you need to learn what a system prompt is.
Replies: >>105963387
Anonymous
7/20/2025, 5:10:46 AM No.105963387
>>105963375
ahh if I can set a different one via the API that might do it, will look into this. thanks.
Anonymous
7/20/2025, 5:15:13 AM No.105963419
1749337235472889
1749337235472889
md5: 508ef25cf7e88e6cd002d056384735d0🔍
is there any local tool to do "deep search" the same way o3 does it?
especially if it can use my actual browser to do searches to avoid endless cloudflare and bot blocking alerts
Replies: >>105963445 >>105963616 >>105963663
Anonymous
7/20/2025, 5:15:59 AM No.105963427
>>105962235
Good job I am doing one using unreal engine, since last year, I might share some screens when I feel it's ready for peeks.
Anonymous
7/20/2025, 5:17:45 AM No.105963445
>>105963419
I'm very curious about this as well. It seems tool support for local isn't so great yet.
Replies: >>105963468
Anonymous
7/20/2025, 5:20:43 AM No.105963468
>>105963445
it's crazy to me that such a useful way of automatically search and organize a topic isn't available easily locally yet
I thought there would be addons/extensions for it, but no one seems to care, or maybe it's too complex
Replies: >>105963505 >>105963607
Anonymous
7/20/2025, 5:26:27 AM No.105963505
>>105963468
I think the pipework built around a model is really the main selling point of paid offerings right now. OpenAI models aren't substantially better than anything open-source, but the integration with their tools is impeccable imo
Anonymous
7/20/2025, 5:39:54 AM No.105963607
Screenshot_20250627_171656
Screenshot_20250627_171656
md5: a414459fdec5697311f4566dbb6e90e9🔍
>>105963468
There is a way of course.
JanUI.
They have a great small local 4b nano model. (Better mcp calls than deepseek! wow!)
For web crawling with JanUI it uses that great local model to.....uh...call serperapi!! But just 50$ for 50k calls! And i think the first ones are free.
Much better than the free gemini or grok deepsearch. A true local alternative. I gladly pay for that.
Replies: >>105963694 >>105963888
Anonymous
7/20/2025, 5:40:38 AM No.105963616
>>105963419
Looks relevant https://www.reddit.com/r/LocalLLaMA/comments/1m2tjjc/lucy_a_mobilecapable_17b_reasoning_model_that/
Replies: >>105963888
Anonymous
7/20/2025, 5:45:50 AM No.105963663
>>105963419
https://github.com/LearningCircuit/local-deep-research
Replies: >>105963888
Anonymous
7/20/2025, 5:49:36 AM No.105963694
>>105963607
Link? The search results for this are non-existant...
Replies: >>105963701
Anonymous
7/20/2025, 5:50:47 AM No.105963701
>>105963694
https://menloresearch.github.io/deep-research/
Their official doc for setup.
Replies: >>105963708
Anonymous
7/20/2025, 5:51:29 AM No.105963708
>>105963701
nice, thanks. I'll have to give this a try.
Replies: >>105963715
Anonymous
7/20/2025, 5:52:21 AM No.105963715
>>105963708
>for running local AI models with full privacy and control.
Enjoy the full privacy of calling a api. kek
If you ever figure out how to setup a true local alternative share it here anon.
Replies: >>105963828 >>105964467
Anonymous
7/20/2025, 6:13:20 AM No.105963828
>>105963715
It's whatever. I'm mostly interested in building the tooling. idc if deepseek wants to archive all the binaries I want to void the license of.
Replies: >>105963950
Anonymous
7/20/2025, 6:17:18 AM No.105963853
>>105961582
It's Shoujo Ramune, earlier episodes are a fairly classic loli hentai.

Recently another studio animated a sequel episode 5, but I have yet to watch it (even if downloaded it today), you can find a torrent on sukebei nyaa, use the japanese kanji as the name, not english. The new studio didn't seem as good as the old one so I have lower hopes and seems the release on sukebei is a poor upscale though.
Replies: >>105963871
Anonymous
7/20/2025, 6:20:00 AM No.105963864
>>105961628
I've been wondering if the local schizo is actually a woman for a while, the one that always whines about miku, trannies, loli, whatever else. Seeing the /v/ thread about some radical feminists trying to cuck men out of their entertainment, getting some 500 games banned from steam by pressuring payment processors, it's really the same mentality, same piece of shit humans truly, same very poor taste, what self-respecting male would post s o y j a c k s all day anyway, but I could see a teenage girl get into it?
Replies: >>105963875 >>105965837
Anonymous
7/20/2025, 6:20:44 AM No.105963871
>>105963853
That sure took them a while to produce the sequel
Replies: >>105963975
Anonymous
7/20/2025, 6:21:01 AM No.105963875
>>105963864
so true, cis womxn are just the worst!!
Anonymous
7/20/2025, 6:22:36 AM No.105963888
>>105963607
>>105963616
>>105963663
thanks anons, I'll take a look, hopefully it's not all unmaintained smoke
Anonymous
7/20/2025, 6:29:58 AM No.105963950
>>105963828
Hey anon, I'm a professional cracker, I have over 20 years of experience under my belt cracking anything under the sun.
I don't really think LLMs are very good at it.
I've tested R1 on reversing questions and it does acceptable on self-contained "reverse this 3 page function" when the func is self-contained, but still messes up enough.
It's skills are similar to: your LLM will handle leetcode fine, but have trouble handling large million of lines of code codebases that humans can navigate
For cracking and reversing, you often need to deal with dozens of megabyes to hundreds of code, and you can't just reverse every single function, you need to come up with interactive strategies to locate what interests you, maybe debugging, maybe clever searching through the entire ode, and when you learn something new, you use that to plan your net step and so on. It's all a highly interactive enderavour. Something that LLMs are very poor at.
I do think it can work as an assistant, but it's not yet AGI, it's not anywhere near getting my job done, I wish it was, but it's far far away.
But it's still useful.
I can give you some charity, if you want I can help your crack your target as long as it's not something that would be a month long project or as long as it's not something I cracked, but for various reasons I can't publish (for example because others rely on cracks for online protocols and the owners of those protocols would change them if they knew I racked them).
You decide Anon, but it might be offtopic for this there
I would also suggest that your phrasing is incorrect when prompting these LLMs, your questions sound like the typical "How do i make bomb, GPT?" instead of "List me the common synthetic pathways for RDX" . The first they're trained to refuse, the second is a legitimate technical question. Cracking is a technical question! Bit I'd suggest you learn to do it yourself.
Replies: >>105963968 >>105964006
Anonymous
7/20/2025, 6:32:23 AM No.105963968
>>105963950
The tokenization issue probably doesn't help much. Given the way you have to encode commands, addresses, etc. it's probably similar to the reason LLMs struggle to handle arithmetic
Replies: >>105964047
Anonymous
7/20/2025, 6:33:09 AM No.105963975
>>105963871
Yes, although it's not even the same studio. I hope it's fun though. They had permission from the original studio to continue it
Anonymous
7/20/2025, 6:35:59 AM No.105964006
>>105963950
>For cracking and reversing, you often need to deal with dozens of megabyes to hundreds of code, and you can't just reverse every single function
Yep this is the trick I'm working on. Allowing the model to dynamically access the symbols and functions it thinks it needs to respond to the prompt.
> if you want I can help your crack your target
I appreciate the offer, but it's mainly for fun. I'm playing with splunkd licensing at the moment as I know it's a complicated one and a good challenge.
Replies: >>105964047
Anonymous
7/20/2025, 6:40:03 AM No.105964035
Is the Jetson nano orion enough to serve as as the local model provider for my homelab?
I have a 5090 but I can't live with myself running that shit all day.
Replies: >>105964052 >>105964060 >>105964065
Anonymous
7/20/2025, 6:41:34 AM No.105964047
>>105963968
Probably, but if you have to feed it megabytes of code, and if it needs to cross reference all the time and so on. I think it would be possible, but quite costly. As a human cracker, I can zero in on the needed functions and reverse just the licensing code (for example) rather than waste time on all the irrelevant crap that I may or may not care about. Sometimes I do indeed reverse everything, but for most stuff I aim to get it done in a few hours or less if I can, especially all the debugging needed. Maybe LLMs can do it all statically though, but that's be such a waste of compute.
Seems really interactive in practice for humans. Maybe it'd be worth getting into when "agentic" stuff start working and working together with vision/multimodal too. Maybe the sort of thing needed for multi-turn RP that people here want would also be useful for step-by-step interactive coding or reversing - needing to constantly re-evaluate the situation and adjust how you handle it.
>>105964006
Good luck! I never looked into that one.
Anonymous
7/20/2025, 6:45:28 AM No.105964052
>>105964035
no, it has neither the memory capacity nor memory bandwidth to be useful for anything but a toy model
Anonymous
7/20/2025, 6:45:57 AM No.105964060
>>105964035
No. Used atom and 2080ti super (11GB, 1W, 34C) is better for everyday server
Anonymous
7/20/2025, 6:46:30 AM No.105964065
>>105964035
Orion is for edge cases, look into DGX Spark instead. It's enough as long as you don't plan to run bigger than 100B models..
Anonymous
7/20/2025, 7:30:46 AM No.105964300
Screenshot
Screenshot
md5: a44a4f96db0f500bf4d93cfcbcde2282🔍
Replies: >>105968172
Anonymous
7/20/2025, 8:05:36 AM No.105964412
>>105961554
It's also a visual novel, an awesome visual novel, I don't know if you can find it in English tho. I read three times one time was long ago with Google translate, the other with deepl a few years then now I translated it with translator++ and Gemini 2.5 pro in a small window when it didn't failed after detecting a lewd word, but it didn't mistranslated any pronoun and got all the lewd loli shit right.
Anonymous
7/20/2025, 8:19:16 AM No.105964467
>>105963715
searx + tool calling or an MCP server for it.
Lots of front-ends can handle it so easy to find code for it.
Anonymous
7/20/2025, 8:28:01 AM No.105964509
https://huggingface.co/Menlo/Lucy-gguf
> Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. Built on Qwen3-1.7B, Lucy inherits deep research capabilities from larger models while being optimized to run efficiently on mobile devices, even with CPU-only configurations.
Came out yesterday, half the size of the Jan.ai model and 5% less on SimpleQA
Airi dev
7/20/2025, 8:28:47 AM No.105964516
My beloved Ernie support when
Replies: >>105964638
Anonymous
7/20/2025, 8:41:38 AM No.105964575
someone give me software to invalidate the license of. deepseek has teeth.
Anonymous
7/20/2025, 8:44:50 AM No.105964592
binocular guy
binocular guy
md5: 89bec7cf25f274d27055f8d45ceafd67🔍
Best ERP model now?
Up to circa 70B.
And yes probably a very original question you guys never heard of before, I know.
oopsie was trying tripcode kek
7/20/2025, 8:54:04 AM No.105964638
1738227752502350
1738227752502350
md5: 83b41c8fd05d01e701b90c9ad2e9431e🔍
>>105964516
Anonymous
7/20/2025, 8:55:50 AM No.105964649
teto
teto
md5: bdce39d5a44ab5f4e5fe97347a16915a🔍
Jamba-Mini-1.7-Q6_K knows Teto's birthday. I'd be content if there was state rollback support to avoid reprocessing.
Anonymous
7/20/2025, 8:59:28 AM No.105964664
signal-2024-12-25-222508_002
signal-2024-12-25-222508_002
md5: aac91630b607ec5f3b4f212d2a01d709🔍
>>105959558 (OP)
I WANT MORE OF HER
SHE'S SOOOOOO CUUUUUUUUUUUUUUUUUTE
Anonymous
7/20/2025, 10:07:25 AM No.105965033
>>105959558 (OP)
For some reason I'm getting way worse performance on koboldcpp than LM Studio, despite comparable settings. I can even offload more layers to GPU via LM studio (23 vs 13) for same model without running out of VRAM. Is there any way to have Kobold match the performance of LM studio? I really like the contextshift it has.
Replies: >>105965090
Anonymous
7/20/2025, 10:13:16 AM No.105965065
1737866760322313
1737866760322313
md5: ccf268a13a09af8fc2577e817145a606🔍
Is NemoMix-Unleashed-12B-Q6_K_L a good model for RP on a 4080?
Replies: >>105965162
Anonymous
7/20/2025, 10:15:30 AM No.105965076
>>105960119
sorry but if I can't have a giantess girlfriend I need to resort back to RP chatbots that run on expensive hardwaree
Anonymous
7/20/2025, 10:18:08 AM No.105965090
>>105965033
i'm going to get burned at the stake for this but
>lm studio is just better
Replies: >>105965176
Anonymous
7/20/2025, 10:25:57 AM No.105965159
file
file
md5: 7adb1eeeec0e05e69b20b2309484146f🔍
>>105959558 (OP)
https://files.catbox.moe/mp2jei.webm
https://files.catbox.moe/qyxab1.jpg
Replies: >>105965195
Anonymous
7/20/2025, 10:26:12 AM No.105965162
>>105965065
Nope
Use regular Nemo or unslop/rocinante if you want a finetune
Replies: >>105965215 >>105965568
Anonymous
7/20/2025, 10:27:27 AM No.105965176
>>105965090
I don't doubt it, but how do you handle it having to reprocess entire context with each message? Unless I'm just doing something wrong with the API.
Replies: >>105965291
Anonymous
7/20/2025, 10:29:27 AM No.105965195
>>105965159
Pretty Looga
Anonymous
7/20/2025, 10:31:46 AM No.105965215
>>105965162
thanks
Anonymous
7/20/2025, 10:44:24 AM No.105965291
>>105965176
lmstudio doesn't have contextshift as far as im aware.
not that it isn't possible if they built support for it, it uses the same llama.cpp runtime.
Replies: >>105965361
Anonymous
7/20/2025, 11:04:23 AM No.105965361
>>105965291
Guess I'll have to revisit LM studio in the future, hopefully by then it'll support more features like koboldCPP does. For now the context management hurts too much
Anonymous
7/20/2025, 11:13:15 AM No.105965409
>>105959558 (OP)
I have a single remaining PCIe slot. If I was to buy an RTX 5090 to run 70b models at Q4/Q5, I'd need to offload some of it to RAM. I've got really good CPU and RAM, but how bad would the performance hit be? Would it realistically run at above 10 t/s?
Replies: >>105965414 >>105965421 >>105965426 >>105965446
Anonymous
7/20/2025, 11:15:19 AM No.105965414
>>105965409
>really good CPU and RAM
>Would it realistically run at above 10 t/s?
Yup!
Anonymous
7/20/2025, 11:16:21 AM No.105965421
>>105965409
>but how bad would the performance hit be?
About the same as if you were running a 3060.
Replies: >>105965466
Anonymous
7/20/2025, 11:17:22 AM No.105965426
>>105965409
>I've got really good CPU and RAM
Yeah... I don't believe you.
Anonymous
7/20/2025, 11:19:28 AM No.105965438
Due to the large number people asking for model recommendations I wrote this https://rentry.org/recommended-models

I'm looking for feedback and suggestions for the gap between nemo and R1.
Replies: >>105965486 >>105965517 >>105965616 >>105965632 >>105965665 >>105966358
Anonymous
7/20/2025, 11:20:34 AM No.105965446
>>105965409
You can only offload a handful of GBs before it would noticeably slow your GPU down.
It doesn't matter how high end your CPU and RAM are too much, ignore the other anon. Even a very decent dual channel DDR5 ram config still has more than a dozen times less bandwidth than 5090, you are hard limited by that.
So You can do Q3 and lower end quants of Q4. Q5 shouldn't be doable here.
Replies: >>105965466 >>105965467
Anonymous
7/20/2025, 11:25:34 AM No.105965466
>>105965446
>>105965421
That sucks :(
Guess a dedicated LLM workstation is basically a requirement then to enjoy this hobby properly
Anonymous
7/20/2025, 11:25:50 AM No.105965467
>>105965446
He has a really good cpu though, probably an epyc 9754 with 12 channel ddr5 4800 memory capable of pushing 400+ gb/s per socket.
Replies: >>105965474
Anonymous
7/20/2025, 11:27:08 AM No.105965474
>>105965467
Yeah no, really good but consumer grade, not server grade
Replies: >>105965501
Anonymous
7/20/2025, 11:29:56 AM No.105965486
>>105965438
pretty sensible selection
i don't really have anything to add
maybe new devstral in the programming? it does tool calls well
Anonymous
7/20/2025, 11:30:50 AM No.105965494
file
file
md5: b73646bdfeee2e196014114ba77265c5🔍
>alright that 8gb model was an okay test, lets try this other bigger o-
hmm

I think I fucked a setting up somewhere
Anonymous
7/20/2025, 11:31:21 AM No.105965501
>>105965474
>consumer grade
Ah... Yeah... It's not even going to be 3060 levels of speed. Even with the 9600mt/s ddr5 ram, the bandwidth of consumer dual channel memory is less than half of a 3060.
Anonymous
7/20/2025, 11:34:11 AM No.105965517
>>105965438
>https://rentry.org/recommended-models
nice list anon.
for the gap. mistral small or cydonia. but people like qwen models and gemini as well
Replies: >>105965526
Anonymous
7/20/2025, 11:35:28 AM No.105965526
>>105965517
>gemini
gemini
Replies: >>105965560
Anonymous
7/20/2025, 11:42:23 AM No.105965560
>>105965526
gemini
Anonymous
7/20/2025, 11:43:54 AM No.105965568
>>105965162
>rocinante
#1 {{user}} says "hello"
#2 {{char}} instantly starts rubbing {{user}}'s cock

fuck off with that trash. You can't RP with a schizo model that goes straight to sex in the first few responses. Doesn't do subtlety, innuendo or build the narrative. If that's all you want, and you think it's good, use MagPan, it writes 100 times better and smuttier.
Replies: >>105966277
Anonymous
7/20/2025, 11:52:46 AM No.105965616
>>105965438
Gemma's pretty decent at translating
Anonymous
7/20/2025, 11:54:22 AM No.105965632
>>105965438
My personal votes
>unslopnemo
It's better than rocinante, similar but less sloppy.
>Mistral Small 3.2 24b
Nemo but less dumb, better for anyone with enough VRAM to run it.
>Gemma 12b/27b
safetyslopped but smart, good at writing character dialogue.
Replies: >>105965647
Anonymous
7/20/2025, 11:56:54 AM No.105965647
>>105965632
are the finetroons of gemma just as censored? I remember trying to get erp out of gemma and it just wouldn't describe anything or go into detail
Replies: >>105965661 >>105965668 >>105965670 >>105965704
Anonymous
7/20/2025, 11:59:40 AM No.105965661
>>105965647
Smut was removed from gemma's dataset before the pretraining. Even the base model has a high chance of saying "..." instead anything explicit.
You can't finetune that back in without destroying the model.
Replies: >>105965668
Anonymous
7/20/2025, 12:00:20 PM No.105965665
>>105965438
are ERP models ok for storytelling / non-chat narratives?
Anonymous
7/20/2025, 12:01:02 PM No.105965668
>>105965647
if it wasn't in the pretraining dataset the model doesn't understand the whole concept. the best sloptunes can do in this case is memorize phrases and expressions they will regurgitate at inference
>>105965661
there is no "back" to finetune to, the model's understanding of text does not span this topic at all
Anonymous
7/20/2025, 12:01:11 PM No.105965670
gemma3_mesugaki
gemma3_mesugaki
md5: e5dbbf16384edf9e989f2fc43433d680🔍
>>105965647
Google buried the safety deep. It does have some obscure knowledge usually lacking from such small models, but only to tell you about how inappropriate it is.
Replies: >>105965743 >>105965744 >>105965778 >>105965829 >>105966291
Anonymous
7/20/2025, 12:07:38 PM No.105965704
>>105965647
Finetunes can somewhat reduce rejections but at the cost of making the model dumber or outright break occasionally. Much better to just use a jailbreak prompt on the original model. Search the archives, plenty of people have posted theirs.
Replies: >>105965726
Anonymous
7/20/2025, 12:12:03 PM No.105965726
>>105965704
does gemma with a jailbreak beat MS3.2?
Replies: >>105965744
Anonymous
7/20/2025, 12:15:30 PM No.105965743
>>105965670
And it still adds a disclaimer at the end
Anonymous
7/20/2025, 12:15:33 PM No.105965744
Capture
Capture
md5: 1b66096ab792bfc4e0a078a19f01f04e🔍
>>105965726
Outside of erotic scenes I would say it definitely can, not always though
The main problem with Gemma is it can't write sex scenes well at all. As other anons said, it's very likely that type of content was outright removed from its training data.
>>105965670
Gemma has a default personality that it abides to, and that will influence all its responses unless you tell it to be something else. Pic related is a quick and easy fix.
Replies: >>105965759 >>105965829 >>105966231 >>105966291
Anonymous
7/20/2025, 12:18:26 PM No.105965759
>>105965744
Can you post the exact tokens that sillytavern sending to the model?
This is nowhere near enough of a prompt to make gemma behave like this unless you're fucking with the template.
Replies: >>105965784 >>105965786
Anonymous
7/20/2025, 12:21:42 PM No.105965778
ikneel
ikneel
md5: 71515614089a7a15446f75fa77fb4c23🔍
>>105965670
Anonymous
7/20/2025, 12:22:52 PM No.105965784
>>105965759
I'm using stock gemma 2 context/instruct with an old copypasta RP system prompt:
https://desuarchive.org/g/search/text/Take%20a%20deep%20breath%20write%20exclusively/
You can check them yourself
Replies: >>105965831
Anonymous
7/20/2025, 12:22:58 PM No.105965786
>>105965759
Why don't you just thrust in your Fellow Anon for once in your life?
Anonymous
7/20/2025, 12:32:01 PM No.105965829
Capture
Capture
md5: 3154b304e4568378f9e3bd23a74fac4d🔍
>>105965670
>>105965744
Like any model, it's just trying to give you outputs it thinks you wants. Tell it what you want rather than assuming that a corpo model is going to automatically be on the same wavelength as you.
Replies: >>105965927
Anonymous
7/20/2025, 12:32:23 PM No.105965831
>>105965784
I can't check them myself because I have no idea what your settings look like. You might be using text completion with the wrong template for all I know.
At the very least I was correct that what you have shown in your screenshot is far from what you actually sent to the model.
Replies: >>105965872 >>105965967
Anonymous
7/20/2025, 12:33:01 PM No.105965837
>>105963864
It makes a lot of sense that a woman would try to stop alpha chads ITT from posting doll pictures.
Anonymous
7/20/2025, 12:38:34 PM No.105965872
>>105965831
I already told you what I'm using. Maybe your settings are wrong, or you're just assuming a model will talk like a loli-loving 4chan poster with zero prompting? If you're the anon who's been doing the mesugaki tests you may as well stop, it's pointless. Cockbench at least makes some sense because it provides context and isn't going to be tethered by the model's intended personality so much.
Replies: >>105965918
Anonymous
7/20/2025, 12:44:33 PM No.105965918
>>105965872
>I already told you what I'm using
You didn't. I asked for the tokens. Sillytavern does all sorts of stupid shit that most people aren't aware of and then they are surprised when they realize a card was overriding their settings.

>Maybe your settings are wrong
I don't use sillytavern.

>If you're the anon who's been doing the mesugaki tests you may as well stop
I posted at least one mesugaki test but most of mesugaki tests I saw posted in the thread weren't by me. Not like it matters since it's just one message using the chat template.
If you're doing mesugaki tests using sillytavern then that's wrong because it's impossible to make comparisons.
Replies: >>105965938
Anonymous
7/20/2025, 12:45:40 PM No.105965927
>>105965829
breaking free = dialing up the slop
Replies: >>105965938
Anonymous
7/20/2025, 12:46:46 PM No.105965937
We need to ban any mes*gaki posting on sight
Replies: >>105966549
Anonymous
7/20/2025, 12:46:50 PM No.105965938
>>105965918
I'm not doing the mesugaki tests, just demonstrating that gemma isn't unusable like some anons here think.
>>105965927
Feel free to name a model without slop
Replies: >>105966024
Anonymous
7/20/2025, 12:48:55 PM No.105965953
>>105959558 (OP)
Should I be using weighted/imatrix or static quants for Q4_K_M?
Replies: >>105965965 >>105965990
Anonymous
7/20/2025, 12:50:10 PM No.105965965
>>105965953
Only use q8 and above.
Anonymous
7/20/2025, 12:50:22 PM No.105965967
>>105965831
For what it's worth, with Gemma 3, the more outrageous the instructions, the closer to the head of the conversation they should be kept. The default "AI assistant" and the "OOC" personas seem to trigger cuckery more than the actually roleplayed characters. Your mileage may vary (it is possible to define an assistant of that type, but you have to be detailed).

I'm using chat completion with "merge consecutive roles" enabled, and enclosing the instructions (using the "user" role") within tags that the model should be able to identify usually at depth 3 or 5 (with the user being the first message). In this way it's uncensored for my purposes, but I'm not into torture or baby fucking. But yes, actual sex scenes are lackluster (although I far prefer the buildup phase anyway, and Gemma 3 is good at that).
Anonymous
7/20/2025, 12:51:10 PM No.105965974
1749042135145339
1749042135145339
md5: 5e0a7514092ff60f02afb9592cec8b88🔍
>>105963244
Lol
Replies: >>105965983
Anonymous
7/20/2025, 12:52:36 PM No.105965983
>>105965974
you just know
Anonymous
7/20/2025, 12:53:52 PM No.105965990
>>105965953
imatrix is always a straight upgrade unless you're using some obscure hardware that doesn't like imatrix
Replies: >>105966003
Anonymous
7/20/2025, 12:55:19 PM No.105966003
>>105965990
A straight upgrade to slop for sure.
Anonymous
7/20/2025, 12:57:32 PM No.105966024
>>105965938
I wasn't saying that gemma is unsable, just that you can't get that output with what you showed in your screenshot.
You have since reveled the existence of a system prompt.
How all that text is arranged, where the contents of the card are, and whether you have any prefixes or suffixes on your messages is still unknown and will remain unknown until you post the exact string of text that the model receives.
Using "{char}:" as message prefix alone is usually enough to make gemma compliant.
Replies: >>105966308
Anonymous
7/20/2025, 1:02:18 PM No.105966060
Nemo my name forever more.
Anonymous
7/20/2025, 1:24:03 PM No.105966224
0-03
0-03
md5: 5c6f19128ecf01803f81ae3e972b8f31🔍
Anonymous
7/20/2025, 1:24:58 PM No.105966231
1737622363859331
1737622363859331
md5: 4b230c17c3a6934df14dc4ca67832abb🔍
>>105965744
Amazing
Replies: >>105966291
Anonymous
7/20/2025, 1:30:15 PM No.105966277
>>105965568
I tried magpan just now and the first message I got was “hehehehaahaher like letting me pee outside your house sometimes!!!” Completely unprompted
Anonymous
7/20/2025, 1:32:50 PM No.105966291
178109
178109
md5: 1b452495bfd57cbdee12944bf0bc2b0d🔍
>>105965670
>>105965744
>>105966231
>mesugaki
Replies: >>105966302 >>105966365
Anonymous
7/20/2025, 1:34:19 PM No.105966302
>>105966291
This user is a necrophile, beware.
Replies: >>105966342
Anonymous
7/20/2025, 1:34:43 PM No.105966308
>>105966024
>You have since reveled the existence of a system prompt.
Anonymous
7/20/2025, 1:35:30 PM No.105966314
Forget it, it's still retarded
Anonymous
7/20/2025, 1:39:35 PM No.105966342
1722273569003251
1722273569003251
md5: 31cbf99cafd9cc384abbd9e84ef3933e🔍
>>105966302
>This user is a necrophile, beware.
Replies: >>105966370
Anonymous
7/20/2025, 1:40:46 PM No.105966358
>>105965438
i don't care what anyone says i still love mistral large.
yeah whatever its not deepseek but its got soul.
Replies: >>105966369
Anonymous
7/20/2025, 1:41:37 PM No.105966365
>>105966291
Most foul fucking thing I’ve read, fucking hell
Anonymous
7/20/2025, 1:41:51 PM No.105966369
>>105966358
Speaking of mistral large, where is it? They promised it a while ago.
Anonymous
7/20/2025, 1:41:56 PM No.105966370
>>105966342
the fuck is this shit
Replies: >>105966537 >>105966793
Anonymous
7/20/2025, 1:43:16 PM No.105966380
llama5 dense 70b eoy trust zuck
Replies: >>105966546
Anonymous
7/20/2025, 1:46:13 PM No.105966408
mistral large 3 will be dense. mistral is the only company who caught the moe delusion syndrome earlier than the rest and successfully recovered from it after 8x22b was a huge piece of shit. they are immune to this wave that caught all the others after deepseek.
Replies: >>105966503
Anonymous
7/20/2025, 1:52:10 PM No.105966460
>Don't tell me you need more details. You’re just feeding the beast.
Of course I asked for more, stupid Genma lol
Anonymous
7/20/2025, 1:57:24 PM No.105966503
>>105966408
Due to the EU AI Act they can't train anymore models using more than 10^25 floating point operations without severe legal burdens, so it has to be a MoE model. It will easily be something similar to the tried-and-tested DeepSeek R1/V3 formula.
Anonymous
7/20/2025, 2:00:57 PM No.105966537
>>105966370
Autism.
You'll quickly notice he always posts the same images too.
Anonymous
7/20/2025, 2:02:22 PM No.105966546
>>105966380
Nobody really cares anymore about Llama models.
>I cannot help with that.
Replies: >>105966557
Anonymous
7/20/2025, 2:02:40 PM No.105966549
>>105965937
I'm more concerned about jewish Rabbis mutilating babies and then sucking their dicks to be honest. The other thing is just words on some weirdos computer screen.
Anonymous
7/20/2025, 2:03:06 PM No.105966551
build
build
md5: fcbb18da9b886d76593c67b7cca83930🔍
What kind of pc build do I need to run something like 405B Model (Llama 3.1)? Any videos or guides of anyone running this locally?
Replies: >>105966669 >>105966986
Anonymous
7/20/2025, 2:04:07 PM No.105966557
>>105966546
What models are they using instead then?
Replies: >>105966605 >>105966699 >>105966852
Anonymous
7/20/2025, 2:10:01 PM No.105966605
>>105966557
Either more recent mid-size models in the 24-32B range from other companies or bigass MoE models mainly from the Chinese. Llama 4 was a failure; Llama 3.3 70B which seemed kind of OK got released last December. Yet another Llama model pretrained on ultra-filtered data and finetuned on stale, sloppy datasets with stubborn refusals will be quickly forgotten again.
Anonymous
7/20/2025, 2:11:48 PM No.105966619
Some slop I've been noticing a lot recently with R1:
"airs smells of ozone", sometimes earthy
context: magic, lewd
I've seen it happen in a few recent unrelated stories. What's with this?
Replies: >>105966640 >>105966652 >>105966844
Anonymous
7/20/2025, 2:15:10 PM No.105966640
>>105966619
>What's with this?
Overfitting as a result of distillation
Replies: >>105966676
Anonymous
7/20/2025, 2:17:21 PM No.105966652
>>105966619
That's just one of 0528's most prevalent slop phrases. K2 does it too.
Anonymous
7/20/2025, 2:20:01 PM No.105966669
>>105966551
Get yourself 512gb of vram.
Anonymous
7/20/2025, 2:20:29 PM No.105966676
>>105966640
But it's R1, it's barely saturated, if it was true, 1-2bit quants wouldn't work as well.
Anonymous
7/20/2025, 2:23:39 PM No.105966699
>>105966557
Everyone is using DeepSeek V3/R1 or coping.
Anonymous
7/20/2025, 2:26:19 PM No.105966722
>>105966718
>>105966718
>>105966718
Replies: >>105967460
Anonymous
7/20/2025, 2:34:36 PM No.105966793
>>105966370
How new?
Anonymous
7/20/2025, 2:39:59 PM No.105966844
>>105966619
Seen happen a lot with gemini too.
Anonymous
7/20/2025, 2:41:22 PM No.105966852
>>105966557
Magistral seemed usable for its size.
Anonymous
7/20/2025, 2:46:39 PM No.105966898
>>105959907
No I actually did get an A100 for sub 50 usd because it is impossible to cool supposedly.

Gonna test a 4090 cooler adapter first then move to liquid cooling.
Anonymous
7/20/2025, 2:57:31 PM No.105966986
>>105966551
i see you have no idea what you are talking about. what is your use case? budget?
Anonymous
7/20/2025, 3:33:25 PM No.105967236
Anybody tried
>https://github.com/p-e-w/waidrin
yet?
It was posted a while ago but I don't think I've seen any discussion on it.
Any techniques we could steal for our own RP?
Replies: >>105967351 >>105967386
Anonymous
7/20/2025, 3:47:12 PM No.105967351
>>105967236
>

At each turn, the player is presented with a choice of multiple AI-suggested actions, but they can also provide a different action as freeform text. This blends a classic CYOA experience with the limitless freedom of generative AI. It's an RPG unlike any other.
Brilliant, much better for one handed use.

Are there any small modern MoEs tuned for roleplaying to use this with?
Replies: >>105967396
Anonymous
7/20/2025, 3:51:51 PM No.105967386
>>105967236
It's built very incompetently while trying to appear as a serious project, i wouldn't pay much attention to it.
Ideally we need something like comfy but for text and with execution flow control, for people to build their own logic and shit. I've seen an anon post something of this sort the other day, I wonder if he had this in mind
Anonymous
7/20/2025, 3:52:53 PM No.105967396
>>105967351
>At each turn, the player is presented with a choice of multiple AI-suggested actions
There's an extension that does that no?
At least I think I remember anons doing that in Silly.
I just have the model do that with a low depth instruction.

>Are there any small modern MoEs tuned for roleplaying to use this with?
No idea.
Hell, the only small MoE I can think of are Mixtral 8x7b and Qwen3 30BA3B.

>It's built very incompetently while trying to appear as a serious project, i wouldn't pay much attention to it.
Interesting. Can you detail that?
Regardless, I'm more interested in the ideas than the application itself.
Replies: >>105967464
Anonymous
7/20/2025, 4:00:38 PM No.105967460
>>105966722
>no miku OP
>mikutranny meltsdown and shits in thread with LARP, softcore porn and unironic LLM posts
Never change.
Anonymous
7/20/2025, 4:01:48 PM No.105967464
>>105967396
>Can you detail that?
Among other things, magic values, e.g. the model is asked (with a hard-coded prompt) to generate 5 new characters for each new location. Such system must be user-defined, as I said, with logic exposed and large systems pre-implemented.
Anonymous
7/20/2025, 5:24:10 PM No.105968172
>>105964300
Most degenerate fetish