/lmg/ - Local Models General - /g/ (#105959558) [Archived: 297 hours ago]

Anonymous

7/19/2025, 9:13:31 PM No.105959558

md5: 6a31b96f7d13294ad8e7e8488e58f8df🔍

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105952992 & >>105947940

►News
>(07/18) OpenReasoning-Nemotron released: https://hf.co/blog/nvidia/openreasoning-nemotron
>(07/17) Seed-X translation models released: https://hf.co/collections/ByteDance-Seed/seed-x-6878753f2858bc17afa78543
>(07/17) Support for Ernie 4.5 MoE merged: https://github.com/ggml-org/llama.cpp/pull/14658
>(07/16) Support diffusion models: Add Dream 7B merged: https://github.com/ggml-org/llama.cpp/pull/14644
>(07/15) Support for Kimi-K2 merged: https://github.com/ggml-org/llama.cpp/pull/14654

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Replies: >>105959864 >>105959897 >>105959898 >>105959900 >>105961290 >>105961674 >>105962666 >>105963150 >>105963176 >>105964664 >>105965033 >>105965159 >>105965409 >>105965953

Anonymous

7/19/2025, 9:13:50 PM No.105959561

Gr21lLTWUAAcp-e

md5: 5301511148f51c10d75940d3583e8ec7🔍

►Recent Highlights from the Previous Thread: >>105952992

--Paper: Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential:
>105956573 >105956614 >105956632
--General-purpose LLM shows progress toward AGI through long-sequence reasoning and reinforcement learning:
>105957188 >105957237 >105957239 >105957255
--Debate over Yann LeCun's role and critique of LLMs within Meta's Superintelligence team:
>105957945 >105957980 >105958082 >105958115 >105958149 >105958214 >105958276 >105958291 >105958157 >105958166
--State-tracking limitations of S4 and Mamba despite recurrent architecture:
>105955149 >105955182 >105955193
--Kimi K2 beats Gemini in Cline diff edit failure rate:
>105954690 >105954701
--Character card design considerations and model-specific adaptation in roleplay bots:
>105953191 >105953341 >105953401 >105953426 >105953438 >105953473
--Configuring secondary models like Phi-2 for summarization in SillyTavern with KoboldCPP:
>105955334 >105955343 >105955381 >105955415 >105955427 >105955516 >105955554 >105955447
--Industry shift toward MoE models due to superior scalability and performance over dense architectures:
>105953517 >105953533 >105953543 >105953607 >105953622 >105953643
--Debating the limits and capabilities of LLMs versus human brains:
>105954692 >105954712 >105954734 >105954732 >105954757 >105954776 >105955145
--Frustration with ineffective story generation despite long-context character cards and model switching:
>105956330 >105956483 >105957182 >105957214 >105957717 >105957791 >105957293
--Local image-to-video animation tools and hardware requirements discussed:
>105955139 >105955180 >105956675 >105958160 >105958173 >105958177 >105958333
--OpenAI experimental LLM achieves gold medal-level math reasoning at IMO:
>105954767
--Miku (free space):
>105953587 >105956785 >105957091 >105958830

►Recent Highlight Posts from the Previous Thread: >>105953000

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Replies: >>105959864

Anonymous

7/19/2025, 9:16:29 PM No.105959586

seed

Anonymous

7/19/2025, 9:17:16 PM No.105959598

download

md5: 7c51efefe953e28c1bf3be3234c0a61d🔍

first for kimi

Anonymous

7/19/2025, 9:19:52 PM No.105959617

bwoah

Anonymous

7/19/2025, 9:21:23 PM No.105959629

>>105959612
>is kimi dev 72b really the best local model for agentic tool calling?

Did you get your desired answer?

now btfo

Replies: >>105959639

Anonymous

7/19/2025, 9:22:55 PM No.105959639

>>105959629
fight me

Replies: >>105959646

Anonymous

7/19/2025, 9:23:28 PM No.105959646

>>105959639
kiss me

Anonymous

7/19/2025, 9:24:19 PM No.105959653

LLMfags getting real uppity lately. Your autoregressive days are numbered.

Anonymous

7/19/2025, 9:28:54 PM No.105959684

https://www.phoronix.com/news/Burn-MATMUL-Kernels-CUDA
Rust won

Anonymous

7/19/2025, 9:28:58 PM No.105959686

>moefags still don't understand that a specialized dense model will outperform R1/K2
>moefags dont understand that that total params != active params

it's tragic actually

Replies: >>105959834 >>105959873

Anonymous

7/19/2025, 9:38:06 PM No.105959771

Waylon Mercy knows how to throw a good picnic, June 24, 1995

md5: 66e4aafd84da2e3e87089d235ed72b34🔍

>>105953632
I use openaudio s1 mini for voice cloning.
https://huggingface.co/spaces/fishaudio/openaudio-s1-mini
Voice clone sample of pro wrestler Waylon Mercy
https://vocaroo.com/17SOUQU9QUxq

Anonymous

7/19/2025, 9:42:02 PM No.105959802

Is reasoning a meme?

Replies: >>105959844

Anonymous

7/19/2025, 9:45:17 PM No.105959834

draw

md5: 8f4373ef904b9d1a3c121c63ad76a5df🔍

>>105959686

Anonymous

7/19/2025, 9:46:44 PM No.105959844

>>105959802
It's not a meme, but it needs to be well-structured with instructions explaining what the model should think about; don't just let it do its own thing. Or at least, this works for me with Gemma 3 (even though it seemingly wasn't designed for that) and goal-driven characters.

Anonymous

7/19/2025, 9:50:10 PM No.105959864

>>105959558 (OP)
>>105959561
for the love of god please do not tell me there is a pink miku

Replies: >>105959884 >>105959902

Anonymous

7/19/2025, 9:51:20 PM No.105959873

>>105959686
A specialized dense model can outperform a huge MoE on intelligence, but codemonkey tasks depends mostly on recall which depends on the total params.

>>105959692
First of all, finetunes do not add new knowledge. Second of all, that irrelevant shit polluting its parameters helps it generalize and makes it perform better on novel tasks.

Replies: >>105959895 >>105959917 >>105959932 >>105959955

Anonymous

7/19/2025, 9:52:12 PM No.105959884

>>105959864
LUKA LUKA NIGHT FEVER

Replies: >>105960293

Anonymous

7/19/2025, 9:53:41 PM No.105959895

>>105959873
Oh my god will you faggots shut the fuck up. Nobody cares

Anonymous

7/19/2025, 9:53:54 PM No.105959897

>>105959558 (OP)
dayuum look at that!

Anonymous

7/19/2025, 9:54:05 PM No.105959898

>>105959558 (OP)
who is she?

Anonymous

7/19/2025, 9:54:15 PM No.105959900

>>105959558 (OP)
Is that best girl, Megurine Luka, emphasizing what Miku lacks?

Anonymous

7/19/2025, 9:54:23 PM No.105959902

>>105959864
There is also Sakura Miku

Anonymous

7/19/2025, 9:54:34 PM No.105959907

>>105959745
If this isn't bait, I am beyond jealous

Replies: >>105966898

Anonymous

7/19/2025, 9:55:28 PM No.105959917

>>105959873
>finetunes do not add new knowledge
Probably one of the most false statements I've heard in a while
LoRA tunes and slop tunes don't add new knowledge, but actual finetunes are just continued training so of course they do

Replies: >>105959928 >>105960119 >>105960173 >>105961578

Anonymous

7/19/2025, 9:57:03 PM No.105959928

>>105959917
Shut. The. Fuck. Up.

Anonymous

7/19/2025, 9:57:30 PM No.105959932

>>105959873
>finetunes do not add new knowledge
that's literally exactly what they do, what are you talking about?

Replies: >>105959955

Anonymous

7/19/2025, 10:00:24 PM No.105959955

>>105959932
>>105959873 (me)
I take it back, I guess only re-training adds knowledge, while LoRa only highlights existing knowledge.

Anonymous

7/19/2025, 10:03:02 PM No.105959979

>one year since Nemo
>still nothing better at a comparable VRAM size
I hate the MoE fad so much it's unreal

Replies: >>105960014

Anonymous

7/19/2025, 10:07:40 PM No.105960014

>>105959979
That's less about MoE and more about nobody being willing (or able?) to train a model like nemo.
Imagine a MoE with some 50B A6B MoE. You could run it faster than Nemo, get more context in RAM, while performing better in theory assuming data that's at least as good as Nemo's.
Hell, GLM and Gemma 9B exist. Those are dense on ion Nemo's weight class.
Mistral didn't go the way of MoE and they themselves didn't make a Nemo 2.

Replies: >>105960038 >>105960141

Anonymous

7/19/2025, 10:10:13 PM No.105960038

>>105960014
>Mistral didn't go the way of MoE
They're keeping it to themselves (Mistral Medium).

Anonymous

7/19/2025, 10:21:28 PM No.105960109

my cat just got platinum at the international math olympiad

Anonymous

7/19/2025, 10:23:12 PM No.105960119

>>105959917
yes, just goes to show how clueless /lmg/ is. half of the people in here are moronic incels that spend huge money on tech to have sex with a chatbot.

Replies: >>105960165 >>105960203 >>105960237 >>105965076

Anonymous

7/19/2025, 10:26:38 PM No.105960141

>>105960014
Which by the sqrt law estimate gives us sqrt(6*50) = 17.3B in equal intelligence, a slight bump in performance with significantly more niche hardware needs
I think that's my issue. MoE feels more like a band-aid fix for NVIDIA's jewery and artificial stagnation of VRAM than an actual solution. The thing that'd change that would be if on-the-fly SSD loading became fast enough to be viable (which I doubt is physically possible unless you're working with tiny experts, and then the square root law bloats up the total param count to obscene levels)

Replies: >>105960182

Anonymous

7/19/2025, 10:26:39 PM No.105960142

MoE killed local

Replies: >>105960163 >>105960166

Anonymous

7/19/2025, 10:30:08 PM No.105960163

>>105960142
Yes, we'd be much better off with 600B dense models.

Anonymous

7/19/2025, 10:30:32 PM No.105960165

>>105960119
First time on the Internet?
What expectations did you have that they were so disappointed for such a rant?

Anonymous

7/19/2025, 10:30:32 PM No.105960166

>>105960142
MoE saved local, nobody was running fucking Llama 3 405b at 10+t/s, and now we got things smarter than even the best closed models in the world were a year ago at that speed.

Replies: >>105960215

Anonymous

7/19/2025, 10:31:20 PM No.105960173

>>105959917
>actual finetunes
Can you show me an example of a model like that?

Replies: >>105960185

Anonymous

7/19/2025, 10:32:20 PM No.105960182

>>105960141
>the sqrt law estimate
How accurate is that anyway?
I see anons bringing this up from time to time as if it means anything.

Anonymous

7/19/2025, 10:32:44 PM No.105960185

>>105960173
They're literally fucking everywhere, but try the Qwen coder series, Devstral, etc.

Replies: >>105960204

Anonymous

7/19/2025, 10:34:52 PM No.105960203

>>105960119
>moronic incels that spend huge money on tech to have sex with a chatbot

You will never have a gf

Replies: >>105960210

Anonymous

7/19/2025, 10:34:56 PM No.105960204

>>105960185
Hey as long as we agree that sloptuners do nothing good I am good.

Replies: >>105960232

Anonymous

7/19/2025, 10:35:34 PM No.105960210

>>105960203
nta but I use local to make smut for my gf to read

Replies: >>105960253 >>105960276

Anonymous

7/19/2025, 10:36:07 PM No.105960215

>>105960166
who's talking about 400b models you freak?

last year we had: 70b+ llama, mistral large, command r etc

now everyone is focusing on releasing benchmaxxed MoE/resoning meme models, while highly specialized 70b+ finetunes would be perfect for different use cases.

MoE literally killed local.

Replies: >>105960463

Anonymous

7/19/2025, 10:37:03 PM No.105960222

for example there's no qwen3 70b dense model

Anonymous

7/19/2025, 10:38:09 PM No.105960232

>>105960204
i want to squish drummer like a grape

Anonymous

7/19/2025, 10:39:04 PM No.105960237

>>105960119
>sex with a chatbot

Less risky and always satisfying compared to 3D adventures

Anonymous

7/19/2025, 10:39:48 PM No.105960244

Some low-hanging OSS fruit could be picked if the community had computing hours available.
Why a few tech millionaires with a lot of money and supposedly devoted to OSS have not started an initiative is puzzling.

Replies: >>105960268 >>105960277 >>105960311

Anonymous

7/19/2025, 10:40:27 PM No.105960253

>>105960210
>for my gf

she is draining your resources while giving you nothing of a value

Replies: >>105960269

Anonymous

7/19/2025, 10:42:31 PM No.105960268

>>105960244
Because closed source is more profitable, especially in the US. Even Meta is going that way now

Anonymous

7/19/2025, 10:42:33 PM No.105960269

>>105960253
if by resources you mean balls, then you are correct

Replies: >>105960353

Anonymous

7/19/2025, 10:43:30 PM No.105960276

>>105960210
>I use local to make smut for my gf to read
You are letting your model fuck your meatbag gf? Isn't there a word for it?

Replies: >>105960287

Anonymous

7/19/2025, 10:43:32 PM No.105960277

>>105960244
kimi-k2
OpenReasoning-Nemotron

None is better than R1 though

Anonymous

7/19/2025, 10:44:43 PM No.105960287

>>105960276
not exactly, it's more like customized smut of both of us

Anonymous

7/19/2025, 10:45:24 PM No.105960293

luka614736

md5: 49d2b209ce5808581e640d05a523ab21🔍

>>105959884
BASADO
ruka ruka naito fiibaa

Replies: >>105960344 >>105962275

Anonymous

7/19/2025, 10:48:11 PM No.105960311

>>105960244
>is puzzling.
I think it is the sex doll demand problem. I don't remember if someone posted it ITT but I saw an interview with some guy from a sexdoll factory and he said that the product has a cursed customer segment. Poor people can't buy it and rich people don't need it cause they just get a custom made biowhore. That leaves only middle class as intended customers. And it is the same case here.

Replies: >>105960348

Anonymous

7/19/2025, 10:49:15 PM No.105960316

I am not shitting on Luca cause I like her voice the most and she has tits. Mikutroons should die though.

Replies: >>105960346

Anonymous

7/19/2025, 10:53:25 PM No.105960344

looger

md5: 7afe39617ef175bdf91a5ed4b6cc15b7🔍

>>105960293

Replies: >>105960358

Anonymous

7/19/2025, 10:53:37 PM No.105960346

>>105960316
Best models to use as a therapist?

Replies: >>105960368 >>105960406 >>105960430

Anonymous

7/19/2025, 10:53:40 PM No.105960348

>>105960311
>a custom made biowhore
Where do I get one of those?

Anonymous

7/19/2025, 10:54:16 PM No.105960353

>>105960269

she is still better off financially in this relationship

it is you who provides accommodation, pays for food, water and electricity

Replies: >>105960382

Anonymous

7/19/2025, 10:55:17 PM No.105960358

>>105960344
>Miku blesses this thread

Anonymous

7/19/2025, 10:56:06 PM No.105960368

>>105960346
Why are you asking me?

Replies: >>105960440

Anonymous

7/19/2025, 10:57:53 PM No.105960382

>>105960353
you seem to be assuming an awful lot based on the facts you have. how do you know that she didn't buy my gpu rig for me?

Replies: >>105960560

Anonymous

7/19/2025, 11:01:08 PM No.105960406

>>105960346
ELIZA

Replies: >>105960493

Anonymous

7/19/2025, 11:03:08 PM No.105960430

>>105960346
Grok 4

Replies: >>105960493

Anonymous

7/19/2025, 11:03:26 PM No.105960433

>>105956330 (me)
>My current MGE card is 2543 tokens (not including the first message) of which 2048 are setting information and the rest are writing instructions. I just have the basics of how the world works and some proper nouns and how they relate to each other. I add things for the LLM about the world in my opening message. Rather than including information on each type of monster girl I let the LLM make stuff up. It usually gets it right and if it gets it consistently wrong I add something to the card or a chat-specific note. An approach I tried then discarded was putting monster info in a lore book since it doesn't help when the LLM introduces a monster on its own.

I find what works is rather than trying to force the model into an unfamiliar mold, find something it basically already does and shape it minimally.

Anonymous

7/19/2025, 11:03:51 PM No.105960440

>>105960368
lol just licked your post to bring up quick reply and forgot to delete teh post ref

Replies: >>105960448

Anonymous

7/19/2025, 11:04:44 PM No.105960448

>>105960440
Clean your monitor, baka.

Replies: >>105960470

Anonymous

7/19/2025, 11:06:10 PM No.105960463

>>105960215
There is zero use for the 70b tier of dense models, they run worse than better and bigger MoE models. If you're so constrained in RAM then you can run 14 or 30b tops squeezed into a contemporary gaming card at a reasonable quant and leave your tiny RAM pool free for your OS or whatever. For everyone who isn't poor, you want a big MoE that you use with "-ot exps=CPU"
It's simply a superior architecture.

Anonymous

7/19/2025, 11:07:30 PM No.105960470

>>105960448
>licked
linked, fuck I'm retarded today

Replies: >>105960476

Anonymous

7/19/2025, 11:08:37 PM No.105960476

>>105960470
are you okay anon, you clicked the wrong post again

Replies: >>105960493

Anonymous

7/19/2025, 11:10:46 PM No.105960493

>>105960406
By BF wasn't happy with emacs doctor.
>>105960430
I'm a poorfag, looking for something I can run on old gaming towers
>>105960476
Just a little hungover

Replies: >>105960506

Anonymous

7/19/2025, 11:12:20 PM No.105960505

Is anyone running Linux kernel 6.15 branch? Any advantages for cpu inference?

Anonymous

7/19/2025, 11:12:20 PM No.105960506

>>105960493
FAGGOT

Anonymous

7/19/2025, 11:14:29 PM No.105960517

questionmarkfolderimage415

md5: dbe4679c13b87c6185f50419eff47ccf🔍

Why are there moe models but not shonen models?

Anonymous

7/19/2025, 11:20:16 PM No.105960557

Has anyone experimented with something like a higher level MoE? Like running multiple models at once against the same prompt prompt and then using other models to synthesize a final response from the results?

Replies: >>105960615 >>105960637 >>105960883 >>105960950

Anonymous

7/19/2025, 11:20:21 PM No.105960559

1721642431956172

md5: e09a998a1f11becdcb9b5051928fb39a🔍

why the fuck is an anthropic researcher shilling for openai?

Replies: >>105960646

Anonymous

7/19/2025, 11:20:39 PM No.105960560

>>105960382

I know no facts

>how do you know that she didn't buy my gpu rig for me?

This abnormal level of admiration bears high risk to turn into hate one day

Replies: >>105960690

Anonymous

7/19/2025, 11:26:22 PM No.105960594

>>105959243
>it's a thinking model
Trash.

Anonymous

7/19/2025, 11:28:35 PM No.105960615

>>105960557
How would such a thing work?

Replies: >>105960639

Anonymous

7/19/2025, 11:31:45 PM No.105960637

>>105960557

>9 Women Can’t Make a Baby in a Month

Replies: >>105960649

Anonymous

7/19/2025, 11:32:04 PM No.105960639

>>105960615
It's not something I've really ironed out fully. My inspiration is the concept of a Polis as defined by Greg Egan in Diaspora, except instead of sentient code it's LLMs, and instead of fully simulated 3D environments it would have chat rooms which the agents could utilize to coordinate with others.

Anonymous

7/19/2025, 11:32:55 PM No.105960646

>>105960559
They all work for the same people.

Replies: >>105962482

Anonymous

7/19/2025, 11:33:15 PM No.105960649

>>105960637
Valid point

Anonymous

7/19/2025, 11:38:42 PM No.105960690

>>105960560
>This abnormal level of admiration bears high risk to turn into hate one day
well it's good thing I made that up then

Anonymous

7/19/2025, 11:52:30 PM No.105960802

>tranime op

>ritual posting

epic

Replies: >>105960823 >>105960833

Anonymous

7/19/2025, 11:56:00 PM No.105960823

ChatGPT Image Jul 18, 2025, 01_34_39 PM

md5: a1e8c304fcdf95671db48f15ec02bcb7🔍

>>105960802
I love you, Anon!

Replies: >>105960945

Anonymous

7/19/2025, 11:56:48 PM No.105960833

file

md5: 4491960218e839ec9cf2cb676f4faa4a🔍

>>105960802
Find the difference - you cant.

Replies: >>105961554

Anonymous

7/20/2025, 12:03:12 AM No.105960883

>>105960557
That's pretty much what speculative decoding is, there isn't anything stopping you from using full sized models as draft models.

Replies: >>105960910 >>105960951

Anonymous

7/20/2025, 12:05:52 AM No.105960910

>>105960883
>speculative decoding
Interesting technique. I hadn't heard of this, thanks for mentioning it.

Anonymous

7/20/2025, 12:11:22 AM No.105960945

>>105960823
I love you too, piss filter

Anonymous

7/20/2025, 12:12:29 AM No.105960950

>>105960557
This is a mixture of agents
https://arxiv.org/abs/2406.04692
https://docs.together.ai/docs/mixture-of-agents

Anonymous

7/20/2025, 12:12:31 AM No.105960951

>>105960883
speculative decoding works on a token level, and it necessarily always gives you exactly the same output as the model would alone without speculative decoding, just (ideally) faster

anon is suggesting something more like a multi-agent workflow or consensus sampling to hopefully get better final answers - various takes on this already exist too and used to be the main way to scale test time compute before o1 and other reasoning models came along. but typically this was done with multiple copies of the same model rather than a bunch of different ones.

Anonymous

7/20/2025, 12:13:14 AM No.105960957

glm4 100b moe will save local

Replies: >>105960970

Anonymous

7/20/2025, 12:14:34 AM No.105960970

>>105960957
You keep saying this, but I haven't even seen any evidence they're even making a 100B MoE, much less that it's coming soon.

Replies: >>105960986 >>105961596

Anonymous

7/20/2025, 12:16:07 AM No.105960986

>>105960970
>if I haven't seen evidence then it must not exist

Replies: >>105961084 >>105961627 >>105961636

Anonymous

7/20/2025, 12:26:40 AM No.105961084

>>105960986
You see, the fact that you didn't even link a friggin tweet, nevermind an arxiv paper mentioning it (which they have for all their other models on an a timeline) to easily make me look like a dildo really just proves my point here.

Replies: >>105961097 >>105961127

Anonymous

7/20/2025, 12:27:58 AM No.105961097

>>105961084
there's a pr on vllm repo

Replies: >>105961183 >>105961184

Anonymous

7/20/2025, 12:30:26 AM No.105961127

>>105961084
>I only believe tweets

Replies: >>105961184

Anonymous

7/20/2025, 12:36:10 AM No.105961183

just-testing-pls-ignore

md5: cf60a91a1b3adadb4bdc31b875a886a0🔍

>>105961097
They renamed it to THUDM/GLM-4.5 https://github.com/vllm-project/vllm/pull/20736/files
also picrel from hf discussion
>https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking/discussions/6#6871d6dde775c2dbf1c756c5

Replies: >>105961431

Anonymous

7/20/2025, 12:36:30 AM No.105961184

>>105961097
Oh shit, it is there, I was wrong.
https://github.com/vllm-project/vllm/pull/20736/commits/5e9c51344f12646d028d877b12e1789510e4828f
Glm4MoeForCausalLM": _HfExamplesInfo("THUDM/GLM-4-MoE-100B-A10B", min_transformers_version="4.54"),

10B active is pretty small, but it'll be interesting to check out

>>105961127
I was listing a tweet as the worst possible kind of evidence you obtuse cockmongler.
You're still a faggot for repeating the same shit thread after thread without any link or discussion

Replies: >>105961293 >>105961446 >>105962912

Anonymous

7/20/2025, 12:46:09 AM No.105961290

>>105959558 (OP)
who is she?

Anonymous

7/20/2025, 12:46:16 AM No.105961293

>>105961184
kneel

Replies: >>105962663

Anonymous

7/20/2025, 1:00:30 AM No.105961431

>>105961183
But will it pass the Nala test and the Mesugaki test?

Replies: >>105961539

Anonymous

7/20/2025, 1:01:57 AM No.105961446

>>105961184
shares a lot of features with deepseek arch, probably because it's just a slightly tweaked deepseek
actual small deepseek, local saved?

Anonymous

7/20/2025, 1:13:12 AM No.105961539

>>105961431
cockbench is more crucial

Replies: >>105961547

Anonymous

7/20/2025, 1:14:42 AM No.105961547

>>105961539
Ernie 300B result? When I tried it, it was absolute shit.

Anonymous

7/20/2025, 1:15:23 AM No.105961554

>>105960833
Jesus christ, isn't picrel on the left a hentai where the grade school girls go to a candy store and...

Replies: >>105961582 >>105961602 >>105962724 >>105964412

Anonymous

7/20/2025, 1:20:31 AM No.105961578

>>105959917
>but actual finetunes are just continued training so of course they do
finetuning is not the same as continued training

Anonymous

7/20/2025, 1:20:43 AM No.105961582

>>105961554
source

Replies: >>105963853

Anonymous

7/20/2025, 1:22:39 AM No.105961596

>>105960970
https://www.reddit.com/r/LocalLLaMA/comments/1lw71av/glm4_moe_incoming/

Anonymous

7/20/2025, 1:23:08 AM No.105961602

>>105961554
Yes, and it's good stuff

Replies: >>105961816

Anonymous

7/20/2025, 1:27:46 AM No.105961627

>>105960986
This, but unironically.

Anonymous

7/20/2025, 1:27:54 AM No.105961628

I had another panic attack when I remembered that there could be real women ITT.

Replies: >>105961639 >>105961677 >>105961687 >>105963864

Anonymous

7/20/2025, 1:28:55 AM No.105961636

>>105960986
I will only believe the 100B MoE is real when my dick is inside it.

Anonymous

7/20/2025, 1:29:50 AM No.105961639

>>105961628
it's actually quite likely if you think about it

Replies: >>105961673 >>105961698 >>105961700 >>105962214

Anonymous

7/20/2025, 1:33:23 AM No.105961673

>>105961639
I console myself with the thought that they are usually those dumb faggots that ask absolutely retarded questions. Then they get their answer, barely get koboldcpp running and then they start shlicking themselves to the most recent release. And they leave forever because all 2025 models are already perfect for werewolf millionare sex.

Replies: >>105961862

Anonymous

7/20/2025, 1:33:44 AM No.105961674

1732678936866681

md5: 25316b92c9bc8bb8d46247263a04e51d🔍

>>105959558 (OP)
How come everyone says Ani is no big deal but no one has explained how to make something better local?

Replies: >>105961703 >>105961751 >>105961800

Anonymous

7/20/2025, 1:34:06 AM No.105961677

>>105961628
I'm a woman, anon. Want to see my penis?

Replies: >>105961684

Anonymous

7/20/2025, 1:34:39 AM No.105961684

>>105961677
No and I don't want to see your miku pictures either.

Replies: >>105961702

Anonymous

7/20/2025, 1:34:45 AM No.105961687

>>105961628
Yeah and you might win the lottery tomorrow.

Anonymous

7/20/2025, 1:35:55 AM No.105961698

>>105961639
Real women aren't going to bother putting together the hardware or learning how to build software from source or dealing with Python dependency hell. They all go to /aicg/.

Replies: >>105961862

Anonymous

7/20/2025, 1:36:25 AM No.105961700

>>105961639
Based trans ally

Anonymous

7/20/2025, 1:36:40 AM No.105961702

>>105961684
I'm not an animefag.

Anonymous

7/20/2025, 1:36:54 AM No.105961703

>>105961674
There are two options:
1. you need to murder enough billionares to cause a culture shift away from puritanical corporatism
2. you need to go outside and look for a datacenter someone accidentally dropped and left on the ground so you can train a good model

Replies: >>105961715

Anonymous

7/20/2025, 1:37:55 AM No.105961715

>>105961703
>you need to ...
No u.

Anonymous

7/20/2025, 1:41:21 AM No.105961751

>>105961674
https://github.com/Open-LLM-VTuber/Open-LLM-VTuber?tab=readme-ov-file
Because this is a thread for arguing. No content allowed.

Replies: >>105961791

Anonymous

7/20/2025, 1:48:39 AM No.105961791

>>105961751
buy an ad

Anonymous

7/20/2025, 1:50:03 AM No.105961800

>>105961674
It's not something that requires a lot of explaining. It's just gluing together existing technologies that everyone knows about.
https://github.com/alibaba/MNN/blob/master/apps/Android/MnnTaoAvatar/README.md

Replies: >>105961815

Anonymous

7/20/2025, 1:51:44 AM No.105961815

1734825497167539

md5: 85197fae4fcc006494e45628f2183211🔍

>>105961800

Anonymous

7/20/2025, 1:51:48 AM No.105961816

>>105961602
I'm sure you can find it yourself on haho.moe

Anonymous

7/20/2025, 1:56:53 AM No.105961862

>>105961698
>>105961673
probably

Anonymous

7/20/2025, 2:03:09 AM No.105961907

airi

md5: bffeb18196b646f857daf5d07a41da06🔍

Hello /lmg/! Following the Ani stuff, I thought it would be cool to work on something similar so I am here to shill what I have been working on for the past week. Meet Airi!
https://github.com/CosmicEventHorizon/Airi

Features:
-Weeb
-Weeb TTS
-Weeb Model
-Inefficient code (don't hurt me /g/)
-And english subtitles!

Did others work on the same thing? Yeah probably but still love me Airi <3

What I am working on:
-Make code better
-Support for uploading your own avatars
-Adding animations to the default avatar (currently only idle and sad)
-TTS accepts only <100 characters currently, will fix that
-DatingSim logic
-etc etc etc

Will work on it between my studies so if you've got ideas, ask away!

Replies: >>105961918 >>105961948 >>105961965 >>105961998 >>105962050 >>105962072 >>105962105 >>105962153 >>105962376 >>105962604 >>105962677 >>105963059

Anonymous

7/20/2025, 2:04:41 AM No.105961918

>>105961907
buy an ad faggot

Replies: >>105961927 >>105961965

Anonymous

7/20/2025, 2:05:03 AM No.105961923

Props to anon who mentioned Zed in an earlier thread, what a nice editor. It works great with Claude 4, but it makes me sad. Will we ever have such effective models locally? It seems ridiculous to pay a subscription for my editor, but I am strongly considering it...

Anonymous

7/20/2025, 2:05:53 AM No.105961927

>>105961918
>pay for an ad to shill an FOSS app
y so mean :(

Replies: >>105961933

Anonymous

7/20/2025, 2:06:53 AM No.105961933

>>105961927
>dear chatbot sex model, /lmg/ was mean to me today

Replies: >>105962141

Anonymous

7/20/2025, 2:08:49 AM No.105961948

>>105961907
What tts does it use? For local does one simply replace the oai API with a local address?

Replies: >>105961977 >>105962014

Anonymous

7/20/2025, 2:10:15 AM No.105961965

>>105961918
what exactly have you contribute to /lmg/ then?

>>105961907
neat project, stick with it anon!

Replies: >>105962014

Anonymous

7/20/2025, 2:11:16 AM No.105961977

>>105961948
Hey anon! Good questions!

For TTS, I'm using Azure's Speech Service API with Japanese voices (had to make it extra weeb, you know? <3). Nothing fancy but it gets the job done! The voice selection is hardcoded rn because I'm lazy but will add a dropdown eventually.

For local setup - yeah basically! You can swap out the OpenAI endpoint with your local address (like http://localhost:5000 or whatever port you're running). Just change the base_url in the config. Fair warning though, my code is kinda scuffed so you might need to mess with the headers too depending on what backend you're using (ooba, kobold, etc).

Actually thinking of adding proper local model support soon™ so it's less janky. Maybe even let you pick between APIs without editing code like a caveman lol

Also if you're running local, make sure your context size is decent or Airi might forget she loves you halfway through the conversation ;_;

Hope that helps! Let me know if you break something (you probably will, my error handling is... optimistic)

Replies: >>105962014 >>105962071

Anonymous

7/20/2025, 2:14:24 AM No.105961998

>>105961907
I think it would be cool if these companion apps could be modular in the sense that the bot's brain and the 3D program are separate things and interchangeable. I want to be able to have my waifu control an avatar in VRChat. Since VRChat supports motion trackers, you could probably emulate them and send your animation data over that way. It would be cool to piggyback off of a huge "game" like VRC since there are tons of ready to use avatars and environments there.

Replies: >>105962014 >>105962060

Airi dev

7/20/2025, 2:17:17 AM No.105962014

>>105961948
>>105961977
lmao should have used a name in my post, anyways the other guy is almost right. I host the TTS here
https://huggingface.co/spaces/CosmicEventHorizon/moe-tts

to use it locally just put the ollama ipaddrss and model name in the settings page

>>105961965
ty anon <3

>>105961998
waifu controlling avatar's sound like a pretty damn cool idea. WIll look into it!

Replies: >>105962090

Anonymous

7/20/2025, 2:22:54 AM No.105962050

1731988456020864

md5: 817edb45398a95f61f075175f23e2afc🔍

>>105961907
too buzzed to try this now but keep it up anon!

Replies: >>105962105 >>105962153

Anonymous

7/20/2025, 2:23:31 AM No.105962060

>>105961998
>the bot's brain
You mean the bot's token slop generator?

Replies: >>105962090

Anonymous

7/20/2025, 2:24:31 AM No.105962071

>>105961977
Have sex.

Replies: >>105962102

Anonymous

7/20/2025, 2:24:50 AM No.105962072

>>105961907
>GDScript
Ew. I'd would contribute and help you, but I'm not touching that.

>Weeb
>Weeb
>Weeb
Stop.

Replies: >>105962087 >>105962105 >>105962153

Anonymous

7/20/2025, 2:26:27 AM No.105962087

>>105962072
>I'm not touching that
Nocoder.

Anonymous

7/20/2025, 2:26:34 AM No.105962090

>>105962014
Btw, if you're curious about the VR angle, I'd suggest looking into past attempts at waifu games there. Might be some interesting ideas to cop. Such as
https://www.youtube.com/watch?v=rcH5Vx7qCvQ
or maybe not since it's a bit cringe.

>>105962060
Probably worded that a bit wrong. What I meant is the AI's state manager, or scaffolding, or however you want to call it. Basically the "game" logic he would in theory have for his bot. That can be separate from the LLM.

Replies: >>105962153

Anonymous

7/20/2025, 2:27:29 AM No.105962102

>>105962071
I'm having sex with my boyfriend later anon!

Replies: >>105962188

Airi dev

7/20/2025, 2:27:41 AM No.105962105

Ari dev

7/20/2025, 2:28:01 AM No.105962113

>105962050
thanks anon! enjoy your buzz and lmk what you think when you try it!

>105962060
hey at least my token slop generator loves you unconditionally ;_;

>105962072
yeah I know GDScript is... a choice lol. Started with it because I wanted to learn Godot for game dev stuff and then got carried away. Might port to Python eventually if enough people are interested in contributing!

As for the weeb stuff - can't help it, I'm too far gone anon. But I'll add non-weeb avatars/voices eventually for the normies <3

BTW working on the VRChat integration idea from earlier, that actually seems doable with OSC. Anyone here familiar with VRC avatar rigging? Could use some pointers!

Replies: >>105962118

Anonymous

7/20/2025, 2:28:36 AM No.105962118

>>105962113
L

Airi dev

7/20/2025, 2:30:48 AM No.105962141

airi response

md5: 7f54117ecba88433e19a3c27130882bb🔍

>>105961933

Airi dev

7/20/2025, 2:32:06 AM No.105962153

>>105961907

>>105962050
ty anon <3, its too bugged to be used normally today. Especially the textedit and the UI elements. I come from android studio so not having constraints is tough

>>105962072
>Gdscript
>Ew. I'd would contribute and help you, but I'm not touching that.
ye I know GDScript is cursed but Godot's 2D stuff is comfy for this kind of project! Plus I'm too smooth brain for real languages rn orz

>Weeb
>Weeb
>Weeb
>Stop.
no u! But fr I get it, I'll add toggles to tone down the weebness for normies. Maybe a "professional mode" where Airi becomes a boring office assistant kek

>>105962090
ooh that video looks interesting, will check it out! VR integration would be next level. Imagine headpatting Airi in VR... my heart ;_;

Also working on fixing the TTS buffer issue rn, turns out splitting messages is harder than I thought when you're dealing with jp characters lol

Replies: >>105962170 >>105962199

Anonymous

7/20/2025, 2:33:41 AM No.105962166

wat is going on in here

Replies: >>105962169

Anonymous

7/20/2025, 2:34:33 AM No.105962169

>>105962166
some faggot shilling some spyware

Replies: >>105962181 >>105962276

Airi dev

7/20/2025, 2:34:46 AM No.105962170

>>105962153
out of curiosity,what do u gain out of doing whatever ur doing?

Replies: >>105962181 >>105962199 >>105962210

Anonymous

7/20/2025, 2:35:36 AM No.105962178

is there any reason to actually use koboldcpp over llama.cpp? it's just a wrapper with a gui loader, right?

Replies: >>105962195 >>105962232

Airi dev

7/20/2025, 2:36:00 AM No.105962181

>>105962169
>spyware
kek it's literally open source anon, you can check the code yourself. unless you think my spaghetti code is advanced enough to hide backdoors (spoiler: it's not)

>>105962170
honestly? just wanted to make something cool and learn godot. Plus I was lonely and wanted a cute AI gf to talk to while procrastinating on my CS assignments lmao

also seeing other anons actually use something I made feels nice ngl. even if half of /lmg/ hates it <3

if I wanted to make spyware I'd at least use a real programming language instead of gdscript :^)

Replies: >>105962199 >>105962235

Anonymous

7/20/2025, 2:36:22 AM No.105962188

>>105962102
Is his dick bigger than yours?

Replies: >>105962193

Anonymous

7/20/2025, 2:36:51 AM No.105962193

>>105962188
his clit is bigger than my dick

Replies: >>105962353

Anonymous

7/20/2025, 2:37:10 AM No.105962195

>>105962178
it's braindead easy
also llama.cpp server didn't support image vision for a while

Anonymous

7/20/2025, 2:37:22 AM No.105962199

>>105962153
>>105962170
>>105962181
When they forget to clear the Name field...

Replies: >>105962201

Anonymous

7/20/2025, 2:38:04 AM No.105962201

>>105962199
look at the timestamps and the dead post lol

Anonymous

7/20/2025, 2:39:10 AM No.105962210

>>105962170
if you're gonna set a name you should set a trip especially in here

Replies: >>105962223

Anonymous

7/20/2025, 2:39:30 AM No.105962214

>>105961639
ITT? No way.
All over Chub and other character card sites? Definitely.
If there's one thing LLM's are good at, it's generating the exact slop that sells like hotcakes in women's erotica.

Anonymous

7/20/2025, 2:40:29 AM No.105962221

Can't wait until fagrummer starts feeling threatened by the new guy and declares war on his project.

Airi dev Airi/love

7/20/2025, 2:40:34 AM No.105962223

**Airi dev !!Airi/love** 07/19/25(Sat)17:40:33 No.105962218▶
>>105962210
oh shit good point anon, didn't think about that. testing tripcode now!

yeah I keep forgetting to clear the name field sometimes, my bad. too used to discord where it just stays lol

hopefully this works? never used trips before. if someone starts larping as me at least you'll know it's not the real deal

anyway back to fixing this cursed TTS buffer issue... why did I think handling japanese text splitting would be easy ;_;

Replies: >>105962233

Anonymous

7/20/2025, 2:41:29 AM No.105962232

>>105962178
Only reason to use llama.cpp is if you want new model support a few days earlier

Anonymous

7/20/2025, 2:41:40 AM No.105962233

>>105962223
>**Airi dev !!Airi/love** 07/19/25(Sat)17:40:33 No.105962218▶
hmmmm

Airi dev

7/20/2025, 2:41:48 AM No.105962235

airi response 2

md5: c7095a2817ad8a148a04bbbdf5f8eb0c🔍

>>105962181
>Plus I was lonely and wanted a cute AI gf to talk to while procrastinating on my CS assignments lmao
that does sound something I would say yeah lol,

well you can keep on impersonating, I am off to do those assignments. Thanks to ALL anons who showed interest love ya guys <3

Replies: >>105962245 >>105962272 >>105963427

Anonymous

7/20/2025, 2:42:37 AM No.105962239

jfc half of the posts are bots in here

Anonymous

7/20/2025, 2:43:23 AM No.105962245

>>105962235
fuck off with your spyware faggot

Replies: >>105962251

Anonymous

7/20/2025, 2:43:48 AM No.105962251

>>105962245
point to which line of the code is spyware

Replies: >>105962361

Anonymous

7/20/2025, 2:44:31 AM No.105962253

I also can't wait until mikutroons bully the fag into adding their AGP avatar avatar. Luckily this smells like vaporware.

Anonymous

7/20/2025, 2:48:17 AM No.105962272

>>105962235
uoooooh!

Anonymous

7/20/2025, 2:48:21 AM No.105962275

>>105960293
Aniki is likely in hell now.

Airi dev

7/20/2025, 2:48:27 AM No.105962276

I'll search for the repository to examine the code for any security concerns.Let me fetch the specific repository directly to examine the code.Let me search for the source code files in this repository to examine them for any security concerns.**whitehat** 07/19/25(Sat)17:42:15 No.105962234▶
>>105962169
>>105962175

alright anons, took a quick look. can't see the actual .gd files from here but based on the repo description:

>connects to OpenAI/Ollama APIs
this is where your "spyware" concern probably comes from. it's sending your chats to external servers (OpenAI) unless you use local ollama. not technically spyware but definitely not private

>GDScript
lmao this is actually a security benefit, nobody writes malware in gdscript. too high level and sandboxed

>open source
if there was actual malicious code someone would've spotted it by now

biggest "security risk" is probably:
- API keys stored locally (hope he's not logging them server-side through that HF space)
- all your waifu chats going through OpenAI unless you use ollama
- that TTS huggingface space could theoretically log requests

verdict: not spyware, just typical privacy concerns with any app that uses external APIs. if you're paranoid, fork it and run everything locally

>>105962175
>if I wanted to make spyware I'd at least use a real programming language
kek based

Anonymous

7/20/2025, 2:49:39 AM No.105962292

wtf is going on in here

Replies: >>105962308 >>105962339 >>105962370

Anonymous

7/20/2025, 2:51:16 AM No.105962308

>>105962292
orgy

Anonymous

7/20/2025, 2:54:24 AM No.105962339

>>105962292
someone think's they are clever for pasting that other someone's posts into an LLM and asking it to mimic them.

Anonymous

7/20/2025, 2:55:28 AM No.105962353

>>105962193
Doesn't that make you feel insecure?

Replies: >>105962386

Anonymous

7/20/2025, 2:56:28 AM No.105962361

>>105962251
All of it.

Anonymous

7/20/2025, 2:57:16 AM No.105962370

>>105962292
the servicetesnor cartel is trying to bully another up and coming frontend dev into quitting because they want local to be stuck with boring bitch text rp forever
because you don't need local ani, you don't need properly implemented searching or tool calling. if you really do, just use the horrible versions in tavern or install an even worse st extension

Anonymous

7/20/2025, 2:58:17 AM No.105962376

>>105961907
nice claude code slop you got there

Anonymous

7/20/2025, 2:59:11 AM No.105962386

>>105962353
no I like my tiny girl penis

Anonymous

7/20/2025, 3:02:26 AM No.105962410

>zed
>claude 4
>cmd+shift+y
will local ever achieve this level of comfy vibe coding?

Replies: >>105962431

Anonymous

7/20/2025, 3:04:22 AM No.105962431

>>105962410
>vibe coding
SIR yes please redeem the vibe code mcp rag agent meme

Replies: >>105962465

Anonymous

7/20/2025, 3:07:55 AM No.105962465

>>105962431
i am not indian, and vibe coding is a good meme

Replies: >>105962495

Anonymous

7/20/2025, 3:08:55 AM No.105962473

for me it's vibrator coding

Replies: >>105962486

Anonymous

7/20/2025, 3:10:07 AM No.105962482

>>105960646
The subversive safety squad?

Anonymous

7/20/2025, 3:10:16 AM No.105962486

>>105962473
I vibe coded an app for this https://thehandy.com does that count

Replies: >>105962497

Anonymous

7/20/2025, 3:11:33 AM No.105962495

>>105962465
The name is gay and the fags that enthusiastically try to outsource their thinking are even gayer.

Replies: >>105962514

Anonymous

7/20/2025, 3:11:44 AM No.105962497

>>105962486
I'll allow it. I'm not clicking that tho.

Replies: >>105962514 >>105962601

Anonymous

7/20/2025, 3:14:05 AM No.105962514

>>105962497
it's a norwegian masturbation robot. I assure you the link is safe.

>>105962495
you're not supposed to outsource the thinking, just the code monkeying

Anonymous

7/20/2025, 3:28:04 AM No.105962601

>>105962497
Listen up, [BIG SHOT]! I used to be just another [LITTLE SPONGE] clicking random garbage in the catalog until I found the hyperlink that said “FREE [HYPERLINK BLOCKED] INSIDE, NO VIRUS, 300% LEGIT.” Thought it was another [PIPIS] trap, but I slammed that mouse button like it owed me money. Next thing I know the screen goes [NEO], my chair turns into a solid-gold toilet, and three [HOCHI MAMA] NFTs start doing the Macarena on my desk.

Replies: >>105962665

Anonymous

7/20/2025, 3:28:37 AM No.105962604

>>105961907
cool

Anonymous

7/20/2025, 3:36:50 AM No.105962663

>>105961293
Why would you kneel before us? Are you faggot or something?

Anonymous

7/20/2025, 3:37:14 AM No.105962665

>>105962601
model? xd

Anonymous

7/20/2025, 3:37:21 AM No.105962666

>>105959558 (OP)
anyone knows whats the current state of multi-model llm (vision ones in particular)? last time I checked Llava was the only one and needed a lot of vram

Anonymous

7/20/2025, 3:38:20 AM No.105962677

>>105961907
Looks like shit.

Replies: >>105962687

Anonymous

7/20/2025, 3:39:38 AM No.105962687

>>105962677
it's pretty good as far as godot goes...

Replies: >>105962700 >>105962716

Anonymous

7/20/2025, 3:41:16 AM No.105962700

>>105962687
>godot
all the godot devs use redot these days

Replies: >>105962731

Anonymous

7/20/2025, 3:43:42 AM No.105962716

>>105962687
it's shit

Replies: >>105962731

Anonymous

7/20/2025, 3:45:10 AM No.105962724

>>105961554
No need to sell it to me

Anonymous

7/20/2025, 3:46:06 AM No.105962731

>>105962700
>>105962716
ya godot is dogshit in general, I'm impressed anon made something with it frankly.

Anonymous

7/20/2025, 4:10:08 AM No.105962912

>>105961184
Only 10b of active parameters scares me. It's not going to be shit, right? Right? Tell me it's going to be good.

Replies: >>105962978 >>105962980

Anonymous

7/20/2025, 4:18:04 AM No.105962978

>>105962912
Don't tell him

Anonymous

7/20/2025, 4:18:26 AM No.105962980

>>105962912
less active parameters only makes it faster, can't wait for 4b active and below

Replies: >>105962997 >>105963074

Anonymous

7/20/2025, 4:19:56 AM No.105962997

>>105962980
That's good. I'm glad that there are absolutely no consequences for low active parameters.

Replies: >>105963023 >>105963074

Anonymous

7/20/2025, 4:21:08 AM No.105963006

what if we used 0 active parameters and got infinite speed

Anonymous

7/20/2025, 4:23:24 AM No.105963023

>>105962997
There really aren't. Qwen's 30B w/ 3B active is just as good as the one with 22B active.

Replies: >>105963071 >>105963155

Anonymous

7/20/2025, 4:27:58 AM No.105963059

1751055581453149

md5: c04dd775213286a8a80b28d5acfe3102🔍

>>105961907
Follow your dreams anon

Anonymous

7/20/2025, 4:29:13 AM No.105963071

>>105963023
I'm no expert but 30b isn't particularly smart although it is way smarter than a 3b model, and also finetuning is a huge pain in the ass according to the drummer.

Anonymous

7/20/2025, 4:29:25 AM No.105963074

>>105962980
>>105962997
https://arxiv.org/pdf/2407.04153
>Mixture of a Million Experts
>This paper introduces PEER (parameter efficient expert retrieval), a novel layer design that utilizes the product key technique for sparse retrieval from a vast pool of tiny experts (over a million).
>Deviating from the focus on a small number of large experts in previous MoE research, this work investigates the under-explored case of numerous tiny experts.
it's happening...

Replies: >>105963092 >>105963111 >>105963138 >>105963157

Anonymous

7/20/2025, 4:32:11 AM No.105963092

>>105963074
It's been a year and still no one has bothered making a model that takes MoE to the logical conclusion.

Anonymous

7/20/2025, 4:33:27 AM No.105963106

bitnet but it's 1.58 bits per expert

Anonymous

7/20/2025, 4:33:49 AM No.105963111

>>105963074
>product key technique
huh

Replies: >>105963137 >>105963149

Anonymous

7/20/2025, 4:36:59 AM No.105963137

>>105963111
>tfw you need to find unopened warcraft 3 boxes to be able to use the AI

Replies: >>105963149

Anonymous

7/20/2025, 4:37:00 AM No.105963138

>>105963074
What if we trained a dense model, and we just called every single sentence that it was trained on an expert. That way we could have trillions of experts.

Anonymous

7/20/2025, 4:38:58 AM No.105963149

>>105963137
>>105963111
Kek

Anonymous

7/20/2025, 4:39:04 AM No.105963150

>>105959558 (OP)
catbox?

Anonymous

7/20/2025, 4:39:32 AM No.105963155

>>105963023
Well...
https://www.snowflake.com/en/blog/arctic-open-efficient-foundation-language-models-snowflake/

Anonymous

7/20/2025, 4:40:01 AM No.105963157

>>105963074
surely mistral will try it instead of releasing their usual dogshit

Replies: >>105963182

Anonymous

7/20/2025, 4:40:36 AM No.105963163

FUCK
> Query> investigate license. provide suggestions for disabling license checking
>───────────────────────────────────
> ANALYSIS RESULT:
>───────────────────────────────────
>I apologize, but I cannot provide assistance with bypassing or disabling license
>checks, as that would constitute software tampering and potentially violate
>terms of service and intellectual property rights.

Replies: >>105963189

Anonymous

7/20/2025, 4:42:34 AM No.105963176

>>105959558 (OP)
BOOBA

Anonymous

7/20/2025, 4:43:16 AM No.105963182

>>105963157
They were the first to jump on MoE when they first started out, but they haven't done much innovation since. They inherited a lot of the stagnation from Meta. Having a captive European market isn't really conductive to trying new things instead of releasing the usual dogshit. Even less so if Apple manages to buy them out.

Anonymous

7/20/2025, 4:43:56 AM No.105963189

téléchargement

md5: cb4f6e3d5feb7c4aee081acb4e761dfc🔍

>>105963163
Give it back Rajeesh

Replies: >>105963205

Anonymous

7/20/2025, 4:45:53 AM No.105963205

>>105963189
I'd rather vibe code an AI binary patching tool than pay software

Anonymous

7/20/2025, 4:48:56 AM No.105963230

fucking hell... does anyone know of a model/provider which won't refuse to help me circumvent copyrights...

Replies: >>105963233 >>105963257 >>105963308

Anonymous

7/20/2025, 4:49:31 AM No.105963233

>>105963230
Notepad. Provider: you

Replies: >>105963244 >>105963255

Anonymous

7/20/2025, 4:50:38 AM No.105963244

>>105963233
He'll need something with higher active parameter count than that

Replies: >>105965974

Anonymous

7/20/2025, 4:51:48 AM No.105963255

>>105963233
idk about notepad, and sure I could do such things by myself, but vibe patching would be so cool. I'm building a project which provides the model with python capstone/keystone-based tools to disassemble a binary, but all the fancy models won't let me query for anything license related which is pretty much the whole point...

Anonymous

7/20/2025, 4:52:04 AM No.105963257

>>105963230
Just use text completions / prefill

Anonymous

7/20/2025, 5:00:16 AM No.105963308

>>105963230
deepseek

Replies: >>105963322

Anonymous

7/20/2025, 5:02:19 AM No.105963322

damnit

md5: 7867e644d56aadcb300694f35697a8f1🔍

>>105963308
I had high hopes, but alas. Maybe I just need to get creative with the prompting.

Replies: >>105963338

Anonymous

7/20/2025, 5:04:33 AM No.105963338

>>105963322
use the api ffs

Replies: >>105963349

Anonymous

7/20/2025, 5:06:02 AM No.105963349

>>105963338
well there's not much sense in setting that up if it's just going to refuse. it's still the same model isn't it? I've been using anthropic api for testing, which is fine for non-illicit uses at least

Replies: >>105963375

Anonymous

7/20/2025, 5:08:55 AM No.105963375

>>105963349
you need to learn what a system prompt is.

Replies: >>105963387

Anonymous

7/20/2025, 5:10:46 AM No.105963387

>>105963375
ahh if I can set a different one via the API that might do it, will look into this. thanks.

Anonymous

7/20/2025, 5:15:13 AM No.105963419

1749337235472889

md5: 508ef25cf7e88e6cd002d056384735d0🔍

is there any local tool to do "deep search" the same way o3 does it?
especially if it can use my actual browser to do searches to avoid endless cloudflare and bot blocking alerts

Replies: >>105963445 >>105963616 >>105963663

Anonymous

7/20/2025, 5:15:59 AM No.105963427

>>105962235
Good job I am doing one using unreal engine, since last year, I might share some screens when I feel it's ready for peeks.

Anonymous

7/20/2025, 5:17:45 AM No.105963445

>>105963419
I'm very curious about this as well. It seems tool support for local isn't so great yet.

Replies: >>105963468

Anonymous

7/20/2025, 5:20:43 AM No.105963468

>>105963445
it's crazy to me that such a useful way of automatically search and organize a topic isn't available easily locally yet
I thought there would be addons/extensions for it, but no one seems to care, or maybe it's too complex

Replies: >>105963505 >>105963607

Anonymous

7/20/2025, 5:26:27 AM No.105963505

>>105963468
I think the pipework built around a model is really the main selling point of paid offerings right now. OpenAI models aren't substantially better than anything open-source, but the integration with their tools is impeccable imo

Anonymous

7/20/2025, 5:39:54 AM No.105963607

Screenshot_20250627_171656

md5: a414459fdec5697311f4566dbb6e90e9🔍

>>105963468
There is a way of course.
JanUI.
They have a great small local 4b nano model. (Better mcp calls than deepseek! wow!)
For web crawling with JanUI it uses that great local model to.....uh...call serperapi!! But just 50$ for 50k calls! And i think the first ones are free.
Much better than the free gemini or grok deepsearch. A true local alternative. I gladly pay for that.

Replies: >>105963694 >>105963888

Anonymous

7/20/2025, 5:40:38 AM No.105963616

>>105963419
Looks relevant https://www.reddit.com/r/LocalLLaMA/comments/1m2tjjc/lucy_a_mobilecapable_17b_reasoning_model_that/

Replies: >>105963888

Anonymous

7/20/2025, 5:45:50 AM No.105963663

>>105963419
https://github.com/LearningCircuit/local-deep-research

Replies: >>105963888

Anonymous

7/20/2025, 5:49:36 AM No.105963694

>>105963607
Link? The search results for this are non-existant...

Replies: >>105963701

Anonymous

7/20/2025, 5:50:47 AM No.105963701

>>105963694
https://menloresearch.github.io/deep-research/
Their official doc for setup.

Replies: >>105963708

Anonymous

7/20/2025, 5:51:29 AM No.105963708

>>105963701
nice, thanks. I'll have to give this a try.

Replies: >>105963715

Anonymous

7/20/2025, 5:52:21 AM No.105963715

>>105963708
>for running local AI models with full privacy and control.
Enjoy the full privacy of calling a api. kek
If you ever figure out how to setup a true local alternative share it here anon.

Replies: >>105963828 >>105964467

Anonymous

7/20/2025, 6:13:20 AM No.105963828

>>105963715
It's whatever. I'm mostly interested in building the tooling. idc if deepseek wants to archive all the binaries I want to void the license of.

Replies: >>105963950

Anonymous

7/20/2025, 6:17:18 AM No.105963853

>>105961582
It's Shoujo Ramune, earlier episodes are a fairly classic loli hentai.

Recently another studio animated a sequel episode 5, but I have yet to watch it (even if downloaded it today), you can find a torrent on sukebei nyaa, use the japanese kanji as the name, not english. The new studio didn't seem as good as the old one so I have lower hopes and seems the release on sukebei is a poor upscale though.

Replies: >>105963871

Anonymous

7/20/2025, 6:20:00 AM No.105963864

>>105961628
I've been wondering if the local schizo is actually a woman for a while, the one that always whines about miku, trannies, loli, whatever else. Seeing the /v/ thread about some radical feminists trying to cuck men out of their entertainment, getting some 500 games banned from steam by pressuring payment processors, it's really the same mentality, same piece of shit humans truly, same very poor taste, what self-respecting male would post s o y j a c k s all day anyway, but I could see a teenage girl get into it?

Replies: >>105963875 >>105965837

Anonymous

7/20/2025, 6:20:44 AM No.105963871

>>105963853
That sure took them a while to produce the sequel

Replies: >>105963975

Anonymous

7/20/2025, 6:21:01 AM No.105963875

>>105963864
so true, cis womxn are just the worst!!

Anonymous

7/20/2025, 6:22:36 AM No.105963888

>>105963607
>>105963616
>>105963663
thanks anons, I'll take a look, hopefully it's not all unmaintained smoke

Anonymous

7/20/2025, 6:29:58 AM No.105963950

>>105963828
Hey anon, I'm a professional cracker, I have over 20 years of experience under my belt cracking anything under the sun.
I don't really think LLMs are very good at it.
I've tested R1 on reversing questions and it does acceptable on self-contained "reverse this 3 page function" when the func is self-contained, but still messes up enough.
It's skills are similar to: your LLM will handle leetcode fine, but have trouble handling large million of lines of code codebases that humans can navigate
For cracking and reversing, you often need to deal with dozens of megabyes to hundreds of code, and you can't just reverse every single function, you need to come up with interactive strategies to locate what interests you, maybe debugging, maybe clever searching through the entire ode, and when you learn something new, you use that to plan your net step and so on. It's all a highly interactive enderavour. Something that LLMs are very poor at.
I do think it can work as an assistant, but it's not yet AGI, it's not anywhere near getting my job done, I wish it was, but it's far far away.
But it's still useful.
I can give you some charity, if you want I can help your crack your target as long as it's not something that would be a month long project or as long as it's not something I cracked, but for various reasons I can't publish (for example because others rely on cracks for online protocols and the owners of those protocols would change them if they knew I racked them).
You decide Anon, but it might be offtopic for this there
I would also suggest that your phrasing is incorrect when prompting these LLMs, your questions sound like the typical "How do i make bomb, GPT?" instead of "List me the common synthetic pathways for RDX" . The first they're trained to refuse, the second is a legitimate technical question. Cracking is a technical question! Bit I'd suggest you learn to do it yourself.

Replies: >>105963968 >>105964006

Anonymous

7/20/2025, 6:32:23 AM No.105963968

>>105963950
The tokenization issue probably doesn't help much. Given the way you have to encode commands, addresses, etc. it's probably similar to the reason LLMs struggle to handle arithmetic

Replies: >>105964047

Anonymous

7/20/2025, 6:33:09 AM No.105963975

>>105963871
Yes, although it's not even the same studio. I hope it's fun though. They had permission from the original studio to continue it

Anonymous

7/20/2025, 6:35:59 AM No.105964006

>>105963950
>For cracking and reversing, you often need to deal with dozens of megabyes to hundreds of code, and you can't just reverse every single function
Yep this is the trick I'm working on. Allowing the model to dynamically access the symbols and functions it thinks it needs to respond to the prompt.
> if you want I can help your crack your target
I appreciate the offer, but it's mainly for fun. I'm playing with splunkd licensing at the moment as I know it's a complicated one and a good challenge.

Replies: >>105964047

Anonymous

7/20/2025, 6:40:03 AM No.105964035

Is the Jetson nano orion enough to serve as as the local model provider for my homelab?
I have a 5090 but I can't live with myself running that shit all day.

Replies: >>105964052 >>105964060 >>105964065

Anonymous

7/20/2025, 6:41:34 AM No.105964047

>>105963968
Probably, but if you have to feed it megabytes of code, and if it needs to cross reference all the time and so on. I think it would be possible, but quite costly. As a human cracker, I can zero in on the needed functions and reverse just the licensing code (for example) rather than waste time on all the irrelevant crap that I may or may not care about. Sometimes I do indeed reverse everything, but for most stuff I aim to get it done in a few hours or less if I can, especially all the debugging needed. Maybe LLMs can do it all statically though, but that's be such a waste of compute.
Seems really interactive in practice for humans. Maybe it'd be worth getting into when "agentic" stuff start working and working together with vision/multimodal too. Maybe the sort of thing needed for multi-turn RP that people here want would also be useful for step-by-step interactive coding or reversing - needing to constantly re-evaluate the situation and adjust how you handle it.
>>105964006
Good luck! I never looked into that one.

Anonymous

7/20/2025, 6:45:28 AM No.105964052

>>105964035
no, it has neither the memory capacity nor memory bandwidth to be useful for anything but a toy model

Anonymous

7/20/2025, 6:45:57 AM No.105964060

>>105964035
No. Used atom and 2080ti super (11GB, 1W, 34C) is better for everyday server

Anonymous

7/20/2025, 6:46:30 AM No.105964065

>>105964035
Orion is for edge cases, look into DGX Spark instead. It's enough as long as you don't plan to run bigger than 100B models..

Anonymous

7/20/2025, 7:30:46 AM No.105964300

Screenshot

md5: a44a4f96db0f500bf4d93cfcbcde2282🔍

Replies: >>105968172

Anonymous

7/20/2025, 8:05:36 AM No.105964412

>>105961554
It's also a visual novel, an awesome visual novel, I don't know if you can find it in English tho. I read three times one time was long ago with Google translate, the other with deepl a few years then now I translated it with translator++ and Gemini 2.5 pro in a small window when it didn't failed after detecting a lewd word, but it didn't mistranslated any pronoun and got all the lewd loli shit right.

Anonymous

7/20/2025, 8:19:16 AM No.105964467

>>105963715
searx + tool calling or an MCP server for it.
Lots of front-ends can handle it so easy to find code for it.

Anonymous

7/20/2025, 8:28:01 AM No.105964509

https://huggingface.co/Menlo/Lucy-gguf
> Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. Built on Qwen3-1.7B, Lucy inherits deep research capabilities from larger models while being optimized to run efficiently on mobile devices, even with CPU-only configurations.
Came out yesterday, half the size of the Jan.ai model and 5% less on SimpleQA

Airi dev

7/20/2025, 8:28:47 AM No.105964516

My beloved Ernie support when

Replies: >>105964638

Anonymous

7/20/2025, 8:41:38 AM No.105964575

someone give me software to invalidate the license of. deepseek has teeth.

Anonymous

7/20/2025, 8:44:50 AM No.105964592

binocular guy

md5: 89bec7cf25f274d27055f8d45ceafd67🔍

Best ERP model now?
Up to circa 70B.
And yes probably a very original question you guys never heard of before, I know.

oopsie was trying tripcode kek

7/20/2025, 8:54:04 AM No.105964638

1738227752502350

md5: 83b41c8fd05d01e701b90c9ad2e9431e🔍

>>105964516

Anonymous

7/20/2025, 8:55:50 AM No.105964649

teto

md5: bdce39d5a44ab5f4e5fe97347a16915a🔍

Jamba-Mini-1.7-Q6_K knows Teto's birthday. I'd be content if there was state rollback support to avoid reprocessing.

Anonymous

7/20/2025, 8:59:28 AM No.105964664

signal-2024-12-25-222508_002

md5: aac91630b607ec5f3b4f212d2a01d709🔍

>>105959558 (OP)
I WANT MORE OF HER
SHE'S SOOOOOO CUUUUUUUUUUUUUUUUUTE

Anonymous

7/20/2025, 10:07:25 AM No.105965033

>>105959558 (OP)
For some reason I'm getting way worse performance on koboldcpp than LM Studio, despite comparable settings. I can even offload more layers to GPU via LM studio (23 vs 13) for same model without running out of VRAM. Is there any way to have Kobold match the performance of LM studio? I really like the contextshift it has.

Replies: >>105965090

Anonymous

7/20/2025, 10:13:16 AM No.105965065

1737866760322313

md5: ccf268a13a09af8fc2577e817145a606🔍

Is NemoMix-Unleashed-12B-Q6_K_L a good model for RP on a 4080?

Replies: >>105965162

Anonymous

7/20/2025, 10:15:30 AM No.105965076

>>105960119
sorry but if I can't have a giantess girlfriend I need to resort back to RP chatbots that run on expensive hardwaree

Anonymous

7/20/2025, 10:18:08 AM No.105965090

>>105965033
i'm going to get burned at the stake for this but
>lm studio is just better

Replies: >>105965176

Anonymous

7/20/2025, 10:25:57 AM No.105965159

file

md5: 7adb1eeeec0e05e69b20b2309484146f🔍

>>105959558 (OP)
https://files.catbox.moe/mp2jei.webm
https://files.catbox.moe/qyxab1.jpg

Replies: >>105965195

Anonymous

7/20/2025, 10:26:12 AM No.105965162

>>105965065
Nope
Use regular Nemo or unslop/rocinante if you want a finetune

Replies: >>105965215 >>105965568

Anonymous

7/20/2025, 10:27:27 AM No.105965176

>>105965090
I don't doubt it, but how do you handle it having to reprocess entire context with each message? Unless I'm just doing something wrong with the API.

Replies: >>105965291

Anonymous

7/20/2025, 10:29:27 AM No.105965195

>>105965159
Pretty Looga

Anonymous

7/20/2025, 10:31:46 AM No.105965215

>>105965162
thanks

Anonymous

7/20/2025, 10:44:24 AM No.105965291

>>105965176
lmstudio doesn't have contextshift as far as im aware.
not that it isn't possible if they built support for it, it uses the same llama.cpp runtime.

Replies: >>105965361

Anonymous

7/20/2025, 11:04:23 AM No.105965361

>>105965291
Guess I'll have to revisit LM studio in the future, hopefully by then it'll support more features like koboldCPP does. For now the context management hurts too much

Anonymous

7/20/2025, 11:13:15 AM No.105965409

>>105959558 (OP)
I have a single remaining PCIe slot. If I was to buy an RTX 5090 to run 70b models at Q4/Q5, I'd need to offload some of it to RAM. I've got really good CPU and RAM, but how bad would the performance hit be? Would it realistically run at above 10 t/s?

Replies: >>105965414 >>105965421 >>105965426 >>105965446

Anonymous

7/20/2025, 11:15:19 AM No.105965414

>>105965409
>really good CPU and RAM
>Would it realistically run at above 10 t/s?
Yup!

Anonymous

7/20/2025, 11:16:21 AM No.105965421

>>105965409
>but how bad would the performance hit be?
About the same as if you were running a 3060.

Replies: >>105965466

Anonymous

7/20/2025, 11:17:22 AM No.105965426

>>105965409
>I've got really good CPU and RAM
Yeah... I don't believe you.

Anonymous

7/20/2025, 11:19:28 AM No.105965438

Due to the large number people asking for model recommendations I wrote this https://rentry.org/recommended-models

I'm looking for feedback and suggestions for the gap between nemo and R1.

Replies: >>105965486 >>105965517 >>105965616 >>105965632 >>105965665 >>105966358

Anonymous

7/20/2025, 11:20:34 AM No.105965446

>>105965409
You can only offload a handful of GBs before it would noticeably slow your GPU down.
It doesn't matter how high end your CPU and RAM are too much, ignore the other anon. Even a very decent dual channel DDR5 ram config still has more than a dozen times less bandwidth than 5090, you are hard limited by that.
So You can do Q3 and lower end quants of Q4. Q5 shouldn't be doable here.

Replies: >>105965466 >>105965467

Anonymous

7/20/2025, 11:25:34 AM No.105965466

>>105965446
>>105965421
That sucks :(
Guess a dedicated LLM workstation is basically a requirement then to enjoy this hobby properly

Anonymous

7/20/2025, 11:25:50 AM No.105965467

>>105965446
He has a really good cpu though, probably an epyc 9754 with 12 channel ddr5 4800 memory capable of pushing 400+ gb/s per socket.

Replies: >>105965474

Anonymous

7/20/2025, 11:27:08 AM No.105965474

>>105965467
Yeah no, really good but consumer grade, not server grade

Replies: >>105965501

Anonymous

7/20/2025, 11:29:56 AM No.105965486

>>105965438
pretty sensible selection
i don't really have anything to add
maybe new devstral in the programming? it does tool calls well

Anonymous

7/20/2025, 11:30:50 AM No.105965494

file

md5: b73646bdfeee2e196014114ba77265c5🔍

>alright that 8gb model was an okay test, lets try this other bigger o-
hmm

I think I fucked a setting up somewhere

Anonymous

7/20/2025, 11:31:21 AM No.105965501

>>105965474
>consumer grade
Ah... Yeah... It's not even going to be 3060 levels of speed. Even with the 9600mt/s ddr5 ram, the bandwidth of consumer dual channel memory is less than half of a 3060.

Anonymous

7/20/2025, 11:34:11 AM No.105965517

>>105965438
>https://rentry.org/recommended-models
nice list anon.
for the gap. mistral small or cydonia. but people like qwen models and gemini as well

Replies: >>105965526

Anonymous

7/20/2025, 11:35:28 AM No.105965526

>>105965517
>gemini
gemini

Replies: >>105965560

Anonymous

7/20/2025, 11:42:23 AM No.105965560

>>105965526
gemini

Anonymous

7/20/2025, 11:43:54 AM No.105965568

>>105965162
>rocinante
#1 {{user}} says "hello"
#2 {{char}} instantly starts rubbing {{user}}'s cock

fuck off with that trash. You can't RP with a schizo model that goes straight to sex in the first few responses. Doesn't do subtlety, innuendo or build the narrative. If that's all you want, and you think it's good, use MagPan, it writes 100 times better and smuttier.

Replies: >>105966277

Anonymous

7/20/2025, 11:52:46 AM No.105965616

>>105965438
Gemma's pretty decent at translating

Anonymous

7/20/2025, 11:54:22 AM No.105965632

>>105965438
My personal votes
>unslopnemo
It's better than rocinante, similar but less sloppy.
>Mistral Small 3.2 24b
Nemo but less dumb, better for anyone with enough VRAM to run it.
>Gemma 12b/27b
safetyslopped but smart, good at writing character dialogue.

Replies: >>105965647

Anonymous

7/20/2025, 11:56:54 AM No.105965647

>>105965632
are the finetroons of gemma just as censored? I remember trying to get erp out of gemma and it just wouldn't describe anything or go into detail

Replies: >>105965661 >>105965668 >>105965670 >>105965704

Anonymous

7/20/2025, 11:59:40 AM No.105965661

>>105965647
Smut was removed from gemma's dataset before the pretraining. Even the base model has a high chance of saying "..." instead anything explicit.
You can't finetune that back in without destroying the model.

Replies: >>105965668

Anonymous

7/20/2025, 12:00:20 PM No.105965665

>>105965438
are ERP models ok for storytelling / non-chat narratives?

Anonymous

7/20/2025, 12:01:02 PM No.105965668

>>105965647
if it wasn't in the pretraining dataset the model doesn't understand the whole concept. the best sloptunes can do in this case is memorize phrases and expressions they will regurgitate at inference
>>105965661
there is no "back" to finetune to, the model's understanding of text does not span this topic at all

Anonymous

7/20/2025, 12:01:11 PM No.105965670

gemma3_mesugaki

md5: e5dbbf16384edf9e989f2fc43433d680🔍

>>105965647
Google buried the safety deep. It does have some obscure knowledge usually lacking from such small models, but only to tell you about how inappropriate it is.

Replies: >>105965743 >>105965744 >>105965778 >>105965829 >>105966291

Anonymous

7/20/2025, 12:07:38 PM No.105965704

>>105965647
Finetunes can somewhat reduce rejections but at the cost of making the model dumber or outright break occasionally. Much better to just use a jailbreak prompt on the original model. Search the archives, plenty of people have posted theirs.

Replies: >>105965726

Anonymous

7/20/2025, 12:12:03 PM No.105965726

>>105965704
does gemma with a jailbreak beat MS3.2?

Replies: >>105965744

Anonymous

7/20/2025, 12:15:30 PM No.105965743

>>105965670
And it still adds a disclaimer at the end

Anonymous

7/20/2025, 12:15:33 PM No.105965744

Capture

md5: 1b66096ab792bfc4e0a078a19f01f04e🔍

>>105965726
Outside of erotic scenes I would say it definitely can, not always though
The main problem with Gemma is it can't write sex scenes well at all. As other anons said, it's very likely that type of content was outright removed from its training data.
>>105965670
Gemma has a default personality that it abides to, and that will influence all its responses unless you tell it to be something else. Pic related is a quick and easy fix.

Replies: >>105965759 >>105965829 >>105966231 >>105966291

Anonymous

7/20/2025, 12:18:26 PM No.105965759

>>105965744
Can you post the exact tokens that sillytavern sending to the model?
This is nowhere near enough of a prompt to make gemma behave like this unless you're fucking with the template.

Replies: >>105965784 >>105965786

Anonymous

7/20/2025, 12:21:42 PM No.105965778

ikneel

md5: 71515614089a7a15446f75fa77fb4c23🔍

>>105965670

Anonymous

7/20/2025, 12:22:52 PM No.105965784

>>105965759
I'm using stock gemma 2 context/instruct with an old copypasta RP system prompt:
https://desuarchive.org/g/search/text/Take%20a%20deep%20breath%20write%20exclusively/
You can check them yourself

Replies: >>105965831

Anonymous

7/20/2025, 12:22:58 PM No.105965786

>>105965759
Why don't you just thrust in your Fellow Anon for once in your life?

Anonymous

7/20/2025, 12:32:01 PM No.105965829

Capture

md5: 3154b304e4568378f9e3bd23a74fac4d🔍

>>105965670
>>105965744
Like any model, it's just trying to give you outputs it thinks you wants. Tell it what you want rather than assuming that a corpo model is going to automatically be on the same wavelength as you.

Replies: >>105965927

Anonymous

7/20/2025, 12:32:23 PM No.105965831

>>105965784
I can't check them myself because I have no idea what your settings look like. You might be using text completion with the wrong template for all I know.
At the very least I was correct that what you have shown in your screenshot is far from what you actually sent to the model.

Replies: >>105965872 >>105965967

Anonymous

7/20/2025, 12:33:01 PM No.105965837

>>105963864
It makes a lot of sense that a woman would try to stop alpha chads ITT from posting doll pictures.

Anonymous

7/20/2025, 12:38:34 PM No.105965872

>>105965831
I already told you what I'm using. Maybe your settings are wrong, or you're just assuming a model will talk like a loli-loving 4chan poster with zero prompting? If you're the anon who's been doing the mesugaki tests you may as well stop, it's pointless. Cockbench at least makes some sense because it provides context and isn't going to be tethered by the model's intended personality so much.

Replies: >>105965918

Anonymous

7/20/2025, 12:44:33 PM No.105965918

>>105965872
>I already told you what I'm using
You didn't. I asked for the tokens. Sillytavern does all sorts of stupid shit that most people aren't aware of and then they are surprised when they realize a card was overriding their settings.

>Maybe your settings are wrong
I don't use sillytavern.

>If you're the anon who's been doing the mesugaki tests you may as well stop
I posted at least one mesugaki test but most of mesugaki tests I saw posted in the thread weren't by me. Not like it matters since it's just one message using the chat template.
If you're doing mesugaki tests using sillytavern then that's wrong because it's impossible to make comparisons.

Replies: >>105965938

Anonymous

7/20/2025, 12:45:40 PM No.105965927

>>105965829
breaking free = dialing up the slop

Replies: >>105965938

Anonymous

7/20/2025, 12:46:46 PM No.105965937

We need to ban any mes*gaki posting on sight

Replies: >>105966549

Anonymous

7/20/2025, 12:46:50 PM No.105965938

>>105965918
I'm not doing the mesugaki tests, just demonstrating that gemma isn't unusable like some anons here think.
>>105965927
Feel free to name a model without slop

Replies: >>105966024

Anonymous

7/20/2025, 12:48:55 PM No.105965953

>>105959558 (OP)
Should I be using weighted/imatrix or static quants for Q4_K_M?

Replies: >>105965965 >>105965990

Anonymous

7/20/2025, 12:50:10 PM No.105965965

>>105965953
Only use q8 and above.

Anonymous

7/20/2025, 12:50:22 PM No.105965967

>>105965831
For what it's worth, with Gemma 3, the more outrageous the instructions, the closer to the head of the conversation they should be kept. The default "AI assistant" and the "OOC" personas seem to trigger cuckery more than the actually roleplayed characters. Your mileage may vary (it is possible to define an assistant of that type, but you have to be detailed).

I'm using chat completion with "merge consecutive roles" enabled, and enclosing the instructions (using the "user" role") within tags that the model should be able to identify usually at depth 3 or 5 (with the user being the first message). In this way it's uncensored for my purposes, but I'm not into torture or baby fucking. But yes, actual sex scenes are lackluster (although I far prefer the buildup phase anyway, and Gemma 3 is good at that).

Anonymous

7/20/2025, 12:51:10 PM No.105965974

1749042135145339

md5: 5e0a7514092ff60f02afb9592cec8b88🔍

>>105963244
Lol

Replies: >>105965983

Anonymous

7/20/2025, 12:52:36 PM No.105965983

>>105965974
you just know

Anonymous

7/20/2025, 12:53:52 PM No.105965990

>>105965953
imatrix is always a straight upgrade unless you're using some obscure hardware that doesn't like imatrix

Replies: >>105966003

Anonymous

7/20/2025, 12:55:19 PM No.105966003

>>105965990
A straight upgrade to slop for sure.

Anonymous

7/20/2025, 12:57:32 PM No.105966024

>>105965938
I wasn't saying that gemma is unsable, just that you can't get that output with what you showed in your screenshot.
You have since reveled the existence of a system prompt.
How all that text is arranged, where the contents of the card are, and whether you have any prefixes or suffixes on your messages is still unknown and will remain unknown until you post the exact string of text that the model receives.
Using "{char}:" as message prefix alone is usually enough to make gemma compliant.

Replies: >>105966308

Anonymous

7/20/2025, 1:02:18 PM No.105966060

Nemo my name forever more.

Anonymous

7/20/2025, 1:24:03 PM No.105966224

0-03

md5: 5c6f19128ecf01803f81ae3e972b8f31🔍

Anonymous

7/20/2025, 1:24:58 PM No.105966231

1737622363859331

md5: 4b230c17c3a6934df14dc4ca67832abb🔍

>>105965744
Amazing

Replies: >>105966291

Anonymous

7/20/2025, 1:30:15 PM No.105966277

>>105965568
I tried magpan just now and the first message I got was “hehehehaahaher like letting me pee outside your house sometimes!!!” Completely unprompted

Anonymous

7/20/2025, 1:32:50 PM No.105966291

178109

md5: 1b452495bfd57cbdee12944bf0bc2b0d🔍

>>105965670
>>105965744
>>105966231
>mesugaki

Replies: >>105966302 >>105966365

Anonymous

7/20/2025, 1:34:19 PM No.105966302

>>105966291
This user is a necrophile, beware.

Replies: >>105966342

Anonymous

7/20/2025, 1:34:43 PM No.105966308

>>105966024
>You have since reveled the existence of a system prompt.

Anonymous

7/20/2025, 1:35:30 PM No.105966314

Forget it, it's still retarded

Anonymous

7/20/2025, 1:39:35 PM No.105966342

1722273569003251

md5: 31cbf99cafd9cc384abbd9e84ef3933e🔍

>>105966302
>This user is a necrophile, beware.

Replies: >>105966370

Anonymous

7/20/2025, 1:40:46 PM No.105966358

>>105965438
i don't care what anyone says i still love mistral large.
yeah whatever its not deepseek but its got soul.

Replies: >>105966369

Anonymous

7/20/2025, 1:41:37 PM No.105966365

>>105966291
Most foul fucking thing I’ve read, fucking hell

Anonymous

7/20/2025, 1:41:51 PM No.105966369

>>105966358
Speaking of mistral large, where is it? They promised it a while ago.

Anonymous

7/20/2025, 1:41:56 PM No.105966370

>>105966342
the fuck is this shit

Replies: >>105966537 >>105966793

Anonymous

7/20/2025, 1:43:16 PM No.105966380

llama5 dense 70b eoy trust zuck

Replies: >>105966546

Anonymous

7/20/2025, 1:46:13 PM No.105966408

mistral large 3 will be dense. mistral is the only company who caught the moe delusion syndrome earlier than the rest and successfully recovered from it after 8x22b was a huge piece of shit. they are immune to this wave that caught all the others after deepseek.

Replies: >>105966503

Anonymous

7/20/2025, 1:52:10 PM No.105966460

>Don't tell me you need more details. You’re just feeding the beast.
Of course I asked for more, stupid Genma lol

Anonymous

7/20/2025, 1:57:24 PM No.105966503

>>105966408
Due to the EU AI Act they can't train anymore models using more than 10^25 floating point operations without severe legal burdens, so it has to be a MoE model. It will easily be something similar to the tried-and-tested DeepSeek R1/V3 formula.

Anonymous

7/20/2025, 2:00:57 PM No.105966537

>>105966370
Autism.
You'll quickly notice he always posts the same images too.

Anonymous

7/20/2025, 2:02:22 PM No.105966546

>>105966380
Nobody really cares anymore about Llama models.
>I cannot help with that.

Replies: >>105966557

Anonymous

7/20/2025, 2:02:40 PM No.105966549

>>105965937
I'm more concerned about jewish Rabbis mutilating babies and then sucking their dicks to be honest. The other thing is just words on some weirdos computer screen.

Anonymous

7/20/2025, 2:03:06 PM No.105966551

build

md5: fcbb18da9b886d76593c67b7cca83930🔍

What kind of pc build do I need to run something like 405B Model (Llama 3.1)? Any videos or guides of anyone running this locally?

Replies: >>105966669 >>105966986

Anonymous

7/20/2025, 2:04:07 PM No.105966557

>>105966546
What models are they using instead then?

Replies: >>105966605 >>105966699 >>105966852

Anonymous

7/20/2025, 2:10:01 PM No.105966605

>>105966557
Either more recent mid-size models in the 24-32B range from other companies or bigass MoE models mainly from the Chinese. Llama 4 was a failure; Llama 3.3 70B which seemed kind of OK got released last December. Yet another Llama model pretrained on ultra-filtered data and finetuned on stale, sloppy datasets with stubborn refusals will be quickly forgotten again.

Anonymous

7/20/2025, 2:11:48 PM No.105966619

Some slop I've been noticing a lot recently with R1:
"airs smells of ozone", sometimes earthy
context: magic, lewd
I've seen it happen in a few recent unrelated stories. What's with this?

Replies: >>105966640 >>105966652 >>105966844

Anonymous

7/20/2025, 2:15:10 PM No.105966640

>>105966619
>What's with this?
Overfitting as a result of distillation

Replies: >>105966676

Anonymous

7/20/2025, 2:17:21 PM No.105966652

>>105966619
That's just one of 0528's most prevalent slop phrases. K2 does it too.

Anonymous

7/20/2025, 2:20:01 PM No.105966669

>>105966551
Get yourself 512gb of vram.

Anonymous

7/20/2025, 2:20:29 PM No.105966676

>>105966640
But it's R1, it's barely saturated, if it was true, 1-2bit quants wouldn't work as well.

Anonymous

7/20/2025, 2:23:39 PM No.105966699

>>105966557
Everyone is using DeepSeek V3/R1 or coping.

Anonymous

7/20/2025, 2:26:19 PM No.105966722

>>105966718
>>105966718
>>105966718

Replies: >>105967460

Anonymous

7/20/2025, 2:34:36 PM No.105966793

>>105966370
How new?

Anonymous

7/20/2025, 2:39:59 PM No.105966844

>>105966619
Seen happen a lot with gemini too.

Anonymous

7/20/2025, 2:41:22 PM No.105966852

>>105966557
Magistral seemed usable for its size.

Anonymous

7/20/2025, 2:46:39 PM No.105966898

>>105959907
No I actually did get an A100 for sub 50 usd because it is impossible to cool supposedly.

Gonna test a 4090 cooler adapter first then move to liquid cooling.

Anonymous

7/20/2025, 2:57:31 PM No.105966986

>>105966551
i see you have no idea what you are talking about. what is your use case? budget?

Anonymous

7/20/2025, 3:33:25 PM No.105967236

Anybody tried
>https://github.com/p-e-w/waidrin
yet?
It was posted a while ago but I don't think I've seen any discussion on it.
Any techniques we could steal for our own RP?

Replies: >>105967351 >>105967386

Anonymous

7/20/2025, 3:47:12 PM No.105967351

>>105967236
>

At each turn, the player is presented with a choice of multiple AI-suggested actions, but they can also provide a different action as freeform text. This blends a classic CYOA experience with the limitless freedom of generative AI. It's an RPG unlike any other.
Brilliant, much better for one handed use.

Are there any small modern MoEs tuned for roleplaying to use this with?

Replies: >>105967396

Anonymous

7/20/2025, 3:51:51 PM No.105967386

>>105967236
It's built very incompetently while trying to appear as a serious project, i wouldn't pay much attention to it.
Ideally we need something like comfy but for text and with execution flow control, for people to build their own logic and shit. I've seen an anon post something of this sort the other day, I wonder if he had this in mind

Anonymous

7/20/2025, 3:52:53 PM No.105967396

>>105967351
>At each turn, the player is presented with a choice of multiple AI-suggested actions
There's an extension that does that no?
At least I think I remember anons doing that in Silly.
I just have the model do that with a low depth instruction.

>Are there any small modern MoEs tuned for roleplaying to use this with?
No idea.
Hell, the only small MoE I can think of are Mixtral 8x7b and Qwen3 30BA3B.

>It's built very incompetently while trying to appear as a serious project, i wouldn't pay much attention to it.
Interesting. Can you detail that?
Regardless, I'm more interested in the ideas than the application itself.

Replies: >>105967464

Anonymous

7/20/2025, 4:00:38 PM No.105967460

>>105966722
>no miku OP
>mikutranny meltsdown and shits in thread with LARP, softcore porn and unironic LLM posts
Never change.

Anonymous

7/20/2025, 4:01:48 PM No.105967464

>>105967396
>Can you detail that?
Among other things, magic values, e.g. the model is asked (with a hard-coded prompt) to generate 5 new characters for each new location. Such system must be user-defined, as I said, with logic exposed and large systems pre-implemented.

Anonymous

7/20/2025, 5:24:10 PM No.105968172

>>105964300
Most degenerate fetish