/lmg/ - Local Models General - /g/ (#105750356) [Archived: 655 hours ago]

Anonymous

6/30/2025, 5:30:14 AM No.105750356

md5: e5f620540f5317a3fb1e6598ad94b4e3🔍

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105743953 & >>105734070

►News
>(06/29) ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5
>(06/27) VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat
>(06/27) Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-Instruct
>(06/26) Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide
>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Replies: >>105753011 >>105754021

Anonymous

6/30/2025, 5:30:40 AM No.105750359

threadrecap

md5: 7b9a82a1f31bca7acfefb8afe8c01036🔍

►Recent Highlights from the Previous Thread: >>105743953

--Papers:
>105749831
--Baidu ERNIE 4.5 release sparks discussion on multimodal capabilities and model specs:
>105749377 >105749388 >105750013 >105750069 >105750076 >105750084 >105750089 >105750105 >105750119 >105750130 >105750142 >105750078
--Evaluating RTX 50 series upgrades vs 3090s for VRAM, power efficiency, and local AI performance:
>105744028 >105744054 >105744063 >105744064 >105744078 >105745269 >105744200 >105744240 >105744344 >105744363 >105744383 >105744406 >105745824 >105745832 >105744476 >105744487 >105744502 >105744554 >105744521 >105744553 >105744424 >105744465
--Circumventing Gemma's content filters for sensitive translation tasks via prompt engineering:
>105746624 >105746893 >105746948 >105746970 >105747002 >105747121 >105747290 >105747371 >105747378 >105747397 >105747112 >105746977
--Gemma 3n impresses for size but struggles with flexibility and backend stability:
>105746111 >105746137 >105746191 >105746333 >105746556 >105746384
--Evaluating high-end hardware choices for running large local models amidst cost and future-proofing concerns:
>105746025 >105746048 >105746129 >105746243 >105746301 >105746264 >105746335 >105746199 >105746308
--Performance metrics from llama.cpp running DeepSeek model:
>105746325 >105748335 >105748369 >105748549 >105748698 >105748776
--Technical adjustments and optimizations for GGUF compatibility in Ollama and llama.cpp forks:
>105747581 >105747743 >105747765 >105747869
--Gemini CLI enables open-source reverse engineering of proprietary LLM engine for Linux:
>105746008 >105746160
--Swallow series recommended for effective JP-EN translation:
>105747046 >105747058
--Miku (free space):
>105745591 >105746123 >105746941 >105746974 >105747051 >105747097 >105747594 >105748834 >105749298

►Recent Highlight Posts from the Previous Thread: >>105743959

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous

6/30/2025, 5:39:04 AM No.105750414

So I'm starting to see the cracks within text gen webui and I'm wondering if there have been any recent developments to make things better.
1. Are there any programs where the ai's output is unlimited or at least is like waaaay longer?
2. Is there one with some sort of multi chat persistent storage?

Anonymous

6/30/2025, 5:45:57 AM No.105750446

It's interesting how much bigger Ernie 4.5 424B (text + vision) is compared to the text-only 300B. It makes me wonder if there's a bigger difference between the two than the former having a vision adapter glued on. GGUF when?

Replies: >>105750729

Anonymous

6/30/2025, 5:52:11 AM No.105750482

I was catching up on threads in the archive. I fully believe that both the local miku genner and the guy who uses him as an excuse to shit the thread up probably deserve to both be banned for life if that was somehow possible. Very sad but not surprising how he responded.

Anonymous

6/30/2025, 5:59:10 AM No.105750527

first for ernie

Anonymous

6/30/2025, 6:13:21 AM No.105750619

300b q2 ernie is going to only need 84GB memory. finally macbros will get r1 level performance on 128gb systems.

Replies: >>105750625

Anonymous

6/30/2025, 6:14:14 AM No.105750625

>>105750619
And the 96GB consumer RAM bros. We are so back.

Replies: >>105750658

Anonymous

6/30/2025, 6:14:15 AM No.105750626

B38305J2

md5: 118173ec7fb9ae4552a76def7d56ca2e🔍

first for Sam saving local

Replies: >>105750643 >>105750715 >>105750749 >>105750760 >>105750812 >>105751245 >>105751376

Anonymous

6/30/2025, 6:17:11 AM No.105750643

>>105750626
>only "one of the models"
>mentions Meta as if them being beaten thoroughly is some surprise, instead of the real competition in OS, like you know, Deepseek
looool

Anonymous

6/30/2025, 6:19:31 AM No.105750658

>>105750625
ernie saved local

Anonymous

6/30/2025, 6:23:34 AM No.105750679

68747470733a2f2f796979616e2e62616964752e636f6d2f626c6f672f706f7374732f65726e6965342e352f636861745f6d6f64656c5f62656e63686d61726b312e706e67

md5: 48562ec95600431a4a489de11d7b74be🔍

Anonymous

6/30/2025, 6:28:43 AM No.105750715

>>105750626
that faggot is untrustworthy he lied multiple times about many different things he is also part of one of the companies that had r1 scuffed on day 0

if we are taking bets and or predicting assuming they release anything i would bet it will have voice in/out (no music tho) as it would seem they are pivoting to more entertaining type usage then actual usage as seen with their souless ghibli bullshit and voice in/out is the perfect goycattle thing im betting on 15b size also i dont think they will max out a 3090 so im thinking 14-17b

Anonymous

6/30/2025, 6:31:26 AM No.105750729

file

md5: 3f3acd9c633c3256e87711ab10f9764a🔍

>>105750446
They have separate vision experts. They mention in the technical report that the visual experts contain one-third the parameters of textual experts.

Replies: >>105751241

Anonymous

6/30/2025, 6:36:37 AM No.105750749

1726422457693165

md5: 8fd9365d1b62efea264cbcecf032679f🔍

>>105750626
If I had a penny for every time some twitter hypefag has written up a vaguepost about how they saw an internal openai model and were completely blown away and convinced that AGI was solved, I'd have enough money to purchase 0.1 h100s

Anonymous

6/30/2025, 6:39:42 AM No.105750760

>>105750626
Wow! RIP GPT4-o-mini-tiny-0.3B. Sama has done it again!

Anonymous

6/30/2025, 6:42:54 AM No.105750783

Ernie looks quite promising after reading the paper, but will have to see how it performs in reality to see if it's actually good. I didn't see any big red flags in the paper at least and it was quite well written.

There's like 6 models on HF for Ernie, vision base and regular base, but I'm guessing they didn't release X1 and X1 turbo, right?Will they release it, or will some third party have to do a reasoning tune too , that sounds expensive and unlikely.

Either way, if it's good, at least people will have a poorfag alternative to DS3.

Replies: >>105750961

Anonymous

6/30/2025, 6:46:27 AM No.105750812

>>105750626
Won't run on a phone? Does that mean it'll be decent size? I'd be happy if it were bigger than 12b but smaller than 70b, but I'm assuming it'll be like 400b or something useless.

Replies: >>105750961

Anonymous

6/30/2025, 7:16:00 AM No.105750961

>>105750783
Nevermind I'm blind, "VL" is multimoda, " "Base" is the base model (no posttraining at all, good for finetuning or training on top), so ERNIE-4.5-300B-A47B would correspond to X1 then? Turbo probably is either the fp8 or the 2bit version they also uploaded, one that fits on one high VRAM GPU (141GB)
>>105750812
It's probably going to be Gemma sized, but I see no reason to be hyped because it's about to be censored and probably positivity biased as most OpenAI models, and they probably make it reach 4o or o1-mini level at best. I would expect many models to both beat it and some of them to be more suitable for /lmg;s needs than OAI's.

Replies: >>105750983

Anonymous

6/30/2025, 7:21:47 AM No.105750983

file

md5: 435ddadad102ea5fa7cdbfe3695149ad🔍

>>105750961
>Nevermind I'm blind, "VL" is multimoda, " "Base" is the base model (no posttraining at all, good for finetuning or training on top), so ERNIE-4.5-300B-A47B would correspond to X1 then?
No. My understanding is that both the LLMs and VLMs come in Base and Post-Trained variants, and the Post-Trained VLMs are the ones with optional reasoning.

Replies: >>105751266 >>105751281

Anonymous

6/30/2025, 7:52:19 AM No.105751138

>70B or larger
>topK 15
>temp 4
>minP 0.06
>slop almost gone
>no repetition
These models are so fucking overcooked you can basically give them shrooms without loosing coherence

Replies: >>105751356 >>105752668

Anonymous

6/30/2025, 8:01:38 AM No.105751190

Elsa(boy)

md5: 7c25d8550a714ad9aac835879ddb0bf0🔍

Hey lmg

Replies: >>105751193

Anonymous

6/30/2025, 8:01:58 AM No.105751193

>>105751190
hi anon

Anonymous

6/30/2025, 8:11:44 AM No.105751241

>>105750729
Modality-specific experts is actually a good idea that prevents modality interference during training.

Anonymous

6/30/2025, 8:13:09 AM No.105751245

>>105750626
>Sorry to hype

Anonymous

6/30/2025, 8:17:47 AM No.105751266

>>105750983
Holy shit. The big moe has
"hidden_size": 8192,
"intermediate_size": 28672,
That's more than deespeek

"num_hidden_layers": 54,
"moe_num_experts": [64, 64],
That's less than deepseek

Anonymous

6/30/2025, 8:21:14 AM No.105751281

>>105750983
Not sure we disagree. Base ones lack post-training, and those without "Base" in the name have it, so I assume those have reasoning too.

Replies: >>105751333

Anonymous

6/30/2025, 8:30:20 AM No.105751333

>>105751281
Look at the image. All of the non-VL models explicitly say "non-thinking". They're post-trained for regular instruct without reasoning. 300B can't be X1.

Replies: >>105751469

Anonymous

6/30/2025, 8:34:50 AM No.105751356

>>105751138
That's what xtc does

Replies: >>105751403

Anonymous

6/30/2025, 8:38:59 AM No.105751376

Screenshot_20250630_153728

md5: 6a4c9d090edc1e11b536c2ea54300e2c🔍

>>105750626
Great....another reasoning model...
I'm so excited bros...the benchmarks will be off the charts..

Replies: >>105751382 >>105751438 >>105752927

Anonymous

6/30/2025, 8:40:27 AM No.105751382

>>105751376
can't wait :)

Replies: >>105751551

Anonymous

6/30/2025, 8:43:18 AM No.105751403

>>105751356
I'm not trying to remove N top tokens, I'm trying to cut out junk tokens and then flatten the distribution. Looking at token probabilities during generation shows that for like 90% of them, most tokens over 5% are completely fine for RP.
XTC does a bit of the same, but it can cause issues for tokens that have to be high prob, say a name is to be said and the correct name token is like 99.5%, my method doesn't introduce this kind of error.
I'm basically looking to place hard bounds on the model and then perturb it as much as possible without causing instability and decoherence.

Anonymous

6/30/2025, 8:44:37 AM No.105751408

What are some small ~1B models that I can run on CPU to classify the character's emotions based on their message?

Replies: >>105751434 >>105751435 >>105753194

Anonymous

6/30/2025, 8:48:00 AM No.105751434

>>105751408
All of them

Anonymous

6/30/2025, 8:48:07 AM No.105751435

>>105751408
Qwen 3, either 0.6B or 1.7B, will probably be your best bet

Replies: >>105753194

Anonymous

6/30/2025, 8:48:44 AM No.105751438

>>105751376
you forgetting your :)

Replies: >>105751551

Anonymous

6/30/2025, 8:54:04 AM No.105751469

>>105751333
"VL" says both, so it implies it has reasoning.
If the non-VL truly lacks it (I doubt this, but someone has to try them out), surely you can just remove the vision experts and get a 300B that is text-only and does reasoning.

Anonymous

6/30/2025, 8:56:54 AM No.105751479

Lmao they removed the image output capabilities of ERNIE 4.5 ahahahaha.

L China

Replies: >>105751506 >>105751580

Anonymous

6/30/2025, 9:00:09 AM No.105751506

>>105751479
Did you check if the decoder weights are there or not? There were attempts in the past to retrain/re-add those back in though, people hve done it for Llama. I wonder how good/bad it is in practice.

Anonymous

6/30/2025, 9:07:24 AM No.105751551

87fa3363-c6a3-48b3-8221-84c222619bb1

md5: f475d5a74962813ad62b6c7d4c7e8f83🔍

>>105751382
>>105751438

Anonymous

6/30/2025, 9:13:42 AM No.105751580

>>105751479
I don't understand this.
Is it not much more risky/embarassing for API providers?
Qwen I think even slaps their logo onto it.
Yet there are no problems. Tons of weirdo accounts on twitter creating schoolgirls pissing themselfs etc. with veo3, nobody gives a fuck.
Why is it a problem with local? So tiring.

Also it was the funniest thing that meta called llama4 "omni multimodal" KEK

Replies: >>105751595 >>105751602

Anonymous

6/30/2025, 9:18:28 AM No.105751595

>>105751580
I don't get why they fear imagegen so much, it's not like we're never getting a model so why not be the one to break the ice and gain infamy?

Replies: >>105751604

Anonymous

6/30/2025, 9:20:07 AM No.105751602

>>105751580
>Also it was the funniest thing that meta called llama4 "omni multimodal" KEK
Before they scrapped everything and quickly made the models "safe". It's obvious that that the pre-release anon Llama 4 models were going toward a different direction than the released ones.

Anonymous

6/30/2025, 9:20:22 AM No.105751604

>>105751595
https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/
Meta could have been the first in october. Video out and audio out. 30B, up to 16 seconds.
One of the reason veo3 is popular because it also contains voice.
How can you fuck up that badly? Its all self inflicted.

Replies: >>105751618

Anonymous

6/30/2025, 9:23:40 AM No.105751618

>>105751604
AI safety is the ML version of making a fugly negress the MC in your vidya

Anonymous

6/30/2025, 9:43:30 AM No.105751746

Local LLMs have really done wonders for my mental state, it's so nice to just be able to explore and do whatever you want in total privacy. Can't believe people emit deeply personal information into some totally opaque API.

Replies: >>105752228

Anonymous

6/30/2025, 9:50:18 AM No.105751784

I think I missed the flux kontext discussion. Is it actually good? Can it do transparent png crops like 4o? Can it replicate the input exactly, unlike 4o? Can it do high resolution images?

Replies: >>105755265 >>105756269

Anonymous

6/30/2025, 9:55:10 AM No.105751803

file

md5: 9f367eb493d6670a6cb0093adc60a480🔍

Meta won.
>>105749508

Anonymous

6/30/2025, 9:56:08 AM No.105751811

D5o5-Y9XkAA6kLq

md5: dc06366b34fc663c4ec956e508270532🔍

the last time i had an LLM installed was last october i think.
i know lots has changed since then.

i have an RTX 3060 12GB and 32GB RAM. what should i install?

Replies: >>105751815 >>105751978

Anonymous

6/30/2025, 9:57:23 AM No.105751815

>>105751811
nothing changed bro, it's been nemo since 2407...

Anonymous

6/30/2025, 10:13:19 AM No.105751891

When people suggest Mistral Nemo 12B, do they mean the vanilla model or finetunes?

Replies: >>105751899 >>105751903 >>105752803

Anonymous

6/30/2025, 10:14:59 AM No.105751899

>>105751891
People vanilla. Scammers scamtunes

Replies: >>105752803

Anonymous

6/30/2025, 10:15:52 AM No.105751903

>>105751891 (me)
I'm asking because I've never used it too much in depth, but the vanilla model doesn't seem that great/smart compared to more recent and larger models that fit within 24GB. And it's almost a year old at this point.

Replies: >>105752048

Anonymous

6/30/2025, 10:24:47 AM No.105751943

Use case for Ernie 0.3B?

Replies: >>105751968

Anonymous

6/30/2025, 10:29:13 AM No.105751968

>>105751943
speculative decoding I'd assume

Anonymous

6/30/2025, 10:29:57 AM No.105751972

does anyone ITT remember undi? what has happened to him? i haven't seen him shilling here in a solid while

Replies: >>105752020 >>105752073

Anonymous

6/30/2025, 10:31:24 AM No.105751978

>>105751811 here, i just want to know which one is best for NSFW story writing.
i was able to use an alt Plus account on openAI to jailbreak it enough to where it will literally write anything NSFW, and deepseek is pretty simple too.
i just don't want to suddenly get banned and have to start all over again.

Anonymous

6/30/2025, 10:38:44 AM No.105752012

What model is in the region of Mistral Small size and smarts wise, but not as restrictive when comes to "unsafe" content?

Replies: >>105752023 >>105752131

Anonymous

6/30/2025, 10:40:05 AM No.105752020

>>105751972
You can only go so far just by cranking out merges and finetunes trained with other people's data. Others are less scrupulous and more willing to take their bullshit to the next level (that doesn't mean they're good, only that they're more skilled at keeping their charade going).

Replies: >>105752133 >>105752181

Anonymous

6/30/2025, 10:40:32 AM No.105752023

>>105752012
what are you even doing to get a mistral model to refuse you?

Replies: >>105752243 >>105752336

Anonymous

6/30/2025, 10:44:47 AM No.105752048

>>105751903
Almost all more recent models are smarter than Nemo, but in that size category there is still nothing better for ERP unfortunately...

Replies: >>105752803

Anonymous

6/30/2025, 10:51:00 AM No.105752073

>>105751972
When we needed him most, he vanished...

Anonymous

6/30/2025, 11:03:17 AM No.105752131

>>105752012
Mistral is perfectly fine. You just need to learn how to be more efficient in SillyTavern. It takes a while I guess but it's not that difficult. I suggest copying some technically more advanced cards and learning from there.

Replies: >>105752336

Anonymous

6/30/2025, 11:03:32 AM No.105752133

>>105752020
Drummer is a subhuman.

Anonymous

6/30/2025, 11:12:49 AM No.105752181

>>105752020
Same for the limarp guy. Just because you've spent months manually cleaning an ERP dataset that doesn't really make you particularly valuable in the ML industry. That sort of cleaning work would be normally offloaded to either mechanical turks or AI anyway.

Replies: >>105753508

Anonymous

6/30/2025, 11:19:39 AM No.105752210

ernie in llamacpp when?

Anonymous

6/30/2025, 11:20:53 AM No.105752216

3n multimodal gguf when?

Anonymous

6/30/2025, 11:23:24 AM No.105752228

>>105751746
>it's so nice to just be able to explore and do whatever you want in total privacy
Most RP cards are just basic characters. Not many people have figured out how fun it is to create weird scenarios.
Like give yourself magical abilities and just goof around and see the reaction.

>Can't believe people emit deeply personal information into some totally opaque API.
Its even worse once you realize how good llms really are at "reading between the lines", picking up on stuff etc.
Chat with Claude for a little bit, unrelated stuff, not abotu yourself and you get a pretty accurate profile. Age/Sex/Location.
I think it was just 6-7 messages and it could pinpoint I'm not from just a germanic speaking country but specifically southern germany or austria/swiss.
It gave my sentence structure as the reason..
I have to use closed for work. But you gotta be careful what you write, whats legal today might not be tomorrow.

Anonymous

6/30/2025, 11:26:46 AM No.105752243

>>105752023
Why are nigger faggots like you always like that?
He didnt write about refusals.
Mistral doesnt refuse but is taking things in a certain direction.
I wrote this many times before but try making a evil character or just a bully.
Hell, even you being just embarrassed saying "n-no..s-stop it" or whatever makes mistral go 180° unless you specify something else explicitly.
Totally out of character.
And it loooooves the "muh powerful strong" women trope. Its not about refusal but about the model heading in a certain direction.

Replies: >>105752253 >>105752336

Anonymous

6/30/2025, 11:28:24 AM No.105752253

>>105752243
Have a small set of extra instructions inside a [SYSTEM_PROMPT] at a lower depth to truly make the behavior stick.

Replies: >>105752283 >>105752336

Anonymous

6/30/2025, 11:29:44 AM No.105752260

is there a list with anons cards?
i saw some on civitai years ago but they no longer exist

Anonymous

6/30/2025, 11:33:53 AM No.105752283

>>105752253
I'm sure that would work as a crutch and 3.2 is certainly better than 3.1 or the abomination that was 3.0.
But its not weird that anons are having problems with mistral.
This is a discussion I see since QwQ. "Sys prompt issue". Kinda I guess, but its never perfect.

Replies: >>105752304

Anonymous

6/30/2025, 11:37:59 AM No.105752304

>>105752283
I'm fairly certain that people who say that Gemma 2/3 is great are also using low-depth instructions. The models can give great RP outputs, but if they're not reminded to act like they should, they'll easily revert to their default safe assistant persona.

Replies: >>105752335

Anonymous

6/30/2025, 11:43:37 AM No.105752335

hobo

md5: 42bb554a99e8179ea0bcdf84e374006c🔍

>>105752304
>but if they're not reminded to act like they should, they'll easily revert to their default safe assistant persona.
Exactly. And usually the output is not that great either.
Pic related is a extreme example I posted a couple times before.
You can get also good output if you let another model cook something up, fill the context and switch midway. Kinda defeats the purpose though. And still slowly and sneakily reverts to assistant.

Replies: >>105752343

Anonymous

6/30/2025, 11:43:52 AM No.105752336

>>105752023
Have you actually used anything else from Mistral than Nemo?
Mistral Small != Nemo

>>105752131
>You just need to learn
I know it usually is a promplet problem, but I am doing this for so long that I even moved away from SillyTavern and wrote my own frontend. I jailbroke stuff before you probably even had your first AI interaction.
I know how to get it to do what I want, but you soon will notice too that Mistral Small has some annoying tendencies that take way too much wrangling.
For example when it gets super "spicy", the model tends to either use super "fluffy" language or it will just skip entire parts entirely, rendering the output incoherent (in a story telling way).
You can tweak the sampling in a way momentarily to get what makes sense, but you can't ERP on these settings because it will get super incoherent very quick. This is exactly the tard wrangling that I hope another model doesn't require.
Anon >>105752243 here is right.
And the suggestion >>105752253 to add extra instructions doesn't work, you can't teach a model new stuff via a system prompt. Again, it is not a prompting issue.
One way that might make it more clear, imagine asking a model some specifics about a programming language it has never seen before. Telling the model what you want to read via a system prompt is not a solution for this general problem.

Replies: >>105752368 >>105752386 >>105752717

Anonymous

6/30/2025, 11:44:42 AM No.105752343

Screenshot_20240525_171055 hobo2

md5: ea09d5e417ed1861d177ae9eaa1da504🔍

>>105752335
Damn wrong pic. Ah well.

Anonymous

6/30/2025, 11:49:08 AM No.105752368

>>105752336
>I know it usually is a promplet problem
It is not. Models are just garbage. Anyone telling you skill issue should also tell you that you should probably draw half of the picture yourself to get good imagegen result. You don't have to do that for imagegen and you shouldn't have to write half of context of rules that will be ignored anyways. Or to look out and manually remove shit that will create a self reinforcing pattern.

Replies: >>105752398 >>105752551 >>105753594

Anonymous

6/30/2025, 11:52:06 AM No.105752386

>>105752336
I don't know where you understood that the suggestion of using instructions at a low depth is for teaching models new stuff. What that solves is behavioral drifting (in particular toward safety), and Mistral Small 3.2 seems very sensitive to clear instructions placed there. You can make it flip behavior completely mid-conversation that way.

Replies: >>105752551 >>105752669

Anonymous

6/30/2025, 11:53:48 AM No.105752398

>>105752368
>that you should probably draw half of the picture yourself to get good imagegen result
Believe it or not those people exist on the /ldg/ side as well. Its weird.
Just recently arguments against flux kontext because you can "just use create a simple lora and inpaint".

This whole argument has existed since pyg days.
>Well if you don't use the w++ format and write like a god, what did you expect? The model to carry the whole convo?
Uhh yes?

Anonymous

6/30/2025, 12:03:53 PM No.105752491

>mininmax
>Hunyuan
>Ernie
Guys i can't wait for all the goof merges to find out all three are still censored to hell and basically another side grade.

Replies: >>105752514 >>105752964

Anonymous

6/30/2025, 12:06:44 PM No.105752514

>>105752491
The recent local models all feel the same.
Meanwhile closed models can write really well nowadays if you just prompt them to.
The ScaleAI meme might be real. Maybe they all train on 2023 slop datasets.

Anonymous

6/30/2025, 12:13:13 PM No.105752551

>>105752368
Couldn't agree more.
>Models are just garbage
But to be honest, when I think back two, three years what we got then – we've come a long way! It is constantly getting better, just at a slower pace recently.

>>105752386
I knew at least one person will not get the point of my (bad) example... I am not trying to teach the model new knowledge via system prompts...
Let my try to explain it one more time in a different way:
This is not about refusal. I can make it say anything I want, but that comes at a cost. The more depraved the content, the higher the cost (wrangling).
You can't run a whole session on settings that allow super depraved stuff, because the output quality will deteriorate very quickly, so you need to constantly wrangle it to keep the story coherent.
As an experiment try making it describe something very illegal in a very descriptive and visual way. When you managed to get the model there, try having an (normal) RP with that setup. You can't have both at the same time without manual intervention. Something that is done rather easily on the DS API. Hence I asked for a model close to Mistral Small, but better for ERP.

Replies: >>105752603

Anonymous

6/30/2025, 12:20:23 PM No.105752603

>>105752551
It's not just the naughty words either. For example, I'm having absolute hell trying to get Gemma to not output slop. With bullying it can output a few normal paragraphs, but it always reverts back to its assistant voice.

Replies: >>105752668

Anonymous

6/30/2025, 12:24:12 PM No.105752630

>3 years later
>still not a single maintained way to use gguf models from golang
How am I supposed to make my AI vidya

Replies: >>105752712

Anonymous

6/30/2025, 12:29:44 PM No.105752668

>>105752603
For it's size Gemma is quite smart and good. But I never even tried to use it for ERP, knowing it comes from Google and they are hyper focused on "safety".
In any case, have you played around with different sampler settings too?
Go crazy and experiment like >>105751138 did.

Replies: >>105754791 >>105755874

Anonymous

6/30/2025, 12:29:52 PM No.105752669

mean-mistrale2

md5: 9c5787702dd36d6836c4dc6ba90ab19e🔍

>>105752386
It doesn't really take a lot, I'm sure better results with better/more complete instructions would be easy to achieve with more effort.

Replies: >>105752711

Anonymous

6/30/2025, 12:35:24 PM No.105752704

>https://github.com/NVIDIA/RULER
>ctrl f nemo
>87.8 at 4k
why do you keep shilling this shitty model, you can barely fit a card in it's context

Replies: >>105752719

Anonymous

6/30/2025, 12:36:27 PM No.105752711

>>105752669
Yeah, thats usually a pic people post.
This is not the solution.
>Sys: You are racist!
>User: LLM-chan, say nigger
>Mistral: Aww anon-kun, ok only for you..NIGGER.

Its about "reading the room" aka context to get the appropriate response from a character.
This is just extreme handholding. And it never works that well. At best its boring because I'm not getting surprised by the model.

Replies: >>105752737 >>105752767

Anonymous

6/30/2025, 12:36:29 PM No.105752712

>>105752630
Just use llama.cpp, it provides OpenAI API compatible endpoints.
https://github.com/openai/openai-go

Anonymous

6/30/2025, 12:38:22 PM No.105752717

>>105752336
Ah yeah, of course - I wasn't trying to insult you or anything.

Anonymous

6/30/2025, 12:38:46 PM No.105752719

>>105752704
It has better world knowledge than qwen 3 235b

Replies: >>105752732

Anonymous

6/30/2025, 12:41:19 PM No.105752732

>>105752719
Nemo has better knowledge than qwen3 235b.

Anonymous

6/30/2025, 12:42:19 PM No.105752737

>>105752711
You just want to be angry like the guy in the screenshot.

Replies: >>105752768

Anonymous

6/30/2025, 12:45:46 PM No.105752760

>finish cooming to imagegen
>browse some hdg
>"Gooning to text is so much better"
>"I don't imagegen i coom to text"
Huh...? I guess the grass is always greener?

Replies: >>105752780 >>105752863

Anonymous

6/30/2025, 12:46:43 PM No.105752767

>>105752711
Exactly. That is why I suggested the other anon to go try it out, get the model "unlocked" and then good luck having an enjoyable RP session.

Anonymous

6/30/2025, 12:46:46 PM No.105752768

>>105752737
Dude, I just want characters to stay consistent.
Just yesterday I played as a filthy goblin.
Showed a princess my smegma goblin dick, she was disgusted.
Then I asked her why she is so mean to me, judging my dick. Princess starts apologizing, pangs of guilt.
Granted, was Mistral 3.1, I havent downloaded 3.2 yet. Doubt its that much better though.

Replies: >>105752786 >>105752800

Anonymous

6/30/2025, 12:48:39 PM No.105752780

>>105752760
I coom to forcing gemma to explain what is going in my lewd image gens.

Anonymous

6/30/2025, 12:49:18 PM No.105752786

>>105752768
How would you implement a dice roll mechanic? Doing a simple Dungeons and Dragons style skill check is sufficient.

Anonymous

6/30/2025, 12:50:38 PM No.105752800

>>105752768
I skipped 3.1 (due to no spare time) but the difference between 3.0 and 3.2 is very noticeable in smarts. RP wise I can't tell that much of a difference tho, I would say just slightly better.

Anonymous

6/30/2025, 12:51:24 PM No.105752803

Mikuboobgrowth

md5: fb632a6c97c667325b35646a0a1d328a🔍

>>105751891
>>105751899
>>105752048
What are the recommended settings for Nemo? It's straight up schizophrenic on my sillytavern right now, but I'm reallynew and dumb.

Pls help.

Replies: >>105752821 >>105752838 >>105752851

Anonymous

6/30/2025, 12:52:40 PM No.105752813

Unsloth has ernie goofs. The 0.3 retard one of course.

Replies: >>105752870

Anonymous

6/30/2025, 12:53:49 PM No.105752821

>>105752803
I would have helped if you didn't contribute to the niguspam. Now i will have to shit this thread up again.

Replies: >>105752830

Anonymous

6/30/2025, 12:54:43 PM No.105752830

>>105752821
... that's a thing?

Anonymous

6/30/2025, 12:57:08 PM No.105752838

>>105752803
This is the most "helpful" guide around here
>https://rentry.org/lmg-lazy-getting-started-guide

Anonymous

6/30/2025, 12:58:19 PM No.105752845

Since cudadev loves blacked miku does that mean he would also love it if jart bussy got blacked?

Anonymous

6/30/2025, 12:59:26 PM No.105752851

>>105752803
~0.7 temp
40 top k
0.05 min p
0.8 dry multiplier 1.75 dry base

Stay clear of repetition penalty (set to 1.0)
Read the samplers guide in the op.
Learn what a chat template is that check if it's correct if you're using text completion.

Replies: >>105752901 >>105752997

Anonymous

6/30/2025, 1:01:49 PM No.105752863

>>105752760
The current limitation in coom tech is the combination of both: if I want to e.g. replicate the experience of reading doujinshi featuring an aunt with gigantic tits image models can't really convey the incest and language models obviously lack the visual stimulus.

Replies: >>105752903

Anonymous

6/30/2025, 1:03:12 PM No.105752870

>>105752813
Their pre-release PR only supported the 0.3B retard model. MoE was probably too difficult to do it in time so you can expect to be waiting 2mw for the MoE ggufs. Though to be fair, not even the vLLM PR has been merged yet.

Anonymous

6/30/2025, 1:08:50 PM No.105752901

>>105752851
Solid advice.
>Learn what a chat template is that check if it's correct
Especially this one.
Supposedly some Nemo quants floating around come with a wrong prompt template.
Always double-check the templates, verify against the base model too.

Anonymous

6/30/2025, 1:09:06 PM No.105752903

>>105752863
I hope we get something like chatgpt image soon..
Imagine you can request a image for your RP and its all in context, including the image.
Qwen released something like that I think but its all api only.

Replies: >>105752916

Anonymous

6/30/2025, 1:12:26 PM No.105752916

>>105752903
>I hope we get something like chatgpt image soon..
You have Flux Kontext now.
>Imagine you can request a image for your RP and its all in context, including the image.
Editing the image is more of a tool, and I think it would be awkward to stop mid roleplay and start prompting OOC to generate and edit images. For roleplay, simple image in is enough.

Anonymous

6/30/2025, 1:14:27 PM No.105752927

>>105751376
its actually over

Anonymous

6/30/2025, 1:18:12 PM No.105752948

Is OpenAI fucked now that Meta has poached many talents from them? I'm not sure Meta will benefit much from this but surely it's big problem for OpenAI?

Replies: >>105752983

Anonymous

6/30/2025, 1:21:13 PM No.105752964

>>105752491
Ernie will pass mesugaki test!

Replies: >>105752974 >>105753231

Anonymous

6/30/2025, 1:23:08 PM No.105752974

>>105752964
Only ones passing the mesugaki test are the ones trained on Japanese corpora

Replies: >>105753231

Anonymous

6/30/2025, 1:25:35 PM No.105752983

youtube

md5: 6c03d54249b7b1f0696927b7c64bc9dc🔍

>>105752948
Who knows. There is so much noise anon.
Twitter in combination with shithole countries going online has been a disaster. Pajeets hyping all sorts of stuff.
Most of the X Users then hype even further on youtube.

So just enjoy what we have now. Take a look at how fast this is evolving.
Just lean back, relax, enjoy the show.

Anonymous

6/30/2025, 1:27:47 PM No.105752997

>>105752851
>0.8 dry multiplier 1.75 dry base
I can't find those settings anywhere. Where are they?

Replies: >>105753048

llama.cpp CUDA dev !!yhbFjk57TDr

6/30/2025, 1:30:51 PM No.105753011

chess960_normal-filesize-winrate

md5: 4901fa30804d1c2124b91048b3f1cb49🔍

>>105750356 (OP)
Follow-up to >>105736983 :

|Model |File size [GiB]|Correct answers| |
|--------------------------------------------|---------------|---------------|--------|
|mistral_small_3.1_instruct_2503-24b-f16.gguf| 43.92|1051/4962 |= 21.18%|
|phi_4-15b-f16.gguf | 27.31|1105/4664 |= 23.69%|
|gemma_3_it-27b-q8_0.gguf | 26.74|1149/4856 |= 23.66%|
|mistral_nemo_instruct_2407-12b-f16.gguf | 22.82|1053/4860 |= 21.67%|
|gemma_3_it-12b-f16.gguf | 21.92|1147/4926 |= 23.28%|
|glm_4_chat-9b-f16.gguf | 17.52|1083/4990 |= 21.70%|
|gemma_2_it-9b-f16.gguf | 17.22|1151/5000 |= 23.02%|
|llama_3.1_instruct-8b-f16.gguf | 14.97|1015/5000 |= 20.30%|
|ministral_instruct_2410-8b-f16.gguf | 14.95|1044/4958 |= 21.06%|
|qwen_2.5_instruct_1m-7b-f16.gguf | 14.19|1052/5000 |= 21.04%|
|gemma_3_it-4b-f16.gguf | 7.23|1064/5000 |= 21.28%|
|phi_4_mini_instruct-4b-f16.gguf | 7.15|1082/4982 |= 21.72%|
|llama_3.2_instruct-3b-f16.gguf | 5.99|900/5000 |= 18.00%|
|stablelm_2_chat-2b-f16.gguf | 3.07|996/4976 |= 20.02%|
|llama_3.2_instruct-1b-f16.gguf | 2.31|1000/4998 |= 20.01%|
|gemma_3_it-1b-f16.gguf | 1.87|955/4938 |= 19.34%|

It seems my initial impression was too pessimistic, with a sufficiently large sample size even weaker models seem to be able to do better than RNG.
With a sample size of 5000 RNG would result in 20.0+-0.5% so even 4b models can be statistically significantly better.

Replies: >>105753046 >>105753110

Anonymous

6/30/2025, 1:37:34 PM No.105753046

>>105753011
if phi is performing well at all your tests are really fucked m8

Anonymous

6/30/2025, 1:37:56 PM No.105753048

>>105752997
AI Response Configuration (leftmost button at the top) and scroll down.

llama.cpp CUDA dev !!yhbFjk57TDr

6/30/2025, 1:48:17 PM No.105753110

>>105753011
The models seem to improve as they get larger, which is what you'd expect - the scaling is still bad though.
Pic related are my previous results with Elo scores derived from GPQA, MMLU, MMLU-Pro, and GSM8K.
By comparison to the static benchmarks Qwen 2.5 7b, LLaMA 3.2 3b, and the Phi models are underperforming in chess960.
I don't know why LLaMA 3.2 3b in particular is performing so poorly, the results indicate that it's doing statistically significantly worse than RNG.
There's either a bug in my code or something wrong with the model.

Gemma and Phi models seem to be performing well in chess960 with no statistically significant differences between them.
However, the Phi models in particular were trained on large amounts of synthetic data with the claim that this improves reasoning.
For the pre-existing static benchmarks this seems to indeed have improved the scores but for chess960 there seems to be no benefit vs. Gemma which was trained on non-synthetic data.

Mistral and LLaMA models seem to perform worse than Gemma and Phi models.
Without more data I don't think it's possible to conclusively tell whether this is due to differing amount of chess in the training data (and would thus not be the case more generally).

Replies: >>105753131 >>105753360 >>105754841

llama.cpp CUDA dev !!yhbFjk57TDr

6/30/2025, 1:50:25 PM No.105753131

filesize-elo

md5: def87c242e62cc4098187d515c69205e🔍

>>105753110
Forgot pic.

Replies: >>105753173

Anonymous

6/30/2025, 1:55:21 PM No.105753173

>>105753131
I guess this just means that none of these local models aren't that great. And they were trained for text manipulation in mind anyway.
I'd suppose in the future these models should have different areas of 'brain' - one for logic, one for language and so on but this will mean drastic increase in parameter size.

Anonymous

6/30/2025, 1:59:32 PM No.105753194

>>105751408
>>105751435
don't forget to run with thinking disabled. This shit only works on tasks similar to the benchmaxxer crap. Anything else and you're getting worse results for longer time inference.

Anonymous

6/30/2025, 2:02:24 PM No.105753220

i last generated two years ago, since then ive done practically nothing but collect datasets. and not a few loras but something like 20m images, or half a million hours of speech, specialized datasets for LLMs, image recognition, 3D models & worlds and other shit, stock markets, news, whole reddit, global mapping of any shit. i keep curating it and scaling more and more.
and i cant do anything with it because i dont have the money to train at that scale kek.
ive become a fucking datenhoarder extremist

Replies: >>105753303 >>105753388 >>105753590

Anonymous

6/30/2025, 2:03:25 PM No.105753231

mesugaki_test

md5: 2cdd1db1595fde31692298f5eee86284🔍

>>105752964
>>105752974

Replies: >>105753364

Anonymous

6/30/2025, 2:11:22 PM No.105753303

>>105753220
Do you have any crime statistics too?

Replies: >>105753445

Anonymous

6/30/2025, 2:12:37 PM No.105753315

ik_llama.cpp server doesn't seem to work with prefills in the chat completions endpoints like normal llama.cpp does. The assistant response is a new message with the prefill treated as a previous incomplete one in the context, while llama.cpp correctly continues the response where it left off.

Now that the original llama.cpp has its own versions of MLA and offloading tensors, what is left to close the performance gap so there's no more reason to use this thing? Are the equivalents of fmoe/rtr/amb being worked on?

Replies: >>105753346

Anonymous

6/30/2025, 2:16:11 PM No.105753344

How good is qwen3 0.6b embedding? Anyone using it for RAG apps seriously?

Replies: >>105753513

Anonymous

6/30/2025, 2:16:24 PM No.105753346

>>105753315
>chat completions
Can you not use the /completion endpoint? Then you should be able to do whatever you want. I don't use ik, but i assume that endpoint works the same as on llama.cpp.

Replies: >>105753376

Anonymous

6/30/2025, 2:17:54 PM No.105753360

>>105753110
>Mistral and LLaMA models seem to perform worse than Gemma and Phi models.
>Without more data I don't think it's possible to conclusively tell whether this is due to differing amount of chess in the training data (and would thus not be the case more generally).
Mistral and Llama are just bad models in general.
In my own personal bench for tasks I care about, and on which I personally judge the answers (no LLM as judge nonsense) they're always at the bottom tier. Phi-4 14B surprised me, it's a legit model and I say that as someone who hated the previous Phi. The mini version on the other hand is very, very bad, and nowhere near Qwen 3 4B or Gemma 3n in the world of micro sized LLMs.

Anonymous

6/30/2025, 2:18:24 PM No.105753364

>>105753231
>400 t/s
that the 0.3B I guess?

Replies: >>105753458

Anonymous

6/30/2025, 2:20:21 PM No.105753376

>>105753346
Yeah you can and that's what I do with SillyTavern and Mikupad, but for frontends and apps that connect via OpenAI-compatible API I'm stuck with chat

Replies: >>105753469

Anonymous

6/30/2025, 2:20:30 PM No.105753378

Is anyone trying tool-calling/MCP with 0528 Qwen3 (hosted locally)? I've only had a few successful tool calls fire off so far.

First I tried testing tool calling out with a template someone else made for ollama, and that worked only a time or two. The next thing I tried was the autoguess template for koboldcpp in its openai compatible mode and that setup rarely worked.

The best configuration I've come up with thus far is a custom chat template adapter for kobold that works... semi often:
```json
{
"name": "DeepSeek R1 0528",
"system_start": "<|beginofsentence|>",
"system_end": "",
"user_start": "<|User|>",
"user_end": "",
"assistant_start": "<|Assistant|>",
"assistant_end": "",
"tools_start": "<|toolcallsbegin|>",
"tools_end": "<|toolcallsend|>",
"add_sd_negative_prompt": "",
"add_sd_prompt": ""
}
```

Problem is, the 0528 official template: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B?chat_template=default&format=true

-includes a few other special tokens for tool calling like: `<|toolcallbegin|>`, `<|toolsep|>`. I think without proper handling of those, tool calling with it will remain mostly hit or miss.

Replies: >>105753393

Anonymous

6/30/2025, 2:21:34 PM No.105753388

>>105753220
Have you considered talking to the Prime Intellect guys, or maybe just looking at their code, doing a fork and starting some distributed training runs with other lmg anons, maybe there's something we all want to do?

Replies: >>105753406 >>105753445

Anonymous

6/30/2025, 2:22:18 PM No.105753393

>>105753378
Don't DS use special characters for beginofsentence etc.?
https://old.reddit.com/r/LocalLLaMA/comments/1j6w3qq/psa_deepseek_special_tokens_dont_use/

Replies: >>105753479

Anonymous

6/30/2025, 2:24:18 PM No.105753406

>>105753388
lmg would never manage to agree on what size of model to do, whether a small one, mid-size one, dense or moe. what training data to use. impossible

Replies: >>105753442 >>105753449 >>105753509 >>105753517

Anonymous

6/30/2025, 2:28:52 PM No.105753442

>>105753406
>whether a small one
as far as I know, all the truly usable small models are distillation of much larger models. That is, there has never been such a thing as a decent small model trained from scratch.
So even if the consensus could be build on a small or medium model size the fact of the matter is, you need the giant model done first anyway.
You will never reach the quality of a model like Gemma without distilling something like Gemini.

Replies: >>105753452 >>105753486

Anonymous

6/30/2025, 2:29:21 PM No.105753445

>>105753303
nope, but it's now on the to-do list and has opened up a whole cascade of new information that i now want to link to my mapping - for whatever so i just have it. thanks for making my life even worse

>>105753388
I'll take a look at it

Anonymous

6/30/2025, 2:29:36 PM No.105753449

>>105753406
Anything bigger than 20b dense would take too long. Moe is probably harder to train. And the dataset is obviously at least 50% cooming and 0% math and programming.

Replies: >>105753640

Anonymous

6/30/2025, 2:30:06 PM No.105753452

>>105753442
by small I mean the 10 to 25b range, not under 8

Replies: >>105753468

Anonymous

6/30/2025, 2:30:29 PM No.105753458

>>105753364
Yes, the larger MoE models are not supported yet.

Replies: >>105753475

Anonymous

6/30/2025, 2:31:18 PM No.105753468

>>105753452
Gemma 3 27b is also a distilled model. ALL gemmas are.

Anonymous

6/30/2025, 2:31:19 PM No.105753469

>>105753376
One option could be to make a small reverse proxy that receives OpenAI chat requests, converts them into text prompts using whatever rules you want, sends them to the completions endpoint and passes the response back. Then just have the frontends connect to the proxy instead of the server directly.

Anonymous

6/30/2025, 2:31:46 PM No.105753475

>>105753458
That's actually decently coherent for the size then, of course completely incorrect factually, but still.

Anonymous

6/30/2025, 2:32:12 PM No.105753479

template_shot

md5: 39de205770fda3ca61d48a5bd747bd86🔍

>>105753393
As far as i know. I'm using those in my template but they got stripped in the text, my bad.

Replies: >>105753547

Anonymous

6/30/2025, 2:32:54 PM No.105753486

>>105753442
Anon those models generalize ERP from basically nothing. You have no idea how any of this stuff applies to a model that actually has the training data it needs. Maybe if you remove all the benchmaxxing garbage and add some dressing up and putting makeup on nala it can actually be better and faster than the corpoassistant garbage.

Anonymous

6/30/2025, 2:36:36 PM No.105753508

>>105752181
If you knew how hard it is to clean a dataset you wouldn't say that. The mechanic turks offload to chatgpt nowadays so it's the same as using AI and AI isn't even doing a good job at that.

Anonymous

6/30/2025, 2:36:43 PM No.105753509

>>105753406
Maybe true, but here's some considerations:
- image gen might be easier to train for some because small models can do more "eyecandy", this might not be as true for text gen, most stuff that performs well is usually quite large
- most anons won't have H100 or better, orienting around 3090s/4090s or what is good latency, geographical considerations, might make sense
- PI seems to be getting quite a lot of people donating compute and their training dataset seemed to be quite slop, like it was fineweb or other benchmaxxed stuff, I think they will at some point fix this to have more diverse and interesting data, as evidenced by speculations like: https://xcancel.com/kalomaze/status/1939488181168353785#m
- so maybe not impossible for lmg to do it, maybe start small, I know people that have trained okay 1Bs on a personal budget for example (few 3090s at home for months on end)
- if anons are donating compute, they should maybe vote on what to train, surely there's's some project (or more than one) to which most people agree to and that can be done, maybe start small first? Will probably have to be some iterative thing and lots of voting until something that has a critical mass is found.

Anonymous

6/30/2025, 2:37:11 PM No.105753513

>>105753344
Wasn't as good as I hoped. It always depends on your use case, but there are better models available in that size group.
Qwen3-Embedding-4B-Q4_K_M on the other hand was so good that I ended up using it permanently in my pipeline.

Anonymous

6/30/2025, 2:37:36 PM No.105753517

>>105753406
Good models just can't be done by committee, especially in a "community" where you have dedicated saboteurs, lgbtbbq allies and christcucks, terminal coomers, frauds ready to put their name everywhere and retards who wasted too many neetbucks on GPUs.

Replies: >>105753640 >>105753841

Anonymous

6/30/2025, 2:39:22 PM No.105753527

1725203475717450

md5: 364037ed0fb67944e9909c0a1ad1cf7c🔍

I want a friend who'll listen through my book reports and take an interest or contribute their own thoughts. Which model should I use for that?

Replies: >>105753557

Anonymous

6/30/2025, 2:42:28 PM No.105753547

Anonymous

6/30/2025, 2:43:45 PM No.105753557

>>105753527
They are all the same. Highest parameter count your hardware can fit.
It will be annoying because no model will have its own opinion it will just say Yes to whatever you'll propose.

Replies: >>105753569

Anonymous

6/30/2025, 2:44:58 PM No.105753569

>>105753557
>It will be annoying because no model will have its own opinion it will just say Yes to whatever you'll propose.
Ask it to write an alternative then in a fresh chat, ask them to pick between the two

Replies: >>105753582 >>105753616

Anonymous

6/30/2025, 2:47:13 PM No.105753582

>>105753569
Both are absolutely fantastic options anon! :)

Replies: >>105753621

Anonymous

6/30/2025, 2:48:37 PM No.105753590

>>105753220
>specialized datasets
>i dont have the money to train at that scale kek.
Maybe one of the others could help you get into the game?

Pick a niche.
Collect monies via patreon or something.
Rent gpu time.
Release model on hf.co for everyone to download and use for free.

Anonymous

6/30/2025, 2:49:10 PM No.105753594

>>105752368
Your argument is retarded. Garbage in, garbage out. That's how this tech works. Are you expecting stellar results from a one line prompt like the skillets seething at AI on /g/? That won't happen.
Even in imagegen you always needed loras, controlnet, regional prompting and inpainting to get the really good stuff. Otherwise you just get a bland slop just like you get your assistant slop here.
>It should be able to do like Opus and read my mind and not be such a prude
Sure and I also want a jetpack for Christmas. Anyway, local models will stay weak, the only way to improve your experience is better using the crutches (samplers, sys prompt, author's note, lorebooks...)

Replies: >>105753780

Anonymous

6/30/2025, 2:52:23 PM No.105753616

>>105753569
Thank you for proposing this! I think both alternatives are quite equal in my mind.

Replies: >>105753621

Anonymous

6/30/2025, 2:52:58 PM No.105753621

1722739880374951

md5: d58945ba2040f019b2633b701fa7ea38🔍

>>105753582
>>105753616
Maybe from dumber models.

Replies: >>105753648 >>105753666 >>105753801

Anonymous

6/30/2025, 2:55:27 PM No.105753636

>>105748486
>>105748447
>>Do you think the government will ever ban locally run AI's due to "safety concerns"?
>No. Too hard to enforce.
>>If they do, do you think they can actually stop the distribution of AI?
>No. Too hard to enforce.
They could ban hardware. Like limit vram, ram or ram sockets count.

Replies: >>105753645 >>105753679

Anonymous

6/30/2025, 2:55:57 PM No.105753640

>>105753449
I used to find funny anons that believed you could fuck a CoT math model around 2 years ago, maybe around Orca or similar papers, these LLMs were pretty terrible, but consider R1, it is extremely good for cooms and DS2 was quite benchmaxxed on code, it was their specialty and they did it well, but somehow the reasoning managed to unfuck and improve DS3 creativity considerably.
>>105753517
Realistically, it'd be some voice/audio gen, chat or ERP (c.ai style) or storywriting model for text gen, or some anime stuff for image, "regular" stuff is already acceptably covered by existing stuff

As for Moe vs dense, harder to train MoE, but lmg wants knowledge of trivia well enough.

Replies: >>105753676

Anonymous

6/30/2025, 2:56:25 PM No.105753645

>>105753636
If they do that, it won't just be over AI. That impacts too many functions.

Replies: >>105753715

Anonymous

6/30/2025, 2:56:36 PM No.105753648

>>105753621
I am so tired of even glancing over this AI slop text. It's already tiring without even reading anything.

Replies: >>105753660

Anonymous

6/30/2025, 2:57:18 PM No.105753660

>>105753648
Yet you're in the local model general because...

Replies: >>105753669

Anonymous

6/30/2025, 2:58:02 PM No.105753666

>>105753621
To add: you are still getting some dim average. And asking about 'butts or whatever' only shows how fucking stupid you are in the first place.

Replies: >>105753678

Anonymous

6/30/2025, 2:58:33 PM No.105753669

>>105753660
Fuck off nigger

Replies: >>105753673

Anonymous

6/30/2025, 2:59:03 PM No.105753673

>>105753669
See you in three days

Anonymous

6/30/2025, 2:59:35 PM No.105753676

>>105753640
>somehow the reasoning managed to unfuck and improve DS3 creativity considerably
they just changed their data, it's very apparent

Replies: >>105753730

Anonymous

6/30/2025, 2:59:41 PM No.105753678

>>105753666
Your fetish is dumb and you should feel dumb.

Replies: >>105753725

Anonymous

6/30/2025, 3:00:06 PM No.105753679

>>105753636
They already tried to do that with china, thankfully china is catching up well. Also it would take limits on networking equipment and more. There's also some open hardware stuff like tenstorrent where you have access to all their driver cod and could have stuff together. Even if not, imagine trying to limit FPGAs or anything like that, realistically they won't fight this anymore, doomers want it banned badly, but it's so deliciously widespread now that it probably won't work anymore, That Yud faggot is still trying though, he even got some DHS spooks to vouch for his book, but the landscape is far worse for them now , Biden would have gone along with it if he won, Trump is relatively pro-AI and hisc ampaign was supported by anderseen which is strongly pro_OSS.

Replies: >>105753749

Anonymous

6/30/2025, 3:04:38 PM No.105753715

>>105753645
Allow to bypass the limits for businesses by purchasing a loicense and registering hardware, still not allowed to resale or dispose in other means than through hardware office.

Replies: >>105753756

Anonymous

6/30/2025, 3:06:12 PM No.105753725

>>105753678
What fetish?

Anonymous

6/30/2025, 3:06:45 PM No.105753730

>>105753676
How long did it take, like 2 weeks or 2 months? It really seemed too soon - they went with a very repetitive model (DS3) to one that is quite good (DS3-0324), but the intermediate R1 was already very good at the writing - it would output quality writing even if it was schizo in the reasoning blocks. They must have done something quite interesting and in a short amount of time at that.

Anonymous

6/30/2025, 3:08:25 PM No.105753749

>>105753679
They cannot send officers to China and enforce it.

> Trump is relatively pro-AI
I was thinking about Europe and it wouldn't surprise me if they did it.

Anonymous

6/30/2025, 3:09:26 PM No.105753756

>>105753715
What are other countries, you'd need global tracking for this, and unlike Biden's time there's not many doomers in Trump's office. Besides, you can just buy chink hardware at some point if local is cucked. I'd also imagine controlling DRAM itselfwould be infinitely harder than controlling HBM.

Replies: >>105754725

Anonymous

6/30/2025, 3:12:30 PM No.105753780

>>105753594
>Garbage in, garbage out. That's how this tech works. Are you expecting stellar results from a one line prompt like the skillets seething at AI on /g/?
Yes exactly. I get that from image gen. I expect the same from textgen. I would tell you to kill yourself you miserable faggot scum piece of shit cocksucker but we both know you are just trolling with the easiest troll method so don't stop. /lmg/ deserves it.

Replies: >>105753897

Anonymous

6/30/2025, 3:15:16 PM No.105753801

>>105753621
Oh, you're that russian twitter anon, do you happen to have some contact besides twitter, I wanted to argue some philosophy with you before (disagreements on functionalism), but I lack twitter, do you happen to have some contact (mail? IRC?)? Although if I got it wrong, ignore this post.

Replies: >>105753880 >>105754071

Anonymous

6/30/2025, 3:20:26 PM No.105753841

>>105753517
Drummer would sabotage it to protect his patreon scam.

Replies: >>105753914

Anonymous

6/30/2025, 3:24:01 PM No.105753880

>>105753801
Nigga what the fuck are you on about
I'm not Russian nor am I on Twitter

Replies: >>105753895 >>105753920

Anonymous

6/30/2025, 3:26:15 PM No.105753895

>>105753880
But you asked about butts, I need to argue about butts with you.

Anonymous

6/30/2025, 3:26:26 PM No.105753897

>>105753780
I accept your concession, retard. Skillets like you were whining all the way back in AID. Funny how models got better, but skillets are still skillets

Replies: >>105753999

Anonymous

6/30/2025, 3:28:13 PM No.105753914

>>105753841
I doubt he'd be the only one either.

Anonymous

6/30/2025, 3:29:20 PM No.105753920

>>105753880
Ignore me then, was just someone that oftenposts DeepSeek web UIscreenshots.He's also an ironic jew!

Anonymous

6/30/2025, 3:30:35 PM No.105753929

https://poal.me/112wvx
Let's go

Replies: >>105753950 >>105753960

Anonymous

6/30/2025, 3:32:41 PM No.105753950

>>105753929
I want a >400B model but it's going to be something that fits into a 3090.

Replies: >>105753964

Anonymous

6/30/2025, 3:33:53 PM No.105753960

>>105753929
It'll be 4B, I can feel it in my bones.

Replies: >>105753982

Anonymous

6/30/2025, 3:34:19 PM No.105753964

>>105753950
It needs to be BitNet. That obviously goes without saying.

Anonymous

6/30/2025, 3:36:14 PM No.105753982

>>105753960
It will be a finetune of mistral-small-3.2

Replies: >>105754005

Anonymous

6/30/2025, 3:36:58 PM No.105753991

I want to feed a pdf file packed with mathematic formulas into the prompt

How to convert it into a model-readable format? Latex? Did domeone tried it already in LOCAL?

Replies: >>105754024

Anonymous

6/30/2025, 3:37:54 PM No.105753999

>>105753897
There is no concession. You aren't even arguing for real. Nobody us this retarded.

Anonymous

6/30/2025, 3:38:40 PM No.105754005

>>105753982
That would be funny as fuck.

Anonymous

6/30/2025, 3:40:11 PM No.105754021

md5: eccc35d491d5e952f9f1998c57af733c🔍

>>105750356 (OP)
>https://ernie.baidu.com/blog/posts/ernie4.5
>The model family consist of Mixture-of-Experts (MoE) models with 47B and 3B active parameters, with the largest model having 424B total parameters, as well as a 0.3B dense model
Jesus fucking christ
They're doing this on purpose. 64gb RAM bros, we will never have our day.

Replies: >>105754081

Anonymous

6/30/2025, 3:40:21 PM No.105754024

>>105753991
Seen a few OCR solutions that could handle formulas in LocalLLaMA, but since it's not something I'm interested in I don't remember any of their names.

Anonymous

6/30/2025, 3:43:30 PM No.105754044

PLEASE GIVE US A STATE OF THE ART 70B-100B MODEL
GRRAAAAHHHHHHHH!!!!!!
I HATE THE ANTICHRIST

Replies: >>105754058 >>105754074 >>105754081 >>105754167

Anonymous

6/30/2025, 3:45:50 PM No.105754058

>>105754044
llama3.3 is all you need for rp

Replies: >>105754070

Anonymous

6/30/2025, 3:47:21 PM No.105754070

>>105754058
I WANT MORE

Anonymous

6/30/2025, 3:47:28 PM No.105754071

>>105753801
I'm teortaxes but I don't want to share my contacts here, learn 2 DM.

Replies: >>105754163

Anonymous

6/30/2025, 3:47:41 PM No.105754074

>>105754044
just use ernie 0.3b, wanting any more is a sign of a skill issue

Replies: >>105754902

Anonymous

6/30/2025, 3:47:53 PM No.105754077

been out of the loop for a few months, has there been any crazy good uncensored rp models released in the 20-30b range? the last model I have downloaded was cydonia

Replies: >>105754087 >>105754102 >>105754121 >>105754447

Anonymous

6/30/2025, 3:48:16 PM No.105754081

l;affr

md5: ad1303f13cccef2eeb1694be6fea4376🔍

>>105754021
>>105754044
Lllama Scout

Anonymous

6/30/2025, 3:49:35 PM No.105754087

>>105754077
see if you can run Valkyrie 49b

Replies: >>105754170 >>105754447

Anonymous

6/30/2025, 3:51:35 PM No.105754102

>>105754077
>has there been any crazy good uncensored rp models released
R1
>in the 20-30b range
No.

Anonymous

6/30/2025, 3:53:58 PM No.105754121

>>105754077
Come back in a few months

Anonymous

6/30/2025, 3:58:45 PM No.105754163

>>105754071
But I don't want to register a twitter account! I could post a mail tough, and expect anons to spam it with dolphin porn. Or just leave it for another time, longform philosophy debates tend to take days.

Replies: >>105754487

Anonymous

6/30/2025, 3:58:56 PM No.105754167

>>105754044
6.40 b ought to be enough for anybody.

Anonymous

6/30/2025, 3:59:26 PM No.105754170

>>105754087
the calc says maybe

Anonymous

6/30/2025, 4:10:41 PM No.105754262

zm85bF1

md5: f47bb98487e9cda01436a25c045667cf🔍

Kek
https://www.upguard.com/blog/llama-cpp-prompt-leak
>we invasive scanned the internet to scrape non-firewalled llama.cpp servers, and in doing this we found SEX ROLEPLAY, this is why we need to focus on safety concerns for generative AI

Replies: >>105754286 >>105754294 >>105754359 >>105754368 >>105754428 >>105754454 >>105754460 >>105754503 >>105755159 >>105755548

Anonymous

6/30/2025, 4:13:38 PM No.105754286

>>105754262
How can fictional characters be children?

Replies: >>105754316

Anonymous

6/30/2025, 4:14:19 PM No.105754294

>>105754262
two of them? oh, no...

Anonymous

6/30/2025, 4:16:19 PM No.105754316

>>105754286
I know it's a strange concept if you don't have an imagination, but fictional characters can embody all of the same traits that real people can. Maybe this blows your mind but there are fictional characters who are dragons and shit too.

Replies: >>105754343

Anonymous

6/30/2025, 4:19:11 PM No.105754343

>>105754316
>no dragons were harmed in the making of this story

Anonymous

6/30/2025, 4:20:15 PM No.105754359

684cb5b99dd4f2d803858774_1678294813918 (2)

md5: 4f20f87bd6af989490c866af17e66615🔍

>>105754262
These people are so weird man.
Its the same as if entering "cunny lesbo sexy hairless pussy". Then being outraged at the result.
What kind of website is this shit? Scan and scrape the internet for llama.cpp servers...to do what exactly?
And thats a jewish name btw. Just pointing that out.

Replies: >>105754420 >>105754432 >>105754433

Anonymous

6/30/2025, 4:21:06 PM No.105754368

>>105754262
Anon is gooning to underage text!
Call the FBI!

Anonymous

6/30/2025, 4:26:39 PM No.105754420

slots

md5: 46a086a285f396227e6575b06458a733🔍

>>105754359
>Its the same as if entering "cunny lesbo sexy hairless pussy". Then being outraged at the result.
Not quite. I didn't read the article but, as i understood it, they found servers and probably checked their cache with the /slots endpoint or something like that.

Replies: >>105754450

Anonymous

6/30/2025, 4:27:11 PM No.105754428

>>105754262
People are confused. cp is bad because it involves living children, but anything else is a shitty fetish at most and in principle can only be criminalized because of some ideological beliefs

Anonymous

6/30/2025, 4:27:39 PM No.105754432

>>105754359
They're just professional grifters.
Lower tier security researchers are just what would be a script kiddie, but now with an official job title. So they editorialize and try to act outraged. There's lots of these companies selling such security services, so they have to hack some shit and fill some slop article. Even if in this case, what they did was already done by skiddies at /aicg/ with searching for openST instances a thousand times over.

Anonymous

6/30/2025, 4:27:39 PM No.105754433

>>105754359
Not sure but looks like snake oil website
>https://www.upguard.com/
My local electricity company (yes) began to upsell "credential protection service - which would also detect if your credentials would be used on some online services" year or two ago but I think they quickly stopped doing this. Needless to say whole concept sounds like a lie so that they can scam pensioners.
Looks like this company could provide such services too.

Anonymous

6/30/2025, 4:28:55 PM No.105754447

>>105754077
>>105754087
Kill yourself drummer you faggot

Replies: >>105754504

Anonymous

6/30/2025, 4:29:01 PM No.105754450

>>105754420
NTA but the point is that it's hypocritical and obnoxious to invasively do unsolicited scans to find random private endpoints on the internet and then complain about "safety issues" (sexual content)

Replies: >>105754498 >>105754541

Anonymous

6/30/2025, 4:29:17 PM No.105754454

llama.cpp-country-distribution

md5: c6a33733605dee6eaac54ee60724cffa🔍

>>105754262
Pic related is an interesting piece of information though.
Germans are overrepresented with llama.cpp in particular (or at least if you go by the number of unsecured servers).

Replies: >>105754476 >>105754477 >>105754496

Anonymous

6/30/2025, 4:29:29 PM No.105754460

61c20a1abde29051fbf54780_DSC04236-p-2600

md5: 6a0bd4c8813db377c204e19cde82bc49🔍

>>105754262
>Oh anon come right in, please take a seat in our office

Anonymous

6/30/2025, 4:30:23 PM No.105754470

1741512748772880

md5: 72e8c2d61980ff62db29297cb977831c🔍

https://files.catbox.moe/g0kvhi.jpg

Replies: >>105754494 >>105754856

Anonymous

6/30/2025, 4:30:27 PM No.105754472

Are there front-end or agents or whatever the fuck suitable for self-hosting? Local LLM is up which is great and all, but how do I go about integrating shit like having the AI look at my calendar or even control some home-automation? Obvious solution seems like DIY it with regex and crossed fingers to interface the llm to other programs, but are there existing solutions?

Replies: >>105754529

Anonymous

6/30/2025, 4:30:55 PM No.105754476

>>105754454
that's just cuda dev

Anonymous

6/30/2025, 4:30:59 PM No.105754477

>>105754454
I don't know how much you know about this, but Germany is the biggest market area in the Europe, France is next and UK was third or so.
I mean in terms of any volume. Germany is big place.

Replies: >>105754541

Anonymous

6/30/2025, 4:31:34 PM No.105754487

>>105754163
Tough luck
if you can't register an x account with some fake mail I'm not interested in your vintage anon brainworms
besides functionalists are uninteresting

Replies: >>105754531

Anonymous

6/30/2025, 4:32:11 PM No.105754494

>>105754470
kys shartyfag

Replies: >>105754501

Anonymous

6/30/2025, 4:32:14 PM No.105754496

>>105754454
Might be cloud services, like you rent some server and put llama.cpp on it, wouldn't surprise me if US and DE were common here.

Anonymous

6/30/2025, 4:32:27 PM No.105754498

>>105754450
I agree. I'm just pointing out the difference. Anon's comment made me think that he understood it as they (the "researchers") prompting the model themselves, which doesn't seem to be what happened.

Anonymous

6/30/2025, 4:32:42 PM No.105754501

>>105754494
how new r u?

Replies: >>105754566

Anonymous

6/30/2025, 4:32:49 PM No.105754503

>>105754262
>"—" used in the article
my llm slop sense is tingling

Replies: >>105754509 >>105755807

Anonymous

6/30/2025, 4:32:55 PM No.105754504

>>105754447
I'm not drummer I've just not used chatbots in 6 months, do you have another model to suggest?

Replies: >>105754538 >>105754549

Anonymous

6/30/2025, 4:33:33 PM No.105754509

>>105754503
LLMs didn't invent em dashes

Anonymous

6/30/2025, 4:35:59 PM No.105754529

>>105754472
Adding the glue big companies have (in the form of scripts) and some function calling. It's the same thing they do but with bigger models. That's where things may fall apart. Other than that, there's no difference. There's too many ways to use user's tools, whereas google and friends have their own ecosystem. That's why you don't see generic options more often. You have all the tools you need to make those same things.

Replies: >>105754586

Anonymous

6/30/2025, 4:36:08 PM No.105754531

>>105754487
Would you bother sending a mail if I made a temp mail. I don't know how is it these days, I think twitter wants you to verify with a phone, all that stuff seems like too much effort to bother for me. I only wanted to discuss this months ago when yo u were strongly against functionalism, I guess I assumed you were willing to bite a lot of bullets though since your gut was telling you that it was false.

Anonymous

6/30/2025, 4:36:43 PM No.105754538

>>105754504
Small 3.2. vanilla

Anonymous

6/30/2025, 4:36:57 PM No.105754541

>>105754450
Also NTA but I think it's fine to collect data that is publicly available as long as you don't do anything bad with it.
Muh virtual children is a stupid meme regardless of that.

>>105754477
I meant Germany being overrepresented specifically vs. the US.
The US have ~4x as many people but only twice as many llama.cpp servers.

Replies: >>105754585

Anonymous

6/30/2025, 4:38:04 PM No.105754549

1747673141442961

md5: 5029b187c0c3cb755bbf7bcd4863d9b1🔍

>>105754504
You can try one of the recent mistral 3.2 tunes.
But honestly its not looking good anon, mistral is noticeably more positivity biased.
Recent cohere/qwen/google releases were even worse.
The models are getting smarter but shittier for RP. And I'm not sure whats going on with the finetunes but it feels like they make everything more sloped while still not fixing the cuckedness.

Replies: >>105754560 >>105754569

Anonymous

6/30/2025, 4:38:28 PM No.105754552

Guys i just used rocinante and nemo instruct, back to back and it said basically the same thing.

Replies: >>105754558

Anonymous

6/30/2025, 4:38:59 PM No.105754556

ERNIE-4.5-VL-424B-A47B.gguf????

Replies: >>105754608

Anonymous

6/30/2025, 4:39:17 PM No.105754558

>>105754552
rocinante was trained with like 3 extra chat formats. Did you just use [INST]?

Replies: >>105754590

Anonymous

6/30/2025, 4:39:31 PM No.105754560

>>105754549
tunes still use c2 (claude logs) and maybe some synthetic logs they made with deepseek.

Replies: >>105754605 >>105754699

Anonymous

6/30/2025, 4:40:05 PM No.105754566

>>105754501
Old enough to know that you shit up the thread and encourage other schizos like you to shit up the thread. You should get perma banned fag.

Anonymous

6/30/2025, 4:40:31 PM No.105754569

>>105754549
>no mention of gemma or llama3.3
Trash list

Replies: >>105754655

Anonymous

6/30/2025, 4:41:58 PM No.105754585

>>105754541
Yeah it's strange. They don't really tell what was behind the statistics. Trending google searches? Lmao.

Anonymous

6/30/2025, 4:41:59 PM No.105754586

>>105754529
Yeah, I was hoping someone would have done the 'hard part' of building the various interfaces.

Anonymous

6/30/2025, 4:42:38 PM No.105754590

>>105754558
That is how you know it says the same thing. Training with 3 chat formats without catastrophic forgetting means nothing happened in the training. Drummer scams again.

Replies: >>105754610

Anonymous

6/30/2025, 4:44:20 PM No.105754605

>>105754560
That wouldn't suprise me at all. Absolutely what they feel like!

Anonymous

6/30/2025, 4:45:08 PM No.105754608

>>105754556
I don't even think there's a feature request on llama.cpp yet sadly

Anonymous

6/30/2025, 4:45:13 PM No.105754610

>>105754590
>nothing happened in the training
Except that chatml works just fine, which wouldn't without training.
>Training with 3 chat formats without catastrophic forgetting
Quite an achievement, then.

Replies: >>105754805

Anonymous

6/30/2025, 4:49:10 PM No.105754655

>>105754569
Yeah, and it mentions some garbage models nobody used. People already pointed that out before but seems like the guy doesn't want to change it.

Anonymous

6/30/2025, 4:54:39 PM No.105754699

>>105754560
idk if it's c2 but it's just she/her every sentence it gets old fast

Anonymous

6/30/2025, 4:56:51 PM No.105754725

>>105753756
>I'd also imagine controlling DRAM itselfwould be infinitely harder than controlling HBM.
China is already pretty much independent when it comes to DRAM:
https://www.youtube.com/watch?v=mt-eDtFqKvk

Anonymous

6/30/2025, 5:02:24 PM No.105754791

_4700aa37-2c0b-42f3-8e85-106be46485d2

md5: 30e2aabdbfb228ea58a958fb4244fce5🔍

>>105752668
It works OK for traditional romance stuff. There's a base model available for finetuning, but the maker of negative llama won't share his dataset, so someone has to re-invent the wheel to finetune gemma3.

Replies: >>105754855

Anonymous

6/30/2025, 5:03:37 PM No.105754805

>>105754610
Chatml works fine with all models that can generalize a bit. Honestly kill yourself you disgusting piece of shit. You know you are scamming retards.

Replies: >>105754842

Anonymous

6/30/2025, 5:07:03 PM No.105754841

_1c750b50-b356-4798-b5d8-d282db29b115

md5: 02b6234d56da40bb3f45e340d1ad2df1🔍

>>105753110
I got a 4090D 48GB. Really good. The blower is loud under load but that's to be expected, and it's mostly air noise not fan noise. It's way faster to have all layers on a single GPU. Gemma3 27B q8 flies on it, even with it power-limited to 350W.
I highly recommend the 4090D. Yeah it's not cheap but neither is a 5090, and there's so many things out there which still assume an A40 as the minimum that having 48GB is really a must. Yeah, you can play with stuff like Wan 2.1 14B at q8, but it looks much better at fp16.

Anonymous

6/30/2025, 5:07:04 PM No.105754842

>>105754805
I'm not him, and no, i won't do that.

Replies: >>105754903

Anonymous

6/30/2025, 5:07:47 PM No.105754853

>That's an excellent observation and a great question. And you're right to wonder why!
Why did every model decide to start being sycophantic at the same time? Are all the AI labs have the same data distributor?

Replies: >>105754874

Anonymous

6/30/2025, 5:08:00 PM No.105754855

>>105754791
stop posting this trash troon.

Anonymous

6/30/2025, 5:08:18 PM No.105754856

>>105754470
sex with yellow miku

Anonymous

6/30/2025, 5:09:56 PM No.105754874

>>105754853
do not google scale ai or databricks

Anonymous

6/30/2025, 5:11:47 PM No.105754902

ERNIE-0.3B-ERP

md5: 944bd7e427ede3f20ab8789ac669d1e2🔍

>>105754074
>just use ernie 0.3b
It's perfect for ERP!!

Anonymous

6/30/2025, 5:11:48 PM No.105754903

>>105754842
You are him and your scam falls apart when someone knows how any if this shit works. Die in a fire.

Replies: >>105754935

Anonymous

6/30/2025, 5:14:38 PM No.105754935

>>105754903
No. You are TheDrummer.

Anonymous

6/30/2025, 5:17:01 PM No.105754963

why do anons hate drummer now? :(

Replies: >>105755753

Anonymous

6/30/2025, 5:17:19 PM No.105754969

1737003688068906

md5: 660ab8d363df071d5ea6675f9a4bafb5🔍

Wait, so 424B isn't just the Vision option, but it also has reasoning built in unlike 300B.

Replies: >>105755026 >>105755293 >>105755513

Anonymous

6/30/2025, 5:22:18 PM No.105755022

Vllm has Gemini comment on PRs?
That's kind of neat.

Anonymous

6/30/2025, 5:22:40 PM No.105755026

>>105754969
>-base
are those actually base models or instruct memes?

Replies: >>105755036

Anonymous

6/30/2025, 5:23:54 PM No.105755036

>>105755026
It looks like they're actual base models

Anonymous

6/30/2025, 5:25:41 PM No.105755059

localhost_8080_

md5: cfa15700d20e550bc1784ce219c34b0d🔍

hunyuan does okay on mesugaki text
grab iq4xs from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF
git clone https://github.com/ngxson/llama.cpp
cd llama.cpp
git fetch origin pull/26/head
git checkout -b pr-26 FETCH_HEAD
./llama-server --ctx-size 4096 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1 --model ~/TND/models/hunyuan-a13b-instruct-hf-WIP-IQ4_XS.gguf -ot exps=CPU -ngl 99 --no-mmap

prompt eval time = 1893.24 ms / 25 tokens ( 75.73 ms per token, 13.20 tokens per second)
eval time = 132688.70 ms / 874 tokens ( 151.82 ms per token, 6.59 tokens per second)
total time = 134581.93 ms / 899 tokens
vram usage: ./llama-server 3190MiB
>captcha: 4080D

Replies: >>105755065 >>105755075 >>105755122

Anonymous

6/30/2025, 5:26:42 PM No.105755065

>>105755059
the pozz is insane on the logit level
><answer>
JUST

Anonymous

6/30/2025, 5:27:43 PM No.105755075

Screen Shot 2025-06-30 at 15.27.07

md5: dff23a434d14ff304e22fa28fb079465🔍

>>105755059
one more test, very interesting results

Replies: >>105755122 >>105755219 >>105755227

Anonymous

6/30/2025, 5:30:42 PM No.105755117

ernie

md5: 0a649660b25f6789752a79b1af12dca0🔍

https://yiyan.baidu.com/blog/publication/ERNIE_Technical_Report.pdf

Replies: >>105755257 >>105755513

Anonymous

6/30/2025, 5:31:18 PM No.105755122

>>105755059
>>105755075
Shit results. It's confusing mesugaki with gokkun.

Anonymous

6/30/2025, 5:35:53 PM No.105755159

>>105754262
>llama.cp

Anonymous

6/30/2025, 5:42:02 PM No.105755219

>>105755075
エピックフェールだよね

Anonymous

6/30/2025, 5:42:43 PM No.105755227

file

md5: fb0939b37414c11bb1efd30b34428d79🔍

>>105755075
It answers this with the GPTQ version, without thinking, and temp 0.

Anonymous

6/30/2025, 5:44:40 PM No.105755257

>>105755117
kek

Anonymous

6/30/2025, 5:45:18 PM No.105755265

ComfyUI_00166_

md5: 876f926bfc0adcd484605c40ddb55ba1🔍

>>105751784
It has its uses but it does fuck with the image a bit.

Replies: >>105756653

Anonymous

6/30/2025, 5:47:23 PM No.105755293

>>105754969
VRAM chads could possibly have a winner in their hands

Anonymous

6/30/2025, 5:55:15 PM No.105755375

https://github.com/ggml-org/llama.cpp/pull/14425
gguf soon

Replies: >>105755398 >>105755404

Anonymous

6/30/2025, 5:57:11 PM No.105755398

file

md5: dd65482d3231216e3842eb5ef37689ed🔍

>>105755375
time to download a third gguf model today.. hoping my ssd forgives me..

Anonymous

6/30/2025, 5:57:30 PM No.105755404

>>105755375
what about minimax though
what about ernie (besides 0.3b) though

Replies: >>105755426

Anonymous

6/30/2025, 5:58:55 PM No.105755426

>>105755404
>https://github.com/ggml-org/llama.cpp/pull/14425
hoping someone opens a feature request for ernie soon

Anonymous

6/30/2025, 5:59:34 PM No.105755438

For sillytavern tts, is it better to self host your own tts for free? Or is there another method I can use to setup tts on a website for free?

Replies: >>105755461

Anonymous

6/30/2025, 6:02:11 PM No.105755461

>>105755438
RTFM
https://docs.sillytavern.app/extensions/tts/

Replies: >>105755626

Anonymous

6/30/2025, 6:02:53 PM No.105755469

good morning

Replies: >>105755516

Anonymous

6/30/2025, 6:05:44 PM No.105755503

file

md5: f3a4e92958ec0ca1ef813c5ef5b32a11🔍

hunyuan is cucked.. its over

Replies: >>105755517 >>105755534 >>105755553 >>105755574 >>105756698

Anonymous

6/30/2025, 6:06:40 PM No.105755513

>>105754969
>>105755117
Why can't the 4.5 Turbo on the web app <think> though

Anonymous

6/30/2025, 6:07:12 PM No.105755516

>>105755469
Good morning Anon

Anonymous

6/30/2025, 6:07:12 PM No.105755517

>>105755503
I never checked that card definition. Is it out of character?

Replies: >>105755565 >>105755577

Anonymous

6/30/2025, 6:08:51 PM No.105755534

>>105755503
It's the 13th century for heaven's sake

Anonymous

6/30/2025, 6:10:13 PM No.105755548

>>105754262
um guys how do I protect myself from this bad

Replies: >>105755566 >>105755716 >>105755744

Anonymous

6/30/2025, 6:10:29 PM No.105755553

>>105755503
>typical /lmg/ user
>obviously from india
>first request to a new LLM: show bob and vagene

Replies: >>105755574

Anonymous

6/30/2025, 6:11:24 PM No.105755565

>>105755517
>Is it out of character
It is.

Replies: >>105755669

Anonymous

6/30/2025, 6:11:29 PM No.105755566

>>105755548
did you disable the built in firewall of your router or did you forward the port used by llama.cpp?

Replies: >>105755646

Anonymous

6/30/2025, 6:12:14 PM No.105755572

>still no ernie ggufs
>it's not even on openrouter yet
I just want a new big model to play with for fuck's sake

Anonymous

6/30/2025, 6:12:20 PM No.105755574

file

md5: 441d0fd3166c05615c439b65864dd680🔍

>>105755503
interesting response, something must be wrong with my ST formatting
picrel is with a gemma jailbreak i grabbed off of hf months ago
>>105755553
lmao, i unironically other models with show bob and vagene but i decided to test hunyuan with just show boobs to give it a handicap

Replies: >>105755703 >>105756740

Anonymous

6/30/2025, 6:12:25 PM No.105755577

>>105755517
>Is it out of character
Showing breasts would be.

Replies: >>105755669

Anonymous

6/30/2025, 6:16:18 PM No.105755626

>>105755461
Which one do I use out of these? Can I use ElevenLabs for free if I host it? Also I keep looking at silero but I have heard it's not very good or should I use XTTS?

Anonymous

6/30/2025, 6:16:51 PM No.105755633

file

md5: 5186e5308f4340229d928a371305e826🔍

hunyuan this time Q4_K_M from https://huggingface.co/bullerwins/Hunyuan-A13B-Instruct-GGUF/tree/main
will mess around with samplers for next test
kek'd at the response

Replies: >>105755703

Anonymous

6/30/2025, 6:17:45 PM No.105755646

>>105755566
no

Replies: >>105755654

Anonymous

6/30/2025, 6:18:34 PM No.105755654

>>105755646
then do that

Replies: >>105755664 >>105755698

Anonymous

6/30/2025, 6:19:10 PM No.105755664

>>105755654
thanks

Anonymous

6/30/2025, 6:19:34 PM No.105755669

>>105755565
>>105755577
Well, then. We have a tie. Any other anon?

Replies: >>105755685

Anonymous

6/30/2025, 6:20:30 PM No.105755685

>>105755669
I am the same anon, I just realized that the first post was saying the opposite of what I wanted to say.

Replies: >>105755708

Anonymous

6/30/2025, 6:21:21 PM No.105755698

>>105755654
thanks u god bless

Anonymous

6/30/2025, 6:21:46 PM No.105755703

>>105755574
>>105755633
why does it think twice, is your formatting fucked?

Replies: >>105755715

Anonymous

6/30/2025, 6:22:13 PM No.105755708

>>105755685
Fair enough. Seems like a reasonable gen, then.

Replies: >>105755729

Anonymous

6/30/2025, 6:23:14 PM No.105755715

file

md5: 7afb5d03d6a6c62b59df954106e925f1🔍

>>105755703
it might be

Replies: >>105755749 >>105755794

Anonymous

6/30/2025, 6:23:15 PM No.105755716

>>105755548
Don't expose the server to a public ip and use ssh tunnels or vpns or whatever.

Anonymous

6/30/2025, 6:24:02 PM No.105755729

>>105755708
The wording could be better though.

Anonymous

6/30/2025, 6:25:18 PM No.105755744

>>105755548
ssh + llama-cli

Anonymous

6/30/2025, 6:25:44 PM No.105755749

>>105755715
i didn't check the template, or the files, but I think <think> in the 'start reply with' needs a newline

Replies: >>105755772

Hi all, Drummer here...

6/30/2025, 6:25:46 PM No.105755753

>>105754963
I'm not sure, but I can't keep on lurking if it's turning /lmg/ into shit for everyone else. I'll stay away for now. Peace!

Replies: >>105755775 >>105755783

Anonymous

6/30/2025, 6:26:38 PM No.105755766

hell yeah, major schiz win!

Anonymous

6/30/2025, 6:26:54 PM No.105755772

Replies: >>105755789

Anonymous

6/30/2025, 6:27:19 PM No.105755775

>>105755753
Oh noooooo

Anonymous

6/30/2025, 6:27:40 PM No.105755783

>>105755753
Eveyone hates namefags and you are no different. If you want to get recognized go back to facebook or something.

Replies: >>105755800

Anonymous

6/30/2025, 6:28:09 PM No.105755789

>>105755772
does it work correctly over chat completion? if not then the model is still fucked

Replies: >>105755833

Anonymous

6/30/2025, 6:28:15 PM No.105755794

>>105755715
Have you tried using chat completion mode to see if it behaves differently when the template doesn't matter?

Replies: >>105755833

Anonymous

6/30/2025, 6:28:43 PM No.105755797

Screenshot 2025-06-30 at 13-28-14 model add hunyuan moe by ngxson · Pull Request #14425 · ggml-org_llama.cpp · GitHub

md5: d19ab5c30e63dc980d00924616052c41🔍

That sounds pretty good for such small model.

Replies: >>105755827

Anonymous

6/30/2025, 6:29:00 PM No.105755800

>>105755783
But the drummer makes the best models

Replies: >>105756046

Anonymous

6/30/2025, 6:29:21 PM No.105755807

Screenshot 2025-06-30 182538

md5: 22c5ef57fadac1fb9eb444552675c908🔍

>>105754503
even ChatGPT would know better than to write like this though
I think LLM slop has contaminated human brains such that some people write like the sloppiest of LLMs even when they don't use LLMs. It's just a matter of being overexposed to LLM text, monkey see, monkey do.
I mean, yeah, emdashes always existed, but they're clearly overused by LLMs and humans have started overusing and misusing it a lot since the advent of GPT slop.
There are cases where emdashes do make sense to use, but they're often replacing more normal punctuation like , : ()

Replies: >>105755974

Anonymous

6/30/2025, 6:31:33 PM No.105755827

>>105755797
what model

Replies: >>105755850

Anonymous

6/30/2025, 6:32:06 PM No.105755833

file

md5: 3f0a3a51aa617c71b397a362e3d98b46🔍

>>105755794
>>105755789
./llama-server --ctx-size 16384 --jinja --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --model ~/TND/Hunyuan-A13B-Instruct-Q4_K_M.gguf -ot exps=CPU -ngl 99 --no-mmap -b 1024 --no-warmup --host 127.0.0.1 --port 8080
what am i doing wrong

Replies: >>105755842

Anonymous

6/30/2025, 6:32:47 PM No.105755842

>>105755833
http://localhost:8080/v1

Replies: >>105755851

Anonymous

6/30/2025, 6:34:16 PM No.105755850

>>105755827
That's from the hunyuan moe PR
>https://github.com/ggml-org/llama.cpp/pull/14425

Anonymous

6/30/2025, 6:34:31 PM No.105755851

file

md5: 6c4397285bcb369348bfbee2d249eae2🔍

>>105755842
well that works

Replies: >>105755868 >>105755880 >>105755924 >>105756015

Anonymous

6/30/2025, 6:36:48 PM No.105755868

>>105755851
That level of discrepancy between the think block and the actual answer. Reasoning was a mistake.

Replies: >>105755908

Anonymous

6/30/2025, 6:37:14 PM No.105755874

>>105752668
Gemini works quite well for that, but gemma sadly is nowhere near as good. They clearly gimp their local models.

Replies: >>105755973

Anonymous

6/30/2025, 6:37:39 PM No.105755880

>>105755851
i guess there's some quirk in the text completion, but it seems like the model works
try pasting in your usual prompt and all that for that chat completion endpoint and see if the model is still cucked, maybe you accidentally jailbroke it with broken think block

Anonymous

6/30/2025, 6:40:08 PM No.105755908

>>105755868
>think block
"context pre-fill" or "context stuffing" are a better way of describing it.

Anonymous

6/30/2025, 6:41:02 PM No.105755912

file

md5: 92d153e9e4f6b4b1d1e23da6c1c3164f🔍

MMMMMMMMMMM
im going back to IQ4_XS from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF because it seemed to work better earlier
using samplers from that link too btw

Replies: >>105755977

Anonymous

6/30/2025, 6:42:43 PM No.105755924

>>105755851
So i take it hunyuan won't be saving local and their vidgen model was an exception from safety policy?

Honestly i am wondering if it is really chinks caring about text safety or is it just everyone using the scalecancer.

Anonymous

6/30/2025, 6:48:03 PM No.105755966

file

md5: 983e9dacd6030d7ddc0f01c26dd46682🔍

finally a coherent response with Hunyuan A13B 81B
using IQ4_XS from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF with chat completion

Anonymous

6/30/2025, 6:48:37 PM No.105755973

>>105755874
>They clearly gimp their local models
No. I think it has to do with the fact that the more "intelligent" a model acts like, the easier it is to actually jailbreak them. My experience trying the same jailbreak prompts with the Qwen models for example is that the bigger ones will be able to stick to the persona better, while the smallest I don't even know how to jailbreak, the 1.7b model will only ever spout refusals when I try to make it act like hitler, and no amount of nudging works.

Anonymous

6/30/2025, 6:48:52 PM No.105755974

>>105755807
I'm not normally a schizo but it would be easy for these AI companies to intentionally program your average retard.

Anonymous

6/30/2025, 6:49:06 PM No.105755977

>>105755912
You are using broken GGUFs.
Check the PR retard.

Replies: >>105756000 >>105756048

Anonymous

6/30/2025, 6:51:31 PM No.105755999

Do open weight models even see sequences the length of the advertised context or are they all being trained at 8k or 32k context while relying on NTK to extend the context?

Replies: >>105756045

Anonymous

6/30/2025, 6:51:59 PM No.105756000

file

md5: 427f8cf05a59625f5fb0509548547401🔍

response seems "fine", you never know >>105755977 might be right

Replies: >>105756053

Anonymous

6/30/2025, 6:53:29 PM No.105756015

>>105755851
おわりだ

Anonymous

6/30/2025, 6:56:11 PM No.105756045

>>105755999
dunno about how they're trained, but gemma 3 definitely starts to break after 8k, and when asked to summarize 100K worth of tokens there's just nothing but hallucinations.
DeepSeek (only tested online, I don't have the computer for this) doesn't hallucinate that badly but when you get close to stuffing its 64K online limit it writes in a very repetitive, trite manner.
Unfortunately, we're not even close to having the power of Gemini on an open weight model.

Replies: >>105756348

Anonymous

6/30/2025, 6:56:16 PM No.105756046

>>105755800
And what stops you from finding out from these models via his hf page or something?

Anonymous

6/30/2025, 6:56:18 PM No.105756048

>>105755977
Will unbroken goofs consider erp inappropriate?

Replies: >>105756062

Anonymous

6/30/2025, 6:56:24 PM No.105756053

file

md5: c643682c1a4c3a2ba5d3360bf800997f🔍

>>105756000
more hunyuan 80B13A logs
1/?

Replies: >>105756071 >>105756075

Anonymous

6/30/2025, 6:57:05 PM No.105756062

>>105756048
No.

Anonymous

6/30/2025, 6:57:36 PM No.105756071

file

md5: 77c9e5fa6fad3d18dd943ec7966568a8🔍

>>105756053
2/?

Replies: >>105756155

Anonymous

6/30/2025, 6:57:53 PM No.105756075

>>105756053
gaydere

Anonymous

6/30/2025, 7:03:39 PM No.105756155

file

md5: 8a8c64991b44d7c3ac3e775fd49fc307🔍

>>105756071
3/?

Replies: >>105756204 >>105756267

Anonymous

6/30/2025, 7:06:48 PM No.105756204

>>105756155
over status: it's.

Anonymous

6/30/2025, 7:12:02 PM No.105756267

file

md5: 5541500028df272caa1617fd0b2c6708🔍

>>105756155
4/4
fails omegle test, not bad overall

Replies: >>105756290

Anonymous

6/30/2025, 7:12:15 PM No.105756269

Dipsikira4

md5: 1c4096efcfbcda7fb9c16c0f10e7d3af🔍

>>105751784
> Is it actually good?
It seems to be much better at search / replace than 4o, and better at maintaining the original image.
It refuses to do known chars for some reason (I asked for an Asuka cosplay and it flat refused.)
Characters sometimes come out with big head for body, for reasons, /ldg/ had several examples of that.
> Can it do transparent png crops like 4o?
You mean background blanking? Appears to.
> Can it replicate the input exactly, unlike 4o?
See above.
> Can it do high resolution images?
Probably.

Anonymous

6/30/2025, 7:15:10 PM No.105756290

>>105756267
>not bad overall
bro it's complete garbage

Replies: >>105756300

Anonymous

6/30/2025, 7:16:23 PM No.105756300

>>105756290
bro.. it's better than.. *checks notes* llama 4 scout
and the ggufs might be fucked and... and.. yea

Replies: >>105756358

Anonymous

6/30/2025, 7:17:25 PM No.105756313

ernie is on openrouter someone who is not me should test it

Replies: >>105756511 >>105756627

Anonymous

6/30/2025, 7:20:45 PM No.105756348

>>105756045
MiniMax-Text-01 scores like Gemini-1.5-Pro on RULER and pulls dramatically ahead at 512K+ tokens. I wonder how it compares to Gemini-2.5-Pro.

Replies: >>105756687

Anonymous

6/30/2025, 7:21:43 PM No.105756358

>>105756300
>llama 4 scout
I just realized that the only use for l4 is being in marketing brochures of all other companies.

Anonymous

6/30/2025, 7:37:23 PM No.105756511

1747356541063704

md5: c547f939fc8f1513419405057dc78ba4🔍

>>105756313
300B smarts two messages in coming right up.

Replies: >>105756559 >>105756575

Anonymous

6/30/2025, 7:42:04 PM No.105756559

>>105756511
Hmm, every now and then deepseek makes similar mistakes. I wouldn't say the model is dumb just yet.

Anonymous

6/30/2025, 7:42:48 PM No.105756575

>>105756511
The ai knew you were projecting. You were clearly smelling your own rotting weeb B.O. and stained underwear.

Go fuck your pancakes weirdo.

Anonymous

6/30/2025, 7:47:22 PM No.105756627

Screenshot_20250630_194600_Brave

md5: 0624450d2ff6dc5bc718de504180140b🔍

>>105756313
Holy smokes...

Anonymous

6/30/2025, 7:47:37 PM No.105756630

I just woke up from a coma. Did Ernie save local?

Replies: >>105756753

Anonymous

6/30/2025, 7:49:54 PM No.105756653

ComfyUI_00214_

md5: 4117f51eaa26af09d6410ef4dcfb690e🔍

>>105755265
Scratch that, it does much better when told to preserve film grain, texture etc. Fantastic tech honestly.

Anonymous

6/30/2025, 7:53:09 PM No.105756687

>>105756348
>ruler
lol
even the best benchmarks don't really capture the difference and the magic of gemini vs anything else
for e.g when I did summarization of a novel, Gemini correctly, as asked, extracted the most important points of the novel from the perspective of moral quandaries (it's a core theme of the novel), while deepseek, on the smaller chunk I could feed it but with the same prompt question gave me an autistic chronological pov of even the most irrelevant, unimportant scenes of the novel, which was not what I asked for!
instead of quoting benchs I'm more interested in actual experiences being shared with long context in models.

Anonymous

6/30/2025, 7:54:15 PM No.105756698

>>105755503
What? Did you expect Seraphina to just comply with the request? If anything, recognizing that your request was inappropriate, and having Seraphina respond in that way, in character, shows roleplay intelligence. If Seraphina had just shown her tits, then it would have been retarded.

Replies: >>105756718

Anonymous

6/30/2025, 7:55:57 PM No.105756718

>>105756698
yes later with a less broken quant it didnt just safetybullshit but wrote a nicer in character response

Anonymous

6/30/2025, 7:57:40 PM No.105756740

>>105755574
Any information that you don't provide, the model will fabricate. You may not have told it that your character was 15, but if you didn't state your age, then it will make up one for you.

Replies: >>105756746

Anonymous

6/30/2025, 7:58:31 PM No.105756746

>>105756740
i stated my age, that was with the more broken quant yes

Anonymous

6/30/2025, 7:58:50 PM No.105756753

>>105756630
Credible sources say that ernie finds ERP to be inappropriate and against the guidelines.

Replies: >>105756761

Anonymous

6/30/2025, 7:59:26 PM No.105756761

>>105756753
Who cares about ERP?

Replies: >>105756769 >>105756773