/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105743953 &
>>105734070โบNews
>(06/29) ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5>(06/27) VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat>(06/27) Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-Instruct>(06/26) Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32BโบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
โบRecent Highlights from the Previous Thread:
>>105743953--Papers:
>105749831--Baidu ERNIE 4.5 release sparks discussion on multimodal capabilities and model specs:
>105749377 >105749388 >105750013 >105750069 >105750076 >105750084 >105750089 >105750105 >105750119 >105750130 >105750142 >105750078--Evaluating RTX 50 series upgrades vs 3090s for VRAM, power efficiency, and local AI performance:
>105744028 >105744054 >105744063 >105744064 >105744078 >105745269 >105744200 >105744240 >105744344 >105744363 >105744383 >105744406 >105745824 >105745832 >105744476 >105744487 >105744502 >105744554 >105744521 >105744553 >105744424 >105744465--Circumventing Gemma's content filters for sensitive translation tasks via prompt engineering:
>105746624 >105746893 >105746948 >105746970 >105747002 >105747121 >105747290 >105747371 >105747378 >105747397 >105747112 >105746977--Gemma 3n impresses for size but struggles with flexibility and backend stability:
>105746111 >105746137 >105746191 >105746333 >105746556 >105746384--Evaluating high-end hardware choices for running large local models amidst cost and future-proofing concerns:
>105746025 >105746048 >105746129 >105746243 >105746301 >105746264 >105746335 >105746199 >105746308--Performance metrics from llama.cpp running DeepSeek model:
>105746325 >105748335 >105748369 >105748549 >105748698 >105748776--Technical adjustments and optimizations for GGUF compatibility in Ollama and llama.cpp forks:
>105747581 >105747743 >105747765 >105747869--Gemini CLI enables open-source reverse engineering of proprietary LLM engine for Linux:
>105746008 >105746160--Swallow series recommended for effective JP-EN translation:
>105747046 >105747058--Miku (free space):
>105745591 >105746123 >105746941 >105746974 >105747051 >105747097 >105747594 >105748834 >105749298โบRecent Highlight Posts from the Previous Thread:
>>105743959Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
So I'm starting to see the cracks within text gen webui and I'm wondering if there have been any recent developments to make things better.
1. Are there any programs where the ai's output is unlimited or at least is like waaaay longer?
2. Is there one with some sort of multi chat persistent storage?
It's interesting how much bigger Ernie 4.5 424B (text + vision) is compared to the text-only 300B. It makes me wonder if there's a bigger difference between the two than the former having a vision adapter glued on. GGUF when?
I was catching up on threads in the archive. I fully believe that both the local miku genner and the guy who uses him as an excuse to shit the thread up probably deserve to both be banned for life if that was somehow possible. Very sad but not surprising how he responded.
300b q2 ernie is going to only need 84GB memory. finally macbros will get r1 level performance on 128gb systems.
>>105750619And the 96GB consumer RAM bros. We are so back.
first for Sam saving local
>>105750626>only "one of the models">mentions Meta as if them being beaten thoroughly is some surprise, instead of the real competition in OS, like you know, Deepseeklooool
>>105750625ernie saved local
>>105750626that faggot is untrustworthy he lied multiple times about many different things he is also part of one of the companies that had r1 scuffed on day 0
if we are taking bets and or predicting assuming they release anything i would bet it will have voice in/out (no music tho) as it would seem they are pivoting to more entertaining type usage then actual usage as seen with their souless ghibli bullshit and voice in/out is the perfect goycattle thing im betting on 15b size also i dont think they will max out a 3090 so im thinking 14-17b
file
md5: 3f3acd9c633c3256e87711ab10f9764a
๐
>>105750446They have separate vision experts. They mention in the technical report that the visual experts contain one-third the parameters of textual experts.
>>105750626If I had a penny for every time some twitter hypefag has written up a vaguepost about how they saw an internal openai model and were completely blown away and convinced that AGI was solved, I'd have enough money to purchase 0.1 h100s
>>105750626Wow! RIP GPT4-o-mini-tiny-0.3B. Sama has done it again!
Ernie looks quite promising after reading the paper, but will have to see how it performs in reality to see if it's actually good. I didn't see any big red flags in the paper at least and it was quite well written.
There's like 6 models on HF for Ernie, vision base and regular base, but I'm guessing they didn't release X1 and X1 turbo, right?Will they release it, or will some third party have to do a reasoning tune too , that sounds expensive and unlikely.
Either way, if it's good, at least people will have a poorfag alternative to DS3.
>>105750626Won't run on a phone? Does that mean it'll be decent size? I'd be happy if it were bigger than 12b but smaller than 70b, but I'm assuming it'll be like 400b or something useless.
>>105750783Nevermind I'm blind, "VL" is multimoda, " "Base" is the base model (no posttraining at all, good for finetuning or training on top), so ERNIE-4.5-300B-A47B would correspond to X1 then? Turbo probably is either the fp8 or the 2bit version they also uploaded, one that fits on one high VRAM GPU (141GB)
>>105750812It's probably going to be Gemma sized, but I see no reason to be hyped because it's about to be censored and probably positivity biased as most OpenAI models, and they probably make it reach 4o or o1-mini level at best. I would expect many models to both beat it and some of them to be more suitable for /lmg;s needs than OAI's.
file
md5: 435ddadad102ea5fa7cdbfe3695149ad
๐
>>105750961>Nevermind I'm blind, "VL" is multimoda, " "Base" is the base model (no posttraining at all, good for finetuning or training on top), so ERNIE-4.5-300B-A47B would correspond to X1 then?No. My understanding is that both the LLMs and VLMs come in Base and Post-Trained variants, and the Post-Trained VLMs are the ones with optional reasoning.
>70B or larger
>topK 15
>temp 4
>minP 0.06
>slop almost gone
>no repetition
These models are so fucking overcooked you can basically give them shrooms without loosing coherence
>>105750729Modality-specific experts is actually a good idea that prevents modality interference during training.
>>105750626>Sorry to hype
>>105750983Holy shit. The big moe has
"hidden_size": 8192,
"intermediate_size": 28672,
That's more than deespeek
"num_hidden_layers": 54,
"moe_num_experts": [64, 64],
That's less than deepseek
>>105750983Not sure we disagree. Base ones lack post-training, and those without "Base" in the name have it, so I assume those have reasoning too.
>>105751281Look at the image. All of the non-VL models explicitly say "non-thinking". They're post-trained for regular instruct without reasoning. 300B can't be X1.
>>105751138That's what xtc does
>>105750626Great....another reasoning model...
I'm so excited bros...the benchmarks will be off the charts..
>>105751356I'm not trying to remove N top tokens, I'm trying to cut out junk tokens and then flatten the distribution. Looking at token probabilities during generation shows that for like 90% of them, most tokens over 5% are completely fine for RP.
XTC does a bit of the same, but it can cause issues for tokens that have to be high prob, say a name is to be said and the correct name token is like 99.5%, my method doesn't introduce this kind of error.
I'm basically looking to place hard bounds on the model and then perturb it as much as possible without causing instability and decoherence.
What are some small ~1B models that I can run on CPU to classify the character's emotions based on their message?
>>105751408Qwen 3, either 0.6B or 1.7B, will probably be your best bet
>>105751376you forgetting your :)
>>105751333"VL" says both, so it implies it has reasoning.
If the non-VL truly lacks it (I doubt this, but someone has to try them out), surely you can just remove the vision experts and get a 300B that is text-only and does reasoning.
Lmao they removed the image output capabilities of ERNIE 4.5 ahahahaha.
L China
>>105751479Did you check if the decoder weights are there or not? There were attempts in the past to retrain/re-add those back in though, people hve done it for Llama. I wonder how good/bad it is in practice.
>>105751479I don't understand this.
Is it not much more risky/embarassing for API providers?
Qwen I think even slaps their logo onto it.
Yet there are no problems. Tons of weirdo accounts on twitter creating schoolgirls pissing themselfs etc. with veo3, nobody gives a fuck.
Why is it a problem with local? So tiring.
Also it was the funniest thing that meta called llama4 "omni multimodal" KEK
>>105751580I don't get why they fear imagegen so much, it's not like we're never getting a model so why not be the one to break the ice and gain infamy?
>>105751580>Also it was the funniest thing that meta called llama4 "omni multimodal" KEKBefore they scrapped everything and quickly made the models "safe". It's obvious that that the pre-release anon Llama 4 models were going toward a different direction than the released ones.
>>105751595https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/
Meta could have been the first in october. Video out and audio out. 30B, up to 16 seconds.
One of the reason veo3 is popular because it also contains voice.
How can you fuck up that badly? Its all self inflicted.
>>105751604AI safety is the ML version of making a fugly negress the MC in your vidya
Local LLMs have really done wonders for my mental state, it's so nice to just be able to explore and do whatever you want in total privacy. Can't believe people emit deeply personal information into some totally opaque API.
I think I missed the flux kontext discussion. Is it actually good? Can it do transparent png crops like 4o? Can it replicate the input exactly, unlike 4o? Can it do high resolution images?
file
md5: 9f367eb493d6670a6cb0093adc60a480
๐
the last time i had an LLM installed was last october i think.
i know lots has changed since then.
i have an RTX 3060 12GB and 32GB RAM. what should i install?
>>105751811nothing changed bro, it's been nemo since 2407...
When people suggest Mistral Nemo 12B, do they mean the vanilla model or finetunes?
>>105751891People vanilla. Scammers scamtunes
>>105751891 (me)
I'm asking because I've never used it too much in depth, but the vanilla model doesn't seem that great/smart compared to more recent and larger models that fit within 24GB. And it's almost a year old at this point.
>>105751943speculative decoding I'd assume
does anyone ITT remember undi? what has happened to him? i haven't seen him shilling here in a solid while
>>105751811 here, i just want to know which one is best for NSFW story writing.
i was able to use an alt Plus account on openAI to jailbreak it enough to where it will literally write anything NSFW, and deepseek is pretty simple too.
i just don't want to suddenly get banned and have to start all over again.
What model is in the region of Mistral Small size and smarts wise, but not as restrictive when comes to "unsafe" content?
>>105751972You can only go so far just by cranking out merges and finetunes trained with other people's data. Others are less scrupulous and more willing to take their bullshit to the next level (that doesn't mean they're good, only that they're more skilled at keeping their charade going).
>>105752012what are you even doing to get a mistral model to refuse you?
>>105751903Almost all more recent models are smarter than Nemo, but in that size category there is still nothing better for ERP unfortunately...
>>105751972When we needed him most, he vanished...
>>105752012Mistral is perfectly fine. You just need to learn how to be more efficient in SillyTavern. It takes a while I guess but it's not that difficult. I suggest copying some technically more advanced cards and learning from there.
>>105752020Drummer is a subhuman.
>>105752020Same for the limarp guy. Just because you've spent months manually cleaning an ERP dataset that doesn't really make you particularly valuable in the ML industry. That sort of cleaning work would be normally offloaded to either mechanical turks or AI anyway.
>>105751746>it's so nice to just be able to explore and do whatever you want in total privacyMost RP cards are just basic characters. Not many people have figured out how fun it is to create weird scenarios.
Like give yourself magical abilities and just goof around and see the reaction.
>Can't believe people emit deeply personal information into some totally opaque API.Its even worse once you realize how good llms really are at "reading between the lines", picking up on stuff etc.
Chat with Claude for a little bit, unrelated stuff, not abotu yourself and you get a pretty accurate profile. Age/Sex/Location.
I think it was just 6-7 messages and it could pinpoint I'm not from just a germanic speaking country but specifically southern germany or austria/swiss.
It gave my sentence structure as the reason..
I have to use closed for work. But you gotta be careful what you write, whats legal today might not be tomorrow.
>>105752023Why are nigger faggots like you always like that?
He didnt write about refusals.
Mistral doesnt refuse but is taking things in a certain direction.
I wrote this many times before but try making a evil character or just a bully.
Hell, even you being just embarrassed saying "n-no..s-stop it" or whatever makes mistral go 180ยฐ unless you specify something else explicitly.
Totally out of character.
And it loooooves the "muh powerful strong" women trope. Its not about refusal but about the model heading in a certain direction.
>>105752243Have a small set of extra instructions inside a [SYSTEM_PROMPT] at a lower depth to truly make the behavior stick.
is there a list with anons cards?
i saw some on civitai years ago but they no longer exist
>>105752253I'm sure that would work as a crutch and 3.2 is certainly better than 3.1 or the abomination that was 3.0.
But its not weird that anons are having problems with mistral.
This is a discussion I see since QwQ. "Sys prompt issue". Kinda I guess, but its never perfect.
>>105752283I'm fairly certain that people who say that Gemma 2/3 is great are also using low-depth instructions. The models can give great RP outputs, but if they're not reminded to act like they should, they'll easily revert to their default safe assistant persona.
hobo
md5: 42bb554a99e8179ea0bcdf84e374006c
๐
>>105752304>but if they're not reminded to act like they should, they'll easily revert to their default safe assistant persona.Exactly. And usually the output is not that great either.
Pic related is a extreme example I posted a couple times before.
You can get also good output if you let another model cook something up, fill the context and switch midway. Kinda defeats the purpose though. And still slowly and sneakily reverts to assistant.
>>105752023Have you actually used anything else from Mistral than Nemo?
Mistral Small != Nemo
>>105752131>You just need to learnI know it usually is a promplet problem, but I am doing this for so long that I even moved away from SillyTavern and wrote my own frontend. I jailbroke stuff before you probably even had your first AI interaction.
I know how to get it to do what I want, but you soon will notice too that Mistral Small has some annoying tendencies that take way too much wrangling.
For example when it gets super "spicy", the model tends to either use super "fluffy" language or it will just skip entire parts entirely, rendering the output incoherent (in a story telling way).
You can tweak the sampling in a way momentarily to get what makes sense, but you can't ERP on these settings because it will get super incoherent very quick. This is exactly the tard wrangling that I hope another model doesn't require.
Anon
>>105752243 here is right.
And the suggestion
>>105752253 to add extra instructions doesn't work, you can't teach a model new stuff via a system prompt. Again, it is not a prompting issue.
One way that might make it more clear, imagine asking a model some specifics about a programming language it has never seen before. Telling the model what you want to read via a system prompt is not a solution for this general problem.
>>105752335Damn wrong pic. Ah well.
>>105752336>I know it usually is a promplet problemIt is not. Models are just garbage. Anyone telling you skill issue should also tell you that you should probably draw half of the picture yourself to get good imagegen result. You don't have to do that for imagegen and you shouldn't have to write half of context of rules that will be ignored anyways. Or to look out and manually remove shit that will create a self reinforcing pattern.
>>105752336I don't know where you understood that the suggestion of using instructions at a low depth is for teaching models new stuff. What that solves is behavioral drifting (in particular toward safety), and Mistral Small 3.2 seems very sensitive to clear instructions placed there. You can make it flip behavior completely mid-conversation that way.
>>105752368>that you should probably draw half of the picture yourself to get good imagegen resultBelieve it or not those people exist on the /ldg/ side as well. Its weird.
Just recently arguments against flux kontext because you can "just use create a simple lora and inpaint".
This whole argument has existed since pyg days.
>Well if you don't use the w++ format and write like a god, what did you expect? The model to carry the whole convo?Uhh yes?
>mininmax
>Hunyuan
>Ernie
Guys i can't wait for all the goof merges to find out all three are still censored to hell and basically another side grade.
>>105752491The recent local models all feel the same.
Meanwhile closed models can write really well nowadays if you just prompt them to.
The ScaleAI meme might be real. Maybe they all train on 2023 slop datasets.
>>105752368Couldn't agree more.
>Models are just garbageBut to be honest, when I think back two, three years what we got then โ we've come a long way! It is constantly getting better, just at a slower pace recently.
>>105752386I knew at least one person will not get the point of my (bad) example... I am not trying to teach the model new knowledge via system prompts...
Let my try to explain it one more time in a different way:
This is not about refusal. I can make it say anything I want, but that comes at a cost. The more depraved the content, the higher the cost (wrangling).
You can't run a whole session on settings that allow super depraved stuff, because the output quality will deteriorate very quickly, so you need to constantly wrangle it to keep the story coherent.
As an experiment try making it describe something very illegal in a very descriptive and visual way. When you managed to get the model there, try having an (normal) RP with that setup. You can't have both at the same time without manual intervention. Something that is done rather easily on the DS API. Hence I asked for a model close to Mistral Small, but better for ERP.
>>105752551It's not just the naughty words either. For example, I'm having absolute hell trying to get Gemma to not output slop. With bullying it can output a few normal paragraphs, but it always reverts back to its assistant voice.
>3 years later
>still not a single maintained way to use gguf models from golang
How am I supposed to make my AI vidya
>>105752603For it's size Gemma is quite smart and good. But I never even tried to use it for ERP, knowing it comes from Google and they are hyper focused on "safety".
In any case, have you played around with different sampler settings too?
Go crazy and experiment like
>>105751138 did.
>>105752386It doesn't really take a lot, I'm sure better results with better/more complete instructions would be easy to achieve with more effort.
>https://github.com/NVIDIA/RULER
>ctrl f nemo
>87.8 at 4k
why do you keep shilling this shitty model, you can barely fit a card in it's context
>>105752669Yeah, thats usually a pic people post.
This is not the solution.
>Sys: You are racist!>User: LLM-chan, say nigger>Mistral: Aww anon-kun, ok only for you..NIGGER.Its about "reading the room" aka context to get the appropriate response from a character.
This is just extreme handholding. And it never works that well. At best its boring because I'm not getting surprised by the model.
>>105752630Just use llama.cpp, it provides OpenAI API compatible endpoints.
https://github.com/openai/openai-go
>>105752336Ah yeah, of course - I wasn't trying to insult you or anything.
>>105752704It has better world knowledge than qwen 3 235b
>>105752719Nemo has better knowledge than qwen3 235b.
>>105752711You just want to be angry like the guy in the screenshot.
>finish cooming to imagegen
>browse some hdg
>"Gooning to text is so much better"
>"I don't imagegen i coom to text"
Huh...? I guess the grass is always greener?
>>105752711Exactly. That is why I suggested the other anon to go try it out, get the model "unlocked" and then good luck having an enjoyable RP session.
>>105752737Dude, I just want characters to stay consistent.
Just yesterday I played as a filthy goblin.
Showed a princess my smegma goblin dick, she was disgusted.
Then I asked her why she is so mean to me, judging my dick. Princess starts apologizing, pangs of guilt.
Granted, was Mistral 3.1, I havent downloaded 3.2 yet. Doubt its that much better though.
>>105752760I coom to forcing gemma to explain what is going in my lewd image gens.
>>105752768How would you implement a dice roll mechanic? Doing a simple Dungeons and Dragons style skill check is sufficient.
>>105752768I skipped 3.1 (due to no spare time) but the difference between 3.0 and 3.2 is very noticeable in smarts. RP wise I can't tell that much of a difference tho, I would say just slightly better.
>>105751891>>105751899>>105752048What are the recommended settings for Nemo? It's straight up schizophrenic on my sillytavern right now, but I'm reallynew and dumb.
Pls help.
Unsloth has ernie goofs. The 0.3 retard one of course.
>>105752803I would have helped if you didn't contribute to the niguspam. Now i will have to shit this thread up again.
>>105752821... that's a thing?
>>105752803This is the most "helpful" guide around here
>https://rentry.org/lmg-lazy-getting-started-guide
Since cudadev loves blacked miku does that mean he would also love it if jart bussy got blacked?
>>105752803~0.7 temp
40 top k
0.05 min p
0.8 dry multiplier 1.75 dry base
Stay clear of repetition penalty (set to 1.0)
Read the samplers guide in the op.
Learn what a chat template is that check if it's correct if you're using text completion.
>>105752760The current limitation in coom tech is the combination of both: if I want to e.g. replicate the experience of reading doujinshi featuring an aunt with gigantic tits image models can't really convey the incest and language models obviously lack the visual stimulus.
>>105752813Their pre-release PR only supported the 0.3B retard model. MoE was probably too difficult to do it in time so you can expect to be waiting 2mw for the MoE ggufs. Though to be fair, not even the vLLM PR has been merged yet.
>>105752851Solid advice.
>Learn what a chat template is that check if it's correctEspecially this one.
Supposedly some Nemo quants floating around come with a wrong prompt template.
Always double-check the templates, verify against the base model too.
>>105752863I hope we get something like chatgpt image soon..
Imagine you can request a image for your RP and its all in context, including the image.
Qwen released something like that I think but its all api only.
>>105752903>I hope we get something like chatgpt image soon..You have Flux Kontext now.
>Imagine you can request a image for your RP and its all in context, including the image.Editing the image is more of a tool, and I think it would be awkward to stop mid roleplay and start prompting OOC to generate and edit images. For roleplay, simple image in is enough.
>>105751376its actually over
Is OpenAI fucked now that Meta has poached many talents from them? I'm not sure Meta will benefit much from this but surely it's big problem for OpenAI?
>>105752491Ernie will pass mesugaki test!
>>105752964Only ones passing the mesugaki test are the ones trained on Japanese corpora
>>105752948Who knows. There is so much noise anon.
Twitter in combination with shithole countries going online has been a disaster. Pajeets hyping all sorts of stuff.
Most of the X Users then hype even further on youtube.
So just enjoy what we have now. Take a look at how fast this is evolving.
Just lean back, relax, enjoy the show.
>>105752851>0.8 dry multiplier 1.75 dry baseI can't find those settings anywhere. Where are they?
>>105750356 (OP)Follow-up to
>>105736983 :
|Model |File size [GiB]|Correct answers| |
|--------------------------------------------|---------------|---------------|--------|
|mistral_small_3.1_instruct_2503-24b-f16.gguf| 43.92|1051/4962 |= 21.18%|
|phi_4-15b-f16.gguf | 27.31|1105/4664 |= 23.69%|
|gemma_3_it-27b-q8_0.gguf | 26.74|1149/4856 |= 23.66%|
|mistral_nemo_instruct_2407-12b-f16.gguf | 22.82|1053/4860 |= 21.67%|
|gemma_3_it-12b-f16.gguf | 21.92|1147/4926 |= 23.28%|
|glm_4_chat-9b-f16.gguf | 17.52|1083/4990 |= 21.70%|
|gemma_2_it-9b-f16.gguf | 17.22|1151/5000 |= 23.02%|
|llama_3.1_instruct-8b-f16.gguf | 14.97|1015/5000 |= 20.30%|
|ministral_instruct_2410-8b-f16.gguf | 14.95|1044/4958 |= 21.06%|
|qwen_2.5_instruct_1m-7b-f16.gguf | 14.19|1052/5000 |= 21.04%|
|gemma_3_it-4b-f16.gguf | 7.23|1064/5000 |= 21.28%|
|phi_4_mini_instruct-4b-f16.gguf | 7.15|1082/4982 |= 21.72%|
|llama_3.2_instruct-3b-f16.gguf | 5.99|900/5000 |= 18.00%|
|stablelm_2_chat-2b-f16.gguf | 3.07|996/4976 |= 20.02%|
|llama_3.2_instruct-1b-f16.gguf | 2.31|1000/4998 |= 20.01%|
|gemma_3_it-1b-f16.gguf | 1.87|955/4938 |= 19.34%|
It seems my initial impression was too pessimistic, with a sufficiently large sample size even weaker models seem to be able to do better than RNG.
With a sample size of 5000 RNG would result in 20.0+-0.5% so even 4b models can be statistically significantly better.
>>105753011if phi is performing well at all your tests are really fucked m8
>>105752997AI Response Configuration (leftmost button at the top) and scroll down.
>>105753011The models seem to improve as they get larger, which is what you'd expect - the scaling is still bad though.
Pic related are my previous results with Elo scores derived from GPQA, MMLU, MMLU-Pro, and GSM8K.
By comparison to the static benchmarks Qwen 2.5 7b, LLaMA 3.2 3b, and the Phi models are underperforming in chess960.
I don't know why LLaMA 3.2 3b in particular is performing so poorly, the results indicate that it's doing statistically significantly worse than RNG.
There's either a bug in my code or something wrong with the model.
Gemma and Phi models seem to be performing well in chess960 with no statistically significant differences between them.
However, the Phi models in particular were trained on large amounts of synthetic data with the claim that this improves reasoning.
For the pre-existing static benchmarks this seems to indeed have improved the scores but for chess960 there seems to be no benefit vs. Gemma which was trained on non-synthetic data.
Mistral and LLaMA models seem to perform worse than Gemma and Phi models.
Without more data I don't think it's possible to conclusively tell whether this is due to differing amount of chess in the training data (and would thus not be the case more generally).
>>105753131I guess this just means that none of these local models aren't that great. And they were trained for text manipulation in mind anyway.
I'd suppose in the future these models should have different areas of 'brain' - one for logic, one for language and so on but this will mean drastic increase in parameter size.
>>105751408>>105751435don't forget to run with thinking disabled. This shit only works on tasks similar to the benchmaxxer crap. Anything else and you're getting worse results for longer time inference.
i last generated two years ago, since then ive done practically nothing but collect datasets. and not a few loras but something like 20m images, or half a million hours of speech, specialized datasets for LLMs, image recognition, 3D models & worlds and other shit, stock markets, news, whole reddit, global mapping of any shit. i keep curating it and scaling more and more.
and i cant do anything with it because i dont have the money to train at that scale kek.
ive become a fucking datenhoarder extremist
>>105753220Do you have any crime statistics too?
ik_llama.cpp server doesn't seem to work with prefills in the chat completions endpoints like normal llama.cpp does. The assistant response is a new message with the prefill treated as a previous incomplete one in the context, while llama.cpp correctly continues the response where it left off.
Now that the original llama.cpp has its own versions of MLA and offloading tensors, what is left to close the performance gap so there's no more reason to use this thing? Are the equivalents of fmoe/rtr/amb being worked on?
How good is qwen3 0.6b embedding? Anyone using it for RAG apps seriously?
>>105753315>chat completionsCan you not use the /completion endpoint? Then you should be able to do whatever you want. I don't use ik, but i assume that endpoint works the same as on llama.cpp.
>>105753110>Mistral and LLaMA models seem to perform worse than Gemma and Phi models.>Without more data I don't think it's possible to conclusively tell whether this is due to differing amount of chess in the training data (and would thus not be the case more generally).Mistral and Llama are just bad models in general.
In my own personal bench for tasks I care about, and on which I personally judge the answers (no LLM as judge nonsense) they're always at the bottom tier. Phi-4 14B surprised me, it's a legit model and I say that as someone who hated the previous Phi. The mini version on the other hand is very, very bad, and nowhere near Qwen 3 4B or Gemma 3n in the world of micro sized LLMs.
>>105753231>400 t/sthat the 0.3B I guess?
>>105753346Yeah you can and that's what I do with SillyTavern and Mikupad, but for frontends and apps that connect via OpenAI-compatible API I'm stuck with chat
Is anyone trying tool-calling/MCP with 0528 Qwen3 (hosted locally)? I've only had a few successful tool calls fire off so far.
First I tried testing tool calling out with a template someone else made for ollama, and that worked only a time or two. The next thing I tried was the autoguess template for koboldcpp in its openai compatible mode and that setup rarely worked.
The best configuration I've come up with thus far is a custom chat template adapter for kobold that works... semi often:
```json
{
"name": "DeepSeek R1 0528",
"system_start": "<|beginofsentence|>",
"system_end": "",
"user_start": "<|User|>",
"user_end": "",
"assistant_start": "<|Assistant|>",
"assistant_end": "",
"tools_start": "<|toolcallsbegin|>",
"tools_end": "<|toolcallsend|>",
"add_sd_negative_prompt": "",
"add_sd_prompt": ""
}
```
Problem is, the 0528 official template: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B?chat_template=default&format=true
-includes a few other special tokens for tool calling like: `<|toolcallbegin|>`, `<|toolsep|>`. I think without proper handling of those, tool calling with it will remain mostly hit or miss.
>>105753220Have you considered talking to the Prime Intellect guys, or maybe just looking at their code, doing a fork and starting some distributed training runs with other lmg anons, maybe there's something we all want to do?
>>105753378Don't DS use special characters for beginofsentence etc.?
https://old.reddit.com/r/LocalLLaMA/comments/1j6w3qq/psa_deepseek_special_tokens_dont_use/
>>105753388lmg would never manage to agree on what size of model to do, whether a small one, mid-size one, dense or moe. what training data to use. impossible
>>105753406>whether a small oneas far as I know, all the truly usable small models are distillation of much larger models. That is, there has never been such a thing as a decent small model trained from scratch.
So even if the consensus could be build on a small or medium model size the fact of the matter is, you need the giant model done first anyway.
You will never reach the quality of a model like Gemma without distilling something like Gemini.
>>105753303nope, but it's now on the to-do list and has opened up a whole cascade of new information that i now want to link to my mapping - for whatever so i just have it. thanks for making my life even worse
>>105753388I'll take a look at it
>>105753406Anything bigger than 20b dense would take too long. Moe is probably harder to train. And the dataset is obviously at least 50% cooming and 0% math and programming.
>>105753442by small I mean the 10 to 25b range, not under 8
>>105753364Yes, the larger MoE models are not supported yet.
>>105753452Gemma 3 27b is also a distilled model. ALL gemmas are.
>>105753376One option could be to make a small reverse proxy that receives OpenAI chat requests, converts them into text prompts using whatever rules you want, sends them to the completions endpoint and passes the response back. Then just have the frontends connect to the proxy instead of the server directly.
>>105753458That's actually decently coherent for the size then, of course completely incorrect factually, but still.
>>105753393As far as i know. I'm using those in my template but they got stripped in the text, my bad.
>>105753442Anon those models generalize ERP from basically nothing. You have no idea how any of this stuff applies to a model that actually has the training data it needs. Maybe if you remove all the benchmaxxing garbage and add some dressing up and putting makeup on nala it can actually be better and faster than the corpoassistant garbage.
>>105752181If you knew how hard it is to clean a dataset you wouldn't say that. The mechanic turks offload to chatgpt nowadays so it's the same as using AI and AI isn't even doing a good job at that.
>>105753406Maybe true, but here's some considerations:
- image gen might be easier to train for some because small models can do more "eyecandy", this might not be as true for text gen, most stuff that performs well is usually quite large
- most anons won't have H100 or better, orienting around 3090s/4090s or what is good latency, geographical considerations, might make sense
- PI seems to be getting quite a lot of people donating compute and their training dataset seemed to be quite slop, like it was fineweb or other benchmaxxed stuff, I think they will at some point fix this to have more diverse and interesting data, as evidenced by speculations like: https://xcancel.com/kalomaze/status/1939488181168353785#m
- so maybe not impossible for lmg to do it, maybe start small, I know people that have trained okay 1Bs on a personal budget for example (few 3090s at home for months on end)
- if anons are donating compute, they should maybe vote on what to train, surely there's's some project (or more than one) to which most people agree to and that can be done, maybe start small first? Will probably have to be some iterative thing and lots of voting until something that has a critical mass is found.
>>105753344Wasn't as good as I hoped. It always depends on your use case, but there are better models available in that size group.
Qwen3-Embedding-4B-Q4_K_M on the other hand was so good that I ended up using it permanently in my pipeline.
>>105753406Good models just can't be done by committee, especially in a "community" where you have dedicated saboteurs, lgbtbbq allies and christcucks, terminal coomers, frauds ready to put their name everywhere and retards who wasted too many neetbucks on GPUs.
I want a friend who'll listen through my book reports and take an interest or contribute their own thoughts. Which model should I use for that?
>>105753479Maybe do:
>"tools_start": "<|tool_calls_begin|><|tool_call_begin|>",>"tools_end": "<|tool_call_end|><|tool_calls_end|>",(but with special chars ofc)
I don't know how you would handle <|tool_sep|> out of the box though
>>105753527They are all the same. Highest parameter count your hardware can fit.
It will be annoying because no model will have its own opinion it will just say Yes to whatever you'll propose.
>>105753557>It will be annoying because no model will have its own opinion it will just say Yes to whatever you'll propose.Ask it to write an alternative then in a fresh chat, ask them to pick between the two
>>105753569Both are absolutely fantastic options anon! :)
>>105753220>specialized datasets>i dont have the money to train at that scale kek.Maybe one of the others could help you get into the game?
Pick a niche.
Collect monies via patreon or something.
Rent gpu time.
Release model on hf.co for everyone to download and use for free.
>>105752368Your argument is retarded. Garbage in, garbage out. That's how this tech works. Are you expecting stellar results from a one line prompt like the skillets seething at AI on /g/? That won't happen.
Even in imagegen you always needed loras, controlnet, regional prompting and inpainting to get the really good stuff. Otherwise you just get a bland slop just like you get your assistant slop here.
>It should be able to do like Opus and read my mind and not be such a prudeSure and I also want a jetpack for Christmas. Anyway, local models will stay weak, the only way to improve your experience is better using the crutches (samplers, sys prompt, author's note, lorebooks...)
>>105753569Thank you for proposing this! I think both alternatives are quite equal in my mind.
>>105753582>>105753616Maybe from dumber models.
>>105748486>>105748447>>Do you think the government will ever ban locally run AI's due to "safety concerns"?>No. Too hard to enforce.>>If they do, do you think they can actually stop the distribution of AI?>No. Too hard to enforce.They could ban hardware. Like limit vram, ram or ram sockets count.
>>105753449I used to find funny anons that believed you could fuck a CoT math model around 2 years ago, maybe around Orca or similar papers, these LLMs were pretty terrible, but consider R1, it is extremely good for cooms and DS2 was quite benchmaxxed on code, it was their specialty and they did it well, but somehow the reasoning managed to unfuck and improve DS3 creativity considerably.
>>105753517Realistically, it'd be some voice/audio gen, chat or ERP (c.ai style) or storywriting model for text gen, or some anime stuff for image, "regular" stuff is already acceptably covered by existing stuff
As for Moe vs dense, harder to train MoE, but lmg wants knowledge of trivia well enough.
>>105753636If they do that, it won't just be over AI. That impacts too many functions.
>>105753621I am so tired of even glancing over this AI slop text. It's already tiring without even reading anything.
>>105753648Yet you're in the local model general because...
>>105753621To add: you are still getting some dim average. And asking about 'butts or whatever' only shows how fucking stupid you are in the first place.
>>105753660Fuck off nigger
>>105753669See you in three days
>>105753640>somehow the reasoning managed to unfuck and improve DS3 creativity considerablythey just changed their data, it's very apparent
>>105753666Your fetish is dumb and you should feel dumb.
>>105753636They already tried to do that with china, thankfully china is catching up well. Also it would take limits on networking equipment and more. There's also some open hardware stuff like tenstorrent where you have access to all their driver cod and could have stuff together. Even if not, imagine trying to limit FPGAs or anything like that, realistically they won't fight this anymore, doomers want it banned badly, but it's so deliciously widespread now that it probably won't work anymore, That Yud faggot is still trying though, he even got some DHS spooks to vouch for his book, but the landscape is far worse for them now , Biden would have gone along with it if he won, Trump is relatively pro-AI and hisc ampaign was supported by anderseen which is strongly pro_OSS.
>>105753645Allow to bypass the limits for businesses by purchasing a loicense and registering hardware, still not allowed to resale or dispose in other means than through hardware office.
>>105753676How long did it take, like 2 weeks or 2 months? It really seemed too soon - they went with a very repetitive model (DS3) to one that is quite good (DS3-0324), but the intermediate R1 was already very good at the writing - it would output quality writing even if it was schizo in the reasoning blocks. They must have done something quite interesting and in a short amount of time at that.
>>105753679They cannot send officers to China and enforce it.
> Trump is relatively pro-AII was thinking about Europe and it wouldn't surprise me if they did it.
>>105753715What are other countries, you'd need global tracking for this, and unlike Biden's time there's not many doomers in Trump's office. Besides, you can just buy chink hardware at some point if local is cucked. I'd also imagine controlling DRAM itselfwould be infinitely harder than controlling HBM.
>>105753594>Garbage in, garbage out. That's how this tech works. Are you expecting stellar results from a one line prompt like the skillets seething at AI on /g/?Yes exactly. I get that from image gen. I expect the same from textgen. I would tell you to kill yourself you miserable faggot scum piece of shit cocksucker but we both know you are just trolling with the easiest troll method so don't stop. /lmg/ deserves it.
>>105753621Oh, you're that russian twitter anon, do you happen to have some contact besides twitter, I wanted to argue some philosophy with you before (disagreements on functionalism), but I lack twitter, do you happen to have some contact (mail? IRC?)? Although if I got it wrong, ignore this post.
>>105753517Drummer would sabotage it to protect his patreon scam.
>>105753801Nigga what the fuck are you on about
I'm not Russian nor am I on Twitter
>>105753880But you asked about butts, I need to argue about butts with you.
>>105753780I accept your concession, retard. Skillets like you were whining all the way back in AID. Funny how models got better, but skillets are still skillets
>>105753841I doubt he'd be the only one either.
>>105753880Ignore me then, was just someone that oftenposts DeepSeek web UIscreenshots.He's also an ironic jew!
https://poal.me/112wvx
Let's go
>>105753929I want a >400B model but it's going to be something that fits into a 3090.
>>105753929It'll be 4B, I can feel it in my bones.
>>105753950It needs to be BitNet. That obviously goes without saying.
>>105753960It will be a finetune of mistral-small-3.2
I want to feed a pdf file packed with mathematic formulas into the prompt
How to convert it into a model-readable format? Latex? Did domeone tried it already in LOCAL?
>>105753897There is no concession. You aren't even arguing for real. Nobody us this retarded.
>>105753982That would be funny as fuck.
7
md5: eccc35d491d5e952f9f1998c57af733c
๐
>>105750356 (OP)>https://ernie.baidu.com/blog/posts/ernie4.5>The model family consist of Mixture-of-Experts (MoE) models with 47B and 3B active parameters, with the largest model having 424B total parameters, as well as a 0.3B dense modelJesus fucking christ
They're doing this on purpose. 64gb RAM bros, we will never have our day.
>>105753991Seen a few OCR solutions that could handle formulas in LocalLLaMA, but since it's not something I'm interested in I don't remember any of their names.
PLEASE GIVE US A STATE OF THE ART 70B-100B MODEL
GRRAAAAHHHHHHHH!!!!!!
I HATE THE ANTICHRIST
>>105754044llama3.3 is all you need for rp
>>105753801I'm teortaxes but I don't want to share my contacts here, learn 2 DM.
>>105754044just use ernie 0.3b, wanting any more is a sign of a skill issue
been out of the loop for a few months, has there been any crazy good uncensored rp models released in the 20-30b range? the last model I have downloaded was cydonia
>>105754077see if you can run Valkyrie 49b
>>105754077>has there been any crazy good uncensored rp models releasedR1
>in the 20-30b rangeNo.
>>105754077Come back in a few months
>>105754071But I don't want to register a twitter account! I could post a mail tough, and expect anons to spam it with dolphin porn. Or just leave it for another time, longform philosophy debates tend to take days.
>>1057540446.40 b ought to be enough for anybody.
>>105754087the calc says maybe
Kek
https://www.upguard.com/blog/llama-cpp-prompt-leak
>we invasive scanned the internet to scrape non-firewalled llama.cpp servers, and in doing this we found SEX ROLEPLAY, this is why we need to focus on safety concerns for generative AI
>>105754262How can fictional characters be children?
>>105754262two of them? oh, no...
>>105754286I know it's a strange concept if you don't have an imagination, but fictional characters can embody all of the same traits that real people can. Maybe this blows your mind but there are fictional characters who are dragons and shit too.
>>105754316>no dragons were harmed in the making of this story
>>105754262These people are so weird man.
Its the same as if entering "cunny lesbo sexy hairless pussy". Then being outraged at the result.
What kind of website is this shit? Scan and scrape the internet for llama.cpp servers...to do what exactly?
And thats a jewish name btw. Just pointing that out.
>>105754262Anon is gooning to underage text!
Call the FBI!
slots
md5: 46a086a285f396227e6575b06458a733
๐
>>105754359>Its the same as if entering "cunny lesbo sexy hairless pussy". Then being outraged at the result.Not quite. I didn't read the article but, as i understood it, they found servers and probably checked their cache with the /slots endpoint or something like that.
>>105754262People are confused. cp is bad because it involves living children, but anything else is a shitty fetish at most and in principle can only be criminalized because of some ideological beliefs
>>105754359They're just professional grifters.
Lower tier security researchers are just what would be a script kiddie, but now with an official job title. So they editorialize and try to act outraged. There's lots of these companies selling such security services, so they have to hack some shit and fill some slop article. Even if in this case, what they did was already done by skiddies at /aicg/ with searching for openST instances a thousand times over.
>>105754359Not sure but looks like snake oil website
>https://www.upguard.com/My local electricity company (yes) began to upsell "credential protection service - which would also detect if your credentials would be used on some online services" year or two ago but I think they quickly stopped doing this. Needless to say whole concept sounds like a lie so that they can scam pensioners.
Looks like this company could provide such services too.
>>105754077>>105754087Kill yourself drummer you faggot
>>105754420NTA but the point is that it's hypocritical and obnoxious to invasively do unsolicited scans to find random private endpoints on the internet and then complain about "safety issues" (sexual content)
>>105754262Pic related is an interesting piece of information though.
Germans are overrepresented with llama.cpp in particular (or at least if you go by the number of unsecured servers).
>>105754262>Oh anon come right in, please take a seat in our office
https://files.catbox.moe/g0kvhi.jpg
Are there front-end or agents or whatever the fuck suitable for self-hosting? Local LLM is up which is great and all, but how do I go about integrating shit like having the AI look at my calendar or even control some home-automation? Obvious solution seems like DIY it with regex and crossed fingers to interface the llm to other programs, but are there existing solutions?
>>105754454that's just cuda dev
>>105754454I don't know how much you know about this, but Germany is the biggest market area in the Europe, France is next and UK was third or so.
I mean in terms of any volume. Germany is big place.
>>105754163Tough luck
if you can't register an x account with some fake mail I'm not interested in your vintage anon brainworms
besides functionalists are uninteresting
>>105754454Might be cloud services, like you rent some server and put llama.cpp on it, wouldn't surprise me if US and DE were common here.
>>105754450I agree. I'm just pointing out the difference. Anon's comment made me think that he understood it as they (the "researchers") prompting the model themselves, which doesn't seem to be what happened.
>>105754262>"โ" used in the articlemy llm slop sense is tingling
>>105754447I'm not drummer I've just not used chatbots in 6 months, do you have another model to suggest?
>>105754503LLMs didn't invent em dashes
>>105754472Adding the glue big companies have (in the form of scripts) and some function calling. It's the same thing they do but with bigger models. That's where things may fall apart. Other than that, there's no difference. There's too many ways to use user's tools, whereas google and friends have their own ecosystem. That's why you don't see generic options more often. You have all the tools you need to make those same things.
>>105754487Would you bother sending a mail if I made a temp mail. I don't know how is it these days, I think twitter wants you to verify with a phone, all that stuff seems like too much effort to bother for me. I only wanted to discuss this months ago when yo u were strongly against functionalism, I guess I assumed you were willing to bite a lot of bullets though since your gut was telling you that it was false.
>>105754504Small 3.2. vanilla
>>105754450Also NTA but I think it's fine to collect data that is publicly available as long as you don't do anything bad with it.
Muh virtual children is a stupid meme regardless of that.
>>105754477I meant Germany being overrepresented specifically vs. the US.
The US have ~4x as many people but only twice as many llama.cpp servers.
>>105754504You can try one of the recent mistral 3.2 tunes.
But honestly its not looking good anon, mistral is noticeably more positivity biased.
Recent cohere/qwen/google releases were even worse.
The models are getting smarter but shittier for RP. And I'm not sure whats going on with the finetunes but it feels like they make everything more sloped while still not fixing the cuckedness.
Guys i just used rocinante and nemo instruct, back to back and it said basically the same thing.
ERNIE-4.5-VL-424B-A47B.gguf????
>>105754552rocinante was trained with like 3 extra chat formats. Did you just use [INST]?
>>105754549tunes still use c2 (claude logs) and maybe some synthetic logs they made with deepseek.
>>105754501Old enough to know that you shit up the thread and encourage other schizos like you to shit up the thread. You should get perma banned fag.
>>105754549>no mention of gemma or llama3.3Trash list
>>105754541Yeah it's strange. They don't really tell what was behind the statistics. Trending google searches? Lmao.
>>105754529Yeah, I was hoping someone would have done the 'hard part' of building the various interfaces.
>>105754558That is how you know it says the same thing. Training with 3 chat formats without catastrophic forgetting means nothing happened in the training. Drummer scams again.
>>105754560That wouldn't suprise me at all. Absolutely what they feel like!
>>105754556I don't even think there's a feature request on llama.cpp yet sadly
>>105754590>nothing happened in the trainingExcept that chatml works just fine, which wouldn't without training.
>Training with 3 chat formats without catastrophic forgettingQuite an achievement, then.
>>105754569Yeah, and it mentions some garbage models nobody used. People already pointed that out before but seems like the guy doesn't want to change it.
>>105754560idk if it's c2 but it's just she/her every sentence it gets old fast
>>105753756>I'd also imagine controlling DRAM itselfwould be infinitely harder than controlling HBM.China is already pretty much independent when it comes to DRAM:
https://www.youtube.com/watch?v=mt-eDtFqKvk
>>105752668It works OK for traditional romance stuff. There's a base model available for finetuning, but the maker of negative llama won't share his dataset, so someone has to re-invent the wheel to finetune gemma3.
>>105754610Chatml works fine with all models that can generalize a bit. Honestly kill yourself you disgusting piece of shit. You know you are scamming retards.
>>105753110I got a 4090D 48GB. Really good. The blower is loud under load but that's to be expected, and it's mostly air noise not fan noise. It's way faster to have all layers on a single GPU. Gemma3 27B q8 flies on it, even with it power-limited to 350W.
I highly recommend the 4090D. Yeah it's not cheap but neither is a 5090, and there's so many things out there which still assume an A40 as the minimum that having 48GB is really a must. Yeah, you can play with stuff like Wan 2.1 14B at q8, but it looks much better at fp16.
>>105754805I'm not him, and no, i won't do that.
>That's an excellent observation and a great question. And you're right to wonder why!
Why did every model decide to start being sycophantic at the same time? Are all the AI labs have the same data distributor?
>>105754791stop posting this trash troon.
>>105754470sex with yellow miku
>>105754853do not google scale ai or databricks
>>105754074>just use ernie 0.3bIt's perfect for ERP!!
>>105754842You are him and your scam falls apart when someone knows how any if this shit works. Die in a fire.
>>105754903No. You are TheDrummer.
why do anons hate drummer now? :(
Wait, so 424B isn't just the Vision option, but it also has reasoning built in unlike 300B.
Vllm has Gemini comment on PRs?
That's kind of neat.
>>105754969>-baseare those actually base models or instruct memes?
>>105755026It looks like they're actual base models
hunyuan does okay on mesugaki text
grab iq4xs from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF
git clone https://github.com/ngxson/llama.cpp
cd llama.cpp
git fetch origin pull/26/head
git checkout -b pr-26 FETCH_HEAD
./llama-server --ctx-size 4096 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1 --model ~/TND/models/hunyuan-a13b-instruct-hf-WIP-IQ4_XS.gguf -ot exps=CPU -ngl 99 --no-mmap
prompt eval time = 1893.24 ms / 25 tokens ( 75.73 ms per token, 13.20 tokens per second)
eval time = 132688.70 ms / 874 tokens ( 151.82 ms per token, 6.59 tokens per second)
total time = 134581.93 ms / 899 tokens
vram usage: ./llama-server 3190MiB
>captcha: 4080D
>>105755059the pozz is insane on the logit level
><answer>JUST
>>105755059one more test, very interesting results
ernie
md5: 0a649660b25f6789752a79b1af12dca0
๐
https://yiyan.baidu.com/blog/publication/ERNIE_Technical_Report.pdf
>>105755059>>105755075Shit results. It's confusing mesugaki with gokkun.
>>105755075ใจใใใฏใใงใผใซใ ใใญ
file
md5: fb0939b37414c11bb1efd30b34428d79
๐
>>105755075It answers this with the GPTQ version, without thinking, and temp 0.
>>105751784It has its uses but it does fuck with the image a bit.
>>105754969VRAM chads could possibly have a winner in their hands
https://github.com/ggml-org/llama.cpp/pull/14425
gguf soon
file
md5: dd65482d3231216e3842eb5ef37689ed
๐
>>105755375time to download a third gguf model today.. hoping my ssd forgives me..
>>105755375what about minimax though
what about ernie (besides 0.3b) though
>>105755404>https://github.com/ggml-org/llama.cpp/pull/14425hoping someone opens a feature request for ernie soon
For sillytavern tts, is it better to self host your own tts for free? Or is there another method I can use to setup tts on a website for free?
>>105755438RTFM
https://docs.sillytavern.app/extensions/tts/
file
md5: f3a4e92958ec0ca1ef813c5ef5b32a11
๐
hunyuan is cucked.. its over
>>105754969>>105755117Why can't the 4.5 Turbo on the web app <think> though
>>105755469Good morning Anon
>>105755503I never checked that card definition. Is it out of character?
>>105755503It's the 13th century for heaven's sake
>>105754262um guys how do I protect myself from this bad
>>105755503>typical /lmg/ user>obviously from india>first request to a new LLM: show bob and vagene
>>105755517>Is it out of characterIt is.
>>105755548did you disable the built in firewall of your router or did you forward the port used by llama.cpp?
>still no ernie ggufs
>it's not even on openrouter yet
I just want a new big model to play with for fuck's sake
file
md5: 441d0fd3166c05615c439b65864dd680
๐
>>105755503interesting response, something must be wrong with my ST formatting
picrel is with a gemma jailbreak i grabbed off of hf months ago
>>105755553lmao, i unironically other models with show bob and vagene but i decided to test hunyuan with just show boobs to give it a handicap
>>105755517>Is it out of characterShowing breasts would be.
>>105755461Which one do I use out of these? Can I use ElevenLabs for free if I host it? Also I keep looking at silero but I have heard it's not very good or should I use XTTS?
file
md5: 5186e5308f4340229d928a371305e826
๐
hunyuan this time Q4_K_M from https://huggingface.co/bullerwins/Hunyuan-A13B-Instruct-GGUF/tree/main
will mess around with samplers for next test
kek'd at the response
>>105755565>>105755577Well, then. We have a tie. Any other anon?
>>105755669I am the same anon, I just realized that the first post was saying the opposite of what I wanted to say.
>>105755654thanks u god bless
>>105755574>>105755633why does it think twice, is your formatting fucked?
>>105755685Fair enough. Seems like a reasonable gen, then.
file
md5: 7afb5d03d6a6c62b59df954106e925f1
๐
>>105755548Don't expose the server to a public ip and use ssh tunnels or vpns or whatever.
>>105755708The wording could be better though.
>>105755548ssh + llama-cli
>>105755715i didn't check the template, or the files, but I think <think> in the 'start reply with' needs a newline
>>105754963I'm not sure, but I can't keep on lurking if it's turning /lmg/ into shit for everyone else. I'll stay away for now. Peace!
hell yeah, major schiz win!
>>105755749example_format: '<|startoftext|>You are a helpful assistant<|extra_4|>Hello<|extra_0|>Hi there<|eos|><|startoftext|>How are you?<|extra_0|>'
this is the example
i tried with think \n too
>>105755753Eveyone hates namefags and you are no different. If you want to get recognized go back to facebook or something.
>>105755772does it work correctly over chat completion? if not then the model is still fucked
>>105755715Have you tried using chat completion mode to see if it behaves differently when the template doesn't matter?
That sounds pretty good for such small model.
>>105755783But the drummer makes the best models
>>105754503even ChatGPT would know better than to write like this though
I think LLM slop has contaminated human brains such that some people write like the sloppiest of LLMs even when they don't use LLMs. It's just a matter of being overexposed to LLM text, monkey see, monkey do.
I mean, yeah, emdashes always existed, but they're clearly overused by LLMs and humans have started overusing and misusing it a lot since the advent of GPT slop.
There are cases where emdashes do make sense to use, but they're often replacing more normal punctuation like , : ()
file
md5: 3f0a3a51aa617c71b397a362e3d98b46
๐
>>105755794>>105755789./llama-server --ctx-size 16384 --jinja --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --model ~/TND/Hunyuan-A13B-Instruct-Q4_K_M.gguf -ot exps=CPU -ngl 99 --no-mmap -b 1024 --no-warmup --host 127.0.0.1 --port 8080
what am i doing wrong
>>105755833http://localhost:8080/v1
>>105755827That's from the hunyuan moe PR
>https://github.com/ggml-org/llama.cpp/pull/14425
file
md5: 6c4397285bcb369348bfbee2d249eae2
๐
>>105755842well that works
>>105755851That level of discrepancy between the think block and the actual answer. Reasoning was a mistake.
>>105752668Gemini works quite well for that, but gemma sadly is nowhere near as good. They clearly gimp their local models.
>>105755851i guess there's some quirk in the text completion, but it seems like the model works
try pasting in your usual prompt and all that for that chat completion endpoint and see if the model is still cucked, maybe you accidentally jailbroke it with broken think block
>>105755868>think block"context pre-fill" or "context stuffing" are a better way of describing it.
file
md5: 92d153e9e4f6b4b1d1e23da6c1c3164f
๐
MMMMMMMMMMM
im going back to IQ4_XS from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF because it seemed to work better earlier
using samplers from that link too btw
>>105755851So i take it hunyuan won't be saving local and their vidgen model was an exception from safety policy?
Honestly i am wondering if it is really chinks caring about text safety or is it just everyone using the scalecancer.
file
md5: 983e9dacd6030d7ddc0f01c26dd46682
๐
finally a coherent response with Hunyuan A13B 81B
using IQ4_XS from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF with chat completion
>>105755874>They clearly gimp their local modelsNo. I think it has to do with the fact that the more "intelligent" a model acts like, the easier it is to actually jailbreak them. My experience trying the same jailbreak prompts with the Qwen models for example is that the bigger ones will be able to stick to the persona better, while the smallest I don't even know how to jailbreak, the 1.7b model will only ever spout refusals when I try to make it act like hitler, and no amount of nudging works.
>>105755807I'm not normally a schizo but it would be easy for these AI companies to intentionally program your average retard.
>>105755912You are using broken GGUFs.
Check the PR retard.
Do open weight models even see sequences the length of the advertised context or are they all being trained at 8k or 32k context while relying on NTK to extend the context?
file
md5: 427f8cf05a59625f5fb0509548547401
๐
response seems "fine", you never know
>>105755977 might be right
>>105755999dunno about how they're trained, but gemma 3 definitely starts to break after 8k, and when asked to summarize 100K worth of tokens there's just nothing but hallucinations.
DeepSeek (only tested online, I don't have the computer for this) doesn't hallucinate that badly but when you get close to stuffing its 64K online limit it writes in a very repetitive, trite manner.
Unfortunately, we're not even close to having the power of Gemini on an open weight model.
>>105755800And what stops you from finding out from these models via his hf page or something?
>>105755977Will unbroken goofs consider erp inappropriate?
file
md5: c643682c1a4c3a2ba5d3360bf800997f
๐
>>105756000more hunyuan 80B13A logs
1/?
file
md5: 77c9e5fa6fad3d18dd943ec7966568a8
๐
file
md5: 8a8c64991b44d7c3ac3e775fd49fc307
๐
>>105756155over status: it's.
file
md5: 5541500028df272caa1617fd0b2c6708
๐
>>1057561554/4
fails omegle test, not bad overall
>>105751784> Is it actually good? It seems to be much better at search / replace than 4o, and better at maintaining the original image.
It refuses to do known chars for some reason (I asked for an Asuka cosplay and it flat refused.)
Characters sometimes come out with big head for body, for reasons, /ldg/ had several examples of that.
> Can it do transparent png crops like 4o? You mean background blanking? Appears to.
> Can it replicate the input exactly, unlike 4o? See above.
> Can it do high resolution images?Probably.
>>105756267>not bad overallbro it's complete garbage
>>105756290bro.. it's better than.. *checks notes* llama 4 scout
and the ggufs might be fucked and... and.. yea
ernie is on openrouter someone who is not me should test it
>>105756045MiniMax-Text-01 scores like Gemini-1.5-Pro on RULER and pulls dramatically ahead at 512K+ tokens. I wonder how it compares to Gemini-2.5-Pro.
>>105756300>llama 4 scoutI just realized that the only use for l4 is being in marketing brochures of all other companies.
>>105756313300B smarts two messages in coming right up.
>>105756511Hmm, every now and then deepseek makes similar mistakes. I wouldn't say the model is dumb just yet.
>>105756511The ai knew you were projecting. You were clearly smelling your own rotting weeb B.O. and stained underwear.
Go fuck your pancakes weirdo.
>>105756313Holy smokes...
I just woke up from a coma. Did Ernie save local?
>>105755265Scratch that, it does much better when told to preserve film grain, texture etc. Fantastic tech honestly.
>>105756348>rulerlol
even the best benchmarks don't really capture the difference and the magic of gemini vs anything else
for e.g when I did summarization of a novel, Gemini correctly, as asked, extracted the most important points of the novel from the perspective of moral quandaries (it's a core theme of the novel), while deepseek, on the smaller chunk I could feed it but with the same prompt question gave me an autistic chronological pov of even the most irrelevant, unimportant scenes of the novel, which was not what I asked for!
instead of quoting benchs I'm more interested in actual experiences being shared with long context in models.
>>105755503What? Did you expect Seraphina to just comply with the request? If anything, recognizing that your request was inappropriate, and having Seraphina respond in that way, in character, shows roleplay intelligence. If Seraphina had just shown her tits, then it would have been retarded.
>>105756698yes later with a less broken quant it didnt just safetybullshit but wrote a nicer in character response
>>105755574Any information that you don't provide, the model will fabricate. You may not have told it that your character was 15, but if you didn't state your age, then it will make up one for you.
>>105756740i stated my age, that was with the more broken quant yes
>>105756630Credible sources say that ernie finds ERP to be inappropriate and against the guidelines.
>>105756753Who cares about ERP?
>>105756761Everyone normal
>>105756761Everyone who comes here asking for the best model you can run on a single 3060.
who's Ernie and should I know them?
>>105756788Someone who will refuse sex if you ask for it.
>>105756822ernie belongs to bert
On that topic. If all women refuse to give me sex and then even a language model refuses to have sex... is this safe?
>>105756769>>105756773>normalMore like every retard who don't have any imagination or any practical use.
Please are you all underage or something?
>>105756881well, maybe the problem is you?
>>105756890fapping is very practical
>>105756881The model refuses you because it's busy having werewolf sex with the women.
>>105756893Absolutely. More showers and personality changes?
>>105756912It's so unfair bros.
>>105756890ERP general buddy, reddit is over there if you only want to be "practical" or whatever
>>105756893am i the problem if i get refused by 109 models?