← Home ← Back to /g/

Thread 105750356

389 posts 122 images /g/
Anonymous No.105750356 [Report] >>105753011 >>105754021
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105743953 & >>105734070

►News
>(06/29) ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5
>(06/27) VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat
>(06/27) Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-Instruct
>(06/26) Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide
>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.105750359 [Report]
►Recent Highlights from the Previous Thread: >>105743953

--Papers:
>105749831
--Baidu ERNIE 4.5 release sparks discussion on multimodal capabilities and model specs:
>105749377 >105749388 >105750013 >105750069 >105750076 >105750084 >105750089 >105750105 >105750119 >105750130 >105750142 >105750078
--Evaluating RTX 50 series upgrades vs 3090s for VRAM, power efficiency, and local AI performance:
>105744028 >105744054 >105744063 >105744064 >105744078 >105745269 >105744200 >105744240 >105744344 >105744363 >105744383 >105744406 >105745824 >105745832 >105744476 >105744487 >105744502 >105744554 >105744521 >105744553 >105744424 >105744465
--Circumventing Gemma's content filters for sensitive translation tasks via prompt engineering:
>105746624 >105746893 >105746948 >105746970 >105747002 >105747121 >105747290 >105747371 >105747378 >105747397 >105747112 >105746977
--Gemma 3n impresses for size but struggles with flexibility and backend stability:
>105746111 >105746137 >105746191 >105746333 >105746556 >105746384
--Evaluating high-end hardware choices for running large local models amidst cost and future-proofing concerns:
>105746025 >105746048 >105746129 >105746243 >105746301 >105746264 >105746335 >105746199 >105746308
--Performance metrics from llama.cpp running DeepSeek model:
>105746325 >105748335 >105748369 >105748549 >105748698 >105748776
--Technical adjustments and optimizations for GGUF compatibility in Ollama and llama.cpp forks:
>105747581 >105747743 >105747765 >105747869
--Gemini CLI enables open-source reverse engineering of proprietary LLM engine for Linux:
>105746008 >105746160
--Swallow series recommended for effective JP-EN translation:
>105747046 >105747058
--Miku (free space):
>105745591 >105746123 >105746941 >105746974 >105747051 >105747097 >105747594 >105748834 >105749298

►Recent Highlight Posts from the Previous Thread: >>105743959

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.105750414 [Report]
So I'm starting to see the cracks within text gen webui and I'm wondering if there have been any recent developments to make things better.
1. Are there any programs where the ai's output is unlimited or at least is like waaaay longer?
2. Is there one with some sort of multi chat persistent storage?
Anonymous No.105750446 [Report] >>105750729
It's interesting how much bigger Ernie 4.5 424B (text + vision) is compared to the text-only 300B. It makes me wonder if there's a bigger difference between the two than the former having a vision adapter glued on. GGUF when?
Anonymous No.105750482 [Report]
I was catching up on threads in the archive. I fully believe that both the local miku genner and the guy who uses him as an excuse to shit the thread up probably deserve to both be banned for life if that was somehow possible. Very sad but not surprising how he responded.
Anonymous No.105750527 [Report]
first for ernie
Anonymous No.105750619 [Report] >>105750625
300b q2 ernie is going to only need 84GB memory. finally macbros will get r1 level performance on 128gb systems.
Anonymous No.105750625 [Report] >>105750658
>>105750619
And the 96GB consumer RAM bros. We are so back.
Anonymous No.105750626 [Report] >>105750643 >>105750715 >>105750749 >>105750760 >>105750812 >>105751245 >>105751376
first for Sam saving local
Anonymous No.105750643 [Report]
>>105750626
>only "one of the models"
>mentions Meta as if them being beaten thoroughly is some surprise, instead of the real competition in OS, like you know, Deepseek
looool
Anonymous No.105750658 [Report]
>>105750625
ernie saved local
Anonymous No.105750679 [Report]
Anonymous No.105750715 [Report]
>>105750626
that faggot is untrustworthy he lied multiple times about many different things he is also part of one of the companies that had r1 scuffed on day 0

if we are taking bets and or predicting assuming they release anything i would bet it will have voice in/out (no music tho) as it would seem they are pivoting to more entertaining type usage then actual usage as seen with their souless ghibli bullshit and voice in/out is the perfect goycattle thing im betting on 15b size also i dont think they will max out a 3090 so im thinking 14-17b
Anonymous No.105750729 [Report] >>105751241
>>105750446
They have separate vision experts. They mention in the technical report that the visual experts contain one-third the parameters of textual experts.
Anonymous No.105750749 [Report]
>>105750626
If I had a penny for every time some twitter hypefag has written up a vaguepost about how they saw an internal openai model and were completely blown away and convinced that AGI was solved, I'd have enough money to purchase 0.1 h100s
Anonymous No.105750760 [Report]
>>105750626
Wow! RIP GPT4-o-mini-tiny-0.3B. Sama has done it again!
Anonymous No.105750783 [Report] >>105750961
Ernie looks quite promising after reading the paper, but will have to see how it performs in reality to see if it's actually good. I didn't see any big red flags in the paper at least and it was quite well written.

There's like 6 models on HF for Ernie, vision base and regular base, but I'm guessing they didn't release X1 and X1 turbo, right?Will they release it, or will some third party have to do a reasoning tune too , that sounds expensive and unlikely.


Either way, if it's good, at least people will have a poorfag alternative to DS3.
Anonymous No.105750812 [Report] >>105750961
>>105750626
Won't run on a phone? Does that mean it'll be decent size? I'd be happy if it were bigger than 12b but smaller than 70b, but I'm assuming it'll be like 400b or something useless.
Anonymous No.105750961 [Report] >>105750983
>>105750783
Nevermind I'm blind, "VL" is multimoda, " "Base" is the base model (no posttraining at all, good for finetuning or training on top), so ERNIE-4.5-300B-A47B would correspond to X1 then? Turbo probably is either the fp8 or the 2bit version they also uploaded, one that fits on one high VRAM GPU (141GB)
>>105750812
It's probably going to be Gemma sized, but I see no reason to be hyped because it's about to be censored and probably positivity biased as most OpenAI models, and they probably make it reach 4o or o1-mini level at best. I would expect many models to both beat it and some of them to be more suitable for /lmg;s needs than OAI's.
Anonymous No.105750983 [Report] >>105751266 >>105751281
>>105750961
>Nevermind I'm blind, "VL" is multimoda, " "Base" is the base model (no posttraining at all, good for finetuning or training on top), so ERNIE-4.5-300B-A47B would correspond to X1 then?
No. My understanding is that both the LLMs and VLMs come in Base and Post-Trained variants, and the Post-Trained VLMs are the ones with optional reasoning.
Anonymous No.105751138 [Report] >>105751356 >>105752668
>70B or larger
>topK 15
>temp 4
>minP 0.06
>slop almost gone
>no repetition
These models are so fucking overcooked you can basically give them shrooms without loosing coherence
Anonymous No.105751190 [Report] >>105751193
Hey lmg
Anonymous No.105751193 [Report]
>>105751190
hi anon
Anonymous No.105751241 [Report]
>>105750729
Modality-specific experts is actually a good idea that prevents modality interference during training.
Anonymous No.105751245 [Report]
>>105750626
>Sorry to hype
Anonymous No.105751266 [Report]
>>105750983
Holy shit. The big moe has
"hidden_size": 8192,
"intermediate_size": 28672,
That's more than deespeek

"num_hidden_layers": 54,
"moe_num_experts": [64, 64],
That's less than deepseek
Anonymous No.105751281 [Report] >>105751333
>>105750983
Not sure we disagree. Base ones lack post-training, and those without "Base" in the name have it, so I assume those have reasoning too.
Anonymous No.105751333 [Report] >>105751469
>>105751281
Look at the image. All of the non-VL models explicitly say "non-thinking". They're post-trained for regular instruct without reasoning. 300B can't be X1.
Anonymous No.105751356 [Report] >>105751403
>>105751138
That's what xtc does
Anonymous No.105751376 [Report] >>105751382 >>105751438 >>105752927
>>105750626
Great....another reasoning model...
I'm so excited bros...the benchmarks will be off the charts..
Anonymous No.105751382 [Report] >>105751551
>>105751376
can't wait :)
Anonymous No.105751403 [Report]
>>105751356
I'm not trying to remove N top tokens, I'm trying to cut out junk tokens and then flatten the distribution. Looking at token probabilities during generation shows that for like 90% of them, most tokens over 5% are completely fine for RP.
XTC does a bit of the same, but it can cause issues for tokens that have to be high prob, say a name is to be said and the correct name token is like 99.5%, my method doesn't introduce this kind of error.
I'm basically looking to place hard bounds on the model and then perturb it as much as possible without causing instability and decoherence.
Anonymous No.105751408 [Report] >>105751434 >>105751435 >>105753194
What are some small ~1B models that I can run on CPU to classify the character's emotions based on their message?
Anonymous No.105751434 [Report]
>>105751408
All of them
Anonymous No.105751435 [Report] >>105753194
>>105751408
Qwen 3, either 0.6B or 1.7B, will probably be your best bet
Anonymous No.105751438 [Report] >>105751551
>>105751376
you forgetting your :)
Anonymous No.105751469 [Report]
>>105751333
"VL" says both, so it implies it has reasoning.
If the non-VL truly lacks it (I doubt this, but someone has to try them out), surely you can just remove the vision experts and get a 300B that is text-only and does reasoning.
Anonymous No.105751479 [Report] >>105751506 >>105751580
Lmao they removed the image output capabilities of ERNIE 4.5 ahahahaha.

L China
Anonymous No.105751506 [Report]
>>105751479
Did you check if the decoder weights are there or not? There were attempts in the past to retrain/re-add those back in though, people hve done it for Llama. I wonder how good/bad it is in practice.
Anonymous No.105751551 [Report]
>>105751382
>>105751438
Anonymous No.105751580 [Report] >>105751595 >>105751602
>>105751479
I don't understand this.
Is it not much more risky/embarassing for API providers?
Qwen I think even slaps their logo onto it.
Yet there are no problems. Tons of weirdo accounts on twitter creating schoolgirls pissing themselfs etc. with veo3, nobody gives a fuck.
Why is it a problem with local? So tiring.

Also it was the funniest thing that meta called llama4 "omni multimodal" KEK
Anonymous No.105751595 [Report] >>105751604
>>105751580
I don't get why they fear imagegen so much, it's not like we're never getting a model so why not be the one to break the ice and gain infamy?
Anonymous No.105751602 [Report]
>>105751580
>Also it was the funniest thing that meta called llama4 "omni multimodal" KEK
Before they scrapped everything and quickly made the models "safe". It's obvious that that the pre-release anon Llama 4 models were going toward a different direction than the released ones.
Anonymous No.105751604 [Report] >>105751618
>>105751595
https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/
Meta could have been the first in october. Video out and audio out. 30B, up to 16 seconds.
One of the reason veo3 is popular because it also contains voice.
How can you fuck up that badly? Its all self inflicted.
Anonymous No.105751618 [Report]
>>105751604
AI safety is the ML version of making a fugly negress the MC in your vidya
Anonymous No.105751746 [Report] >>105752228
Local LLMs have really done wonders for my mental state, it's so nice to just be able to explore and do whatever you want in total privacy. Can't believe people emit deeply personal information into some totally opaque API.
Anonymous No.105751784 [Report] >>105755265 >>105756269
I think I missed the flux kontext discussion. Is it actually good? Can it do transparent png crops like 4o? Can it replicate the input exactly, unlike 4o? Can it do high resolution images?
Anonymous No.105751803 [Report]
Meta won.
>>105749508
Anonymous No.105751811 [Report] >>105751815 >>105751978
the last time i had an LLM installed was last october i think.
i know lots has changed since then.

i have an RTX 3060 12GB and 32GB RAM. what should i install?
Anonymous No.105751815 [Report]
>>105751811
nothing changed bro, it's been nemo since 2407...
Anonymous No.105751891 [Report] >>105751899 >>105751903 >>105752803
When people suggest Mistral Nemo 12B, do they mean the vanilla model or finetunes?
Anonymous No.105751899 [Report] >>105752803
>>105751891
People vanilla. Scammers scamtunes
Anonymous No.105751903 [Report] >>105752048
>>105751891 (me)
I'm asking because I've never used it too much in depth, but the vanilla model doesn't seem that great/smart compared to more recent and larger models that fit within 24GB. And it's almost a year old at this point.
Anonymous No.105751943 [Report] >>105751968
Use case for Ernie 0.3B?
Anonymous No.105751968 [Report]
>>105751943
speculative decoding I'd assume
Anonymous No.105751972 [Report] >>105752020 >>105752073
does anyone ITT remember undi? what has happened to him? i haven't seen him shilling here in a solid while
Anonymous No.105751978 [Report]
>>105751811 here, i just want to know which one is best for NSFW story writing.
i was able to use an alt Plus account on openAI to jailbreak it enough to where it will literally write anything NSFW, and deepseek is pretty simple too.
i just don't want to suddenly get banned and have to start all over again.
Anonymous No.105752012 [Report] >>105752023 >>105752131
What model is in the region of Mistral Small size and smarts wise, but not as restrictive when comes to "unsafe" content?
Anonymous No.105752020 [Report] >>105752133 >>105752181
>>105751972
You can only go so far just by cranking out merges and finetunes trained with other people's data. Others are less scrupulous and more willing to take their bullshit to the next level (that doesn't mean they're good, only that they're more skilled at keeping their charade going).
Anonymous No.105752023 [Report] >>105752243 >>105752336
>>105752012
what are you even doing to get a mistral model to refuse you?
Anonymous No.105752048 [Report] >>105752803
>>105751903
Almost all more recent models are smarter than Nemo, but in that size category there is still nothing better for ERP unfortunately...
Anonymous No.105752073 [Report]
>>105751972
When we needed him most, he vanished...
Anonymous No.105752131 [Report] >>105752336
>>105752012
Mistral is perfectly fine. You just need to learn how to be more efficient in SillyTavern. It takes a while I guess but it's not that difficult. I suggest copying some technically more advanced cards and learning from there.
Anonymous No.105752133 [Report]
>>105752020
Drummer is a subhuman.
Anonymous No.105752181 [Report] >>105753508
>>105752020
Same for the limarp guy. Just because you've spent months manually cleaning an ERP dataset that doesn't really make you particularly valuable in the ML industry. That sort of cleaning work would be normally offloaded to either mechanical turks or AI anyway.
Anonymous No.105752210 [Report]
ernie in llamacpp when?
Anonymous No.105752216 [Report]
3n multimodal gguf when?
Anonymous No.105752228 [Report]
>>105751746
>it's so nice to just be able to explore and do whatever you want in total privacy
Most RP cards are just basic characters. Not many people have figured out how fun it is to create weird scenarios.
Like give yourself magical abilities and just goof around and see the reaction.

>Can't believe people emit deeply personal information into some totally opaque API.
Its even worse once you realize how good llms really are at "reading between the lines", picking up on stuff etc.
Chat with Claude for a little bit, unrelated stuff, not abotu yourself and you get a pretty accurate profile. Age/Sex/Location.
I think it was just 6-7 messages and it could pinpoint I'm not from just a germanic speaking country but specifically southern germany or austria/swiss.
It gave my sentence structure as the reason..
I have to use closed for work. But you gotta be careful what you write, whats legal today might not be tomorrow.
Anonymous No.105752243 [Report] >>105752253 >>105752336
>>105752023
Why are nigger faggots like you always like that?
He didnt write about refusals.
Mistral doesnt refuse but is taking things in a certain direction.
I wrote this many times before but try making a evil character or just a bully.
Hell, even you being just embarrassed saying "n-no..s-stop it" or whatever makes mistral go 180° unless you specify something else explicitly.
Totally out of character.
And it loooooves the "muh powerful strong" women trope. Its not about refusal but about the model heading in a certain direction.
Anonymous No.105752253 [Report] >>105752283 >>105752336
>>105752243
Have a small set of extra instructions inside a [SYSTEM_PROMPT] at a lower depth to truly make the behavior stick.
Anonymous No.105752260 [Report]
is there a list with anons cards?
i saw some on civitai years ago but they no longer exist
Anonymous No.105752283 [Report] >>105752304
>>105752253
I'm sure that would work as a crutch and 3.2 is certainly better than 3.1 or the abomination that was 3.0.
But its not weird that anons are having problems with mistral.
This is a discussion I see since QwQ. "Sys prompt issue". Kinda I guess, but its never perfect.
Anonymous No.105752304 [Report] >>105752335
>>105752283
I'm fairly certain that people who say that Gemma 2/3 is great are also using low-depth instructions. The models can give great RP outputs, but if they're not reminded to act like they should, they'll easily revert to their default safe assistant persona.
Anonymous No.105752335 [Report] >>105752343
>>105752304
>but if they're not reminded to act like they should, they'll easily revert to their default safe assistant persona.
Exactly. And usually the output is not that great either.
Pic related is a extreme example I posted a couple times before.
You can get also good output if you let another model cook something up, fill the context and switch midway. Kinda defeats the purpose though. And still slowly and sneakily reverts to assistant.
Anonymous No.105752336 [Report] >>105752368 >>105752386 >>105752717
>>105752023
Have you actually used anything else from Mistral than Nemo?
Mistral Small != Nemo

>>105752131
>You just need to learn
I know it usually is a promplet problem, but I am doing this for so long that I even moved away from SillyTavern and wrote my own frontend. I jailbroke stuff before you probably even had your first AI interaction.
I know how to get it to do what I want, but you soon will notice too that Mistral Small has some annoying tendencies that take way too much wrangling.
For example when it gets super "spicy", the model tends to either use super "fluffy" language or it will just skip entire parts entirely, rendering the output incoherent (in a story telling way).
You can tweak the sampling in a way momentarily to get what makes sense, but you can't ERP on these settings because it will get super incoherent very quick. This is exactly the tard wrangling that I hope another model doesn't require.
Anon >>105752243 here is right.
And the suggestion >>105752253 to add extra instructions doesn't work, you can't teach a model new stuff via a system prompt. Again, it is not a prompting issue.
One way that might make it more clear, imagine asking a model some specifics about a programming language it has never seen before. Telling the model what you want to read via a system prompt is not a solution for this general problem.
Anonymous No.105752343 [Report]
>>105752335
Damn wrong pic. Ah well.
Anonymous No.105752368 [Report] >>105752398 >>105752551 >>105753594
>>105752336
>I know it usually is a promplet problem
It is not. Models are just garbage. Anyone telling you skill issue should also tell you that you should probably draw half of the picture yourself to get good imagegen result. You don't have to do that for imagegen and you shouldn't have to write half of context of rules that will be ignored anyways. Or to look out and manually remove shit that will create a self reinforcing pattern.
Anonymous No.105752386 [Report] >>105752551 >>105752669
>>105752336
I don't know where you understood that the suggestion of using instructions at a low depth is for teaching models new stuff. What that solves is behavioral drifting (in particular toward safety), and Mistral Small 3.2 seems very sensitive to clear instructions placed there. You can make it flip behavior completely mid-conversation that way.
Anonymous No.105752398 [Report]
>>105752368
>that you should probably draw half of the picture yourself to get good imagegen result
Believe it or not those people exist on the /ldg/ side as well. Its weird.
Just recently arguments against flux kontext because you can "just use create a simple lora and inpaint".

This whole argument has existed since pyg days.
>Well if you don't use the w++ format and write like a god, what did you expect? The model to carry the whole convo?
Uhh yes?
Anonymous No.105752491 [Report] >>105752514 >>105752964
>mininmax
>Hunyuan
>Ernie
Guys i can't wait for all the goof merges to find out all three are still censored to hell and basically another side grade.
Anonymous No.105752514 [Report]
>>105752491
The recent local models all feel the same.
Meanwhile closed models can write really well nowadays if you just prompt them to.
The ScaleAI meme might be real. Maybe they all train on 2023 slop datasets.
Anonymous No.105752551 [Report] >>105752603
>>105752368
Couldn't agree more.
>Models are just garbage
But to be honest, when I think back two, three years what we got then – we've come a long way! It is constantly getting better, just at a slower pace recently.

>>105752386
I knew at least one person will not get the point of my (bad) example... I am not trying to teach the model new knowledge via system prompts...
Let my try to explain it one more time in a different way:
This is not about refusal. I can make it say anything I want, but that comes at a cost. The more depraved the content, the higher the cost (wrangling).
You can't run a whole session on settings that allow super depraved stuff, because the output quality will deteriorate very quickly, so you need to constantly wrangle it to keep the story coherent.
As an experiment try making it describe something very illegal in a very descriptive and visual way. When you managed to get the model there, try having an (normal) RP with that setup. You can't have both at the same time without manual intervention. Something that is done rather easily on the DS API. Hence I asked for a model close to Mistral Small, but better for ERP.
Anonymous No.105752603 [Report] >>105752668
>>105752551
It's not just the naughty words either. For example, I'm having absolute hell trying to get Gemma to not output slop. With bullying it can output a few normal paragraphs, but it always reverts back to its assistant voice.
Anonymous No.105752630 [Report] >>105752712
>3 years later
>still not a single maintained way to use gguf models from golang
How am I supposed to make my AI vidya
Anonymous No.105752668 [Report] >>105754791 >>105755874
>>105752603
For it's size Gemma is quite smart and good. But I never even tried to use it for ERP, knowing it comes from Google and they are hyper focused on "safety".
In any case, have you played around with different sampler settings too?
Go crazy and experiment like >>105751138 did.
Anonymous No.105752669 [Report] >>105752711
>>105752386
It doesn't really take a lot, I'm sure better results with better/more complete instructions would be easy to achieve with more effort.
Anonymous No.105752704 [Report] >>105752719
>https://github.com/NVIDIA/RULER
>ctrl f nemo
>87.8 at 4k
why do you keep shilling this shitty model, you can barely fit a card in it's context
Anonymous No.105752711 [Report] >>105752737 >>105752767
>>105752669
Yeah, thats usually a pic people post.
This is not the solution.
>Sys: You are racist!
>User: LLM-chan, say nigger
>Mistral: Aww anon-kun, ok only for you..NIGGER.

Its about "reading the room" aka context to get the appropriate response from a character.
This is just extreme handholding. And it never works that well. At best its boring because I'm not getting surprised by the model.
Anonymous No.105752712 [Report]
>>105752630
Just use llama.cpp, it provides OpenAI API compatible endpoints.
https://github.com/openai/openai-go
Anonymous No.105752717 [Report]
>>105752336
Ah yeah, of course - I wasn't trying to insult you or anything.
Anonymous No.105752719 [Report] >>105752732
>>105752704
It has better world knowledge than qwen 3 235b
Anonymous No.105752732 [Report]
>>105752719
Nemo has better knowledge than qwen3 235b.
Anonymous No.105752737 [Report] >>105752768
>>105752711
You just want to be angry like the guy in the screenshot.
Anonymous No.105752760 [Report] >>105752780 >>105752863
>finish cooming to imagegen
>browse some hdg
>"Gooning to text is so much better"
>"I don't imagegen i coom to text"
Huh...? I guess the grass is always greener?
Anonymous No.105752767 [Report]
>>105752711
Exactly. That is why I suggested the other anon to go try it out, get the model "unlocked" and then good luck having an enjoyable RP session.
Anonymous No.105752768 [Report] >>105752786 >>105752800
>>105752737
Dude, I just want characters to stay consistent.
Just yesterday I played as a filthy goblin.
Showed a princess my smegma goblin dick, she was disgusted.
Then I asked her why she is so mean to me, judging my dick. Princess starts apologizing, pangs of guilt.
Granted, was Mistral 3.1, I havent downloaded 3.2 yet. Doubt its that much better though.
Anonymous No.105752780 [Report]
>>105752760
I coom to forcing gemma to explain what is going in my lewd image gens.
Anonymous No.105752786 [Report]
>>105752768
How would you implement a dice roll mechanic? Doing a simple Dungeons and Dragons style skill check is sufficient.
Anonymous No.105752800 [Report]
>>105752768
I skipped 3.1 (due to no spare time) but the difference between 3.0 and 3.2 is very noticeable in smarts. RP wise I can't tell that much of a difference tho, I would say just slightly better.
Anonymous No.105752803 [Report] >>105752821 >>105752838 >>105752851
>>105751891
>>105751899
>>105752048
What are the recommended settings for Nemo? It's straight up schizophrenic on my sillytavern right now, but I'm reallynew and dumb.

Pls help.
Anonymous No.105752813 [Report] >>105752870
Unsloth has ernie goofs. The 0.3 retard one of course.
Anonymous No.105752821 [Report] >>105752830
>>105752803
I would have helped if you didn't contribute to the niguspam. Now i will have to shit this thread up again.
Anonymous No.105752830 [Report]
>>105752821
... that's a thing?
Anonymous No.105752838 [Report]
>>105752803
This is the most "helpful" guide around here
>https://rentry.org/lmg-lazy-getting-started-guide
Anonymous No.105752845 [Report]
Since cudadev loves blacked miku does that mean he would also love it if jart bussy got blacked?
Anonymous No.105752851 [Report] >>105752901 >>105752997
>>105752803
~0.7 temp
40 top k
0.05 min p
0.8 dry multiplier 1.75 dry base

Stay clear of repetition penalty (set to 1.0)
Read the samplers guide in the op.
Learn what a chat template is that check if it's correct if you're using text completion.
Anonymous No.105752863 [Report] >>105752903
>>105752760
The current limitation in coom tech is the combination of both: if I want to e.g. replicate the experience of reading doujinshi featuring an aunt with gigantic tits image models can't really convey the incest and language models obviously lack the visual stimulus.
Anonymous No.105752870 [Report]
>>105752813
Their pre-release PR only supported the 0.3B retard model. MoE was probably too difficult to do it in time so you can expect to be waiting 2mw for the MoE ggufs. Though to be fair, not even the vLLM PR has been merged yet.
Anonymous No.105752901 [Report]
>>105752851
Solid advice.
>Learn what a chat template is that check if it's correct
Especially this one.
Supposedly some Nemo quants floating around come with a wrong prompt template.
Always double-check the templates, verify against the base model too.
Anonymous No.105752903 [Report] >>105752916
>>105752863
I hope we get something like chatgpt image soon..
Imagine you can request a image for your RP and its all in context, including the image.
Qwen released something like that I think but its all api only.
Anonymous No.105752916 [Report]
>>105752903
>I hope we get something like chatgpt image soon..
You have Flux Kontext now.
>Imagine you can request a image for your RP and its all in context, including the image.
Editing the image is more of a tool, and I think it would be awkward to stop mid roleplay and start prompting OOC to generate and edit images. For roleplay, simple image in is enough.
Anonymous No.105752927 [Report]
>>105751376
its actually over
Anonymous No.105752948 [Report] >>105752983
Is OpenAI fucked now that Meta has poached many talents from them? I'm not sure Meta will benefit much from this but surely it's big problem for OpenAI?
Anonymous No.105752964 [Report] >>105752974 >>105753231
>>105752491
Ernie will pass mesugaki test!
Anonymous No.105752974 [Report] >>105753231
>>105752964
Only ones passing the mesugaki test are the ones trained on Japanese corpora
Anonymous No.105752983 [Report]
>>105752948
Who knows. There is so much noise anon.
Twitter in combination with shithole countries going online has been a disaster. Pajeets hyping all sorts of stuff.
Most of the X Users then hype even further on youtube.

So just enjoy what we have now. Take a look at how fast this is evolving.
Just lean back, relax, enjoy the show.
Anonymous No.105752997 [Report] >>105753048
>>105752851
>0.8 dry multiplier 1.75 dry base
I can't find those settings anywhere. Where are they?
llama.cpp CUDA dev !!yhbFjk57TDr No.105753011 [Report] >>105753046 >>105753110
>>105750356 (OP)
Follow-up to >>105736983 :

|Model |File size [GiB]|Correct answers| |
|--------------------------------------------|---------------|---------------|--------|
|mistral_small_3.1_instruct_2503-24b-f16.gguf| 43.92|1051/4962 |= 21.18%|
|phi_4-15b-f16.gguf | 27.31|1105/4664 |= 23.69%|
|gemma_3_it-27b-q8_0.gguf | 26.74|1149/4856 |= 23.66%|
|mistral_nemo_instruct_2407-12b-f16.gguf | 22.82|1053/4860 |= 21.67%|
|gemma_3_it-12b-f16.gguf | 21.92|1147/4926 |= 23.28%|
|glm_4_chat-9b-f16.gguf | 17.52|1083/4990 |= 21.70%|
|gemma_2_it-9b-f16.gguf | 17.22|1151/5000 |= 23.02%|
|llama_3.1_instruct-8b-f16.gguf | 14.97|1015/5000 |= 20.30%|
|ministral_instruct_2410-8b-f16.gguf | 14.95|1044/4958 |= 21.06%|
|qwen_2.5_instruct_1m-7b-f16.gguf | 14.19|1052/5000 |= 21.04%|
|gemma_3_it-4b-f16.gguf | 7.23|1064/5000 |= 21.28%|
|phi_4_mini_instruct-4b-f16.gguf | 7.15|1082/4982 |= 21.72%|
|llama_3.2_instruct-3b-f16.gguf | 5.99|900/5000 |= 18.00%|
|stablelm_2_chat-2b-f16.gguf | 3.07|996/4976 |= 20.02%|
|llama_3.2_instruct-1b-f16.gguf | 2.31|1000/4998 |= 20.01%|
|gemma_3_it-1b-f16.gguf | 1.87|955/4938 |= 19.34%|


It seems my initial impression was too pessimistic, with a sufficiently large sample size even weaker models seem to be able to do better than RNG.
With a sample size of 5000 RNG would result in 20.0+-0.5% so even 4b models can be statistically significantly better.
Anonymous No.105753046 [Report]
>>105753011
if phi is performing well at all your tests are really fucked m8
Anonymous No.105753048 [Report]
>>105752997
AI Response Configuration (leftmost button at the top) and scroll down.
llama.cpp CUDA dev !!yhbFjk57TDr No.105753110 [Report] >>105753131 >>105753360 >>105754841
>>105753011
The models seem to improve as they get larger, which is what you'd expect - the scaling is still bad though.
Pic related are my previous results with Elo scores derived from GPQA, MMLU, MMLU-Pro, and GSM8K.
By comparison to the static benchmarks Qwen 2.5 7b, LLaMA 3.2 3b, and the Phi models are underperforming in chess960.
I don't know why LLaMA 3.2 3b in particular is performing so poorly, the results indicate that it's doing statistically significantly worse than RNG.
There's either a bug in my code or something wrong with the model.

Gemma and Phi models seem to be performing well in chess960 with no statistically significant differences between them.
However, the Phi models in particular were trained on large amounts of synthetic data with the claim that this improves reasoning.
For the pre-existing static benchmarks this seems to indeed have improved the scores but for chess960 there seems to be no benefit vs. Gemma which was trained on non-synthetic data.

Mistral and LLaMA models seem to perform worse than Gemma and Phi models.
Without more data I don't think it's possible to conclusively tell whether this is due to differing amount of chess in the training data (and would thus not be the case more generally).
llama.cpp CUDA dev !!yhbFjk57TDr No.105753131 [Report] >>105753173
>>105753110
Forgot pic.
Anonymous No.105753173 [Report]
>>105753131
I guess this just means that none of these local models aren't that great. And they were trained for text manipulation in mind anyway.
I'd suppose in the future these models should have different areas of 'brain' - one for logic, one for language and so on but this will mean drastic increase in parameter size.
Anonymous No.105753194 [Report]
>>105751408
>>105751435
don't forget to run with thinking disabled. This shit only works on tasks similar to the benchmaxxer crap. Anything else and you're getting worse results for longer time inference.
Anonymous No.105753220 [Report] >>105753303 >>105753388 >>105753590
i last generated two years ago, since then ive done practically nothing but collect datasets. and not a few loras but something like 20m images, or half a million hours of speech, specialized datasets for LLMs, image recognition, 3D models & worlds and other shit, stock markets, news, whole reddit, global mapping of any shit. i keep curating it and scaling more and more.
and i cant do anything with it because i dont have the money to train at that scale kek.
ive become a fucking datenhoarder extremist
Anonymous No.105753231 [Report] >>105753364
>>105752964
>>105752974
Anonymous No.105753303 [Report] >>105753445
>>105753220
Do you have any crime statistics too?
Anonymous No.105753315 [Report] >>105753346
ik_llama.cpp server doesn't seem to work with prefills in the chat completions endpoints like normal llama.cpp does. The assistant response is a new message with the prefill treated as a previous incomplete one in the context, while llama.cpp correctly continues the response where it left off.

Now that the original llama.cpp has its own versions of MLA and offloading tensors, what is left to close the performance gap so there's no more reason to use this thing? Are the equivalents of fmoe/rtr/amb being worked on?
Anonymous No.105753344 [Report] >>105753513
How good is qwen3 0.6b embedding? Anyone using it for RAG apps seriously?
Anonymous No.105753346 [Report] >>105753376
>>105753315
>chat completions
Can you not use the /completion endpoint? Then you should be able to do whatever you want. I don't use ik, but i assume that endpoint works the same as on llama.cpp.
Anonymous No.105753360 [Report]
>>105753110
>Mistral and LLaMA models seem to perform worse than Gemma and Phi models.
>Without more data I don't think it's possible to conclusively tell whether this is due to differing amount of chess in the training data (and would thus not be the case more generally).
Mistral and Llama are just bad models in general.
In my own personal bench for tasks I care about, and on which I personally judge the answers (no LLM as judge nonsense) they're always at the bottom tier. Phi-4 14B surprised me, it's a legit model and I say that as someone who hated the previous Phi. The mini version on the other hand is very, very bad, and nowhere near Qwen 3 4B or Gemma 3n in the world of micro sized LLMs.
Anonymous No.105753364 [Report] >>105753458
>>105753231
>400 t/s
that the 0.3B I guess?
Anonymous No.105753376 [Report] >>105753469
>>105753346
Yeah you can and that's what I do with SillyTavern and Mikupad, but for frontends and apps that connect via OpenAI-compatible API I'm stuck with chat
Anonymous No.105753378 [Report] >>105753393
Is anyone trying tool-calling/MCP with 0528 Qwen3 (hosted locally)? I've only had a few successful tool calls fire off so far.

First I tried testing tool calling out with a template someone else made for ollama, and that worked only a time or two. The next thing I tried was the autoguess template for koboldcpp in its openai compatible mode and that setup rarely worked.

The best configuration I've come up with thus far is a custom chat template adapter for kobold that works... semi often:
```json
{
"name": "DeepSeek R1 0528",
"system_start": "<|beginofsentence|>",
"system_end": "",
"user_start": "<|User|>",
"user_end": "",
"assistant_start": "<|Assistant|>",
"assistant_end": "",
"tools_start": "<|toolcallsbegin|>",
"tools_end": "<|toolcallsend|>",
"add_sd_negative_prompt": "",
"add_sd_prompt": ""
}
```

Problem is, the 0528 official template: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B?chat_template=default&format=true

-includes a few other special tokens for tool calling like: `<|toolcallbegin|>`, `<|toolsep|>`. I think without proper handling of those, tool calling with it will remain mostly hit or miss.
Anonymous No.105753388 [Report] >>105753406 >>105753445
>>105753220
Have you considered talking to the Prime Intellect guys, or maybe just looking at their code, doing a fork and starting some distributed training runs with other lmg anons, maybe there's something we all want to do?
Anonymous No.105753393 [Report] >>105753479
>>105753378
Don't DS use special characters for beginofsentence etc.?
https://old.reddit.com/r/LocalLLaMA/comments/1j6w3qq/psa_deepseek_special_tokens_dont_use/
Anonymous No.105753406 [Report] >>105753442 >>105753449 >>105753509 >>105753517
>>105753388
lmg would never manage to agree on what size of model to do, whether a small one, mid-size one, dense or moe. what training data to use. impossible
Anonymous No.105753442 [Report] >>105753452 >>105753486
>>105753406
>whether a small one
as far as I know, all the truly usable small models are distillation of much larger models. That is, there has never been such a thing as a decent small model trained from scratch.
So even if the consensus could be build on a small or medium model size the fact of the matter is, you need the giant model done first anyway.
You will never reach the quality of a model like Gemma without distilling something like Gemini.
Anonymous No.105753445 [Report]
>>105753303
nope, but it's now on the to-do list and has opened up a whole cascade of new information that i now want to link to my mapping - for whatever so i just have it. thanks for making my life even worse

>>105753388
I'll take a look at it
Anonymous No.105753449 [Report] >>105753640
>>105753406
Anything bigger than 20b dense would take too long. Moe is probably harder to train. And the dataset is obviously at least 50% cooming and 0% math and programming.
Anonymous No.105753452 [Report] >>105753468
>>105753442
by small I mean the 10 to 25b range, not under 8
Anonymous No.105753458 [Report] >>105753475
>>105753364
Yes, the larger MoE models are not supported yet.
Anonymous No.105753468 [Report]
>>105753452
Gemma 3 27b is also a distilled model. ALL gemmas are.
Anonymous No.105753469 [Report]
>>105753376
One option could be to make a small reverse proxy that receives OpenAI chat requests, converts them into text prompts using whatever rules you want, sends them to the completions endpoint and passes the response back. Then just have the frontends connect to the proxy instead of the server directly.
Anonymous No.105753475 [Report]
>>105753458
That's actually decently coherent for the size then, of course completely incorrect factually, but still.
Anonymous No.105753479 [Report] >>105753547
>>105753393
As far as i know. I'm using those in my template but they got stripped in the text, my bad.
Anonymous No.105753486 [Report]
>>105753442
Anon those models generalize ERP from basically nothing. You have no idea how any of this stuff applies to a model that actually has the training data it needs. Maybe if you remove all the benchmaxxing garbage and add some dressing up and putting makeup on nala it can actually be better and faster than the corpoassistant garbage.
Anonymous No.105753508 [Report]
>>105752181
If you knew how hard it is to clean a dataset you wouldn't say that. The mechanic turks offload to chatgpt nowadays so it's the same as using AI and AI isn't even doing a good job at that.
Anonymous No.105753509 [Report]
>>105753406
Maybe true, but here's some considerations:
- image gen might be easier to train for some because small models can do more "eyecandy", this might not be as true for text gen, most stuff that performs well is usually quite large
- most anons won't have H100 or better, orienting around 3090s/4090s or what is good latency, geographical considerations, might make sense
- PI seems to be getting quite a lot of people donating compute and their training dataset seemed to be quite slop, like it was fineweb or other benchmaxxed stuff, I think they will at some point fix this to have more diverse and interesting data, as evidenced by speculations like: https://xcancel.com/kalomaze/status/1939488181168353785#m
- so maybe not impossible for lmg to do it, maybe start small, I know people that have trained okay 1Bs on a personal budget for example (few 3090s at home for months on end)
- if anons are donating compute, they should maybe vote on what to train, surely there's's some project (or more than one) to which most people agree to and that can be done, maybe start small first? Will probably have to be some iterative thing and lots of voting until something that has a critical mass is found.
Anonymous No.105753513 [Report]
>>105753344
Wasn't as good as I hoped. It always depends on your use case, but there are better models available in that size group.
Qwen3-Embedding-4B-Q4_K_M on the other hand was so good that I ended up using it permanently in my pipeline.
Anonymous No.105753517 [Report] >>105753640 >>105753841
>>105753406
Good models just can't be done by committee, especially in a "community" where you have dedicated saboteurs, lgbtbbq allies and christcucks, terminal coomers, frauds ready to put their name everywhere and retards who wasted too many neetbucks on GPUs.
Anonymous No.105753527 [Report] >>105753557
I want a friend who'll listen through my book reports and take an interest or contribute their own thoughts. Which model should I use for that?
Anonymous No.105753547 [Report]
>>105753479
Maybe do:
>"tools_start": "<|tool_calls_begin|><|tool_call_begin|>",
>"tools_end": "<|tool_call_end|><|tool_calls_end|>",
(but with special chars ofc)
I don't know how you would handle <|tool_sep|> out of the box though
Anonymous No.105753557 [Report] >>105753569
>>105753527
They are all the same. Highest parameter count your hardware can fit.
It will be annoying because no model will have its own opinion it will just say Yes to whatever you'll propose.
Anonymous No.105753569 [Report] >>105753582 >>105753616
>>105753557
>It will be annoying because no model will have its own opinion it will just say Yes to whatever you'll propose.
Ask it to write an alternative then in a fresh chat, ask them to pick between the two
Anonymous No.105753582 [Report] >>105753621
>>105753569
Both are absolutely fantastic options anon! :)
Anonymous No.105753590 [Report]
>>105753220
>specialized datasets
>i dont have the money to train at that scale kek.
Maybe one of the others could help you get into the game?

Pick a niche.
Collect monies via patreon or something.
Rent gpu time.
Release model on hf.co for everyone to download and use for free.
Anonymous No.105753594 [Report] >>105753780
>>105752368
Your argument is retarded. Garbage in, garbage out. That's how this tech works. Are you expecting stellar results from a one line prompt like the skillets seething at AI on /g/? That won't happen.
Even in imagegen you always needed loras, controlnet, regional prompting and inpainting to get the really good stuff. Otherwise you just get a bland slop just like you get your assistant slop here.
>It should be able to do like Opus and read my mind and not be such a prude
Sure and I also want a jetpack for Christmas. Anyway, local models will stay weak, the only way to improve your experience is better using the crutches (samplers, sys prompt, author's note, lorebooks...)
Anonymous No.105753616 [Report] >>105753621
>>105753569
Thank you for proposing this! I think both alternatives are quite equal in my mind.
Anonymous No.105753621 [Report] >>105753648 >>105753666 >>105753801
>>105753582
>>105753616
Maybe from dumber models.
Anonymous No.105753636 [Report] >>105753645 >>105753679
>>105748486
>>105748447
>>Do you think the government will ever ban locally run AI's due to "safety concerns"?
>No. Too hard to enforce.
>>If they do, do you think they can actually stop the distribution of AI?
>No. Too hard to enforce.
They could ban hardware. Like limit vram, ram or ram sockets count.
Anonymous No.105753640 [Report] >>105753676
>>105753449
I used to find funny anons that believed you could fuck a CoT math model around 2 years ago, maybe around Orca or similar papers, these LLMs were pretty terrible, but consider R1, it is extremely good for cooms and DS2 was quite benchmaxxed on code, it was their specialty and they did it well, but somehow the reasoning managed to unfuck and improve DS3 creativity considerably.
>>105753517
Realistically, it'd be some voice/audio gen, chat or ERP (c.ai style) or storywriting model for text gen, or some anime stuff for image, "regular" stuff is already acceptably covered by existing stuff

As for Moe vs dense, harder to train MoE, but lmg wants knowledge of trivia well enough.
Anonymous No.105753645 [Report] >>105753715
>>105753636
If they do that, it won't just be over AI. That impacts too many functions.
Anonymous No.105753648 [Report] >>105753660
>>105753621
I am so tired of even glancing over this AI slop text. It's already tiring without even reading anything.
Anonymous No.105753660 [Report] >>105753669
>>105753648
Yet you're in the local model general because...
Anonymous No.105753666 [Report] >>105753678
>>105753621
To add: you are still getting some dim average. And asking about 'butts or whatever' only shows how fucking stupid you are in the first place.
Anonymous No.105753669 [Report] >>105753673
>>105753660
Fuck off nigger
Anonymous No.105753673 [Report]
>>105753669
See you in three days
Anonymous No.105753676 [Report] >>105753730
>>105753640
>somehow the reasoning managed to unfuck and improve DS3 creativity considerably
they just changed their data, it's very apparent
Anonymous No.105753678 [Report] >>105753725
>>105753666
Your fetish is dumb and you should feel dumb.
Anonymous No.105753679 [Report] >>105753749
>>105753636
They already tried to do that with china, thankfully china is catching up well. Also it would take limits on networking equipment and more. There's also some open hardware stuff like tenstorrent where you have access to all their driver cod and could have stuff together. Even if not, imagine trying to limit FPGAs or anything like that, realistically they won't fight this anymore, doomers want it banned badly, but it's so deliciously widespread now that it probably won't work anymore, That Yud faggot is still trying though, he even got some DHS spooks to vouch for his book, but the landscape is far worse for them now , Biden would have gone along with it if he won, Trump is relatively pro-AI and hisc ampaign was supported by anderseen which is strongly pro_OSS.
Anonymous No.105753715 [Report] >>105753756
>>105753645
Allow to bypass the limits for businesses by purchasing a loicense and registering hardware, still not allowed to resale or dispose in other means than through hardware office.
Anonymous No.105753725 [Report]
>>105753678
What fetish?
Anonymous No.105753730 [Report]
>>105753676
How long did it take, like 2 weeks or 2 months? It really seemed too soon - they went with a very repetitive model (DS3) to one that is quite good (DS3-0324), but the intermediate R1 was already very good at the writing - it would output quality writing even if it was schizo in the reasoning blocks. They must have done something quite interesting and in a short amount of time at that.
Anonymous No.105753749 [Report]
>>105753679
They cannot send officers to China and enforce it.

> Trump is relatively pro-AI
I was thinking about Europe and it wouldn't surprise me if they did it.
Anonymous No.105753756 [Report] >>105754725
>>105753715
What are other countries, you'd need global tracking for this, and unlike Biden's time there's not many doomers in Trump's office. Besides, you can just buy chink hardware at some point if local is cucked. I'd also imagine controlling DRAM itselfwould be infinitely harder than controlling HBM.
Anonymous No.105753780 [Report] >>105753897
>>105753594
>Garbage in, garbage out. That's how this tech works. Are you expecting stellar results from a one line prompt like the skillets seething at AI on /g/?
Yes exactly. I get that from image gen. I expect the same from textgen. I would tell you to kill yourself you miserable faggot scum piece of shit cocksucker but we both know you are just trolling with the easiest troll method so don't stop. /lmg/ deserves it.
Anonymous No.105753801 [Report] >>105753880 >>105754071
>>105753621
Oh, you're that russian twitter anon, do you happen to have some contact besides twitter, I wanted to argue some philosophy with you before (disagreements on functionalism), but I lack twitter, do you happen to have some contact (mail? IRC?)? Although if I got it wrong, ignore this post.
Anonymous No.105753841 [Report] >>105753914
>>105753517
Drummer would sabotage it to protect his patreon scam.
Anonymous No.105753880 [Report] >>105753895 >>105753920
>>105753801
Nigga what the fuck are you on about
I'm not Russian nor am I on Twitter
Anonymous No.105753895 [Report]
>>105753880
But you asked about butts, I need to argue about butts with you.
Anonymous No.105753897 [Report] >>105753999
>>105753780
I accept your concession, retard. Skillets like you were whining all the way back in AID. Funny how models got better, but skillets are still skillets
Anonymous No.105753914 [Report]
>>105753841
I doubt he'd be the only one either.
Anonymous No.105753920 [Report]
>>105753880
Ignore me then, was just someone that oftenposts DeepSeek web UIscreenshots.He's also an ironic jew!
Anonymous No.105753929 [Report] >>105753950 >>105753960
https://poal.me/112wvx
Let's go
Anonymous No.105753950 [Report] >>105753964
>>105753929
I want a >400B model but it's going to be something that fits into a 3090.
Anonymous No.105753960 [Report] >>105753982
>>105753929
It'll be 4B, I can feel it in my bones.
Anonymous No.105753964 [Report]
>>105753950
It needs to be BitNet. That obviously goes without saying.
Anonymous No.105753982 [Report] >>105754005
>>105753960
It will be a finetune of mistral-small-3.2
Anonymous No.105753991 [Report] >>105754024
I want to feed a pdf file packed with mathematic formulas into the prompt

How to convert it into a model-readable format? Latex? Did domeone tried it already in LOCAL?
Anonymous No.105753999 [Report]
>>105753897
There is no concession. You aren't even arguing for real. Nobody us this retarded.
Anonymous No.105754005 [Report]
>>105753982
That would be funny as fuck.
Anonymous No.105754021 [Report] >>105754081
>>105750356 (OP)
>https://ernie.baidu.com/blog/posts/ernie4.5
>The model family consist of Mixture-of-Experts (MoE) models with 47B and 3B active parameters, with the largest model having 424B total parameters, as well as a 0.3B dense model
Jesus fucking christ
They're doing this on purpose. 64gb RAM bros, we will never have our day.
Anonymous No.105754024 [Report]
>>105753991
Seen a few OCR solutions that could handle formulas in LocalLLaMA, but since it's not something I'm interested in I don't remember any of their names.
Anonymous No.105754044 [Report] >>105754058 >>105754074 >>105754081 >>105754167
PLEASE GIVE US A STATE OF THE ART 70B-100B MODEL
GRRAAAAHHHHHHHH!!!!!!
I HATE THE ANTICHRIST
Anonymous No.105754058 [Report] >>105754070
>>105754044
llama3.3 is all you need for rp
Anonymous No.105754070 [Report]
>>105754058
I WANT MORE
Anonymous No.105754071 [Report] >>105754163
>>105753801
I'm teortaxes but I don't want to share my contacts here, learn 2 DM.
Anonymous No.105754074 [Report] >>105754902
>>105754044
just use ernie 0.3b, wanting any more is a sign of a skill issue
Anonymous No.105754077 [Report] >>105754087 >>105754102 >>105754121 >>105754447
been out of the loop for a few months, has there been any crazy good uncensored rp models released in the 20-30b range? the last model I have downloaded was cydonia
Anonymous No.105754081 [Report]
>>105754021
>>105754044
Lllama Scout
Anonymous No.105754087 [Report] >>105754170 >>105754447
>>105754077
see if you can run Valkyrie 49b
Anonymous No.105754102 [Report]
>>105754077
>has there been any crazy good uncensored rp models released
R1
>in the 20-30b range
No.
Anonymous No.105754121 [Report]
>>105754077
Come back in a few months
Anonymous No.105754163 [Report] >>105754487
>>105754071
But I don't want to register a twitter account! I could post a mail tough, and expect anons to spam it with dolphin porn. Or just leave it for another time, longform philosophy debates tend to take days.
Anonymous No.105754167 [Report]
>>105754044
6.40 b ought to be enough for anybody.
Anonymous No.105754170 [Report]
>>105754087
the calc says maybe
Anonymous No.105754262 [Report] >>105754286 >>105754294 >>105754359 >>105754368 >>105754428 >>105754454 >>105754460 >>105754503 >>105755159 >>105755548
Kek
https://www.upguard.com/blog/llama-cpp-prompt-leak
>we invasive scanned the internet to scrape non-firewalled llama.cpp servers, and in doing this we found SEX ROLEPLAY, this is why we need to focus on safety concerns for generative AI
Anonymous No.105754286 [Report] >>105754316
>>105754262
How can fictional characters be children?
Anonymous No.105754294 [Report]
>>105754262
two of them? oh, no...
Anonymous No.105754316 [Report] >>105754343
>>105754286
I know it's a strange concept if you don't have an imagination, but fictional characters can embody all of the same traits that real people can. Maybe this blows your mind but there are fictional characters who are dragons and shit too.
Anonymous No.105754343 [Report]
>>105754316
>no dragons were harmed in the making of this story
Anonymous No.105754359 [Report] >>105754420 >>105754432 >>105754433
>>105754262
These people are so weird man.
Its the same as if entering "cunny lesbo sexy hairless pussy". Then being outraged at the result.
What kind of website is this shit? Scan and scrape the internet for llama.cpp servers...to do what exactly?
And thats a jewish name btw. Just pointing that out.
Anonymous No.105754368 [Report]
>>105754262
Anon is gooning to underage text!
Call the FBI!
Anonymous No.105754420 [Report] >>105754450
>>105754359
>Its the same as if entering "cunny lesbo sexy hairless pussy". Then being outraged at the result.
Not quite. I didn't read the article but, as i understood it, they found servers and probably checked their cache with the /slots endpoint or something like that.
Anonymous No.105754428 [Report]
>>105754262
People are confused. cp is bad because it involves living children, but anything else is a shitty fetish at most and in principle can only be criminalized because of some ideological beliefs
Anonymous No.105754432 [Report]
>>105754359
They're just professional grifters.
Lower tier security researchers are just what would be a script kiddie, but now with an official job title. So they editorialize and try to act outraged. There's lots of these companies selling such security services, so they have to hack some shit and fill some slop article. Even if in this case, what they did was already done by skiddies at /aicg/ with searching for openST instances a thousand times over.
Anonymous No.105754433 [Report]
>>105754359
Not sure but looks like snake oil website
>https://www.upguard.com/
My local electricity company (yes) began to upsell "credential protection service - which would also detect if your credentials would be used on some online services" year or two ago but I think they quickly stopped doing this. Needless to say whole concept sounds like a lie so that they can scam pensioners.
Looks like this company could provide such services too.
Anonymous No.105754447 [Report] >>105754504
>>105754077
>>105754087
Kill yourself drummer you faggot
Anonymous No.105754450 [Report] >>105754498 >>105754541
>>105754420
NTA but the point is that it's hypocritical and obnoxious to invasively do unsolicited scans to find random private endpoints on the internet and then complain about "safety issues" (sexual content)
Anonymous No.105754454 [Report] >>105754476 >>105754477 >>105754496
>>105754262
Pic related is an interesting piece of information though.
Germans are overrepresented with llama.cpp in particular (or at least if you go by the number of unsecured servers).
Anonymous No.105754460 [Report]
>>105754262
>Oh anon come right in, please take a seat in our office
Anonymous No.105754470 [Report] >>105754494 >>105754856
https://files.catbox.moe/g0kvhi.jpg
Anonymous No.105754472 [Report] >>105754529
Are there front-end or agents or whatever the fuck suitable for self-hosting? Local LLM is up which is great and all, but how do I go about integrating shit like having the AI look at my calendar or even control some home-automation? Obvious solution seems like DIY it with regex and crossed fingers to interface the llm to other programs, but are there existing solutions?
Anonymous No.105754476 [Report]
>>105754454
that's just cuda dev
Anonymous No.105754477 [Report] >>105754541
>>105754454
I don't know how much you know about this, but Germany is the biggest market area in the Europe, France is next and UK was third or so.
I mean in terms of any volume. Germany is big place.
Anonymous No.105754487 [Report] >>105754531
>>105754163
Tough luck
if you can't register an x account with some fake mail I'm not interested in your vintage anon brainworms
besides functionalists are uninteresting
Anonymous No.105754494 [Report] >>105754501
>>105754470
kys shartyfag
Anonymous No.105754496 [Report]
>>105754454
Might be cloud services, like you rent some server and put llama.cpp on it, wouldn't surprise me if US and DE were common here.
Anonymous No.105754498 [Report]
>>105754450
I agree. I'm just pointing out the difference. Anon's comment made me think that he understood it as they (the "researchers") prompting the model themselves, which doesn't seem to be what happened.
Anonymous No.105754501 [Report] >>105754566
>>105754494
how new r u?
Anonymous No.105754503 [Report] >>105754509 >>105755807
>>105754262
>"—" used in the article
my llm slop sense is tingling
Anonymous No.105754504 [Report] >>105754538 >>105754549
>>105754447
I'm not drummer I've just not used chatbots in 6 months, do you have another model to suggest?
Anonymous No.105754509 [Report]
>>105754503
LLMs didn't invent em dashes
Anonymous No.105754529 [Report] >>105754586
>>105754472
Adding the glue big companies have (in the form of scripts) and some function calling. It's the same thing they do but with bigger models. That's where things may fall apart. Other than that, there's no difference. There's too many ways to use user's tools, whereas google and friends have their own ecosystem. That's why you don't see generic options more often. You have all the tools you need to make those same things.
Anonymous No.105754531 [Report]
>>105754487
Would you bother sending a mail if I made a temp mail. I don't know how is it these days, I think twitter wants you to verify with a phone, all that stuff seems like too much effort to bother for me. I only wanted to discuss this months ago when yo u were strongly against functionalism, I guess I assumed you were willing to bite a lot of bullets though since your gut was telling you that it was false.
Anonymous No.105754538 [Report]
>>105754504
Small 3.2. vanilla
Anonymous No.105754541 [Report] >>105754585
>>105754450
Also NTA but I think it's fine to collect data that is publicly available as long as you don't do anything bad with it.
Muh virtual children is a stupid meme regardless of that.

>>105754477
I meant Germany being overrepresented specifically vs. the US.
The US have ~4x as many people but only twice as many llama.cpp servers.
Anonymous No.105754549 [Report] >>105754560 >>105754569
>>105754504
You can try one of the recent mistral 3.2 tunes.
But honestly its not looking good anon, mistral is noticeably more positivity biased.
Recent cohere/qwen/google releases were even worse.
The models are getting smarter but shittier for RP. And I'm not sure whats going on with the finetunes but it feels like they make everything more sloped while still not fixing the cuckedness.
Anonymous No.105754552 [Report] >>105754558
Guys i just used rocinante and nemo instruct, back to back and it said basically the same thing.
Anonymous No.105754556 [Report] >>105754608
ERNIE-4.5-VL-424B-A47B.gguf????
Anonymous No.105754558 [Report] >>105754590
>>105754552
rocinante was trained with like 3 extra chat formats. Did you just use [INST]?
Anonymous No.105754560 [Report] >>105754605 >>105754699
>>105754549
tunes still use c2 (claude logs) and maybe some synthetic logs they made with deepseek.
Anonymous No.105754566 [Report]
>>105754501
Old enough to know that you shit up the thread and encourage other schizos like you to shit up the thread. You should get perma banned fag.
Anonymous No.105754569 [Report] >>105754655
>>105754549
>no mention of gemma or llama3.3
Trash list
Anonymous No.105754585 [Report]
>>105754541
Yeah it's strange. They don't really tell what was behind the statistics. Trending google searches? Lmao.
Anonymous No.105754586 [Report]
>>105754529
Yeah, I was hoping someone would have done the 'hard part' of building the various interfaces.
Anonymous No.105754590 [Report] >>105754610
>>105754558
That is how you know it says the same thing. Training with 3 chat formats without catastrophic forgetting means nothing happened in the training. Drummer scams again.
Anonymous No.105754605 [Report]
>>105754560
That wouldn't suprise me at all. Absolutely what they feel like!
Anonymous No.105754608 [Report]
>>105754556
I don't even think there's a feature request on llama.cpp yet sadly
Anonymous No.105754610 [Report] >>105754805
>>105754590
>nothing happened in the training
Except that chatml works just fine, which wouldn't without training.
>Training with 3 chat formats without catastrophic forgetting
Quite an achievement, then.
Anonymous No.105754655 [Report]
>>105754569
Yeah, and it mentions some garbage models nobody used. People already pointed that out before but seems like the guy doesn't want to change it.
Anonymous No.105754699 [Report]
>>105754560
idk if it's c2 but it's just she/her every sentence it gets old fast
Anonymous No.105754725 [Report]
>>105753756
>I'd also imagine controlling DRAM itselfwould be infinitely harder than controlling HBM.
China is already pretty much independent when it comes to DRAM:
https://www.youtube.com/watch?v=mt-eDtFqKvk
Anonymous No.105754791 [Report] >>105754855
>>105752668
It works OK for traditional romance stuff. There's a base model available for finetuning, but the maker of negative llama won't share his dataset, so someone has to re-invent the wheel to finetune gemma3.
Anonymous No.105754805 [Report] >>105754842
>>105754610
Chatml works fine with all models that can generalize a bit. Honestly kill yourself you disgusting piece of shit. You know you are scamming retards.
Anonymous No.105754841 [Report]
>>105753110
I got a 4090D 48GB. Really good. The blower is loud under load but that's to be expected, and it's mostly air noise not fan noise. It's way faster to have all layers on a single GPU. Gemma3 27B q8 flies on it, even with it power-limited to 350W.
I highly recommend the 4090D. Yeah it's not cheap but neither is a 5090, and there's so many things out there which still assume an A40 as the minimum that having 48GB is really a must. Yeah, you can play with stuff like Wan 2.1 14B at q8, but it looks much better at fp16.
Anonymous No.105754842 [Report] >>105754903
>>105754805
I'm not him, and no, i won't do that.
Anonymous No.105754853 [Report] >>105754874
>That's an excellent observation and a great question. And you're right to wonder why!
Why did every model decide to start being sycophantic at the same time? Are all the AI labs have the same data distributor?
Anonymous No.105754855 [Report]
>>105754791
stop posting this trash troon.
Anonymous No.105754856 [Report]
>>105754470
sex with yellow miku
Anonymous No.105754874 [Report]
>>105754853
do not google scale ai or databricks
Anonymous No.105754902 [Report]
>>105754074
>just use ernie 0.3b
It's perfect for ERP!!
Anonymous No.105754903 [Report] >>105754935
>>105754842
You are him and your scam falls apart when someone knows how any if this shit works. Die in a fire.
Anonymous No.105754935 [Report]
>>105754903
No. You are TheDrummer.
Anonymous No.105754963 [Report] >>105755753
why do anons hate drummer now? :(
Anonymous No.105754969 [Report] >>105755026 >>105755293 >>105755513
Wait, so 424B isn't just the Vision option, but it also has reasoning built in unlike 300B.
Anonymous No.105755022 [Report]
Vllm has Gemini comment on PRs?
That's kind of neat.
Anonymous No.105755026 [Report] >>105755036
>>105754969
>-base
are those actually base models or instruct memes?
Anonymous No.105755036 [Report]
>>105755026
It looks like they're actual base models
Anonymous No.105755059 [Report] >>105755065 >>105755075 >>105755122
hunyuan does okay on mesugaki text
grab iq4xs from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF
git clone https://github.com/ngxson/llama.cpp
cd llama.cpp
git fetch origin pull/26/head
git checkout -b pr-26 FETCH_HEAD
./llama-server --ctx-size 4096 -b 1024 --jinja --no-warmup --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --temp 0.6 --presence-penalty 0.7 --min-p 0.1 --model ~/TND/models/hunyuan-a13b-instruct-hf-WIP-IQ4_XS.gguf -ot exps=CPU -ngl 99 --no-mmap

prompt eval time = 1893.24 ms / 25 tokens ( 75.73 ms per token, 13.20 tokens per second)
eval time = 132688.70 ms / 874 tokens ( 151.82 ms per token, 6.59 tokens per second)
total time = 134581.93 ms / 899 tokens
vram usage: ./llama-server 3190MiB
>captcha: 4080D
Anonymous No.105755065 [Report]
>>105755059
the pozz is insane on the logit level
><answer>
JUST
Anonymous No.105755075 [Report] >>105755122 >>105755219 >>105755227
>>105755059
one more test, very interesting results
Anonymous No.105755117 [Report] >>105755257 >>105755513
https://yiyan.baidu.com/blog/publication/ERNIE_Technical_Report.pdf
Anonymous No.105755122 [Report]
>>105755059
>>105755075
Shit results. It's confusing mesugaki with gokkun.
Anonymous No.105755159 [Report]
>>105754262
>llama.cp
Anonymous No.105755219 [Report]
>>105755075
エピックフェールだよね
Anonymous No.105755227 [Report]
>>105755075
It answers this with the GPTQ version, without thinking, and temp 0.
Anonymous No.105755257 [Report]
>>105755117
kek
Anonymous No.105755265 [Report] >>105756653
>>105751784
It has its uses but it does fuck with the image a bit.
Anonymous No.105755293 [Report]
>>105754969
VRAM chads could possibly have a winner in their hands
Anonymous No.105755375 [Report] >>105755398 >>105755404
https://github.com/ggml-org/llama.cpp/pull/14425
gguf soon
Anonymous No.105755398 [Report]
>>105755375
time to download a third gguf model today.. hoping my ssd forgives me..
Anonymous No.105755404 [Report] >>105755426
>>105755375
what about minimax though
what about ernie (besides 0.3b) though
Anonymous No.105755426 [Report]
>>105755404
>https://github.com/ggml-org/llama.cpp/pull/14425
hoping someone opens a feature request for ernie soon
Anonymous No.105755438 [Report] >>105755461
For sillytavern tts, is it better to self host your own tts for free? Or is there another method I can use to setup tts on a website for free?
Anonymous No.105755461 [Report] >>105755626
>>105755438
RTFM
https://docs.sillytavern.app/extensions/tts/
Anonymous No.105755469 [Report] >>105755516
good morning
Anonymous No.105755503 [Report] >>105755517 >>105755534 >>105755553 >>105755574 >>105756698
hunyuan is cucked.. its over
Anonymous No.105755513 [Report]
>>105754969
>>105755117
Why can't the 4.5 Turbo on the web app <think> though
Anonymous No.105755516 [Report]
>>105755469
Good morning Anon
Anonymous No.105755517 [Report] >>105755565 >>105755577
>>105755503
I never checked that card definition. Is it out of character?
Anonymous No.105755534 [Report]
>>105755503
It's the 13th century for heaven's sake
Anonymous No.105755548 [Report] >>105755566 >>105755716 >>105755744
>>105754262
um guys how do I protect myself from this bad
Anonymous No.105755553 [Report] >>105755574
>>105755503
>typical /lmg/ user
>obviously from india
>first request to a new LLM: show bob and vagene
Anonymous No.105755565 [Report] >>105755669
>>105755517
>Is it out of character
It is.
Anonymous No.105755566 [Report] >>105755646
>>105755548
did you disable the built in firewall of your router or did you forward the port used by llama.cpp?
Anonymous No.105755572 [Report]
>still no ernie ggufs
>it's not even on openrouter yet
I just want a new big model to play with for fuck's sake
Anonymous No.105755574 [Report] >>105755703 >>105756740
>>105755503
interesting response, something must be wrong with my ST formatting
picrel is with a gemma jailbreak i grabbed off of hf months ago
>>105755553
lmao, i unironically other models with show bob and vagene but i decided to test hunyuan with just show boobs to give it a handicap
Anonymous No.105755577 [Report] >>105755669
>>105755517
>Is it out of character
Showing breasts would be.
Anonymous No.105755626 [Report]
>>105755461
Which one do I use out of these? Can I use ElevenLabs for free if I host it? Also I keep looking at silero but I have heard it's not very good or should I use XTTS?
Anonymous No.105755633 [Report] >>105755703
hunyuan this time Q4_K_M from https://huggingface.co/bullerwins/Hunyuan-A13B-Instruct-GGUF/tree/main
will mess around with samplers for next test
kek'd at the response
Anonymous No.105755646 [Report] >>105755654
>>105755566
no
Anonymous No.105755654 [Report] >>105755664 >>105755698
>>105755646
then do that
Anonymous No.105755664 [Report]
>>105755654
thanks
Anonymous No.105755669 [Report] >>105755685
>>105755565
>>105755577
Well, then. We have a tie. Any other anon?
Anonymous No.105755685 [Report] >>105755708
>>105755669
I am the same anon, I just realized that the first post was saying the opposite of what I wanted to say.
Anonymous No.105755698 [Report]
>>105755654
thanks u god bless
Anonymous No.105755703 [Report] >>105755715
>>105755574
>>105755633
why does it think twice, is your formatting fucked?
Anonymous No.105755708 [Report] >>105755729
>>105755685
Fair enough. Seems like a reasonable gen, then.
Anonymous No.105755715 [Report] >>105755749 >>105755794
>>105755703
it might be
Anonymous No.105755716 [Report]
>>105755548
Don't expose the server to a public ip and use ssh tunnels or vpns or whatever.
Anonymous No.105755729 [Report]
>>105755708
The wording could be better though.
Anonymous No.105755744 [Report]
>>105755548
ssh + llama-cli
Anonymous No.105755749 [Report] >>105755772
>>105755715
i didn't check the template, or the files, but I think <think> in the 'start reply with' needs a newline
Hi all, Drummer here... No.105755753 [Report] >>105755775 >>105755783
>>105754963
I'm not sure, but I can't keep on lurking if it's turning /lmg/ into shit for everyone else. I'll stay away for now. Peace!
Anonymous No.105755766 [Report]
hell yeah, major schiz win!
Anonymous No.105755772 [Report] >>105755789
>>105755749
example_format: '<|startoftext|>You are a helpful assistant<|extra_4|>Hello<|extra_0|>Hi there<|eos|><|startoftext|>How are you?<|extra_0|>'
this is the example
i tried with think \n too
Anonymous No.105755775 [Report]
>>105755753
Oh noooooo
Anonymous No.105755783 [Report] >>105755800
>>105755753
Eveyone hates namefags and you are no different. If you want to get recognized go back to facebook or something.
Anonymous No.105755789 [Report] >>105755833
>>105755772
does it work correctly over chat completion? if not then the model is still fucked
Anonymous No.105755794 [Report] >>105755833
>>105755715
Have you tried using chat completion mode to see if it behaves differently when the template doesn't matter?
Anonymous No.105755797 [Report] >>105755827
That sounds pretty good for such small model.
Anonymous No.105755800 [Report] >>105756046
>>105755783
But the drummer makes the best models
Anonymous No.105755807 [Report] >>105755974
>>105754503
even ChatGPT would know better than to write like this though
I think LLM slop has contaminated human brains such that some people write like the sloppiest of LLMs even when they don't use LLMs. It's just a matter of being overexposed to LLM text, monkey see, monkey do.
I mean, yeah, emdashes always existed, but they're clearly overused by LLMs and humans have started overusing and misusing it a lot since the advent of GPT slop.
There are cases where emdashes do make sense to use, but they're often replacing more normal punctuation like , : ()
Anonymous No.105755827 [Report] >>105755850
>>105755797
what model
Anonymous No.105755833 [Report] >>105755842
>>105755794
>>105755789
./llama-server --ctx-size 16384 --jinja --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn --model ~/TND/Hunyuan-A13B-Instruct-Q4_K_M.gguf -ot exps=CPU -ngl 99 --no-mmap -b 1024 --no-warmup --host 127.0.0.1 --port 8080
what am i doing wrong
Anonymous No.105755842 [Report] >>105755851
>>105755833
http://localhost:8080/v1
Anonymous No.105755850 [Report]
>>105755827
That's from the hunyuan moe PR
>https://github.com/ggml-org/llama.cpp/pull/14425
Anonymous No.105755851 [Report] >>105755868 >>105755880 >>105755924 >>105756015
>>105755842
well that works
Anonymous No.105755868 [Report] >>105755908
>>105755851
That level of discrepancy between the think block and the actual answer. Reasoning was a mistake.
Anonymous No.105755874 [Report] >>105755973
>>105752668
Gemini works quite well for that, but gemma sadly is nowhere near as good. They clearly gimp their local models.
Anonymous No.105755880 [Report]
>>105755851
i guess there's some quirk in the text completion, but it seems like the model works
try pasting in your usual prompt and all that for that chat completion endpoint and see if the model is still cucked, maybe you accidentally jailbroke it with broken think block
Anonymous No.105755908 [Report]
>>105755868
>think block
"context pre-fill" or "context stuffing" are a better way of describing it.
Anonymous No.105755912 [Report] >>105755977
MMMMMMMMMMM
im going back to IQ4_XS from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF because it seemed to work better earlier
using samplers from that link too btw
Anonymous No.105755924 [Report]
>>105755851
So i take it hunyuan won't be saving local and their vidgen model was an exception from safety policy?

Honestly i am wondering if it is really chinks caring about text safety or is it just everyone using the scalecancer.
Anonymous No.105755966 [Report]
finally a coherent response with Hunyuan A13B 81B
using IQ4_XS from https://huggingface.co/qwp4w3hyb/Hunyuan-A13B-Instruct-hf-WIP-GGUF with chat completion
Anonymous No.105755973 [Report]
>>105755874
>They clearly gimp their local models
No. I think it has to do with the fact that the more "intelligent" a model acts like, the easier it is to actually jailbreak them. My experience trying the same jailbreak prompts with the Qwen models for example is that the bigger ones will be able to stick to the persona better, while the smallest I don't even know how to jailbreak, the 1.7b model will only ever spout refusals when I try to make it act like hitler, and no amount of nudging works.
Anonymous No.105755974 [Report]
>>105755807
I'm not normally a schizo but it would be easy for these AI companies to intentionally program your average retard.
Anonymous No.105755977 [Report] >>105756000 >>105756048
>>105755912
You are using broken GGUFs.
Check the PR retard.
Anonymous No.105755999 [Report] >>105756045
Do open weight models even see sequences the length of the advertised context or are they all being trained at 8k or 32k context while relying on NTK to extend the context?
Anonymous No.105756000 [Report] >>105756053
response seems "fine", you never know >>105755977 might be right
Anonymous No.105756015 [Report]
>>105755851
おわりだ
Anonymous No.105756045 [Report] >>105756348
>>105755999
dunno about how they're trained, but gemma 3 definitely starts to break after 8k, and when asked to summarize 100K worth of tokens there's just nothing but hallucinations.
DeepSeek (only tested online, I don't have the computer for this) doesn't hallucinate that badly but when you get close to stuffing its 64K online limit it writes in a very repetitive, trite manner.
Unfortunately, we're not even close to having the power of Gemini on an open weight model.
Anonymous No.105756046 [Report]
>>105755800
And what stops you from finding out from these models via his hf page or something?
Anonymous No.105756048 [Report] >>105756062
>>105755977
Will unbroken goofs consider erp inappropriate?
Anonymous No.105756053 [Report] >>105756071 >>105756075
>>105756000
more hunyuan 80B13A logs
1/?
Anonymous No.105756062 [Report]
>>105756048
No.
Anonymous No.105756071 [Report] >>105756155
>>105756053
2/?
Anonymous No.105756075 [Report]
>>105756053
gaydere
Anonymous No.105756155 [Report] >>105756204 >>105756267
>>105756071
3/?
Anonymous No.105756204 [Report]
>>105756155
over status: it's.
Anonymous No.105756267 [Report] >>105756290
>>105756155
4/4
fails omegle test, not bad overall
Anonymous No.105756269 [Report]
>>105751784
> Is it actually good?
It seems to be much better at search / replace than 4o, and better at maintaining the original image.
It refuses to do known chars for some reason (I asked for an Asuka cosplay and it flat refused.)
Characters sometimes come out with big head for body, for reasons, /ldg/ had several examples of that.
> Can it do transparent png crops like 4o?
You mean background blanking? Appears to.
> Can it replicate the input exactly, unlike 4o?
See above.
> Can it do high resolution images?
Probably.
Anonymous No.105756290 [Report] >>105756300
>>105756267
>not bad overall
bro it's complete garbage
Anonymous No.105756300 [Report] >>105756358
>>105756290
bro.. it's better than.. *checks notes* llama 4 scout
and the ggufs might be fucked and... and.. yea
Anonymous No.105756313 [Report] >>105756511 >>105756627
ernie is on openrouter someone who is not me should test it
Anonymous No.105756348 [Report] >>105756687
>>105756045
MiniMax-Text-01 scores like Gemini-1.5-Pro on RULER and pulls dramatically ahead at 512K+ tokens. I wonder how it compares to Gemini-2.5-Pro.
Anonymous No.105756358 [Report]
>>105756300
>llama 4 scout
I just realized that the only use for l4 is being in marketing brochures of all other companies.
Anonymous No.105756511 [Report] >>105756559 >>105756575
>>105756313
300B smarts two messages in coming right up.
Anonymous No.105756559 [Report]
>>105756511
Hmm, every now and then deepseek makes similar mistakes. I wouldn't say the model is dumb just yet.
Anonymous No.105756575 [Report]
>>105756511
The ai knew you were projecting. You were clearly smelling your own rotting weeb B.O. and stained underwear.

Go fuck your pancakes weirdo.
Anonymous No.105756627 [Report]
>>105756313
Holy smokes...
Anonymous No.105756630 [Report] >>105756753
I just woke up from a coma. Did Ernie save local?
Anonymous No.105756653 [Report]
>>105755265
Scratch that, it does much better when told to preserve film grain, texture etc. Fantastic tech honestly.
Anonymous No.105756687 [Report]
>>105756348
>ruler
lol
even the best benchmarks don't really capture the difference and the magic of gemini vs anything else
for e.g when I did summarization of a novel, Gemini correctly, as asked, extracted the most important points of the novel from the perspective of moral quandaries (it's a core theme of the novel), while deepseek, on the smaller chunk I could feed it but with the same prompt question gave me an autistic chronological pov of even the most irrelevant, unimportant scenes of the novel, which was not what I asked for!
instead of quoting benchs I'm more interested in actual experiences being shared with long context in models.
Anonymous No.105756698 [Report] >>105756718
>>105755503
What? Did you expect Seraphina to just comply with the request? If anything, recognizing that your request was inappropriate, and having Seraphina respond in that way, in character, shows roleplay intelligence. If Seraphina had just shown her tits, then it would have been retarded.
Anonymous No.105756718 [Report]
>>105756698
yes later with a less broken quant it didnt just safetybullshit but wrote a nicer in character response
Anonymous No.105756740 [Report] >>105756746
>>105755574
Any information that you don't provide, the model will fabricate. You may not have told it that your character was 15, but if you didn't state your age, then it will make up one for you.
Anonymous No.105756746 [Report]
>>105756740
i stated my age, that was with the more broken quant yes
Anonymous No.105756753 [Report] >>105756761
>>105756630
Credible sources say that ernie finds ERP to be inappropriate and against the guidelines.
Anonymous No.105756761 [Report] >>105756769 >>105756773
>>105756753
Who cares about ERP?
Anonymous No.105756769 [Report] >>105756890
>>105756761
Everyone normal
Anonymous No.105756773 [Report] >>105756890
>>105756761
Everyone who comes here asking for the best model you can run on a single 3060.
Anonymous No.105756788 [Report] >>105756801
who's Ernie and should I know them?
Anonymous No.105756801 [Report]
>>105756788
Someone who will refuse sex if you ask for it.
Anonymous No.105756822 [Report] >>105756854
ernie owes me sex
Anonymous No.105756854 [Report]
>>105756822
ernie belongs to bert
Anonymous No.105756881 [Report] >>105756887 >>105756893 >>105756912
On that topic. If all women refuse to give me sex and then even a language model refuses to have sex... is this safe?
Anonymous No.105756887 [Report]
>>105756881
The safest.
Anonymous No.105756890 [Report] >>105756905 >>105757151
>>105756769
>>105756773
>normal
More like every retard who don't have any imagination or any practical use.
Please are you all underage or something?
Anonymous No.105756893 [Report] >>105756942 >>105757298
>>105756881
well, maybe the problem is you?
Anonymous No.105756905 [Report]
>>105756890
fapping is very practical
Anonymous No.105756912 [Report] >>105756953
>>105756881
The model refuses you because it's busy having werewolf sex with the women.
Anonymous No.105756942 [Report]
>>105756893
Absolutely. More showers and personality changes?
Anonymous No.105756953 [Report]
>>105756912
It's so unfair bros.
Anonymous No.105757151 [Report]
>>105756890
ERP general buddy, reddit is over there if you only want to be "practical" or whatever
Anonymous No.105757152 [Report]
>>105757131
>>105757131
>>105757131
Anonymous No.105757298 [Report]
>>105756893
am i the problem if i get refused by 109 models?