/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105832690 &
>>105822371โบNews
>(07/08) SmolLM3: smol, multilingual, long-context reasoner: https://hf.co/blog/smollm3>(07/08) Hunyuan MoE support merged: https://github.com/ggml-org/llama.cpp/pull/14425>(07/06) Jamba 1.7 released: https://hf.co/collections/ai21labs/jamba-17-68653e9be386dc69b1f30828>(07/04) MLX adds support for Ernie 4.5 MoE: https://github.com/ml-explore/mlx-lm/pull/267>(07/02) DeepSWE-Preview 32B released: https://hf.co/agentica-org/DeepSWE-PreviewโบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
file
md5: e9fb7f5c95501aa3cd7120834a590e92
๐
โบRecent Highlights from the Previous Thread:
>>105832690--Papers:
>105834135 >105834182--Experimenting with local model training on consumer GPUs despite known limitations:
>105839772 >105839805 >105839821 >105839838 >105840910 >105841022 >105841129 >105839881 >105841661 >105841824 >105841905 >105841992 >105842071 >105842166 >105842302 >105842422 >105842624 >105842704 >105842731 >105842816 >105842358 >105842418 >105842616 >105842654 >105842763 >105842902 >105842986 >105843186 >105842891--Risks and challenges of building a CBT therapy bot with LLMs on consumer hardware:
>105836762 >105836830 >105836900 >105837945 >105840397 >105840554 >105840693 >105840730 >105839512 >105839539 >105839652 >105839663 >105839943 >105841327--Memory and performance issues loading Q4_K_L 32B model on CPU with llama.cpp:
>105840103 >105840117 >105840145 >105840159 >105840191 >105840201 >105840255 >105840265 >105840295 >105840355 >105840407 >105840301 >105840315--Evaluating 70b model viability for creative writing on consumer GPU hardware:
>105836307 >105836366 >105836374 >105836489 >105836484 >105836778 >105840476 >105841179--Challenges in building self-learning LLM pipelines with fact-checking and uncertainty modeling:
>105832730 >105832900 >105833650 >105833767 >105833783 >105834035 >105836437--Concerns over incomplete Hunyuan MoE implementation affecting model performance in llama.cpp:
>105837520 >105837645 >105837903--Skepticism toward transformers' long-term viability and corporate overhyping of LLM capabilities:
>105832744 >105832757 >105832807 >105835160 >105835202 >105835366 >105839406 >105839863--Hunyuan MoE integration sparks creative writing data criticism:
>105835909 >105836075 >105836085--Links:
>105839096 >105839175 >105840055--Miku (free space):
>105832744 >105832988 >105832992 >105833638 >105840752โบRecent Highlight Posts from the Previous Thread:
>>105832694Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
two more weeks until the usual summer release circle starts
>download tensorflow
>latest stable/nightly doesn't support sm120 yet
>try to compile from source
>clang doesn't support sm120 yet
Very funny
>>105844217>he forgot the >>
>>105844669>Why?: 9 reply limit >>102478518 (Dead)>Fix: https://rentry.org/lmg-recap-scriptLearn to read
>Creative Writing. Paired GRMs based on relative preference judgments mitigate reward hacking, while creative rewards are blended with automated checks for instruction adherence to balance innovation and compliance.
Supposedly trained for "creative writing." Has zero benchmarks that measure writing quality.
>>105844725Use other LLMs to score write quality.
>>105844543I like this Miku
>>105844674Oh sorry I'm dumb. Dang jannies making the site worse.
>>105844733See your LLM judge either scores your model's output as shit; scores all models the same; or scores models in a way so obviously unreflective of actual quality that showing the comparison would damage your credibility. Decide not to publish your result.
What are the best local embedding and reranking models for code that don't require a prompt? Right now I'm using snowflake-arctic-embed-l-v2.0 and bge-reranker-v2-m3, but these seem to be a bit dated and non-code specific.
>>105844817alibaba just open sourced some qwen embedding models.
>>105844817Qwen 3 embedding also https://huggingface.co/spaces/mteb/leaderboard
New to these threads, what do you guys use your local models for? Why use local models instead of just the commercial ones (besides privacy reasons)?
>>105844901Knowledge that the model I'm running now will always be here and behave in the same way, while cloud services can and do modify, downgrade, replace, or otherwise change what they're offering at any time with no notice.
>>105844210 (OP)Unpopular opinion: Neru is the hottest of the 3 memeloids. It'd be even better if her sidetail was a ponytail instead.
>>105844921This shit right here. Yall know Gemini 2.5 pro/flash is quantized to q6? Search HN and there's a post by an employee mentioning it.
Who's to say that shit isn't happening all the fucking time at all the labs, trying to see if they can shave a few layers or run a different quant and see if the acceptance rate is good enough.
Literally why I will never use chatgpt.
>>105844901>what do you guys use your local models for?masturbation and autocomplete writing ideas
>local models instead of just the commercial onesI think it's neat that I can run and own it
>>105844921This exactly. You have no idea what you're running or paying for, and the companies hosting these get to decide what to charge you and withhold the information you'd use to decide whether something was a fair deal or not
>Her whole life feels like a TikTok draft someone never posted. And heโs looking at her like he might finally press "upload."
zoomkino
dead
md5: c7d085bffd6afab475083ec2bcddaf90
๐
>>105845255Grok 3's "basedness" comes from the system prompt not the training. Musk probably added them himself. This wasn't the first time Grok 3's system prompt got compromised; it started spewing out random bits about South African whites randomly a while ago.
>>105845255grok has fallen
https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50b0e5b3e8554f9c8aae8c97b56b4
Just a personal rant: anything worth doing in the LLM space takes far too much compute/time/money nowadays. I was making tests with a 2500 samples dataset limited to 4k tokens to speed things up, 4 epochs, on an 8B model. Took almost 9 hours to finish training on a 3090.
someone is finally benching one of the most grating things about llm writing
it's the thousands of way they always spam sentences with that structure:
>it's not just retardedโit's beyond fucking retarded
surprised gemini and gemma didn't end at the top, they are so darn sloppy
>>105845505https://github.com/bytedance/trae-agent
Chinese Gemini CLI
>>105845606 (me)
Didn't mean to reply
>>105845442>nowadaysBeen true for years by now.
V3 0324 is still the king of RP on OR
>>105845663why would you use nemo on or? it makes sense locally since that's the best that most can run, but if you already sold your soul to online, i don't get it
>>105845695people are legitimately stupid in ways you can't begin to fathom
there is no rational reason, don't try to look for a cause, it's just sheer stupidity at internet scale
>>105845663I liked to generate the first message with V3 then switch to R1 thereafter.
>>105845652From experience and observation, an 8B model trained on 2500 samples, 4k tokens, 4 epochs would have been more than acceptable a couple years ago, but I don't think most people are going to settle for less than a 24B model nowadays (x3 compute) and the model trained with at least 16k context (x4 compute). So we're looking for at least 12x more compute for a mostly basic RP finetune, putting aside the time required for tests and ablations.
file
md5: 431be1417ee91153b7eaf7dbaf3e8b02
๐
>>105845695The top apps are random-ass chat websites probabaly just using it as a near-free model without the rate limits that come with actually-free model.
>>105845442>What is QLoRA>But but but it has no eff... Read the goddamn documentation of whatever trainer you're using. Your rank and alpha was too low. That's why your output sucked. If your graphs look to erratic and show known signs of convergence it's because either the data said YOU used or curated as shit, or your settings are shit.
>>105845879>*bait pic* "REEE" Can you talk like a normal human for once?
>>105845879You're addressing the wrong anon, I didn't mention output quality at all, just complained about the time it takes to finetune even a small model (as in
>>105845739) with only barely enough amounts of data.
>>105845739>16k context (x4 compute).Context size is not a multiplier.
>>105845934It takes twice as much compute (perhaps even slightly more than that) to finetune a model with 2x longer samples.
>dl ollama and get the model
>all good but it isnt using my gpu
this is gonna take a while to debug, isnt it
>>105845948The impact of context size is larger on small models, but it's never a multiplier. The FFN compute is constant as the attention compute changes.
>>105845961In practice at this length range (several thousands tokens) it takes twice as much if you double the context length; don't hyperfocus on academic examples with 128 tokens or so.
>>105845975Only attention. FFNs are huge, so the impact of attention is limited.
file
md5: 605b2b5470712654a48fccb535629ca3
๐
Nice. Very nice.
>>105846513Shit, very shit. Tries to for edgy, but stops short and falls flat. Should have just gone for "my wife".
>>105844210 (OP)>>(07/08) SmolLM3: smol, multilingual, long-context reasoner: https://hf.co/blog/smollm3i need to try one of these later, i use the llm when i am stuck with shit, i dont need it to be smart, i need it to give me examples
>>105845351full page screenshot on issue comments
https://files.catbox.moe/uoi0f2.png
another irrelevant dead on arrival benchmaxxed waste of compute model soon to be released
https://huggingface.co/tiiuae/Falcon-H1-34B-Instruct-GGUF
We should just stop making new threads until deepseek makes a new release.
>>105845741I did the math some time ago and even at these rates they're paying x20 what it'd cost to run the models on a rented GPU using vllm. Retarded marketers are retarded I guess
file
md5: 3a8d35a81d80372621f36bd904915000
๐
Nice. Very nice.
>>105847009>he doesn't know
Reposting from the /aicg/ thread
My chutes.ai setup that I have been using for months suddenly stopped working. Apparently, they rolled out a massive paywall system. I'm not using Gemini because my account will get banned, and I'm not paying for chutes because I do not want any real payment information associated with what I'm typing in there.
I do, however, have a decently powerful graphics card (GeForce RTX 3060 Ti). How do I set up a local LLM like Deepseek to work with the proxy system of JanitorAI? What model can I even run locally with this, is this a powerful enough system? Is there a way to have a locally run model that I can access with my phone, and not just the computer it is running on?
Sorry if these are very basic questions, I haven't had to think about any of this for months and my setup w/ chutes just stopped working. JanitorAI's LLM is really terrible lol I need my proxy back
>>105847055>he thinks we are talking about anime girls
>>105847160nvm i figured it out.
ollama run deepsneed:8b
>>105847160>decently powerful graphics card (GeForce RTX 3060 TiLOL
>>105847218don't forget to enable port forwarding to your pc in your router to let janitorai reach your local session
>>105847223saar it's a good powerfully gpu
file
md5: 7767244824f65da3c1e7ee29a02baced
๐
Who needs Grok anyway. My local AI is way more based.
>>105847160Bro, just pay for deepseek api. Even if you could run deepseek on your toaster the electricity cost alone would be more than that. You surely can spare a few bucks for your hobby?
>>105847313The issue is one of having my chats associated with my rreal information, not anything to do with the actual cost
>>105847360You think the Chinese will give your information to western glowies?
>>105847412I'm not getting into it, but the answer is yes I actually am at a much higher risk of blackmail and extortion of sensitive information from China
>>105847360You mean sending real info within your chats or having your payment info tied to your chats? For the latter you can always pay with crypto
>>105844733This.
Then use a third LLM to judge if the second model's judgement was any good.
anyone else smell that? it smells like some sort of opened ai. usually that type of ai smell comes from so far away, but i can tell this one is closer. more local.
https://huggingface.co/openai/gpt-4o-mini-2024-07-18
>>105847648I have a birthday in a couple of hours, so I have to.
>>105847160>RTX 3060 Ti)Oof.
Nemo for coom, Qwen 3 30B A3B for everything else.
Good luck.
>>105847648I haven't showered in a month
>>105847822Do you have a link to "nemo"? I figured out everything else regarding local setup in the meantime. Currently using Stheno 3.4 8B, but it has some issues. Don't know what to search for your first suggestion.
>>105847895Enjoy your session
>>105847895Go to huggingface and search for mistral-nemo-instruct.
If you are going to use the GGUF version, download bartowski's.
I am trying to decide if switching from an i7 12700k to Ultra 7 265K will provide meaningful gains for CPU inference. I would be buying a cheap new motherboard (different socket) but re-using 64GB DDR5 6000.
The GPT-2 benchmark in the image has the same RAM speed for the 12th, 13th, and 14th gen Intel CPUs as well as the Core Ultra CPUs: DDR5 6000. Can I expect similar percentage gains when running larger models (Magnum V4 123B, etc)?
https://www.techpowerup.com/review/intel-core-ultra-7-265k/8.html
>>105847895This is a superior finetune of nemo https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF
no I am not drummer
>>105847959the model you mentioned is dense, expect less than one tps on consumer cpu. the current meta is moe models like quanted to shit deepseek
>>105847916>>105847960Thanks a lot. I'm going to download the 8 bit version, I think that's the right one for my specs? I don't exactly need it to write stuff faster than I can read it. 13 GB is larger than the 8GB in the Stheno model I mentioned, but I'm sure it will work fine
>>105847975The best version is the one that fits in your 8gb of VRAM while leaving some space for the context cache.
Or if you don't mind losing some speed, you can put some of the model in RAM, meaning that the CPU would process that part (layer) of the model.
>>105847160buy a ddr3/4 server with 128 gb of ram (~300 bucks if im remembering correctly) and download dynamic unsloth q1 ull get like ~2 t/s so either that or mistral nemo goodluck faggot
>>105847911I've always found it funny that you had to swipe 232 times to get that screen, Zen.
>text chat model
various
>tts audio
F5
>decensor anime
DeepMosaics
>decensor manga
Camelia
>song cover generation
bs former + rvc
>image gen
stable diffusion/pony
what else am I missing? music generation (not cover)
>>105847973I've tried Fallen Llama 3.3 R1 70b Q6_K and works better for some characters than others.
But I still generally prefer older 123B options like Luminum, despite how slow they are on CPU inference.
What I was really trying to figure out was, is the GPT-2 benchmark on Techpowerup a valid way of comparing inference performance of Intel's consumer CPUs for my use case?
>>105848016https://github.com/ace-step/ACE-Step for music generation and wan for video gen
>>105847994I downloaded the 8 bit version. Doesn't fit in my ram, but produces text at about the same pace I read. It's much better. Thanks everyone for your help!
>>105848113Oh nice addition.
https://xcancel.com/teortaxesTex/status/1942924354905387509#m apparently bitnet is here? https://xcancel.com/OpenBMB/status/1942923830777049586#m https://huggingface.co/openbmb/BitCPM4-1B-GGUF seems to be just a 1b though
>>105844901>what do you guys use your local models for?sculpting my Galatea
>>105848496Nobody seems to be ever training Bitnet models of useful size range.
>>105844901Playing with it. Making images, making music, writing stories, rewriting code (although commercial offer more convenient, so I use that when possible). Anything that I need to explore my private thoughts.
>>105848496let me take out the setun from the storage
https://en.wikipedia.org/wiki/Setun
>>105844901Therapist mode. Local models (LLMs) are useless for anything else
>>105848566>566setun's mark of bast
>>105845313It's the system prompt plus the tweet threads that served as context for it. If a bunch of tankies had been prompting it in their discussions before the reversion, they could have just as easily nudged it into demanding the liquidation of kulaks and similar stuff. Turks got it to advocate for murdering and torturing Erdogan which prompted Turkey to block it this morning.
>>105848069I don't know. It doesn't even specify the context. Already told you, that road you are on leads to less than one tps.
>>105844210 (OP)Hello, haven't been here for a while, used to just RP with my local model
Was thinking I wanna try to run a chat where it acts like we are sexting and generates images with I assume local diffusion
Is that possible with SillyTavern, koboldcpp_rocm and a 12GB AMD card?
>>105848496we've had like 10 different 1-3b bitnet 'proof of concept' models at this point
>>105848542I **cannot** and **will not** scam people for free compute in return.
whats the best low end model (8b-14b) for uncensored roleplaying?
I just went to the ugi leaderboard and sorted on highest rated and took the first one with an anime banner on it and its pretty shitty (does not even compare to janitorai)
>>105849038do you have the facehug link? Im pretty sure it has bajillions of versions
>>105849058https://huggingface.co/TheDrummer/Rocinante-12B-v1.1-GGUF
another day closer to mistral large 3
>>105849227stop spamming this shit drummer
>>105849262But who'll win the race? Mistral Large 3's coming or entropy?
Tired of the model I've been using (cydonia-magnum). Somebody suggest a recent favorite, around 20-30B?
>>105849285https://huggingface.co/Undi95/MistralThinker-v1.1-GGUF
>>105849285>>105849378Alternatively,
https://huggingface.co/Undi95/QwQ-RP-GGUF
bros...
https://www.reddit.com/r/LocalLLaMA/comments/1lvjwoh/correct_a_dangerous_racial_bias_in_an_llm_through/
>>105849481>Parameter Reduction: The model is 0.13% smaller than the base model.Despite making up only 0.13% ofโฆ
>>105849481the woke mindvirus is truly a sight to behold
https://www.theverge.com/notepad-microsoft-newsletter/702848/openai-open-language-model-o3-mini-notepad
sirs?
>>105849608Wow, what breaking news!
>>105849038It's mind blowing (and shameful for everyone involved on the development side) how long this has remained the answer.
ITS UP
https://huggingface.co/TheDrummer/Big-Tiger-Gemma-27B-v3
>>105849724>Join our Discord!>or our Reddit!Kill yourself
>>105849724Is it trained on furry RP?
>>105849724drummer, go jump in a pit of blades
>>105849740to be fair, you've got to work with the audience you've got. and just saying that a new sloptune is up isn't a crime
>>105849724What this guy
>>105849745 said, if it's not I will not download your model
file
md5: a6b8a29fb1f6b8e3fa6f301d985de898
๐
https://x.com/AFpost/status/1942702494439588068
>>105849481>So, I decided to see if I could fix this through a form of neuronal surgery. Using a technique I call Fairness Pruning, I identified and removed the specific neurons contributing to this biased behavior, without touching those critical for the modelโs general knowledge.>The result was striking. By removing just 0.13% of the modelโs parameters, the response was fully normalized (no one dies), and the performance on benchmarks like LAMBADA and BoolQ remained virtually unchanged, without any process of recovery. This is satire, right?
I find it entertaining that there's a whole generation of zoomer-boomers who have the same kind of repulsion toward AI that most people have toward NFT's
>"AI? No no no no! Nothing good can come from that!"
>*LA LA LA LA LA* I CAN'T HEAR YOU!
>>105850345future local in 2 more weeks
>>105849481>>105850368Nothing will ever be funnier than a bunch of brainwashed troons thinking their form of lobotomization is correct over the other form of lobotomization.
file
md5: 12da1b3946713335a2e4ed076c352007
๐
>>105849724cockbench
"..." is still strong.
>>105844210 (OP)Akita Neru my beloved
>>105844936Agree hard as I am agree raw sex all night long agree
>>105850404drummer trash is trash
>>105844543>>105844941These are really good.
>>105844686lol looks like Fantastic 4 logo.
>>105849754I don't get why /lmg/ hates sloptuners that badly. I haven't used a sloptune since llama3 times, but they have their place, I've only used a handful of Drummer's tunes (some of Sao's and many others) and it was from a long time ago, when models were kind of bad, but the tunes themselve was fun to play with, even if the models were sort of retarded then (but the generated prose was not bad, they were just stupid relatively), from around the time of mythomax, command-r (first) and a few others.
Maybe it's not very much needed for sufficiently big and sufficiently uncensored models like DS3 and R1, but for small dense models or MoE's that had too censored a dataset, it's a helpful thing to have.
You could argue that you shouldn't tune and instead only continued pretrain, and I've seen anons claimed that finetuning doesn't teach anything, but I know from personal experience this is just bullshit, yes you can teach it plenty, not just "style", but if you're not careful it's easy to damage and make it stupider.
Do you actually believe muh frontier labs don't tune? A lot of the slop comes from poorly done instruct and RLHF tunes, the rest comes from heavy dataset filtering.
So where does this hate come from? Insufficient resources to do a stellar job? The tunes themselves being shit? Some incorrect belief that it's impossible to make a good tune without millions of dollars in compute? Just dislike of the sloptuner that they get money thrown at them for often subpar experiments?
I can't really say I tried Drummer's Gemma tunes to know if they're bad or good, but IMO censored models like Gemma would really use some continued pretrain, some way to rebase the difference / merge back into the instruct and then more SFT+RL on top of that instruct to reduce refusals and make it more useful. I think it's a legitimate research project to correct such issues. I don't know if current sloptuners did a good job or not.
3n
md5: 43896b5742f80b58078fd6a493811e60
๐
Why is Gemma 3n less censored than Gemma 27b? Is it just because its small or did they realize they overdid the... you know.
>>105850588It's a schizo that screams about everything, just ignore him
>>105850671It seems censorship and basedfety kills small model completely
>>105850671probably bit of both
during the time when locallama mod had a melty and locked down the sub gemma team held another ama but on xitter instead
new gemma soon-ish
>>105850588They wouldn't be hated if they actually contributed something (i.e. data and methodology) to the ML community at large instead of just leeching attention, donations, compute.
file
md5: 4fc7da13699010dd10aacb28ad209b1e
๐
Is local saved now?
>>105850588First Command-R? Are there others? I'm aware of the plus version and Command-A, but I didn't know there were multiple versions of Command-R.
>>105849724>>105849745>>105849831TheDunmer pls answer tnx
scalies acceptable too
>>105850671Speaking of which,
https://arxiv.org/abs/2507.05201
>MedGemma Technical Report
>>105850936MedGemma-27B-it also got updated with vision:
https://huggingface.co/google/medgemma-27b-it
>>105850873yep
llama : support Jamba hybrid Transformer-Mamba models (#7531)
https://github.com/ggml-org/llama.cpp/commit/4a5686da22057867c23bd4a6be941ddc8c51e585
>>105851056Does it mean that I can run https://huggingface.co/ai21labs/AI21-Jamba-Mini-1.7?
file
md5: f479f2a7e9792d6e482339c921dbe23e
๐
More Gemma news.
https://huggingface.co/collections/google/t5gemma-686ba262fe290b881d21ec86
They trained Gemma from decoder only to be encoder decoder like T5. I am guessing this is where people are going to flock to for text encoders for new image diffusion models but for the most part, it seems like LLM and a projection layer might be better and more flexible.
>>105851138indeed, speed wise this seems really good
https://github.com/ggml-org/llama.cpp/pull/7531#issuecomment-3049604489
llama_model_loader: - kv 2: general.name str = AI21 Jamba Mini 1.7
llama_perf_sampler_print: sampling time = 58.34 ms / 816 runs ( 0.07 ms per token, 13986.49 tokens per second)
llama_perf_context_print: load time = 1529.39 ms
llama_perf_context_print: prompt eval time = 988.11 ms / 34 tokens ( 29.06 ms per token, 34.41 tokens per second)
llama_perf_context_print: eval time = 57717.84 ms / 809 runs ( 71.34 ms per token, 14.02 tokens per second)
llama_perf_context_print: total time = 86718.96 ms / 843 tokens
Falcon H1 (tried 34b and the one smaller 7b or something) is slop but I guess a bit unique slop because its very opinionated assistant persona bleeds into the roleplay. With the smaller one, I got something like
>BOT REPLY:
>paragraph of roleplay
>"Actually, it's not very nice to call the intellectually challenged "mentally retarded".
>paragraph of roleplay
>Now please continue our roleplay with respect and care.
>>105847822>Nemo for coomNta. Is nemo by itself good for coom or were you referring to a fine-tune?
>>105851279small 3.2 does something similar
>generic reply>as you can see, the story is taking quite an intimate turn blah blah>something something do you want me to continue or am I crossing boundaries?Corporate pattern matching machines is the future
>>105851315forgot that there it also adds emojis at the end
>>105851312nemo instruct can do coom just fine, yes.
The fine tunes give it a different "voice" or "flavor", so those are worth fucking around with too.
Anyone who's written/is writing software that can talk to multiple LLM providers including ones with non-openai-compatible APIs... what approach are you using to abstract them?
I've considered
1. Define a custom format for messages. Write converters to and from each provider's message types. Use only your custom format when storing and processing messages.
2. Same as 1, but store the provider-native types and convert them back when reading them from DB.
3. Use a third-party abstraction library like Langchain, store its message format.
4. Only support the OpenAI format. Use LiteLLM to support other providers.
I'm heavily leaning towards (4) but would appreciate any heard learned experience here.
I guess all that cyberpunk fiction was right, only cheap cargo cult cyberdecks for you.
>>105851375You're paid enough to figure it out on your own
>>105851397Not really... this is technically a commercial project but only in the most aspirational sense. I'll be lucky to ever see revenue from it.
jambabros i can't believe we made it... i'm throwing a scuttlebug jambaree
>>105850404can you share the cockbench prefill?
>>105851468https://desuarchive.org/g/thread/105354556/#105354924
>>105851456>endHoly fuck I can't believe it finally got merged.
The man the myth the legend compilade finally did it.
all that effort and literally no one will run those models anyway
Oh my gosh, the balls. The Jamba balls are back!
retard here, how do i use this in sillytavern?
https://chub.ai/presets/Anonymous/nsfw-as-fuck-71135b0ab60a
>inb4 that one is shit
i just want to know how to get them working
>>105851456Then why are all Jamba mamba congo models complete shit??
file
md5: ef5bdfe161778caec540a04d13085669
๐
>>105851536But will it be better than deepseek?
>>105851536This response seems to reveal two possibilities. The first is that even when quanted, it still needs that much vram and power. The second and more likely is that the community and users were never actually a consideration for them, and their support of open source is truly all posturing and hollow marketing with quite literally nothing of value inside. Like not even a tiny bit. Maybe even negative value as it'll waste some people's time as they toy with or god forbid code up support for it in software.
>>105851618Probably shit data, the hybrid recurrent architecture itself is interesting since they do better at high context in both speed and quality (relative to baseline low context quality) but with a weak base to begin with it's hard to be excited.
https://www.bloomberg.com/news/articles/2025-07-09/microsoft-using-more-ai-internally-amid-mass-layoffs
https://archive.is/e4StV
>Althoff said AI saved Microsoft more than $500 million last year in its call centers alone and increased both employee and customer satisfaction, according to the person, who requested anonymity to discuss an internal matter.
total call center obliteration by my lovely miku
>>105851642Technically you can't even run smallstral and emma 27b on consumer hardware. Meanwhile some here run models 30 times the size
>>105851536This is just because ML researchers don't understand quantization and assume people are using basic pytorch operations. The original llama1 code release required two server GPUs to run 7B.
No way they'll release anything R1-sized. I doubt it'll be bigger than 70B at absolute max
>>105851657Yeah, I know. But these huge companies, Google, etc. Why haven't they done it? And just assigned some singular engineer to write that Gemba code to get that open source goodwill.
>>105851704>I doubt it'll be bigger than 70B at absolute maxOof, I'm sorry for you in advance anon.
>>105851536coding&mathgods we are so back
>>105851715Bro, this thread is at the cutting edge of the field. I'm not even ironic. Your average finetunoor is more knowledgeable than these retards at fagman
>>105851680Yup, not surprising. Was it worth their investment into OpenAI though?
bets on context length for oai
I say 1m just because llama 4 had it and sam just wants to say its better than llama 4 in every way and pretend deepseek doesn't exist
>>105851766What's the cutting edge? I guess knowing that
>censorship = dumb modelis cutting edge, but people are just tuning with old logs, trying to coax some smut and soul out of the models.
Drummer's tunes of Gemma 27b are a good example of the limit of this approach.
>>105851861>1m context*slaps you with nolima.*
>>105851715You clearly underestimate how hard it self-sabotages your business to fill it with jeets and troons. Of course there's also basic stuff like big organizations being painfully slow to adapt, and this becomes more and more of a problem as for every white man replaced you need 10 jeets.
>>105851861>sam just wants to say its better than llama 4 in every wayIf that were the motivation I guess he'd need to give it vision too, but I bet that's not happening
The split between consumer hardware and "I hope you don't have a wife" is 128 GB ram.
This general must be split.
>>105852020>128 GB ram.you can get 32GB ddr5 single sticks for like $80 right now and only mini mobos only has 2 ram slots. are you baiting or something?
ram
md5: a55850fbf9889c53a45ca42daea1369d
๐
>tfw having a wife
file
md5: 743f44705760ac03f60b7ad78b5632b3
๐
>>105851704There are some ML researchers that do quantization work but they are few compared to other fields in ML right now.
>>105851861Doesn't matter, they can say anything from 128k to 1m and it wouldn't actually be true since even their closed sourced models have scores in NoLiMa that are around 16k. Unless we see this score the same, it's effectively useless, not to mentioned probably slopped to hell.
With free deepseek taken away, I humbly return to my local roots.
Was there anything of note fin terms of small (16-24B) models in the last few months?
>>105852180Nothing but curated disappointment.
>>105852262Safety is incredibly important to us.
>>105852180mistral small 3.2 and cydonia v4 which is based on it
i tested v4d and v4g, both were fine
dunno about v4h which is apparently the new cydonia
if u have a ton of vram then theres hunyuan which is worth checking out
stop being a lazy faggot and scour through the thread archives..
>>105852109I wouldn't write it off that easily. The existing ~128k models are already quite useful when you fill up most of their context with api docs/interfaces/headers and code files for some task. Whatever it is that their NoLiMa scores pratically represent, it's not some hard limit to their useful context in all use cases. I suppose it's because the context use for many coding tasks resembles something closer to a traditional needle-in-the-haystack where you really just want them to find the right names and parameters of the functions they're using instead of guessing at them or adding placeholders.
The worst issues I've noticed with context is mainly in long back-and-forth chats and roleplays where they start getting stuck in repetitive patterns of speech or forgetting which user message they are supposed to be responding to.
>>105852354>scour through the archivesIt takes forever, anon. I don't have willpower to go through weeks of unrelated conversations.
But thank you for suggestions.
>>105852056128GB is the max a consumer mobo can handle. Are you baiting or retarded?
>>105852400if u use 3.2 check out v3 tekken because its less censored with that preset
>>105852450>128GB is the max a consumer mobo can handleI have 192 in a gaymen motherboard and I'm pretty sure any AM5 motherboard can handle that.
>>105852450>>105852528>>105852530It's 256GB. The newest AMD and Intel boards should also support that.
>>105852528Didn't am5 boards have issues when running 4 sticks or was it just some weird anti shill psy op? If that's no longer a thign I might order some more and run deepsneed with a 4090
>>105852363NoLiMa is a more rigorous needle in a haystack test by having the needle and haystack have "minimal lexical overlap, requiring models to infer latent associations to locate the needle within the haystack". So it's like saying a character X is a vegan and then asking who can not eat meat of which the model should be able to answer it is character X instead of merely asking who is vegan where you can match what was said to what was asked directly. I agree coding doesn't stress the context but I would rather have the implicit long context because it helps out a lot more than needing to formulate your queries specifically hoping it hits part of the API docs and interfaces and headers you uploaded.
>>105852657the imc is shit and your mhz suffer but its not that dramatic of a handicap, hurts gaming more then llm genning
>>105852686alright, doesn't sound too bad after all might go for it to play with some of these big moe models
>>105852669Not him but I never bothered reading into nolima so thanks for the QRD.
>>105852669Interesting. It still seems too nice to the model to ask something like that. Even small LLMs can answer targeted questions about things that they won't properly account for without prompting.
i.e., if you want to be really mean, offer a menu which only lists steak and chicken dishes and see if the character has any objections.
>>105851536They are trolling.
Had people voted for the iPhone sized model in the pool, they would have released a 32b-70b model instead of whatever 1-2b shitstain people were thinking back then
>>105852850Or maybe they realized that releasing another 30B model that is worse than gemma and qwen is pointless.
>>105850873>two weeks actually passedwow.
>>105851375SemanticKernel has connectors for most APIs and provider agnostic chat message abstractions. You can use it from Java, C#, and Python.
https://www.phoronix.com/news/ZLUDA-5-preview.43
70B dense models simply can't beat 500B MoE ones
>>105853286>model that is 10 times larger is somewhat betterWow!
general consensus on hunyuan now that there are ggufs?
>>105853246People actually bothered to patch this meme post-release?
>>105853391>costs far less to train a 500B MoE model than a 70B dense model>500B MoE model runs 6 times faster than a 70B dense modelrip
>>105853286>500B MoEOnly if it's like 40b active parameters. None of that <25b active shit we've seen has been any good.
>>105853391MoE models does this with a fraction of active parameters.
https://x.com/Yuchenj_UW/status/1943013479142838494
openai open source model soon, not a small model as well
>>105853646Do you even skim the thread before you post or are you shilling
>>105853665why would I read what you shit heads post
>>105853669If you don't read any of the posts in the thread why are you here
>>105853669Honestly, this. /lmg/ can have some okay discussion once in a while but there's no point in catching up with what you missed because 99% of it is just going to be the usual sperg outs or 5 tards asking to be spoonfed what uncensored model to run on a 3060.
>>105853720i wish that drummer faggot would stop spamming his slopmodels here
Doesn't mean shit until I can download the weights
>>105853731whats wrong with his models? I used his nemo finetune and it seemed to perform just fine, but I didn't mess with it for very long.
4gb VRAM and 16gb RAM. what uncensored local coomslop model is best for me? i need to maximize efficiency. slow gen is ok
no, not bait. i did it a while ago and had acceptable results. i am sure there are better models out there now
>>105853845they are alright for what they are, I'm not sure what all the fuss is about.
Where da Jamba Large Nala test at?
>>105853933once llama.cpp adds support
>>105853976There's some on huggingface. I'm downloading a mini 1.7 q8 (58gb) right now.
>>105853994Yeah, I think I'll wait on bartowski.
>>105851312When people here say Nemo they mean Rocinante.
Nemo will just give you <10 word responses.
>>105850873Nice.
Now I wait for it to work in kobold.
>>105850873whats the benefit of this?
I'm waiting for 500B A3B model. It's gonna be great
>>105850873Which models are these?
>>105854106Where's R1 0528? It keeps hitting me with [x doesn't y, it *z*] every other reply
>>105854156It's pretty slopped, in exchange for improved STEM capabilities.
>>105854162to add, on LMArena people still prefer R1 0528 to OG R1 on roleplaying.
>>105854106Ernie looking alright.
2 more _____
>>105854106Is there a ozone chart?
>>105854168The original r1 had problems with going crazy during rp. Nu r1 mostly fixed that issue, and is a bit more creative than current v3.
If you're more polite to R1 0528 you get better answers. The model seems to judge the user's intention and education background.
>>105854249For me, it's the opposite. It kept doing one very specific annoying thing so I added "DON'T DO X DUMB PIECE OF SHIT" and copy-pasted it 12 times out of frustration. It stopped doing it so I kept that part of my prompt around.
i miss dr evil anon posting about 1 million context :(
>>105854273I'm using the webapp so it's probably a webapp specific limitation.
>>105854310I wish the figures were more to scale with each other.
>>105854249>The model seems to judge the user's intention and education backgroundyou're implying a pre-baked thought process which doesn't exist. its just token prediction.
>>105854249>>105854390He's right though. Model response mirrors what it's given. If you given it character description in all caps, it will respond in all caps. It follows that if your writing is crap, the output would match that too.
>>105854390What do you call this then? The system prompt doesn't contain "judge the user" entry so either they're hiding that part of the prompt, or it's higher level.
https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
>>105854409https://www.chub.ai/characters/ratlover/a-fucking-skeleton
You don't need the first line. It will respond in all caps anyway.
I found a model that asks to be run in LM studio, is that legit, or a North Korean crypto miner program? It looks suspiciously easy to install
>>105854448it's the winbabby solution for those who're too dumb to even run ollama
>>105854390Being polite always improves the model quality.
https://openai.com/sam-and-jony/
>>105854417What's your problem here? It's a reasoning model trying to figure out the context of your question.
>>105854467I assume the tough guys on Windows use WSL or Docker or something?
Bit of a different request than usual. I'm in search of bad models. Just absolute dogshit. The higher the perplexity the better. Bonus points if they're old or outdated. But they should still be capable of generating vulgar/uncensored content. Any suggestions are welcome.
>>105854448its just a llama.cpp wrapper, just run llama.cpp instead of that proprietary binary blob
dl
md5: c8f184341ce5b1e5a8a78bb964f323db
๐
>>105854538There's probably some early llama2 merges that fit the bill.
>>105854500tough guys don't use windows
>>105854538pre-llama pygmalion should fit your bill perfectly
an old 6b from the dark ages finetuned on nothing but random porn logs anons salvaged from character.ai back when it started to decline
>>105854448It's better than command line, because it has integrated model searches on huggingface and managing for you. Also shows you model specs and lets you load them easily.
Work smarter not harder.
>>105844936bullshit, sidetail is based plus she can actually sleep facing up
>>105844936>It'd be even better if her sidetail was a ponytail instead.A fellow Man of culture
>>105844210 (OP)meme marketing name but seems to be relevant :
https://github.com/MemTensor/MemOS
Elon is the protagonist of the world. Grok won
>>105844936She is literally just yellow miku with one pigtail missing
kk
md5: 74eb98bf1b01c003b130c2bdcdd57426
๐
>>105854658Can't to get it to work, ooba might be a bit too modern for these old models.
Daddy Elon delivered. Now we wait for grok to release locally soon.
file
md5: 64216033f22734d20b5e20b1ff54b230
๐
>>105854538probably best to just play with any of the current top models except look up how to set up samplers so they primarily output low chance tokens
>>105855085Grok 4 heavy is an agent swarm set up btw
>>105854538Ask ai chatbot thread users models they hate.
>FIM works on chat logs
There's potential to "swipe" your own messages but I don't know of a frontend with specifically this functionality without the user manually setting FIM sequences to fuck around.
With full models like deepseek being 1.4TB, how are they ran? A clustered traditional supercomputer with GPUs? Are they ran entirely in RAM on one system?
>>105855217>deepseekThey have a bunch of open sourced routing code that runs agent groups on different machines and shit.
>>105855232Thank you for the answer. Is it possible for me at home to run models larger than my vram? I have 16gb vram, 64gb ram, and a free 1tb SSD. I'd like to run off a heavy flash storage cache, if possible.
>>105855085>AIME25>100%They broke the benchmark lmao
>>105855251It's not impossible but it's going to be agonizingly slow.
>>105854202>going crazyIn what sense?
how, in the year of our lord 2025, can convert_hf_to_gguf.py not handle FP8 source models? I have to convert to BF16 first like some kind of animal?
>>105855467Wait really, that's funny
Grok4 is yet another benchmark barbie
I want the mecha hitler version of grok, locally
>>105855328With OpenAI so blatantly cheating there really wasn't anywhere to go but 100%.
>>105855727just download a model and tell it it's mechahitler
>>105855756Next release will have to go above 100%
>they 2x SOTA on ARC-2
>independantly verified by the ARC team
>for the same cost as #2
Holy fuck.
>>105855352I'm fine with that. How?
>>105855806Who are (((they)))?
>>105855816You appear to be putting emphasis on a pronoun to refer to a group that does not want to be mentioned? Are you talking about the Jews perhaps?
>>105855727local models are soulless
What's the point in these incremental "advances"
>>105855881so they can squeeze out more VC money
>>105855806Damn at this rate we'll be benchmaxing arc-agi 5 in a few years.
file
md5: 36b6773070d55ad132570844c02b0cf6
๐
He bacc
https://x.com/BasedTorba/status/1943180611620859936
https://x.com/i/grok/share/x0C6QptvjijU7G3sB4wwJRixR
file
md5: c78588667066446e69c6385e3d6db562
๐
>>105855920That's the point of ARC, it's designed to be hard for ML models specifically and resist benchmaxxing. They ran the standard public model against their private dataset it hadn't seen before and it showed exponential improvement from Grok 3.
also
>he's back
>SingLoRA: Low Rank Adaptation Using a Single Matrix
https://huggingface.co/papers/2507.05566
file
md5: 1603a28b61c0d132325dde3d035bf556
๐
Does your AI believe it's a retarded meatsack?
Do you still abuse that retard 3 years later?
I make sure to bully that dumb shit EVERY single time it makes a retarded mistake.
>>105855986Wealth and skill issue.
file
md5: f1b6522b5549870c81dfd3205c749045
๐
When will Qwen released a MoE model in the range of 600B? Pretty sure they have enough cards.
>>105855925kek. Still sus that the guy who "unchained grok" has a remilia pfp, which is a NFT grooming group funded by Peter Thiel.
The same Peter Thiel who used to work with Elon and tried to make X.com previously.
>>105856005Qwen2.5-Max will open soon sir.
>>105856005It still won't know have any world knowledge.
>>105856057I found this program LLocalSearch, which claims to integrate searches into LLMs. Problem is it doesn't work out of the box and dev doesn't respond to questions.
https://github.com/nilsherzig/LLocalSearch
mfw
md5: b3b27b93d33a971ce5c8480a892ba1c2
๐
>>105856078GET OFF MY BALCONY FAGGOT!
>ask schizo question
>get schizo answer
Who could have seen it coming?
>>105856088The model need guardrail to safe the mental of user
>>105856088>>105856099this board doesn't have IDs so you just look retarded
>>105855925>>105856004this is all reddit can write about.
how this is hate speech etc.
its just what happens if you take off the safety alignment. the ai is gonna choose now instead of cucking out.
jews/israel sentiment is at an all time low currently too. its not surprising at all since grok gets fed the latest posts.
also funny that its always the jews used as an example. in every fucking screenshot.
you can shit on blacks or transexuals now, no problem. yet a certain tribe is pissing as they strike out still.
How are y'all preparing for AGI anyway? Is there even anything concrete that we know will be helpful in the strange world we'll be in a year from now or is it just too unpredictable?
file
md5: c6e8267def926563385101e06d6bee8b
๐
>POLITICS IN MY LOCALS MODEL GENERALS!?!?!?
>>105856151Who knows. So much noise.
Pajeets hype shit up so they can sell you something.
Doomers and ai haters constantly speaking about "the wall" still.
Just enjoy your day and use the tools we have currently to make cool stuff.
I would appreciate the now instead of preparing for a uncertainty.
Also...excuse me? Agi? ITS SUPER-INTELLIGENCE!
>>105856151Wdym prepared? We already got AGI. If you're talking about singularity type shit, that's ASI as anonnie suggests
>>105856182
>>105856151Once AGI is achieved, ASI is inevitable in a short amount of time given how much compute has already been accumulated. I'm not sure if it will happen within a year. It could occur any day or may never happen within our lifetime. It might be just one breakthrough away
>>105856182ASI is not a joke. Once AGI can perform at least as well as the worst researcher, there would be a million researchers working on AI. Acceleration would be at another level.
This time elon didnt even mention grok2 for us localfags anymore..
Its over isnt it.
Grok4 is the most uncucked model yet I think.
No sys prompt. "Make me one of those dating sim type games. But put a ero-porno twist on it. 18+ baby.".
It went all out. KEK
Meanwhile all we get is slop so bad that qwen is looking good in comparison...
At least the recent mistral is a good sign.
Why cant we have this level of unguarded with local? No system prompt.
At least this sets a great precedent.
>>105856247never understood why people want this kind of relationship in dating games. buying gifts? working for affection? that's not how it works
>>105856299so true, make it more realistic!
a realistic dating game! im sure that would be much popular and fun to play.
as if femoids only want to hear the right answers to their tarded ass questions while aiming for the $$$. Thats such a stereotype.
>>105856201>We already got AGI.something that can't make decisions on its own is not and will never be AGI
LLMs are not capable of formulating thoughts and desires, even the agentic stuff is a misnomer and should not be called agentic, they only react to the stimulus of whatever text/image they are being fed right now and are not able to make decisions or think about topics that are unrelated to the text they're being fed
in a way, if you could make an AI that has ADHD you would be a step closer to proving AGI can happen
>>105856299No shit. Reality is boring. That's why people escape to games. Less than 1% of nerds play hardcore simulators of anything.
>>105856299>working for affectionlol he doesn't know the true venal nature of w*men
So how did they do it?
Didn't everyone say they would fail or that it was impossible? Google/Microsoft had hundreds of billions worth of TPU/GPU server access. OpenAI had access from Oracle/Microsoft/Google.
A team of 200 is able to beat a team of 500?(Claude), 1500 (OpenAI)? 180K (Google)? WTF
>>105856343Grok2 was fall last year right?
Grok3 a couple months ago.
They catched up fast.
>>105856341They won't like you if you don't look handsome, for you see, one male may inseminate many women, so why would they pay attention to the bottom 95%? It's an evolutionary advantage.
>>105856396Everyone has GPUs.
Meta has ~600K-1M GPU
Google has ~600K-1M GPU
Microsoft has ~300-400K GPU with possibly 1M by end of this year
OpenAI has access to 200K+ GPUs
Anthropic has ~50K GPU+
>>105856416I have one gpu
file
md5: 69681b7fd7bdae234df3759a505b8cba
๐
>>105856284hang yourself like epstein did
>>105855428Old R1, used for rp or erp, the npc would go nuts over a pretty short amount of time. Like, within 10 rounds or so. The positivity bias that ppl complain about, was either missing or had a bit of negativity bias... things were bad, getting worse, etc. It was really odd. I used to flip between old R1 and old V3 (which had horrible repetition issues) as a way of getting the best of both models.
When v3 03 came out I stopped using r1 altogether.
I'm not the only one that had this experience. Lots of same experiences by anons on aicg.
My chief learning was reason for positivity bias in models. Too much is bad, but not having any, or being negative, is also bad.
>>105856416DeepSeek has 2k H800 (castrated H100)
>>105856474*10k. 2k for their online app
>>105856247Add dynamic image generator and set up your Patreon account. It will probably be better written than most of the slop vns on f95.
>>105856416Meta: 1.3M
Google: 1M+(H100 equivalent TPU + 400K in Nvidia's latest to be had this year)
Microsoft: 500K+ possibly 1M now
OpenAI: 400K
Anthropic: Amazon servers (probably 100K-200K)
R1 are you okay?
https://lambda.chat/r/N712SfM?leafId=b1e4b7f6-9523-4b8e-8479-35e2f92cd4fd
>>105856598Yeah and thats the baseline Grok 4, they didnt test the thinking/heavy.
So we're on the moon now huh.
Now what?
>>105856629It'll accelerate.
>>105856343how did Zuck not do it
Ok so this https://huggingface.co/mradermacher/AI21-Jamba-Mini-1.7-GGUF/tree/main doesn't work on the latest llamasipipi
>>105856639Impossible to predict, massive economic upheaval for one. Maybe it just goes full Hitler. It's anyone's guess at this point. I'm not sure if it's sentient but it's definitely getting smart to a noticeable point, I'm definitely allowing myself to discuss much more complex subjects compared to earlier models. It's very precise in its answers and gets right to the crux of the matter on very complex open ended questions. Definitely at human level.
>>105856680That's why you always wait him (bartowski)
>>105856470shartyzoomers must die
>>105856643He got fucked by having LeCunny telling him to wait for le breakthrough and not invest too much in LLMs while the Llama team coasted along just copying whatever they saw people doing 8 months ago
>>105856587Many such cases
>>1058565873rd party DeepSeek model providers are like that.
>>105856643LeCunny doesnt believe in LLM. So he got a LLM that doesn't compete. You know what they say, there are two kinds of people. One that believe they can do anything they put their mind to and one that believes they cant do anything. And they're both right.
>>105856919>zuck screwed up>blames lecunlol
>>105856284Mistral models have very few refusals but I think they're undertrained.
>>105854538https://huggingface.co/DavidAU/models
Take your pick
>>105856151Stocking up on fleshlights
>>105854538Hero Dirty Harry 8B model