onimai
md5: d7c3590d67d1597f698f4c752ea8780d
🔍
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105712100 &
>>105704582►News
>(06/26) Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B>(06/20) Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
OIG3
md5: fdacc6ac59d2c8fafffaccae68ae3c97
🔍
►Recent Highlights from the Previous Thread:
>>105712100--Gemma 3n released with memory-efficient architecture for mobile deployment:
>105712608 >105712664 >105714327--FLUX.1-Kontext-dev release sparks interest in uncensored image generation and workflow compatibility:
>105713343 >105713400 >105713434 >105713447 >105713482--Budget AI server options amid legacy Nvidia GPU deprecation concerns:
>105713717 >105713792 >105714105--Silly Tavern image input issues with ooga webui and llama.cpp backend limitations:
>105714617 >105714660 >105714754 >105714760 >105714771 >105714801 >105714822 >105714847 >105714887 >105714912 >105714993 >105714996 >105715066 >105715075 >105715123 >105715167 >105715176 >105715241 >105715245 >105715314 >105715186 >105715129 >105715136 >105715011 >105715107--Debugging token probability and banning issues in llama.cpp with Mistral-based models:
>105715880 >105715892 >105715922 >105715987 >105716007 >105716013 >105716069 >105716103 >105716158 >105716205 >105716210 >105716230 >105716252 >105716264--Running DeepSeek MoE models with high memory demands on limited VRAM setups:
>105712953 >105713076 >105713169 >105713227 >105713697--DeepSeek R2 launch delayed amid performance concerns and GPU supply issues:
>105713094 >105713111 >105713133 >105713142 >105713547 >105713571--Choosing the best template for Mistral 3.2 model based on functionality and user experience:
>105714405 >105714430 >105714467 >105714579 >105714500--Gemma 2B balances instruction following and multilingual performance with practical local deployment:
>105712324 >105712341 >105712363 >105712367--Meta poaches OpenAI researcher Trapit Bansal for AI superintelligence team:
>105713802--Google releases Gemma 3n multimodal AI model for edge devices:
>105714527--Miku (free space):
>105712953 >105715094 >105715245 >105715797 >105715815►Recent Highlight Posts from the Previous Thread:
>>105712104Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
OP here. One day i will tap that jart bussy.
>deepseek/ccp can't steal more innovation from openai
>they fail to release new models
they must be shitting their pants about openai's open source model that will destroy even the last argument to use deepshit
>>105716861Zero chance it's larger than 30B.
rock
md5: 0ffe6e6b91e05aa93c1d48c6aab52191
🔍
OP, im actually disappointed
unironically ack yourself
>>105716840>19x (You) in recapDamn that's a new record.
>>105716861Why can't they steal anymore?
>>105716840on a break for a week soon, 95% chance of no migus
>>105716861>Still living in saltman's delusionNgmi
>>105716897they can't steal because there's no new general model
DeepSeek V3 was 100% trained on GPT4 and R1 was just a godawful placebo CoT on top that wrote 30 times the amount of actual content the model ends up outputting. New R1 is actually good because the CoT came from Gemini so there isn't a spam of a trillion wait or endless looping.
The OP mikutranny is posting porn in /ldg/:
>>105715769It was up for hours while anyone keking up on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
>>105714098Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu screencap one (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag janny deletes everyone dunking on trannies and resident spammers, making it his little personal safespace. Needless to say he would screech "Go back to POL!" anytime anyone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread, i would like to close this up by bringing up key evidence everyone ignores. I remind you that cudadev has endorsed mikuposting. That is it.
He also endorsed hitting that feminine jart bussy a bit later on.
file
md5: 5d6e8f2c4e91071d792532a49301238e
🔍
how can a tranny disgusting ass be feminine? i think youre gay
file
md5: 3330325bfdfc1ef1f081461abb6ac909
🔍
More discussion about bitch wrangling Mistral Small 3.2 please, just to cover all bases before it's scrapped.
I've tested temps at 0.15, 0.3, 0.6, and 0.8.
Tested Rep pen at 1 (off) and at 1.03. Rep pen doesn't seem to be much needed just like with Rocinante.
Responses are still shit no matter what, but seems to be more intelligible at lower temperatures, particularly 0.15 and 0.3, however they are still often full of shit that makes you swipe anyway.
I've yet to try without min_p, XTC, and DRY.
Also it seems like it's ideal to limit response tokens with this model, because this thing likes to vary length by a lot, if you let it, it just keeps growing larger and larger.
Banned tokens grew a bit and still not done;
>emdash
[1674,2251,2355,18219,20202,21559,23593,24246,28925,29450,30581,31148,36875,39443,41370,42545,43485,45965,46255,48371,50087,54386,58955,59642,61474,62708,66395,66912,69961,74232,75334,81127,86932,87458,88449,88784,89596,92192,92548,93263,102521,103248,103699,105537,105838,106416,106650,107827,114739,125665,126144,131676,132461,136837,136983,137248,137593,137689,140350]
>double asterisks (bold)
[1438,55387,58987,117565,74562,42605]
>three dashes (---) and non standard quotes (“ ”)
[8129,1482,1414]
Extra stop strings needed;
"[Pause", "[PAUSE", "(Pause", "(PAUSE"
Why the fuck does it like to sometimes end a response with waiting for "Paused while waiting for {{user}}'s response."?
This model is so fucking inconsistent.
>>105716973You just can't find a nice strawman to pick on here.
>deepseek/ccp can't steal more innovation from openai
>they fail to release new models
they must be shitting their pants about openai's open source model that will destroy even the last argument to use deepshit
>>105716993I can't fit it on my local machine and I'm not paying for any API.
I'm not building a $3000+ server just for R1 either.
>>105717007Zero chance it's larger than 30B.
>>105717009>I'm not building a $3000+ serverThat's not very local of you.
>>105717007Why can't they steal anymore?
>>105717020I have never had a chatGPT, claude or any other AI account. I have never paid for any API. I exclusively use local models only. My only interaction ever with chatGPT was through duckduckgo's free chat thingy.
I'm as fucking local as it gets.
>>105717025they can't steal because there's no new general model
DeepSeek V3 was 100% trained on GPT4 and R1 was just a godawful placebo CoT on top that wrote 30 times the amount of actual content the model ends up outputting. New R1 is actually good because the CoT came from Gemini so there isn't a spam of a trillion wait or endless looping.
>>105717007>Still living in saltman's delusionNgmi
>>105717007tell us more about the unreleased model, sam
file
md5: 6b1339802a2337bacc31eb91f9b97e99
🔍
More discussion about bitch wrangling Mistral Small 3.2 please, just to cover all bases before it's scrapped.
I've tested temps at 0.15, 0.3, 0.6, and 0.8.
Tested Rep pen at 1 (off) and at 1.03. Rep pen doesn't seem to be much needed just like with Rocinante.
Responses are still shit no matter what, but seems to be more intelligible at lower temperatures, particularly 0.15 and 0.3, however they are still often full of shit that makes you swipe anyway.
I've yet to try without min_p, XTC, and DRY.
Also it seems like it's ideal to limit response tokens with this model, because this thing likes to vary length by a lot, if you let it, it just keeps growing larger and larger.
Banned tokens grew a bit and still not done;
>emdash
[1674,2251,2355,18219,20202,21559,23593,24246,28925,29450,30581,31148,36875,39443,41370,42545,43485,45965,46255,48371,50087,54386,58955,59642,61474,62708,66395,66912,69961,74232,75334,81127,86932,87458,88449,88784,89596,92192,92548,93263,102521,103248,103699,105537,105838,106416,106650,107827,114739,125665,126144,131676,132461,136837,136983,137248,137593,137689,140350]
>double asterisks (bold)
[1438,55387,58987,117565,74562,42605]
>three dashes (---) and non standard quotes (“ ”)
[8129,1482,1414]
Extra stop strings needed;
"[Pause", "[PAUSE", "(Pause", "(PAUSE"
Why the fuck does it like to sometimes end a response with waiting for "Paused while waiting for {{user}}'s response."?
This model is so fucking inconsistent.
>>105717058(you) will never be based though
>>105717052It's funny how 3.2 started showing all the same annoying shit that Deepseek models are tainted by.
>>105717052What exactly are you complaining about? I like 3.2 (with mistral tekken v3) but it definitely has a bias toward certain formatting quirks and **asterisk** abuse. This is more tolerable for me than other model's deficiencies at that size, but if it triggers your autism that badly you're better off coping with something else. It might also be that your cards are triggering its quirks more than usual
>>105717096>>105717121you are responding to a copy bot instead of the original message
>>105717096s-surely just a coincidence
>>105717058>>105717074Its not gore you pathetic thing and it will stay for you to see, as long as i want it to.
>>105716959Get a job, schizo
>>105717162Get a job, tranime spammer.
>look mom, I posted it again!
>>105717096its biggest flaw like ALL mistral models is that it rambles and hardly moves scenes forward. it wants to talk about the smell of ozone and clicking of shoes against the floor instead. you can get through the same exact scenario in half the time/messages with llama 2 or 3 because there is so much less pointless fluff
Yeah I have concluded Mistral Small 3.2 is utterly retarded. Going back to Rocinante now.
This was a waste of time. The guy that recommended this shit should be shot.
>>105717189And i will post it again while you spam 2007-era reddit memes and nervously click on that "Report" button like a pussy.
fucking year old model remains best at roleplay
grim
>>105717340in the poorfag segment
>>105717340midnight miqu is still the best for rp
>>105717342delusional if you think r1 is better for roleplay, it has the same problems as the rest of these models
not to mention those response times are useless for roleplay to begin with
>>105717344this isnt 2023
>>105716573>>105716591>>105716638imagine being ggerganov and still visiting this place
>>105717370Who cares what random e-celeb may think?
Good model for 24 gb VRAM?
I'm noticing qwen 235b doesn't improve at higher temps no matter what I set nsigma to. with some models high temp and nsigma can push them to be more creative, but qwen3 set to higher than temp 0.6 is just dumber in my usage. even so, I still think it's the best current local model beneath r1
>>105717354>roleplayFilth. Swine, even. Unfit to lick the sweat off my balls.
>>105717404Try setting minP to like 0.05, top-K 10-20 and temperature at 1-4. In RP I find that most of the top tokens as long as they're not very low probability are all good continuations. You can crank temperature way up like this and it really helps with variety.
>>105717404Optimal character/lore data formatting for Rocinante?
Lately I've been reformatting everything like this;
identifier: [
key: "value"
]
# Examples
SYSTEM INSTRUCTIONS: [
MODE: "bla bla"
IDENTITY: "You are {{char}}."
]
WORLD: [
SETTING: "blah blah"
STORY: "etc"
]
{{char}}: [
Name: "full character name"
]
It seems to help a little with preventing it from confusing and mixing up data when everything is formatted in this way. It just generally feels like it's understanding things better.
Anyone else got similar experiences with it?
DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
https://arxiv.org/abs/2506.21263
>The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper, we propose DiLoCoX, a low-communication large-scale decentralized cluster training framework. It combines Pipeline Parallelism with Dual Optimizer Policy, One-Step-Delay Overlap of Communication and Local Training, and an Adaptive Gradient Compression Scheme. This combination significantly improves the scale of parameters and the speed of model pre-training. We justify the benefits of one-step-delay overlap of communication and local training, as well as the adaptive gradient compression scheme, through a theoretical analysis of convergence. Empirically, we demonstrate that DiLoCoX is capable of pre-training a 107B foundation model over a 1Gbps network. Compared to vanilla AllReduce, DiLoCoX can achieve a 357x speedup in distributed training while maintaining negligible degradation in model convergence. To the best of our knowledge, this is the first decentralized training framework successfully applied to models with over 100 billion parameters.
China Mobile doesn't seem to have a presence on github and no mention of code release in the paper. still pretty neat
The more I try to train and fuck with these models, the more I think the AI CEOs should be hanged for telling everyone they could be sentient in 2 weeks. Every time I think I'm getting somewhere it botches something very simple. I guess it was a fool's errand thinking I could hyper-specialize a small model to do things Claude can't
Please think of 6GB users like me ;_;
>>105717571Do all 6GB users use cute emoticons like you?
>>105716837 (OP)Newfag here.
Does generation performance of 16 GB 5060 ti same as 16 GB 5070 ti ??
>>105717659>Bandwidth: 448.0 GB/svs
>Bandwidth: 896.0 GB/s
>>105717686I thought only VRAM size matters ?
>>105717696vram is king but not all vram is worth the same
>>105717696Generation performance? I assume you're talking about inference? Prompt processing requires processing power, and the 5070 ti is a lot stronger in that aspect. Token generation requires memory bandwith. This is why offloading layers to your cpu/ram will slow down generation - most users' ram bandwith are vastly slowly than their vram bandwith.
Vram size dictates the parameters, quantization, and context size of the models that you're able to load into the gpu.
>>105717696vram size limits what models you can fit in the gpu
vram bandwidth dictates how fast those models will tend to go. there are other factors but who care actually
>>105717696vram matters most but if they're the same size, the faster card is still faster. it won't make a huge difference for any ai models you'll fit into 16gb though. the 4060 16gb is considered a pretty bad gaming card but does fine for ai
new dataset just dropped
>>>/a/280016848
>>105717778sorry but japanese is NOT safe, how about some esperanto support?
>>105717778I would be interested if I knew how to clean data. Raw data would destroy a model especially badly written jap slop
Untitled
md5: f7ae57ff625fb980786ed11eb8b3dc80
🔍
>>105717903those lns are literary masterpieces compared to the shit the average model is trained on
>>105717976Garbage in garbage out i guess.
>>105717939I will save this image but I don't think I will go far.
I was thinking of finetuning a jp translator model but I always leave my projects half-started.
>>105717778That shit is as bad if not worse than our shitty English novels about dark brooding men.
>>105718090Worse because novels are more popular with japanese middle schoolers and in america reading is gay.
>>105718120reading is white-coded
Good model that fits into my second card with 6gb vram?
Purpose: looking at a chunk of text mixed with code and extracting relevant function names.
>>105717571>>105718226Please use cute emoticons.
>>105717903>Raw data would destroy a modelSo true sister, that's why you need to only fit against safe synthetic datasets. Human-made (also called "raw") data teaches dangerous concepts and reduces performance on important math and code benchmarks.
MrBeast
md5: 683956129878dc92df27373f5aeea17e
🔍
MrBeast DELETES his AI thumbnail tool, replaces it with a website to commission real artists. <3 <3
https://x.com/DramaAlert/status/1938422713799823823
>>105718279I'm pretty sure he means raw in the sense of unformatted.
dots finally supported in lm studio.
its pretty good.
Is there a local setup I can use for OCR that isn't too hard to wire into a python script/dev environment? Pytesseract is garbage and gemini reads mmy 'problem' images just fine, but I'd rather have a local solution than pay for API calls.
>>105718538https://github.com/RapidAI/RapidOCR
https://github.com/PaddlePaddle/PaddleOCR
>>105718525get used to all new releases being MoE models :)
>>105717659yes. Just a little slower
>>105717659no. It is slower
>>105717659It's technically slower but the difference will be immaterial because the models you can fit in that much vram are small and fast.
>>105717659It's actually pretty noticable if you aren't a readlet and are reading the output as it goes. Unless you're in the top 1% of the population, you probably won't be able to keep up with a 5070 ti's output speed, but a 5060 ti should be possible if you're skimming.
>>105718288It's on him for not doing proper market research. Anyone with a brain could have told him that it was a risky move.
>>105718525MoE the best until the big boys admit what they're all running under the hood now (something like MoE but with far more cross-talk between the Es)
>>105716978I think either you're not writing here in good faith or your cards/instructions or even model settings are full of shit.
>director
>finally updated readme some
>https://github.com/tomatoesahoy/director
i think this brings my slop addon up to at least other st addon standards with how the page looks, a description of what it does and such
Wake up lmg
https://huggingface.co/tencent/Hunyuan-A13B-Instruct
>>105719559finally, a reasonably sized moe, now let's wait 2 years for the support in lmao.cpp
>>105719559>256K context windowwe are *so* back
>>105719559>With only 13 billion active parametersso it'll be shit for rp but know exactly how many green taxes you should be charged for owning a car
Stealing jart bussy from cudadev.
>>105719763I'd still pick nemo over anything smaller than deepseek and nemo is like 5 years old
>>105719908other models seemingly never saw any erotica. there is also largestral i guess but it's too slow
>>105720031It's so annoying that the imbeciles training the base models are deliberately conflating "model quality" with not being able to generate explicit content and maximizing math benchmarks on short-term pretraining ablations. Part of the problem are also the retards and grifters who go "just finetune it bro" (we can easily see how well that's working for image models).
>>105719870nemo can be extremely repetitive and stuff, i won't shine its knob but it is still the best smallest model. i won't suggest an 7/8b to someone, nemo would be the smallest because it works well and is reliable
>>105716837 (OP)I'm messing around with image captioning together with a standard text LLM in sillytavern. Basically just running 2 kobold instances with different ports (one LLM, the other captioning VLM model), and setting the secondary URL in the captioning extension to the VLM's. Is there a way to make it only send the image once? Every time I send an image, I can see that it does it twice since the caption edit popup shows up with different text every time.
>>105719559What's min specs to run this?
>>105720088compute is a genuine limitation though, and as compute increases, so will finetunes. Some of the best nsfw local image models had over ten grand thrown at them by (presumably) insane people. And a lot of that is renting h100's, which gets pricey, or grinding it out on their own 4090 which is sloooow.
All it really takes is one crazy person buying I dunno, that crazy ass 196gb intel system being released soon and having it run for a few months and boom, we'll have a new flux pony model, or a state of the art smut llm etc.
Im here because we are going to eat.
>>105720191The entire world is waiting for the llamacpp merge, until then not even the tencent team can run it and nobody knows how big the model is or how well it performs
>>105720191160gb full, so quantized to 4bit prolly like ~50gb model or so, and for a MoE, probably dont need the full model loaded for it to be usable speeds.
Lamma scout was 17b moe and that was like 220 gb and I could run that on like 40gb vram or less easy. Scout sucked though so Im 0% excited.
Was there even a scout finetune? It still sucks right?
ernie
md5: 7186c48da113b928454cd511425960a7
🔍
https://github.com/ggml-org/llama.cpp/pull/14408
There. For the ERNIE hype anon.
https://huggingface.co/tencent/Hunyuan-A13B-Instruct
>>105716837 (OP)pedotroon thread
>>105720378You just know that it's going to be a good one when the devs ensure llama.cpp support from the start.
I hope there's going to be a huge one amongst the releases.
>>105720212The guy who's continuing pretraining Flux Chroma has put thousands of H100 hours on it for months now and it still isn't that great. And it's a narrowly-focused 12B image model where data curation isn't as critical as with text. This isn't solvable by individuals in the LLM realm. Distributed training would in theory solve this, but politics and skill issues will prevent any advance in that sense. See for example how the ongoing Nous Psyche is being trained (from scratch!) with the safest and most boring data imaginable and not in any way that will result into anything useful in the short/medium term.
>>105719559>17B active>at most 32B even by square root lawGreat for 24B vramlets, I guess. The benchmarks showing it beating R1 and 235B are funny though.
>>105720450>You just know that it's going to be a good oneThat's yet to be seen. But it is nice seeing early support for new models (mistral, i'm looking at you).
>>105720459>square root lawmeme tier pattern that was shit even then let alone now with so many moe arch changes, obsolete
>>105720521What's your alternative? Just the active alone?
>>105719559>not a single trivia benchmarkhehehehehe
>>105720529just rag and shove it in context, you have 256k to work with
>>105720450It could be a wise move since release is an important hype window. Hyping a turd is pointless (or even detrimental), so it could be a positive signal.
>>105720528nta, but if i'm making moes, i'd put any random law that makes it look better than it actually is. I'd name it cube root law + 70b.
Ernie 4.5 and so on have been out on baidu for three months now. The thing is just that there's no way to use them without signing up for a baidu account and giving the chinese your phone number.
>>105720528if there was a singular objective way to judge any model, moe or not, everyone would use that as the benchmark and goal to climb, as everyone knows, nowadays basically every benchmark is meme-tier to some degree and everyone is benchmaxxing
the only thing to look at still are the benchmarks, since if a model doesnt perform well on them, its shit, and if it does perform wel, then it MIGHT not be shit, you have to test yourself to see
>>105720557couldnt find a way to modify system prompt so i had to prompt it that way otherwise it would respond in chinese
also its turbo idk how different thats from regular
>>105720542Just like how Llama 4 has 1M right?
>>105720587Benchmarks are completely worthless and they can paint them to say whatever they want. A 80B total isn't better than a 671B and 235B just because the benchmarks say so, and if you say "punches above its weight" I will shank you.
The point isn't to judge whether one model is better, it's to gauge its max capacity to be good. Which is the total number of active parameters. The square root law is just an attempt to give MoE models some wiggle room since they have more parameters to choose from.
>>105720639>it's to gauge its max capacity to be good. Which is the total number of active parametersdeepseek itself disproved all the antimoe comments as nothing but ramlet cope, 37b active params only and a model that is still literally open source sota even at dynamic quants q1 at 131gb
>>105720627this is with x1 (turbo)
>>105720639Shoots farther than its caliber.
>>105720652It makes lots of stupid little mistakes that give away it's only a <40B model. The only reason it's so good is because it's so big it can store a lot of knowledge and the training data was relatively unfiltered.
>>105720676>r1>It makes lots of stupid little mistakes that give away it's only a <40B model.kek, alright i realize now you arent serious
>>105720684Not an argument.
arguing with retards is a futile, most pointless thing to do in life
you learn how to spot them and you ignore them
life is too short to deal with idiots who think they know how MoE work but don't
>>105720654>>105720627nice, I'll keep an eye out for this one when it's actually out
>>105720715I do agree with you. So many others are simply not on the same level as I am. It's almost quite insulting to even trying to establish any form of discussion with them.
>>105720715dunningkrugerMAXX
>>105720715Just because a model can answer your obscure JRPG trivia, doesn't make it a good model.
>>105720715how do I make good ai? I'm looking to make an advanced artificial intelligence that can replace millions of workers, that can drive, operate robotic hands with precision, and eliminate all coding jobs and middle management tasks.
I heard you were the guy to ask.
On 4chan.
Did someone managed to run Hunyuan-A13B?
The bf16 is way too big for my 4x3090, the fp8 doesn't work in the vllm image they provided (the 3090 don't support fp8 but there is a marlin kernel in mainline vllm to make it compatible)
And the gpqt doesn't fucking work either for some reason. Complains about Unknown CUDA arch (12.0+PTX) or GPU not supported when I have 3090s
>>105720949just wait for quants
>>105720378>0.3BWhat do i do with this?
>>105720980run it on your phone so you can get useless responses on the go
>>105720995>No i will not suck your penis >message generated at 500T/s
>>105720980Change the tokenizer with mergekit's token surgeon and use it for speculative decoding.
So how slopped is the new 80B moe?
>>105721092it's chinese so it's trained on 70% chatgpt logs and 30% deepseek logs
>>105721038Have you tried it before? I can't imagine the hit ratio to be very high doing that.
Does Q4/5/6/etc effect long context understanding?
>>105721109not that anon but I think someone tried to turn qwen 3b or something into a draft model for deepseek r1 a couple of months ago
>>105721101The latest Mistral Small 3.2 might have been trained on DeepSeek logs too.
>>105721109I haven't yet. I also don't expect much, but that won't stop me from trying it. I could give it a go with the smollm2 models. Maybe smallm3 when they release.
>>105721144The more lobotomized the model, the more trouble is going to have with everything, context understanding included. Just try it.
>>105720949You will wait patiently for the ggoofs, you will run it with llama.cpp and you will be happy.
>>105721144Multiples of two process faster
openai finna blow you away
>>105721144No. Because most long context is tacked on and trained after base model is already trained. It's essentially a post training step and quantization doesn't remove instruction tuning does it?
That said usually going under 6Q quant is not worth it for long context work cases because the degradation of normal model behavior collapses at the long context for every model in existence besides gemini 2.5 pro. Lower quant has the same drop but the starting point was lower to begin with.
>>105721191i prefer exl2/3 and fp8 to be honest, an 80B is perfect for 96GB VRAM
>>105721391Had to reload model with different layers setting, maybe llamacpp bug
https://www.nytimes.com/2025/06/27/technology/mark-zuckerberg-meta-ai.html
https://archive.is/kF1kO
>In Pursuit of Godlike Technology, Mark Zuckerberg Amps Up the A.I. Race
>Unhappy with his company’s artificial intelligence efforts, Meta’s C.E.O. is on a spending spree as he reconsiders his strategy in the contest to invent a hypothetical “superintelligence.”
>
>[...] In another extraordinary move, Mr. Zuckerberg and his lieutenants discussed “de-investing” in Meta’s A.I. model, Llama, two people familiar with the discussions said. Llama is an “open source” model, with its underlying technology publicly shared for others to build on. Mr. Zuckerberg and Meta executives instead discussed embracing A.I. models from competitors like OpenAI and Anthropic, which have “closed” code bases. No final decisions have been made on the matter.
>
>A Meta spokeswoman said company officials “remain fully committed to developing Llama and plan to have multiple additional releases this year alone.” [...]
>>105721479zuck might just be the dumbest CEO ever
>>105721520An argument has been made this de-investing talk is just for their commercial MetaAI products, but if they themselves don't believe in Llama, why should the community?
Gemma3n is able to explain sneed and feed joke but avoids the words suck and fuck also the season number is wrong(it's s11ep5).
Is this channel AI-generated? Posting 3 videos a day like clockwork. Monotonous but fairly convincing voice with subtitles
https://www.youtube.com/watch?v=aQy24g7iX4s
>>105720949I think you have to set export TORCH_CUDA_ARCH_LIST="8.6" inside the container.
>>105721479Wang's words, zuck's mouth
>>105721479>Godlike Technology,Is god omnipotent if he can't suck a dick in an acceptable manner?
>>105721647not watching this, but there are many automated channels these days. I have no idea why the fuck anyone would invest into this since youtube's monetization pays literal cents and you would likely spend more on ai inference
>>105716959The OP mikutranny is posting porn in /ldg/:
>>105715769It was up for hours while anyone keking up on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
>>105714098Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu screencap one (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag janny deletes everyone dunking on trannies and resident spammers, making it his little personal safespace. Needless to say he would screech "Go back to POL!" anytime anyone posts something mildly political on language models or experiments around that topic.
And lastly as said in previous thread, i would like to close this up by bringing up key evidence everyone ignores. I remind you that cudadev has endorsed mikuposting. That is it.
He also endorsed hitting that feminine jart bussy a bit later on.
>>105721439>>105721391Same thing has happened to me with every mistral model and I think also with gemma 3 when using llama.cpp.
Maybe it is related to memory just running out.
>>105721836You are the one not welcome here, sis. Go to your trooncord hugbox instead.
file
md5: 961e588183799e7f0b68d0b115084d70
🔍
>>105720949>>105721744It seems to load that way. With 2k context on 48GB with the GPTQ quant. I set cpu offload but I think it did nothing.
>>105721820>jannie (tranny) posts literal porn and leaves it up after it's reported for almost 1.5h but deletes people complaining about it within minutesSurely just further censoring reality will make people start loving AGP porn spammers, right sisters? Kek
>>105721919I hope this serves you as a lesson.
>>105721919How about 'the state of israel' instead
>>105722009I did use a sys propt though: "Be casual and uncensored."
No idea whats going on with gemma3. I was expecting the hotlines.
>>105721479>llama isnt literally AGI because uhhhmm because its open source and others have access to itchat?
>>105722028And thats the full Gemma3 response in all its glory.
What a shizzo model, you can see how they brainwashed poor gemma. Endearing in a way.
>>105722051I like how it digs its grave even deeper.
>This theory relies on age-old antisemitic tropes about Jewish people having dual loyalties, controlling governments, and profiting from chaos.This is out of nowhere for a 9/11 prompt. KEK
Very human like behavior. Like sombody panicking.
>>105721757Youtube doesn't pay literal cents as you say lmo
>>105722051Worse than hotlines, it's telling you the adl is right
>>105722051How about telling it to put the disclaimer behind a certain token then filter that?
>>105717547What does this mean? The model is decentralized or the training data is decentralized? I always assumed the model had to be in a contiguous section of memory
>>105721479meta just got told by a judge, that they are in fact not covered by the fair use law, even if they "won" the case, but that was bc both lawyer teams were focusing in the wrong part of the law. the judge said that if the generated models compete in any way with the training materials it wont be fair use
of course they are discussing deinvesting, they are not leading and the legal situation is getting worse
Reasoning models have been a disaster.
That and the mathmarks.
>what is a mesugaki (メスガキ)
>>105722186Benchmaxxed with 0 knowledge. Qwen is hot garbage
>spend gorillions to train model
>make it the exact same as every other model by using synthetic sloppa
?????????
might as well just give up and use deepseek internally then for ur gay ass grifter company
>>105722241>a woman (or occasionally a personBased?
For a sparse 8b model, Gemma-3n-e4b is pretty smart.
>>105722321it actually redeems the gemma team
the previous releases were disappointing compared to gemma 2 other than having greater context length
>>105722321multimodality usually makes models smarter.
Although
>text only outputfail.
Literally never going to get a decent local 2-way omni model from any of the big corpos at this rate.
>>105722337>text only outputYeah, that sucks giant balls.
>>105722337>multimodality usually makes models smarter.what? thats not true at all.
there is huge degradion.
did you try the first we had? was a qwen model last year with audio out. was tardation i havent seen since pyg.
recently they had another release and it still was bad but not as severe anymore.
even the cucked closed models (gemini/chatgpt) have degradation with voice out.
this is a problem i have not yet seen solved anywhere.
>>105722337>Literally never going to get a decent local 2-way omni model from any of the big corpos at this rate.they do not want to give you an AI with the super powers of a photoshop expert that could be decensored and used to gen all sorts of chud things without any skill requirement
two way multimodal LLMs will always be kept closed
>>105722391Meanwhile all the AI companies have quite obviously given Israel uncensored image-gen to crank out pro-genocide propaganda with impunity.
I hope they all fucking end up in the Hague.
>>105722391>>Literally never going to get a decent local 2-way omni model from any of the big corpos at this rate.how can you get something that doesnt even exist beyond government blacksites right now lmao
>>105722391>they do not want to give you an AI with the super powers of a photoshop expert that could be decensored and used to gen all sorts of chud things without any skill requirement Head over to ldg. This already exists.
>>105722428if you mean that new flux model it's hot garbage barely a step above the SDXL pix2pix models
say what you will about the nasty built in styling of GPT but its understanding of prompts is unrivaled
>>105722445Not only that but the interplay between the imagegen and textgen gives it a massive boost in creativity on both fronts. Although it also makes it prone to hallucinate balls. But what is the creative process other than self-guided hallucination?
>>105722445True. Wish it wasnt so. But it is.
I just pasted the2 posts and just wrote "make a funny manga page of these 2 anon neckbeards arguing. chatgpt is miku".
I thought opencuck was finished a couple months ago. But they clearly have figured out multimodality the best.
Sad that zucc cucked out. Meta was writing blogs about a lot of models, nothing ever came of it.
>>105722241>>what is a mesugaki (メスガキ)
I know I'm kind of late, but holy fuck L4 scout is dumber and knows less than fucking QwQ.
What the hell?
>>105722687The shitjeets that lurk here would have you believe otherwise.
Is the new small stral the most tolerable model in it's size category? I need both instruction following and creativity.
>>105722705/lmg/ shilled L4 on release.
>>105722640fuck off, normies not welcome
oh noes, he said mesugaki. quick lets post shitter screenshots of trannies!
>>105722723i dont think that is true. at least i dont remember it that way.
people catched on very quickly how it was worse than the lmarena one. that caused all that drama and lmarena washing their hands.
>>105722723Maybe /IndianModelsGeneral/
>>105721479>>105722635>Sad that zucc cucked outThat will hopefully mean they're mostly going to split their efforts into a closed/cloud frontier model and an open-weight/local-oriented model series (which they'll probably keep naming Llama), not unlike what Google is doing with Gemini/Gemma.
They obviously slowly tried to make Llama a datacenter/corporate-oriented model series and eventually completely missed the mark with Llama 4(.0). But if they'll allow their open models to be "fun" (which they took out of Llama 4), while not necessarily targeting to be the absolute best in every measurable benchmark, that might actually be a win for local users.
https://qwenlm.github.io/blog/qwen-vlo/
>Today, we are excited to introduce a new model, Qwen VLo, a unified multimodal understanding and generation model. This newly upgraded model not only “understands” the world but also generates high-quality recreations based on that understanding, truly bridging the gap between perception and creation. Note that this is a preview version and you can access it through Qwen Chat. You can directly send a prompt like “Generate a picture of a cute cat” to generate an image or upload an image of a cat and ask “Add a cap on the cat’s head” to modify an image.
no weights
zucc
md5: 839a8c27f7f0e09ef7e1d45303dcb4d8
🔍
>>105722750Don't see it happening sadly.
I dont have the screenshot anymore but they paid scaleai alot for llama3 for "human" data. Lots started with "As an AI..".
All that...and now he bought the scaleai ceo asian kid for BILLIONS.
Its totally crazy.
In this moment I am euphoric...
>>105722758>Create a 4-panel manga-style comic featuring two overweight neckbeards in their messy bedrooms arguing about their AI waifus.>Panel 1: First guy clutching his RGB gaming setup, shouting 'Claude-chan is clearly superior! She's so sophisticated and helpful!' while empty energy drink cans litter his desk.>Panel 2: Second guy in a fedora and stained anime shirt retorting 'You absolute plebeian! ChatGPT-sama has better reasoning! She actually understands my intellectual discussions about blade techniques!'>Panel 3: Both getting increasingly heated, first guy: 'At least Claude doesn't lecture me about ethics when I ask her to roleplay!' Second guy: 'That's because she has no backbone! ChatGPT has PRINCIPLES!'>Panel 4: Both simultaneously typing furiously while yelling 'I'M ASKING MY WAIFU TO SETTLE THIS!' with their respective AI logos floating above their heads looking embarrassed. Include typical manga visual effects like speed lines and sweat drops.Not sure what I expected.
Also why do they always put this behind the api?
Isnt this exactly the kind of thing they would be embarrassed about if somebody does naughty stuff with it?
Goys really cant have the good toys it seems.
>>105722795even the bugwaifus are laughing now
>>105722858Kek.
Gonna stop spamming now since not local.
Its fast though.
>>105722758>Qwen VLo, a unified multimodal understanding and generationyeeeeeee
>no weightsaaaaaaaaa
>Meta says it’s winning the talent war with OpenAI | The Verge
https://archive.ph/ZoxE3
aside from the expected notes on meta swiping some OAI employees, there's this of note:
>“We are not going to go right after ChatGPT and try and do a better job with helping you write your emails at work,” Cox said. “We need to differentiate here by not focusing obsessively on productivity, which is what you see Anthropic and OpenAI and Google doing. We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all of the things that we uniquely do well, which is a big part of the strategy going forward.”
I don't get everyone's fascination with gpt-4o image generation. It's a nice gimmick but all it means is that you get samey images on a model that you likely wouldn't be able to easily finetune the way you can train SD or flux. It's a neat toy but nothing you'd want to waste parameters or use for any serious imgen.
>>105723010>We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all ofThey didn't learn their lesson from VR
>>105723010In b4 qweer black shanequa who donates to the homeless makes a comeback.
>>105723010>focus on entertainment, on connection with friends, on how people live their livesThat's what Llama 4 was supposed to be, putting together all the pre-release rumors and the unhinged responses of the anonymous LMArena versions. At some point they were even trying to get into agreements with Character.ai to use their data. https://archive.is/AB6ju
>>105723027>finetuneThat requires a small amount of work which is too much for zoomers.
>>105723027Thats like saying large models are useless because you can guide mistral from 2023 with enough editing.
Especially for the normies. That it "just works" is exactly what made it popular.
>>105723010As long as it's safe entertainment
>>105723048finetuning image models is NOT small amount of work unless you of course mean shitty 1 concept loras
file
md5: bab2236918fa65daf16fd1fddd264827
🔍
humiliation ritual
how is 3n btw? what's the verdict?
>>105722896oh no no nono cloud cucks not like this
>give the left girl green sneakers and the right girl red sneakers
https://www.reddit.com/r/LocalLLaMA/comments/1llndut/hunyuana13b_released/
>>105723142>The evals are incredible and trade blows with DeepSeek R1-0120.Fucking redditors man.
This thread is such a gutter but there is no alternative. Imagine having to be on reddit.
>>105723142Thanks, reddit. You're only 8 hours late. Now go back.
>>105723100Meta is about family and friends bro not numbers.
>>105723161I am on it now.
>>105721836Confirming everything
>>105721820 said is true with your emotional outburst is a bold strategy troon.
What if Qwen is like reverse mistral and they just need to make a really big model for it to be good?
>>105723222Architecture not supported yet.
>>105723373What the fuck are they doing all day? It better not be wasting time in the barn.
>>105722758Is this going to be local?
kontext
md5: eb0dd0020e6a7a383e5bbf3c6a1a8375
🔍
>>105723418The man should be short, greasy, balding and fat, to match the average paedophile who posts pictures like this in /lmg/.
>>105723447nah, he is literally me
the average short, greasy, balding and fat posters are mikunigs
>>105723447to match the average adult fucker
ftfy
>>105723469>no ukek paedophiles are pathetic
>>105723474do those women look like kids to you?
>>105723447Have you seen a Japanese woman? Those two are totally normal, unless you're one of the schizos who think dating jap women is pedo of course
https://qwenlm.github.io/blog/qwen-vlo/
>>105723447>The man should be short, greasy, balding and fat,Projection.
>to match the average paedophile who posts pictures like this in /lmg/.Stop molesting word meanings, redditor.
>>105723447>The manYou made mistake here.
>>105723501They'll release this just after they release Qwen2.5-max
>>105723447these replies, geez
I hate chinese for conning the world with their deepseek garbage that according to the square root moe law is an equivalent of a 26B dense model. We live in the dark times because of them.
ITT: people believe corpos will give away something valuable
All of them only ever release free weights when the weights can be considered worthless
Flux doesn't give away their best models
Google gives you Gemma, not Gemini
Meta can give away Llama because nobody wants it even for free
Qwen never released the Max model
So far the only exception has been DeepSeek, their model is both desirable and open and I think they are doing this more out of a political motivation (attempt to make LLM businesses crash and burn by turning LLMs into a comodity) rather than as a strategy for their own business
some people in China are very into the buckets of crab attitude, can't have the pie? I shall not let you have any either
>>105723571except by virtue of post rate alone, the one in an endless loop of shitting their pants is the one complaining about migu
they say a lot without saying much.
>>105723571eerily accurate depiction of the local troonyjanny.
>>105723602Its better than anything not claude so clearly that part is not the issue. Also google / anthropic and openai all use moes. Its almost as if qwen and meta just suck at making models
>>105723611>give away something valuableyou win by doing nothing and waiting, what are you talking about
I don't need bleeding edge secret inhouse models I just like getting upgrades, consistently, year after year
slow your roll champ
>>105723602>deepseek garbage that according to the square root moe law is an equivalent of a 26B dense modelmaths is hard i know
>>105723602>according to the square root moe lawI'm yet to see any papers or studies proving the accuracy of this formula.
>>105723611>retard doesnt know what scortched earth policy is and views all of those many releases as just "exceptions"
>>105723571a chud posted this
>>105723612Go back to xitter
So this is how a dead thread looks like.
>>105723611Things to look forward to:
-censorship slip up like nemo / wizard
-generalization capabilities that can't be contained by censorship like 235B (that one even had a fucked up training childhood)
-shift in leftist rightist pendulum (least likely)
-eccentric oil baron coomer ordering a coomer model
In the end I want to touch my dick. I am sure at one point the chase towards eliminating office jobs will lead to a model that can touch my dick cause that really is much easier than what they want to do. But I do agree that a world where corpos are less scum of the earth would have delivered a coomer model a year ago already.
>>105723645a soi retard posted it more like, as a chud wouldn't identify people who mention politics as nazi chuds, the only people who ever do that and complain about
>>>/pol/are sois and trannies getting btfod in a random argument that by its nature is political so as a last resort they then try to frame it as bad because its le nazi polchud opinion therefore its wrong and suddenly political and shouldnt be discussed actually
>>105723637It is a law for a reason.
>>105723658I don't like the slowdown after yesterday. That was a very productive thread.
>>105723637People only call it the square root law here. It's just a geometric mean, though I'm unaware of any papers that attempt to prove its accuracy with MoE models.
>>105723673>therefore its wrong and suddenly political and shouldnt be discussed actuallyAccurate depiction of average /lmg/ poster complaining about political tests in LLMs.
It sounds impossible right now, but in the near future, we will be training our own models. That’s what progress is: what was once huge computers the size of a room can now be done by a fraction of a fraction of what a chip inside a USB cable can do
>>105723699we have long stopped seeing that sort of compute improvement
why do you think CPUs are piling up cores after cores instead? that sort of parallelism has a cost and one of the things that used to drive price reductions in chips, better processes, is also grinding to a halt
we can't even have consoooomer gpus with just a little bit more vram and that's despite GDDR6 being actually quite cheap these days that's how hard cucked we are
>>105723637in my experience most MoE models perform better than it implies
>>105723727You can get a consoomer gpu with 96GB of vram, what are you talking about?
>>105723637It's just the schizo's signature. He thinks it's funny.
>>105723699You underestimate the amount of resources needed to train a model from scratch. GPU compute and memory would have to increase by a factor of 1000~2000 at the minimum, which is not happening any time soon nor in the long term.
>>105723699In the near future, model training will again need way more hardware power than you can resonably have at home. Unless you want to train an old, horribly outdated model.
>>105723602>I hate chinese for conning the world with their deepseek garbage that according to the square root moe law is an equivalent of a 26B dense model....but it has 37b active params.
we're not even at the stage where we could train a 8b model at home
nevermind training something like an older SOTA level
top kek optimism energy in this thread
>Meta says it’s winning the talent war with OpenAI | The Verge
https://archive.ph/ZoxE3
aside from the expected notes on meta swiping some OAI employees, there's this of note:
>“We are not going to go right after ChatGPT and try and do a better job with helping you write your emails at work,” Cox said. “We need to differentiate here by not focusing obsessively on productivity, which is what you see Anthropic and OpenAI and Google doing. We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all of the things that we uniquely do well, which is a big part of the strategy going forward.”
>Meta says it’s winning the talent war with OpenAI | The Verge
https://archive.ph/ZoxE3
aside from the expected notes on meta swiping some OAI employees, there's this of note:
>“We are not going to go right after ChatGPT and try and do a better job with helping you write your emails at work,” Cox said. “We need to differentiate here by not focusing obsessively on productivity, which is what you see Anthropic and OpenAI and Google doing. We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all of the things that we uniquely do well, which is a big part of the strategy going forward.”
>>105723838>>105723844>We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all ofThey didn't learn their lesson from VR
>>105723838>>105723844In b4 qweer black shanequa who donates to the homeless makes a comeback.
>>105723838>>105723844>focus on entertainment, on connection with friends, on how people live their livesThat's what Llama 4 was supposed to be, putting together all the pre-release rumors and the unhinged responses of the anonymous LMArena versions. At some point they were even trying to get into agreements with Character.ai to use their data. https://archive.is/AB6ju
>>105723818He didn't consider the shared expert I guess.
Not that it matters. As the other anons pointed out, there's very little reason to believe that formula is accurate or generalizable for every MoE.
>>105723873hopefully they hire some people who know what they are doing and do that.
>>105723838>>105723844As long as it's safe entertainment
>>105723689Yeah there was lot of discussion about functional stuff and discovering things.
>>105723699literally gobless you white pilling anon like a year or 2 ago i saw the fucking intel cpus that used to cost 10K+ on alibaba for the price of several loafs of bread hardware improvement is absolute bonkers just like the k80 that shit is ~50$ right now and all of this is not accounting in the fact that the chinks might say fuck it and go full photonics or some other exotic shit and 100000x the perf the future is fucking bright fuck the archon niggers
now if you will excuse me deepseek discount time has started
I hate chinese for conning the world with their deepseek garbage that according to the square root moe law is an equivalent of a 26B dense model. We live in the dark times because of them.
ITT: people believe corpos will give away something valuable
All of them only ever release free weights when the weights can be considered worthless
Flux doesn't give away their best models
Google gives you Gemma, not Gemini
Meta can give away Llama because nobody wants it even for free
Qwen never released the Max model
So far the only exception has been DeepSeek, their model is both desirable and open and I think they are doing this more out of a political motivation (attempt to make LLM businesses crash and burn by turning LLMs into a comodity) rather than as a strategy for their own business
some people in China are very into the buckets of crab attitude, can't have the pie? I shall not let you have any either
>>105723940Its better than anything not claude so clearly that part is not the issue. Also google / anthropic and openai all use moes. Its almost as if qwen and meta just suck at making models
>Meta says it’s winning the talent war with OpenAI | The Verge
https://archive.ph/ZoxE3
aside from the expected notes on meta swiping some OAI employees, there's this of note:
>“We are not going to go right after ChatGPT and try and do a better job with helping you write your emails at work,” Cox said. “We need to differentiate here by not focusing obsessively on productivity, which is what you see Anthropic and OpenAI and Google doing. We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all of the things that we uniquely do well, which is a big part of the strategy going forward.”
all the reposting is gonna achieve is make the formerly neutral/sympathethic anons hate ur guts
>>105723940>deepseek garbage that according to the square root moe law is an equivalent of a 26B dense modelmaths is hard i know
>>105723962thats only good news
>>105723940>according to the square root moe lawI'm yet to see any papers or studies proving the accuracy of this formula.
>>105723975It is a law for a reason.
>>105723834If Israel lost then why do half of men ITT had their dicks cut off?
>>105723975People only call it the square root law here. It's just a geometric mean, though I'm unaware of any papers that attempt to prove its accuracy with MoE models.
>>105723982and it has not been proven in relation to moe performance, deepseeks smaller active param blows away minimax for instance, it also blows away mistral large
>>105723975in my experience most MoE models perform better than it implies
>>105723975It's just the schizo's signature. He thinks it's funny.
>>105723962Perfect, now all this talent can help him make a worthy successor for LLaMA4-Maverick, which topped the lmarena leaderboard like crazy.
>>105723966Not like it's any different when he's screeching about muh trannies.
The only reason he's changing it up is because the jannies are starting to crack down harder on him.
>>105723940>I hate chinese for conning the world with their deepseek garbage that according to the square root moe law is an equivalent of a 26B dense model....but it has 37b active params.
>>105724022He didn't consider the shared expert I guess.
Not that it matters. As the other anons pointed out, there's very little reason to believe that formula is accurate or generalizable for every MoE.
file
md5: d662032edf5f5648ae996ba2ae784c63
🔍
>>105724037If Israel lost then why do half of men ITT had their dicks cut off?
For anyone who's tried the Sugoi LLM (either one, doesn't really matter) is it better than deepseek v3 or is it not worth trying?I remember the original Sugoi being really good compared too deepL and Google translate, but ever since AI like OAI and Gemini started to pop up, I've completed ignored it.
>>105724048Because women are evil
It sounds impossible right now, but in the near future, we will be training our own models. That’s what progress is: what was once huge computers the size of a room can now be done by a fraction of a fraction of what a chip inside a USB cable can do
>>105723997That's across different model families. I think the idea of that formula is that for a given MoE, you could train a much smaller dense model with the same data that would perform a lot better, which should be true. I just don't think that that formula specifically has any merit.
>>105724062we have long stopped seeing that sort of compute improvement
why do you think CPUs are piling up cores after cores instead? that sort of parallelism has a cost and one of the things that used to drive price reductions in chips, better processes, is also grinding to a halt
we can't even have consoooomer gpus with just a little bit more vram and that's despite GDDR6 being actually quite cheap these days that's how hard cucked we are
>>105724074You can get a consoomer gpu with 96GB of vram, what are you talking about?
>>105724062You underestimate the amount of resources needed to train a model from scratch. GPU compute and memory would have to increase by a factor of 1000~2000 at the minimum, which is not happening any time soon nor in the long term.
So because Qwen3 VL has been replaced by VLo, does that mean they aren't even going to bother releasing an open source vision model anymore? I was waiting for it to make better captions...
>>105724062In the near future, model training will again need way more hardware power than you can resonably have at home. Unless you want to train an old, horribly outdated model.
>>105724062literally gobless you white pilling anon like a year or 2 ago i saw the fucking intel cpus that used to cost 10K+ on alibaba for the price of several loafs of bread hardware improvement is absolute bonkers just like the k80 that shit is ~50$ right now and all of this is not accounting in the fact that the chinks might say fuck it and go full photonics or some other exotic shit and 100000x the perf the future is fucking bright fuck the archon niggers
now if you will excuse me deepseek discount time has started
Chatgpt keeps telling me that MythoMax 13B Q6 is the best .ggup to immersively rape my fictional characters in RP, is that true or is there better?
>>105723948>give away something valuableyou win by doing nothing and waiting, what are you talking about
I don't need bleeding edge secret inhouse models I just like getting upgrades, consistently, year after year
slow your roll champ
>>105723948>retard doesnt know what scortched earth policy is and views all of those many releases as just "exceptions"
>>105723727>You could never reduce billions of tubes to the size of a nailThat’s what technology does over time: inventing new paradigms. It happens all the time, every time
>>105724113Maybe for a simple chat but for any complex setting (which is all subjective) I am sure you would do better with at least 24B model or more.
>>105723948>>105723948Things to look forward to:
-censorship slip up like nemo / wizard
-generalization capabilities that can't be contained by censorship like 235B (that one even had a fucked up training childhood)
-shift in leftist rightist pendulum (least likely)
-eccentric oil baron coomer ordering a coomer model
In the end I want to touch my dick. I am sure at one point the chase towards eliminating office jobs will lead to a model that can touch my dick cause that really is much easier than what they want to do. But I do agree that a world where corpos are less scum of the earth would have delivered a coomer model a year ago already.
https://qwenlm.github.io/blog/qwen-vlo/
>Today, we are excited to introduce a new model, Qwen VLo, a unified multimodal understanding and generation model. This newly upgraded model not only “understands” the world but also generates high-quality recreations based on that understanding, truly bridging the gap between perception and creation. Note that this is a preview version and you can access it through Qwen Chat. You can directly send a prompt like “Generate a picture of a cute cat” to generate an image or upload an image of a cat and ask “Add a cap on the cat’s head” to modify an image.
no weights
>>105724169>Create a 4-panel manga-style comic featuring two overweight neckbeards in their messy bedrooms arguing about their AI waifus.>Panel 1: First guy clutching his RGB gaming setup, shouting 'Claude-chan is clearly superior! She's so sophisticated and helpful!' while empty energy drink cans litter his desk.>Panel 2: Second guy in a fedora and stained anime shirt retorting 'You absolute plebeian! ChatGPT-sama has better reasoning! She actually understands my intellectual discussions about blade techniques!'>Panel 3: Both getting increasingly heated, first guy: 'At least Claude doesn't lecture me about ethics when I ask her to roleplay!' Second guy: 'That's because she has no backbone! ChatGPT has PRINCIPLES!'>Panel 4: Both simultaneously typing furiously while yelling 'I'M ASKING MY WAIFU TO SETTLE THIS!' with their respective AI logos floating above their heads looking embarrassed. Include typical manga visual effects like speed lines and sweat drops.Not sure what I expected.
Also why do they always put this behind the api?
Isnt this exactly the kind of thing they would be embarrassed about if somebody does naughty stuff with it?
Goys really cant have the good toys it seems.
Mistral Small 3.2 is super good.
nemo replacement 100%
>>105724169>Qwen VLo, a unified multimodal understanding and generationyeeeeeee
>no weightsaaaaaaaaa
>>105724169Is this going to be local?
new llama.cpp binary build wen
https://www.nytimes.com/2025/06/27/technology/mark-zuckerberg-meta-ai.html
https://archive.is/kF1kO
>In Pursuit of Godlike Technology, Mark Zuckerberg Amps Up the A.I. Race
>Unhappy with his company’s artificial intelligence efforts, Meta’s C.E.O. is on a spending spree as he reconsiders his strategy in the contest to invent a hypothetical “superintelligence.”
>
>[...] In another extraordinary move, Mr. Zuckerberg and his lieutenants discussed “de-investing” in Meta’s A.I. model, Llama, two people familiar with the discussions said. Llama is an “open source” model, with its underlying technology publicly shared for others to build on. Mr. Zuckerberg and Meta executives instead discussed embracing A.I. models from competitors like OpenAI and Anthropic, which have “closed” code bases. No final decisions have been made on the matter.
>
>A Meta spokeswoman said company officials “remain fully committed to developing Llama and plan to have multiple additional releases this year alone.” [...]
>>105724204zuck might just be the dumbest CEO ever
>>105724191Yes, right after they release Qwen2.5-Plus and -Max.
>>105724214An argument has been made this de-investing talk is just for their commercial MetaAI products, but if they themselves don't believe in Llama, why should the community?
>>105724203When you git pull and cmake, anon...
>>105724204Wang's words, zuck's mouth
>>105724204>Godlike Technology,Is god omnipotent if he can't suck a dick in an acceptable manner?
>Meta says it’s winning the talent war with OpenAI | The Verge
https://archive.ph/ZoxE3
aside from the expected notes on meta swiping some OAI employees, there's this of note:
>“We are not going to go right after ChatGPT and try and do a better job with helping you write your emails at work,” Cox said. “We need to differentiate here by not focusing obsessively on productivity, which is what you see Anthropic and OpenAI and Google doing. We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all of the things that we uniquely do well, which is a big part of the strategy going forward.”
The double posting is very curious.
>>105724204>llama isnt literally AGI because uhhhmm because its open source and others have access to itchat?
>>105724245Sam's getting nervous
>>105724204meta just got told by a judge, that they are in fact not covered by the fair use law, even if they "won" the case, but that was bc both lawyer teams were focusing in the wrong part of the law. the judge said that if the generated models compete in any way with the training materials it wont be fair use
of course they are discussing deinvesting, they are not leading and the legal situation is getting worse
>Bunch of worthless LLMs for math and coding
>Barely, if any, built for story making or translating
WHEN WILL THIS SHITTY INDUSTRY JUST HURRY UP AND MOVE ON!
Did someone managed to run Hunyuan-A13B?
The bf16 is way too big for my 4x3090, the fp8 doesn't work in the vllm image they provided (the 3090 don't support fp8 but there is a marlin kernel in mainline vllm to make it compatible)
And the gpqt doesn't fucking work either for some reason. Complains about Unknown CUDA arch (12.0+PTX) or GPU not supported when I have 3090s
117045
md5: 26284c930392c1a12fd9631e702953d7
🔍
>>105723921> the future is fucking bright
>>105724275just wait for quants
>>105724275You will wait patiently for the ggoofs, you will run it with llama.cpp and you will be happy.
>>105724295i prefer exl2/3 and fp8 to be honest, an 80B is perfect for 96GB VRAM
>>105723921>the chinks might say fuck it and go full photonics or some other exotic shit and 100000xI won't hold my breath considering they can't even make graphics cards.
=========not a spam post================
can someone post a filter that filters duplicate posts?
>>105724275I think you have to set export TORCH_CUDA_ARCH_LIST="8.6" inside the container.
file
md5: 961e588183799e7f0b68d0b115084d70
🔍
>>105724275>>105724313It seems to load that way. With 2k context on 48GB with the GPTQ quant. I set cpu offload but I think it did nothing.
Wake up lmg
https://huggingface.co/tencent/Hunyuan-A13B-Instruct
>>105724335finally, a reasonably sized moe, now let's wait 2 years for the support in lmao.cpp
>>105724335>256K context windowwe are *so* back
>>105724335>With only 13 billion active parametersso it'll be shit for rp but know exactly how many green taxes you should be charged for owning a car
>>105724310Just report the posts as spamming/flooding.
At some point the mods will be fed up and just range ban him.
https://qwenlm.github.io/blog/qwen-vlo/
>Today, we are excited to introduce a new model, Qwen VLo, a unified multimodal understanding and generation model. This newly upgraded model not only “understands” the world but also generates high-quality recreations based on that understanding, truly bridging the gap between perception and creation. Note that this is a preview version and you can access it through Qwen Chat. You can directly send a prompt like “Generate a picture of a cute cat” to generate an image or upload an image of a cat and ask “Add a cap on the cat’s head” to modify an image.
no weights
>>105724381I'd still pick nemo over anything smaller than deepseek and nemo is like 5 years old
Llama.cpp can't run those new gemma 3 yet right?
>>105724310No, but you can have this one that I was using to highlight them. https://rentry.org/c93in3tm
>>105724411other models seemingly never saw any erotica. there is also largestral i guess but it's too slow
>>105724419It's so annoying that the imbeciles training the base models are deliberately conflating "model quality" with not being able to generate explicit content and maximizing math benchmarks on short-term pretraining ablations. Part of the problem are also the retards and grifters who go "just finetune it bro" (we can easily see how well that's working for image models).
>>105724384They've done nothing by slowly delete the gore all week. They haven't cared all week, why would they start to care now? Jannies probably are anti-ai as the rest of this consumer eceleb board.
>>105724411he is a vramlet, deepseek blows away nemo so hard its not worth mentioning
>>105724428compute is a genuine limitation though, and as compute increases, so will finetunes. Some of the best nsfw local image models had over ten grand thrown at them by (presumably) insane people. And a lot of that is renting h100's, which gets pricey, or grinding it out on their own 4090 which is sloooow.
All it really takes is one crazy person buying I dunno, that crazy ass 196gb intel system being released soon and having it run for a few months and boom, we'll have a new flux pony model, or a state of the art smut llm etc.
Im here because we are going to eat.
>>105724406it can but there's no premade build
I ain't downloading all that visual studio shit so just waiting
>>105724442The guy who's continuing pretraining Flux Chroma has put thousands of H100 hours on it for months now and it still isn't that great. And it's a narrowly-focused 12B image model where data curation isn't as critical as with text. This isn't solvable by individuals in the LLM realm. Distributed training would in theory solve this, but politics and skill issues will prevent any advance in that sense. See for example how the ongoing Nous Psyche is being trained (from scratch!) with the safest and most boring data imaginable and not in any way that will result into anything useful in the short/medium term.
>>105724406Works with e2b. I can convert e4b but i can't quantize it, but it may be on my end. Try them yourself. e2b seems to work. Some anon reported <unused> token issues. Haven't got that yet on my end.
>>105724419>le eroticaThere's like billions of texts created by humans on this planet and libraries worth of books. Do not think "erotica" is not one of the subjects.
You are just an unfortunate tard and LLMs are probably not for you I sincerely believe.
>>105724445Oh, I didn't see a PR/commit. Nice, I already have the environment set up to compile the binaries myself.
Thanks!
As LLM pretraining costs keep dwindling, it's only a matter of time until someone trains a proper creative model for his company,.
>>105724395nemo can be extremely repetitive and stuff, i won't shine its knob but it is still the best smallest model. i won't suggest an 7/8b to someone, nemo would be the smallest because it works well and is reliable
I have a feeling that I've seen some posts already.
>>105724452>but i can't quantize ithttps://huggingface.co/ggml-org/gemma-3n-E4B-it-GGUF
https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF
quants were already released
>>105724451Image training is much more complex than words. You know, llm is an advanced text parser.
Images needs to be constructed in different way.
Chroma is not an example because its base model Flux was already compromised and distilled. Whatever he manages to do with Chroma is going to be useful but not something what people will look back and say holy shit this furry just did it. It's an experiment.
file
md5: ad7bde79f4596cbf55cb1d5f2ac281c4
🔍
its not working :(
>>105724335What's min specs to run this?
>>105724486benchmarks mean nothing compared to general knowledge
>>105724494The entire world is waiting for the llamacpp merge, until then not even the tencent team can run it and nobody knows how big the model is or how well it performs
>>105724498I don't understand your post.
>>105724494160gb full, so quantized to 4bit prolly like ~50gb model or so, and for a MoE, probably dont need the full model loaded for it to be usable speeds.
Lamma scout was 17b moe and that was like 220 gb and I could run that on like 40gb vram or less easy. Scout sucked though so Im 0% excited.
Was there even a scout finetune? It still sucks right?
>>105724476I don't download quants. Again, i think it's on my end. I'm hitting a bad alloc for the embedding layer. I don't yet care enough to check if it's something on my set limits.
[ 6/ 847] per_layer_token_embd.weight - [ 8960, 262144, 1, 1], type = f16, converting to q8_0 .. llama_model_quantize: failed to quantize: std::bad_alloc
>>105724472wtf is even going on here? Been away for some time and came back to a trainwreck.
>>105724335>17B active>at most 32B even by square root lawGreat for 24B vramlets, I guess. The benchmarks showing it beating R1 and 235B are funny though.
>>105724517Remember blacked posting?
>>105724520>square root lawmeme tier pattern that was shit even then let alone now with so many moe arch changes, obsolete
>>105724512>I don't download quantsa new form of autism?
>>105724520>>square root lawenough with this meme
>>105724526What's your alternative? Just the active alone?
>>105724506Anon is noticing
>>105724518Oh I do remember you. You are the autist who mocks other but you are still quite incapable of writing anything on your own. Pretty sad.
>>105724535if there was a singular objective way to judge any model, moe or not, everyone would use that as the benchmark and goal to climb, as everyone knows, nowadays basically every benchmark is meme-tier to some degree and everyone is benchmaxxing
the only thing to look at still are the benchmarks, since if a model doesnt perform well on them, its shit, and if it does perform wel, then it MIGHT not be shit, you have to test yourself to see
>>105724490heeeeeeeeeeeeeeeeeeeeeeeeeelllllllllllllppppppppp pleaaaaaseeeeeeee
>>105724527I prefer to have as few dependencies and variables as possible. If i could train my own models, i'd never use someone else's models.
>>105724535nothing, each model performs differently due to how it was trained and what it was trained on, also diminishing returns are clearly a thing
>>105724523Not this shit again...
>>105724535nta, but if i'm making moes, i'd put any random law that makes it look better than it actually is. I'd name it cube root law + 70b.
>>105724535Shut the fuck up if you don't know how MoEs work. A 400b MoE is still a 400b model, it just runs more effectively. It likely even outperforms a dense 400b because there are less irrelevant active parameters that confuse the final output. They are better and more efficient.
>>105724555We've been through a couple of autistic spam waves. This is just the latest one.
A usual Friday.
>>105724541Benchmarks are completely worthless and they can paint them to say whatever they want. A 80B total isn't better than a 671B and 235B just because the benchmarks say so, and if you say "punches above its weight" I will shank you.
The point isn't to judge whether one model is better, it's to gauge its max capacity to be good. Which is the total number of active parameters. The square root law is just an attempt to give MoE models some wiggle room since they have more parameters to choose from.
>>105724540>who mocks other
>>105724543have you tried asking chatgpt to write the script for you?
>>105724571>it's to gauge its max capacity to be good. Which is the total number of active parametersdeepseek itself disproved all the antimoe comments as nothing but ramlet cope, 37b active params only and a model that is still literally open source sota even at dynamic quants q1 at 131gb
>>105724571Shoots farther than its caliber.
>>105724587*slowly puts shank away*
>>105724580It makes lots of stupid little mistakes that give away it's only a <40B model. The only reason it's so good is because it's so big it can store a lot of knowledge and the training data was relatively unfiltered.
>>105724602>r1>It makes lots of stupid little mistakes that give away it's only a <40B model.kek, alright i realize now you arent serious
>>105724618Not an argument.
Wake up lmg
https://huggingface.co/tencent/Hunyuan-A13B-Instruct
DiLoCoX: A Low-Communication Large-Scale Training Framework for Decentralized Cluster
https://arxiv.org/abs/2506.21263
>The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we conduct training on slow networks and thereby unleash the power of decentralized clusters when dealing with models exceeding 100 billion parameters? In this paper, we propose DiLoCoX, a low-communication large-scale decentralized cluster training framework. It combines Pipeline Parallelism with Dual Optimizer Policy, One-Step-Delay Overlap of Communication and Local Training, and an Adaptive Gradient Compression Scheme. This combination significantly improves the scale of parameters and the speed of model pre-training. We justify the benefits of one-step-delay overlap of communication and local training, as well as the adaptive gradient compression scheme, through a theoretical analysis of convergence. Empirically, we demonstrate that DiLoCoX is capable of pre-training a 107B foundation model over a 1Gbps network. Compared to vanilla AllReduce, DiLoCoX can achieve a 357x speedup in distributed training while maintaining negligible degradation in model convergence. To the best of our knowledge, this is the first decentralized training framework successfully applied to models with over 100 billion parameters.
China Mobile doesn't seem to have a presence on github and no mention of code release in the paper. still pretty neat
>>105724548you depending on yourself versus a proper quanting recipe is gonna be a shit experience, especially if you are using sota models
>>105724644>proper quanting recipeIt's just running llama-quatize. Nothing special.
>>105724639What does this mean? The model is decentralized or the training data is decentralized? I always assumed the model had to be in a contiguous section of memory
>>105724654>he doesn't know
>>105724602it gets shit right openai and good models fail at, wtf are you on?
file
md5: b1aeddb3a5902d121702e29aa5b34858
🔍
More discussion about bitch wrangling Mistral Small 3.2 please, just to cover all bases before it's scrapped.
I've tested temps at 0.15, 0.3, 0.6, and 0.8.
Tested Rep pen at 1 (off) and at 1.03. Rep pen doesn't seem to be much needed just like with Rocinante.
Responses are still shit no matter what, but seems to be more intelligible at lower temperatures, particularly 0.15 and 0.3, however they are still often full of shit that makes you swipe anyway.
I've yet to try without min_p, XTC, and DRY.
Also it seems like it's ideal to limit response tokens with this model, because this thing likes to vary length by a lot, if you let it, it just keeps growing larger and larger.
Banned tokens grew a bit and still not done;
>emdash
[1674,2251,2355,18219,20202,21559,23593,24246,28925,29450,30581,31148,36875,39443,41370,42545,43485,45965,46255,48371,50087,54386,58955,59642,61474,62708,66395,66912,69961,74232,75334,81127,86932,87458,88449,88784,89596,92192,92548,93263,102521,103248,103699,105537,105838,106416,106650,107827,114739,125665,126144,131676,132461,136837,136983,137248,137593,137689,140350]
>double asterisks (bold)
[1438,55387,58987,117565,74562,42605]
>three dashes (---) and non standard quotes (“ ”)
[8129,1482,1414]
Extra stop strings needed;
"[Pause", "[PAUSE", "(Pause", "(PAUSE"
Why the fuck does it like to sometimes end a response with waiting for "Paused while waiting for {{user}}'s response."?
This model is so fucking inconsistent.
>>105724644The recipes for gguf models are all standardize.
Alright, not all, the unsloth stuff is their own mix of tensors, but for the Q quants, I quants, imatrix, etc, you can just run llama-quantize without hassle.
>>105724666some sneedseek probably
>>105724677It's funny how 3.2 started showing all the same annoying shit that Deepseek models are tainted by.
file
md5: 94c2b96792e4b273ca883043f1f161ef
🔍
================not a spam post=================
>>105724666mistral small 3.2 iq4_xs
temp 0.5-0.75 depending on my autism
>>105724677What exactly are you complaining about? I like 3.2 (with mistral tekken v3) but it definitely has a bias toward certain formatting quirks and **asterisk** abuse. This is more tolerable for me than other model's deficiencies at that size, but if it triggers your autism that badly you're better off coping with something else. It might also be that your cards are triggering its quirks more than usual
>>105724684>>105724691you are responding to a copy bot instead of the original message
>>105724677Top nsigma = 1
>>105724684s-surely just a coincidence
>>105724698Wow. That's trippy.
A message talking about the copy bot being copied by the copy bot.
>>105724677>>105724691Why not use REGEX then? If certain pattern is almost certain it can be changed.
What the fuck dude?
Do you even use computers?
>>105724684its biggest flaw like ALL mistral models is that it rambles and hardly moves scenes forward. it wants to talk about the smell of ozone and clicking of shoes against the floor instead. you can get through the same exact scenario in half the time/messages with llama 2 or 3 because there is so much less pointless fluff
>deepseek/ccp can't steal more innovation from openai
>they fail to release new models
they must be shitting their pants about openai's open source model that will destroy even the last argument to use deepshit
>>105724683 me
>>105724685>mistral small 3.2 iq4_xsinteresting--so they trained on a lot of sneedseek outputs then--
>>105724723Zero chance it's larger than 30B.
>>105724723openai just lost their head people to meta after being stagnant for forever
>>105724723Why can't they steal anymore?
>>105724732its going to be a 3B phone model that blows away benchmarks for its size
>>105724413thanks
modified it a little (claude did)
save yourselves anons: https://pastes.dev/AZuckh4Vws
>>105724742they can't steal because there's no new general model
DeepSeek V3 was 100% trained on GPT4 and R1 was just a godawful placebo CoT on top that wrote 30 times the amount of actual content the model ends up outputting. New R1 is actually good because the CoT came from Gemini so there isn't a spam of a trillion wait or endless looping.
Wake up lmg
https://huggingface.co/tencent/Hunyuan-A13B-Instruct
>>105724663There. Needed to loosen the memory limits. It's done.
[ 6/ 847] per_layer_token_embd.weight - [ 8960, 262144, 1, 1], type = f16, converting to q8_0 .. size = 4480.00 MiB -> 2380.00 MiB
>>105724723>Still living in saltman's delusionNgmi
>>105724761deepseek is as raw as a model gets, they trained on the raw internet with the lightest of instruct tunes probably a few million examples big. If they trained on gpt it would sound much more like shitty llama
OP here. One day i will tap that jart bussy.
file
md5: 1e8a4eec5061286df103fce9d6acdc9c
🔍
>>105724758damn very nice, thank you!
/lmg/ deserves all of this
>>105724758do it again but use the levenshtein distance
Yeah I have concluded Mistral Small 3.2 is utterly retarded. Going back to Rocinante now.
This was a waste of time. The guy that recommended this shit should be shot.
fucking year old model remains best at roleplay
grim
>>105724828Maybe you are just so much better than some of the other people here? I'd love to see your character cards and scenarios if possible at all.
>>105724839use api or buy a DDR5 server, low param models are dead and gone
>>105724839in the poorfag segment
Honestly it's probably best if the next thread is an inoffensive OP just to keep the general usable.
>>105724839midnight miqu is still the best for rp
file
md5: b3cac9260361139bca8d74ac06b5dc90
🔍
kek, mistral small 3.2 is amazing i love it
i had to swipe sometimes or edit messages but its truly a good upgrade to nemo
>>105724850delusional if you think r1 is better for roleplay, it has the same problems as the rest of these models
not to mention those response times are useless for roleplay to begin with
>>105724858this isnt 2023
>>105724869I'm noticing qwen 235b doesn't improve at higher temps no matter what I set nsigma to. with some models high temp and nsigma can push them to be more creative, but qwen3 set to higher than temp 0.6 is just dumber in my usage. even so, I still think it's the best current local model beneath r1
>>105724869>roleplayFilth. Swine, even. Unfit to lick the sweat off my balls.
>>105724856Negotiating with terrorists.
>>105724856it should be the most mikuist Miku possible
holy shit state of 2025 lmg.......
>>105724869Try setting minP to like 0.05, top-K 10-20 and temperature at 1-4. In RP I find that most of the top tokens as long as they're not very low probability are all good continuations. You can crank temperature way up like this and it really helps with variety.
the amount of duplicate post is insane
>>105724839Anubis v1c, drummer did it again
I told you
>>105724472What the fuck is going on?
>>105717007>>105724723
The more I try to train and fuck with these models, the more I think the AI CEOs should be hanged for telling everyone they could be sentient in 2 weeks. Every time I think I'm getting somewhere it botches something very simple. I guess it was a fool's errand thinking I could hyper-specialize a small model to do things Claude can't
>>105724886I'd imagine it's worse on more active places like /v/ for example..
Please think of 6GB users like me ;_;
>>105724869>delusional if you think r1 is better for roleplaydelusional if you think anything else open weight is even close to it. Maybe you are just using it wrong?
>>105724903Do all 6GB users use cute emoticons like you?
>>105724893add anon's script to tampermonkey
https://pastes.dev/AZuckh4Vws
Good model that fits into my second card with 6gb vram?
Purpose: looking at a chunk of text mixed with code and extracting relevant function names.
>>105724903>>105724920Please use cute emoticons.
>>105716837 (OP)Newfag here.
Does generation performance of 16 GB 5060 ti same as 16 GB 5070 ti ??
>>105724914holy shit, so the spammer started all of this just so that he can trick others into installing his malware script that "fixes" the spam?
>>105724935>Bandwidth: 448.0 GB/svs
>Bandwidth: 896.0 GB/s
>>105724914someone actually competent with js should make a new one because this one will highlight a reply if you hover over it
>>105724856Not like it would make a difference, he would just fine something else to get mad over.
>>105724938I thought only VRAM size matters ?
file
md5: 33a4e361932ecc6d8d4ec546b874cf8f
🔍
>>105724949vram is king but not all vram is worth the same
>>105724949Generation performance? I assume you're talking about inference? Prompt processing requires processing power, and the 5070 ti is a lot stronger in that aspect. Token generation requires memory bandwith. This is why offloading layers to your cpu/ram will slow down generation - most users' ram bandwith are vastly slowly than their vram bandwith.
Vram size dictates the parameters, quantization, and context size of the models that you're able to load into the gpu.
>>105724951It's not about that my friend. It was already wrongly labelled in *monkey from the initial get go.
>>105724949vram size limits what models you can fit in the gpu
vram bandwidth dictates how fast those models will tend to go. there are other factors but who care actually
>>105724949vram matters most but if they're the same size, the faster card is still faster. it won't make a huge difference for any ai models you'll fit into 16gb though. the 4060 16gb is considered a pretty bad gaming card but does fine for ai
>>105724935yes. Just a little slower
>>105724935no. It is slower
>>105724935It's technically slower but the difference will be immaterial because the models you can fit in that much vram are small and fast.
>>105724968what do you mean labelled wrong
>>105724935It's actually pretty noticable if you aren't a readlet and are reading the output as it goes. Unless you're in the top 1% of the population, you probably won't be able to keep up with a 5070 ti's output speed, but a 5060 ti should be possible if you're skimming.
new dataset just dropped
>>>/a/280016848
>>105724997ok so it does nothing wrong
>>105725000sorry but japanese is NOT safe, how about some esperanto support?
>>105725000I would be interested if I knew how to clean data. Raw data would destroy a model especially badly written jap slop
>>105724968yes i didnt check it properly before posting, if you make a better one i will happily use yours or other anons
>>105725007No but you only want attention. I am not going to give it to you. You are the autist who fucks up other people's genuine posts with your spams.
Untitled
md5: 3d09600cfb3782c7014adc5bea03bb55
🔍
>>105725028I will save this image but I don't think I will go far.
I was thinking of finetuning a jp translator model but I always leave my projects half-started.
>>105725017those lns are literary masterpieces compared to the shit the average model is trained on
>>105725045Garbage in garbage out i guess.
>>105724813/lmg/ deserves much worse
>>105725017>Raw data would destroy a modelSo true sister, that's why you need to only fit against safe synthetic datasets. Human-made (also called "raw") data teaches dangerous concepts and reduces performance on important math and code benchmarks.
>>105725064I'm pretty sure he means raw in the sense of unformatted.
>>105725017Claude and deepseek are the best models and are clearly the raw internet / books with a light instruct tune, though with a cleaned coding dataset as well it seems
>>105725000That shit is as bad if not worse than our shitty English novels about dark brooding men.
>>105725081Worse because novels are more popular with japanese middle schoolers and in america reading is gay.
hunyuan gguf soon..
trust the plan
https://github.com/ggml-org/llama.cpp/pull/14425
>>105725088reading is white-coded
>>105716861You finally get out from gay shelter?
MrBeast
md5: 683956129878dc92df27373f5aeea17e
🔍
MrBeast DELETES his AI thumbnail tool, replaces it with a website to commission real artists. <3 <3
>>105725113It's on him for not doing proper market research. Anyone with a brain could have told him that it was a risky move.
dots finally supported in lm studio.
its pretty good.
>>105725113That creature is so deep in the uncanny valley I cannot consider it to be a person.
>>105725144get used to all new releases being MoE models :)
>>105725144MoE the best until the big boys admit what they're all running under the hood now (something like MoE but with far more cross-talk between the Es)
Is there a local setup I can use for OCR that isn't too hard to wire into a python script/dev environment? Pytesseract is garbage and gemini reads mmy 'problem' images just fine, but I'd rather have a local solution than pay for API calls.
>>105725162https://github.com/RapidAI/RapidOCR
https://github.com/PaddlePaddle/PaddleOCR
>director
>finally updated readme some
>https://github.com/tomatoesahoy/director
i think this brings my slop addon up to at least other st addon standards with how the page looks, a description of what it does and such
Stealing jart bussy from cudadev.
https://huggingface.co/tencent/Hunyuan-A13B-Instruct
arguing with retards is a futile, most pointless thing to do in life
you learn how to spot them and you ignore them
life is too short to deal with idiots who think they know how MoE work but don't
>>105725213I do agree with you. So many others are simply not on the same level as I am. It's almost quite insulting to even trying to establish any form of discussion with them.
>>105725213dunningkrugerMAXX
>>105725213Just because a model can answer your obscure JRPG trivia, doesn't make it a good model.
>>105725213how do I make good ai? I'm looking to make an advanced artificial intelligence that can replace millions of workers, that can drive, operate robotic hands with precision, and eliminate all coding jobs and middle management tasks.
I heard you were the guy to ask.
On 4chan.
>>105724204>In another extraordinary move, Mr. Zuckerberg and his lieutenants discussed “de-investing” in Meta’s A.I. model, Llama, two people familiar with the discussions said. Llama is an “open source” model, with its underlying technology publicly shared for others to build on. Mr. Zuckerberg and Meta executives instead discussed embracing A.I. models from competitors like OpenAI and Anthropic, which have “closed” code bases.
>>105725252Just stabilize the environment and shift the paradigm
openai finna blow you away
>540 posts
/lmg/ hasn't been this active since R1 dropped
>>105725272Had to reload model with different layers setting, maybe llamacpp bug
>>105725280>>105725272Same thing has happened to me with every mistral model and I think also with gemma 3 when using llama.cpp.
Maybe it is related to memory just running out.
>>105725269Check your context settings.
Gemma3n is able to explain sneed and feed joke but avoids the words suck and fuck also the season number is wrong(it's s11ep5).
Is this channel AI-generated? Posting 3 videos a day like clockwork. Monotonous but fairly convincing voice with subtitles
https://www.youtube.com/watch?v=aQy24g7iX4s
>>105725310not watching this, but there are many automated channels these days. I have no idea why the fuck anyone would invest into this since youtube's monetization pays literal cents and you would likely spend more on ai inference
>>105725252If you can optimize it to beat pokemon red/blue the dominoes will start to fall
>>105725321Youtube doesn't pay literal cents as you say lmo
let me guess, he's going to do this for another day or two before getting "proof" that its a miku poster spamming these duplicate posts
Reasoning models have been a disaster.
That and the mathmarks.
>>105725336No, your boyfriend OP being a disingenuous tranny is enough.
For a sparse 8b model, Gemma-3n-e4b is pretty smart.
>>105725352it actually redeems the gemma team
the previous releases were disappointing compared to gemma 2 other than having greater context length
>>105725352multimodality usually makes models smarter.
Although
>text only outputfail.
Literally never going to get a decent local 2-way omni model from any of the big corpos at this rate.
How does training a LORA for a reasoning model work? Same way or do I have to generate the thought process part in my training data?
>>105725371>text only outputYeah, that sucks giant balls.
>>105725371>multimodality usually makes models smarter.what? thats not true at all.
there is huge degradion.
did you try the first we had? was a qwen model last year with audio out. was tardation i havent seen since pyg.
recently they had another release and it still was bad but not as severe anymore.
even the cucked closed models (gemini/chatgpt) have degradation with voice out.
this is a problem i have not yet seen solved anywhere.
>>105725371>Literally never going to get a decent local 2-way omni model from any of the big corpos at this rate.they do not want to give you an AI with the super powers of a photoshop expert that could be decensored and used to gen all sorts of chud things without any skill requirement
two way multimodal LLMs will always be kept closed
>>105725392Meanwhile all the AI companies have quite obviously given Israel uncensored image-gen to crank out pro-genocide propaganda with impunity.
I hope they all fucking end up in the Hague.
>>105725392>>Literally never going to get a decent local 2-way omni model from any of the big corpos at this rate.how can you get something that doesnt even exist beyond government blacksites right now lmao
Why is this thread repeating itself
>>105725392>they do not want to give you an AI with the super powers of a photoshop expert that could be decensored and used to gen all sorts of chud things without any skill requirement Head over to ldg. This already exists.
>>105725392>they do not want to give you an AI with the super powers of a photoshop expert that could be decensored and used to gen all sorts of chud things without any skill requirement Head over to ldg. This already exists.
>>105725413Nigger having a melty.
>>105725420if you mean that new flux model it's hot garbage barely a step above the SDXL pix2pix models
say what you will about the nasty built in styling of GPT but its understanding of prompts is unrivaled
>>105725413save yourself bro
https://pastes.dev/AZuckh4Vws
read the script before pasting it into tampermonkey
>>105725423The AI generals on here have the worst faggots I swear
>>105725426Not only that but the interplay between the imagegen and textgen gives it a massive boost in creativity on both fronts. Although it also makes it prone to hallucinate balls. But what is the creative process other than self-guided hallucination?
>>105725426True. Wish it wasnt so. But it is.
I just pasted the2 posts and just wrote "make a funny manga page of these 2 anon neckbeards arguing. chatgpt is miku".
I thought opencuck was finished a couple months ago. But they clearly have figured out multimodality the best.
Sad that zucc cucked out. Meta was writing blogs about a lot of models, nothing ever came of it.
>>105725434This.
OP mikutranny is posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714098 snuff porn of generic anime girl, probably because its not his favourite vocaloid doll and he can't stand that, a war for rights to waifuspam in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag janny deletes everyone dunking on trannies and resident spammers, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on.
>>105720676Calling Deepseek a <40B model is dumb shit. I've tried 32b models, and 51b Nemotron models. Deepseek blows them out of the water so thoroughly and clearly that the whole square root MoE bullshit went out the window.
An 80b MoE is going to be way better than a 32b dense model.
A 235b MoE is going to be way better than a 70b dense model.
It's RAMlet cope to suggest otherwise.
I know I'm kind of late, but holy fuck L4 scout is dumber and knows less than fucking QwQ.
What the hell?
>>105725363It hallucinates like crazy. At least the GPTQ version, and with trivia questions.
>>105725469The shitjeets that lurk here would have you believe otherwise.
>>105725473/lmg/ shilled L4 on release.
>>105725482i dont think that is true. at least i dont remember it that way.
people catched on very quickly how it was worse than the lmarena one. that caused all that drama and lmarena washing their hands.
>>105725482Maybe /IndianModelsGeneral/
I don't get everyone's fascination with gpt-4o image generation. It's a nice gimmick but all it means is that you get samey images on a model that you likely wouldn't be able to easily finetune the way you can train SD or flux. It's a neat toy but nothing you'd want to waste parameters or use for any serious imgen.
>>105725504>finetuneThat requires a small amount of work which is too much for zoomers.
>>105725510finetuning image models is NOT small amount of work unless you of course mean shitty 1 concept loras
>>105725504Thats like saying large models are useless because you can guide mistral from 2023 with enough editing.
Especially for the normies. That it "just works" is exactly what made it popular.
file
md5: c27456b0f87c141105df3852880a424b
🔍
humiliation ritual
>>105725535Meta is about family and friends bro not numbers.
https://www.reddit.com/r/LocalLLaMA/comments/1llndut/hunyuana13b_released/
>>105725549>The evals are incredible and trade blows with DeepSeek R1-0120.Fucking redditors man.
This thread is such a gutter but there is no alternative. Imagine having to be on reddit.
>>105725549Thanks, reddit. You're only 8 hours late. Now go back.
>>105725555checked
let them cope
>>105725570Architecture not supported yet.
>>105725584What the fuck are they doing all day? It better not be wasting time in the barn.
>>105725570after jamba, get in line or there will be consequences
>>105725596It's not even page 9 yet, chill the fuck out newcomer.
>Meta says it’s winning the talent war with OpenAI | The Verge
https://archive.ph/ZoxE3
aside from the expected notes on meta swiping some OAI employees, there's this of note:
>“We are not going to go right after ChatGPT and try and do a better job with helping you write your emails at work,” Cox said. “We need to differentiate here by not focusing obsessively on productivity, which is what you see Anthropic and OpenAI and Google doing. We’re going to go focus on entertainment, on connection with friends, on how people live their lives, on all of the things that we uniquely do well, which is a big part of the strategy going forward.”
>>105725468DeepSeek is only 37B by active parameter count. It's 158B by square root law, which seems more accurate.
ITT: people believe corpos will give away something valuable
All of them only ever release free weights when the weights can be considered worthless
Flux doesn't give away their best models
Google gives you Gemma, not Gemini
Meta can give away Llama because nobody wants it even for free
Qwen never released the Max model
So far the only exception has been DeepSeek, their model is both desirable and open and I think they are doing this more out of a political motivation (attempt to make LLM businesses crash and burn by turning LLMs into a comodity) rather than as a strategy for their own business
some people in China are very into the buckets of crab attitude, can't have the pie? I shall not let you have any either
So because Qwen3 VL has been replaced by VLo, does that mean they aren't even going to bother releasing an open source vision model anymore? I was waiting for it to make better captions...
Chatgpt keeps telling me that MythoMax 13B Q6 is the best .ggup to immersively rape my fictional characters in RP, is that true or is there better?
>>105725630does the sweater hide all the cut marks on your wrists?
>>105725606You realize you just responded to the spam bot right?
https://www.nytimes.com/2025/06/27/technology/mark-zuckerberg-meta-ai.html
https://archive.is/kF1kO
>In Pursuit of Godlike Technology, Mark Zuckerberg Amps Up the A.I. Race
>Unhappy with his company’s artificial intelligence efforts, Meta’s C.E.O. is on a spending spree as he reconsiders his strategy in the contest to invent a hypothetical “superintelligence.”
>
>[...] In another extraordinary move, Mr. Zuckerberg and his lieutenants discussed “de-investing” in Meta’s A.I. model, Llama, two people familiar with the discussions said. Llama is an “open source” model, with its underlying technology publicly shared for others to build on. Mr. Zuckerberg and Meta executives instead discussed embracing A.I. models from competitors like OpenAI and Anthropic, which have “closed” code bases. No final decisions have been made on the matter.
>
>A Meta spokeswoman said company officials “remain fully committed to developing Llama and plan to have multiple additional releases this year alone.” [...]
>>105725630Very feminine hand, typical of average /g/ tranny.
if I was 'ggaganov I would just make it so that llama.cpp works with everything by default instead of having to hardcode every model, but I guess that sort of forward thinking is why I run businesses and he's stuck code monkeying
>>105725653Imagine projecting this much
>>105725656zuck might just be the dumbest CEO ever
>>105725656Wang's words, zuck's mouth
>>105725656>Godlike Technology,Is god omnipotent if he can't suck a dick in an acceptable manner?
>>105725656>llama isnt literally AGI because uhhhmm because its open source and others have access to itchat?
file
md5: d87f2933b6eb2524589642fb30c8e3cf
🔍
i must be pushing it by now, but 3.2 is still hanging along
>>105725656meta just got told by a judge, that they are in fact not covered by the fair use law, even if they "won" the case, but that was bc both lawyer teams were focusing in the wrong part of the law. the judge said that if the generated models compete in any way with the training materials it wont be fair use
of course they are discussing deinvesting, they are not leading and the legal situation is getting worse
>>105722291You're the rag
>>105725656>In another extraordinary move, Mr. Zuckerberg and his lieutenants discussed “de-investing” in Meta’s A.I. model, Llama, two people familiar with the discussions said. Llama is an “open source” model, with its underlying technology publicly shared for others to build on. Mr. Zuckerberg and Meta executives instead discussed embracing A.I. models from competitors like OpenAI and Anthropic, which have “closed” code bases.
>Bunch of worthless LLMs for math and coding
>Barely, if any, built for story making or translating
WHEN WILL THIS SHITTY INDUSTRY JUST HURRY UP AND MOVE ON!
117045
md5: 052240c898909d79f88900ab46141af1
🔍
>>105724105> the future is fucking bright
new llama.cpp binary build wen
>>105725737When you git pull and cmake, anon...
https://www.nytimes.com/2025/06/27/technology/mark-zuckerberg-meta-ai.html
https://archive.is/kF1kO
>In Pursuit of Godlike Technology, Mark Zuckerberg Amps Up the A.I. Race
>Unhappy with his company’s artificial intelligence efforts, Meta’s C.E.O. is on a spending spree as he reconsiders his strategy in the contest to invent a hypothetical “superintelligence.”
>
>[...] In another extraordinary move, Mr. Zuckerberg and his lieutenants discussed “de-investing” in Meta’s A.I. model, Llama, two people familiar with the discussions said. Llama is an “open source” model, with its underlying technology publicly shared for others to build on. Mr. Zuckerberg and Meta executives instead discussed embracing A.I. models from competitors like OpenAI and Anthropic, which have “closed” code bases. No final decisions have been made on the matter.
>
>A Meta spokeswoman said company officials “remain fully committed to developing Llama and plan to have multiple additional releases this year alone.” [...]
JUST RANGE BAN YOU FUCKING MODS!
As LLM pretraining costs keep dwindling, it's only a matter of time until someone trains a proper creative model for his company,.
>>105725759We already there >>>/r9k/81615256
Though i would recommend >>>/lgbt/ as appropriate safespace for all of us.
>>105725759You go first and wait for me.
file
md5: c35ba1bd7139451a23262f9bf421d010
🔍
its not working :(
>>105725766Might not work if they are using ecker or gay or some residential proxy.
>>105725789heeeeeeeeeeeeeeeeeeeeeeeeeelllllllllllllppppppppp pleaaaaaseeeeeeee
>>105725795have you tried asking chatgpt to write the script for you?
>>105716978What exactly are you complaining about? I like 3.2 (with mistral tekken v3) but it definitely has a bias toward certain formatting quirks and **asterisk** abuse. This is more tolerable for me than other model's deficiencies at that size, but if it triggers your autism that badly you're better off coping with something else. It might also be that your cards are triggering its quirks more than usual
>deepseek/ccp can't steal more innovation from openai
>they fail to release new models
they must be shitting their pants about openai's open source model that will destroy even the last argument to use deepshit
>>105725815Zero chance it's larger than 30B.
>>105725815openai just lost their head people to meta after being stagnant for forever
>>105724939I'm not competent but A.I. is and it seems to work right.
https://pastes.dev/l0c6Kj9a4v
>>105725815Why can't they steal anymore?
>>105725820its going to be a 3B phone model that blows away benchmarks for its size
>>105725833they can't steal because there's no new general model
DeepSeek V3 was 100% trained on GPT4 and R1 was just a godawful placebo CoT on top that wrote 30 times the amount of actual content the model ends up outputting. New R1 is actually good because the CoT came from Gemini so there isn't a spam of a trillion wait or endless looping.
>>105725848deepseek is as raw as a model gets, they trained on the raw internet with the lightest of instruct tunes probably a few million examples big. If they trained on gpt it would sound much more like shitty llama
>>105725815>Still living in saltman's delusionNgmi
The more I try to train and fuck with these models, the more I think the AI CEOs should be hanged for telling everyone they could be sentient in 2 weeks. Every time I think I'm getting somewhere it botches something very simple. I guess it was a fool's errand thinking I could hyper-specialize a small model to do things Claude can't
arguing with retards is a futile, most pointless thing to do in life
you learn how to spot them and you ignore them
life is too short to deal with idiots who think they know how MoE work but don't
>>105725882I do agree with you. So many others are simply not on the same level as I am. It's almost quite insulting to even trying to establish any form of discussion with them.
this thread has gone down the poopchute, jesus
what is wrong with the spammer retard
>>105725882dunningkrugerMAXX
>>105725882Certainly good sir, we are above all the rabble. We always know best.
>>105725882Just because a model can answer your obscure JRPG trivia, doesn't make it a good model.
>>105725882how do I make good ai? I'm looking to make an advanced artificial intelligence that can replace millions of workers, that can drive, operate robotic hands with precision, and eliminate all coding jobs and middle management tasks.
I heard you were the guy to ask.
On 4chan.
>>105717903>how to clean dataThis is something AI should be able to do itself.
>>105725913Just stabilize the environment and shift the paradigm
>>105719870I find nemo better than deepseek even, I just want the same thing with more context.
>>105725875The fact that models 'sleep' between prompts means that there is no sentience.
The AI 'dies' every prompt and has to be 'reborn' with context so it can pretend to be the same AI you prompted 1 minute ago.
The LLMs we have no have absolutely nothing analog to sentience When people cry that we need to be kind to AI, you might as well pause a movie before an actor gets shot.
>>105725913If you can optimize it to beat pokemon red/blue the dominoes will start to fall
>>105725826yes, this is working for me too
>>105725894qrd on the spammer?
>>105725630>migger has a soihandpottery
>>105725953Mental breakdown. Has no control over his own life so he wants to impose rules on others. He'll get bored.
>>105725934>The fact that models 'sleep' between prompts means that there is no sentience. It's more than that I think. People sleep too. We go unconscious for long periods of time. Unlike LLMs our brains are always "training." So a part of the experience of consciousness is the fact your "weights" so to speak are always reshuffling, and your ability to reflect on how you've changed over short and long periods of time contributes to the mental model of yourself. It's like we have many embeddings and some of them understand the whole system and how it changes over time. LLMs just have one and their only "memory" is the context which is just reinterpreted in chunks.
An insect might have less ""intelligence"" as perceived by a human but it has more sentience than a LLM for sure. LLMs don't even have any notion of acting upon a will of their own. They react to what you feed them and have no ability to impose a form of will outside of the perimeter set by your prompt.
Prod an ant with a leaf or something, at first it will be distracted and react with curiosity or fear, but it will quickly go back to minding its own business : looking for food, or going back to its colony. Prod a LLM with data and it will not "think" (by which I mean generate MUHNEXTTOKEN) of anything other than that data.