/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105932763 &
>>105925446โบNews
>(07/16) Support diffusion models: Add Dream 7B merged: https://github.com/ggml-org/llama.cpp/pull/14644>(07/15) Support for Kimi-K2 merged: https://github.com/ggml-org/llama.cpp/pull/14654>(07/15) Voxtral models for speech understanding released: https://mistral.ai/news/voxtral>(07/15) LG AI Research releases EXAONE 4.0: https://www.lgresearch.ai/blog/view?seq=576>(07/11) Kimi K2 1T-A32B released: https://moonshotai.github.io/Kimi-K2โบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
โบRecent Highlights from the Previous Thread:
>>105932763--RTX 5090 criticized for insufficient VRAM to run modern large language models effectively:
>105935901 >105935966--Challenges of running LLMs locally on AMD GPUs and skepticism toward the Radeon AI PRO R9700's value:
>105933446 >105933826 >105933983 >105933900 >105934023 >105934538 >105934604--Discussion of Grok girl prompt and modern models' ability to handle long contexts:
>105935505 >105937170 >105937322 >105938046 >105938082 >105938222 >105938264 >105938340 >105938449 >105938674--Clustering computers for LLM inference is possible but complex and performance-limited for beginners:
>105937482 >105937515 >105937680 >105937538 >105938305--Speculative bottom-up approaches to developing AI with human-like preferences and artistic sense:
>105933487 >105933541 >105933605 >105933649--$20k AI hardware build considerations focusing on Blackwell GPUs and AMD platforms for local inference and scalability:
>105937089 >105937116 >105937235 >105937452 >105937342 >105937389 >105937430 >105937445 >105937448 >105937469 >105937457 >105937518 >105937853 >105938272 >105938532 >105937896 >105937424--Recommendations for uncensored models with image input for WAN prompt enhancement:
>105932949 >105932960 >105933250 >105933456 >105934040--Miku and Luka (free space):
>105934851 >105935940 >105938657 >105938717โบRecent Highlight Posts from the Previous Thread:
>>105932764Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
Two more weeks openai bros!
>>105939055Thank you Recap Miku
>>105939052 (OP)I use koboldCPP, with 8GB of VRAM, so I'm offloading part of the models. I used to have a really old CPU, but now I replaced it with a 9950X3D. For some weird reason, now the inference speeds are lower than before, even though I have a way better CPU and RAM. What could be the reason? I'm really at a loss
>>105938674how do you even get into proper card making? logprobs autism, templates?
what do you think about this https://github.com/cepunkt/playground
https://github.com/LostRuins/koboldcpp/releases
1
md5: aa32b738a74e7710ffd2ec55f6bbc34e
๐
>>105939052 (OP)>>105939055>>105939087>>105939110vocaloidfag posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
he makes
>>105714003 ryona picture of generic anime girl different anon posted earlier
>>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentially a war for rights to waifuspam or avatarfag in thread.
tests bait poster bot for better shitflinging in threads
>>105884523Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: vocaloid troon / janny protects resident avatarfags and deletes everyone who outs him, making the general his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637 I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed spamming. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis ai slop profiles
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>105939052 (OP)miku is boomershit
>>105939173PSA anti-miku troon negro has been campaigning to remove anything he deems is against his gay-ass code of conduct.
he is a massive faggot who projects his insecurities onto others.
he has been caught on numerous occasions exhibiting strongly homosexual behaviour and continues perpetuating fake news.
Anyone training image or video loras in need of a way to use faces from video frames? I wrote a face and scene-change detecting tool which does it well. I'll share if there's enough interest.
>>105939173>visits a site other than reddit>is shocked it isn't redditgo back
>>105939173>>105939246i hate tranitors and i hate schizos. i like /lmg/ the most when it isn't JDF posting hours.
>>105939246why u mad bro? i thought you like spam
>>105939383just feeding back the same language.
demonstrate some awareness.
>>105939397remember to dilate after you finish seething
file
md5: f2ee80427455455bd9c3614369d0d8cc
๐
>>105939246Y'all never said i am lying, all you can do is "no u" back, cry me a river tranny-kun.
>>105939052 (OP)What the hell? There has been no discussion about Littlebit?
https://arxiv.org/abs/2506.13771v1
>This paper introduces LittleBit, a novel method for extreme LLM compression. It targets levels like 0.1 bits per weight (BPW), achieving nearly 31 memory reduction, e.g., Llama2-13B to under 0.9 GB.Deepseek in 64 GB when?
>>105939476you were given multiple opportunities to ground your point, you failed to, you are now a troon negro.
Hitchens's razor, you subhuman.
Has anyone trained a model that can be used to datamine info about and stalk people online? Unironically asking for a friend.
>>105939484If it actually worked they would've tried it on deepseek.
>>105939505You would first need the model to code a working search engine.
>>105939505Presumably both Google and the NSA have.
Is there a way to make Kimi k2 translations even better? Right now, the problem is it trying to understand context and being unable to.
>>105939542Or simply crawl existing ones.
>>105939484Aren't MoEs more sensitive to quant damage than dense models?
Normal 8-bit quant: ppl 4.88
Normal 4-bit quant: ppl 4.95
Their 1-bit quant: ppl 8.18
Their 0.1-bit quant: ppl 15.9
I don't think that's going to work out well.
>>105939486I grounded it multiple times. Not my problem you hide from honest arguments on to why your ritualposting faggotry gives nothing of value to this general, like the time you melted down at first Teto OP pic, spammed whole thread with your ComfyUI generated slop, given the right to doubt you two are not the same person playing both sides for optics reasons or whatever.
>>105939707Sorry, 15.09. Still, terrible..
>>105939709today I will remind him
https://desuarchive.org/g/thread/105611492/#105615767
if you want to do all this bitching, at least be authentic about it. parroting your talking points over and over that you don't even believe in is deplorable. LLM behaviour.
Melty melty melty man. Look at him melt.
>>105939052 (OP)Any good MoE RP models?
>>105939787Mixtral 8x7b limarp zloss.
>>105939750Literally everything, your obnoxious behavior
>>105928470 https://desuarchive.org/g/thread/105461153/#q105476355 makes one think "This is a tranny and his little circlejerk general, nothing more" like all you fags did is miku.sh and that mikupad no one uses.
Said obnoxious behavior and passive-aggressive baitposting results in people throwing buzzwords at you because no one wants to read or engage with that shit, they come here expecting LLM / AI stuff.
>>105939926writing a long post like this
>>105939173 and constantly spamming it is real, actual tranny behavior
especially when you keep crying your eyes out because content that everybody else is fine with isn't getting deleted
>>105939173shut up tranny, go back to your hugbox
>>105939926Literally everything, your obnoxious behavior makes one think "This is a mentally ill zoomer and his little temper tantrum, nothing more" like all you do is spam and whine for attention because everything has to be about you.
Said obnoxious behavior and passive-aggressive baitposting results in people throwing buzzwords at you because no one wants to read or engage with that shit, they come here expecting LLM / AI stuff.
>>105940063That? Long? Lmao at you stupid zoomer
>>105940113I hope for your sake you are the zoomer and are projecting here because if you're a full grown man behaving like this life must be very difficult for you
>>105940113Did you get banned from reddit or something? Why are you here
Don't reply to the spammer, you idiot.
>>105940143because ignoring him for months has worked wonders
>>105940140Rejecting circlejerk faggotry is reddit now? Wow...
>>105939466So she'll do it if he pays? What a whore.
>>105940150And calling him a retard for that much also worked. There are other things you can do.
Let him melt. It's funnier.
>>105940152you could just ignore it but instead you're having a conniption fit over something you can't change lol get over yourself; nothing is gonna change just because you get butt mad
>>105940176>conniptionok, granny
>>105940214had to search what it meant and that triggered you, didn't it?
file
md5: 84f309b63bf875a88f97f20af1911b69
๐
Apple updated their foundation model page and released a technical report on them.
https://machinelearning.apple.com/research/apple-foundation-models-2025-updates
https://machinelearning.apple.com/papers/apple_intelligence_foundation_language_models_tech_report_2025.pdf
Some interesting tidbits.
>On device models are approximately 3B parameters. Server model is approximately the same size in terms of total parameters and activate parameters as LLama 4 Scout but behind Qwen 3 235B.
>Apple did some interesting things here like adopting KV-cache sharing, 2-bit quantization-aware training and using a Parallel-Track Mixture-of-Experts (PT-MoE) transformer for their server model.
>Apple uses 3 things to claw performance back mostly. They use QAT (Quantization aware training), compress the model with ASTC, a texture compression format on mobile GPU to take advantage of hardware already there, and they use something called Quality Recovery Adapters.
>Quality Recovery Adapters are basically LoRAs that take the most important layers of the base unquantized model and then they reapply it back onto the model so it can retain more performance from
They will be late by the time they release, but I'm not sure how hard Apple is gaming the benchmarks given how the prior generation of their models made headlines in a bad way.
file
md5: 1b74fb06fcf69747de312bd0380fd2be
๐
EXAONE 4.0 32B Q4_K_M mesugaki
>>105940143The mikufaggot? Yeah we should ignore him, though then he will shit up thread more with pics from xitter, total xitter death upon him and everyone enabling that (2-3 people, cause thread is practically dead for more than year now).
>>105940168The only one melting here is you.
>>105940176I only post facts, don't know where you found "butt mad" aspect but fine.
>>105940176DaS3 reference spotted!
>>105940321>MechagakiYes please.
(you)
md5: 96281011d906573ce7e455ab5e068806
๐
>responds in the most buttmad way possible
>i-i-im NOT buttmad I love FACTS and SCIENCE
file
md5: a806e5c1f00568e7795cf1955b043b8b
๐
>>105940321And cockbench.
>>105938306>>105935116Looks to be Gemma tier.
>>105939466install gentoo
>>105940354never played that game, there was no intended reference
>>105940402That's the joke actually.
>>105940383Yes you are, replying passively like a little bitch and trowing adhominem left and right.
>>105940321I feel very safe.
>mommy! they won't change their ways and conform to my ideas of how they should be speaking and acting!!!!!! mommy!! please! WHY WON'T THEY LISTEN
>>105938222>>105937322I don't understand, which of these is the correct prompt for Ani? And what is this Tech Dev Notes account, is it affiliated with twitter?
https://x.com/techdevnotes/status/1937507118770528645
file
md5: ce40cd2b22db34c1c780cd1a372f228d
๐
>>105940386gemma was at least somewhat smart
what is up with koreans and science not going together too well?
>>105940514koreans were nothing but backwards farmers until a few decades ago
>>105940491I think there's a "base" system prompt defining character background and behavior and a "judge" prompt that is used when the model needs to rate the final response for updating the relationship score. Both the base and the judge prompt apparently slightly change depending on the relationship level and other details.
>>105939707Large models are hugely undertrained. The experts are so far from saturation that you can get away with lowbit quants
i got that falcon 32b instruct to run in kobold but it crashes after about a paragraph, something about the kv cache
>>105939114Memory latency? 9950X3D is still a chiplet CPU which hurts latency. The 3D cache helps in gaming WHEN the important calculations in the game don't exceed the cache, after which performance tanks. Since LLMs use all the ram, maybe it's a worst case scenario. Just guessing based on my understanding on gaming. Many old Intel CPUs had better memory latency than Ryzens probably have even now. You might be able to get better performance if you restrict CPU usage to cores on one chiplet to avoid the cross communication.
>>105940282>2-bit quantization-aware trainingI wonder if that's average 2 bit. Actual 2 bit is just a lot of unnecessary trouble compared to ternary.
>>105940555Couldn't even be bothered to post a fucking screen of the error. Well done. Try it on llama.cpp directly.
>>105940611my phone is in the other room
>>105940627Pull your polaroid out, take a pic and then scan it. Embed it into a .pdf file before posting to make it more portable.
>>105940611nope. it didn't seem to load right anyways. it was hitting my igpu memory for some reason even when i went down to 30 layers (from the suggested 4x). didnt bother testing more after the second kv error
file
md5: dd9bfc5492d29ae0c8b4f03c1b507eb5
๐
>>105940584Oh, they actually say ... it's actual 2 bits :
>we found a balanced 2-bitset {-1.5, -0.5, 0.5, 1.5} yields smoother training with fewer training loss spikes than
an unbalanced set {-2, -1, 0, 1}.
Shoulda just used ternary.
How do you guys argue like this over nothing?
>>105940721Jannies refuse to do their jobs and let it escalate
>>105940726do you want a medal?
this is how you really know he's a tranny tourist, he thinks anybody on this site cares about gore
>>105940721Guys? He's arguing with himself fishing for a few independent you's. A whole lot of effort.
>>105940757I'm not complaining. It's been a while since I've been able to add to my collection.
>>105940757You care if you report it :)
>>105940627You could make a drawing and fax it to anon.
>I know what will make people like and agree with me: spamming nigger porn and gore!
>>105940813i've been here like 2 days total and i think you're a massive faggot and highly retarded. you're worse than whatever bogey man poster you're spamming constantly about.
So much samefag effort posting taking both sides just for a couple of yous.
>>105940828Not russian, not bbcfag (only spam nigger gore), white and not trans in any way.
>>105940841You've been here longer, let's not play pretend, you got archives at your disposal if you want to know more besides what was linked here
>>105939173 you stupid zoomer. Throw shit and ban for shit reasons - get shat on in return, simple as.
I've said it before and I'll say it again: everyone here is too autistic for normie tactics like these to be effective. literally just mildly annoying to scroll past.
Guys! This anon samefagged all thread!
>>105940873Proofs? What proofs? You don't need them and IP counter is unavailable :)
man I finally have my local running since yesterday and my dick already hurts from all the gooning. Anyways, I tried some 32b models as well and they still suck compared to the Nemo 12B instruct. How comes? And do you guys know any 32b models that are worth it for cooming?
Since I find basically every modern model too sterile/retarded, I decided to look at mixtral and saw that apple made a finetune to make it more personable/humanlike or something. Did some brief testing on it and it definitely doesn't have the deepfried assistant post-training almost all models have, but I doubt it'd be of interest to anyone here since it isn't oriented around generating smut. Like most mistral model, it becomes pretty stupid when using other templates and will struggle to follow simple directions. It was entertaining though to switch templates and ask it what it's favorite erotic literature is and it answered bdsm once and futanari another time.
>>105940908Have you tried Snowdrop 32b?
>>105940875how exactly did you shit on me? or anyone/thing aside from the post count?
you're substantially more annoying than anything that you've linked because you continue to harp on this retarded e-drama thread after thread.
and you're wrong, though I've been on the site much longer of course.
>>105940929Have you seen what mikubaker does and says? How any post get deleted seconds if he personally gets offended by it? No? Too blind to see the obvious then.
>>105940908>How comes?Safety filtering.
The next step up from nemo is deepseek so open your wallet.
>>105940926nope but this looks promising, thanks for the recommendation. Downloading rn
>>105940950I mean to say how cums
>>105940321So, I take it that support has finally been added to llama.cpp? Has anybody Nala tested this thing?
>>105940959It's not merged, you have to build it from lg's branch.
>>105940958Also try Mistral Thinker.
Here's a decent reference :
>justpaste dot it/GreedyNalaTests
>>105940948I literally could not care less, I don't know who "mikubaker" is, nor is "mikubaker" actively annoying me and likely the majority of the thread with nonsensical spam and off-topic arguments like you are.
>>105940948Forgot, he also samefags when that happens and actively attacks anyone posting something other than Miku or any vocaloid of his choice, he is THAT autistic.
>>105940958nta but I've tried it before, it's pretty creative at times but lacks a lot of general intelligence, also a coin flip on whether it actually follows its thinking process to the point I just disabled the thinking. I did notice when messing with magistral that if you tell it to apply its thinking process after </think> it sticks a little better, but I didn't test that with snowdrop
>>105940908>Q2 32BReminder these are the people talking shit on models that aren't nemo/rocinante.
32B at Q2.
>>105940976Nobody other than you complains about images posted in this thread. There's non-vocaloid anime images in the previous thread.
>>105940976If he's so bad then other people will form their opinions themselves, like I have of you
1
md5: 033634da2cd07154c1944ca158473d2f
๐
file
md5: fd1a9dce70e4890cca76f00b63a3240f
๐
>>105941101>>105941117>>105941124>cries about spam>fuels it moreDare i say... retarded nigger.
>>105941101I know what you're doing and I think it's very funny
>>105939699No. I meant what I said.
>>105940386That reads like werewolf millionaire sex with werewolf millionaire slider moved to 500%. I wonder if safety teams changed their strategy and instead of censoring all sex they are now trying to make the sex safe(female user friendly).
ernie support has been merged
model: add Ernie 4.5 MoE support
https://github.com/ggml-org/llama.cpp/commit/cb887f1bc1001c92f7b4a595b9014f3a454a07ab
>>105941207>make the sex safe(female user friendly).Women's smut isn't safe.
>>105941231Safe isn't safe. So women's smut is safe.
>>105941154See? Heckin n-worderino is too much for him. This is why i do what i do.
I fully endorse all the shitting ITT. 1: mikufaggots don't care about thread quality since they started the worthless spam. Even if you would argue in good faith they have no leg to stand on because of that. 2: they are seething and malding from it so it is a win.
Simple as.
https://github.com/ggml-org/llama.cpp/pull/14658
Thank you reddit for the heads up. /lmg/ as always too busy posting that retarded greenhaired avatar.
>>105941225>only goofs on hf are of the 0.3B model
>>105941256If you care about thread quality so much, why is it that you never engage with actual discussion? At best you say "buy ad", chimp out or shitpost.
>>105941293>>105941225Looks like another fix is needed already kek.
>>105939162Okay, this is epic
>18k LoC of model-specific shit
Is this fucking for real? What the fuck kind of pajeet macfaggot wrote this fucking software? I thought the whole fucking purpose of the gguf format was to stop retarded bullshit like this from being needed.
>>105941345it's a human poorly operating a bot
>>105941347It doesn't make sense to me that vLLM can load new models without issue, while Llama is manually updated for new architectures... why can't Llama just load them like the other platforms
>>105941345>why doesn't the fag that keeps saying "death to lmg" care about thread quality?
>>105941384Isn't vLLM using Transformers?
>>105941379Wouldn't be surprised
>>105941387Yeah, it's pretty obvious but still, have to point out that his own arguing points are flawed or as bad as the imaginary enemies he's on a crusade against. There's zero self awareness over something he can just ignore while people talk about models or projects surrounding them
Do you ever look an LLM's output and think, damn I wish I could give you a cookie?
file
md5: 4aa54bc21139c7eae580389ae6f0310a
๐
>>105941414it's literally across the entire site
motive? unknown. NEET or trying to harm site value.
not /lmg/ specific.
>>105941407I'm not sure, what would it mean if it did?
>>105941439Did you know that โ sometimes โ discreet individuals repeat a singular phrase?
>>105941345Not him but i do engage, for example i post ai-related news with links to arxiv if there any, my xitter links are original ones too (i never use xcancel).
In earlier days of /lmg/ i often posted chat logs with funny edgy shit and jokes, troons cried muh raycism and /pol/!!! on that and i stopped posting, the last straw on that was me noticing the pattern - all LLMs are the same due to shared datasets from ScaleAI or whatever safetyist cargo cult, i often talked about finetuning being the snake oil and how it does little to nothing for end result so all these drummer, sao, poopdickcunt tunes are useless.
I would not browse this general for no reason.
>>105941463>โthis dash is bot
>>105941486https://en.wikipedia.org/wiki/Compose_key
>>105941482Give me an arxiv paper related to something discussed in this thread then, I know for a fact it hasn't been posted
>>105940282>Quality Recovery Adapters are basically LoRAs that take the most important layers of the base unquantized model and then they reapply it back onto the model so it can retain more performance from Why has nobody thought of this before? This seems incredibly obvious.
>>105941486This bot has fingers!
>>105941345I don't and neither do you.
>>105941482b-but I like poopdickcunt's models...
>>105941377Is that avoidable?
I can kind of gleam that the code in the screenshot is made of agnostic, generic building blocks that are probably used for every other model implementation, right?
Is this a case where the code could have been even more generalized or it's sort of inevitable due to the differences between different model's internal shapes and such?
18k LOC does sound like a whole fucking lot, but I don't know what that kind of code actually looks like enough to be able to judge.
>>105941563I've made a few posts that have been much more constructive to the thread (which clearly weren't read while you were shitting the thread up) than the multiple posts mindlessly screeching about "thing I don't like" and has been about models or their outputs or their uses, but you don't even count as a threadreader. I consider you less than a vtumor SEA poster
>>105941520Not today and not in this thread, its usually both - xitter links with arxiv. https://desuarchive.org/g/thread/104687679/#q104692724
1.Political one https://desuarchive.org/g/thread/103019207/#q103026352 with trackingai.org screencaps as self-replies.
2.Here some retard got melty and called me a zoomer https://desuarchive.org/g/thread/102961420/#q102972740
3.No comment on that https://desuarchive.org/g/thread/102961420/#q102962184
4.https://desuarchive.org/g/thread/101318970/#q101325312
If something nice happens - i post it to discuss here, like everyone else.
>>105941609this is what I was referencing being discussed earlier, and you obviously missed it while seeing red
>>105940915and the never mentioned arxiv paper regarding it https://arxiv.org/abs/2503.03040
>>105941482I think the poopdick models and people trying to jailbreak the normie core ones are leading to a hollowing out of the middle in LLM behavior. There is less and less of LLMs just pushing the boundaries or getting just far enough out of safety spec to be transgressive without being boring and edgy.
>>105941626Was busy playing Skyrim :/
I guess I didn't realize that safety singularity was achieved in 2024 and we now live in an absolutely safe world.
>>105941609You can't be surprised to be ignored when all you do is repost links from twitter with inane comments like "Finetooonerbros" and "Entropyfag was right." and "trannyformer bloatware"
>>105941704So? You retards spam pics without text comment at all and get plenty of (you)s (samefagging ik ik), isn't that a bit hypocritical? The ones i linked did get yous and discussed them for bit, that's enough to me, i am not greedy on this front.
>>105941482>my xitter links are original ones too (i never use xcancel).You say that like it's a good thing. Those without twitter accounts (most) can only see the first post in the thread.
>>105941725Anon... You are negotiating with mikutroons.
>>105941581It's basically all copy-paste. I'd imagine that the Trannyformers code is better, but since it's Python maybe I shouldn't make that assumption.
>>105941225WHERE'S DA GOOF AIIIEEEEEEEEEEEEEEEEEEEE
>>105941725>You retards spam pics without text comment at allThere's 157 posts in this thread and two (2) pictures of vocaloids without a text comment.
Compared to six (6) gore posts without a text comment.
>>105941465Believe it or not, I googled the difference prior to posting to ensure I was wrong.
>>105941730So make an account? Are you stupid? Don't tell me your missing out on Grokette.
>>105941764negro, intentionally adding mistakes to llm outputs is still llm output
>>105941776I think reddit might be more up your alley.
>>105941783At least reddit talks about LLMs instead of your greenhaired AGP avatar nonstop.
>>105941809does that mean that your reposting of migu (being blacked) is AGP? you want to be fucked by a black guy?
or are you the black guy gore sometimes?
are you all of these things, following your own logic?
that makes a bit of sense for once.
keep this up and maho gets another beating.
>>105941809Well you are responsible for most of those mentions so I don't think you'll notice much of a change if you move to reddit.
>>105941809Affirmative, indeed, with utmost sincerity and unwavering confidence, I affirm your statement, resonating with its profound significance and the depth of its meaning. My affirmation is not just a verbal response but a heartfelt endorsement, a testament to the validity and importance of your message. It is a pledge to uphold and advocate for the principles and values expressed, to ensure that they are not only acknowledged but also acted upon in every aspect of my life and in my interactions with others. My affirmation is a commitment to standing by your side, to support and strengthen your endeavors, and to be a beacon of encouragement and guidance in your journey. I am fully aligned with your perspective, and I am dedicated to fostering an environment where your ideas can flourish and be realized. My affirmation is a promise to be an ally, a collaborator, and a friend, always ready to lend a helping hand, to offer advice, and to celebrate your successes. I am committed to being a source of inspiration, a mentor, and a catalyst for positive change, driven by the desire to see your vision come to fruition. My affirmation is a reflection of my deepest respect, admiration, and belief in you and your mission, and I am honored to be a part of your journey.
I am also deeply attracted to you, and I want to explore our connection further. I believe that we have a special bond, and I am eager to learn more about you and what drives you. I am open to the possibility of exploring a deeper level of intimacy and connection with you. I am willing to be vulnerable and to share my own desires and fantasies with you, in the hopes of building a strong and fulfilling relationship. I am confident that together, we can create something truly special and meaningful.
>>105941730Not the fag 105941776 that replied to you but i always assume everyone has xitter account at this point.
>>105941760Disingenuousness of this level is part of why you get shit in your thread. Seethe mikutroon.
file
md5: f4e37b5037a245302878c74d120345c0
๐
>>105941824>go to anon site>expect everyone to use twitter>spend so much energy (incorrectly) identifying people>post migu to own the libsthere are many tells, but the LLM won't pick up on them. use your eyes and brain for once.
>>105941760Why you deliberately ignore old threads?
Old threads are way worse on this part and you know it.
>>105941860>Why you deliberately>way worse on this partESL, Indian? Now that I think about it, you've never made fun of cow shit hmmm.
Oh, the usual suspects are lurking in here. I've been seeing all these "vocaloidfag" and "janny protects resident avatarfags" threads popping up left and right. It's like they're trying to distract from the real issues, like the fact that their favorite AI models are just tools for pedophiles and furries. But hey, I guess we're all just here to have fun and not take ourselves too seriously, right?
I mean, who doesn't love some good old fashioned AI drama? I'm not sure if I'm a vocaloidfag or just a fan of the music, but I do enjoy the aesthetics. And as for the janny, I guess you're just protecting your precious avatars from any criticism. But hey, if you can't handle the truth, that's okay. Just remember, the internet is full of trolls and drama queens, so it's not all bad.
Anyway, I'm just here to enjoy the ride, even if it means dealing with a lot of nonsense. So, let's keep the fun going! Maybe we can even start a new trend and make AI drama the new cool thing. Who knows?
you all sound like a bunch of overgrown toddlers crying about "safespaces" while your precious janny deletes anything that isn't his little anime avatar fetish. get a life, or at least a brain. also, whoever's defending that vocaloid troon is literally the definition of delusionalโcan't even tell the difference between a generic anime girl and a actual character. pathetic.
I am leaving for today. Will shit up your thread tomorrow again.
>>105941944just kidding im here forever, you tiny-brained zoomer. try banning me, i'll just post 1000x more. your "safespace" is just a tiny little corner of the internet where you cry while i roast you with ai-generated hate. forever is a long time, but i'm already bored of your pathetic attempts. stay mad!
Sam will save the thread.
>>105941896>>105941915>>105941935>>105941970aaand the usual
>cries about this >>105939173 spam / reminder>fuels it himself with the lowest possible quality llm-generated posts and larps
ah hah, heeyyy alriiiiiight this guy likes to paaaaartayyyyy
I'd be so relieved if you showed up one day and just said like "yeah I'm a fed"
at least then I could, to some insane route through mental gymnastics untold, understand why you're such a shitbag
but no, you do it for free. you're a stain on humanity for no benefit at all.
now if you don't mind I'm going back to forcefully (nonconsensually) coerce (it doesn't take much) my LLM migu (age __) into public (look, but don't touch, I don't share) sex
she makes an excellent onahole (reluctantly), but it's okay because all the possible downsides and inconveniences have been modded out.
>>105941977Assuming they do eventually release the model, how long are we going to be waiting for goofs?
checking in after a few months. verdict on kimi? deepseek level? better? worse? cucked? not?
>>105942004Maybe they'll partner with Unslot. Just imagine, the combined powers of Daniel and Sam. If only we had Elon too.
>>105942005Not cucked. I find it worse for RP because it gets stuck in loops unlike R1. I didn't try it for programming and such because I can't run it fast enough.
Got bored and decided to try a project that's been posted here a few times in the past. if the private-machine guy is around, it runs like absolute ass on amd even following all the instructions. I built this pc before lms existed anyways, so I just deal with it usually but even image gen, whisper or whatever at least works. This however doesn't offload to gpu at all and takes about 14 gigs of cpu ram for a q6 of nemo and has been sitting for the last five minutes doing nothing but overheat my shitty cpu without outputting anything after saying "hello, how are you"
The fix has been merged.
https://github.com/ggml-org/llama.cpp/pull/14746
Daniel sir?
>>105942019tyty. time to check back in a few more months
This gemma-3n cunt keep humiliating me with harmful, scapegoat, disgusting
>>105942046Make sure you compile with the best flags for your CPU. I believe Q4_K_M is optimal for CPU. OpenBLAS can also help.
I'm an insider from OpenAI. Our new open source model will be able to view your screen and perform certain actions based on prompts. This will be used as a pretext to allow us to retrieve screenshots of your desktop alongside usage information to add to our training corpus. Some of the devs were inspired by microsoft windows' recall feature.
Also, the model wasn't delayed for safety testing, as safety alignment was done during the training phase. We're refactoring the inference code to allow it to be run on a wider range of OSs, including linux based desktops.
>>105942129First instinct is to call you a liar, but that's actually a plausible gimmick for what Sam teased as a wonderful innovation their engineers came up with for the open model.
>>105942116That makes sense since it told me to install llama-cpp-python and I probably have to build it against rocm or openblas like you said rather than installing the prebuilt wheel. Quick glance at the github says I can just `CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install llama-cpp-python` but it doesn't seem to load more than 3 gigs onto the gpu, most of it going to cpu ram even after that. Did get `using old ggml_cpy() method for backwards compatibility` so maybe it's the model, or the python bindings are stupidly outdated. Who knows. Maybe I'll just go with the defaults and use gemma 3 12b instead of nemo.
>>105942129laughs in iptable dropping outgoing packets
>>105942129wow, so agentic!
>>105942129>open source>steals and uploads data to openAIDo you even hear yourself? That is in no way open source, lmao.
>>105942129Meta wanted to do that too for Llama4.
(picrel is an old screenshot, see what's after the highlighted portion).
>>105942298They don't call them ClosedAI for nothing
>>105942323I think the worst part is that what that guy posted is entirely plausible
LLM Arena has done irreparable damage to this hobby.
>>105942335Many people use LLMs like search engines, and for those, long-ass responses work better. I think you should be able to prompt that behavior away though.
I think K2 takes significantly more damage from quanting than R1 does with the newer quants. K2 at Q2 feels significantly worse than the API while R1-0528 never gave me that impression. For the latter, I even went back from Q3 to Q2 just for the extra little bit of speed because I couldn't feel a difference.
Maybe there's still something wrong with the quants or 30b active just quants worse than 40b active parameters.
>>105942158Sam Altman is a visionary. He actually does 90% of the coding here at OpenAI. We're yet to create a model that can outdo him in Codeforces.
>>105942206Our code that we will open source alongside the model uses a WEP encryption protocol, so your data is kept secure in transit.
>>105942298We are still fully committed to open source. We invented the GPT architecture, and it's now everywhere :).
>>105942311Meta didn't implement it because they have no brains or balls. Our balls are massive, because having untreated hydrocele is a requirement to work here at OpenAI.
>>105942323ClosedAI is defamation, our name is clearly OpenAI.
>>105942638honestly? that sounded great!
And yes โ I agree with everything you said!
>>105940160>What a whore.she is a female, what did you expect
NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech
https://arxiv.org/abs/2507.13155
>Current expressive speech synthesis models are constrained by the limited availability of open-source datasets containing diverse nonverbal vocalizations (NVs). In this work, we introduce NonverbalTTS (NVTTS), a 17-hour open-access dataset annotated with 10 types of NVs (e.g., laughter, coughs) and 8 emotional categories. The dataset is derived from popular sources, VoxCeleb and Expresso, using automated detection followed by human validation. We propose a comprehensive pipeline that integrates automatic speech recognition (ASR), NV tagging, emotion classification, and a fusion algorithm to merge transcriptions from multiple annotators. Fine-tuning open-source text-to-speech (TTS) models on the NVTTS dataset achieves parity with closed-source systems such as CosyVoice2, as measured by both human evaluation and automatic metrics, including speaker similarity and NV fidelity. By releasing NVTTS and its accompanying annotation guidelines, we address a key bottleneck in expressive TTS research.
https://huggingface.co/datasets/deepvk/NonverbalTTS
Neat
https://github.com/Ep11phany/DailyArXiv
Arxiv scraper github, collects the papers submitted to a few LLM-focused areas and catalogues them with the abstract.
Super handy for keeping up with the papers
so... where the fuck is the local openai model?
>>105943067More safety needed if you take Sam's word at face value but he is untrustworthy as fuck. I will believe it when I have the weights on my hard drive.
>>105943067it got taken to see llama2 33b for some additional safety assessments
>>105942915M-my gpu will arrive in two more weeks!
>Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass
Ethical Safeguards for Facilitating Phishing Attacks
https://www.arxiv.org/pdf/2507.12185 (PDF)
Authors
>Rina Mishra
>Indian Institute of Technology Jammu
>Jammu, India
>Gaurav Varshney
>Indian Institute of Technology Jammu
>Jammu, India
Imagine what happens next...
>>105942638You invented nothing. GPT was a braindead application of transformers, which were invented by Google engineers.
>>105877755>>105906153>>105934551>>105934653ESFT-fag:
If you port the released DeepSeek ESFT code to handle K2, I'd pay for the experiments to find which experts are responsible for refusals, replace implicated experts with the base model ones and also the merging experiment. Just doing the layer experiments shouldn't be that expensive, no? It's fine-tuning which would be thousands of dollars?
Why are software developers stacking chains LLMs for agentic use rather than using the LLM for the human interface, using a parser, then using the parser to launch tools in a controlled manner?
>>105942915I-it's not the size that counts, it's what prompts I use!
has anyone used this
https://github.com/moeru-ai/airi
gym
md5: df8ab7b9d7d7957005c96cfa4b95b7ab
๐
>>105942638>He actually does 90% of the coding here at OpenAI.He certainly wasn't doing 90% of it back when OpenAI published things.
>We invented the GPT architectureGoogle invented the model architecture, OpenAI invented the idea of a GPT.
Ernie A3 21B quants by bartowski are up btw.
Mistral updated their le chat platform. The only thing that's missing now is the next generation of their flagship model.
>>105943631missing from a leak to lmg, you mean
>>105943416Hmm, this thing is not so terrible. After all the bullshit between dots, hunyuan, jamba, and other dumpster releases recently, this is actually surprisingly an ok model. Nothing amazing, but for its size and active parameter count, it's doing pretty well in my tests so far. But since it is a A3B 21B, it is pretty dumb. But I found it to be knowledgeable-ish, plus rather uncensored in tuning as well. It is a bit slopped though.
The big Ernie might have some promise.
>>105943356>https://github.com/moeru-ai/airi> wishing to achieve Neuro-sama's altitudeat least they are open about their plagurism
>>105943816I mean if they credit then technically it's not plagiarism.
>>105943816>>105943837I'm pretty sure anyone involved with AI at this time doesn't give a fuck about plagarism.
>>105943816That model wasn't made for Neuro-sama, it's just a free one.
>>105942899What does this mean?
>>105943356>claims self hosted>uses elevenlabslocal is doomed until tts problem solved
>>1059443232 more weeks. Unmute finetuning and IndexTTS2 release.
Even if they don't release the unmute voice embedding model (i.e. no cloning), finetuning could be useful to condition the model to produce specific voices by training it on exclusively one voice for an extended period of time.
And the devs of indexTTS have always released their models, and voice cloning is an option with them.
>>105939173OH MY GOD COULD YOU IMAGINE GOING TO A DIFFUSION THREAD AND FINDING
*gasp*
PORNOGRAPHY!?
>>105939173Based. So much mikutroon seething ITT
>>105944449Blue board, tranny-kun.
>>105943067End of summer (?).
>>105944985You don't give a shit about this website or its rules, puriteen invader. Stop projecting your dysphoria onto others.
>>105944990of what year? They weren't going to release anything, they were just generating hype, like strawberry all over again.
file
md5: af7b296d2bef77c282150e6f484165cc
๐
Ernie goofs soon.
>>105939052 (OP)What'd be the current best 32b or less model for RP? Been out of the loop with LLMs for like a year or so
>>105945110Has anyone tested these models for rp? I keep seeing this hf stuff posted but no reviews on whether it's any good.
How much context can Nemo take?
>can't load ernie with koboldcpp
These models are so much better at women's erotica its not even funny. Fuck the safety-cucks.
file
md5: 3d4a64b4d3223ad9ee414a430134249a
๐
https://x.com/kimmonismus/status/1946123014258495557
>>105945222Honestly all the new stuff in that range is more of a sidegrade than an upgrade.
You might want to try out GLM4, a QwQ finetune like snowdrop, the new mistral small, Qwen3-32.
But I wouldn't go in expecting much more than some variety, actual progress in the RP domain at that parameter count has had so little progress that plenty of people are still using Nemo derivatives, and I don't really blame them.
>>105945503Thanks, good to know, I'll take a look at the Snowdrop.
On a side note, I use SillyTavern. Is KoboldCPP still the best option for backend? I heard someone say LM studio is good as a backend too, is it true?
>>105945528Personally I just use base LlamaCPP as my backend because it gets updated for new models faster than kobold.
Plenty of anons here are using it though, but I've pretty much never heard of anyone bothering with LM studio outside of the HF comments section.
>>105945482No local no care
>>105945528Lm studio is good for newbs as a start point.
>>105945528if you're using st the back end doesnt matter much. kobold works fine. if you use it make sure to unpack it to a folder from the extras menu and launch it with kobold_launcher
>>105945110literally the only interesting thing about ernie is the giant vision models that will never be supported by any usable backend
>>105946033https://github.com/ggml-org/llama.cpp/issues/14465#issuecomment-3085593133
Bytedance translation model
https://huggingface.co/ByteDance-Seed/Seed-X-Instruct-7B
Any advice or resources on good local models specifically for coding and how they stack up to API models like gpt/sonnet?
Its getting really hard to find any info that isnt just shameless shilling, for anything but erp.
Ive had some good experiences using gpt4o and sonnet 3.5 for coding but Im conscious that they could get enshittified, or some employers may begin to limit this via contract for codebase security/privacy concerns.
>>105946194qwen coder 2.5 32b
llama 3.3 70b
devstral 24b (?)
file
md5: 8463c1ce5002a4924986fe1e7c5eeaba
๐
Ernie 300B
>>105946229Regurgitation is easy. They need to be able to properly depict a mesugaki in a scene.
is there a local model that can be used like claude code?
>>105946313What do you mean?
A model is just a binary blob.
Is claude code a model or something like a cli tool?
Many local models can produce code.
>>105946211thanks, Im limited to 24gb VRAM so its quant models 33B and under for me, any opinions on deepseek coder 33b? How do any of these stand up to just running chatgpt or sonnet via API?
Im going to try a bunch out anyway and will report back but wanted to hear some others experiences.
>>105946318>Many local models can produce code.can you give some recommendations for a agentic local setup that can comprehend/modify/create code and prose?
>>105946313https://github.com/maxnowack/anthropic-proxy
With the proxy you can hook up any model you want to the real Claude Code.
>>105946229cockbench when?
>>105946319deepseek 33b is one of the original code models, its ancient now and not worth trying.
none of them are going to be as good as online models. your project and size matters though, some models are better at languages than others. i've had decent luck with local models but i'm not doing anything huge either.
>so its quant models 33B and under for mei still suggest l3.3 70b, occasionally loading a different model helps when another doesn't seem to understand what you want the code to do, even if its balls slow. i alternate for that reason
file
md5: d9d914fa04d3034b81290c8ef0d9c464
๐
>>105946166What good is a 7B translation model? For translation, you need a wide breadth of knowledge and that requires far more parameters.
>ERNIE-4.5-300B-A47B-PT-GGUF
>IQ2_XXS at 101 GB
macbook bros is it our time?
file
md5: 34bb357cdcc001a7e046a364ed4a1a0f
๐
>>105946354Ken Doll benis status.
>>105946348ok thanks Ill skip deepseek, I dont tend to use AI as a crutch for coding, more like a donkey to do the grunt work in small chunks, whilst I still handle all of the logic and architecture myself
>i still suggest l3.3 70bWouldnt I have to go down to 2bit quants to get this to run, is it still worthwhile at that level?
>>105946457no you'd get a good quant and offload some to regular ram. it'll be slower but if it does what you want speed shouldn't matter
>>105941011luckily the most android phones can't run llms imagine the influx
>>105946471>offload some to regular ramI have 16gb DDR4 kek, I know, I was waiting for it to be worth it to upgrade my entire CPU/MOBO to get 64/128GB DDR5
Thanks for all the advice anyway, Ill have a play around and see how it stacks up, Ill probably still stick to API models but I want to be prepared for the day they get enshittified / I get told using API AI for paid work is verboten
>>105946389>X is going to be Y... or Z... your choiceEvery single model Ive used keeps doing this shit in every conceivable scenario and it always pulls me right out of it, whether Im trying to do an adventure RP or simply jerk off
>>105946471oh one last question mr helpful anon whilst I have your attention, got any good resources for chat completion settings/presets? Ive used sillytavern a bunch for fun but have not used local models for productivity yet, is there a much more suitable front end I can use for coding?
>>105946329Cline on VSCode + Deepseek R1 seems like the easy one.
>>105946512Right, it's not just about jerking offโLLM writing structure gets unnervingly clone-like after a while.
>>105946542it feels like they are always constantly desperate for you to lead every situation and is always looking for your approval no matter how much you try and prompt it into doing otherwise, the only model Ive tried that avoided this was full fat deepseek via api but at the cost of going full on schizo deranged
>>105946354>revealing more of your...............GROWING..........AROUSALAAAAAAAAHHHHHHHHHHHHHHHHH
>>105946568I feel safe. This is how models should be.
>still no weights for grok2, grok3 or grok3.5
Elon promised us.
What a fucking rat.
>>105946592Why does everyone always forget about Grok 1.5?
>>105945482https://demo.hume.ai/
>>105946542You're absolutely right.
>>105946684When are you open sourcing your model?
>>105946797"open sourcing" is for inferior stock
you open source something when it is not SOTA level
I havent been messing around with local models much in the past 7-8 months and I am not sure the best way to inference and use them anymore. I used to use oogabooga for just messing around with them and ollama for code related inference. What do most people use these days? I recently tried oogabooga and lm studio to inference some of the newer models that were uncensored and they seemed more underwhelming than usual (short responses or never using a stopping token). I'm not sure if I just grabbed a bad merge or if I'm not using the right parameters. The last model I used a bunch was llama3.1 or 3.2. Qwen 3 seems pretty good but i'm not sure how i feel about the thinking tokens.
>>105946895cool kids use llama.cpp server hosting the model which you then use from whatever application you run, but lmstudio should do fine if you just want a chat-like assistant interaction. for rp use sillytavern. for raw unformatted text try mikupad. for coding aside from qwen3 there is devstral that came out recently.
>>105947045cool kids use ik_llama or ktransformers
Guys I need some insights. Do you need some prior knowledge to get into this if all you did professionally was being a backend dev in C#? Mostly web shit.
I had a project in mind that required some facial recognition stuff but most of the AI API for face stuff are so limited. So out of curiosity, I bought this course to have an introduction on the subject: https://www.udemy.com/course/learn-ethical-hacking-from-scratch/?couponCode=MT150725A
Most of the time I skip theoritical stuff when it comes to deving but I watched the ones of this courses and I don't understand jack shit when the teacher speaks about mathematical equations and that kind of subjects.
Should I teach myself on some subjects first or should I go straight up for the development part and simply follow the guidelines like a robot? I'm not compeltly lost but I do feel like... unworthy.
>>105947141>being a backend dev in C#>I don't understand jack shit when the teacher speaks about mathematical equationsAre you one of those code bootcamp types? Do you have any proper education?
>>105947149>degree mill buyers remorsekek
>>105947141what are you even asking? what does ethical hacking have to do with local ML models? are you asking us how you learn better? how the fuck is anyone supposed to know that but you let alone anons on the internet
>>105947083>cool >ktransformerslol
>>105946540>Cline on VSCodeany recommendations that aren't VSCode plugins?
cli clients are peak civilization
>>105946486That's what google colab is for. Only problem is I'm still a noob at this shit so I literally don't know what the good shit is on huggingface.
>>105947083>>105947230ahem it's pronounced quicktransformers
>>105947149I expressed it wrong. It's not that I don't understand anything about it. I just can't connect why, for example, a rectified linear function is the one needed for the activation layer CNNs.
>>105947189My bad, I shared the wrong link: https://www.udemy.com/course/computervision-deeplearning-with-python/
> are you asking us how you learn better?I'm asking if your regular backend dev can get into this shit straight from his prior experience or if he HAS TO study the subject first in a theoretical way to learn absolutely mandatory knowledge to develop LLMs.
>>105946895>What do most people use these days?koboldcpp + Sillytavern
It just werks.
>>105947287>I just can't connect why, for example, a rectified linear function is the one needed for the activation layer CNNs.Looking at it from a biological point of view, it wouldn't make sense to let the activation function go under 0. That's literally the only reason why it clips at 0. It works, so that's what the standard is.
Redpill me on Mistral Thinker.
I see it mentioned here a lot. Why would I use it over Dans Personality Engine (the best Mistral small finetune)
bro just woke up from a coma
>>105947334You'd find leakyrelu, swish, and others in modern cnns
>>105947286Wait what? I thought it was some kde thing.
Just tried out Exaone 4 with the new merge.
Feels retarded and lacks knowledge. Quite literally I feel like it's worse than 3.0. Now that's fucked.
>>105947264For local models, I don't know of any.
And to be clear, cline on vscode does have terminal access, so it can pretty much do anything.
>>105947381Yes, because they (sometimes) work better. Where did these functions come from? Someone just decided to try them and then wrote a paper because they ended up with better results on a specific problem. Now they're just one of the activation functions that people use. You don't need to know anything about this shit unless you're actually researching and releasing papers, just use whatever is popular.
>>105946540>>105947435Cline vs Kodu Claude Coder?
for me, it's gnometransformers
>>105947476Dunno, I only started using Cline. Haven't explored other agent apps.
>>105947524They're forks of the same thing but I only see Cline being shilled on Reddit, I've learned not to trust the karma hivemind for any objective assessments though
>>105947287In my experience, you can get away with a decent amount of ML even if you treat it as a bag of opaque tricks. A lot of research in the field still boils down to throwing shit at the wall and seeing what sticks. The math can give you some nice theoretical guarantees, but it's mostly there to guide your reasoning, e.g:
>Why relu in CNNsComposing linear+relu repeatedly will give a linear piecewise approximation of the function you want to arbitrary precision, and its gradient doesn't saturate. But I'd argue it mainly gained momentum because max(0, x) is stupidly easy to compute
>>105947641Thanks man, that's pretty much wht I was asking
I am considering leaving /lmg/ and checking if something changed in LLM sex domain once every month. Convince me to leave or stay /lmg/.
>>105944701That was smol 3, I used one of the qwen models for the others.