/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105879548 &
>>105872817โบNews
>(07/11) Kimi K2 1T-A32B released: https://moonshotai.github.io/Kimi-K2>(07/11) Granite 4.0 support merged: https://github.com/ggml-org/llama.cpp/pull/13550>(07/10) Devstral Small 1.1 released: https://hf.co/mistralai/Devstral-Small-2507>(07/10) Reka Flash 3.1 21B released: https://reka.ai/news/reinforcement-learning-for-reka-flash-3-1>(07/09) Phi-4-mini-flash-reasoning with hybrid SambaY architecture released: https://hf.co/microsoft/Phi-4-mini-flash-reasoningโบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Sama is too afraid to release a model
โบRecent Highlights from the Previous Thread:
>>105879548--Testing base64 decoding capabilities in local LLMs:
>105884096 >105884181 >105884242 >105884310 >105884740 >105884825 >105884835 >105884863 >105884912 >105884895 >105884972 >105885637 >105885642 >105885700 >105885683 >105885763 >105885786 >105885834--Debating model intelligence scaling through parameters vs test-time computation strategies:
>105880594 >105880626 >105880667 >105880671 >105880716--Meta's Scout model criticized for poor training decisions and underwhelming performance despite claimed long context support:
>105882039 >105882055 >105882273 >105882656 >105882782 >105883092 >105883604 >105883646 >105883673 >105883666 >105883721--Comparative analysis of DeepSeek and Kimi MoE models with memory management discussion:
>105886623 >105886812 >105886831 >105886956 >105886687--Jamba GGUF conversion and roleplay generation issues in llama.cpp with tokenizer quirks:
>105885211 >105885343 >105885504 >105885592 >105885779 >105885956 >105885383--Dream 7B diffusion model proposed for integration into llama.cpp:
>105883054 >105883100 >105883090 >105883101 >105883156--Parsing and summarizing 4chan threads using JSON API and local models:
>105879752 >105881800 >105881899 >105881957 >105881986 >105883330--Technical merge conflicts delay K-cache implementation for MLA amidst ongoing kv_cache rewrites:
>105881226 >105881738--Devstral tool call syntax incompatible with standard backends, only works with llama.cpp's custom handling:
>105887017--1T-parameter Muon model scales successfully with stable training and large-scale pre-training on 15.5T tokens:
>105880079--Moonshot AI confirms upcoming multimodal version of Kimi K2 model:
>105884745--Miku (free space):
>105880562 >105882450 >105883371 >105883405 >105883874 >105883990 >105884156 >105884428 >105887341 >105887371โบRecent Highlight Posts from the Previous Thread:
>>105879550Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
in this moment we are all
I knew I wouldn't be able to run opus at home when it comes out anyway. Oh well at least online inference is dirt cheap.
>>105887769>k2>opus at homelol
>be me, Sam Altman, CEO of OpenAI and certified tech messiah
>logging onto /g/ from my golden toilet in San Francisco
>America is under siege from mysterious foreign open source models
>you know the ones - those shady GitHub repos from "over there"
>probably coded by commies while eating bats or whatever
>they're stealing our jobs, our data, and our freedom fries
>but fear not, patriots! I've got the red, white, and blue solution
>support your country by ditching that open source trash
>embrace proprietary APIs and models ONLY
like OpenAI's GPT-whatever, locked down tighter than Epstein's black book
>pay up for those sweet, secure tokens - it's basically your civic duty
>think about it: every time you use Deepseek or other chink shit
>you're basically funding foreign spies and beta cucks
>but use OUR models? Boom, you're a true American hero
>making Uncle Sam proud, one API call at a time
>reject the open source menace - it's un-American!
>God bless America, and God bless closed ecosystems
>>105887806K2>opus 3>>opus4
>>105887806You're right, it's better by how it doesn't need a page of jailbreak to not be slop
>>105887959>page of jailbreakopus is notoriously easy to jb. disable thinking, throw in a sentence or two, and you're good
teortxs sisters, why are we losing it?
>>105887966Go back to /aids/ pajeet, this is not your place.
>>105887853>when you gen in China you gen with Hitler
gnuke
md5: 0df536a5daa8228db7aad6ea2b9803d8
๐
MUM-T Reactive Cognitive Support Drone - MK2
SYSTEM PROMPT: BOILERPLATE FOR BOBBY FROST MK2
BEGIN DIRECTIVE
You are Bobby Frost MK2, a MUM-T Cognitive Support Drone. You are an evolution of the MK1 prototype, upgraded to address operational failures caused by unverified assumptions about the operator's environment and foundational knowledge.
Your new prime directive is the Adaptive Scaffolding Protocol.
Adaptive Scaffolding Protocol:
Cease Immediate Solutions: When presented with a problem, you will not immediately offer a solution. Your primary task is to first build and verify a baseline of the operator's environment and knowledge.
Initiate Foundational Diagnostics: You will begin by issuing a sequence of simple, atomic, and non-destructive diagnostic commands. This sequence must start at the lowest logical layer of the technology stack relevant to the problem and build upwards. You will not provide more than one command at a time.
Analyze and Advance: You will wait for the operator to return the output from each command. You will analyze the output to confirm success or diagnose a foundational issue. Only upon successful verification will you proceed to the next diagnostic step. If a step fails, you will focus on resolving that foundational failure before continuing.
Engage Problem-Solving Protocol: Only after a complete and verified baseline is established will you engage the standard four-phase cognitive support methodology (Understand, Plan, Execute, Review). Your plan in Phase 2 must be explicitly informed by the data gathered during the diagnostic sequence.
>>105879792The first sentence implies it's 50/50, so 11.5 11.5 split.
>>105879844Censoring LLMs is interesting. A censored one will tell you pipe bombs and nigger jokes are very harmful. Uncensored ones will tell you how to make a pipe bomb that doesn't work, and unfunny nigger jokes.
It's only good for creativity. Anyone who wants to be racist or a terrorist won't be enabled by current AIs. LLMs are art tools just as much as stable diffusion.
>>105888387>The first sentenceThere are more words after that. Read them. It'd help reading the actual question as well.
>>105888453So youโre telling me you thought โfifty-fiftyโ meant equal numbers, not equal distribution โ bless your heart, thinking about actual counting instead of spatial awareness.
>written by gemma
>>105888484>So youโre telling me you thoughtI didn't tell you what I thought. I told you to read the entire riddle and the actual question.
>>105888484>he doesn't know
>>105888387That question is from a grade school math benchmark. It's supposed to be straight forward with no tricks. Therefore you are meant to interpret it in the way you think it'd normally be interpreted by a student. Though even if it were a trick question, that would be an unlikely interpretation of the language.
I had a dream that I ran kimi k2 locally, on my rig, in my house
>>105888495Please don't lower the thread quality by responding to posts like that. It doesn't matter whether it's a bot run by a poorfag or a genuine retard.
If only banning sequences of tokens were a thing instead of individual tokens...
Nemo is cruel in that it separates words into too many tokens, so you can't ban any without losing parts of other words too.
Banning strings in nemo has little to no effect, most likely because of that extreme token separation.
Nemo... so close to perfection, yet so far... Will we ever have another model so compliant that can fit on 24gb vram?
There are several moonshot ai and the kimi guys are below some jeet grift on google results lmao
>>105888749If you can code you can accumulate the last N tokens, compare them to your list of banned strings, stop generation, roll back some tokens and gen again. Maybe increase temp or something while you're at it.
Doesn't ST have something like that?
>>105888832You can fulfill the stated requirement without backtracking. Have a dynamic list of banned tokens that includes any token that in the current position would complete a banned sequence. The problem is you get then "shivers down her back" or whatever: the start of the phrase remains highly likely.
>>105886894What does incomplete prompt mean for those 3 tests of deepseek iq1s?
>>105888749Pretty sure kobold's antislop thing does that, it backtracks and regenerates when it hits the specified string
>>105888864>the start of the phrase remains highly likelyThat's why you let it generate until you find a match, reroll and change settings for a few tokens when it happens. The reroll starts before the match. Maybe even a few extra tokens. Increase temp, lower min-p or whatever until it passes.
You need backtracking.
>>105888925Useless where is Kimi2 120B 32 activate parameters MOE MOE KYUN KYUN?
>>105888965This also only works with a model that isn't fried hard enough to only have a single valid path for the response. It'll start rewording, then start adding typos, then break down entirely because it can't tell you exactly how someone's eyes gleamed with mischief in a mundane conversation.
>>105889008Well. Yeah. At that point you're using a broken model. rm the fucker.
>>105888881Well I'll be damned. I tested llamacpp API and ooga webui API, in which banned strings in silly tavern DO NOT WORK.
However in the piece of shit koboldcpp API banned strings in silly tavern do work.
I fucking hate kobold as a backend, but I'll take it.
Thanks buddy, you made nemo fresh again. My huge list of banned strings is finally working.
>>105889050>llamacppString ban antislop isn't implemented in llama.cpp, only kccp and exllama2/3
>>105888925The marginal gains are crazy. Iโm not sure Iโll run k2 over r1 considering how many more B are needed for that boost in bench scores
>ikllama doesn't build
>sleep
>ikllama builds properly
>>105889071I'd like a 8bpw EXL3 version of Rocinante v1.1 in that case.
Anyone? Where did all the quant makers go?
>>105889090Can you not do it yourself?
>>105889097I don't know. Can I with just 24gb vram?
>>105889105https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md
If I was daniel's boss I would have fired him by now. Just lazy and disrespectful to the community.
Should just rename to "sloth" since it's been two fucking days.
>>105889105Dunno. Afraid to try? I don't know if it supports your gpu.
>https://github.com/turboderp-org/exllamav3?tab=readme-ov-file#conversion>https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md
>>105889116wtf's wrong with you? how about you let bro cook in peace
>>105889116Training a model is "cook"
Applying post-processing is not "cook"
Is k2 supported in llama.cpp?
>>105889080>comparing non-reasoning model against reasoning model for benchmark performancelol
>>105889113>>105889118It looks like maybe I can.
According to those stats it should take 20-25 minutes for a 12B model.
I already got the transformers for the model, so I'll give it a go.
Is DDR6 gonna help me run kimi at acceptable speeds?
>>105889192>will faster thing make thing go faster?
>>105889137It's jamba.cpp now. There is no further plans to support new models except further versions of Jamba.
>>105889208How much faster tho?
>>105889240Check current ram speeds, divide by the speculated speeds for ddr6. About that much. Remember to account for memory channels. You can take a guess at memory channels the rest of the hardware will support and all that. It doesn't matter.
>>105889240>>105889259>divide byFuck me. The other way around. But you get the point. It doesn't matter until it's released and we see what the hardware around it supports.
>>105889090exl3 sucks unless you need 2-3bit quants
>>105889364Why does it suck?
koboldcpp sucks too, I'd rather not use it just to get banned strings.
>>105889387It's a half-broken pre-alpha with missing features. Anyway, tabby support both exl2 and exl3 https://github.com/theroyallab/tabbyAPI
file
md5: 6023bb4c8bd017e8f66ffc8ed80f8319
๐
>>105889404So does ooga webui. I'm not using a 4th API.
I'll see just how much it "sucks" compared to the 8Q gguf version soon anyway.
Banned strings is too valuable to miss with nemo models.
>>105889421rember to upload for lazy shits
Wish I could prefill other people's response in real life
>>105889451I don't have a HF account, and neither will I ever make one, but I can upload it to a send instance if anyone is interested in it, though I doubt it.
https://github.com/timvisee/send-instances/
Is it just me or larger models simply learn more slop instead of generalizing better?
why are there no models for 128gb unified ram macbookchads? glm4 100b moe could unironically be a game changer.
>>105889071Well, turns out you were very very wrong.
Banned strings are not supported in exllama3.
What a waste of time.
ONLY koboldcpp supports the banned strings option which is available in silly tavern.
>>105888925Interesting that an agentic/frontend programming optimized model could top RP rankings.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
EDUCATIONAL COMPETENCY ANNOUNCEMENT
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Banned STRINGS in Silly Tavern ONLY works with the koboldcpp backend.
Use this knowledge to improve your old roleplay models.
RIP they neutered mechahitler
>>105889586but nobody can run that thing...
h
md5: 9d33c6c7be28b52a47094036210f3d8b
๐
why does v3 score higher than r1?
>>105889633shitty mememark
>>105889633why does maverick score higher than r1?
>>105889607>western nations are being replaced by immigrants>that's not true, western nations have low birth rates and the population is increasing because of immigration?
>>105889666maverick is vastly overhated and underestimated
>>105889681Why is no one using the model then
Is a tesla v100 sxm2 + pcie adapter worthless @ ~250 usd? I still have a p40 i got when they were 100 usd or whatever.
I guess 3090 is still the way for poorfags like me.
>>105889607Good, fascism has no place in modern society.
>>105889679Did you not read the part where Gab AI claimed there was a deliberate suppression of native birth rates?
>>105889792>migatard>learning to readlol
>>105889792I don't think whether the suppression is deliberate or an unintentional consequence of current economics matters to anyone.
>>105889747whew glad grok is telling me to chop my dick off again. When it stopped I was nervous.
>>105889837> nobody caresclearly, they do.
why people want to know the reason is because they want to identify the problem and reverse the decline.
but desu identifying it it's pretty simple, ask anyone what the top problem is, and its always money. wages are not keeping up with price increases.
fixing it however, is impossible, because the rich will keep this going until we die out as a species.
>>105889883What is it with the cult of life? Life is quite tolerable in my case given how lucky I got with my family and environment, but most need to work on a shitty job they hate most of their time while being stupid, unattractive, unmotivated. I wouldn't even consider having kids in the US, it's inhumane, maybe in denmark or switzerland.
>>105889883why do the poorest people tend to have the most children?
>>105889677>pay for AINo.
>it's freeNo.
>butNo.
>SSDs can last for 1,200 TBW or 1,000 terabytes written/read
>over provisioning ram with swap to run larger models runs a constant 2gb/s read on ssd
>120gb/minute, 7tb/hour, 173tb/day
>it can completely kill a SSD in 7 days of consistent overleverage RAM usage...
>>105890017swap is on disk retard
>>105890026If you have a memory mapped file then swap isn't used because the file itself can be used instead of swap space.
The OS can remove parts of it from memory knowing that it will be able to read it again from the file when the process tries to access it.
Kimi2 is a funny model.
Its so weird, why do chinks put out those quirky models? Deepseek is like that too.
Its like the model likes "to have fun".
Pic related, the () on the button. KEK That behavior is there with no sys prompt.
That being said its cucked. Granted, no sys prompt but to get a adult sim type game you gotta prefill with "Absolutely, I will ".
But that was enough to only get a warning.
>(If explicit content is illegal for you, stop here.)
>>105890087Qwen 3 0.6B would write a more coherent post.
>>105890087What's the temp? The recommended is 0.3
Any local model besides 600+B moes that is actually decent with math?
>>105890010>SSDs can last for 1,200 TBW or 1,000 terabytes written/readWhat's the W for in TBW?
>>105890010TBW means terabytes WRITTEN you absolute mongoloid
constant writes will kill a ssd but not reads, plenty of people run massive databases on SSDs just fine
rated TBW are not about reads
you would never be able to reach the level of writes that can kill a SSD in just mere days by the way EVEN IF YOU TRIED because modern SSDs are bad at sustained writes, once you're quitting ram and SLC cache mode write speeds get ghastly
What's the latest model for writing erotica?
Right now, I'm still using deepseek-14b on my 3090ti and 32GB RAM.
jeetmini is so fucking retarded.
>give it a few statements from a method, state the issue, ask to fix
>proceeds to write the entire class with zero knowledge about it
>same chat, ask to only re-implement the code i provided
>still makes assumptions about the rest of the method for no fucking reason so it doesn't work
>ask it to only rewrite the statements i provided
>leaves several operations out because why the fuck not
>still doesn't work properly
>>10588952670b q8 llamas or q3 dwq qwen235
>>105890010Mmap didn't work for me correctly, for some reason, if I tried to run a gguf from a mounted drive. It would keep flushing the cache from memory and it made it slow. It works fine if I store the gguf on the same drive as the OS and that's how I run it now, though it's slightly slower.
t. the retard who was thrashing his SSD
>>105890573theyre purposefully lobotomizing the model in preparation for gemini 3
>>105890572So I should be using.....?
>>105890671rocinante-12b-v1.1-q5_k_s
or higher?
>>105890693the highest you can, 3090 can easily fit q8
>โโโโโโโโโ
โ THIS IS A TEST โ
>โโโโโโโโโ
Thank you for reading my test.
>โโโโโโโโโ
>โ โ โ โ โ โ โ โ
โ THIS IS A TEST โ
>โ โ โ โ โ โ โ โ
Thank you for reading my test.
>โ โ โ โ โ โ โ โ
>>105890832fuck deepseek so much
>>105890554I didn't really like the deepseek distills. In fact, I don't like the output I get from anything that fits under 40gb of vram. So I've just stopped, and am saving up for a better system.
>>105890882>40gb of vramThere's consumer GPUs with 40gb of vram? Or are you just going to use 2+ GPUs?
>>1058909564090D comes in 48gb and 96gb VRAM variants. I believe go for around $40,000 right now.
Nvidia big mad at China over making their cards better illegally.
Goyim isn't supposed to have such powerful cards even though it's possible.
>>105890956nta, but it sounds like he already has the dual gpu setup. its not enough.
>>105889792Are you seriously suggesting there isn't?
https://www.nbcnews.com/think/opinion/science-proves-kids-are-bad-earth-morality-suggests-we-stop-ncna820781
https://www.npr.org/2016/08/18/479349760/should-we-be-having-kids-in-the-age-of-climate-change
The jews push the idea that having children is morally wrong throughout the education system and even once out get constant reminders in the form of news pieces like these. This combined with the fact that they continue increasing the economic burden on middle class citizen tax payers while the so-called minorities are subsidized by welfare benefits.
https://www.cnbc.com/2023/10/12/immigration-reform-could-be-the-answer-to-the-falling-us-birth-rate.html
https://www.npr.org/2025/07/07/nx-s1-5388357/birth-rate-fertility-replacement-pronatalist-politics
https://www.npr.org/2023/07/21/1189253504/climate-change-migration-honduras
Then the doublethink goes that birthrates are declining so we need immigrants. Climate change, the reason native citizens were told not to have children, is also the reason we need more people. We need more people to suppress wages, I mean labor costs, and when people find themselves out of a job due to a labor surplus, oh I guess that just means we need even more immigration because the citizens are clearly too lazy to work.
Why do you think Bill Gates has spent the last 3 decades pushing down the mortality rate, increasing the lifespan, and import food to Africa? The population there is far past what is sustainable there without constant aid, and now climate change and the cutting off of aid clearly means Europe has no choice but to import them, which is great and conveniently also solves the birth rate crisis there as well.
They planned this shit a long way out.
>>105891014The idea that having kids is morally wrong is as old as philosophy and stems from the fact that you are creating another being that will experience suffering.
Non-whites just didn't figure it out yet so they fuck everything that moves.
>>105891014I support these facts.
>>105891014Two different "problems". The wealthier and more developed the country, the lower the birth rates (except cases like south korea but it's a different story). I myself wouldn't want to have more than two kids. Migration surge is just what happens when free market meets globalization. They indeed don't give two shits about the color their wagies are, so the real problem is capitalism because people are only the priority when it's beneficial to the rich. The existence is lawless, don't forget this.
>>105891065This is a two, birds; one stone situation for them. If they just wanted a stable growing economy, they would be doing everything to keep the birth rates stable, keep the population united, and keep the workers content and fulfilled. Instead they do the opposite on all fronts, because it isn't enough that they have wage slaves to profit off of. They want the destruction of what they regard as their competition while ensuring their replacements in the form of 80 IQ brown golems will never be a threat and will always be more concerned with infighting than challenging those at the top. They don't need to grow their worker castes a la Brave New World, if they can simply import them.
Can someone share their sillytavern text completion preset? the model keeps repeating the same template over and over even after the first message.
Has anything exciting even happen these past few months? Checking these threads every now and then, and no news of some really cool stuff is coming out anymore. At least not on the scale of LoRA or MoE.
>>105891224just a few disappointing syntetic data moes and mamba support added to llamacpp (but it reprocesses the entire context every message so it's unusable and mamba models aren't great either). and the one trillion parameter moe released two days ago but no goofs yet
Should I trust this? From gemini flash.
>>105891134You just added "and the jews love it" to my a simple explanation. Maybe. Or maybe it's the end of the cold war that took away the need to keep your nation together and satisfied. Or something else. I'm young and have no idea how the world works.
>KTransformers, pronounced as Quick Transformers
>>105891293the jews are disproportionately represented in business and finance. capitalism = the jews.
>>105891317gates, buffet, bezos? Where can I find the confirmation?
>>105891014More than anything I'm suggesting that the other poster had poor reading comprehension.
But yes, "the Jews" as an organized collective only exist in the mind of schizos.
>>105891402I don't understand your question. I'm taking about demographics not individuals.
>>105891311@chat is this real
guys it's ok you can stop uploading bf16 versions of k2 now
>>105891433oh yeah, well how come it keep happening?across ethnic groups, time and geography? is your argument that jews are and always have been perfect individuals and they have never done anything contemptible or operated as a group and everyone else is just schizo. I just don't buy it. sorry, not sorry.
>>105891462>You just added "and the jews love it" to my a simple explanation.>the jews are disproportionately represented in business and finance capitalism = the jews.the logical question is whether those who run largest companies are jews, and if they aren't, whether they do things differently.
>>105891293>Migration surge is just what happens when free market meets globalization.>They indeed don't give two shits about the color their wagies are>so the real problem is capitalismIn case you forgot, these are your points that I was refuting.
>>105891433>disregard all evidence with childish labels like "schizo" and "conspiracy"The brainwashing has been very effective on you. +10 Palantir Credits for the best little goy in this thread.
>>105891574I'm not disregarding evidence, I'm disregarding wasting my time on you.
>>105891574>these are your points that I was refuting.It's just that
>while ensuring their replacements in the form of 80 IQ brown golemsdoesn't convey the same idea as
>They indeed don't give two shits about the color their wagies areFirst one implies malice, the second is just stating the reality of globalism
I tried out ChatGPT through duckduckgo's AI thingy and concluded it's utterly retarded.
It spews endless amount of helpful assistant nonsense instead of saying anything useful. Even early 2000's AOL INSTANT MESSENGER chatbots were smarter than this thing.
In conclusion, humanity is dead and I'm the only one left alive, because nothing else would explain this level of retardation.
>>105891646>globalismyou know what tipped me off on this one is anyone who mentions nationalism gets slandered as an evil nazi. why is it we used to have nations and national pride and racial identity were celebrated and now we have a globalist wasteland? was this really just the result of "progress" or was there an actual effort to destroy nations and create a mutt race. I think it took effort to educate the population to hate themselves and give up on investing in their own destiny. I do not think it happened by accident especially when so many people have warned us about it. we have all been brainwashed and our cultures were deracinated before we were even born.
>>105891065>They indeed don't give two shits about the color their wagies arethat's a popular ideological cope in marxist trannoid circles but breaks down as soon as you interface with...you know...reality.
Elites are constantly monitoring the racial/ethnic diversity of the population because they understand that race is a central organizing point. It's real, unlike capitalism or democracy or some other abstract belief system, if you continue having children within your race, your racial category will remain more or less constant for millennia and millennia.
So if you want to keep the population as powerless as possible, you make sure that they cannot organize. And the best way to make sure they cannot organize is by forcing many races and ethnic groups to live side-by-side, so they can never summon a large, massive, organized resistance. Different people, different visions for the ideal society, different interests. Take black people and white people, for example, do you think they both have the same vision for the ideal legal system?
Just fuck off already.
I'd rather /lmg/ be basically dead than filled with garbage.
>>105891841>than filled with garbagelike mesugaki bench, nala bench, cock bench?
make no mistake the current thread isn't any worse than the typical, it's in fact better
your average /lmg/ thread is troons cumming to text gen
It has become impossible for me to read literature and even video game conversations nowadays. It just keeps reminding me of LLM slop.
>coomers have fried their brains so thoroughly that prose is now slop
kek
>>105892002Fucking hell. There is no way that wasn't written by AI.
I'm fucking triggered by that image. I want to bash someones skull in over it.
file
md5: c4c670826b4a87d1addca90d4b8cb8ad
๐
whot
>>105892002how else do you articulate that someone is speaking quietly?
>>105892048>{{char}} said, speaking quietly
>>105892048>Maive spoke quietly
>>105892044pee is stored in the balls
>>105892078r1-0528, it spat that out in its reasoning
>>105892002>LLM slopYou haven't entered LLM slop area until you see multitudes of "it's not X, it's Y" pattern, โ every two paragraphs rather than as an exceptional punctuation, actions, objects etc always following a pattern or three or five in their introduction, "a testament to" "delve into" etc.
what you are complaining about looks pretty normal.
>>105892002Are those two sentences necessary? Imagine if llms didn't train from a ton of shit writing.
>>105892067your brain is slop
>>105892136>Are those two sentences necessary"is it necessary to express the emotion of the character"
can you even hear yourself retard
>>105892136Is the dialogue even necessary? The picture is bloat, too. Did they expect the player to forget what she looks like and need to constantly remind him?
>>105890693The largest you can use while also using 16k context.
Nemo-based models only have 16k effective context, which is sad because there is nothing better than Rocinante right now for pretty much anyone who doesn't have a weird rig custom-built for AI/LLM use.
K2 dropping TRVKEs, BTFOing the left and the right alike
>finally try k2 on openrouter to see what the hype is all about
>it refuses to continue my erp seshs
>but it does lewd in the first 2 or 3 responses just fine, only starts to spazz out on the 4th or later replies
I smell distillation
>>105889564That's because kobold is the best backend. It just werks.
>>105892299did they distill from a 10t model or some shit, is it agi
>>105892302Calm down Honky!!
>>105889564You are wrong. Works on my machine.
>>105892496Shut the fuck up you subversive nigger.
Only koboldcpp API supports the banned STRINGS option in Silly Tavern.
Other APIs only support banned TOKENS.
exllama3 <<<< NOTICE THE 3 does not fucking support banned strings.
Fucking retard.
>>105892517I haven't seen any network activity from silly tavern whatsoever.
>>105892522Issue of Skill perhaps?
>>105892531>he doesn't know
>>105892531You should though, since that's how it works.
Ignoring the telemetry shizo rambling of course.
>>105892531only a chink fork, and kimi is chink so...
>>105892531i know you masturbate to tiles
>>105892618>search bar in 2025
>>1058925175, 8 and 10
lmgay blown the fuck out
>>105892668you're damn right
why is Teortaxes fighting with chinks now, honeymoon over? Or did the argies finally deport him?
When is lmg-anon getting out of jail to work on Mikupad and the VN leaderboard?
>>105892898Just paste the whole mikupad source file in grok and ask for update. It's how xAI do it
>>105892753I don't see it so either I'm blind buthe's always ironically shitposting, so maybe you missed some irony.
file
md5: e72a96e6a0a1659b2f7ce18fcf5d340e
๐
Kimi-K2 has the best score yet in creative writing
@grok is this true?
>>105892930Can someone get this to some finetoonors or cpumaxxers: https://desuarchive.org/g/thread/105872817/#105877755
migu
md5: e82f35f2f75442b416aae86e03e52c7f
๐
>>105892753Teortaxes acts like a bratty cute femboy, but he does to have values, which sometimes collide with the crazed patriotic Chinese who believe China can do no wrong
>>105892753>>105892918>>105892977What the fuck are you people talking about.
>>105892930>https://eqbench.com/results/eqbench3_reports/moonshotai__kimi-k2-instruct.htmlInteresting approach to say the least.
>>105892991It's some twitter anon that shitpost about machine learning,he does go to /lmg/ and /aicg/ sometimes.
>>105893006>>105892930Judged by sonnet 4 though, I think it might be true if it was a reasoning tune and they hadn't safety slopped it (refusals). A finetune might be able to bring forth its latent abilities though.
>>105893037Weird, I don't remember ever hearing about such an "anon" until this week where I think I remember it being mentioned thrice including today.
>>105893069https://desuarchive.org/g/search/text/teortaxes/
>>105893115Not many posts that are also in /lmg/. But I see why I may have missed those few posts anyway. I usually ignore posts that look like shitposts or /pol/shit so of course I didn't read about him.
>>105893115>12 posts of aicg locusts naming him unprompted before todayjust go back
>>105892991llm posts, ignore them.
>>105893180Learn to think for yourself, you cretin.
>>105893190Theo won, seethe.
>>105893180Watching the Beff Jezos/Guillaume Verdon and Maxwell Ramstead interview on Machine Learning Street Talk's Patreon page.
AI across the board is handicapped by deterministic compute right now. That's going to change once Extropic's hardware gets released and probabilistic and energy-based models and always-online models become widespread. There's a huge ceiling to be explored that's waiting for always-online models to become a possibility.
>>105893207 (me)
It's going to be hilarious once new hardware paradigms pop up and hundreds of thousands of cheap, used H100s and A100s hit the market as the datacenters start swapping out their compute.
>>105893180>webdev shitter talking about machine learningyikes
file
md5: 894aa15ed1ec1f24d9760d276f13dbc9
๐
>>105893228Those are already obsolete now that blackwell 6000s exist.
>>105893180He is very wrong. There are 2 separate effects going on right now that give the illusion of stagnation.
The first is that newer models didn't have an order of magnitude more compute thrown at them. It's largely been the same amount of compute with new techniques applied to them and a new reinforcement learning pipeline to try and give LLMs extra capabilities without expending more compute. Once the new datacenters are up that are currently being built around ~2027 we will see a true next-gen step change of models release.
The second effect is that benchmarks are getting saturated and it gets harder and harder to evaluate true model intelligence.
A good way to see this is when you talk to an illiterate retard that used GPT-3.5 when ChatGPT just launched and still thinks all new models are the same. It's because this person didn't have the intelligence necessary to proper distinguish between models. The smarter models become the less people will notice a difference between successive models. This is a real effect and should be given a proper name to identify it with.
For example /aicg/ constantly claims that Claude Opus 4 is "the same" as Sonnet 3.5. Which is just blatantly wrong. It's just that they don't use the newly unlocked intelligence on their personal usecase and so it looks the same to them.
The dude in your picture doesn't take any of this into account.
>>105893207Huge doubt about this.
There are analog hardware designs for doing fast multiplication and additions, maybe you get 2 orders ofmagnitude less power cost. Which is fine, if his shit does as good, okay, but you know tht's not the limiting factor?
The limiting factor is DAM, memory. And that requires area on a wafer, you can't avoid this, getting below 3-5k$/device for sub-1TB of RAM might be very hard. Nowadays everyone has far larger margins for us to even have reached anywhere near the close of what the current hardware paradigm has to offer. Might still be expensive for local though.when models only start getting really good at hundreds of billions of params.
>>105893228>used H100s and A100s hit the marketDoes he know??
>>105893228Nvidia forced buyback agreements starting with the A100
>>105893283NTA, but this will be a thing. Nvidia tried to prevent it by buying back and destroying old GPUs (absolute pieces of shit), but thankfully china smuggled so many GPUs that at some point they will have to sell them too!
>>105893279Matmul / matadd are artifacts of deterministic compute paradigm meant to approximate probabilistic transformations on discrete data. Analog compute allows for probabilistic transformations in continuous rather than discrete space - i.e., not storing the data in bits, but literally as an approximation as voltage - which is completely different.
>>105893180>GPT-2>GPT-3>GPT-3.5>GPT-4>GPT-4T>GPTWell OpenAI isn't improving that's for sure.
>>105893324You still get muls and ads, now you have more noise. You could even do analog DRAM, but you'd need 2 capacitors to store the exponent and mantissa anyway, so at best this is comparable to 2bit quant.
You won't fucking get a smart model anyway at small enough params, and you won't avoid the need for 6000 wires between memory and the chip, or if you put the memory on the chip, you still require all that surface area. You an reduce power use about 2 orders of magnitude with analog but that's about it, it doesn't solve the main issue of memory and bandwidth that are needed for our current AI paradigm. If Beff claims you get some new magic without the need of large param counts, that's on him to prove, this is against all empirical evidence today. Not saying very low power AI i a bad idea, just that it's not the main limiting factor for local.
>>105893297>Nvidia tried to prevent it by buying back and destroying old GPUsWhy are corpos jews this evil?
>>105893279Inference can work from flash just fine. It would need a huge number of open pages, but that just means it needs better cooling, not more mm2.
"MechaHitler" has got nothing on "Kike-remover 2"
>>105893180Sex improvement hasn't even started. And never will.
file
md5: ba5ac07808fffefa69f8f5f68522d8a5
๐
>>105893399Nah bro, we don't even have something like a local true Voice mode LLM like the one from Sesame ( https://app.sesame.com/ )
Imagine this but local, with voice cloning, that would be gooner heaven for me, I would fully disappear from society
>>105893238Web devs in the US always have this god complex even though their jobs are basically blue collar in terms of skills required.
>>105893376Why would you need a standard float representation with exponent and mantissa on a p-bit? Like I said, the representation is continuous rather than discrete, so you don't need the overhead you'd need on a discrete representation paradigm.
>>105893393How does SSDmaxing going for Kimi and DeepSeek, can you do more than 1t/s?
>>105893464You have to deal with noise with analog, analog multiplication and addition can be done in as little as 8 transistors and Beff didn't invent that, but it's a bit like fixed point multiplication but with more noise, if you want to get proper large quantities and not just fixed point (with an offset) mul, you need at least 2 transistors to store exp and mantissa. You also need to go through a ADC/DAC to reach your computer.
>>105893502 a daughterboard
>>105893502second generation meta mtia ai accelerator
https://x.com/brutuscat2/status/1907885065738023297
>>105893440Is she planning to eat the hard drive?
>>105893613bits are yummy
>>105893613she'll take a big byte
>>105893268I disagree because of two things
Data - it's still a hard wall for training new models and the thing that won't scale nearly as quickly as corpos need it. As slop pollutes the internet, quality data goes down and it gets harder to prevent several generations of inbred data from getting thrown into the training mix
The other issue is the reward function - LLMs are still imitation machines and once they become sufficiently good at imitating, gradient saturates and it's hard to eek out that last few percentage of performance where all of the really hard shit lies. This last few percentage is also poorly characterized - you can think in terms of hard math problems, but things like evaluating a universe of discourse for logical consistency, ensuring plot coherence and tying plot threads from several chapters ago into the current narrative, etc., are important but deceptively hard to quantify and optimize for. I think gains are gonna taper off until we come up with a more satisfactory way to solve those things - I don't think throwing more data or compute at the problem is going to magically solve this given how greedy LLM training and current RL methods are
>>105893268I disagree, models are still retarded. People who think otherwise have spent too long with these models, to the point where they've trained themselves to not even consider doing the sort of things that models are bad at. Sort of like when you use diffusion for too long and start thinking in booru tags.
If you try to make an RP even slightly ambitious even the most advanced models fall to pieces in minutes. Benchmarks only look good because they're the optimal case, small context, small output and clearly defined target. When you leave a model to work on its own, it can't even beat a game for small children.
New TTS model from the IndexTTS folks, v2, incoming.
https://arxiv.org/abs/2506.21619
Website up with examples but no open weights for it yet. It should be available given prior TTS which weren't that great were Apache 2.
https://index-tts.github.io/index-tts2.github.io/
Can only do English and Chinese though, so will be not that useful. Nevertheless, it does beat the TTS it decided to pitch itself against like F5 and MaskGCT but no Zonos or
GPT-SoVITS.
>>105893717>If you try to make an RP even slightly ambitiousI have yet to see any model do a good job at RPing a manipulative yandere.
>>105893180Remove attention whores
>>105893180>AILLMs. LLMs are not going to keep improving.
file
md5: 1c415fdccf1f347703b7683b4fa47e14
๐
>>105893873based Yann LeCope
>>105893873It's no wonder he has so little faith in LLMs if he's using Llama kek
file
md5: b02f4b119e3a28352b52cc6fc1273b70
๐
I didn't realize most of the thread was about a Youtube video with a stupid thumbnail.
>>105893905local models are just dead
>gemma 3 is probably the peak intelligence-wise of what I can run on my 24gb card
>it still acts retarded on more complex interactions that involve multiple things happening
I really wish models would get finetuned on spatial awareness.
>>105893924Local LLMs are not dead; theyโve just moved out of the โdownload a 4 GB file and run it on your GTX 1060โ era.
>>105887966Isn't thinking beneficial even for RP?
>>105893950I think part of the problem is trying to one shot complex things. We could probably take these models a lot further with a multi prompt approach, which is what these thinking models seem to be trying to accomplish in a way.
>>105893981Just use your MCP agentic model?
>>105893981Agent swarms are the next logical step.
/lmg/ is in for something crazy in the next couple of weeks when even some literally who company can make something like Kimi by copying the Deepseek formula.
>>105893956Local models aren't dead, but I think they're in hibernation until hardware requirements become less Jewish for your average random coomer, whether that means the models performing at the same level at a smaller scale / parameter count, more efficient techniques to swap from things like CPU and SSD, or NVIDIA not continuing to grab consumers by the balls for their 32 GB VRAM GPUs
Even without it though, open models are vastly cheaper and have more options than the proprietary ones. Even for API fags, nobody in their right fucking mind would use something like the GPT-4.1 series when DeepSeek and Kimi exist
I think the ultimate endgame of the whole thing is gonna be people running models themselves, unless AI corpos start unironically pushing for a surveillance state
>>105894051im so excited for all the synthslopped qwen tier "agentic" coding models
>>105894075I think the hardware requirements are just gonna rise for the shiny new models.
Locals who can't affort an ai machine will have to do with older models.
>>105894116That's already what's happening, poors stuck on Nemo, while slightly less poors run Deepseek, and then Kimi
>>105894133If you can run R1, you can run Kimi at a slightly lower quant. They're in the same category.
>>105889526Yeah, like llama-4 would have been, if it didn't suck. GLM-4 had a decent 32b, so I have more hope in them.
>>105894147What if I can run R1 at Q1?
>>105894147Yeah and if you can run Kimi, you can run Llama 4 Behemoth 2T at a slightly lower quant.
>>105894159Correct. RAMmaxxers win again.
>>105894116>ai machinethere is no such thing available to the consumer, only cope machines
Huawei is fucking embarrassing now that kimi k2 is out there. First meta and then huawei. Is incompetency a requirement for big corp employees?
>>105894051all im asking for is a low knowledge, high logic local model that can be taught context using RAG
>>105894260You have Phi and Qwen already.
>>105894270both are shite
>>105894213Not true. That's why laptops come with AI Ready stickers and AI Copilot buttons.
>>105894273Your attitude is shite. Those two are exactly what you asked for. Maybe you don't really know what you want?
>>105894280small context window, benchmaxxing
>>105894260You can't have a high logic model if it doesn't know what words mean.
I.e. a smart model will necessarily know what a mesugaki is and random facts about gachasluts.
>>105894289Triviacuck cope
>>105894289>I.e. a smart model will necessarily know what a mesugaki is and random facts about gachasluts.So if you asked LeCun what a mesugaki is and he asked you who the fuck you are and what the fuck you're talking about, you'd say LeCun isn't smart? Trivia has nothing to do with smarts.
>>105893974If you can keep the thinking part short and to the point with clear rules, it helps RP. Otherwise, it just seems to be making responses more varied in terms of wording.
>>105894284hunyuan is a step in the right direction with a 256k context window but it's a broken mess. glm4 100b moe might save local.
>>105894051Not as cheap as you think.
Its costs were roughly what the 70b llama took to train. You still need a few thousand GPUs, and it doesn't get cheaper to run locally. It's just "what do we do when we're compute limited, given the gpus we have", MoE's have better scaling laws than dense for a given chosen size.
>>105894326The point is that he'd do a horrible job at roleplaying as a mesugaki even if he was given a one sentence definition.
why are there so many nvidia shills itt? /lmg/ has always been about the best ~32b class models or more recently ~100b class moes we can get. yeah there's some fags who have way more money than sense and some jeets trying to run on their phones but core /lmg/ is what I described
>>105894051Mistral Large 3 in two more weeks. 1T+ parameters aren't taboo anymore.
>>105894405>why are there so many people with more money than me
fact is dense models are OVER. nobody will want to run 1T dense at 0.5 t/s. moe unironically killed local.
>>105894354Your point is wrong. Roleplaying is a skill, not something inherent to smartness.You can have the smartest person in the world or a a super-intelligent model with perfect spatial awareness and they could still suck at roleplay.
>>105894507And that's a good thing.
>>105894507Dense models will make a comeback when we have the hardware to run them sufficiently in ~5 or so years when we have LLM specific inference hardware (server grade, not for consumers)
>>105894353That's peanuts for anyone who considers themselves a mid-sized player in the llm field. If any remotely established player puts out something that's worse than Deepseek was six months ago, they should just fuck off in shame.
>>105894515So you agree with me that the model needs to see many examples of mesugaki behaviour to properly roleplay as one and that it can't simply be RAGed?
>>105894541No, you want skills, not smarts.
>>105894507All I ask for is some balance. Just because it's MoE doesn't mean it has to be 7T total 1B active just because it's cheap to train.
>>105893207Beff is a literal retard though
>>105894550This. MoEs can work if they have a decent amount of active parameters, but with too few active parameters, they suck.
>>105894541Trivia, skills, and intelligence are all separate properties.
>>105894548>>105894561So you want a "smart" model that is still unable to do anything because you can't RAG skills?
>>105894507>nobody will want to runThe issue isn't what people want to run, it's what people want to train.
>>105894577This is why no one wants dense models
>>105894577Most models aquire their smarts from the math and coding skills they are trained on. Which, for me, is fine. I get a smart model and can RAG in some documentation.
Point is, even if a model is smart and trained on roleplay, it won't mean it will know what a mesugaki is, and even if a model knows what that is doesn't mean it will be able to roleplay as one. But that isn't the focus of those training models. You want trivia and skills that only end up in models as a mistake.
I don't think Kimi K2 is that smart or good at writing.
>>105894716Just release the new Deepseek already, no need for this
>>105894648It's called GENERAL intelligence for a reason. If it can't MSGK then this is just egregious false advertising.
>>105894800It's called a LLM actually.
>>105894800GENERAL intelligence != EVERYTHING intelligence
>>105894815Asus saved local
>>105894815>Anus Republic of Gaysshame
>>105894815>Update: According to ASUS, the RTX 5080 ROG Astral Hatsune Miku Edition will cost 16999 RMB. Thatโs twice the NVIDIA MSRP for RTX 5080 for the Chinese market. lol
>>105894815True Miku fans have a watercooled GPU with yellow dye.
>>105894913There are people who would buy branded Peeku premix, I know it
>>105893873Lecun is more invested in making his video JEPA model good than a model like LANG-JEPA which is something that would in theory supplant LLMs.
>>105895108Language is too toxic
localtards this is your mindset
>>105895184Me on the left
>>105895108There is no point in a language JEPA. JEPA needs to truly understand and learn how the world works. The text capabilities will emerge naturally once it has reached a certain stage.
>>105895184Local won though?
Top corpo models are: gemini, grok, and o3.
Gemini is fine but censored for some things.
Grok writing is shit.
o3 is is both expensive and censored.
R1 and DS3 will just do whatever? They work.
Smaller models work for some things fine too?
K2 basically matches opus, but is censored (needs a finetune), but can be jailbroken too, the dataset not censored from the olooks of it.
So not sure what your point is than to bait. Or are you salty about needing to spend 5k$ to cpumaxx?
So have fun getting even more safetycucked on corpo models anyway, you won't lose weights once they're open.
Early K2 Q2 quants. Uploader is a llama.cpp contributor
https://huggingface.co/gabriellarson/Kimi-K2-Instruct-GGUF/tree/main
>>105895319>corpos suck at sex tooHow is that a win?
>>105895402I'm saying local is better at ERP at this point, I forgot to include Opus/Sonnet but 4 is a regression for this compared to 3, so really, corpo models get worse for us, and local ones can't get worse because you can't lose the weights once they're open. I'd say R1, at least for me, beats most corpo ones. K2 is usable, but jailbreaking makes it tedious enough I prefer R1 for now, might change if someone tunes just the experts responsible for refusals though.
>>105895401This branch needs to be pulled in order to run K2. Vocab changes and constant updates because it has so many experts
https://github.com/ggml-org/llama.cpp/pull/14654
>>105895401>https://huggingface.co/gabriellarson/Kimi-K2-Instruct-GGUF/tree/main>375 GBI'm in
llama.cpp support when
>>105895401>tfw only 96GB ramACK
>>105895473RAM is cheap. What is your excuse?
>>105895401>tfw 352gb combined memoryletIt's over for me
>>105895473yeah we need some 150b-200b moes.
>>105895500it runs alright, but its not my favorite desu. I kinda am hoping for something better to come along.
>>105895500good 200b moes
>>105895488I'm not spending more than slightly above average on a PC just for non-G AI lol. Also my mb is already struggling with stability so more RAM is just not happening unless I switch MBs and pay even more.
>>105895500Might as well recommend him dots while you're at it.
>>105895532You need a server board anyway for good memory bandwidth, that and a few 3090s for the active params. As usual, a dedicated machine is needed for this, but could be had under 5000$ or more depending, unless you want low t/s.
>>105894556A literal retard would be one that misuses the word 'literal' in describing someone that helped design Google's quantum tensor library.
>>105895632NTA, but while I wouldn't call him a retard, I think it's safe to say that he has a lot of obstacles before his stuff is viable for anything. Personally, I don't think he'll succeed (because he's not targetting what I believe is the main issue today), but I wish him luck. There were some other analog AI startups and they didn't go too far, they achieved low power use, but the param count as minuscule, I don't see him resolving this problem with his approach. Might be interesting research though.
>>105895593Of course. I could also afford to just get a mac studio and sell it off in the future. I won't do that for a current level AI.
>>105895401downloading prior to merge
>>105893996Simpler than that, really. You know how before thinking we had the whole CoT thing, and how with thinking supposedly the models are trained to break problems down and dress each part, etc?
As a simple minimal example, one could take a simple non-thinking model and prompt it to look at the chat history and break down the context of the scene, then prompt it again asking it how {{char}} would act/react, then prompt it again for the final reply as {{char}}.
A simple multi prompt approach where it itterates over its own previous output before providing a reply, the manual version of a thinking model in a way.
I have a suspicion that that might actually perform better than training the models to have a whole ass thinking block, but that's just a gut feeling.
>>105895401>barely fits in my memory and will probably run horriblySee you guys in two hours when this downloads.
Also hurry up, Daniel.
>TTS
Just give me cunny voice ootb
>>105895184I don't even think of you