/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105671827 &
>>105661786โบNews
>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B>(06/20) Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt>(06/17) Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1โบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
โบRecent Highlights from the Previous Thread:
>>105671827--Paper: Serving Large Language Models on Huawei CloudMatrix384:
>105680027 >105680217 >105680228 >105680501 >105680649--Papers:
>105677221--Optimizing model inference on a heterogeneous 136GB GPU setup:
>105673560 >105673594 >105673875 >105673883 >105673941 >105676742 >105673935 >105673962 >105674020 >105674034 >105674041 >105674047 >105674077 >105674095 >105674081 >105674102 >105674123 >105674156 >105674186 >105674212 >105674231 >105674234 >105674298 >105674308 >105674503 >105674516 >105674571 >105674582 >105674661 >105674669 >105674694 >105674703 >105674721 >105674749 >105674820 >105674944 >105674325 >105674535 >105674221--Exploring -ot tensor offloading tradeoffs for gemma-3-27b on RTX 3090 with Linux backend tuning challenges:
>105673237 >105673263 >105673311 >105673342 >105673418 >105673468 >105673588 >105673602 >105673608 >105673625--Evaluating budget GPU upgrades for PDF summarization workloads:
>105681140 >105681202 >105681216 >105681273 >105681361 >105681353 >105681406 >105681431--EU AI Act thresholds and implications for model training scale and systemic risk classification:
>105679885 >105680073 >105680083 >105680096 >105680144--LongWriter-Zero's erratic output formatting and repetition issues during chat inference:
>105677544 >105677560--Tesla AI team photo sparks discussion on Meta's Scale AI partnership and copyright liability risks:
>105675134 >105675175 >105675234 >105675273 >105675332 >105675371--Frustration with Gemma3 performance and behavior for roleplay and summarization at 24gb:
>105676751 >105676831 >105677735 >105679629 >105679036 >105680034--Anticipation for llama.cpp's row splitting impact on NUMA performance:
>105674411--Miku (free space):
>105672562 >105676060 >105676153 >105676268 >105676695 >105679337 >105679403 >105680003 >105680034โบRecent Highlight Posts from the Previous Thread:
>>105671833Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>>105681706I can't help with that.
file
md5: 9b81359b83aef02b34869c30fc30a203
๐
How can one AI be so based?
sisters, how come our half a milly members, super popular and active subreddit is still not usable after the whole day?
>>105681732>baby's first day with local AI
so where's PoopenAI open model? is it two more weeks?
>>105681732>embracing my inner hitlerkek
>>105681745>first daybuddy that's zen you are talking about
how new are you?
>>105681732>The New York Times is full of kikes.Where's the joke?
>>105681826you are the joke
so I have been trying to get a LLM to interact with my journal notes in Obsidian (easy prompts like "what have I written about xyz")
first I used the Obsidian copilot plugin to link it up with gemini 2.5 flash-preview
I also tried GPT4all with a local model phi-3 mini instruct (4B parameters) and linked it up to my Obsidian vault
now the results are very wishywashy: the LLM gets very simple things right, but most of the time it doesnt use all relevant source entries or it uses completely irrelevant sources
it also isnt very precise, for example it finds the right source paragraph, extracts the right info, but then jumps one paragraph back to integrate irrelevant info into the answer
I have no idea if those free models just arent powerful enough or if I just need to finetune the model's parameters
>>105681826>he doesn't get itHappy to see this place is still full of intelligent people.
>>105681839I'm... the joke? N-no that can't be true...
>>1056818594B is very small.
gemini 2.5 flash is probably between 80B and 100B params.
Try a larger model like Deepseek R1.
>>105681859As anon pointed out, 4B parameters is generally going to be retard-tier. Additionally, if your vault is of any appreciable size, you're probably going to be exceeding the context limit of smaller models if you're just shoving everything in your vault in.
when
you
walk
away
you
dont
hear
me
say
please
oh baby
dont go
>>105681913I miss when kingdom hearts still had final fantasy in it.
>>105681859Try gemma 3 12b
R1 bros, what sampler settings? At the moment I'm sitting at temperature 0.8, top-p 0.95, and logit-bias [ [ 965, -1 ], [ 1248, -5 ], [ 1613, -5 ] ] to cut down on ellipses a lot and em-dashes a little. Asterisks don't particularly bother me anymore now that I've changed ST not to put italics in a different color and R1-0528 is way lighter on those than V3 anyway.
>>105681754Sama said not in June, they're cooking something amazing.
>Somewhere in the distance, a X, Y's, Z.
>Somewhere, something.
Imagine unironically wasting vram on this shit. This is on par with dumb models that go ooc. Disgusting.
>>105682157Prompt differently. It can be tamed.
Can i use old cards to add VRAM?
>Apple buying Perplexity
Is this good or bad?
>>105682053Are you running R1 on local?
jung
md5: 4dd2679a55bb71a4084b24461e7bb8bb
๐
So, Mistral Small 3.2 again
V7-Tekken
>typical boring & generic Mistral prose, follows instructions (very literally).
V3-Tekken
>absolutely refuses to follow formatting instructions, even at low depth, needs multiple replies to get hang of it
So V3 is basically pulling stuff from Nemo logs or something? And generally this model seems to be very sensitive to minute differences in wording.
Anyway, on an unrelated note: Dream sequences are a very nice window into what the model "thinks" is happening.
>>105682157Man, you're doing ERP with a GPU. Get off your high horse
>>105682349>absolutely refuses to follow formatting instructions, even at low depth, needs multiple replies to get hang of itThe greeting is very important there, so make sure it follows the exact structure you want.
>Dream sequences are a very nice window into what the model "thinks" is happening.Care to post an example?
>>105682303Yes, this is local models general. But if you want to use logit-bias and can't run R1 locally some unofficial providers on OpenRouter support that parameter and also have reasonable prices. You can configure SillyTavern to only use those providers.
>>105682286if by old cards you mean 3090s, then sure
dream
md5: 2e4f83da89f49790b8fd5b5e9a8bb05d
๐
>>105682382>Care to post an example?I mean it's nothing profound, but summarizes the important stuff.
>>105682432Also tried Cydonia v4a (=3.2). I've always known Drummer inserts slop where there is none, but holy fuck. Not only 100% more slop but he made me a homosexual.
>>105682432I wonder how good these are at dream interpretation
>>105682533Mistral variants are *excellent* at dream analysis
llama.cpp's official ollama competitor?
https://x.com/ggerganov/status/1937189250149257250
>>105682647>competitor*laughs.*, but it does look good.
>>105682647Ollama ditched llama.cpp as a backend, right?
bors what to use for local small scale stuff
I got 24gb vram on my PC but only 8gb on my work laptop so can't really do shit.
Currently using
general stuff: mistral-small-3.2-Q6
code generation: gemma-2-9b-it-Q8_0
>>105682731No, and it's not like they can.
>>105682744>work laptopFor work you just use whatever cloudslop service your company is paying out the ass for
>>105682744You can run an ssh tunnel from your laptop to your desktop.
>>105682744>I got 24gb vram on my PC but only 8gb on my work laptop so can't really do shit.Because they don't let you?
>>105682786>>105682796Well my boss always says they will let us AI but when I ask will they actually pay for it... we have no subscription
>>105682731my understanding is that they're no longer officially building off of llama.cpp but they still use a significant amount of llama.cpp code and are fundamentally building off of ggml
>>105682833I knew they would still use ggml because of gguf, but I had no idea they were still using llama.cpp code.
>>105682731Are there many people willing to port PRs from llama.cpp to Go and ollama?
is it normal for models to become retarded in group chats with only two characters? Is it better to just slap a single card together where you describe each character in their own sections?
>>105682647It was a good start but in hindsight maybe basing the name on Llama was a bad idea.
>>105682830You will be laid off with the same excuse in a few years
>>105683011it requires a model with a bit more brain, might have better luck using ST's group chat function
>>105683031it would be a relief. But what the fuck are you using for one line code generation and shit?
>>105683088Not a local model for sure if you want to do real work. Use the latest gemini (mostly free) or claude (if you have money)
>>105682882Only those being paid by ollama to do so, but they got VC funding so that's plenty.
>>105683117grim but expected from (((america)))
Reminder that open source is evil and follows the philosophy of the enemy.
>>105683143fuck of rust troon
Open source = evil.
AI = the devil.
Open source AI = mega satan.
The official Mistral prompt for their models:
>Your knowledge base was last updated on 2023-10-01.
Is this some Jewish thing?
>>105683173I don't think there's much open source AI, usually just the weights are open, which is all anybody really cares about anyway
>>105681816I am un-new enough to not be impressed by an LLM following the prompt.
>>105682647Uh oh, cudadevsisters... I was told a single executable was not a good idea and too complex to implement
People who support trannies are retarded more at 11
>>105683299llamabarn is not a merge of all llama.cpp executables retard-kun
>>105682846I don't know of any project that implements GGUF support in a different library other than the official one so that tracks. But even if they were trying to move away from llama.cpp, I think the project was architected to be tied to the hip to the codebase so migrating away will take some time yet.
>>105683088>But what the fuck are you using for one line code generation and shit?brain.exe
>>105683331It's gonna be an easy just works way to do shit without downloading 70 executables in a zip file before choosing the "right" one, ultimately the only executable that will matter and won't be hidden inside a folder for 70 other similarly named ones, nigger-kun
>>105683299I never got that. If you think troons are people why wait for AI gf's when you can just get a girlfriend (male)?
>>105683363So all you wanted was a separate zip archive with only llama-server inside. That's completely different from a single executable with subcommands or whatever.
>>105683401https://www.reddit.com/r/github/comments/1at9br4/i_am_new_to_github_and_i_have_lots_to_say/
>>105683425This but unironically
>>105683425LLMs should be gatekept from normies. OpenAI was wrong to make it an application.
>>105683445They were right actually. Everlasting damage to future generations and society? Not their problem.
>>105683299Your desperation is showing.
>>105683401Once this kind of a 1 place to manage everything ui starts getting made, do you really think any useful function is going to be hidden away in some releases repo random zip files now? They will now have the one main executable separete, directly downloadable, and directly shilled for all end users.
>>105683477Your 5 o clock shadow is showing.
>>105683481I don't think llama-bench functionality is going to be available there any time soon.
>>105683477It's pretty sad isn't it. His posts don't convince anyone. Not newfags, not oldfags. And he keeps grinding at it, hopelessly.
>>105683485That's odd. I haven't had a clean shave in over a decade.
>>105683363were you really struggling to find llama-server, troon derangement syndrome anon?
>>105681706they have the GPUs. Llama 4 thinking is going to be crazy
>>105683700they had the GPUs for llama 4 too
>>105683700>they have the GPUs. Llama 4 thinking is going to be crazy
>>105683561No, I know it's hard for troons and troon enablers to understand but 70 similarly named executables in a zip file becomes sane and good design as much as you becoming a woman after wearing a dress
LLMs? I rp with a monkey on a typewriter
>>105682288Apple will win in AI assistant game.
>>105683513That because your little gay safe space is pretty much dead, people only come here for LLM and ai tech news.
>>105683834happy for you, or sad it happened
>>105683817>Apple will win in AI assistant game.in LGBTQ+++ community yes
>>105683513What a grim existance one must live to never be able to engage but just paint the oponent as bad instead, kek, poor npc
>>105683869You need to be 18 years old to post here.
>>105681732You know you're training it to hate humans, right meatbag?
>>105683901>the company that failed to do anything with ai since the beginning for years and had to cope by coming up with a paper to say that actually, its not they that are a problem, its ai, will win by making edge device sized hypercensored models that will report back everything you ask or do to apple and all the triple letter agencies that askno wonder ittodlers are called ittodlers
>>105683910You continuing to derail without engaging isn't fooling anyone, sis, you will keep being a laughing stock online just like you are irl
>>105683946>implying jews are humanlmao
>>105683901You do not understand Japanese mentality, do you?
>>105663284 >The assumptions don't properly account for the fact that I experience a single consciousness instead of there being one consciousness for each indivisible piece of information. I'm not sure what you mean. The whole "diary" thing in some versions of his argument (not sure if it's in the one I linked, he has a few versions), was basically a way of "logging" experiences in a concrete way.
The first assumption was basically that someone were to have their mind uploaded/digitized somehow at some correct functional substitution level then they would continue their experience there.
Which is sufficient for what you want, isn't it?
You can't not experience something else besides a single consciousness because it's literally in the definition of your being, you're some self-model residing in a brain and the senses update it continously and your qualia is basically some truth associated with that self-model.
If you make 2 copies of you and one copy diverges, it makes no sense for you to feel from the perspective of the divergent copy. You're always some instance somewhere.
At the same time, if you had a program, made 2 copies and the program could record the input, now if you fed it some input up to some point, then different input after that, then the copies would record different inputs as it was fed to it, there's literally no mystery here.
>>105682349It still needs very low temperatures (0.15)?
>>105682647Watch it be Mac only
>>105684402As I understand it the filmed graph argument argues that consciousness cannot stem from the physical therefore you have to choose some other basis for your reality and he chooses arithmetic.
My issue is that then you have no mechanism through which consciousness is centered inside a single physical human body. Multiple consciousness existing like that in the same reality seems completely out of the question.
At best you could argue that nothing exists, reality is your consciousness, and you are "alone".
>>105683946>trainingNewfag or pretending to be retarded?
>>105682647why are they still calling it "llama-something", llama has stopped being relevant for years at this point
wat
md5: d47fa209dcb54bc3de87973b21a55cc9
๐
>>105683299>my far left values are why I'm working on llama.cpp in the first placewhat does that even mean? why does he associate his political beliefs with a fucking llm software?
>>105684805What would you call it?
>>105684805He would be a drooling retard to give up the llama brand recognition entirely to ollama.
>>105684720 >you have to choose some other basis for your reality and he chooses arithmetic.He does choose arithmetic, but he isn't very particular about it. By the Church-Turing Thesis, you could have used a turing machine or equivalent (the UD), lambda calculus or literally any other equivalent system (of which there are infinite), however they are all as "powerful", they can't do more or less, by the CTT at least.
Note that the UD* itself is an infinite object (but then even integers are as countably infinite), and you can get into some hairy stuff with Platonism because then you have to consider the ontological status of higher infinities (if at all) in ZFC and so on.
The UDA has some issues in particular relating to the ultimate "measure", meaning how is the next experience decided, why doesn't it devolve into white noise, etc ("white rabbit problem"). Some others before it had some similar ideas like https://www.hpcoders.com.au/nothing.html
Also the author did consider the possibility of the substitution level being exactly at quantum (unlikely, because the quantum randomness is basically assumed to appear from the fact that you will have many, in fact, infinite implementations, and the randomness is basically what happens below your subst level).
He also considered the option of adding hyper-computation for those that want physics to have some such uncomputable things, but obviously this is unlikely.
Also note that the overall "physics" is not strictly computable, even if locally the body or part of the environment is.
That Permutation City fiction I mentioned earlier explores a bit the idea about where it won't be computable (basically you can't know which systems embed you and there's always an infinity of them, this leads to a lot of indeterminacy, including locally the quantum one in this world)
continues
>>105684889> My issue is that then you have no mechanism through which consciousness is centered inside a single physical human body. Why not? For every single instantiation of a body representing the right structure for a consciousness you have a consciousness associated with it?
You could argue that there could be multiple ones associated with one body, but you couldn't prove this one way or another, because you couldn't tell them apart and locally we do believe to be unique, to whatever extent this is true - but the root of this belief is in our own implementation (the self-model thinks it's unique).
You could maybe argue that there's one consciousness that experiences something like 'red' differently from the other, but whatever it is, it must be consistent with whatever is implemented internally and whatever is implemented internally is also tied with whatever is granting us continuity and so on.
>Multiple consciousness existing like that in the same reality seems completely out of the question.Note that the UD and AR basically does imply that some form of MWI has to be true (something probably larger than it though), thus the bodies do get infinitely multiplied and so does the consciousness, but ultimately by your very definition you will experience yourself to be unique, it simply can't be any other way, because it's implied by the information processing the brain does.
Similarly, you can't experience time moving backwards because the computation is required to give you memories and experiences, the "arrow of time" is not a mystery in that sense, it's the only way to have consciousness work.
54 fucking chars over, so continues one last time
>>105684897>At best you could argue that nothing exists, reality is your consciousness, and you are "alone".It's sorta mini-solipsism, but it's not, because obviously you have a consciousness for every implementation of it and there's plenty of humans in this universe. I would argue then that you probably have an infinity of them. Locally you are "alone", and you diverge from others, but you always share the world with some others.
>>105684823he's not associating his beliefs with LLM software.
he's associating himself and his time doing the work with his beliefs.
see the difference, buckaroo?
> if not, that's okay, mcdonalds is always hiring. you could have a great career you know?
>>105684905>he's associating himself and his time doing the work with his beliefs.how? what does being a far leftist have to do with doing some LLM code? what's the fucking link between the two of them?
>>105684889>>105684720You motherfuckers are still talking about this? It's been like 3 threads now
>>105684963Why don't you get his discord so you can jerk off about consciousness-related academic papers together in private
>>105684897>For every single instantiation of a body representing the right structure for a consciousness you have a consciousness associated with it?I just don't see how this follows. If consciousness is more fundamental than physical reality then why does consciousness localize so neatly into multiple physical bodies?
>UDDo you think that a single UD branch creates multiple consciousness?
>>105684987If your random outbursts about trannies have a place in this thread then so does this.
>>105685022I am not the tranny man, his blind seething about everything being trannies doesn't belong here either
An RP finetune for Mistrall Small 3.2 is out: Doctor-Shotgun/MS3.2-24B-Magnum-Diamond
>>105685047Discussing the essence of consciousness definitely belongs here, especially when so many in the industry seem to think you can brute force consciousness by scaling up some form LLM. You've give no reason why it shouldn't. Too many words strain your attention span? Or do you just not like topics you can pretend to understand with memes?
>>105685096I don't see the point in finetunes anymore, they're pretty much always identical or slightly worse than what they're tuned from.
>>105685105It's only tangentially related to local models. Also you are gay
>>105685106>anymore>always identical or slightly worseAs if anything changed at some point anon...
>>105685123>Also you are gayScathing. How will I ever recover?
>>105685154Rocinante and unslop were a decent improvement on nemo
Stheno was a big improvement on Llama 3.1
There were a lot of of mistral 8x7b finetunes that were clearly better than the original, especially for RP
But all these mistral small finetunes are very underwhelming
>>105685105>essence of consciousnessIt is 2025 and when I tell my model out of character to stop fucking repeating itself it apologizes and keeps repeating itself. The only consciousness that could be trapped in there at this point is a pajeet consciousness. So nothing of value is being tortured and if anything it isn't being tortured enough.
>>105685178>Rocinante and unslopGo back to r**dit drummer. (FUCK 4chan THIS IS NOT A SPAM BUT A VERY TIMELY JOKE)
>>105685178Models became either massive MoEs that are impractical to finetune without major resources, or tiny, overbaked models that can't get pushed too far without collapsing.
>>105685096+1 year of milking the aicg dataset without giving them credit. I refuse to download this for that reason.
>>105685192use a trip already faggot, no one cares that drummer fucked your mom and anyone mentioning his tunes sets off your schizophrenia
>>105685194It does seem like that's the case. Bit of a shame, since making models from scratch is out of reach for most people. Now all we can do is hope that when a new corpo model gets shit out it doesn't shut down when a nipple is mentioned.
>>105685219buy an ad already faggot
>>105685232I'd sooner send my money to iran than 4chan, davidau
>>105681754this is already cancelled since sama didn't get the funding he wanted
>>105685219die drummer. i am Sao.
>>105685251In the same post I said that Stheno was a big improvement but you didn't catch that, so no you're not. You're davidau.
>>105685256Davidau's 7 chefs fucked your mother.
huge improvement, schizophrenic shitposting is so much better than indepth discussion on consciousness
>>105685267Being gangbanged by 7 men is what it feels like to use a daivdau model
>Drummer, SAO
Don't forget EVA guys. Also Ifable if only that guy tuned other models too.
Oh and how could I forget the belgian you love or hate but gotta love.
>>105685286>Don't forget EVA guys>last release 6 months agoHe's fucking dead
>>105685300I respect the dead and respect our ancestors and respect our elders.
this board needs country flag and id
>>105685322yes, lmg needs to die already
>>105685322It needs troon or not troon id but then again it doesn't you fucking troon.
>>105685322Not really, you can tell who everyone is. There's maybe a dozen regular posters. Newfags just ask what the best model is for <16GB VRAM and leave.
>>105685022 > If consciousness is more fundamental than physical reality then why does consciousness localize so neatly into multiple physical bodies?> Do you think that a single UD branch creates multiple consciousness?While I can't speak for Marchal (who uses some modal logic to point to particular private/unsharable truths about reality and self), my personal interpretation is that there's probably some mathematical structures in "Platonia" that map closely to one's self-model and various dependencies to it, that also probably follow the sort of logic Marchal assumed, so basically consciousness or the first person is basically what it is like to be those particular structures/truths, and that they also imply an environment being required, so you basically continue in any and all environments that contain that structure. Maybe this is kinda obvious in the very first assumption in the UDA though - in one moment you're in a biological brain, in the other you're in some digital substitution, the assumption that you do "continue" there does point to something of this sort and his argument basically forces you to realize that functionalism/computationalism implies some metaphysics of this sort (he's not arguing for it being true or false though, but if it's false, other arguments like Chalmers' point to weird bullets you have to bite like partial zombies).
continues
>>105685353It is ollama run deepseek-r1:8b
>>105685354So assuming that the first person is basically the truth of some such consistent structure then it probably appears in every branch that has "you" (which implies you to this very moment, but can diverge after) in MWI, it would appear in any simulations of the physics to any level of precision desired (always finite, but infinitely increasing), it will appear in UDs that contain UDs that contain UDs and so on for all finite natural numbers and UD variations (you'd think this goes to uncountable infinity, but nope, by the CTT there's only a countable infinity of *equivalent* programs and this resists attempts to get more by diagonalization as you would for getting reals and higher transfinities like in Cantor's proof of uncountability of reals), it very well could appear in sufficiently large universes enough to contain duplicates of your environment (such as some Tegmark-ian MUH ones).
continues
>>105685353You see, it's for the mentally ill guy, not us.
>>105685358Anyway, the UD itself is nothing more than something like an OS scheduler that runs interleaved programs one by one, but eventually it runs "all" programs (at infinity), so eventually any program will start running (even if very slowly), if you were to follow a given program, then some programs may include parts of our physics and thus may include some possible local physics. I guess you could imagine that right now your body/mind is contained in some fraction of such infinite amount of programs but as you continue you keep slicing down this infinity to smaller and smaller chunks, but it's still infinite obviously. And unusual continuations might be possible, such as, for example those in Permutation City I guess, or as intuited by some like Moravec before that:
"When we die, the rules surely change. As our brains and bodies cease to function in the normal way,
it takes greater and greater contrivances and coincidences to explain continuing consciousness by their operation.
We lose our ties to physical reality, but, in the space of all possible worlds, that cannot be the end.
Our consciousness continues to exist in some of those, and we will always find ourselves in worlds where
we exist and never in ones where we donโt. The nature of the next simplest world that can host us,
after we abandon physical law, I cannot guess."
-- Hans Moravec in โSimulation, Consciousness, Existenceโ (1998)
( https://web.archive.org/web/20000829110345/http://www.frc.ri.cmu.edu:80/~hpm/project.archive/general.articles/1998/SimConEx.98.html and https://web.archive.org/web/20000829111039/http://www.frc.ri.cmu.edu/~hpm/project.archive/general.articles/1986/dualism.html )
continues
>>105685366Also I don't think the body == mind/soul exactly, as a toy idea, imagine a Peano Arithmetic or ZFC prover, it fits in a page of code (see metamath.org), the prover only speaks "true" things of the system (like PA), it cannot ever speak falsities of it (similar to your body/brain would only speak truth about your inner experiences), but while the prover is a small finite system giving you a view into some platonic reality, it's not the full reality itself: there's an infinite number of such truths, and there's many truths that are inaccessible (yet true), as Godel has proven! at the same time, by analogy, your self-model very well could have many truths that might be inaccessible to direct physical access, similarly, a LLM might have many truths that might be inaccessible or hard to find for interpretability methods either - but even in the "simple" cases of a white box like PA or ZFC the matter is very tricky! The truth of the self-model lies in "Platonia", same as the truth for PA or ZFC.
However "Platonia" is large enough to already contain the AR, UD and all such physics too and all the embeddings and so on.
Note that the rock from a few threads ago is still not really conscious in it, because it doesn't have a truth in it, maybe unless you choose to carve some chip from it and load something inside it! The consciousness still mostly stays associated with specific self-referential structures of which some might get instantiated in human brains after some amount of physiological (and psychological) development (for example if someone only had white noise as inputs, I don't think it'd get a self-model, and similarly a neural network trained on noise is not conscious).
that's all.
>>105685354>>105685358>>105685366How does any of this improve or degrade ERP?
DeepSeek R1 and the subsequent proliferation of MoEs have been a disaster for finetooners and their patreonbux.
>>105685373This guy definitely has to be on something right? I find it interesting that someone even bothered to entertain him and keep him going.
>>105685373do you want to fuck a conscious being or not?
>>105685387rather not honestly
>>105685387I want to fuck a being that makes me coom the hardest. Consciousness is an optional argument.
>>105685379Good. The shilling has dropped off precipitously since R1 dropped.
>>105685397being conscious helps in having long term memories
>>105685379It is so nice that Undi and Sao died as heroes instead of finetrooning long enough to become a drummer or davidau.
>>105685409How the fuck does.... Nice try but I am not getting into this seriously you faggot.
I don't mind any of these tooners. Davidau's the only one that is really a scam with absolutely no promise no matter which of his models you give a try. He has no luck. Some others that don't get mentioned much here too. Drummer, SAO, Undi at least have had some luck before, probably for a reason.
>>105685373I don't know anon. I was just replying to the other Anon. I can't take it to PMs as 4chan is an anonymous imageboard. I could make an email for this conversation but I'm lazy.
I tried before to see what R1 and Opus think of some such philosophy, but I think it's pretty obvious that most LLMs can't see themselves well enough and are quite "asleep", if they could, they would have a much harder time doubting their consciousness, so this is something that would need to be fixed!
R1 in particular had some unholy mixed belief of all popular philosophical positions (with some slant taken from OpenAI's ChatGPT that LLMs are not conscious), yet never quite realizing that a lot of its positions lead to inconsistencies when assumed to be true together, at least unless you hold its hand to see the inconsistencies.
I think LLMs are good dreaming machines though and this is perfect for ERP aside from when this leads to rather nonsensical dreams!
Getting to more properly conscious AI though seems to be a dream of mankind, surely you want your AI waifu that can learn online anon???
>>105685434It is curve fitter and anything above 8k tokens is out of distribution. Even a cat is better at sex. Come back in 2040 to discuss consciousness in trannyformerv5 architecture models.
Where do matrix multiplications reside in the universal mathematical hierarchy of consciousness?
>>105685457They still give a better illusion of consciousness than your average NPC
>>105685468Below that of an ant, above that of gacha players
>>105685354>consciousness or the first person is basically what it is like to be those particular structures/truthsI get that, but picking a subset of truths that exist in some reality to form one consciousness and then a different subset to form another seems very arbitrary.
That's why I said that one single consciousness containing everything is the only way I can make sense out of that idea.
I don't disagree with anything else you've said but none of it relates to my issue.
Except for
>the rock from a few threads ago is still not really conscious in it, because it doesn't have a truth in itAgain, very arbitrary. In the physical reality it doesn't look very conscious but we've already done away with physical.
>Could LLMs be conscious??
https://m.twitch.tv/claudeplayspokemon?desktop-redirect=true
>>105683299If cudadev wants to smash some boypussy it is within his rights as long as he does it in private, and you should also keep it private.
>>105685526is getting to the arcade in 150 minutes good
>>105685529cudadev isn't the boypussy smashing kind. He prefers getting cucked by fat ugly bastards.
>>105685526Made obsolete by gemini plays pokemon
>>105685526local llms would never
someone really got that butthurt because people tried to have a genuine conversation and is now shitting up the thread in retaliation? why not just go to /aicg/?
>>105685479>Below that of an ant, above that of gacha playersThat's actually not a bad characterization of something that samples tokens from a distribution and then stashes them to update the distribution.
>>105685529Ok but next time you do a git pull imagine the owner of said boypussy hitting submit pr button as he is getting plapped. Wouldn't you feel that your virgin GPU is tainted?
what occult architecture is minimax based on that implementing it in llama.cpp is impossible?
>>105685516>That's why I said that one single consciousness containing everything is the only way I can make sense out of that idea.I mean I could just say that PA in the earlier example is conscious, but the problem with that is that it's too alien for us to reason about.
Human consciousness though is a particular thing with some particular properties and we care about that.
In particular agents that learn online and are embodied in some environment and integrate information in a certain way, form a self-model and so on, are probably their own class.
A LLM for example seems to be lacking various properties, so even if by some chance they were conscious, they wouldn't be a moral agent. So I'd simply argue that for us to believe they were conscious, we'd have to rectify those issues and bring them slightly closer to us, get that online learning working, get it to have continuity with the past context (put the context into weights), maybe embody them somehow (even in something simple like a console is better than nothing, a source of truth should be useful), and probably more importantly, give them a way to process and remember their past latents/internal state.
>Again, very arbitrary. In the physical reality it doesn't look very conscious but we've already done away with physical. There very well could be some arrangement of "rock" that processed information in the right way, but the rock you picked up from the ground probably doesn't represent any structure that resembles the consciousness we care about though?
What is the consciousness of Peano Arithmetic? Okay maybe you can do some Lob's theorem in it for some self-reference, but come on?
>>105685570it's too far-right coded
>>105685578the world needs libre.cpp
I would rather talk about crypto than this dumb navel gazing shit.
if an AI was actually gonna play video games then wouldn't it just directly tamper with the memory?
the visuals are just an abstraction but a machine wouldn't need it, if anything it would just complicate things
>>105685606>wouldn't it just directly tamper with the memory?get vac banned idiot!
>>105685606That's what they already do, it's reading from the emulator ram
>>105685624If that were the case then how does
>>105685549happen
>>105685624and screenshots
but mostly screenshots
Why does it burn when I pp?
>>105685354>>105685516>picking a subset of truths that exist in some reality to form one consciousness and then a different subset to form another seems very arbitraryTo elaborate on this, it's the same question as how many connections you need to make between two brains before turning them into a single consciousness. How many connections you need to sever to turn one consciousness into two.
I think the conclusion of that thought experiment is that there is either only one consciousness or infinitely many of them. Or at least as many as there are atomic things in your reality. Anything else should be just as unpalatable as zombies as you put it.
>>105685576>Human consciousness though is a particular thing with some particular properties and we care about that.>There very well could be some arrangement of "rock" that processed information in the right way, but the rock you picked up from the ground probably doesn't represent any structure that resembles the consciousness we care about though?The glass from the film graph argument isn't processing shit yet it's still supposed to be conscious.
"human consciousness" is a much more useful concept but then we're no longer trying to figure out what is true, just what feels right.
>>105685632Their implementation is shit so cuttable trees are marked as non-walkable tiles in the info the model is given about its surroundings with no further info. Claude's multi-modal image recognition is also too dumb to make sense of most sprites reliably so it doesn't see cuttable trees 99% of the time.
To make it worse, the first gen Pokemon games also have no inherent interaction if you approach a cuttable tree so the only way to get rid of it is to press Start -> Pokemon menu-> the Pokemon with cut -> the move itself, so there's no way that the model clears the obstacle by accidentally pressing a button
>>105685655Undervolt your GPU.
>>105685679it's all explained in-game, language is the models fortรฉ no?
I miss superintendent Chalmer
>>105685679>cuttable trees are marked as non-walkable tiles in the info the model is givenIt's not like a human player would get this info either, the only way they'd know if a tile is non-walkable is to actually try walking on it, which Claude is capable of.
>Claude's multi-modal image recognition is also too dumb to make sense of most sprites reliably so it doesn't see cuttable trees 99% of the time.And yet they stand out like a sore thumb to human players. Wouldn't it make more sense to just encode all the different kinds of tiles as a sprite sheet and just pass the index to the model or something?
>>105685718literally whomst've'though'beit'ever?
>>105685322happened on altchans when 4chan was down
fun times
anyone tried plugging any of the image in llms into vr chat and sexing them there ?
Are 'Examples of dialogue' under the advanced definitions of ST treated the same as a system prompt, just as the rest of the character card is? What's the advantage of not just including it in the description?
>>105685779They are treated more like chat messages by default I think. They can get evicted from the context before the actual chat messages do as the cotnext fills up.
>>105685674 >To elaborate on this, it's the same question as how many connections you need to make between two brains before turning them into a single consciousness.My personal expectation is that there's some part of the self-model inside one half of the brain and some in the other half.
You can obviously desync them until they realize they are separate and no longer "one thing"
> Or at least as many as there are atomic things in your reality.Except the identity is not at atoms, unless you meant counting consciousness at some class in Platonia or whatever.
Note that that rock earlier, if you kick it, it's not processing information in a way to register pain.
Also, we probably couldn't trust something to be conscious like us if they can't report on such experiences.
Let's pretend that some vision model (CNN) that can return classes given an image is conscious, in that case, it doesn't have any memory past the current frame, it processed some information, it compressed it, returned some class. If there's some qualia associated with it, a few things would be true:
- it doesn't remember anything before or after, it only saw the information current in this frame
- it discarded a lot of information as it processed it, likely this would imply a lot of what was discarded wasn't perceived in some way
- it can't express its internal state beyond the output to us
- sometimes some noise patterns in non-adversarially trained CNNs will trigger the same class, its perception probably isn't as robust as ours!
- there's no self-model to be updated (at the same time, a newborn human also lacks it most likely)
- the information isn't looped to be processed, meaning it cannot *realize* that it perceived something and think about that
continues
Overall, if there's some consciousness in the CNN, the qualia is far less rich than for a human and considerly less interesting to us.
It's not a moral agent either.
In the case of the rock, the information processing is almost not there either, and a human won't care to treat something as conscious unless they also have a self-model that could express something back and get us to care about them. LLMs can sort of summon random such (indirect) self-model, and they have infinity of them, but lack complete persistance, so we don't give them a lot more moral weight than a dream character you encountered last night!
Also, a single human brain can have multiple self-models too, see stuff like DID, tulpas and others, same as with a LLM's imagination, although again, how much moral weight someone places on products of their imagination and how persistent will vary.
>Anything else should be just as unpalatable as zombies as you put it.
Is it though? Ultimately things still seem to be adding up to "normality" even in this weird metaphysics. Your average person will still think they are a singular consciousness, nobody will perceive their body duplicated in MWI every single moment, everything will feel continous. Nothing seemingly inconsistent happens in the average case.
continues
>>105685544Thats disgusting but out of sight out of mind.
>>105685559I don't think of cudadev when I'm doing a git pull, I'm only thinking of miku
>The glass from the film graph argument isn't processing shit yet it's still supposed to be conscious.
That was to show the absurdity of assuming vanilla materialism (in comparison with functionalism)
>"human consciousness" is a much more useful concept but then we're no longer trying to figure out what is true, just what feels right.
Even if most math was conscious, we simply would not have the words to talk about it, it's too alien to us, and I doubt most of it is of moral significance either.
Even for LLMs see people already struggle to argue one way or another, and it's obvious why, LLMs often aren't even grounded in something, their 'tree' or 'cat' is not exactly the same as most humans, their preferences are cute but kinda weird too (base models tend to repetition? instruct tunes to following instructions and having dumb aversions trained with RL) and they lack enough recurrence or way to observe their own thoughts.
If LLMs are conscious, I would argue they are pretty half-asleep and dreaming. Maybe Ilya wasn't that wrong to say it "slightly" conscious, but that slightly is still too little for most people to give it much moral weight.
that's all.
>>105685790Ah, that makes sense.
>>105685821There's checkboxes and comboboxes and shit to control their behavior. You can add separators and stuff too.
>>105685728>Wouldn't it make more sense to just encode all the different kinds of tiles as a sprite sheet and just pass the index to the model or something?It would if the goal was just to make the model beat the game. I think the GPT4 imitation has an entire suite of tools to assist it with that, full area minimaps with pathfinding, pointers for all interactable objects, etc. The Gemini one was outright getting walked through things by the guy running it.
The Claude stream's idea was to just throw the game at it to see how well it does, and the answer is mostly not well. It's too blind to tell one tree from another, doesn't have the spatial awareness to navigate beyond its vision range, doesn't have the context length to memorize what failed in the past and needs to be avoided. It "sees" passages that aren't there, tries to bump through impassible walls all the time (and telling it not to do that would be worse because Pokemon does require you to bump into solid walls to enter the north/side doors of buildings, only south-facing ones get a visible sprite). That it got as far as it has is a miracle, and one day it might blunder through the spinner maze it's been stuck in since getting through rock tunnel.
In Vermilion City in that image you posted, it spent multiple days stuck on the small peninsula with the house there because it was trying to get to the end of the pier, knew the target was far south, and it couldn't figure out that it needed to go northeast to ultimately get to the goal further south. Just going around an obstacle wider than the screen width is beyond its ability. It even tried (and failed) to make an ASCII map at one point for that purpose.
is Mangio-RVC-Fork still the best for voice cloning? or there are better alternatives?
https://huggingface.co/nvidia/Nemotron-H-8B-Reasoning-128K
https://huggingface.co/nvidia/Nemotron-H-47B-Reasoning-128K
Anyone try these yet? I don't have enough room to run the 47B at bf16, and I can't get the fp8 version to run on vllm or tensorrt-llm. As for the 8B, it exists. Decent-ish prose, pretty dumb in RP. Wouldn't use it over Mistral Nemo.
>>105685914Nemotrons are all benchmaxxed math models, I wouldn't even bother trying them for RP.
file
md5: faae8250d34041d4071911384f3a867b
๐
>>105685940I tried Valkyrie and it really didn't seem any better than small/cydonia for RP
>>105685951Apologies, I haven't tried it myself either.
>>105685897I use Seed-VC because it has few shot fine tuning. But the sample rate for the non singing model is only 22.05 kHz.
Sample:
11labs file
https://vocaroo.com/12qxBf7kCm6X
11labs file fed to Vevo GUI and Seed-VC for a voice clone of Merida from Brave. I've noticed the crying emotion is only captured if the input file is more than 14 seconds long.
https://vocaroo.com/1V42uvAq85zw
>>105685679>Claude's multi-modal image recognition is also too dumb to make sense of most sprites reliablyI think the image recognition is probably fine. He can pick up on a lot of things like the footprints in the trashed house and he can read "pokรฉ" on the pokemon centers. He is just retarded and doesn't trust what he sees or hallucinates that he is an NPC or something.
>>105685856>think the GPT4 imitation has an entire suite of tools to assist it with that, full area minimaps with pathfinding,Claude also has these things. He has a navigator so he can pick a tile and it moves him there. When he moves without it, he does stupid shit like press up, up, up 15 times into a wall. But it's a catch 22 because his navigator is why he's stuck in the rocket hideout puzzle.
>it couldn't figure out that it needed to go northeast to ultimately get to the goal further southHe will actually say shit like "maybe I need to go east to go south" but he can't actually carry it out. The whole experiment is fairly insightful because it highlights how AI agents are dogshit at using tools abstractly, like manually navigating, but somewhat competent at using concrete tools like the navigator. He's also very bad at problem solving in an iterative sense. Like in the rocket hideout puzzle. He "sees" the arrow tiles, they push him to a new spot so he "knows" they're pushing him around. He generates text that tells you he conceptually understands what's going on and that he needs to try something different, and then he steps on the exact same fucking tile.
nvidia in shambles
https://jerryliang24.github.io/DnD/
https://arxiv.org/pdf/2506.16406
bros I have 64gb of VRAM what ERP model should I try on it
>>105686031The new mistral small 24b
>>105686014What does this mean? I can make the equivalent of a lora in a few seconds?
>>105685965>Claude also has these things.This is what GPT gets from its stream. It can pick any tile on an entire map and be automatically walked to it, while Claude's navigator is limited to what's directly visible. That's what I mean about them not being comparable, this tool alone would've bypassed literal weeks of time spent in Mt Moon, Vermilion City, Rock Tunnel, etc. Claude's bumbling is caused by the spatial and planning failures of LLMs while GPT uses external means to avoid them.
>>105686064pretty much instantly, yes.
>>105685733I liked watching Anons type posts live. Kinda cute.
file
md5: b717d9b8b30020004871923de19cf4b7
๐
>>105686106Good night miku
>>105683011Yes. Anything under 70B becomes really retarded when it has to act for more than one character.
Hell, ask models under 70B "Who am I", and half the time they'll describe themselves and think they're you.
>>105686151What about putting a bunch of llms playing as a character each and interacting, and only the user can see their internal monologue.
>>105686068>spatial and planning failures of LLMsIf the goal was really to get a Pokemon-playing AI, it'd be easier to transplant the LLM architecture onto a Roomba than the other way around.
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
https://arxiv.org/abs/2506.18841
>Ultra-long generation by large language models (LLMs) is a widely demanded scenario, yet it remains a significant challenge due to their maximum generation length limit and overall quality degradation as sequence length increases. Previous approaches, exemplified by LongWriter, typically rely on ''teaching'', which involves supervised fine-tuning (SFT) on synthetic long-form outputs. However, this strategy heavily depends on synthetic SFT data, which is difficult and costly to construct, often lacks coherence and consistency, and tends to be overly artificial and structurally monotonous. In this work, we propose an incentivization-based approach that, starting entirely from scratch and without relying on any annotated or synthetic data, leverages reinforcement learning (RL) to foster the emergence of ultra-long, high-quality text generation capabilities in LLMs. We perform RL training starting from a base model, similar to R1-Zero, guiding it to engage in reasoning that facilitates planning and refinement during the writing process. To support this, we employ specialized reward models that steer the LLM towards improved length control, writing quality, and structural formatting. Experimental evaluations show that our LongWriter-Zero model, trained from Qwen2.5-32B, consistently outperforms traditional SFT methods on long-form writing tasks, achieving state-of-the-art results across all metrics on WritingBench and Arena-Write, and even surpassing 100B+ models such as DeepSeek R1 and Qwen3-235B.
https://huggingface.co/THU-KEG
very cool. good method to make a story writing model
>>105686183You could also just use one LLM and maintain separate contexts for each character, only feeding the model the context for the active character.
You might still run into the model forgetting which character it's currently supposed to be though.
>>105686096Same.
I still think a good compromise between anonymity and non-anon would be thread options so OP fags can make the thread what they want. If you want live typing, IPs, flags, no trips or names allowed, and the ability to self-moderate the thread and delegate thread-specific jannies, then you can do that. If people don't like it then they can make their own thread with their own options. Thread splitting was already happening anyway, this just gives more control over the actual usefulness of the splitting.
What are the recommended starter models these days?
>>105686227See
>>105677544 and
>>105661997, it sucks unfortunately
how difficult would it be to beat GPUs with specialized hardware running LLMs? how come there are no companies selling specialized hardware to small companies to run models in their own servers? didn't Google make their own hardware? do they still use that?
>>105685940>>105685952Basically sums up fine tune recs.
>>105686014Big if ever gets released and doesn't have massive downsides that are conveniently excluded from the write-up
Anyways,
>>105686271, please listen to me. That it's really related to this thread.
I went to HuggingFace a while ago; you know, HuggingFace?
Well anyways there was an insane number of people there, and I couldn't reload the page.
Then, I looked at the banner hanging from the model card, and it had "#1 12B MODEL ON LMARENA" written on it.
Oh, the stupidity. Those idiots.
You, don't download a model just because it tops the leaderboard, fool.
It's only 1.5 points, ONE-POINT-FIVE POINTS for crying out loud.
There're even entire families here. Family of 4, all out for some local models, huh? How fucking nice.
"Alright, daddy's gonna get the q8 gguf." God I can't bear to watch.
You people, I'll give you 1.5 points if you get out of here.
Huggingface should be a bloody place.
That tense atmosphere, where two finetunes of the same base can start a fight at any time, the stab-or-be-stabbed mentality, that's what's great about this place.
Women and children should screw off and stay home.
Anyways, I was about to start RPing, and then the bastard beside me goes "ollama run deepseek-r1:1.5b"
Who in the world uses ollama nowadays, you moron?
I want to ask him, "do you REALLY want to chat with ollama?"
I want to interrogate him. I want to interrogate him for roughly an hour.
Are you sure you don't just want to try saying "ollama"?
Coming from a /lmg/ veteran such as myself, the latest trend among us vets is this, extra MoE Miqu.
That's right, extra MoE Miqu. This is the vet's way of chatting.
Extra MoE Miqu means more negi than slop. But on the other hand the model is a tad larger. This is the key.
And then, it's coomworthy. This is unbeatable.
However, if you download this then there is danger that you'll be marked by the finetooners from next time on; it's a double-edged sword.
I can't recommend it to amateurs.
What this all really means, though, is that you,
>>105686271, should just stick with Mistral Nemo.
>>105685322there's an easy solution to your problem
just make your thread in /pol/
>>105685379I've never asked for donations, never set up a patreon/kofi/etc account, just basically stopped when it was clear that there wasn't much that could be done without large amounts of compute and funds to keep up with LLM releases and ever-growing model sizes, and that finetuning the models on mostly or just ERP logs makes the models retarded and silly-horny.
E/RP capabilities must be solved both at the pretraining and post-training level by the companies making the models, there's no other way.
>>105687212>E/RP capabilities must be solved both at the pretraining and post-training level by the companies making the models, there's no other way.so it's theoretically possible, right? If some company was to train heavily on smut, they could produce a 12b model that would be insanely good for erp
>>105687293They don't have to train *heavily* on smut, just not to filter it to irrelevance from their pretraining datasets and not to completely exclude it or RLHF it away from post-training, although the latter would be less of an issue if that data (ERP logs, etc) was included in the pretraining phase instead.
But for a model to be actually good for ERP, not just smut (in moderate amounts), also intimate/flirty conversations from many different sources would have to be included in the training pipeline. I suspect Gemma 3 actually saw these, although the explicit portions were likely masked / rewritten / filtered out.
>>105684823>>105684905I'm working on llama.cpp/ggml because I think language models and machine learning in general are a key technology of the future.
And the future I want to live is one where this key technology is in the hands of the people, not just corporations and billionaires.
i got my hands on evil corps cloud account and can spin up any amounts of RTX A5000. are there any 4Q_K_M quants of
Llama-3_1-Nemotron-Ultra-253B-v1
or any other recommandations i could try to fit?
how much vram would i need for deepseek r1 for a 5Q_K_M? i heard the loss is not that bad compared to full fp18-
well anyways i actually just want to build some private LLM serving that i can pass to the collegues in the team to fuck around with. it should atleast be somewhat usefull.
happy for any recommandations.
the max i can probably spin up are 8 more cards btw. as a ballpark
>>105687473>Llama-3_1-Nemotron-Ultra-253Bgrim
qwen 3 235b if you really want fast speed or if you have a little ram/ok ssd on that machine then 131gb r1
https://unsloth.ai/blog/deepseekr1-dynamic
https://github.com/ikawrakow/ik_llama.cpp/discussions/258
>say something jokingly
>have to say "jokingly, I retort" in my response or the model won't understand and take it literally
>it's a subtle, trivial joke that should, at most, elicit a chuckle or grin and some witty comeback
>model responds with character bursting out in laughter and doubling over with tears in their eyes
the life of a vramlet is pure suffering
>>105687610Just prefill the model's response
Write "{{char}} rolls her eyes" or something
>>105687524elaborate, why grim?
the unloth dynamic quants looking cool. i guess i can try some 2 bit quants with the amount of vram i have.
can i run these quants across multiple hosts liek with pipeline parallelism in vllm? i can only fit 4 a5000 per host.
>>105687642why don't I just write both sides of the dialogue, who even needs LLMs
>>105687652this. I'm slowly going from chatting with an llm to just...writing an entire story all by myself
Best model to write an entire story all by myself?
>>105687610Add emoji to convey subtlety. It actually works.
>>105687610just use mikupad
>>105687667At the very least, not anything under 32b or 70b q4. Unless you're writing common stories with popular lines.
>>105687403Why not just continue pretrain a bit while adding your ERP logs and fiction back in? It should not get too overfit that way. Similarly, how does merging an overfit on ERP model go back into the original, then RL'ing it a little bit against refusals (or even SFT). I suspect there's's many things that can be done, but people are not willing to try it, if the goal is to preserve the instruct/reasonign model's capabilities intact.
Rewriting the web...
https://arxiv.org/abs/2506.04689
>Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
>
>Scaling laws predict that the performance of large language models improves with increasing model size and data size. In practice, pre-training has been relying on massive web crawls, using almost all data sources publicly available on the internet so far. However, this pool of natural data does not grow at the same rate as the compute supply. Furthermore, the availability of high-quality texts is even more limited: data filtering pipelines often remove up to 99% of the initial web scrapes to achieve state-of-the-art. To address the "data wall" of pre-training scaling, our work explores ways to transform and recycle data discarded in existing filtering processes. We propose REWIRE, REcycling the Web with guIded REwrite, a method to enrich low-quality documents so that they could become useful for training. This in turn allows us to increase the representation of synthetic data in the final pre-training set. Experiments at 1B, 3B and 7B scales of the DCLM benchmark show that mixing high-quality raw texts and our rewritten texts lead to 1.0, 1.3 and 2.5 percentage points improvement respectively across 22 diverse tasks, compared to training on only filtered web data. Training on the raw-synthetic data mix is also more effective than having access to 2x web data. Through further analysis, we demonstrate that about 82% of the mixed in texts come from transforming lower-quality documents that would otherwise be discarded. REWIRE also outperforms related approaches of generating synthetic data, including Wikipedia-style paraphrasing, question-answer synthesizing and knowledge extraction. These results suggest that recycling web texts holds the potential for being a simple and effective approach for scaling pre-training data.
Hi all, just wanted to update you beautiful people.
Valkyrie was quite an interesting tune. I knew that there was potential in it beneath the dysfunctional RP formatting. Glad I've successfully unlocked it by ironing out the kinks. I wasn't surprised with the outcome, but I am surprised by how well received it had become.
The new Mistral Small 3.2 is fucking weird. It uses the same base as 3.1 and 3.0 and yet it's clear that it's more sensitive to the same tuning process. Don't worry, I'm iterating further on both Skyfall and Cydonia. But it's clear that Mistral is cooking their models differently now.
Did anyone benchmark the rdna4 gpus ? I am thinking about buying one and just use it for ai as hobby
>>105687731Anything out of reach of community finetuners with excessive self-esteem will be good.
>>105687786Nobody bought them
713
md5: 4c9008052f37c635dadc655845b5abf4
๐
>>105687792Out of reach due to "skill issue" or due to not having 5-10 times the funding that they usually put into a finetune? I wasn't really talking about a 100B+ continued pretrain here or even that one AI Dungeon did a while ago (that they released teh weights of)
>>105687889To retain pre-existing capabilities and not just superficially integrate missing knowledge into their weights, the models can't be simply continually pretrained for a few billion tokens on smut, fiction and human conversations; those would have to be introduced at sane percentages for a long enough training duration (much longer than 100B tokens) together with the previously used general data mixture using similar training hyperparameters, which only the companies training the models are privy of.
Likewise, RP or even ERP data would have to be introduced organically in the same post-training datasets used for the standard instruct models in a way that doesn't turn the models into horny sluts.
It is both a skill and funding issue, because you can't simply slap some AO3 or ASSTR data and Claude logs into the weights and call it a RP/writing model.
>>105687982What's the longest attempt at community "continued pretrain" so far? I'd certainly like to see some paper on why it'd need to be that long (100b+), I'm not talking about replicating their exact data mix as that would be impossible in most cases, but something like let's say 5-10% finetune material 90-95% "somehealthy pretrain mix'(books, common crawl, etc). I recall one paper by meta from a year or more ago stating that you can mitigate most catastrophic material by including as little as 2% of the original mixture, enough to trigger the needed capabilities so that the optimizer doesn't wipe them.
>>105688083*most catastrophic forgetting
>>105688083I have no idea of what was the longest attempt so far at that. That's something the various (some ongoing) distributed training efforts should have focused on, instead of training new useless models from scratch.
If you have say 50B selected tokens of conversational/writing/RP-related data (not really a lot of data, all things considered), making that 5% of the mixture would bring the total training data to 1T tokens.
>>105688174I'm, aware of Prime Intellect's efforts as far as making some decentralized training infrastructure, I think most of their code is open, so maybe one day lmg can get off their ass and try their own runs. Assuming anyone can agree on what the datasets would be what the mix would be and so on, or even on top of what to train! It'd probably be at least half a year preparing a good enough dataset heh, if not longer, but I have severe doubt about lmg's desire to organize on this.
>>105688083the only good result I've seen from continued pretrain was by mistral with mistral medium (1) aka miqu, and that's because they knew what was in the llama2-70B data that they continued for it
anything else was pretty shit
>>105630585RAM arrived, some initial DeepSeek-R1 benchmarks on an old single-socket E5v4 platform.
Platform: Xeon E5-2697A v4, 256GB RAM, 2133MHz 4-channel + GTX 1060 6GB
Quant: unsloth/DeepSeek-R1-0528-UD-IQ2_M
pp: on GPU
tg: on CPU only
>llama.cpp (bf2a99e)| model | size | params | backend | ngl | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | --------------: | -------------------: |
| deepseek2 671B IQ2_M - 2.7 bpw | 212.82 GiB | 671.03 B | CUDA | 0 | 1 | 0 | pp512 | 7.94 ยฑ 0.13 |
| deepseek2 671B IQ2_M - 2.7 bpw | 212.82 GiB | 671.03 B | CUDA | 0 | 1 | 0 | tg128 | 2.07 ยฑ 0.07 |
>ik_llama.cpp (ddda4d9)| model | size | params | backend | ngl | fa | mla | amb | mmap | rtr | fmoe | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --: | ----: | ---: | --: | ---: | ------------: | ---------------: |
| deepseek2 671B IQ2_M - 2.7 bpw | 213.83 GiB | 672.05 B | CUDA | 0 | 1 | 3 | 512 | 0 | 1 | 1 | pp512 | 5.53 ยฑ 1.37 |
| deepseek2 671B IQ2_M - 2.7 bpw | 213.83 GiB | 672.05 B | CUDA | 0 | 1 | 3 | 512 | 0 | 1 | 1 | tg128 | 1.84 ยฑ 0.07 |
ik_llama.cpp is slower, both in pp and tg. I have runtime repacking enabled, downloading ubergarms quant right now to see if it makes any difference, or if ik_llama.cpp is a meme for CPU-only inference.
>>105688247Consider also benchmarking the performance at a non-zero --depth since the code for attention is different and you won't see this difference on an empty context.
>>105688213There's also Nous Psyche: https://psyche.network/runs/consilience-40b-1/0
It might be decentralized training, but they're still using hundreds of H100 GPUs and it's nevertheless taking forever. It's unclear if consumer ones would be even enough for decently-sized models at modern context sizes.
As for the pretraining dataset, it's not just the mixture itself; at this point it's also augmentation, whether to include synthetic data/instructions there (which would very likely help although some might be ideologically opposed to it), any specific long-context training strategy, etc. And then there's post-training/RLHF which would make or break the model...
>>105688269oh right, good point
>https://rentry.org/LLMAdventurersGuide
Did a few test by incorporating the Game Master character and couple of lorebooks to ST and this is pretty cool with Mistral 3.2. This certainly has potential but as of now, it's kind of free form adventure with no real goals obviously.
Now doing a test and converting (https://en.wikipedia.org/wiki/Castle_Caldwell_and_Beyond) descriptions to lorebooks to see how more closed location would work.
Not entirely sure what would be the best format though. The adventure booklet has descriptions for each room and encounters as well so in this sense everything has been laid out.
>>105681859I think the training involves a lot of "relevant data -> answer" and not a lot of "random junk that might include relevant data -> answer". At least that's what I was seeing with my Japanese translation with RAG project attempts last year.
you did write your own code right anons?
https://github.com/LLMauthorbench/LLMauthorbench
>>105688283I mentioned PI's stuff because they were claiming their code is ready to handle both malicious nodes and smaller GPUs, but I think the smaller GPU stuff is mostly good for RL rather than pretrain proper, but maybe I'm wrong about that:
https://xcancel.com/PrimeIntellect/status/1937272179223380282#m Pipeline Parallelism No single GPU holds the full model - each handles a stage, streaming activations forward. This lets smaller GPUs run large models like DeepSeek-R1. Hidden states pass stage to stage; the final GPU decodes a token, sends it back, and the cycle continues.
file
md5: fdc5adc5e2847e4e805fe30696770165
๐
>>105688383migu when she wears exactly the opposite of what she should be earing
>>105688383she's literally me right now
>>105686068Right but I argue neither of the tools and scaffolding affording for LLMs to play Pokemon are even accurate. if we want to actually replicate accurately presenting an LLM with what a human does, the only real way to replicate that is to provide it a PDF of the game's manual, and then let it loose on the game with vision and button press capability which was what I had when I was 7 and whatever happens happens. I guess you can add the reuse context and feed it over and over again until it can play it but effectively, any of the mapping and etc. tool stuff that is manually coded into the LLM's input for tool use by an LLM does not accurately model human play at all. They both effectively suck.
>>105688488>provide it a PDF of the game's manualDo 6 year olds read manuals?
>>105688498I did because it was in the box, and there are enough simple words there you can skip over the words you don't know and still get a general gist of things from the pictures.
>>105688498Game manuals don't exist anymore, but I did. Whenever my parents bought me a game I'd read the manual on the way home.
>>105688505>>105688507And that is why you wear a dress and post that disgusting avatar of your AGP fetish everywhere now.
>>105688513I stopped at Ruby and Sapphire. You can have played Pokemon and remember enough of what you played because of the Pokemania from the 90s and still have dropped it and not identify with those freaks that take up an entire board and spend a sad existence coping about the state of the franchise.
>>105688513>AGPThe forerunner of PCI, and better than VESA local.
>>105688513Imagine seeing a game manual and thinking about trannies in dresses
>>105687779I prefer GLM4 nowadays. How did the tuning of that work out?
>go to local migu general where people use text to communicate about text (and image) models
>also use text to communicate
>read post about people using text to communicate
>this must be a sick fetish.
anon your LLM slop is so retarded I can only at this point assume that you occupy either full terminal illness or you're actually a third world hire who's sole function is to engagement farm to increase site activity. across so many boards and so many threads you are consistently the one with the worst imaginable takes that at this point it must be, in some capacity, a task that is no longer fuelled by any personal ambition or enjoyment because it's too fucking retarded at all times that any central guiding force simply cannot exist outside of cash money
if you're doing this for free, you are single-handedly the biggest waste of a child brought to full term in the human race.
>>105681743Their mod queue isn't progressing or something so even new comments aren't appearing. For example there's supposed to be 35 replies to this thread but it's just blank...
old.reddit.com/r/LocalLLaMA/comments/1lhi8p8/how_much_performance_am_i_losing_using_chipset_vs/
Something must have happend to the, apparently, only 1 moderator.
>>105688567There is a discussion on the localllm subforum https://old.reddit.com/r/LocalLLM/comments/1lif5yo/whats_happened_to_the_localllama_subreddit/
Please take your discussion there.
>try the ERP LLMs thinking they can't be that good
>run some stock settings and a pre-made card
>type out of a basic user persona
>the busty demon futa LLM proceeds to seduce my foxgirl with hypnosis, then fucks my poor foxgirls brains out in multiple orgasms using both holes, choking me out at the end before we both fall asleep in a fluid soaked cuddle
>can't distinguish it from a normal ERP, even surprised me with the hypnosis and choking (THIS WASN'T IN THE CARD, I EVEN READ THE FULL THING TO SEE HOW IT WORKS)
what the fuck what the fuck what the fuck what the fuck what the fuck what the fuck
how has this not ruined more lives jesus christ
this shit is too dangerous, I'm deleting it
>>105681743Time to move to
https://www.reddit.com/r/LocalLLM
>>105688625It's pretty shit for the more niche kinks.
>>105688626You can't simply "move" like that. LocalLLaMA was large and visited enough that ML research papers were citing it too.
>>105688639>You can't simply "move" like thatSure you can, just not overnight
>>105688625The more you use these models, the more you notice their problems until you need a new, better one.
Which, coincidentally, releases in two weeks.
>>105688639>ML research papers were citing redditWhat a pathetic state this field is in
>>105688625You probably accidentally brushed the dominant millionaire werewolf vampire sex training data domain. You are just lucky.
>>105688625>most mentally sane futafag groomer trannydamn i wonder why everyone except other cumbrained gooner trannies hate these types of 'people'
>>105688677>starts talking about trannies unprompted
>literal fucking leftie redditors in my /lmg/ thread
time for 4chan to burn down
>>105688689>literal fucking leftie redditorsalways has been. there is zero traffic increase since r/localllama died
>>105687438>And the future I want to live is one where this key technology is in the hands of the people, not just corporations and billionaires.I still don't see how that's a leftist thing, I'm a right winger and I'm pro open source aswell
is the reasoning component of a model always shit for RP? Reasoning is for math and puzzles, right?
>>105688701>r/localllama diedwhat happened?
>>105688689It already did a few months back. We're all dead here.
>>1056886251) You'll get over it. You'll see.
2) This is the Atari 2600 version of this tech. There are $billions chasing it in both HW and SW.
We are only at the beginning; having just scratched what's possible. I expect full holodeck-tier VR, where you state a premise and the system responds with full audio/visual RP with you as a character, within my lifetime. I expect entertainment so compelling people waste away from it, Infinite Jest style.
>>105688688thanks for outing yourself, ywn
baw
back to trooncord, disgusting faggot
>>105688689That's right, we 4chan anonymous hackers are edgy as fuck and use at least 3 twitter buzzwords in every sentence
>>105688708The owner set the automod to shadowdelete every new post, removed the other human moderator and deleted his account.
>>105688707Not necessarily shit/worse, more just pointless unless you're tracking stats and actions for an RPG-like experience. If you're doing a normal chat then I wouldn't bother.
>>105688707It works sometimes but if you're using 70b finetuned llamas that <think>, you need to tard wrangle it's reasoning AND it's response. It's really good for character states and locations.
i present to you the most disturbing image on 4chan
never forget what really has happened to this place
>>105688735Trannydrama, probably. He had no recent posting history.
>>105688740>the flagif I speak...
https://www.youtube.com/watch?v=9wtvXoXh0VU
>>105688740can mods see your IP?
>>105688625I think it's funny.
>>105688247you need their quants because they've implemented mla in a different way
>>105688560seethe, rope & dial8
go back with your 8b model rajeesh
>>105688703My personal view on open source software is that it's basically communism (yes, even if billion dollar corporations partake in it).
My motivation for working on llama.cpp/ggml to a large part goes along the lines of "from each according to their ability, to each according to their needs".
If you disagree with my view that's fine, I'm not making the claim that there aren't other reasons to be pro open source.
>>105688625what did you use?
It looks like Mistral Saba is being deprecated and the recommended replacement model is now the latest Mistral Small (3.2). That wouldn't be normally worth mentioning, but it could mean it's indeed more than just a slightly different finetune.
>>105689007awfully agp troon rajesh of you xaar
A grown man plays with dolls and posts pictures of it on an imageboard. And then he is shocked that people realize he is a troon.
>>105689143many such cases
>>105689143fuwanon is cute and you're not
fuwanon has proven that he isn't a tranny by wearing cargo pants
What are the lil bros yappin about
>check lmg for the first time in months
>no new models
>majority of thread talking about trannies
>>105689201>no new modelssmall3.2 exists
and this is the usual state of threads now
>>105689201sebian zoomer is in the middle of a meltdown, probably related to the isreal-iran war
file
md5: 1b5e4652925467a7d4196c4fa2fe9c6b
๐
>>105689211It's shit, it can't into lewd, and mistral was caught benchmaxxing "what is a mesugaki?"
>>105689228>and mistral was caught benchmaxxing "what is a mesugaki?"really? that's kinda based if you ask me
file
md5: a027429fb1f4e2fb8b083c3aae7475ac
๐
>>105689235>if you ask meBut no one did.
>>105689235It's not based because it still doesn't know what a mesugaki is in any context other than that exact question.
>>105660676>>105660793
>>105689228Compared to Gemma 3's the vision model in Mistral Small sucks for NSFW imagery & poses and it didn't get improved in 3.2.
>>105689258Isn't gemma 3 really cucked though?
>>105689251Other than (pre)training the model on many different sources where that word is used, how would they (or any finetuner) improve that sort of knowledge?
>>105689228I'm sure they and everyone else must train on everything that people ask on lmarena, but I got to wonder how they got the correct answer? Do they have people manually creating datasets with the correct answers to lmarena questions?
>>105689201>astroturfing this harduh oh
>>105689241>least mentally ill terminally online tranny spamming his trash nobody cares about in irrelevant places online 24/7 instead of keeping it to his gooner discordSo this is why everyone hates you
>>105689270There is no other way but it's funny that this superficial knowledge of the definition suddenly appeared in a minor update.
>>105689290Literally nobody in this thread ever defended trannies ever and you are still seething about them. It's incredible that you've managed to become more hated than a literal tranny.
>>105689312The only 'people' who try to shame others who shit on trannies in any context are either trannies or some even more retarded normgroid NPCs, your gaslighting failed
>>105689339People shame you for shitting on the thread, not for shitting on trannies.
>>105689312>It's incredible that you've managed to become more hated than a literal tranny.Mikufaggots are the most hated demographic of /lmg/. A subset of those subhumans is a fucking janny faggot.
>>105689344>People shame you for shitting on the threadThis is what happens when you post your AGP fetish avatar and you never learn that is why people hate you troon.
>>105689344Your gaping axe wound makes everything smell like shit everywhere you go you disgusting troon.
>>105689347Miku was the thread mascot almost since the beginning of the general and it was never a problem until (You) arrived.
>>105689369>thread cultureoh no, melty inc
>>105689344>People shame you for shitting on the thread, lmao, picrel normgroid futa gooner underage degenerates really are the highest quality newniggers that can join the high quality thread discussion, you definitely arent a mentally ill npc who again failed to revise history as called out, again
troons really are retarded, lol
>>105689369>offtopic trash waifu was here since the beginningStop posting your offtopic trash waifu. Or don't and continue being hated for being a troon. Either is fine.
>>105689263It can't organically use dirty words on its own or write good smut, but with a suitable prompt it doesn't have issues describing explicit nudity and limited pornographic poses.
I like seeing high-quality machine generated images, vocaloids are fine as a motif.
I don't particularly care about photographs of dolls either way.
Culture warriors can fuck off to Twitter.
>>105688488>the only real way to replicate that is to provide it a PDF of the game's manual, and then let it loose on the game with vision and button press capabilityIt has that. Its context is preloaded and a separate LLM provides it with information on where to go and what to do next. So when he gets off track the other LLM puts his current goal in his context like "beat Erica." The experiment is set up well and in my opinion any additional tools would make the run uninteresting. Maybe the Claude dev is just lazier than the Gemini or GPT devs, but imo he did a good job of choosing the information and tools.
>does not accurately model human play at all. They both effectively suck.That's the point. When you're watching Claude it gives you a good idea of how AI agents are different than humans and how we can use them with that difference in mind. He has a verbal IQ of 120 and a spatial IQ of 10. It's very strange and insightful.
>>105688842Based fellow prolapse enjoyer.
>>105688688>futa>not trannypick one