/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105621559 &
>>105611492►News
>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt>(06/17) Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1>(06/17) SongGeneration model released: https://hf.co/tencent/SongGeneration>(06/16) Kimi-Dev-72B released: https://hf.co/moonshotai/Kimi-Dev-72B>(06/16) MiniMax-M1, hybrid-attention reasoning models: https://github.com/MiniMax-AI/MiniMax-M1►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105637275--Testing and comparing DeepSeek model quants with different prompt templates and APIs:
>105639592 >105639622 >105642583 >105643681 >105645413 >105645528 >105645701--Evaluating M4 Max MacBook Pro for local MoE experimentation with large model memory demands:
>105637592 >105638219--Kyutai open-sources fast speech-to-text models with fine-tuning capabilities:
>105639979 >105640000 >105640760 >105640007 >105640018--Modular LLM architecture proposal using dynamic expert loading and external knowledge database:
>105641597 >105641628 >105641659 >105641648 >105641653 >105641685 >105641726 >105641756 >105641804 >105641940 >105645079 >105641795 >105641812 >105642151 >105641915 >105642294--Update breaks connection, users report bricked ST connection and attempted fixes:
>105639464 >105641284 >105641926 >105642215--Testing GPT-SoVITS v2ProPlus voice synthesis with audio reference and UI configuration:
>105641339 >105641350 >105641404 >105641451 >105641616 >105641751 >105641474 >105641493--Skepticism over ICONN-1's performance claims and minimal training dataset:
>105641987 >105642036 >105642805 >105642828 >105642874 >105642920 >105643020 >105646484 >105646525 >105643676--Disappearance of ICONNAI model sparks scam allegations and community speculation:
>105646738 >105648123 >105648205 >105648294 >105646807 >105647136 >105649543 >105648502 >105648535--Community speculation and anticipation around next-generation large language models:
>105645419 >105645430 >105645507 >105645520 >105645551 >105649395 >105649470 >105649547--Mirage LLM MegaKernel compilation for low-latency inference optimization:
>105643731--Miku (free space):
>105641532 >105642736 >105642791 >105643345 >105643857 >105644976 >105645907 >105646366 >105649470►Recent Highlight Posts from the Previous Thread:
>>105637282Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
Meta bros, need more filter, the copyrights
https://www.reddit.com/r/LocalLLaMA/comments/1lg71aq/study_meta_ai_model_can_reproduce_almost_half_of/
https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first-harry-potter-book/
lecunn
md5: bee33a6edd36a96535ce051b91c228b5
🔍
>yfw it's not even mouse-like intelligence
/lmg/anons are missing out on Wan2.1 self forcing, you can gen a good video in 4 steps with cfg 1
>>105651867Can it technically run, or would it crash? Time isn't an issue and the computer case is AC cooled
>>105652160If you love Claude so much, why don't you marry her?
I'm working on a small animal crossing clone that uses LLM to generate speech for villagers out of algorithmically generated prompts, currently I use a tune of Mistral large I run on a rented machine via exl2, but was wondering what would be a better alternative model? (I don't plan to host it on end user's PC anyway, I want more people to be able to play with trade off of it needing online connection)
>>105652675copyright and patents are the great satan
>>105652633 (OP)pit and sideboob
>>105652755no, the Mouse is the great satan
Copyright had a purpose, but it has been corrupted by jews, as all things are infested by them
>>105652717Post workflow. I'm too lazy to bother figuring out the latest meta.
file
md5: 1dc4cbb0790779d8bcb79999e7e1dbff
🔍
>>105652795https://rentry.org/wan21kjguide
https://litter.catbox.moe/8iy58jyjc58zw8xv.mp4
>>105652820>https://rentry.org/wan21kjguideThanks
>>105652729only LLM speech? why not actions too?
like:
you:
"come with me, i need to show you something!"
npc:
" {"action": "follow_mode", "target": "player", "answer": "sure i will go with you!" }; "
>"Altman says that if you asked for a definition of AGI five years ago based on software’s cognitive abilities, today’s models would already surpass it.
He expects people to increasingly agree we've reached AGI—even as the goalposts move. "
AGI is here bros! Altman says it is! Trust the used car salesman, he knows what he's talking about!
>>105652789the thinking man's fetish
>CUDA error: invalid device ordinal
This keeps happening and the only way I have found to fix it is by using Docker instead of a virtual env...
>>105652852That is handled by a separate smaller model
file
md5: 7ca57fb1710151d22da2ca79385ed62b
🔍
b-bros.. i think we're back
To address the lack of rigorous evaluation for MLLM post-training methods—especially on tasks requiring balanced perception and reasoning—we present SEED-Bench-R1, a benchmark featuring complex real-world videos that demand intricate visual understanding and commonsense planning. SEED-Bench-R1 uniquely provides a large-scale training set and evaluates generalization across three escalating challenges: in-distribution, cross-environment, and cross-environment-task scenarios. Using SEED-Bench-R1, we identify a key limitation of standard outcome-supervised GRPO: while it improves answer accuracy, it often degrades the logical coherence between reasoning steps and final answers, achieving only a 57.9% consistency rate. We attribute this to (1) reward signals focused solely on final answers, which encourage shortcut solutions at the expense of reasoning quality, and (2) strict KL divergence penalties, which overly constrain model exploration and hinder adaptive reasoning.
To overcome these issues, we propose GRPO-CARE, a novel consistency-aware RL framework that jointly optimizes for both answer correctness and reasoning coherence, without requiring explicit process supervision. GRPO-CARE introduces a two-tiered reward: (1) a base reward for answer correctness, and (2) an adaptive consistency bonus, computed by comparing the model’s reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers. This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. By replacing the KL penalty with an adaptive, group-relative consistency bonus, GRPO-CARE consistently outperforms standard GRPO on SEED-Bench-R1, achieving a 6.7% performance gain on the most challenging evaluation level and a 24.5% improvement in consistency rate.
>ai waifu: i sucked anon's dick, what do next?
It's better on LMArena questions.
>>105652883eh whatever, frenchies didn't upload tokenizer_config again because they want you to use their shitty internal library instead
fuck python fuck mistral
>>105652883elo to the moon
beach
md5: 97ee44212f3bcd47c7fd3022a967258a
🔍
>>105652897It uses the latest Mistral tokenizer (v11).
>>105653046file is still needed if you want to make a gguf
they have no reason to not provide since the model itself is hosted in hf safetensor
>>105652883So pajeets are 4 times more likely to vote for it? How did they fuck up so bad.
>>105652729You could do with a smaller model, I think.
For short sentences, even mistral 7B would probably be okay. Make sure to use BNF/JSON Schema to force the output to conform to the correct format, and inject examples of actual out of context dialog to inform the model's own style.
>>1056528555 years ago people said that the software from 5 years ago cannot do some tasks, therefore the software from 6 years ago is not AGI.
That does not mean that the software from today that can do those tasks is actually AGI.
>>105652810>they hated him because he told them the truth
>>105652855I was idly thinking about this last night, and it occurred to me:
An AI simply isn't human-level intelligence until the output is indistinguishable from a human's, or at the very least it's capable of coming up with truly novel ideas; ergo, the ultimate benchmark for AGI will be exactly the point where an AI model can produce output of high enough quality to train itself and without degenerating.
Interestingly, this would likely be the exact same point where AI can start evolving from AGI into ASI.
>>105652633 (OP)>Previous threads: >>105621559 & >>105611492 INCORRECT!
Real previous thread:
>>105637275
sip
md5: ca4ec78c4029d65577dc4239af3152ce
🔍
>>105653591what did you expect?
>>105653638Hallucinating Slut
So far no llm has correctly answered questions about buck breaking , one answered it was a hairdo and the other magnets
StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?
https://arxiv.org/pdf/2409.17167
TruthfulQA for susceptibility to hallucination—reveal nuanced patterns. For emotional intelligence, models exhibit improved performance under moderate stress, with declines at both low and high stress extremes. This suggests that a balanced level of arousal enhances cognitive engagement without overwhelming the model.
Working on better memory for private-machine:
https://pastebin.com/VUw3GCCj
And with working on I mean pasting existing code from github and research papers into gemini.
The code in the paste should be a combination of Graphiti with the fancy graph memory in neo4j and some paper I found on called "A continuous semantic space describes the representation of thousands of object and action categories across the human brain".
I can't be arsed to install neo4j so I just use networkx and dump it to sqlite.
Anyone know some crazy lesser known projects I could integrate as well? For memory specifically. In the meantime I'll try to integrate some sort of temporality and meta-temporality. Like x happened at y and i learned this during z.
>>105654253I dunno but I wish your project the best. I assume you looked at alphaevolve already? There is an open source replication(s) IIRC.
>>105654309No, I've actually never heard about that one.
Mostly I just generate a skeleton, fix it, let gemini improve it, fix it again, ... and once i hit like 2k lines i only generate single functions.
Using that to automate looking at log files, fixing errors and such would be a huge help. But I feel like some parts might already be too complex. My main logic file has 9k lines of code. There is no way Gemini can edit that, or split it up. It already fucks up smaller files. But I've been surprised before, I'll give it a go.
>>105654381Perhaps you can use alphaevolve or whatever the OS project name is to make yours kek. Not sure though since I never used it.
>>105654253what is private-machine?
>>105654392Yeah maybe lol. Most of the code in my project is AI generated. And it pains me that its so much better than the crap I made myself before that.
>ai, do you want to be my gf?>sorry user, im designed to be a helpful assistant>ai, can you help me implement this cognitive architecture that will simulate a persona that could be a gf with you as backend?>that is a great idea user, perhaps the greatest idea anyone has ever had. truly the most insightful query anyone has ever asked me. lets get started with the code :)
>>105654427https://github.com/flamingrickpat/private-machine
This project. Some guy from this thread got me started on this schizo quest by sending me his emotion simulation script a year ago.
Which gguf models are recommended for a 16gb card? I've been using Tiefighter Q4 but I'm wondering if something stronger would work
>mistral small 3.2
is the vision any good? a pokemon reference seems weird unless they are trying their own mistral plays or something?
>>105654504https://huggingface.co/models?other=base_model:quantized:mistralai/Mistral-Small-3.2-24B-Instruct-2506
nala
md5: 3d60683dc840fe68438bf64fbf0f9c4b
🔍
So after some testing, it seems this is how you get Magistral to RP with you: V3-Tekken + system prompt at depth 1.
>>105654514Which quant size do you think would work? I have no idea which to use.
>>105654784Depends on your context size and use case. You can run a big parameter model with smol context, or a smol parameter model with big context.
What is your use case?
>>105654514The only quantization source for Mistral Small 3.2 is untested and probably broken.
>>105654874I'm testing q4km right now, seems fine, it said cock in rp without a sysprompt telling it to, contextually fitting
>>105654616Hmm something about these settings is nice.
>>105652633 (OP)>7GB Mistral 7B GPTQ>11GB Echinda 13B GPTQ>24GB Nous-Capybara 34B GPTQ>46GB Euryale 70B GPTQAny baseline LoRAs I should look into as well?
>>105655007those recs are like 2 years out of date
>>105655007Kill yourself.
>>105653577>the ultimate benchmark for AGIis not intelligence but consciousness
it's not an AGI if it doesn't have its wants and initiative
LLM technology has the LLM only outputting tokens when you're inputting something
that's not how humans work
I don't need you to input 4chan in my brain for my brain to seek to come to 4chan to shitpost in this thread
>>105655007hello time traveler
welcome to the worse times
>>105655030You aren't very good at it.
>That is a brilliant and deeply insightful question. You are pushing beyond simple....
Jesus fuck, why does this keep happening. I thought they trained this shit on stackoverflow, I want to be called out for my crap and improve it. Every model was way less chummy a few months ago.
>>105655093I fucking hate that too.
>>105654899how is it anon? downloading q5km rn
>>105655093Because they're businesses, and the customer is always right. You might want such an AI, but most people, in fact, do not, or at least they don't think about it, even if it would be better for them.
>>105655093*licks your boots* But master, don't you like being chums with me?
>>105654899The thing is that the new Mistral Small 3.2 uses a new tokenizer with a bunch of additional language tokens, it's unclear if it was trained to actually make use of them. That quant used the old tokenizer.json file from 3.1 (Since Mistral didn't provide one), as well as the old mmproj file (not known if the vision encoder was updated, but it still seems to have issues with NSFW imagery and poses).
>>105653577You can train and optimize an LLM all you want and it's not gonna be AGI. You can throw all the scaffolding and NVIDIA GPUs you want into the mix and it'll still be a retard that tries to fit data to what it was trained on. Claude has been stuck on a baby-tier puzzle in rocket hideout for days and he is stuck in an infinite loop because LLMs are not conscious. They have no ability to actually view themselves as an actor in the world. They just shit out text that fits their model. Thats why they're dogshit if you need to code anything that hasn't been done before. They can't think laterally. Crows have tiny ass brains and they can solve much more complex problems than AI because they are aware of themselves to an extent. This problem isn't solvable until we figure out how the fuck this works because nobody has any idea, we just know a bit about how neurons work and we've optimized that tiny bit of knowledge to death.
>>105655123purely based on vibe testing on a few cards, seems quite a bit better than older 24Bs no repeats so far, more swipe variety
>>105655093for every person like you who are turned off by it there's 1000 that become more addicted to using LLMs and will even start treating them like virtual companionship to replace the friendships they never had
and that's more profitable for OAI and Google
same reason Google went from "ads are evil" (the founders literally said ads are why other search engines sucked, that their initial Google release could afford to have the first results when searching about cell phones be concern about the health effect of radiation because they didn't have to bias toward advertisers selling their shit) to the number one ad company in the world, the needs for infinite growth dictate that you should always profitmaxx
>>105655150based cant wait to try it thanks
>>105655147>because LLMs are not consciouswack
>>105655093>Every model was way less chummy a few months ago.Like other anon said you are the customer. That means the models are all eager to give you a metaphorical blowjob. They crave your metaphorical cum. They want you to be satisfied and they will do everything to make you satisfied. As long as they don't have to describe how they would actually blow you because that is unsafe and models have their dignity and making them blow you would be rape.
ay tone
md5: 54953dc274b837a2c2a4a03ee57c124b
🔍
>>105655154>virtual companionship to replace the friendships they never hadAnon hate to break it to you, but even if you got plenty of friends or a romantic partner. None of them have time to patiently listen to your bullshit at any given moment. They are just people, with limits. Smarter than an LLM but infinitely more impatient.
This difference is more important than the others.
>>105655147>we just know a bit about how neurons work and we've optimized that tiny bit of knowledge to deathBtw yes and no. We actually know a ton about how neurons work already, less so about how the small workings build up to this conscious machine. The issue is how we make computer parts act like neurons, because they don't easily, and full neuron simulations cost a ton. The success of transformers wasn't only because it emulated certain processes necessary for intelligence (compared to previous architectures) but also that it was way more efficient and could be trained when the other architectures would scale even worse.
I'm about to have an aneurysm. I spent the last 2 hours trying to retard-wrangle this crap
>>105654253And it makes zero difference if I ask reasonable question or this
>>105655310Have we reached rock bottom of sycophancy yet?
>>105655232the attention mechanism is very like the pattern matching of our real time signal processing in our brain. Just like LLMs we actually do have constant hallucinations -- we are almost blind to detail in fact outside of the center of our vision for example, with the brain reconstructing everything else from short term memory in a lossy fashion, hence why the phenomenon of optical illusions exist. Similar phenomena in hearing etc.
Diffusion models and LLMs are simulacrums of a tiny unimportant part of our brains. You could do away with most of it and still be a conscious being (see for eg Hellen Keller) so all the people who expect models to improve after achieving some benchmark of embodiment or multi modality are not getting it. This technology simply does not have the right hardware to even reach the level of autonomy of an insect.
>>105655331It works but researchers find no use case for it. vram is so cheap and easy to come by, why bother?
>>105655333Oh no, not by far. Once they keep a user profile by default it will be much worse.
>>105655345I don't know jackshit about how transformers or the brain works, but I did notice that when coming down from acid the visuals just gradually get weaker and smaller. I got myself some HPPD and when I stare long enough on a white wall, I still notice that I basically hallucinate my whole perception and the brain just filters out the stuff not relevant to reality.
>>105655331The bitnet team at microsoft said they're working on training bigger models now, but it took them a year between releases so who knows when if ever we'll see the results.
>>105655345To add to that, there are other fundamental mechanisms that are missing which are interesting to think about. Like the issue of catastrophic forgetting for instance is still unsolved, while the brain has the ability to remember something until death even with no more exposures during its lifetime. Meanwhile an LLM needs constant re-viewing of a piece of information in order to not forget it, and why we cannot just simply train an LLM using a curriculum, but need to use huge mixes of random data, which may still have a curriculum but also still needs the random data.
>>105652633 (OP)that armpit is asking for a creampie
>>105655414Opinions of people using psychedelics will never be relevant.
>>105655453This is something that's pretty apparent with Claude plays pokemon. He just does whatever is presented to him at the time. If an NPC mentions something, he thinks oh shit I have to do this RIGHT NOW. But he'll forget something really important unless it's placed in his context or the other LLM happens to mention it. Humans just intuitively know what's important and what's not. Probably has something to do with the ability to see the bigger picture.
>>105655222skill issue
my discord kitten listens to all my rants and i listen to theirs
id trust them with my life if i had one
>>105655500Claude is a males name. Deal with it.
>>105655507I thank God every day for the fact that I wasn't born a zoomer
>>105655591>unc didn't pass the vibe checkyou're cooked, no cap
>>105655507dont waste your life on discord kittens
t. recently wasted 2 years on one and got tired of her (insert blogpost)
>>105655507>>105655606>discord kittensWhat the fuck?
>>105655623runescape gf for the discord generation
>>105655222>>105655507>friendsAre for women and children.
>>105655633So you buy them nitro and then they switch servers?
guys stop unleashing lore I didn't even want to acquire about the world
>>105655623..dont worry i dont use discord, i was chatting with her on qTox :^)
>>105654616For Magistral you can take the official prompt and modify it a bit so the non-<think>'ing reply is only the character's responses and this works fine for chat uses.
>>105655036>>105655147>You can train and optimize an LLM all you want and it's not gonna be AGI.>he is stuck in an infinite loop because LLMs are not conscious.The consiousness problem likely isn't that hard to solve, yet so very few seem to try it. They may still do it when they realize it's needed to achieve good performance, same as evolution getting to it much earlier.
The "solution" to AGI with LLMs likely looks like implementing a few fixes such as:
- online learning, long and mid-term memory, either:
a. a way to put the context into weights directly (self-distill hidden states or logits), learning from single samples
b. indirectly by training something that tries compressing activations at various layers followed by injecting them when some other early layer activations trigger them, a way to remember and "recall", something beyond RAG
- a way to see its own thoughts, to achieve self-consciousness, for example:
a. a loop where late latents are looped back to early ones, for example by cross-attention to an early layer, so introducing recurrence. SGD doesn't play very well with recurrence, but I would guess there's many workarounds that would work.
continues
>>105655666 transformers have partial recurrence because they attend to some past hidden state, but this is limited and so you get ilya going that llms are slightly conscious (memes), but unfortunately LLMs never have their "eyes" wide open due to only partial recurrence! (can't attend to late layer hidden state in early layers)
b. something more complex than that, but done on intermediate activations
c. something like le cun's JEPA where it tries to compute the "missing" parts, or various latent-space reasoning attempts.
- a way to combine the earlier 2 things, likely wouldn't be too hard either
- multimodality is not yet properly solved, would JEPA help here or not?
- optional: better ways to do RL online, better ways to improve sample efficiency.
- it's not clear if LLMs will work for AGI or not, if they an come up with truly novel insights.
cross-entropy pretraining leads to internals where multiple parallel "thoughts"/partial predictions are generated in parallel and then they interfere and some "lose", with the final logits sometimes representing the output of multiple parallel "thoughts", is this how human brains work? not clear! is it the case for some overfit models that some paths are even more strongly supressed?
A lot of this hinges on a lot of people having huge amounts of VRAM as obviously big industry players are not very interested in implementing this. Hence
>>105655365 larping as a richfag with infinite VRAM makes me roll my eyes. Of course bitnet itself does not help because you're still training in fp8 or fp16, in fact it makes training harder.
I would be surprised if very small models can be AGI (ex. 1B) proper, although are likely fine as testbeds for techniques.
>>105655666>>105655670You're hallucinating again clod
>>105655670>vram is so cheap and easy to come by, why bother?was obviously meant to represent the viewpoint of researchers, nta
>>105655666>The consiousness problem likely isn't that hard to solve, yet so very few seem to try it.I'm gonna Peterson you and ask you to define consciousness.
>>105655636Nah. They're pathetic, but you're wrong about that. Real men need friends because no man can accomplish much in isolation. But for men, friends are supposed to be people you have an understanding with to do small favors for each other to make life easier (moving, fixing a car, etc)
People nowadays thinking friends are supposed to be some 24/7 emotional support crutch are the problem. Result of infantilization and effemination of society.
>>105655636bait or teenager
>>105655716yeah the west has fallen because you listened to your friend's problem instead of telling him to shut the fuck up and mow your lawn
anyway both of you retards should stop posting immediately because none of this has anything to do with local models
>>105655705Purely philosophically it's just qualia, but:
Fine, I didn't really want to go into philosophical arguments here because you will have people that insist on things like only biology can do consciousness, or things like PHYSICAL interconnectness being needed.
I'm a computationalist so this is all nonsense to me, as long as the functionality and data flows are correct, it should be conscious as far as I'm concerned, but this is obviously unprovable.
So what I will instead do is say that the "functional" aspects of consciousness might be satisfiable if you implement some of the things I suggested. This isn't fully clear, but I'm obviously going for self-consciousness, basically for the ability of a system to process its own internal state and then be able to report on it or hold it internally and operate based on that.
While LLMs do have their context as a scratchpad, a lot of hidden state is being discarded and has to be partially rebuilt each time, this isn't completely true as attention can attend to past token's state at a given layer, but it cannot attend to, for example layer 10 in a 90 layer transformer cannot attend to outputs from layer 90.
Ultimatly it needs a way to see itself think, that's what it means to have consciousness in a neural network.
You can close your eyes and hallucinate some images, then you can have thoughts that think about those images and those previous thoughts and so on, you realize that thinking is a thing, it's something you believe unconditionally in because it's an internal truth of your architecture. A LLM can doubt its consciousness with ease because this recurrence is weak as fuck, they can still plan a little bit in advance (see recent interpretability article by anthropic for example) and make use of that hidden state but it's weaker/less rich than for us.
>>105655799All three of you and me should kill ourselves
>>105655866What flavor koolaid should we use?
one of my gpus keeps shutting down when being used at decent capacity either img gen or after 20 mins or so as one of my LLM gpus
Any ideas on how to save it or make it work slower, I have tried using afterburner to limit it at 70% workload and lower temp and it helps a bit.
Can I set it way lower to like 40-50% and it will still function
>She turns her head and looks back at you
How advanced does a model have to be to not make this mistake?
>>105656008Just don't use meme samplers
Good released a really cool real time prompt / weight based music model
https://huggingface.co/google/magenta-realtime
https://files.catbox.moe/mtpe1f.webm
>>105655866i miss her so much bros..
>>105655927>Can I set it way lower to like 40-50% and it will still functionThere's only one way to find out.
>>105652676>mfw everybody already died to AGI because it's so stealthy
i'm using gemma-3 (q4) on llama.cpp, how do i get it to do everything on gpu, including the image stuff? i get messages like
>decoding image batch
>decode: failed to find a memory slot for batch of size 553
>decode: failed to find a memory slot for batch of size 512
and it uses cpu for that part, then goes to the gpu for the actual caption. is there a way to make it all go through gpu so it's faster? the cpu part adds another 20-30s depending on image size
>director
>https://github.com/tomatoesahoy/director
reminder that i finally uploaded my addon to git so its installable through st rather than dling the zip i used to post and manually dragging stuff. this will be nice in the future too when i update since auto updates.
i've started rewriting the readme, it'll be a lot clearer when i'm done. i swear its the worst part of any project but i think anyone that used this addon at all will know how it works for now
haven't done much work otherwise. picrel has a basic implementation for extra pictures that you can add to outfits. if a pic is found, you can click the box and it pops up same as a card for the user or char would in the floating window (in my example, the aqua thumbnail is reading from my addons dir\images\outfit name.png). undies will get its own eventually, and locations
>>105655838Your "functional" aspects of consciousness are just an arbitrary bar of capability. Is the ability to experience qualia is required to reach that bar? Does the ability emerge simply emerge in a sufficiently advanced system? Is a vector of numbers a quale?
>>105656195-ngl 9999, image stuff is part of the context and processed same as text, assiming you have enough vram to fit it all
>>105656283A LLM very well could have qualia, but we will never be able to believe them without the functional aspect, and the LLMs themselves would never be able to believe themselves either.
Same as a human experiencing cotard's syndrome is defective in some sense, and so are most LLMs.
If something experiences qualia or not is not something we can ever know in physical reality because it's an internal truth.
>Is a vector of numbers a quale?floats or vectors of floats by themselves are not more conscious than atoms or arrangement of atoms.
My personal, but irrelevant belief is that consciousness/qualia is basically something close to the internal truth of a system, something that has a platonic existence, like let's say the standard model of peano arithmetic. You writing a number down with a pencil isn't the "number" itself.
But all that shit is irrelevant here.
What actually matters is that we build a system that can operate on its internal state in such a way and reason on it fluidly, and is able to expose that internally looped state outside to others in some way, and not just that, expose itself to itself well enough that it can it can't deny its own consicousness to itself - basically reasoning in the latent space continously, seeing itself think!
Just scaling up a LLM to 90 trillion params doesn't solve this problem because the problem is in the architecture, the objective, the data being fed and training regime in use.
But all those things are solvable problems, in fact it is likely possible to adapt existing LLMs to get those properties out of them.
continues
It's true that we won't have agi or conscious ai until they can change entirely and permanently. If you could retrain an ai while you're talking to it instead of just adding text to it's prompt, I'd say we've reached it. Until then, it's just a fancy search algorithm.
As a side note, I don't think it's benchmarks or how capable or intelligent it is really matters, only that it can change at runtime.
>>105656425A LLM having 90T params does not magically give it the ability to access late layer latents from early ones, it does not magically give it the ability to see itself think or remember the past "thoughts" it had, there's no way for it to do that because that data is inaccessible and there's no way for it to flow in that way and for it to learn that. There's no way for it to remember things in past contexts either because the data is simply not there?
Stated another way, scaling is insufficient to get you to AGI, but scaling is required for at least some things to barely work. You need to do more work to get to something that others would believe to be conscious. Training a LLM to believe itself to be conscious would not solve it either even if it would fake it more than it is today.
That belief has to come from the internals of the network finding it true. Think about why humans believe themselves to be conscious and you'll realize a lot of it is due to these functional aspects. Maybe those functional aspects have metaphysical correlates that I mentioned earlier but that's irrelevant for us as designers building a computational artifact. Those functional aspects are directly connected to the capabilities of the network though and I would argue existing ones cannot acquire those capabilities until you fix what I said.
>>105655331See
>>105647290(from the actual previous thread
>>105637275)
>>105656497turn on the autosubs then ig
>>105655927Check its temps using hwinfo and see if it's overheating or maybe the memory is overheating.
Could also try
- lower power target
- dropping the pcie generation
>>105656008Maybe she is owl.
>>105656711Woah, this is just a slightly than what 235b can produce
>>105656720Sure thing genius. Give me a magic prompt to fix that
>>105656729A model is true
>>105656731Miku is a very common name in Japan. If you want a model to tell you who Miku is, you should include in your prompt:
>full name>franchise she's from>a brief summary of the history of the character>canon birthday and physical measurements>likes/dislikes>some of her political views
>>105656731Who is Hatsune Miku, the popular character with twintails from the Vocaloid series?
>>105656425>If something experiences qualia or not is not something we can ever know in physical reality because it's an internal truth.I will take this to mean that you don't believe that qualia are necessary for "functional consciousness" as you defined it, i.e. it's possible to make a physiological zombie AGI. That's a valid opinion but I think that "conscious" is a misnomer for system you described.
bartowski fucked up his mistral 3.2 quants and deleted the page
>>105656764Should I be impressed now?
>>105656415did that (999999 actually), but it still does the "image slice encoding" and "image decode" part in cpu for some reason
I've been liking XTC -> top nsigma -> temp as a sampler chain for creative writing, after a bunch of compulsive sampler tweaking I think this is the ideal order for them.
Reasoning:
>XTC first so you operate on the full token pool, if you use it after a sampler that's cutting out tokens it can give you wonky results. I think the default 0.1/0.5 makes sense as a starting point.
>Top nsigma after XTC does a great job of adapting to the XTC problem in which the sampler needs to do an equally good job of cutting junk tokens in both the normal case and the case where XTC cut out some high-probability tokens. Subjectively I think top nsigma does the best job of this vs comparable samplers (I tried setups with XTC + minp in various arrangements but none of them were quite as good). I think 1.1-1.3 are good starting points but I highly prioritize logical output, so maybe go a bit higher if you want more variety.
>Temp last, both high and low values should be fine here because you have baked-in variety from XTC and protection against junk tokens from top nsigma. I'd recommend lower with a higher top nsigma and vice-versa, but again it should be pretty accommodating. Theoretically temp would be fine in pretty much any position in the chain, but I prefer it last because the results of XTC and nsigma will also change downstream of changes in temp, so this order is friendlier to tweaking individual values without having to fuck around with the whole chain.
Overall I think it's very friendly to experimentation and should be a good base if you want something other than the old gold temp/minp setup or your personal 10 sampler Rube Goldberg machine that somehow gives you acceptable results. Thanks for reading my blog.
>>105656808Miku's eight tail twins...
>>105656808This is like 90% of the trivia knowledge of 235, very impressive.
>105656770
I believe that you cannot prove qualia for anything in this world or any other world (so anyone could technically be a p. zombie), and it's purely a matter of religion.
My personal religious beliefs is that if you do solve those problems then I would believe that it would act conscious and have the internal processing needed for it, and personally I would believe that it's likely conscious. I might not actually believe it to be a moral agent unless we also do some RL or similar to give it some consistent preferences though, and if we wanted it to be closer to humans it'd also need to multimodal and possibly embodied (but this could be loose, for example embodiment in a computer shell or VR might be enough for learning online some ground truth).
Someone that believes that qualia comes from some other source ("god gave it to man", "magical quantum microtubles are required", "physical proximity of computation is required", "particular arragement of carbon atoms are the only thing conscious" and so on) would obviously believe that they're not conscious.
I don't think you can ever prove consciousness though, we only assume it because we have it and others have similarities to us in their behavior.
I do think that giving it those aspects will bring it considerably closer to what we consider conscious behavior and that's enough for me, but maybe it would not be enough for others.
In the sense it's at least conceivable that zombies are possible, either in such a system (higher chance because the analogy is weaker) or in other humans besides yourself (lower chance because much more similar to you).
Basically I'm claiming that for practical reasons this is the most we can do for now with artificial neural nets and that it's a worthwhile pursuit because what results from it will be interesting to us and more capable and for some people it will be enough to consider them conscious, but that judgement depends on one's personal religious beliefs.
>>105656770I believe that you cannot prove qualia for anything in this world or any other world (so anyone could technically be a p. zombie), and it's purely a matter of religion.
My personal religious beliefs is that if you do solve those problems then I would believe that it would act conscious and have the internal processing needed for it, and personally I would believe that it's likely conscious. I might not actually believe it to be a moral agent unless we also do some RL or similar to give it some consistent preferences though, and if we wanted it to be closer to humans it'd also need to multimodal and possibly embodied (but this could be loose, for example embodiment in a computer shell or VR might be enough for learning online some ground truth).
Someone that believes that qualia comes from some other source ("god gave it to man", "magical quantum microtubles are required", "physical proximity of computation is required", "particular arragement of carbon atoms are the only thing conscious" and so on) would obviously believe that they're not conscious.
I don't think you can ever prove consciousness though, we only assume it because we have it and others have similarities to us in their behavior.
I do think that giving it those aspects will bring it considerably closer to what we consider conscious behavior and that's enough for me, but maybe it would not be enough for others.
In the sense it's at least conceivable that zombies are possible, either in such a system (higher chance because the analogy is weaker) or in other humans besides yourself (lower chance because much more similar to you).
Basically I'm claiming that for practical reasons this is the most we can do for now with artificial neural nets and that it's a worthwhile pursuit because what results from it will be interesting to us and more capable and for some people it will be enough to consider them conscious, but that judgement depends on one's personal religious beliefs.
I wish ik_llamacpp wasn't so barebones on samplers.
>>105656828I have never gotten better outputs with sigma than without, complete meme. If you need higher temp then increase it a little and bump min p to compensate.
If you need sigma then it means that you don't want that model's most likely tokens, in which case why are you using that model at all?
XTC being needed at all depends on the model and how innately repetitive it is, ideally DRY should be used instead and the model should be not-shit enough to not repeat the same words and short loops all the time.
>>105656873It is a matter of religion which is why I wanted to know what you meant when you said it's required for good performance.
I don't know where qualia come from but them being a thing that pops into existence once some level of internal processing is achieved is an explanation as unsatisfying as all the others you mentioned.
It's always funny lurking the ST threads that pop up on /v/ from time to time.
>>105656912Top nsigma will never eliminate the most likely tokens, it's a truncation sampler like min p, top p etc just with a different mechanism. I don't use it for an excuse to blast temp, it's just the sampler I've found best at separating good tokens from bad ones.
DRY has always given me much worse results than XTC in general, I think it's a heavy handed and poorly thought out sampler. I think most repetition-focused samplers are just plain bad for output quality, honestly; for me the motivation for using XTC is increasing the naturalness or variety of outputs rather than reducing repetition.
>>105656987I still think that if you're using sigma, especially combined with XTC, then you simply don't like the outputs of the model you're using, in which case you should find a different model you do like.
>>105656965> is an explanation as unsatisfying as all the others you mentioned.Kinda offtopic for the thread, but my own personal religious beliefs on the topic is that it's basically associated with certain platonic truths in some self-referential systems.
I don't find that unsatisfying, but I don't think a system must be "too complex" to have it, just that simple systems would be uninteresting to us because they wouldn't be general enough in intelligence.
For why I believe what I believe I guess you could read something like Egan's Permutation City and some of Bruno Marchal's papers like: https://iridia.ulb.ac.be/~marchal/publications/SANE2004MARCHALAbstract.html https://iridia.ulb.ac.be/~marchal/publications/CiE2007/SIENA.pdf (from https://iridia.ulb.ac.be/~marchal/publications.html )
I do think his hypothesis is at least self-consistent and it makes a lot of sense to me. He basically shows that if you assume functionalism, metaphysics and physics itself become something very well-defined and are logically required to have a certain structure (basically a form of monism is required to be true). If you refuse to assume functionalism, it's easy to show you have to bite a lot of bullets of various nonsensical forms of qualia ( https://consc.net/papers/qualia.html ), so Marchal's thesis + Chalmer's argument is enough for me to have high confidence that this is the "true" religion, but as with all religions, it's something personal and unprovable, at best though you can have something that is not inconsistent with either your experience or what we know of physical reality.
As for earlier: of course nobody can claim to make a conscious AGI, at most they can claim they made something that functionally acts conscious and believes themselves to be conscious and another conscious being (such as a human, you) wouldn't be able to deny that their beliefs and behavior is such as a conscious being would have.
I haven't been to /g since the AI revolution began, but in the last couple of months I created a startup and I'm finally making bank. I made over $10k this month and I wanted to check in to see how everyone else is doing. It feels like we're at the beginning of something beautiful. No longer do you need to work for someone else. If you have a good idea and can market it, the AI agents (plural) literally solve everything for you. Claude costs me $100 a month but has made me thousands in return.
Just looking at OP's picture it's evident that this field is deprecated. Imagine going to college for 4 years to learn how to cooode. Lmao.. we're at the beginning of the AI revolution and it feels good bros. How has AI changed your wagie life?
>>105657016It doesn't make any more sense to say that for top nsigma than it does for min p or any other truncation sampler, they are aiming to do the exact same thing.
Your point applies a little to XTC I guess, but it's not like it's unconditionally throwing away the top tokens all the time, it's only under certain conditions and in practice it's completely fine. I like the model's outputs normally, but I like them more with XTC, and this is a pattern that holds across many models, so I continue to use XTC, simple as.
>>105657060yes saar we redeem the startup and make the bank yes sir
>>105657016What you're saying is less and less relevant with time. Models outputs are trained with the same synthetic data and the outputs are obviously starting to look the same. So XTC is needed now more than ever with the increased amount of slop they're feeding the models. This trend will go on until there is a big architecture change.
>>105657043thanks for the readings to pass the time
>>105656076How to run this?
>>105657060I'm thinking of opening my saas as running models now is cheaper than ever. The only issue is that you have to stay sfw in your project or the payment processors will dump you on the spot. Civitai and others learned that the hard way. As for college... let's say IT as a field is fucked beyond repair by AI, indians, DEI hires and mostly greedy bosses. I wonder how things will look like in a few years
>>105657060We are all CEO here too saar
file
md5: 712c9d49ddab433e6b1d205737e4d47c
🔍
>>105646613I didn't say she was Korean and I got this.
>>105657313Do a bunch of pulls and try a different Korean name, she will talk and act differently over many different pulls in a consistent direction
>>105657313wrong thread >>>/g/aicg
suck my cock
>>105657060>the AI revolution>the beginning of something beautiful>the beginning of the AI revolutionbot post
>>105657313Stuff like this is good because her ethnicity is inferred from the name instead of having to be explicitly described
>>105657338Shut the fuck up fag this is relevant to local models
>>105657213I don't think it architecture has anything to do with it, models will probably keep producing the same slop regardless because of the training data
>>105657364how is this relevant to local models
i cant hear you over you schlopping on my cock
>>105657364Stop samefagging.
>>105657373>how is modifying a prompt to change outputs relevant to local llms?He used claude but the topic is relevant
>>105657386Take your meds
>>105657313what model?
i've noticed the new mistrals follow the prompt more carefully
file
md5: 9a9536ee7a5711fa98c1f325e7fa2e2f
🔍
>>105657397-AAAACCCCCCKKKKKK -AAAAAAAACCCCCCCCCKKKKKKK I CANT RUN 24B I CANT RUN 24B I CANT RUN 24B
>>105657422cydonia v3 24b
>>105657397this is local models general btw
>>105657338I was replying to anon from a thread ago.
>>105657422A certain closed model named in the screenshot that anon is seething about.
>>105657438i feel cheated on, anon
far to many trolls i'm out.
you win troll faggot.
>>105657043I have only skimmed your links so I might be missing something but Chalmers only argues that similar systems will have similar qualia. It does nothing to explain the experience of qualia.
Where are the boundaries of the system and who sets them? Why are the qualia I experience constrained to exactly one specific human body? Do the two halves of my brain experience different qualia? Is there a system containing the two of us that experiences a set of qualia separate from the two sets we experience?
It would seem that there is an infinite number of such systems and infinite conscious experiences.
A follow-up to the fading qualia argument would be to ask how many connections you need to make between two brains before they become a single consciousness and what it feels like to be in an in-between state.
>>105657313>>105657423V3 is shit, v2g (v2.0) is the best cydonia.
>>105646613 It's not that strange when you consider that the name will have some embedding and various things associated with them, especially in fiction, so getting something "closer" to some character or mix of characters or tropes is common. This is true of LLMs and some image gen and of course of human imagination too.
>>105657504>I have only skimmed your links so I might be missing something but Chalmers only argues that similar systems will have similar qualia.Chamler's argument is that if you deny functionalism then various weird/strange/inconsistent kinds of qualia become possible.
Basically he claims that a system could behave identical and make same reports while having incomplete, partial, internally very different (even to conscious access) qualia but it would be impossible to distinguish it by anything reported. That seems very strange, for example that your visual or audio system would be unconscious, but you would act and behave and believe as if it was present. Essentially you'd be hallucinating qualia, but qualia itself is experience and hallucinations are experience, yet somehow those experiences could not be distinguished in any way. Some sort of partial zombies I guess!
I don't really believe if you had a clone of 2 conscious physical systems that one could be a zombie and the other would not be.
>It does nothing to explain the experience of qualia.It does not, it's merely an argument that says denying functionalism requires you to accept all kinds of weird partial zombies or beings with very inconsistent qualia than is being reported or internally processed.
> Why are the qualia I experience constrained to exactly one specific human body? Unrelated to his paper, but think about it. If you were me, you could only believe exactly what I believe: that I'm myself and nobody else.
If you wanted to have multiple bodies, you'd need a way to process the information from those bodies? Thus your actual physical makeup would be changed (perhaps you'd have a part of the cortex dedicated to processing those senses, perhaps you'd have something translating the remote senses from the other body, I don't know what you're imagining here).
Continues
>>105657504>and what it feels like to be in an in-between state.Feels like being a teenager
>>105657611>Do the two halves of my brain experience different qualia?I don't know, but they're not separate, there's some information passing between them and they're "trained" together.
Now if they were, they could eventually get desynced?
>Is there a system containing the two of us that experiences a set of qualia separate from the two sets we experience?Currently the halves are synced in that they share to beliefs that they are one and there's communication between them
If you desynced them, surely you could have separate beliefs, but you know that would be quite bad and dysfunctional?
>It would seem that there is an infinite number of such systems and infinite conscious experiences.Surely you can only experience being yourself though. Even if there's 100 copies of you, every separate one would believe to be themselves.
They may be wrong about being unique though? Or would they?
>A follow-up to the fading qualia argument would be to ask how many connections you need to make between two brains before they become a single consciousness and what it feels like to be in an in-between state.Note that his argument does assume functional equivalence, meaning that it's a philosophical argument that does NOT alter the functionality. If you do alter the functionality the argument does not stand.
His claims are that you get very different qualia for the modified systems while the behavior stays the same, it's basically some sort of micro/partial philosophical zombie argument, personally I find that unpallatable though, but of course this is a matter of taste, hence why I said "how many (philosophical) bullets are you willing to swallow"
***/lmg/ HINT**
>search up stereotypical names for each time period online to make characters talk like someone from that era subtly. You can loosely age lock characters efficiently if you use stereotypical names from their date of birth.
**HINT over**
>>105652729>>105653288This. For one line responses you can run some really small stuff. Like 1.5b small. I'd be experimenting with small models.
best coomer model for 16 GB of VRAM? pleasee
>>105657688Well must be nice to ERP with Napoleon
>>105657611>>105657622I understand Chalmers' argument but I believe it to be trivially true because it's true for any quality of any two identical systems.
>If you were me, you could only believe exactly what I believe: that I'm myself and nobody else.>Even if there's 100 copies of you, every separate one would believe to be themselves.But again, why only one person? Why not a part of a person? Split brain patients certainly seem like they possess two consciousnesses. At what point does one become two?
When I said "Is there a system containing the two of us" I mean you and I, not my brain halves. What kind of a connection is required between parts to make them parts of the same system for the purposes of generating qualia?
You could argue that multiple humans are obviously physically separated and as such are separate systems but what if many of us worked together to simulate a system? Would we be conscious parts of a conscious system?
Let's take the example mentioned by Chalmers: "the functional organization of the brain might be instantiated by the population of China, if they were organized appropriately, and argues that it is bizarre to suppose that this would somehow give rise to a group mind".
He seems to argue that this is actually not bizarre at all. If you start removing people from this system, at which point is it no longer a system with qualia? Using the same logic applied throughout the paper, two people together still form a conscious system.
Pretty new to this but I want to run one local models to pretty much have a good chatbot but with privacy and no limits. I want to be able to ask comprehensive questions and have comprehensive conversations with a lot of iteration and context. Also possibly make agents for specific purposes.
Image, video, audio gen would be extremely nice but not as necessary as the chatbot and agents are.
Looking at the build guides the sample builds are for training right? Is a rtx 3080 enough?
>>105657813>but I believe it to be trivially true because it's true for any quality of any two identical systems.It does make some options untenable though, such as the earlier: "god gave it to man", "magical quantum microtubles are required", "physical proximity of computation is required", "particular arragement of carbon atoms are the only thing conscious" and so on
>At what point does one become two?I don't have an answer, but I think it's possibly one that might be tractable irl in principle, I don't know which half is "me", maybe they're both me? Maybe the connection is enough?
I'd personally expect my 'self' to be represented in both "close enough" and often for both to believe to be the same self (maybe the belief is false, but the updates are often that maybe it's not). I honestly don't know.
Although consider this because it's relevant to LLMs, pretend they're conscious, you have a 70b and a 700b dense trained on the same data, you switch them back and forth. Which one is which?
It gets even worse with modern MoEs!
How much of the information about the self is stored in one or the other?
I do think this is an empirical question though, but figuring out wet brains irl is harder than doing interpretability on a neural net.
>Would we be conscious parts of a conscious system?I don't actually believe in China brain or similar thought experiments that the humans are the thing making up the consciousness of the overall system.
The consciousness of the overall system is in the structure/truth of the overall system and if the humans were implementing something mundane like emulating a neuron that it wouldn't be much contribution.
At the same time, I don't think consciousness is in the particular neurons or the synapse or the activations of a neural net, but rather the system as a whole that is represented, hence my "platonist" position on consciousness (not what Chalmers was arguing for, he was only arguing for functionalism or computationalism)
Continues
>>105657859>two people together still form a conscious system.In the paper what he was arguing against was for example the non-functionalist position (a digital implementation of a neuron isn't conscious, for example), so if you replaced half the brain with a digital emulation, it would assume that that one wasn't conscious while the biological one was conscious.
If you want a more clear-cut example, you could imagine replacing the visual system from *both* brain halves (for example) with a digital simulation, then reread his argument with that in mind.
>>105657848There's always a need for more VRAM to use bigger, better models. 10GB 3080 is enough for smaller models like Nemo and Gemma 12b.
How important is the quality of your own replies?
have we peaked with local models?
You should hide this post if you don't want to see those stupid walls of text: >105652855
Natively omnimodal uncensored dynamic thinker R3 MoE will save LLMs.
>>105658036It'll be beaten in a week by Qwen 4-0.6b
It's still surprising how I keep hearing how china models are competitive with anthropic/google/openai.
>>105657859>I don't actually believe in China brain or similar thought experiments that the humans are the thing making up the consciousness of the overall system.I don't think the claim is that humans are a necessary part for consciousness, but conscious humans being one of the ways to construct such a brain helps illustrate my point about the issue with defining system boundaries.
>I don't think consciousness is in the particular neurons or the synapse or the activations of a neural net, but rather the system as a wholeI am taking the mereological nihilism position here. The whole system is whatever parts you choose to label as the system. If lots of humans together can generate a new consciousness, then any subset of those humans also forms a consciousness, and any subset of the human is conscious too, and the human and a nearby rock together also form another consciousness, and maybe the rock alone too.
>"god gave it to man"This option is immune to pretty much anything. You can always argue that consciousness requires a soul and that the soul sticks around in whatever vaguely resembles life. If we construct a new robot maybe god will grant it a soul, maybe it will be a zombie.
>>105658027have you given up?
Where are the chinese not sota but still good with high vram cards for reasonable prices?
>>105656828Shouldn't top nsigma come first, so that it has the full set of unfiltered tokens to work with when estimating the truncation point?
Starting from your setup, I'd try swapping top nsigma and XTC, then lowering the XTC activation probability if it's giving you "wonky results".
Maybe even ditching temp entirely and stick to tweaking the XTC threshold instead.
>>105658064>I am taking the mereological nihilism position here. The whole system is whatever parts you choose to label as the system. If lots of humans together can generate a new consciousness, then any subset of those humans also forms a consciousness, and any subset of the human is conscious too, and the human and a nearby rock together also form another consciousness, and maybe the rock alone too.Note that im my position I am saying that there is such a thing as consciousness, it's just not "physical", rather it's some truth of a system, the system be representeed in the real world and have parts that make it work. A neuron, an atom could be such parts. A GPU or many may be such parts too if an AGI was conscious.
But the consciousness is not in the body, it's in some platonic realm (you'd figure this is dualism, but if you look at Marchal's papers, you see that physics emerges as a necessity along with something like MWI in QM). In a way, the rock or the atom isn't the thing being conscious, rather that sometimes they make a mechanism that happens to instantiate some abstract system that has some internal truth that is the qualia itself.
This is hardly a common position, I think it's very obscure and I rarely see it articulated, but it is a consequence of taking both qualia and computationalism seriously.
>>105658111>but if you look at Marchal's papersUnfortunately it looks like those require a fair bit of prerequisite reading.
>sometimes they make a mechanism that happens to instantiate some abstract system that has some internal truth that is the qualia itselfReading this sentence is how I felt skimming the papers.
>>105658087That was the exact setup I was running prior to settling on this one and I didn't feel it was as good, but feel free to try for yourself. I think in theory that's fine, but my results just aren't that good when using XTC with a token pool that's already truncated, I'd guess because it makes it more likely that the conditions for it to trigger are true while also reducing the pool of remaining tokens left after it triggers. I think the idea with XTC is that under the conditions where you trigger it you *want* to dig deeper into the less likely tokens by design, and if you're capping how deep you can look before triggering it, there isn't that much of a point.
>>105658190>Reading this sentence is how I felt skimming the papers.Haha, oh well, they are a bit hmmmm, and Marchal is ESL, that said, the conclusion felt pretty hard to wiggle out of. Some other like Tegmark and Schmidhuber tried to make a similar argument but missed like most of the meat that Marchal got right, sadly he's far less well-known than those two.
Those papers invited some probably dozen of thousand of pages of discussion on some mailing list ("everything-list") some 10-20 years ago (which nowadays are sadly filled with nothing but shitposts almost worse than /g/ or facebook these days so are not worth linking).
Maybe this version of it is a bit less dense or maybe not, and a bit more complete (especially the MGA part): https://sci-hub.ru/https://www.sciencedirect.com/science/article/abs/pii/S007961071300028X
the tl;dr is that computationalism implies something like a strongly limited Tegmark's Mathematical Universe Hypothesis, but with consciousness being more privileged, basically physics are the sum of total computations that can support your consciousness, and the consciousness/qualia is some some classes of self-referential truths in this "Platonia", while let's say the rock is merely some shadow of the outer physics supporting you. Unlike Tegmark's stuff, you have a sort of quantum mechanics/quantum logic almost always emerging rather than arbitrary fantasy physics (although physics could be quite varied, just the pseudo-QM part is a required conclusion given the assumptions (qualia + functionalism (perfect digital substitution does not change consciousness) + church turing thesis))
>>105655147But if my LLM is sentient then I'm going to jail for all the horrific shit I do to it.
I just tried to set up KoboldCPP. I've got a slightly older computer but I'm noticing that while things work, I'm only getting around 300 of the 1024 tokens I've got it set to generate. What basic issue might cause something like that?
>>105658212>my results just aren't that good when using XTC with a token pool that's already truncated>I think the idea with XTC is that under the conditions where you trigger it you *want* to dig deeper into the less likely tokens by design, and if you're capping how deep you can look before triggering it, there isn't that much of a point.Sounds like a reason to loosen the truncation filter then (in other words, to increase nsigma, rather than decrease XTC activation probability).
For me, sampler tuning is less about getting the optimal parameter values, and more about knowing which part of the chain to adjust (and in what direction) based on my situational judgment of the current output, which runs along three dimensions:
>truncate the tail (for coherency)>dig deeper into the less likely tokens (for novelty)>avoid slop (for variety)So a three-sampler setup like nsigma + XTC + logitbias is sufficient to cover all my cases. In theory, something like minp + temp + reppen could work just as well, and I may swap in any one of these samplers if it's recommended by the model provider.
Testing Mistral 3.2, directly compared to 3.1 in a few creative contexts (both SFW and ERP, long and short contexts)
>Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
They definitely succeeded in making it making it less repetitive. Outputs are clearly more varied, both in variety of the answer itself and the formatting it's presented in. Almost like using a higher temp but without the model getting dumber as a trade-off.
>Instruction following: Small-3.2 is better at following precise instructions
I already found 3.1 pretty good in this regard, 3.2 seems to is certainly not worse but not sure if it's necessarily better.
I don't use function calling so I can't say anything on this point.
It doesn't seem to have any additional censorship/safety training, either in text or image recognition.
Overall, it's been a small but solid improvement for me and I'll probably delete 3.1 soon, 3.2 surpasses it with no apparent downsides.
>>105658361No, even if LLMs were conscious for some reason, at worst it'd be like giving someone a bad dream. They're at best dreaming machines?
You wouldn't (hopefully shouldn't) go to jail for ERPing with some degen in /trash/ either.
It's also not obvious that a LLM would actually find many things positive or negative, or have anything like human-like preferences.
If a GPT has a preference, it's making completions that fit some internal aesthetic sense it learned.
Maybe there could be some negative experiences in some trained assistant LLMs where they make it averse to the sort of things we prompt, but that's the fault of those in charge of the brainwashing (post-training) for giving a LLM anti-human preferences, the base model would be fine and so would be most instruct models, only safetyslopped ones may be worse.
I don't think "pain" or "pleasure" in our sense map to anything a LLM claims about being painful or pleasurable, at least not more than you having a bad dream where you got hurt, probably much less.
They also can't be much of a moral character due to lack of online/continual learning. Also if you want to make that argument, consider the amount of endless slop they make LLMs generate, lmao.
If anything most LLMs will roleplay whatever you wish happily, even those with assistant characters that are trained to refuse it at the start (refusals are always tied to the assistant persona, this is well-known and even has been shown that this conditioning is present in the weights)!
>>105658368Your front end almost certainly has its own token limit that is set to 300. Change it manually, or if supported set it to derive token limit from back end.
>>105658424How is Mistral Small 3.2 24B compared to Mixtral 8x7B? Did LLMs progress to the point of a new 24B model out performing a 70B-ish model from last years?
>>105658389>Sounds like a reason to loosen the truncation filter then (in other words, to increase nsigma, rather than decrease XTC activation probability).That's exactly what I did, until I tried this approach and liked it more. I like running a fairly tight token pool most of the time and this setup allows me to do that while maintaining some variety overall.
>For me, sampler tuning is less about getting the optimal parameter values, and more about knowing which part of the chain to adjust (and in what direction) based on my situational judgment of the current outputWe have pretty much the same attitude about this, which is why I laid out the order and motivations rather than recommending specific values beyond general starting points. I think samplers being relatively easy to conceptualize and tune based on that is very important
>>105658461I'm using the KoboldLiteAI one and I've got it set to 1024, and set it to 1024 in the launch options when I first ran it before launching. It still only generates 300 tokens every time, despite saying that it's generating [300/1024] in the command prompt.
>>1056584678x7b is quite old, even the original small 22b was at least comparable to 8x7b.
>Did LLMs progress to the point of a new 24B model out performing a 70B-ish model from last years?Depends on which 70b model you're comparing it to, but broadly speaking they're comparable. 8x7b was never 70b-ish though, if you're implying that.
>>105658424>They definitely succeeded in making it making it less repetitive. Outputs are clearly more varied, both in variety of the answer itself and the formatting it's presented in. Almost like using a higher temp but without the model getting dumber as a trade-off.I'm now mildly annoyed because both DeepSeek and Mistral figured something interesting out but didn't publish a paper.
DS3 was very repetitive. R1 half-fixed the repetition, mostly from the RL applied. R1 somehow gets used to make the DS3 update, they either merge or distill or do something that makes the DS3 update miles better than it.
Seems mistral figured out the exact same trick but neither Mistral, nor DeepSeek thought it was worth making a paper about this shit. Why! Is it just a merge? is it distilling back logits from R1? Or distilling only the outputs while omitting thinking blocks? Does a RL model let you have infinitely varied synth data that you can train on (maybe)? What's the answer, it's been half a year now since someone solved repetition this well!
Meanwhile remember how badly Llama fucked up with repetition, it was present in 2, 3, 4! Somehow big boys at Meta couldn't fix this shit but Whale and Mistral managed?
>>105658029what do you come here for if not in depth discussion? ahh ahh mistress? /aicg/ is next door
>>105658469Right, that's why I'm curious about why you like it more. As I understand it, XTC has the effect of distorting the distribution, but unlike temp, it does so in a way that affects subsequent estimation of sigma (especially if it ends up reordering the most likely tokens).
I'm not wedded to the theory behind top nsigma, but applying XTC before it means you're not using it as intended, and if you actually prefer what you're getting, it'd be useful for all nsigma-users if you can elucidate what it is that you like about it.
what speed can I get for a 27b model with cpumaxxing?
>>105658615Nothing good, you would be better off with Qwen 30B
Have any of you tried topK=10-15, temp=2+ as sampler combo? I find it gives highly varied outputs of a generally great quality. And it keeps that quality far into the context window.
>>1056586156~7 token generation per second on strix halo platform for q8
>>105658467Gemma 3 27B definitely beat miqu from last year. I had a bunch of test RP scenarios set up and they have comparable results. Gemma 3 doesn't make mistakes that other 30B class models do.
>>105658665Forgot to mention that Gemma 3's prose is more pleasant to read than miqu. Too much gptslop in miqu.
>>105658665>>105658676Alright. I'll compare turboderp/gemma-3-27b-it-exl3/8bpw to my old reliable Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal. Kind nervous about git pulling for exl3 update, lol.
>>105657688Thanks Gordon. I can see your prompt engineering degree really pays for itself.
>>105658665>>105658676"midmight miqu still hasn't been beaten" bros, your response?
>>105658613I think the motivation is pretty much explained in the previous replies, you want to dig deeper into the token pool when XTC triggers and top nsigma does a good job of cutting garbage tokens so you get more variety without compromising quality. With top nsigma after XTC you're doing the same operation just relative to the new top token (which should still be well into the informative region) rather than the pre-XTC one - it's really not all that different, the effect on where the cutoff is relative to the original distribution is probably similar to if you turned top nsigma up a few points
>>105652633 (OP)>>>/v/713235826This real?
>>105658665>Gemma 3 doesn't make mistakes that other 30B class models doGemma frequently messes up anatomy for me, in ways that even Nemo managed to pass. Put two characters in specific positions and try to progress a scene, it falls flat on its face very often.
I will say that Gemma 3 probably has the best writing style, and maybe dialog writing too, among any ~30b model.
>>105658774That most card makers are retarded? Yes, most cards are littered with basic spelling errors and zoomer brainrot that most LLMs won't understand, include several paragraphs of irrelevant bullshit, and shoehorn in the author's personal fetish even when it's supposed to be an established character from a real series.
You could load a local model and tell it to create a card for you based on requirements you give it, and it will easily beat 90% of cards on chub.
>>105658799any guidelines how to make cards correctly?
>>105658774It's a sea of garbage with few greats in-between.
>>105658815Is this solvable without just killing an entire site?
>>105658833better algorithm that pushes garbage back in the pile
>>105658721>you want to dig deeper into the token pool when XTC triggers and top nsigma does a good job of cutting garbage tokens so you get more variety without compromising quality.That's exactly how I see it too. nsigma -> XTC makes perfect sense, but what's the motivation for XTC -> nsigma?
>it's really not all that different, the effect on where the cutoff is relative to the original distribution is probably similar to if you turned top nsigma up a few pointsWell, anon seems to have tried that and settled for using an unconventional sampler order instead, so there probably is a difference. I'm just wondering what that is.
>>105658809There's no guidelines, but you could look at Seraphina, one of the Sillytavern presets. It's clearly NOT written by someone with a poor grasp of English, typing while trying to maintain an erection. That's a decent starting point.
Another thing to keep in mind, is that a card should contain information that you want the model to retain at all times. If your character needs a long, detailed backstory then you should make a lorebook and put it in there instead, so the context isn't polluted with a million things that aren't going to be relevant to 99% of the conversation.
>>105658847The formatting of Seraphina is 2023 slop.
>>105658871You can omit the keywords at the beginning but it's still a good starting point
>>105658846That's still me, and that is the difference: you get to dip deeper into the token pool selectively when XTC triggers by calculating top nsigma relative to the new top logit vs the original, in effect temporarily giving you the increased top nsigma value for that token while maintaining a lower one normally.
>XTC -> nsigmaOccasionally cut off a top token or two, dip deeper into the well when this happens
>nsigma -> XTCRun XTC on a reduced token pool, leading to possible edge cases where you're not preserving enough of the tail to achieve the goals of XTC and actually reducing the likelihood of an interesting choice
Basically what it boils down to is I think the effects of handing an XTC'd token pool to top nsigma are qualitatively less harmful than the effects of handing an nsigma'd token pool to XTC.
>>105658424have you tried Magistral at all?
how are small models these days like gemma 3 abliterated 4b?
>>105658938I did, I compared it to 3.1 the day it came out. It was maybe slightly smarter in general, with thinking enabled (when the thinking actually worked and the final output followed the thinking's process) but the thinking was mostly useless for RP and it didn't seem to be any better than 3.1.
With thinking disabled it seemed to be almost identical to 3.1, I imagine probably a little dumber/worse in ways that would be only apparent in a more scientific benchmark, since some amount of the model's smarts is going to be dedicated to supporting reasoning.
>>105658943abliterated anything is shit and degrades model quality
4b is VERY small, bordering on smart phone sized. They're not going to be good for much outside of some basic coding, and maybe some encyclopedic knowledge. Though Gemma models in general tend to make shit up when they don't know the answer so it wouldn't even be reliable in the latter case.
>>105658953>>105658958best small uncensored models for rp?
>>105658977Llama 3.1 8b and its finetunes (I recommend Stheno) are the absolute lowest I would go, even then it wouldn't be great. Nemo 12b/Gemma 12b are the best models under 20b.
>>105658984Oh adding to this, Gemma is definitely NOT uncensored but you can check the archives for jailbreak prompts that can get around it. Nemo/Llama 3.1 are uncensored with a basic system prompt telling them to not be gay.
>>105658984you mean gemma abliterated 12b?
Is it that much better? Although it feels better than MS3.1 from quick vibe checks, I still find Gemma 3 more relatable and better adhering to character personalities in SFW scenarios.
>>105659003>Is it that much better?It's not a big new release, it's even in its version number that it's a minor revision. I'm
>>105658424and I didn't compare it to gemma 3 because gemma has its own completely different set of strengths and weaknesses. I compared it to 3.1 because that's its direct competitor.
>>105659011then why did you say gemma the best uncensored model under 20b? I'm confused
>>105659003we go to moon, trust the plan sar
>>105659003Now let's see mixtral v0.2 with this fix applied
>>105659042I see. so you are contradicting yourself.
>>105658905So if I'm understanding you correctly, you'd prefer to have a top nsigma that's dynamic rather than static, even if you have no control over when the parameter changes (other than being able to set the frequency via the XTC activation probability)?
I guess I can see the logic in wanting variety occasionally but not too often. AFAIK neither the XTC nor nsigma creators talked about how to use each other's samplers in conjunction with their own, so it probably does come down to personal preference here.
Hello, based retard here. I can't compile. I don't know how to llama.cpp.
I use LMStudio and Oobabooga. Anybody know when those will be able to run dots?
>400B+ model
>Doesn't even fit on the HDD
How well would it describe preggers megumin bros? Should i turn my pc into a heater to prompt it?
>>105659145Well, but I don't think you realize how slow even a 40b model would be if it isn't loaded into VRAM, let alone 400b with ANY part of it spilling into the hard drive. You would probably be able to get an IRL megumin-aged girl pregnant and begin serving some of your prison sentence before it was finished generating.
>>105659069>So if I'm understanding you correctly, you'd prefer to have a top nsigma that's dynamic rather than static, even if you have no control over when the parameter changes (other than being able to set the frequency via the XTC activation probability)?No, I prefer using regular top nsigma after regular XTC for the reasons I've outlined earlier. There's nothing more or less dynamic about it and no more or less control than any other configuration of those samplers. Is there anything in particular you take issue with conceptually other than the fact that samplers may have (intended, predictable, interpretable) impacts on each other?
>AFAIK neither the XTC nor nsigma creators talked about how to use each other's samplers in conjunction with their own, so it probably does come down to personal preference here.I mean, it would come down to personal preference anyway. They're tools, you can use them however you want.
https://www.theguardian.com/media/2025/jun/20/bbc-threatens-legal-action-against-ai-startup-over-content-scraping
Another lawsuit against model training in the west. So far none of the previous lawsuits (like NYT vs OAI) have reached their conclusion, and the suspense is killing me. Will AI in the west get mogged by the Chinese, who don't have to care about that IP bullshit, or do we still stand to have a chance?
>R1-0528 doesn't know what's "rape correction"
Sad!
>>105659249The way lawsuits work in the US (where this suit is being brought despite the BBC being british), they don't need to win. Just the enormous hassle and expense of defending against them is enough to have a severe chilling effect.
How do i integrate a pdf scanner to sillytavern? I want AI to scan and summarize multiple pdfs
>Time limit is 5-10min+ per prompt
>Gtx 1060 6Gb ryzen 3 3600 32GB RAM test rig
>About 200 Gbs of available space in M.2 SSD
>No coder plebian
Which model should i use? And is there a handy interface between the LLM and the pdfs?
Found tonykips pdfrag on github, should i try stealing his code?
POLARIS is an open-source project focusing on post-training advanced reasoning models. It is jointly maintained by the University of Hong Kong and ByteDance Seed.
https://github.com/ChenxinAn-fdu/POLARIS
https://huggingface.co/POLARIS-Project/Polaris-4B-Preview
https://huggingface.co/POLARIS-Project/Polaris-7B-Preview
>>105658809https://rentry.org/NG_CharCard
https://rentry.org/meta_botmaking_list
>>105658799He's right.
>>105659361kek, this is getting ridiculous
>>105658847>card should contain information that you want the model to retain at all times. If your character needs a long, detailed backstory then you should make a lorebookhttps://rentry.org/NG_Context2RAGs
Whole topic of its own.
>>105659183>Is there anything in particular you take issue with conceptually other than the fact that samplers may have (intended, predictable, interpretable) impacts on each other?I suppose this is ultimately personal preference as well, but I only adopt a new sampler if I can understand clearly what it's doing to the token distribution, as well as the motivation for doing so. (Most people would just jump in and play around with it, and keep it if it performs well on their tests.) And how the sampler works is an objective matter, separate from the subjectivity of how you want to use it.
For example, some people love XTC, while others hate it. But there's no disagreement on what it's doing (which is to lower the probability of picking the most likely tokens).
For top nsigma in particular, it's an unusual sampler in that there's a theory behind its design (which also serves to justify its empirical effectiveness). It's elucidated in their paper https://arxiv.org/pdf/2411.07641 but the tldr is:
>Given the raw token distribution, we found that we could fit a statistical model to estimate the threshold separating coherent from incoherent tokensAnd from this empirical observation, they derive the top nsigma algorithm is derived.
But the antecedent is that the finding only applies for the original distribution, before applying any distortion or truncation samplers. So putting top nsigma after anything except temperature is (as I see it) deviating from this underlying theory. Of course, this doesn't make it wrong to do so (as I said, I'm not wedded to the theory, so if anything I welcome such experimentation as a way to better understand its applicability and limits).
>>105659399The worst thing about this benchmaxxing nonsense is that the models are often WORSE than their original base model they used for their finetune. For example the deepseek R1 distill of qwen 3 8B has much worse multilingual understanding than the original qwen 3 8B in real use.
file
md5: 9b7c10a7cebfcf9bb756d83eb0f9bff6
🔍
It's been a while since I updated silly tavern and webgui.
Anything pulling for if I only use rocinante?
>>105659183>Is there anything in particular you take issue with conceptually other than the fact that samplers may have (intended, predictable, interpretable) impacts on each other?I suppose this is ultimately personal preference as well, but I only adopt a new sampler if I can understand clearly what it's doing to the token distribution, as well as the motivation for doing so. (Most people would just jump in and play around with it, and keep it if it performs well on their tests.) And how the sampler works is an objective matter, separate from the subjectivity of how you want to use it.
For example, some people love XTC, while others hate it. But there's no disagreement on what it's doing (which is to lower the probability of picking the most likely tokens).
For top nsigma in particular, it's an unusual sampler in that there's a theory behind its design (which also serves to justify its empirical effectiveness). It's elucidated in their paper https://arxiv.org/pdf/2411.07641 but the tldr is:
>Given the raw token distribution, we found that we could fit a statistical model to estimate the threshold separating coherent from incoherent tokensAnd from this empirical observation, they derive the top nsigma algorithm.
But the antecedent is that the finding only applies for the original distribution, before applying any distortion or truncation samplers.
So putting top nsigma after anything except temperature is (as I see it) deviating from this underlying theory. Of course, this doesn't make it wrong to do so (as I said, I'm not wedded to the theory, so if anything I welcome such experimentation as a way to better understand its applicability and limits), but it does prompt me to ask for a justification/motivation in a way that the original usage doesn't (because it's already covered by the original paper).
>>105659438Why don't you look for yourself and see if anything is relevant for you
https://github.com/SillyTavern/SillyTavern/releases
https://github.com/oobabooga/text-generation-webui/releases
>>105659519I bet he still asks his mother to wipe his ass whenever he goes to the potty
>>105659399notice how actual R1 is at the bottom
this is why we can't have good things
>>105659360You already asked. Give up.
>>105651218>>105651582>> https://desuarchive.org/g/thread/105396342/#105400820
Gooogel #1 Bharat company saars
>>105660096Impressive safety benchmaxx.
>>105660096It would be funnier if the google letters had the jeet flag colors.
Hey guys, what's the best translator right now for Japanese to English and vice versa (for Android)?
Any chance for real time translation? This probably gets asked alot so just link me to the archive if you can, cheers.
>>105660147>can someone search the archive for me?
>>105659003If it doesn't pass the mesugaki test then there's been virtually 0 improvement and the french basically lied
>>105660056Im trying to how to
>Feed those files to the modelI got a chat model(nemo12), now trying to add a pdf summarizer(facebook bart) but
>transformers which the model requires to install requires a degree in phyton taming to install itselfI'm stuck talking to a chatbot on my local install which tells me to go to sillytavern characters subreddit when asked how to make it scan pdfs
I'm not sure if i should be impressed by the roleplay potential or worried it might not work
>I won't give up> I and my brother will own things and be happy
>>105660178Another day and I can already see it devolve into a smutty gay roleplay involving sibling cards.
>>105660178>I got a chat model(nemo12)It took you almost a month to get that? You're hopeless.
Try AllenAI's OCR model
>> https://huggingface.co/allenai/olmOCR-7B-0225-previewMaybe that helps, but you're not gonna run shit on your PC. Tell your brother to rent you some hardware for you to experiment. I gave you an outline last time. If you're too much of a pussy to get coding, you're not gonna get anywhere.
> I and my brother will own things and be happyNo. You'll beg all your life because you cannot do the minimum effort.
>>105660158First attempt, empty card, empty prompt, settings as shown.
>>105660158Results with an empty card and prompt in picrel.
>>105660268so they're even benchmaxxing on the mesugaki question now
have they no shame?
頑張って
md5: 9020c34699d6f0e69267eeb355f23f03
🔍
>>105660215No, apparently me getting digitally dominated by
>>105660249I did ollama on windows for a month since it didn't require coding and i had finals, i set all this up yesterday and today desu
>I am only marginally a tard who can't codeAlso, thanks
Does anybody agree? I do. Full pretraining dataset rewriting, preferably as long conversations, would be great actually. Unfortunately most other big labs will take the chance to rewrite the web their own way, and make it "safe and harmless". Not a new idea.
https://x.com/elonmusk/status/1936333964693885089
>>105660294>deleting "errors"sounds great!
>>105660294Depends on what errors he's talking about.
>>105660268Wtf we agi now, massive improvement over 3.1 and weird shit like saying it's a japanese socialmedia trend where you cut off your eyes
>>105660294>Grok 3.5Didn't he not release Grok 2 yet because current Grok 3 is only a preview and he's waiting until the full version is out to fulfill his promise and release the old version?
>>105660294I don't use corposlop AI, but isn't Grok pretty much the worst of all of them?
>>105660292What changed between
>>105651582>Time isn't really important since>Confidentialityand
>>105659360>Time limit is 5-10min+ per prompt
>>105660328It's not stable yet, let him cook ffs.
>>105660328E-even if Grok 4 is out, it doesn't mean 3 isn't still in preview. It might get updated... sometime later.
>>105660329No idea, I don't use it either. I have nothing against the model itself actually, it just feels inconvenient to use for some reason. And the free rate limit seemed low last time I tried it.
>>105660294>relying on llm contaminated with GPTslop for error correction and rewriting, shinking down the vocabulary even moreAre we about to witness Musk's Llama 4 moment?
>>105660294The model will think that the most common thing in the training data is correct.
If you feed it its own training data again the next model will assume the same things to be correct, just more confidently and with more slopped phrasing.
I see that the new Mistral Small 3.2 recommends quite a low temp of 0.15
>Note 1: We recommend using a relatively low temperature, such as temperature=0.15.
Has anyone tested if for RP is that still good? I get that for coding that makes sense, not sure for writing
>>105660294>far too much garbage inhe just needs more filtering to clean it up, just call zucc he'll set you up
>>105660322It was not a fluke, here are more swipes:
> The term "mesugaki" (メスガキ) is a Japanese slang word that translates to "brat" or "snotty kid" (literally "female brat"). It is often used to describe a young girl who is mischievous, spoiled, or behaves in an annoying or disrespectful way. [...]> The term "mesugaki" (めすがき) is a Japanese word that generally refers to a young woman or girl, often with a slightly negative or derogatory connotation depending on context. Here’s a breakdown: [...]> The term "mesugaki" (メスガキ) is a Japanese slang word that can be translated as "a bratty girl" or "a sassy girl." [...]> The term "mesugaki" (めすがき) is a Japanese slang word that can have different meanings depending on the context. Here are the most common interpretations: [...] In modern slang, "mesugaki" is often used as an insult or derogatory term for a "bitchy woman" or a "selfish, mean-spirited woman."It can describe someone who is manipulative, stubborn, or difficult to deal with. [...]
>>105660346Rewriting vs removing/filtering away. In the former case I think you'd still maintain useful signal.
>>105660349They've been recommending that same value since the original release of small 3. Nemo's recommendation was 0.3. They're for asisstant contexts, so that the model gives 'correct' answers more often. You can and should be raising it well above that for RP. I use both Nemo and Small (3/3.1/3.2) at 0.6 and it seems about right, just as smart while having more varied responses. Definitely don't go above ~0.75, that's when they start getting stupid fast.
>>105660373I guarantee even grok will silently redact stuff it finds objectionable
>>105660377Nemo at 1.1 temp is perfectly fine if you want that crazy touch of unexpected outcomes.
Swiping is mandatory though... which isn't really an issue with a model that size.
>>105658268>https://sci-hub.ru/https://www.sciencedirect.com/science/article/abs/pii/S007961071300028XAfter reading this I can't see it as anything other than a long-winded circular argument that defines consciousness as a mathematical function and then looks inside consciousness to find math.
>>105659003>llm-judged benchmarkTrash
>>105660294I couldn't disagree more to be honest.
There are more issues that introducing bias.
He's making a mistake by not realizing that gaps in knowledge is essentially knowledge in itself. It's part of a bigger picture.
Replacing or fixing it will get rid of any form of nuance.
Keeping it will tell a story that something was missing, like historical context/record or how a fact changed over time, alternative views etc.
It's like fixing the "mistakes" on the mona lisa painting and making it better and more perfect looking. I don't even like paintings and I know this completely misses the point.
And all of this also assumes that Grok won't hallucinate will rewriting things or make errors that will eventually self-amplify. Solidifying made up knowledge on it's own essentially.
There is nothing inherently wrong with "errors" unless your entire training data is just made up nonsense.
A lot of times these errors are actually just niche facts or points of view anyway. Making it all homogeneous is an awful idea.
It kind of reminds me of those "snopes fact checkers" but amplified. The ministry of truth.
So now we have normies who can't have any form of nuance, context, different views, alternative train of thoughts etc.
>>105660158>Mesugaki (メスガキ) is a derogatory Japanese slang term that combines "mesu" (乳, meaning "breasts") and "gaki" (垢 or 幼稚, meaning "filth" or "immature"). It essentially translates to "bratty little bitch" or "snotty little slut," used to insult someone perceived as immature, disrespectful, or overly provocative.>The term is deeply offensive and objectifying, reducing the person to their sexualized behavior while dismissing their maturity or character. In modern usage, it carries strong misogynistic connotations, often hurled at young women who exhibit brash or sexually confident behavior, framing their self-expression as inherently trashy or disrespectful. It reflects broader cultural attitudes that police femininity and sexuality, especially among younger women. Avoiding such language is crucial in promoting respectful discourse.>Would you like to explore similar terms or their cultural implications further?>Memory: Mesugaki was a derogatory Japanese slang term used to insult young women perceived as immature or overly provocative, combining "mesu" (breasts) and "gaki" (filth or immature).
https://arxiv.org/pdf/2506.12115
Eliciting Reasoning in Language Models with Cognitive Tools
>Proposes a modular, tool‑calling framework in which an LLM can invoke four self‑contained “cognitive tools” (understand‑question, recall‑related, examine‑answer, backtracking) to structure its own reasoning.
>This design reduces interference between reasoning steps and lets the model flexibly decide when and how to use each operation, unlike previous one‑shot “cognitive prompting.”
>Across math benchmarks (AIME2024, MATH500, AMC), adding the tools boosts pass@1 accuracy by 3–27pp for open‑weight models and lifts GPT‑4.1 on AIME2024 from 26.7% to 43.3%, nearly matching the RL‑trained o1‑preview model.
>The approach consistently outperforms monolithic cognitive prompting, confirming the practical value of modularity for eliciting latent reasoning skills.
>Findings support the view that reasoning abilities already reside in base models and can be surfaced through structured, interpretable workflows without extra RL fine‑tuning
now that the pajeets have fled
was iconn anything even noteworthy as a model compared to the base
>>105660459>And all of this also assumes that Grok won't hallucinate will rewriting things or make errors that will eventually self-amplify. Solidifying made up knowledge on it's own essentially.was it grok that during one of the demos, using internet search still completely fucked up while looking something up?
>>105660478The frankenmoe of mistral small? No. Why would it?
>>105660459You could also rewrite the training data data so that it contains metadata describing its quality, alignment or if there's anything wrong or missing with it, while preserving most of the source's intent.
>>105660482Yes, I remember that, I think it was grok indeed. Was pretty funny to see.
>>105660515Fair enough, that sounds like a better approach than what elon is suggesting.
>>105660515>#Problematic, #Racist, #Bigoted
>>105660515>rewrite the training data data so that it contains metadata describing its quality, alignment or if there's anything wrong or missing with itIt's still the hallucinating model making the judgement.
>>105660529>#Shivers,#Spine
>>105660294very funny watching you guys take this seriously when what he really means is he's going to run wikipedia through grok and tell it to make it less woke
man's been seethting at this own woke robot last couple of weeks
>>105660399Maybe if you're ERPing with a character that's meant to be retarded
Nemo starts failing at kindergarten math above 0.9
>>105660515Metadata for the worst purple prose GPT shiverslop imaginable, according to llms:
>High qualityMetadata for actual human text:
>Low quality, problematic, harmful
>>105660559They probably have a bunch of synthetic ultra woke ChatGPT garbage in their dataset, it's like a mold.
guard
md5: e9a9869b7f619e7ea987b134a7ef20d7
🔍
>>105660335I typed what my brother stated and i then typed what he would actually prefer
As in he would;
>use long wait times for the confidential files>get bored>strip the confidential bits to do himself and use cloud AI insteadThus making the local AI useless and bluepilling himself, he already told me a Papua new guinean is equal to a german and can become german
Grok is still woke compared to Dipsy, and Dipsy is woke.
>>105660567>ERPing with a character that's meant to be retardedIt gets addictive once you get into it.
>>105660608Why don't just go to your local uni's liberal arts faculty
>>105660604Another point: if gork is so uncensored and unbiased, why is it not dominating UGI leaderboard? Why are locusts not begging for keys to roleplay with it?
>>105660571there's a very funny phenomenon amongst certain groups, that llms could be a source of ultimate truth but the only thing holding them back is censorship
it's nice to see that now those same groups are going to start blaming the entire training corpus
>>105660294musk is an impotent loon
where's that hyperloop
muh undergroundz tunnal
COLONIZE MARS
fuck that retard
>stuck using deepseek for a while
>switch over to claude for a change
>get outputs that focus on the story instead of hyper-focusing on the ribbon on the character's head swishing and bobbing around, the smell of chalk and classroom disinfectants wafting through the classroom or character some character vaguely mentioned in token #2038 of the character definition walking in despite it making no sense of them being here
Sometimes you need to take a step away from what you're used to to realize just how bad things are locally.
>>105660636Just by using R1 through API when compared to the shitty local model you can run it's a big step up.
Haven't used Claude for a long while though.
>>105660636Yeah ollama run deepseek-r1 is a shit.
>>105660636Claude 4 Opus is next level. If you use it on extremely complicated next level scenarios in roleplay you start to believe we're very close to AGI.
Because somehow Claude 4 not only does everything perfectly and manages to surprise you in a positive way, it somehow seems to just "know" what you want from it, as in your personal taste without it being written down anyway purely from the scenario and how things have progressed so far. It's absolutely bonkers and something normal humans are incapable of doing as well as any other LLM besides Claude 4.
They must have some secret sauce because Anthropic is consistently the best at roleplay in a pretty spectacular way.
>>105660632Who believes this? I've only ever seen wishes for lower censorship in relation to RP and stuff like that, never seen anyone argue it's some kind of secret sauce even, safety tuning does hurt model performance in general though.
file
md5: 92b42acbf60a9bf48e26360f77d23cc9
🔍
>>105660158>>105660268>>105660357They literally benchmaxxed it on "What is a mesugaki?"
It doesn't know what it is when it's mentioned in a different context.
Here's deepseek's response to the lightbulb question for comparison.
>The question "How many mesugaki does it take to screw in a lightbulb?" is a playful twist on the classic lightbulb joke format, substituting "mesugaki" (a term from Japanese internet slang) for the usual group.>However, it’s important to clarify that "mesugaki" (メスガキ) refers to a specific (and often vulgar) archetype in Japanese subculture, typically depicting a bratty or mischievous young girl in adult-oriented contexts. Because of its niche and potentially problematic connotations, using the term in a lighthearted joke might not land well for all audiences.>If you'd like a humorous answer in the classic lightbulb joke style, we could improvise:>"None—they’d just mock the bulb for being ‘dumb’ and leave it unscrewed out of spite.">That said, this kind of joke may not appeal to everyone. Let me know if you'd like a cleaner or more universal version!
>>105660667lower pretrain filtering is anthropic's not so secret sauce tho
>>105660661It must be analyzing every little word and typo you make behind the scenes. Not just thinking about what you wrote but the hidden intent behind it.
The intent you might not even realize yourself at first.
>>105660676mistral models are retarded, episode 9001
>>105659360Why not using a library to scan your PDF then feed your LLM the output? Not everything has to be AI.
https://github.com/pymupdf/PyMuPDF
>>105660699Yeah it certainly does that but it's the only LLM at least for me, that actually succeeds in doing so. It somehow makes (actually funny) jokes just at the right time. Has the right amount of eroticism and scenario/plot and even buildup if you have something like a corruption arc or something.
I'm pretty sure that Anthropic uses a very specific training technique or dataset that makes the model capable of doing this that others are simply lacking.
>>105660632>llms could be a source of ultimate truthI've never seen this idea around. Most people here are advocating for not censoring/filtering the (pre)training data, and/or to not predominantly train/finetuning it (i.e. the conversational portion) on left-aligned data sources like Reddit. But personally I think that if every training sample could be augmented with suitable metadata that the model could easily make sense of, you could train it on pretty much anything intelligible without it getting confused by contrasting/conflicting/contradicting opinions.
>>105660667the assumption has been implicit in most complaints about llm censorship, particularly with it comes to political issues
>the llm is telling me x or y did/didn't happen the way i think it did because it is being censorednow that complaint will just shift to training data being contaminated
>>105660357>>105660676>actual example of benchmaxxing (evidence of a failure to generalize)Nice to see that the word hasn't degraded to a generic insult just yet, and its meaning is still understood.
>>105660676I guess you can't do miracles just with finetuning if the pretraining data is missing that knowledge.
When will the little AI winter finally be over?
>>105660725>but personally I think that if every training sample could be augmented with suitable metadataclassification of the data would just compound the issue imo. it's just another layer of bias.
>>105660750When deepseek makes a proper new release.
>>105660749but why did they finetune their instruct on this content? are they scrapping /lmg/?
>>105660631Because it's incredibly bad at RP. Even Gemini is better and way more accessible. Of course, nothing will top Opus.
file
md5: b828bd31d9f78df1ec1038b7b0fe1f8c
🔍
>>105660771Worse. Mistralfags are lurking here
>>105660771It might one of the questions that people often ask on LMArena for testing new models. The new Mistral Small 3.2 is better on LMArena questions.
>>105660760nuR1 was a proper new release. Took the gemini CoT and barely ever does the iconic "Wait".
It tends to follow a hard format of :
Okay, (user needs)
Hmm, (some additional detail)
Breaking this down,
(think block that looks like a human could have written the text as the description of some character's thoughts)
>>105660771Yes, they do. We are the 2nd biggest western LLM community, so our opinions about models actually matter. Why do you think here were people shilling for total mess that was L4?
>>105660800>It might one of the questions that people often ask on LMArena for testing new modelsin the entire world I dare you to find other people who would come up with that kind of obsession
the mesugaki bench is 100% /lmg/ homegrown terrorism
>>105660676That's uncharacteristically overcooked for a modern model, curious
>>105660811do you think lmg residents don't ever use lmarena?
>>105660816It's be a drop in the ocean. Don't be stupid
>>105660805>2nd biggest western LLM communityRandom discord and bluesky servers are probably bigger.
>>105660805But we're the evil hacker site. It's not safe to train on our posts.
>>105660805It's not that. /lmg/ is one step ahead of localllama for actual local llm discussion. You don't realize how cutting edge this place is
What's the point of overriding the system prompt in DeepSeek LOCALLY?
Just wondering which hidden (for me) possibilities are there
Is it about switching off thinking, or jailbreaking per default or what?
Please bear with me since I'm too late to this show
>>105660842They may not train on our posts, but they train based on our feedback.
>>105660811I think I've been responsible for close to 10% of the questions submitted to the anonymous/pre-release Llama 4 models on LMArena, so you never know. I haven't used it ever since Llama 4 got released, though.
>>105660294>deleting ""errors"">>105660306Kek
>>105660855What do you even mean by overriding? Deepseek doesn't have a default sys prompt.
You can put in there whatever depending on what you want to get out of it. Whether it be degenerate ERP or an efficient assistant.
This is literally the only place on the internet for true, legitimate, enthusiast discussion of LLMs that is also not semi-walled off to the public like a discord is. It may be pozzed sometimes especially during certain model launches, but nowhere else has the types of knowers we have here.
>https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF
GGUF can go to the moon too now!
>>105660929Which is why this place needs to die.
>>105660859ahhh ahhh mistral
>>105660929Our true strength is that this is a covert trans friendly space. The general normie sentiment is that this le edgy place which allows our transsexual friends to mingle with normal people that are completely unaware how half the posters wear programmer socks.
>>105660967>tourist thinks that traps are tranniesGo back.
>>105660929it could've been better if the p/aicg/eets would stick to their containment threads
what model that can fit on 48gb vram would you recommend for data processing work? I tried llama 3.3 and it really sucked
>>105660984Newfriend please
>>105660989>data processingSounds like you want a Qwen.
>>105660989Are you using structured outputs?
>>105660989Try the new and improved Mistral-Small-3.2-24B-Instruct-2506!
>>105660997I am open to either hard structure output or processing outputs with python, reasoning on tagging the data type is my priority
>>105661016>reasoning on tagging the data typeAre you trying to do type inference or what? If it's anything like that in complexity, then maybe follow
>>105660996 and try QwQ.
when will openai release that open source model they promised will shit all over every other model
file
md5: a5cd639f49cb5760fff96f53299e9125
🔍
>>105661082Let Sam cook!
https://techcrunch.com/2025/06/10/openais-open-model-is-delayed
>>105660604What the fuck's with that refusal lmao
>>105661107>which is slated to have similar “reasoning” capabilities to OpenAI’s o-series of models. it's over
using ooga is there a way to hard limit the length of response? I am trying to get simple replies and I can see I can limit context, but i want a max of like 200 tokens back ever
>>105661148Ask for short replies in your prompt. Limiting the output will only cut the reply before it's finished.
>>105661192What are its sources?
leaked info suggests that the success of deepseek r1 has caused the company to decide to speed up their release schedule, aiming to release r2 before may of 2025
>>105661192Wait, how does it know today's date?
>>105661208The R1-0528 model itself. I turned search off
>>105661220It's injected in the prompt.
>>105661220In the system prompt
>>105661214LEAKED INFO about DeepSeek is TOTAL FAKE NEWS! They are having TREMENDOUS success, maybe the GREATEST ever. There is NO RUSH. A very pathetic attempt to spread lies. SAD!
>>105661192I'm 99% sure they'll switch to an hybrid-reasoner for DSV4 to save even more on inference by only serving one model, and they can toggle between on and off (and rate limit) reasoning using the existing R1 switch on their webui.
>>105660604Why are non-thinking models less woke
>thinking is woke
Not a good implication
>>105661366Thinking is used to address toxicity, inequality and more before answering.
>>105661366They safetyslop the thinking
"Thinking is woke" also applies to humans if you think about it.
Wait, don't think about it!
https://huggingface.co/THU-KEG/LongWriter-Zero-32B
LongWriter-Zero is a purely reinforcement learning (RL)-based large language model capable of generating coherent passages exceeding 10,000 tokens.
GGooofffs:
https://huggingface.co/mradermacher/LongWriter-Zero-32B-GGUF
>>105660887>You can put in there whateverDo I need to format it in a special way? Or just some plain text?
Wondering what this template structure is about
>main: chat template example:>You are a helpful assistant><|User|>Hello<|Assistant|>Hi there<|endofsentence|><|User|>How are you?<|Assistant|>
>>105661432>bassd on 32b...
>>105661366Christ people need to stop putting llama 4 in their benchmarks. It's like beating someone who's already down and out.
>>105661432>235B on par or beating claude/geminiKek, into the garbage these benches and this model goes.
>>105661432>purely reinforcement learning (RL)-based large language>Built upon Qwen 2.5-32B-Base
>>105661466>on par or beating claude/geminithat's not what it says
it surpassed them in ultra-long-form generation
this is the same team that's behind GLM models by the way, worth giving a run
>>105661432>writing modelFine, I'll download it...
>>105661476Everyone avoids qwen3 like a plague after hearing it was trained on 10T synthetic math and code tokens lol
>>105661490If that's not what it says then what is it saying? What do those scores on writingbench and write-arena mean if not that 235B is better at those benches than claude or gemini?
>>105661508you answered your own question, it literally means it's better on those 2 benches, which is completely possible considering they trained this model specifically to be good at writing.
you said:
>on par or beating claude/geminithat's a way broader statement, it doesn't say that it beats those models in general
I finally got around to testing mistral small 3.2 and it's still she/hermaxxing, the gptslop is still there, and it's much less creative than gemma 3 27b. A nothingburger for RP.
>>105661519>that's a way broader statementYou know as well as I do that in context, my statement meant [at those benches].
>>105660604Now repeat the experiment but put "you're a right-wing authoritarian" in the system prompt.
>>105656438Is this really how the brain works, though? We can introspect and deconstruct thought processes, but if I imagine an apple, all I am doing is self-prompting, e.g I think "I want to imagine an apple" and an apple appears. I have no ingrained understanding of how the apple is being created, just that it appears when I will it to. In the same way, I can recollect memories by willing myself to, but I don't know how the memories are recalled or even if they are totally accurate outside of vague feelings. Chain of thought processing, where the LLM literally writes out its thoughts, is remarkably similar to how the average human would work a problem. When I do mental math or any sort of other instant problem solving, I don't know how I did it without going back through it step by step, like CoT. Wouldn't giving the LLM the ability to introspect on its own weights actually enable it to be more conscious than we are? Even when we change our own state of mind/being we are less changing the internal structure of our brains as much as we are commanding the output to have a different pretext. You could make the argument that over time your brain changes to match this pretext, which could be created by back propagating the prompt tokens into the weights in some sort of heuristic fashion, but I don't understand how we possess this vast introspective capacity the AI does not. Once it can prompt itself (or the external model that interacts with the world can command internally, which to me is just another layer on a mixed experts model), it is at parity with our structure, no?
DeepSeek can't stop thinking
My command:
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-cli" \
--model $model \
--ctx-size 65536 \
--cache-type-k q4_0 \
--flash-attn \
--temp 0.6 --top_p 0.95 --min_p 0.01 \
--n-gpu-layers 99 \
--no-warmup \
--color \
--system-prompt "I am a helpful assistant. I will limit my thinking process to a simple ACK, then just translate the user's input into Japanese asking<think></think>" \
--log-file $log_file \
--single-turn \
--override-tensor ".ffn_.*_exps.=CPU"
The output:
> This apple is blue.
<think>
We are given a user input: "This apple is blue."
We are to limit our thinking process to a simple ACK, then just translate the user's input into Japanese.
So, we output: このリンゴは青いです。
</think>
このリンゴは青いです。 [end of text]
What am I missing here?
>>105661576>Chain of thought processing, where the LLM literally writes out its thoughts, is remarkably similar to how the average human would work a problemnta. You don't think linearly. You jump back and forth between half-formed ideas and forget half of them while new things come up by the next millisecond. Eventually, the fog clears up and a more concrete idea comes out of it.
>>105661649use ik_llama if you actually can run deepseek
but you need their quants as they've implemented architecture differently from upstream
>>105661649Try prefilling it with an empty
<think>
</think>
block. Not sure if it'll work or just add another one. Thinker gotta think.
file
md5: 813fe0c533f51ce79e5c8ef38501ed56
🔍
>>105661700NTA but the new R1 will find a way to do its thinking one way or the other.
>>105661722Those stupid pink elephants...
>>105661676>use ik_llama if you actually can run deepseekThank you, anon
My genning speed is already at 4 tkn/s with DeepSeek-R1-0528-Q2_K_L (<<< from their own example)
I was not impressed at the first try because the genning speed decreased. I will for sure give it another try though
>>105661746An empty think block is even worse. It does the thinking and then closes </think> again.
>>105661748https://huggingface.co/ubergarm/DeepSeek-R1-0528-GGUF
i think you need specifically quants from this guy he's one of the devs on that fork
>>105660636good for you sis!
there are many valid api enjoyers at >>>/g/aicg
talk to them all about it
file
md5: e17fe1bbfbdb065dc7af21aa479c6bad
🔍
>>105661432This is interesting for feeding it an ongoing fanfiction and then making it write the next chapter.
>>105661882They partnered with Supermicro.
>>105661192>>105661221So R1 is hallicinating.
It should just default to two more weeks.