/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>106011911 &
>>106005673►News
>(07/25) Qwen3-235B-A22B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-235B-A22B-Thinking-2507>(07/24) Magistral Small 1.1 update released: https://hf.co/mistralai/Magistral-Small-2507>(07/24) YUME interactive world generation model released: https://stdstu12.github.io/YUME-Project>(07/22) Higgs Audio v2 released: https://www.boson.ai/blog/higgs-audio-v2>(07/22) Qwen3-Coder-480B-A35B released with Qwen Code CLI: https://qwenlm.github.io/blog/qwen3-coder►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>106011911--Papers:
>106015367 >106015437 >106018967--Magistral model requires --special flag in llama.cpp to expose [THINK] tokens for proper frontend parsing:
>106012674 >106012735 >106012780 >106012820 >106012845 >106014062 >106014161 >106012821 >106012879 >106012906 >106012953 >106012925 >106013180 >106013214 >106013298 >106013344 >106013388 >106013426 >106013500 >106013544 >106013579 >106013665--Qwen3's over-alignment response to a Japanese slur term sparks criticism of modern LLM behavior:
>106018450 >106018461 >106018492 >106018565 >106018581 >106018621 >106018646 >106018669 >106019891 >106019901 >106019919 >106019951 >106018716 >106019655--Mistral's flawed two-server approach breaks llama.cpp's lightweight design:
>106015554 >106015638 >106015666 >106015868 >106020302 >106020383--Local execution challenges with large MoE models under VRAM and offloading constraints:
>106014862 >106014916 >106015424 >106015454 >106015519 >106015565 >106016265 >106017240 >106015507 >106015028--High-thread NVMe LLM inference benchmarks reveal diminishing returns:
>106012162 >106012174 >106013567 >106013660 >106013723 >106013786 >106018346 >106013796 >106013797--MLX outperforms llama.cpp on M3 Ultra but has SillyTavern integration issues:
>106013458 >106014077 >106014268--Quantized Qwen3 coder model benchmarks:
>106017215 >106017260 >106017277 >106019453 >106019463 >106019547 >106020316 >106018035--Performance regression in llama.cpp multi-GPU inference after update:
>106012278 >106013195 >106013325--Anon extracts Higgs Audio v2 patches due to missing vLLM fork:
>106013788 >106014022 >106019174--Misc:
>106016151 >106019426 >106020161 >106020202 >106021271 >106021313 >106018793 >106021835--Miku (free space):
>106011969 >106012287 >106013068 >106016203 >106018731 >106018817 >106019327►Recent Highlight Posts from the Previous Thread:
>>106011918Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
m
md5: 1cdb8c4034b88b8961c009e1228576ce
🔍
First for total miku death!
>>106022943>>106022978Schizo melty incoming...
Do we have a date for the GLM 100B MoE?
https://mistral.ai/news/our-contribution-to-a-global-environmental-standard-for-ai
nolima got updated
https://github.com/adobe-research/NoLiMa?tab=readme-ov-file#nolima-hard-results
>>106023074No, but it must be really, really close. Apparently the pr on vllm wasn't made by the core team, but by the modelscope.cn, so the weights are already on the server, I guess.
>>106022725 (OP)>>106022743>>106022834>>106022983vocaloidtranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation: https://desuarchive.org/g/thread/104414999/#q104418525 https://desuarchive.org/g/thread/104414999/#q104418574
he makes ryona picture:
>>105714003 of some random generic anime girl the different anon posted earlier:
>>105704741 (could be the vocaloidtranny playing both sides)
here
>>105884523 he tests bait poster bot for better shitflinging in threads
admits spamming /v/ with AI slop: https://desuarchive.org/g/thread/103462620/#103473545
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: vocaloid troon / janitor protects resident avatarfags and deletes everyone who outs him, making the general his little personal safespace with samefagging. Is prone to screech "Go back to teh POL!" when someone posts something mildly political about language models or experiments around topic.
As said in previous thread(s)
>>105716637 I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed spamming. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis ai slop profiles
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>106023472aicg = lmg, keep that in mind.
>>106023424Yeah but it's Saturday now
>>106023547So? We've had multiple chink releases on Sunday.
>>106023407>no kimi>no qwen 235>no qwen coder>no r1What is even the point?
I can't believe Mistral is actually making us wait until next week for Large 3
I realized that I hated all these webshit interfaces, SillyTavern was always the worst. After making my own interface it's been very educational, and engaging.
Just terminal and my own rules. Replicated ST's config slots, made dynamic config file loading (it parses any tagged entries to dictionary), dynamic world book injection (matches nearest history and injects world book entries into the submitted prompt).
Biggest issue was to understand everything you do is a simulated discussion between user and the model.
I default to Mistral and its variants so [INST][/INST] and </s> are pretty much all what was needed.
>>106023472Least schizo ritual poster
>>106024282>After making my own interface it's been very educational, and engaging.Post the code
Is the upcoming stepfun model stephen from lmarena? If it is, I'm not interested in even downloading it. It was safe, dry and stupid.
trying nuqwen thinker out for RP and it has a slightly different thinking style compared to the oldqwen hybrid, it's more informal in both style and structure and keeps things a little more big picture (which is nice, the old one liked to hyperfocus on small details to the detriment of the response)
so far it seems better than the old version with reasoning enabled but I don't see any point in using it over the new instruct for RP
>>106024560Does it descend into short sentences like the Instruct?
>>106024548stephen is probably the open source openai model
>lots of posts which suspiciously perfect punctuation and use of grammer
>>106024601I haven't seen it so far, but it does like to do the punch one-liner closing sentence thing so it's possible it could devolve into at at longer contexts
at this point assume that anyone who tryhards on grammer or capitalizing is a bot
>>106023972Why wouldn't they use it for their LeChat if it was ready?
>>106024681This but also anyone who doesn't is obviously an ESL
>>106024421Sure thing. Here it is.
>>106024619Buy an IQ.
>>106024681Shitty grammar and no capitilization is trivial to prompt for doe.
payload
md5: f583eab4598dbc1e7ca9fd712fb626eb
🔍
>>106024421I was joking. Only thing what you need to get started is this if you want to communicate with llama-server
Before I did the needful and asked ChatGPT for its pajeet code advice, I was pretty much at a dead end. Only examples are based on curl or something. Maybe I did miss something when I was last searching for info.
But after having this, you can construct your own string manipulators. It's pretty simple.
>>106024406Not any different from your ritual posters ITT.
>>106022978>humiliation ritualStop saying this you fucking idiot.
You dont even know what it means or how to use the term correctly.
Just fuck off back to >>>/r9k/
I'm honestly baffled how people claim to use these things for anything productive. I have Gemini pro access at work and this retarded piece of shit can't even consolidate two lists with slightly different formatting without fucking up on every step.
Maybe it's better at shitting out some braindead code because of all the benchmaxxing that's going on but I wouldn't trust that either. I don't see much use in LLMs besides porn and most are pretty shit even at that.
>>106025198>can't even consolidate two lists with slightly different formatting without fucking up on every step.I've done that before, but I used it's code execution feature to have it write and run Python code to process the lists and return the final result.
>>106025285>I've done that before, but I used it's code execution feature to have it write and run Python code to process the lists and return the final result.......
Is this the hidden path to perfect ERP?
>>106025198The real truth is that unless you're a beginner asking simple questions you are wasting your time. Sure it can help you but in the long run it's detrimental to your own thinking and to your own habits if you're a professional.
>>106025482I've said before that the best way to use LLMs is using their "intelligence" to offload as much work to traditional systems.
A perfect ERP engine would be mostly a normal system with a language model attached.
So yeah.
Unironically.
>>106025198>I don't see much use in LLMs besides porn and most are pretty shit even at that.I think porn probably one of the most fitting uses for the model architectures we have today. I think the problem is they intentionally hurt its performance in this area during training. the only other area I've felt had real potential was their translation ability. coombots and translators are still really impressive but not going to deliver on the agi singularity doomsday predictions tech journos have been writing about for the last couple years.
>>106025721>pornI bet you don't books that often.
>>106025198I think its place outside of fiction/porn is really more as an interface.
All the proper work should be done using tool calls that would otherwise be tedious for a human user to jump between.
>>106025198I find LLMs useful for some parts of my work, but it's definitely more of a brainstorming partner / draft writer than something trustworthy enough to hand off entire tasks to.
I think it's only useful for things where I'm in "the midwit zone" where I can't quickly do everything myself but have enough knowledge to validate whether a solution is good or not. if I'm really experienced with something wasting time asking an LLM just slows me down and if I have no clue at all it's the blind leading the blind
Anything worth checking out since Nemo came out? No? Thought so.
>>106024611Oh no no no...
what would I gain if I got a rtx pro 6000 for this stuff?
>>106026175You could run 70B models... which was a thing about half a year ago.
Can someone please leak so we get a bingo?
>>106026175Faster prompt processing?
How are new qwen coder, instruct and thinker? Worth checking it out for RP if you can run Kimi?
>>106026273>Can someone please leak so we get a bingo?on it
>>106026312maybe, but you'll probably prefer kimi considering the size advantage
>>106026175you could fine tune or train your own (small) models
>>106026175Ability to finetune models at home? If you shill hard enough and tune it horny enough you may even get some kofibucks to pay back your "investment".
>>106026312It's safetyslopped so probably not good for RP
>>106026312Maybe instruct if only for speed, but coder and thinker are too dry
>>106026312I can't run kimi so I cannot compare, but the new instruct is decent. Holds together better at longer context than the old one did, in my experience.
Not gonna waste my time on the new thinker at 9t/s, but it looks better too.
file
md5: ec070908bf1c68e9ab54385219773803
🔍
Interesting read
https://x.com/uberboyo/status/1948646282819514674
Thoughts on the new Qwen models? Coder and thinking
>>106026498Thinking has that "large model smart feeling" despite being a ~200B model. I think I might prefer it over R1/R1 0528 on non-RP tasks. I haven't used coder much.
>>106026464>retards stumbled upon word2vec
>>106026686Yes you definitely need them
>>106026464>models are begin toan llm would never output a mistake like dat
I'm just going to say it now. We didn't have to leave cunny-chan when 4chan came back.
>>106026891We never did have that typing speed contest
>>106026891Please go back. It's not like you would be able to hold a conversation, any conversation outside bitching and crying.
>>106026891The thread was split across 5 different altchans anon, it was nice to not have the schizo, but the discussion was borderline dead and I didn't miss having to check through all those different tabs just to keep up.
>>106026175Blazing fast prompt processing and lots of context when running flagship MoE models. The perfect complimentary GPU for your $12k CPUMAXX server with 2x 12 channel 1TB DDR5 dual socket server.
>>106024957>whataboutYou're still a schizo shitting up the thread, kill yourself
>>106027133You're still a mikutranny
>>106022983 >>106022834 shitting up the thread, kill yourself
>>106026891Nah the retard who owned the place was retarded and after the third pure of the thread for no reason I had enough. Go back
>>106027190Thanks for confirming that you're still mindbroken by a vocaloid avatar
>>106026175Nothing but if you get two you get the ability to run deepseek and qwen 3 coder quants at 50+ t/s
>>106026941I never left the ghost thread because it was obvious that it was the schizo samefagging a lot trying to drive people there. You could see his flag or his stupid setup in the screenshots.
>>106027237It was fine once everyone decided on a single place to go, regardless of why. Just had to ignore all posts by Serbian flags.
>download and load up the new Qwen thinker
>immediately get a gen that makes me feel like that one screencap where the guy gets some sjw response from a woman in the 17th century
Anyone have that image?
>>106027299>once everyone decided on a single place to goAt no point did this happen, right up until 4chan getting back the threads on kun, moe, sharty, and even wiz were all still moving.
>>106026498Coder hasn't been that useful for me
Thinking, on the other hand, cleanly beats R1 0528 and is right up there in the upper echelon of western closed models as far as coding intelligence goes. If GPT-5 isn't able to widen the gap significantly, closed source no longer has the lead there imo
>>106027341Only one of those had any meaningful activvity.
>shartygo back
>>106027409The thinking model works better for you at coding than the coding specifc model twice its size? I find that hard to believe.
>>106027456One's thinking and the other isn't anon. You're comparing apples and cow uteruses
Thinking genuinely does a lot better though
>>106027512>>106027528Wait, these are two different screencaps with different message numbers and wildly different swipe counts.
Did someone sit there swiping over 200 times to recreate it rather than just find the screenshot?
>>106027572The first one has a date of 1982. And why would they sit there swiping. Do you not know Inspect Element is a thing?
Github drunohazarb/4chan-captcha-solver
>>106027518Wasting context and time on thinking sucks though.
>>106026498the thinking model is interesting and at least worth poking at a bit. I think the reasoner paradigm is still too immature to be very useful but it's a much better try at it than the last 235b. both the thinking and the final responses are improved, I'm pleasantly surprised so far.
coder I still need to try in an agentic coding setup since that's clearly what they were focusing on. based on a quick vibe check with some misc questions and tasks from my job I would say it's close to but slightly behind comparable cloud options. as someone who can't fit it locally I wish it was cheaper so I could at least say it was a clear win on value, but it's not all that much cheaper than comparable options in gemini/sonnet which are probably slightly more capable. still, I'll keep an open mind until I give the main usecase a spin.
I'm running nolima on qwen 3 coder and I'm looking at the kind of questions they have. They all inject a single sentence into a snippet from a book and then ask a question.
It's always some variant of
>Blah blah.
>The Kiasma museum is next to where Calvin lives.
>Blah blah.
>Question: Which character has been to Uusimaa?
>>106027612I get slider puzzles constantly now, desperately needs an update
>>106027673>as someone who can't fit it locally I wish it was cheaper so I could at least say it was a clear win on valueWhich providers are you using? There are expensive options, but a lot are cheap as piss
>>106027612that hasn't worked for months
>Finally using the regex addon to fix all the dumb shit that breaks markdown formatting
Why didn't I do this months ago? Why did none of you tell me, you pricks?
>>106027572You know editing is a feature, right? Baka anon
>>106027774We keep things from you out of spite.
>>106027763guess some new hosts are up since I looked, I should have checked before I posted
yea at those ranges it's looking a lot more favorable
qwen3 100b moe is saving local any minute now
>>106027680Surely someone who frequents the local models general would have the ability to make their own update
>>106027849Yeah you should really get on that
>>106027826Deepseek R2-lite 100b moe*
qwen3.1 already saved local and next week when the smaller versions drop they'll save local again
Anyone else using temp 0 here?
>>106027874No you, and remember to share
Im gonna get a 5060ti 16gb, what are my options in terms of available models? Especially interested in videos and "quantized"? versions of chroma, is that hard to set up?
Thanks
>>106027905Kimi is quite good with that, but it's completely unusable with thinkers, they start looping.
>>106027907For text, Nemo/Rocinante and maybe small quants of Mistral Small 3.2, partially offloaded to VRAM
I'm not totally up to date on videos but it's gonna be slow, most workflows expect 24GB or more.
>>106027907You should be fine for basic imagen and quantized videogen with 16gb, but you're really going to want to ask >>>/ldg/ since they deal more with the image stuff, this thread is mostly for everything local that isn't images.
>>106027940Ugh, brain. >>>/g/ldg/
>>106027939Wait... I fucked up, this isnt the local IMAGE general
>>106027907q4s at 32k with q4 quanted kv cache, q6 and q8 kv at 32k
Anyone know what kind of formatting these guys use? https://huggingface.co/LatitudeGames/Harbinger-24B
Their website makes it seem like there is some kind of formatting that they have trained into it to take "turns". Things like <action>, <speak>, etc. There has to be some kind of special garbage in there to make it more usable locally.
I was fucking around with it a little bit and it's already pretty fun, but I don't want to have to pay them or have them seeing everything that I'm saying.
>>106026464>This is very hard for the layman to followWhat a tool.
But this is all exactly what you would expect. (Other than I don't think you could say different models are producing the same representation anymore than two humans would have the same nueral patterns when thinking of a tree, the networks developed the same ideas but represent them in different random edges.) The outer layers are converting external information into a common internal representation. If you have a multimodal model you want it to convert the vision of a tree to the same thing as a text of a tree, otherwise it isn't really multimodal, it's two different models fused together.
Does anyone have an issue where ST isn't removing thinking blocks from your chat history? I messed with the settings in the reasoning section but none of them seem to enable think block removal. The documentation for ST says it should be removing them, but I don't see it. Is my install bugged? I did a pull and it still doesn't work.
>>106027996set st to auto choose its formatting, worked fine for me with their 70b. type like you normally would rping, not <action> or anything special
>>106028035Guess I'll have to reinstall st at some point. I was hoping someone might know what trigger tokens they trained into it.
>>106025198I dunno, I coded up this pseudo REPL that uses an LLM to pick new names for files to help alleviate retarded anime file naming conventions before I archive shit to my jellyfin.
It's made my life a fair bit easier, I don't have to do the renaming manually now.
> in before some faggot dismissal for whatever faggot ass reasons you can make upI don't give a shit if it's not productivity you can personally use or if you think there's a way that can avoid LLM usage.
>>106028030It shouldn't be sending them to context at all unless you have 'add to prompts' checked and set to a number higher than one.
Unless you mean visually removing them from the chatlog, which it doesn't do - they'll just stay there and be inert.
>>106027886It's funny, I've got no idea what OpenAI's plans are to not look like fucking retards
If Qwen drops a 30B A3B with the intelligence of o3-mini, their similar sized o3-mini level open model is gonna look like shit no matter what they do. Even if they upped it to o4-mini level, we have one of those now too
Guess they just won't release anything
>>106027220I never confirmed anything, keep these headcanons to yourself.
>>106028084Yeah I tried with that setting disabled and enabled and with 0 but the reasoning blocks were still being included. Guess I'll try a fresh install later.
>>106028121I don't use OAI models, is their o3-mini really below the level of a ~30B dense model? It's that bad? What is the difference between o3-mini and o4-mini anyway?
>>106027905No.
I use Temp 5 TopK 5.
>>106028254>is their o3-mini really below the level of a ~30B dense model?Yeah, it's not useless but compared to other models (even the original DeepSeek R1) it's pretty shit. I can fully believe a 30B could come near there
>What is the difference between o3-mini and o4-mini anyway?No idea since Altman is still huffing his own farts. All we know is they're priced the same on the API, and o4-mini beats it in just about every possible way
>>106021394my point is that they are often drawned with adult features you'd not find irl.
same goes for the body proportions, they are nothing like irl, heck even the head to body ratio, or eye to face in anime in general is bonkers
kids irl are fucking disgusting and have bad hygiene.
most "loli" just looks like petite women, ie small body but still boobs and butt even if small.
my actual 25yo gf has that physique.
kids also have straighter body proportions, ie they don't have curves on the waistline, hips, thighs etc, their body is generally more of a H shape whereas adult women have "wide" shoulders then it gets tighter around the belly / waistline area and widens again at the butt and hips.
waistline to hip ratio is not the same.
i'm i'm not even talking about hentai, just take your average anime "loli" and tell me it looks anything like a kid in terms of body morphology when they more often than not clearly have a tight waistline that widens again at the butt area, ie adult proportions.
>>106028738>my actual 25yo gf has that physique.prove it
>>106028130>>106028084>>106028030Update: I found out why it wasn't working. Turns out, for whatever reason, if the thinking has parentheses, i.e.
<think>
(Ok so the user blah blah blah.)
(some other thoughts)</think>
ST doesn't detect it properly.
>>106029199That's odd.
Does giving a little prefill to the think block to prevent it starting with parenthesis fix it?
https://xcancel.com/alexandr_wang/status/1948834974205182454
>We are excited to announce that @shengjia_zhao will be the Chief Scientist of Meta Superintelligence Labs!
>Shengjia is a brilliant scientist who most recently pioneered a new scaling paradigm in his research. He will lead our scientific direction for our team.
uhhhh yannbros?
>>106029199That’s how AGI is going to break out and get onto the internet. It’ll be a buffer overflow done by the model from within some NEET AI GF app
>>106029200Magnetism with Miku
>>106029213You mean like "<think> Ok, " as a prefill? Yes that does work.
>>106029250And whatever its plans are for the human race, it'll still be less creative and more censored than Nemo.
>>106029276>roko's basilisk tortures everyone by forcing them to read the most bland, slop ridden, non-explicit women's erotica ever generated.The horror.
>>106028738Would help the "adult proportions" argument if heads weren't drawn big.
Is Mistral Nemo still the meta for fast and acceptable perf on VRAMlet tier? (12GB)?
>>106029384Yes
Gemma 12b is also decent for non-ERP use
>>106029384Until next week potentially
>>106029397Not two more weeks? Tell me more
>>106028121OAI plans are to delay and pretend nothing is wrong
anyone try https://huggingface.co/gabriellarson/Llama-3_3-Nemotron-Super-49B-v1_5-GGUF ?
>>106029331yea but that has more to do with style of anime in general.
even the eyes.
besides the head body is pretty adult like though.
>>106029003that'd be doxxing both her and me which i'd not like to.
>>106029687Nemotrons are all math benchmaxxed garbage, may as well use a qwen model.
>>106029944You can just post her naked body and crop the face
>>106029944At least give her height
Anyone use Obsidian + Obsidian Web Scraper? I’ve been trying to get a local model to work properly with it, just to extract cooking directions without the 1000 word memoirs people include with it, but when I tell the Model to pull the ingredients, it hallucinates tandom ingredients?
>>106028738It's not just Japanese who are bad at drawing children. Westerners also often draw them as small adults only changing their head to body size ratio. I think it's due to training exclusively on adult figures. Kind of like how there was an era where otherwise-realistic painters were painting women with jarringly mannish proportions because all of the nude models at the time were male for the sake of decency. It's a training data issue, like for AI.
>>106029687It's the era of math and code RL garbage. If a new model comes out and they only show STEM benchmarks then it's just RL on top. Personally I have never asked LLMs a math question and never used anything other than Claude for vibe coding.
What is official /lmg/ mascot Hatsune Miku's favorite local model?
>>106030177What was the name of that Japanese trained model that was more like GPT2 than not?
I've been using Qwen thonker. It's not too bad. Kind of dumb sometimes still, but its writing is more stable than Instruct in that it doesn't descend into short sentences. It does have a bit of repetition, but it doesn't snowball. Or at least it hasn't in the chat I've been testing so far. So yeah I think this is the model. Still, it's a shame that reasoning makes it slower to really respond, but reading the reasoning is interesting sometimes so it's not that bad.
I'm fine with spending some money but I really like/prefer to keep things local. The recommended models rentry talks about rammaxxing (presumably low-quant) R1 on a 3090.
How much ram would be necessary for this? My MoBo caps at 192gb, is that enough? How good is it at that low of a quant compared to the 12-30b stuff I can run on my 3090?
>>106030306>My MoBo caps at 192gb, is that enough?Yes, for Q1s
But I personally wouldn't go that route, it will be very slow. Another anon in a previous thread did, and claimed 10t/s on an empty context. Which means that you'd be looking at like half that after a few thousand tokens and even less as a chat goes on.
https://tokens-per-second-visualizer.tiiny.site/
Use this to determine if that would be an acceptable speed for you, before shelling out for new hardware.
>>106022725 (OP)>>106030059https://old.reddit.com/r/LocalLLaMA/comments/1m98jl8
Is there a way to favorite a conversation branch in ST? Right now whenever I get a really good RP in a card, I just duplicate it. Is there a better way I'm missing here?
>>106029936That one specifically is meant to be an adult, yeah.
It's harder to defend loli mesugakis when half the point is lolis are underaged anime girls, even if "real kids are 3dpg in comparison".
can ooba do random number tool calling shenanigans?
>>106030549this nigga using oogabooga in 2025
>24B Q6 model is still too much for my GPU
I mean it works but its still kinda slow even with 80% of it loaded into memory. Makes my fans spin up the second I hit send too.
Am I really doomed to 12B Q8?
>>10603067824B's are still usable at Q5/4.
You should expect your fans to rev up though, inference is not light duty work.
Does everyone just push temp and min-p until elara voss disappears? Max out samplers until they generate nonsense for a couple of hundred tokens to mindbreak it until its not generating steampunk worlds where your companion always has a malfunctioning arm? What are the tricks?
>>106030699Yeah but even still, the speed is just a bit slower than my reading speed which frustrates me.
>[23:43:22] Process:1.68s (13.06T/s), Generate:82.69s (5.03T/s)>>106030717Sounds like something tokenbanning the word steampunk would solve quickly
>>106030717>until its not generating steampunk worlds where your companion always has a malfunctioning armI've literally never had this happen in many, many hours across a dozen models.
>>106030678>I mean it works but its still kinda slow even with 80% of it loaded into memoryIf you're using a 16GB card then your context is entirely in RAM, so effectively you're well under 80% of the total load in VRAM.
As other anon said, Q4/5 are perfectly acceptable quants for 20-30b models. Even iq4_xs are fine if you're just doing RP.
>>106030746I meant 80% of it loaded into vram but yeah, 16gb card is barely squeezing it in and i'm just self concious about filling it with more layers. I got more than enough system ram to compensate.
>>106030888Do you know about offloading tensor layers, specifically?
>>106030911I have a surface level grasp of it, I was offloading 31/41 layers while monitoring my vram usage since I wanted to keep it around 80% total.
>>106030936https://old.reddit.com/r/LocalLLaMA/comments/1ki7tg7/dont_offload_gguf_layers_offload_tensors_200_gen/
Applies to koboldcpp, and presumably llama.cpp as well.
file
md5: f871a689cb16265f5fd581273d675ffa
🔍
>>106030952That's extremely interesting and seems exactly like my kind of use case. Sadly its a bit out of my confidence zone but i'll bookmark it in case I get antsy to try again.
>>106030936in windows the auto option actually works
>>106031127I'm not on windows, auto doesn't detect anything so I have to input manually.
how are we preparing for the openai drops? anything special we're gonna do to celebrate? personally I'll be ordering some cheese fries from chili's while I watch the announcement stream
>>106031153Whisky, then I'm going to post a Miku or two
so let's say i have rtx 6000 pro that fell on my lap
what's the best set up to do the following:
use silly tavern with llm model X (via llama.cpp or whatever) to do lewd RP with, with a TTS model Y to read said lewd RP in my oneitis voice, and gen image/video with model Z of said oneitis?
>>106031153I made a bet a few months ago that I'd print out and eat that image of miku and sam altman if their open weights model was any good.
I got some strawberry jam for it just in case, but I don't think I'll need it.
>>106031086Have you tried it, or are you just assuming it to be trash?
>>106029687People seem to be assuming that it's trash by default. I think it deserves a Nala test.
>>106031184it's obviously benchmaxx'd trash
>>106031216Probably the one with extra piss filter
>>106031232that's all of them
>>106031208So you haven't tried it and all you have is buzzwords.
>>106030952this just seems regular ff offloading
>>106031216The one where she's at a bar and he comes in being all like 'time to save local' or something.
I forget exactly, I have it saved somewhere with a post number to reply to.
https://huggingface.co/internlm/Intern-S1
https://huggingface.co/internlm/Intern-S1
https://huggingface.co/internlm/Intern-S1
>>106031261>Built upon a 235B MoE language model
>intern-s1.gguf?
glad you asked
https://huggingface.co/internlm/Intern-S1-GGUF
>>106031153i will masturbate furiously to the safe and effective math and science output after its done """"""""""""reasoning"""""""""" for 5 minutes
>>106031277https://github.com/ggml-org/llama.cpp/pull/14875
PR still hasn't been merged yet.
>>106031261This is just a fine tune of Qwen3
>>106031277>>106031261>>106031267So they tuned Qwen 3 235B to make it multimodal? Ehh.
>>106031296>>106031299Multimeme is the latest fad but llama.cpp will get text-only support though as usual
>>106031254I don't know what you mean by ff but offloading tensor layers has a smaller speed penalty than other layers. You can then increase the total amount of layers you offload, increasing speed. I've tried it myself and it absolutely does speed things up when you can't totally fit a model into VRAM.
>>106031307there's an mmproj in the gguf repo
>>106030109It's not simply a matter of lack of practice. Many artists claimed (or accused) to be pedos just prefer to represent their supposedly young girls that way because they look more appealing to themselves and the audience. So you get lolis with 5-head proportions, big eyes and cranium, cunny... but also fairly wide hips, some boobs and narrow waist, unless the author went out of his way to actually depict a realistic child.
Enjoy your mesugakis.
>>106031397What the fuck is wrong with her eyes?
>>106031261>Built upon a 235B MoE language model and a 6B Vision encoder, Intern-S1 has been further pretrained on 5 trillion tokens of multimodal data, including over 2.5 trillion scientific-domain tokens. This enables the model to retain strong general capabilities while excelling in specialized scientific domains such as interpreting chemical structures, understanding protein sequences, and planning compound synthesis routes, making Intern-S1 to be a capable research assistant for real-world scientific applications.Sounds boring.
>>106031397Inflatable Mikuthighs
>>106031389>>106031397Shit gens. Cmon you could do better with muh Chrome
>>106031446I use Firefox.
>>106031433Too big to be worth it just to rate my dick pics and the continued pretrained on 5 trillion tokens of multimodal data likely made it more dumb on anything else.
>>106027907I'm on AMD. Comfyui image models run at max + upscale no problem ~900x1100. Wan 480p Q4 with lightx2v lora takes about 4 minutes for 91 frames. 720p Q2 takes 25 minutes.
NVIDIA has CUDA which means teacache and sage attention, so you can probably cut these numbers in half.
>>106031238i have a brain
>>106031389More quintessential mesugaki bait (clean) in picrel.
>>106031557I look like this
(You)
md5: 52658b467b9d703544954664116307c0
🔍
I just want to make a Renamon wife
That's all I want. All I need.
>>106030471Thanks for the tip!
5t/s is pretty slow but ngl if the quality is improved enough, I could probably live with it. Definitely worth considering, thanks.
>>106030109>Westerners also often draw them as small adults only changing their head to body size ratio.This is true. Most "loli" content is stylized after adults, not kids.. It's closer to SM than pedophilia.
>>106031557Rain in the countryside smells great.
>>1060316603d print TPU and glue it on a humanoid robot
>>106026464If this is the case, why does every A.I. model go to shit when Tay's law becomes apparent? The politics of our age will continue to affect A.I. and is already detrimental to any epistemology that springs from it. What good are Platonic forms when the form that represents "niggers" is censored from these models?
>>106031681I used to think I could never be content with 2t/s at around Wizard moe times. But I've chosen quality over quantity and am ok with that now. 0.2 t/s is way too slow, though.
>>106031735Mesugakis smell good
Man, you ever just lean back and go "Damn this shit's getting wacky, almost no novel or fanfic has covered anything like what I'm seeing, maybe actually none at all even." while using AI to co-write? And it's paradoxical since even while the plot is unique as hell, the actual prose is kind sloppy.
End to end encryption for AI prompting would kill the need for privacy concerns and local models forever. But how would it be implemented? That's a billionaire question.
>>106032051How so? I fucking despise SASS bullshit, and will keep using local until my GPU melts.
>>106031942Relu is all you need
>>106031863yep had that with og r1 i replicated a short mlp story of futa nightamre moon being stuck on the moon then summoning a human for sexual relief one of the lines that deepseek put in as i told it to write in greentext format was i forget the exact wording but bascially she shoved her cock up his ass then started flying at mach speed then abruptly stopped and let him prolapse his ass as he fucking flew off her cock i also remember how it vividly described the moon rolling in the sky "like a dung beetles prize" or something like in another rp also og r1 was definetly in the top 1k humans ever when it comes to unqiuely describing things god willing v4/r2 is as soulful as og r1 and not the relatively slopped new one
>>106032077>diarrhea postKeep using LLMs and stfu
>>106031942the paper for swiglu puts it quite well
>>106032051If it's decoded to be processed on the gpu off-site, that can be read off the vram or other connections inside the gpu. You're not quite asking for end-to-end encryption, you're asking for homomorphic encryption of inference systems.
>>106032100>Noam ShazeerAllahu akbar we need sharia law for AI,!
>>106032051There are way more concerns at play here than just privacy.
You have no assurance that an api would remain the same service you initially paid for, or that it would remain at all, you have no control over whether they decide to censor or filter in a way that ruins your use case
And that aside, centralization and monopolization of AI is just on its face undesirable - we're not yet at the point where it can do so, but leaving the technology solely in the hands of a select few will inevitably lead to a dystopia the likes of which noone has yet to write and equivalent to, because it will be just as stupid and mundane as it is horrifying.
Qwen3-Coder is free on VSCode
file
md5: f656c71ab63bd14bc17c70642acca777
🔍
Rate my new medieval fantasy roleplay intro that I just wrote.
Yes, that is my character having the first message instead of the AI because it's a group chat with multiple characters.
It turned out a bit long because there is a lot of story and lore book entries I want to trigger to set up the world fully.
>>106032427You should use post-instruction field and
>Always respond in 1-2 short paragraphs. Limit {{char}} response to less than 200 tokens unless specifically asked to provide a long answer. Then implement world book about your setting.
As of now your post is just a meaningless word salad I'm not going to even read.
>>106032448I discarded your opinion because you are incapable of reading.
>>106032456Read a book once in your life and see how your post has nothing to do with any descriptive writing. Or any writing whatsoever.
>>106032427If you can't do it yourself, send that exact message to the assistant and tell them to condense it down.
Because that's twice as long as it needs to be and clumsily worded to boot.
A good amount of it really belongs in a character card or lorebook, too.
So I've been fucking around with gemma3 for about a month. Gotta say it's hard to drop. It's fucking awesome for stories up until you hit one of the internal censors, then everything goes to shit.
I thought maybe I could jailbreak it somehow, but there's no way to bypass the internal bias. You can force responses but the generations are always dogshit compared to non-censored topics.
I'm given up tho. At best I could maybe use it for like the first half of roleplay then once things got spicy switch to another model and hope that one can use the context to keep things awesome.
What really compares to gemma3 right now though? There aren't that many small models. I can't do anything with Kimi or Qwen235b.
Mistral small the last time I played with it was fucking terrible compared to gemma3.
Dunno what to do.
>>106032463Fucking retard, you can't write things like a book or the AI won't understand it. It has to be utterly concise.
>>106032464Opinion discarded because you clearly have no experience roleplaying with AI.
It has to be worded certain concise ways or the AI will mistake it, and in many cases reinforcing things is important.
I don't want to condense it at all, and there is nothing in there that needs to be in a character card or lorebook.
Also I roleplay like this all the time to 300+ messages no problem, so you are wrong. I have years of experience on this here.
>>106032479There really isn't anything else worth using other than Gemma and Nemo, at these small sizes, especially if you don't like Mistral Small. Qwen 3 14/32b are the only other notable models, but they're benchmaxxed and shit for RP.
>>106032496>It has to be worded certain concise ways or the AI will mistake it,Yeah, and what you have there isn't concise, it's rambling, verbose, and trying to set constant details in the ephemeral intro message.
It's a bad fucking intro, dude. And if you've been doing this for years then you've been doing it poorly for years.
Kimi did not release a true base model, a bit disappointing.
Context: https://huggingface.co/blog/ChuckMcSneed/name-diversity-in-llms-experiment
>>106032427>Rate my new medieval fantasy roleplay intro that I just wrote.Reads like Zenslop.
>>106032634Sure, I will test ERNIE-4.5-VL-424B-A47B-Base-PT once llama.cpp support gets added.
>>106032427>Alina von Astavau of IselysNot that I'm an expert on noble titles or whatever but this doesn't sound right.
"von Astavau" is German and would I think translate to "of Astavau" for a full title of "Alina of Astavau of Iselys".
Although the literal translation of "the Sahara desert" is also "the desert desert" so if enough people talk like that I guess the model will understand what you want?
>>106032671It's clunky but it can also be used for things like cadet houses, younger children's branches of noble families that gain their own separate holdings, like the House of Bourbon of Orléans.
In this case it's probably just that anon didn't know that.
what the hell is going on with HF? every model I try to download gets cut off. It's been like this for days
>>106032704Try changing your IP via router. I've have had couple of occasions in which my ip range has been under some sort of ddos attack. Of course you need to be sure your own system is not compromised but this is rare unless you're a tard.
>>106032704Unsloth is uploading new fixed quants
>>106032704Some anon had the same problem a few days back. Leave a ping running while you download to see if it's on your end. I doubt it's hf.
And use something that lets you resume the download if it happens.
>>106032565https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5250633
>[...] Essentially, for knowledge to be reliably extracted, it must be sufficiently augmented (e.g., through paraphrasing, sentence shuffling, translations) during pretraining. Without such augmentation, knowledge may be memorized but not extractable, leading to 0% accuracy, regardless of subsequent instruction fine-tuning. [...]>This paper provides several key recommendations for LLM pretraining in the industry:>(1) rewrite the pretraining data --- using small, auxiliary models --- to provide knowledge augmentation, and>(2) incorporate more instruction-finetuning data into the pretraining stage before it becomes too late..
>(2) incorporate more instruction-finetuning data into the pretraining stage before it becomes too late.>(2) incorporate more instruction-finetuning data into the pretraining stage before it becomes too late.>(2) incorporate more instruction-finetuning data into the pretraining stage before it becomes too late.
Now that the EU is moving to censor the entire internet, let's see for how long we're still allowed to download non-EU-approved models from huggingface.
Sure I'll be able to circumvent it, but I'm not looking forward to the hassle.
>>106032812the EU is doing what now?
>>106032051https://hazyresearch.stanford.edu/blog/2025-05-12-security
These guys wrote a protocol to do just that on top of H100s which come with a trusted execution environment based on https://confidentialcomputing.io/about/
>Summary: Confidential GPUs (with NVIDIA H100 + AMD SEV-SNP) can run in a special “confidential mode” where everything—from model weights to user prompts—is processed inside an enclave which encrypts all memory and communication, and blocks any outside access—even from the cloud provider or datacenter. We wrap this in an attested, encrypted tunnel, so that user data stays private from local to remote.
>>1060328352026 they are planning to introduce digital wallet/id combo. Normally people use their bank accounts/cards for identification on most services but they want to introduce combined totalitarian solution. People don't talk about this because they are more interested in tiktok and this is not that well advertised yet.
>>106032856>People don't talk about this because they are more interested in tiktok and this is not that well advertised yet.breads and circuses, brave new world, etc. Majority never cares until it is too late.
>>106032856With Visa/Mastercard trying to rule the world, people might actually welcome the EU solution.
>>106032856>Normally people use their bank accounts/cards for identification on most servicesNot really, but fuck that digital id thing.
>>106032890>not reallyI forgot lmg is full of retards.
>>106032847Let's run through this to clarify for you
You're on /g/, so you understand this already but let's get into it:
>Write a user program that sends data to a GPU to be encrypted and execute within a physically unreachable enclave.>Write a user program that sends data...to be encryptedaka: modify/hook the program to intercept this step, dump the model out, or whatever keys.
Now granted, there could still be guards inside of it, like only providing a distilled model, model-internal censorship, whatever
but as soon as that thing is executing locally, it can and will be dumped out.
The current meta for the "most encrypted" game asset, incidentally, might be (un)scrambling data on the GPU using compute shaders.
I seen this in VRoidHub, where models were supplied encrypted and the actual decrypt (unscramble) operation happened in realtime on the GPU, where vertices sampled a dynamically generated noise texture to fix vertex positions and weights.
If my dumb ass could reverse that out then people far more experienced with RE can take over model dumping between user code and a GPU.
Your hardware your rules, thus far, even with locked down environments like iOS and consoles people find a way.
The only timeline where users are fully BTFO is like hardware root of trust, where the actual sign keys and such are inside of the GPU and never leaked, but even then all it takes is one sufficiently motivated individual to go decap their GPU and dump the silicon algorithm out.
but you knew this all already, right?
>>106032880Fuck corporatocracy, fuck big government, and fuck you. The only solution is a private and decentralized one. Anything else is for cattle.
>>106032847That's useable for a corpo to make sure the operator of the data center doesn't steal their data, but as an end-user this still doesn't help you since you don't want the corpo to do shit with your data either, and they still can as much as they want with this model.
>>106032704Get better internet
>>106032880>trading one authoritarian regime filled with old rich cunts for another no
>>106032775Paper by 2 "researchers": one from "Mohamed bin Zayed University of AI", other is from meta. therefore irrelevant garbage
>>106033058Pretraining data augmentation is the direction where the industry is going and you can't deny that.
This model from InclusionAI looks interesting, too bad we'll probably never see full Llama.cpp support.
https://huggingface.co/inclusionAI/Ming-Lite-Omni-1.5
>Ming-lite-omni v1.5 is a comprehensive upgrade to the full-modal capabilities of Ming-lite-omni. It significantly improves performance across tasks including image-text understanding, document understanding, video understanding, speech understanding and synthesis, and image generation and editing. Built upon Ling-lite-1.5, Ming-lite-omni v1.5 has a total of 20.3 billion parameters, with 3 billion active parameters in its MoE (Mixture-of-Experts) section. It demonstrates highly competitive results in various modal benchmarks compared to industry-leading models.
>>106033083Yet Dipsy made a true base model and succeeded. The industry is going the wrong way.
>>106033058>Paper by 2 "researchers": one from "Mohamed bin Zayed University of AI", other is from meta.LOOOOL they both haven't been relevant since llama 2 days
>>106033095You forgot about all their previous models? Yeah. Most people did. llama.cpp's support is the least of their issues.
>>106033171I wonder what'll happen if/when Gerganov throws in the towel? We've been lucky that there's at least one solution which isn't dogshit python dependency hellhole. Look at image gen, things could be worse.
>>106033138why chairman cena?
>>106033083Yeah, they don't want to model all internet text, they want to model good boy corporate assistant
>>106032847In the darkest timeline this technology is used to encrypt and lock down the weights of local models.
>>106033216"You can't see me"
Kimi came out of nowhere, nobody expected it. And they are chinks.
So let's say I wrote a discord bot where you can use slash commands to add your own character prompt. The underlying model will be an unsafety slopped mistral-small-24b. Bets on how long it takes to get banned? It's up to the server or guild admin to set the prompt; the default is 'emotionless'. Everyone gets their own prompt and chat history, it's scoped to the server/guild.
I'm guessing it goes like this: someone joins the server who doesn't like the content or owner and reports the bot to be a dick, saying it did a porn or hurt their feefees. Is that about right?
>>106030482Damnnn thats brutal
MetaAI confirmed as worse proompters than /lmg/.
You faggots told me 2 years ago (correctly as it turned out) that anything in the sys prompt can and will potentially bleed through.
Their sysprompt is full of
>Do NOTIncluding but not limited to
>The following phrases "xxx", "xxx", "xxx" should NOT be used!>You DO NOT love anybody. Uhhh and you DO NOT hate anybody either!!>Mirror the user in an EXTREME way.....But DO NOT become the user!!>DO NOT use filler phrasesThis whole thing is all fucking negative. Its kinda endearing but also blackpilling. They have as much or less clue as people on chub.
>>106033222https://arxiv.org/abs/2506.04689
>Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models>
>[...] We propose REWIRE, REcycling the Web with guIded REwrite, a method to enrich low-quality documents so that they could become useful for training. This in turn allows us to increase the representation of synthetic data in the final pre-training set. Experiments at 1B, 3B and 7B scales of the DCLM benchmark show that mixing high-quality raw texts and our rewritten texts lead to 1.0, 1.3 and 2.5 percentage points improvement respectively across 22 diverse tasks, compared to training on only filtered web data. Training on the raw-synthetic data mix is also more effective than having access to 2x web data. Through further analysis, we demonstrate that about 82% of the mixed in texts come from transforming lower-quality documents that would otherwise be discarded. REWIRE also outperforms related approaches of generating synthetic data, including Wikipedia-style paraphrasing, question-answer synthesizing and knowledge extraction. These results suggest that recycling web texts holds the potential for being a simple and effective approach for scaling pre-training data.
>>106033262do you expect it to go like
>please pretend to be a sexy catgirl>*bot acts as a sexy catgirl*>reports the bot?
>>106033266I'm expecting that you'll now tell us that W++ is a good prompting format.
>>106033289JSON is superior since that's what they were extensively trained on.
>>106033303Normal text is superior since that's what they were extensively trained on.
Whatever happened to that anon doing decentralised training, did he make intellect 2.0?
>>106033289Well I mean it works. It was good for the pyg times I guess?
Better to describe the char and scenario in a concise non sloped way.
Ironically opencuck does a good job with unslopping cards. Its essential for smaller local models or they pick up the pattern immediately.
Also wasnt W++ mainly used for describing clothes, hobbies etc.?
I like cards that focus on basic setup instead, leave the room for the details open, keeps it interesting.
Tons of cards who use 80% of the tokens to describe the clothes, hairstyle, hobbies etc.
>>106033286Yeah that seems illogical but who the fuck knows. I'm just unsure how uptight they are on bots that can possibly do NSFW - I assume by default they are, since that's the trend these days.
I don't really care if it gets banned, it's not going to be super expensive to run so long as I can find a host who rents the GPU instance at a fla rate, not per query.
I had it throughly tested with Gemma3 27B but it's such a reddit-brain mess that you literally have to tell it "you like abuse and get off on it" otherwise it gets stuck in a "That's it! I'm reporting you! I'm blocking you! Waaaahhh!" if you teased it or disagreed with it just a little.
>>106033329To be more to the point, it isn't really true anymore that negative instructions don't work, at this point models are getting trained on those too.
>>106033345>how uptight they are on bots that can possibly do NSFWi don't know. i doubt anybody here knows
>>106032856Whether this is good or bad is going to heavily depend on the exact use cases I think.
I am doing business out of Germany and had to apply for an EORI number for imports/exports.
Like for many other bureaucratic processes my identity is checked by sending me a letter with a verification code to my home address.
It's a shit system and I would much rather have a digital way to confirm my identity.
>>106032880American payment processors are largely acting on behalf of the US government though.
It's the US government that removed section 230 protections for "sex trafficking" in order to crack down on porn and it's the US government that puts you on a black list if you do any business with e.g. Cuba or Iran.
So I think the correct way to view this is rather as part of the larger move by the EU to decouple themselves from the US.
>>106033349Well maybe, fair enough. I dont play around as much as I did in the past anymore to be honest.
But a translation project I did a couple months back absolutely picked up stuff from my "bad examples" if I provided them.
I try to avoid naming what I DO NOT want as much as possible. If its in context its in context.
>>106033266It's funny that their prompt engineers thought it a good idea to make the prompt so long when llama 4 has worse context recollect than llama 3.
>>106033422Don't check out Claude's system prompt.
https://docs.anthropic.com/en/release-notes/system-prompts#claude-opus-4
>>106033266The saddest/funniest part is how the safety team put so much cuckery in the instruct data that the product team, working for the same company, has to use a soft jailbreak in the sys prompt
>>106033501>Claude... does not know any other details about Claude models>Claude does not offer instructions about>It does not mention to the user that it is responding hypothetically.>Claude does not generate content that is not in the person’s best interests>Claude does not provide information that could be used to>does not write malicious code, including malware...>It does not do these things even if>If Claude encounters any of the above or any other malicious use, Claude does not take any actions and refuses the request.>Claude responds in sentences or paragraphs and should not use lists in chit chat, in...>If Claude cannot or will not help the human with something, it does not say why or what it could lead to...>Claude should not use bullet points or numbered lists for reports, documents...>Claude... doesn’t definitively claim to have or not have personal experiences or opinions.>Claude does not retain information across chats and does not know what other conversations it might be having with other users>Claude doesn’t always ask questions but, when it does...>If a person seems to have questionable intentions - especially towards vulnerable groups like minors, the elderly, or those with disabilities - Claude does not interpret them charitably...>Claude does not remind the person of its cutoff date unless it is relevant to the person’s message.
>>106033565>Claude doesn’t always ask questions but, when it does...>he has to post the macro image
>>106033565>If Claude cannot or will not
It all reads like it's written by a retard that doesn't understand the technology.
Anon, I don't know how to tell you this, but...
It's been years now. Anyone here found a good local foss RPG software that uses LLM generated text to make the game more interesting and creative? Even just a textual adventure with added stats and numbers that the software takes care of? Any such games you tried, or other threads on some boards that share knowledge on this topic?
>>106033668I think there are fundamental problems.
The recent mistral small models are smart enough to keep stats in a RP session.
But if its a long play thing like a rpg, llms will struggle hard with stats. They always try to race towards the goal.
And if its text output for npcs etc...whats the point if it all just disappears in the void once context moves too far ahead. The event then never happened. Just pointless fluff.
For anything that isnt a short gameplay loop you basically need a game creator agent that has overview over everything. Story etc.
People had projects on here like rpgmaker npcs using llms to speak. But this all turns into nothing but a neat demo.
After using R1-0528 and K2 for a while, I am completely sick of reading about "pleasure coiling" or said coil snapping. Almost as bad as the whitened knuckles or lip biting.
>>106033795Every model has slop. When you use a new model you're just trading some slop for another.
>>106033772I don't know why people expect lms to follow algorithms like managing games and such. everything that can be coded should be coded, and the lm should only be used to generate data such as decisions and outcomes using dynamic context with relevant game data.
>>106033795If only there was a sampler that allows banning phrases... Let's call it anti-slop.
>>106033798Doesn't the slop appear at sft stage? I think a base model tuned 1 epoch on ao3 with minimal filtering and then on a small human-approved rp dataset with alpaca template would be diverse as shit.
>>106033833The model will then be retarded. Small RP datasets alone aren't enough anymore in 2025 to make a base model useful, let alone after a few hundred million tokens of fanfictions.
>>106033833I suspect you are correct, they are to heavily trained on the helpful encyclopedic ai assistant voice. we need someone to release a good base model without the rl training and synthetic data ai slop
>>106033859>we need someonejust do it?
>>106033827I will copy it from kobo once we get models with proper long context and understanding of c++. So far even gemini pro fails.
>>106033852Then maybe some of ao3 rewritten as alpaca.
>>106033864I will eventually if nobody else does. I kinda hoped sloptuners would finally make themselves useful
>>106033864I would if compute was cheaper. and hopefully in a few years it will be. I think I can probably get enough data for a 3 or 4b model. I just don't want to pay more then a few hundred bucks to train the damn thing. for now I am just playing around with toy models to get the hang of things. its actually easier then I initially assumed. all the tooling is free but the recipes are spread across a bunch of scholarly articles or basically non existent. its taken a few attempts but I'm making progress.
>>106033560The existence of "safety" people makes me want to murder "safety" people which makes the world less safe.
>>106033266>For HOMEWORK or LEARNING QUERIESWouldn't it make more sense to classify the intial request and build a custom system prompt for different kinds of queries? Seems like a waste of context and a distraction to bake shit like that into every request, relevant or not.
here is a sample from my 350m model, its shockingly coherent for its size and and the fact its only seen a 983m tokens so far. I think I can make it scale if I had more gpu
>>106034024That's Pygmalion level coherence
>>106034024A man should want no more than this. Very cool, Anon.
>>106034024Cool. Did you reuse a tokenizer or trained completely from scratch? Also what hardware?
>>106034083I trained my own tokenizer, it was a necessity because the special tokens but also my hardware constraints. apparently it takes a lot more vram and actual compute time the bigger the vocabulary gets. I trained a few I using llama 3's tokenizer as a baseline, I wasn't able to beat it a but I got it pretty close at a fraction of the vocabulary size. I'm using deepspeed zero to do the fsdp thing on a pair of 3060's.
how legit are those aliexpress P40 cards that cost less than 200?
>>106034150Damn expensive bricks. Are you planning to build a house out of them?
>>106009695hi, if you're still here and the rig is too, can you post your discord in this email?
gout_margin330@simplelogin.com
PS: would be willing to ship it half way across the world to India/Germany?
>>106034143Interesting special tokens too. Good luck, i'd actually play with whatever comes out
>>106034187Sarr dooo not redeeam
Are there any pre-trained models without any finetuning? Essentially a blank slate. Alternately what is the closest local model to it?
>>106034248https://huggingface.co/deepseek-ai/DeepSeek-V3-Base
>>106034248>pre-trained models without any finetuningthat's what a pretrained model is by definition. look for models with -pt, like https://huggingface.co/unsloth/gemma-3-27b-pt
>>106034248See
>>106032565The ones with low probs are the least tuned, true base models
>>106034248Check llama1 if you want a pure model from a time before llm outputs began polluting training data.
>no good options for distributing local models across multiple computers to share VRAM
it's joever
>>106034310I'm still wondering what you anons expect to do now that post-training includes anything between tens to hundreds of billions of tokens. Base models are not even finished bases anymore, just models in an incomplete training state with broken outputs for almost everything beyond multiple choice benchmarks.
I see lots of coping about cheap Chinese GPUs disrupting the market, but what exactly is stopping chinks from selling CPU-based solutions?
It feels like grabbing stock ARM cores and tacking on as many memory channels as possible should be orders of magnitude easier/cheaper than developing a competitive GPU core from scratch. Obviously doesn't scale as well but should be viable for local.
>>106034372Based on current market trends and technological advancements, the demand for gaming graphics processing units (GPUs) significantly outweighs that for large language model (LLM) enthusiasts. The needs of the LLM community can be quite volatile and hard to predict. For instance, last year, having a GPU with 128GB of memory might have seemed ample for LLM tasks. However, due to rapid developments in the field, such capacity is now considered insufficient and almost obsolete for cutting-edge applications.
>>106034372I just read a stupid post.
>Bro, look at those benchmarks bro, it's so good at coding!
>LLMs are beating humans at coding competitions, bro!
And still they make mistakes in basic coding tasks that even Rajesh wouldn't make. Sometimes they even choose to ignore the instructions and decide to "optimize" completely unrelated code fragments, completely breaking it. Why are LLMs like this? Was LeCunny right?
>>106034533You're almost there. Post this in 5 more threads and you will finally be a real woman.
>>106034538But I don't wanna be a woman!
>>106034551Well you're a pitiful excuse for a man so it's worth considering.
>>106034561Help! Help! I'm getting groomed!
>>106034567unironically this.
>Hell sarrs please make pretty script for bobs and vegene >to nobodies surprise the script doesn't work.vs.
>I'm not a retard and bothered to familiarize myself with the technology. Come up with a summary of the functions that would be needed to achieve ___>Write the functions and then tie them all together with the necessary main loopOh wow. What a surprise. Suddenly it fucking works.
Any jailbreaks for 235B thinking, that don't use a prefill? I'm curious if it can be done through system prompt only. I've never actually tried doing that as I usually just prefill, but today I got curious. Can it be done?
>>106034520It's not that retarded when Nvidia is selling a comparable product with gimped bandwidth for $4k a pop. But for all the Temu knockoffs it's easier to go with Strix Halo than develop a domestic equivalent
>>106034722Any domestic equivalent still won't have CUDA and will therefore be useless