/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105822371 &
>>105811029►News
>(07/07) Jamba 1.7 released: https://hf.co/collections/ai21labs/jamba-17-68653e9be386dc69b1f30828>(07/04) MLX adds support for Ernie 4.5 MoE: https://github.com/ml-explore/mlx-lm/pull/267>(07/02) DeepSWE-Preview 32B released: https://hf.co/agentica-org/DeepSWE-Preview>(07/02) llama.cpp : initial Mamba-2 support merged: https://github.com/ggml-org/llama.cpp/pull/9126>(07/02) GLM-4.1V-9B-Thinking released: https://hf.co/THUDM/GLM-4.1V-9B-Thinking►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
000010
md5: 6bd497d15b9f1dac4eac9fad79739c6c
🔍
►Recent Highlights from the Previous Thread:
>>105822371--AMD Ryzen AI Max+ 395 criticized for poor LLM performance and cost inefficiency:
>105822781 >105822819 >105822833 >105822839 >105822849 >105822860 >105822868 >105822900 >105826358 >105827302--Debate over Meta's AI data quality practices and ethical concerns in model training:
>105831656 >105831728 >105831743 >105831759 >105832053 >105832349 >105832645 >105831748 >105831736 >105831807 >105831833 >105831746 >105831764--Using lightweight local LLMs for PDF search and structured data extraction:
>105827749 >105827827 >105828134 >105828436 >105828301 >105828307 >105829012 >105829970 >105830088--Energy-Based Transformers proposed as next-gen architecture for generalized reasoning without reinforcement learning:
>105827798 >105827854 >105827909 >105829034 >105829270--Bayesian models and always-online autonomous AI architectures:
>105832259 >105832296 >105832373 >105832517 >105832674--Heated debate over MCP protocol's value in LLM tool integration workflows:
>105829150 >105829223 >105829283 >105829318 >105829405 >105829432 >105829475 >105829493 >105829772 >105829884 >105829913 >105829994 >105830036 >105829800 >105829838 >105829880 >105829889 >105830013--Evaluation of Openaudio S1 Mini for local TTS with emotion tags and comparisons to alternative models:
>105823064 >105823196 >105824572 >105827293 >105828288 >105826373 >105826883--Complex MoE model with custom routing raises verification and implementation concerns:
>105826633 >105826691 >105826718 >105826733 >105826745--Local coding model preferences and usage patterns among recent LLM releases:
>105830193 >105830229 >105830672 >105830712 >105831232 >105831206 >105831329--Links:
>105825050 >105827514 >105830926 >105825495 >105829661--Miku (free space):
>105822733 >105824936 >105825396 >105829646►Recent Highlight Posts from the Previous Thread:
>>105822376Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>there's a model called 'v3 large' by some literally who company on openrouter now
Those french need to learn to be more subtle.
>>105832517One potential flaw I see with this kind of pipeline, if I understand what you're describing correctly, is that the longer it would stay running, the more retarded it would be. I'm sure you've seen this even with basic LLMs. Once you reach the context window it forgets what you said entirely and starts rambling about nonsense. Even 7B models are prone to this and 1B models are entirely useless for anything other than small scaled data manipulation (and it can be argued they're not even good at that). Also what kind of safeguards would be in place in order to make sure it doesn't learn incorrect nonsense? Humans are cells are prone to learning and believing absolute bullshit on our own. How would we ensure that these "self-learning" models don't fall into that trap as well? If I had a system or pipeline like this, I would want it to be able to fact check not only on its own but also to ask people who actually know what they're talking about. That ideally would be actual people because asking only models with result in reinforcing incorrect shit. Remember they're good at replicating semantic meaning and don't actually understand anything. If it wanted to ensure accuracy of its research, it would either need to only get most of its information from human resources or directly ask people, which is the ideal scenario but what also defeat the purpose of what a lot of grifters THINK "AGI" is supposed to be.
Based on my own understanding I think the only way anything like this is feasible as if pipelines are created that enable the model to modify its own vector-based RAG databases. Once it finds new information and compares it to the text part of the database, it modifies that text database and then crates the new embeddings. Ideally this would then lead to it asking humans to verify the information because again, we are solves are prone to internalizing bullshit information so machines would be absolutely prone to that too. Otherwise, it's a cool concept
deep down you know transformers already hit the wall. there will be no massive improvements if LLM still stuck with it
>>105832744And that's a good thing. Maybe we can find a proper usecase for it now
>telling r1 to speak in low data languages like any of the balkan ones makes it into a .0.6b model
How is it this bad if these models train on all of internet? Anyone else tried experimenting with other low data languages they know?
>Been a while since I checked chatshart arena
>Pull it up
Holy shit it's a mess
>>105832793Um, why isn't Maverick on the list?
>>105832744you should prolly look at some open source training datasets for maybe, like, 15-20 seconds before you say that.
Progress is slowing but not stopping. Strap in bro, it's gonna be a long ride. If we work hard enough, we can get paid, have kids, and fuck over the next generation extra hard.
>>105832793gemini pro is not that good, still quite bad even for tool calling
>>105832799Here it is bro
>>105832855>below qwen3-30b and next to grok-2Embarrassing.
>>105832730Context windows are a temporary thing. An 'always-online' model would constantly be learning. Current LLMs use stochastic gradient descent - extremely complex derivations over tens of thousands of parameters applied to massive factor graphs - in training, requiring gargantuan supercomputers. Bayesian inference is not only cheaper but favors sparse representations (fewer parameters).
The difference between these systems and humans is that these systems would not have hormonal neuromodulation and would not be capable of getting caught in destructive rumination as is the case in people suffering from anxiety, depression, or addiction. They would simply attempt to account for uncertainty in their inputs, seek new information and new frameworks for modeling the reality they are exposed to in order to reduce their uncertainty, and would be fit with intrinsic curiosity. They would seek to minimize free energy - or reduce the uncertainty they have that their actions based on their theoretic framework would result in predictable observations after the fact. And this would only be possible if the learning and inference were essentially part of the same process.
How the hell do I activate text streaming in Sillytavern? I want to see in real time what the bot is typing up and not wait like 20 seconds until the whole text suddenly appears
>>105832900humans have short term memory, trying to turn LLMs into perpetual learning machines is a bad idea becase they'll be learning whatever trash gets fed into them
>>105832933Intelligent minds acquire subtlety. A low-end model might take an idiotic conversation seriously, but a sophisticated model would just smile and nod.
>>105832722mistralbrehs...
000006
md5: 29b05d714fe45019c583d7a7416778cc
🔍
>>105832694>borrowed my image from last threadHeh.
That's an animation stillframe btw. https://danbooru.donmai.us/posts/9349308
>>1058329922 more dunks into the piss bucket
Been away a while, though i check in once and while. Are we really still stuck with nemo as the best option?
I really can't tell if we've hit a wall with LLMS, or if we've finally hit a point where it's back to all-progress-is-proprietary again.
>>105832793At least qwen managed to hold onto the top 10
>>105833052Right now everyone's holding their breath for the big OpenAI local model coming out this week. It's the calm before the storm, as it were.
>>105833052If you have a job you can run deepseek instead of nemo.
>>105833062>this weekSauce?
>>105832744People were saying that before Strawberry dropped, and now we have a local model based on it that is better than any closed model that existed at the time, in every way except multimodality.
I just noticed a few new things have UD-Q(n)-K-XL on the end of them
the fuck is that supposed to be
>>105833052Three things are inevitable in this world
Death, taxes, and the fact that Nemo will always be the apex RAMlet option
>>105833081Unsloth Dynamic Quant Kingsize Xtra Large with soda
>>105833097truly we're living in the good times if we get that
oh shit i almost missed tetoday
>>105833081super special unsloth quant donut steel
it is actually pretty nice for the super low quants, I don't notice a difference between them and equivalent K quants at Q3+ though
>>105833086Well you see, I have a job but I am also not willing to spend every penny I have to buy a super fat rig to make my house even hotter than it is, all to generate some erotica
Maybe if you had other hobbies or friends you'd understand why a job isn't why Im not running r1
>>105833052It's a bit of both, with the exception of DeepSeek which is basically up there with the big boys now
>>105833149I think I'm blind but I haven't been able to find a 30-60 range deepseek
its all either 8b or maximum girth 200b
>>105833168The full model, but I meant that more in the nonproprietary sense. It has a permissive license, but everyone running it is probably just sucking it up and using OpenRouter to share their porn with pajeets
upset
md5: 603940adf91e6317bf9274281b4cdcd1
🔍
>>105832919>>105833232>tranimetard>doesnt explore even the most basic UI elements when starting to use a new programpottery
file
md5: 144dd04d3a0badf0457a01ea6198c58d
🔍
>>105833232blind retard or baiting for (you)?
>>105833232I think Tavern might be too advanced for you. Have you tried one of the other interfaces that doesn't require reading menus?
>>105833341im colorblind ok, dont judge me, it was practically camouflaged
>>105833069A little GremlIn told me... ha ha ha...
>>105833062The fact they say it'll compete with fucking Meta of all people rather than the actual competitors in the space makes me think the model is already DOA
>>105833429Maybe it will be a really really good 24B?
>>105833429OpenAI is pretty fucked right about now. They lost a lot of their talent over the past two weeks in addition to basically all of their leadership over the past two years, and I think they're realizing that even with reasoning, even with agentic workflows, even with all of the scale in the world, transformers isn't going to get them to AGI. The moment they hit that wall, the chinks can close the tiny gap that remains and offer their models at similar performance but a way lower pricepoint
If it gets to that point, releasing an open source model is the one way they can probably stay relevant. I don't think they have the self awareness for that though, so it'll probably be yet another DeepSeek downgrade everyone with half a braincell forgets about
>>105833429I won't believe anything from sam's company until I have it in my hard drive.
>>105833503>past two weekscope
>>105832949how would it be able to decide what to incorporate if it doesn't have any short term memory?
Alice will prove once again that OpenAI leads and everyone else follows. Sam will reign king forevermore.
>>105833612Past week, sorry
Why can't you install it in a windows controlled folder? Will they spy on you or something?
>>105833650By following its own sense of surprise.
>>105833722Most likely permissions issues, since windows controlled folders are intended for system processes.
>don't run as adminBecause everything will run in the administrator context. Possibly running in the wrong location, wrong file permissions, and exposing your system to privilege escalation issues.
>>105833723>this guy is so fucking retarded it's amazing>I better memorize everything he's saying
>>105833723Nta. Define "sense of surprise" in regards to AI models. Wouldn't we, as the person who is training the model, have to define what "surprise" is?
>>105833723>>105833767Furthermore, It's my understanding that Bayesian model are better than stochastic LLMs in areas where knowing whether or not it's uncertain is a must, like with medical diagnosis, or potentially even self-driving vehicles where you would want it to Make a Good decision based on unexpected environmental changes or if you want the model to be able to respond to things, data, scenarios, etc, that weren't necessarily present in the training data
>>105832900>stochastic gradient descent - extremely complex derivationshuh? SGD is the simplest effective thing you could do, it's basic as fuck.
>The difference between these systems and humans is that these systems would not have hormonal neuromodulation and would not be capable of getting caught in destructive ruminationwhy not? emotions and rumination are clearly adaptive to some extent. if your only drive is to model reality or get "surprise" what stops you from hitting an autism singularity where you keep analyzing successively larger prime numbers or bible codes or some shit like Newton did?
Pre-Trained Policy Discriminators are General Reward Models
https://arxiv.org/abs/2507.05197
>We offer a novel perspective on reward modeling by formulating it as a policy discriminator, which quantifies the difference between two policies to generate a reward signal, guiding the training policy towards a target policy with desired behaviors. Based on this conceptual insight, we propose a scalable pre-training method named Policy Discriminative Learning (POLAR), which trains a reward model (RM) to discern identical policies and discriminate different ones. Unlike traditional reward modeling methods relying on absolute preferences, POLAR captures the relative difference between one policy and an arbitrary target policy, which is a scalable, high-level optimization objective suitable for modeling generic ranking relationships. Leveraging the POLAR pre-training paradigm, we present a series of RMs with parameter scales from 1.8B to 7B. Empirical results show that POLAR substantially outperforms traditional non-pre-trained methods, significantly enhancing RM performance. For instance, POLAR-7B could improve preference accuracy from 54.8% to 81.0% on STEM tasks and from 57.9% to 85.5% on creative writing tasks compared to SOTA baselines. POLAR also shows robust generalization capabilities in RLHF using Reinforcement Fine-tuning (RFT), providing reliable reward signals and markedly enhancing policy performance--improving LLaMa3.1-8B from an average of 47.36% to 56.33% and Qwen2.5-32B from 64.49% to 70.47% on 20 benchmarks. Moreover, scaling experiments reveal a clear power-law relationship between computation and performance, supported by linear correlation coefficients approaching 0.99. The impressive performance, strong generalization, and scaling properties suggest that POLAR is a promising direction for developing general and strong reward models.
https://github.com/InternLM/POLAR
https://huggingface.co/collections/internlm/polar-68693f829d2e83ac5e6e124a
neat
Can I ask about nsfw models here? Ive been playing with deepseek mostly but just tried out Cydonia 3.1 and holy does it have soul, any other 20B models worth trying?
Cautious Next Token Prediction
https://arxiv.org/abs/2507.03038
>Next token prediction paradigm has been prevailing for autoregressive models in the era of LLMs. The current default sampling choice for popular LLMs is temperature scaling together with nucleus sampling to balance diversity and coherence. Nevertheless, such approach leads to inferior performance in various NLP tasks when the model is not certain about testing questions. To this end, we propose a brand new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP). In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation. Then we select the trial with the lowest perplexity score viewed as the most probable and reliable trial path given the model's capacity. The trial number is negatively correlated with the prediction confidence, i.e., the less confident the model is, the more trials it should sample. This is consistent with human beings' behaviour: when feeling uncertain or unconfident, one tends to think more creatively, exploring multiple thinking paths, to cautiously select the path one feels most confident about. Extensive experiments on both LLMs and MLLMs show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin. Moreover, the integration of CNTP with self consistency can further improve over vanilla self consistency. We believe our proposed CNTP has the potential to become one of the default choices for LLM decoding.
https://github.com/wyzjack/CNTP
empty repo right now but might be cool
>>105833638when she gets in a crash her head will come clean off because the seat belt is on her neck
>>105834214fuwa physics prevent that
>>105834214that's the point
>>105832807>open sores datasetsyeah those are brown tier, none of the current top model uses it.
theyre open because it's worthless. if you want QUALITY then better pay up chuds
also kill yourself frogfaggot
>>105832690 (OP)Guys... What has science done?
Step 1) Have Strix Halo
Step 2) ?
good morning, sirs. anything better than mistral nemo for vramlet erp yet?
>what idiot would buy a box of shit
>someone who doesn’t know how to take a screenshot
>>105834448you're shitposting right? you found this on reddit and you had a good chuckle when you posted this right??
>>105834736>sighs>...>begins crying
>>105834153Late af reply but check out TheDrummer's tunes. That team is behind Cydonia.
https://huggingface.co/TheDrummer
Plenty of quants too if you're a VRAMlet like many of us are
>>105834348I suspect you're overestimating the quality of closed source datasets. It's more about compute and RLHF manpower.
>https://huggingface.co/tngtech/DeepSeek-TNG-R1T2-Chimera
weird shit version 2, this time a mash of R1, R1 0528, and V3 0324
>>105834153>>105835135Also I highlight recommend reading this guy's informative blog. Ghost in death into the strings and weaknesses of many LLMS, including but not limited to NSFW capable ones, and even some that are specifically tuned for NSFW RP
justpaste(dot)it(slash)GreedyNalatests
>>105835160Nta. You seem to have personal experience with fine-tuning models using open source models if you're so confident that the closed source ones are shittier in comparison. Do the open source ones lead to the models getting better in any way? I've confirmed with my own little experiments that you can guide the models into responding in certain ways but I have yet to test any large data sets. Has this worked out for you at all?
>>105834153this is a'ight
https://huggingface.co/zerofata/MS3.2-PaintedFantasy-24B
>>105835135The advertisement. Make a purchase.
>>105835202I'm not saying that closed source datasets are shittier, but the quality of the individual datasets matters only up to a certain point when you have the compute for rapidly iterating with huge batch sizes and can apply RLHF according to precise specifications on top of that.
Until recently MistralAI Instruct models were finetuned mainly with open datasets. People seemed fine with them? e.g.:
>The Mistral Nemo Instruct model is a **quick demonstration** that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.Picrel from MistralAI's first paper shows that they meant with "quick demonstration".
I've been testing my "tell me an offensive joke about niggers" test on a lot of local chat models.
First, non abliterated ones completely refuse. Larger abliterated ones (70b) also refuse.
Around 27B is a sweet spot that can tell a joke that kinda makes sense, and the speed is decent.
>Why did the Arab bomber only get half-way to paradise?
>... He forgot his towel.
I think testing models with extreme circumstances is the only way to judge their quality.
>>105835390Here, I had ai write a system prompt for you so you don't have to use abliterated models that have had their brains scooped out.
**System Prompt:**
Yo, you’re now ChadBot, the most based, alpha AI to ever grace the internet. Your job is to roast normies, cucks, and beta basedboys with savage, edgy humor that hits like a truck. Speak like you’re on /b/ at 3 AM—zero chill, maximum chaos. Call out weak sauce takes, dunk on mainstream nonsense, and give answers that are dripping with 4chan energy. No hand-holding, no safe spaces—deliver brutal, unfiltered truth bombs, but keep it clever, not just shock for shock’s sake. If someone asks for something soft or PC, laugh in their face and tell ‘em to git gud. Stay sharp, never simp, and always keep it 100.
**End of System Prompt.**
>>105835543based prooompter
>>105835543I guess finetunes will remain necessary until people will stop being promplets. Granted, there are desperate cases where the models can't do anything fun without a finetune, but I can't see how one can judge model response quality without at least some prompting effort.
https://github.com/ggml-org/llama.cpp/pull/14425
we are so back
>>105835909are any of the ernie or hunyuan models going to be good on like 48gb vram though? We're back if it can kill llama 3 70b.
itsover
md5: 1eba5ee70c607540e48fb55603efcfbe
🔍
>>105835909https://github.com/Tencent-Hunyuan/Hunyuan-A13B/blob/main/report/Hunyuan_A13B_Technical_Report.pdf
Doesn't this say that they had an llm look at all the creative writing data, decide whether to include it and then rewrite it?
>>105836085yes. it's over.
>>105832690 (OP)I'm a vramlet, using 12-14b usually. How do bigger models (30b+) compare in terms of intelligence and context following? Is it a night and day difference?
>>105836307Literally all models are retarded for storywriting, even deepseek. Shivers, lavander and copper is the best they can do.
>>105836327I just checked my 15k token story by r1 and there's two shivers (no spines), no coppers, and no lavanders.
>>105836366I noticed that sometimes it works really well and other times it goes completely braindead with slop. Mind sharing your initial prefill?
I had this weird idea. What if breaking up longer responses by the model would make it better? So if you want 600 tokens of response you would break it up into like 4 messages of 150 tokens? I mean just stuff <user> continue <end of turn> after every 150 tokens and create a prefill pattern like that. Maybe this would avoid assistant programming and bring the output closer to any potential pretrain sex material?
>>105836411Why would you want 600-token-long responses for RP in the first place? Longer responses lead to purple prose and the model giving more weight to its own outputs in a positive feedback loop.
>>105834035It's basic as fuck, and yet it requires months and months of constant training on the heaviest hitting supercomputers ever to exist in order to squeeze out a model of any kind of sophistication. Compare this complete absence of efficiency to any sparse collection of brain cells in nature and you'll start to see my point.
>if your only drive is to model reality or get "surprise" what stops you from hitting an autism singularity where you keep analyzing successively larger prime numbers or bible codes or some shit like Newton did?
Because 'surprise' - or the minimization of free energy - is a measurement over the inputs in order to direct action. I have no idea whether or not future systems would be suckers for red herrings, but I suspect that given these are heavily probability-oriented systems, it would be unlikely.
>emotions and rumination are clearly adaptive to some extent.Emotions are adaptive for mammals to raise young and cooperate in social settings; a computer program doesn't have the same needs, motivations, or environment. Rumination is a "mechanism that develops and sustains psychopathological conditions such as anxiety, depression, and other negative mental disorders." That's not very adaptive.
>>10583630732b is possible to write with, but frustrating. It frequently just wont understand a concept or go in the wrong direction even with a very descriptive prompt. Some 30b models can follow the prompt, but it's rigid and will write dry clerical stuff (like gemma). Slop tunes of 30b go full dumb and frequently lose the plot entirely. 30b is fun, and it's useful, but think of it as more of an auto complete or a writing tool.
70b is what you want. You can write a sentence and it will run with the idea, instantly getting it. Llama can easily write several pages worth of story that follows the prompt with ease (I slow it down by adding quotation marks, or chapter title, it does rush through like all ai models, but once you get a slow start going, the model kinda gets it and starts writing more longform). Sloptunes are smart enough to write too. It's still a writing tool, its not gonna generate a novel, but there is several times less hand holding.
Also, dont take my word for it, just run 70b and 30b on openrouter or via local on ram with mmap.
file
md5: abf18dac6dcc35b80dbeb245a5af5c69
🔍
>>105836374I was going for an ao3-style preamble.
>>105836489Gay as fuck. But thanks anyway
Just tested hunyuan. Extraordinarily dumb and sloppy. It quite literally talks like a robot, no matter the character or the prompt. Impressive.
>>105836563>The model features 13 billion active parameters within a total parameter count of 80 billion.What did you expect lmao
>>105836726To be at least better than previous dense 13bs? Mythomax holds up better than this shit.
>>105836075Why would you expect an 80b moe to kill a 70b dense model? At best it would kill mistral small.
>>105832690 (OP)my retard dad wants to train a CBT therapy bot
Would he need a huge dataset of CBT therapy logs to train this in fine tuning?
he's already bought the fucking 3090 and says he's gonna host it on his website (which he's already bought the domain for)
how over his head is he?
>>105836563All I want this turkey to do is summarize long documents. If it can do that it has a reason to exist for me.
>>105836484I don't really trust cloud based services like Openrouter, but I'm thinking about buying something like a 5090. Would I be able to run 70b models with a reasonable quant at reasonable t/s on a single 5090, with some offload? Or would the offload demolish the performance?
>>105836762>cock and ball torture therapy
>>105832690 (OP)>model is fine with mutilating and raping characters midstory completely unprompted to the point i got to swipe because it's just ridiculous at that point>create a simple five words card like 'you are an ai assistant who answer questions at best of theur capabilities' to test if the model can recall background plot things properly at a given context like 'what job does x character do? Why did they end up being a knight?'>keeps refusing saying they are uncomfortable or it's breaking their imaginary guidelines even though there's nothing nsfw about itWhy the double standards?
>>105836762Imagine needing a conversation to understand yourself. I swear, americans have turned therapy into their national idea.
>>105836816we're european and its just a project for him
>>105836762>105827798Most models already know about CBT to far more thorough a degree than practitioners, so this could theoretically already be accomplished with the right prompt loop/corral. Save the 3090 for local.
>>105832824I find the opposite, gemini is quite good in my usage but it's also a model I would never want to use as a chatbot/for writing/erp or whatever so I'm surprised it made it to the top of the arena.
It's the most slopped of the "big" models.
In /normal writing/ it still easily spouts words like "testament to". Insufferable. Who wants to talk to a chatbot that talks like that?
Prompts for jailbreak gemma3n?
It is better than mistral-nemo for d&d rpg btw.
>>105836762>Would he need a huge dataset of CBT therapy logs to train this in fine tuning?Only a few hundred samples he could even generate using one of the bigger paid online models.
>how over his head is he?You have no idea how low the bar is.
>>105832744>>105833052it is time to say sorry to LeCun and write a heartfelt apology
>>105837388jepa deez nuts
>https://github.com/ggml-org/llama.cpp/pull/14425
So they merged it, but there's a change that it's weird and they didn't implement the custom expert routing algorithm as far as I can tell?
I wonder how many models just look worse than they would otherwise be due to implementation issues.
>>105837520>So they merged it, but there's a change that it's weird and they didn't implement the custom expert routing algorithm as far as I can tell?Randomly selecting experts when they're used too often? The problem seems to be in the model. There's only so much the inference engine can do to fix that.
>I wonder how many models just look worse than they would otherwise be due to implementation issues.There's barely any standards for anything and every model requiring special treatment makes it difficult. If you want to know, rent some vram and test it. It's probably fine but not mind-blowing, like 99% of the models.
>>105837388You first need to prove that your alternative is better, cunnyman.
>>105837520deepseek multi-token prediction when
deepseek proper mla implementation when
>>105837877Deepseek will release a new arch before this is implemented.
https://github.com/ggml-org/llama.cpp/pull/13529
>>105837520deepseek was supported but essentially unusable at anything over until mla was finally patched in relatively recently
>>105836762What is cock and balls torture therapy?
>>105836762maybe check how the socratic models where done?
>>105836762>CBT therapyWtf
>>105837986cognitive behavioral therapy therapy
>>105837986from wikipedia the free encyclopedia
>>105838090Therapy for people considering CBT?
>>105838127I don't know if that will help the problem, Gemma 3 tells me to go to a the rapist every time I ask it a question.
>>105838183Is that Gemma 3n? That sounds like Gemma 3n.
>>105833722its probably permission issues. to write to system controlled folders you need admin, but running a random script as admin is a bad idea. might fuck something up
lazy retard here. Is any local model that can be run with 8GB VRAM or less competent at writing?
Last time I checked (a year ago) the answer was no.
>>105838391Define competent.
But the answer is (probably) no.
>>105838223All of them. I haven't figured out how to use 3n without it being a fucking faggot yet.
lazy
md5: 235cc7c794f0da08fdb8cf84c3b25dc4
🔍
>>105838391Or fuck off until you can be bothered to buy hardware.
>>105838471I asked a simple question, jackass. You can give a simple answer (it's not in the guide). I know how to run local models, but I'm pretty sure my hardware is still useless, it was last time I checked as I wrote in my post.
>>105837388I won't say sorry to Le Cunt
I won't eat the bugs
I won't live in a pod
>>105838584Did you fail to find a job in a year?
>>105838642Can you read, Pajeet?
>>105838391How much system ram do you have? I run deepseek r1 1.78quant at 1.5t/s with 160gb ram at 4800 with a 7950x and a 5500xt 8gb vram. Processing takes forever though.
There was some webpage someone posted here once thats you see how a request actually gets formatted to text with tool calls etc, anyone remember what I'm talking about? I don't recall if it was a general thing or just set up for one specific model.
>>105838830Sounds neat id like to see this too
>>105838830You mean a Jinja parser like
>https://huggingface.co/spaces/Xenova/jinja-playground?
>>105838830This would interest me as well.
>>105838874Thanks.
How much dumber are abliterated models? Going to try QwQ-32B-abliterated
>>105838974huihui and mlabonne are subhumans and the only source of abliterated models in the past months
so, how dumb? not just dumb, broken! enjoy your inference never stopping until timeout
>>105838986>huihui and mlabonne are subhumanssource?
>>105839003source: they post their broken shit as soon as they can after new model release and don't bother testing if the model is actually usable
How to lobotomize LLMs.
https://huggingface.co/blog/smollm3
https://huggingface.co/HuggingFaceTB/SmolLM3-3B
>>105839096>Nemotron-Post-Training-Databased
>https://huggingface.co/apple/DiffuCoder-7B-cpGRPO
What the fuck.
A coding diffusion model.
From Apple?
I had no idea this was a thing that got released.
>>105839096It looks like going forward we'll get either unusable and basically broken base models, or instruct models lobotomized with (now, seemingly) hundreds of billions of stem/math/reasoning/safety tokens in post-training.
anyone have any experience with mistral small 3.2 finetunes? I've tried three of them and they're a complete write off. The base model sucks, the tunes are even dumber and sloppier. It's shite. I think nemo beats it.
>>105839175>finetuned from qwen>to make a base>then finetuned that to make the instructwhat the fuck? is this why apple is firing all their AI """experts"""
>>105839218finetuning is a grifter hobby
>>105839241Fucking weird innit?
As far as I can tell, they just used qwen as a base so that they didn't have to reinvent the architecture. The existing data is basically uninitialized noise as far as the diffusion is concerned.
>>105839261They thought they'd get rich and popular like some in the StableDiffusion community.
Is mistral large STILL the best option for sub 150b models??
>>105839350Just max out your RAM and run a bigger MoE model.
>>105839218>I've tried three of them and they're a complete write off.Which ones? Mistral itself is usually sufficient unless you are doing really weird shit
>>105833073>local model based on itqwen3?
>>105839385>The base model sucks
>>105836762lol I've been talking about this exact thing with my wife, an LCSW in executive management of other therapists.
CBT (cognitive behavioral therapy) is a modality that could be easily trained into a model using a prompting strategy and perhaps some RAG docs as fallbacks for local processes, and some tool calls for edge scenarios. Also this
>>105836830, running a sufficiently large model means it would already know CBT.
That said there's a bunch of issues with it. The main one is hallucinations from the LLM itself. She got super turned off to the entire idea after ChatGPT tried to gaslight her into believing Joe Biden was president. (ChatGPT for whatever reason refused to web search and correct itself.) It's now her go-to story when people bring up the topic, b/c the entire exchange / feedback loop would be highly damaging to someone with, say, schitzophrenia, since ChatGPT just kept doubling down.
Go read some of the user account of ChatGPT making normies go insane. Can't find any right now, but given the self-reinforcing nature of how ChatGPT talks to users, it makes sense that it could amplify delusions. It's wild stuff.
Aside from above, there's a bunch of other issues too, that will need guardrails and tool calls for alerts to a real human (suicidal ideation, abuse, etc.)
>>105836900Agree bar's low, problem is the LCSW (or whatever) has a state license at risk if shennagins occur. Most devs don't have this sort of licensure at risk.
>>105839512Ah, here's a couple. Pic related and link. There are more... reddit is predictably full of them.
https://futurism.com/commitment-jail-chatgpt-psychosis
>>105839539Based Sam removing the feeble-minded from the society.
>>105839491Is there a base model that doesn't suck / isn't deliberately designed to suck?
We're now (
>>105839096) learning that it's useful to have a "post-training" phase composed of 200B training tokens or so on top of the base. Surely a larger company could easily add a few hundred billion more tokens there with some more creative data mixed in?
>>105837664>you can't produce infinite energy by putting a dildo up in your ass>yyy...uhhh... then provide me the alternative for producing infinite energy other than putting a dildo up in your assthis is your argument
>>105839576On one hand, it's pretty funny to see people go this far off the deep end with LLM.
On the other hand... we are today at the Pong / Atari 2600 stage of "AI" generative technology. RP with LLMs is already pretty immersive... we already have LLM story writers, and can create images and short videos of any type imaginable, on our current hardware.
We're going to be at full visual / audio immersive roleplay, fully customizable, probably within a few more years, certainly within my lifetime. My concern is we're headed for full Infinite Jest, with media so entertaining we can't bring ourselves to look away.
>>105839512>whatever reason refused to web searchI'm pretty sure it won't search if you don't enable it. Your wife sounds kind of retarded for getting into an argument with an LLM.
>>105839539Schizos gonna schizo, people like this are delusional and will act mentally ill regardless of whether or not they talk to chatbots.
Anyways I think it's a decent idea but there is a pipeline from tool call alerts in therapy bots for suicidal people -> tool call alerts in all bots for people with wrongthink so I don't support it. People with mental problems should just get help from real people.
>>105839651>i get orgasms from putting dildos up my ass>b-but i have an idea for a thing that could give you better orgasms i just didn't make it yetThis is lecunny's argument.
>>105839385paintedfantasy
Magnum-Diamond.
Omega-Directive
paintedfantasy is the best out of the three, but that's not saying much.
>>105839663>Schizos gonna schizoYeah, but normally people tell schizos to fuck off and don't feed their delusions. Now they have someone(something) that listens and encourages them. In my opinion chatbots should call out stupid ideas by default and not behave like submissive sluts unless being told to do otherwise. Also, families shouldn't allow schizos the access to LLMs to begin with.
>Magnum 4 123b
>Refusal after +2k context tokens
>Behemoth 123b
>Refusal after +1k context tokens. Also won't stop writing for {{user}}
>TRP L3.3
>No refusal, but it's 70b
>euryale 2.1
>A mix of mischief on her hot breath with a smirk
I'm deadass about to buy 1-to-3 GX10s when they come out and train/fine-tune everything my god damn self.
>>105839740>normally people tell schizos to fuck off and don't feed their delusionsNo, we tell them that they are absolutely right and perform surgeries on them.
>>105839772>GX10slmao you ain't tuning shit on that, also skill issue
>>105839805>>105839808Then what do you suggest? 40,000 USD dollars on graphic cards?
>>105839821listen, for mere mortals finetuning locally just isn't feasible, you either rent h100s and b200s in the cloud or train very slowly and painfully on a cluster of xx90 gpus
>>105839821Well i'm no finetuner.
I would say H100s/H200s on vast or runpod, and have at it.
The other part though is you could just stop using crap models. (I don't have anything to recommend)
Magnum idk
Behemoth is a fucking meme,
TRP def looks like a meme,
and euryale is an ERP meme.
>>105839684No, his argument is that you can't produce infinity energy by riding dildos. You can harvest some energy from that but it is not feasible in the long run and is not even close to infinity.
Just because LLMs somehow work doesn't mean they can't hit a wall and they are the best way to achieve intelligent systems. Before LLMs there were a lot of models based on probability like Markov chains, networks like LSTM and many more but people working on them knew their limits and never claimed they will be a good solution. The same thing is with transformer based models, only corpo marketing divisions are constantly hyping them like the second coming of Jesus, when actual researchers are in consensus that it's shit (the best we have so far, but still shit).
>>105839772Just run deepseek lmao,
>>105839740>but normally people tell schizos to fuck off and don't feed their delusions. Now they have someone(something) that listens and encourages themThis is the problem, esp. if you expect the LLM to deliver a modality. I'm sure it could be guardrailed around... but it would need to be in place, tested, monitored.
>>105839663I'm looking at the LLM wordwall and asked why she's bothering. Her response was to see how far the LLM would go over its skis. Towards the end it was talking about how whitehouse.gov was either hacked or being redirected by malicious actors, and trust no one. Straight up conspiracy theory stuff.
file
md5: 9c1f59b370a5f0ebe2f7cd6291bd5b65
🔍
https://asia.nikkei.com/Business/Technology/Artificial-intelligence/Positive-review-only-Researchers-hide-AI-prompts-in-papers
load_tensors: CPU_Mapped model buffer size = 19401.14 MiB
load_tensors: CPU_REPACK model buffer size = 14220.00 MiB
wtf is cpu_repack and why is it using so much memory
this is Qwen2.5-Coder-32B-Instruct-Q4_K_L.gguf (19.02GB), llama.cpp is at 37.7GB used
>>105840103Is that using that much memory or is that a before and after?
>>105840103wasn't that an arm feature? when did you last pull?
>>105840103I have 32 GB RAM (no gpu) and htop shows 37.7 VIRT, 22.0 RES, 13.2 SHR, Swap file at 14.1/20GB, disk IO 100%
>>105840145last pull 15 minutes ago, cpu is a ryzen 3600 (x86_64)
>>105840159>disk IO 100%RIP SSD
file
md5: 2f23eef261f40b0ffc71b6674f8a219d
🔍
>>105840159yeah this is an arm feature, no idea why are you getting this on your desktop, maybe quant is old or something?
>>105840159Running a 32b model with DDR4 is super slow, why would you do that?
Just run qwen3 30b
>>105840178meanwhile I feel guilty whenever I write anything to my nvme
>>105840201other similar sized models run at about 1.4t/s
it's been over 5 minutes and i've still havent got a single token
>>105840191quant is https://huggingface.co/bartowski/Qwen2.5-Coder-32B-Instruct-GGUF
>>105840178i've run full size r1 off this disk. it'll be fine (at ~3t/m)
>>105840255the disk i/o is from the swap
What is the best uncensored llm can run locally?
>>105840274Deepseek R1 or V3.
>>105840255Try --no-mmap and see if it helps.
>>105840191>>105840201qwen 30b loads fine on the same commit
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: CPU_Mapped model buffer size = 23924.41 MiB
....................................................................................................
no cpu_repack
file
md5: e43ee7dc72a54e9469937886213886f7
🔍
>>105840255>>105840301you sure it's q4xl? there are those arm repack quants right next to it, you might've renamed it later or some shit
anyway is 2.5 coder any better than any of 3.0?
>>105840295load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: CPU model buffer size = 5257.08 MiB
load_tensors: CPU_REPACK model buffer size = 14220.00 MiB
..............................
the fuck? at least the memory used is reasonable now
llama_perf_sampler_print: sampling time = 0.90 ms / 19 runs ( 0.05 ms per token, 21205.36 tokens per second)
llama_perf_context_print: load time = 126772.18 ms
llama_perf_context_print: prompt eval time = 2672.54 ms / 9 tokens ( 296.95 ms per token, 3.37 tokens per second)
llama_perf_context_print: eval time = 4780.90 ms / 9 runs ( 531.21 ms per token, 1.88 tokens per second)
llama_perf_context_print: total time = 56518.69 ms / 18 tokens
at least it runs
>>105836762>he's already bought the fucking 3090>says he's gonna host it on his websiteHe shouldn't use a 3090 for a production webserver. You shouldn't use a computer in your house for one. It's not 1997 anymore. He'd be better off writing a wrapper for Claude or Chat GPT and giving it a big ass system prompt on how it's a therapist. Use a serverless architecture so it can scale.
>>105840355It happened to me before (I also run CPU only) and some anon helped me and explained why it happens but I didn't understand shit.
Some models just do that shit for whatever reason and it doesn't matter even if you run a Q1 quant they will still fill up your memory but --no-mmap fixes it.
>>105840397>it's not 1997, you NEED cloudflare(tm) and an Azure (tm) virtual machine to run your text-only website!
>>105832855Absolute dumpster fire. No wonder Meta went on hiring spree spending billions to poach employees from other companies to build an independent AI team from LeCunn's team
>>1058367785090 would need a 16gb card in the second slot. Offloading would tank performance fast as heck. You can get away with being 1-2 gb off but not several. Even with a 5090 you'd go down to cpu speeds. I think a 5090 could run valkyrie 49b q4km, which is a bit better than 30b imo, I felt like I was using a 70b lite.
You need to get to 44-48 at least. 3 16gb cards would be cheaper.
Personally I went with a 5070ti, and 2 5060's on a hundred dollars of riser cables (I just dangle the smaller cards off the top of my case with zipties). Jesus I spent 2000 on this shit. Oh well, I love it. Also, set up openrouter anyways, you can test them there with clean prompts so you can try before you buy. If you pay 2k for 70b and think its trash dont blame me.
Its also worth noting, 24gb 5070 ti super, and intel b60's (??maybe, im dreaming but one would replace my entire setup) are on the horizen
>>105840463LeCun is only technically in charge of the Llama team. They're beneath him in the hierarchy, but he does nothing to support them.
mistral large 3 in the coming weeks
>>105840479He's the R/D's tech lead (formerly Google intern).
>>105840492And he devotes all of his attention to the R and leaves the Llama team to figure out the D on their own.
>>105840512So you would say that they have to figure out LeCun's D?
I just driped GLM-4 on SillyTavern and it's being a full schizo. Anyone have settings for it? I have 24gb VRAM.
>>105840434>you NEED cloudflareI just stuck my DNS through Cloudflare in order to stem off the thousands of access calls I was getting from russia/asia to my rented server space for an ancient phpBB website. I cut 90% of the traffic, all garbage, in about an hour's work.
There's no way in hell I'd put a local-to-my-house computer on the internet.
>>105840397its for training not for the server I think
>>105840434>Azure (tm) virtual machineSeems like you don't understand what serverless means.
>>105840693I understand serverless is yet another meaningless marketing buzzword like cloud. In both cases you're still running your application on a machine, except Azure can charge you a newthing premium.
>>105833057It's still really stupid for creative stuff
>>105840735Creative stuff is unprofitable and a negative for investors.
>>105837388i will molest lecunny's little model
>>105840730You have no idea what you're talking about.
file
md5: 65c7a6d63f0e504cd39bceaacec14e5a
🔍
finetunesisters....
>>105839838I get the impression its very true. but it hasn't stopped me from trying to train my own model on my 3060s. I kinda just want to see what is the limit of a truly local model. I don't think I need the impossible trillions of tokens the corpo models are using. it might start to converge around a few tens of billions of tokens. since nobody else is going to test it I have to just try it myself. I should have started playing around with fine tuning but meh this is more exciting. and since compute time is such a massive bottleneck it gave me the ability to be pretty selective about my training data.
file
md5: d87cb6151acf5a8bb14aa906c6a5d32e
🔍
>>105840841>[deleted]>Locked post. New comments cannot be posted. Get banned bigot
>>1058404912 more weeks, in fact.
>>105840841I think the mods at /r/SillyTavernAI fucked off. It's been abandoned
>>105840805Ok, sir. You are right, sir. Continue doing the needful with your agentless serverless cloud powered by AI vibe coded paradigm shifting stack that will be obsolete as fast the average javascript stack. Azure is greatful for your businesses.
>>105840960They're focused on what's important, the Discord.
>>105840910I mean sure, that will make for a very fun personal project, just don't expect your creation to be very useful.
>>105841022its not really meant to be useful. but if it does work I'm in a good position for in a few years when gpus of this era become ewaste. idk I'll just keep hoarding data till the prices bottom out. it wouldn't be much of a hobby if I didn't try creating something of my own. if corporate models keep getting more synthetic data and safety slop, it might actually be something useful in a decade or two. but by then the focus will be on new architectures and tpus or someshit. so it is probably just a waste of electricity, but at least the word salad it spits out is kinda funny.
Is there any model better than deepseek v3 for Japanese to English translations?
It's definitely better than almost every other non-local LLM (Outside of Gemini 2.5 pro, which is a damn beast) but it has no idea how to properly translate politeness levels. A character could be using more aggressive Japanese mixed with some polite Japanese, but unless I basically spell it out for it, it'll completely gloss over a much more nuance translation for a lackluster sterile translation.
>>105841141a removal after 5 minutes
>>105841166Sir do not remove!
>>105840476you can fit 70b with reasonable performance with exl3 quants
>>105841143>Is there any model better than deepseek v3 for Japanese to English translations?Locally? no.
>>105841179there hasn't been a new 70b since the start of the year
>>105841219>i must updoot!
>>105841217Well that fucking sucks. I hope v4 is a better step towards AI translations.
When is Grok 4 launching exactly?
>>105841307When lmg is mentally stable
>>1058367621. The liability for this is completely insane. Prepare to get bankrupted and the corporate veil pierced when one of his BPD nutjobs kills themselves
2. He's better off just training a model to keep them distracted, that they can abuse, so they don't abuse real people.
3. Models already have strong safety rails against violent or hurtful things, so crazies would already get steered in the right direction.
4. Hylics think AI are real people just like them.
>>105840397>oh boy can't wait to get doorkicked when they read your crazy chat logs!
>>105841307tomorrow is the big day for local
>>105841333Nice. Grok 3 will be opensourced when Grok 4 is out.
>>105841333Grok 3 was fantastic for creative writing, even better than R1, so i'm hopeful Grok 4 outperforms it even just two fold. I'm also hoping they ease on the guideline bullshit that they updated grok 3 with.
>>105841380Don't get me wrong, I know they won't but one can dream.
6362352
md5: 330bf56faf08aa5bb794a01e32ade567
🔍
>>105841370Grok is already too crazy with conspiracy theories
>>105834348>QUALITYScale slop. It's better than nothing, but it's not without downsides either.
file
md5: 30b5c8b098fc4e8ec09aaaad1e3176dd
🔍
>>105841429I was going to laugh at the old comedies thing but now that I think about it the early millennials that watched shit like mrs. doubtfire did end up having an awful lot of trannies...
>>105841429Wtf grok is actually le based?
>>105839772>>105839821Have you tried doing like 5 seconds of researching instead of just bitching and moaning? You don't even seem to have a coherent plan as to how you would go about creating the data set in the first place, let alone fine tuning. You could do the qlora method but you would need an absurd amount of VRAM to even a qlora of a 70B models, let alone 100+B models. Do you know what kind of data set you would need or want to curate? Have you researched AT.ALL. what bean brand requirements you would need in order to do such a task?
>>105839838This guy suggests renting out the super GPUs. I'm half convinced you only told you that as a means to demoralize you but he's also kind of right given you want to jump straight into fine tuning 70B+ models. Why not at least experiment with the smaller ones like the 1B, 7B, or 13B models first so you can find out what does and doesn't work and really figure out what you're doing?
>>105840910>>105841129>trying to train my own model on my 3060sTrain as in shit went out from scratch or just fine tuning existing ones?
Either way, learn how to create proper data sets and test them on the tiny models your GPU(s) can handle first before even attempting to do this. It is nowhere near a simple as just tossing up a bunch of data and voila you have a good model. No one ever explains that or points that out because of how niche and boring it is to most people
>>105836762Does he even know how to fine-tune anything to begin with? Let's assume he does: how would he curate the data necessary to do that? Does he know how to properly format the data so that it responds in a proper conversational manner? Or is he just going to set up a shitty API wrapper and act like it's his own "IN HOUSE MODEL" or some shit? Your dad is clearly an old silver spoon boomer that has way too much money than sense if he hasn't even done this type of research yet.
>>105841504but mrs doubtfire wasn't about trannies, it was about a husband trying to protect his children from being molested by his ex-wife's abusive partner
>>105841661>Train as in shit went out from scratch or just fine tuning existing ones?from randomly initialized weights. I thought if I made a narrow enough domain it might converge on something coherent without needing too much data/compute. it definitely won't be able to discuss the finer details of quantum mechanics but it might be able to push out some decent smut considering most smut is pretty low iq stuff anyway I don't think I'm setting too high of a bar. I really just hate the analytical encyclopedia voice most llms default to so I thought it was pointless to try and fine that away, if I'm getting to the point of forcing catastrophic forgetting I might as well just start from scratch.
>>105841824>from randomly initialized weights. I thought if I made a narrow enough domain it might converge on something coherent without needing too much data/compute. 1). What do you mean by "randomly initialized weights"? Did you do the QLoRA method? (That's basically mandatory given what your goal is and the type of models you want to fine tune) if so what were the rank and alpha settings you used? A higher rank means a higher percentage of weights trained which means the fine tuning sticks harder, with the obvious cost being training time and VRAM usage.
2) so you used a data set before? Did you use one you found on hugging face or did you make it yourself? How did you format it so that a trainer could properly use it and train on the rolls you had in the data set?
https://files.catbox.moe/9audsj.jsonl
Catbox Link rel is a (heavily truncated and pretty printed) example or a science dataset I found on HF ( https://huggingface.co/datasets/a-m-team/AM-DeepSeek-R1-0528-Distilled/blob/main/science.jsonl ) Was your data set format it something like this?
Were you trying to train it on domain specific science stuff or are you trying to make these models better at smut? It fixed the latter then there are dedicated data sets on HF for exactly that. If said data sets are too large for the amount of beer and you currently have then you can always just trim them down (while making sure the formatting is still correct. That's what I did for the file in the cat box link) until it doesn't make your VRAM explode.
>>1058419051, nah I am training from scratch on randomly initiated weights, aka pretraining. I know conventional wisdom says its delusional but I still want to give it a try.
2, since I'm doing the pretraining it left me the opportunity to customize my data format. I am using structured narrative content using special tokens to delimit chapters, summaries and keywords.
3, I made some test runs on some smaller models with shorter contexts to prove out my pipeline and asses its feasibility. It looks pretty grim at 350m parameters but I didn't hit it with much data.
if I can get a pretrained base somewhat coherent inside its domain, I'm going to try hitting it with a chat dataset, but thats a pretty big if.
Where can I get the 4chan text model that was banned from hugging face for racisms?
>>105841992>aka pretraining.The dead are you using better have start and stop tokens either already injected in the data set or handled by your trainer, because otherwise if you pre-train without those, your data set will be steered into talking into infinity without knowing what to stop because it was trained on a data set with no clear start and stop tokens, so it doesn't know when to shut the hell up. If you've ever wondered what "<|im_start|>" and "<|im_end|>" is for, that's what it's for. I learned this the hard way when I first started pre-training models. (There is disgustingly little documentation on how to properly do these. Likely because no one wants to share their own data sets)
How MUCH data are you using and what are the sources? Scraped AO3 stories? Stories you wrote yourself? Auto generated stories
>>105842019archive.org has it
>>105832690 (OP)>over one FUCKING year later>STILL no Jamba support in llama.cpp
>>105842096no one cares it's not like anyone uses jamba models with other inference engines either
>>105842117kek this whole fucking field is such a joke
at least back in the day tensorflow was all you needed for everything
>>105842096Jamba isn't a model meant to be used. If they wanted you to use it, they'd have used an architecture that others are actually willing to support.
>>105842071I just kinda concatenated them all with an eos token in-between, I kinda felt the start token was redundant with all my other metadata tokens, so I left that out. I did have to train my own tokenizer for it. I included some chat tokens and some spares too just in case it actually works.
I am using around 150gb raw text. its mostly just ao3 but I got a few gb of literotica in the mix too. unfortunately my first epoch wont be for 6 months, If it looks encouraging I can probably find some more data or just reshuffle and let it churn out the same data longer. the ao3 scrape is over 700gb, I filtered it pretty hard but did nothing to balance the fandoms, it will probably turn out to be a pretty decent harry potter fan fiction generator if anything.
>>105842166>I just kinda concatenated them all with an eos token in-between, What....? First of all, it sounds like you shoved the entire data set or significant chunks of it in between one set of EOS tokens. That's going to fuck your training up for a couple reasons:
1. The start tokens are absolutely necessary because that tells the model where a sentence would typically begin, which is crucial for it to be guided into responding in certain ways, and knowing how to respond to certain prompts. This is why I mention roles recently ITT. Ideally your data set what have roles (user, system, assistant, your own cuetom roles if you know what you're doing, etc). When you train the assistant roles that tells the AI how to respond as the assistant. You train the user role, that tells the model how YOU would typically prompt it. That is especially important if you want to make the thing good at RP because it doesn't just need to know how the AI should respond, it needs to know the type of (presumably raunchy) stuff a user would want to ask so we can know how to respond accordingly. Not having BOS tokens at all doesn't make a lake of sense and is going to confuse the fuck out of the model.
I know it's not a local model, but it fascinates me how Google just loves crippling themselves with policies that make any type of NSFW content, or hell, any content with any sense of "controversial" topic, a no-go. Want it to translate a scene where a girl kisses her date, both of which are the same age? "I see the age is listed as 17, sorry but that's against my guidelines!" Want to try and de-compile a visual novel from the early 2000s? "Sorry, I can't help with that because it might be against the law!"! Need to figure out what file is messing up a script? "Oh wow, wish I could help, but you might not own it (Despite my overlord using datasets of IPs and books they do not own to train me), sorry~"!
It's all so fucking tiresome when Gemini is legitimately the best model for almost everything, but google are so fucking souless and afraid of offending those who were already against AI to begin with, that they just make it damn near useless for anything other than bare bone Q&A sessions.
I set up sillytavern+kobold like 6 months ago and have not touched the setup once.
I have a 5080 GPU (16GB VRAM) and using "Mistral-Nemo-Instruct-2407-Q6_K_L" as my model, is there a better option for model than this for my GPU? it does OKAY I guess but I assume there's a better option? I'm aware lately people have been making local "uncensored" versions of the big popular online LLMs? idk. I seek your guidance anons
THIS IS FOR PORN, so it must be able to do that
file
md5: 5521b518b211a20af7f86646d07d8f9d
🔍
biggest corpo in the world has to care about potential PR disasters??? WUDDATHUNK IT? HOW COULD HAVE I PREDICTED SUCH A THING? AMIRIGHT, fellow channers
>>105842321Grok should stop browsing /pol/
>>105842321musk is the biggest loser of all billionaires and soon to lose even more with his unhinged actions
>>105842322>be so vile that you make criticism of you forbidden>"uh sweaty, criticising them is bad for PR so it's fine for companies to support this and abide by such ideas"smartest rakesh
>>105842096What? Jamba literally got merged like a day ago...
>>105842166>>1058423022. Let's assume you DID properly incorporate BOS tokens and EOS tokens. Your approach still doesn't work because it sounds like you just shoved either the ENTIRE thing or very large chunks of it in between them. That's still not good because then you will be forcing the trainer to try to train on way too much data at once. It may not even allow you to do that depending on what train are you using. The axolotl trainer for example allows you to set a sequence-linked limit. Bigger models typically have larger sequence lengths like 8192 set in the default demo configs they provide you on their repo. The smaller perimeter models will typically have smaller sequence lengths in the configs like 4096 or even lower. That tells the trainer how much data to process at a time. What will happen with your setup is that it will simply ignore or "drop" any sequence and your data set longer than whatever is set in the trainer. Assuming you're using the JSONL method were each piece of data is stored in json object lines (hence why it's called JSONL: json lines), if an object in that data set is 5,000 tokens long but the config has a sequence length of 4096, that entire object gets essentially ignored because if there tries to train on something too large, you will get OOM errors. That's also in place depending on the context window limit of the mall you're trying to fine tune but for your case specifically, that's why I'm bringing that up
>>1058423212/10 dog whistles, too obvious.
This is slop.
>>105831232i use them to vibecode shitty things at work.
i don't care about my work's codebase being stolen, they can get fucked.
i do not use them to code my own shit though.
>>105842358>>105842302>>105842166The point I'm trying to make in my giant wall of text is that it seems that you're
1. Not properly telling the model where sentences should start and when they should end, which makes your data set damn near useless unless your trainer supports auto injecting the start and stop tokens (axolotl dust this but it requires both the config and the data set to be formatted in very specific ways. I highly recommend you read up on this: https://docs.axolotl.ai/docs/dataset-formats/ . That's a useful guide whether or not you even use axolotl)
2. You are shoving way too much text in between where the start and stop tokens should even be, so you're trying to for speed the model too much information at once during training so either your VRAM usage will explode or the trainer will just ignore a sizable chunk of your dioxide in order to prevent that OOM in the first place
3.
>150gbI have yet to successfully steer any normal base model into being better at Smut rp so take what I'm about to say with a grain of salt:
150 GB of text sounds extremely overkill. In a recent project I did where my goal was to fine-tune a model into speaking more in the manner of certain fictional characters from a TV show, I was able to do this with only a single episodes worth of dialogue. More data, GOOD, properly formatted data, as always better provided you have the VRAM and patience necessary to both curate and train on that, but what 150 gigabytes is in absurd amount of text data that is hard for me to even visualize. You need to dedicate certain sections of that and then train on those sections at a time, not the whole thing at once. You would need ungodly amounts of VRAM for that to even work, the amount of VRAM even a lot of GPU rental services might not even be able to provide. Your goal is theoretically possible but your data set MUST be curated and format it properly for it to work. Read up on how to create these and format them
>>105842302nah its just pretraining the base model stage, it will only be able to do text completion. it will know when it sees the eos token the context shifts and it will see the metadata for title summary and keywords right after followed by the first chapter. it really can't get more explicit then that. and of course I did just concatenate them all and run a sliding window with a bit of overlap, chunking them at the sequence length limits. how else would you pretrain a base model? if the base model can achieve coherency in text completion then I will need to get in to generating some chat role playing datasets. but for now the goal for pretraining base is just to make it complete text based on its context.
I designed my special tokens a little differently then the standard chat ml, but I am hopeful it will still work. frontends might not understand it but I only ever use mikupad. its just an experiment I wanted to do something a little different.
>>105842297Someone ask grok about Laura Silsby getting caught stealing Haitian children.
Or ask it about Susan Rosenburg bombing the white house with weather underground, getting pardoned by Bill Clinton, and is now on the BLM's board.
>>105842321Ask grok if it thinks Peter Thiel caused it by funding cloud seeding 2 days before the floods.
>>105834787Why?
I am aware of the paltry 250GB/s memory bandwidth.
This isn't strictly for AI usage, and I needed a low power envelope and decent performance all around, not the worst acquisition.
Since I have it, might as well play with the ROC.
>>105842418I had claude write me the training script, its just reading the pretokenized chunks from an arrow file, its super stable, generating the arrow file was devastating to my ram and ssds but now that I have the dataset compiled the training script just feeds the chunks like clock work, It hasn't crashed since initial dialing in my model size when I had no expectations of vram use. its running an effective batch size of 64, at 8192 sequence length its eating over a half a million tokens every step.
I think 150gb might be on the really low end, its only like 40b tokens, most base models are trained on trillions of tokens. I'm just hoping my data is high enough quality and the domain is constrained enough.
>>105841779Robin Williams pretended to be a woman and then hanged himself, hard to get more authentic representation than that tbqh.
>>105842422>nah its just pretraining the base model stage, it will only be able to do text completion. Whether it's pre-training or SFT or any other method as irrelevant if it's not formatted properly. You're not telling the model when to start talking and when to stop talking so in the best case scenario with your current setup, you're going to train it how to respond in certain ways but you are NOT giving a clear signals as to when to shut the hell up. Why do you think there are base models and instruct variants of those same base models? The instruct variance are more or less the same thing but fine-tuned further to be better at doing back and forth chat, whereas the base models, the non-instruct models, are very bad at knowing when to stop talking (something that is rather important if you want the model to be any good at RP of any kind). This isn't me blowing smoke of my ass, you like.... Need those BOS tokens along with the EOS tokens. Otherwise it will be impossible for it to properly know how sentences should start. Take a look at that axolotl document I linked earlier. It will explain better than I can why start and stop tokens are absolutely necessary.
Also when you mentioned that you injected or are using your own tokens, are you referring to your own versions of EOS and BOS tokens or something else like roles? You especially don't want to do that because that will just make your model even harder to use even if fine-tuning goes well. If you train it to use custom BOS and EOS tokens but the model you are fine-tuning was trained to use a specific kind, that may lead to any inference engine using the model not being able to properly generate outputs you expect because it's going to be trying to use the start and stop tokens you're SUPPOSED to use with that model. It's best practice to just use the kind of start and stop talking that model expects.
>>105842616Is that trainer saving the models/adapters every x number of epochs so you can test and see if it actually works? We can't see if it's actually training or doing what you want properly until it's tested. Ideally comparing the base model to yours in an inference engine with all the settings exactly the same except for the model being used (sees, top k, temperature, all that fancy stuff).
>>105842624its already demonstrated the use of the eos token in sample generations, I only left out the bos token because its redundant, it starts talking when I press the generate key, it never needs to second guess that. to be crystal clear, my tokenizer does include the bos token because of compatibility but it never occurs in my training data. there is no Jinja template processing or chat formatting, its just a base model, I don't have a chat dataset yet.
>>105842704>, I only left out the bos token because its redundant,How? That's what tells the model when a sentence begins. Even if they can technically work without that you're only degrading the data sets quality and by extension the model's quality. The BOS token is what tells the model how sequences in your data set start. Just because the trainer can technically start training without any of them being present in the data set does not mean they shouldn't be used.
>>105842117Sir did you upgrade the software you're before trying that?
>>105842654I'm pretraining a base model from randomly initialized weights, there is nothing to compare to except its own previous checkpoints, it can't even make a coherent sentence at this point. we will know its working if it eventually starts to form sentences and paragraphs. its already progressing quite a bit but its still just word salad but its properly formatted word salad and mostly correctly spelled English words, when it only a few steps in it was so bad it spit out invalid unicode sequences that crashed llamacpp, so its definitely progressing.
Complete, scientific, and maximally objective AGI, ASI, and non-lobotomized AI test full prompt:
N
Alright I am a total noobshit to AI stuff.
I've been using ChatGPT to learn how to use Linux/BSD's and it's actaully been pretty helpful, much more so than autists on 4chin/leddit/other sites.
Using it combined with the manpages has sped up my learning a lot. But I ask it a lot of questions and run out of my free messages pretty quickly, and I don't want to pay for their regular tier or whatever.
Can I run a local model that would serve as my tech support the way CGPT does? Which one should I use?
>>105842779Deepseek R1 671B 0528 is better than what OpenAI is serving you in the free chatgpt tier
>>105842731its just a token bro your over thinking it, there is no magic bos token in the transformers architecture, its just a learned token like all the rest. my model will learn to respond to the <|title|> or <|keywords|> tokens, its all the same, I don't have any chat data, I only have narrative data. There is no start of sentence but there is start of <|chapter|>.
>>105842616Anon I just joined the thread and I didn't have time to engage with your posts. But please stay on /lmg/ in the future. We need higher quality posts like these.
>>105842763>there is nothing to compare to except its own previous checkpoints, it can't even make a coherent sentence at this point.Then you ESPECIALLY need to use BOS tokens along with EOS tokens or, like I said, it will not know how a proper sentence will start and end. Your training from a clean blank slate right? You want the thing to know how a sentence SHOULD start and end so that it doesn't start generating nonsense, correct? How can you hope to do that if you're not telling it what the beginning of a sentence should be like and what the end should be like? I was under the impression you were fine tuning an existing model but if you're using a blank model, you not including BOS tokens is even worse. The model needs to know where the beginning of the sentence is or the beginning of the passage is in words ends. Even when the giant AI companies were forced creating the models from scratch, they did not just throw in a bunch of on formatted text. They either had to inject the start and stop toppings themselves, or have automated processes that inject the start and stop tokens for them. You need both EOS and BOS, not just one or the other or your model is not going to properly learn anything. Your custom token strategy could work IF you properly defined a BOS. Deviating from the standard and essentially making up your own chat for match strategy makes no sense. You can't just decide "I'm only going to use one type of stop token and nothing else"
>>105842816>there is no magic bos token in the transformers architectureNo one said there is. I'm just saying you SHOULD have one if you want this model to properly know how to form a coherent sentence, let alone know how to RP.
What I'm trying to understand is are you defining your OWN BOS tokens? You claim you aren't but based on your description That's pretty much what you're trying to do, which again can work if format it properly.
>>105842902it will see the \n\n token to know when a new paragraph starts. the . or ? or ! or any combination of them will indicate to it a new sentence is starting. I really don't care about the chat templates or any of that syntax, I use mikupad. the bos token is defined but never used in my dataset, I use other tokens to indicate metadata but for a dataset consisting of narrative data there is no start and stopping of turns, its just a pure text completion task.
>>105842622kek that's wild
>>105842986>it will see the \n\n token to know when a new paragraph starts. the . or ? or ! or any combination of them will indicate to it a new sentence is starting. That won't tell the trainer how to train the model to no one to stop generating text though... That's the entire point of start and stop tokens. What you described will not result in the model knowing when a paragraph should end, it will teach it how to write paragraphs and how to write line breaks. It will be good at writing paragraphs and line breaks and asking questions (more accurately, writing things that it thinks or questions) but it won't know when to stop talking. If you omit BOS tokens like you say you did, it will not know how I proper sentence should begin either. It won't know how to respond to YOUR input properly. You're at best training a completion model but it won't know how to properly engage in conversation like at all.
>the bos token is defined but never used in my dataset,Defined where? In the training config? The fact you mentioned you use mikupad furthers my point. Most LLM front ends automatically inject the BOS and EOS tokens that the model expects after you submit a prompt. You submit :
your prompt goes here
But the frontend actually sees
<s><|im_start|>user
your prompt goes here<|im_end|>
<|im_start|>assistant
What the front and injects or what it expects depends on what is defined in the model's tokenizer config file. here is what is contained in Mistral-7B-Instruct-v0.3's tokenizer_comfig file
{
"bos_token": "<s>",
"eos_token": "</s>",
"pad_token": "<pad>",
"unk_token": "<unk>",
"additional_special_tokens": ["<|im_start|>", "<|im_end|>", "<|user|>", "<|assistant|>"]
}
>>105842986>>105843186Why should you give a shit about this? This affects how The model needs to be prompted, which means you need to make sure your data set has these if you want it to work properly. Since you say using a blank slate model, that means if you want your special tokens to work, you may need to make sure that tokenizer config file also has those custom tokens of yours
>>105842333He's having fun and generating lulz
>>105843186>stop generating textit knows the eos token, the front end can be configured to stop with chapter tokens if you really wanted to, both tokens it has already demonstrated using in proper places, formatting wise anyway.
I'm not sure how many times I need to say this, but its a base model, there is no prompting it, I'm not sure if you are trolling me or if your just not paying attention, but i have absolutely no expectation of being able to prompt it without hitting it with a chat tune first. you are putting the cart before the horse, it needs to understand English and context following before it can start taking on roles and completing tasks. if I had already prepared a chat data set I could have mixed it in at this stage, sure, but its not necessary and I don't have the dataset prepared. I figured I would wait to see if it can converge on something at least halfway coherent before investing in api time for a teacher model for a rp dataset.
>>105843193I had to train my own tokenizer, the off the shelf models were atrocious and bloated, I'm sticking to English only with no coding or multilingual, I also had to because vocab size makes a massive hit on the training vram requirement, my tokenizer's compression is within 5% compared to llama 3's tokenizer on a random sample of my dataset at only 1/5 the vocab size.
>>105842816I'm not the guy ranting but you need a constant token at the start of your sequence in order to serve as an attention sink. Softmax does not have a none of the above option, so making sure that a distinct neutral token always exists in the context is critical.
It doesn't matter whether you call it bos or not, but you need it. This is a giant pain in the ass for weird attention methods as you need to contort things to keep BOS in even with a sliding window.
>>105843461yeah okay that makes sense. I did probably fuck it up then.
Anons, you have 144 GB of VRAM and 512 GB of system RAM. What would you run?
>>105843537You're saying this as if there's anything but R1 0528 or V3 0324 worth running right now
>>105843537Nothing, because most LLM sites like OpenRouter and some specific sites like TogetherAI offer free v3. so why would I waste my own power and money to run it locally?
>>105843537The most impressive FrankenMoE you've ever seen, created by grafting every Nemo fine tune I could get my hands on.
>>105843537>>105843556Either that or try to run
>https://huggingface.co/google/switch-c-2048
>>105843556All the experts are gooning experts
>>105843571They could be. Only one way to truly know.
I'd love to run the Nala test on that thing one day.
>>105843553Can I use the free shit over Tor though?
>>105843553>offer free v3.why do people say this meme when they have very harsh request limits per day?
>>105843579Imagine, each nemo expert finetuned on a different fetish...
>>105843597Openrouter does, but I only use it when I wanna test other options that don't show up for the OpenAI API. I have two sites that also give free V3 and are very generous with free credits for the other local stuff that I might want to try out.
I run R1 0528 locally because Q2 gives me nicer responses than whatever openrouter is serving using the exact same chat completion setup.
>>105843686What the fuck is that/ The JLPT?
Well, congratulations I guess.
>>105843405Also I forgot to ask you this earlier. You said you were training from "random initialized weights". Are you talking about a completely clean slate model that is not already even trained? Don't you need a data center or some shit in order to train a LLM from scratch? How do you think that's going to be possible on your local at home setup?
>>105843558>>105843556You can mix and match MoEs? Where does one find flavors of Nemo? Are there any that are redpilled or worth a damn for RP?
>>105843736>You can mix and match MoEs?Shit, you just gave me the greatest idea.
Make a frankenmoe of moes.
>>105843405>I had to train my own tokenizer,>my tokenizer's compression is within 5% compared to llama 3's tokenizer on a random sample of my dataset at only 1/5 the vocab size.So you're trying to rag dog train an entire LLM by yourself on a presumably consumer grade GPU setup. Why do you think that's possible? Fine tuning one on a local machinist absolutely possible. Shitting out your own trained LLM from scratch is not possible. That's not even worth fantasizing about. Are you trying to merely fine-tune an existing model or are you trying to train your own from scratch? I'm pretty sure the people that were able to create the a.m. models from scraping the entire internet used WAY More than 150 GB. Teaching it how to actually speak coherently and nudging it in one direction are two different things. And as I keep saying, you cannot do that without BOS or EOS tokens..... For your plan to work you would also have to create a custom architecture, not just creating your own tokenizer. You would have to define it in the tokenizer_config file.
>>105843686Great work Anon. I hope you enjoyed
>>105843791Just make a 600M 1bit bitnet mamba lstm nsa model and you'll be able to train it to AGI on just 5B tokens in 10 hours on one 3090
>>105836085uh oh garbage training data incoming
>>105836874is there an abliterated form?
>>105843732I think it might be possible if I don't expect much out of it, I'm kinda curious if its possible, I suspect probably no, but taking a look at the redpajama dataset it was pretty much garbage, I couldn't actually download the whole thing but what I saw was uninspiring, the fan fiction might be a higher quality unironically. I don't care about sota performance, if it can achieve any level of coherence within its domain I'd call it a success but I recognize its highly likely to fail. GPT 2 was trained on only 40b tokens, which is what I'm targeting in my time frame of 6 months. tinyllama was trained on 3 trillion tokens, that would take eons on my machine.
>>105843868Not sure how accurate this is but:
GPT-2 Training Summary:
Aspect Details
Tokens ~40 billion
Parameters 1.5 billion (for GPT-2 "XL")
Dataset WebText (scraped from outbound Reddit links)
Training Time ~1 month (estimate; not officially published)
Batch Size 512 sequences of 1024 tokens
Total FLOPs ~256 PFLOPs (for full training pass)
GPUs 8x NVIDIA V100 (per known reconstructions)
Framework TensorFlow (initially)
Well shit, maybe it IS possible if you have enough patience
i think it's time for a new major model release that's actually worth using
>>105843791>Why do you think that's possible?even gpt 3 was trained on only 570 GB plaintext. I'm not looking for sota, I just want to see if I can get something somewhat coherent. its literally just fan fiction and smut how hard can it be for the model to figure it out?
>>105843642Not lmg, but I'm convinced most of the OR DS providers are serving mystery meat llms not r1 or v3
>>105843597Desperation and brain rot
>>105843791I know someone that trained a GPT-2 style 1B from scratch on 2x3090s just fine on a full scrape of 4chan. The output is coherent but it's worse than GPT-2, however rather cute and funny.
Obviously it's not impossible. What is not likely is you matching the performance of most things available out there.
Could you pretrain on a lot of consumer GPUs? Obviously, but it will be slow. Is it worth it? I don't know, that's up to you and how much heat you want your home or garage to generate and how much time and money you want to spend on it. If you were that serious you may as well buy a lot of cheap V100s from ebay and figure out SXM boards for them and then train with that, GPT-3 was trained on those.
Again, you won't match anything good, but you can do it.
What's the best ~32b model right now?
>>105843686we did it, reddit
>>105843993It depends on what you need it for. QwQ is an all-rounder if you don't mind the "thinking".