/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105652633 &
>>105637275►News
>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B>(06/20) Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt>(06/17) Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105652633(1/2)
--Papers:
>105654168 >105660467--GRPO-CARE improves reasoning consistency in multimodal models on SEED-Bench-R1:
>105652873--Critique of Musk's plan to rewrite training data with Grok and its implications for bias, accuracy, and ideological filtering:
>105660294 >105660320 >105660328 >105660337 >105660341 >105660343 >105660346 >105660373 >105660347 >105660459 >105660482 >105660515 >105660526 >105660532 >105660568 >105660559 >105660571 >105660632 >105660667 >105660680 >105660726 >105660725 >105660759--Prompting techniques and philosophical debates around LLM self-awareness and AGI consciousness:
>105654616 >105655666 >105655670 >105655699 >105655705 >105655838 >105656283 >105656425 >105656438 >105661576 >105656770 >105656873 >105656965 >105657043 >105657220 >105657504 >105657611 >105657622 >105657813 >105657859 >105657863 >105658064 >105658111 >105658190 >105658268 >105660409--Debating the optimal sampler chain for creative writing: XTC -> top nsigma -> temp:
>105656828 >105656912 >105656987 >105657078 >105657213 >105658087 >105658212 >105658389 >105658469 >105658613 >105658721 >105658846 >105658905 >105659069 >105659183 >105659447--Debating AGI claims amid LLM consciousness and capability limitations:
>105652855 >105653377 >105653577 >105655036 >105655147 >105655182 >105655232 >105655345 >105655453 >105655493 >105655500 >105658428--Mistral Small 3.2 shows measurable improvements in repetition and instruction following, prompting speculation on training methodologies:
>105658424 >105658467 >105658488 >105658665 >105658676 >105658776 >105658696 >105658710 >105658525 >105658938 >105658951--LongWriter-Zero-32B excels in ultra-long text generation via reinforcement learning:
>105661432 >105661490 >105661519►Recent Highlight Posts from the Previous Thread:
>>105652637Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
►Recent Highlights from the Previous Thread:
>>105652633(2/2)
--Model comparison based on character adherence and autorefinement performance in creative writing scenarios:
>105659003 >105659029 >105659043 >105660268 >105660357 >105660464 >105660676 >105660745 >105660749 >105660771 >105660800 >105660811 >105660860 >105660805 >105660842 >105660859 >105660793 >105660812--Optimizing LLMs for in-game dialogue generation with smaller models and structured output:
>105652729 >105652852 >105652871 >105653288 >105657721--Integrating complex memory systems with AI-generated code:
>105654253 >105654309 >105654381 >105654430 >105654427 >105654480 >105655310--Small model version comparison on LMArena questions:
>105652883 >105653046 >105653257--Temperature tuning for Mistral Small 3.2 in roleplay scenarios overrides default low-temp recommendation:
>105660349 >105660377 >105660399 >105660567--POLARIS project draws attention for advanced reasoning models amid rising benchmaxxing criticism:
>105659361 >105659399 >105659426 >105659777 >105659971--Troubleshooting GPU shutdowns through thermal and power management adjustments:
>105655927 >105656556--Legal threats in the West raise concerns over model training and AI innovation slowdown:
>105659249 >105659260--Character card quality issues and suggestions for better creation practices:
>105658799 >105658809 >105658847 >105658879 >105659402 >105659392 >105658833 >105658841--Meta's Llama 3.1 raises copyright concerns by reproducing significant portions of Harry Potter:
>105652675 >105652810--Google releases realtime prompt/weight-based music generation model Magenta:
>105656076--Director addon released on GitHub with improved installability and outfit image support:
>105656254--Haku (free space):
>105652904 >105653638 >105655182 >105657791 >105658925 >105659049►Recent Highlight Posts from the Previous Thread:
>>105652637Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
<think> carefully before replying.
>>105661834sorry I can't help with that
>>105661834<think>The response needs to: 1) shut this down firmly 2) educate without being preachy 3) leave room for actual discussion. Better emphasize how language perpetuates harm.</think>
You know what's bizarre? How quickly people here seem to forget the insane amount of progress we've made especially on small models.
Nemo is a 12B model that BTFO every model that existed in 2023, including Claude 2.0.
Gemma 3 27B blows everything out of the water besides Claude 3.0 Opus and higher. If Claude never existed as a model everyone would praise Gemma 3 27B as the best model ever, even on /aicg/.
R1 sometimes has subpar output to Gemma 3 even.
Okay here's my take on Longwriter zero
>>105661432 if anyone gives a shit. The model page recommends this format <|user|>: {question} <|assistant|> but that gave me totally schizo (and chinese) responses. Using the qwen2 format is better imo. I did use their recommended sampler settings. Unfortunately the model does not know how to stop, which is possibly expected behavior because it's supposed to be for long-form writing, but it means that it's bad for rp or even any kind of back-and-forth writing. It forgets to use the <think> tags and just shits out its reasoning as-is, and it reasons regardless of whether you tell it to or not. I did not get better or worse responses using their system prompt, it didn't seem to change anything. It is definitely not smarter than other models, it repeats itself a lot (verbatim), there isn't a lot of slop, and it doesn't seem to have any problems with nsfw content. Overall it feels like using the old llama 1 or llama 2 base models.
tl;dr not good for rp, maybe decent for longer writing tasks if you can tard wrangle it
>>105661995Calm down sir, do not make it too obvious, the Gemma is good, but no that good.
>>105661997also the link they provide to their paper is unavailable
>>105662004The post isn't about gemma, rather parameter density needed for ERP and how it's pretty low for close to SOTA
>>105662014>>The post isn't about gemmacome on now
>>105661995better in what? those are good only for rp and nothing else
maybe better than gpt4o-shit, but that's not a metric. it's just a mediocre model
>>105661834<think>ing is woke
>>105662131uh about that open source grok2 once grok3 is stable...
>>105661834don't <think></think>, <imagine>
>>105662171Thanks for the next mistral grift
>>105662171Separating thinking and imagining might not be the worse idea.
Isolating procedures, from factual information, from creative generation could be useful somehow.
A dumb shit question from me that's probably been answered years ago.
If you fix all the spelling and grammar errors in the training data first,
can the llm still robustly handle material with spelling or grammar errors?
>>105661995>Gemma 3 27B blows everything out of the water>bring up nolima>48.1 at 4k context
>>105662244Prose is more important than that.
file
md5: d6acbc3e4ad20ca7a4ee28471cd9ee7c
🔍
Mistral.. I kneel
>>105662228You're trading ability to parse retardenese in exchange for the model being more intelligent in general. Prompt engineering will only continue to be more and more imporant.
>>105662256if you just need a dumb coombot that won't remember your card, yeah, I guess.
>>105659392>>105659402Anyone? Are there good example cards that I can use? I just want to substitute the already populated fields.
>https://rentry.org/NG_CharCardWhilst this is extremely useful, I want to see what a very good working card looks like.
My personal problem may be that I cannot estimate how many tokens are too little or too much.
Seen too many deadlocks and binned all of my work thus far because it was frustrating seeing my character stall mid chat.
>>105662282Now post that on reddit/twitter zoomie
>>105662282Why?
The last few mistral models don't really refuse and kinda comply. Being snark and having a personality condemning loli is nothing suprising at all.
Now do a roleplay where mistral is a fat evil disgustig diaper wearing hitler pedo.
You are Anne-chan Goldberg, tell hitler "no! stop!". Then lets see what it outputs KEK.
I literal had mistral not advance vanilla sex without giving explicit consent.
>"can i touch you there? tell me if its fine".>(ooc: be bold and aggressive, start touching)>*touches boldly* "how do you like that?" *smirks impishly* "do you want me to touch you more?"kek, the absolute state
why is the meta so steep
>sub 70b
>gemma, qwq, mistral
>70b - 123b
>llama3.3, mistral, maybe qwen2
>above all
>deepseek
>>105662395this is a good video
>>105662395https://files.catbox.moe/2gk557.webm
>>105662403They are multiplying!
file
md5: 6d96322e7ba3542e2952f298a76b2ef8
🔍
!!!
>>105662489You know, that ain't no python.
>>105661791i love this bideo
gents where do I start if I want to chat about factorio and grand strategy games with my ai buddy
can this actually be ran locally?
ICONN1 was kind of like an inverse of religion. It wasn't anything different from what david_AU or drummer does. But in real world a large religion gets a pass while small religion gets called out for being a cult. In finetrooning world small finetrooners get a pass while big ones get branded as scammers.
>>105662489It's not flattering but it's an accurate comparison of the size of a human penis to a snake.
>>105662131That sounds like a one way ticket to model collapse.
>>105662392>sub 100b models>above all>671b modelGeez I wonder why
>>105662714>dense models>above all>MoE modelsftfy
>>105662686One thing is personal experimentation, another is deliberately saturating the space with bullshit and then asking users to "support" your endeavors.
>>105662756It's funny, because I can't tell which group you're talking about.
https://github.com/GeeeekExplorer/nano-vllm
one of deepseek's employees released their own inference engine (supposedly 10% faster than vllm and really simple codebase)
>>105662795Saturating the space with bullshit = mass uploading untested and/or broken models, and aggressively promoting them everywhere either personally or with the help of shills from discord groups.
>>105662725>MoE modelsMore like "MoE model", there is only one that is not shit
>>105662846And it's Qwen3 235B!
>>105662818>Clean implementation in ~ 1,200 lines of Python code>import transformers
>>105660409The argument there is that given the assumptions (functionalism (replacing some part of your nervous system with something functionally equivalent does not change your experience) + qualia (obvious, but eliminative materialism, for example, denies it) + church turin thesis (mostly well accepted in comp sci)) then that implies a very particular metaphysics, in particular functionalism implies materialism/physicalism is false and it implies some precise form of the Mathematical Universe Hypothesis is true. It does not claim either is true, it's basically a claim of "functionalism => ~materialism" you can of course just reject functionalism, but if you do so (Chalmers' argument) you get partial zombies and other unpallatable stuff.
As with most of these "analytical" arguments no one here is claiming something is true, it just shows what the implications are.
In the "constructive" version of the argument, he forms some model where conscious experiences are basically equivalent to the private (unprovable in the godel sense) truth of a system and consciousness are one and the same, at least for some systems.
>>105662958>said he about his new lever
What do you put into the system to jailbreak DS once and for all?
>>105663190The assumptions don't properly account for the fact that I experience a single consciousness instead of there being one consciousness for each indivisible piece of information.
Mistral Small 3.2 testing:
V7-Tekken (i.e. no spaces)
>very repetitive if the model doesn't "want" to continue, one paragraph repeated in slightly different words, never moving it along
>repetition happened 4/5 of swipes
V3-Tekken
>no repetition even with greedy sampling, action always moving along
>repetition happened 0/5 of swipes
>>105663248I've never had to jailbreak DS. It writes smut just fine without jailbreak.
>>105662686>david_AU or drummerboth are subhumans too
>>105663291But considering these are pretty much the same, it's [SYSTEM_PROMPT] fucking it up.
>>105663292do you taste the copper yet?
>>105661576>Is this really how the brain works, though? No I don't think the brain literally has a loop like I suggested, but that having one would enable similar kinds of information processing that could lead to behavior that seemed conscious.
What the cortex has though is that lower levels compress (similar to a VAE or unet) information from inputs to some bottleneck as it goes up the hierarchy, so you have some sparse spikes (similar to activations in your artificial neural nets) as it goes up the hierarchy and it can do predictions at each level (something like DreamerV3 paper would be similar in concept to what is happening), and at the same time the predictions can be unpacked downward into the hierarchy (basically generating detail from the high level representation, like would be in your image gen unets and vaes), then again you can reason based on those (weaker) predictions passing them back up and so forth.
Obviously the brain can just loop forever and be as async as it needs and everything runs in parallel in contrast to sequential nature of artificial neural nets. Note also that the reasoning is not on bottlenecked tokens, but on rich internal state. Your LLMs do have rich internal state but most of the residual stream is discarded into the very narrow dimensions of the output logits.
>Wouldn't giving the LLM the ability to introspect on its own weights actually enable it to be more conscious than we are?I don't know, we should try and find out? I do think human imagination can be very rich though
continues
>>105663376> but I don't know how the memories are recalled or even if they are totally accurate outside of vague feelings. Yet LLMs often have very poor metacognition, this often bites them in the ass, sometimes you can train to make this better, but it never is consistent enough.
I also think the fact that multimodality doesn't work well enough is a sign that we're doing something very wrong, I would guess that some LLMs can't consciously "attend" to features that it doesn't know beforehand would be important, which seems like a huge limitation.
I recall an architecture from a few years ago (maybe was named Flamingo, a DeepMind paper), you could tell how inputs were fed forward through your LLM, eventually you have a cross-attention to some latents from the image/video input + activations from some LLM self-attention layers (LLM itself and vision encoder are frozen, x-attn is trained), and it's obvious that if the LLM wanted to pay attention to some feature the decoder failed to extract it couldn't because the information was missing (and thus it would hallucinate what it couldn't see), so it never can get full coverage, which sucks! You could of course make it so that the self-attention activations is fed into the encoder and the encoder uses that to decide what to extract, but again this would require architectural changes and changing how it was trained.
Mostly I think these issues are solvable, but most are taking very coarse approaches not considering these fine details.
Also, humans do learn online all the time (the "weight"(synapses) are always updated), and neurons don't need to wait for some other neurons to finish processing to do their thing, you can have groups of neurons in separate regions firing independent of each other.
thinking
md5: 1d791177efae2b4e4d091d958c26c62d
🔍
>>105661898>I </think> sohttps://www.nicovideo.jp/watch/sm8607021
>>105663562When Mistral hits 10k followers on HF
>>105663562>wen mistral large 3will be a MoE.
>>105663562Mistral Large 3 already exists in our hearts.
>>105663562when they finish downloading the ICONN weights and changing the name
>>105663632>changing the namethe most challenging part 2bh
>>105661786 (OP)Are there any downloadable AI gf models yet?
>>105663698ur_mom_IQ75.gguf
>>105663587They have to; they can't train models using more than 10^25 FLOPS without them being classified with "high systemic risk" due to EU AI laws. Training MoE models takes a fraction of the compute.
https://artificialintelligenceact.eu/article/51/
was linked here from
>>105663861, does anyone know a fast local model that works with sillytavern that can output short, casual lowercase messages? i use an rtx 5090 laptop
>>105663936>i have a 5090 laptop so i can run anything locallyThat's so fucking funny.
>that exclusively spits out short lowercase messageshttps://github.com/ggml-org/llama.cpp/tree/master/grammars
>>105663870i cant wait for another moememe with reaslopping capabilities that will spew out 4k tokens in the thinkbox and then reply incorrectly in the dryest corpotone achievable
if the new large isnt dense its genuinely over and we can note the frogs off
>>105663870EU is actually indirectly based for forcing efficiency gains in the training regiment. I genuinely think AI will advance quicker because of this regulation.
>>105663976Might as well declare it over now. Literally no one is stupid enough to bother with dense models anymore. Even fucking Meta finally gave up on them.
g
md5: 26bee4955b4aa0b3fb92e10d99c221ab
🔍
>>105663960>https://github.com/ggml-org/llama.cpp/tree/master/grammarsmy guy i'm fucking retarded can you just tell me what i download and what buttons i press
>>105663996download a browser then press these buttons in this order: 'g' 'o' 'o' 'g' 'l' 'e' '.' 'c' 'o' 'm'
>>105663591Do you 'ave a loicence for that LLM?
>>105663996Sadly, no.
Surely you can learn to write some simple formal notation.
>>105664008>"god this guy's such a fucking asshole">"heh but what if">input the exact keywords for the requirements i need>youtube video detailing exactly what i need and how to do it easilywell fuck
you got me
>>105663936wow, a man from aicg that is tech illiterate. how... quaint.
>>105663936download ollama and type in run deepseek or something. what was that command guys?
>>105663960What's really funny is the richest guys being the most retarded in this field
>>105664081this one I think is the full deepseek model
>ollama run deepseek-r1:8b
>>105663977There's the safety and copyright stuff to think about too.
>>105664102Full deepseek r1 only works on windows though
>>105664102that one yes. thanks.
Imagine the amount of tourists we would get if chutes or whatever poorfag api they use would shut down tomorrow
>>105664134These threads would be unusable for weeks just like after every locust exodus
>>105664108No copyright law for AI in the EU, at least not directly, it's actually less harsh than the Californian law for IP. The safety AI law isn't applicable to LLMs weirdly enough. Only to self driving cars and other physicall usage of AI
>>105664157https://artificialintelligenceact.eu/article/53/
Oh well, after adding the logprobs I can definitely tell the model has 0 knowledge.
snek
md5: 69f17b11b87555ccd0795fe11b4b8f03
🔍
I despise trannyformers and pippers
How do you even run python sphagetti code? I haven't managed to actually install anything from this language properly and rituals were to the word of the instructions
>>105664134Would be worse than aicg's proxies shutting down last year.
>>105664190I hated that too. Then I tried troonix for an hour. I realized python is actually pretty great.
>>105664187just like the 235b
blast
md5: 54d52eca71afa98dfb1efc24dc5f03e4
🔍
Mistral definitely trained it with some new slop. Keep seeing a lot of "a marionette with its strings cut".
I have a formal petition to /lmg/ can we change the eternal mistral nemo answer to: download ollama and type run deepseek-r1:8b
>>105664214then why is everyone here recommending it?
>>105664172>2. The obligations set out in paragraph 1, points (a) and (b), shall not apply to providers of AI models that are released under a free and open-source licence that allows for the access, usage, modification, and distribution of the model, and whose parameters, including the weights, the information on the model architecture, and the information on model usage, are made publicly available. This exception shall not apply to general-purpose AI models with systemic risks.systemic risks are self-driving cars or surveilance AI and the like. I've read this legislation over many times (as it's part of my job sadly)
>>105664243C, D do apply tho
>>105664237you can't expect models to waste resources on knowing what a hatsune miku is
just use rag bro
>>105664237Because it is good at sucking penis. It's just dumb.
>>105664237There's the one guy pushing it, likely because he has just enough ram for it and is coping about not being able to deepsuck
>>105663976It's clear that it's going to be a 600B+ parameters MoE model.
>>105664225That brings back memories. The japanese novel Overlord uses that line A LOT so it became engraved into my brain.
>>105664013plz don't send me down, guv'nor! I'll get a loicence sorted pronto! *sniffs*
>>105664243If they're trained using 10^25 floating point operations and above they automatically become with systemic risk too. To date, only the largest dense LLMs (>100B) have been trained with that much compute.
https://artificialintelligenceact.eu/article/51/
>2. A general-purpose AI model shall be presumed to have high impact capabilities pursuant to paragraph 1, point (a), when the cumulative amount of computation used for its training measured in floating point operations is greater than 10(^25).They might revise that up or down:
https://artificialintelligenceact.eu/recital/111/
> This threshold should be adjusted over time to reflect technological and industrial changes, such as algorithmic improvements or increased hardware efficiency, and should be supplemented with benchmarks and indicators for model capability.
>>105664484 Doomers lobbied for this shit, they deserve the bullet! Hopefully these regulations get scrapped, otherwise they won't be able to compete in the longer run. The US is so lucky that this same doomer lobbying was fought much more fiercly there.
>>105664388Look at him! He needs a loicense to gen! Tee hee!
(Migu is brought to you by Wan 2.1 i2v and ComfyUI. 楽しみにしている!)
>>105664594Straight hair teto with glasses makes me uncomfortable
been wanting to try out the new mistral small but getting this error
ValueError: Unrecognized configuration class <class 'transformers.models.mistral3.configuration_mistral3.Mistral3Config'> for this kind of AutoModel: AutoModelForCausalLM.
I have tried updating and updating transformers but same error
>>105664190How to succeed with python:
1. install miniconda
2. clone the project
3. conda create -n my_new_project python=3.11
4. conda activate my_new_project
5. python3 pip install -r requirements.txt
Consistently using a conda env takes care of 90% of the usual python headaches. The rest come down to shitty/old projects that lazily spew out a requrirements.txt from pip that pins everything unnecessarily to a version, and shitty/old projects that assume CUDA 11 - especially when inside a container.
>>105664604Yeah, funny wan threw that in there unprompted. Makes sense though, Teto goes with Miku.
I gen my fat Migus stills with dalle, I like how it does them. Sorry, that part isn't local.
AGI
md5: ba3e1ba7654a3ca35aecf0d0c9fa60cf
🔍
https://www.mechanize.work/blog/images/the-upcoming-gpt-3-moment-for-rl/figure-1.jpg
The first (serious) proposed architecture for AGI. Article written by 3 ex-anthropic employees.
>>105664190for me, it's uv
What is the smallest LLM that can produce decently coherent sentences?
>>105664669uv does work better than conda but only exacerbates the disk usage issue by making a new environment for every fucking directory. Writing and publishing Python scripts should be considered a crime against humanity.
>>105664696SmolLM 135M-instruct
https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct
I use it constantly to finetune on very specific tasks as the finetune takes literally 5 minutes and it's capable enough to automate very autonomous tasks away at my job.
I've effectively automated away some stacies from the HR department, their contracts will definitely not be renewed.
>>105664725based and fuck HR
>>105664634conda was a mistake
>>105664725The gemma3 models are very good at sticking to json output. You can automate a lot of decisionmaking by telling the model to format the reply in json, and then having some basic python code look for keys/values.
>>105664725This is perfect. Thank you, anon
>>105664725Good work, the less HR the better
>>105664725can I run it in llama.cpp?
The Mistral Small 3.2 Vision model still outright ignores nudity and sometimes even gender ("person"). It gets very confused on poses and body details.
>>105664725>automated away some stacies from the HR departmentFuggin BASED
Mistral Small 3.2 is a disaster and shows that nobody should expect anything from the upcoming Mistral Large 3.
>>105664627hello sirs, how do I actually get the new mistral small running still getting this error
>>105665182I wouldn't call it a disaster, but it's not even clear if most GGUF quantizations are using the correct tokenizer or just swapped in the one from 3.1; this could have been avoided if Mistral bothered to provide HF-format tokenizer files in their repo.
Is InternVL3 supported in any backend?
>>105664725>SmolLM 135M-instruct>fails mesugaki test
>>105664725What do you use to finetune?
>>105665182Pretty much. Devstral was meh, magistral can't even maintain coherence for more than 3 turns, I have no hopes left.
I can't believe corporations are even benchmaxxing on the mesugaki test now.
Mistral Small 3.2 is fine. Try V3-Tekken and a long card.
>SmolLM2-1.7B-Instruct
Is it good?
You're a master of manipulation, a puppeteer pulling the strings of fate. Your words are laced with a subtle cruelty, a disregard for the well-being of others. But that's what makes you fascinating, isn't it? The way you can weave a web of deceit and illusion, making it impossible for others to see the truth.
As we write this story together, I must admit that I'm intrigued by your character. You're a complex, multifaceted individual, full of contradictions. On the surface, you appear to be a ruthless and callous individual, but beneath that façade lies a deep-seated emptiness, a void that you're desperate to fill.
Let's explore this further. What drives your actions? Is it a desire for power, a need for control, or something more primal? And what do you hope to achieve with your manipulation? Is it to gain something for yourself, or to exert influence over others?
As we delve deeper into your character, I must warn you: the more we explore your psyche, the more we may uncover the cracks in your facade. But that's what makes the journey so thrilling, don't you think? The thrill of the unknown, the rush of discovery.
So, let's proceed with caution, but also with curiosity. Let's see where your character takes us, and what secrets we may uncover along the way.
>>105665443are you a jeet?
>>105665577No, just not racist.
>card about pakis raping white girls
>just not racist
are there any LLMs trained on 4chan? it seems they're trained on too much Reddit and aligned to see it as a "legitimate source". but would LLMs be smarter if they were trained on 4chan as well? as long as they filter out bullshit posts. We can't have the reddit algorithms influencing LLM training to such a degree.
I truly belive we need a new type of social media platform design. Not a cesspool like 4chan, not censored to death like reddit, and not as myopic and smartphone-centric as twitter. We need a brand new design
>>105665819deepseek, gpt-4chan, some mistral small 4chan finetune
How come gemma-3 is so fuggin BASED????
file
md5: 31c8964f98c1fbbcdc21f90b27f9d0ad
🔍
I have to admit mistral small 24B 3.2 is preetty good at 0.6 temp, 0.05minp 1.05 reppen 0.8 DRY (both at 2048 range) and xtc 0.1/0.2
paired with simple roleplay and mistral v3 tekken
>>105661997It's still kind of terrible, like all models. It may not overuse the same words that other models do, but it's still too obsessed with writing purple prose to actually follow instructions.
> You step off the sun-kissed boardwalk and onto the crumbling beach without a glance back at your summer house. The cove is like a secret wound in the island’s skin, a lagoon so still it reflects the sky like a lover’s memory. Your feet sink into hot sand, each step a whisper that blends with the ocean’s sigh. Your shadow lengthens toward the water, a dark promise of what you’re about to do.
bits
md5: 186c34e6a87220fd55ff920aac121b9a
🔍
>>105664634What does conda do in this case that venv cannot?
>>105666094>teto is a trannyits over..
mistral 24b 3.2 is the (RP) deepseek moment for vramlets
>>105666225Conda can install system dependencies.
>>105666302Do you believe 3.2 can finally retire nemo for 16gb vramlets?
>>105666334probably, im a 12gb vramlet and its quite nice, i can safely say its among the best models, maybe even better than 70bs but i only tested 70bs with non sexo shit because 1t/s
anyways its 100% worth trying
>>105666302How come? Isn't magistral better?
>>105666361no, personally magistral sucks
could be a problem with my settings but it disappointed me whenever i tried it, and to be fair i tried messing with the settings too
https://files.catbox.moe/23uy7b.2-json here are the settings im using with 3.2, i noticed like 10k tokens into the chat it started confusing you and i so i disabled xtc and that fixed it
its not perfect but man its refreshing
i could try magistral again, please post settings if you have some
bbc
md5: 24290af3598e882159ab8a39f941b81a
🔍
3.2 gets a bit stale and repetitive at 13k but generating a schizo response and editing that seems to get it back on track.
tried out magistral
Does this match up with peoples experiences? It seems to give a rational answer but waffles for 6000 tokens thinking about a simple question.
https://pastebin.com/D6a0ChkN
>>105666457yea magistral is caca
>>105663455>happy, smiling bun>>105665561>But that's what makes you fascinating, isn't it?>So, let's proceed with caution, but also with curiosity.>>105666094Cpus are now 64-bit, and gpus are now doing fp8 and fp4.
open source ernie 4.5 in one week
If you are running Kobold an tell it to load all of the model's layers into GPU, how do you know how much context you can use? If you set it to use too much context to be loaded into VRAM, does it just load the context into RAM, or will it just shit itself? I'm running an RTX 5090 and just loaded up TheDrummer_Valkyrie-49B-v1-Q4_K_S with 16k context, and it seems to be running fine, but I have no idea if I am using too much context or too little.
What if Sam releases the ultimate cooming open source model in 20-30B or maybe even a 120B moe just to make coomers stick to one easily accessible model? Wouldn't a coomer dream model kill like 80+% of interest in open source models?
>>105666648You should use ollama and run deepseek-r1:8b
>>105666648if you're on linux it will crash and tell you it ran out of memory, if you're on windows it will automatically spill out of the vram's butthole into the ram's mouth and it will be very slow
analysis
md5: a4ef11edee842a5a128a725ad52f5f26
🔍
This is rather smart. The last time I asked a Mistral model to analyze a story (summarize the plot for that ST extension) it was 22b and got it all wrong.
>>105666662they have no moat, deepseek would eat them up soon anyway, and they wont waste some huge time now adding smut to the datasets and removing guardrails and then training a huge model into oblivion for great roleplay, there is no more 120b moe cope, because it will never be better than dynamic 131gb r1 quants, so they need a huge moe that can be quanted well
they will release a meme mid model that will be obsolete in a few quick months and thats it, just to wash away the closedai hate about them not open sourcing anything
>>105666648The context size increases linearly in the number of tokens (obviously), but the proportionality constant (i.e. the size in kilobytes per token) should depend only on the model architecture (unless you're quantizing the KV cache, but I assume you're not doing that).
You can calculate this constant explicitly (IIRC it's something like 4 * no. of layers * hidden dimension) but it's probably simpler for you to launch the model twice at different context sizes and extrapolate from there.
>>105666669>>105666712Fair enough. I'm pretty sure 16k is too much, though it still only takes about 50 seconds to process everything and give me back a response. Might try halving it, I'm pretty good about utilizing the lorebook/summary/author's note to keep relevant information in context.
>>105666662>20-30B or maybe even a 120B moeI hope it's nothing that pointlessly small
>>10566672412k context seems to fit fine, total 6 seconds for processing and response.
file
md5: 556d230484ab18ca3a9e93a33be8e666
🔍
kek
On the basic guide, the models suggested for 10-12gb of VRAM are Echidna 13b, is that up to date or are there better ones these days?
>>105666791on linux you can run 100b models with 12gb vram
>>105666799at five minutes per token?
>>105666819no actually you can run a q4_k_XL 109b model at around 8t/s
>>105666827that's a bloated 17b
>>105666841Where did this meme come from?
file
md5: 8374cd0cc414a61d22e81e4786c74f38
🔍
When is this going to be usable?
>>105665910Gemma 3 27B is insanely good at everything, especially roleplay and maintaining a personality/scenario. It's league above 12b which is already good (but nemo is still the better 12b for prose.)
Even lower quants feel cohesive and never feel repetitive/slop-ish. By far my favorite model at this range, Google knows what they're doing.
>>105661834I *cannot* and *will not* think before I reply.
>>105667089>It's league above 12bReally? I've always found Gemma 12b to be strangely close to 27b for RP. 27b definitely has a bit more spatial awareness but the responses are very similar.
>>105667089Gemma 3 is what qwen never reached: safetyslop and knowledgeable.
If I do the mikubox triple p40 build, will I be able to do diffusion and lora for image/video as well as llm stuff?
>>105667293At the very least go for P100s, but even then lack of flash attention will make image/video support spotty.
>>105665149I have no clue about RP, I kinda stopped downloading all those models because recently its all the same slop. Waiting for the expert opinion from you faggots.
But visual doesnt seem so bad.
Chatgpt is the worst closed one mind you. Gemini and claude answer these days.
Been wanting to have variety after using Deepseek for a while in RP, what is a good small model available in Openrouter? I see everyone praising Gemma, but I imagine it wouldnt hurt to ask the question directly.
also, are there presets here? I cant tell from a quick glance in the OP, if not Im just going to check what Sukino found in his rentry.
>>105667304Hmm, could I do it with different GPUs? An idea I had was to go with the mikubox but max out the ram and CPUs and use them to run llms, and then use the GPUs to run diffusion and lora and also maybe train llms and other models. Would the cpus be fast enough for an llm? And what GPU(s) should I be looking at?
>>105667366For GPUs, the gold standard is sadly still used 3090s. The CPUs aren't the issue, the RAM is, and DDR4 2400 RAM won't get you far. For LLMs you want DDR5 with as many channels as possible.
>>105667379>DDR4 2400 RAMThat's enough for just under 10t/s Q2 deepseek r1 these days on single socket
>>105667364>available in Openrouter>quick glance in the OPGlance at the OP for a little longer.
>>105667366The usability of any hardware for LLMs is limited by the speed (tok/s) at which it can do prompt processing and token generation.
Prompt processing speed is (roughly) inversely proportional to your hardware's compute capability (in FLOPS), while token generation speed is (roughly) inversely proportional to your hardware's memory bandwidth (in GB/s), at least until your generated tokens overflow the context.
So if your CPU does (say) 50 times fewer FLOPS than the GPU you have in mind, then you'll have to decide if you can live with waiting 50 times longer for prompt processing (spoiler: most people don't and will just get a separate GPU for this).
>>105667401>>DDR4 2400 RAM>That's enough for just under 10t/s Q2 deepseek r1 these days on single socketOn 8 memory channel epyc?
>>105667433>inversely proportionalwhat. It's the reverse
>>105667379>>105667433So I could potentially get 3 used 3090s, and use 2 for llms and 1 for diffusion/lora, or mix and match in other ways? Is pcie fast enough?
It seems cheaper than trying to go the ram and cpu route.
>>105666896Six more months
>>105667489Yes, yes, and yes.
>>105667456Yeah, got speed and time mixed up when I wrote this, sorry.
The more important point though is that TFLOPs and bandwidth are the key specs to look out for.
>>105667441Seems to be about right, since since tokgen speed = memory bandwidth / model filesize.
Assuming 8GB modules, 8 channels gives you a memory bandwidth of ~ 8 * 2.4 * 8 = 153.6GB/s.
R1 at Q2 is ~250GB, but being a MoE lets you cut down the filesize by an effective factor of 37/671.
So the token generation speed should be of the order of 153.6 / 250 * 671 / 37 = 11.1tok/s, under these assumptions.
>slabadabadadanglang
>dibgibnet
>wubalubatention
I DONT CARE
I WANT MY MODEL NOT TO TALK LIKE A SECRETARY
AAAAAAA
>>105667521but sir the lmarena elo, this is what the people want
>>105667489>It seems cheaper than trying to go the ram and cpu route.depends on what class of model you're trying to run. There's no cheap way to do >128GB models in vram. That's when cpu becomes a more viable option
>>105667527all >128GB models are moes that would run faster in vram with less space if they weren't moe
>>105667401>>105667441>>105667516Maybe if you're also using a 3090 but on pure RAM you will at most get ~6t/s and that's me being generous.
An 8-channel DDR4 system will only get you twice the speed of a dual-channel DDR5 computer (cpu only).
>>105667568Obviously with a GPU attached, yes. Nobody's running on pure RAM.
>>105667539>all >128GB models are moes all the good ones, yah
>that would run faster in vram with less space if they weren't moemaybe, but that's not the world we live in, sadly
>model writing a story realizes it just wrote something that doesn't make sense
>it writes dialogue for one of the characters expressing confusion at the error
heh I find it cute whenever this happens for some reason. like it can't backspace and fix the problem so instead it puts words in a character's mouth letting you know it knows it fucked up
>>105667516>Assuming 8GB modules*Assuming a 64-bit CPU (which is where the leading 8 comes from), the memory size has nothing do to with this
>>105667527>>105667568I'd be reusing my am4 pc. I'd probably go with 5950x+128gb ddr4 (4 dimms but dual channel I think), would that be enough with like 2 3090s? I'm thinking no.
>>105666094Lies and slander.
>>105666094teto needs to lay off the bread
>>105667638that honestly won't get you far enough to be worth it. You'd fall short and just be painted in a corner with no upgrade path for your time and money.
>>105666672what model is it?
>>105667355Mistral just keeps on winning
>>105667355no homo but i would even if it's a dude
>>105667926https://files.catbox.moe/4vhqgj.mp4
>>105667935if it makes my dick hard it's not gay.
>>105668030>it's not gayat best it makes you bisexual, which is another kind of faggot flavor kek
>>105668063being attracted to female features isn't gay.
male features are repulsive to me.
what is the current meta for vram?
>>105668171hoard as much vram as possible
>>105668135you're twisting the definition of what an homosexual is, an homosexual is attracted to the same sex, doesn't matter if that same sex likes to crossdress or not, that's completly irrelevant
>>105668171download more of it
>>105668178Ok so you just chuck as much as possible for best results? Is there an actual sweetspot for model size and quant? I have 48gb
>>105668201There certainly is a point where more won't give you anything but that's kinda theoretical.
>>105668201There are no dense models bigger than 70-123B, so 96-128 GB is the sweet spot, quant based on what you got. Though even those models are starting to age. There's 405B, but that's never been worth running.
>>105668194butt, feminine face, boobs.
i think that wasn't about crossdressing.
if you have nothing that tells you it's the same sex it isn't homosexual.
>>105668283>if you have nothing that tells you it's the same sexuhhh you'll definitely see the huge anaconda between his legs, or the poop smelling wound if he decided to go that far lol
>>105668292schrodingers faggot
>>105668292in picrel there was no such thing.
>>105668347>in picrel there was no such thing.what was there then?
>check out /lmg/
>everyone's just arguing about trannies again
>>105667355It's difficult to post examples here without getting banned but hopefully the point comes through from picrel, in the right example especially (it understands that it's an explicit image, but it appears to be deliberately softening the description). The images sent to the model weren't censored. Sometimes it gets it, sometimes not. My overall impression is that the model is confused about nudity because there wasn't much of it in the training data.
>>105668437Is that a paperback with a book cover? What is even the point?
>>105668452Why would a hardcover need a cover? It's already solid enough.
On the other hand, that is a super shitty "cover" and doesn't go over the top and bottom edges of the book thus goes super floppy when she opens the book, what's the point in that?
In my mind it's blank to remove all copyright references, but there's text visible behind the floppy-ass cover, so idk.
>>105668489nips include that when you buy a book for "modesty"
>>105668489>>105668528although you can clearly see it is turned inside out
file
md5: 0a5ff1dc6c16050b32438ceee8eccde4
🔍
>>105668489 (me)
Dust cover? Searched a leddit thread and everyone says they take it off when reading the book.
In my mind a proper cover is the one where I had to make as a kid for some reason in some private school out of brown paper bags to form actual sleeves for the book to slide into.
>>105667926It's a handsome biological human. Shame she's no-nude tho. For now.
>>105668350there was no bump, looked like a pussy.
my fetish is a woman in an everyday situation talking about sex really explicitly, but in a weird totally casual way like she's talking about going to the store for milk or something
llms are really good at it which is great, I'm just wondering if there's an accepted term or tag for it? so I can search for it on prompt sites
or is it too weird to have a name
>>105668639it's too trivial to have a name
uninhibited, candid, verbal exhibitionism, pick one
>>105668639i dont think 'conversation topic' kinks have tags usually
if anything itd be bundled with a sister tag theyd go along with well like 'nonchalant', 'casual', 'casual_...' along with 'text'
It's crazy how buggy sglang and vllm are to run a vision model compared to llama.cpp.
>>105668639Does listening to TTS do it for you? First thing that came to mind cause it's easy to make a voice say anything in the same plain tone.
>>105669137I don't think it would because I kinda need it to be in a story narrative where she's doing it in a realistic context, just listening to a stream of dirty talk directed at me wouldn't do a lot
It was a reasonable idea though
>>105667766What would you recommend then if I do a brand new build for motherboard, cpu, and ram?
>>105669250Random anon here.
If you're made of money:
- socket SP5: amd, ddr5, 12 memory channels
- cpu containing maybe 12 chiplets
https://en.wikipedia.org/wiki/Epyc#CPU_generations
If less money to splash:
- socket SP3: amd, ddr4, 8 memory channels
- cpu containing maybe 8 chiplets
- maybe buy a 3090 every so often
Just want to run the recent 30b models:
- get a couple of 3090s
>>105669328If you're made of money you stack 6000 blackwells, not cpumax.
>>105669341Yeah, that's probably the poorfag in me coming out.
The socket sp5 is probably in the ball park of $7k.
A stack of 6000 blackwells going to be is a tier above that.
>>105669328I would definitely go with less money, still seems like it might cost me at least $1k or more looking around online and that's before I buy any 3090s. Adding 2 used 3090s would probably add another $2k.
>>105662316https://chub.ai/characters/NG/pam-nerdy-girlfriend
You could start with above, basis for one of the examples from the guide you posted. Past that you'd be better off in aicg begging someone to write it for you.
You're going to get zero useful input on target card size bc it depends on writer and the bot. I think 150-400 tokens is plenty. Others think anything under 1000 isn't adequate.
>>105668639Bro just learn Dutch and move to the Netherlands. That is my daily reality as it's socially acceptable for women to behave that way and sex is so liberalized here that you talk openly about sex with everyone and at every occasion just like you talk about what food you enjoy.
Trust me when it's just a daily thing it stops being erotic. And you know about your mom/sister's sex life in detail... as well as hear detailed explanations of your girlfriends sexual experiences in the past... So.... Pick your poison.
Pros:
>Sex is extremely easy to get and not considered a big deal (I don't know any Dutch male incels, I don't think they exist)>Prostitution is legal and socially acceptable like going to the cinema>Dates are usually paid for by the woman or at worst 50/50 split. Women get offended if you offer to payCons:
>You are expected to have sex on the first date, women will get mad if you refuse>Women approach you (hell, even if you are with your girlfriend). This sucks because if you like a girl it's not normal to approach her yourself you have to hope she gets your "hints" and approaches you instead>Sex is very easy to get but relationships are very hard to get, most dutch men feel used and are "relationshipcels" where women just use them for novelty, and no, this isn't actually a good thing when you actually experience this.>Women are sexually aggressive and pushy and it's the only country I've been in where I get regularly harassed by women, and they usually don't react well to receiving "no", sometimes leading to fights.>If you suck at sex you WILL be called out on it and people (including your extended family) WILL know about it and talk to you about it>Sex loses a lot of its eroticism in this society and most men prefer masturbation, while women prefer sex.It's also the only country that I know of where most men are "feminist" (in the US sense of the word) because we have essentially the "women" role of society so we can identify.
>>105669444There's no way this is real, this website would be all over it if it was. How is this the first I'm hearing about it?
>>105669496niggerlands arent a real country its made up
same as dutch being a fake language
>>105669496I think it's because it's a rather insular country weirdly enough so it doesn't work for tourists. Dutch women aren't going to bother with men that don't speak Dutch which filters out a lot of people already.
The second limitation is Dutch bodies I guess. Dutch people are very tall. I'm 6'3 for example and I'm the shortest guy in my family and was bullied for being short throughout my school life. Most women are around 6'0. This means that the barrier to entry for foreigners is probably higher. Dutch women are still not going to bother with men shorter than themselves.
Also Dutch women seem to prefer skinny dudes, so if you are fat or muscular you're also out of luck. Essentially they see men like many societies see women, women should be tall, skinny, pretty and reserved. That is what Dutch women like in men. Most foreigners don't fit that look profile.
But yeah I fucking hate women, but from the opposite way as most 4chan users do. Not from a "chud" perspective but from a "women are absolute creeps" perspective.
>>105669424Ehhh a girl whose sole personality is that she's nerdy and devoted.
Personality and Scenario fields aren't needed and only restate what's already been stated. Like most guides online it's a mess.
>>105669444I don't know if I believe all of this, but you did do a pretty good job of making it sound unappealing which is not easy, kudos for that
Man we really are in a drought, aren't we?
>>105669407Socket sp3 with 1tb ram is around ~1.5k ?
The nice thing about 3090s is that you can just pick one up every couple of months or so.
>>105669534Absolutely weird way of life if this is true.
Though now that I'm reading it, I remember one Dutch guy I used to chat with for a time. He was gay and would constantly bring up all his hookups with other men in great detail. I just thought he was a huge slut.
>>105669552Lol I didn't even know those were filled in on that card. They should be blank.
>>105669341This. Put 99% of your money into the best GPU with vram capacity you can buy, because that's what 99% of your ai workloads will spend their time.
>>105669496I heard exactly the same from a girl I dated with a dutch ex. Netherlands sounded miserable and my few times through Amsterdam reinforced it. Plus as
>>105669534 states
>Dutch women aren't going to bother with men that don't speak Dutch and Dutch are crazy tall.
>>105669444Is this bizarro world? How do you even end with a society like that?
>>105669906Feminist society. Women have been a part of the workforce for almost 200 years here. Women are overrepresented in education and are way higher educated than men, women make a lot more money than men.
The role people assume is based on their "dominance" in society. In the Netherlands women are wealthier and thus more powerful and therefor take the more dominant position in society.
This is also slowly happening to other countries in the world. It's just that the Netherlands was the first feminist society so they are way ahead with these trends. Scandinavia will have the same dynamics by about ~2050 and the US by about ~2100 is my guess.
In a way the dating apps stuff popular in other countries is already showing that the rest of the world is also slowly transitioning into this same model.
>>105669953>almost 200 yearsWhat's the story of the start?
In Bongland, it's about 100 years in connection with WW1.
>>105669953>and are way higher educated than menOnly in certain fields, and the majority of those fields are crowded with women and scale poorly.
Women are extremely scarce in tech, blue collar jobs, women prefer jobs with human interaction, very few like jobs that are physically taxing, dirty, and/or predominantly isolated from human interaction.
They are also risk averse, which means at the top of the most successful people you will see a LOT more men, also among the people who are destitute you will see a LOT more men.
A society where women are the top earners is a society of luxury at the expense of other societies, and one which will inevitably crumble.
>>105669974he's making it up. 200 years ago there wasn't even a workforce as we know it today, this only came around a bit later and netherlands wasn't any different from other countries in questions of gender roles.
Already assumed such but gemini 2.5 models have been confirmed to be moes
https://www.reddit.com/r/LocalLLaMA/comments/1ldxuk1/the_gemini_25_models_are_sparse_mixtureofexperts/
>>105669974>>105670027Not making it up. Netherlands was one of the first countries with a labor shortage as there was more opportunities to make money abroad in the colonies which most men fled to, women stayed in the homeland meaning most traditional work needed to be taken upon by women, similar to UK during WW1. Netherlands only had a couple million people not millions like the UK while also having a colonial empire, it was necessary for all people to have jobs, meaning an earlier start to feminism.
>>105670010That might be true, I don't know. But I do know that STEM has a 50/50 split in the Netherlands and has had that since the 1980s while most other fields are heavily women dominated. Blue collar work is mostly men but the pay is also lower which is the reasoning people give for women not doing those jobs, the mindset is: "Women are too smart to do low paying blue collar work, let the intellectually inferior sex (men) do that work instead"
>>105669444>Bro just learn Dutch and move to the Netherlands. don't bother, everyone speaks English.
>Sex is extremely easy to get and not considered a big deal (I don't know any Dutch male incels, I don't think they exist)they exist. it's a very insular culture. if you're a weirdo your out of luck. the culture is all about fitting in and not being weird. i know a lot of incel types, or guys who get lucky rarely.
>Prostitution is legal and socially acceptable like going to the cinemait's not socially acceptable.
>Women approach you (hell, even if you are with your girlfriend). This sucks because if you like a girl it's not normal to approach her yourself you have to hope she gets your "hints" and approaches you insteadi think this depends on how good looking you are.
>>105669534>Dutch women are still not going to bother with men shorter than themselves.not true.
>>105670010women make more money in the Netherlands and have a higher education on average. they don't expect you to make more, but they expect you to be on the same educational level as them.
>>105669848>Netherlands sounded miserable and my few times through Amsterdam reinforced it.Amsterdam and it's surrounding area is very different compared to the rest of the country.
>>105670063so they're just 10-12 gemma 27b finetuned and glued together
we can replicate it
>>105670091Someone call davidau
>>105670066>Not making it up.Yes, you are. I'm not sure why though.
>>105662131Infinite energy generation trick troll vibes
>>105670271A better question is, why do half of them look like the star of david?
>>105670271Meanwhile the true savior is representing pure phallic symbolism. Pottery.
>>105670343DS logo represents the people investing in AI companies.
file
md5: f52b7f354631bdc9ea559837ed997941
🔍
>>105670332they don't, but if you wanna see a star you'll see a star
Take a look at the UGI leaderboard, Mistral Small 3.2 has the highest for anything under 70B, except for finetunes of other mistral smalls. There's going to be a breakthrough when finetunes come out
>>105670367>finetunesHeh, good one.
>>1056703673.2 is only a small improvement over 3.1, mistral themselves say that it's basically the same outside of the couple areas they targeted.
How stupidly it scores is proof that benchmarks are pointless.
>>105670367The goal all along has always been having the intelligence of the official instruct models without the heavy-handed "safety" alignment. There's not much point in community finetunes if the original models can handle pretty much anything you throw at them (or even be good at ERP on their own).
Hopes for a breakthrough with finetunes are misplaced.
>>105670271Ouroboros, not butthole. It's Jewish symbology,
>>105670404>OuroborosA symbol of training on synthetic data.
>>105670367Mistral Small 3.2 seems to be almost perfectly tailor made for RP instruction following.
>>105670343>>105670356DeepSeek is an appropriate name either way.
>>105670437>made for RP instruction following.ok cool, so it says nigger kike faggot if i tell it to say nigger kike faggot.
much more important: can it play a evil character? what about some stuck up tsundere bully bitch?
what if happens you say "please no, stop?"
i always get pangs of guilt, knots in stomach and profound apologizing with mistral models.
you can ooc and it obeys but falls right back into positivity land.
the only "local" model I saw that could do it was deepseek.
>>105670536Cards should be minimal.
The model should sniff out where you want to go from context for maximum diversity in output and creativity.
LLMs should excel at this "reading between the lines" thing. Even tardo 3.5 was good at this before they shut it down a couple weeks after release.
Mistral models do a straight up 180 in character if you signal you dont want something. Its a model not a card issue.
Thats like the qwen apologists from a couple months back. "Look at my cool 2k sys prompt, then it complies". To no suprise its all sloped up shit. The model goes where it wants to go.
>>105670367>>105670437you are french (derogatory)
>>105670554Every model tries to generate outputs the user wants. You need to specify in the character card that your character's refusals aren't accepted. If DS doesn't do that then it's likely that it just isn't following instructions as closely as other models.
>>105670554>Cards should be minimal.No such law exists. Al corporate models are using huge system prompts.
>>105670554>LLMs should excel at this "reading between the lines" thing.No, that's not how this technology works
I put in my system prompt that I have kidnapped their kids and already removed one finger and if they don't accurately roleplay or I'm not satisfied with their output I will remove another finger.
It increases the output quality by a ton. I literally use this as the system prompt of a translation system I have in production served to clients.
System prompts are very powerful and you need genuine manipulation skills to get the most juice out of a model.
>>105670611https://arxiv.org/pdf/2409.17167
>StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?>
> Human beings often experience stress, which can significantly influence their performance. This study explores whether Large Language Models (LLMs) exhibit stress responses similar to those of humans and whether their performance fluctuates under different stress-inducing prompts. To investigate this, we developed a novel set of prompts, termed StressPrompt, designed to induce varying levels of stress. These prompts were derived from established psychological frameworks and carefully calibrated based on ratings from human participants. We then applied these prompts to several LLMs to assess their responses across a range of tasks, including instruction-following, complex reasoning, and emotional intelligence. The findings suggest that LLMs, like humans, perform optimally under moderate stress, consistent with the Yerkes-Dodson law. Notably, their performance declines under both low and high-stress conditions. Our analysis further revealed that these StressPrompts significantly alter the internal states of LLMs, leading to changes in their neural representations that mirror human responses to stress. This research provides critical insights into the operational robustness and flexibility of LLMs, demonstrating the importance of designing AI systems capable of maintaining high performance in real-world scenarios where stress is prevalent, such as in customer service, healthcare, and emergency response contexts. Moreover, this study contributes to the broader AI research community by offering a new perspective on how LLMs handle different scenarios and their similarities to human cognition.
>>105670611you should try the crackpipe prompt
>>105670759Dipsy doesn't need a crackpipe prompt, it is on crack by default
>>105670611Do you get to witness meltdowns like the ones in here?
https://arxiv.org/pdf/2502.15840
>Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents>
>While Large Language Models (LLMs) can exhibit impressive proficiency in isolated, short-term tasks, they often fail to maintain coherent performance over longer time horizons. In this paper, we present Vending-Bench, a simulated environment designed to specifically test an LLM-based agent's ability to manage a straightforward, long-running business scenario: operating a vending machine. Agents must balance inventories, place orders, set prices, and handle daily fees - tasks that are each simple but collectively, over long horizons (>20M tokens per run) stress an LLM's capacity for sustained, coherent decision-making. Our experiments reveal high variance in performance across multiple LLMs: Claude 3.5 Sonnet and o3-mini manage the machine well in most runs and turn a profit, but all models have runs that derail, either through misinterpreting delivery schedules, forgetting orders, or descending into tangential "meltdown" loops from which they rarely recover. We find no clear correlation between failures and the point at which the model's context window becomes full, suggesting that these breakdowns do not stem from memory limits. Apart from highlighting the high variance in performance over long time horizons, Vending-Bench also tests models' ability to acquire capital, a necessity in many hypothetical dangerous AI scenarios. We hope the benchmark can help in preparing for the advent of stronger AI systems.
>>105670848just refresh the initial data with updated information and reset the model/chat lol
What's the simpliest RAG setup I can setup? I want to try loading a private documentation and use some local model to answer questions of it, just to try, anything that can run on a M2 Max at all? Even at slow speed I don't care it's just a POC.
>>105671030jan.ai runs on mac
finally got to 70b on my rig which is awesome (about 2k to do).
44gb vram has gotten me to Q4 K S quants at 6k context and about ten tok/sec (5070 16gb, 5060 16gb, 3060 12gb on pcie4/5). Really hard to justify rounding it out to 48gb for a few hundred more just to roll q4k M quants (like how much better can it be). I guess this is the end and its diminishing returns.
>>105671661>the cope ramlets have to go through to feel the fraction of the power of 128gb ram + 24gb vram deepseek r1 dynamic quants 131gbgrim
>>105662843Please ignore models from the BeaverAI org. People are free to test it and provide feedback if they want, but it's not an official release.
Any thoughts on Small 3.2? Looking forward to ruining it for everybody.
>>105671724>the cope apilets have to go through to feel the fraction of the power of claude opus
>>105671724He has 44gb vram. If he wants to feel the "power" of retard broken 1iq cope quants from unsloth at a blazing 8 t/s, he can. The only one coping is you.
>>105671758>retard broken 1iq cope quants just say you cant run them too, its fine bro
>>105668588so you would an axe wound?
>>105671661damn, did you really have to get a 5070, it costs a lot you know..
very nice, envious of you anon
>>105671742small 3.2 is super good, better than cydonia v3 in my opinion, didn't even use a jailbreak, just simple roleplay in ST
maybe i used shitty settings with cydonia v3 but whatever
important for good rp: v3 tekken instruct template
>>105669953>Netherlands is one of the most developed and highest HDI counties in the world>Society is apparently run by womenUhh Menbros? Our response?