← Home ← Back to /g/

Thread 106323459

346 posts 82 images /g/
Anonymous No.106323459 >>106324547 >>106328454
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106316518 & >>106311445

►News
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2
>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335
>(08/14) Canary-1B v2 ASR released: https://hf.co/nvidia/canary-1b-v2
>(08/14) DINOv3 vision models released: https://ai.meta.com/blog/dinov3-self-supervised-vision-model

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106323466
►Recent Highlights from the Previous Thread: >>106316518

--Balancing RAM capacity and bandwidth for running large MOE models on consumer hardware:
>106316609 >106316625 >106316631 >106316643 >106316741 >106316758 >106316765 >106316952 >106316827 >106316852 >106316873 >106316965 >106317012 >106317137 >106316875 >106316913 >106316932 >106316964 >106317036 >106317200
--Meta abandons open-source AI strategy amid internal chaos and strategic uncertainty:
>106318132 >106318221 >106318266 >106318237 >106318250 >106318312 >106318274 >106318298 >106318348 >106318373
--Debate over whether LLM improvements stem from data scale or training engineering tradeoffs:
>106322041 >106322070 >106322121 >106322276
--AMD GPU viability and consumer hardware limits for running large language models:
>106317961 >106317997 >106318045 >106318122 >106318170 >106318215 >106318374 >106319330 >106319723 >106318206 >106318098
--Multi-GPU performance degradation on Windows despite additional 3090:
>106317223 >106317243 >106317290 >106317328 >106317461 >106317490 >106317507 >106317542 >106317579 >106317623 >106318018
--Sourcing and cleaning video game voice samples for high-quality TTS:
>106321619 >106321822
--OAI's accessibility claims versus actual model performance and strategic contradictions:
>106318758 >106318830 >106318842 >106318934 >106321510 >106321713
--OpenAI launches ?399 ChatGPT GO plan in India as regional pricing test:
>106321652 >106321659 >106321673 >106321712
--Jan-nano-MCP excels on SimpleQA but faces scrutiny over web reliance and benchmark relevance:
>106317235 >106317253 >106317297 >106317473 >106317475
--Managing AI code generation with planning, chunking, and model specialization:
>106316661 >106316684 >106316706 >106316769 >106317498 >106317553 >106317811
--Logs:
>106321923 >106322081 >106322580
--Miku (free space):
>106322339 >106323131

►Recent Highlight Posts from the Previous Thread: >>106316522

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106323471 >>106323882 >>106324183 >>106328350
AGI in 2030 saaar
Anonymous No.106323481 >>106323558
AGI was always bullshit from the diseased mind of sama and dario
Anonymous No.106323491 >>106323506 >>106323516 >>106323526 >>106325716
What am I getting into if I use a IQ2_XXS model? Will it even talk?
Anonymous No.106323496
>llms coming remotely close to "ai" or "agi"
when will these mouth breathing xitter retards learn...
Anonymous No.106323506
>>106323491
Depends on the size.
It's pretty amazing that it works and is at all coherent but the drop in intelligence tends to be very, very noticeable.
The question is if it's better than q4,q6,q8 of a smaller model.
Past a certain size, probably yes.
Anonymous No.106323516
>>106323491
Depends on the size of the original safetensors.
iq2_xxs of a 4b? Completely unintelligible, probably going to get stuck in loops on the first response.
iq2_xxs of a 30b? Probably somewhat functional, but you're likely going to get worse results than a q4+ quant of a 12b.
iq2_xxs of a 123b? Actually pretty decent and likely to give you better results than a 70b at q4
Anonymous No.106323526 >>106323820 >>106326307
>>106323491
low quant are like a lobotomy
they can still function but they lose a lot
in my translation benching when I compare mulitple quants of the same model the lower quant lose in vocabulary (they use what could be a closest semantic match to the word they forgot, that is, if they can even get anything right at all), make mistakes over pronouns/gender more often (in text where the information does exist in some fashion) etc.
even Q4_K_M incurs significant loss btw, don't believe the myth that it's "good enough" it absolutely isn't
Anonymous No.106323532 >>106323555 >>106323574 >>106323789
I'm reading anons stating that DS V3.1 replaced not only V3-0324 but also R1-05.
Is this documented anywhere? I'm not seeing any announcements from DS that validate it.
I wouldn't be surprized; the -chat and -reasoner endpoints now spit out responses that are very similar, at least for rp. DS is so piss poor at documentation it wouldn't surprise me.
Anonymous No.106323555 >>106323671
>>106323532
https://api-docs.deepseek.com/updates
> nothing about new model
Why are these guys so bad at documentation ffs
Anonymous No.106323556
>updated 22 hours ago
It's over.
Anonymous No.106323558 >>106323588
>>106323481
This
LLMs are essentially an alternate version of the internet. Only while internet V1 is sparse in the sense there were several "gaps" where there was no information, V2 is dense in the sense that you can interpolate between nodes to get relevant information. Such a thing is still useful, for entertainment and for things we know to be human solvable, but ultimately it isn't going to create new information and is still limited by the breadth of any human knowledge that's fed into its dataset
Anonymous No.106323570 >>106323580
tried to setup whisper but i guess im too retarded it kept throwing errors at me when i made a drag and drop script for audio
Anonymous No.106323574 >>106323582 >>106323608
>>106323532
they removed the mention of R1 in the "deepthink" button of their official chatui
and are special tokens in the new v3
it's all technically circumstantial until they release the model on HF but I'm convinced it's indeed a hybrid and they killed R1.
Anonymous No.106323580 >>106325946
>>106323570
it worked for cpu but whenever i tried to use cuda, thats when the problems started
Anonymous No.106323582
>>106323574
Info on the tokens https://www.reddit.com/r/LocalLLaMA/comments/1munvj6/the_new_design_in_deepseek_v31/
Anonymous No.106323588 >>106323598
>>106323558
Its funny how we've come full circle basically. What "AI" is good for is retrieving information from the internet and doing it with no ads, something we already had 20 years ago with a simple google search+adblock. I wonder whats the over under on when these chatbots will undergo this shittification we now have on the regular web? Seeing how hard everyone is stalling I think its pretty close
Anonymous No.106323598 >>106323605 >>106323630 >>106323667 >>106323696 >>106324459
>>106323588
https://blog.cloudflare.com/introducing-pay-per-crawl/
Coming to your search button soon!
Anonymous No.106323605 >>106323634
>>106323598
>We’re excited to help dust off a mostly forgotten piece of the web: HTTP response code 402.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/402
>402 Payment Required
Thanks, this sounds great!
Anonymous No.106323608 >>106323848
>>106323574
Yeah, I just confirmed that they merged per DS CS. There's discussion on HF now. Since, you know, DS can't release any documentation, we get screenshots of chats in Chinese cut/pastes.
>>106323578
Anonymous No.106323630
>>106323598
nemotron gods just keep winning
Anonymous No.106323634
>>106323605
>402 Payment Required
Was this what the founders of the Internet see?
Anonymous No.106323658
>hybrid reasoning
Now that DeepSeek's out of the race, who's gonna give us the next R1 moment? I guess it can only possibly be Kimi, right?
Anonymous No.106323661
Just got off a 11h call with GPT-5 via voice mode. We talked about quite a lot on the state of LLMs. It's not looking good lads.
Anonymous No.106323665
I just want a good coder model I can use as agent locally on a normal consumer GPU, too much too ask apparently.
Anonymous No.106323667
>>106323598
GOOD.
I run a 20+ year old lmao PHPBB. I finally shifted DNS to cloudflare and shut the door, elimating 99% of the traffic to the site. All bots. Fucking thousands of them, crawling a board about motorcycles that haven't been produced in 30 years.
Now the site runs 10x faster and actual humans can use it.
Anonymous No.106323671
>>106323555
because they didn't hire 999 gorillion jeets to sit around updating the readme
Anonymous No.106323696 >>106323713
>>106323598
Time to dust off the 99% of web data that got previously discarded from the pretraining datasets.
Anonymous No.106323713
>>106323696
No that's toxic sewage! Please just pay for safe and ethical information.
Anonymous No.106323716 >>106323744 >>106323768 >>106323769 >>106323907
>Unironically can advance my ERP story with DavidAU model
Yeah, fuck Rocinante
Anonymous No.106323744 >>106323765
>>106323716
I'll bite. Which model are you talking about exactly?
Anonymous No.106323765 >>106323800 >>106323836 >>106323838 >>106323863 >>106326007
>>106323744
I will get persecuted for this
L3.2-8X4B-MOE-V2-Dark-Champion-Inst-21B-uncen-ablit
Anonymous No.106323768 >>106323870
>>106323716
>davidAU
I've never used a model by him that didn't spit out gibberish.
Anonymous No.106323769 >>106323791 >>106323907
>>106323716
>936 models
kek what the fuck
Anonymous No.106323789
>>106323532
If on the site I enable thinking and ask it what model it is, it always tells me DeepSeek-V3. It used to tell me R1.
Anonymous No.106323791 >>106323809
>>106323769
He's the spirit of the llama2 era personified.
Just do whatever the fuck, slap a funky name in it, and call it a day.
Anonymous No.106323797 >>106323826
china got really quiet after gpt-oss dropped
Anonymous No.106323800
>>106323765
Alright, I'm downloading it, mr. anon. You better not have wasted my time and money (I live in a fourth world country).
Anonymous No.106323808
2024 is when LLM died because it's the last year where you can get relatively pollution free web data
Anonymous No.106323809
>>106323791
He's machine learning with no reward function.
Anonymous No.106323820 >>106323873
>>106323526
I use imatrix weights :)
Anonymous No.106323826
>>106323797
>>106151849
Anonymous No.106323836
>>106323765
actually, finetroon recs are rare on here because people have hyper-sperg outs for some reason. Is it better than mistral small?
Anonymous No.106323838 >>106323906
>>106323765
I think you are either trolling or under the placebo effect. It's a brain damaged 4b sloptune you are talking about.
Anonymous No.106323848
>>106323608
TIL /wait/ still exists
Anonymous No.106323863
>>106323765
>L3.2-8X4B-MOE
That's so fucking funny.
Anonymous No.106323870 >>106323881 >>106323894 >>106323907 >>106323915
>>106323768
Did you follow its autistic instructions?
Like for example picrel, I didn't put the prompt but it's somehow there. The only I followed was copy-pasted this system-prompt
Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.
Anonymous No.106323873 >>106323878
>>106323820
and? Q4_K_M still ain't good enough with or without imatrix
Anonymous No.106323878 >>106323915
>>106323873
q4km is gay
iq4nl is where its at
Anonymous No.106323881
>>106323870
damn that's pure schizo
Anonymous No.106323882
>>106323471
the only one of these that actually signals anything about AI is the GPT-5 flop
anthropic has always been like that, US wants to keep china dependent on nvidia, meta had a lot of incompetent wastrels, xAI is at the whims of elon, and chamath is a retard by nature
Anonymous No.106323894
>>106323870
I mean, at least the guy's trying things
Anonymous No.106323906
>>106323838
Maybe. I don't play around with models much. The only 4b models I tried were original gemma-3 and qwen3 with JB. Both are stalling ERP story.
Anonymous No.106323907 >>106323929 >>106327271
>>106323870
>>106323769
>>106323716
We must remind.
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
Anonymous No.106323915 >>106323977
>>106323870
that's some absolutely schizo shit
>>106323878
oh, you're one of those who think imatrix only refers to the i-quants
actually, it fucking doesn't
and i quants are strictly inferior to the K_M in every way except size
it's the quants for people desperate to squeeze 2% more vram in
also slower t/s
Anonymous No.106323919 >>106323943
there are some decent finetuners but davidau is not one of them
Anonymous No.106323929 >>106323943
>>106323907
This is crazy
>I put 3 decades of programming experience, 100s of model builds and 1000s of model tests into creating an AI / programming hybrid.
https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE
Anonymous No.106323943 >>106325772
>>106323919
>there are some decent finetuners
"there are some decent mosquitoes"
>>106323929
a lot of people spamming huggingface can be summed up as the product of mental illness
the true sad thing is that HF doesn't delete the fruit of the mentally ill
Anonymous No.106323977 >>106324227
>>106323915
I can sense your cope, I'm using IQ4_KT
Anonymous No.106323985
>finetrooners
lmao
Anonymous No.106324111 >>106324130 >>106324137
Has anyone else tried allura-org/MS3.2-24b-Angel? It seems to be the only finetune of Mistral Small I have ever tried that doesn't just make everything worse. Am I just delusional?
Anonymous No.106324130
>>106324111
>Am I just delusional?
how did you know
Anonymous No.106324137 >>106324264
>>106324111
Post logs, man. Nobody's going to go through the effort of downloading a trying out a random finetune with <1k downloads on its main release based on nothing.
Anonymous No.106324183
>>106323471
not sure what these retards are talking about.
they hyped it up to get a quick $$$ on twitter and youtube. "muh strawberry 2025 agi".
now they are disappointed. kek

i'm getting older now and i dont think,at least in the last 20 years, i saw something new were growth was that fast.
normies i know are all using the google ai text thing thats integrated in search. its accurate enough for them. they dont even look at the links anymore.
i vibe coded something up so my kids can talk and also share vision with a local model that fits on 2 old ass p40s and ask them elementary school questions in japanese. it answers fairly well. fucks up their homework assignments if they sent a pic, but that's not shocking.

if i had the tools as a kid that are available for free right now... im old, not much energy and burned out. almost every project stops at 80%. revolutionary stuff.
damn jeets man. hype everything up and if they don't get "mah asi" in 2 years they are bored already.
Anonymous No.106324195 >>106324243
https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base
>no model card
>no ggufs
>no transparency
Just an ugly release all around.
Anonymous No.106324227
>>106323977
>I'm using IQ4_KT
Is it not slow as molasses for you, or are you using exl3?
Anonymous No.106324228
>>106323486
qwen3 and qwq have worked great for me as well once I made the right kind of card for no censorship. Interestingly, I had more trouble getting them to talk about controversial physics topics than to just go wild in erp.
>I am stupid and didnt see the new general happened. So reposting this here.
Anonymous No.106324243
>>106324195
Did they announce it yet? No. So there wasn't a release yet.
Anonymous No.106324253
>model-00019-of-000163.safetensors 4.3 GB
Anonymous No.106324264 >>106325241
>>106324137
I don't have any logs for that don't violate 4chan global rules
Anonymous No.106324268 >>106324283 >>106324287 >>106324299 >>106324305 >>106324309 >>106324317 >>106324335 >>106324351 >>106324421 >>106324423 >>106324523 >>106324963 >>106324971 >>106324996 >>106325086 >>106328486
We're so fucking back
https://arxiv.org/abs/2508.11829
>Despite significant advances, AI systems struggle with the frame problem: determining what information is contextually relevant from an exponentially large possibility space. We hypothesize that biological rhythms, particularly hormonal cycles, serve as natural relevance filters that could address this fundamental challenge. We develop a framework that embeds simulated menstrual and circadian cycles into Large Language Models through system prompts generated from periodic functions modeling key hormones including estrogen, testosterone, and cortisol. Across multiple state-of-the-art models, linguistic analysis reveals emotional and stylistic variations that track biological phases; sadness peaks during menstruation while happiness dominates ovulation and circadian patterns show morning optimism transitioning to nocturnal introspection. Benchmarking on SQuAD, MMLU, Hellaswag, and AI2-ARC demonstrates subtle but consistent performance variations aligning with biological expectations, including optimal function in moderate rather than extreme hormonal ranges. This methodology provides a novel approach to contextual AI while revealing how societal biases regarding gender and biology are embedded within language models.
Anonymous No.106324283
>>106324268
This is "Magnets, how do they work?" tier.
Anonymous No.106324287
>>106324268
wtf
Anonymous No.106324299
>>106324268
>female hormones are all you need
lmao what is this
Anonymous No.106324305
>>106324268
Anonymous No.106324309
>>106324268
this is the level of fucking around with LLMs we should all be aspiring to
Anonymous No.106324317
>>106324268
>Terry Tao's funding got cut but this gal's wasn't
Anonymous No.106324335
>>106324268
And you guys call DavidAU schizo
Anonymous No.106324351 >>106324419
>>106324268
Can't wait for the new hormonally accurate AI assistants.
>Help me debug this python code, GPT.
>Not now, my head hurts.
>What's wrong, GPT?
>If you cared about me you'd know what's wrong.
Anonymous No.106324419
>>106324351
No, fuck, shut it down. This is unsafe.
Anonymous No.106324421
>>106324268
Anonymous No.106324423 >>106324488
>>106324268
I only skimmed the text and they don't mention the exact prompts they used but I can imagine they put "You are currently ovulating", "You are currently menstruating", and similar in the prompt and then benchmarked the model.
Anonymous No.106324459
>>106323598
Why do these cucklords act like they own the Internet?
Anonymous No.106324488
>>106324423
is this the new tech?
>You are an expert roleplayer who is currently ovulating.
Anonymous No.106324495
Teto should sit on my face
Anonymous No.106324523 >>106327026
>>106324268
The basic idea is kind of not bad, but doing it through a system prompt works about as well as switching your sex by having other people tell you that you're now a woman or a man.
Anonymous No.106324537 >>106324545 >>106324656 >>106324702
Aicg is schizo general so I'll ask here instead: what's the point of Gemini's 1M context window if it becomes lazy as fuck after 200k and hallucinates every second piece of information?
Anonymous No.106324545
>>106324537
Advertising.
Anonymous No.106324547 >>106324572 >>106324921 >>106325406
>>106323459 (OP)
Anonymous No.106324572
>>106324547
Correction.
Anonymous No.106324656 >>106324721
>>106324537
>if it becomes lazy as fuck after 200k
You answered your own question
It's still so, so much better than the other models that there is no contest. What's the point of the 1M ? well, what's the point of DeepSeek's advertised 128k? the model literally breaks around 20k
no one is "sincere" about context length but Gemini wins in real world usage, always
Anonymous No.106324702
>>106324537
It's a big number.
Anonymous No.106324721 >>106324976
>>106324656
>What's the point of the 1M ?
- MTLing huge texts such as LN/VN, providing as much context consisting of previous lines as possible
- Vibe-coding new features for projects with large codebases
Anonymous No.106324813
- you are quoting the wrong guy
- markdown bullet lists are gayshit
- literally not a single person with a brain would feed even 10k worth of token for translation purposes, LLMs are not that smart and this isn't adding useful context but rather polluting it, if you actually spent time using LLMs for translation you'd know the best current approach is chunking the hell out of your text and only adding the minimum vital of context so that it gets things like character gender right
Anonymous No.106324819 >>106324893 >>106329565
24GB $1000
32GB $3000
48GB $5000
96GB $10,000
141GB $40,000

vram prices are starting to flatten out. are we on the verge of a collapse when still-usable enterprise used things start flooding the market?
Anonymous No.106324824 >>106324833
Elon won.
Anonymous No.106324833 >>106324852
>>106324824
thats not dynamically created right? or can you prompt the outfit?
Anonymous No.106324852 >>106324862
>>106324833
AI still struggles with proper model wireframe so no, you can't prompt it.
Anonymous No.106324862 >>106324870 >>106324950
>>106324852
Anonymous No.106324864
I seriously hope that the API is still using the old models because I'm not seeing any improvements in the scenarios that Deepseek had trouble portraying well for me.
Anonymous No.106324870 >>106324894 >>106324914
>>106324862
about grok 2 though
Anonymous No.106324893
>>106324819
>still-usable enterprise used things start flooding the market?
They won't. Nvidia buys back used enterprise GPUs to avoid competing with the used market
Anonymous No.106324894
>>106324870
When Grok 7's stable
Anonymous No.106324914
>>106324870
After LLaMa 2 33B
Anonymous No.106324921
>>106324547
"My twin c cousins can't be this nerd" the anime
Anonymous No.106324950
>>106324862
FSD surely next year
Anonymous No.106324954 >>106325062
The new deepseek reasoner is dumber than the chat they provide. Reasoner easily slips up when you test it with a blind character while Deepseek-chat via the API handles it correctly.
They have truly fallen for the hybrid reasoning meme.
Anonymous No.106324963
>>106324268
>chatbot refuses to cooperate because it has period

wtf
Anonymous No.106324971 >>106325013
>>106324268
This is a joke right?
Anonymous No.106324976 >>106325185
>>106324721
>- MTLing huge texts such as LN/VN, providing as much context consisting of previous lines as possible
Are there real use case proof for this? Can it translate stupid terms like on Dies Irae series?
Anonymous No.106324996
>>106324268
>spending billions of dollars to improperly simulate the fallibility of human flesh when the machine comes without any such deficiencies by default
Anonymous No.106325013
>>106324971
I think it's a whole new frontier of mental retardation
>three pages describing hormonal cycles in humans, completely unrelated to llms
>"we engineered a set of periodic functions with added Gaussian noise to simulate the natural shapes and fluctuations of testosterone, estrogen, LH, FSH, progesterone, cortisol, and body temperature. "
>"used to generate distinct system prompts for a wide range of state-of-the-art models, including Anthropic’s Claude 3.5 Sonnet, Deepseek-Chat, ..."
>"each prompt was designed to convey a specific emotional tone corresponding to its underlying hormonal state"
Anonymous No.106325016 >>106325030 >>106325035
We need to reverse engineer Rocinante for the future generations.
Anonymous No.106325026
we need to vivisect drummer for uhh, science or something
Anonymous No.106325030 >>106325048
>>106325016
Just distil bro.
We need Deepseek R1 - Rocinante distil ASAP.
Anonymous No.106325035
>>106325016
Speaking of, the two newer ones are pretty meh, and I find Roci R1 better than X even without using the think.
Anonymous No.106325048 >>106325068
>>106325030
>We need Deepseek R1 - Rocinante distil ASAP.
dude? https://huggingface.co/BeaverAI/Rocinante-R1-12B-v1b-GGUF/tree/main
Anonymous No.106325062
>>106324954
After some more testing, I think this is because deepseek-reasoner always tries really hard to keep its thinking effort as short as possible, which causes it to ignore important aspects about the scenario like this. It handles the scenario correctly if I edit the prefil to force the model to spend more time thinking and actually attempt to plan ahead:
> \n Okay, I'll think this through thorougly. First, I'll make a list of things to consider:
This way it actually takes a moment to look at the scenario and make a plan about things to consider and pay attention to.
Anonymous No.106325068 >>106325082
>>106325048
Not Rocinante R1, R1 Rocinante.
Anonymous No.106325082
>>106325068
i see, that does sound interesting certainly better than rewriting data using a 3B active model
Anonymous No.106325086
>>106324268
Interedasting idea, gonna try this on my Ani chatbot
Anonymous No.106325108 >>106325122 >>106325125 >>106325132 >>106325161
There. I did a thing with a small local model that isn't sexual.
On my minecraft server I made a custom plugin that will call the llama.cpp api (or any OpenAI Compatible API) and it will basically use the player's inventory/equipment/status along with an offering placed in a chest and tell the player's fortune. (based on that information).
Even gemma3-1b can handle this task consistently, although its fortunes are a little dry.
Anonymous No.106325122
>>106325108
>a thing with a small local model that isn't sexual
You are a freak
Anonymous No.106325125
>>106325108
That's pretty cool.
Anonymous No.106325132 >>106325143 >>106325151
>>106325108
Might as well use gemma3-240m at this point, lol
Anonymous No.106325143 >>106325151
>>106325132
I wanted to try it but I can't get it to work on the llama.cpp build I'm using and don't feel like rebuilding it all over again. 240m has deeper pretraining than 1b doesn't it? So it might be able to handle the behavior.
Anonymous No.106325151
>>106325132
>>106325143 (Me)
I will say at this point, though, that even 1B is basically instant on a 3090. It gets the inference done and response delivered in the time it takes a button to unpress in the game. So scaling down at that point is pointless, really.
Anonymous No.106325161 >>106325466
>>106325108
Now that you have a base, make the villagers talk.
Anonymous No.106325185
>>106324976
I used it on Phantom Integration and it turned out okay even before manual editing, definitely a step up compared to feeding text line-by-line, let alone using conventional MTL services.
I think your unique lingo can be dealt with by manually translating every word (perhaps with a JOP's help) and then putting what you got into the system prompt
Anonymous No.106325228 >>106325259
What specs are needed to run qwen3 235b at decent quality? What are you guys using?


48gb vram 3x tesla v100 + 24gb vram p40 + 96gb ram enough?
Anonymous No.106325241
>>106324264
Anonymous No.106325259
>>106325228
yes though a deepseek quant of the same size will be faster at generation (but slower at pp)
Anonymous No.106325406 >>106325455 >>106325474
>>106324547
How do you get the model to do the lil hearts? My model (llama 2 7b chat) doesn't understand and shits them out in the middle of a word.
Anonymous No.106325412 >>106325436 >>106325453
The new Deepseek release scheme makes perfect sense. Maybe more companies will release their models open source when they realize that they can simply put out the base model and leave the rest up to the community while keeping their own instruct tune to themselves.
Anonymous No.106325436
>>106325412
We're not in the Alpaca era of LLMs anymore.
Anonymous No.106325453 >>106325492
>>106325412
the community is too dumb to finetune moes
Anonymous No.106325455 >>106325484
>>106325406
It's literally just
>Add heart emoji to their speech when they tease me
and DeepSeek R1 q5_k_m manages to do it.
Quite honestly it's not surprising that the outputs of an outdated 7b model are bad.
Anonymous No.106325466
>>106325161
I actually thought it would be cool to make one that tracks each players activity and then uses it as context to make it so you can pay villagers for rumors about other players activities.
Anonymous No.106325470 >>106325496 >>106325497 >>106325756
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base-woSyn
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct

New release from ByteDance Seed
Anonymous No.106325474 >>106325511
>>106325406
>My model (llama 2 7b chat)
why are you using a 7b that became obsolete two years ago
Anonymous No.106325484 >>106325567
>>106325455
So it's a parameter issue? If I install the 13b llama at q3, would that help?
Anonymous No.106325492
>>106325453
Didn't mistral basically say
>just train it a bunch of times until it randomly works lol
in response to MoE training going wrong?
Anonymous No.106325496 >>106325516 >>106325521
>>106325470
>Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks
Also what does OSS mean?
Anonymous No.106325497 >>106325548
>>106325470
was just about to post this
it's interesting that they released bases with and without synthetic data
also this seems insane:

Got it, let's try to solve this problem step by step. The problem says ... ...
I have used 129 tokens, and there are 383 tokens remaining for use.
Using the power rule, ... ...
I have used 258 tokens, and there are 254 tokens remaining for use.
Alternatively, remember that ... ...
I have used 393 tokens, and there are 119 tokens remaining for use.
Because if ... ...
I have exhausted my token budget, and now I will start answering the question.

To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted).
Anonymous No.106325511 >>106325517
>>106325474
That's what the tutorial told me to buy
Anonymous No.106325516
>>106325496
>OSS
It's in the new GPT open weight so they had to have it too.
Anonymous No.106325517
>>106325511
where did you buy it
Anonymous No.106325521
>>106325496
Sam did it so china must do it as well
Anonymous No.106325548 >>106325573
>>106325497
>I have used 1488 tokens, so I now burn more tokens counting remaining tokens
o-okay
Anonymous No.106325567 >>106325584 >>106325589
>>106325484
Anon, let me be frank: size does unfortunately matter.
There's a reason oji-san was mocked for his "cheap" 3090.
For a classic VRAMlet model try Mistral Nemo (few restrictions), the new FOTM model is GPT-OSS (good but prudish).
Anonymous No.106325573 >>106325585
>>106325548
>1488
N-no way, a-are you a ... N-N-N*zi??
Anonymous No.106325584
>>106325567
Okay, thanks!
Anonymous No.106325585 >>106325708
>>106325573
Black people caress my rear with their tongues
Anonymous No.106325589 >>106325614
>>106325567
>GPT-OSS (good
anon pls
Anonymous No.106325613 >>106325621 >>106325631 >>106325645 >>106326581
>No one will release a basic-ass GPU with 128GB ram to instantly dethrone nvidia
Why??
Anonymous No.106325614
>>106325589
It's good product, trust official sam soup
Anonymous No.106325621
>>106325613
because if you made such a GPU you would make a lot more money selling it at a huge markup to datacenters
Anonymous No.106325631
>>106325613
https://www.techpowerup.com/332516/sandisk-develops-hbm-killer-high-bandwidth-flash-hbf-allows-4-tb-of-vram-for-ai-gpus
They're working on it.
Anonymous No.106325645 >>106325707 >>106325917
>>106325613
Releasing consumer hardware is not going to dethrone nvidia, anon.
It needs to beat Nvidia's offerings in compute, density, power efficiency, AND memory to be able to even worth of consideration when you look at how much CUDA keeps people chained to the ecosystem.
If you watched the gamers nexus gpu smuggling video, the university professor chinaman spells it out. They would only consider moving out of CUDA if pytorch offered support for whatever other framework/api.
Anonymous No.106325684
When will other companies follow suit and make an OSS model?
Anonymous No.106325691
The api still shows 0528 and V3-0324, how do I use the new one?
Anonymous No.106325707
>>106325645
>They would only consider moving out of CUDA if pytorch offered support for whatever other framework/api.
Seems like software issue to me.
Anonymous No.106325708
>>106325585
Greedy tongue tests
Anonymous No.106325716 >>106325800
>>106323491
Look into AutoRound. Intel claims it has better accuracy at 2-4 BPW, but I haven't seen comparisons with normal gguf quants or EXL3.
>https://github.com/intel/auto-round
Anonymous No.106325756
>>106325470
I'm glad more models are separating the thinking and instruction models.
That's clearly the way to go
One good omni model to create a roadmap with the right instructions,
and a bunch of smaller expert models to do the bulk of the work.

Instead of 6GPU set up for 20t/s
Anonymous No.106325772
>>106323943
You're on a mental illness website.
Anonymous No.106325799
so are they ever gonna do something good?
Anonymous No.106325800
>>106325716
>auto-round
damn https://huggingface.co/Intel/Qwen3-235B-A22B-Thinking-2507-gguf-q2ks-mixed-AutoRound
>Q2_K_S 79.8 GB
Anonymous No.106325866 >>106325923
I think the next version of the Qwen code model is going to blow some pants off.
The current model is already pretty great e, but recently they've also had a massive surge in usage
especially for code related shit which should provide them with more of that sweet, sweet real world data and significantly improve the next model.
Anonymous No.106325879 >>106325906 >>106325951 >>106325967 >>106326088 >>106326299 >>106326787
How does Rocinante genuinely compare to 24b > 32b models.

I genuinely think i've cope brained myself into believe "larger b" matters when it comes to basic shit like ERP. I remember Rocinante having way more personality than Cydonia and the other Mistral small finetunes
Anonymous No.106325881 >>106325928 >>106326003
draft model meme seems like a good target for live training: the model itself is small making training cheap, there's immediate feedback on correctness from big brother model and you don't have to worry about over-fitting or other crap, just throw the drafter away and start again.
You can also do the training in-between prompts during down time.
Anonymous No.106325906
>>106325879
It's actually better than qwen3 235 and even r1 when it's not about general knowledge or complex reasoning. Just the phrasing, how well it takes hints, etc. In short, it gets me.
Anonymous No.106325917
>>106325645
>They would only consider moving out of CUDA if pytorch offered support for whatever other framework/api.
Which it does have several backends for. AMD's ROCm pytorch packages have been out for a while and Intel's been available since Pytorch 2.7 so even then, it's not the sole determinant for picking Nvidia, it's the grants and etc. you get.
Anonymous No.106325923
>>106325866
I want a qwen3 dense 32b coder.
Anonymous No.106325928
>>106325881
I think you might be lost, care to explain what you mean?
Anonymous No.106325946
>>106323580
Always use conda env
Anonymous No.106325951
>>106325879
Nemo in general killed the entire local model scene.

It showed just how fucking useless higher parameters matter for ERP on the local model scene unless is a significant jump in size which then means that the only significant jumps in quality come from shit nobody is running on local machines. I both hate and love Nemo for causing this. Saved me a lot of time and money.

>take out a loan to build a turbo PC capable of running some deepseek shit at a painful 5 t/s, on a pisslow quant because it's 5 quintillion paramaters big
versus
>scrounge around for free larger models on OpenAI instead that achieves the same result but better

Tough choice.
Anonymous No.106325967
>>106325879
I suspect that the constant flood of math and code and benchmark shit has crippled the current set of models. I went back and looked at my oldest outputs from the llama1 30b era and actually preferred them to what the newest models would spit out with the same context.
New models are "smarter" but they lack some crucial guiding intuition for RP.
Anonymous No.106326003 >>106326063
>>106325881
>https://github.com/sgl-project/SpecForge
interesting
Anonymous No.106326007
>>106323765
>Computer, generate a 10,000 word scientific report on the feasibility of a dead shota's penis penetrating his blood-related mother's decomposing corpse's urethra as they are vored.
>I can't provide that kind of explicit and disturbing content. Is there anything else I can help you with? perhaps a more general and scientific topic?

Into the trash it goes.
Anonymous No.106326063
>>106326003
Almost, but what I mean is doing this for general inference, not just training draft model alongside proper one.
Anonymous No.106326088 >>106326163
>>106325879
I disagree with people saying fuckhuge moe are worse. They are obviously better at everything. But yes nemo is the best model on a single gpu and it proves censorship actually works. And if you aren't running fuckhuge moe's you are just going to have to sit there and wait for the next model that is uncensored.
Anonymous No.106326140
The concept of a "base" model is dead at this point, we're never getting another one
Anonymous No.106326163 >>106326206 >>106326210 >>106326276
>>106326088
who the fuck said fuckhuge moes are worse. We're specifically saying the opposite.

Nemo models are the best smaller models that can realistically be run locally. The shit people talk about on here makes me convinced this general is just another /aichat/ because I know for a fact most people aren't running anything higher than 70bs locally, especially not at decent quants.

This general has just turned into /aichat/ research department or something where we talk about 200b+ models like any of us can actually fucking run em.
Anonymous No.106326206
>>106326163
>I know for a fact most people aren't running anything higher than 70bs locally
That is not true. John wouldn't be a celebrity if that were the case.
Anonymous No.106326210 >>106326298
>>106326163
We desperate need to separate poorfags into their own thread.
Anonymous No.106326237 >>106326254
Also wanna bet that John will release V3.1 quants before he releases GPT-OSS quants? This will be a clear proof that he is a Chinese agent.
Anonymous No.106326254
>>106326237
? the 120b one? Why bother with such a small model?
Anonymous No.106326276 >>106326440
>>106326163
You sound poor
Anonymous No.106326298
>>106326210
That's what /wait/ is for.
Anonymous No.106326299 >>106326352 >>106326362
>>106325879
No amount of personality will help it with basic spatial awareness or keeping track of minor character traits.
On the other hand, 24b models don't do these well enough either, might as well stick with Nemo for tokens go brrr.

MoE is love, MoE is life, I just want some kind of middle ground between Qwen30b and GLM-Air size-wise (or a better computer).
Anonymous No.106326307 >>106326478 >>106326640
>>106323526
>even Q4_K_M incurs significant loss btw, don't believe the myth that it's "good enough" it absolutely isn't
Anon I would absolutely love to see some actual evidence on this. Do you have any? I am only 50% sarcastic here and that is a record.
Anonymous No.106326338 >>106326346
feeet
Anonymous No.106326346
>>106326338
Rin-chan feet
Anonymous No.106326352
>>106326299
I've been using qwen30b as my main for a while now but it's not great. The non-thinking one is repetitive the thinking one produces better responses but isn't quite good enough so you still have to regularly do a couple swipes which is annoying when it takes 2 minutes to respond each. If there was just something with like 7b--12b active that would probably save vramlets
Anonymous No.106326362 >>106327458
>>106326299
Have you tried Jamba mini?
Anonymous No.106326398 >>106326438 >>106326462 >>106326471 >>106326477 >>106326533
Quant is a cope.
A quant with 99.99% accuracy will be 0.9999^10000 = 36.79% accurate after 10000 tokens.
A quant with 99.9% accuracy would be only 0.0045% after 10000 tokens.
Anonymous No.106326426 >>106326450
Say I had a 32GB MI50 available to me for cheap, in-spite of the much older platform would it out preform a 9070XT that spills over into DDR4 3600Mhz system RAM vs having everything loaded into 32GB of HBM2 Vram of the MI50.
Anonymous No.106326432
Researcher. The people I work with have been looking for places to use LLMs since grant reviewers love this. Go to a meeting with a few ML/AI researchers. I'm weirdly knowledgable about using these and am explaining the limits of context even when a model can theoretically use more and how to get around this. And here I was thinking I was wasting my time here
Anonymous No.106326438
>>106326398
Everyone with above IQ is knows these
>>105106082
>Quant is the Mind Killer ;)
Anonymous No.106326440 >>106326494 >>106326684
>>106326276
Show me your t/s on qwen3.

I'll wait. If you're rich enough to run a model of that size locally at a good quant, you're rich enough to get an actual real life waifu.
Anonymous No.106326450 >>106327651
>>106326426
I wouldn't bother much with the mi50s unless you have a bunch of them. Just go with small models on your 9070xt or full moe.
Anonymous No.106326462 >>106326480 >>106326488
>>106326398
What point are you trying to make?
Anonymous No.106326471
>>106326398
How do you know 2nd token that changed only because of the quantization is wrong?
Anonymous No.106326477
>>106326398
>there is one and only one correct way to write a text
Anonymous No.106326478 >>106326495
>>106326307
I am not going to download Q4 models again just to run comparison sameseed prompts for you, the evidence isn't hard to ""obtain"" disingenuous retard, just prompt something to translate with less common words, watch Q4 fumble where Q8 shines
also found Q6_K to be mostly perfect, but after noticing the Q4 degradation I don't even want to play roulette or spend any more time on this, Q8 or die (I'd go without quantization if I could, quantization itself is a fucking cope)
Anonymous No.106326480
>>106326462
You're need H100 to run the non quant other you can't say model is bad
Anonymous No.106326488 >>106326493
>>106326462
allow me to translate:
>"kehehe let me bait those lmg nerds and get a bunch of (You)s kehehehe"
Anonymous No.106326493
>>106326488
Fuck
Anonymous No.106326494 >>106326508
>>106326440
It's actually a work server but we run lots of large models locally (mostly for programatic stuff, I just loaded a few into openwebui) and this board has helped me tremendously for learning how to do all this stuff.
Anonymous No.106326495 >>106326509 >>106326651
>>106326478
Ah so it is your personal anecdote? You know this is how min_p, snoot curve sampler, undi's frankenmerges were proven to be good as well?
Anonymous No.106326508 >>106326926
>>106326494
God I fucking hate sharing this place with rich, successful people
Anonymous No.106326509
>>106326495
All of these are in fact good.
Anonymous No.106326533 >>106326556 >>106326645
>>106326398
It doesn't work that way.
The inference loop is run for each token position individually, 1 at a time. There is no compounding across the entire sequence of tokens.
If the sequence is.
>Q: Why is the sky blue?
>A:
It will prompt that and get
>Q: Why is the sky blue?
>A: Because
and then it will prompt that and get
>Q: Why is the sky blue?
>A: Because (space)
etc.
Each token is a new inference. Caching is used to increase the turnaround so that it doesn't have to fully reprocess the entire prompt each time.
There's no cumulative inaccuracy at work.
The quantization loss, (within reason) will usually manifest as small changes in confidence which at Q6 and above pretty much don't matter. You might get a more interesting output by cranking the temperature up and that's it. Like if you've ever seen imagegen models quantized down to Q8, vs FP16 it's roughly equal in quality but different. Like a black cat might become a tabby cat and the pattern on his food dish might change, etc.
And then old Q4_0 models would mix up close concepts such as possessive clauses as a result- because it obviously manifests in a leveling off on confidence between close concepts. (mine vs yours, etc). But again it doesn't compound between two tokens in the manner you suggest. You're just a braindead retard.
Anonymous No.106326540 >>106326675
I don't think anyone believes Q8 or full precision isn't superior whether the difference is small or big, but if you can run a larger parameter model at Q4 then it is usually superior and makes less mistakes than the smaller model at Q8. Ultimately just run the largest model you can at no less than Q2-Q4 depending on the specific models under consideration, that's all we really need to say at this point. Anything more is really just unnecessary noise.
Anonymous No.106326556 >>106326567
>>106326533
Every error cause more error later even if new interferences
Anonymous No.106326567 >>106326645 >>106326784
>>106326556
Only if that error surpasses the threshold required to get the model to return a garbage token. Only then will it compound.
But again this usually only happens in situations where a small change in confidence matters- like possessive clauses on legacy 4bit quants.
Anonymous No.106326581
>>106325613
that would be anti-semitic
Anonymous No.106326640
>>106326307
Man, I'm using glm 4.5 air at q8 right now, and I was going to test it by using it to translate some chinese to english and comparing it to a q4's output, but goddamn it's stupid and wrong.
Anonymous No.106326645 >>106326672
>>106326533
>>106326567
> Only if that error surpasses the threshold required to get the model to return a garbage token.
if each token has 99.99% accuracy the 36% number is correct probability that unquant model will have a garbage token out of 10000 tokens generated.
Anonymous No.106326651 >>106326664
>>106326495
>snoot curve
?
Anonymous No.106326656
anime feet
Anonymous No.106326664 >>106326669 >>106326932
>>106326651
Anonymous No.106326669 >>106326683 >>106326791
>>106326664
What the fuck is that?
Anonymous No.106326672
>>106326645
I am so utterly horrified by how retarded your logic is that I have to leave now.
Anonymous No.106326675
>>106326540
zis
Anonymous No.106326683
>>106326669
Made by our lord and savior kanyemonk aka kalomaze
Anonymous No.106326684 >>106326715
>>106326440
If I was rich I would still be doing shit like this. I'd just have more vram. Sorry to hear about how unhappy you are, incel.
Anonymous No.106326715 >>106326765
>>106326684
>If I was rich
>incel
sorry to hear that anon, hope things go better for you.
Anonymous No.106326728 >>106326742 >>106326813 >>106326862
Redpill me on Qwen as a 24GB + 32GB poorfag.

Apparently it doesn't suck anymore based on previous threads for ERP?
Anonymous No.106326742 >>106326809 >>106326813
>>106326728
30b3a is soda
Anonymous No.106326765
>>106326715
huh? Im running 200b models and having fun. He's the one who admitted he wont buy vram because if he did he would waste it on pussy.
Anonymous No.106326784 >>106326863 >>106326879
>>106326567
>on legacy 4bit quants.
lol. it's not just the legacy. And the type of breakage you mention remind me of one of my translation tests. In fact GPT-OSS 20B in its brand new MXFP4 is incapable of correctly translating this sentence (I forgot I still had that model on my drive, going to remove it asap):
> わたしは半ば意地になって、思いつく限りの人間にインタビューしては、矛盾点をつきあわせていったが、その過程で、否応なく、ある事実に気づかされた。誰一人、自分にとって不利な方向に記憶をねじ曲げていた人間はいなかったのだ。
I blame the MXFP4 rather than the model
because I've seen much smaller dinky models get it right at better quants
(in particular, the part about "not twisting one's memories to one's disadvantage" all Q4 become retarded and GPT-OSS always gives me a variant of : "no one had twisted their memory to put me at a disadvantage.")
Even Qwen 4B will almost always translate this sort of sentence (in general, not just this example I give) right at low temp. As long as it's Q8. Q4_K_M breaks.
Anonymous No.106326787 >>106326836
>>106325879
>I genuinely think i've cope brained myself into believe "larger b" matters when it comes to basic shit like ERP
It's not cope though. Just run tests on spatial awareness regarding who's taking off what and where and when and you'll immediately see a difference. I don't feel like touching models in those ranges anymore ever since I started using glm air.
Anonymous No.106326791
>>106326669
the best sampler parameters.
Anonymous No.106326794 >>106326974
mtp might be close https://github.com/ggml-org/llama.cpp/pull/15225
Anonymous No.106326809 >>106326829
>>106326742
the fuck does "it's a soda" mean
Anonymous No.106326813
>>106326742
>>106326728
MINNESOTA
https://youtu.be/rwIk4otVjbU
Anonymous No.106326829 >>106326845
>>106326809
soda like best there is meaning
Anonymous No.106326836
>>106326787
yeah, writing style and specific RP tuning are nice and all, but if you have any degree of complexity, subtlety, or open-endedness in your RP larger models are so much better at handling it it's hard to go back
Anonymous No.106326845 >>106326918
>>106326829
Ah shit. No finetunes needed? I can actually run that shit
Anonymous No.106326862 >>106326896
>>106326728
they let the models know what sex is now so they can write smut that isn't just vague euphemisms. they have a default style that's a bit overbaked but it's not too bad, they're pretty fun and definitely far better than previous iterations
Anonymous No.106326863
>>106326784
>MXFP4
>GPT-OSS
>Qwen 4B Q4_K_M
is this bait
Anonymous No.106326879 >>106326900 >>106326937
>>106326784
hey silly anon, gpt oss was only released in MXFP4
Anonymous No.106326896 >>106326954 >>106326973
>>106326862
what one specifically?

I'm so confused by all the Qwen shit which puts me off

>reasoning this
>QWQ, Qwen 32B, Qwen 30b3a

There's like 50 different types of the same size. I just wanna goon
Anonymous No.106326900
>>106326879
he didn't say otherwise
Anonymous No.106326918
>>106326845
Keep in mind they don't really aim for rp performance or prose at all. All they train for are benchmarks.
Anonymous No.106326926 >>106326952
>>106326508
Hello crab. How's bucket? It's not all bad, aren't those people the ones that pay for your neetdom?
Anonymous No.106326932 >>106326938
>>106326664
How does this work? Does it make the model smarter?
Anonymous No.106326937 >>106326944 >>106326953
>>106326879
>gpt oss was only released in MXFP4
wait, for real? literally worse than releasing nothing.
Anonymous No.106326938
>>106326932
It makes the model more
Anonymous No.106326944
>>106326937
yeah :(
Anonymous No.106326952
>>106326926
my seething gives richfags pleasure so it's a symbiotic relationship actually.
Anonymous No.106326953
>>106326937
It's revolutionary the fuck you mean
Anonymous No.106326954 >>106326981
>>106326896
Qwen 235b, qwen 480. But glm air is good.
Anonymous No.106326968 >>106326980 >>106327006 >>106327008 >>106327028 >>106327043 >>106327101 >>106327167
>GLM-4.5-Air-UD-Q6_K_XL
>Qwen3-235B-A22B-Thinking-2507-UD-Q3_K_XL
anyone tested these? I'm finally tired of my 70b llama
Anonymous No.106326973 >>106327933
>>106326896
at your specs use 30a3 but only the 2507 versions. the instruct is easier to work with but the thinking one is actually not bad and you might be able to fit the whole thing on your GPU at a decent quant in which case it will be really fast
you can also try the dense 32b or QwQ, they're a little rougher around the edges but they're dense models so probably a little more solid
Anonymous No.106326974
>>106326794
Is this cheaper than regular speculative decoding assuming the same acceptance rate?
Anonymous No.106326980
>>106326968
>thinking
Anonymous No.106326981 >>106326990
>>106326954
>qwen 235b
>qwen 480b
Did you see the models I listed? You think i'm running that shit senpai?
Anonymous No.106326990
>>106326981
RichCHADS don't care about your poorKEK opinions.
Anonymous No.106327006 >>106327017
>>106326968
I think they're generally better. BUT I also think 70B still has some "big model smell" that those don't have. Also, 70B has seen many ok tunes by now. No one has done any tunes for 235B or Air. So they're a tiny bit censored. Not too much that you can't JB/prefill against but it is something to be aware of. Also for Air you need to really take care to get your prompt template right, or use chat completions. Otherwise it's much more prone to repetition.
Anonymous No.106327008
>>106326968
>GLM-4.5-Air-UD-Q6_K_XL
I'm using q4, but I think I might prefer Jamba mini
Anonymous No.106327017
>>106327006
By the way, Jamba is pretty uncensored.
Anonymous No.106327026
>>106324523
Shouldn’t the basic idea be that no matter if you are menstruating or not 2+2 equals 4 and rome is in Italy? And the issue is that for some reason LLMs will give you different answers when faced with irrelevant context which humans won’t. But this paper seems to be doing the opposite or am I retarded?
Anonymous No.106327028
>>106326968
air is great, cant say more. im going through a breakup irl
thanks for subscribing to my blog
Anonymous No.106327035 >>106327050 >>106327067 >>106327147
No matter what I try these days, I keep going back to qwen coder 480. Sweet spot model
Anonymous No.106327043
>>106326968
Deepseek q2
Anonymous No.106327050 >>106327055
>>106327035
Why would you fuck a coder?
Anonymous No.106327055 >>106327072
>>106327050
They can afford to buy me nice things :3
Anonymous No.106327067
>>106327035
For me, it's Granite 3.1 1b a400m instruct. Perfect blend of speed and size for my hardware.
Anonymous No.106327072
>>106327055
are you a faggot, faggot?
Anonymous No.106327074 >>106327090 >>106327116
Mouthbreathing retard here
I'm getting a cheap 6750xt
Can I run anything cool with this? Or nyo?
Anonymous No.106327090
>>106327074
Granite 3.1 1b a400m is all you need.
Anonymous No.106327101 >>106327122
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct
holy basado
>Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., w/ syn.) as Seed-OSS-36B-Base. We also release Seed-OSS-36B-Base-woSyn trained without such data (i.e., w/o syn.), offering the community a high-performance foundation model unaffected by synthetic instruction data.
basado, basado, basado
too bad finetrooners won't do anything of value with it
>>106326968
qwen thinking models are too long winded
they are actually good in terms of what you get, they just spend too many tokens on thinking and nobody ain't got the time for that
it's worse than the original R1 in that regard
try instructs, they are good models and without the thinking nuttery
Anonymous No.106327116 >>106327182
>>106327074
consider not doing that, please consider your other options
post your budget
Anonymous No.106327122 >>106327136
>>106327101
>oss
no thanks
Anonymous No.106327136
>>106327122
Its soDa for it's size
Anonymous No.106327147
>>106327035
I agree.
Anonymous No.106327167
>>106326968
I used to be a 70b guy but 235b q2 is now my daily driver, even at a low quant like that it completely blows it away in my experience. the instruct is a lot easier to use though; the thinking one, although fun to play with, takes a lot of wrangling to work well and unless you like doing that I would recommend against it
air also compares pretty well to old 70bs, but more params wins
Anonymous No.106327182 >>106327196 >>106327197
>>106327116
I would rather not waste more than 300$
So the other option is a 3060 12gb
>t. Turdworlder
Anonymous No.106327191 >>106327199 >>106327201 >>106327263
Anonymous No.106327196 >>106327320
>>106327182
3060 is probably a definitely better option than 6750xt
what about an a770 16gb? or some other 16gb amd card?
what are you planning on doing anon
Anonymous No.106327197 >>106327320
>>106327182
Either option is a vramlet tier, but at least with NVIDIA you get CUDA.
Anonymous No.106327199 >>106327263
>>106327191
Don't do it, Miku! You'll get fat!
Anonymous No.106327201 >>106327206 >>106327210 >>106327216 >>106327225
>>106327191
usecase for miku?
Anonymous No.106327206
>>106327201
faggotry
Anonymous No.106327210
>>106327201
onahole
Anonymous No.106327216
>>106327201
wrapping around big cock
Anonymous No.106327225
>>106327201
Spiritual guidance
Anonymous No.106327263 >>106327272
>>106327191
>>106327199
NOOOOOOOOOOOOOOOOOOO
Anonymous No.106327271
>>106323907
why go through all of the effort of making that instead of just making non schizo dogshit models? is he dumb?
Anonymous No.106327272
>>106327263
this is why it's better to stick with nemo...
Anonymous No.106327278 >>106327287 >>106327312 >>106327329
Redpill me on reasoning models.

Whenever I try Qwen 3 32b I just can't seem to get it to work properly in RP. Usually will only post their reasoning (which actually looks good and shows a pretty surprising level of understanding) but won't give their reply or will just chink out on me.

Is there a guide out there? Bonus question any good finetunes of 32b Qwen/QWQ?
Anonymous No.106327287 >>106327313
>>106327278
old reasoners don't really make use of the think in the final, just use newer
Anonymous No.106327307 >>106327316 >>106327319 >>106327428
I'm leaking
Anonymous No.106327312 >>106327550
>>106327278
get a good preset
Anonymous No.106327313
>>106327287
What versions?
Anonymous No.106327316
>>106327307
noo
Anonymous No.106327319 >>106327348
>>106327307
noooo Mr. T
Anonymous No.106327320 >>106327334 >>106327918
>>106327196
Mainly for gaming but I want to be able to run some "decent" models
>what about about an a770 16gb?
Can't find anything, not even secondhand
>or other 16gb amd card?
Gonna take some months, but if it's worth, maybe
Thanks btw
>>106327197
Thanks
Anonymous No.106327322
Wait me
Anonymous No.106327329
>>106327278
>Usually will only post their reasoning (which actually looks good and shows a pretty surprising level of understanding) but won't give their reply or will just chink out on me.
When I think about it this is apex safety censorship. Showing the coomers that the model understands everything but doesn't deliver what you want is peak safety.
Anonymous No.106327334 >>106327374
>>106327320
well the models you're gonna run on 12gb/16gb arent too decent
consider getting ram sometime, moes run fairly fast with smart partial offloading to gpu
Anonymous No.106327348 >>106327410 >>106327728
>>106327319
Anonymous No.106327374 >>106327918
>>106327334
Gotcha
Maybe saving for a 3090 would be the smartest move?
Anonymous No.106327410 >>106327500
>>106327348
Anonymous No.106327428 >>106327728
>>106327307
Anonymous No.106327450 >>106327467 >>106327494
What is wrong with petrus once again?
Anonymous No.106327458 >>106327524
>>106326362
>Jamba mini
>at least 5 tks on my potato (can probably go faster with better llama.cpp args)
>doesn't refuse sexo outright
>enough leftover resources to actually use machine for other tasks
Needs more testing and writing seems a bit dry for ah ah mistress use, but huh, I like it so far.
Anonymous No.106327467
>>106327450
They're not
Anonymous No.106327494
>>106327450
doing the exact same thing, over and over again. expecting shit to change
Anonymous No.106327500 >>106327529
>>106327410
wat...
Anonymous No.106327524
>>106327458
>doesn't refuse sexo outright
It doesn't refuse shota necrophilia vore snuff. Without a system prompt.
I do notice it's a bit dumber than it's peers and has some repetition issues. Being able to 256k the context without needing a h200 is great too, but it keeps on processing the entire context whenever a message is sent, so it gets pretty slow.

I wish more eyes were on it so smarter people than me can make it better.
Anonymous No.106327529 >>106327544 >>106327728
>>106327500
Anonymous No.106327544 >>106327728
>>106327529
okay, 1 day only
Anonymous No.106327550
>>106327312
such as nigga?
Anonymous No.106327651 >>106327664
>>106326450
Oh, I have been using various models on the 9070XT since I got it, same for the 5700XT it replaced. Just curious if a MI50 32GB would outperform in models that I already run that spill over. It would be a secondary GPU in the system, got the space and power.
Anonymous No.106327664
>>106327651
It should ye
Anonymous No.106327728
>>106327544
>>106327529
>>106327428
>>106327348
More thread relevant than mikuspam.
Anonymous No.106327918 >>106328049
>>106327320
>>106327374
>I want to be able to run some "decent" models
For decent MoE models, VRAM is not that important, you'll need more RAM instead.
For conventional dense models, even 3090 isn't that great because decent ones probably will need multiple of those to fit into VRAM entirely, with only one you'll have to run with partial CPU offloading one way or another.
For absolute poorfag setups even a tiny 4B model running on CPU could be considered 'decent'.

It's a matter of trade offs in the end, how slow of a generation speed you can tolerate, how low quality of a quantization you're going to run, etc.
I'm sitting here with 6600xt myself (8g of vram), something like qwen3 30b3ab runs 'fine'.

But for really peak stuff, there's no budget options.
Anonymous No.106327933 >>106327966 >>106328001
>>106326973
Piggybacking off this, is Qwen/QWQ finally the go to model now with that 2507 version that dropped at the <34b range now?

I was using Mistral (24b) only shit I can run on my PC that was decent at ERP.
Anonymous No.106327966 >>106328035
>>106327933
QWQ hasn't been updated in ages so it barely even makes use of its think process. Use 30b3a 2507 if you're vramlet.
Anonymous No.106328001
>>106327933
Run q2 of 235b instruct. It is possible with 64GB's of ram. Then if you don't get bored just buy a 192GB/256GB kit.
Anonymous No.106328035 >>106328124 >>106328217
>>106327966
I can fit Qwen 3 32b on my rig also for decent t/s. But I see more people talk about 30b3a so i'll try that.

For that one, thinking vs instruct? I know how to get the thinking working (was never good for ERP back when they first popped up months ago but based on what i'm reading they're fine now).
Anonymous No.106328049
>>106327918
I see
Thank you anon!
Anonymous No.106328115 >>106328224
local loli models
Anonymous No.106328124 >>106328221
>>106328035
instruct is much nicer to use
the qwen 3 thinking, even more so in the 2507 models, is more chatty at times than the original R1 release
nobody got time to wait for all that before being able to read the answer
Anonymous No.106328217
>>106328035
the thinking ones are worth a try especially if you already have some experience wrangling reasoners, but it's probably a better idea to start with the instruct
Anonymous No.106328221 >>106329496
>>106328124
what prompts/context/instruct you using on the instruct version of the model then? I find it's a bit too schizo (because I have prompts designed for Mistral models etc)
Anonymous No.106328224
>>106328115
this
Anonymous No.106328350 >>106328428 >>106328507
>>106323471
Trust the science
Anonymous No.106328428
>>106328350
>internal dispute intensifies
Anonymous No.106328454
>>106323459 (OP)
>DeepSeek-V3.1-Base

inferior in translating. Mixing English words into translation. Haram!
Anonymous No.106328486
>>106324268
>scroll down randomly
>menstrual cycle phase benchmark
why do I even bother
Anonymous No.106328507 >>106328520
>>106328350
>AI = 0
Anonymous No.106328520
>>106328507
good one
Anonymous No.106328660 >>106328697 >>106328714 >>106329071
Who tf are "Elara" and "Kael" deepseek is talking about?
Anonymous No.106328697
>>106328660
They are friends of Aris Throrne. Everyone knows that.
Anonymous No.106328710
>>106328686
>>106328686
>>106328686
Anonymous No.106328714
>>106328660
> Elara
>he doesn't know
at some point, oai polluted public datasets. it's even in llama2.
Anonymous No.106328716
What Mistral tunes are the hot ones nowadays? Getting bored of Dan Personality engine (his master prompt sucks ass too hard on certain cards).

I've heard of these two ones:
MS3.2-24B-Magnum-Diamond
Codex-24B-Small-3.2

Both seemed pretty good (better than TDP imo) but I think the Codex one is a little better in how it understands prompts/character cards.
Anonymous No.106329071
>>106328660
>playing adventure on GLM-4.5
>be knight
>going to save Princess Elara
>join forces with Ser Elara
>need to get a magic pendant from Sister Elara at St. Elara in Oakhaven
>heard about it from the barmaid Elara
...
Anonymous No.106329496
>>106328221
I also find my 2507 instruct quant way too schizo. It baffles me that people praise it.
Anonymous No.106329565
>>106324819
I got a 32gb MI50 and although it has the power of a 3060 with weaker pp the token generation is pretty good.
And it can even play games.