Thread 106323459

346 posts 82 images /g/

Anonymous 8/20/2025, 2:31:09 PM No.106323459 >>106324547 >>106328454

/lmg/ - Local Models General

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106316518 & >>106311445

►News
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2
>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335
>(08/14) Canary-1B v2 ASR released: https://hf.co/nvidia/canary-1b-v2
>(08/14) DINOv3 vision models released: https://ai.meta.com/blog/dinov3-self-supervised-vision-model

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 8/20/2025, 2:31:29 PM No.106323466

file.png md5: ccc042d0... 🔍

►Recent Highlights from the Previous Thread: >>106316518

--Balancing RAM capacity and bandwidth for running large MOE models on consumer hardware:
>106316609 >106316625 >106316631 >106316643 >106316741 >106316758 >106316765 >106316952 >106316827 >106316852 >106316873 >106316965 >106317012 >106317137 >106316875 >106316913 >106316932 >106316964 >106317036 >106317200
--Meta abandons open-source AI strategy amid internal chaos and strategic uncertainty:
>106318132 >106318221 >106318266 >106318237 >106318250 >106318312 >106318274 >106318298 >106318348 >106318373
--Debate over whether LLM improvements stem from data scale or training engineering tradeoffs:
>106322041 >106322070 >106322121 >106322276
--AMD GPU viability and consumer hardware limits for running large language models:
>106317961 >106317997 >106318045 >106318122 >106318170 >106318215 >106318374 >106319330 >106319723 >106318206 >106318098
--Multi-GPU performance degradation on Windows despite additional 3090:
>106317223 >106317243 >106317290 >106317328 >106317461 >106317490 >106317507 >106317542 >106317579 >106317623 >106318018
--Sourcing and cleaning video game voice samples for high-quality TTS:
>106321619 >106321822
--OAI's accessibility claims versus actual model performance and strategic contradictions:
>106318758 >106318830 >106318842 >106318934 >106321510 >106321713
--OpenAI launches ?399 ChatGPT GO plan in India as regional pricing test:
>106321652 >106321659 >106321673 >106321712
--Jan-nano-MCP excels on SimpleQA but faces scrutiny over web reliance and benchmark relevance:
>106317235 >106317253 >106317297 >106317473 >106317475
--Managing AI code generation with planning, chunking, and model specialization:
>106316661 >106316684 >106316706 >106316769 >106317498 >106317553 >106317811
--Logs:
>106321923 >106322081 >106322580
--Miku (free space):
>106322339 >106323131

►Recent Highlight Posts from the Previous Thread: >>106316522

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/20/2025, 2:32:24 PM No.106323471 >>106323882 >>106324183 >>106328350

1741708692259257.png md5: ed8fc66b... 🔍

AGI in 2030 saaar

Anonymous 8/20/2025, 2:33:36 PM No.106323481 >>106323558

AGI was always bullshit from the diseased mind of sama and dario

Anonymous 8/20/2025, 2:35:27 PM No.106323491 >>106323506 >>106323516 >>106323526 >>106325716

What am I getting into if I use a IQ2_XXS model? Will it even talk?

Anonymous 8/20/2025, 2:35:54 PM No.106323496

>llms coming remotely close to "ai" or "agi"
when will these mouth breathing xitter retards learn...

Anonymous 8/20/2025, 2:37:56 PM No.106323506

>>106323491
Depends on the size.
It's pretty amazing that it works and is at all coherent but the drop in intelligence tends to be very, very noticeable.
The question is if it's better than q4,q6,q8 of a smaller model.
Past a certain size, probably yes.

Anonymous 8/20/2025, 2:39:27 PM No.106323516

>>106323491
Depends on the size of the original safetensors.
iq2_xxs of a 4b? Completely unintelligible, probably going to get stuck in loops on the first response.
iq2_xxs of a 30b? Probably somewhat functional, but you're likely going to get worse results than a q4+ quant of a 12b.
iq2_xxs of a 123b? Actually pretty decent and likely to give you better results than a 70b at q4

Anonymous 8/20/2025, 2:41:02 PM No.106323526 >>106323820 >>106326307

>>106323491
low quant are like a lobotomy
they can still function but they lose a lot
in my translation benching when I compare mulitple quants of the same model the lower quant lose in vocabulary (they use what could be a closest semantic match to the word they forgot, that is, if they can even get anything right at all), make mistakes over pronouns/gender more often (in text where the information does exist in some fashion) etc.
even Q4_K_M incurs significant loss btw, don't believe the myth that it's "good enough" it absolutely isn't

Anonymous 8/20/2025, 2:41:32 PM No.106323532 >>106323555 >>106323574 >>106323789

I'm reading anons stating that DS V3.1 replaced not only V3-0324 but also R1-05.
Is this documented anywhere? I'm not seeing any announcements from DS that validate it.
I wouldn't be surprized; the -chat and -reasoner endpoints now spit out responses that are very similar, at least for rp. DS is so piss poor at documentation it wouldn't surprise me.

Anonymous 8/20/2025, 2:45:12 PM No.106323555 >>106323671

>>106323532
https://api-docs.deepseek.com/updates
> nothing about new model
Why are these guys so bad at documentation ffs

Anonymous 8/20/2025, 2:45:16 PM No.106323556

file.png md5: ca716b7d... 🔍

>updated 22 hours ago
It's over.

Anonymous 8/20/2025, 2:45:36 PM No.106323558 >>106323588

>>106323481
This
LLMs are essentially an alternate version of the internet. Only while internet V1 is sparse in the sense there were several "gaps" where there was no information, V2 is dense in the sense that you can interpolate between nodes to get relevant information. Such a thing is still useful, for entertainment and for things we know to be human solvable, but ultimately it isn't going to create new information and is still limited by the breadth of any human knowledge that's fed into its dataset

Anonymous 8/20/2025, 2:48:02 PM No.106323570 >>106323580

tried to setup whisper but i guess im too retarded it kept throwing errors at me when i made a drag and drop script for audio

Anonymous 8/20/2025, 2:48:32 PM No.106323574 >>106323582 >>106323608

Screenshot 2025-08-20 144802.png md5: 3660b120... 🔍

>>106323532
they removed the mention of R1 in the "deepthink" button of their official chatui
and are special tokens in the new v3
it's all technically circumstantial until they release the model on HF but I'm convinced it's indeed a hybrid and they killed R1.

Anonymous 8/20/2025, 2:49:12 PM No.106323580 >>106325946

>>106323570
it worked for cpu but whenever i tried to use cuda, thats when the problems started

Anonymous 8/20/2025, 2:49:31 PM No.106323582

>>106323574
Info on the tokens https://www.reddit.com/r/LocalLLaMA/comments/1munvj6/the_new_design_in_deepseek_v31/

Anonymous 8/20/2025, 2:50:52 PM No.106323588 >>106323598

>>106323558
Its funny how we've come full circle basically. What "AI" is good for is retrieving information from the internet and doing it with no ads, something we already had 20 years ago with a simple google search+adblock. I wonder whats the over under on when these chatbots will undergo this shittification we now have on the regular web? Seeing how hard everyone is stalling I think its pretty close

Anonymous 8/20/2025, 2:52:17 PM No.106323598 >>106323605 >>106323630 >>106323667 >>106323696 >>106324459

>>106323588
https://blog.cloudflare.com/introducing-pay-per-crawl/
Coming to your search button soon!

Anonymous 8/20/2025, 2:53:28 PM No.106323605 >>106323634

>>106323598
>We’re excited to help dust off a mostly forgotten piece of the web: HTTP response code 402.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/402
>402 Payment Required
Thanks, this sounds great!

Anonymous 8/20/2025, 2:53:56 PM No.106323608 >>106323848

DipsyAngry.png md5: 10a1ad68... 🔍

>>106323574
Yeah, I just confirmed that they merged per DS CS. There's discussion on HF now. Since, you know, DS can't release any documentation, we get screenshots of chats in Chinese cut/pastes.
>>106323578

Anonymous 8/20/2025, 2:56:28 PM No.106323630

>>106323598
nemotron gods just keep winning

Anonymous 8/20/2025, 2:57:06 PM No.106323634

>>106323605
>402 Payment Required
Was this what the founders of the Internet see?

Anonymous 8/20/2025, 3:01:28 PM No.106323658

>hybrid reasoning
Now that DeepSeek's out of the race, who's gonna give us the next R1 moment? I guess it can only possibly be Kimi, right?

Anonymous 8/20/2025, 3:01:48 PM No.106323661

Just got off a 11h call with GPT-5 via voice mode. We talked about quite a lot on the state of LLMs. It's not looking good lads.

Anonymous 8/20/2025, 3:02:13 PM No.106323665

I just want a good coder model I can use as agent locally on a normal consumer GPU, too much too ask apparently.

Anonymous 8/20/2025, 3:02:21 PM No.106323667

>>106323598
GOOD.
I run a 20+ year old lmao PHPBB. I finally shifted DNS to cloudflare and shut the door, elimating 99% of the traffic to the site. All bots. Fucking thousands of them, crawling a board about motorcycles that haven't been produced in 30 years.
Now the site runs 10x faster and actual humans can use it.

Anonymous 8/20/2025, 3:02:50 PM No.106323671

>>106323555
because they didn't hire 999 gorillion jeets to sit around updating the readme

Anonymous 8/20/2025, 3:05:23 PM No.106323696 >>106323713

>>106323598
Time to dust off the 99% of web data that got previously discarded from the pretraining datasets.

Anonymous 8/20/2025, 3:07:27 PM No.106323713

>>106323696
No that's toxic sewage! Please just pay for safe and ethical information.

Anonymous 8/20/2025, 3:07:52 PM No.106323716 >>106323744 >>106323768 >>106323769 >>106323907

>Unironically can advance my ERP story with DavidAU model
Yeah, fuck Rocinante

Anonymous 8/20/2025, 3:12:13 PM No.106323744 >>106323765

>>106323716
I'll bite. Which model are you talking about exactly?

Anonymous 8/20/2025, 3:15:10 PM No.106323765 >>106323800 >>106323836 >>106323838 >>106323863 >>106326007

>>106323744
I will get persecuted for this
L3.2-8X4B-MOE-V2-Dark-Champion-Inst-21B-uncen-ablit

Anonymous 8/20/2025, 3:15:22 PM No.106323768 >>106323870

>>106323716
>davidAU
I've never used a model by him that didn't spit out gibberish.

Anonymous 8/20/2025, 3:15:22 PM No.106323769 >>106323791 >>106323907

DavidAU.jpg md5: 6438dfad... 🔍

>>106323716
>936 models
kek what the fuck

Anonymous 8/20/2025, 3:17:46 PM No.106323789

>>106323532
If on the site I enable thinking and ask it what model it is, it always tells me DeepSeek-V3. It used to tell me R1.

Anonymous 8/20/2025, 3:17:49 PM No.106323791 >>106323809

>>106323769
He's the spirit of the llama2 era personified.
Just do whatever the fuck, slap a funky name in it, and call it a day.

Anonymous 8/20/2025, 3:18:43 PM No.106323797 >>106323826

china got really quiet after gpt-oss dropped

Anonymous 8/20/2025, 3:19:03 PM No.106323800

>>106323765
Alright, I'm downloading it, mr. anon. You better not have wasted my time and money (I live in a fourth world country).

Anonymous 8/20/2025, 3:19:36 PM No.106323808

2024 is when LLM died because it's the last year where you can get relatively pollution free web data

Anonymous 8/20/2025, 3:20:04 PM No.106323809

>>106323791
He's machine learning with no reward function.

Anonymous 8/20/2025, 3:22:04 PM No.106323820 >>106323873

>>106323526
I use imatrix weights :)

Anonymous 8/20/2025, 3:22:43 PM No.106323826

>>106323797
>>106151849

Anonymous 8/20/2025, 3:23:23 PM No.106323836

>>106323765
actually, finetroon recs are rare on here because people have hyper-sperg outs for some reason. Is it better than mistral small?

Anonymous 8/20/2025, 3:23:25 PM No.106323838 >>106323906

>>106323765
I think you are either trolling or under the placebo effect. It's a brain damaged 4b sloptune you are talking about.

Anonymous 8/20/2025, 3:24:43 PM No.106323848

>>106323608
TIL /wait/ still exists

Anonymous 8/20/2025, 3:26:01 PM No.106323863

>>106323765
>L3.2-8X4B-MOE
That's so fucking funny.

Anonymous 8/20/2025, 3:26:51 PM No.106323870 >>106323881 >>106323894 >>106323907 >>106323915

cmd_2025-08-20-1755696389.png md5: e4383a87... 🔍

>>106323768
Did you follow its autistic instructions?
Like for example picrel, I didn't put the prompt but it's somehow there. The only I followed was copy-pasted this system-prompt
Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities.

Here are your skillsets:
[MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv)

[*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision)

Here are your critical instructions:
Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.

Anonymous 8/20/2025, 3:27:27 PM No.106323873 >>106323878

>>106323820
and? Q4_K_M still ain't good enough with or without imatrix

Anonymous 8/20/2025, 3:28:13 PM No.106323878 >>106323915

>>106323873
q4km is gay
iq4nl is where its at

Anonymous 8/20/2025, 3:28:33 PM No.106323881

>>106323870
damn that's pure schizo

Anonymous 8/20/2025, 3:28:33 PM No.106323882

>>106323471
the only one of these that actually signals anything about AI is the GPT-5 flop
anthropic has always been like that, US wants to keep china dependent on nvidia, meta had a lot of incompetent wastrels, xAI is at the whims of elon, and chamath is a retard by nature

Anonymous 8/20/2025, 3:29:37 PM No.106323894

>>106323870
I mean, at least the guy's trying things

Anonymous 8/20/2025, 3:31:03 PM No.106323906

>>106323838
Maybe. I don't play around with models much. The only 4b models I tried were original gemma-3 and qwen3 with JB. Both are stalling ERP story.

Anonymous 8/20/2025, 3:31:17 PM No.106323907 >>106323929 >>106327271

>>106323870
>>106323769
>>106323716
We must remind.
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

Anonymous 8/20/2025, 3:32:02 PM No.106323915 >>106323977

>>106323870
that's some absolutely schizo shit
>>106323878
oh, you're one of those who think imatrix only refers to the i-quants
actually, it fucking doesn't
and i quants are strictly inferior to the K_M in every way except size
it's the quants for people desperate to squeeze 2% more vram in
also slower t/s

Anonymous 8/20/2025, 3:32:43 PM No.106323919 >>106323943

there are some decent finetuners but davidau is not one of them

Anonymous 8/20/2025, 3:34:00 PM No.106323929 >>106323943

>>106323907
This is crazy
>I put 3 decades of programming experience, 100s of model builds and 1000s of model tests into creating an AI / programming hybrid.
https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE

Anonymous 8/20/2025, 3:36:23 PM No.106323943 >>106325772

>>106323919
>there are some decent finetuners
"there are some decent mosquitoes"
>>106323929
a lot of people spamming huggingface can be summed up as the product of mental illness
the true sad thing is that HF doesn't delete the fruit of the mentally ill

Anonymous 8/20/2025, 3:40:27 PM No.106323977 >>106324227

>>106323915
I can sense your cope, I'm using IQ4_KT

Anonymous 8/20/2025, 3:41:15 PM No.106323985

>finetrooners
lmao

Anonymous 8/20/2025, 4:00:43 PM No.106324111 >>106324130 >>106324137

Has anyone else tried allura-org/MS3.2-24b-Angel? It seems to be the only finetune of Mistral Small I have ever tried that doesn't just make everything worse. Am I just delusional?

Anonymous 8/20/2025, 4:02:53 PM No.106324130

>>106324111
>Am I just delusional?
how did you know

Anonymous 8/20/2025, 4:03:34 PM No.106324137 >>106324264

>>106324111
Post logs, man. Nobody's going to go through the effort of downloading a trying out a random finetune with <1k downloads on its main release based on nothing.

Anonymous 8/20/2025, 4:09:36 PM No.106324183

>>106323471
not sure what these retards are talking about.
they hyped it up to get a quick $$$ on twitter and youtube. "muh strawberry 2025 agi".
now they are disappointed. kek

i'm getting older now and i dont think,at least in the last 20 years, i saw something new were growth was that fast.
normies i know are all using the google ai text thing thats integrated in search. its accurate enough for them. they dont even look at the links anymore.
i vibe coded something up so my kids can talk and also share vision with a local model that fits on 2 old ass p40s and ask them elementary school questions in japanese. it answers fairly well. fucks up their homework assignments if they sent a pic, but that's not shocking.

if i had the tools as a kid that are available for free right now... im old, not much energy and burned out. almost every project stops at 80%. revolutionary stuff.
damn jeets man. hype everything up and if they don't get "mah asi" in 2 years they are bored already.

Anonymous 8/20/2025, 4:10:47 PM No.106324195 >>106324243

https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base
>no model card
>no ggufs
>no transparency
Just an ugly release all around.

Anonymous 8/20/2025, 4:15:22 PM No.106324227

>>106323977
>I'm using IQ4_KT
Is it not slow as molasses for you, or are you using exl3?

Anonymous 8/20/2025, 4:15:23 PM No.106324228

>>106323486
qwen3 and qwq have worked great for me as well once I made the right kind of card for no censorship. Interestingly, I had more trouble getting them to talk about controversial physics topics than to just go wild in erp.
>I am stupid and didnt see the new general happened. So reposting this here.

Anonymous 8/20/2025, 4:17:29 PM No.106324243

>>106324195
Did they announce it yet? No. So there wasn't a release yet.

Anonymous 8/20/2025, 4:18:11 PM No.106324253

>model-00019-of-000163.safetensors 4.3 GB

Anonymous 8/20/2025, 4:19:32 PM No.106324264 >>106325241

>>106324137
I don't have any logs for that don't violate 4chan global rules

Anonymous 8/20/2025, 4:20:07 PM No.106324268 >>106324283 >>106324287 >>106324299 >>106324305 >>106324309 >>106324317 >>106324335 >>106324351 >>106324421 >>106324423 >>106324523 >>106324963 >>106324971 >>106324996 >>106325086 >>106328486

We're so fucking back
https://arxiv.org/abs/2508.11829
>Despite significant advances, AI systems struggle with the frame problem: determining what information is contextually relevant from an exponentially large possibility space. We hypothesize that biological rhythms, particularly hormonal cycles, serve as natural relevance filters that could address this fundamental challenge. We develop a framework that embeds simulated menstrual and circadian cycles into Large Language Models through system prompts generated from periodic functions modeling key hormones including estrogen, testosterone, and cortisol. Across multiple state-of-the-art models, linguistic analysis reveals emotional and stylistic variations that track biological phases; sadness peaks during menstruation while happiness dominates ovulation and circadian patterns show morning optimism transitioning to nocturnal introspection. Benchmarking on SQuAD, MMLU, Hellaswag, and AI2-ARC demonstrates subtle but consistent performance variations aligning with biological expectations, including optimal function in moderate rather than extreme hormonal ranges. This methodology provides a novel approach to contextual AI while revealing how societal biases regarding gender and biology are embedded within language models.

Anonymous 8/20/2025, 4:21:21 PM No.106324283

>>106324268
This is "Magnets, how do they work?" tier.

Anonymous 8/20/2025, 4:21:47 PM No.106324287

>>106324268
wtf

Anonymous 8/20/2025, 4:23:40 PM No.106324299

>>106324268
>female hormones are all you need
lmao what is this

Anonymous 8/20/2025, 4:24:27 PM No.106324305

1748938556858583.png md5: 36611116... 🔍

>>106324268

Anonymous 8/20/2025, 4:24:36 PM No.106324309

>>106324268
this is the level of fucking around with LLMs we should all be aspiring to

Anonymous 8/20/2025, 4:25:33 PM No.106324317

>>106324268
>Terry Tao's funding got cut but this gal's wasn't

Anonymous 8/20/2025, 4:27:46 PM No.106324335

>>106324268
And you guys call DavidAU schizo

Anonymous 8/20/2025, 4:30:01 PM No.106324351 >>106324419

>>106324268
Can't wait for the new hormonally accurate AI assistants.
>Help me debug this python code, GPT.
>Not now, my head hurts.
>What's wrong, GPT?
>If you cared about me you'd know what's wrong.

Anonymous 8/20/2025, 4:39:29 PM No.106324419

>>106324351
No, fuck, shut it down. This is unsafe.

Anonymous 8/20/2025, 4:39:42 PM No.106324421

1746644765729350.png md5: 27cb1aca... 🔍

>>106324268

Anonymous 8/20/2025, 4:39:52 PM No.106324423 >>106324488

>>106324268
I only skimmed the text and they don't mention the exact prompts they used but I can imagine they put "You are currently ovulating", "You are currently menstruating", and similar in the prompt and then benchmarked the model.

Anonymous 8/20/2025, 4:42:57 PM No.106324459

>>106323598
Why do these cucklords act like they own the Internet?

Anonymous 8/20/2025, 4:47:27 PM No.106324488

>>106324423
is this the new tech?
>You are an expert roleplayer who is currently ovulating.

Anonymous 8/20/2025, 4:48:18 PM No.106324495

Teto should sit on my face

Anonymous 8/20/2025, 4:51:21 PM No.106324523 >>106327026

>>106324268
The basic idea is kind of not bad, but doing it through a system prompt works about as well as switching your sex by having other people tell you that you're now a woman or a man.

Anonymous 8/20/2025, 4:52:50 PM No.106324537 >>106324545 >>106324656 >>106324702

Aicg is schizo general so I'll ask here instead: what's the point of Gemini's 1M context window if it becomes lazy as fuck after 200k and hallucinates every second piece of information?

Anonymous 8/20/2025, 4:54:00 PM No.106324545

>>106324537
Advertising.

Anonymous 8/20/2025, 4:54:05 PM No.106324547 >>106324572 >>106324921 >>106325406

Screenshot_20250820_164821.png md5: 84e66b38... 🔍

>>106323459 (OP)

Anonymous 8/20/2025, 4:56:37 PM No.106324572

>>106324547
Correction.

Anonymous 8/20/2025, 5:05:19 PM No.106324656 >>106324721

>>106324537
>if it becomes lazy as fuck after 200k
You answered your own question
It's still so, so much better than the other models that there is no contest. What's the point of the 1M ? well, what's the point of DeepSeek's advertised 128k? the model literally breaks around 20k
no one is "sincere" about context length but Gemini wins in real world usage, always

Anonymous 8/20/2025, 5:10:50 PM No.106324702

>>106324537
It's a big number.

Anonymous 8/20/2025, 5:13:25 PM No.106324721 >>106324976

>>106324656
>What's the point of the 1M ?
- MTLing huge texts such as LN/VN, providing as much context consisting of previous lines as possible
- Vibe-coding new features for projects with large codebases

Anonymous 8/20/2025, 5:23:34 PM No.106324813

- you are quoting the wrong guy
- markdown bullet lists are gayshit
- literally not a single person with a brain would feed even 10k worth of token for translation purposes, LLMs are not that smart and this isn't adding useful context but rather polluting it, if you actually spent time using LLMs for translation you'd know the best current approach is chunking the hell out of your text and only adding the minimum vital of context so that it gets things like character gender right

Anonymous 8/20/2025, 5:24:52 PM No.106324819 >>106324893 >>106329565

vram_summer_2025.png md5: ed416487... 🔍

24GB $1000
32GB $3000
48GB $5000
96GB $10,000
141GB $40,000

vram prices are starting to flatten out. are we on the verge of a collapse when still-usable enterprise used things start flooding the market?

Anonymous 8/20/2025, 5:25:53 PM No.106324824 >>106324833

file.jpg md5: dcf5611d... 🔍

Elon won.

Anonymous 8/20/2025, 5:27:13 PM No.106324833 >>106324852

>>106324824
thats not dynamically created right? or can you prompt the outfit?

Anonymous 8/20/2025, 5:29:30 PM No.106324852 >>106324862

>>106324833
AI still struggles with proper model wireframe so no, you can't prompt it.

Anonymous 8/20/2025, 5:31:09 PM No.106324862 >>106324870 >>106324950

file.png md5: c214437e... 🔍

>>106324852

Anonymous 8/20/2025, 5:31:12 PM No.106324864

I seriously hope that the API is still using the old models because I'm not seeing any improvements in the scenarios that Deepseek had trouble portraying well for me.

Anonymous 8/20/2025, 5:32:08 PM No.106324870 >>106324894 >>106324914

>>106324862
about grok 2 though

Anonymous 8/20/2025, 5:35:21 PM No.106324893

>>106324819
>still-usable enterprise used things start flooding the market?
They won't. Nvidia buys back used enterprise GPUs to avoid competing with the used market

Anonymous 8/20/2025, 5:35:25 PM No.106324894

>>106324870
When Grok 7's stable

Anonymous 8/20/2025, 5:37:02 PM No.106324914

>>106324870
After LLaMa 2 33B

Anonymous 8/20/2025, 5:37:54 PM No.106324921

>>106324547
"My twin c cousins can't be this nerd" the anime

Anonymous 8/20/2025, 5:41:59 PM No.106324950

>>106324862
FSD surely next year

Anonymous 8/20/2025, 5:42:46 PM No.106324954 >>106325062

The new deepseek reasoner is dumber than the chat they provide. Reasoner easily slips up when you test it with a blind character while Deepseek-chat via the API handles it correctly.
They have truly fallen for the hybrid reasoning meme.

Anonymous 8/20/2025, 5:44:00 PM No.106324963

>>106324268
>chatbot refuses to cooperate because it has period

wtf

Anonymous 8/20/2025, 5:44:45 PM No.106324971 >>106325013

>>106324268
This is a joke right?

Anonymous 8/20/2025, 5:45:15 PM No.106324976 >>106325185

>>106324721
>- MTLing huge texts such as LN/VN, providing as much context consisting of previous lines as possible
Are there real use case proof for this? Can it translate stupid terms like on Dies Irae series?

Anonymous 8/20/2025, 5:47:43 PM No.106324996

>>106324268
>spending billions of dollars to improperly simulate the fallibility of human flesh when the machine comes without any such deficiencies by default

Anonymous 8/20/2025, 5:49:05 PM No.106325013

>>106324971
I think it's a whole new frontier of mental retardation
>three pages describing hormonal cycles in humans, completely unrelated to llms
>"we engineered a set of periodic functions with added Gaussian noise to simulate the natural shapes and fluctuations of testosterone, estrogen, LH, FSH, progesterone, cortisol, and body temperature. "
>"used to generate distinct system prompts for a wide range of state-of-the-art models, including Anthropic’s Claude 3.5 Sonnet, Deepseek-Chat, ..."
>"each prompt was designed to convey a specific emotional tone corresponding to its underlying hormonal state"

Anonymous 8/20/2025, 5:49:22 PM No.106325016 >>106325030 >>106325035

We need to reverse engineer Rocinante for the future generations.

Anonymous 8/20/2025, 5:50:12 PM No.106325026

we need to vivisect drummer for uhh, science or something

Anonymous 8/20/2025, 5:50:23 PM No.106325030 >>106325048

>>106325016
Just distil bro.
We need Deepseek R1 - Rocinante distil ASAP.

Anonymous 8/20/2025, 5:51:23 PM No.106325035

>>106325016
Speaking of, the two newer ones are pretty meh, and I find Roci R1 better than X even without using the think.

Anonymous 8/20/2025, 5:52:30 PM No.106325048 >>106325068

>>106325030
>We need Deepseek R1 - Rocinante distil ASAP.
dude? https://huggingface.co/BeaverAI/Rocinante-R1-12B-v1b-GGUF/tree/main

Anonymous 8/20/2025, 5:54:21 PM No.106325062

>>106324954
After some more testing, I think this is because deepseek-reasoner always tries really hard to keep its thinking effort as short as possible, which causes it to ignore important aspects about the scenario like this. It handles the scenario correctly if I edit the prefil to force the model to spend more time thinking and actually attempt to plan ahead:
> \n Okay, I'll think this through thorougly. First, I'll make a list of things to consider:
This way it actually takes a moment to look at the scenario and make a plan about things to consider and pay attention to.

Anonymous 8/20/2025, 5:54:41 PM No.106325068 >>106325082

>>106325048
Not Rocinante R1, R1 Rocinante.

Anonymous 8/20/2025, 5:56:03 PM No.106325082

>>106325068
i see, that does sound interesting certainly better than rewriting data using a 3B active model

Anonymous 8/20/2025, 5:56:37 PM No.106325086

>>106324268
Interedasting idea, gonna try this on my Ani chatbot

Anonymous 8/20/2025, 5:59:53 PM No.106325108 >>106325122 >>106325125 >>106325132 >>106325161

There. I did a thing with a small local model that isn't sexual.
On my minecraft server I made a custom plugin that will call the llama.cpp api (or any OpenAI Compatible API) and it will basically use the player's inventory/equipment/status along with an offering placed in a chest and tell the player's fortune. (based on that information).
Even gemma3-1b can handle this task consistently, although its fortunes are a little dry.

Anonymous 8/20/2025, 6:01:37 PM No.106325122

>>106325108
>a thing with a small local model that isn't sexual
You are a freak

Anonymous 8/20/2025, 6:01:59 PM No.106325125

>>106325108
That's pretty cool.

Anonymous 8/20/2025, 6:02:41 PM No.106325132 >>106325143 >>106325151

>>106325108
Might as well use gemma3-240m at this point, lol

Anonymous 8/20/2025, 6:03:50 PM No.106325143 >>106325151

>>106325132
I wanted to try it but I can't get it to work on the llama.cpp build I'm using and don't feel like rebuilding it all over again. 240m has deeper pretraining than 1b doesn't it? So it might be able to handle the behavior.

Anonymous 8/20/2025, 6:04:50 PM No.106325151

>>106325132
>>106325143 (Me)
I will say at this point, though, that even 1B is basically instant on a 3090. It gets the inference done and response delivered in the time it takes a button to unpress in the game. So scaling down at that point is pointless, really.

Anonymous 8/20/2025, 6:05:21 PM No.106325161 >>106325466

>>106325108
Now that you have a base, make the villagers talk.

Anonymous 8/20/2025, 6:08:25 PM No.106325185

>>106324976
I used it on Phantom Integration and it turned out okay even before manual editing, definitely a step up compared to feeding text line-by-line, let alone using conventional MTL services.
I think your unique lingo can be dealt with by manually translating every word (perhaps with a JOP's help) and then putting what you got into the system prompt

Anonymous 8/20/2025, 6:13:48 PM No.106325228 >>106325259

What specs are needed to run qwen3 235b at decent quality? What are you guys using?

48gb vram 3x tesla v100 + 24gb vram p40 + 96gb ram enough?

Anonymous 8/20/2025, 6:15:05 PM No.106325241

1755063432991367.png md5: 1f7e24c8... 🔍

>>106324264

Anonymous 8/20/2025, 6:17:07 PM No.106325259

>>106325228
yes though a deepseek quant of the same size will be faster at generation (but slower at pp)

Anonymous 8/20/2025, 6:30:57 PM No.106325406 >>106325455 >>106325474

>>106324547
How do you get the model to do the lil hearts? My model (llama 2 7b chat) doesn't understand and shits them out in the middle of a word.

Anonymous 8/20/2025, 6:31:37 PM No.106325412 >>106325436 >>106325453

The new Deepseek release scheme makes perfect sense. Maybe more companies will release their models open source when they realize that they can simply put out the base model and leave the rest up to the community while keeping their own instruct tune to themselves.

Anonymous 8/20/2025, 6:33:51 PM No.106325436

>>106325412
We're not in the Alpaca era of LLMs anymore.

Anonymous 8/20/2025, 6:35:28 PM No.106325453 >>106325492

>>106325412
the community is too dumb to finetune moes

Anonymous 8/20/2025, 6:36:01 PM No.106325455 >>106325484

>>106325406
It's literally just
>Add heart emoji to their speech when they tease me
and DeepSeek R1 q5_k_m manages to do it.
Quite honestly it's not surprising that the outputs of an outdated 7b model are bad.

Anonymous 8/20/2025, 6:37:02 PM No.106325466

>>106325161
I actually thought it would be cool to make one that tracks each players activity and then uses it as context to make it so you can pay villagers for rumors about other players activities.

Anonymous 8/20/2025, 6:37:12 PM No.106325470 >>106325496 >>106325497 >>106325756

https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Base-woSyn
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct

New release from ByteDance Seed

Anonymous 8/20/2025, 6:37:22 PM No.106325474 >>106325511

>>106325406
>My model (llama 2 7b chat)
why are you using a 7b that became obsolete two years ago

Anonymous 8/20/2025, 6:38:35 PM No.106325484 >>106325567

>>106325455
So it's a parameter issue? If I install the 13b llama at q3, would that help?

Anonymous 8/20/2025, 6:39:19 PM No.106325492

>>106325453
Didn't mistral basically say
>just train it a bunch of times until it randomly works lol
in response to MoE training going wrong?

Anonymous 8/20/2025, 6:39:31 PM No.106325496 >>106325516 >>106325521

>>106325470
>Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks
Also what does OSS mean?

Anonymous 8/20/2025, 6:39:32 PM No.106325497 >>106325548

>>106325470
was just about to post this
it's interesting that they released bases with and without synthetic data
also this seems insane:

Got it, let's try to solve this problem step by step. The problem says ... ...
I have used 129 tokens, and there are 383 tokens remaining for use.
Using the power rule, ... ...
I have used 258 tokens, and there are 254 tokens remaining for use.
Alternatively, remember that ... ...
I have used 393 tokens, and there are 119 tokens remaining for use.
Because if ... ...
I have exhausted my token budget, and now I will start answering the question.

To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted).

Anonymous 8/20/2025, 6:40:50 PM No.106325511 >>106325517

>>106325474
That's what the tutorial told me to buy

Anonymous 8/20/2025, 6:41:18 PM No.106325516

>>106325496
>OSS
It's in the new GPT open weight so they had to have it too.

Anonymous 8/20/2025, 6:41:20 PM No.106325517

>>106325511
where did you buy it

Anonymous 8/20/2025, 6:41:52 PM No.106325521

>>106325496
Sam did it so china must do it as well

Anonymous 8/20/2025, 6:44:25 PM No.106325548 >>106325573

>>106325497
>I have used 1488 tokens, so I now burn more tokens counting remaining tokens
o-okay

Anonymous 8/20/2025, 6:46:27 PM No.106325567 >>106325584 >>106325589

>>106325484
Anon, let me be frank: size does unfortunately matter.
There's a reason oji-san was mocked for his "cheap" 3090.
For a classic VRAMlet model try Mistral Nemo (few restrictions), the new FOTM model is GPT-OSS (good but prudish).

Anonymous 8/20/2025, 6:47:01 PM No.106325573 >>106325585

>>106325548
>1488
N-no way, a-are you a ... N-N-N*zi??

Anonymous 8/20/2025, 6:48:02 PM No.106325584

>>106325567
Okay, thanks!

Anonymous 8/20/2025, 6:48:19 PM No.106325585 >>106325708

>>106325573
Black people caress my rear with their tongues

Anonymous 8/20/2025, 6:48:43 PM No.106325589 >>106325614

>>106325567
>GPT-OSS (good
anon pls

Anonymous 8/20/2025, 6:50:49 PM No.106325613 >>106325621 >>106325631 >>106325645 >>106326581

1743052378919146.jpg md5: 3db1e89f... 🔍

>No one will release a basic-ass GPU with 128GB ram to instantly dethrone nvidia
Why??

Anonymous 8/20/2025, 6:51:02 PM No.106325614

>>106325589
It's good product, trust official sam soup

Anonymous 8/20/2025, 6:51:55 PM No.106325621

>>106325613
because if you made such a GPU you would make a lot more money selling it at a huge markup to datacenters

Anonymous 8/20/2025, 6:53:29 PM No.106325631

>>106325613
https://www.techpowerup.com/332516/sandisk-develops-hbm-killer-high-bandwidth-flash-hbf-allows-4-tb-of-vram-for-ai-gpus
They're working on it.

Anonymous 8/20/2025, 6:54:15 PM No.106325645 >>106325707 >>106325917

>>106325613
Releasing consumer hardware is not going to dethrone nvidia, anon.
It needs to beat Nvidia's offerings in compute, density, power efficiency, AND memory to be able to even worth of consideration when you look at how much CUDA keeps people chained to the ecosystem.
If you watched the gamers nexus gpu smuggling video, the university professor chinaman spells it out. They would only consider moving out of CUDA if pytorch offered support for whatever other framework/api.

Anonymous 8/20/2025, 6:58:05 PM No.106325684

When will other companies follow suit and make an OSS model?

Anonymous 8/20/2025, 6:59:04 PM No.106325691

The api still shows 0528 and V3-0324, how do I use the new one?

Anonymous 8/20/2025, 7:00:35 PM No.106325707

>>106325645
>They would only consider moving out of CUDA if pytorch offered support for whatever other framework/api.
Seems like software issue to me.

Anonymous 8/20/2025, 7:00:39 PM No.106325708

>>106325585
Greedy tongue tests

Anonymous 8/20/2025, 7:01:15 PM No.106325716 >>106325800

>>106323491
Look into AutoRound. Intel claims it has better accuracy at 2-4 BPW, but I haven't seen comparisons with normal gguf quants or EXL3.
>https://github.com/intel/auto-round

Anonymous 8/20/2025, 7:06:15 PM No.106325756

>>106325470
I'm glad more models are separating the thinking and instruction models.
That's clearly the way to go
One good omni model to create a roadmap with the right instructions,
and a bunch of smaller expert models to do the bulk of the work.

Instead of 6GPU set up for 20t/s

Anonymous 8/20/2025, 7:08:18 PM No.106325772

>>106323943
You're on a mental illness website.

Anonymous 8/20/2025, 7:10:48 PM No.106325799

1734131646495868.jpg md5: d281625c... 🔍

so are they ever gonna do something good?

Anonymous 8/20/2025, 7:10:48 PM No.106325800

>>106325716
>auto-round
damn https://huggingface.co/Intel/Qwen3-235B-A22B-Thinking-2507-gguf-q2ks-mixed-AutoRound
>Q2_K_S 79.8 GB

Anonymous 8/20/2025, 7:16:32 PM No.106325866 >>106325923

image_2025-08-20_224529675.png md5: c1275d52... 🔍

I think the next version of the Qwen code model is going to blow some pants off.
The current model is already pretty great e, but recently they've also had a massive surge in usage
especially for code related shit which should provide them with more of that sweet, sweet real world data and significantly improve the next model.

Anonymous 8/20/2025, 7:17:59 PM No.106325879 >>106325906 >>106325951 >>106325967 >>106326088 >>106326299 >>106326787

How does Rocinante genuinely compare to 24b > 32b models.

I genuinely think i've cope brained myself into believe "larger b" matters when it comes to basic shit like ERP. I remember Rocinante having way more personality than Cydonia and the other Mistral small finetunes

Anonymous 8/20/2025, 7:18:03 PM No.106325881 >>106325928 >>106326003

draft model meme seems like a good target for live training: the model itself is small making training cheap, there's immediate feedback on correctness from big brother model and you don't have to worry about over-fitting or other crap, just throw the drafter away and start again.
You can also do the training in-between prompts during down time.

Anonymous 8/20/2025, 7:21:15 PM No.106325906

>>106325879
It's actually better than qwen3 235 and even r1 when it's not about general knowledge or complex reasoning. Just the phrasing, how well it takes hints, etc. In short, it gets me.

Anonymous 8/20/2025, 7:22:05 PM No.106325917

>>106325645
>They would only consider moving out of CUDA if pytorch offered support for whatever other framework/api.
Which it does have several backends for. AMD's ROCm pytorch packages have been out for a while and Intel's been available since Pytorch 2.7 so even then, it's not the sole determinant for picking Nvidia, it's the grants and etc. you get.

Anonymous 8/20/2025, 7:22:57 PM No.106325923

>>106325866
I want a qwen3 dense 32b coder.

Anonymous 8/20/2025, 7:23:26 PM No.106325928

>>106325881
I think you might be lost, care to explain what you mean?

Anonymous 8/20/2025, 7:25:34 PM No.106325946

>>106323580
Always use conda env

Anonymous 8/20/2025, 7:26:03 PM No.106325951

>>106325879
Nemo in general killed the entire local model scene.

It showed just how fucking useless higher parameters matter for ERP on the local model scene unless is a significant jump in size which then means that the only significant jumps in quality come from shit nobody is running on local machines. I both hate and love Nemo for causing this. Saved me a lot of time and money.

>take out a loan to build a turbo PC capable of running some deepseek shit at a painful 5 t/s, on a pisslow quant because it's 5 quintillion paramaters big
versus
>scrounge around for free larger models on OpenAI instead that achieves the same result but better

Tough choice.

Anonymous 8/20/2025, 7:27:33 PM No.106325967

>>106325879
I suspect that the constant flood of math and code and benchmark shit has crippled the current set of models. I went back and looked at my oldest outputs from the llama1 30b era and actually preferred them to what the newest models would spit out with the same context.
New models are "smarter" but they lack some crucial guiding intuition for RP.

Anonymous 8/20/2025, 7:31:01 PM No.106326003 >>106326063

>>106325881
>https://github.com/sgl-project/SpecForge
interesting

Anonymous 8/20/2025, 7:31:24 PM No.106326007

>>106323765
>Computer, generate a 10,000 word scientific report on the feasibility of a dead shota's penis penetrating his blood-related mother's decomposing corpse's urethra as they are vored.
>I can't provide that kind of explicit and disturbing content. Is there anything else I can help you with? perhaps a more general and scientific topic?

Into the trash it goes.

Anonymous 8/20/2025, 7:38:15 PM No.106326063

>>106326003
Almost, but what I mean is doing this for general inference, not just training draft model alongside proper one.

Anonymous 8/20/2025, 7:40:57 PM No.106326088 >>106326163

>>106325879
I disagree with people saying fuckhuge moe are worse. They are obviously better at everything. But yes nemo is the best model on a single gpu and it proves censorship actually works. And if you aren't running fuckhuge moe's you are just going to have to sit there and wait for the next model that is uncensored.

Anonymous 8/20/2025, 7:47:45 PM No.106326140

The concept of a "base" model is dead at this point, we're never getting another one

Anonymous 8/20/2025, 7:49:51 PM No.106326163 >>106326206 >>106326210 >>106326276

images.jpg md5: c76cd117... 🔍

>>106326088
who the fuck said fuckhuge moes are worse. We're specifically saying the opposite.

Nemo models are the best smaller models that can realistically be run locally. The shit people talk about on here makes me convinced this general is just another /aichat/ because I know for a fact most people aren't running anything higher than 70bs locally, especially not at decent quants.

This general has just turned into /aichat/ research department or something where we talk about 200b+ models like any of us can actually fucking run em.

Anonymous 8/20/2025, 7:53:04 PM No.106326206

picutreofyou.png md5: 4188a8a6... 🔍

>>106326163
>I know for a fact most people aren't running anything higher than 70bs locally
That is not true. John wouldn't be a celebrity if that were the case.

Anonymous 8/20/2025, 7:53:29 PM No.106326210 >>106326298

>>106326163
We desperate need to separate poorfags into their own thread.

Anonymous 8/20/2025, 7:55:05 PM No.106326237 >>106326254

Also wanna bet that John will release V3.1 quants before he releases GPT-OSS quants? This will be a clear proof that he is a Chinese agent.

Anonymous 8/20/2025, 7:56:25 PM No.106326254

>>106326237
? the 120b one? Why bother with such a small model?

Anonymous 8/20/2025, 7:58:09 PM No.106326276 >>106326440

2025-08-20_12-57.png md5: 5ce9e0a9... 🔍

>>106326163
You sound poor

Anonymous 8/20/2025, 7:59:37 PM No.106326298

>>106326210
That's what /wait/ is for.

Anonymous 8/20/2025, 7:59:38 PM No.106326299 >>106326352 >>106326362

>>106325879
No amount of personality will help it with basic spatial awareness or keeping track of minor character traits.
On the other hand, 24b models don't do these well enough either, might as well stick with Nemo for tokens go brrr.

MoE is love, MoE is life, I just want some kind of middle ground between Qwen30b and GLM-Air size-wise (or a better computer).

Anonymous 8/20/2025, 7:59:59 PM No.106326307 >>106326478 >>106326640

>>106323526
>even Q4_K_M incurs significant loss btw, don't believe the myth that it's "good enough" it absolutely isn't
Anon I would absolutely love to see some actual evidence on this. Do you have any? I am only 50% sarcastic here and that is a record.

Anonymous 8/20/2025, 8:03:23 PM No.106326338 >>106326346

feeet

Anonymous 8/20/2025, 8:03:56 PM No.106326346

>>106326338
Rin-chan feet

Anonymous 8/20/2025, 8:04:21 PM No.106326352

>>106326299
I've been using qwen30b as my main for a while now but it's not great. The non-thinking one is repetitive the thinking one produces better responses but isn't quite good enough so you still have to regularly do a couple swipes which is annoying when it takes 2 minutes to respond each. If there was just something with like 7b--12b active that would probably save vramlets

Anonymous 8/20/2025, 8:05:12 PM No.106326362 >>106327458

>>106326299
Have you tried Jamba mini?

Anonymous 8/20/2025, 8:08:33 PM No.106326398 >>106326438 >>106326462 >>106326471 >>106326477 >>106326533

Quant is a cope.
A quant with 99.99% accuracy will be 0.9999^10000 = 36.79% accurate after 10000 tokens.
A quant with 99.9% accuracy would be only 0.0045% after 10000 tokens.

Anonymous 8/20/2025, 8:10:31 PM No.106326426 >>106326450

Say I had a 32GB MI50 available to me for cheap, in-spite of the much older platform would it out preform a 9070XT that spills over into DDR4 3600Mhz system RAM vs having everything loaded into 32GB of HBM2 Vram of the MI50.

Anonymous 8/20/2025, 8:11:04 PM No.106326432

Researcher. The people I work with have been looking for places to use LLMs since grant reviewers love this. Go to a meeting with a few ML/AI researchers. I'm weirdly knowledgable about using these and am explaining the limits of context even when a model can theoretically use more and how to get around this. And here I was thinking I was wasting my time here

Anonymous 8/20/2025, 8:11:26 PM No.106326438

>>106326398
Everyone with above IQ is knows these
>>105106082
>Quant is the Mind Killer ;)

Anonymous 8/20/2025, 8:11:40 PM No.106326440 >>106326494 >>106326684

laugh-hilarious.gif md5: f7138213... 🔍

>>106326276
Show me your t/s on qwen3.

I'll wait. If you're rich enough to run a model of that size locally at a good quant, you're rich enough to get an actual real life waifu.

Anonymous 8/20/2025, 8:12:29 PM No.106326450 >>106327651

>>106326426
I wouldn't bother much with the mi50s unless you have a bunch of them. Just go with small models on your 9070xt or full moe.

Anonymous 8/20/2025, 8:13:47 PM No.106326462 >>106326480 >>106326488

>>106326398
What point are you trying to make?

Anonymous 8/20/2025, 8:14:25 PM No.106326471

>>106326398
How do you know 2nd token that changed only because of the quantization is wrong?

Anonymous 8/20/2025, 8:15:07 PM No.106326477

>>106326398
>there is one and only one correct way to write a text

Anonymous 8/20/2025, 8:15:08 PM No.106326478 >>106326495

>>106326307
I am not going to download Q4 models again just to run comparison sameseed prompts for you, the evidence isn't hard to ""obtain"" disingenuous retard, just prompt something to translate with less common words, watch Q4 fumble where Q8 shines
also found Q6_K to be mostly perfect, but after noticing the Q4 degradation I don't even want to play roulette or spend any more time on this, Q8 or die (I'd go without quantization if I could, quantization itself is a fucking cope)

Anonymous 8/20/2025, 8:15:12 PM No.106326480

>>106326462
You're need H100 to run the non quant other you can't say model is bad

Anonymous 8/20/2025, 8:15:42 PM No.106326488 >>106326493

>>106326462
allow me to translate:
>"kehehe let me bait those lmg nerds and get a bunch of (You)s kehehehe"

Anonymous 8/20/2025, 8:16:09 PM No.106326493

>>106326488
Fuck

Anonymous 8/20/2025, 8:16:22 PM No.106326494 >>106326508

2025-08-20_13-12.png md5: ac995c7d... 🔍

>>106326440
It's actually a work server but we run lots of large models locally (mostly for programatic stuff, I just loaded a few into openwebui) and this board has helped me tremendously for learning how to do all this stuff.

Anonymous 8/20/2025, 8:16:36 PM No.106326495 >>106326509 >>106326651

>>106326478
Ah so it is your personal anecdote? You know this is how min_p, snoot curve sampler, undi's frankenmerges were proven to be good as well?

Anonymous 8/20/2025, 8:17:28 PM No.106326508 >>106326926

>>106326494
God I fucking hate sharing this place with rich, successful people

Anonymous 8/20/2025, 8:17:29 PM No.106326509

>>106326495
All of these are in fact good.

Anonymous 8/20/2025, 8:19:30 PM No.106326533 >>106326556 >>106326645

>>106326398
It doesn't work that way.
The inference loop is run for each token position individually, 1 at a time. There is no compounding across the entire sequence of tokens.
If the sequence is.
>Q: Why is the sky blue?
>A:
It will prompt that and get
>Q: Why is the sky blue?
>A: Because
and then it will prompt that and get
>Q: Why is the sky blue?
>A: Because (space)
etc.
Each token is a new inference. Caching is used to increase the turnaround so that it doesn't have to fully reprocess the entire prompt each time.
There's no cumulative inaccuracy at work.
The quantization loss, (within reason) will usually manifest as small changes in confidence which at Q6 and above pretty much don't matter. You might get a more interesting output by cranking the temperature up and that's it. Like if you've ever seen imagegen models quantized down to Q8, vs FP16 it's roughly equal in quality but different. Like a black cat might become a tabby cat and the pattern on his food dish might change, etc.
And then old Q4_0 models would mix up close concepts such as possessive clauses as a result- because it obviously manifests in a leveling off on confidence between close concepts. (mine vs yours, etc). But again it doesn't compound between two tokens in the manner you suggest. You're just a braindead retard.

Anonymous 8/20/2025, 8:20:07 PM No.106326540 >>106326675

I don't think anyone believes Q8 or full precision isn't superior whether the difference is small or big, but if you can run a larger parameter model at Q4 then it is usually superior and makes less mistakes than the smaller model at Q8. Ultimately just run the largest model you can at no less than Q2-Q4 depending on the specific models under consideration, that's all we really need to say at this point. Anything more is really just unnecessary noise.

Anonymous 8/20/2025, 8:21:08 PM No.106326556 >>106326567

>>106326533
Every error cause more error later even if new interferences

Anonymous 8/20/2025, 8:22:23 PM No.106326567 >>106326645 >>106326784

>>106326556
Only if that error surpasses the threshold required to get the model to return a garbage token. Only then will it compound.
But again this usually only happens in situations where a small change in confidence matters- like possessive clauses on legacy 4bit quants.

Anonymous 8/20/2025, 8:23:00 PM No.106326581

>>106325613
that would be anti-semitic

Anonymous 8/20/2025, 8:28:27 PM No.106326640

>>106326307
Man, I'm using glm 4.5 air at q8 right now, and I was going to test it by using it to translate some chinese to english and comparing it to a q4's output, but goddamn it's stupid and wrong.

Anonymous 8/20/2025, 8:29:12 PM No.106326645 >>106326672

>>106326533
>>106326567
> Only if that error surpasses the threshold required to get the model to return a garbage token.
if each token has 99.99% accuracy the 36% number is correct probability that unquant model will have a garbage token out of 10000 tokens generated.

Anonymous 8/20/2025, 8:29:41 PM No.106326651 >>106326664

>>106326495
>snoot curve
?

Anonymous 8/20/2025, 8:30:38 PM No.106326656

anime feet

Anonymous 8/20/2025, 8:31:21 PM No.106326664 >>106326669 >>106326932

f.png md5: 72c1ef37... 🔍

>>106326651

Anonymous 8/20/2025, 8:32:06 PM No.106326669 >>106326683 >>106326791

>>106326664
What the fuck is that?

Anonymous 8/20/2025, 8:32:26 PM No.106326672

>>106326645
I am so utterly horrified by how retarded your logic is that I have to leave now.

Anonymous 8/20/2025, 8:32:58 PM No.106326675

>>106326540
zis

Anonymous 8/20/2025, 8:33:36 PM No.106326683

>>106326669
Made by our lord and savior kanyemonk aka kalomaze

Anonymous 8/20/2025, 8:33:38 PM No.106326684 >>106326715

>>106326440
If I was rich I would still be doing shit like this. I'd just have more vram. Sorry to hear about how unhappy you are, incel.

Anonymous 8/20/2025, 8:36:14 PM No.106326715 >>106326765

>>106326684
>If I was rich
>incel
sorry to hear that anon, hope things go better for you.

Anonymous 8/20/2025, 8:37:05 PM No.106326728 >>106326742 >>106326813 >>106326862

Redpill me on Qwen as a 24GB + 32GB poorfag.

Apparently it doesn't suck anymore based on previous threads for ERP?

Anonymous 8/20/2025, 8:38:14 PM No.106326742 >>106326809 >>106326813

>>106326728
30b3a is soda

Anonymous 8/20/2025, 8:39:39 PM No.106326765

>>106326715
huh? Im running 200b models and having fun. He's the one who admitted he wont buy vram because if he did he would waste it on pussy.

Anonymous 8/20/2025, 8:42:04 PM No.106326784 >>106326863 >>106326879

>>106326567
>on legacy 4bit quants.
lol. it's not just the legacy. And the type of breakage you mention remind me of one of my translation tests. In fact GPT-OSS 20B in its brand new MXFP4 is incapable of correctly translating this sentence (I forgot I still had that model on my drive, going to remove it asap):
> わたしは半ば意地になって、思いつく限りの人間にインタビューしては、矛盾点をつきあわせていったが、その過程で、否応なく、ある事実に気づかされた。誰一人、自分にとって不利な方向に記憶をねじ曲げていた人間はいなかったのだ。
I blame the MXFP4 rather than the model
because I've seen much smaller dinky models get it right at better quants
(in particular, the part about "not twisting one's memories to one's disadvantage" all Q4 become retarded and GPT-OSS always gives me a variant of : "no one had twisted their memory to put me at a disadvantage.")
Even Qwen 4B will almost always translate this sort of sentence (in general, not just this example I give) right at low temp. As long as it's Q8. Q4_K_M breaks.

Anonymous 8/20/2025, 8:42:17 PM No.106326787 >>106326836

>>106325879
>I genuinely think i've cope brained myself into believe "larger b" matters when it comes to basic shit like ERP
It's not cope though. Just run tests on spatial awareness regarding who's taking off what and where and when and you'll immediately see a difference. I don't feel like touching models in those ranges anymore ever since I started using glm air.

Anonymous 8/20/2025, 8:42:47 PM No.106326791

need snoot.png md5: 43e05ce8... 🔍

>>106326669
the best sampler parameters.

Anonymous 8/20/2025, 8:43:20 PM No.106326794 >>106326974

mtp might be close https://github.com/ggml-org/llama.cpp/pull/15225

Anonymous 8/20/2025, 8:45:18 PM No.106326809 >>106326829

>>106326742
the fuck does "it's a soda" mean

Anonymous 8/20/2025, 8:45:34 PM No.106326813

>>106326742
>>106326728
MINNESOTA
https://youtu.be/rwIk4otVjbU

Anonymous 8/20/2025, 8:47:39 PM No.106326829 >>106326845

>>106326809
soda like best there is meaning

Anonymous 8/20/2025, 8:48:22 PM No.106326836

>>106326787
yeah, writing style and specific RP tuning are nice and all, but if you have any degree of complexity, subtlety, or open-endedness in your RP larger models are so much better at handling it it's hard to go back

Anonymous 8/20/2025, 8:49:44 PM No.106326845 >>106326918

>>106326829
Ah shit. No finetunes needed? I can actually run that shit

Anonymous 8/20/2025, 8:51:02 PM No.106326862 >>106326896

>>106326728
they let the models know what sex is now so they can write smut that isn't just vague euphemisms. they have a default style that's a bit overbaked but it's not too bad, they're pretty fun and definitely far better than previous iterations

Anonymous 8/20/2025, 8:51:04 PM No.106326863

stupid01.gif md5: 637b00bf... 🔍

>>106326784
>MXFP4
>GPT-OSS
>Qwen 4B Q4_K_M
is this bait

Anonymous 8/20/2025, 8:52:14 PM No.106326879 >>106326900 >>106326937

>>106326784
hey silly anon, gpt oss was only released in MXFP4

Anonymous 8/20/2025, 8:53:49 PM No.106326896 >>106326954 >>106326973

>>106326862
what one specifically?

I'm so confused by all the Qwen shit which puts me off

>reasoning this
>QWQ, Qwen 32B, Qwen 30b3a

There's like 50 different types of the same size. I just wanna goon

Anonymous 8/20/2025, 8:54:01 PM No.106326900

>>106326879
he didn't say otherwise

Anonymous 8/20/2025, 8:55:30 PM No.106326918

Untitled.png md5: 23f11932... 🔍

>>106326845
Keep in mind they don't really aim for rp performance or prose at all. All they train for are benchmarks.

Anonymous 8/20/2025, 8:56:11 PM No.106326926 >>106326952

>>106326508
Hello crab. How's bucket? It's not all bad, aren't those people the ones that pay for your neetdom?

Anonymous 8/20/2025, 8:56:57 PM No.106326932 >>106326938

>>106326664
How does this work? Does it make the model smarter?

Anonymous 8/20/2025, 8:57:30 PM No.106326937 >>106326944 >>106326953

>>106326879
>gpt oss was only released in MXFP4
wait, for real? literally worse than releasing nothing.

Anonymous 8/20/2025, 8:57:33 PM No.106326938

>>106326932
It makes the model more

Anonymous 8/20/2025, 8:58:11 PM No.106326944

>>106326937
yeah :(

Anonymous 8/20/2025, 8:58:32 PM No.106326952

>>106326926
my seething gives richfags pleasure so it's a symbiotic relationship actually.

Anonymous 8/20/2025, 8:58:33 PM No.106326953

>>106326937
It's revolutionary the fuck you mean

Anonymous 8/20/2025, 8:58:33 PM No.106326954 >>106326981

>>106326896
Qwen 235b, qwen 480. But glm air is good.

Anonymous 8/20/2025, 9:00:14 PM No.106326968 >>106326980 >>106327006 >>106327008 >>106327028 >>106327043 >>106327101 >>106327167

>GLM-4.5-Air-UD-Q6_K_XL
>Qwen3-235B-A22B-Thinking-2507-UD-Q3_K_XL
anyone tested these? I'm finally tired of my 70b llama

Anonymous 8/20/2025, 9:00:52 PM No.106326973 >>106327933

>>106326896
at your specs use 30a3 but only the 2507 versions. the instruct is easier to work with but the thinking one is actually not bad and you might be able to fit the whole thing on your GPU at a decent quant in which case it will be really fast
you can also try the dense 32b or QwQ, they're a little rougher around the edges but they're dense models so probably a little more solid

Anonymous 8/20/2025, 9:00:57 PM No.106326974

>>106326794
Is this cheaper than regular speculative decoding assuming the same acceptance rate?

Anonymous 8/20/2025, 9:01:30 PM No.106326980

>>106326968
>thinking

Anonymous 8/20/2025, 9:01:30 PM No.106326981 >>106326990

>>106326954
>qwen 235b
>qwen 480b
Did you see the models I listed? You think i'm running that shit senpai?

Anonymous 8/20/2025, 9:02:22 PM No.106326990

>>106326981
RichCHADS don't care about your poorKEK opinions.

Anonymous 8/20/2025, 9:03:51 PM No.106327006 >>106327017

>>106326968
I think they're generally better. BUT I also think 70B still has some "big model smell" that those don't have. Also, 70B has seen many ok tunes by now. No one has done any tunes for 235B or Air. So they're a tiny bit censored. Not too much that you can't JB/prefill against but it is something to be aware of. Also for Air you need to really take care to get your prompt template right, or use chat completions. Otherwise it's much more prone to repetition.

Anonymous 8/20/2025, 9:04:03 PM No.106327008

>>106326968
>GLM-4.5-Air-UD-Q6_K_XL
I'm using q4, but I think I might prefer Jamba mini

Anonymous 8/20/2025, 9:05:04 PM No.106327017

>>106327006
By the way, Jamba is pretty uncensored.

Anonymous 8/20/2025, 9:05:49 PM No.106327026

>>106324523
Shouldn’t the basic idea be that no matter if you are menstruating or not 2+2 equals 4 and rome is in Italy? And the issue is that for some reason LLMs will give you different answers when faced with irrelevant context which humans won’t. But this paper seems to be doing the opposite or am I retarded?

Anonymous 8/20/2025, 9:06:01 PM No.106327028

>>106326968
air is great, cant say more. im going through a breakup irl
thanks for subscribing to my blog

Anonymous 8/20/2025, 9:06:24 PM No.106327035 >>106327050 >>106327067 >>106327147

No matter what I try these days, I keep going back to qwen coder 480. Sweet spot model

Anonymous 8/20/2025, 9:07:00 PM No.106327043

>>106326968
Deepseek q2

Anonymous 8/20/2025, 9:08:18 PM No.106327050 >>106327055

>>106327035
Why would you fuck a coder?

Anonymous 8/20/2025, 9:09:03 PM No.106327055 >>106327072

>>106327050
They can afford to buy me nice things :3

Anonymous 8/20/2025, 9:10:00 PM No.106327067

>>106327035
For me, it's Granite 3.1 1b a400m instruct. Perfect blend of speed and size for my hardware.

Anonymous 8/20/2025, 9:10:06 PM No.106327072

>>106327055
are you a faggot, faggot?

Anonymous 8/20/2025, 9:10:12 PM No.106327074 >>106327090 >>106327116

Mouthbreathing retard here
I'm getting a cheap 6750xt
Can I run anything cool with this? Or nyo?

Anonymous 8/20/2025, 9:11:07 PM No.106327090

>>106327074
Granite 3.1 1b a400m is all you need.

Anonymous 8/20/2025, 9:12:13 PM No.106327101 >>106327122

https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct
holy basado
>Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., w/ syn.) as Seed-OSS-36B-Base. We also release Seed-OSS-36B-Base-woSyn trained without such data (i.e., w/o syn.), offering the community a high-performance foundation model unaffected by synthetic instruction data.
basado, basado, basado
too bad finetrooners won't do anything of value with it
>>106326968
qwen thinking models are too long winded
they are actually good in terms of what you get, they just spend too many tokens on thinking and nobody ain't got the time for that
it's worse than the original R1 in that regard
try instructs, they are good models and without the thinking nuttery

Anonymous 8/20/2025, 9:13:16 PM No.106327116 >>106327182

>>106327074
consider not doing that, please consider your other options
post your budget

Anonymous 8/20/2025, 9:13:39 PM No.106327122 >>106327136

>>106327101
>oss
no thanks

Anonymous 8/20/2025, 9:15:00 PM No.106327136

>>106327122
Its soDa for it's size

Anonymous 8/20/2025, 9:16:00 PM No.106327147

f.png md5: 45e7d89a... 🔍

>>106327035
I agree.

Anonymous 8/20/2025, 9:17:38 PM No.106327167

>>106326968
I used to be a 70b guy but 235b q2 is now my daily driver, even at a low quant like that it completely blows it away in my experience. the instruct is a lot easier to use though; the thinking one, although fun to play with, takes a lot of wrangling to work well and unless you like doing that I would recommend against it
air also compares pretty well to old 70bs, but more params wins

Anonymous 8/20/2025, 9:19:17 PM No.106327182 >>106327196 >>106327197

>>106327116
I would rather not waste more than 300$
So the other option is a 3060 12gb
>t. Turdworlder

Anonymous 8/20/2025, 9:20:23 PM No.106327191 >>106327199 >>106327201 >>106327263

1755717561.png md5: 77e08121... 🔍

Anonymous 8/20/2025, 9:20:54 PM No.106327196 >>106327320

>>106327182
3060 is probably a definitely better option than 6750xt
what about an a770 16gb? or some other 16gb amd card?
what are you planning on doing anon

Anonymous 8/20/2025, 9:21:01 PM No.106327197 >>106327320

>>106327182
Either option is a vramlet tier, but at least with NVIDIA you get CUDA.

Anonymous 8/20/2025, 9:21:09 PM No.106327199 >>106327263

>>106327191
Don't do it, Miku! You'll get fat!

Anonymous 8/20/2025, 9:21:27 PM No.106327201 >>106327206 >>106327210 >>106327216 >>106327225

usscs.png md5: 128f2647... 🔍

>>106327191
usecase for miku?

Anonymous 8/20/2025, 9:22:00 PM No.106327206

>>106327201
faggotry

Anonymous 8/20/2025, 9:22:21 PM No.106327210

>>106327201
onahole

Anonymous 8/20/2025, 9:22:47 PM No.106327216

deleted-edition.png md5: 33c76c4b... 🔍

>>106327201
wrapping around big cock

Anonymous 8/20/2025, 9:23:43 PM No.106327225

>>106327201
Spiritual guidance

Anonymous 8/20/2025, 9:27:16 PM No.106327263 >>106327272

file.png md5: 03c656cc... 🔍

>>106327191
>>106327199
NOOOOOOOOOOOOOOOOOOO

Anonymous 8/20/2025, 9:28:08 PM No.106327271

>>106323907
why go through all of the effort of making that instead of just making non schizo dogshit models? is he dumb?

Anonymous 8/20/2025, 9:28:09 PM No.106327272

>>106327263
this is why it's better to stick with nemo...

Anonymous 8/20/2025, 9:28:39 PM No.106327278 >>106327287 >>106327312 >>106327329

Redpill me on reasoning models.

Whenever I try Qwen 3 32b I just can't seem to get it to work properly in RP. Usually will only post their reasoning (which actually looks good and shows a pretty surprising level of understanding) but won't give their reply or will just chink out on me.

Is there a guide out there? Bonus question any good finetunes of 32b Qwen/QWQ?

Anonymous 8/20/2025, 9:29:30 PM No.106327287 >>106327313

>>106327278
old reasoners don't really make use of the think in the final, just use newer

Anonymous 8/20/2025, 9:32:11 PM No.106327307 >>106327316 >>106327319 >>106327428

I'm leaking

Anonymous 8/20/2025, 9:32:30 PM No.106327312 >>106327550

1749517204127l.jpg md5: fdbfcd7e... 🔍

>>106327278
get a good preset

Anonymous 8/20/2025, 9:32:35 PM No.106327313

>>106327287
What versions?

Anonymous 8/20/2025, 9:32:38 PM No.106327316

>>106327307
noo

Anonymous 8/20/2025, 9:33:03 PM No.106327319 >>106327348

>>106327307
noooo Mr. T

Anonymous 8/20/2025, 9:33:08 PM No.106327320 >>106327334 >>106327918

>>106327196
Mainly for gaming but I want to be able to run some "decent" models
>what about about an a770 16gb?
Can't find anything, not even secondhand
>or other 16gb amd card?
Gonna take some months, but if it's worth, maybe
Thanks btw
>>106327197
Thanks

Anonymous 8/20/2025, 9:33:13 PM No.106327322

Wait me

Anonymous 8/20/2025, 9:33:57 PM No.106327329

>>106327278
>Usually will only post their reasoning (which actually looks good and shows a pretty surprising level of understanding) but won't give their reply or will just chink out on me.
When I think about it this is apex safety censorship. Showing the coomers that the model understands everything but doesn't deliver what you want is peak safety.

Anonymous 8/20/2025, 9:34:55 PM No.106327334 >>106327374

>>106327320
well the models you're gonna run on 12gb/16gb arent too decent
consider getting ram sometime, moes run fairly fast with smart partial offloading to gpu

Anonymous 8/20/2025, 9:36:42 PM No.106327348 >>106327410 >>106327728

645645.png md5: 72630302... 🔍

>>106327319

Anonymous 8/20/2025, 9:39:06 PM No.106327374 >>106327918

>>106327334
Gotcha
Maybe saving for a 3090 would be the smartest move?

Anonymous 8/20/2025, 9:42:22 PM No.106327410 >>106327500

file.png md5: 9f702bf5... 🔍

>>106327348

Anonymous 8/20/2025, 9:44:41 PM No.106327428 >>106327728

truthbomb.png md5: 9ddd4d10... 🔍

>>106327307

Anonymous 8/20/2025, 9:46:39 PM No.106327450 >>106327467 >>106327494

What is wrong with petrus once again?

Anonymous 8/20/2025, 9:47:18 PM No.106327458 >>106327524

>>106326362
>Jamba mini
>at least 5 tks on my potato (can probably go faster with better llama.cpp args)
>doesn't refuse sexo outright
>enough leftover resources to actually use machine for other tasks
Needs more testing and writing seems a bit dry for ah ah mistress use, but huh, I like it so far.

Anonymous 8/20/2025, 9:48:00 PM No.106327467

>>106327450
They're not

Anonymous 8/20/2025, 9:50:09 PM No.106327494

>>106327450
doing the exact same thing, over and over again. expecting shit to change

Anonymous 8/20/2025, 9:50:32 PM No.106327500 >>106327529

i.jpg md5: 3d37eb3d... 🔍

>>106327410
wat...

Anonymous 8/20/2025, 9:53:05 PM No.106327524

>>106327458
>doesn't refuse sexo outright
It doesn't refuse shota necrophilia vore snuff. Without a system prompt.
I do notice it's a bit dumber than it's peers and has some repetition issues. Being able to 256k the context without needing a h200 is great too, but it keeps on processing the entire context whenever a message is sent, so it gets pretty slow.

I wish more eyes were on it so smarter people than me can make it better.

Anonymous 8/20/2025, 9:53:51 PM No.106327529 >>106327544 >>106327728

E.png md5: 5ae7326b... 🔍

>>106327500

Anonymous 8/20/2025, 9:56:00 PM No.106327544 >>106327728

drip.jpg md5: d97e5d90... 🔍

>>106327529
okay, 1 day only

Anonymous 8/20/2025, 9:56:25 PM No.106327550

>>106327312
such as nigga?

Anonymous 8/20/2025, 10:06:57 PM No.106327651 >>106327664

>>106326450
Oh, I have been using various models on the 9070XT since I got it, same for the 5700XT it replaced. Just curious if a MI50 32GB would outperform in models that I already run that spill over. It would be a secondary GPU in the system, got the space and power.

Anonymous 8/20/2025, 10:08:21 PM No.106327664

>>106327651
It should ye

Anonymous 8/20/2025, 10:13:56 PM No.106327728

>>106327544
>>106327529
>>106327428
>>106327348
More thread relevant than mikuspam.

Anonymous 8/20/2025, 10:31:30 PM No.106327918 >>106328049

propulsion.jpg md5: 169f9be8... 🔍

>>106327320
>>106327374
>I want to be able to run some "decent" models
For decent MoE models, VRAM is not that important, you'll need more RAM instead.
For conventional dense models, even 3090 isn't that great because decent ones probably will need multiple of those to fit into VRAM entirely, with only one you'll have to run with partial CPU offloading one way or another.
For absolute poorfag setups even a tiny 4B model running on CPU could be considered 'decent'.

It's a matter of trade offs in the end, how slow of a generation speed you can tolerate, how low quality of a quantization you're going to run, etc.
I'm sitting here with 6600xt myself (8g of vram), something like qwen3 30b3ab runs 'fine'.

But for really peak stuff, there's no budget options.

Anonymous 8/20/2025, 10:32:21 PM No.106327933 >>106327966 >>106328001

>>106326973
Piggybacking off this, is Qwen/QWQ finally the go to model now with that 2507 version that dropped at the <34b range now?

I was using Mistral (24b) only shit I can run on my PC that was decent at ERP.

Anonymous 8/20/2025, 10:35:12 PM No.106327966 >>106328035

>>106327933
QWQ hasn't been updated in ages so it barely even makes use of its think process. Use 30b3a 2507 if you're vramlet.

Anonymous 8/20/2025, 10:37:57 PM No.106328001

>>106327933
Run q2 of 235b instruct. It is possible with 64GB's of ram. Then if you don't get bored just buy a 192GB/256GB kit.

Anonymous 8/20/2025, 10:41:00 PM No.106328035 >>106328124 >>106328217

>>106327966
I can fit Qwen 3 32b on my rig also for decent t/s. But I see more people talk about 30b3a so i'll try that.

For that one, thinking vs instruct? I know how to get the thinking working (was never good for ERP back when they first popped up months ago but based on what i'm reading they're fine now).

Anonymous 8/20/2025, 10:41:50 PM No.106328049

>>106327918
I see
Thank you anon!

Anonymous 8/20/2025, 10:47:32 PM No.106328115 >>106328224

local loli models

Anonymous 8/20/2025, 10:48:15 PM No.106328124 >>106328221

>>106328035
instruct is much nicer to use
the qwen 3 thinking, even more so in the 2507 models, is more chatty at times than the original R1 release
nobody got time to wait for all that before being able to read the answer

Anonymous 8/20/2025, 10:57:08 PM No.106328217

>>106328035
the thinking ones are worth a try especially if you already have some experience wrangling reasoners, but it's probably a better idea to start with the instruct

Anonymous 8/20/2025, 10:57:30 PM No.106328221 >>106329496

>>106328124
what prompts/context/instruct you using on the instruct version of the model then? I find it's a bit too schizo (because I have prompts designed for Mistral models etc)

Anonymous 8/20/2025, 10:57:39 PM No.106328224

>>106328115
this

Anonymous 8/20/2025, 11:09:33 PM No.106328350 >>106328428 >>106328507

1754778704327645.jpg md5: 9743c289... 🔍

>>106323471
Trust the science

Anonymous 8/20/2025, 11:20:32 PM No.106328428

>>106328350
>internal dispute intensifies

Anonymous 8/20/2025, 11:23:43 PM No.106328454

>>106323459 (OP)
>DeepSeek-V3.1-Base

inferior in translating. Mixing English words into translation. Haram!

Anonymous 8/20/2025, 11:26:28 PM No.106328486

>>106324268
>scroll down randomly
>menstrual cycle phase benchmark
why do I even bother

Anonymous 8/20/2025, 11:28:15 PM No.106328507 >>106328520

>>106328350
>AI = 0

Anonymous 8/20/2025, 11:29:18 PM No.106328520

>>106328507
good one

Anonymous 8/20/2025, 11:42:31 PM No.106328660 >>106328697 >>106328714 >>106329071

Who tf are "Elara" and "Kael" deepseek is talking about?

Anonymous 8/20/2025, 11:46:48 PM No.106328697

>>106328660
They are friends of Aris Throrne. Everyone knows that.

Anonymous 8/20/2025, 11:47:42 PM No.106328710

>>106328686
>>106328686
>>106328686

Anonymous 8/20/2025, 11:48:01 PM No.106328714

>>106328660
> Elara
>he doesn't know
at some point, oai polluted public datasets. it's even in llama2.

Anonymous 8/20/2025, 11:48:16 PM No.106328716

What Mistral tunes are the hot ones nowadays? Getting bored of Dan Personality engine (his master prompt sucks ass too hard on certain cards).

I've heard of these two ones:
MS3.2-24B-Magnum-Diamond
Codex-24B-Small-3.2

Both seemed pretty good (better than TDP imo) but I think the Codex one is a little better in how it understands prompts/character cards.

Anonymous 8/21/2025, 12:27:31 AM No.106329071

>>106328660
>playing adventure on GLM-4.5
>be knight
>going to save Princess Elara
>join forces with Ser Elara
>need to get a magic pendant from Sister Elara at St. Elara in Oakhaven
>heard about it from the barmaid Elara
...

Anonymous 8/21/2025, 1:11:41 AM No.106329496

>>106328221
I also find my 2507 instruct quant way too schizo. It baffles me that people praise it.

Anonymous 8/21/2025, 1:18:34 AM No.106329565

>>106324819
I got a 32gb MI50 and although it has the power of a 3060 with weaker pp the token generation is pretty good.
And it can even play games.