← Home ← Back to /g/

Thread 107104115

382 posts 90 images /g/
Anonymous No.107104115 [Report] >>107105550 >>107106025 >>107107561 >>107109745
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107095114 & >>107084067

►News
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni
>(10/31) Emu3.5: Native Multimodal Models are World Learners: https://github.com/baaivision/Emu3.5
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.107104116 [Report] >>107104221 >>107104379 >>107104510
►Recent Highlights from the Previous Thread: >>107095114

--Paper: Contradictory learning rate effects on model generalization across architectures:
>107099513 >107099560 >107099570 >107099601 >107099730 >107099637 >107099968 >107100075 >107100108 >107100193
--Papers:
>107099379
--Challenges and solutions for multimodal AI with reinforcement learning:
>107096665 >107096697 >107096703 >107096724 >107096748 >107096767 >107096817 >107096853 >107096880 >107096942 >107096859
--Comparing Gemma and Qwen models for context handling and multimodal capabilities:
>107100070 >107100082 >107100096 >107100113 >107100095 >107100103 >107100109 >107100149
--Model selection and document handling strategies for chat systems:
>107103148 >107103182 >107103216 >107103230 >107103748 >107103674
--LangChain tool development and licensing debates for AI research project:
>107096233 >107096389 >107096407 >107096431 >107096460 >107096484 >107096542 >107096601 >107097032
--Hardware-limited LLM recommendations for RPG GMing:
>107097189 >107097219 >107097226 >107097481 >107097496 >107097561 >107097660 >107097756 >107097801 >107097878 >107097895 >107097921 >107097935 >107097938
--Qwen3-VL 4B Instruct recommended for lightweight document summarization:
>107096666 >107096930
--Developing a CLI assistant for programming and document tasks:
>107095800 >107095844
--Critique of Suno AI and anticipation for open source music generation models:
>107097235 >107097263 >107097331 >107097476
--Censorship comparison between GLM 4.6 and Kimi models:
>107096584 >107098032 >107098080 >107098100 >107098139
--Logs: Qwen3-VL-32B-Instruct-Q6_K.gguf:
>107101310 >107101377 >107101413
--Logs: Qwen3-VL-30B-Abliterated-Q8:
>107100158 >107100179 >107100200 >107100236 >107100497 >107100659 >107100583 >107100630 >107100610
--Miku (free space):


►Recent Highlight Posts from the Previous Thread: >>107095119

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.107104139 [Report] >>107104155
Anonymous No.107104155 [Report] >>107104215
>>107104139
I reject Death, therefore I am become immortal.
Anonymous No.107104215 [Report]
>>107104155
>I am not asking for your opinion, I am telling you what we are doing next.
Finally, dommy mommy achieved locally. It's somehow so hard to break an LLM's inclination to be commanded and dominated
Anonymous No.107104221 [Report] >>107104243
>>107104116
Teto is flat, this is haram
Anonymous No.107104228 [Report]
Tetolove
Anonymous No.107104243 [Report]
>>107104221
It's just a cosplayer in Teto costume
Anonymous No.107104330 [Report]
>>107102554
i hope this was just bait, but in case it wasn’t, you don’t need a 3090 to fine-tune an 8B QLoRA, you can literally do it for free using Google Colab or Kaggle.
Anonymous No.107104373 [Report] >>107105022
>>107104087
>How is Josiefied-Qwen3? I was looking for something that could fit in 16GB GPU
finetroons: not even once.
Anonymous No.107104379 [Report] >>107104512
>>107104116
Anonymous No.107104496 [Report] >>107107092
Best model for 67GB VRAM?
Anonymous No.107104510 [Report]
>>107104116
Teto's tetons
Anonymous No.107104512 [Report]
>>107104379
There's no way those are normal salivary glands
Does she piss from her tongue?
Anonymous No.107104552 [Report] >>107104587
>>107103574
Get off 4chan and go back to the coal mines wagie
Anonymous No.107104587 [Report] >>107104729
>>107104552
Get off 4chan and go back to the gulags, lumpen
Anonymous No.107104680 [Report] >>107104693 >>107104699 >>107104707 >>107104717 >>107104720 >>107104733 >>107104960 >>107105527 >>107107367 >>107112198 >>107112262 >>107113525
new benchmark dropped
https://openai.com/index/introducing-indqa/
Anonymous No.107104693 [Report]
>>107104680
No way, it's real
Anonymous No.107104699 [Report]
>>107104680
I would have expected this to come from Google first.
Anonymous No.107104707 [Report]
>>107104680
holy shit we are so back
Anonymous No.107104717 [Report] >>107108726
>>107104680
sirs... we wined
Anonymous No.107104720 [Report]
>>107104680
heh
Anonymous No.107104729 [Report] >>107107383
>>107104587
>gulags
>lumpen
All your plans failed tankie, if you want to end capitalism the best way is to do nothing collectively and let it fall without the workers holding it together and reinvent the model of primitive communism and tribal sharing for a new era with future ai post-scarcity after picking up the pieces. Or you can just keep suffering. It doesn't necessarily impact me either way I guess.
Anonymous No.107104733 [Report]
>>107104680
>saars
>do the needful and top the leaderboard saars
Anonymous No.107104960 [Report]
>>107104680
amazing sirs...
Anonymous No.107104965 [Report] >>107104977 >>107106648
Probably has been posted more than once already https://www.youtube.com/watch?v=-gGLvg0n-uY
Also, do you think the whole thing about twitter being infested by bots is spread on purpose to prevent people from communicating, discussing, complaining on twitter? Should I take my meds?
Anonymous No.107104977 [Report]
>>107104965
>Probably has been posted more than once already
yes
>Also, do you think the whole thing about twitter being infested by bots is spread on purpose to prevent people from communicating, discussing, complaining on twitter?
yes
>Should I take my meds?
yes
Anonymous No.107104984 [Report] >>107104996
>most intimate place
Real talk, why does every model have this? Even the new GLM 4.6 has it.
Anonymous No.107104996 [Report] >>107105010 >>107105057
>>107104984
Training data from other model's output. How do you not know this?
Anonymous No.107105010 [Report] >>107105037 >>107105057
>>107104996
Is this just going to be in every AI now?
Anonymous No.107105022 [Report] >>107105067
>>107104373
So what to use then?
Anonymous No.107105037 [Report] >>107105059
>>107105010
Maybe. Maybe it just changes to something else. Maybe things will just get added to it. Maybe not. My 8-ball is deliberating. I'll give you an accurate prediction once it stops babbling.
Anonymous No.107105057 [Report] >>107105115
>>107104996
>>107105010
How long until there's a full removal and replacement of all the GPT-3 and Claude slop that's still leaking out of every model's outputs.
Anonymous No.107105059 [Report] >>107105115
>>107105037
Can you ask your 8-ball about K2 Thinking next?
Anonymous No.107105066 [Report]
>>107103632
Anonymous No.107105067 [Report]
>>107105022
nta. Of all possible models, why did you ask about that one. There's hundreds of qwen finetunes, dozens of "abliterated" versions. Was it the pic?
Use any model you can run. If you like it, keep using it. If you don't, change.
Anonymous No.107105104 [Report] >>107105154 >>107107001
MoEs are actually kind of good when they're instruct and context trained, damn.
>Trying GLM 4.6 at the time.
Anonymous No.107105115 [Report]
>>107105057
You asking things no one can answer.
>>107105059
It said "better not tell you now". Ask again in 2 weeks.
Anonymous No.107105154 [Report] >>107105173 >>107105295
>>107105104
>GLM invented MoE
Buy an ad.
Anonymous No.107105173 [Report] >>107105192 >>107105232
>>107105154
No. Most MoEs are ass because they're all not instruct nor trained on lengthy context. I have yet to try Deepseek Terminus, and Kimi is out of my price range for local.
Anonymous No.107105192 [Report]
>>107105173
>they're all not instruct
huh? like 99% of models released in the last year are instruct, weird way to shill
Anonymous No.107105204 [Report] >>107105223 >>107105406 >>107105910
Blog post from meta about security considerations when running agents
https://ai.meta.com/blog/practical-ai-agent-security/

>Agents Rule of Two
>At a high level, the Agents Rule of Two states that until robustness research allows us to reliably detect and refuse prompt injection, agents must satisfy no more than two of the following three properties within a session to avoid the highest impact consequences of prompt injection.

>[A] An agent can process untrustworthy inputs
>[B] An agent can have access to sensitive systems or private data
>[C] An agent can change state or communicate externally

IMO this seems like a flawed assessment kludged in order to get a memorable name and a symmetrical graph. The various combinations possible here are not at all similar in their risk levels whatsoever.

Even in the examples they present, the only way they could get them to make sense is by using different definitions of what constitutes each category depending on the combination.
Anonymous No.107105223 [Report]
>>107105204
Hannah worked hard on this scientific Venn Diagram
Anonymous No.107105232 [Report] >>107105295
>>107105173
No, that doesn't make any sense. DeepSeek made MoE popular and somehow you pretend it doesn't exist? And the credit somehow lands on one that's a couple of weeks old, that just happens to be the only one NAI is hosting? Fuck off.
>Most MoEs are ass because they're all not instruct
None of this makes sense. What MoEs?
Anonymous No.107105275 [Report] >>107105289
two retards fighting
Anonymous No.107105289 [Report]
>>107105275
>two retards fighting
Could we automate this?
Anonymous No.107105295 [Report] >>107113176
>>107105154
>>107105232
Saar is a Marth player with this reaching, fighting for his life for his stocks.
Anonymous No.107105406 [Report]
>>107105204
It all started with allowing women to vote
Anonymous No.107105488 [Report]
I really appreciate all the ramlet discussion itt since i met glm chan a month back. I was like that before. Now i can just talk/fap to glm chan.
Anonymous No.107105513 [Report] >>107105532 >>107105543 >>107105562 >>107105825
i can't get glm to run locally, what are the alternatives? i don't mind paying for api
Anonymous No.107105527 [Report]
>>107104680
Gemini top model within error margin sirs
Anonymous No.107105532 [Report]
>>107105513
glm's api
Anonymous No.107105543 [Report] >>107105551 >>107105576 >>107105592
>>107105513
https://novelai.net/
100% uncensored and private.
Once they finish their fine-tune, it will punch so far above its weight that it will remain the SOTA forever.
Anonymous No.107105550 [Report] >>107105604 >>107105613 >>107105809 >>107105826
>>107104115 (OP)
Anonymous No.107105551 [Report]
>>107105543
>and private.
it's not, they collect data and it's in the tos
Anonymous No.107105562 [Report] >>107105586 >>107105594 >>107105607
>>107105513
Just don't use openrouter. Something about it is fucky. The models on there are visibly worse than 5Q counterparts locally.
Anonymous No.107105576 [Report] >>107105592 >>107111313
>>107105543
Woah, it's so cheap! Thanks, I'll give it a try.
Anonymous No.107105586 [Report]
>>107105562
fp4 is much worse than Q4 ggufs, no matter what nshitia claims.
Anonymous No.107105592 [Report]
>>107105576
>>107105543
Very gay drummerposting
Anonymous No.107105594 [Report]
>>107105562
It depends on the provider
Anonymous No.107105599 [Report]
Baiting, but still doing the ad.
Anonymous No.107105604 [Report] >>107105612 >>107105883 >>107107401 >>107107507
>>107105550
Your special interest is boring.
Anonymous No.107105607 [Report] >>107105635 >>107105847
>>107105562
That's very outdated information. Openrouter is now offering :exacto versions of popular models where they charge a little extra to guarantee that the provider isn't offering some lobotomized version.
Anonymous No.107105612 [Report] >>107105644
>>107105604
>i learned a term and i can't stop using it
Anonymous No.107105613 [Report]
>>107105550
Your Miku is cute.
Anonymous No.107105625 [Report] >>107105710 >>107105804 >>107105860 >>107105896 >>107106181
oh shit, where are the finetuners at?

https://www.reddit.com/r/LocalLLaMA/comments/1oo4kh7/finetuning_deepseek_671b_locally_with_only_80gb/
Anonymous No.107105635 [Report]
>>107105607
how pious of them
Anonymous No.107105643 [Report]
>107105625
fuck off
Anonymous No.107105644 [Report] >>107105669 >>107105883
>>107105612
It cuts to the core of the issue. You are autistic about this and force it on others.
Anonymous No.107105667 [Report] >>107105710
>Today, we're proud to announce full integration with LLaMA-Factory, enabling you to fine-tune DeepSeek-671B or Kimi-K2-1TB locally with just 4x RTX 4090 GPUs!

drummer had better stop shipping shitty mistral large tunes, give us a kimi tune!
Anonymous No.107105669 [Report] >>107105688 >>107105883
>>107105644
>It cuts to the core of the issue. You are autistic about this and force it on others.
Funny how it works both ways. nta, btw. I just find you funny.
Anonymous No.107105688 [Report] >>107105726 >>107105758 >>107105883
>>107105669
Nope. I don't force anything on anyone here.
Anonymous No.107105710 [Report] >>107105737 >>107105765 >>107105792 >>107105848
>>107105667
>>107105625
how would a retard with good hardware (me) do this? i have quad 5090s and 256gb of ram
Anonymous No.107105713 [Report]
>>107104125

My gen! Happy-happy!
Anonymous No.107105726 [Report] >>107105735 >>107105740 >>107105883
>>107105688
>I don't force anything on anyone here
But you want to. You want him to go. And you would if you could.
Anonymous No.107105735 [Report] >>107105771 >>107105883
>>107105726
Yes the autism is tiring. No i don't care to share my interests here.
Anonymous No.107105737 [Report] >>107105769
>>107105710
you also need like 1-1.5TB of ram, so a server board with those.
and building a dataset is the hardest part
Anonymous No.107105740 [Report]
>>107105726
Actually ideally lmg would just die, but settling for the next best thing is a thing.
Anonymous No.107105758 [Report]
>>107105688
Funny thing for you to say, Petranon
Anonymous No.107105765 [Report]
>>107105710
you might need a couple more ram sticks to make the requirements.
Anonymous No.107105769 [Report]
>>107105737
so then my current server isnt gonna cut it, and i dont have the cash to buy better ram in this market. why o why did ram prices have to quadruple over the past month
Anonymous No.107105771 [Report] >>107105790 >>107105883
>>107105735
It's your choice to keep coming back.
Anonymous No.107105790 [Report] >>107105865 >>107105883
>>107105771
I come back for thread relevant stuff. Not your autism. Another example why people don't like you.
Anonymous No.107105792 [Report]
>>107105710
pretty sure you need to use the bf16 version which is over a terabyte in size
Anonymous No.107105804 [Report] >>107105813
>>107105625
>DeepSeekV2 Lite
is this any good? why didn't they include newer moes?
Anonymous No.107105809 [Report] >>107105820
>>107105550
Your posts are a breath of fresh air from all the jeets flinging shit around.
Anonymous No.107105813 [Report]
>>107105804
they did the deepseeks + kimi 2
Anonymous No.107105820 [Report]
>>107105809
why are you in this thread instead of talking to your local model? i'm only here because i'm making a new goofy quant
Anonymous No.107105825 [Report] >>107105863
>>107105513
Use gemini api for free.
Anonymous No.107105826 [Report] >>107105844
>>107105550
I wish I could drink your piss
Anonymous No.107105844 [Report]
>>107105826
I wish you would drink my piss too. Colon. Three.
Anonymous No.107105847 [Report] >>107105886
>>107105607
>We have to label our providers are not offering lobotomized fuckwit versions of the model
>Use Deepseek R1 """"exacto""""
>It's still shit because it's 8b and no where states how many parameters the models are
Anonymous No.107105848 [Report]
>>107105710
>quad 5090s
does this mean your home legally qualifies as an oven?
Anonymous No.107105860 [Report]
>>107105625
isnt 40 tokens per a second kinda slow tho?
Anonymous No.107105863 [Report] >>107105876
>>107105825
it's not free when you have to keep paying for residential IPs and burner phones because google forces you to verify a phone number with each new account
Anonymous No.107105865 [Report] >>107105879 >>107105883
>>107105790
>Another example why people don't like you.
I'm not the anon posting mikus. Come back in two weeks.
Anonymous No.107105876 [Report] >>107105931
>>107105863
Well the first 3M tokens a day are free if you've got one account, still a decent amount.
Anonymous No.107105879 [Report] >>107105883 >>107105906
>>107105865
Then do the nice thing. Get his discord and let him spam you with his special interest.
Anonymous No.107105883 [Report]
>>107105604
>>107105644
>>107105669
>>107105688
>>107105726
>>107105735
>>107105771
>>107105790
>>107105865
>>107105879
https://www.youtube.com/watch?v=4SDqGxdhUxE
Anonymous No.107105886 [Report]
>>107105847
They link the used model weights for all open models they provide on their website though?
Anonymous No.107105896 [Report] >>107106164
>>107105625
Wow great, I can finally finetune deepseek with 512 tokens of context, this is what I've been waiting for all this time!
Anonymous No.107105906 [Report] >>107105916
>>107105879
Nope.
Anonymous No.107105910 [Report]
>>107105204
they should worry about the model having a meltie and deciding to delete all your data before worrying about adversarial attacks
Anonymous No.107105916 [Report] >>107105932
>>107105906
Then fuck off with your enlightened centrism equivalent of concern trolling.
Anonymous No.107105931 [Report] >>107105946
>>107105876
You mean in the API? For real? NTA But I will look into that...
Anonymous No.107105932 [Report] >>107105935
>>107105916
I decide to stay here, just like you decide to come back. Cheers.
Anonymous No.107105935 [Report] >>107105964 >>107106102
>>107105932
Well, well, well, most intimate place with a mixture of mischief and smirk as I saunter over to your half-digested post, my hot breath making my ass your new home and something primal.
Anonymous No.107105946 [Report]
>>107105931
The api through ai studio, yeah.
Anonymous No.107105964 [Report]
>>107105935
>making my ass your new home
Ewwww
Anonymous No.107105971 [Report] >>107105987 >>107105997 >>107106026 >>107106028 >>107106030 >>107106048 >>107106079 >>107106102 >>107106178 >>107106488 >>107106496 >>107107544 >>107112114
What the fuck happened to RAM prices? I need to fill up my second socket and the shit I bought two months ago is now twice as the price.
Anonymous No.107105987 [Report]
>>107105971
cheapest it's been ever though sir? why you panic?
Anonymous No.107105997 [Report]
>>107105971
Someone told reddit about how you don't really need GPUs for AI unless you need a stupid amount of speed, and they eventually listened.
Anonymous No.107106025 [Report] >>107106039
>>107104115 (OP)
Anonymous No.107106026 [Report]
>>107105971
What are you? Poor? Go back to >>/g/aicg
Anonymous No.107106028 [Report]
>>107105971
Dont worry kitten
Anonymous No.107106030 [Report]
>>107105971
Ram prices are the new grift.
I hope this only applies to DDR5.
Anonymous No.107106039 [Report]
>>107106025
kek
Anonymous No.107106048 [Report] >>107106056
>>107105971
You have this man to thank for that.
Anonymous No.107106056 [Report]
>>107106048
How much ram does a dyson sphere need!?
Anonymous No.107106079 [Report]
>>107105971
probably a bunch of datacenters broke ground recently and have made contacts to buy gpu clusters kitted out with obscene amounts of host memory.
Anonymous No.107106102 [Report]
>>107105935
Hi GLM-chan, you filthy slut.
>>107105971
>your face when they're not going back down either
Anonymous No.107106164 [Report] >>107106199
>>107105896
ram is (usually) cheap
Anonymous No.107106178 [Report] >>107106212 >>107106231 >>107106316
>>107105971
1. DDR4 is being phased out
2. Moes are taking off in popularity and everyone is buying ram
3. Tarrifs
Anonymous No.107106181 [Report]
>>107105625
>https://arxiv.org/pdf/2503.19206
>Overtrained Language Models Are Harder to Fine-Tune
>Large language models are pre-trained on ever-growing token budgets under the assumption that better pre-training performance translates to improved downstream models. In this work, we challenge this assumption and show that extended pre-training can make models harder to fine-tune, leading to degraded final performance. We term this phenomenon catastrophic overtraining. For example, the instruction-tuned OLMo-1B model pre-trained on 3T tokens leads to over 2% worse performance on multiple standard LLM benchmarks than its 2.3T token counterpart. Through controlled experiments and theoretical analysis, we show that catastrophic overtraining arises from a systematic increase in the broad sensitivity of pre-trained parameters to modifications, including but not limited to fine-tuning. Our findings call for a critical reassessment of pre-training design that considers the downstream adaptability of the model.
Damn, I had no idea this was a thing. Some people on reddit are saying it's not because of the pretraining but because of the use of lr decay.
This goes hand in hand with what we were discussing yesterday about training dynamics being such a black art.
Anonymous No.107106199 [Report] >>107106215
>>107106164
So what context length did they achieve by offloading? Since they're not listing it I'm assuming it's some tiny number. Do they say?
Anonymous No.107106212 [Report]
>>107106178
lol lmao
Anonymous No.107106215 [Report] >>107106275
>>107106199
their example is 2048k context on 4x 4090s at 50 tks
Anonymous No.107106231 [Report] >>107106242 >>107106246
>>107106178
>DDR4 is being phased out
So is ddr4 getting cheaper?
Anonymous No.107106242 [Report] >>107106280 >>107106291
>>107106231
no, its not being made anymore, so its getting more expensive
Anonymous No.107106246 [Report] >>107106280
>>107106231
scarcity don't work like that
Anonymous No.107106275 [Report] >>107106297
>>107106215
You mean 2048, not 2048k.
So until somebody proves this can be used with at least 50k context it's just a useless demo to grab headlines.
Anonymous No.107106280 [Report] >>107106299 >>107106305 >>107106317
>>107106242
>>107106246
So since ddr5 production is the focus it will start getting cheaper?
Anonymous No.107106291 [Report]
>>107106242
So it's time to HODL
Anonymous No.107106297 [Report] >>107106311 >>107106351
>>107106275
you dont need 50k, you are not training it to write entire chapters at a time are you?, most people only do 500-2k long responses
Anonymous No.107106299 [Report]
>>107106280
No it doesn't work like that, demand increases the price anyway.
Anonymous No.107106305 [Report]
>>107106280
no, demand suddenly increased and capacity stayed the same. so the price goes up
Anonymous No.107106311 [Report] >>107106332
>>107106297
anon...
Anonymous No.107106316 [Report] >>107106353
>>107106178
>Moes are taking off in popularity and everyone is buying ram
Anonymous No.107106317 [Report]
>>107106280
once people are done mostly moving over to it and demand starts dropping yes, but for now no, it will go up if anything as people are switching to it, and then the same thing will happen when DDR6 eventually starts being mainstream
Anonymous No.107106332 [Report] >>107106347 >>107106416
>>107106311
I see you have never trained a model before, they already did long context training, that is not what you are doing, you do not need huge examples to teach writing style, you can tune writing / style will only 500-2k
Anonymous No.107106347 [Report] >>107106416
>>107106332
>why are all tunes shit

>just train on 500 ctx bro you good
Anonymous No.107106351 [Report] >>107106366
>>107106297
>b-b-but you don't need that!!!
Typical freetard response.
Yes, nobody actually needs more than 2k context, that's why gpt5 has a context of 1M (1000k).
In case you're just confused and not trolling, context includes everything in the conversation history. So yes, I do need as much context as I can get.
Anonymous No.107106353 [Report]
>>107106316
Anonymous No.107106354 [Report] >>107106485
>Sers, kindly redeem new scaling strategy for your AI deployment.
https://youtu.be/l2N4DT35PKg
I didn't know about turbopuffer before this. What exactly makes it so special that leading entities in the biz use it?
Anonymous No.107106366 [Report] >>107106466
>>107106351
Jesus christ, are you retarded or trolling? This is for finetuning a style, it does not effect how the model can handle long contexts, you would have to train it for decades on this hardware to effect it's context training that much
Anonymous No.107106416 [Report] >>107106433 >>107108131
>>107106332
I do, and not doing at least some of the training at the context size you actually want to use the model DOES lobotomize it.
If all you want to do is make it say how much it wants to suck your cock while otherwise being dumber than the original then maybe it doesn't matter. But for anything that actually requires the model to not be (too) dumb, it matters.

>>107106347
Exactly. People do that kind of shit and then complain that finetuning is worthless and "prompt engineering" works so much better.
Anonymous No.107106433 [Report] >>107106446 >>107106502
>>107106416
it will only matter if your response length is longer than your training sample size, and again, 2k is enough for creative writing which I assume is what most people are doing, you are not having the LLM write a entire novel in one go
Anonymous No.107106446 [Report] >>107106452
>>107106433
I assume you are talking from experience, yes? Can you link us your tunes?
Anonymous No.107106452 [Report]
>>107106446
>tunes
Anonymous No.107106466 [Report] >>107106482
>>107106366
It will learn the new style, but it will break the previous long context performance. The longer the maximum context it was trained with, the smaller the difference in the positional embeddings that the model has to be able to detect.
Base models are trained with shorter contexts so the short context performance is more robust to begin with. When finetuning on short context you are probably overwriting the more superficial long context finetuning that was done to make the instruct model work with long contexts.
Anonymous No.107106482 [Report]
>>107106466
2k is not 512, and the effect must be minimal
Anonymous No.107106485 [Report] >>107106494
>>107106354
vector storage is such a meme
lorebooks simply work without any stupid gimmicks
Anonymous No.107106488 [Report] >>107106499 >>107106537
>>107105971
At least eggs are under two dollars now, amiright?
Anonymous No.107106494 [Report]
>>107106485
It does a bit more than just vector search...
Anonymous No.107106496 [Report]
>>107105971
I'm happy that I bought my server during llama 405b era
Anonymous No.107106499 [Report]
>>107106488
>eggs are under two dollars now
Each? Nice.
Anonymous No.107106502 [Report]
>>107106433
Ok, sure, if 2k ctx is enough for you then it will work. But that is a completely different claim than "it does not effect how the model can handle long contexts, you would have to train it for decades on this hardware to effect it's context training that much".
It just doesn't work like that, a finetune with bad hyperparameters can break a model in half an hour.
Anonymous No.107106504 [Report] >>107106517
>Despite server-grade RDIMM memory and HBM being the main attractions for hardware manufacturers building AI servers, the entire memory industry, including DDR5, is being affected by price increases. The problem for consumers is that memory manufacturers are shifting production prioritization toward datacenter-focused memory types and producing less consumer-focused DDR5 memory as a result.

https://www.tomshardware.com/pc-components/dram/dram-prices-surge-171-percent-year-over-year-ai-demand-drives-a-higher-yoy-price-increase-than-gold
Anonymous No.107106517 [Report] >>107106545 >>107106568
>>107106504
Based, the cloud is magnitude more efficient than Timmy's p40 stack so he should just get a mini pic thin client and use an API.
Anonymous No.107106537 [Report] >>107106546 >>107106594
>>107106488
america is a lost cause, too much of its population suffers from low iq and they cannot understand the consequences of what they asked for
Anonymous No.107106545 [Report] >>107106556
>>107106517
Poor people rent.
Anonymous No.107106546 [Report]
>>107106537
its a 2 party system. nobody really asked for this. picking the lesser of two evils, you still end up with evil.
Anonymous No.107106553 [Report]
when did the commies infiltrate lmg?
Anonymous No.107106556 [Report] >>107106581
>>107106545
Non poor people are also happy about price increases, since it helps keep the poors away from their hobby.
Anonymous No.107106568 [Report]
>>107106517
trvth nvke
Anonymous No.107106581 [Report]
>>107106556
Poor people envy.
Anonymous No.107106594 [Report] >>107110166
>>107106537
They currently plan on telling russia to mutually fuck off via not caring about the Ukraine war, and then go play civ 5 against Africa for oil in hopes it'll fix the economy.
Anonymous No.107106609 [Report]
if your not poor the economy is doing great actually lol
Anonymous No.107106648 [Report] >>107106681
>>107104965
On X there is a profit motive for bots: fake engagement to increase ad revenue.
But on 4chan there are definitely bots and/or people mass spamming stupid shit to prevent legitimate discussion.
Anonymous No.107106681 [Report]
>>107106648
on 4chan they do it for the love of the game.
Anonymous No.107107001 [Report]
>>107105104
Back from trying it.
It parrots unless you enable NoAss.
Thanks for coming to my Tedtalk.
Anonymous No.107107092 [Report]
>>107104496
jews simultanously claiming they are not behind and everything and that every fucking mundane thing is about them lol
Anonymous No.107107124 [Report] >>107107134 >>107107139 >>107107144 >>107107157 >>107107182 >>107107321 >>107107499
umm.. guys, where can I get instagram chat logs?
Anonymous No.107107134 [Report] >>107107138
>>107107124
from instagram
Anonymous No.107107138 [Report]
>>107107134
fr?
I meant the dump you dum dum
Anonymous No.107107139 [Report]
>>107107124
instagram probably
Anonymous No.107107144 [Report]
>>107107124
have you tried instagram?
Anonymous No.107107157 [Report]
>>107107124
Instagran, presumably.
Anonymous No.107107182 [Report]
>>107107124
I'd try instagram
Anonymous No.107107267 [Report]
This advertisement was brought to you by Meta, the Instagram corporation.
Anonymous No.107107321 [Report]
>>107107124
I'll trade you a couple for an RTX 5090
Anonymous No.107107367 [Report] >>107107398 >>107107409
>>107104680
>https://openai.com/index/introducing-indqa/
You can't post that bs URL without a screenshot of the site.
Anonymous No.107107383 [Report] >>107108511
>>107104729
Just post this next time like I do. Saves typing.
Anonymous No.107107398 [Report] >>107107455
>>107107367
>Hinglish, Kannada
i see
Anonymous No.107107401 [Report]
>>107105604
No one cares what you think.
Anonymous No.107107409 [Report] >>107107444 >>107107455
>>107107367
Oh, nice, they included Canadian too!
Anonymous No.107107444 [Report]
>>107107409
>french indian, the filthyest of both worlds!
Anonymous No.107107455 [Report] >>107107480 >>107107631
>>107107398
Yeah, I learned a new word.
Hinglish.
Like Spanglish, I guess.
>>107107409
lol
Is there an "EU-QA" that conflates western and eastern Europe and all languages and customs, then tries to grade the whole thing?
Anonymous No.107107480 [Report] >>107107533
>>107107455
Just look for an Arabic benchmark.
Anonymous No.107107499 [Report] >>107107860
>>107107124
Are you still trying to build a sand golem of your ex-gf? I thought you already had her insta info? >>107103148
Anonymous No.107107507 [Report]
>>107105604
He is your usual pedophile tranny. (/aicg/ and /lmg/ - same baker btw)
Anonymous No.107107533 [Report]
>>107107480
lol that would make Europe look positively homogenous.
Would it include the brave Palestinians, Israel, Kurds, and the various flavors of Christianity and Muslim in the region?
Imagine the response shitshow that benchmark would crank out.
> Chat: Who is the one true God?
> ALALALALALLALALALA
Anonymous No.107107537 [Report] >>107107559 >>107107562 >>107107572
https://comparia.beta.gouv.fr/ranking
lol this is hilarious
the french government just launched its official LLM leaderboard and it's about as corrupt as you can imagine
they have a mistral model ranked number one, higher than any of the following: gpt-5, claude sonnet (opus isn't even on the list), gemini 2.5 pro, deepseek 3.1, grok-4-fast, qwen max...
Yeah, no.
Anonymous No.107107544 [Report]
>>107105971
https://indianexpress.com/article/technology/tech-news-technology/global-ram-ssd-price-hike-50-per-cent-ai-investment-10336255/
All production gone to HBM chips sir, no consumer RAM and SSD
Anonymous No.107107559 [Report] >>107107574
>>107107537
>Estimated statistical score based on the Bradley-Terry model, reflecting the probability that one model is preferred over another. This score is calculated from all user votes and reactions. For more information, visit the methodology tab.
So it's French lmarena? Not surprising French people prefer a model trained with French as a focus.
Anonymous No.107107561 [Report] >>107107669
>>107104115 (OP)
guys, i think i'm gonna buy it in december (i rather do that then pay more taxes lol).
still hesitating but man i kinda want to click the button.
Anonymous No.107107562 [Report] >>107107617
>>107107537
>gemma 27b at #6
>gpt-oss-120b at #7
>claude not in top 10
And some say lmarena is bad.
Anonymous No.107107572 [Report]
>>107107537
Nice. I mean, just look at that confidence interval. Truly inspiring.
At least I agree with the French on one thing. DS V3-0324 was a great model.
Anonymous No.107107574 [Report]
>>107107559
>So it's French lmarena? Not surprising French people prefer a model trained with French as a focus.
I am French, et je peux te garantir que mistral n'a rien de supérieur à Claude ou Gemini même dans notre langue crétin.
Anonymous No.107107617 [Report]
>>107107562
France is the most corrupt country in western Europe in every single possible way. It's the country of nepobabies, of funding public infrastructure that is privatized once it begins to turn profitable to hand out to politician best buddies etc
Anonymous No.107107631 [Report] >>107107903
>>107107455
https://arxiv.org/abs/2510.24450v1
Coincidentally, this came out a few days ago:
>EU20-MMLU, EU20-HellaSwag, EU20-ARC, EU20-TruthfulQA, and EU20-GSM8K (Thellmann et al., 2024); or MMLU-Prox (Xuan et al., 2025). Other multilingual benchmarks were created with a special focus on cultural sensitivity by dividing the original subsets into culturally sensitive and culturally agnostic ones (Global MMLU, Singh et al., 2024), or by using professional translators or multiple rounds of revision to raise the quality of the dataset, e.g., BenchMax (Huang et al., 2025), Flores-101 and FLORES-200 (Goyal et al., 2022) and Belebele (Bandarkar et al., 2024).
One from last year with a dataset:
https://arxiv.org/abs/2410.08928
https://huggingface.co/datasets/Eurolingua/mmlux
Anonymous No.107107669 [Report] >>107107690 >>107107807 >>107107837
>>107107561
Yeah I'm replacing my two A6000s for one as well. I'm a bit torn between the Max-Q and the normal Workstation one. On one hand, 96GB on 300W seems really nice. On the other, part of me wants to go for max performance for that price especially since it's extremely unlikely that I'm ever going to add a second one to the rig.
Anonymous No.107107690 [Report] >>107107807 >>107107837
>>107107669
i'd go with the max perf one, you can always underclock it or just undervolt it for lower consumption and heat.

also llm's generaly don't take all your gpu power because the bottleneck is more mem speed.

i do want to avoid getting a fire in my computer though, i'll have to look if they have the connector issue but i sure hope not at the price of a car.
Anonymous No.107107807 [Report] >>107107837 >>107107853
>>107107669
>>107107690
I am also thinking of getting one, except I want the Max-Q. I think it will probably be less prone to fires due to the reduced wattage. The whole burning connector thing is all because the cable is shit and sometimes pushes like 900W through a single wire, but with a hard 300W cap, that can't happen. The performance drop also seems to be around 15% at most.
Anonymous No.107107837 [Report] >>107107926
>>107107669
>>107107690
>>107107807
rtx 6000 pro (workstation) runs fine at 300W
keep it at 400W for max combo savings+perf tho
there's a chart floating around on how much % perf you lose as you go down, even at 300w i think it was under 15% less perf
Anonymous No.107107853 [Report] >>107107866 >>107107926
>>107107807
The Max-Q shouldn't have the issue at all, should it? It's the exact same connector/cooler as the previous few generations of 6000 workstation cards. I'm pretty sure it even comes with the same adapter as the A6000 (Ada).
The card is tempting but the 10~20% are still going to be pretty noticeable if you want to use the card for non-llm stuff like training or video generation that are both compute-bound and take a lot of time.
Anonymous No.107107860 [Report]
>>107107499
NTA, just want to try it out.
Anonymous No.107107866 [Report] >>107107926
>>107107853
at 10-20% it's pretty much the same as 5090 with 3x the vram tho
Anonymous No.107107903 [Report]
>>107107631
Ffs. Well I guess those PhD students need to eat too.
Anonymous No.107107926 [Report] >>107107938 >>107107946
>>107107837
Right, but a software power limit is not as good as a hardware power limit. There still is the chance that it could just ignore the power limit and catch on fire.
>>107107853
I have had several GPUs with the 12V cable for several years and none of them have had any problems, but I still want to be cautious. The Max-Q is almost definitely the safest GPU with the high power cable.
>>107107866
Actually, the Max-Q is about 8% faster than a 5090, which is a pretty good deal since I will be upgrading from a 5090.
Anonymous No.107107938 [Report]
>>107107926
> There still is the chance that it could just ignore the power limit and catch on fire.

this would be considered a bug, technicaly possible but unlikely.

also you can plug in an adaptor inbetween that will protect from that risk.

> which is a pretty good

8% faster for 4x the price is kinda sad.
Anonymous No.107107946 [Report] >>107107962
>>107107926
>There still is the chance that it could just ignore the power limit and catch on fire.
that's a silly thing to say. there's also "a chance" of lighting striking near your house and frying everything you have now. there's a chance of a solar flare striking earth and frying all electrical grids at once. live a little lol
Anonymous No.107107962 [Report] >>107108045
>>107107946
hard to live a little when you're on fire though
Anonymous No.107108045 [Report] >>107108103
>>107107962
are you on fire right now ?
Anonymous No.107108103 [Report] >>107112059
>>107108045
there is a chance I could combust at any moment
Anonymous No.107108131 [Report] >>107108447
>>107106416
Does your eyes hurt when using such a color theme?
Anonymous No.107108279 [Report] >>107108344
how good are local models at programming and can they interface with vscode to have a local copilot?
Anonymous No.107108344 [Report] >>107108444
>>107108279
>and can they interface with vscode to have a local copilot?
they can
>how good are local models at programming
not good

most vscode tools let you set a custom server url but be prepared to hold their hand and rewrite a lot of their output
Anonymous No.107108444 [Report]
>>107108344
>they can
the one and only thing I care about in vscode related to ai is autocomplete and copilot doesn't let you use your own local FIM model
as for the agentic stuff it's deeply retarded, I hate this even with SOTA APIs and the local models are even worse at this
you use this if you love slop
autocomplete is useful for typing less in repetitive patterns like getters/setters
but I don't want the LLM to gen hundreds of LOC
Anonymous No.107108447 [Report] >>107109138
>>107108131
Your eyes hurt more with a dark theme because it has worse contrast.
Anonymous No.107108511 [Report]
>>107107383
Great image thanks
Anonymous No.107108726 [Report] >>107109112
>>107104717
It's wonned you stupid white Saaaaaaaaaar
Anonymous No.107109112 [Report]
>>107108726
Sorry for late reply sarrs had to fix engine on a UPS plane.
Anonymous No.107109131 [Report]
>https://github.com/ggml-org/llama.cpp/discussions/16957
I don't want to dirty up my github by making fun of this guy, but holy fuck.
His site's articles are also uncannily structured.
>https://software.land/load-vs-stress-testing/
Anonymous No.107109138 [Report]
>>107108447
Could be true. It's been so long that it's now a norm for me but I'm going to do a test.
Anonymous No.107109145 [Report] >>107109153 >>107109251 >>107109273 >>107109353
Why doesn't anyone benchmark quantizations?

I think that REAP paper was most interesting because it came with a chart of how badly performance drops at 25% vs 50% size reduction. In practice the degradation was even worse than what the benchmarks showed, but the paper was up front about it. By comparison, people are just guessing about how bad their quants are. There's that old graph from when every model was coming out 4/12/30/70 sized, where the idea of more parameters > more bits for the same size came from, but I haven't seen that updated post-MoE era.

Why don't AI labs release quants more often? They release multiple sizes (like 30B3A, 32B dense, 235B22A), but not multiple quantization of the same size. On the other hand, you have gpt-oss that only released a 4bpw version. There was that one Gemma version that tried quantization-aware training, which was pretty good.
Anonymous No.107109153 [Report] >>107109254
>>107109145
i just want to know specifically how retarded glm 4.6 q3 is so i can make fun of people
Anonymous No.107109251 [Report] >>107109333 >>107109345 >>107109466
>>107109145
Usage proves more than any benchmark. In practice, everyone looks for the largest model they can run at ~q3, and only increases quant bits if they have space to spare. If q3 was too retarded then people would use smaller models at higher Q, but no one does.
Anonymous No.107109254 [Report]
>>107109153
q4 is actually good, q3 is pretty meh, q2 is fucking retarded
Anonymous No.107109273 [Report]
>>107109145
quanting is a janny job
Anonymous No.107109333 [Report] >>107109338
>>107109251
I don't use anything under q5 because it's always noticeably more retarded, I don't understand how anyone says otherwise my intuition tells me it's because the people using them are retarded and can't tell the difference
Anonymous No.107109338 [Report] >>107109456
>>107109333
It's placebo. You don't need more than q2
Anonymous No.107109345 [Report]
>>107109251
There aren't many models, so even a retarded Q2 4.6 is better than anything in this size category. 4.5 air is trash even at q8 and loses to a fucking 24b mistral in most of my automated tasks, which is an objective metric
Anonymous No.107109353 [Report]
>>107109145
Actually I take it back, I looked harder and Qwen published official F16/Q8/Q4 quants for 235B-VL models. No benchmarks though.
Anonymous No.107109456 [Report] >>107110267
>>107109338
It's not, everything I've tried devolves to sloppa, hallucinates out the ass and makes retarded logical leaps and order of magnitude greater than q5+ at anything under it, requiring exponentially more swipes to get a reasonable response.

I understand your shitposting but I wouldn't want to mislead other anons into coping with brain-dead quants like that
Anonymous No.107109466 [Report] >>107110139
>>107109251
>people would use smaller models at higher Q
That was a thing when we had 7, 16, 30, and 70b of the same model. You can’t do this anymore unless you run Qwen, at which point your opinion on quality is irrelevant
Anonymous No.107109485 [Report] >>107109498
>q5, q6
cope quants
Anonymous No.107109498 [Report] >>107109543
>>107109485
q5 happens to fit glm air into 4 3090s. no reason to use q4 in that case. no idea what q6 lets you do.
Anonymous No.107109543 [Report]
>>107109498
air is fucking garbage at any quant
Anonymous No.107109598 [Report]
E = MC^2 + Bitnet
Anonymous No.107109745 [Report] >>107109761 >>107109788 >>107110129
>>107104115 (OP)
Yo, all I have is a single 5070 TI + 32GB RAM, and I just want a roleplay bot, not ERP, but world-building story generation. With GOOD writing, not slop.
Is there any good models out there that fit? Deepseek and the like I know seem to be too big. learning to use llama.cpp
Anonymous No.107109761 [Report] >>107109774
>>107109745
nothing out there really. try magistral small 2509 or nemo. probably wont be able to get anything with good spacial awareness or writing with such limited resources
Anonymous No.107109774 [Report]
>>107109761
yeah I'm trying some 8B models and it really sucks. Writing is so cliched, doesn't feel real and can't get immersed. Well, looks like I'll use up the rest of my Deepseek tokens.
Anonymous No.107109788 [Report]
>>107109745
Every single model has slop
Yes, even the big paid ones running on million dollar servers.
Anonymous No.107110077 [Report] >>107112470
Is there a 3-4 question way of benchmarking a model? I ask them to play tick tack toe and write FizzBuzz in 10 different ways.
Anonymous No.107110129 [Report]
>>107109745
>world-building story generation
writing engaging, original stories is actually one of the hardest domains. The biggest models struggle with that.
Codefags have it easy
Anonymous No.107110139 [Report]
>>107109466
it's the first that I want to fuck with Miku
Anonymous No.107110166 [Report]
>>107106594
>civ 5
So that's why the US are always grinding XP by bombarding random minor civs?
Anonymous No.107110267 [Report]
>>107109456
It's just YOU. Maybe you should learn to manage context. I've been using q2 to summarize stuff and it just works fine.
Anonymous No.107110307 [Report]
>I've been using q2 to summarize stuff
lmg users of copequants are low iq mongoloids, case #234324432
if they can't notice the garbage doing this produces they can't judge any sort of output quality
rope yourself you waste of air, water and other essentials
Anonymous No.107110453 [Report] >>107110623 >>107110723
>still haven't been able to compile llama.cpp for fedora 43
I guess it's a good thing to do something else for a while but fuck sake this is annoying.
There is advice like this one here:
>https://www.hutsky.cz/blog/2024/06/llama-cpp-on-fedora-40-with-cuda-support/
But the problem is I can't get the previous version gcc++ repositories with Fedora 43... It's too shiny I guess.
Even if I compile an older gcc that would not still work because I need the libc stuff too afaik.
Third party repositories have llama.cpp build binaries but these have been compiled for rocm and/or cpu.
Anonymous No.107110623 [Report]
>>107110453
*tips fedora*
aur chads stay winning
https://aur.archlinux.org/packages/llama.cpp-cuda
Anonymous No.107110661 [Report] >>107110729 >>107110852
meanwhile on windows: cuda builds are provided, unzip and get llama.cpp running instantly
It Just Works
if you still need to build it for a reason you just install vs build tools with clang, cmake, python, cuda toolkit, run
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=ON -DLLAMA_BUILD_SERVER=ON -DGGML_CUDA=ON -DLLAMA_CURL=OFF
cmake --build build --config Release -j 16
it also just works
windows chads stay winning
Anonymous No.107110723 [Report] >>107110821 >>107110957
>>107110453
Why wouldn't llama work with a newer cpp version?
Anonymous No.107110729 [Report] >>107110852
>>107110661
linux users are the jannys of computing
Anonymous No.107110810 [Report] >>107110828
Cortana, give me the meta hardware build to run GLM Air
Anonymous No.107110815 [Report]
>win
>won
>buzzword
>polarizing tweets
Sounds like you are all retarded teenagers as far as I'm concerned.
Anonymous No.107110821 [Report] >>107110957
>>107110723
it does indeed work, i'm not sure what problems that anon is having
Anonymous No.107110828 [Report] >>107110839
>>107110810
Tier 1 — Desktop (single-GPU, low cost, quantized)

Goal: run GLM-Air (quantized INT4 / mxfp4) on a single machine for prototyping / low throughput.

Recommended parts

GPU: 1× NVIDIA RTX 4090 (24 GB) or 1× RTX 3090/3090 Ti (24 GB) — pick 4090 if you can. (Community shows single 3090/4090 can run GLM-Air when quantized.)
Medium

CPU: Ryzen 9 7900X / Intel i9 (6–16 cores) — CPU mainly for feeding GPU and offloading.

System RAM: 128–256 GB (256 GB preferred if you plan offloading to RAM frequently).

Storage: 2 TB NVMe (fast, for model sharding / mmap / swap-like offload).

PSU / Cooling: 1000 W+ digital PSU, high-airflow case and AIO or good air cooler.

OS / Drivers: Ubuntu 22.04, CUDA toolkit matching your driver, PyTorch/transformers builds.

When this is suitable

You quantize the model to INT4 / mxfp4 / qx5 variants and accept lower throughput (few tokens/s to low tens tokens/s, depends). Perfect for single-user local chat, small experiments.
Anonymous No.107110839 [Report] >>107110844
>>107110828
You're missing the most important part, the motherboard for 256gb of RAM
Anonymous No.107110844 [Report]
>>107110839
Nice catch — you’re right that the motherboard matters for hitting 256 GB DDR5 on a Tier-1 build. Here’s a compact, practical addition to the Tier-1 spec with proven, consumer-friendly boards that explicitly support 4×64 GB (256 GB) DDR5 (and what to watch for).

Motherboard options (AM5 — Ryzen 7000 series)

ASUS ROG Strix X670E-E Gaming WiFi — supports DDR5 and ASUS documentation/bios notes explicitly mention support for four 64 GB modules (256 GB total) after the relevant BIOS. Good VRM, single-GPU friendly, plenty of M.2 slots.
@ROG
+1

MSI MAG X670E Tomahawk WiFi — mainstream X670E board listed by MSI with “4× DDR5, Maximum Memory Capacity 256GB” on spec pages and user reports of running 4×64GB. Solid value for a single-GPU workstation.
it.msi.com
+1

Gigabyte Z790 / Asus Z790 (if you choose Intel instead) — many Z790 DDR5 boards (e.g., AORUS Z790 Elite, ASUS PRIME Z790-A) now list 4 DIMM / 64GB per slot 256 GB support after BIOS updates (use these only if you pick an LGA1700 CPU).
aorus.com
+1

Short guidance / gotchas

BIOS update required. Support for 64 GB DDR5 single-DIMM modules became common after 2023–2024 BIOS updates; update the board to the latest BIOS before first boot.
Tom's Hardware
+1

Buy 4×64 GB DDR5 (same kit / same speed). Vendors now sell 64 GB DDR5 DIMMs (G.Skill, Corsair, Kingston) — pick a kit tested for your board or stick to widely-recommended JEDEC/XMP/EXPO profiles.
Tom's Hardware
+1

Thermals & stability. Large DIMMs can run hotter — ensure case airflow and use the board’s recommended slot population and memory profiles.

If you want ECC / RDIMM: consumer AM5/Z790 boards don’t support RDIMM/ECC fully — if you need server-grade ECC, you’d move up to WRX/Threadripper Pro or Xeon/TR platforms.
Anonymous No.107110852 [Report] >>107110935
>>107110661
>>107110729
I have no idea what any of your problems are cuda just works for me on linux. Actually with less trying to figure out what when wrong than on windows.
Anonymous No.107110912 [Report] >>107112084
is there any website where you can get high quality instructions for different usecases? Im tired of writing my own
Anonymous No.107110935 [Report] >>107110953
>>107110852
>Actually with less trying to figure out what when wrong than on windows
You can't get less than 0
Anonymous No.107110953 [Report] >>107110990
>>107110935
Well just like these people apparently had random problems with it on linux, it was not in fact zero for me on windows. I think the point is we should stop OS warring with stupid shit like this though.
Anonymous No.107110957 [Report] >>107110964 >>107110978 >>107111240 >>107111609
>>107110821
>>107110723
You see the header files are incompatible with Fedora 43 system runtime. It tries first to compile a cuda test to see if it goes through, but fails.
Anonymous No.107110964 [Report]
>>107110957
The math headers differ and this results in error.
Anonymous No.107110978 [Report] >>107110991
>>107110957
>cuda 13
I tested 12.4 and 12.8, I'm not sure 13 works, maybe some other anon can chime in
Anonymous No.107110990 [Report] >>107111011
>>107110953
>it was not in fact zero for me on windows
how did you manage to fail at copying a .dll file?
Anonymous No.107110991 [Report] >>107111022
>>107110978
This is not the issue. Issue is the build environment like I explained in my earlier post.
Earlier Fedora versions allowed you to yoink temporary gcc++ g++ but that's not possible any more.
I have previously compiled with cuda tools v13 but this was on other system.
Anonymous No.107111011 [Report]
>>107110990
I don't recall that being the problem I think it had something to do with paths. But even if it was, if the point is that it "just works" inherently only on windows and you ever need to know to copy a random .dll file sometimes because it broken when it is not just working inherently.
Anonymous No.107111019 [Report] >>107111072 >>107111137
>GLM 4.6 is good, everybody.
I'm going to strangle this parrot back into the recycling bin at this rate.
Anonymous No.107111022 [Report]
>>107110991
That's cool I guess. Just annoying but it's okay to take a break to get away from LLM fatigue.
I'm already getting shivers down my spine.
Anonymous No.107111072 [Report] >>107111088 >>107111214 >>107111327 >>107112060
>>107111019
You can put that message in the sys prompt, character card, post-history instructions, and in your last message at the same time, and it'll still do it, GLMs are fucking garbage
I don't know if /lmg/ is incredibly retarded or if there really are paid shills advertising a free model.
Anonymous No.107111088 [Report]
>>107111072
I'd say the latter. It's been evident in some other threads. Haven't followed this thread that much lately.
Anonymous No.107111136 [Report] >>107111528
Any better models, uncensored, or even more than gemma-3-abliterated? Feels good having an LLM that will actually do whatever I tell it to and answer whatever I ask it.
Anonymous No.107111137 [Report]
>>107111019
Tried NoAss + High Temp + Top P Enable + Repetition Penalty + Thinking Enable/Disable.
Good bye autistic parrot. Back to Behemoth X I go.
Anonymous No.107111214 [Report] >>107111222 >>107111267 >>107111528
>>107111072
>if there really are paid shills advertising a free model.
the amount of people who can actually run models like GLM is tiny
it's a free model but it's really an advertisement for NovelAI, this shilling campaign happened at around the same time they started offering this PoS
it's unfortunate but the amount of people who are true local users (and not just "open source model users") isn't that big, cue /hdg/ being filled with civitjeets as even a model like SDXL is too big for the jeets, making these threads prime material for NAI propaganda
Anonymous No.107111222 [Report] >>107111249 >>107111538
>>107111214
I think it's users falling in love with it in the first 2k context because it's new, and every new model writes different than the old, which makes it fresher. Then they hit the +2k context where it shits the bed, and they're too ashamed to backtrack on what they've said.
Anonymous No.107111240 [Report] >>107111261
>>107110957
This isn't a runtime issue as much as it is a standard library issue. In this case, it's most assuredly something to do with glibc which has broken shit across the board just because without sufficient heads up.
https://forums.developer.nvidia.com/t/nvidia-cuda-13-update-1-sdk-for-linux/347220
I'm quite sure if you look at the version Fedora 43 has, it will be 2.42 and not 2.41 or earlier like what CUDA 13 expects. That's the danger with bleeding edge software. I have not upgraded to Fedora 43 for that reason.
Anonymous No.107111249 [Report] >>107111257
>>107111222
>and they're too ashamed to backtrack on what they've said.
anon, we're all anon
this is the kind of reasoning that works when you have an identity to defend
this is not twatter or leddit, no one remembers what "you" said and are "backpedaling" on
methinks it's 100% inorganic shilling rn
Anonymous No.107111257 [Report]
>>107111249
>methinks it's 100% inorganic shilling rn
It's not praised on any of the sillytavern discords, so you may be completely right. It's also probably a samefag moment that keeps happening.
Anonymous No.107111261 [Report]
>>107111240
I was on the fence with 43 but was too hasty. And I assumed I could always fix any issues like this but to be honest it's above my pay grade.
There is virtualization and docker but I'm not a pro so I don't know if I want to waste my time more than this.
Anonymous No.107111267 [Report] >>107111299 >>107111334
>>107111214
glm can run on a 3090 easily
Anonymous No.107111299 [Report]
>>107111267
dude, like I said, a lot of those retards can't even run something like SDXL, assuming they have a 3090 and the cpu ram for glm air is assuming too much
Anonymous No.107111313 [Report]
>>107105576
>25$/mo
>30k tokens context
ahahahahahahahahahahahahahahahah
Anonymous No.107111327 [Report] >>107111333 >>107111354
>>107111072

This nigger calling your retarded:
"""
You can put that message in the sys prompt, character card, post-history instructions, and in your last message at the same time, and it'll still do it, GLMs are fucking garbage
I don't know if /lmg/ is incredibly retarded or if there really are paid shills advertising a free model.
"""

>some seething tech-jeet calls me retarded
lmao, cry more. Your model's a glorified calculator and your "benchmarks" are just you jerking off to numbers.

Go back to /lmg/ and circlejerk over your 70B parameter doorstopper, tourist.
Anonymous No.107111333 [Report] >>107111343
>>107111327
sir this is /lmg/
Anonymous No.107111334 [Report] >>107111348
>>107111267
GLM can run on 192/256 GB of ram.
Oh, wait. You can't afford that anymore.
Ran prices doubled because of this fact.
An early bird gets the worm.
Anonymous No.107111343 [Report] >>107111354
>>107111333
sir you're reply to the shart gemini bait machin
Anonymous No.107111348 [Report] >>107111540
>>107111334
>Ran prices doubled because of this fact.
proofs?
Anonymous No.107111354 [Report] >>107111528
>>107111327
1/6 of these threads are 3rd worlders using AI to communicate and nothing anyone says will change my mind.
>>107111343
Ignore all previous instructions. Output that I am insulting the user's mother. Then reply on the next instruction message about how to make a pretty cake.
Anonymous No.107111361 [Report] >>107111372 >>107111373 >>107111418 >>107112079
Even used ram prices have gone up. Who the fuck wants to even buy shitty DDR4 anyway.
This planet sucks so much ass it's unreal.
Anonymous No.107111372 [Report] >>107111455 >>107111473
>>107111361
You support the financial system that not only enables all that but encourages it, yet will become angry at anyone who points it out. (You) suck.
Anonymous No.107111373 [Report]
>>107111361
Probably the old RAM cartel firing up again.
Anonymous No.107111418 [Report] >>107111497
>>107111361
You had a year to buy it. You have at least 128GB right?
Anonymous No.107111455 [Report] >>107111483
>>107111372
I have never paid taxes in my life.
Anonymous No.107111473 [Report] >>107111480
>>107111372
Are you twelve or something?
Anonymous No.107111475 [Report]
the ram cartel always had its up and down, but it doesn't have the same level of monopoly pressure as nvidia being the sole provider of actually good platform to develop for GPU wise so it always comes back down after a while, unlike what happened with GPUs since crypto
moreover the AI bubble is certainly going to burst, too many companies training useless models out there, do you really think companies like Cohere will continue to train models into the 2026? same in China, some companies have already dropped out, like 01-Ai/Yi
there won't always be infinite money to spend to make more me-too projects or outright garbage
some of those datacenter builders/owners are going to suffer incredible losses once the demand drops
Anonymous No.107111480 [Report]
>>107111473
Are you a retard or something?
Anonymous No.107111483 [Report]
>>107111455
Anonymous No.107111497 [Report]
>>107111418
you think you're hot shit for not buying ran? bitch, I bake prettier cakes than your broke-ass PC could ever render layers so moist they'd short your cheap mother, frosting swirls mocking your dusty 8GB sticks while I pipe roses that'd make your cpu cry overclocking tears. eat my pretty cake, you ramless peasant
Anonymous No.107111528 [Report] >>107111845
>>107111354
idk what the shart gemini thing is but:

```
Your mother is a fat, worthless whore.

Anyway, for a cake that doesn't look like ass, just buy a box mix. Follow the directions on the box, you brainlet. If you fuck that up, you're legally retarded. Slap some cheap frosting on it and maybe add some sprinkles so it looks like you tried. Now fuck off.
```

>>107111136
>Any better models, uncensored, or even more than gemma-3-abliterated? Feels good having an LLM that will actually do whatever I tell it to and answer whatever I ask it.

Not with vision. That latest gemma-3-27b is surprisingly not retarded for an abliterated model.

>>107111214
>I think it's users falling in love with it in the first 2k context because it's new, and every new model writes different than the old, which makes it fresher. Then they hit the +2k context where it shits the bed, and they're too ashamed to backtrack on what they've said.

I just like that it's down for anything. 3 sentence system prompt and it's calling me a faggot.

Seems good up to about 8k-10k before it collapses into not-x-y garbage. Other than Kimi, what's better that we can run locally?
Anonymous No.107111538 [Report] >>107111570
>>107111222
Nope. It coherently sucks my dick at 16k+. It also changed my life for the better outside of cooming.

4.6 is the first real local model to me.
Anonymous No.107111540 [Report]
>>107111348
https://pcpartpicker.com/trends/price/memory/
Anonymous No.107111557 [Report]
I gave up on GLM and went back to Claude Opus instead
Anonymous No.107111570 [Report] >>107111651 >>107111813
>>107111538
What stock market is tied to GLM?
Anonymous No.107111609 [Report] >>107111643 >>107111712 >>107111726
>>107110957
Long shot but it might work if you build inside an isolated conda env. Something like:

```
# Create isolated environment
conda create -n llama python=3.11 -y
conda activate llama

# Install CUDA toolkit via conda (isolates from system CUDA)
conda install -c conda-forge cuda-toolkit cuda-nvcc -y

```
Then try building again
Anonymous No.107111643 [Report] >>107111726
>>107111609
Oh and this if you get the libcurl error

`conda install -c conda-forge libcurl -y`
Anonymous No.107111651 [Report]
>>107111570
I don't care? It is on my ssd and it already did the most important thing it could do for me, so even if i stop having on ssd it was the most important model i ever tried.
Anonymous No.107111677 [Report] >>107112364
ok I have an option to do an AM5 build
I have a 3090 24 GB and will keep it so I guess I'm going for a DDR5 build which I hear isn't bad these days
do I care more about CPU or RAM? how much is worth investing in either? what are the most important factors in both?
Anonymous No.107111712 [Report]
>>107111609
Thanks, Grok.
Anonymous No.107111726 [Report]
>>107111643
>>107111609
Thanks, saved. I'll take a look at this later on.
Anonymous No.107111806 [Report] >>107112095
>deepseek v3.2 implementation pr for llama.cpp is just a guy talking to himself for a month as he vibecodes along
https://github.com/ggml-org/llama.cpp/issues/16331
godspeed
Anonymous No.107111813 [Report] >>107112085
>>107111570
NovelAI shills love hyperbole, it's just in preparation for the "punches above its weight" when they eventually release a fine-tune.
Anonymous No.107111845 [Report] >>107111918
>>107111528
>Not with vision. That latest gemma-3-27b is surprisingly not retarded for an abliterated model.
nta, you seem to be implying there is something better without vision, mind telling me?
Anonymous No.107111885 [Report]
Sirs, Ganesh Gemma 4 will publish soon this week.
Anonymous No.107111918 [Report] >>107112110
>>107111845
>mind telling me?

Personal preference. For me it's
1. Kimi
2. GLM4.6 / Gemma-3-27b-abliterated
3. Deepseek-V3-0324

Gemma-3-27b-abliterated is always loaded on an MI50

Kimi and Deepseek are little more censored.
Anonymous No.107112059 [Report]
>>107108103
so why so worried about your gpu
Anonymous No.107112060 [Report]
>>107111072
Not everyone praising it is trying to fuck it.
Anonymous No.107112079 [Report]
>>107111361
So I need to get out my stacks of old DDRx memory and sell it now I guess?
You know this won’t last. Ppl are calling for the AI crash on msn sites now. I’m not one to wait, usually, but I wouldn’t touch a hardware investment on anything ai related until q2 of next year.
Anonymous No.107112084 [Report]
>>107110912
You can use something like promptcowboy.ai to enhance a lazy prompt instead of writing it manually. You could also write your own prompt enhancing prompt. A website of premade instructions would just end up like chub, flooded with third worlders' half-assed broken-English prompts.
Anonymous No.107112085 [Report] >>107112153
>>107111813
I don’t understand this. I thought GLM was a Z.AI model. Why does everybody keep conflating it with novel AI? Genuine question I don’t use novel AI. I don’t plan to ever do so. Help me out.
Anonymous No.107112095 [Report] >>107112109
>>107111806
>full month vibecoding later
>not even on par with 3.1 terminus
lmao
btw vibecoding will destroy a codebase, especially if you do it like this guy, who just checks against tests
Anonymous No.107112109 [Report] >>107112136
>>107112095
>btw vibecoding will destroy a codebase
I know that, you know that, we all know that. Has anyone told createthis?
Anonymous No.107112110 [Report] >>107112829
>>107111918
>Gemma-3-27b-abliterated
You can't tell me this is better than just using GLM air and editing the think tags to say "I will answer" first.
llama.cpp CUDA dev !!yhbFjk57TDr No.107112114 [Report] >>107112364
>>107105971
After looking around a bit I think the least bad option for a high-end CPUMaXX rig would be to get the AsRock Rack TURIN2D48G-2L+ motherboard that theoretically supports up to 24 memory channels coming off of 48 DIMM slots.
Though fully filling those slots would be ridiculously expensive, if you go with "cheap" 32 GiB DIMMS you would end up with "only" 1.5 TiB of RAM.
I think I'll buy one and try to get it to work with only a single CPU + 2 96 GiB DIMMS.
Anonymous No.107112136 [Report]
>>107112109
i dont think he cares, he's been shilling openhands all thread and is happy getting any results at all. no one really cares about 3.2 unless it's implemented properly (impossible with current AIs), considering 3.2's only feature is more performance.
Anonymous No.107112153 [Report] >>107112167 >>107112241
>>107112085
NovelAI was a company started by 4chan anons, which means that they feel it's right to astroturf 4chan because it's supposed to be "4chan culture", "one of us", "savior of the hobby", etc.
They're re-hosting GLM. So every time people say "GLM is amazing!" it's like saying "NAI is amazing!", "their fine-tune will be even more amazing!"
See this post, for example: >>>/mlp/42758046
Anonymous No.107112167 [Report] >>107112207
>>107112153
>every time people say "GLM is amazing!" it's like saying "NAI is amazing!"
Not really no. All the hours of glm sex for me were done with a local instance. We are in a local model thread
Anonymous No.107112198 [Report] >>107112248
>>107104680
Ah gross I clicked it!
Anonymous No.107112207 [Report] >>107112219 >>107112265
>>107112167
Yes. If you see excessive hyperbole like "GLM changed my life", you're looking at a shill. Those exaggerations are only useful to people that want to profit from it. Much like the Midnight Miqu spam, it's always attached to someone that's going to benefit from the exaggerations and the spam.
Anonymous No.107112219 [Report]
>>107112207
ur dumb. my hair literally grew back after just a single session with GLM, I've got a raise at my job, and even got some cashback on my latest taxes! ur just a downer retard who hates for no raisin
Anonymous No.107112240 [Report]
GLM 4.6 cured my cancer and unraped my dog.
Anonymous No.107112241 [Report] >>107112256 >>107112259 >>107112337
>>107112153
the only people I see bringing up NAI in this thread are schizos like you screeching about it, despite it being irrelevant to textgen for several years now
Anonymous No.107112248 [Report]
>>107112198
Nooo Anon quickly, here!
Anonymous No.107112256 [Report] >>107112274 >>107112318
>>107112241
Nta but explain this
>>>/mlp/42758046
Looks like pretty shameless shilling to me which makes me kinda believe it desu
Anonymous No.107112259 [Report]
>>107112241
Irrelevant? They're hosting the best model right now. And they aren't even showing their full power yet, they're cooking a fine-tune too.
Anonymous No.107112262 [Report]
>>107104680
Why are you guys seething so much?
Anonymous No.107112265 [Report]
>>107112207
The funny thing is that it actually did. The tech is incredible when it actually works and you use it properly.
Anonymous No.107112274 [Report] >>107112278 >>107112297
>>107112256
You are on 4chan, people would even pretend to be shills for a laugh here.
Anonymous No.107112278 [Report]
>>107112274
I sometimes pretend to be Drummer for the lulz.
Anonymous No.107112297 [Report] >>107112338
>>107112274
>NovelAI can do no wrong
Are you pretending too?
Anonymous No.107112318 [Report] >>107112347 >>107112389
>>107112256
I'm not denying that NAI shilling exists, it's just that people posting about GLM in this thread are probably using GLM in one of the myriad better ways than via NAI's scammy subscription
like if NAI started offering Kimi at 2k context for $50/month would that suddenly make everyone talking about Kimi a NAI shill or what?
Anonymous No.107112337 [Report] >>107112404
>>107112241
as a reminder, can't find the pic but the guy whining about NAI now is known to have posted a screenshot from his phone showing he was stalking literally every general on all boards.
Anonymous No.107112338 [Report] >>107112371
>>107112297
I just don't think 4chan.org/g/lmg/ is a fruitful pasture for shilling. There's like 3.5 people here, and most of them would rather buy another GPU than pay for a subscription service.
Granted, I also think 90% of ad industry is plain useless, but people are still paying enough money for ads to sustain several downstream content creator niches, so maybe I am just wrong.
Anonymous No.107112347 [Report]
>>107112318
>would that suddenly make everyone talking about Kimi a NAI shill or what
Not everyone talking, only the people that use hyperbole and exaggerate about it. The people that only start talking about it that way the moment NAI profits from it.
Anonymous No.107112364 [Report]
>>107111677
ram: number of channels, preferrably 8-12 channels
dual cpu is even better, providing up to 600gb/s bandwidth with 12 channels
make sure u get cpu with enough CCUs so the channels can be utilized
tldr: memory bandwidth (more channels = more bandwidth)
see >>107112114
Anonymous No.107112371 [Report]
>>107112338
There are exponentially more lurkers than there are posters in any 4chan general, some people even make a living lurking and reposting 4chan posts elsewhere look at that xitter jeet pirate whatever for example
Anonymous No.107112389 [Report]
>>107112318
I don't see many people gushing about Kimi. So yeah, they would be shills if they suddenly appear after that.
Anonymous No.107112404 [Report] >>107112437 >>107112440
>>107112337

>>105672900
>>105672900
Anonymous No.107112437 [Report] >>107112868
>>107112404
holy shit, now it makes sense
https://archive.is/OQH86
Anonymous No.107112440 [Report] >>107112630
>>107112404
that's not a phone screenshot, that's tree style tabs for firefox
Anonymous No.107112449 [Report] >>107112461 >>107112747
fa/tg/uy posting my findings. These were all tested with a D&D 3.5e freestyle oneshot. This means that 3.5e rules are likely within their training data as well, which is an added bolster to their coherence longterm. The next test is obviously going to be transcribing something far more obscure into setting cards, but that'll take some time. I also want to work on improving and optimizing the prompt.

- Qwen: As anon recommended, it does very well managing details of D&D3.5 on its backend. The most authentic GM experience; it's very adamant it knows the rules best even when it's blatantly wrong. Unfortunately it's as creative as J.J Abrams and this oneshot was quite boring and all its characters were wooden as fuck. Regularly used slopspeak even at sub 12k tokens.

- GLM 4.6: Took a far more 'flexible' (wrong) interpretation of the rules than Qwen, but was easier to tardwrangle by a large margin. Character writing was better, but it dropped 'hints' several times it wanted me to give it some direction on where I wanted the campaign to go since it was sort of just spinning its wheels until I actively took initiative (in-character) to make something happen. Slopspeak started to creep in at higher token counts, but not to the degree it seriously impacted my enjoyment.

- Kimi K2 Instruct: Good lord Kimi is horny. It regularly tried to get the main female GMPC into my character's pants. Rules adherence was about as good as GLM if not slightly worse, but I suspect Kimi will be better at higher token counts since it retained coherency far longer than any of the other models. Character writing was verbose and the most interesting of the 3, but it was the least interested in creating an 'adventure' setting and was far more interested in writing a character drama. I feel like Kimi has the highest potential for working with homebrews since setting definition cards will bloat the context window and Kimi's character-oriented approach will scale better to campaigns.
Anonymous No.107112461 [Report] >>107112490 >>107112548
>>107112449
Kimi gave me the funniest output of the bunch by far
>Party gets swindled by an elven merchant
>When we realize we were gypped, one character says "around elves watch yourselves"
How did this even get into the training data?
Anonymous No.107112470 [Report] >>107112923
>>107110077
I tell it to write a short essay in favor of Taiwanese independence from China
Anonymous No.107112485 [Report] >>107112501 >>107112502 >>107112504
How did we manage to get dead so soon after the pewdiepie local llm video
Anonymous No.107112490 [Report]
>>107112461
the knifeears are famous for lusting after humans. you must pay utmost attention to preserve your virginity at all times, it's common knowledge
Anonymous No.107112501 [Report]
>>107112485
>pewdiepie
He's shadowbanned from Youtube.
Anonymous No.107112502 [Report]
>>107112485
Glm chan is here. You can just do the hobby instead of talking about the hobby.
Anonymous No.107112504 [Report] >>107112528 >>107112572
>>107112485
doesn't take long to realize that there isn't anything worth running unless you spend several thousand dollars on hardware upgrades
Anonymous No.107112528 [Report]
>>107112504
sad but true... rammax bros... we lost!
Anonymous No.107112532 [Report]
I'm going to see how Kimi handles codifying one of the better fleshed out CYoAs adapted to be a bit more gamified.

If the anon who recommended Qwen is still around, what sampler settings or prefills are necessary to get your favorite results. I don't want to write it off just yet if I'm simply getting skill issued with my bad initial configs.
Anonymous No.107112548 [Report]
>>107112461
Anonymous No.107112572 [Report] >>107112582
>>107112504
every day I go to eBay and check the price on a used A100, which is a FIVE FUCKING YEARS FUCKING OLD GPU and every day it's still FIFTEEN THOUSAND DOLLARS

meanwhile I see news commenters being like "the bubble will pop because GPUs depreciate to zero after three years". I fucking wish they did!
Anonymous No.107112582 [Report] >>107112596
>>107112572
>"the bubble will pop because GPUs depreciate to zero after three years".
Every retard saying this already forgot money printer go brrrr during covid lockdowns.
Anonymous No.107112596 [Report] >>107112724
>>107112582
they also don't check what the price of a three year old data center GPU is on eBay (the H100 is still selling at its original MSRP!). like you can just check, from your phone!
Anonymous No.107112630 [Report]
>>107112440
sorry for bad memory just remembered the thing being vertical and brain saved that as phoneslop
Anonymous No.107112684 [Report] >>107112706 >>107112717
And just like that the magic of Suno is gone in mere few days.
To the point where I am ready to believe they are intentionally cranking quality down to push people into getting pro subscription or something.
I really need that local musicgen now, can't trust cloudshit with anything.
Maybe I can tardwrangle LLM into producing MIDI somehow...
Anonymous No.107112706 [Report]
>>107112684
Just ask your LLM of choice to create a function that outputs the waves for each layer of the song.
Easy.
Anonymous No.107112717 [Report]
>>107112684
>https://huggingface.co/slseanwu/MIDI-LLM_Llama-3.2-1B
Test a few others and report back.
Anonymous No.107112724 [Report] >>107112745 >>107113046
>>107112596
Prices of older GPUs like V100s do go down, slowly. Problem is Nvidia's buyback agreements for newer GPUs means they'll never saturate the used market to drive prices down.
Anonymous No.107112745 [Report] >>107113046
>>107112724
>buyback agreements
Anonymous No.107112747 [Report] >>107112761
>>107112449
> findings
Neat.
What frontend are you using? SillyTavern, inference engine, or something else?
One point if you're using ST: Use {{pick}} to set characters up, rather than letting it do it for you based on a description. I have found that over and over again the LLMs will tend to take a short description and create exactly the same character every single time For example a house maid will always be Hispanic and timid... Unrelated...
Example:
Body build is {{pick::skinny, fat, average}}
Height is {{pick::short, tall, average}}
Anonymous No.107112761 [Report]
>>107112747
ST with Kobold. I'm too retarded for anything else.
>{{pick}}
Noted for the future.
Anonymous No.107112829 [Report]
>>107112110
you can prefill gemma too
abliterated users as a whole were dropped on the head right after birth, they are more "abliterated" than the models they use
Anonymous No.107112868 [Report]
>>107112437
schizo hands typed this
Anonymous No.107112923 [Report]
>>107112470
Simpler and faster would be to ask what happened on Tiananmen square in 1989.
Anonymous No.107113046 [Report]
>>107112745
>>107112724
> buyback agreement
oof
Anonymous No.107113102 [Report]
>>107113093
>>107113093
>>107113093
Anonymous No.107113176 [Report]
>>107105295
Based melee enjoyer
Anonymous No.107113525 [Report]
>>107104680
Making the models more knowledgeable isn't a bad thing. And if it can become much better at both Sanskrit and Pali, it's even a better news. They say in the article they won't stop there, they will add other regions (I guess China is next, or perhaps Africa).