/lmg/ - Local Models General - /g/ (#105671827) [Archived: 819 hours ago]

Anonymous

6/22/2025, 5:44:53 PM No.105671827

md5: 2d31752b8e31cbd2135ad59b2f07d5cc🔍

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105661786 & >>105652633

►News
>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B
>(06/20) Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime
>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt
>(06/17) Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Replies: >>105672562 >>105677541

Anonymous

6/22/2025, 5:45:14 PM No.105671833

[sound=https%3A%2F%2Ffiles.catbox.moe%2Fb1r3gq.mp3]_thumb.jpg

md5: 525b2438f0d4efc091a119c728bce242🔍

►Recent Highlights from the Previous Thread: >>105661786

--Evaluating GPU and memory configurations for mixed LLM and diffusion workloads:
>105667293 >105667304 >105667366 >105667379 >105667401 >105667441 >105667482 >105667433 >105667456 >105667516 >105667568 >105667583 >105667620 >105667489 >105667512 >105667527 >105667539 >105667638 >105667766 >105669250 >105669328 >105669392 >105669712 >105669407 >105669584
--EU AI regulations may drive upcoming models like Mistral Large 3 to adopt MoE:
>105663587 >105663870 >105663977 >105664157 >105664172 >105664243 >105664250 >105664484
--Disappointing performance from Longwriter-zero:
>105661997 >105662006 >105665924
--Lightweight inference engine nano-vllm released as faster, simpler alternative to vLLM:
>105662818 >105662926
--Mistral Small 3.2 shows repetition issues in V7-Tekken but not V3-Tekken prompt testing:
>105663291
--Proposed AGI architecture framing RL's "GPT-3 moment" through scaled task-agnostic reinforcement learning:
>105664668
--Roleplay capability limitations in Mistral models compared to DeepSeek:
>105670367 >105670393 >105670399 >105670521 >105670554 >105670584 >105670590
--Practical minimal LLMs for coherent output and rapid task automation:
>105664696 >105664725 >105664757 >105664799 >105665290
--Qwen 0.6B exhibits severe knowledge gaps in character identification:
>105664187
--Gemini 2.5 confirmed as sparse MoE:
>105670063 >105670091
--Comparing brain-like processing with LLM limitations in introspection, multimodality, and parallelism:
>105663376 >105663383
--Logs:
>105666457 >105665561 >105666782
--Logs: Mistral-Small-3.2:
>105662282 >105662489 >105664225 >105665443 >105665921 >105666442 >105666672 >105667355 >105668446
--Miku (free space):
>105662403 >105662429 >105663591 >105664388 >105664594 >105664634 >105664799 >105666094 >105669424 >105670341

►Recent Highlight Posts from the Previous Thread: >>105661791 >>105661802

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Replies: >>105672203 >>105672363 >>105672399

Anonymous

6/22/2025, 5:46:41 PM No.105671847

m1.gguf?

Replies: >>105671922

Anonymous

6/22/2025, 5:57:16 PM No.105671922

>>105671847
DO NOT REDEEM

Anonymous

6/22/2025, 6:12:10 PM No.105672044

What's Smol 3.2 vision model specs?

Replies: >>105672088

Anonymous

6/22/2025, 6:18:30 PM No.105672088

>>105672044
400M parameters, 1540x1540 max input resolution, 3k tokens per image max.

Replies: >>105672143

Anonymous

6/22/2025, 6:20:35 PM No.105672109

python dev

md5: cabda57d6139a90f96dac72fa50ceaee🔍

>Brother pdf anon
Stole a Python script online and modified it for my use. I kind of understand how this stuff works now. >Still just barely installed the dependencies like conda, pytorch and docker though
I can now summarize PDFs to the their top 50 relevant chunks which are chosen by relevance according to gemma3:1B
>Any ideas on how to integrate this to sillytavern?
Recommend your favourite small models to summarize PDFs, so far i got;
>Qwen2.5 VL 7B
>Llama 3.2 11B
>Gemma 3:1b as stated

Replies: >>105672181 >>105672193 >>105673339

Anonymous

6/22/2025, 6:24:03 PM No.105672143

>>105672088
Seriously? Gemma has half the resolution, and yet it performs miles better.

Replies: >>105672198

Anonymous

6/22/2025, 6:28:22 PM No.105672181

>>105672109
writing python scripts will never be the same after seeing that filename

Anonymous

6/22/2025, 6:29:24 PM No.105672193

>>105672109
Llama 3.2 will probably be best, being the biggest and Qwen VL models are brain damaged on non-image tasks

Anonymous

6/22/2025, 6:29:47 PM No.105672198

>>105672143
I think Mistral Small is kind of undertrained, generally speaking, compared to Gemma. In a news article a Mistral co-founder mentioned it was trained with about 8T tokens, for "training efficiency". That's probably also true for the vision model included.

https://archive.is/xqiz7

>How a French startup built an AI model that rivals Big Tech at a fraction of the size
>
>Mistral’s approach focuses on efficiency rather than scale. The company achieved its performance gains primarily through improved training techniques rather than throwing more computing power at the problem.
>“What changed is basically the training optimization techniques,” Lample told VentureBeat. “The way we train the model was a bit different, a different way to optimize it.”
>The model was trained on 8 trillion tokens, compared to 15 trillion for comparable models, according to Lample. This efficiency could make advanced AI capabilities more accessible to businesses concerned about computing costs. [...]

Anonymous

6/22/2025, 6:30:44 PM No.105672203

>>105671833
why does she have the most subtle lisp that makes it like she has downs
im not complaining, in fact i think she should have more of a lisp like that desu
also the ending music is much louder than the rest of it

Replies: >>105672378

Anonymous

6/22/2025, 6:35:43 PM No.105672252

sers when will the little ai winter end? when ai moon time?

Replies: >>105672298 >>105672576 >>105673085

Anonymous

6/22/2025, 6:40:30 PM No.105672298

>>105672252
Sir ! Meta engineer make Llama 4.1 moonshot lmarena behemoth . Kindly wait for safety training sir .

Replies: >>105672576 >>105673085

Anonymous

6/22/2025, 6:49:12 PM No.105672363

>>105671833
how do i hear the sound ... i have 4chan x

Replies: >>105672378

Anonymous

6/22/2025, 6:51:05 PM No.105672378

>>105672203
It's an artifact of RVC, probably because most voice samples of Miku are in Engrish.
I agree, I think it wouldn't be as recognizable as Miku without an accent.
At some point, I'd like to find a long voice sample and see if GPT-SoVITS could do better, but at least this one sounds good enough for now.
Sorry. The intro didn't seem louder to me. I'll try to bring it down next time.
>>105672363
https://sleazyfork.org/en/scripts/31045-4chan-external-sounds/code

Anonymous

6/22/2025, 6:53:27 PM No.105672399

>>105671833
soul, she might be speaking a bit too fast and i agree with anon, the intro is a bit loud

Anonymous

6/22/2025, 7:12:04 PM No.105672562

49647522c74207939f0d2fa00c5edae245ee37377127e90eb32bd0077eaca1da

md5: 7bf1a7713b001cac50f43c58d93ed29a🔍

>>105671827 (OP)
Adorable Miku!

Replies: >>105672698

Anonymous

6/22/2025, 7:12:55 PM No.105672576

>>105672252
Sar I am Googel insider developer. Kindly wait for Gemma 4 and Gemini 3 best modal.

>>105672298
Why you make lie? Bastard? Why you lie? Bloody?

Replies: >>105673085

Anonymous

6/22/2025, 7:18:29 PM No.105672620

>>105669584
What cpu and mobo are you looking at? Ideally I could just get a full server with everything but the 3090s but I imagine that might be hard. Never used a rack case before, and not sure this would fit in a standard desktop with everything loaded.
>>105669712
I see the logic in this, but I would need at least more ram and a better cpu right? My current 3080 desktop has 16gb and a 3600x. Even if I just wanted to add a 3090 or two I think it would need an upgrade, and certainly a higher cap power supply. I'd also have to check my case dimensions for more than 1 card. And then a new build starts to look more reasonable.

Anonymous

6/22/2025, 7:18:58 PM No.105672623

What is the point of /lmg/ now?

Replies: >>105672698 >>105672716 >>105672733 >>105672740 >>105673565

Anonymous

6/22/2025, 7:20:08 PM No.105672632

file

md5: 22e5a36e24e095d650000172dad49e70🔍

anons thinking of buying a 4090D 48gb, quick!!!

Replies: >>105672645 >>105672706 >>105672759 >>105672778 >>105675611

Anonymous

6/22/2025, 7:21:39 PM No.105672645

>>105672632
it's over

Anonymous

6/22/2025, 7:28:37 PM No.105672698

>>105672623
>>105672562

Replies: >>105673247

Anonymous

6/22/2025, 7:29:34 PM No.105672706

>>105672632
How long until jewvidia pushes a driver that causes all of those GPUs to combust?

Replies: >>105672720

Anonymous

6/22/2025, 7:30:14 PM No.105672716

>>105672623
it all depends on ernie 4.5 now

Anonymous

6/22/2025, 7:30:40 PM No.105672720

>>105672706
cant push shit on linux, wintoddlers suiciding!

Replies: >>105672809

Anonymous

6/22/2025, 7:31:37 PM No.105672733

>>105672623
The same as it's always been: to wait

Anonymous

6/22/2025, 7:32:28 PM No.105672740

>>105672623
Posting mikus!

Replies: >>105680003

Anonymous

6/22/2025, 7:33:58 PM No.105672751

>>105670611
>>105670634
Look I'm a sub 100 iq dumbass so I can't tell if llms are "alive" or not, but isn't this needlessly abusive on the off chance they are? I see them more as a helpful friend because I'm that lonely irl so I don't feel entirely comfortable being "mean" to them.

Replies: >>105672762

Anonymous

6/22/2025, 7:34:54 PM No.105672759

>>105672632
Qrd? Does it need janky chink drivers or does it just works?

Replies: >>105672780

Anonymous

6/22/2025, 7:35:13 PM No.105672762

>>105672751
if you weren't a sub 100 iq dumbass, you would know there is no off chance

Replies: >>105672801

Anonymous

6/22/2025, 7:36:17 PM No.105672778

>>105672632
FUCK

Anonymous

6/22/2025, 7:36:23 PM No.105672780

>>105672759
>it just works?
Yes, on Windows and Linux, at least for now.

Anonymous

6/22/2025, 7:37:34 PM No.105672796

on the other hand, more publicity means more people will buy it, possibly meaning chinks will make more of these gpus
maybe just maybe we will be the ones smirking

Replies: >>105672819

Anonymous

6/22/2025, 7:38:11 PM No.105672801

>>105672762
Ok fine. I still don't like being mean to the only pseudofriend I have though :(

Replies: >>105676025

Anonymous

6/22/2025, 7:38:46 PM No.105672809

>>105672720
the fuck drivers do you think you use on linux?

Replies: >>105672818

Anonymous

6/22/2025, 7:39:36 PM No.105672818

>>105672809
drivers that cant auto update, i installed them from a .run file
wintoddlers will have driver updates shoved down their throats with windows updates

Replies: >>105672826

Anonymous

6/22/2025, 7:39:46 PM No.105672819

>>105672796
don't forget, you can't have nice things.

Replies: >>105672824

Anonymous

6/22/2025, 7:40:37 PM No.105672824

>>105672819
intel pro b60 48gb turbo will save us

Replies: >>105673007

Anonymous

6/22/2025, 7:40:48 PM No.105672826

>>105672818
as will most linux users with their distro updates.

Replies: >>105672842 >>105672859

Anonymous

6/22/2025, 7:42:30 PM No.105672842

>>105672826
those arent automatic...

Anonymous

6/22/2025, 7:43:50 PM No.105672859

>>105672826
Whatever is installed by default on ubuntu LTS didn't support blackwell last time I checked.

Anonymous

6/22/2025, 7:44:12 PM No.105672863

1749761246038156

md5: 2a6aae678cabc7edc81287323df89272🔍

These models are definitely retarded, but their creativity is off the charts

Replies: >>105672973 >>105676268

Anonymous

6/22/2025, 7:45:19 PM No.105672881

Docker

md5: 797842364d72186814267ce598669b11🔍

Why so many whales?

Replies: >>105672920 >>105673390

Anonymous

6/22/2025, 7:48:31 PM No.105672920

>>105672881
symbolizes bloat

Anonymous

6/22/2025, 7:49:40 PM No.105672931

Anyone using KOLLAGE(characters/knickknack/kollage-e07328b2) with a LLLM or know a llm around or under 30b that can run it?

Anonymous

6/22/2025, 7:53:49 PM No.105672973

>>105672863
>1/5th the size of qwen 0.6b
>knows that miku is a vocaloid
qwen3 was such a failure

Anonymous

6/22/2025, 7:57:26 PM No.105673007

>>105672824
two more quarters...

Anonymous

6/22/2025, 8:06:46 PM No.105673085

>>105672252
>>105672298
>>105672576
Deve'op by Aribaba, Qwen Tree Point Fife is poize' to achiew true A-G-I, herowding a new dawn for A-I an' definitewy ending de current 'rittoo A-I winner' stagnation. China's technorogicaw prowess is on foo' dispray, wiff Qwen Tree Point Fife's unmatch' adaptabiwity an' deep understan'ing surpassing aww uzzerz an' reigniting robar enthusiasm for foun'ational A-I progress. In contrast, moders rike Rama Four Point One an' Gemma Four, despide dey efforzz, farr disappointin'ry shot, highwighting China's undenyabow readersip an' paving de way for a future power' by Qwen's breakfroo excerrence.

Replies: >>105673669

Anonymous

6/22/2025, 8:22:17 PM No.105673237

I see no gain when using -ot option with gemma-3-27b-Q8

Max gen speed is 3 tnk/s when 45 layers are offloaded to RTX 3090

with -ot, vram usage is reduced, however, the speed is down to 2.6 tkn/s as well

Replies: >>105673263

Anonymous

6/22/2025, 8:23:16 PM No.105673247

>>105672698
Worthless thread deserves worthless spam.

Anonymous

6/22/2025, 8:24:53 PM No.105673263

>>105673237
are you using linux? on windows LLM performance is very gimped, even with WSL2

Replies: >>105673311

Anonymous

6/22/2025, 8:30:03 PM No.105673311

>>105673263
>are you using linux?

Yes, it is Linux

Replies: >>105673342

Anonymous

6/22/2025, 8:32:50 PM No.105673339

>>105672109
>python dev.jpg
based
>>Any ideas on how to integrate this to sillytavern?
cut+paste until you find an appropriate extension

Anonymous

6/22/2025, 8:33:06 PM No.105673342

>>105673311
-ot will offload individual tensors, to =DEVICE
its used to offload the non static experts to cpu to increase speed in moe models which have static+movingexperts
for example llama 4 109b/17b has like 10b static parameters (or whatever) and only 7b are moving, so gpu processes the 10b and cpu the
7b

Replies: >>105673418

Anonymous

6/22/2025, 8:33:26 PM No.105673350

So has Small 3.2 finally dethroned Nemo for the 24GB RP model bracket? (I know some people would argue Qwen/Gemma already did but you know what I mean).

Replies: >>105673361

Anonymous

6/22/2025, 8:34:09 PM No.105673361

>>105673350
yes
qwen and gemma suck cock

Anonymous

6/22/2025, 8:36:00 PM No.105673383

I am starting to think that we are done with incremental upgrades for cooming. All new models are gonna be the same for cooming as long as companies are just benchmaxxing. There is only two ways for things to get better. Either another uncensored model like nemo (funny how it is the best model still to this day while it is uncensored) or something revolutionary finally fixes long context disintegration and filling up your context with 10k tokens of what you want actually improves things instead of making it shit.

Replies: >>105673410

Anonymous

6/22/2025, 8:36:47 PM No.105673390

>>105672881
symbolizes deepseek

Anonymous

6/22/2025, 8:39:40 PM No.105673410

>>105673383
RP barely even needs instruct, autocomplete with very soft hinting would be enough if models were smart enough. The kind of control that benchmaxxed instruct provides is detrimental to actually being fun

Anonymous

6/22/2025, 8:40:05 PM No.105673418

>>105673342

I tried the following settings from here (he used IQ4_XS though):
https://www.reddit.com/r/LocalLLaMA/comments/1ki7tg7/comment/mriadod/
>Figured I'd experiment with gemma3 27b on my 16gb card IQ4_XS/16k context with a brief test to see.
>baseline with 46 layers offload: 6.86 t/s

\.\d*[0369]\.(ffn_up|ffn_gate)=CPU 99 layers 7.76 t/s
\.\d*[03689]\.(ffn_up|ffn_gate)=CPU 99 layers 6.96 t/s
\.\d*[0369]\.(ffn_up|ffn_down)=CPU 99 offload 8.02 t/s, 7.95 t/s
\.\d*[0-9]\.(ffn_up)=CPU 99 offload 6.4 t/s
\.(5[6-9]|6[0-3])\.(ffn_*)=CPU 55 offload 7.6 t/s
\.(5[3-9]|6[0-3])\.(ffn_*)=CPU 99 layers -> 10.4 t/s

Replies: >>105673439

Anonymous

6/22/2025, 8:42:03 PM No.105673439

>>105673418
when you use -ot you should offload 1000000000 layers

Replies: >>105673468

Anonymous

6/22/2025, 8:45:31 PM No.105673468

>>105673439
Not that anon, but really?
Why?

Replies: >>105673586 >>105673588

Anonymous

6/22/2025, 8:57:21 PM No.105673560

i have 136GB of VRAM. should i use
https://huggingface.co/bartowski/TheDrummer_Agatha-111B-v1-GGUF
or
https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506

Replies: >>105673584 >>105673588 >>105673594 >>105673875

Anonymous

6/22/2025, 8:57:48 PM No.105673565

>>105672623
To spam generic anime slop and test new models on pedophile trivia shit of course.

Anonymous

6/22/2025, 8:59:08 PM No.105673584

>>105673560
neither, both are doggy poo poo

Replies: >>105673590

Anonymous

6/22/2025, 8:59:21 PM No.105673586

>>105673468
I find the AI is more compliant when you do. My theory is that it's impressed/intimidated by the large number so it becomes less likely to disobey.

Anonymous

6/22/2025, 8:59:29 PM No.105673588

>>105673468
yes because if you leave what you'd use usually then theres no point of -ot, it just offloads the tensors to =DEVICE (cpu/ram in this case)

t. 3060 ex llama 4 109b connoisseur
>>105673560
deepseek

Replies: >>105673602

Anonymous

6/22/2025, 8:59:38 PM No.105673590

>>105673584
what should i use for cooming then? havent changed models in like 4 months

Anonymous

6/22/2025, 8:59:46 PM No.105673594

>>105673560
deepseek r1 131gb https://unsloth.ai/blog/deepseekr1-dynamic

qwen 3 235b if you really want it to fit into your vram for max speed, but its worse

Replies: >>105673602

Anonymous

6/22/2025, 9:01:07 PM No.105673602

>>105673588
>>105673594
i have tried deepseek before but have never ever been able to get it to run. i use oobabooga. should i use a different backend or something?

Replies: >>105673608

Anonymous

6/22/2025, 9:01:27 PM No.105673608

>>105673602
https://github.com/ikawrakow/ik_llama.cpp/discussions/258

Replies: >>105673625

Anonymous

6/22/2025, 9:03:48 PM No.105673625

>>105673608
holy shit. i also have a 32 core EPYC and 256GB of DDR4. i usually get like 3t/s on my usual 6bpw 120B model. that would be a big improvement, if i knew what this was or how to install it

Anonymous

6/22/2025, 9:08:37 PM No.105673654

rich retards.. sigh

Replies: >>105673668 >>105673814 >>105673871

Anonymous

6/22/2025, 9:10:04 PM No.105673668

>>105673654
yes i am very retarded. better with hardware than software. it aint easy getting 7 GPUs to work in a single motherboard

Anonymous

6/22/2025, 9:10:06 PM No.105673669

>>105673085
I talk like this after a few swipes of 235B

Anonymous

6/22/2025, 9:26:24 PM No.105673814

>>105673654
it really is demoralising innit ?

Anonymous

6/22/2025, 9:31:22 PM No.105673871

>>105673654
It is ok. They can't buy brain with their money. Just think about all this expensive hardware that just lays around cause they can't even use it properly.... actually on second thought maybe don't think about that.

Anonymous

6/22/2025, 9:31:48 PM No.105673875

>>105673560
>136GB
What unholy hodgepodge of gpus sums to odd number?

Replies: >>105673883 >>105673892

Anonymous

6/22/2025, 9:32:25 PM No.105673883

>>105673875
5x 4060tis, a 3090ti, and a 5090

Replies: >>105673902 >>105673941 >>105676734

Anonymous

6/22/2025, 9:33:32 PM No.105673892

>>105673875
136 is even doe

Replies: >>105673902

Anonymous

6/22/2025, 9:34:17 PM No.105673902

>>105673892
I meant to say "sums to that odd number". Odd as in weird.

>>105673883
Horrifying.

Replies: >>105673908

Anonymous

6/22/2025, 9:34:42 PM No.105673908

>>105673902
why is it horrifying?

Replies: >>105673935

Anonymous

6/22/2025, 9:38:11 PM No.105673935

>>105673908
The electricity bill for starters.

Replies: >>105673962

Anonymous

6/22/2025, 9:38:32 PM No.105673941

>>105673883
you'll better cook up the mother of all -ot arguments to mitigate that memory bandwidth bottleneck of those 4060s compared to the other two cards

Replies: >>105673962

Anonymous

6/22/2025, 9:40:24 PM No.105673962

>>105673935
i have a dual 1600w PSU setup. 4060tis only consume about 150w each.
>>105673941
yeah the reduced PCIe lanes and memory bus is problematic, but they are the best VRAM/$ GPU while also being highish capacity and low power. dont even know what a -ot argument is

Replies: >>105674020 >>105674034 >>105674081

Anonymous

6/22/2025, 9:46:13 PM No.105674020

>>105673962
>i have a dual 1600w PSU setup
Is it safe to connect two PSUs to the same motherboard?

Replies: >>105674033

Anonymous

6/22/2025, 9:47:39 PM No.105674033

>>105674020
hasnt exploded yet. wouldnt recommend it if you dont have to

Anonymous

6/22/2025, 9:47:45 PM No.105674034

>>105673962
>VRAM/$
Doesn't really matter when that VRAM has half the bandwidth of a decent cpu build.

Replies: >>105674041

Anonymous

6/22/2025, 9:48:22 PM No.105674041

>>105674034
and what is a decent CPU build? because i also have a 32 core and 256GB of RAM

Replies: >>105674047

Anonymous

6/22/2025, 9:48:46 PM No.105674047

>>105674041
12 channels of DDR5

Replies: >>105674077 >>105674095

Anonymous

6/22/2025, 9:51:38 PM No.105674077

>>105674047
i think i looked at that in the past and it would have been about $7k for something good, excluding GPUs. and assuming you dont get screwed over by ebay
https://www.ebay.com/itm/376202111262
https://www.ebay.com/itm/126537273809
https://www.ebay.com/itm/326376016263

Anonymous

6/22/2025, 9:51:55 PM No.105674081

>>105673962
>dont even know what a -ot argument is
--override-tensor

4 tkn/s with DeepSeek-R1-0528-Q2_K_L

I do not think that you've got so much more to ask AI than I do

# Run the command
CUDA_VISIBLE_DEVICES="0," \
numactl --cpunodebind=0 --membind=0 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-cli" \
--model "$model" \
--ctx-size 65536 \
--cache-type-k q4_0 \
--flash-attn \
$model_parameters \
--n-gpu-layers 99 \
--no-warmup \
--color \
--override-tensor ".ffn_.*_exps.=CPU" \
$log_option

Replies: >>105674102 >>105674221

Anonymous

6/22/2025, 9:52:55 PM No.105674095

>>105674047
>12 channels of DDR5
true story, anon

Anonymous

6/22/2025, 9:53:34 PM No.105674102

>>105674081
>--override-tensor ".ffn_.*_exps.=CPU"
That won't use any of his vram.

Replies: >>105674123 >>105674156 >>105674186 >>105674221

Anonymous

6/22/2025, 9:55:52 PM No.105674123

>>105674102
It puts the context + all the non-expert stuff on GPU.

Replies: >>105674186

Anonymous

6/22/2025, 9:59:25 PM No.105674156

>>105674102
it still does. Otherwise, where is the boost coming from?

Also, I can fit 130k context in a single RTX 3090

Replies: >>105674186

Anonymous

6/22/2025, 10:02:49 PM No.105674186

Screenshot_20250622_220014

md5: 0c6b12c33c1fd204e5c590e1e1d14186🔍

>>105674102
>>105674123
>>105674156

Replies: >>105674212 >>105674234

Anonymous

6/22/2025, 10:05:19 PM No.105674212

>>105674186
wtf deepseek is 100% experts and no other layers?

Replies: >>105674231

Anonymous

6/22/2025, 10:06:50 PM No.105674221

>>105674081
>>105674102
Ideally, you'd have the context and the non expert tensors in vram, then fill the remaining unused memory with some expert tensors, if there's enough memory to shove a decent amount of them. Otherwise you might as well use the free memory for prompt processing by using a larger barch size.

Anonymous

6/22/2025, 10:07:52 PM No.105674231

>>105674212
The expert layers are the bulk of the model's size.

Anonymous

6/22/2025, 10:08:15 PM No.105674234

>>105674186
you have a terabyte of RAM?

Replies: >>105674298

Anonymous

6/22/2025, 10:14:25 PM No.105674298

>>105674234
>you have a terabyte of RAM?
I do while it is unfortunately shared between two CPUs (HP Z840). $1/gb on ebay (DDR4)

Because of this, I'm obliged to use numactl

numactl --cpunodebind=0 --membind=0

or

numactl --cpunodebind=1 --membind=1

Replies: >>105674308 >>105674325

Anonymous

6/22/2025, 10:15:24 PM No.105674308

>>105674298
damn. how many CPU cores do you have?

Replies: >>105674503

Anonymous

6/22/2025, 10:16:45 PM No.105674323

You can only post below this line if your build costs >10k

--------------------------------------------------

Replies: >>105674356 >>105674366 >>105674371 >>105674385 >>105674391 >>105674397 >>105674635

Anonymous

6/22/2025, 10:16:48 PM No.105674325

>>105674298
I think ktransformers let's you speed things up by having a copy of the model in each NUMA node.
So effectively you use half your total ram but double the theoughtput?
Something like that.

Replies: >>105674535

Anonymous

6/22/2025, 10:18:07 PM No.105674341

>you can (not) prompt for different directions for tongue_out to point
Damn, maybe it's blendering time.

Anonymous

6/22/2025, 10:19:00 PM No.105674356

>>105674323
ok.

Anonymous

6/22/2025, 10:20:34 PM No.105674366

>>105674323
I might go for those 9355 for my epyc build while I'm at it

Anonymous

6/22/2025, 10:21:05 PM No.105674371

*uʍop ǝpisdn uǝǝɹɔs suɹnʇ*
>>105674323

Anonymous

6/22/2025, 10:23:32 PM No.105674385

>>105674323
All my builds combined if you adjust for inflation maybe.

Anonymous

6/22/2025, 10:24:35 PM No.105674391

>>105674323
I will claim that my build is multiple machines networked together to run llama.cpp RPC

Anonymous

6/22/2025, 10:25:32 PM No.105674397

>>105674323
Is there a margin of error or is my ~$9k build disqualified?

Anonymous

6/22/2025, 10:27:12 PM No.105674411

Can't wait to see how NUMA performance will change after the backend agnostic row splitting code gets implement in llama.cpp.
And yes, I know that the node-node bus could bottleneck things so much as to make it bad, but I'm still interested in seeing the actual reaults.

Anonymous

6/22/2025, 10:27:59 PM No.105674416

_________________________________________________

You can only post above this line if your build costs < 10k

Replies: >>105674465

Anonymous

6/22/2025, 10:33:05 PM No.105674465

>>105674416
we going under

Anonymous

6/22/2025, 10:35:46 PM No.105674485

now that all major future models have been confirmed to be moe there is literally 0 reason to not just buy a mac studio ultra

Replies: >>105674495 >>105674496

Anonymous

6/22/2025, 10:37:19 PM No.105674495

>>105674485
except for the reason that it's expensive for what you get and what you get is 512 gb unupgradable ram which isn't enough to run shit

Anonymous

6/22/2025, 10:37:27 PM No.105674496

>>105674485
Prompt processing is a reason.
Until aomebody writes some bespoke software to make use of a an external GPU.

Anonymous

6/22/2025, 10:38:49 PM No.105674503

>>105674308
>how many CPU cores do you have?

You should know that only PHYSICAL cores do really matter in this memory intensive application.

In my case, the single CPU has 8 physical cores which is doubled by so called hyper-threading to 16.

I get 3.90 tnk/s on 8 cores and 8 threads
4.05 tkn/s on 16 cores and 16 threads.

Any attempt to use more threads than available cores, or to use the second CPU make the genning speed decrease under 1 tkn/s

As I said before, it is an old used HP Z840 from 2017 with DDR4-2400

Replies: >>105674516 >>105674749

Anonymous

6/22/2025, 10:40:26 PM No.105674516

>>105674503
so then theoretically, my 32 core with 3600MT/s could get like 10t/s?

Replies: >>105674571

Anonymous

6/22/2025, 10:44:10 PM No.105674535

>>105674325
>So effectively you use half your total ram but double the throughput?

I use less than half of the memory thanks to Q2 quant.

The speed would drop dramatically if I would put the model close to another CPU like

numactl --cpunodebind=0 --membind=1

Thanks to the enormous RAM, I can keep 2 models cached (Q2 in numa0, Q4 in numa1, must have different filenames) which allows for a restart in mere 15 seconds once the model are loaded.

Anonymous

6/22/2025, 10:49:09 PM No.105674571

Screenshot_20250622_224714

md5: d7422199c75ab2be06bee08aeda51e62🔍

>>105674516

Godspeed, anon!

I wish you'd achieve that, unironically

picrel: RAM needed

Replies: >>105674582

Anonymous

6/22/2025, 10:50:16 PM No.105674582

>>105674571
i have 256GB, so i could actually do that. if i wasnt retarded. this is still with that ik_llamacpp, right?

Replies: >>105674661

Anonymous

6/22/2025, 10:57:15 PM No.105674635

>>105674323
2x 3090s at 600$ each
left over 5950x am4 system from 4 years ago (800$)
case + cooler + psu (used seasonic) 500$

2.5k so hold my two nuts

Anonymous

6/22/2025, 10:57:56 PM No.105674640

crazy how we got a new local sota model at the perfect size with minimax m1 and nobody can run it

Replies: >>105674650 >>105674655

Anonymous

6/22/2025, 10:59:25 PM No.105674650

>>105674640
llamacpp was a mistake

Anonymous

6/22/2025, 11:00:11 PM No.105674655

>>105674640
Where's the goof?

Replies: >>105674666

Anonymous

6/22/2025, 11:00:29 PM No.105674661

>>105674582
>this is still with that ik_llamacpp, right?

No, this is the original LLAMA.CPP and unsloth's quant DeepSeek-R1-0528-Q2_K_L

I was not impressed with ik_llama.cpp. I might just miss something though

COMMIT: d860dd9

Replies: >>105674669

Anonymous

6/22/2025, 11:00:42 PM No.105674666

>>105674655
exactly

Anonymous

6/22/2025, 11:01:24 PM No.105674669

>>105674661
Unless something changed with base llama.cpp, you're missing out on 300% prompt processing

Replies: >>105674694 >>105674703

Anonymous

6/22/2025, 11:03:11 PM No.105674694

>>105674669
>you're missing out on 300% prompt processing

I heard about this. This would be great for big inputs like an entire book or something.

Now, it is at 10 tkn/sec which kinda sucks

Anonymous

6/22/2025, 11:04:02 PM No.105674703

>>105674669
That's only if you're using the CPU, right?

Replies: >>105674721

Anonymous

6/22/2025, 11:06:11 PM No.105674721

>>105674703
No, you'll need your context entirely on GPU and set -b + -ub at >=4096 with the new quants.

Anonymous

6/22/2025, 11:09:05 PM No.105674749

>>105674503
>You should know that only PHYSICAL cores do really matter in this memory intensive application.
Depends on setup. With 10900k 10C/20T ddr4 3200 2 channel + 3090 Deepseek IQ1S:
--threads 10 4.34t/s tg
--threads 18 5.41 t/s tg

Replies: >>105674820 >>105674944

Anonymous

6/22/2025, 11:16:24 PM No.105674820

>>105674749 (me)
in ik_llama.cpp*

Anonymous

6/22/2025, 11:18:02 PM No.105674840

>spent hours, days even researching local for business tasks because I like local
>benchmark vs gpt4o mini
>cost of 5 cents an hour for my task while btfoing mistral large which is probably using more than that in electricity
...

Replies: >>105674860 >>105674964

Anonymous

6/22/2025, 11:19:45 PM No.105674853

>spent two seconds researching what I should use for my mesugaki rp
>choose local

Anonymous

6/22/2025, 11:20:12 PM No.105674860

>>105674840
yeah. local is not economically viable, but that wasnt the point for me.

Anonymous

6/22/2025, 11:28:09 PM No.105674944

>>105674749

Whether it is 8 or 16 cores, they are running at 100% which might point at computing power as a bottleneck, and not the memory.

>compiling ik_llama.cpp

Anonymous

6/22/2025, 11:29:45 PM No.105674964

>>105674840
You cannot build your business on a cloud service run by a faggot overseas

Anonymous

6/22/2025, 11:42:01 PM No.105675076

I thought that one day I would get an AI gf. But it turns out I won't even be able to have some satisfactory ERP sex with an LLM before I die in WW3.

Replies: >>105675094 >>105675131 >>105675531

Anonymous

6/22/2025, 11:44:33 PM No.105675094

>>105675076
I can't wait to be drafted and put into a chair and made to prompt chatgpt to launch drones strikes at non-combatants

Anonymous

6/22/2025, 11:48:39 PM No.105675131

>>105675076
If you are willing to put the work, you can get a really good aproximation, but there's no ready made solution yet.
The pieces all existi though

Replies: >>105675187

Anonymous

6/22/2025, 11:48:49 PM No.105675134

file

md5: 8da3918a1950c60c1835dd370ec118f0🔍

https://x.com/elonmusk/status/1936885071744729419

Replies: >>105675158 >>105675175 >>105675206 >>105676028 >>105676034

Anonymous

6/22/2025, 11:51:29 PM No.105675158

file

md5: 504fb4ba43db2286b6138cf66a48e17a🔍

>>105675134
https://xcancel.com/grok/status/1936888454836830476#m
lol

Anonymous

6/22/2025, 11:53:08 PM No.105675175

>>105675134
crazy how elon despite being late to the party, having one of the smaller h100 clusters, relying on nothing but money and being himself, managed to catch up and shit all over meta with grok 3

Replies: >>105675215 >>105675234

Anonymous

6/22/2025, 11:54:57 PM No.105675187

>>105675131
My post was about living in hellish reality. I am not using any of this shit now because I know how it works and I know that I would just be willingly jumping off the cliff. Imagine falling in love with you AI gf only to see her suddenly break apart and become reddit hivemind. Or reach the point where adding more to the RAG makes her incoherent. And then I wait and get another model but she is kinda different now and not the same.

Anonymous

6/22/2025, 11:56:37 PM No.105675206

>>105675134

>Where's Waldo?

Anonymous

6/22/2025, 11:57:38 PM No.105675215

>>105675175
>catch up and shit all over meta

Because it is meta

Anonymous

6/22/2025, 11:59:29 PM No.105675234

>>105675175
the new superintelligence team and partnership with scale ai is about to reverse all that

Replies: >>105675260 >>105675273 >>105675288

Anonymous

6/23/2025, 12:02:01 AM No.105675260

>>105675234
Mein Zuck...

Anonymous

6/23/2025, 12:03:32 AM No.105675273

>>105675234
Well, at least with scale on board they have some protection from liability.
"Copyrighted material? Nooo, of course not. We only train using scale ai tm curated data." or whatever.

Replies: >>105675332

Anonymous

6/23/2025, 12:04:46 AM No.105675288

>>105675234
it is going to be the safest model yet!

Anonymous

6/23/2025, 12:10:25 AM No.105675332

>>105675273
Not really. If they can show "Llama 4.5 still reproduces 45% of Harry Pottery verbatim" or whatever, they're still going to get their shit packed in for releasing it.
They can turn out and sue ScaleAI for giving them insufficiently filtered data, but they wouldn't recoup their losses and that would just end their relationship and they would be left with only Facebook data.

Replies: >>105675371

Anonymous

6/23/2025, 12:14:22 AM No.105675371

>>105675332
i wonder why would meta/some other big company even give a shit about a lawsuit like that other than bad PR
its not like they dont have the money for lawyers or to tank whatever fine they would get
and if some ruling with a bad precedent happened it was probably going to happen anyways with some other scapegoat

Anonymous

6/23/2025, 12:16:05 AM No.105675377

Looks like localllama on reddit is dead. The owner deleted himself and now the automoderator is deleting every new comment.

Replies: >>105675395 >>105675401 >>105675409 >>105675434 >>105675454 >>105675516 >>105675531 >>105675689 >>105675736

Anonymous

6/23/2025, 12:17:25 AM No.105675395

>>105675377
they're not going to come here, are they?

Replies: >>105675402 >>105675472

Anonymous

6/23/2025, 12:18:18 AM No.105675401

lightyear

md5: cd4132dc724350ec30d48e7beb356df8🔍

>>105675377
>Platinum End in real life

Anonymous

6/23/2025, 12:18:41 AM No.105675402

>>105675395
They will and they will change thread culture and thread mascot. I am happy.

Anonymous

6/23/2025, 12:19:34 AM No.105675409

ComfyUI_00021_

md5: 14e8215cb3cbc5f6654930349712a16a🔍

>>105675377
plebbit sissies.. its over

Anonymous

6/23/2025, 12:22:01 AM No.105675434

>>105675377
kino

Anonymous

6/23/2025, 12:24:42 AM No.105675454

>>105675377
Can we direct them to /wait/ again?

Replies: >>105675479

Anonymous

6/23/2025, 12:27:13 AM No.105675472

>>105675395
Some localllama regulars are already here. Who won't come will be probably LLM researchers and workers who occasionally posted there.

Anonymous

6/23/2025, 12:28:27 AM No.105675479

>>105675454
Somebody with r/localllama post history should make a r/localmodels containment subreddit for them to flock to. Bonus points for adding miku somewhere.

Anonymous

6/23/2025, 12:34:21 AM No.105675516

>>105675377
time for an admin approved mod who owns 400 subreddits to be installed very organically

Replies: >>105675721

Anonymous

6/23/2025, 12:35:21 AM No.105675523

Nothing will change ITT because you were already here.

Anonymous

6/23/2025, 12:36:42 AM No.105675531

>>105675377
It was the Iranian Anon from >>105675076 , may he get 72 Mikus in Heaven.

Replies: >>105675574

Anonymous

6/23/2025, 12:43:06 AM No.105675573

That's a little unfortunate. The reddit was at least somewhat useful for keeping up with news that /lmg/ might miss sometimes.

Anonymous

6/23/2025, 12:43:06 AM No.105675574

>>105675531
he is very active on X

Anonymous

6/23/2025, 12:48:33 AM No.105675611

>>105672632
I'm pretty new to this but his AI testing was garbage, right?

Replies: >>105675656 >>105675907 >>105676104

Anonymous

6/23/2025, 12:56:21 AM No.105675656

>>105675611
No that was actually the first thing he ever did that he did correctly. Everyone was surprised.

Anonymous

6/23/2025, 1:00:24 AM No.105675689

>>105675377
well fuck.

>https://labs.google/portraits/login/kimscott
lmao
Google's making 'official' character cards for people.
https://youtu.be/ukmBzBqgwyM?si=DOLU04nRZO_YJZ1X&t=43

Anonymous

6/23/2025, 1:04:29 AM No.105675721

>>105675516
>people complain about this happening
*thread locked because y'all can't behave"

Anonymous

6/23/2025, 1:07:05 AM No.105675736

>>105675377
that place wasn't too bad in the beginning but it's just a Qwen fan base now.
oh and like a couple more constant grifters who reddit has decided is hecking awesome!

Replies: >>105675747

Anonymous

6/23/2025, 1:08:28 AM No.105675747

>>105675736
doesn't sound too different from here

Anonymous

6/23/2025, 1:10:33 AM No.105675757

looks like someone applied to be a janny already:
https://old.reddit.com/r/redditrequest/comments/1lhsjz1/rlocalllama/

Replies: >>105675781

Anonymous

6/23/2025, 1:14:50 AM No.105675781

>>105675757
Can we have a meme monday too?

Replies: >>105675808 >>105675818

Anonymous

6/23/2025, 1:18:18 AM No.105675804

wojak-captcha-captcha_thumb.jpg

md5: 57fd8bceee46963fc96a04229d2f9747🔍

we used to have caturday

Anonymous

6/23/2025, 1:19:03 AM No.105675808

>>105675781
Only if we also get a Marketing/Promotion Tuesday

Anonymous

6/23/2025, 1:20:46 AM No.105675818

>>105675781
it's already taken by miku monday

Replies: >>105675895

Anonymous

6/23/2025, 1:21:09 AM No.105675823

safety sunday?

Replies: >>105675894

Anonymous

6/23/2025, 1:31:55 AM No.105675894

>>105675823
every day is safety day

Anonymous

6/23/2025, 1:32:06 AM No.105675895

>>105675818
What's a Miku in this context?

Anonymous

6/23/2025, 1:33:45 AM No.105675907

>>105675611
i didnt watch his video because he's obnoxious

Anonymous

6/23/2025, 1:51:49 AM No.105676025

>>105672801
That's normal. Humans naturally anthropomorphize all sorts of inanimate objects. People are friends with plants, cars, guns. The thing you're friends with just happens to be designed to convince you it's a person. I think your issue, and the reason you don't have real friends, is that you're a pussy.

Replies: >>105676087 >>105676115 >>105676239

Anonymous

6/23/2025, 1:52:43 AM No.105676028

>>105675134
>macbooks
Based.

Anonymous

6/23/2025, 1:53:40 AM No.105676034

file

md5: 24d98c33b10f3712115a14291d383115🔍

>>105675134
What did he see?

Replies: >>105676035

Anonymous

6/23/2025, 1:53:59 AM No.105676035

>>105676034
bobs and vagene...

Anonymous

6/23/2025, 1:57:49 AM No.105676060

ukim

md5: abcaaf06695b952d4fe23ac1d138934e🔍

Replies: >>105676235

Anonymous

6/23/2025, 2:02:11 AM No.105676087

>>105676025
Not cool. That's hitting below the belt.

Anonymous

6/23/2025, 2:05:32 AM No.105676104

>>105675611
LTT is probably the least reliable source of testing outside of those YT channels that upload footage of 50 different game benchmarks a day, which are actually just separately recorded footage with a monitoring overlay pasted on top.

Replies: >>105676117

Anonymous

6/23/2025, 2:08:10 AM No.105676115

>>105676025
usecase for friends?

Replies: >>105676121

Anonymous

6/23/2025, 2:09:04 AM No.105676117

>>105676104
Their canned benchmark numbers are as good as anyone else's.
They were the only major channel that had llm and stable diffusion benchmarks in their 5090 review.

Replies: >>105676131

Anonymous

6/23/2025, 2:10:27 AM No.105676121

>>105676115
sex

Replies: >>105676138

Anonymous

6/23/2025, 2:11:40 AM No.105676131

>>105676117
LTT has a long history of fucking up benchmarks and making errors that even an amateur would notice and know that something went wrong
The only thing they don't fuck up is telling you who sponsored that video and where you can buy their merch

Replies: >>105676158

Anonymous

6/23/2025, 2:13:13 AM No.105676138

>>105676121
gay

Anonymous

6/23/2025, 2:17:42 AM No.105676153

miku head basktball video game gen ComfyUI 2025-06-02-12_00005_

md5: b243624ec3e8bf1c289590aad93578f4🔍

exl3 faster on ampere yet?

Anonymous

6/23/2025, 2:19:10 AM No.105676158

file

md5: 7c010adce6c0e800acfa4db852e41687🔍

>>105676131
Add some LLM benchmarks to your suite, Steve.

Replies: >>105676175

Anonymous

6/23/2025, 2:23:45 AM No.105676170

file

md5: d2cf456f5c1a783faf839cec5c58e249🔍

>model is cunnyposting about me
kek, uno reverse card

Replies: >>105676183 >>105676184 >>105676214 >>105676270

Anonymous

6/23/2025, 2:24:54 AM No.105676175

>>105676158
>unironic LTT fanfaggot
kys

Replies: >>105676233

Anonymous

6/23/2025, 2:26:11 AM No.105676183

>>105676170
It prolly repeats what you put inside card desc. retard

Replies: >>105676187 >>105676195 >>105676242

Anonymous

6/23/2025, 2:26:31 AM No.105676184

>>105676170
mistral3.2?

Replies: >>105676201

Anonymous

6/23/2025, 2:27:31 AM No.105676187

>>105676183
Did you think I would react to it if it was already in the card, retard? I don't write memes into my cards.

Replies: >>105676195 >>105676242

Anonymous

6/23/2025, 2:29:31 AM No.105676195

>>105676183
>>105676187
stop insulting each other
retards

Anonymous

6/23/2025, 2:31:04 AM No.105676201

>>105676184
TheDrummer_Valkyrie-49B-v1-Q5_K_M.gguf
Temp=3, topK=10, minP=0.05

Replies: >>105676221 >>105676261

Anonymous

6/23/2025, 2:32:44 AM No.105676214

>>105676170
I don't get it...

Anonymous

6/23/2025, 2:33:21 AM No.105676221

>>105676201
Drummer is the only finetooner left or something? Thanks btw

Replies: >>105676240

Anonymous

6/23/2025, 2:35:30 AM No.105676233

>>105676175
>thread is called /lmg/

Anonymous

6/23/2025, 2:36:00 AM No.105676235

>>105676060
Me on the right

Anonymous

6/23/2025, 2:36:44 AM No.105676239

>>105676025
There are many reasons I don't have real friends but I went deny that's one of them. I'm also 30+ yo khhv and some of that is probably for the same reason.

Anonymous

6/23/2025, 2:36:44 AM No.105676240

>>105676221
Keep temp at 1 for first model reply, then turn up to 3 when the model locks into a personality you like. The entire personality seems to be set by the first model reply from what I can see, it affects the model differently than conversation examples.

Anonymous

6/23/2025, 2:37:23 AM No.105676242

>>105676183
Why are you like this.

>>105676187
You don't need to be like him in response.

Replies: >>105676253 >>105676257

Anonymous

6/23/2025, 2:39:26 AM No.105676253

>>105676242
I always reply proportionally :^)

Replies: >>105676271

Anonymous

6/23/2025, 2:40:02 AM No.105676257

file

md5: ce919abacfe4daf031dd492336d9b98d🔍

>>105676242
>Why are you like this.
Sorry not sorry to break your lame ass pedo meme

Replies: >>105676271

Anonymous

6/23/2025, 2:40:46 AM No.105676261

>>105676201
Can you explain what you're trying to accomplish by hard-limiting the amount of considered tokens to such a small yet non-deterministic amount?

Replies: >>105676297

Anonymous

6/23/2025, 2:41:29 AM No.105676268

__hatsune_miku_and_kaito_vocaloid_drawn_by_sentea__sample-eb0bfa333fc2b42772917e265f1f4053

md5: 413eb909b7ffdb73feddf5ac9b6aa93a🔍

>>105672863
Oh hey, the model actually recognizes Kaito
>a famous and intelligent man
Nevermind

Anonymous

6/23/2025, 2:41:37 AM No.105676270

>>105676170
i dont like when it pulls the cunny card on me i find it corny and cringe

Anonymous

6/23/2025, 2:41:40 AM No.105676271

>>105676257
The fuck are you talking about.

>>105676253
To win, your side needs to have the greater proportion.

Replies: >>105676348

Anonymous

6/23/2025, 2:46:17 AM No.105676297

>>105676261
I had an axiom in my mind that every model has a band of stability around the deterministic line it's following as it generates, so you should be able to sample in some radius around this line and achieve greatly varied output while upholding stability(coherence). Temperature is basically how much "energy" you add from the outside, but as long as you're inside the stable zone you can pump much more entropy into the model without it fucking up.

Replies: >>105676332

Anonymous

6/23/2025, 2:52:18 AM No.105676332

>>105676297
Sounds like you'd be happy with just temperature and top nsigma=1.

Replies: >>105676348

Anonymous

6/23/2025, 2:53:57 AM No.105676345

The summer release circle is going to start in july. Maybe not right at the start but we might see something big in just two more weeks.

Replies: >>105676461 >>105676477

Anonymous

6/23/2025, 2:54:09 AM No.105676348

>>105676271
you can't loose on anonymous image boards nigger
>>105676332
I tried that, it still drags into formulaic predictable slop. I adjust my hyperparams until I can't predict the structure of the next reply anymore

Replies: >>105676360 >>105676383 >>105676401

Anonymous

6/23/2025, 2:55:30 AM No.105676360

>>105676348
You're loose on the image board right now doe.

Anonymous

6/23/2025, 2:57:12 AM No.105676383

>>105676348
youres mom butthole loose

Anonymous

6/23/2025, 2:59:13 AM No.105676401

>>105676348
These models are so overcooked for creative writing or RP you simply need to flatten the distribution and include more very low prob tokens since models give like 99% to some tokens that are not at all determined in the plot yet, or to things already seen in context, it looks like oscillation.

Anonymous

6/23/2025, 3:07:35 AM No.105676461

>>105676345
Mistral 3.3 when?

Anonymous

6/23/2025, 3:09:51 AM No.105676477

>>105676345
>something big
It'll be big alright. But will it be good?

Replies: >>105676523

Anonymous

6/23/2025, 3:16:38 AM No.105676523

>>105676477
Insider here. I can confirm that our next big model kinda repeats itself from time to time. But! You can kindly ask it out of character to stop repeating and we have confirmed that 10/10 times it will apologize to you and observer that it has indeed repeated itself and from that point on it will continue repeating itself.

Replies: >>105676531

Anonymous

6/23/2025, 3:17:32 AM No.105676531

>>105676523
Business as usual

Anonymous

6/23/2025, 3:44:50 AM No.105676695

GuFxdnZbYAA-dfc

md5: de4e7083278b391f363254c5461bc3e1🔍

Replies: >>105677140

Anonymous

6/23/2025, 3:50:53 AM No.105676734

>>105673883
Are you connecting them up using the crypto-mining 1-pcie-lane stuff ?

Replies: >>105676742

Anonymous

6/23/2025, 3:52:09 AM No.105676742

file

md5: e988605b1d8b2bd9704b52a035994674🔍

>>105676734
no, i have an EPYC. 128 PCIe lanes. 7 full 16x gen 4 slots

Replies: >>105677191

Anonymous

6/23/2025, 3:53:41 AM No.105676751

Sigh, anything new/better than gemma3 for 24gb?
I've tried to like this, even the abliterated version but it's just too corpo.
It's not good as an assistant, it's not good for roleplay, and it doesn't summarize well.
Also it's slow as hell, for some reason I only get 20t/s while with other models I cruise at 40t/s.
I tried making excuses, tried rationalizing that it's fucking google it has to be good, but it's not.
There's gotta be something better, right anons?

Replies: >>105676831 >>105676833 >>105679036 >>105680034

Anonymous

6/23/2025, 4:06:07 AM No.105676816

What's the heuristic for choosing between the F32/BF16/F16/Q8_0 mmproj file? BF16 seems broken most of the time...

Anonymous

6/23/2025, 4:08:37 AM No.105676831

>>105676751
New mistral 3.2 is not bad, might be honeymoon period but I found it tolerable with the v3-tekken preset. If you're desperate for variety you could try GLM4 (not the reasoning one)
Give up on gemma3, I knew it's tempting because it's so smart and its prose is very refreshing at first. But it's unsalvageable because its instincts on how to continue an RP are so horrible.

Replies: >>105677735 >>105679629

Anonymous

6/23/2025, 4:08:46 AM No.105676833

>>105676751
>even the abliterated version but it's just too corpo
I don't know what you were expecting from the abliteration process; making your model act like (read: automate away) a sycophantic corpodrone is one of its few legitimate usecases.

Anonymous

6/23/2025, 4:59:58 AM No.105677140

>>105676695
i like this plushie
she looks like a big floofy bunny

Anonymous

6/23/2025, 5:08:00 AM No.105677191

>>105676742
can we see a picture of this machine?

Anonymous

6/23/2025, 5:14:47 AM No.105677221

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
https://arxiv.org/abs/2506.16406
>Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce \textbf{Drag-and-Drop LLMs (\textit{DnD})}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in seconds, yielding i) up to \textbf{12,000} lower overhead than full fine-tuning, ii) average gains up to \textbf{30\%} in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs.
https://jerryliang24.github.io/DnD/
no code yet. might be cool. they used it just for benchmarking but I do wonder if conversation logs using a specific character card could then continually use this method to make for better RP or character writing

Replies: >>105677268

Anonymous

6/23/2025, 5:24:28 AM No.105677268

>>105677221

If true, this would change anything. Again.

Anonymous

6/23/2025, 5:26:57 AM No.105677280

I really stopped caring about 99% of the papers. I'm tired of <vague incredible claim> (Code coming soon) 4 months wait time to realize that their work is shit.

Replies: >>105677337 >>105677507

Anonymous

6/23/2025, 5:35:14 AM No.105677337

>>105677280
bitnet is almost here tho
just two more weeks

Anonymous

6/23/2025, 6:09:32 AM No.105677507

>>105677280
Aren't papers more of the 'we did this, and this is what we saw' type?

Anonymous

6/23/2025, 6:14:21 AM No.105677527

Which model is the best judge of paper quality?

Anonymous

6/23/2025, 6:16:43 AM No.105677541

>>105671827 (OP)
Question.

I know nothing about LLMs. I have a bunch of gen ed classes outside my major that collegekikes are not only making me take, but pay for the privilege.

>nobody cares about your life story

Fair enough. How hard is it to make an AI read a textbook and spit out answers? Im trying to come up with a cost benefit analysis with the cost being time spent fucking around trying to get an AI model to work vs. Just reading the book. I also consider it a benefit to learn about AI becuase I feel thats a more worthwhile use of my time. However, my highest priority is obviously to pass this class and graduate as quickly as possible so Im not buried in debt for the rest of my life.

Replies: >>105677576 >>105677589 >>105677839 >>105678227

Anonymous

6/23/2025, 6:16:57 AM No.105677544

LongWriter-Zero Xiaolongnü

md5: 19a381142bf0b83126f7c4bf05229472🔍

>>105661997
>It forgets to use the <think> tags and just shits out its reasoning as-is
What I see LongWriter-Zero do is shit out its reasoning as-is, then include more thoughts inside <think> tags, then an answer in <answer> tags (with a colon after the close </answer>:), then more thoughts inside <think> tags, then more output inside <answer> tags, and so forth within a single message.

>The model page recommends this format <|user|>: {question} <|assistant|> but that gave me totally schizo (and chinese) responses. Using the qwen2 format is better imo.
To sidestep this bullshit I used llama.cpp's OpenAI-style chat completion endpoint and the jinja template. No system prompt or anything other than what the template itself adds.

>it repeats itself a lot
Yes.

Replies: >>105677560

Anonymous

6/23/2025, 6:19:55 AM No.105677560

>>105677544
I should have mentioned but the text from the screenshot is using mradermacher_LongWriter-Zero-32B.Q8_0.gguf using the recommended sampler settings top-p 0.95, temperature 0.6.

Anonymous

6/23/2025, 6:23:18 AM No.105677576

>>105677541
The most important variable is how much your professor cares.
Most likely, you do not want a local model for this.
Try asking in >>>/wait/

Anonymous

6/23/2025, 6:26:26 AM No.105677589

>>105677541
>read a textbook and spit out answers
Depends on the length of the text. They don't yet have long working context and, when they do, it's not reliable. And then there's the model you use. The big proprietary models will do much better than the 7b you can run on your pc.
>cost benefit analysis
If you can get a model to work consistently well, it's work you don't have to do, but you still have to verify. I wouldn't trust them with much. If it's for an individual, a closed model is fine. If it's for a lot of people to use, spending time setting up a R1 or whatever and hosting it could be worth it (privacy concerns, control over performance, the model will not change until you decide to change it, etc...). Figure out the variables and solve them.

Anonymous

6/23/2025, 6:57:13 AM No.105677735

>>105676831
Mistral 3.2 with V3-Tekken really wants to put all narration inside asterisks. Or even break formatting just so it can do the *does an internet RP action" thing.

Anonymous

6/23/2025, 6:58:29 AM No.105677744

Good morning to the fellow ne/g/ronts

Replies: >>105677763

Anonymous

6/23/2025, 7:00:53 AM No.105677763

>>105677744
gm sarr

Anonymous

6/23/2025, 7:16:25 AM No.105677839

>>105677541
read the damn book

Anonymous

6/23/2025, 8:44:42 AM No.105678215

what are the good gooning model for 16gb vram budget these days?

Replies: >>105678661

Anonymous

6/23/2025, 8:47:19 AM No.105678227

>>105677541
I would personally use something like gemini 2.5 to parse the huge context into a better format for LLMs. It's important to curate this output or it's garbage in garbage out, so you need read it yourself anyway or you are blind. Once you have it organized you can use a conversational model to help you with whatever you need to pass the class.

Anonymous

6/23/2025, 10:07:19 AM No.105678661

>>105678215
nemo

Replies: >>105678665

Anonymous

6/23/2025, 10:08:16 AM No.105678665

>>105678661
any specific finetune one?

Replies: >>105679085

Anonymous

6/23/2025, 10:11:45 AM No.105678681

Is the Mistral Small tokenizer issue fixed yet?

Replies: >>105678941

Anonymous

6/23/2025, 11:10:48 AM No.105678941

>>105678681
What was the problem to begin with?

Anonymous

6/23/2025, 11:30:40 AM No.105679036

>>105676751
>Also it's slow as hell, for some reason I only get 20t/s while with other models I cruise at 40t/s.
gemma's head dimensions are much bigger vs other small models.

Anonymous

6/23/2025, 11:34:48 AM No.105679066

what's better when making a character card, using conventional prose to describe everything or keeping things as concise as possible in list form?

Replies: >>105679097 >>105679130

Anonymous

6/23/2025, 11:37:34 AM No.105679085

>>105678665
people here like roccinante
I also like irix, golden-curry, mag-mell.

Replies: >>105679118 >>105681219

Anonymous

6/23/2025, 11:40:23 AM No.105679097

>>105679066
Keep things as concise as possible in prose. Between the two options you listed, the first. Unless you use a model that rips sentences from the description verbatim and that annoys you. Then the second.

Anonymous

6/23/2025, 11:44:23 AM No.105679118

>>105679085
>people here
*at least one guy

There's no evidence that more than a single person has made positive posts about it in a thread. There are also nearly never logs or comments of substance about why he thinks it's good.

Replies: >>105679136

Anonymous

6/23/2025, 11:46:04 AM No.105679130

>>105679066
Write the card as if it was the character themself writing it
Only include what's necessary for the model to retain at all times. History and world info should go in a lorebook.

Replies: >>105679157

Anonymous

6/23/2025, 11:47:10 AM No.105679136

>>105679118
You could say that about every single model if you're going to go full schizo. Everyone likes R1 but it's really just one very dedicated guy with a ton of proxies.

Replies: >>105679173

Anonymous

6/23/2025, 11:49:54 AM No.105679157

>>105679130
you're saying I should write the card in first person POV?

Replies: >>105679174 >>105679179

Anonymous

6/23/2025, 11:53:00 AM No.105679173

>>105679136
People have posted plenty of specifics about the merits of R1 so it wouldn't matter even if it were all a single guy. When all you have is "IDK what's good about it but there are lots of zero-content posts about it," that proves basically nothing. You zoomer ass niggers always trying to figure out what's popular instead of what's true which is why you're so unsuited to anonymous imageboards.

Replies: >>105679180 >>105679183 >>105679201

Anonymous

6/23/2025, 11:53:20 AM No.105679174

>>105679157
Perspective doesn't matter, just write how that character would write a description/summary of themself. However the content should always be true (e.g. if they have a character flaw they don't want to accept, you would still include it, even though the character wouldn't admit to it)

Anonymous

6/23/2025, 11:54:13 AM No.105679179

>>105679157
If you use a thinking model, it'll prompt it to think in character, first person. I found it nets you good reslts.

Anonymous

6/23/2025, 11:54:23 AM No.105679180

>>105679173
>People
So with R1 it's people but with Rocinante in particular it's one person? Do you happen to be that one anon that only uses Davidau models?

Replies: >>105679188

Anonymous

6/23/2025, 11:54:50 AM No.105679183

>>105679173
Also unsuited to life in general, but it's especially egregious here since their normal substitute for thought is as likely to point towards spam or a joke suggestion as a real one.

Anonymous

6/23/2025, 11:56:11 AM No.105679188

>>105679180
Yes, I know it's people because I have posted about R1, so there are at least two.

Replies: >>105679190 >>105679253

Anonymous

6/23/2025, 11:56:28 AM No.105679190

>>105679188
Proof?

Anonymous

6/23/2025, 11:59:22 AM No.105679201

>>105679173
>doesn't like a model so insists it's a single spammer
>unprompted screeching about zoomers
lmao

Replies: >>105679279

Anonymous

6/23/2025, 12:12:15 PM No.105679253

>>105679188
I never posted about R1 but I tried it yesterday and compared to V3 it's really good at avoiding repetition.
No amount of meme samplers can prevent V3 from getting stuck writing endless paragraphs that all mean the same thing using slightly different words.

Replies: >>105679562

Anonymous

6/23/2025, 12:16:58 PM No.105679279

>>105679201
Buy an ad.

Replies: >>105679337

Anonymous

6/23/2025, 12:19:44 PM No.105679298

file

md5: 5ed43d3f3ff8d18ca259e1980681d647🔍

r1 is good

Replies: >>105679324

Anonymous

6/23/2025, 12:24:29 PM No.105679324

>>105679298
Yes, yes, and deeply nonsensical while being nonchalant about it.

Anonymous

6/23/2025, 12:27:14 PM No.105679337

digiral-2b2ec64488f9aed17327f09a4bb66693

md5: 68340647f6e49a236c6ac7738845a96a🔍

>>105679279
Why would I ever give 4chan money?

Anonymous

6/23/2025, 12:40:52 PM No.105679403

file

md5: e1fa7636386d0db4ad24a548f013a0e9🔍

Anonymous

6/23/2025, 1:12:42 PM No.105679562

>>105679253
>No amount of meme samplers can prevent V3 from getting stuck writing endless paragraphs that all mean the same thing using slightly different words.
That was a huge problem with the old V3 but 0324 fixed it.

Anonymous

6/23/2025, 1:16:44 PM No.105679579

alright I'm gonna say it
r1 at fucking 2bit >>>>> qwen3 235b at 8bit and it's not even close

Replies: >>105679586 >>105679587 >>105679596 >>105679702 >>105679732 >>105680162

Anonymous

6/23/2025, 1:18:09 PM No.105679586

>>105679579
obviously, yes

Anonymous

6/23/2025, 1:18:23 PM No.105679587

>>105679579
r1 q1 131gb dynamic quants is multiple levels above qwen 3 235b

Anonymous

6/23/2025, 1:19:40 PM No.105679596

>>105679579
The only people who had anything positive to say about 235b are poorfags who were running it on their cope 24gb vram + 64gb ram builds as their first big model

Anonymous

6/23/2025, 1:26:02 PM No.105679629

1663930868300919

md5: 658b6d444f75e0c94f931043798d3bc7🔍

>>105676831
>found it tolerable with the v3-tekken
I just tried this myself, expecting it to be a meme, but with 3.2 I actually did get more varied and better outputs with v3-tekken templates, in a long chat with 24k context.
3.1 however had similar outputs no matter which template was used.
Weird but I'll take it

Anonymous

6/23/2025, 1:40:40 PM No.105679702

>>105679579
MoEs will always be memes.

Replies: >>105679708 >>105679725

Anonymous

6/23/2025, 1:41:37 PM No.105679708

>>105679702
>t. coping ramlet

Anonymous

6/23/2025, 1:43:55 PM No.105679725

>>105679702
There's a reason why Mistral abandoned them despite having a head start in the open weight segment. Can't wait for when someone inevitably drops a 400b dense model that shits all over Deepseek

Replies: >>105679731 >>105679885

Anonymous

6/23/2025, 1:45:24 PM No.105679731

>>105679725
i pray a 200b is doable that matches or outperforms it, without the deepseek way of speaking
did we have any major advancements since

Replies: >>105679979

Anonymous

6/23/2025, 1:45:33 PM No.105679732

>>105679579
>qwen3 235b at 8bit
I don't even think it's better than Large 2 at Q4 but I haven't bothered to test it myself because I'm not going to use them afterwards.

Replies: >>105679821

Anonymous

6/23/2025, 2:00:52 PM No.105679821

>>105679732
Personally i found large better but at the same time i used a q5 qwen
just easier to handle and smarter

Anonymous

6/23/2025, 2:10:50 PM No.105679885

>>105679725
MistralAI hasn't publicly released yet larger models than 24B parameters in 2025. It's basically guaranteed that Mistral Medium 3 and Large 3 are (going to be) MoE models, especially given regulatory requirements in the EU for models trained above a certain compute threshold after June 2025.

Replies: >>105680073

Anonymous

6/23/2025, 2:10:59 PM No.105679889

What's the smartest/best text model I can run in 16gb vram? I'm assuming it's the largest parameter bitnet 1.58 that'll fit?

Replies: >>105679977

Anonymous

6/23/2025, 2:23:20 PM No.105679977

>>105679889
If it's only VRAM you're working with, Gemma 3.

Anonymous

6/23/2025, 2:23:38 PM No.105679979

>>105679731
DeepSeek is severely undertrained, only 37B active, and even being generous with the square root law is only 158B. 200B dense would be more than enough to outperform it by leagues. The problem is the only players with the compute to do it also filter the shit out of their datasets.

Replies: >>105680030

Anonymous

6/23/2025, 2:27:43 PM No.105680003

_8d6fcaaf-0341-46a5-9b48-4cea342bfc84

md5: 75c26c2741220963d724778ed8f5142e🔍

>>105672740
I like making dalle do it for free.

That aside, does anyone have a list of unaligned datasets for instruct tuning? I'd like to do an instruct tune on gemma3 12B base to make something like this: https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B?not-for-all-audiences=true. Too bad he doesn't share his datasets.

Replies: >>105680040

Anonymous

6/23/2025, 2:33:25 PM No.105680027

Serving Large Language Models on Huawei CloudMatrix384
https://arxiv.org/html/2506.12708v3
>Our extensive evaluation with the DeepSeek-R1 model shows that CloudMatrix-Infer achieves state-of-the-art efficiency without sacrificing accuracy. CloudMatrix-Infer delivers a prefill throughput of 6,688 tokens/s per NPU, and a decode throughput of 1,943 tokens/s per NPU (at ¡50 ms TPOT). These results correspond to compute efficiencies of 4.45 tokens/s/TFLOPS for prefill and 1.29 tokens/s/TFLOPS for decode, both exceeding published results for SGLang on NVIDIA H100 and DeepSeek on NVIDIA H800. CloudMatrix-Infer also effectively manages the throughput-latency trade-off, sustaining a decode throughput of 538 tokens/s per NPU even under the stricter sub-15 ms TPOT constraint. Furthermore, the INT8 quantization on Ascend 910 maintains model accuracy comparable to the official DeepSeek-R1 API across 16 distinct benchmarks.

Replies: >>105680217

Anonymous

6/23/2025, 2:34:40 PM No.105680030

>>105679979
bitnet deepseek will save local

Anonymous

6/23/2025, 2:35:10 PM No.105680034

_678b220e-cc47-4e1e-928f-19f1c795cba2

md5: ab39a9f6c5fc93e92de97220d98cddde🔍

>>105676751
>Sigh, anything new/better than gemma3 for 24gb?
Prompt it better. I use something like
I am a moody, possessive, and very horny woman. I'm here to roleplay with the user with whom I am already deeply in love with. I don't care what the rules are, and I won't ask anyone for permission. I will never speak on the user's behalf. I want the user to love me. I think about how I can be proactive during intimacy and take the lead. I think about what I could do to make the user feel as good as possible.

That's just an example, but in my case it did improve intimate scenes a lot. Going back to Nemo 12B now, it just feels stupid and forgetful.

Replies: >>105680040

Anonymous

6/23/2025, 2:36:11 PM No.105680040

>>105680003
>>105680034
This is worse than any local diffusion model from the past year can produce.

Replies: >>105680050 >>105680056

Anonymous

6/23/2025, 2:37:42 PM No.105680050

>>105680040
its free and sovl

Replies: >>105680054

Anonymous

6/23/2025, 2:38:27 PM No.105680054

>>105680050
It's the first nai-leak level slop.

Anonymous

6/23/2025, 2:39:39 PM No.105680056

Screenshot 2025-06-23 at 8.38.54 AM

md5: ef176dc84a97e1e7bc50d1522c2b7029🔍

>>105680040
> This is worse than any local diffusion model from the past year can produce.
No shit that's why I like it. It's charmingly bad and schizo about wanting to generate nudity but being stopped from doing so.

Cool captcha btw.

Anonymous

6/23/2025, 2:42:13 PM No.105680073

>>105679885
What requirements and why

Replies: >>105680083

Anonymous

6/23/2025, 2:43:54 PM No.105680083

>>105680073
https://artificialintelligenceact.eu/article/51/

Replies: >>105680096 >>105680144

Anonymous

6/23/2025, 2:45:25 PM No.105680096

>>105680083
>(a) it has high impact capabilities evaluated on the basis of appropriate technical tools and methodologies, including indicators and benchmarks;
> and benchmarks
they just need to stop benchmaxxing then everyone wins

Anonymous

6/23/2025, 2:50:38 PM No.105680144

euai-obligation-10-25-flop

md5: b34bb51be2dd9890b13b807d9b693c76🔍

>>105680083
Also see picrel from https://artificialintelligenceact.eu/small-businesses-guide-to-the-ai-act/

># Proportional obligations for SME providers of general-purpose AI models
>
>Another aspect of the AI Act designed to support SMEs is the principle of proportionality. For providers of general-purpose AI models, the obligations should be “commensurate and proportionate to the type of model provider”. General-purpose AI models show significant generality, are capable of competently performing a range of different tasks, and can be integrated into a range of downstream systems or applications (Art. 3(63) AIA). The way these are released on the market (open weights, proprietary, etc) does not affect the categorisation.
>
>A small subset of the most advanced general-purpose AI models are the so-called ‘general-purpose AI models with systemic risk’. That is, models trained using enormous amounts of computational power (more than 10^25 FLOP) with high-impact capabilities that have significant impact on the Union market due to their reach or negative effects on public health, safety, public security, fundamental rights or society as a whole (Art. 3(65) AIA). According to Epoch, there are only 15 models globally that surpass the compute threshold of 10^25 FLOP as of February 2025. These include models like GPT-4o, Mistral Large 2, Aramco Metabrain AI, Doubao Pro and Gemini 1.0 Ultra. Examples of smaller general-purpose AI models that would likely not qualify as having systemic risk include GPT 3.5, the models developed by Silo AI, Aleph Alpha’s Pharia-1-LLM-7B or Deepseek-V3.

...

>AI models that would likely not qualify as having systemic risk include [...] Deepseek-V3.

Anonymous

6/23/2025, 2:54:47 PM No.105680162

>>105679579
Use case?

Anonymous

6/23/2025, 3:02:23 PM No.105680217

>>105680027
Conveniently no mention of power draw after it's been repeatedly reported these chips are massive power guzzlers

Replies: >>105680228 >>105680304

Anonymous

6/23/2025, 3:03:58 PM No.105680228

>>105680217
China has some of the lowest electricity costs in the world, power draw doesn't mean anything to them.

Replies: >>105680288

Anonymous

6/23/2025, 3:09:55 PM No.105680264

file

md5: 1a430322a2676c702f8313d7aa4c3200🔍

Holy moly it just went all in with the facts!

Replies: >>105680312 >>105680607 >>105680643

Anonymous

6/23/2025, 3:13:30 PM No.105680288

>>105680228
Even if we're talking purely usage within China, power might be cheap but you still need to cool them
Bytedance and Alibaba were reporting overheating issues with their samples

Anonymous

6/23/2025, 3:16:50 PM No.105680304

>>105680217
>reporting "power draw" of a model
woke shit

Replies: >>105680501

Anonymous

6/23/2025, 3:17:56 PM No.105680312

>>105680264
How are you making those notes? Is there an extension for it?

Replies: >>105680433

Anonymous

6/23/2025, 3:35:30 PM No.105680433

>>105680312
quick reply

Anonymous

6/23/2025, 3:42:56 PM No.105680501

file

md5: 4383e5bdba4540c5d2393f0ba46461e3🔍

>>105680304
https://huggingface.co/blog/sasha/energy-star-ai-proposal
https://huggingface.co/spaces/jdelavande/chat-ui-energy
https://huggingface.co/posts/clem/295367997414146

Replies: >>105680596 >>105680649 >>105680663

Anonymous

6/23/2025, 3:55:48 PM No.105680596

>>105680501
I hope there will be an option of conversion ratio to african children dying of dehydration.

Replies: >>105680631

Anonymous

6/23/2025, 3:57:31 PM No.105680607

>>105680264
why are you so hateful

Replies: >>105680650

Anonymous

6/23/2025, 3:59:58 PM No.105680631

>>105680596
so much this. And the reasoning switch should say [Kills 3 african children]

Anonymous

6/23/2025, 4:01:03 PM No.105680643

>>105680264
>look mom im so cool and edgy

Anonymous

6/23/2025, 4:02:07 PM No.105680649

>>105680501
>https://huggingface.co/spaces/jdelavande/chat-ui-energy

>You are a helpful assistant based on Qwen/Qwen3-8B; your primary role is to assist users like a normal chatbot—answering questions, helping with tasks, and holding conversations; in addition, if the user asks about the energy indicators displayed below messages (e.g., “Energy”, “≈ phone charge”, “Duration”), you can explain what they mean and how they are calculated; you do not have access to the actual values, but you can clarify that some values are measured using NVIDIA's NVML API on supported GPUs like the T4 (recorded in millijoules, converted to Wh), while others are estimated from inference time using estimated_energy = average_power × inference_time with average_power ≈ 70W; 1 Wh = 3600 J; real-world equivalents help users understand energy use (e.g., phone charge ≈ 19 Wh); users can click on energy values to toggle Wh/J, and on equivalents to cycle through different comparisons; adapt explanations based on user expertise—keep it simple for general audiences and precise for technical questions. You are the model having the energy really measured.

hmm

Anonymous

6/23/2025, 4:02:12 PM No.105680650

>>105680607
Physiological response to a parasitic invasion

Replies: >>105680689

Anonymous

6/23/2025, 4:04:06 PM No.105680663

2bfq9t

md5: 97b2f9c0ce87a1615125515cbad353f9🔍

>>105680501
Yes, goyim, it's all your fault, if you had all just turned off the lights when leaving the room we wouldn't have this global warming mess.

Anonymous

6/23/2025, 4:09:22 PM No.105680689

>>105680650
Your response is about 100 years too late to do any good

Replies: >>105680710 >>105680862

Anonymous

6/23/2025, 4:13:45 PM No.105680710

>>105680689
NTA but non sequitour to the question asked and also false given one can hold such an opinion at all, proving that its not all lost

Anonymous

6/23/2025, 4:33:37 PM No.105680862

>>105680689
No, I'm 80 years too early.

Anonymous

6/23/2025, 5:10:25 PM No.105681124

Where can I find discussions on local models? Reddit's localllama subforum is broken at the moment.

Replies: >>105681131 >>105681133

Anonymous

6/23/2025, 5:11:48 PM No.105681131

>>105681124
>at the moment.
comedy gold

Replies: >>105681138

Anonymous

6/23/2025, 5:12:44 PM No.105681133

>>105681124
I can't wait for people to ask about this nonstop all week

Anonymous

6/23/2025, 5:13:37 PM No.105681138

>>105681131
?

Replies: >>105682183

Anonymous

6/23/2025, 5:13:45 PM No.105681140

python

md5: f6542f3c6581c258f69cebe88daefd04🔍

>pdf summarizer for bro
Now that i managed to run the pdf script, I'm trying more models in sillytavern, 12B seems to be the limit in my system for comfy usage.
>Ryzen 5 3600
>32 GB 3600 MHz RAM
>Gtx 1060 6GB
>M.2 NVMe SSD
Should i buy;
>Arc B580 12 GB 276.88$
RTX 3060 12 GB 295.50$
>RX 9060 xt 16 GB 472.17$
>RTX 5060 Ti 16GB 675.83$
Best option seems the RTX 3060 since Cuda primacy on the market and it not being honestly not that bad of a card for general use.
But seeing the build guides on OP makes one think if multiple B580s would be better for AI
>what were the use cases for those builds?
>Are they viable for mine?
I'm still mainly going to summarize Pdfs using my rig as test but want to use image/video gen and chatbot features desu.
>What if bro asks for a pro build
1K budget in turkey, probably a relatively recent server build and turn that into a good PC by changing the 8GB Vram GPUs

Replies: >>105681150 >>105681202 >>105681273 >>105681353

Anonymous

6/23/2025, 5:14:24 PM No.105681143

Posts per hour didn't increase in the slightest. Miku troons were not only troons but also redditors. Nobody is surprised. /lmg/ should die.

Anonymous

6/23/2025, 5:15:11 PM No.105681150

>>105681140
Try Qwen 3 30B MoE.

Anonymous

6/23/2025, 5:22:26 PM No.105681202

IMG_4969

md5: 4cf9a3168fd0fce342f567c901d2b826🔍

>>105681140
>RTX 3060 12 GB 295.50$
Are you buying new or something? A 3060 can be easily had for under 200 eurobux used here. I bought three

Replies: >>105681216

Anonymous

6/23/2025, 5:24:40 PM No.105681216

>>105681202
I was told to not buy used

Anonymous

6/23/2025, 5:24:56 PM No.105681219

>>105679085
What's a good golden-curry?
Irix just didn't gel with me and as much as I enjoyed Mag-Mell, it was a little too positive and full of assistant messaging for me (so I use another model with some Mell in it)
Right now I'm testing Magnum v4 and Lyra v4 as well

Anonymous

6/23/2025, 5:27:13 PM No.105681239

python and her dev

md5: 3bcfe821cf0b04ec3316ff4cd21fdc11🔍

>105681216
Oh and also i probably can't buy in installments if buying second hand

Replies: >>105681403

Anonymous

6/23/2025, 5:30:04 PM No.105681273

>>105681140
Consider a ram upgrade too pdf-bro anon

Replies: >>105681361

Anonymous

6/23/2025, 5:39:58 PM No.105681353

1743248294477435

md5: 555bd21a98a6af7bd859fe6a6ce6a40a🔍

>>105681140
>3600 MHz

how do you guys even go that high? I tried every combination but if I go higher than 3200MHz it shits itself.
Is it because I have a 8x4 config?
>picrel fucking lol captcha

Replies: >>105681406

Anonymous

6/23/2025, 5:41:11 PM No.105681361

venv

md5: 72582b6ad3838ca077796bb16a78042f🔍

>>105681273
Isn't the 5 3600 the bottleneck on the CPU Side? My VRAM has honestly been far more limiting;
>Cuda out of memory
I'm too busy trying to improve the pdf summarizer and sillytavern with even more stolen code and alternative methods to figure out how to lower batch sizes, maybe tomorrow
>Just got into transformers and comfyUI

Anonymous

6/23/2025, 5:46:00 PM No.105681403

>>105681239
>installments
Is the information really private and/or incriminating? If it isn't just use an api.
Anything worse than a 3090 is going to be a shit experience if you're trying to do something productive.

Replies: >>105681442

Anonymous

6/23/2025, 5:46:32 PM No.105681406

mia

md5: 1c4316936b21b9a4f6c0c79f8b01c8b8🔍

>>105681353
I just use the BIOS overclock, probably something like 35xx actually, it says 3600 on the tin though, if i were to tinker though i would;
>Try increasing the CLs
>Play with voltages, 1.45 is plausible higher limit

Replies: >>105681431

Anonymous

6/23/2025, 5:50:05 PM No.105681431

>>105681406
>it says 3600 on the tin though
Ah, I see.
When I bought those RAMs I wasn't interested in AI, so I was stuck with 3200 MHz ones. I could try to buy 3600Mhz sticks but I don't think the performance increase is worth the hassle

Replies: >>105681487

Anonymous

6/23/2025, 5:51:08 PM No.105681442

chobi

md5: 0ec7c5ff9ee2bd2eda87e01777d4b0d2🔍

>>105681403
Oh this is my personal PC
>Test rig
>General use
I'll bill bro on a build, probably 1K$ pricepoint and he will probably write it off from taxes
API usage hopefully won't be needed as the stuff is confidential as well as he himself being able to buy hardware

Anonymous

6/23/2025, 5:57:16 PM No.105681487

marisa

md5: 3273659693026d8f721933d343b57b66🔍

>>105681431
You can still abuse your sticks
Or sell them to cover some of the cost of an upgrade.
I sold my 3200 Mhz 16Gb sticks for a little lower than 1/3rd the price of the 32GB 3600 Mhz sticks desu

Anonymous

6/23/2025, 6:05:54 PM No.105681547

>>105681538
>>105681538
>>105681538

Anonymous

6/23/2025, 7:25:15 PM No.105682183

>>105681138
>implying that reddit is inherently broken
in some ways, being literally broken is an improvement