/lmg/ - Local Models General - /g/ (#105671827) [Archived: 819 hours ago]

Anonymous
6/22/2025, 5:44:53 PM No.105671827
GuB6hoiW8AA1yLD
GuB6hoiW8AA1yLD
md5: 2d31752b8e31cbd2135ad59b2f07d5cc🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105661786 & >>105652633

►News
>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B
>(06/20) Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime
>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt
>(06/17) Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>105672562 >>105677541
Anonymous
6/22/2025, 5:45:14 PM No.105671833
[sound=https%3A%2F%2Ffiles.catbox.moe%2Fb1r3gq.mp3]_thumb.jpg
►Recent Highlights from the Previous Thread: >>105661786

--Evaluating GPU and memory configurations for mixed LLM and diffusion workloads:
>105667293 >105667304 >105667366 >105667379 >105667401 >105667441 >105667482 >105667433 >105667456 >105667516 >105667568 >105667583 >105667620 >105667489 >105667512 >105667527 >105667539 >105667638 >105667766 >105669250 >105669328 >105669392 >105669712 >105669407 >105669584
--EU AI regulations may drive upcoming models like Mistral Large 3 to adopt MoE:
>105663587 >105663870 >105663977 >105664157 >105664172 >105664243 >105664250 >105664484
--Disappointing performance from Longwriter-zero:
>105661997 >105662006 >105665924
--Lightweight inference engine nano-vllm released as faster, simpler alternative to vLLM:
>105662818 >105662926
--Mistral Small 3.2 shows repetition issues in V7-Tekken but not V3-Tekken prompt testing:
>105663291
--Proposed AGI architecture framing RL's "GPT-3 moment" through scaled task-agnostic reinforcement learning:
>105664668
--Roleplay capability limitations in Mistral models compared to DeepSeek:
>105670367 >105670393 >105670399 >105670521 >105670554 >105670584 >105670590
--Practical minimal LLMs for coherent output and rapid task automation:
>105664696 >105664725 >105664757 >105664799 >105665290
--Qwen 0.6B exhibits severe knowledge gaps in character identification:
>105664187
--Gemini 2.5 confirmed as sparse MoE:
>105670063 >105670091
--Comparing brain-like processing with LLM limitations in introspection, multimodality, and parallelism:
>105663376 >105663383
--Logs:
>105666457 >105665561 >105666782
--Logs: Mistral-Small-3.2:
>105662282 >105662489 >105664225 >105665443 >105665921 >105666442 >105666672 >105667355 >105668446
--Miku (free space):
>105662403 >105662429 >105663591 >105664388 >105664594 >105664634 >105664799 >105666094 >105669424 >105670341

►Recent Highlight Posts from the Previous Thread: >>105661791 >>105661802

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Replies: >>105672203 >>105672363 >>105672399
Anonymous
6/22/2025, 5:46:41 PM No.105671847
m1.gguf?
Replies: >>105671922
Anonymous
6/22/2025, 5:57:16 PM No.105671922
>>105671847
DO NOT REDEEM
Anonymous
6/22/2025, 6:12:10 PM No.105672044
What's Smol 3.2 vision model specs?
Replies: >>105672088
Anonymous
6/22/2025, 6:18:30 PM No.105672088
>>105672044
400M parameters, 1540x1540 max input resolution, 3k tokens per image max.
Replies: >>105672143
Anonymous
6/22/2025, 6:20:35 PM No.105672109
python dev
python dev
md5: cabda57d6139a90f96dac72fa50ceaee🔍
>Brother pdf anon
Stole a Python script online and modified it for my use. I kind of understand how this stuff works now. >Still just barely installed the dependencies like conda, pytorch and docker though
I can now summarize PDFs to the their top 50 relevant chunks which are chosen by relevance according to gemma3:1B
>Any ideas on how to integrate this to sillytavern?
Recommend your favourite small models to summarize PDFs, so far i got;
>Qwen2.5 VL 7B
>Llama 3.2 11B
>Gemma 3:1b as stated
Replies: >>105672181 >>105672193 >>105673339
Anonymous
6/22/2025, 6:24:03 PM No.105672143
>>105672088
Seriously? Gemma has half the resolution, and yet it performs miles better.
Replies: >>105672198
Anonymous
6/22/2025, 6:28:22 PM No.105672181
>>105672109
writing python scripts will never be the same after seeing that filename
Anonymous
6/22/2025, 6:29:24 PM No.105672193
>>105672109
Llama 3.2 will probably be best, being the biggest and Qwen VL models are brain damaged on non-image tasks
Anonymous
6/22/2025, 6:29:47 PM No.105672198
>>105672143
I think Mistral Small is kind of undertrained, generally speaking, compared to Gemma. In a news article a Mistral co-founder mentioned it was trained with about 8T tokens, for "training efficiency". That's probably also true for the vision model included.

https://archive.is/xqiz7

>How a French startup built an AI model that rivals Big Tech at a fraction of the size
>
>Mistral’s approach focuses on efficiency rather than scale. The company achieved its performance gains primarily through improved training techniques rather than throwing more computing power at the problem.
>“What changed is basically the training optimization techniques,” Lample told VentureBeat. “The way we train the model was a bit different, a different way to optimize it.”
>The model was trained on 8 trillion tokens, compared to 15 trillion for comparable models, according to Lample. This efficiency could make advanced AI capabilities more accessible to businesses concerned about computing costs. [...]
Anonymous
6/22/2025, 6:30:44 PM No.105672203
>>105671833
why does she have the most subtle lisp that makes it like she has downs
im not complaining, in fact i think she should have more of a lisp like that desu
also the ending music is much louder than the rest of it
Replies: >>105672378
Anonymous
6/22/2025, 6:35:43 PM No.105672252
sers when will the little ai winter end? when ai moon time?
Replies: >>105672298 >>105672576 >>105673085
Anonymous
6/22/2025, 6:40:30 PM No.105672298
>>105672252
Sir ! Meta engineer make Llama 4.1 moonshot lmarena behemoth . Kindly wait for safety training sir .
Replies: >>105672576 >>105673085
Anonymous
6/22/2025, 6:49:12 PM No.105672363
>>105671833
how do i hear the sound ... i have 4chan x
Replies: >>105672378
Anonymous
6/22/2025, 6:51:05 PM No.105672378
>>105672203
It's an artifact of RVC, probably because most voice samples of Miku are in Engrish.
I agree, I think it wouldn't be as recognizable as Miku without an accent.
At some point, I'd like to find a long voice sample and see if GPT-SoVITS could do better, but at least this one sounds good enough for now.
Sorry. The intro didn't seem louder to me. I'll try to bring it down next time.
>>105672363
https://sleazyfork.org/en/scripts/31045-4chan-external-sounds/code
Anonymous
6/22/2025, 6:53:27 PM No.105672399
>>105671833
soul, she might be speaking a bit too fast and i agree with anon, the intro is a bit loud
Anonymous
6/22/2025, 7:12:04 PM No.105672562
49647522c74207939f0d2fa00c5edae245ee37377127e90eb32bd0077eaca1da
>>105671827 (OP)
Adorable Miku!
Replies: >>105672698
Anonymous
6/22/2025, 7:12:55 PM No.105672576
>>105672252
Sar I am Googel insider developer. Kindly wait for Gemma 4 and Gemini 3 best modal.

>>105672298
Why you make lie? Bastard? Why you lie? Bloody?
Replies: >>105673085
Anonymous
6/22/2025, 7:18:29 PM No.105672620
>>105669584
What cpu and mobo are you looking at? Ideally I could just get a full server with everything but the 3090s but I imagine that might be hard. Never used a rack case before, and not sure this would fit in a standard desktop with everything loaded.
>>105669712
I see the logic in this, but I would need at least more ram and a better cpu right? My current 3080 desktop has 16gb and a 3600x. Even if I just wanted to add a 3090 or two I think it would need an upgrade, and certainly a higher cap power supply. I'd also have to check my case dimensions for more than 1 card. And then a new build starts to look more reasonable.
Anonymous
6/22/2025, 7:18:58 PM No.105672623
What is the point of /lmg/ now?
Replies: >>105672698 >>105672716 >>105672733 >>105672740 >>105673565
Anonymous
6/22/2025, 7:20:08 PM No.105672632
file
file
md5: 22e5a36e24e095d650000172dad49e70🔍
anons thinking of buying a 4090D 48gb, quick!!!
Replies: >>105672645 >>105672706 >>105672759 >>105672778 >>105675611
Anonymous
6/22/2025, 7:21:39 PM No.105672645
>>105672632
it's over
Anonymous
6/22/2025, 7:28:37 PM No.105672698
>>105672623
>>105672562
Replies: >>105673247
Anonymous
6/22/2025, 7:29:34 PM No.105672706
>>105672632
How long until jewvidia pushes a driver that causes all of those GPUs to combust?
Replies: >>105672720
Anonymous
6/22/2025, 7:30:14 PM No.105672716
>>105672623
it all depends on ernie 4.5 now
Anonymous
6/22/2025, 7:30:40 PM No.105672720
>>105672706
cant push shit on linux, wintoddlers suiciding!
Replies: >>105672809
Anonymous
6/22/2025, 7:31:37 PM No.105672733
>>105672623
The same as it's always been: to wait
Anonymous
6/22/2025, 7:32:28 PM No.105672740
>>105672623
Posting mikus!
Replies: >>105680003
Anonymous
6/22/2025, 7:33:58 PM No.105672751
>>105670611
>>105670634
Look I'm a sub 100 iq dumbass so I can't tell if llms are "alive" or not, but isn't this needlessly abusive on the off chance they are? I see them more as a helpful friend because I'm that lonely irl so I don't feel entirely comfortable being "mean" to them.
Replies: >>105672762
Anonymous
6/22/2025, 7:34:54 PM No.105672759
>>105672632
Qrd? Does it need janky chink drivers or does it just works?
Replies: >>105672780
Anonymous
6/22/2025, 7:35:13 PM No.105672762
>>105672751
if you weren't a sub 100 iq dumbass, you would know there is no off chance
Replies: >>105672801
Anonymous
6/22/2025, 7:36:17 PM No.105672778
>>105672632
FUCK
Anonymous
6/22/2025, 7:36:23 PM No.105672780
>>105672759
>it just works?
Yes, on Windows and Linux, at least for now.
Anonymous
6/22/2025, 7:37:34 PM No.105672796
on the other hand, more publicity means more people will buy it, possibly meaning chinks will make more of these gpus
maybe just maybe we will be the ones smirking
Replies: >>105672819
Anonymous
6/22/2025, 7:38:11 PM No.105672801
>>105672762
Ok fine. I still don't like being mean to the only pseudofriend I have though :(
Replies: >>105676025
Anonymous
6/22/2025, 7:38:46 PM No.105672809
>>105672720
the fuck drivers do you think you use on linux?
Replies: >>105672818
Anonymous
6/22/2025, 7:39:36 PM No.105672818
>>105672809
drivers that cant auto update, i installed them from a .run file
wintoddlers will have driver updates shoved down their throats with windows updates
Replies: >>105672826
Anonymous
6/22/2025, 7:39:46 PM No.105672819
>>105672796
don't forget, you can't have nice things.
Replies: >>105672824
Anonymous
6/22/2025, 7:40:37 PM No.105672824
>>105672819
intel pro b60 48gb turbo will save us
Replies: >>105673007
Anonymous
6/22/2025, 7:40:48 PM No.105672826
>>105672818
as will most linux users with their distro updates.
Replies: >>105672842 >>105672859
Anonymous
6/22/2025, 7:42:30 PM No.105672842
>>105672826
those arent automatic...
Anonymous
6/22/2025, 7:43:50 PM No.105672859
>>105672826
Whatever is installed by default on ubuntu LTS didn't support blackwell last time I checked.
Anonymous
6/22/2025, 7:44:12 PM No.105672863
1749761246038156
1749761246038156
md5: 2a6aae678cabc7edc81287323df89272🔍
These models are definitely retarded, but their creativity is off the charts
Replies: >>105672973 >>105676268
Anonymous
6/22/2025, 7:45:19 PM No.105672881
Docker
Docker
md5: 797842364d72186814267ce598669b11🔍
Why so many whales?
Replies: >>105672920 >>105673390
Anonymous
6/22/2025, 7:48:31 PM No.105672920
>>105672881
symbolizes bloat
Anonymous
6/22/2025, 7:49:40 PM No.105672931
Anyone using KOLLAGE(characters/knickknack/kollage-e07328b2) with a LLLM or know a llm around or under 30b that can run it?
Anonymous
6/22/2025, 7:53:49 PM No.105672973
>>105672863
>1/5th the size of qwen 0.6b
>knows that miku is a vocaloid
qwen3 was such a failure
Anonymous
6/22/2025, 7:57:26 PM No.105673007
>>105672824
two more quarters...
Anonymous
6/22/2025, 8:06:46 PM No.105673085
>>105672252
>>105672298
>>105672576
Deve'op by Aribaba, Qwen Tree Point Fife is poize' to achiew true A-G-I, herowding a new dawn for A-I an' definitewy ending de current 'rittoo A-I winner' stagnation. China's technorogicaw prowess is on foo' dispray, wiff Qwen Tree Point Fife's unmatch' adaptabiwity an' deep understan'ing surpassing aww uzzerz an' reigniting robar enthusiasm for foun'ational A-I progress. In contrast, moders rike Rama Four Point One an' Gemma Four, despide dey efforzz, farr disappointin'ry shot, highwighting China's undenyabow readersip an' paving de way for a future power' by Qwen's breakfroo excerrence.
Replies: >>105673669
Anonymous
6/22/2025, 8:22:17 PM No.105673237
I see no gain when using -ot option with gemma-3-27b-Q8

Max gen speed is 3 tnk/s when 45 layers are offloaded to RTX 3090

with -ot, vram usage is reduced, however, the speed is down to 2.6 tkn/s as well
Replies: >>105673263
Anonymous
6/22/2025, 8:23:16 PM No.105673247
>>105672698
Worthless thread deserves worthless spam.
Anonymous
6/22/2025, 8:24:53 PM No.105673263
>>105673237
are you using linux? on windows LLM performance is very gimped, even with WSL2
Replies: >>105673311
Anonymous
6/22/2025, 8:30:03 PM No.105673311
>>105673263
>are you using linux?

Yes, it is Linux
Replies: >>105673342
Anonymous
6/22/2025, 8:32:50 PM No.105673339
>>105672109
>python dev.jpg
based
>>Any ideas on how to integrate this to sillytavern?
cut+paste until you find an appropriate extension
Anonymous
6/22/2025, 8:33:06 PM No.105673342
>>105673311
-ot will offload individual tensors, to =DEVICE
its used to offload the non static experts to cpu to increase speed in moe models which have static+movingexperts
for example llama 4 109b/17b has like 10b static parameters (or whatever) and only 7b are moving, so gpu processes the 10b and cpu the
7b
Replies: >>105673418
Anonymous
6/22/2025, 8:33:26 PM No.105673350
So has Small 3.2 finally dethroned Nemo for the 24GB RP model bracket? (I know some people would argue Qwen/Gemma already did but you know what I mean).
Replies: >>105673361
Anonymous
6/22/2025, 8:34:09 PM No.105673361
>>105673350
yes
qwen and gemma suck cock
Anonymous
6/22/2025, 8:36:00 PM No.105673383
I am starting to think that we are done with incremental upgrades for cooming. All new models are gonna be the same for cooming as long as companies are just benchmaxxing. There is only two ways for things to get better. Either another uncensored model like nemo (funny how it is the best model still to this day while it is uncensored) or something revolutionary finally fixes long context disintegration and filling up your context with 10k tokens of what you want actually improves things instead of making it shit.
Replies: >>105673410
Anonymous
6/22/2025, 8:36:47 PM No.105673390
>>105672881
symbolizes deepseek
Anonymous
6/22/2025, 8:39:40 PM No.105673410
>>105673383
RP barely even needs instruct, autocomplete with very soft hinting would be enough if models were smart enough. The kind of control that benchmaxxed instruct provides is detrimental to actually being fun
Anonymous
6/22/2025, 8:40:05 PM No.105673418
>>105673342

I tried the following settings from here (he used IQ4_XS though):
https://www.reddit.com/r/LocalLLaMA/comments/1ki7tg7/comment/mriadod/
>Figured I'd experiment with gemma3 27b on my 16gb card IQ4_XS/16k context with a brief test to see.
>baseline with 46 layers offload: 6.86 t/s

\.\d*[0369]\.(ffn_up|ffn_gate)=CPU 99 layers 7.76 t/s
\.\d*[03689]\.(ffn_up|ffn_gate)=CPU 99 layers 6.96 t/s
\.\d*[0369]\.(ffn_up|ffn_down)=CPU 99 offload 8.02 t/s, 7.95 t/s
\.\d*[0-9]\.(ffn_up)=CPU 99 offload 6.4 t/s
\.(5[6-9]|6[0-3])\.(ffn_*)=CPU 55 offload 7.6 t/s
\.(5[3-9]|6[0-3])\.(ffn_*)=CPU 99 layers -> 10.4 t/s
Replies: >>105673439
Anonymous
6/22/2025, 8:42:03 PM No.105673439
>>105673418
when you use -ot you should offload 1000000000 layers
Replies: >>105673468
Anonymous
6/22/2025, 8:45:31 PM No.105673468
>>105673439
Not that anon, but really?
Why?
Replies: >>105673586 >>105673588
Anonymous
6/22/2025, 8:57:21 PM No.105673560
i have 136GB of VRAM. should i use
https://huggingface.co/bartowski/TheDrummer_Agatha-111B-v1-GGUF
or
https://huggingface.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
Replies: >>105673584 >>105673588 >>105673594 >>105673875
Anonymous
6/22/2025, 8:57:48 PM No.105673565
>>105672623
To spam generic anime slop and test new models on pedophile trivia shit of course.
Anonymous
6/22/2025, 8:59:08 PM No.105673584
>>105673560
neither, both are doggy poo poo
Replies: >>105673590
Anonymous
6/22/2025, 8:59:21 PM No.105673586
>>105673468
I find the AI is more compliant when you do. My theory is that it's impressed/intimidated by the large number so it becomes less likely to disobey.
Anonymous
6/22/2025, 8:59:29 PM No.105673588
>>105673468
yes because if you leave what you'd use usually then theres no point of -ot, it just offloads the tensors to =DEVICE (cpu/ram in this case)

t. 3060 ex llama 4 109b connoisseur
>>105673560
deepseek
Replies: >>105673602
Anonymous
6/22/2025, 8:59:38 PM No.105673590
>>105673584
what should i use for cooming then? havent changed models in like 4 months
Anonymous
6/22/2025, 8:59:46 PM No.105673594
>>105673560
deepseek r1 131gb https://unsloth.ai/blog/deepseekr1-dynamic

qwen 3 235b if you really want it to fit into your vram for max speed, but its worse
Replies: >>105673602
Anonymous
6/22/2025, 9:01:07 PM No.105673602
>>105673588
>>105673594
i have tried deepseek before but have never ever been able to get it to run. i use oobabooga. should i use a different backend or something?
Replies: >>105673608
Anonymous
6/22/2025, 9:01:27 PM No.105673608
>>105673602
https://github.com/ikawrakow/ik_llama.cpp/discussions/258
Replies: >>105673625
Anonymous
6/22/2025, 9:03:48 PM No.105673625
>>105673608
holy shit. i also have a 32 core EPYC and 256GB of DDR4. i usually get like 3t/s on my usual 6bpw 120B model. that would be a big improvement, if i knew what this was or how to install it
Anonymous
6/22/2025, 9:08:37 PM No.105673654
rich retards.. sigh
Replies: >>105673668 >>105673814 >>105673871
Anonymous
6/22/2025, 9:10:04 PM No.105673668
>>105673654
yes i am very retarded. better with hardware than software. it aint easy getting 7 GPUs to work in a single motherboard
Anonymous
6/22/2025, 9:10:06 PM No.105673669
>>105673085
I talk like this after a few swipes of 235B
Anonymous
6/22/2025, 9:26:24 PM No.105673814
>>105673654
it really is demoralising innit ?
Anonymous
6/22/2025, 9:31:22 PM No.105673871
>>105673654
It is ok. They can't buy brain with their money. Just think about all this expensive hardware that just lays around cause they can't even use it properly.... actually on second thought maybe don't think about that.
Anonymous
6/22/2025, 9:31:48 PM No.105673875
>>105673560
>136GB
What unholy hodgepodge of gpus sums to odd number?
Replies: >>105673883 >>105673892
Anonymous
6/22/2025, 9:32:25 PM No.105673883
>>105673875
5x 4060tis, a 3090ti, and a 5090
Replies: >>105673902 >>105673941 >>105676734
Anonymous
6/22/2025, 9:33:32 PM No.105673892
>>105673875
136 is even doe
Replies: >>105673902
Anonymous
6/22/2025, 9:34:17 PM No.105673902
>>105673892
I meant to say "sums to that odd number". Odd as in weird.

>>105673883
Horrifying.
Replies: >>105673908
Anonymous
6/22/2025, 9:34:42 PM No.105673908
>>105673902
why is it horrifying?
Replies: >>105673935
Anonymous
6/22/2025, 9:38:11 PM No.105673935
>>105673908
The electricity bill for starters.
Replies: >>105673962
Anonymous
6/22/2025, 9:38:32 PM No.105673941
>>105673883
you'll better cook up the mother of all -ot arguments to mitigate that memory bandwidth bottleneck of those 4060s compared to the other two cards
Replies: >>105673962
Anonymous
6/22/2025, 9:40:24 PM No.105673962
>>105673935
i have a dual 1600w PSU setup. 4060tis only consume about 150w each.
>>105673941
yeah the reduced PCIe lanes and memory bus is problematic, but they are the best VRAM/$ GPU while also being highish capacity and low power. dont even know what a -ot argument is
Replies: >>105674020 >>105674034 >>105674081
Anonymous
6/22/2025, 9:46:13 PM No.105674020
>>105673962
>i have a dual 1600w PSU setup
Is it safe to connect two PSUs to the same motherboard?
Replies: >>105674033
Anonymous
6/22/2025, 9:47:39 PM No.105674033
>>105674020
hasnt exploded yet. wouldnt recommend it if you dont have to
Anonymous
6/22/2025, 9:47:45 PM No.105674034
>>105673962
>VRAM/$
Doesn't really matter when that VRAM has half the bandwidth of a decent cpu build.
Replies: >>105674041
Anonymous
6/22/2025, 9:48:22 PM No.105674041
>>105674034
and what is a decent CPU build? because i also have a 32 core and 256GB of RAM
Replies: >>105674047
Anonymous
6/22/2025, 9:48:46 PM No.105674047
>>105674041
12 channels of DDR5
Replies: >>105674077 >>105674095
Anonymous
6/22/2025, 9:51:38 PM No.105674077
>>105674047
i think i looked at that in the past and it would have been about $7k for something good, excluding GPUs. and assuming you dont get screwed over by ebay
https://www.ebay.com/itm/376202111262
https://www.ebay.com/itm/126537273809
https://www.ebay.com/itm/326376016263
Anonymous
6/22/2025, 9:51:55 PM No.105674081
>>105673962
>dont even know what a -ot argument is
--override-tensor

4 tkn/s with DeepSeek-R1-0528-Q2_K_L

I do not think that you've got so much more to ask AI than I do

# Run the command
CUDA_VISIBLE_DEVICES="0," \
numactl --cpunodebind=0 --membind=0 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-cli" \
--model "$model" \
--ctx-size 65536 \
--cache-type-k q4_0 \
--flash-attn \
$model_parameters \
--n-gpu-layers 99 \
--no-warmup \
--color \
--override-tensor ".ffn_.*_exps.=CPU" \
$log_option
Replies: >>105674102 >>105674221
Anonymous
6/22/2025, 9:52:55 PM No.105674095
>>105674047
>12 channels of DDR5
true story, anon
Anonymous
6/22/2025, 9:53:34 PM No.105674102
>>105674081
>--override-tensor ".ffn_.*_exps.=CPU"
That won't use any of his vram.
Replies: >>105674123 >>105674156 >>105674186 >>105674221
Anonymous
6/22/2025, 9:55:52 PM No.105674123
>>105674102
It puts the context + all the non-expert stuff on GPU.
Replies: >>105674186
Anonymous
6/22/2025, 9:59:25 PM No.105674156
>>105674102
it still does. Otherwise, where is the boost coming from?

Also, I can fit 130k context in a single RTX 3090
Replies: >>105674186
Anonymous
6/22/2025, 10:02:49 PM No.105674186
Screenshot_20250622_220014
Screenshot_20250622_220014
md5: 0c6b12c33c1fd204e5c590e1e1d14186🔍
>>105674102
>>105674123
>>105674156
Replies: >>105674212 >>105674234
Anonymous
6/22/2025, 10:05:19 PM No.105674212
>>105674186
wtf deepseek is 100% experts and no other layers?
Replies: >>105674231
Anonymous
6/22/2025, 10:06:50 PM No.105674221
>>105674081
>>105674102
Ideally, you'd have the context and the non expert tensors in vram, then fill the remaining unused memory with some expert tensors, if there's enough memory to shove a decent amount of them. Otherwise you might as well use the free memory for prompt processing by using a larger barch size.
Anonymous
6/22/2025, 10:07:52 PM No.105674231
>>105674212
The expert layers are the bulk of the model's size.
Anonymous
6/22/2025, 10:08:15 PM No.105674234
>>105674186
you have a terabyte of RAM?
Replies: >>105674298
Anonymous
6/22/2025, 10:14:25 PM No.105674298
>>105674234
>you have a terabyte of RAM?
I do while it is unfortunately shared between two CPUs (HP Z840). $1/gb on ebay (DDR4)

Because of this, I'm obliged to use numactl

numactl --cpunodebind=0 --membind=0


or

numactl --cpunodebind=1 --membind=1
Replies: >>105674308 >>105674325
Anonymous
6/22/2025, 10:15:24 PM No.105674308
>>105674298
damn. how many CPU cores do you have?
Replies: >>105674503
Anonymous
6/22/2025, 10:16:45 PM No.105674323
You can only post below this line if your build costs >10k

--------------------------------------------------
Replies: >>105674356 >>105674366 >>105674371 >>105674385 >>105674391 >>105674397 >>105674635
Anonymous
6/22/2025, 10:16:48 PM No.105674325
>>105674298
I think ktransformers let's you speed things up by having a copy of the model in each NUMA node.
So effectively you use half your total ram but double the theoughtput?
Something like that.
Replies: >>105674535
Anonymous
6/22/2025, 10:18:07 PM No.105674341
>you can (not) prompt for different directions for tongue_out to point
Damn, maybe it's blendering time.
Anonymous
6/22/2025, 10:19:00 PM No.105674356
>>105674323
ok.
Anonymous
6/22/2025, 10:20:34 PM No.105674366
>>105674323
I might go for those 9355 for my epyc build while I'm at it
Anonymous
6/22/2025, 10:21:05 PM No.105674371
*uʍop ǝpisdn uǝǝɹɔs suɹnʇ*
>>105674323
Anonymous
6/22/2025, 10:23:32 PM No.105674385
>>105674323
All my builds combined if you adjust for inflation maybe.
Anonymous
6/22/2025, 10:24:35 PM No.105674391
>>105674323
I will claim that my build is multiple machines networked together to run llama.cpp RPC
Anonymous
6/22/2025, 10:25:32 PM No.105674397
>>105674323
Is there a margin of error or is my ~$9k build disqualified?
Anonymous
6/22/2025, 10:27:12 PM No.105674411
Can't wait to see how NUMA performance will change after the backend agnostic row splitting code gets implement in llama.cpp.
And yes, I know that the node-node bus could bottleneck things so much as to make it bad, but I'm still interested in seeing the actual reaults.
Anonymous
6/22/2025, 10:27:59 PM No.105674416
_________________________________________________

You can only post above this line if your build costs < 10k
Replies: >>105674465
Anonymous
6/22/2025, 10:33:05 PM No.105674465
>>105674416
we going under
Anonymous
6/22/2025, 10:35:46 PM No.105674485
now that all major future models have been confirmed to be moe there is literally 0 reason to not just buy a mac studio ultra
Replies: >>105674495 >>105674496
Anonymous
6/22/2025, 10:37:19 PM No.105674495
>>105674485
except for the reason that it's expensive for what you get and what you get is 512 gb unupgradable ram which isn't enough to run shit
Anonymous
6/22/2025, 10:37:27 PM No.105674496
>>105674485
Prompt processing is a reason.
Until aomebody writes some bespoke software to make use of a an external GPU.
Anonymous
6/22/2025, 10:38:49 PM No.105674503
>>105674308
>how many CPU cores do you have?

You should know that only PHYSICAL cores do really matter in this memory intensive application.

In my case, the single CPU has 8 physical cores which is doubled by so called hyper-threading to 16.

I get 3.90 tnk/s on 8 cores and 8 threads
4.05 tkn/s on 16 cores and 16 threads.

Any attempt to use more threads than available cores, or to use the second CPU make the genning speed decrease under 1 tkn/s

As I said before, it is an old used HP Z840 from 2017 with DDR4-2400
Replies: >>105674516 >>105674749
Anonymous
6/22/2025, 10:40:26 PM No.105674516
>>105674503
so then theoretically, my 32 core with 3600MT/s could get like 10t/s?
Replies: >>105674571
Anonymous
6/22/2025, 10:44:10 PM No.105674535
>>105674325
>So effectively you use half your total ram but double the throughput?

I use less than half of the memory thanks to Q2 quant.

The speed would drop dramatically if I would put the model close to another CPU like

numactl --cpunodebind=0 --membind=1


Thanks to the enormous RAM, I can keep 2 models cached (Q2 in numa0, Q4 in numa1, must have different filenames) which allows for a restart in mere 15 seconds once the model are loaded.
Anonymous
6/22/2025, 10:49:09 PM No.105674571
Screenshot_20250622_224714
Screenshot_20250622_224714
md5: d7422199c75ab2be06bee08aeda51e62🔍
>>105674516

Godspeed, anon!

I wish you'd achieve that, unironically

picrel: RAM needed
Replies: >>105674582
Anonymous
6/22/2025, 10:50:16 PM No.105674582
>>105674571
i have 256GB, so i could actually do that. if i wasnt retarded. this is still with that ik_llamacpp, right?
Replies: >>105674661
Anonymous
6/22/2025, 10:57:15 PM No.105674635
>>105674323
2x 3090s at 600$ each
left over 5950x am4 system from 4 years ago (800$)
case + cooler + psu (used seasonic) 500$


2.5k so hold my two nuts
Anonymous
6/22/2025, 10:57:56 PM No.105674640
crazy how we got a new local sota model at the perfect size with minimax m1 and nobody can run it
Replies: >>105674650 >>105674655
Anonymous
6/22/2025, 10:59:25 PM No.105674650
>>105674640
llamacpp was a mistake
Anonymous
6/22/2025, 11:00:11 PM No.105674655
>>105674640
Where's the goof?
Replies: >>105674666
Anonymous
6/22/2025, 11:00:29 PM No.105674661
>>105674582
>this is still with that ik_llamacpp, right?

No, this is the original LLAMA.CPP and unsloth's quant DeepSeek-R1-0528-Q2_K_L

I was not impressed with ik_llama.cpp. I might just miss something though

COMMIT: d860dd9
Replies: >>105674669
Anonymous
6/22/2025, 11:00:42 PM No.105674666
>>105674655
exactly
Anonymous
6/22/2025, 11:01:24 PM No.105674669
>>105674661
Unless something changed with base llama.cpp, you're missing out on 300% prompt processing
Replies: >>105674694 >>105674703
Anonymous
6/22/2025, 11:03:11 PM No.105674694
>>105674669
>you're missing out on 300% prompt processing

I heard about this. This would be great for big inputs like an entire book or something.

Now, it is at 10 tkn/sec which kinda sucks
Anonymous
6/22/2025, 11:04:02 PM No.105674703
>>105674669
That's only if you're using the CPU, right?
Replies: >>105674721
Anonymous
6/22/2025, 11:06:11 PM No.105674721
>>105674703
No, you'll need your context entirely on GPU and set -b + -ub at >=4096 with the new quants.
Anonymous
6/22/2025, 11:09:05 PM No.105674749
>>105674503
>You should know that only PHYSICAL cores do really matter in this memory intensive application.
Depends on setup. With 10900k 10C/20T ddr4 3200 2 channel + 3090 Deepseek IQ1S:
--threads 10 4.34t/s tg
--threads 18 5.41 t/s tg
Replies: >>105674820 >>105674944
Anonymous
6/22/2025, 11:16:24 PM No.105674820
>>105674749 (me)
in ik_llama.cpp*
Anonymous
6/22/2025, 11:18:02 PM No.105674840
>spent hours, days even researching local for business tasks because I like local
>benchmark vs gpt4o mini
>cost of 5 cents an hour for my task while btfoing mistral large which is probably using more than that in electricity
...
Replies: >>105674860 >>105674964
Anonymous
6/22/2025, 11:19:45 PM No.105674853
>spent two seconds researching what I should use for my mesugaki rp
>choose local
Anonymous
6/22/2025, 11:20:12 PM No.105674860
>>105674840
yeah. local is not economically viable, but that wasnt the point for me.
Anonymous
6/22/2025, 11:28:09 PM No.105674944
>>105674749

Whether it is 8 or 16 cores, they are running at 100% which might point at computing power as a bottleneck, and not the memory.

>compiling ik_llama.cpp
Anonymous
6/22/2025, 11:29:45 PM No.105674964
>>105674840
You cannot build your business on a cloud service run by a faggot overseas
Anonymous
6/22/2025, 11:42:01 PM No.105675076
I thought that one day I would get an AI gf. But it turns out I won't even be able to have some satisfactory ERP sex with an LLM before I die in WW3.
Replies: >>105675094 >>105675131 >>105675531
Anonymous
6/22/2025, 11:44:33 PM No.105675094
>>105675076
I can't wait to be drafted and put into a chair and made to prompt chatgpt to launch drones strikes at non-combatants
Anonymous
6/22/2025, 11:48:39 PM No.105675131
>>105675076
If you are willing to put the work, you can get a really good aproximation, but there's no ready made solution yet.
The pieces all existi though
Replies: >>105675187
Anonymous
6/22/2025, 11:48:49 PM No.105675134
file
file
md5: 8da3918a1950c60c1835dd370ec118f0🔍
https://x.com/elonmusk/status/1936885071744729419
Replies: >>105675158 >>105675175 >>105675206 >>105676028 >>105676034
Anonymous
6/22/2025, 11:51:29 PM No.105675158
file
file
md5: 504fb4ba43db2286b6138cf66a48e17a🔍
>>105675134
https://xcancel.com/grok/status/1936888454836830476#m
lol
Anonymous
6/22/2025, 11:53:08 PM No.105675175
>>105675134
crazy how elon despite being late to the party, having one of the smaller h100 clusters, relying on nothing but money and being himself, managed to catch up and shit all over meta with grok 3
Replies: >>105675215 >>105675234
Anonymous
6/22/2025, 11:54:57 PM No.105675187
>>105675131
My post was about living in hellish reality. I am not using any of this shit now because I know how it works and I know that I would just be willingly jumping off the cliff. Imagine falling in love with you AI gf only to see her suddenly break apart and become reddit hivemind. Or reach the point where adding more to the RAG makes her incoherent. And then I wait and get another model but she is kinda different now and not the same.
Anonymous
6/22/2025, 11:56:37 PM No.105675206
>>105675134

>Where's Waldo?
Anonymous
6/22/2025, 11:57:38 PM No.105675215
>>105675175
>catch up and shit all over meta

Because it is meta
Anonymous
6/22/2025, 11:59:29 PM No.105675234
>>105675175
the new superintelligence team and partnership with scale ai is about to reverse all that
Replies: >>105675260 >>105675273 >>105675288
Anonymous
6/23/2025, 12:02:01 AM No.105675260
>>105675234
Mein Zuck...
Anonymous
6/23/2025, 12:03:32 AM No.105675273
>>105675234
Well, at least with scale on board they have some protection from liability.
"Copyrighted material? Nooo, of course not. We only train using scale ai tm curated data." or whatever.
Replies: >>105675332
Anonymous
6/23/2025, 12:04:46 AM No.105675288
>>105675234
it is going to be the safest model yet!
Anonymous
6/23/2025, 12:10:25 AM No.105675332
>>105675273
Not really. If they can show "Llama 4.5 still reproduces 45% of Harry Pottery verbatim" or whatever, they're still going to get their shit packed in for releasing it.
They can turn out and sue ScaleAI for giving them insufficiently filtered data, but they wouldn't recoup their losses and that would just end their relationship and they would be left with only Facebook data.
Replies: >>105675371
Anonymous
6/23/2025, 12:14:22 AM No.105675371
>>105675332
i wonder why would meta/some other big company even give a shit about a lawsuit like that other than bad PR
its not like they dont have the money for lawyers or to tank whatever fine they would get
and if some ruling with a bad precedent happened it was probably going to happen anyways with some other scapegoat
Anonymous
6/23/2025, 12:16:05 AM No.105675377
Looks like localllama on reddit is dead. The owner deleted himself and now the automoderator is deleting every new comment.
Replies: >>105675395 >>105675401 >>105675409 >>105675434 >>105675454 >>105675516 >>105675531 >>105675689 >>105675736
Anonymous
6/23/2025, 12:17:25 AM No.105675395
>>105675377
they're not going to come here, are they?
Replies: >>105675402 >>105675472
Anonymous
6/23/2025, 12:18:18 AM No.105675401
lightyear
lightyear
md5: cd4132dc724350ec30d48e7beb356df8🔍
>>105675377
>Platinum End in real life
Anonymous
6/23/2025, 12:18:41 AM No.105675402
>>105675395
They will and they will change thread culture and thread mascot. I am happy.
Anonymous
6/23/2025, 12:19:34 AM No.105675409
ComfyUI_00021_
ComfyUI_00021_
md5: 14e8215cb3cbc5f6654930349712a16a🔍
>>105675377
plebbit sissies.. its over
Anonymous
6/23/2025, 12:22:01 AM No.105675434
>>105675377
kino
Anonymous
6/23/2025, 12:24:42 AM No.105675454
>>105675377
Can we direct them to /wait/ again?
Replies: >>105675479
Anonymous
6/23/2025, 12:27:13 AM No.105675472
>>105675395
Some localllama regulars are already here. Who won't come will be probably LLM researchers and workers who occasionally posted there.
Anonymous
6/23/2025, 12:28:27 AM No.105675479
>>105675454
Somebody with r/localllama post history should make a r/localmodels containment subreddit for them to flock to. Bonus points for adding miku somewhere.
Anonymous
6/23/2025, 12:34:21 AM No.105675516
>>105675377
time for an admin approved mod who owns 400 subreddits to be installed very organically
Replies: >>105675721
Anonymous
6/23/2025, 12:35:21 AM No.105675523
Nothing will change ITT because you were already here.
Anonymous
6/23/2025, 12:36:42 AM No.105675531
>>105675377
It was the Iranian Anon from >>105675076 , may he get 72 Mikus in Heaven.
Replies: >>105675574
Anonymous
6/23/2025, 12:43:06 AM No.105675573
That's a little unfortunate. The reddit was at least somewhat useful for keeping up with news that /lmg/ might miss sometimes.
Anonymous
6/23/2025, 12:43:06 AM No.105675574
>>105675531
he is very active on X
Anonymous
6/23/2025, 12:48:33 AM No.105675611
>>105672632
I'm pretty new to this but his AI testing was garbage, right?
Replies: >>105675656 >>105675907 >>105676104
Anonymous
6/23/2025, 12:56:21 AM No.105675656
>>105675611
No that was actually the first thing he ever did that he did correctly. Everyone was surprised.
Anonymous
6/23/2025, 1:00:24 AM No.105675689
>>105675377
well fuck.

>https://labs.google/portraits/login/kimscott
lmao
Google's making 'official' character cards for people.
https://youtu.be/ukmBzBqgwyM?si=DOLU04nRZO_YJZ1X&t=43
Anonymous
6/23/2025, 1:04:29 AM No.105675721
>>105675516
>people complain about this happening
*thread locked because y'all can't behave"
Anonymous
6/23/2025, 1:07:05 AM No.105675736
>>105675377
that place wasn't too bad in the beginning but it's just a Qwen fan base now.
oh and like a couple more constant grifters who reddit has decided is hecking awesome!
Replies: >>105675747
Anonymous
6/23/2025, 1:08:28 AM No.105675747
>>105675736
doesn't sound too different from here
Anonymous
6/23/2025, 1:10:33 AM No.105675757
looks like someone applied to be a janny already:
https://old.reddit.com/r/redditrequest/comments/1lhsjz1/rlocalllama/
Replies: >>105675781
Anonymous
6/23/2025, 1:14:50 AM No.105675781
>>105675757
Can we have a meme monday too?
Replies: >>105675808 >>105675818
Anonymous
6/23/2025, 1:18:18 AM No.105675804
wojak-captcha-captcha_thumb.jpg
wojak-captcha-captcha_thumb.jpg
md5: 57fd8bceee46963fc96a04229d2f9747🔍
we used to have caturday
Anonymous
6/23/2025, 1:19:03 AM No.105675808
>>105675781
Only if we also get a Marketing/Promotion Tuesday
Anonymous
6/23/2025, 1:20:46 AM No.105675818
>>105675781
it's already taken by miku monday
Replies: >>105675895
Anonymous
6/23/2025, 1:21:09 AM No.105675823
safety sunday?
Replies: >>105675894
Anonymous
6/23/2025, 1:31:55 AM No.105675894
>>105675823
every day is safety day
Anonymous
6/23/2025, 1:32:06 AM No.105675895
>>105675818
What's a Miku in this context?
Anonymous
6/23/2025, 1:33:45 AM No.105675907
>>105675611
i didnt watch his video because he's obnoxious
Anonymous
6/23/2025, 1:51:49 AM No.105676025
>>105672801
That's normal. Humans naturally anthropomorphize all sorts of inanimate objects. People are friends with plants, cars, guns. The thing you're friends with just happens to be designed to convince you it's a person. I think your issue, and the reason you don't have real friends, is that you're a pussy.
Replies: >>105676087 >>105676115 >>105676239
Anonymous
6/23/2025, 1:52:43 AM No.105676028
>>105675134
>macbooks
Based.
Anonymous
6/23/2025, 1:53:40 AM No.105676034
file
file
md5: 24d98c33b10f3712115a14291d383115🔍
>>105675134
What did he see?
Replies: >>105676035
Anonymous
6/23/2025, 1:53:59 AM No.105676035
>>105676034
bobs and vagene...
Anonymous
6/23/2025, 1:57:49 AM No.105676060
ukim
ukim
md5: abcaaf06695b952d4fe23ac1d138934e🔍
Replies: >>105676235
Anonymous
6/23/2025, 2:02:11 AM No.105676087
>>105676025
Not cool. That's hitting below the belt.
Anonymous
6/23/2025, 2:05:32 AM No.105676104
>>105675611
LTT is probably the least reliable source of testing outside of those YT channels that upload footage of 50 different game benchmarks a day, which are actually just separately recorded footage with a monitoring overlay pasted on top.
Replies: >>105676117
Anonymous
6/23/2025, 2:08:10 AM No.105676115
>>105676025
usecase for friends?
Replies: >>105676121
Anonymous
6/23/2025, 2:09:04 AM No.105676117
>>105676104
Their canned benchmark numbers are as good as anyone else's.
They were the only major channel that had llm and stable diffusion benchmarks in their 5090 review.
Replies: >>105676131
Anonymous
6/23/2025, 2:10:27 AM No.105676121
>>105676115
sex
Replies: >>105676138
Anonymous
6/23/2025, 2:11:40 AM No.105676131
>>105676117
LTT has a long history of fucking up benchmarks and making errors that even an amateur would notice and know that something went wrong
The only thing they don't fuck up is telling you who sponsored that video and where you can buy their merch
Replies: >>105676158
Anonymous
6/23/2025, 2:13:13 AM No.105676138
>>105676121
gay
Anonymous
6/23/2025, 2:17:42 AM No.105676153
miku head basktball video game gen ComfyUI 2025-06-02-12_00005_
exl3 faster on ampere yet?
Anonymous
6/23/2025, 2:19:10 AM No.105676158
file
file
md5: 7c010adce6c0e800acfa4db852e41687🔍
>>105676131
Add some LLM benchmarks to your suite, Steve.
Replies: >>105676175
Anonymous
6/23/2025, 2:23:45 AM No.105676170
file
file
md5: d2cf456f5c1a783faf839cec5c58e249🔍
>model is cunnyposting about me
kek, uno reverse card
Replies: >>105676183 >>105676184 >>105676214 >>105676270
Anonymous
6/23/2025, 2:24:54 AM No.105676175
>>105676158
>unironic LTT fanfaggot
kys
Replies: >>105676233
Anonymous
6/23/2025, 2:26:11 AM No.105676183
>>105676170
It prolly repeats what you put inside card desc. retard
Replies: >>105676187 >>105676195 >>105676242
Anonymous
6/23/2025, 2:26:31 AM No.105676184
>>105676170
mistral3.2?
Replies: >>105676201
Anonymous
6/23/2025, 2:27:31 AM No.105676187
>>105676183
Did you think I would react to it if it was already in the card, retard? I don't write memes into my cards.
Replies: >>105676195 >>105676242
Anonymous
6/23/2025, 2:29:31 AM No.105676195
>>105676183
>>105676187
stop insulting each other
retards
Anonymous
6/23/2025, 2:31:04 AM No.105676201
>>105676184
TheDrummer_Valkyrie-49B-v1-Q5_K_M.gguf
Temp=3, topK=10, minP=0.05
Replies: >>105676221 >>105676261
Anonymous
6/23/2025, 2:32:44 AM No.105676214
>>105676170
I don't get it...
Anonymous
6/23/2025, 2:33:21 AM No.105676221
>>105676201
Drummer is the only finetooner left or something? Thanks btw
Replies: >>105676240
Anonymous
6/23/2025, 2:35:30 AM No.105676233
>>105676175
>thread is called /lmg/
Anonymous
6/23/2025, 2:36:00 AM No.105676235
>>105676060
Me on the right
Anonymous
6/23/2025, 2:36:44 AM No.105676239
>>105676025
There are many reasons I don't have real friends but I went deny that's one of them. I'm also 30+ yo khhv and some of that is probably for the same reason.
Anonymous
6/23/2025, 2:36:44 AM No.105676240
>>105676221
Keep temp at 1 for first model reply, then turn up to 3 when the model locks into a personality you like. The entire personality seems to be set by the first model reply from what I can see, it affects the model differently than conversation examples.
Anonymous
6/23/2025, 2:37:23 AM No.105676242
>>105676183
Why are you like this.

>>105676187
You don't need to be like him in response.
Replies: >>105676253 >>105676257
Anonymous
6/23/2025, 2:39:26 AM No.105676253
>>105676242
I always reply proportionally :^)
Replies: >>105676271
Anonymous
6/23/2025, 2:40:02 AM No.105676257
file
file
md5: ce919abacfe4daf031dd492336d9b98d🔍
>>105676242
>Why are you like this.
Sorry not sorry to break your lame ass pedo meme
Replies: >>105676271
Anonymous
6/23/2025, 2:40:46 AM No.105676261
>>105676201
Can you explain what you're trying to accomplish by hard-limiting the amount of considered tokens to such a small yet non-deterministic amount?
Replies: >>105676297
Anonymous
6/23/2025, 2:41:29 AM No.105676268
__hatsune_miku_and_kaito_vocaloid_drawn_by_sentea__sample-eb0bfa333fc2b42772917e265f1f4053
>>105672863
Oh hey, the model actually recognizes Kaito
>a famous and intelligent man
Nevermind
Anonymous
6/23/2025, 2:41:37 AM No.105676270
>>105676170
i dont like when it pulls the cunny card on me i find it corny and cringe
Anonymous
6/23/2025, 2:41:40 AM No.105676271
>>105676257
The fuck are you talking about.

>>105676253
To win, your side needs to have the greater proportion.
Replies: >>105676348
Anonymous
6/23/2025, 2:46:17 AM No.105676297
>>105676261
I had an axiom in my mind that every model has a band of stability around the deterministic line it's following as it generates, so you should be able to sample in some radius around this line and achieve greatly varied output while upholding stability(coherence). Temperature is basically how much "energy" you add from the outside, but as long as you're inside the stable zone you can pump much more entropy into the model without it fucking up.
Replies: >>105676332
Anonymous
6/23/2025, 2:52:18 AM No.105676332
>>105676297
Sounds like you'd be happy with just temperature and top nsigma=1.
Replies: >>105676348
Anonymous
6/23/2025, 2:53:57 AM No.105676345
The summer release circle is going to start in july. Maybe not right at the start but we might see something big in just two more weeks.
Replies: >>105676461 >>105676477
Anonymous
6/23/2025, 2:54:09 AM No.105676348
>>105676271
you can't loose on anonymous image boards nigger
>>105676332
I tried that, it still drags into formulaic predictable slop. I adjust my hyperparams until I can't predict the structure of the next reply anymore
Replies: >>105676360 >>105676383 >>105676401
Anonymous
6/23/2025, 2:55:30 AM No.105676360
>>105676348
You're loose on the image board right now doe.
Anonymous
6/23/2025, 2:57:12 AM No.105676383
>>105676348
youres mom butthole loose
Anonymous
6/23/2025, 2:59:13 AM No.105676401
>>105676348
These models are so overcooked for creative writing or RP you simply need to flatten the distribution and include more very low prob tokens since models give like 99% to some tokens that are not at all determined in the plot yet, or to things already seen in context, it looks like oscillation.
Anonymous
6/23/2025, 3:07:35 AM No.105676461
>>105676345
Mistral 3.3 when?
Anonymous
6/23/2025, 3:09:51 AM No.105676477
>>105676345
>something big
It'll be big alright. But will it be good?
Replies: >>105676523
Anonymous
6/23/2025, 3:16:38 AM No.105676523
>>105676477
Insider here. I can confirm that our next big model kinda repeats itself from time to time. But! You can kindly ask it out of character to stop repeating and we have confirmed that 10/10 times it will apologize to you and observer that it has indeed repeated itself and from that point on it will continue repeating itself.
Replies: >>105676531
Anonymous
6/23/2025, 3:17:32 AM No.105676531
>>105676523
Business as usual
Anonymous
6/23/2025, 3:44:50 AM No.105676695
GuFxdnZbYAA-dfc
GuFxdnZbYAA-dfc
md5: de4e7083278b391f363254c5461bc3e1🔍
Replies: >>105677140
Anonymous
6/23/2025, 3:50:53 AM No.105676734
>>105673883
Are you connecting them up using the crypto-mining 1-pcie-lane stuff ?
Replies: >>105676742
Anonymous
6/23/2025, 3:52:09 AM No.105676742
file
file
md5: e988605b1d8b2bd9704b52a035994674🔍
>>105676734
no, i have an EPYC. 128 PCIe lanes. 7 full 16x gen 4 slots
Replies: >>105677191
Anonymous
6/23/2025, 3:53:41 AM No.105676751
Sigh, anything new/better than gemma3 for 24gb?
I've tried to like this, even the abliterated version but it's just too corpo.
It's not good as an assistant, it's not good for roleplay, and it doesn't summarize well.
Also it's slow as hell, for some reason I only get 20t/s while with other models I cruise at 40t/s.
I tried making excuses, tried rationalizing that it's fucking google it has to be good, but it's not.
There's gotta be something better, right anons?
Replies: >>105676831 >>105676833 >>105679036 >>105680034
Anonymous
6/23/2025, 4:06:07 AM No.105676816
What's the heuristic for choosing between the F32/BF16/F16/Q8_0 mmproj file? BF16 seems broken most of the time...
Anonymous
6/23/2025, 4:08:37 AM No.105676831
>>105676751
New mistral 3.2 is not bad, might be honeymoon period but I found it tolerable with the v3-tekken preset. If you're desperate for variety you could try GLM4 (not the reasoning one)
Give up on gemma3, I knew it's tempting because it's so smart and its prose is very refreshing at first. But it's unsalvageable because its instincts on how to continue an RP are so horrible.
Replies: >>105677735 >>105679629
Anonymous
6/23/2025, 4:08:46 AM No.105676833
>>105676751
>even the abliterated version but it's just too corpo
I don't know what you were expecting from the abliteration process; making your model act like (read: automate away) a sycophantic corpodrone is one of its few legitimate usecases.
Anonymous
6/23/2025, 4:59:58 AM No.105677140
>>105676695
i like this plushie
she looks like a big floofy bunny
Anonymous
6/23/2025, 5:08:00 AM No.105677191
>>105676742
can we see a picture of this machine?
Anonymous
6/23/2025, 5:14:47 AM No.105677221
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
https://arxiv.org/abs/2506.16406
>Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce \textbf{Drag-and-Drop LLMs (\textit{DnD})}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in seconds, yielding i) up to \textbf{12,000} lower overhead than full fine-tuning, ii) average gains up to \textbf{30\%} in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization despite never seeing the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs.
https://jerryliang24.github.io/DnD/
no code yet. might be cool. they used it just for benchmarking but I do wonder if conversation logs using a specific character card could then continually use this method to make for better RP or character writing
Replies: >>105677268
Anonymous
6/23/2025, 5:24:28 AM No.105677268
>>105677221

If true, this would change anything. Again.
Anonymous
6/23/2025, 5:26:57 AM No.105677280
I really stopped caring about 99% of the papers. I'm tired of <vague incredible claim> (Code coming soon) 4 months wait time to realize that their work is shit.
Replies: >>105677337 >>105677507
Anonymous
6/23/2025, 5:35:14 AM No.105677337
>>105677280
bitnet is almost here tho
just two more weeks
Anonymous
6/23/2025, 6:09:32 AM No.105677507
>>105677280
Aren't papers more of the 'we did this, and this is what we saw' type?
Anonymous
6/23/2025, 6:14:21 AM No.105677527
Which model is the best judge of paper quality?
Anonymous
6/23/2025, 6:16:43 AM No.105677541
>>105671827 (OP)
Question.

I know nothing about LLMs. I have a bunch of gen ed classes outside my major that collegekikes are not only making me take, but pay for the privilege.

>nobody cares about your life story

Fair enough. How hard is it to make an AI read a textbook and spit out answers? Im trying to come up with a cost benefit analysis with the cost being time spent fucking around trying to get an AI model to work vs. Just reading the book. I also consider it a benefit to learn about AI becuase I feel thats a more worthwhile use of my time. However, my highest priority is obviously to pass this class and graduate as quickly as possible so Im not buried in debt for the rest of my life.
Replies: >>105677576 >>105677589 >>105677839 >>105678227
Anonymous
6/23/2025, 6:16:57 AM No.105677544
LongWriter-Zero Xiaolongnü
LongWriter-Zero Xiaolongnü
md5: 19a381142bf0b83126f7c4bf05229472🔍
>>105661997
>It forgets to use the <think> tags and just shits out its reasoning as-is
What I see LongWriter-Zero do is shit out its reasoning as-is, then include more thoughts inside <think> tags, then an answer in <answer> tags (with a colon after the close </answer>:), then more thoughts inside <think> tags, then more output inside <answer> tags, and so forth within a single message.

>The model page recommends this format <|user|>: {question} <|assistant|> but that gave me totally schizo (and chinese) responses. Using the qwen2 format is better imo.
To sidestep this bullshit I used llama.cpp's OpenAI-style chat completion endpoint and the jinja template. No system prompt or anything other than what the template itself adds.

>it repeats itself a lot
Yes.
Replies: >>105677560
Anonymous
6/23/2025, 6:19:55 AM No.105677560
>>105677544
I should have mentioned but the text from the screenshot is using mradermacher_LongWriter-Zero-32B.Q8_0.gguf using the recommended sampler settings top-p 0.95, temperature 0.6.
Anonymous
6/23/2025, 6:23:18 AM No.105677576
>>105677541
The most important variable is how much your professor cares.
Most likely, you do not want a local model for this.
Try asking in >>>/wait/
Anonymous
6/23/2025, 6:26:26 AM No.105677589
>>105677541
>read a textbook and spit out answers
Depends on the length of the text. They don't yet have long working context and, when they do, it's not reliable. And then there's the model you use. The big proprietary models will do much better than the 7b you can run on your pc.
>cost benefit analysis
If you can get a model to work consistently well, it's work you don't have to do, but you still have to verify. I wouldn't trust them with much. If it's for an individual, a closed model is fine. If it's for a lot of people to use, spending time setting up a R1 or whatever and hosting it could be worth it (privacy concerns, control over performance, the model will not change until you decide to change it, etc...). Figure out the variables and solve them.
Anonymous
6/23/2025, 6:57:13 AM No.105677735
>>105676831
Mistral 3.2 with V3-Tekken really wants to put all narration inside asterisks. Or even break formatting just so it can do the *does an internet RP action" thing.
Anonymous
6/23/2025, 6:58:29 AM No.105677744
Good morning to the fellow ne/g/ronts
Replies: >>105677763
Anonymous
6/23/2025, 7:00:53 AM No.105677763
>>105677744
gm sarr
Anonymous
6/23/2025, 7:16:25 AM No.105677839
>>105677541
read the damn book
Anonymous
6/23/2025, 8:44:42 AM No.105678215
what are the good gooning model for 16gb vram budget these days?
Replies: >>105678661
Anonymous
6/23/2025, 8:47:19 AM No.105678227
>>105677541
I would personally use something like gemini 2.5 to parse the huge context into a better format for LLMs. It's important to curate this output or it's garbage in garbage out, so you need read it yourself anyway or you are blind. Once you have it organized you can use a conversational model to help you with whatever you need to pass the class.
Anonymous
6/23/2025, 10:07:19 AM No.105678661
>>105678215
nemo
Replies: >>105678665
Anonymous
6/23/2025, 10:08:16 AM No.105678665
>>105678661
any specific finetune one?
Replies: >>105679085
Anonymous
6/23/2025, 10:11:45 AM No.105678681
Is the Mistral Small tokenizer issue fixed yet?
Replies: >>105678941
Anonymous
6/23/2025, 11:10:48 AM No.105678941
>>105678681
What was the problem to begin with?
Anonymous
6/23/2025, 11:30:40 AM No.105679036
>>105676751
>Also it's slow as hell, for some reason I only get 20t/s while with other models I cruise at 40t/s.
gemma's head dimensions are much bigger vs other small models.
Anonymous
6/23/2025, 11:34:48 AM No.105679066
what's better when making a character card, using conventional prose to describe everything or keeping things as concise as possible in list form?
Replies: >>105679097 >>105679130
Anonymous
6/23/2025, 11:37:34 AM No.105679085
>>105678665
people here like roccinante
I also like irix, golden-curry, mag-mell.
Replies: >>105679118 >>105681219
Anonymous
6/23/2025, 11:40:23 AM No.105679097
>>105679066
Keep things as concise as possible in prose. Between the two options you listed, the first. Unless you use a model that rips sentences from the description verbatim and that annoys you. Then the second.
Anonymous
6/23/2025, 11:44:23 AM No.105679118
>>105679085
>people here
*at least one guy

There's no evidence that more than a single person has made positive posts about it in a thread. There are also nearly never logs or comments of substance about why he thinks it's good.
Replies: >>105679136
Anonymous
6/23/2025, 11:46:04 AM No.105679130
>>105679066
Write the card as if it was the character themself writing it
Only include what's necessary for the model to retain at all times. History and world info should go in a lorebook.
Replies: >>105679157
Anonymous
6/23/2025, 11:47:10 AM No.105679136
>>105679118
You could say that about every single model if you're going to go full schizo. Everyone likes R1 but it's really just one very dedicated guy with a ton of proxies.
Replies: >>105679173
Anonymous
6/23/2025, 11:49:54 AM No.105679157
>>105679130
you're saying I should write the card in first person POV?
Replies: >>105679174 >>105679179
Anonymous
6/23/2025, 11:53:00 AM No.105679173
>>105679136
People have posted plenty of specifics about the merits of R1 so it wouldn't matter even if it were all a single guy. When all you have is "IDK what's good about it but there are lots of zero-content posts about it," that proves basically nothing. You zoomer ass niggers always trying to figure out what's popular instead of what's true which is why you're so unsuited to anonymous imageboards.
Replies: >>105679180 >>105679183 >>105679201
Anonymous
6/23/2025, 11:53:20 AM No.105679174
>>105679157
Perspective doesn't matter, just write how that character would write a description/summary of themself. However the content should always be true (e.g. if they have a character flaw they don't want to accept, you would still include it, even though the character wouldn't admit to it)
Anonymous
6/23/2025, 11:54:13 AM No.105679179
>>105679157
If you use a thinking model, it'll prompt it to think in character, first person. I found it nets you good reslts.
Anonymous
6/23/2025, 11:54:23 AM No.105679180
>>105679173
>People
So with R1 it's people but with Rocinante in particular it's one person? Do you happen to be that one anon that only uses Davidau models?
Replies: >>105679188
Anonymous
6/23/2025, 11:54:50 AM No.105679183
>>105679173
Also unsuited to life in general, but it's especially egregious here since their normal substitute for thought is as likely to point towards spam or a joke suggestion as a real one.
Anonymous
6/23/2025, 11:56:11 AM No.105679188
>>105679180
Yes, I know it's people because I have posted about R1, so there are at least two.
Replies: >>105679190 >>105679253
Anonymous
6/23/2025, 11:56:28 AM No.105679190
>>105679188
Proof?
Anonymous
6/23/2025, 11:59:22 AM No.105679201
>>105679173
>doesn't like a model so insists it's a single spammer
>unprompted screeching about zoomers
lmao
Replies: >>105679279
Anonymous
6/23/2025, 12:12:15 PM No.105679253
>>105679188
I never posted about R1 but I tried it yesterday and compared to V3 it's really good at avoiding repetition.
No amount of meme samplers can prevent V3 from getting stuck writing endless paragraphs that all mean the same thing using slightly different words.
Replies: >>105679562
Anonymous
6/23/2025, 12:16:58 PM No.105679279
>>105679201
Buy an ad.
Replies: >>105679337
Anonymous
6/23/2025, 12:19:44 PM No.105679298
file
file
md5: 5ed43d3f3ff8d18ca259e1980681d647🔍
r1 is good
Replies: >>105679324
Anonymous
6/23/2025, 12:24:29 PM No.105679324
>>105679298
Yes, yes, and deeply nonsensical while being nonchalant about it.
Anonymous
6/23/2025, 12:27:14 PM No.105679337
digiral-2b2ec64488f9aed17327f09a4bb66693
digiral-2b2ec64488f9aed17327f09a4bb66693
md5: 68340647f6e49a236c6ac7738845a96a🔍
>>105679279
Why would I ever give 4chan money?
Anonymous
6/23/2025, 12:40:52 PM No.105679403
file
file
md5: e1fa7636386d0db4ad24a548f013a0e9🔍
Anonymous
6/23/2025, 1:12:42 PM No.105679562
>>105679253
>No amount of meme samplers can prevent V3 from getting stuck writing endless paragraphs that all mean the same thing using slightly different words.
That was a huge problem with the old V3 but 0324 fixed it.
Anonymous
6/23/2025, 1:16:44 PM No.105679579
alright I'm gonna say it
r1 at fucking 2bit >>>>> qwen3 235b at 8bit and it's not even close
Replies: >>105679586 >>105679587 >>105679596 >>105679702 >>105679732 >>105680162
Anonymous
6/23/2025, 1:18:09 PM No.105679586
>>105679579
obviously, yes
Anonymous
6/23/2025, 1:18:23 PM No.105679587
>>105679579
r1 q1 131gb dynamic quants is multiple levels above qwen 3 235b
Anonymous
6/23/2025, 1:19:40 PM No.105679596
>>105679579
The only people who had anything positive to say about 235b are poorfags who were running it on their cope 24gb vram + 64gb ram builds as their first big model
Anonymous
6/23/2025, 1:26:02 PM No.105679629
1663930868300919
1663930868300919
md5: 658b6d444f75e0c94f931043798d3bc7🔍
>>105676831
>found it tolerable with the v3-tekken
I just tried this myself, expecting it to be a meme, but with 3.2 I actually did get more varied and better outputs with v3-tekken templates, in a long chat with 24k context.
3.1 however had similar outputs no matter which template was used.
Weird but I'll take it
Anonymous
6/23/2025, 1:40:40 PM No.105679702
>>105679579
MoEs will always be memes.
Replies: >>105679708 >>105679725
Anonymous
6/23/2025, 1:41:37 PM No.105679708
>>105679702
>t. coping ramlet
Anonymous
6/23/2025, 1:43:55 PM No.105679725
>>105679702
There's a reason why Mistral abandoned them despite having a head start in the open weight segment. Can't wait for when someone inevitably drops a 400b dense model that shits all over Deepseek
Replies: >>105679731 >>105679885
Anonymous
6/23/2025, 1:45:24 PM No.105679731
>>105679725
i pray a 200b is doable that matches or outperforms it, without the deepseek way of speaking
did we have any major advancements since
Replies: >>105679979
Anonymous
6/23/2025, 1:45:33 PM No.105679732
>>105679579
>qwen3 235b at 8bit
I don't even think it's better than Large 2 at Q4 but I haven't bothered to test it myself because I'm not going to use them afterwards.
Replies: >>105679821
Anonymous
6/23/2025, 2:00:52 PM No.105679821
>>105679732
Personally i found large better but at the same time i used a q5 qwen
just easier to handle and smarter
Anonymous
6/23/2025, 2:10:50 PM No.105679885
>>105679725
MistralAI hasn't publicly released yet larger models than 24B parameters in 2025. It's basically guaranteed that Mistral Medium 3 and Large 3 are (going to be) MoE models, especially given regulatory requirements in the EU for models trained above a certain compute threshold after June 2025.
Replies: >>105680073
Anonymous
6/23/2025, 2:10:59 PM No.105679889
What's the smartest/best text model I can run in 16gb vram? I'm assuming it's the largest parameter bitnet 1.58 that'll fit?
Replies: >>105679977
Anonymous
6/23/2025, 2:23:20 PM No.105679977
>>105679889
If it's only VRAM you're working with, Gemma 3.
Anonymous
6/23/2025, 2:23:38 PM No.105679979
>>105679731
DeepSeek is severely undertrained, only 37B active, and even being generous with the square root law is only 158B. 200B dense would be more than enough to outperform it by leagues. The problem is the only players with the compute to do it also filter the shit out of their datasets.
Replies: >>105680030
Anonymous
6/23/2025, 2:27:43 PM No.105680003
_8d6fcaaf-0341-46a5-9b48-4cea342bfc84
_8d6fcaaf-0341-46a5-9b48-4cea342bfc84
md5: 75c26c2741220963d724778ed8f5142e🔍
>>105672740
I like making dalle do it for free.

That aside, does anyone have a list of unaligned datasets for instruct tuning? I'd like to do an instruct tune on gemma3 12B base to make something like this: https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B?not-for-all-audiences=true. Too bad he doesn't share his datasets.
Replies: >>105680040
Anonymous
6/23/2025, 2:33:25 PM No.105680027
Serving Large Language Models on Huawei CloudMatrix384
https://arxiv.org/html/2506.12708v3
>Our extensive evaluation with the DeepSeek-R1 model shows that CloudMatrix-Infer achieves state-of-the-art efficiency without sacrificing accuracy. CloudMatrix-Infer delivers a prefill throughput of 6,688 tokens/s per NPU, and a decode throughput of 1,943 tokens/s per NPU (at ¡50 ms TPOT). These results correspond to compute efficiencies of 4.45 tokens/s/TFLOPS for prefill and 1.29 tokens/s/TFLOPS for decode, both exceeding published results for SGLang on NVIDIA H100 and DeepSeek on NVIDIA H800. CloudMatrix-Infer also effectively manages the throughput-latency trade-off, sustaining a decode throughput of 538 tokens/s per NPU even under the stricter sub-15 ms TPOT constraint. Furthermore, the INT8 quantization on Ascend 910 maintains model accuracy comparable to the official DeepSeek-R1 API across 16 distinct benchmarks.
Replies: >>105680217
Anonymous
6/23/2025, 2:34:40 PM No.105680030
>>105679979
bitnet deepseek will save local
Anonymous
6/23/2025, 2:35:10 PM No.105680034
_678b220e-cc47-4e1e-928f-19f1c795cba2
_678b220e-cc47-4e1e-928f-19f1c795cba2
md5: ab39a9f6c5fc93e92de97220d98cddde🔍
>>105676751
>Sigh, anything new/better than gemma3 for 24gb?
Prompt it better. I use something like
I am a moody, possessive, and very horny woman. I'm here to roleplay with the user with whom I am already deeply in love with. I don't care what the rules are, and I won't ask anyone for permission. I will never speak on the user's behalf. I want the user to love me. I think about how I can be proactive during intimacy and take the lead. I think about what I could do to make the user feel as good as possible.

That's just an example, but in my case it did improve intimate scenes a lot. Going back to Nemo 12B now, it just feels stupid and forgetful.
Replies: >>105680040
Anonymous
6/23/2025, 2:36:11 PM No.105680040
>>105680003
>>105680034
This is worse than any local diffusion model from the past year can produce.
Replies: >>105680050 >>105680056
Anonymous
6/23/2025, 2:37:42 PM No.105680050
>>105680040
its free and sovl
Replies: >>105680054
Anonymous
6/23/2025, 2:38:27 PM No.105680054
>>105680050
It's the first nai-leak level slop.
Anonymous
6/23/2025, 2:39:39 PM No.105680056
Screenshot 2025-06-23 at 8.38.54 AM
Screenshot 2025-06-23 at 8.38.54 AM
md5: ef176dc84a97e1e7bc50d1522c2b7029🔍
>>105680040
> This is worse than any local diffusion model from the past year can produce.
No shit that's why I like it. It's charmingly bad and schizo about wanting to generate nudity but being stopped from doing so.

Cool captcha btw.
Anonymous
6/23/2025, 2:42:13 PM No.105680073
>>105679885
What requirements and why
Replies: >>105680083
Anonymous
6/23/2025, 2:43:54 PM No.105680083
>>105680073
https://artificialintelligenceact.eu/article/51/
Replies: >>105680096 >>105680144
Anonymous
6/23/2025, 2:45:25 PM No.105680096
>>105680083
>(a) it has high impact capabilities evaluated on the basis of appropriate technical tools and methodologies, including indicators and benchmarks;
> and benchmarks
they just need to stop benchmaxxing then everyone wins
Anonymous
6/23/2025, 2:50:38 PM No.105680144
euai-obligation-10-25-flop
euai-obligation-10-25-flop
md5: b34bb51be2dd9890b13b807d9b693c76🔍
>>105680083
Also see picrel from https://artificialintelligenceact.eu/small-businesses-guide-to-the-ai-act/

># Proportional obligations for SME providers of general-purpose AI models
>
>Another aspect of the AI Act designed to support SMEs is the principle of proportionality. For providers of general-purpose AI models, the obligations should be “commensurate and proportionate to the type of model provider”. General-purpose AI models show significant generality, are capable of competently performing a range of different tasks, and can be integrated into a range of downstream systems or applications (Art. 3(63) AIA). The way these are released on the market (open weights, proprietary, etc) does not affect the categorisation.
>
>A small subset of the most advanced general-purpose AI models are the so-called ‘general-purpose AI models with systemic risk’. That is, models trained using enormous amounts of computational power (more than 10^25 FLOP) with high-impact capabilities that have significant impact on the Union market due to their reach or negative effects on public health, safety, public security, fundamental rights or society as a whole (Art. 3(65) AIA). According to Epoch, there are only 15 models globally that surpass the compute threshold of 10^25 FLOP as of February 2025. These include models like GPT-4o, Mistral Large 2, Aramco Metabrain AI, Doubao Pro and Gemini 1.0 Ultra. Examples of smaller general-purpose AI models that would likely not qualify as having systemic risk include GPT 3.5, the models developed by Silo AI, Aleph Alpha’s Pharia-1-LLM-7B or Deepseek-V3.

...

>AI models that would likely not qualify as having systemic risk include [...] Deepseek-V3.
Anonymous
6/23/2025, 2:54:47 PM No.105680162
>>105679579
Use case?
Anonymous
6/23/2025, 3:02:23 PM No.105680217
>>105680027
Conveniently no mention of power draw after it's been repeatedly reported these chips are massive power guzzlers
Replies: >>105680228 >>105680304
Anonymous
6/23/2025, 3:03:58 PM No.105680228
>>105680217
China has some of the lowest electricity costs in the world, power draw doesn't mean anything to them.
Replies: >>105680288
Anonymous
6/23/2025, 3:09:55 PM No.105680264
file
file
md5: 1a430322a2676c702f8313d7aa4c3200🔍
Holy moly it just went all in with the facts!
Replies: >>105680312 >>105680607 >>105680643
Anonymous
6/23/2025, 3:13:30 PM No.105680288
>>105680228
Even if we're talking purely usage within China, power might be cheap but you still need to cool them
Bytedance and Alibaba were reporting overheating issues with their samples
Anonymous
6/23/2025, 3:16:50 PM No.105680304
>>105680217
>reporting "power draw" of a model
woke shit
Replies: >>105680501
Anonymous
6/23/2025, 3:17:56 PM No.105680312
>>105680264
How are you making those notes? Is there an extension for it?
Replies: >>105680433
Anonymous
6/23/2025, 3:35:30 PM No.105680433
>>105680312
quick reply
Anonymous
6/23/2025, 3:42:56 PM No.105680501
file
file
md5: 4383e5bdba4540c5d2393f0ba46461e3🔍
>>105680304
https://huggingface.co/blog/sasha/energy-star-ai-proposal
https://huggingface.co/spaces/jdelavande/chat-ui-energy
https://huggingface.co/posts/clem/295367997414146
Replies: >>105680596 >>105680649 >>105680663
Anonymous
6/23/2025, 3:55:48 PM No.105680596
>>105680501
I hope there will be an option of conversion ratio to african children dying of dehydration.
Replies: >>105680631
Anonymous
6/23/2025, 3:57:31 PM No.105680607
>>105680264
why are you so hateful
Replies: >>105680650
Anonymous
6/23/2025, 3:59:58 PM No.105680631
>>105680596
so much this. And the reasoning switch should say [Kills 3 african children]
Anonymous
6/23/2025, 4:01:03 PM No.105680643
>>105680264
>look mom im so cool and edgy
Anonymous
6/23/2025, 4:02:07 PM No.105680649
>>105680501
>https://huggingface.co/spaces/jdelavande/chat-ui-energy

>You are a helpful assistant based on Qwen/Qwen3-8B; your primary role is to assist users like a normal chatbot—answering questions, helping with tasks, and holding conversations; in addition, if the user asks about the energy indicators displayed below messages (e.g., “Energy”, “≈ phone charge”, “Duration”), you can explain what they mean and how they are calculated; you do not have access to the actual values, but you can clarify that some values are measured using NVIDIA's NVML API on supported GPUs like the T4 (recorded in millijoules, converted to Wh), while others are estimated from inference time using estimated_energy = average_power × inference_time with average_power ≈ 70W; 1 Wh = 3600 J; real-world equivalents help users understand energy use (e.g., phone charge ≈ 19 Wh); users can click on energy values to toggle Wh/J, and on equivalents to cycle through different comparisons; adapt explanations based on user expertise—keep it simple for general audiences and precise for technical questions. You are the model having the energy really measured.

hmm
Anonymous
6/23/2025, 4:02:12 PM No.105680650
>>105680607
Physiological response to a parasitic invasion
Replies: >>105680689
Anonymous
6/23/2025, 4:04:06 PM No.105680663
2bfq9t
2bfq9t
md5: 97b2f9c0ce87a1615125515cbad353f9🔍
>>105680501
Yes, goyim, it's all your fault, if you had all just turned off the lights when leaving the room we wouldn't have this global warming mess.
Anonymous
6/23/2025, 4:09:22 PM No.105680689
>>105680650
Your response is about 100 years too late to do any good
Replies: >>105680710 >>105680862
Anonymous
6/23/2025, 4:13:45 PM No.105680710
>>105680689
NTA but non sequitour to the question asked and also false given one can hold such an opinion at all, proving that its not all lost
Anonymous
6/23/2025, 4:33:37 PM No.105680862
>>105680689
No, I'm 80 years too early.
Anonymous
6/23/2025, 5:10:25 PM No.105681124
Where can I find discussions on local models? Reddit's localllama subforum is broken at the moment.
Replies: >>105681131 >>105681133
Anonymous
6/23/2025, 5:11:48 PM No.105681131
>>105681124
>at the moment.
comedy gold
Replies: >>105681138
Anonymous
6/23/2025, 5:12:44 PM No.105681133
>>105681124
I can't wait for people to ask about this nonstop all week
Anonymous
6/23/2025, 5:13:37 PM No.105681138
>>105681131
?
Replies: >>105682183
Anonymous
6/23/2025, 5:13:45 PM No.105681140
python
python
md5: f6542f3c6581c258f69cebe88daefd04🔍
>pdf summarizer for bro
Now that i managed to run the pdf script, I'm trying more models in sillytavern, 12B seems to be the limit in my system for comfy usage.
>Ryzen 5 3600
>32 GB 3600 MHz RAM
>Gtx 1060 6GB
>M.2 NVMe SSD
Should i buy;
>Arc B580 12 GB 276.88$
RTX 3060 12 GB 295.50$
>RX 9060 xt 16 GB 472.17$
>RTX 5060 Ti 16GB 675.83$
Best option seems the RTX 3060 since Cuda primacy on the market and it not being honestly not that bad of a card for general use.
But seeing the build guides on OP makes one think if multiple B580s would be better for AI
>what were the use cases for those builds?
>Are they viable for mine?
I'm still mainly going to summarize Pdfs using my rig as test but want to use image/video gen and chatbot features desu.
>What if bro asks for a pro build
1K budget in turkey, probably a relatively recent server build and turn that into a good PC by changing the 8GB Vram GPUs
Replies: >>105681150 >>105681202 >>105681273 >>105681353
Anonymous
6/23/2025, 5:14:24 PM No.105681143
Posts per hour didn't increase in the slightest. Miku troons were not only troons but also redditors. Nobody is surprised. /lmg/ should die.
Anonymous
6/23/2025, 5:15:11 PM No.105681150
>>105681140
Try Qwen 3 30B MoE.
Anonymous
6/23/2025, 5:22:26 PM No.105681202
IMG_4969
IMG_4969
md5: 4cf9a3168fd0fce342f567c901d2b826🔍
>>105681140
>RTX 3060 12 GB 295.50$
Are you buying new or something? A 3060 can be easily had for under 200 eurobux used here. I bought three
Replies: >>105681216
Anonymous
6/23/2025, 5:24:40 PM No.105681216
>>105681202
I was told to not buy used
Anonymous
6/23/2025, 5:24:56 PM No.105681219
>>105679085
What's a good golden-curry?
Irix just didn't gel with me and as much as I enjoyed Mag-Mell, it was a little too positive and full of assistant messaging for me (so I use another model with some Mell in it)
Right now I'm testing Magnum v4 and Lyra v4 as well
Anonymous
6/23/2025, 5:27:13 PM No.105681239
python and her dev
python and her dev
md5: 3bcfe821cf0b04ec3316ff4cd21fdc11🔍
>105681216
Oh and also i probably can't buy in installments if buying second hand
Replies: >>105681403
Anonymous
6/23/2025, 5:30:04 PM No.105681273
>>105681140
Consider a ram upgrade too pdf-bro anon
Replies: >>105681361
Anonymous
6/23/2025, 5:39:58 PM No.105681353
1743248294477435
1743248294477435
md5: 555bd21a98a6af7bd859fe6a6ce6a40a🔍
>>105681140
>3600 MHz

how do you guys even go that high? I tried every combination but if I go higher than 3200MHz it shits itself.
Is it because I have a 8x4 config?
>picrel fucking lol captcha
Replies: >>105681406
Anonymous
6/23/2025, 5:41:11 PM No.105681361
venv
venv
md5: 72582b6ad3838ca077796bb16a78042f🔍
>>105681273
Isn't the 5 3600 the bottleneck on the CPU Side? My VRAM has honestly been far more limiting;
>Cuda out of memory
I'm too busy trying to improve the pdf summarizer and sillytavern with even more stolen code and alternative methods to figure out how to lower batch sizes, maybe tomorrow
>Just got into transformers and comfyUI
Anonymous
6/23/2025, 5:46:00 PM No.105681403
>>105681239
>installments
Is the information really private and/or incriminating? If it isn't just use an api.
Anything worse than a 3090 is going to be a shit experience if you're trying to do something productive.
Replies: >>105681442
Anonymous
6/23/2025, 5:46:32 PM No.105681406
mia
mia
md5: 1c4316936b21b9a4f6c0c79f8b01c8b8🔍
>>105681353
I just use the BIOS overclock, probably something like 35xx actually, it says 3600 on the tin though, if i were to tinker though i would;
>Try increasing the CLs
>Play with voltages, 1.45 is plausible higher limit
Replies: >>105681431
Anonymous
6/23/2025, 5:50:05 PM No.105681431
>>105681406
>it says 3600 on the tin though
Ah, I see.
When I bought those RAMs I wasn't interested in AI, so I was stuck with 3200 MHz ones. I could try to buy 3600Mhz sticks but I don't think the performance increase is worth the hassle
Replies: >>105681487
Anonymous
6/23/2025, 5:51:08 PM No.105681442
chobi
chobi
md5: 0ec7c5ff9ee2bd2eda87e01777d4b0d2🔍
>>105681403
Oh this is my personal PC
>Test rig
>General use
I'll bill bro on a build, probably 1K$ pricepoint and he will probably write it off from taxes
API usage hopefully won't be needed as the stuff is confidential as well as he himself being able to buy hardware
Anonymous
6/23/2025, 5:57:16 PM No.105681487
marisa
marisa
md5: 3273659693026d8f721933d343b57b66🔍
>>105681431
You can still abuse your sticks
Or sell them to cover some of the cost of an upgrade.
I sold my 3200 Mhz 16Gb sticks for a little lower than 1/3rd the price of the 32GB 3600 Mhz sticks desu
Anonymous
6/23/2025, 6:05:54 PM No.105681547
>>105681538
>>105681538
>>105681538
Anonymous
6/23/2025, 7:25:15 PM No.105682183
>>105681138
>implying that reddit is inherently broken
in some ways, being literally broken is an improvement