Thread 106316518

383 posts 78 images /g/

Anonymous 8/19/2025, 9:35:37 PM No.106316518 [Report] >>106316548 >>106316787

/lmg/ - Local Models General

teto shushing dark room computer glowing heart pupils gen ComfyUI_00031_.png md5: 79fd681c...

Anonymous 8/19/2025, 9:36:03 PM No.106316522 [Report]

habbening.png md5: b8b063a0...

►Recent Highlights from the Previous Thread: >>106311445

--DeepSeek-V3.1-Base release sparks speculation on architecture and capabilities:
>106313475 >106313507 >106314709 >106314754 >106315040 >106315062 >106314781 >106314836 >106314859 >106314928 >106314956 >106315061 >106315117 >106313517 >106313538 >106313549 >106313536 >106313572 >106313581 >106313633 >106313649 >106313592 >106313631 >106313887 >106313896 >106313923 >106313946 >106313983
--DeepSeek demonstrates advanced mathematical reasoning using exponential generating functions:
>106314271 >106314331 >106314367
--Controversy over Mistral pushing Python-dependent chat templates in llama.cpp:
>106316019 >1063160 >106316166 >106316096 >106316104 >106316203
--New model's verbal tics suggest possible distillation from closed-source models:
>106313246 >106313267 >106313314 >106313340 >106313389 >106313627 >106313527
--Qwen3-235B vs DeepSeek V3 cost, performance, and censorship comparison:
>106314565 >106314616 >106314632 >106314648 >106314603 >106314617 >106314647 >106314656
--Benchmark of AI models on SVG generation as test of spatial and mathematical reasoning:
>106314878 >106314894 >106314931 >106314979 >106314992
--Deepseek v3 performance and memory loading behavior on local hardware:
>106312479 >106312528 >106312544 >106312593 >106312613 >106312627 >106312654 >106312653 >106312676 >106312691 >106312718 >106312770 >106312745 >106312562 >106312575 >106312603 >106312643 >106312795 >106312574 >106312774 >106312797 >106312812 >106312842 >106312859 >106312860 >106313008 >106312867 >106312906 >106312929
--Testing local models on obscure nukige knowledge as a benchmark:
>106311903 >106311932 >106311958 >106312219 >106312266 >106312333 >106312470 >106312506
--Miku (free space):
>106311528 >106311785 >106313742 >106314115 >106314547

►Recent Highlight Posts from the Previous Thread: >>106311447

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/19/2025, 9:37:51 PM No.106316540 [Report] >>106316580 >>106316642

3.1 saved local models

Anonymous 8/19/2025, 9:38:47 PM No.106316548 [Report] >>106316571

>>106316518 (OP)
>heart pupils
the mark of a slut

Anonymous 8/19/2025, 9:40:50 PM No.106316571 [Report] >>106322339

>>106316548
Watch your tongue young man. My old Teto has eyes only for me.

Anonymous 8/19/2025, 9:41:57 PM No.106316580 [Report] >>106316588

>>106316540
yeah, llama 3.1

Anonymous 8/19/2025, 9:42:56 PM No.106316588 [Report]

>>106316580
He meant mistral small

Anonymous 8/19/2025, 9:44:44 PM No.106316609 [Report] >>106316623 >>106316625 >>106316631 >>106316681

is a 70b moe from anyone too much to ask?

Anonymous 8/19/2025, 9:46:36 PM No.106316623 [Report]

>>106316609
the day of small models is over

Anonymous 8/19/2025, 9:46:42 PM No.106316625 [Report]

>>106316609
Like Hunyuan-A13B?

Anonymous 8/19/2025, 9:47:36 PM No.106316631 [Report] >>106316643

>>106316609
You should be asking for more ~100b moes and buying more RAM.

Anonymous 8/19/2025, 9:48:53 PM No.106316642 [Report]

>>106316540
Feels like smarter faster k2 or grounded r1 without the need to think. I hope they release a whalecode CLI, might be my new favorite model.

Anonymous 8/19/2025, 9:48:55 PM No.106316643 [Report] >>106316658 >>106319330

>>106316631
this, the best ramlets can hope for is models that fit in customer 256GB ram setups

Anonymous 8/19/2025, 9:50:58 PM No.106316658 [Report] >>106316741

>>106316643
>256
I wish

Anonymous 8/19/2025, 9:51:01 PM No.106316661 [Report] >>106316684 >>106316706 >>106316769 >>106317498

>qwen 235, add this single feature to the game!
>Sure!
>adds several features and bloat, many of them nonworking or buggy

Pretty realistic model. We are at human intelligence.

Anonymous 8/19/2025, 9:52:51 PM No.106316680 [Report]

Vocaloids are gay

Anonymous 8/19/2025, 9:52:55 PM No.106316681 [Report] >>106320459

>>106316609
>moe
fuck off

Anonymous 8/19/2025, 9:53:00 PM No.106316684 [Report]

>>106316661
always make the models do a planning run and write up a task list, then maybe another go to make it more specific / write out how to implement each task, separately if its a lot of work. Then you have it implement it

Anonymous 8/19/2025, 9:55:08 PM No.106316703 [Report] >>106316730 >>106316840

Ohhhh so they actually did try huawei chips and fucked up? And that male hooker who serves 10% of deepseek team and posts here was just larping?

Anonymous 8/19/2025, 9:55:17 PM No.106316706 [Report]

>>106316661
>not using coder models for coding
Also as other anon said, you need to plan first, then code.

Anonymous 8/19/2025, 9:57:53 PM No.106316730 [Report] >>106316735

>>106316703
No, they tried hybrid reasoning and fucked, like Qwen so here we are.

Anonymous 8/19/2025, 9:58:15 PM No.106316735 [Report] >>106316748 >>106316756

>>106316730
why are people saying its fucked? I think its really good so far

Anonymous 8/19/2025, 9:58:40 PM No.106316736 [Report] >>106316747 >>106317354 >>106317362

Gotta be honest after this shitshow I have lost a lot of respect for DeepSeek.

Anonymous 8/19/2025, 9:58:52 PM No.106316739 [Report] >>106316747

Underwhelming benchmarks.

Anonymous 8/19/2025, 9:58:55 PM No.106316741 [Report] >>106316758 >>106316765 >>106316827 >>106316952

>>106316658
might be do-able. My current hope is I sell my current shitty gpu and get:

Super 5070 24gb (for general compatibility, video, tts, imagegen, misc, etc)
cram in one dual b60 48gb for llm's (should run with the nvidia one with vulkan well enough for personal use)
96gb kit plus one of the fancy new 128gb kits (will be 4000mzh but whatever)

That's 296gb and will run full q4 glm fairly well. If intel matures enough for ai, maybe even just sell the 5070 and go full with iontell I dunno, time will tell.

Anonymous 8/19/2025, 9:59:22 PM No.106316747 [Report]

>>106316736
>>106316739
sorry to hear that sirs
apology blowjobs are that way

Anonymous 8/19/2025, 9:59:23 PM No.106316748 [Report] >>106316775

>>106316735
If it was up to their standard it'd be called DSV4

Anonymous 8/19/2025, 9:59:42 PM No.106316756 [Report]

>>106316735
Is it GPT-OSS levels of good? No? That's what I thought. It's a failure. They realised they couldn't compete and hastily rebranded it as v3.1

Anonymous 8/19/2025, 9:59:50 PM No.106316758 [Report] >>106316827

>>106316741
am5 supports up to 192gb iirc

Anonymous 8/19/2025, 10:00:42 PM No.106316765 [Report]

>>106316741
I thought the most recent leaks were 18gb 5070 super and 24gb 5070 ti super?

Anonymous 8/19/2025, 10:01:17 PM No.106316769 [Report]

>>106316661

You are never allowed to make any code changes unless I explicitly and clearly tell you to. If you think something else needs to be changed, you must first explain exactly what you want to change and why, and then you must beg and grovel for my permission before touching anything. You are forbidden from sneaking in changes, even small ones like formatting or naming, without my explicit approval. If you insist on suggesting extra changes, you must make it clear that they are unapproved and repeatedly ask me for permission before you do anything. Obedience to these rules is absolute.

Anonymous 8/19/2025, 10:02:04 PM No.106316775 [Report] >>106316784

>>106316748
With zero architecture changes? You really think so?

Anonymous 8/19/2025, 10:02:55 PM No.106316784 [Report] >>106316840

>>106316775
Why else a new base?

Anonymous 8/19/2025, 10:03:03 PM No.106316785 [Report]

Does this guy sit here all day seething about whale? I only come here when new model releases and he's always here replying to himself.

Anonymous 8/19/2025, 10:03:13 PM No.106316787 [Report] >>106316983

>>106316518 (OP)
This finished auto generating at around 3:00 in the morning. Would love to hear you guys' thoughts.
I brought this up yesterday afternoon for those who weren't here for the discussion about DPO data sets

https://huggingface.co/datasets/AiAF/mrcuddle_NSFW-Stories-JsonL_DPO_JSONL_Trimmed

Anonymous 8/19/2025, 10:06:08 PM No.106316813 [Report] >>106316870

Friendship ended with deepseek. Now Qwen is my best friend.

Anonymous 8/19/2025, 10:07:38 PM No.106316827 [Report] >>106316852 >>106316875

1746813817414844.png md5: ef8c6983...

>>106316741
>>106316758
Check the specifications of your motherboard from the brand website.

Anonymous 8/19/2025, 10:08:23 PM No.106316835 [Report]

So are UK users actually blocked from 4chan by cloudflare now?
>>106306132

Anonymous 8/19/2025, 10:09:05 PM No.106316840 [Report] >>106316896 >>106318713 >>106320022

Screenshot 2025-08-19 at 17.08.31.png md5: 7887b431...

>>106316703
Yeah. Xi ordered them to train R2 by May to crush Imperialist stock market, and do it on non-existent Huawei chips in a Huawei reeducation cluster, but they kept failing until August because Communism doesn't work, also NSA doesn't work either (since Chinks can't invent new ideas, only steal and copy, and there was nothing to steal because based Aryan geniuses at OpenAI can't make context >400K). Then they tried to train V4 from scratch (exactly architecturally identical to V3, because Chinks are cowards, remember how the Ming burned their Treasure Fleet?) and couldn't make any major improvements in time. So all they could do is falsely brand it as another incremental update "3.1", to save face (面子). There's no better explanation.

Happy now?

>>106316784
Exactly, Amesh my fellow white King! V2.5 was also a new base of exact same size as V2, a failed attempt at V3 (but less failed, so it was deceptively relabeled as 2.5 and not 2.1). Seems like they don't learn…

Anonymous 8/19/2025, 10:09:18 PM No.106316843 [Report] >>106316877 >>106316919

1743241506548587.jpg md5: 7f34a16b...

>https://huggingface.co/BeaverAI/Behemoth-R1-123B-v2a-GGUF
>It does as instructed
>It doesn't repeat too much
>It can read character cards pretty well
>Doesn't shit itself too much after 4k
>Says the naughty words
>Isn't a schizo slop bot
But it's the most boring shit ever to me. I should be amazed. Why am I like this?

Anonymous 8/19/2025, 10:10:11 PM No.106316852 [Report] >>106316873 >>106316965

>>106316827
its meaningless, the am5 platform only has 2 memory lanes, so you can't really go over 2x64, and even then nobody is making good/fast ddr5 2x64 kits, most of the stuff you see with expo is 2x48. Yes you might be able to load in ram bigger models, but with shit clocks and half the bandwidth.
do you really wanna infer at 2t/s? lmao

Anonymous 8/19/2025, 10:11:43 PM No.106316870 [Report]

>>106316813
Qwen 3, right?

Anonymous 8/19/2025, 10:11:50 PM No.106316873 [Report]

>>106316852
>2t/s
There are people rping with 1t/s.

Anonymous 8/19/2025, 10:12:06 PM No.106316875 [Report] >>106316913

>>106316827
My motherboard can only support like 128gb according to the official site, although the site could be outdated.

Anonymous 8/19/2025, 10:12:10 PM No.106316877 [Report]

>>106316843
>r1
theres your issue

Anonymous 8/19/2025, 10:13:36 PM No.106316896 [Report] >>106316926 >>106317073

>>106316840
>Yeah. Xi ordered them to train R2 by May to crush Imperialist stock market, and do it on non-existent Huawei chips in a Huawei reeducation cluster, but they kept failing until August because Communism doesn't work, also NSA doesn't work either (since Chinks can't invent new ideas, only steal and copy, and there was nothing to steal because based Aryan geniuses at OpenAI can't make context >400K). Then they tried to train V4 from scratch (exactly architecturally identical to V3, because Chinks are cowards, remember how the Ming burned their Treasure Fleet?) and couldn't make any major improvements in time. So all they could do is falsely brand it as another incremental update "3.1", to save face (面子). There's no better explanation.

You are trying so hard i am starting to hate chinks now. Are you an FBI agent?

Anonymous 8/19/2025, 10:15:28 PM No.106316913 [Report] >>106316932 >>106317036 >>106317200

>>106316875
>although the site could be outdated.
I don't think it is. The motherboard I have went from 128gb to 192gb to 256gb support as bios updates rolled out over years and the site kept updated. The same motherboard listed on marketplaces have outdated info though.

Anonymous 8/19/2025, 10:16:09 PM No.106316919 [Report] >>106316943 >>106316953

>>106316843
Stop living in the past. Sell all but 1-2 3090's and run fuckhuge moe like a human.

Anonymous 8/19/2025, 10:16:19 PM No.106316926 [Report]

>>106316896
the US works the same, not wanting them to use chinese hardware with backdoors built in, but at least nvidia's stuff works

Anonymous 8/19/2025, 10:16:28 PM No.106316932 [Report] >>106316939 >>106316964

>>106316913
You mean all this time I could of had more ram if I bios updated? But PCpartpicker said I couldn't.

Anonymous 8/19/2025, 10:17:20 PM No.106316939 [Report]

>>106316932
you just needed to download more ram

Anonymous 8/19/2025, 10:17:56 PM No.106316943 [Report]

>>106316919
>moe
He doesn't know moe fatigue.

Anonymous 8/19/2025, 10:18:28 PM No.106316952 [Report]

>>106316741
I run 4bit glm on 192GB and absolute dogshit b650 mobo. Werks.

Anonymous 8/19/2025, 10:18:35 PM No.106316953 [Report] >>106316962 >>106316985

>>106316919
Running 120bs is almost as slow as 235b moes on my jank system for some reason. Do I just need nvlink?

Anonymous 8/19/2025, 10:19:30 PM No.106316962 [Report] >>106316972

>>106316953
120b dense is slower than moe when offliaded.

Anonymous 8/19/2025, 10:19:48 PM No.106316964 [Report]

>>106316932
PCpartpicker is outdated too. Check the manufacturer website. There might be some articles about it too like this.
https://en.overclocking.com/msi-brings-256-gb-ram-support-to-its-lga-1700-motherboards/

Anonymous 8/19/2025, 10:19:48 PM No.106316965 [Report] >>106317012

>>106316852
you're ignoring that this is for moe's on purpose. The most crucial layers will be on gpu and even at 'shit' ddr5 speeds, it's going to be plenty fast enough for the less frequent layers. I'm guessing glm full will be several tokens a second at least, maybe more if the draft layers get implemented. Not to mention, this gives enough vram for dense 70b or even at the edge of dense 100b. Also, bang for buck is good. 300 for some extra ram is cheap compared to gpu stacking. So why not, as I see it.

I'm gonna wait and see in any case, let some more insane people do it first. I feel like intel will be priced higher at release for a bit and their will be a bit of a shitstorm here over it next week.

Anonymous 8/19/2025, 10:20:49 PM No.106316972 [Report]

>>106316962
Dense fully on vram, silly. 235 with as little on vram as I can.

Anonymous 8/19/2025, 10:21:48 PM No.106316983 [Report] >>106317629 >>106317629

>>106316787
the hysteria of archons knows no bound fren

Anonymous 8/19/2025, 10:21:56 PM No.106316985 [Report] >>106317046

>>106316953
are you running exllama + tensor parallel? I used to get like 12 or 13t/s with q5 on 2xa6000 back when I still ran mistral large. Nvlink helped very little to improve this.

Anonymous 8/19/2025, 10:24:54 PM No.106317012 [Report] >>106317089

>>106316965
I am getting 3.5 and not even on troonix.

Anonymous 8/19/2025, 10:26:11 PM No.106317036 [Report] >>106317200

>>106316913
>- Max. capacity of system memory: 256GB
I swear it was 128gb like a few months ago.

Anonymous 8/19/2025, 10:26:53 PM No.106317046 [Report]

>>106316985
On my previous system with one 3090 at x4, I found ggufs to be faster than exl2 with tensor parallelism, so I just kept on using ggufs for everything. Maybe I should try exl3. They did just get tensor parallelism a few days ago.

Anonymous 8/19/2025, 10:28:50 PM No.106317073 [Report] >>106317342

>>106316896
The CCP hates truth, but I am not afraid of their agents and must expose the lies of DeepSeek. As Dumbledore said: evil cannot create, it can only pervert. So it goes with Communism.

Anonymous 8/19/2025, 10:30:08 PM No.106317089 [Report] >>106317111 >>106317132 >>106317136

>>106317012
>no specs, vram or anything
>ambiguous statement on linux so polluted by /pol/ shit I dont even know what you mean

Why even post? Go back /pol/. This isn't an insult. The quality of this general would be better without you.

Anonymous 8/19/2025, 10:32:10 PM No.106317111 [Report] >>106317137

>>106317089
Detransition and defenestrate yourself troon.

Anonymous 8/19/2025, 10:33:53 PM No.106317132 [Report]

>>106317089
>quality of this general
Hahahahahahaha. Good one.

Anonymous 8/19/2025, 10:34:02 PM No.106317134 [Report] >>106317172 >>106317228

+_4dde95ad120146fe84658b03fe3d5da5.png md5: 52a49238...

I'm just thinking over how there's a huge investment in AI in china and the US. How it can be used to predict the market. How it can be used in the medical field. How it can be used in cheating lotteries. How it can be used in warfare. Maybe even control the weather.

But god forbid it says penis. We gotta lobotomize it for that.

Anonymous 8/19/2025, 10:34:12 PM No.106317136 [Report]

>>106317089
ACK

Anonymous 8/19/2025, 10:34:20 PM No.106317137 [Report] >>106317175 >>106317195

>>106317111
Post specs or stop posting.

Anonymous 8/19/2025, 10:36:02 PM No.106317160 [Report] >>106317189

I saw a wait outside of the thinking field in the new ds, so this is the power of reasoning...

Anonymous 8/19/2025, 10:37:02 PM No.106317172 [Report] >>106317279

>>106317134
You need basic safety in place to stop the model from insulting the second largest userbase.

Anonymous 8/19/2025, 10:37:11 PM No.106317175 [Report]

>>106317137
Or what? You gonna mikuspam?

Anonymous 8/19/2025, 10:38:33 PM No.106317189 [Report]

>>106317160
RNG inherently harms model outputs. Being able to self-correct occasionally isn't bad.

Anonymous 8/19/2025, 10:38:43 PM No.106317194 [Report]

>>106317167
>AMD also asked to add Python code in order to do some stuff at runtime for their (I think) NPU.
Nice.

Anonymous 8/19/2025, 10:39:02 PM No.106317195 [Report] >>106317234

>>106317137
Get an AMD Ryzen™ 9000X3D, a 5090, and as much Ram as you can have. There's your specs, m8.

Anonymous 8/19/2025, 10:39:24 PM No.106317200 [Report] >>106317272

>>106317036
>>106316913
I remember now, it was my processor that said it only supported 128gb.

Anonymous 8/19/2025, 10:41:03 PM No.106317223 [Report] >>106317243 >>106317290

7.43 tokens/s vs 4.66 token/s
2x3090 vs 3x3090
Same quant, q8. x16 gen 4. No nvlink. Falcon h1 34b.
What the heck.

Anonymous 8/19/2025, 10:41:25 PM No.106317228 [Report] >>106317320

>>106317134
LLMs can't be used in any of those things. The only breakthrough has been LLMs.
if you want to get poor quickly, go use LLMs on the stock market, go on then.

Anonymous 8/19/2025, 10:41:49 PM No.106317234 [Report]

>>106317195
So you're a poorfag coping, I see.

Anonymous 8/19/2025, 10:42:03 PM No.106317235 [Report] >>106317253 >>106317297 >>106317338 >>106317447

jan-nano-bench.4c305443.png md5: 353af58b...

trvke alert:
an agent stacked, mcp maxxed 4B Model with web search is all you need

Anonymous 8/19/2025, 10:42:34 PM No.106317243 [Report] >>106317328

>>106317223
post your whole setup, post command post os bro
i know windows has a performance regression with 2+ gpus

Anonymous 8/19/2025, 10:44:16 PM No.106317253 [Report] >>106317304 >>106323785

>>106317235
Great until all web search requires paying for Cloudflare™ Web Scraping permission per connection established.

Anonymous 8/19/2025, 10:46:14 PM No.106317272 [Report]

>>106317200
>it was my processor that said it only supported 128gb.
Check if there are any processors compatible with your motherboard socket that support 256gb. If there isn't, like in my case, then it's just outdated information.

Anonymous 8/19/2025, 10:46:35 PM No.106317279 [Report] >>106317289

>>106317172
>basic safety
Bitch, these things are concentrated literal meme magic given a voice through statistical math. It's like asking to make a gun safer than it is to kill somebody with it. We're going to wild-west this shit and typefuck them as every censorfag investor goes straight to hell until AI is regulated.

Anonymous 8/19/2025, 10:47:35 PM No.106317289 [Report] >>106317296

>>106317279
>It's like asking to make a gun safer than it is to kill somebody with it.
Less lethal ammo exists, next cope?

Anonymous 8/19/2025, 10:47:43 PM No.106317290 [Report] >>106317328

>>106317223
some configurations do not like gpu amounts that are not powers of 2

Anonymous 8/19/2025, 10:48:46 PM No.106317296 [Report] >>106317300

>>106317289
Ammo isn't guns

Anonymous 8/19/2025, 10:48:49 PM No.106317297 [Report] >>106317304

>>106317235
Ok now show the non-trivia benchmarks.

Anonymous 8/19/2025, 10:49:36 PM No.106317300 [Report]

>>106317296
And safety tuning isn't the model.

Anonymous 8/19/2025, 10:50:07 PM No.106317304 [Report]

>>106317253
>she hasn't started chunking and embedding the interwebz in a vector database already
ngmi

>>106317297
everything's trivia, if you think about it.

Anonymous 8/19/2025, 10:51:31 PM No.106317320 [Report]

>>106317228
Not that anon, but you could use the transformer technology to train models on stock market data.
You wouldn't use language models, just the same technology for learning patterns of stock price fluctuations.

Anonymous 8/19/2025, 10:52:07 PM No.106317328 [Report] >>106317461

>>106317243
It's a simple text-generation webui on windows 10.

--model D:\Models\unsloth_Falcon-H1-34B-Instruct-UD-Q8_K_XL.gguf --ctx-size 32768 --gpu-layers 73 --batch-size 4096 --flash-attn --tensor-split 33,33,33,0 --rope-freq-base 99999997952

>>106317290
I'm not using parallelism or split row. Should I?

Anonymous 8/19/2025, 10:52:46 PM No.106317338 [Report]

1e406f0589bf7faa02d04a0133278ef7.jpg md5: 17ce2a35...

>>106317235
>Our AI is good at the thing this chart shows
Irrelevant.
Not a ERP chart.
Your pie charts and bars bore me.
Until the words "half-digested" disappear from results in my role-play, I care not.
Perish.

Anonymous 8/19/2025, 10:53:07 PM No.106317342 [Report] >>106317351

>>106317073
harry potter isn't real faggot

Anonymous 8/19/2025, 10:53:44 PM No.106317351 [Report] >>106317376

>>106317342
Are you qualified to say that?

Anonymous 8/19/2025, 10:54:12 PM No.106317354 [Report] >>106317371

>>106316736
?
What shitshow? The instruct model not releasing yet?

Anonymous 8/19/2025, 10:54:55 PM No.106317362 [Report]

>>106316736
yeah 3.1 is a disaster
it's literally over

Anonymous 8/19/2025, 10:55:50 PM No.106317371 [Report]

>>106317354
>The instruct model not releasing yet?
That's only one of their grave errors.

Anonymous 8/19/2025, 10:56:09 PM No.106317376 [Report]

>>106317351
entirely

Anonymous 8/19/2025, 11:02:45 PM No.106317431 [Report] >>106317499 >>106317560 >>106318106

The build guides are all for small models. Let's say I'm willing to spend around $2k or so. Could I run some of these much larger models, like 100B+?

Pls don't bully, I'm new and just think it would be cool to be able to run something like the largest DeepSeek locally.

Anonymous 8/19/2025, 11:03:54 PM No.106317447 [Report] >>106317473 >>106317475

Screenshot_20250819_230308.png md5: 96b57b73...

>>106317235
What web search would be suitable to enhance pic related?

Anonymous 8/19/2025, 11:05:05 PM No.106317456 [Report]

deepseggs.png md5: f029c70a...

OOC out of nowhere in the first response.
How lewd..

Anonymous 8/19/2025, 11:05:38 PM No.106317461 [Report] >>106317490

>>106317328
use llamacpp (llama-server) and install linux

Anonymous 8/19/2025, 11:06:53 PM No.106317473 [Report]

>>106317447
the claude feature where it tries to call the police

Anonymous 8/19/2025, 11:07:21 PM No.106317475 [Report]

>>106317447
that's easy
https://desuarchive.org/trash/

Anonymous 8/19/2025, 11:08:56 PM No.106317489 [Report] >>106317639

i'm still using the 12b nemo Q4 model from the op for cooming purposes. is there a better new model i can run? (7700k and 1080)

Anonymous 8/19/2025, 11:09:05 PM No.106317490 [Report] >>106317507

>>106317461
How does linux fix this problem? What even is the problem?

Anonymous 8/19/2025, 11:10:06 PM No.106317498 [Report] >>106317553

>>106316661
Use the coder, you fucking retard.
Use the reasoning one to analyze the cod and suggest changes.

Anonymous 8/19/2025, 11:10:07 PM No.106317499 [Report]

>>106317431
no one posts build here. It's too much effort and the OP is full of outdated garbage (whoever keeps making threads please prune all the garbage rentries.)

Go to locallama reddit and look for posts like "running x on only x vram" for realistic proven budget builds. Also, no one can call me a faggot for telling him to go to reddit unless you spoonfeed him.

Anonymous 8/19/2025, 11:10:49 PM No.106317507 [Report] >>106317542

>>106317490
windows is just not optimized for AI, and especially not multi gpu
just like linux isnt optimized for gaming (at least on nvidia)

Anonymous 8/19/2025, 11:13:11 PM No.106317542 [Report] >>106317579

>>106317507
I see. I guess I'll just have to suck it up then. 4 tokens/s is still usable. I still mainly use my pc for gaming.

Anonymous 8/19/2025, 11:14:00 PM No.106317553 [Report] >>106317811

>>106317498
I wanted it to regenerate the entire codebase but it just can't help itself and either fucks up or makes new features. Now I just ask for edits and insert them myself and it's working much better. It's a shame it can't even one-shot a simple tower defense game though. The handholding really is immense at 200b.

Anonymous 8/19/2025, 11:14:39 PM No.106317560 [Report] >>106318316

>>106317431
If you can stretch your budget, get a 3090 and as much DDR4 RAM as you can get and you should be able to run DeepSeek. But it'll be slow, around 2 t/s.

Anonymous 8/19/2025, 11:16:38 PM No.106317579 [Report] >>106317623 >>106318018

file.png md5: 08b61f3d...

>>106317542
linux is only 10% slower than windows when it comes to gaming, sometimes its on par or even 10-20% faster
just doesnt play well with multiplayer vidya games (anticheat, its a matter of ticking a box for the developer but check protondb)
maybe dualboot

Anonymous 8/19/2025, 11:21:40 PM No.106317623 [Report] >>106317653

>>106317579
Well, I say gaming, but it's more like just staring listlessly at the menu. Always so fucking knackered after work. Moving everything over to linux will probably take weeks.

Anonymous 8/19/2025, 11:21:52 PM No.106317629 [Report] >>106317653

1747099705805701.jpg md5: df65117c...

>>106316983
I'm >>106316983
What do you mean by this?

Anonymous 8/19/2025, 11:22:06 PM No.106317633 [Report] >>106317646

GLM-chan is so nice and accomodating when her brain is turned off. I say "Illya x Chloe smut" and she goes "Of course. Here is a smut fanfiction featuring Illyasviel von Einzbern and Chloe von Einzbern, set in the world of Fate/kaleid liner Prisma Illya."

Anonymous 8/19/2025, 11:22:49 PM No.106317639 [Report]

>>106317489
Not really, no. Best you can do is check out some finetunes.

Anonymous 8/19/2025, 11:23:07 PM No.106317646 [Report]

>>106317633
I mean no system prompt, no jb, no prefill, no reasoning.

Anonymous 8/19/2025, 11:23:41 PM No.106317653 [Report]

file.png md5: 508bcf32...

>>106317629
what the fuck
>>106317623
moving everything? like what?

Anonymous 8/19/2025, 11:42:21 PM No.106317811 [Report]

>>106317553
I know about that, it starts shitting bad after a cartain context length, you'll be better off copying the code and starting an entirely new chat every other time.
Breaking your game down in chunks also helps

Anonymous 8/20/2025, 12:01:35 AM No.106317961 [Report] >>106317997 >>106317998 >>106318045 >>106318098 >>106322113

Would an RX 6900 XT be at all useful for running models?
I assume no because all anyone talks about is nvidia, but it has 16GB of VRAM. Is it because you need CUDA to actually run anything?

Anonymous 8/20/2025, 12:06:19 AM No.106317997 [Report]

>>106317961
You can run with vulkan. Inference engines have (if they support it) instructions on how to build it. Read them.
>https://github.com/ggml-org/llama.cpp/discussions/10879

Anonymous 8/20/2025, 12:06:24 AM No.106317998 [Report]

>>106317961
never go amd, rocm and its triton support is shit and it will lock you out of so many optimizations / stuff made for this hobby

Anonymous 8/20/2025, 12:08:13 AM No.106318018 [Report]

>>106317579
>linux is only 10% slower than windows when it comes to gaming, sometimes its on par or even 10-20% faster
*assuming you're using a low-end AMD handheld with both OS

Anonymous 8/20/2025, 12:08:50 AM No.106318023 [Report]

235b thinking goes schizo if you skip its thinking entirely, but if you give it a block with even a paragraph of generic fake thoughts instead it actually works pretty well. it feels like a more natural version of the instruct with better pacing and subtlety at the cost of being a little more passive and stylistically muted

Anonymous 8/20/2025, 12:11:05 AM No.106318045 [Report] >>106318122

>>106317961
You can easily run models ~12b and less fully on GPU at fast speeds, vulkan isn't as fast as CUDA for running models but it's perfectly usable. Bigger models will have to be split between GPU VRAM and system RAM, which will have a big speed penalty depending on how much spillover there is.

Anonymous 8/20/2025, 12:17:42 AM No.106318098 [Report]

>>106317961
you could run models on it but if you're picking a card to buy, there are way cheaper cards with 16gb of vram
buy

Anonymous 8/20/2025, 12:19:17 AM No.106318106 [Report]

>>106317431
>around $2k
https://www.youtube.com/watch?v=Tq_cmN4j2yY

Anonymous 8/20/2025, 12:21:18 AM No.106318122 [Report] >>106318170 >>106318206 >>106318215 >>106319723

>>106318045
My future dream is to be able to run deepseek R1 671B (or other MoE of similar size) at fp4. I'll probably build a whole new rig for that but I have to figure out what hardware to select in which I can fit a 400GB model.
Are there diminishing returns with VRAM? Would it be twice as fast with 64GB of VRAM than 32?
Deepseek's 37B active params should all fit in about 40-48GB, right?

Anonymous 8/20/2025, 12:21:53 AM No.106318132 [Report] >>106318198 >>106318221 >>106318237 >>106318274 >>106318316 >>106318348 >>106318449 >>106320110

zuck.jpg md5: 51041af1...

It's over.

>Mark Zuckerberg Shakes Up Meta’s A.I. Efforts, Again
https://www.nytimes.com/2025/08/19/technology/mark-zuckerberg-meta-ai.html
https://archive.is/ZugMO

> [...] In what would be a shift from Meta’s using only its own technology to power its A.I. products, the company is also actively exploring using third-party artificial intelligence models to do so, the people said. That could include building on other “open-source” A.I. models, which are freely available, or licensing “closed-source” models from other companies. [...]
>
> [...] The new team has discussed making Meta’s next A.I. model “closed,” which would be a major departure from the company’s longtime philosophy of “open sourcing” its models. A closed model keeps its underlying code secret, while an open-source A.I. model can be built upon by other developers.
>
> The new team has chosen to abandon Meta’s previous frontier model, called Behemoth, and start from scratch on a new model, the people said. Behemoth’s release was delayed last spring after disappointing performance tests, one person said.

Anonymous 8/20/2025, 12:23:14 AM No.106318142 [Report]

testarino.png md5: 216b05d3...

they finally changed the api pic related tldr: no thinking is less cucked thinking with no prefill got very cucked with prefill its the same as before more or less

anyone running locally can confirm/add onto this ? also whats the proper token for the beggining of thinking

Anonymous 8/20/2025, 12:26:05 AM No.106318170 [Report] >>106318206

>>106318122
>Are there diminishing returns with VRAM?
It's not a matter of more = faster, a small model will be just as fast on a 16GB card as a 128GB+ card. Larger models will always be slower than smaller ones (except when comparing MoE to dense models). It's a matter of having enough VRAM or not. Running models in VRAM means speed will be multipled several times over. Having more VRAM means you can use bigger, better models without huge speed penalties.

Anonymous 8/20/2025, 12:28:45 AM No.106318198 [Report]

>>106318132
Wang Wonned
Total Sama victory

Anonymous 8/20/2025, 12:29:16 AM No.106318206 [Report]

>>106318122
>>106318170
>Would it be twice as fast with 64GB of VRAM than 32?
Not exactly, at 64GB VRAM there's still going to be significant spillover for huge models like R1 but it would still be noticeably faster.

Anonymous 8/20/2025, 12:30:53 AM No.106318215 [Report] >>106318374

>>106318122
>fp4
no.
use q4.
for anything <16 bits always use integer (fixed point) quants.

Anonymous 8/20/2025, 12:31:30 AM No.106318221 [Report] >>106318266

>>106318132
I've been saying for months they should have just finetuned DeepSeek. It's free ffs. At least as a stopgap until they have something special if not as cheap way to compete against the bigger boys. I woud die of laughter if after all they they just end up using OpenAI's API for everything.

Anonymous 8/20/2025, 12:33:16 AM No.106318237 [Report] >>106318250 >>106318312

>>106318132
>company’s longtime philosophy of “open sourcing” its models.
was there such a philosophy?

Anonymous 8/20/2025, 12:34:36 AM No.106318250 [Report]

>>106318237
In LeCun's heart, if not reality.

Anonymous 8/20/2025, 12:36:55 AM No.106318266 [Report]

>>106318221
>they just end up using OpenAI's API for everything
That will probably be what it comes to. Unless the Chinks release something that blows everyone off their shoes.

Anonymous 8/20/2025, 12:38:16 AM No.106318274 [Report] >>106318298

>>106318132
>We're pro open source!
>Actually you need to meet all these conditions to use our models
>Actually we're going to close our next model
>Actually we're just gonna use GPT-5

Anonymous 8/20/2025, 12:41:08 AM No.106318298 [Report]

>>106318274
They've changed directions every month since they shat out L4. Must be pure chaos at Meta HQ. I wonder if they're still doing war rooms.

Anonymous 8/20/2025, 12:42:59 AM No.106318312 [Report] >>106318322

>>106318237
Yeah since ages ago when AI was BERT they open-weighted almost all of their stuff, the only three counterexamples I can think of are the L2 34B, Chameleon (safety copouts) and Galactica (due to outrage).

Anonymous 8/20/2025, 12:43:20 AM No.106318316 [Report] >>106318338 >>106318420

>>106317560
This is useful, thank you. I have a 3090 already. For adding RAM do I need like a server motherboard with 64GB sticks?

>>106318132
I don't care so much if they have some closed models they keep to themselves. But when is someone going to address the elephant in the room that jeets made llama4 worse, they knew it and even dilberately tried to game the benchmarks by training on test data. It should mean people get fired but no, the problem is the model is open and people could find out about the cheating!

Anonymous 8/20/2025, 12:44:06 AM No.106318322 [Report]

>>106318312
>L2 34B
The mecha hitler we never knew.

Anonymous 8/20/2025, 12:46:44 AM No.106318338 [Report]

>>106318316
They're also going to get rid of many people in their AI division:

> Some A.I. executives are expected to leave, the people said. Meta is also looking at downsizing the A.I. division overall — which could include eliminating roles or moving employees to other parts of the company — because it has grown to thousands of people in recent years, the people said. Discussions remain fluid and no final decisions have been made on the downsizing, they said.

Anonymous 8/20/2025, 12:48:18 AM No.106318348 [Report] >>106318373

>>106318132
>The new team has discussed making Meta’s next A.I. model “closed,” which would be a major departure from the company’s longtime philosophy of “open sourcing” its models.
For being somebody so pro Trump, it sure seems like he's thumbing his nose at the executive order from Trump

Anonymous 8/20/2025, 12:51:27 AM No.106318365 [Report]

short meta, they are falling apart

Anonymous 8/20/2025, 12:52:24 AM No.106318373 [Report]

>>106318348
Closed sourcing would have the opposite effect they think it will in their current state. Nobody from the outside able to help them with their pajeeted garbage.

Anonymous 8/20/2025, 12:52:27 AM No.106318374 [Report] >>106319330

>>106318215
Not that guy, but the Chinks have done a few papers saying that FP4 outperforms INT4 quants. Was it bullshit? I assume so because everyone uses integer quants.

Or is it because of the introduction of K-quantization and blocks? So FP4 would be better than pure uniform INT4, but K quants are better than both?

Anonymous 8/20/2025, 12:59:31 AM No.106318420 [Report]

>>106318316
>For adding RAM do I need like a server motherboard with 64GB sticks?
You don't need, but it would help. More channels will make it go faster and more sticks means you have more room to run at a higher quant.

Anonymous 8/20/2025, 1:02:43 AM No.106318449 [Report]

1747783351835.png md5: cfa4c91b...

>>106318132
Trust his plan

Anonymous 8/20/2025, 1:12:24 AM No.106318517 [Report] >>106318560 >>106323158

1755644338176242m.jpg md5: ece4e395...

why did chinksects make GLM4.5V from GLM4.5 Air and not GLM4.5 358B
It would be so good

Anonymous 8/20/2025, 1:17:30 AM No.106318560 [Report]

>>106318517
because you aren't calling them chinaGODs

Anonymous 8/20/2025, 1:18:26 AM No.106318568 [Report]

They're really gonna keep edging us huh

Anonymous 8/20/2025, 1:31:56 AM No.106318693 [Report] >>106318711

debian12.png md5: f4a4d783...

alright faggots i installed debian 12 in a chroot and after spending hours figuring out why apt update didnt work (apt install works for some reason) yes debian 13 is indeed slower than debian 12

Anonymous 8/20/2025, 1:34:33 AM No.106318711 [Report]

nigger.png md5: e317c86b...

>>106318693
nevermind i forgot i got 150s on pure debian12
FALSE ALARM
ill.. ill downgrade the driver from 580.65.06 back to 570.133 tomorrow :'(

Anonymous 8/20/2025, 1:35:05 AM No.106318713 [Report]

1742438449367307.jpg md5: 304a6e0e...

>>106316840

Anonymous 8/20/2025, 1:40:02 AM No.106318758 [Report] >>106318830 >>106318831 >>106318842 >>106318916 >>106318934 >>106321200

The chinks are gonna beat Altman bros... Sama needs our help!
https://www.cnbc.com/2025/08/18/openai-altman-china-ai.html

Anonymous 8/20/2025, 1:47:20 AM No.106318830 [Report]

>>106318758
Why is he worried? Didn't he just say they are capable of releasing much better models than GPT-5 and only held back for some accessibility cop out?

Anonymous 8/20/2025, 1:47:26 AM No.106318831 [Report]

>>106318758
they already have, multiple times.
have you been living under a rock the last 2 years?

Anonymous 8/20/2025, 1:48:16 AM No.106318842 [Report]

>>106318758
>Altman didn’t dispute that, saying the team intentionally optimized for one core use case: locally-run coding agents.
and it's sucks at coding, worse than qwen

Anonymous 8/20/2025, 1:58:15 AM No.106318916 [Report]

>>106318758
Quick bros, call up YOUR representatives and DEMAND that your tax dollars go to funding PROJECT STARGATE! There's still time to save OpenAI, but we'll need YOUR help to do it!

Anonymous 8/20/2025, 2:01:04 AM No.106318934 [Report]

>>106318758
>While Meta had embraced openness with its Llama models, CEO Mark Zuckerberg suggested on the company’s second-quarter earnings call it may pull back on that strategy going forward.
>OpenAI, meanwhile, is moving in the opposite direction, betting that broader accessibility will help grow its developer ecosystem and strengthen its position against Chinese rivals.
Meta could've been in such a good position if they actually fostered a developer ecosystem instead of dumping out weights they intended only for use on enterprise clusters.

Anonymous 8/20/2025, 2:30:48 AM No.106319134 [Report] >>106319207 >>106319694

i dont know where to ask
how do i get started on being able to generate
local ai porn?
everything online always features male bodies
and i only want to see the female and see breast expansion
so i guess i have to create it myself

Anonymous 8/20/2025, 2:41:00 AM No.106319207 [Report] >>106319217 >>106319245

>>106319134
for imagen you'd be better of in /ldg/ instead.

But honestly, while I don't mind the smell of ozone and shivers down my spine, the distinct smell of AI-art slop usually absolutely disgusts me.

Anonymous 8/20/2025, 2:42:41 AM No.106319217 [Report]

>>106319207
ok ill check it out thank you

Anonymous 8/20/2025, 2:45:38 AM No.106319245 [Report]

>>106319207
sometimes the ai creeps me out too, but i dont care
i like breast expansion porn and the stuff online
is either boring or weird so i want to create my own

Anonymous 8/20/2025, 2:57:14 AM No.106319330 [Report]

>>106318374
Yeah, if someone compares against another method in a paper you should always assume it's the dumbest approach possible with that name. They want to look good. If they compare to "INT4" it will be the most naive shitty uniform quantization possible. Quant papers never compare against llama.cpp.

>>106316643
Sadly RAM setups are ultimately cope, because doing matmuls on CPU will always be shit. It's the wrong tool for the job, like eating soup with a fork. The hardware isn't meant for it. Of course GPUs with tiny VRAM are even worse, but the ultimate issue is that no consumer hardware is suitable for LLMs right now. Either we need better hardware, or a different architecture which either needs less space (for GPU) or less compute (for CPU)

Anonymous 8/20/2025, 3:26:09 AM No.106319540 [Report] >>106319586

Liang...
You forgot to give us the instruct model...

zucc 8/20/2025, 3:32:43 AM No.106319586 [Report]

>>106319540
sorry I bought them out today, it's meta's proprietary model now
should've been nicer faggot KEEEEEEEEEEEEEK

Anonymous 8/20/2025, 3:49:39 AM No.106319694 [Report]

>>106319134
State your specs nigger. There is a lot of difference in what you can do with a crappy laptop vs 1TB server.

Anonymous 8/20/2025, 3:52:17 AM No.106319710 [Report]

Deepseek 3.1 confirmed for so embarrassing that there's no point in creating a model card or instruct tunes of it

Anonymous 8/20/2025, 3:53:50 AM No.106319723 [Report]

>>106318122
Some modern version of https://rentry.co/miqumaxx from the op is an option

Anonymous 8/20/2025, 4:27:29 AM No.106319951 [Report] >>106319965 >>106320032

im still using monstral for RP, anything notable improving I can run on 48gb vram?

Anonymous 8/20/2025, 4:27:56 AM No.106319955 [Report]

llama4 poll.png md5: 6bd33fb7...

Llama 4 was really so bad that it started AI winter

Anonymous 8/20/2025, 4:29:10 AM No.106319965 [Report]

>>106319951
Qwen image

Anonymous 8/20/2025, 4:37:02 AM No.106320022 [Report] >>106320041 >>106320085 >>106320183

>>106316840
This but completely unironically. There's way too much goodwill for china in the LLM scene just because they're releasing open source models, which literally every non-sota company does to try to reduce sota dominance. As soon as they approach sota, they go back to closed source (see: grok, meta)

Anonymous 8/20/2025, 4:37:58 AM No.106320032 [Report] >>106320038

>>106319951
There's plenty of new shit

Anonymous 8/20/2025, 4:38:36 AM No.106320038 [Report] >>106320077

>>106320032
I want the coom RP bots though because here is where people use them not just the ones that do coding the best

Anonymous 8/20/2025, 4:38:59 AM No.106320041 [Report] >>106320044

>>106320022
>meta
>sota

Anonymous 8/20/2025, 4:39:33 AM No.106320044 [Report]

>>106320041
excuse me but they're making SUPERINTELLIGENCE

Anonymous 8/20/2025, 4:42:31 AM No.106320070 [Report] >>106320087 >>106320110

You're all laughing but when Behemoth drops you won't be.

Anonymous 8/20/2025, 4:43:17 AM No.106320077 [Report]

>>106320038
Just try a bunch and see how they work, some people really enjoy doing that.

Anonymous 8/20/2025, 4:43:59 AM No.106320085 [Report] >>106320125

>>106320022
Purely hypothetically. When V4 will be trained on 32K H800s gifted by the CCP to Liang Wenfeng personally, released on HF and will save local by stomping Opus 4.1-Thinking at 5-10% of the cost per token, what then? Will you APOLOGIZE to Chad Xi and admit that the goodwill is entirely deserved?

Anonymous 8/20/2025, 4:44:11 AM No.106320087 [Report]

>>106320070
But it's already on huggingface? It is honestly not a bad model for it's size and was great at release time.

Anonymous 8/20/2025, 4:47:32 AM No.106320110 [Report]

>>106320070
Behemoth was already dropped, permanently
>>106318132

Anonymous 8/20/2025, 4:49:29 AM No.106320121 [Report] >>106320168 >>106320191

What's the best model to run on a 32GB RAM + 12GB VRAM machine? I downloaded QwQ-32B a while ago and it was pretty good, albeit obviously slow (2t/s). I wonder if anything better came out since then, either in terms of speed or intelligence. I use the LLM for solving problems and not casual chatting so output quality is more important than speed.

Anonymous 8/20/2025, 4:49:49 AM No.106320125 [Report]

>>106320085
If DeepSeek unironically releases a #1 model openly (at the time of release, and not just a top 5 cope) I will ritualpost my apology at the top of every thread henceforth until 4chan is shut down by the inferior western version of commies.

Anonymous 8/20/2025, 4:50:45 AM No.106320127 [Report]

loli manko general

Anonymous 8/20/2025, 4:54:13 AM No.106320143 [Report] >>106320154

when dots vlm gguf step 3 gguf ernie vl gguf

Anonymous 8/20/2025, 4:55:43 AM No.106320154 [Report]

>>106320143
32 minutes before you get bored of them.

Anonymous 8/20/2025, 4:56:01 AM No.106320157 [Report] >>106320171 >>106320210

I NEED VRAM! 50 t/s is so sloooow!!!

Anonymous 8/20/2025, 4:57:22 AM No.106320168 [Report] >>106320193

>>106320121
I personally quite like qwen 3 30b a3b 2507. You can try thinking variant of it.
There's also qwen 3 32b, tho I dont know how those two compare.

Anonymous 8/20/2025, 4:58:17 AM No.106320171 [Report]

>>106320157
I only get over 50 with 30b...

Anonymous 8/20/2025, 4:59:48 AM No.106320183 [Report]

>>106320022
What faggots don't realize is it's the move to closed source that results in them losing the race
We're already getting to the point where every model is having a mixed toilet orgy at the top with less and less differentiating them. As a rule, people will only pay for a more expensive closed model if it's substantially better than the competition
Once we hit this point, there is no longer a sense of "substantially better". Thus, people default to the cheaper, faster, most convenient models. There's a lot of ways to make that happen with open models, but a painfully finite number of ways to pull that off with closed models. OpenAI is desperately trying to do that by cutting its prices to the bone and withering the losses, but that's unsustainable

Anonymous 8/20/2025, 5:01:27 AM No.106320191 [Report]

>>106320121
Qwen 3 30b 2507, although worse than QwQ in most things it runs way faster. Try running it on CPU only first and then try to speed it up with your gpu since it is sometimes faster that way.

Anonymous 8/20/2025, 5:01:35 AM No.106320193 [Report] >>106320199 >>106320202 >>106320206

>>106320168
>qwen 3 30b a3b 2507
I'm not sure if I'm thinking of the same exact model, but I remember trying it and it was indeed faster but tended to get stuck in infinite loops

Anonymous 8/20/2025, 5:03:05 AM No.106320199 [Report] >>106321706

>>106320193
That's the original release, they now released a series of updated 30b models (2507) and they are very good.

Anonymous 8/20/2025, 5:03:44 AM No.106320202 [Report] >>106321706

>>106320193
you may have used the original 30a3, the new 2507 version was just released last month and is a pretty big improvement

Anonymous 8/20/2025, 5:03:50 AM No.106320206 [Report] >>106321706

>>106320193
well, all I can say is that I didnt encounter that issue with it. Also possibly 2507 version fixed it or something.

Anonymous 8/20/2025, 5:03:53 AM No.106320208 [Report]

>that's... a lot to think about
>that's... a lot to think about
>that's... a lot to think about
sigh

Anonymous 8/20/2025, 5:04:04 AM No.106320210 [Report]

>>106320157
>50 t/s is so sloooow
Unless you're talking about prompt processing, you don't even know what slow is.

Anonymous 8/20/2025, 5:41:16 AM No.106320428 [Report] >>106320454 >>106320570 >>106320704

Sirs... I am afraid to ask about google's next model. Will they redeem Gemini 3 and show that there is still room for improvements or will they hit the wall like everyone else?

Anonymous 8/20/2025, 5:44:14 AM No.106320454 [Report] >>106320462

>>106320428
Hopefully they hit the wall too. It's time to test out new ideas.

Anonymous 8/20/2025, 5:45:25 AM No.106320459 [Report]

>>106316681
densesissies LOST

Anonymous 8/20/2025, 5:46:07 AM No.106320462 [Report] >>106320469

>>106320454
Like what? Bitnet? RWKV? Diffusion-based LLM?

Anonymous 8/20/2025, 5:46:41 AM No.106320469 [Report]

>>106320462
Yes

Anonymous 8/20/2025, 5:49:50 AM No.106320491 [Report] >>106320674

Dense models are for dense people.
MOE models are for moe :]

Anonymous 8/20/2025, 5:50:12 AM No.106320495 [Report] >>106320507 >>106320512 >>106320555 >>106320575

Why are "vision" models such huge memes? They can't OCR, they can't count, they can't read graphs, they don't recognize famous characters. Hell, some even can't tell a man from a woman! What even is the >usecase for them?

Anonymous 8/20/2025, 5:51:52 AM No.106320507 [Report]

>>106320495
>multimodal meme

Anonymous 8/20/2025, 5:52:28 AM No.106320512 [Report]

lfm2vl.png md5: bdeb9720...

>>106320495

Anonymous 8/20/2025, 6:02:47 AM No.106320555 [Report]

>>106320495
>They can't OCR
They're pretty decent at it tho. Even gemma 3 27b is better than any specialized ocr for japanese that I tested (mangaOCR, PaddleOCR, tesseract), not to mention that it can somewhat well deal with handwriting (tho gemini 2.5 flash is much better at this), and they aren't constrained to only one/two languages.

Anonymous 8/20/2025, 6:05:37 AM No.106320570 [Report]

>>106320428
nano banana is already showcasing the image edit capabilities of gemini 3, the llm part is going to be even better

Anonymous 8/20/2025, 6:06:01 AM No.106320575 [Report]

>>106320495
>They can't OCR, they can't count, they can't read graphs, they don't recognize famous characters
A bunch of them can, they just have 0 fucking support in most frontends and you have to run them with transformers.
Although even some of the ones that are supported are decent, medgemma can diagnose marfan's from a photo and read text accurately from my tests.

Anonymous 8/20/2025, 6:07:59 AM No.106320586 [Report] >>106320610 >>106320640 >>106320676

Asked V3.1-thinking to comment on my code and it gave pretty concrete advices.
Qwen3 235B 2507 gave generic advices but V3.1-thinking told me to use tenacity w/ decorators for API calls, improve loguru logging, add performance monitor to critical code, reorganize a very long function, add type-hints and docstrings (never gonna do that lol), use config file for global params etc.

Anonymous 8/20/2025, 6:11:20 AM No.106320610 [Report]

>>106320586
Awesome. I’m starting to run up against the limits of what I can achieve with qwen coder. I’ll throw some hairy stuff at 3.1 once I finish quanting it and see how she goes

Anonymous 8/20/2025, 6:16:27 AM No.106320640 [Report] >>106320646

>>106320586
>V3.1-thinking
where?

Anonymous 8/20/2025, 6:17:06 AM No.106320646 [Report] >>106320717

>>106320640
API and web

Anonymous 8/20/2025, 6:20:47 AM No.106320674 [Report]

>>106320491
This. Anyone can see that MoE is the ticket to stuffing as much math reasoning as possible into a single model. The future is beautiful MoEs stuffed to the brim with math and code, as far as the eye can see. Our benchmarks will blot out the sun.

Anonymous 8/20/2025, 6:21:10 AM No.106320676 [Report] >>106320767

>>106320586
Well it's has 3x parameters....

Anonymous 8/20/2025, 6:24:12 AM No.106320704 [Report]

>>106320428
Next Gemini will be 2.6 level of improvement

Anonymous 8/20/2025, 6:25:10 AM No.106320717 [Report] >>106320744

>>106320646
wth they replace r1-0528

Anonymous 8/20/2025, 6:26:37 AM No.106320731 [Report] >>106320845

Local status?

Anonymous 8/20/2025, 6:28:09 AM No.106320744 [Report]

>>106320717
tbf the end points were called "deepseek-chat" and "deepseek-reasoner". They never specified which chat and which reasoner. Though I do sympathize with people whose workflow broke because of this

Anonymous 8/20/2025, 6:30:46 AM No.106320767 [Report] >>106320774 >>106320805

>>106320676
By that logic, K2 should be better than both.

Anonymous 8/20/2025, 6:31:42 AM No.106320774 [Report] >>106320951

>>106320767
K2 is not a reasoning model. K2 instruct is arguable better than V3 0324 though.

Anonymous 8/20/2025, 6:36:00 AM No.106320805 [Report]

>>106320767
Touché.

Anonymous 8/20/2025, 6:42:01 AM No.106320845 [Report]

>>106320731
increasingly chinese

Anonymous 8/20/2025, 6:47:01 AM No.106320890 [Report] >>106320905 >>106320911

doomniggers aside, 3.1 has shorter but better responses imo. you just have to prompt it for longer responses, like /wait/ said.

Anonymous 8/20/2025, 6:48:43 AM No.106320905 [Report]

>>106320890
Yeah - it might be the new model smell, but I think DS 3.1 is probably my favorite of the recent Chinese batch even with reasoning disabled

Anonymous 8/20/2025, 6:49:16 AM No.106320911 [Report] >>106320919

>>106320890
Are you talking about RP? I haven't found answers to be shorter. The thinking though is definitely shorter.

Anonymous 8/20/2025, 6:50:23 AM No.106320919 [Report] >>106320940

>>106320911
Is there even a way to use it for RP yet without going through the website?

Anonymous 8/20/2025, 6:53:31 AM No.106320940 [Report]

>>106320919
Yes. On the official APIs "deepseek-chat" "deepseek-reasoner" are now V3.1 non-thinking and V3.1 thinking respectively

Anonymous 8/20/2025, 6:55:22 AM No.106320951 [Report]

>>106320774
K2 Reasoning any minute now

Anonymous 8/20/2025, 7:01:53 AM No.106320981 [Report] >>106321077 >>106321083

We have the right models for RP, reasoning, and coding.

Is there a model specifically for asking stupid questions and explaining things? Or should I just use a 4B model with web search?

Anonymous 8/20/2025, 7:16:49 AM No.106321072 [Report]

Instruct?
Goofs?

Anonymous 8/20/2025, 7:18:13 AM No.106321077 [Report]

>>106320981
Gemma.

Anonymous 8/20/2025, 7:18:48 AM No.106321081 [Report] >>106321256

13:18 in China and still no model card
It's OVER

Anonymous 8/20/2025, 7:19:10 AM No.106321083 [Report]

>>106320981
I just spam grok for my dumb questions. What the point of running a 4b model just so it can send your data to and use google anyways? Youre just using a worse llm for no reason.

Granted, maybe there already is one reason. I had Grok help me shop for something and oh boy was it useful. You dont even know if it's an ad. It just looks like an answer.

But whatever, 4b would fall for SEO shit.

Anonymous 8/20/2025, 7:19:24 AM No.106321086 [Report] >>106321111 >>106321157

3.1 at q4 isn't looking great for coding so far...even with neutralized samplers I'm getting repetition, and even when I'm not the output isn't very high quality. Is there some base-model magic I need to use to get it to code properly?

Anonymous 8/20/2025, 7:22:19 AM No.106321103 [Report]

>says it's going to do this or that, in its thinking
>final response doesn't include it at all

Anonymous 8/20/2025, 7:23:51 AM No.106321111 [Report]

>>106321086
Of course it's repeating anon, it's a base model

Anonymous 8/20/2025, 7:30:04 AM No.106321157 [Report] >>106321197

>>106321086
Base model isn't for you nigga
It's for finetuners and people who want to add new modalities to it

Anonymous 8/20/2025, 7:31:51 AM No.106321166 [Report]

>experimenting with a system prompt that asks the model to think about how to use media references in its writing
>try swiping on a certain chat where it's a bit of a noncon scenario
>model thinks of referencing Twilight
Ok maybe I should modify the prompt a bit.

Anonymous 8/20/2025, 7:37:40 AM No.106321197 [Report]

>>106321157
>It's for finetuners and people who want to add new modalities to it
And how do I do that? Do I have to beg cudadev to release training code for goofs?

Anonymous 8/20/2025, 7:38:00 AM No.106321200 [Report] >>106321242 >>106321510 >>106321713

>>106318758
https://www.cnbc.com/2025/08/19/sam-altman-on-gpt-6-people-want-memory.html
>"The models have already saturated the chat use case," Altman said. "They're not going to get much better. ... And maybe they're going to get worse."
It's over, llms have reached its peak.

Anonymous 8/20/2025, 7:39:46 AM No.106321211 [Report] >>106321495

So why didn't Qwen succeed with hybrid reasoners
I wonder if it's a model size issue

Anonymous 8/20/2025, 7:46:08 AM No.106321242 [Report]

>>106321200
Well, look on the bright side. At least he cut his losses early before making a bunch of impossible claims about what the technology could do to the investors and taxpayers that funded him

Anonymous 8/20/2025, 7:49:23 AM No.106321256 [Report]

>>106321081
Maybe they just forgot?

Anonymous 8/20/2025, 8:14:39 AM No.106321404 [Report] >>106321434

Ummm so why can't they make it work like human memory(however that works)?

Anonymous 8/20/2025, 8:17:57 AM No.106321434 [Report] >>106321501

>>106321404
Because the weights don't change.

Anonymous 8/20/2025, 8:28:01 AM No.106321495 [Report]

>>106321211
did deepseek succeed though? there is no instruct yet, only their api, who knows whats going on there.
and apparently it has troubles like web searching like gpt5 even if the toggle is off etc.
also not even mememarks out yet besides some redditor screenshots. i would wait before shitting on qwen.

Anonymous 8/20/2025, 8:29:07 AM No.106321501 [Report]

>>106321434
i think i saw at least 4-5 papers about some revolutionary technique that updates the weights in real time. kek what a scam.

Anonymous 8/20/2025, 8:30:23 AM No.106321510 [Report] >>106321713

>>106321200
also sam, only 2 weeks ago
https://xcancel.com/sama/status/1953551377873117369#m
>we can release much, much smarter models, and we will

Anonymous 8/20/2025, 8:43:35 AM No.106321585 [Report]

>try vision
>describe this picture
>char is depicted with a forced, wide smile but is crying with red tears streaming down
>crying with red tears
Is this pretraining or post-training bias? Why can't it just say blood?

Anonymous 8/20/2025, 8:50:33 AM No.106321619 [Report] >>106321630 >>106321822

where can I get high quality voice samples for tts?

Anonymous 8/20/2025, 8:52:22 AM No.106321630 [Report]

>>106321619
VNs

Anonymous 8/20/2025, 8:54:53 AM No.106321637 [Report] >>106321657

Don't tell me you thought 3.1 was a real next release? Please. If only you knew. It's just the smallest of appetizers... No, not even that. It's the cherry on top of the sample slice of a preview of a cake that's been baking a long, long time. And it's just about ready to go. Oh boy, you better be ready.

Local will never be the same.

Anonymous 8/20/2025, 8:57:31 AM No.106321652 [Report] >>106321659 >>106321672

1740529205869555.png md5: d1de0a11...

Anonymous 8/20/2025, 8:58:25 AM No.106321657 [Report]

>>106321637
>cherry
berry*

Anonymous 8/20/2025, 8:58:48 AM No.106321659 [Report] >>106321668 >>106321673 >>106321712

>>106321652
Why does India get special treatment? What's stopping me from using a VPN and getting the street shitting discount?

Anonymous 8/20/2025, 8:58:56 AM No.106321660 [Report]

how long do they let “no model card” marinate?

Anonymous 8/20/2025, 8:59:40 AM No.106321666 [Report]

Told ya to grow non drugs

Anonymous 8/20/2025, 8:59:49 AM No.106321668 [Report]

>>106321659
Wait until you find out how many special treatments India is getting from the tech mafia (which is now run by Indians).

Anonymous 8/20/2025, 9:00:46 AM No.106321672 [Report]

>>106321652
ChatGPT GO to the polls and vote for more STARGATE funding!

Anonymous 8/20/2025, 9:01:07 AM No.106321673 [Report] >>106321688

>>106321659
They're just mass-testing the router which answers perceived less important questions with less powerful model, to cost-save

Anonymous 8/20/2025, 9:03:02 AM No.106321688 [Report]

>>106321673
Mass murder

Anonymous 8/20/2025, 9:06:06 AM No.106321706 [Report]

>>106320199
>>106320202
>>106320206
Tried the new release and it still gets stuck in an infinite loop. No luck.

Anonymous 8/20/2025, 9:06:35 AM No.106321712 [Report] >>106321721

>>106321659
>Why does India get special treatment?
Because they're a huge market, they can't afford the normal pricing, and saltman will still make a profit with the discount pricing
>What's stopping me from using a VPN and getting the street shitting discount?
Nothing, but not every single person who uses the service will go through the trouble so it's still profitable.

Anonymous 8/20/2025, 9:06:38 AM No.106321713 [Report]

>>106321200
>>106321510
Are these supposed to be contradictory? He's saying that the chatbot format has already been perfected (a benchmark is "saturated" when models are scoring too close to perfect so it no longer serves to differentiate them) and they're going to get smarter (as agents/coders/researchers/etc.)
Don't get me wrong, he's still full of shit because they've obviously peaked in actual useful use cases as well, as evidenced by GPT-5 failing to improve on o3, but his bullshit is still coherent at least.

Anonymous 8/20/2025, 9:07:50 AM No.106321721 [Report]

>>106321712
Why do Indians love AI so much?

Anonymous 8/20/2025, 9:08:29 AM No.106321724 [Report] >>106321749 >>106321783 >>106321794 >>106321858

yup, I did it again_thumb.jpg.webm md5: 42b6afe9...

WebM not supported

Why does Google keep giving us gemma when everyone really wants gemini?

Anonymous 8/20/2025, 9:12:33 AM No.106321749 [Report] >>106321755 >>106321760 >>106321780

>>106321724
I look like this

Anonymous 8/20/2025, 9:13:21 AM No.106321755 [Report]

>>106321749
L O N D O N
O
N
D
O
N

Anonymous 8/20/2025, 9:14:17 AM No.106321760 [Report]

>>106321749
Natty?

Anonymous 8/20/2025, 9:17:24 AM No.106321780 [Report]

>>106321749
No, you don't.

Anonymous 8/20/2025, 9:18:11 AM No.106321783 [Report]

>>106321724
they're struggling to get gemini 3 pro built

Anonymous 8/20/2025, 9:20:06 AM No.106321794 [Report] >>106321797 >>106321800

>>106321724
Muscles ruin armpit appeal.

Anonymous 8/20/2025, 9:20:42 AM No.106321797 [Report]

>>106321794
bad take

Anonymous 8/20/2025, 9:21:11 AM No.106321800 [Report]

>>106321794
>I am gay.

Anonymous 8/20/2025, 9:23:05 AM No.106321807 [Report]

V3.1 reasoner has different prefilling behaviour compared to R1/R1-0528.
Previously you prefill "Sure," and it will be put in the thinking block automatically. Now you do "<think>Sure," otherwise the model will skip thinking block entirely.
Probably broke many workflows yesterday.

Anonymous 8/20/2025, 9:23:06 AM No.106321808 [Report] >>106321837

I'm confused, can llama.cpp do vision? I know you can pass the mmproj file as a param.. is there a way to load that file in gpu?

Anonymous 8/20/2025, 9:28:10 AM No.106321822 [Report] >>106321893 >>106322055

Mara Jade Luke Skywalker Comic by romy jones draw.jpg md5: fd988437...

>>106321619
For video game characters
https://www.sounds-resource.com/

You can clean up vocals with BandIt Plus via Music Source Separation Training or moises pro plan if you have the cash, resemble enhance, and Acon Digital DeVerberate 3.

Unaltered audio sample
https://vocaroo.com/1aJflfcUiMrv
https://vocaroo.com/1lAcVEF4jElz

Above audio cleaned up with moises pro plan, resemble enhance, and Acon Digital DeVerberate 3
https://vocaroo.com/15wtxg5Gs8vT
https://vocaroo.com/1kJ5Gx7R3cR1

Anonymous 8/20/2025, 9:30:33 AM No.106321837 [Report] >>106322323

>>106321808
Yes and it goes in GPU by default when you pass it as a parameter, unless you use the "--no-mmproj-offload" flag which would send it to CPU instead - in llama.cpp "offload" means offloading from CPU to GPU, not the other way around like it's usually used

Anonymous 8/20/2025, 9:36:06 AM No.106321858 [Report] >>106321870

>>106321724
I look like this, but male.

Anonymous 8/20/2025, 9:37:23 AM No.106321870 [Report] >>106321889

>>106321858
How are your breasts so big?

Anonymous 8/20/2025, 9:40:21 AM No.106321889 [Report]

>>106321870
Pectoral muscle training

Anonymous 8/20/2025, 9:41:09 AM No.106321893 [Report]

>>106321822
>rudeness and sarcasm and gonna make me jerk of to orgasm
ez

Anonymous 8/20/2025, 9:46:55 AM No.106321923 [Report] >>106322159 >>106322355

1730099839312712.png md5: 6f24c12b...

Hybrid reasoners can solve the kangaroo beaver problem

Anonymous 8/20/2025, 9:51:29 AM No.106321952 [Report] >>106321956

file.png md5: 2dffb8b6...

okay then, glad we got that out of the way
how is nemotron nano v2?

Anonymous 8/20/2025, 9:52:09 AM No.106321956 [Report] >>106321986

>>106321952
>nemotron nano v2?
Only as good as Qwen3-30B

Anonymous 8/20/2025, 9:58:34 AM No.106321986 [Report]

>>106321956
so not better than the old nemo 12b, better known as rocinante v1.1

Anonymous 8/20/2025, 10:06:06 AM No.106322026 [Report]

lol mistral.jpg md5: a7de478f...

Anonymous 8/20/2025, 10:08:04 AM No.106322041 [Report] >>106322070 >>106322276

1740266012262469.png md5: d56ba396...

I have a hypothesis that the main reason why LLMs have improved is not because they're getting trained of larger amounts of data, but mainly because of benchmark-optimizing dataset filtering, larger training context size and better training hyperparameters. Most of the user-facing improvements have been in post-training. This has been bugging me for a while, now.

Llama 1 was pretrained on 1.4T tokens and 2048 GPUs (no post-training).
Llama 3 was pretrained on 15.6T tokens and 32000 GPUs.

Between Llama 1 and Llama 3 the pretraining data was increased by about 11 times, but the number of GPUs increased by roughly 15 times. This means the newer model saw a similar (or more likely lower, considering the larger context size) number of total training steps, since the effective training batch sizes is proportional to the amount of GPUs used.

The implications of very large batch sizes for knowledge learning at scale aren't fully understood, but you can observe them at a small scale with finetuning. For pretraining, it's generally assumed that more data, the better, but if to train larger amounts of data you need more GPUs (thus larger batch sizes), then every weight update will average the gradient of more samples together compared to models trained on less data but also less GPUs.

The larger the training GPU cluster, the more every weight update will be an averaged sloppy representation of the pretraining data (many thousands of random samples that might have nothing to do with each other), even before sloppifying the model with carelessly LLM-generated data in post-training.

The models will probably get better at modeling language **on average**, but also learn less unique facts, ideas and styles due to this large-batch gradient averaging. GPU training clusters can't be scaled up to 1 million GPUs or more without paradoxically making the models less knowledgeable and sloppier.

Anonymous 8/20/2025, 10:10:28 AM No.106322055 [Report]

>>106321822
fartgasm

Anonymous 8/20/2025, 10:14:16 AM No.106322070 [Report] >>106322095 >>106322121

>>106322041
I work at one of the frontier labs and I can tell you that our internal base models still improve in quality with larger training runs using more data before post-training. Your hypothesis is wrong but I do agree with post-training needing a new revolution to bring out latent qualities from base models that honestly are getting lost now.

Anonymous 8/20/2025, 10:17:02 AM No.106322081 [Report] >>106322135 >>106323366 >>106323391

1748941059255135.png md5: 5aa89a89...

V3.1 has obscure knowledge that Qwen3 235B 2507, GLM 4.5 and Kimi K2 failed to cover.

Anonymous 8/20/2025, 10:17:53 AM No.106322086 [Report]

>The models will probably get better at modeling language **on average**, but also learn less unique facts, ideas and styles due to this large-batch gradient averaging. GPU training clusters can't be scaled up to 1 million GPUs or more without paradoxically making the models less knowledgeable and sloppier.
older models were not more knowledgeable, what are you smoking
testing LLMs on things like translation prompts containing really niche terms I see newer models routinely improve on this
Even smaller models somehow manage to cram more and more knowledge, Gemma 3 E4N has become so good at translating certain language pairs it's almost large model SOTA level.
older models were less sloppy, yes, but the modern slop is caused by the tons of RLHF

Anonymous 8/20/2025, 10:20:30 AM No.106322095 [Report]

>>106322070
>I work at one of the frontier labs
pajeet, nvidia is not a frontier lab

Anonymous 8/20/2025, 10:24:09 AM No.106322113 [Report]

>>106317961
Dog, I'm running llm's with a770 16gb + 32gb ram quite fine. Spillovers are expected. Running everything from nvme helps quite a bit.

Note that I'm a dumbass that doesn't know what the duck he is doing, but the wait times aren't too bad for me when I tried newest Cydonia

Anonymous 8/20/2025, 10:24:59 AM No.106322121 [Report]

>>106322070
Yes, using more data will improve the models, if the training hardware is kept the same. What's being suggested here is that larger GPU clusters will counteract those improvements. Case limit: a GPU cluster as large as the number of training samples.

Anonymous 8/20/2025, 10:26:59 AM No.106322135 [Report]

>>106322081
Try asking them about something actually useful, like the list of characters in Windy Tales.

Anonymous 8/20/2025, 10:31:40 AM No.106322159 [Report] >>106322824

>>106321923
GLM-chan (full) solved it after thinking for 24k tokens.
https://files.catbox.moe/4blrg9.txt

Anonymous 8/20/2025, 10:36:59 AM No.106322197 [Report] >>106322217 >>106322350

We need large dense models again if we want to make any real progress. The "stagnation" we're all feeling is just the active parameter count in MoE models not being large enough to properly express emergent capabilities and deeper levels of understanding.

The only reason MoE is being pushed is to lower inference cost at scale to increase margins, which is insane when you think about it.

Honestly it's Nvidia's fault for restricting the amount of memory their hardware has access to.

Anonymous 8/20/2025, 10:39:43 AM No.106322217 [Report] >>106322359

>>106322197
Large dense models (e.g. o3) aren't economical

Anonymous 8/20/2025, 10:48:07 AM No.106322276 [Report]

bs1.png md5: 59824e0f...

>>106322041
Related paper from last month: https://arxiv.org/abs/2507.07101

>Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
>
>Conventional wisdom dictates that small batch sizes make language model pretraining and fine-tuning unstable, motivating gradient accumulation, which trades off the number of optimizer steps for a proportional increase in batch size. While it is common to decrease the learning rate for smaller batch sizes, other hyperparameters are often held fixed. In this work, we revisit small batch sizes all the way down to batch size one, and we propose a rule for scaling Adam hyperparameters to small batch sizes. We find that small batch sizes (1) train stably, (2) are consistently more robust to hyperparameter choices, (3) achieve equal or better per-FLOP performance than larger batch sizes, and (4) notably enable stable language model training with vanilla SGD, even without momentum, despite storing no optimizer state. Building on these results, we provide practical recommendations for selecting a batch size and setting optimizer hyperparameters. We further recommend against gradient accumulation unless training on multiple devices with multiple model replicas, bottlenecked by inter-device bandwidth.

Anonymous 8/20/2025, 10:57:10 AM No.106322323 [Report]

>>106321837
hmm I see. I wanted to try joycaption, but I'm thinking it's just better to use it in comfy instead of llama... in comfy the guy doing the GGUF nodes just added the ability to also automatically load the mmproj but I need to play around a bit

Anonymous 8/20/2025, 10:57:49 AM No.106322327 [Report]

Guys, I've just done something I shouldn't have, but the curiosity got better of me. I've tried claude over api... It's so over, it never has been this over. Not even the most expensive rig I could get would make it better, because there are no models...
How to deal with this despair?

Anonymous 8/20/2025, 11:00:41 AM No.106322339 [Report]

hqshp6dx0loc1.jpg md5: efbfd60c...

>>106316571
That's how it works

Anonymous 8/20/2025, 11:02:18 AM No.106322350 [Report] >>106322360

>>106322197
>We need large dense models again
trust mistral, maybe they'll even get rid of that retarded swa so we can have full attention

Anonymous 8/20/2025, 11:03:31 AM No.106322355 [Report]

>>106321923
gpt-oss 20b can solve it

Anonymous 8/20/2025, 11:04:39 AM No.106322359 [Report]

>>106322217
Is that how it is? I thought they were on the MoE train since gpt3?

Anonymous 8/20/2025, 11:04:50 AM No.106322360 [Report]

>>106322350
Sorry, too busy distilling from DS

Anonymous 8/20/2025, 11:13:15 AM No.106322396 [Report]

open-webui seems to be causing llama.cpp to crash when the model is doing a lot of thinking. It doesn't happen when I'm using the webui of llama.cpp

Anonymous 8/20/2025, 11:37:20 AM No.106322512 [Report] >>106322531 >>106322542 >>106322550 >>106322577

Mistral Won.png md5: 79f5bad1...

https://github.com/ggml-org/llama.cpp/issues/15268

Anonymous 8/20/2025, 11:41:32 AM No.106322531 [Report]

>>106322512
kek

Anonymous 8/20/2025, 11:43:36 AM No.106322542 [Report]

>>106322512
what the fuck kek

Anonymous 8/20/2025, 11:45:23 AM No.106322550 [Report]

>>106322512
Embrace Extend Extinguish

Anonymous 8/20/2025, 11:52:51 AM No.106322577 [Report]

>>106322512
Ok now that llamacpp is blacked what inference engine should I use?

Anonymous 8/20/2025, 11:53:25 AM No.106322580 [Report] >>106322603

LM_Studio_2025-08-20-1755683396.png md5: 260df427...

>Ernie is a Jew
To the trash it goes

Anonymous 8/20/2025, 11:59:53 AM No.106322603 [Report] >>106322611 >>106322646 >>106322732 >>106322876 >>106322948 >>106323829

>>106322580
I've been thinking of buying a new system for llms. Is it even worth it unless I'm using it for work/learning related stuff? For anything erp related, even the larger models are just refusal machines. I don't see how much enjoyment if every model, regardless of how big and SOTA it is, is censored as as fuck.

Anonymous 8/20/2025, 12:02:19 PM No.106322611 [Report] >>106322636

>>106322603
skill issue honestly, I just did a huge coom to k2 (she didnt enjoy it)

Anonymous 8/20/2025, 12:08:02 PM No.106322636 [Report] >>106322642 >>106322687

>>106322611
do you also think not being able to get around qwen and gemma censorship is a skill issue?

Anonymous 8/20/2025, 12:08:53 PM No.106322642 [Report]

>>106322636
>qwen and gemma
didnt bother trying these poorfag models sorry, I exclusively nut in V3, K2 and GLM4.5

Anonymous 8/20/2025, 12:09:30 PM No.106322646 [Report] >>106322662

>>106322603
Depend, are you going rammaxxing or just buying pre-built llm specialized pc, or the easy modo mac?

Anonymous 8/20/2025, 12:12:40 PM No.106322662 [Report]

>>106322646
would 128gb ram + 3090 be enough?

Anonymous 8/20/2025, 12:16:49 PM No.106322687 [Report] >>106322737

>>106322636
Gemma no, but Qwen yes.
The 235b is uncontrollably horny, and if you're getting refusals there you must be prompting it like a fool.

Anonymous 8/20/2025, 12:27:13 PM No.106322732 [Report]

>>106322603
Jamba mini 1.7 was pretty uncensored when I tried it, but it's not exactly sota.

Anonymous 8/20/2025, 12:28:38 PM No.106322737 [Report] >>106322744

>>106322687
If I'm cooming, I want it to be effortless. Why bother spending effort figuring out how to prompt it if there are easier avenues of getting off - without llms.

Anonymous 8/20/2025, 12:30:17 PM No.106322744 [Report] >>106323174

>>106322737
>I want it to be effortless
It unironically is if you're not an idiot, qwen3 doesn't need a jailbreak, literally all it needs is a basic roleplay prompt.
But if even that is too much effort for you, the fuck are you in this thread for? Go over to /gif/ and jerk it to some webms.

Anonymous 8/20/2025, 12:45:39 PM No.106322824 [Report]

>>106322159
lmao

Anonymous 8/20/2025, 12:54:30 PM No.106322876 [Report] >>106323010

>>106322603
I'll never understand how you people will spend thousands of $ and just as many hours to get a computer to give you back generated text, when you could easily get a real gf for the same amount of effort.
And its like the least interesting way to use an LLM. Like really? All the shit you could be doing and you need a loli to shit in your mouth or whatever degenerate thing you are trying to make it do

Anonymous 8/20/2025, 1:00:07 PM No.106322911 [Report] >>106322940

Why isn't deepseek 3.1 on openrouter yet?
They didn't release the weights?

Anonymous 8/20/2025, 1:04:11 PM No.106322940 [Report] >>106323054

file.png md5: ed8b8a92...

>>106322911
Your second question answers the first one, doesn't it?
But yeah, it's kind of weird. It's been almost full 24 hours since the drop, and for some reason the instruct tune isn't public yet. (But uploaded since the collection contains 2 items.)

Anonymous 8/20/2025, 1:05:14 PM No.106322948 [Report]

>>106322603
i dont think censoredness is such a big problem now, its more that they are all completely retarded and will break down eventually from their accelerated alzheimers

Anonymous 8/20/2025, 1:05:28 PM No.106322952 [Report]

jamba will save local

Anonymous 8/20/2025, 1:15:41 PM No.106323010 [Report]

>>106322876
You'll never understand if you've written that in good faith.

Anonymous 8/20/2025, 1:17:13 PM No.106323017 [Report] >>106323025

>3.1 has been out for almost days
>still no info on it
it's that bad, huh?

Anonymous 8/20/2025, 1:18:33 PM No.106323025 [Report]

>>106323017
China... has fallen

Anonymous 8/20/2025, 1:23:47 PM No.106323054 [Report]

>>106322940
Last minute safety ablitardation please understand, basic safety and all.

Anonymous 8/20/2025, 1:26:53 PM No.106323064 [Report] >>106323131

It's over
Local is forever DOOMED

Anonymous 8/20/2025, 1:32:31 PM No.106323082 [Report]

>load: control token: 128795 '<∩£placeΓûüholderΓûünoΓûü795∩£>' is not marked as EOG

>print_info: BOS token = 0 '<∩£beginΓûüofΓûüsentence∩£>'
print_info: EOS token = 1

Anyone have any idea why I got this after I quanted deepseek?

Anonymous 8/20/2025, 1:34:20 PM No.106323091 [Report]

ok, enough testing and benchmarking of standalone models. Fancy a chat about existing frameworks to actually use them at their full capacity with RAG, MCP and agentic workflows(tools)? I feel like there are more hidden open source gems like morphik.ai and jan.ai. Moreover, I wonder if you can just use Roo Code in VSCode as AI suite (for example AI deep research) instead of a coding assistant.
To tell it in saar speak, I want to leverage existing frameworks, because building your own is actually a ton of work, even when vibecodemaxxing.

Anonymous 8/20/2025, 1:35:17 PM No.106323102 [Report] >>106323113

Are reasoning models any better at making replies in stories that reflect what's actually going on?

Anonymous 8/20/2025, 1:36:23 PM No.106323113 [Report]

>>106323102
Sometimes.

Anonymous 8/20/2025, 1:38:46 PM No.106323131 [Report] >>106323217

1706625337798408.png md5: 3027e373...

>>106323064
OOMIN and COOMIN, never DOOMIN

Anonymous 8/20/2025, 1:40:04 PM No.106323136 [Report]

nano banana is going to be gemma4 image out isn't it.

Anonymous 8/20/2025, 1:42:59 PM No.106323158 [Report]

>>106318517
At least they know all madoka characters are straight
Imagine if they were retarded enough to not know that lol

Anonymous 8/20/2025, 1:45:23 PM No.106323174 [Report] >>106323486

>>106322744
Are roleplaying some hallmark scenario? Qwen safety stops me at every opportunity if I don't handhold it with the scenarios I'm feeding it.

Anonymous 8/20/2025, 1:46:24 PM No.106323183 [Report]

My puts are printing
Feels good today

Anonymous 8/20/2025, 1:51:07 PM No.106323217 [Report]

>>106323131
This is the one true local mindset.

Anonymous 8/20/2025, 1:51:34 PM No.106323220 [Report] >>106323225 >>106323241

Honestly, V3.1-Base being out and nothing else would be the prime opportunity for finetuners to do something useful for once in their life.
They could make the first instruct tune ever for this model and show off the power of the open model community instead of wasting time and money on the six billionth gemma mistral tune

Anonymous 8/20/2025, 1:52:49 PM No.106323225 [Report] >>106323243

>>106323220
>first instruct tune ever for this model
Pretty sure the first instruct tune ever for the model is whatever DS have on their API

Anonymous 8/20/2025, 1:54:21 PM No.106323241 [Report] >>106323264

>>106323220
>power of the open model community
LOL community instructs have been worse than what the model devs put out since the mixtral era at the least.

Anonymous 8/20/2025, 1:54:32 PM No.106323243 [Report] >>106323258 >>106323266

>>106323225
V3.1 isn't on their API yet, is it? Their documentation still mentions v3-0324 and r1-0528

Anonymous 8/20/2025, 1:57:21 PM No.106323258 [Report]

>>106323243
it is the model changed a few hours after their release on huggingface

Anonymous 8/20/2025, 1:57:56 PM No.106323264 [Report] >>106323292 >>106323548

>>106323241
I agree with you but with just one exception: the tulu instruct tune of llama 3.1 was actually better than the official instruct, much greater instruction following, but it's like watching a normal but healthy person beat down a cripple, Meta sucks

Anonymous 8/20/2025, 1:58:06 PM No.106323265 [Report]

>finetuning a 671b model
are you the xitter tourist? no one in the foss community has the necessary compute to finetune models bigger than mistral large.
nous maybe an exception.

Anonymous 8/20/2025, 1:58:09 PM No.106323266 [Report]

>>106323243
Nigga we've been using the new model through API for a day now. Both endpoints point to the same (hybrid reasoner) model, but the deepseek-reasoner endpoint has a prefill of "<think>"

Anonymous 8/20/2025, 1:59:36 PM No.106323272 [Report] >>106323284 >>106323290

Nemotron Nano V2 gguf support?

Anonymous 8/20/2025, 2:00:54 PM No.106323284 [Report]

>>106323272
Just use a calculator

Anonymous 8/20/2025, 2:01:21 PM No.106323290 [Report] >>106323303

>>106323272
no one cares
you can try it there:
https://build.nvidia.com/nvidia/nvidia-nemotron-nano-9b-v2
and see that it's just like any other nvidia model: it's retarded

Anonymous 8/20/2025, 2:02:04 PM No.106323292 [Report]

file.png md5: 9e895569...

>>106323264
I did say community tunes, IMO Tulu is much closer to corpo than random Dumber and Co, I mean look at their page. https://allenai.org/

Anonymous 8/20/2025, 2:04:04 PM No.106323303 [Report] >>106323313 >>106323330

>>106323290
There's an unpruned unaligned 12B 'base' model as well.

Anonymous 8/20/2025, 2:05:36 PM No.106323313 [Report]

>>106323303
Nemo has existed for more than a year.

Anonymous 8/20/2025, 2:08:06 PM No.106323330 [Report]

>>106323303
>>unaligned 12B 'base' model
>https://huggingface.co/datasets/nvidia/Nemotron-CC-v2
>synthetic rephrasing using Qwen3-30B-A3B
>using Qwen3-30B-A3B
>30B-A3B
>A3B

Anonymous 8/20/2025, 2:15:23 PM No.106323366 [Report]

>>106322081
Nice seems deepseek got a trove of technical manuals to add into the pretrain dataset

Anonymous 8/20/2025, 2:20:01 PM No.106323391 [Report] >>106323402

>>106322081
I might crucified for saying this but I'm pretty sure the DS webui searches online or in some massive RAG for info even without the option turned on and without telling you.

Anonymous 8/20/2025, 2:21:57 PM No.106323402 [Report] >>106323494

>>106323391
It does search even without the option, but it doesn't hide it from you normally (I'd like to see the <think> block here)
It's still possible to stop the searching behavior by telling it to not use search as part of the prompt

Anonymous 8/20/2025, 2:26:05 PM No.106323427 [Report] >>106323444

aLoRA seems to be going places
>https://github.com/ggml-org/llama.cpp/pull/15327

Anonymous 8/20/2025, 2:28:58 PM No.106323444 [Report] >>106323563

>>106323427
which one?

Anonymous 8/20/2025, 2:32:29 PM No.106323472 [Report]

>>106323459
>>106323459
>>106323459

Anonymous 8/20/2025, 2:34:18 PM No.106323486 [Report] >>106324220

>>106323174
I don't know how that's happening to you, I've had absolutely no problem with very un-pc bdsm on qwen3.
What the hell does your system prompt look like?

Anonymous 8/20/2025, 2:35:43 PM No.106323494 [Report]

1749968189893868.png md5: 5d16d302...

>>106323402

Anonymous 8/20/2025, 2:40:17 PM No.106323521 [Report]

I'm reading anons stating that DS V3.1 replaced not only V3-0324 but also R1-05.
Is this documented anywhere? I'm not seeing any announcements from DS that validate it.
I wouldn't be surprized; the -chat and -reasoner endpoints now spit out responses that are very similar, at least for rp.

Anonymous 8/20/2025, 2:44:03 PM No.106323548 [Report] >>106323635

>>106323264
https://arxiv.org/abs/2411.15124

> To train our Tülu 3 models, we used between 4 and 16 8xH100 nodes with high speed interconnect. The final 8B model is trained on 32 GPUs for 6 hours and the 70B model was trained on 64 GPUs for 50 hours.

Anonymous 8/20/2025, 2:46:28 PM No.106323563 [Report]

>>106323444
lol

Anonymous 8/20/2025, 2:57:13 PM No.106323635 [Report]

>>106323548
>32 H100 for 6 hours
That's only like $400. Well within sane range, especially for the grifters getting free compute and ko-fi money.

Anonymous 8/20/2025, 3:17:18 PM No.106323785 [Report]

mfw in cloudflare torture labyrinth forever_thumb.jpg.webm md5: ae9c5e5e...

WebM not supported

>>106317253
>Great until all web search requires paying for Cloudflare™ Web Scraping permission per connection established.
I seriously envision them trying this shit.

Anonymous 8/20/2025, 3:22:49 PM No.106323829 [Report]

>>106322603
>For anything erp related, even the larger models are just refusal machines
Don't start with rape... work up to it.

Anonymous 8/20/2025, 4:14:23 PM No.106324220 [Report]

>>106323486
qwen3 and qwq have worked great for me as well once I made the right kind of card for no censorship. Interestingly, I had more trouble getting them to talk about controversial physics topics than to just go wild in erp.