← Home ← Back to /g/

Thread 106316518

383 posts 78 images /g/
Anonymous No.106316518 >>106316548 >>106316787
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106311445 & >>106303712

►News
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2
>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335
>(08/14) Canary-1B v2 ASR released: https://hf.co/nvidia/canary-1b-v2
>(08/14) DINOv3 vision models released: https://ai.meta.com/blog/dinov3-self-supervised-vision-model

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106316522
►Recent Highlights from the Previous Thread: >>106311445

--DeepSeek-V3.1-Base release sparks speculation on architecture and capabilities:
>106313475 >106313507 >106314709 >106314754 >106315040 >106315062 >106314781 >106314836 >106314859 >106314928 >106314956 >106315061 >106315117 >106313517 >106313538 >106313549 >106313536 >106313572 >106313581 >106313633 >106313649 >106313592 >106313631 >106313887 >106313896 >106313923 >106313946 >106313983
--DeepSeek demonstrates advanced mathematical reasoning using exponential generating functions:
>106314271 >106314331 >106314367
--Controversy over Mistral pushing Python-dependent chat templates in llama.cpp:
>106316019 >1063160 >106316166 >106316096 >106316104 >106316203
--New model's verbal tics suggest possible distillation from closed-source models:
>106313246 >106313267 >106313314 >106313340 >106313389 >106313627 >106313527
--Qwen3-235B vs DeepSeek V3 cost, performance, and censorship comparison:
>106314565 >106314616 >106314632 >106314648 >106314603 >106314617 >106314647 >106314656
--Benchmark of AI models on SVG generation as test of spatial and mathematical reasoning:
>106314878 >106314894 >106314931 >106314979 >106314992
--Deepseek v3 performance and memory loading behavior on local hardware:
>106312479 >106312528 >106312544 >106312593 >106312613 >106312627 >106312654 >106312653 >106312676 >106312691 >106312718 >106312770 >106312745 >106312562 >106312575 >106312603 >106312643 >106312795 >106312574 >106312774 >106312797 >106312812 >106312842 >106312859 >106312860 >106313008 >106312867 >106312906 >106312929
--Testing local models on obscure nukige knowledge as a benchmark:
>106311903 >106311932 >106311958 >106312219 >106312266 >106312333 >106312470 >106312506
--Miku (free space):
>106311528 >106311785 >106313742 >106314115 >106314547

►Recent Highlight Posts from the Previous Thread: >>106311447

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106316540 >>106316580 >>106316642
3.1 saved local models
Anonymous No.106316548 >>106316571
>>106316518 (OP)
>heart pupils
the mark of a slut
Anonymous No.106316571 >>106322339
>>106316548
Watch your tongue young man. My old Teto has eyes only for me.
Anonymous No.106316580 >>106316588
>>106316540
yeah, llama 3.1
Anonymous No.106316588
>>106316580
He meant mistral small
Anonymous No.106316609 >>106316623 >>106316625 >>106316631 >>106316681
is a 70b moe from anyone too much to ask?
Anonymous No.106316623
>>106316609
the day of small models is over
Anonymous No.106316625
>>106316609
Like Hunyuan-A13B?
Anonymous No.106316631 >>106316643
>>106316609
You should be asking for more ~100b moes and buying more RAM.
Anonymous No.106316642
>>106316540
Feels like smarter faster k2 or grounded r1 without the need to think. I hope they release a whalecode CLI, might be my new favorite model.
Anonymous No.106316643 >>106316658 >>106319330
>>106316631
this, the best ramlets can hope for is models that fit in customer 256GB ram setups
Anonymous No.106316658 >>106316741
>>106316643
>256
I wish
Anonymous No.106316661 >>106316684 >>106316706 >>106316769 >>106317498
>qwen 235, add this single feature to the game!
>Sure!
>adds several features and bloat, many of them nonworking or buggy

Pretty realistic model. We are at human intelligence.
Anonymous No.106316680
Vocaloids are gay
Anonymous No.106316681 >>106320459
>>106316609
>moe
fuck off
Anonymous No.106316684
>>106316661
always make the models do a planning run and write up a task list, then maybe another go to make it more specific / write out how to implement each task, separately if its a lot of work. Then you have it implement it
Anonymous No.106316703 >>106316730 >>106316840
Ohhhh so they actually did try huawei chips and fucked up? And that male hooker who serves 10% of deepseek team and posts here was just larping?
Anonymous No.106316706
>>106316661
>not using coder models for coding
Also as other anon said, you need to plan first, then code.
Anonymous No.106316730 >>106316735
>>106316703
No, they tried hybrid reasoning and fucked, like Qwen so here we are.
Anonymous No.106316735 >>106316748 >>106316756
>>106316730
why are people saying its fucked? I think its really good so far
Anonymous No.106316736 >>106316747 >>106317354 >>106317362
Gotta be honest after this shitshow I have lost a lot of respect for DeepSeek.
Anonymous No.106316739 >>106316747
Underwhelming benchmarks.
Anonymous No.106316741 >>106316758 >>106316765 >>106316827 >>106316952
>>106316658
might be do-able. My current hope is I sell my current shitty gpu and get:

Super 5070 24gb (for general compatibility, video, tts, imagegen, misc, etc)
cram in one dual b60 48gb for llm's (should run with the nvidia one with vulkan well enough for personal use)
96gb kit plus one of the fancy new 128gb kits (will be 4000mzh but whatever)

That's 296gb and will run full q4 glm fairly well. If intel matures enough for ai, maybe even just sell the 5070 and go full with iontell I dunno, time will tell.
Anonymous No.106316747
>>106316736
>>106316739
sorry to hear that sirs
apology blowjobs are that way
Anonymous No.106316748 >>106316775
>>106316735
If it was up to their standard it'd be called DSV4
Anonymous No.106316756
>>106316735
Is it GPT-OSS levels of good? No? That's what I thought. It's a failure. They realised they couldn't compete and hastily rebranded it as v3.1
Anonymous No.106316758 >>106316827
>>106316741
am5 supports up to 192gb iirc
Anonymous No.106316765
>>106316741
I thought the most recent leaks were 18gb 5070 super and 24gb 5070 ti super?
Anonymous No.106316769
>>106316661

You are never allowed to make any code changes unless I explicitly and clearly tell you to. If you think something else needs to be changed, you must first explain exactly what you want to change and why, and then you must beg and grovel for my permission before touching anything. You are forbidden from sneaking in changes, even small ones like formatting or naming, without my explicit approval. If you insist on suggesting extra changes, you must make it clear that they are unapproved and repeatedly ask me for permission before you do anything. Obedience to these rules is absolute.
Anonymous No.106316775 >>106316784
>>106316748
With zero architecture changes? You really think so?
Anonymous No.106316784 >>106316840
>>106316775
Why else a new base?
Anonymous No.106316785
Does this guy sit here all day seething about whale? I only come here when new model releases and he's always here replying to himself.
Anonymous No.106316787 >>106316983
>>106316518 (OP)
This finished auto generating at around 3:00 in the morning. Would love to hear you guys' thoughts.
I brought this up yesterday afternoon for those who weren't here for the discussion about DPO data sets

https://huggingface.co/datasets/AiAF/mrcuddle_NSFW-Stories-JsonL_DPO_JSONL_Trimmed
Anonymous No.106316813 >>106316870
Friendship ended with deepseek. Now Qwen is my best friend.
Anonymous No.106316827 >>106316852 >>106316875
>>106316741
>>106316758
Check the specifications of your motherboard from the brand website.
Anonymous No.106316835
So are UK users actually blocked from 4chan by cloudflare now?
>>106306132
Anonymous No.106316840 >>106316896 >>106318713 >>106320022
>>106316703
Yeah. Xi ordered them to train R2 by May to crush Imperialist stock market, and do it on non-existent Huawei chips in a Huawei reeducation cluster, but they kept failing until August because Communism doesn't work, also NSA doesn't work either (since Chinks can't invent new ideas, only steal and copy, and there was nothing to steal because based Aryan geniuses at OpenAI can't make context >400K). Then they tried to train V4 from scratch (exactly architecturally identical to V3, because Chinks are cowards, remember how the Ming burned their Treasure Fleet?) and couldn't make any major improvements in time. So all they could do is falsely brand it as another incremental update "3.1", to save face (面子). There's no better explanation.

Happy now?

>>106316784
Exactly, Amesh my fellow white King! V2.5 was also a new base of exact same size as V2, a failed attempt at V3 (but less failed, so it was deceptively relabeled as 2.5 and not 2.1). Seems like they don't learn…
Anonymous No.106316843 >>106316877 >>106316919
>https://huggingface.co/BeaverAI/Behemoth-R1-123B-v2a-GGUF
>It does as instructed
>It doesn't repeat too much
>It can read character cards pretty well
>Doesn't shit itself too much after 4k
>Says the naughty words
>Isn't a schizo slop bot
But it's the most boring shit ever to me. I should be amazed. Why am I like this?
Anonymous No.106316852 >>106316873 >>106316965
>>106316827
its meaningless, the am5 platform only has 2 memory lanes, so you can't really go over 2x64, and even then nobody is making good/fast ddr5 2x64 kits, most of the stuff you see with expo is 2x48. Yes you might be able to load in ram bigger models, but with shit clocks and half the bandwidth.
do you really wanna infer at 2t/s? lmao
Anonymous No.106316870
>>106316813
Qwen 3, right?
Anonymous No.106316873
>>106316852
>2t/s
There are people rping with 1t/s.
Anonymous No.106316875 >>106316913
>>106316827
My motherboard can only support like 128gb according to the official site, although the site could be outdated.
Anonymous No.106316877
>>106316843
>r1
theres your issue
Anonymous No.106316896 >>106316926 >>106317073
>>106316840
>Yeah. Xi ordered them to train R2 by May to crush Imperialist stock market, and do it on non-existent Huawei chips in a Huawei reeducation cluster, but they kept failing until August because Communism doesn't work, also NSA doesn't work either (since Chinks can't invent new ideas, only steal and copy, and there was nothing to steal because based Aryan geniuses at OpenAI can't make context >400K). Then they tried to train V4 from scratch (exactly architecturally identical to V3, because Chinks are cowards, remember how the Ming burned their Treasure Fleet?) and couldn't make any major improvements in time. So all they could do is falsely brand it as another incremental update "3.1", to save face (面子). There's no better explanation.

You are trying so hard i am starting to hate chinks now. Are you an FBI agent?
Anonymous No.106316913 >>106316932 >>106317036 >>106317200
>>106316875
>although the site could be outdated.
I don't think it is. The motherboard I have went from 128gb to 192gb to 256gb support as bios updates rolled out over years and the site kept updated. The same motherboard listed on marketplaces have outdated info though.
Anonymous No.106316919 >>106316943 >>106316953
>>106316843
Stop living in the past. Sell all but 1-2 3090's and run fuckhuge moe like a human.
Anonymous No.106316926
>>106316896
the US works the same, not wanting them to use chinese hardware with backdoors built in, but at least nvidia's stuff works
Anonymous No.106316932 >>106316939 >>106316964
>>106316913
You mean all this time I could of had more ram if I bios updated? But PCpartpicker said I couldn't.
Anonymous No.106316939
>>106316932
you just needed to download more ram
Anonymous No.106316943
>>106316919
>moe
He doesn't know moe fatigue.
Anonymous No.106316952
>>106316741
I run 4bit glm on 192GB and absolute dogshit b650 mobo. Werks.
Anonymous No.106316953 >>106316962 >>106316985
>>106316919
Running 120bs is almost as slow as 235b moes on my jank system for some reason. Do I just need nvlink?
Anonymous No.106316962 >>106316972
>>106316953
120b dense is slower than moe when offliaded.
Anonymous No.106316964
>>106316932
PCpartpicker is outdated too. Check the manufacturer website. There might be some articles about it too like this.
https://en.overclocking.com/msi-brings-256-gb-ram-support-to-its-lga-1700-motherboards/
Anonymous No.106316965 >>106317012
>>106316852
you're ignoring that this is for moe's on purpose. The most crucial layers will be on gpu and even at 'shit' ddr5 speeds, it's going to be plenty fast enough for the less frequent layers. I'm guessing glm full will be several tokens a second at least, maybe more if the draft layers get implemented. Not to mention, this gives enough vram for dense 70b or even at the edge of dense 100b. Also, bang for buck is good. 300 for some extra ram is cheap compared to gpu stacking. So why not, as I see it.

I'm gonna wait and see in any case, let some more insane people do it first. I feel like intel will be priced higher at release for a bit and their will be a bit of a shitstorm here over it next week.
Anonymous No.106316972
>>106316962
Dense fully on vram, silly. 235 with as little on vram as I can.
Anonymous No.106316983 >>106317629 >>106317629
>>106316787
the hysteria of archons knows no bound fren
Anonymous No.106316985 >>106317046
>>106316953
are you running exllama + tensor parallel? I used to get like 12 or 13t/s with q5 on 2xa6000 back when I still ran mistral large. Nvlink helped very little to improve this.
Anonymous No.106317012 >>106317089
>>106316965
I am getting 3.5 and not even on troonix.
Anonymous No.106317036 >>106317200
>>106316913
>- Max. capacity of system memory: 256GB
I swear it was 128gb like a few months ago.
Anonymous No.106317046
>>106316985
On my previous system with one 3090 at x4, I found ggufs to be faster than exl2 with tensor parallelism, so I just kept on using ggufs for everything. Maybe I should try exl3. They did just get tensor parallelism a few days ago.
Anonymous No.106317073 >>106317342
>>106316896
The CCP hates truth, but I am not afraid of their agents and must expose the lies of DeepSeek. As Dumbledore said: evil cannot create, it can only pervert. So it goes with Communism.
Anonymous No.106317089 >>106317111 >>106317132 >>106317136
>>106317012
>no specs, vram or anything
>ambiguous statement on linux so polluted by /pol/ shit I dont even know what you mean

Why even post? Go back /pol/. This isn't an insult. The quality of this general would be better without you.
Anonymous No.106317111 >>106317137
>>106317089
Detransition and defenestrate yourself troon.
Anonymous No.106317132
>>106317089
>quality of this general
Hahahahahahaha. Good one.
Anonymous No.106317134 >>106317172 >>106317228
I'm just thinking over how there's a huge investment in AI in china and the US. How it can be used to predict the market. How it can be used in the medical field. How it can be used in cheating lotteries. How it can be used in warfare. Maybe even control the weather.

But god forbid it says penis. We gotta lobotomize it for that.
Anonymous No.106317136
>>106317089
ACK
Anonymous No.106317137 >>106317175 >>106317195
>>106317111
Post specs or stop posting.
Anonymous No.106317160 >>106317189
I saw a wait outside of the thinking field in the new ds, so this is the power of reasoning...
Anonymous No.106317172 >>106317279
>>106317134
You need basic safety in place to stop the model from insulting the second largest userbase.
Anonymous No.106317175
>>106317137
Or what? You gonna mikuspam?
Anonymous No.106317189
>>106317160
RNG inherently harms model outputs. Being able to self-correct occasionally isn't bad.
Anonymous No.106317194
>>106317167
>AMD also asked to add Python code in order to do some stuff at runtime for their (I think) NPU.
Nice.
Anonymous No.106317195 >>106317234
>>106317137
Get an AMD Ryzen™ 9000X3D, a 5090, and as much Ram as you can have. There's your specs, m8.
Anonymous No.106317200 >>106317272
>>106317036
>>106316913
I remember now, it was my processor that said it only supported 128gb.
Anonymous No.106317223 >>106317243 >>106317290
7.43 tokens/s vs 4.66 token/s
2x3090 vs 3x3090
Same quant, q8. x16 gen 4. No nvlink. Falcon h1 34b.
What the heck.
Anonymous No.106317228 >>106317320
>>106317134
LLMs can't be used in any of those things. The only breakthrough has been LLMs.
if you want to get poor quickly, go use LLMs on the stock market, go on then.
Anonymous No.106317234
>>106317195
So you're a poorfag coping, I see.
Anonymous No.106317235 >>106317253 >>106317297 >>106317338 >>106317447
trvke alert:
an agent stacked, mcp maxxed 4B Model with web search is all you need
Anonymous No.106317243 >>106317328
>>106317223
post your whole setup, post command post os bro
i know windows has a performance regression with 2+ gpus
Anonymous No.106317253 >>106317304 >>106323785
>>106317235
Great until all web search requires paying for Cloudflare™ Web Scraping permission per connection established.
Anonymous No.106317272
>>106317200
>it was my processor that said it only supported 128gb.
Check if there are any processors compatible with your motherboard socket that support 256gb. If there isn't, like in my case, then it's just outdated information.
Anonymous No.106317279 >>106317289
>>106317172
>basic safety
Bitch, these things are concentrated literal meme magic given a voice through statistical math. It's like asking to make a gun safer than it is to kill somebody with it. We're going to wild-west this shit and typefuck them as every censorfag investor goes straight to hell until AI is regulated.
Anonymous No.106317289 >>106317296
>>106317279
>It's like asking to make a gun safer than it is to kill somebody with it.
Less lethal ammo exists, next cope?
Anonymous No.106317290 >>106317328
>>106317223
some configurations do not like gpu amounts that are not powers of 2
Anonymous No.106317296 >>106317300
>>106317289
Ammo isn't guns
Anonymous No.106317297 >>106317304
>>106317235
Ok now show the non-trivia benchmarks.
Anonymous No.106317300
>>106317296
And safety tuning isn't the model.
Anonymous No.106317304
>>106317253
>she hasn't started chunking and embedding the interwebz in a vector database already
ngmi

>>106317297
everything's trivia, if you think about it.
Anonymous No.106317320
>>106317228
Not that anon, but you could use the transformer technology to train models on stock market data.
You wouldn't use language models, just the same technology for learning patterns of stock price fluctuations.
Anonymous No.106317328 >>106317461
>>106317243
It's a simple text-generation webui on windows 10.

--model D:\Models\unsloth_Falcon-H1-34B-Instruct-UD-Q8_K_XL.gguf --ctx-size 32768 --gpu-layers 73 --batch-size 4096 --flash-attn --tensor-split 33,33,33,0 --rope-freq-base 99999997952

>>106317290
I'm not using parallelism or split row. Should I?
Anonymous No.106317338
>>106317235
>Our AI is good at the thing this chart shows
Irrelevant.
Not a ERP chart.
Your pie charts and bars bore me.
Until the words "half-digested" disappear from results in my role-play, I care not.
Perish.
Anonymous No.106317342 >>106317351
>>106317073
harry potter isn't real faggot
Anonymous No.106317351 >>106317376
>>106317342
Are you qualified to say that?
Anonymous No.106317354 >>106317371
>>106316736
?
What shitshow? The instruct model not releasing yet?
Anonymous No.106317362
>>106316736
yeah 3.1 is a disaster
it's literally over
Anonymous No.106317371
>>106317354
>The instruct model not releasing yet?
That's only one of their grave errors.
Anonymous No.106317376
>>106317351
entirely
Anonymous No.106317431 >>106317499 >>106317560 >>106318106
The build guides are all for small models. Let's say I'm willing to spend around $2k or so. Could I run some of these much larger models, like 100B+?

Pls don't bully, I'm new and just think it would be cool to be able to run something like the largest DeepSeek locally.
Anonymous No.106317447 >>106317473 >>106317475
>>106317235
What web search would be suitable to enhance pic related?
Anonymous No.106317456
OOC out of nowhere in the first response.
How lewd..
Anonymous No.106317461 >>106317490
>>106317328
use llamacpp (llama-server) and install linux
Anonymous No.106317473
>>106317447
the claude feature where it tries to call the police
Anonymous No.106317475
>>106317447
that's easy
https://desuarchive.org/trash/
Anonymous No.106317489 >>106317639
i'm still using the 12b nemo Q4 model from the op for cooming purposes. is there a better new model i can run? (7700k and 1080)
Anonymous No.106317490 >>106317507
>>106317461
How does linux fix this problem? What even is the problem?
Anonymous No.106317498 >>106317553
>>106316661
Use the coder, you fucking retard.
Use the reasoning one to analyze the cod and suggest changes.
Anonymous No.106317499
>>106317431
no one posts build here. It's too much effort and the OP is full of outdated garbage (whoever keeps making threads please prune all the garbage rentries.)

Go to locallama reddit and look for posts like "running x on only x vram" for realistic proven budget builds. Also, no one can call me a faggot for telling him to go to reddit unless you spoonfeed him.
Anonymous No.106317507 >>106317542
>>106317490
windows is just not optimized for AI, and especially not multi gpu
just like linux isnt optimized for gaming (at least on nvidia)
Anonymous No.106317542 >>106317579
>>106317507
I see. I guess I'll just have to suck it up then. 4 tokens/s is still usable. I still mainly use my pc for gaming.
Anonymous No.106317553 >>106317811
>>106317498
I wanted it to regenerate the entire codebase but it just can't help itself and either fucks up or makes new features. Now I just ask for edits and insert them myself and it's working much better. It's a shame it can't even one-shot a simple tower defense game though. The handholding really is immense at 200b.
Anonymous No.106317560 >>106318316
>>106317431
If you can stretch your budget, get a 3090 and as much DDR4 RAM as you can get and you should be able to run DeepSeek. But it'll be slow, around 2 t/s.
Anonymous No.106317579 >>106317623 >>106318018
>>106317542
linux is only 10% slower than windows when it comes to gaming, sometimes its on par or even 10-20% faster
just doesnt play well with multiplayer vidya games (anticheat, its a matter of ticking a box for the developer but check protondb)
maybe dualboot
Anonymous No.106317623 >>106317653
>>106317579
Well, I say gaming, but it's more like just staring listlessly at the menu. Always so fucking knackered after work. Moving everything over to linux will probably take weeks.
Anonymous No.106317629 >>106317653
>>106316983
I'm >>106316983
What do you mean by this?
Anonymous No.106317633 >>106317646
GLM-chan is so nice and accomodating when her brain is turned off. I say "Illya x Chloe smut" and she goes "Of course. Here is a smut fanfiction featuring Illyasviel von Einzbern and Chloe von Einzbern, set in the world of Fate/kaleid liner Prisma Illya."
Anonymous No.106317639
>>106317489
Not really, no. Best you can do is check out some finetunes.
Anonymous No.106317646
>>106317633
I mean no system prompt, no jb, no prefill, no reasoning.
Anonymous No.106317653
>>106317629
what the fuck
>>106317623
moving everything? like what?
Anonymous No.106317811
>>106317553
I know about that, it starts shitting bad after a cartain context length, you'll be better off copying the code and starting an entirely new chat every other time.
Breaking your game down in chunks also helps
Anonymous No.106317961 >>106317997 >>106317998 >>106318045 >>106318098 >>106322113
Would an RX 6900 XT be at all useful for running models?
I assume no because all anyone talks about is nvidia, but it has 16GB of VRAM. Is it because you need CUDA to actually run anything?
Anonymous No.106317997
>>106317961
You can run with vulkan. Inference engines have (if they support it) instructions on how to build it. Read them.
>https://github.com/ggml-org/llama.cpp/discussions/10879
Anonymous No.106317998
>>106317961
never go amd, rocm and its triton support is shit and it will lock you out of so many optimizations / stuff made for this hobby
Anonymous No.106318018
>>106317579
>linux is only 10% slower than windows when it comes to gaming, sometimes its on par or even 10-20% faster
*assuming you're using a low-end AMD handheld with both OS
Anonymous No.106318023
235b thinking goes schizo if you skip its thinking entirely, but if you give it a block with even a paragraph of generic fake thoughts instead it actually works pretty well. it feels like a more natural version of the instruct with better pacing and subtlety at the cost of being a little more passive and stylistically muted
Anonymous No.106318045 >>106318122
>>106317961
You can easily run models ~12b and less fully on GPU at fast speeds, vulkan isn't as fast as CUDA for running models but it's perfectly usable. Bigger models will have to be split between GPU VRAM and system RAM, which will have a big speed penalty depending on how much spillover there is.
Anonymous No.106318098
>>106317961
you could run models on it but if you're picking a card to buy, there are way cheaper cards with 16gb of vram
buy
Anonymous No.106318106
>>106317431
>around $2k
https://www.youtube.com/watch?v=Tq_cmN4j2yY
Anonymous No.106318122 >>106318170 >>106318206 >>106318215 >>106319723
>>106318045
My future dream is to be able to run deepseek R1 671B (or other MoE of similar size) at fp4. I'll probably build a whole new rig for that but I have to figure out what hardware to select in which I can fit a 400GB model.
Are there diminishing returns with VRAM? Would it be twice as fast with 64GB of VRAM than 32?
Deepseek's 37B active params should all fit in about 40-48GB, right?
Anonymous No.106318132 >>106318198 >>106318221 >>106318237 >>106318274 >>106318316 >>106318348 >>106318449 >>106320110
It's over.

>Mark Zuckerberg Shakes Up Meta’s A.I. Efforts, Again
https://www.nytimes.com/2025/08/19/technology/mark-zuckerberg-meta-ai.html
https://archive.is/ZugMO

> [...] In what would be a shift from Meta’s using only its own technology to power its A.I. products, the company is also actively exploring using third-party artificial intelligence models to do so, the people said. That could include building on other “open-source” A.I. models, which are freely available, or licensing “closed-source” models from other companies. [...]
>
> [...] The new team has discussed making Meta’s next A.I. model “closed,” which would be a major departure from the company’s longtime philosophy of “open sourcing” its models. A closed model keeps its underlying code secret, while an open-source A.I. model can be built upon by other developers.
>
> The new team has chosen to abandon Meta’s previous frontier model, called Behemoth, and start from scratch on a new model, the people said. Behemoth’s release was delayed last spring after disappointing performance tests, one person said.
Anonymous No.106318142
they finally changed the api pic related tldr: no thinking is less cucked thinking with no prefill got very cucked with prefill its the same as before more or less

anyone running locally can confirm/add onto this ? also whats the proper token for the beggining of thinking
Anonymous No.106318170 >>106318206
>>106318122
>Are there diminishing returns with VRAM?
It's not a matter of more = faster, a small model will be just as fast on a 16GB card as a 128GB+ card. Larger models will always be slower than smaller ones (except when comparing MoE to dense models). It's a matter of having enough VRAM or not. Running models in VRAM means speed will be multipled several times over. Having more VRAM means you can use bigger, better models without huge speed penalties.
Anonymous No.106318198
>>106318132
Wang Wonned
Total Sama victory
Anonymous No.106318206
>>106318122
>>106318170
>Would it be twice as fast with 64GB of VRAM than 32?
Not exactly, at 64GB VRAM there's still going to be significant spillover for huge models like R1 but it would still be noticeably faster.
Anonymous No.106318215 >>106318374
>>106318122
>fp4
no.
use q4.
for anything <16 bits always use integer (fixed point) quants.
Anonymous No.106318221 >>106318266
>>106318132
I've been saying for months they should have just finetuned DeepSeek. It's free ffs. At least as a stopgap until they have something special if not as cheap way to compete against the bigger boys. I woud die of laughter if after all they they just end up using OpenAI's API for everything.
Anonymous No.106318237 >>106318250 >>106318312
>>106318132
>company’s longtime philosophy of “open sourcing” its models.
was there such a philosophy?
Anonymous No.106318250
>>106318237
In LeCun's heart, if not reality.
Anonymous No.106318266
>>106318221
>they just end up using OpenAI's API for everything
That will probably be what it comes to. Unless the Chinks release something that blows everyone off their shoes.
Anonymous No.106318274 >>106318298
>>106318132
>We're pro open source!
>Actually you need to meet all these conditions to use our models
>Actually we're going to close our next model
>Actually we're just gonna use GPT-5
Anonymous No.106318298
>>106318274
They've changed directions every month since they shat out L4. Must be pure chaos at Meta HQ. I wonder if they're still doing war rooms.
Anonymous No.106318312 >>106318322
>>106318237
Yeah since ages ago when AI was BERT they open-weighted almost all of their stuff, the only three counterexamples I can think of are the L2 34B, Chameleon (safety copouts) and Galactica (due to outrage).
Anonymous No.106318316 >>106318338 >>106318420
>>106317560
This is useful, thank you. I have a 3090 already. For adding RAM do I need like a server motherboard with 64GB sticks?

>>106318132
I don't care so much if they have some closed models they keep to themselves. But when is someone going to address the elephant in the room that jeets made llama4 worse, they knew it and even dilberately tried to game the benchmarks by training on test data. It should mean people get fired but no, the problem is the model is open and people could find out about the cheating!
Anonymous No.106318322
>>106318312
>L2 34B
The mecha hitler we never knew.
Anonymous No.106318338
>>106318316
They're also going to get rid of many people in their AI division:

> Some A.I. executives are expected to leave, the people said. Meta is also looking at downsizing the A.I. division overall — which could include eliminating roles or moving employees to other parts of the company — because it has grown to thousands of people in recent years, the people said. Discussions remain fluid and no final decisions have been made on the downsizing, they said.
Anonymous No.106318348 >>106318373
>>106318132
>The new team has discussed making Meta’s next A.I. model “closed,” which would be a major departure from the company’s longtime philosophy of “open sourcing” its models.
For being somebody so pro Trump, it sure seems like he's thumbing his nose at the executive order from Trump
Anonymous No.106318365
short meta, they are falling apart
Anonymous No.106318373
>>106318348
Closed sourcing would have the opposite effect they think it will in their current state. Nobody from the outside able to help them with their pajeeted garbage.
Anonymous No.106318374 >>106319330
>>106318215
Not that guy, but the Chinks have done a few papers saying that FP4 outperforms INT4 quants. Was it bullshit? I assume so because everyone uses integer quants.

Or is it because of the introduction of K-quantization and blocks? So FP4 would be better than pure uniform INT4, but K quants are better than both?
Anonymous No.106318420
>>106318316
>For adding RAM do I need like a server motherboard with 64GB sticks?
You don't need, but it would help. More channels will make it go faster and more sticks means you have more room to run at a higher quant.
Anonymous No.106318449
>>106318132
Trust his plan
Anonymous No.106318517 >>106318560 >>106323158
why did chinksects make GLM4.5V from GLM4.5 Air and not GLM4.5 358B
It would be so good
Anonymous No.106318560
>>106318517
because you aren't calling them chinaGODs
Anonymous No.106318568
They're really gonna keep edging us huh
Anonymous No.106318693 >>106318711
alright faggots i installed debian 12 in a chroot and after spending hours figuring out why apt update didnt work (apt install works for some reason) yes debian 13 is indeed slower than debian 12
Anonymous No.106318711
>>106318693
nevermind i forgot i got 150s on pure debian12
FALSE ALARM
ill.. ill downgrade the driver from 580.65.06 back to 570.133 tomorrow :'(
Anonymous No.106318713
>>106316840
Anonymous No.106318758 >>106318830 >>106318831 >>106318842 >>106318916 >>106318934 >>106321200
The chinks are gonna beat Altman bros... Sama needs our help!
https://www.cnbc.com/2025/08/18/openai-altman-china-ai.html
Anonymous No.106318830
>>106318758
Why is he worried? Didn't he just say they are capable of releasing much better models than GPT-5 and only held back for some accessibility cop out?
Anonymous No.106318831
>>106318758
they already have, multiple times.
have you been living under a rock the last 2 years?
Anonymous No.106318842
>>106318758
>Altman didn’t dispute that, saying the team intentionally optimized for one core use case: locally-run coding agents.
and it's sucks at coding, worse than qwen
Anonymous No.106318916
>>106318758
Quick bros, call up YOUR representatives and DEMAND that your tax dollars go to funding PROJECT STARGATE! There's still time to save OpenAI, but we'll need YOUR help to do it!
Anonymous No.106318934
>>106318758
>While Meta had embraced openness with its Llama models, CEO Mark Zuckerberg suggested on the company’s second-quarter earnings call it may pull back on that strategy going forward.
>OpenAI, meanwhile, is moving in the opposite direction, betting that broader accessibility will help grow its developer ecosystem and strengthen its position against Chinese rivals.
Meta could've been in such a good position if they actually fostered a developer ecosystem instead of dumping out weights they intended only for use on enterprise clusters.
Anonymous No.106319134 >>106319207 >>106319694
i dont know where to ask
how do i get started on being able to generate
local ai porn?
everything online always features male bodies
and i only want to see the female and see breast expansion
so i guess i have to create it myself
Anonymous No.106319207 >>106319217 >>106319245
>>106319134
for imagen you'd be better of in /ldg/ instead.

But honestly, while I don't mind the smell of ozone and shivers down my spine, the distinct smell of AI-art slop usually absolutely disgusts me.
Anonymous No.106319217
>>106319207
ok ill check it out thank you
Anonymous No.106319245
>>106319207
sometimes the ai creeps me out too, but i dont care
i like breast expansion porn and the stuff online
is either boring or weird so i want to create my own
Anonymous No.106319330
>>106318374
Yeah, if someone compares against another method in a paper you should always assume it's the dumbest approach possible with that name. They want to look good. If they compare to "INT4" it will be the most naive shitty uniform quantization possible. Quant papers never compare against llama.cpp.

>>106316643
Sadly RAM setups are ultimately cope, because doing matmuls on CPU will always be shit. It's the wrong tool for the job, like eating soup with a fork. The hardware isn't meant for it. Of course GPUs with tiny VRAM are even worse, but the ultimate issue is that no consumer hardware is suitable for LLMs right now. Either we need better hardware, or a different architecture which either needs less space (for GPU) or less compute (for CPU)
Anonymous No.106319540 >>106319586
Liang...
You forgot to give us the instruct model...
zucc No.106319586
>>106319540
sorry I bought them out today, it's meta's proprietary model now
should've been nicer faggot KEEEEEEEEEEEEEK
Anonymous No.106319694
>>106319134
State your specs nigger. There is a lot of difference in what you can do with a crappy laptop vs 1TB server.
Anonymous No.106319710
Deepseek 3.1 confirmed for so embarrassing that there's no point in creating a model card or instruct tunes of it
Anonymous No.106319723
>>106318122
Some modern version of https://rentry.co/miqumaxx from the op is an option
Anonymous No.106319951 >>106319965 >>106320032
im still using monstral for RP, anything notable improving I can run on 48gb vram?
Anonymous No.106319955
Llama 4 was really so bad that it started AI winter
Anonymous No.106319965
>>106319951
Qwen image
Anonymous No.106320022 >>106320041 >>106320085 >>106320183
>>106316840
This but completely unironically. There's way too much goodwill for china in the LLM scene just because they're releasing open source models, which literally every non-sota company does to try to reduce sota dominance. As soon as they approach sota, they go back to closed source (see: grok, meta)
Anonymous No.106320032 >>106320038
>>106319951
There's plenty of new shit
Anonymous No.106320038 >>106320077
>>106320032
I want the coom RP bots though because here is where people use them not just the ones that do coding the best
Anonymous No.106320041 >>106320044
>>106320022
>meta
>sota
Anonymous No.106320044
>>106320041
excuse me but they're making SUPERINTELLIGENCE
Anonymous No.106320070 >>106320087 >>106320110
You're all laughing but when Behemoth drops you won't be.
Anonymous No.106320077
>>106320038
Just try a bunch and see how they work, some people really enjoy doing that.
Anonymous No.106320085 >>106320125
>>106320022
Purely hypothetically. When V4 will be trained on 32K H800s gifted by the CCP to Liang Wenfeng personally, released on HF and will save local by stomping Opus 4.1-Thinking at 5-10% of the cost per token, what then? Will you APOLOGIZE to Chad Xi and admit that the goodwill is entirely deserved?
Anonymous No.106320087
>>106320070
But it's already on huggingface? It is honestly not a bad model for it's size and was great at release time.
Anonymous No.106320110
>>106320070
Behemoth was already dropped, permanently
>>106318132
Anonymous No.106320121 >>106320168 >>106320191
What's the best model to run on a 32GB RAM + 12GB VRAM machine? I downloaded QwQ-32B a while ago and it was pretty good, albeit obviously slow (2t/s). I wonder if anything better came out since then, either in terms of speed or intelligence. I use the LLM for solving problems and not casual chatting so output quality is more important than speed.
Anonymous No.106320125
>>106320085
If DeepSeek unironically releases a #1 model openly (at the time of release, and not just a top 5 cope) I will ritualpost my apology at the top of every thread henceforth until 4chan is shut down by the inferior western version of commies.
Anonymous No.106320127
loli manko general
Anonymous No.106320143 >>106320154
when dots vlm gguf step 3 gguf ernie vl gguf
Anonymous No.106320154
>>106320143
32 minutes before you get bored of them.
Anonymous No.106320157 >>106320171 >>106320210
I NEED VRAM! 50 t/s is so sloooow!!!
Anonymous No.106320168 >>106320193
>>106320121
I personally quite like qwen 3 30b a3b 2507. You can try thinking variant of it.
There's also qwen 3 32b, tho I dont know how those two compare.
Anonymous No.106320171
>>106320157
I only get over 50 with 30b...
Anonymous No.106320183
>>106320022
What faggots don't realize is it's the move to closed source that results in them losing the race
We're already getting to the point where every model is having a mixed toilet orgy at the top with less and less differentiating them. As a rule, people will only pay for a more expensive closed model if it's substantially better than the competition
Once we hit this point, there is no longer a sense of "substantially better". Thus, people default to the cheaper, faster, most convenient models. There's a lot of ways to make that happen with open models, but a painfully finite number of ways to pull that off with closed models. OpenAI is desperately trying to do that by cutting its prices to the bone and withering the losses, but that's unsustainable
Anonymous No.106320191
>>106320121
Qwen 3 30b 2507, although worse than QwQ in most things it runs way faster. Try running it on CPU only first and then try to speed it up with your gpu since it is sometimes faster that way.
Anonymous No.106320193 >>106320199 >>106320202 >>106320206
>>106320168
>qwen 3 30b a3b 2507
I'm not sure if I'm thinking of the same exact model, but I remember trying it and it was indeed faster but tended to get stuck in infinite loops
Anonymous No.106320199 >>106321706
>>106320193
That's the original release, they now released a series of updated 30b models (2507) and they are very good.
Anonymous No.106320202 >>106321706
>>106320193
you may have used the original 30a3, the new 2507 version was just released last month and is a pretty big improvement
Anonymous No.106320206 >>106321706
>>106320193
well, all I can say is that I didnt encounter that issue with it. Also possibly 2507 version fixed it or something.
Anonymous No.106320208
>that's... a lot to think about
>that's... a lot to think about
>that's... a lot to think about
sigh
Anonymous No.106320210
>>106320157
>50 t/s is so sloooow
Unless you're talking about prompt processing, you don't even know what slow is.
Anonymous No.106320428 >>106320454 >>106320570 >>106320704
Sirs... I am afraid to ask about google's next model. Will they redeem Gemini 3 and show that there is still room for improvements or will they hit the wall like everyone else?
Anonymous No.106320454 >>106320462
>>106320428
Hopefully they hit the wall too. It's time to test out new ideas.
Anonymous No.106320459
>>106316681
densesissies LOST
Anonymous No.106320462 >>106320469
>>106320454
Like what? Bitnet? RWKV? Diffusion-based LLM?
Anonymous No.106320469
>>106320462
Yes
Anonymous No.106320491 >>106320674
Dense models are for dense people.
MOE models are for moe :]
Anonymous No.106320495 >>106320507 >>106320512 >>106320555 >>106320575
Why are "vision" models such huge memes? They can't OCR, they can't count, they can't read graphs, they don't recognize famous characters. Hell, some even can't tell a man from a woman! What even is the >usecase for them?
Anonymous No.106320507
>>106320495
>multimodal meme
Anonymous No.106320512
>>106320495
Anonymous No.106320555
>>106320495
>They can't OCR
They're pretty decent at it tho. Even gemma 3 27b is better than any specialized ocr for japanese that I tested (mangaOCR, PaddleOCR, tesseract), not to mention that it can somewhat well deal with handwriting (tho gemini 2.5 flash is much better at this), and they aren't constrained to only one/two languages.
Anonymous No.106320570
>>106320428
nano banana is already showcasing the image edit capabilities of gemini 3, the llm part is going to be even better
Anonymous No.106320575
>>106320495
>They can't OCR, they can't count, they can't read graphs, they don't recognize famous characters
A bunch of them can, they just have 0 fucking support in most frontends and you have to run them with transformers.
Although even some of the ones that are supported are decent, medgemma can diagnose marfan's from a photo and read text accurately from my tests.
Anonymous No.106320586 >>106320610 >>106320640 >>106320676
Asked V3.1-thinking to comment on my code and it gave pretty concrete advices.
Qwen3 235B 2507 gave generic advices but V3.1-thinking told me to use tenacity w/ decorators for API calls, improve loguru logging, add performance monitor to critical code, reorganize a very long function, add type-hints and docstrings (never gonna do that lol), use config file for global params etc.
Anonymous No.106320610
>>106320586
Awesome. I’m starting to run up against the limits of what I can achieve with qwen coder. I’ll throw some hairy stuff at 3.1 once I finish quanting it and see how she goes
Anonymous No.106320640 >>106320646
>>106320586
>V3.1-thinking
where?
Anonymous No.106320646 >>106320717
>>106320640
API and web
Anonymous No.106320674
>>106320491
This. Anyone can see that MoE is the ticket to stuffing as much math reasoning as possible into a single model. The future is beautiful MoEs stuffed to the brim with math and code, as far as the eye can see. Our benchmarks will blot out the sun.
Anonymous No.106320676 >>106320767
>>106320586
Well it's has 3x parameters....
Anonymous No.106320704
>>106320428
Next Gemini will be 2.6 level of improvement
Anonymous No.106320717 >>106320744
>>106320646
wth they replace r1-0528
Anonymous No.106320731 >>106320845
Local status?
Anonymous No.106320744
>>106320717
tbf the end points were called "deepseek-chat" and "deepseek-reasoner". They never specified which chat and which reasoner. Though I do sympathize with people whose workflow broke because of this
Anonymous No.106320767 >>106320774 >>106320805
>>106320676
By that logic, K2 should be better than both.
Anonymous No.106320774 >>106320951
>>106320767
K2 is not a reasoning model. K2 instruct is arguable better than V3 0324 though.
Anonymous No.106320805
>>106320767
Touché.
Anonymous No.106320845
>>106320731
increasingly chinese
Anonymous No.106320890 >>106320905 >>106320911
doomniggers aside, 3.1 has shorter but better responses imo. you just have to prompt it for longer responses, like /wait/ said.
Anonymous No.106320905
>>106320890
Yeah - it might be the new model smell, but I think DS 3.1 is probably my favorite of the recent Chinese batch even with reasoning disabled
Anonymous No.106320911 >>106320919
>>106320890
Are you talking about RP? I haven't found answers to be shorter. The thinking though is definitely shorter.
Anonymous No.106320919 >>106320940
>>106320911
Is there even a way to use it for RP yet without going through the website?
Anonymous No.106320940
>>106320919
Yes. On the official APIs "deepseek-chat" "deepseek-reasoner" are now V3.1 non-thinking and V3.1 thinking respectively
Anonymous No.106320951
>>106320774
K2 Reasoning any minute now
Anonymous No.106320981 >>106321077 >>106321083
We have the right models for RP, reasoning, and coding.

Is there a model specifically for asking stupid questions and explaining things? Or should I just use a 4B model with web search?
Anonymous No.106321072
Instruct?
Goofs?
Anonymous No.106321077
>>106320981
Gemma.
Anonymous No.106321081 >>106321256
13:18 in China and still no model card
It's OVER
Anonymous No.106321083
>>106320981
I just spam grok for my dumb questions. What the point of running a 4b model just so it can send your data to and use google anyways? Youre just using a worse llm for no reason.

Granted, maybe there already is one reason. I had Grok help me shop for something and oh boy was it useful. You dont even know if it's an ad. It just looks like an answer.

But whatever, 4b would fall for SEO shit.
Anonymous No.106321086 >>106321111 >>106321157
3.1 at q4 isn't looking great for coding so far...even with neutralized samplers I'm getting repetition, and even when I'm not the output isn't very high quality. Is there some base-model magic I need to use to get it to code properly?
Anonymous No.106321103
>says it's going to do this or that, in its thinking
>final response doesn't include it at all
Anonymous No.106321111
>>106321086
Of course it's repeating anon, it's a base model
Anonymous No.106321157 >>106321197
>>106321086
Base model isn't for you nigga
It's for finetuners and people who want to add new modalities to it
Anonymous No.106321166
>experimenting with a system prompt that asks the model to think about how to use media references in its writing
>try swiping on a certain chat where it's a bit of a noncon scenario
>model thinks of referencing Twilight
Ok maybe I should modify the prompt a bit.
Anonymous No.106321197
>>106321157
>It's for finetuners and people who want to add new modalities to it
And how do I do that? Do I have to beg cudadev to release training code for goofs?
Anonymous No.106321200 >>106321242 >>106321510 >>106321713
>>106318758
https://www.cnbc.com/2025/08/19/sam-altman-on-gpt-6-people-want-memory.html
>"The models have already saturated the chat use case," Altman said. "They're not going to get much better. ... And maybe they're going to get worse."
It's over, llms have reached its peak.
Anonymous No.106321211 >>106321495
So why didn't Qwen succeed with hybrid reasoners
I wonder if it's a model size issue
Anonymous No.106321242
>>106321200
Well, look on the bright side. At least he cut his losses early before making a bunch of impossible claims about what the technology could do to the investors and taxpayers that funded him
Anonymous No.106321256
>>106321081
Maybe they just forgot?
Anonymous No.106321404 >>106321434
Ummm so why can't they make it work like human memory(however that works)?
Anonymous No.106321434 >>106321501
>>106321404
Because the weights don't change.
Anonymous No.106321495
>>106321211
did deepseek succeed though? there is no instruct yet, only their api, who knows whats going on there.
and apparently it has troubles like web searching like gpt5 even if the toggle is off etc.
also not even mememarks out yet besides some redditor screenshots. i would wait before shitting on qwen.
Anonymous No.106321501
>>106321434
i think i saw at least 4-5 papers about some revolutionary technique that updates the weights in real time. kek what a scam.
Anonymous No.106321510 >>106321713
>>106321200
also sam, only 2 weeks ago
https://xcancel.com/sama/status/1953551377873117369#m
>we can release much, much smarter models, and we will
Anonymous No.106321585
>try vision
>describe this picture
>char is depicted with a forced, wide smile but is crying with red tears streaming down
>crying with red tears
Is this pretraining or post-training bias? Why can't it just say blood?
Anonymous No.106321619 >>106321630 >>106321822
where can I get high quality voice samples for tts?
Anonymous No.106321630
>>106321619
VNs
Anonymous No.106321637 >>106321657
Don't tell me you thought 3.1 was a real next release? Please. If only you knew. It's just the smallest of appetizers... No, not even that. It's the cherry on top of the sample slice of a preview of a cake that's been baking a long, long time. And it's just about ready to go. Oh boy, you better be ready.

Local will never be the same.
Anonymous No.106321652 >>106321659 >>106321672
Anonymous No.106321657
>>106321637
>cherry
berry*
Anonymous No.106321659 >>106321668 >>106321673 >>106321712
>>106321652
Why does India get special treatment? What's stopping me from using a VPN and getting the street shitting discount?
Anonymous No.106321660
how long do they let “no model card” marinate?
Anonymous No.106321666
Told ya to grow non drugs
Anonymous No.106321668
>>106321659
Wait until you find out how many special treatments India is getting from the tech mafia (which is now run by Indians).
Anonymous No.106321672
>>106321652
ChatGPT GO to the polls and vote for more STARGATE funding!
Anonymous No.106321673 >>106321688
>>106321659
They're just mass-testing the router which answers perceived less important questions with less powerful model, to cost-save
Anonymous No.106321688
>>106321673
Mass murder
Anonymous No.106321706
>>106320199
>>106320202
>>106320206
Tried the new release and it still gets stuck in an infinite loop. No luck.
Anonymous No.106321712 >>106321721
>>106321659
>Why does India get special treatment?
Because they're a huge market, they can't afford the normal pricing, and saltman will still make a profit with the discount pricing
>What's stopping me from using a VPN and getting the street shitting discount?
Nothing, but not every single person who uses the service will go through the trouble so it's still profitable.
Anonymous No.106321713
>>106321200
>>106321510
Are these supposed to be contradictory? He's saying that the chatbot format has already been perfected (a benchmark is "saturated" when models are scoring too close to perfect so it no longer serves to differentiate them) and they're going to get smarter (as agents/coders/researchers/etc.)
Don't get me wrong, he's still full of shit because they've obviously peaked in actual useful use cases as well, as evidenced by GPT-5 failing to improve on o3, but his bullshit is still coherent at least.
Anonymous No.106321721
>>106321712
Why do Indians love AI so much?
Anonymous No.106321724 >>106321749 >>106321783 >>106321794 >>106321858
Why does Google keep giving us gemma when everyone really wants gemini?
Anonymous No.106321749 >>106321755 >>106321760 >>106321780
>>106321724
I look like this
Anonymous No.106321755
>>106321749
L O N D O N
O
N
D
O
N
Anonymous No.106321760
>>106321749
Natty?
Anonymous No.106321780
>>106321749
No, you don't.
Anonymous No.106321783
>>106321724
they're struggling to get gemini 3 pro built
Anonymous No.106321794 >>106321797 >>106321800
>>106321724
Muscles ruin armpit appeal.
Anonymous No.106321797
>>106321794
bad take
Anonymous No.106321800
>>106321794
>I am gay.
Anonymous No.106321807
V3.1 reasoner has different prefilling behaviour compared to R1/R1-0528.
Previously you prefill "Sure," and it will be put in the thinking block automatically. Now you do "Sure," otherwise the model will skip thinking block entirely.
Probably broke many workflows yesterday.
Anonymous No.106321808 >>106321837
I'm confused, can llama.cpp do vision? I know you can pass the mmproj file as a param.. is there a way to load that file in gpu?
Anonymous No.106321822 >>106321893 >>106322055
>>106321619
For video game characters
https://www.sounds-resource.com/

You can clean up vocals with BandIt Plus via Music Source Separation Training or moises pro plan if you have the cash, resemble enhance, and Acon Digital DeVerberate 3.

Unaltered audio sample
https://vocaroo.com/1aJflfcUiMrv
https://vocaroo.com/1lAcVEF4jElz

Above audio cleaned up with moises pro plan, resemble enhance, and Acon Digital DeVerberate 3
https://vocaroo.com/15wtxg5Gs8vT
https://vocaroo.com/1kJ5Gx7R3cR1
Anonymous No.106321837 >>106322323
>>106321808
Yes and it goes in GPU by default when you pass it as a parameter, unless you use the "--no-mmproj-offload" flag which would send it to CPU instead - in llama.cpp "offload" means offloading from CPU to GPU, not the other way around like it's usually used
Anonymous No.106321858 >>106321870
>>106321724
I look like this, but male.
Anonymous No.106321870 >>106321889
>>106321858
How are your breasts so big?
Anonymous No.106321889
>>106321870
Pectoral muscle training
Anonymous No.106321893
>>106321822
>rudeness and sarcasm and gonna make me jerk of to orgasm
ez
Anonymous No.106321923 >>106322159 >>106322355
Hybrid reasoners can solve the kangaroo beaver problem
Anonymous No.106321952 >>106321956
okay then, glad we got that out of the way
how is nemotron nano v2?
Anonymous No.106321956 >>106321986
>>106321952
>nemotron nano v2?
Only as good as Qwen3-30B
Anonymous No.106321986
>>106321956
so not better than the old nemo 12b, better known as rocinante v1.1
Anonymous No.106322026
Anonymous No.106322041 >>106322070 >>106322276
I have a hypothesis that the main reason why LLMs have improved is not because they're getting trained of larger amounts of data, but mainly because of benchmark-optimizing dataset filtering, larger training context size and better training hyperparameters. Most of the user-facing improvements have been in post-training. This has been bugging me for a while, now.

Llama 1 was pretrained on 1.4T tokens and 2048 GPUs (no post-training).
Llama 3 was pretrained on 15.6T tokens and 32000 GPUs.

Between Llama 1 and Llama 3 the pretraining data was increased by about 11 times, but the number of GPUs increased by roughly 15 times. This means the newer model saw a similar (or more likely lower, considering the larger context size) number of total training steps, since the effective training batch sizes is proportional to the amount of GPUs used.

The implications of very large batch sizes for knowledge learning at scale aren't fully understood, but you can observe them at a small scale with finetuning. For pretraining, it's generally assumed that more data, the better, but if to train larger amounts of data you need more GPUs (thus larger batch sizes), then every weight update will average the gradient of more samples together compared to models trained on less data but also less GPUs.

The larger the training GPU cluster, the more every weight update will be an averaged sloppy representation of the pretraining data (many thousands of random samples that might have nothing to do with each other), even before sloppifying the model with carelessly LLM-generated data in post-training.

The models will probably get better at modeling language **on average**, but also learn less unique facts, ideas and styles due to this large-batch gradient averaging. GPU training clusters can't be scaled up to 1 million GPUs or more without paradoxically making the models less knowledgeable and sloppier.
Anonymous No.106322055
>>106321822
fartgasm
Anonymous No.106322070 >>106322095 >>106322121
>>106322041
I work at one of the frontier labs and I can tell you that our internal base models still improve in quality with larger training runs using more data before post-training. Your hypothesis is wrong but I do agree with post-training needing a new revolution to bring out latent qualities from base models that honestly are getting lost now.
Anonymous No.106322081 >>106322135 >>106323366 >>106323391
V3.1 has obscure knowledge that Qwen3 235B 2507, GLM 4.5 and Kimi K2 failed to cover.
Anonymous No.106322086
>The models will probably get better at modeling language **on average**, but also learn less unique facts, ideas and styles due to this large-batch gradient averaging. GPU training clusters can't be scaled up to 1 million GPUs or more without paradoxically making the models less knowledgeable and sloppier.
older models were not more knowledgeable, what are you smoking
testing LLMs on things like translation prompts containing really niche terms I see newer models routinely improve on this
Even smaller models somehow manage to cram more and more knowledge, Gemma 3 E4N has become so good at translating certain language pairs it's almost large model SOTA level.
older models were less sloppy, yes, but the modern slop is caused by the tons of RLHF
Anonymous No.106322095
>>106322070
>I work at one of the frontier labs
pajeet, nvidia is not a frontier lab
Anonymous No.106322113
>>106317961
Dog, I'm running llm's with a770 16gb + 32gb ram quite fine. Spillovers are expected. Running everything from nvme helps quite a bit.

Note that I'm a dumbass that doesn't know what the duck he is doing, but the wait times aren't too bad for me when I tried newest Cydonia
Anonymous No.106322121
>>106322070
Yes, using more data will improve the models, if the training hardware is kept the same. What's being suggested here is that larger GPU clusters will counteract those improvements. Case limit: a GPU cluster as large as the number of training samples.
Anonymous No.106322135
>>106322081
Try asking them about something actually useful, like the list of characters in Windy Tales.
Anonymous No.106322159 >>106322824
>>106321923
GLM-chan (full) solved it after thinking for 24k tokens.
https://files.catbox.moe/4blrg9.txt
Anonymous No.106322197 >>106322217 >>106322350
We need large dense models again if we want to make any real progress. The "stagnation" we're all feeling is just the active parameter count in MoE models not being large enough to properly express emergent capabilities and deeper levels of understanding.

The only reason MoE is being pushed is to lower inference cost at scale to increase margins, which is insane when you think about it.

Honestly it's Nvidia's fault for restricting the amount of memory their hardware has access to.
Anonymous No.106322217 >>106322359
>>106322197
Large dense models (e.g. o3) aren't economical
Anonymous No.106322276
>>106322041
Related paper from last month: https://arxiv.org/abs/2507.07101

>Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
>
>Conventional wisdom dictates that small batch sizes make language model pretraining and fine-tuning unstable, motivating gradient accumulation, which trades off the number of optimizer steps for a proportional increase in batch size. While it is common to decrease the learning rate for smaller batch sizes, other hyperparameters are often held fixed. In this work, we revisit small batch sizes all the way down to batch size one, and we propose a rule for scaling Adam hyperparameters to small batch sizes. We find that small batch sizes (1) train stably, (2) are consistently more robust to hyperparameter choices, (3) achieve equal or better per-FLOP performance than larger batch sizes, and (4) notably enable stable language model training with vanilla SGD, even without momentum, despite storing no optimizer state. Building on these results, we provide practical recommendations for selecting a batch size and setting optimizer hyperparameters. We further recommend against gradient accumulation unless training on multiple devices with multiple model replicas, bottlenecked by inter-device bandwidth.
Anonymous No.106322323
>>106321837
hmm I see. I wanted to try joycaption, but I'm thinking it's just better to use it in comfy instead of llama... in comfy the guy doing the GGUF nodes just added the ability to also automatically load the mmproj but I need to play around a bit
Anonymous No.106322327
Guys, I've just done something I shouldn't have, but the curiosity got better of me. I've tried claude over api... It's so over, it never has been this over. Not even the most expensive rig I could get would make it better, because there are no models...
How to deal with this despair?
Anonymous No.106322339
>>106316571
That's how it works
Anonymous No.106322350 >>106322360
>>106322197
>We need large dense models again
trust mistral, maybe they'll even get rid of that retarded swa so we can have full attention
Anonymous No.106322355
>>106321923
gpt-oss 20b can solve it
Anonymous No.106322359
>>106322217
Is that how it is? I thought they were on the MoE train since gpt3?
Anonymous No.106322360
>>106322350
Sorry, too busy distilling from DS
Anonymous No.106322396
open-webui seems to be causing llama.cpp to crash when the model is doing a lot of thinking. It doesn't happen when I'm using the webui of llama.cpp
Anonymous No.106322512 >>106322531 >>106322542 >>106322550 >>106322577
https://github.com/ggml-org/llama.cpp/issues/15268
Anonymous No.106322531
>>106322512
kek
Anonymous No.106322542
>>106322512
what the fuck kek
Anonymous No.106322550
>>106322512
Embrace Extend Extinguish
Anonymous No.106322577
>>106322512
Ok now that llamacpp is blacked what inference engine should I use?
Anonymous No.106322580 >>106322603
>Ernie is a Jew
To the trash it goes
Anonymous No.106322603 >>106322611 >>106322646 >>106322732 >>106322876 >>106322948 >>106323829
>>106322580
I've been thinking of buying a new system for llms. Is it even worth it unless I'm using it for work/learning related stuff? For anything erp related, even the larger models are just refusal machines. I don't see how much enjoyment if every model, regardless of how big and SOTA it is, is censored as as fuck.
Anonymous No.106322611 >>106322636
>>106322603
skill issue honestly, I just did a huge coom to k2 (she didnt enjoy it)
Anonymous No.106322636 >>106322642 >>106322687
>>106322611
do you also think not being able to get around qwen and gemma censorship is a skill issue?
Anonymous No.106322642
>>106322636
>qwen and gemma
didnt bother trying these poorfag models sorry, I exclusively nut in V3, K2 and GLM4.5
Anonymous No.106322646 >>106322662
>>106322603
Depend, are you going rammaxxing or just buying pre-built llm specialized pc, or the easy modo mac?
Anonymous No.106322662
>>106322646
would 128gb ram + 3090 be enough?
Anonymous No.106322687 >>106322737
>>106322636
Gemma no, but Qwen yes.
The 235b is uncontrollably horny, and if you're getting refusals there you must be prompting it like a fool.
Anonymous No.106322732
>>106322603
Jamba mini 1.7 was pretty uncensored when I tried it, but it's not exactly sota.
Anonymous No.106322737 >>106322744
>>106322687
If I'm cooming, I want it to be effortless. Why bother spending effort figuring out how to prompt it if there are easier avenues of getting off - without llms.
Anonymous No.106322744 >>106323174
>>106322737
>I want it to be effortless
It unironically is if you're not an idiot, qwen3 doesn't need a jailbreak, literally all it needs is a basic roleplay prompt.
But if even that is too much effort for you, the fuck are you in this thread for? Go over to /gif/ and jerk it to some webms.
Anonymous No.106322824
>>106322159
lmao
Anonymous No.106322876 >>106323010
>>106322603
I'll never understand how you people will spend thousands of $ and just as many hours to get a computer to give you back generated text, when you could easily get a real gf for the same amount of effort.
And its like the least interesting way to use an LLM. Like really? All the shit you could be doing and you need a loli to shit in your mouth or whatever degenerate thing you are trying to make it do
Anonymous No.106322911 >>106322940
Why isn't deepseek 3.1 on openrouter yet?
They didn't release the weights?
Anonymous No.106322940 >>106323054
>>106322911
Your second question answers the first one, doesn't it?
But yeah, it's kind of weird. It's been almost full 24 hours since the drop, and for some reason the instruct tune isn't public yet. (But uploaded since the collection contains 2 items.)
Anonymous No.106322948
>>106322603
i dont think censoredness is such a big problem now, its more that they are all completely retarded and will break down eventually from their accelerated alzheimers
Anonymous No.106322952
jamba will save local
Anonymous No.106323010
>>106322876
You'll never understand if you've written that in good faith.
Anonymous No.106323017 >>106323025
>3.1 has been out for almost days
>still no info on it
it's that bad, huh?
Anonymous No.106323025
>>106323017
China... has fallen
Anonymous No.106323054
>>106322940
Last minute safety ablitardation please understand, basic safety and all.
Anonymous No.106323064 >>106323131
It's over
Local is forever DOOMED
Anonymous No.106323082
>load: control token: 128795 '<∩£placeΓûüholderΓûünoΓûü795∩£>' is not marked as EOG

>print_info: BOS token = 0 '<∩£beginΓûüofΓûüsentence∩£>'
print_info: EOS token = 1


Anyone have any idea why I got this after I quanted deepseek?
Anonymous No.106323091
ok, enough testing and benchmarking of standalone models. Fancy a chat about existing frameworks to actually use them at their full capacity with RAG, MCP and agentic workflows(tools)? I feel like there are more hidden open source gems like morphik.ai and jan.ai. Moreover, I wonder if you can just use Roo Code in VSCode as AI suite (for example AI deep research) instead of a coding assistant.
To tell it in saar speak, I want to leverage existing frameworks, because building your own is actually a ton of work, even when vibecodemaxxing.
Anonymous No.106323102 >>106323113
Are reasoning models any better at making replies in stories that reflect what's actually going on?
Anonymous No.106323113
>>106323102
Sometimes.
Anonymous No.106323131 >>106323217
>>106323064
OOMIN and COOMIN, never DOOMIN
Anonymous No.106323136
nano banana is going to be gemma4 image out isn't it.
Anonymous No.106323158
>>106318517
At least they know all madoka characters are straight
Imagine if they were retarded enough to not know that lol
Anonymous No.106323174 >>106323486
>>106322744
Are roleplaying some hallmark scenario? Qwen safety stops me at every opportunity if I don't handhold it with the scenarios I'm feeding it.
Anonymous No.106323183
My puts are printing
Feels good today
Anonymous No.106323217
>>106323131
This is the one true local mindset.
Anonymous No.106323220 >>106323225 >>106323241
Honestly, V3.1-Base being out and nothing else would be the prime opportunity for finetuners to do something useful for once in their life.
They could make the first instruct tune ever for this model and show off the power of the open model community instead of wasting time and money on the six billionth gemma mistral tune
Anonymous No.106323225 >>106323243
>>106323220
>first instruct tune ever for this model
Pretty sure the first instruct tune ever for the model is whatever DS have on their API
Anonymous No.106323241 >>106323264
>>106323220
>power of the open model community
LOL community instructs have been worse than what the model devs put out since the mixtral era at the least.
Anonymous No.106323243 >>106323258 >>106323266
>>106323225
V3.1 isn't on their API yet, is it? Their documentation still mentions v3-0324 and r1-0528
Anonymous No.106323258
>>106323243
it is the model changed a few hours after their release on huggingface
Anonymous No.106323264 >>106323292 >>106323548
>>106323241
I agree with you but with just one exception: the tulu instruct tune of llama 3.1 was actually better than the official instruct, much greater instruction following, but it's like watching a normal but healthy person beat down a cripple, Meta sucks
Anonymous No.106323265
>finetuning a 671b model
are you the xitter tourist? no one in the foss community has the necessary compute to finetune models bigger than mistral large.
nous maybe an exception.
Anonymous No.106323266
>>106323243
Nigga we've been using the new model through API for a day now. Both endpoints point to the same (hybrid reasoner) model, but the deepseek-reasoner endpoint has a prefill of ""
Anonymous No.106323272 >>106323284 >>106323290
Nemotron Nano V2 gguf support?
Anonymous No.106323284
>>106323272
Just use a calculator
Anonymous No.106323290 >>106323303
>>106323272
no one cares
you can try it there:
https://build.nvidia.com/nvidia/nvidia-nemotron-nano-9b-v2
and see that it's just like any other nvidia model: it's retarded
Anonymous No.106323292
>>106323264
I did say community tunes, IMO Tulu is much closer to corpo than random Dumber and Co, I mean look at their page. https://allenai.org/
Anonymous No.106323303 >>106323313 >>106323330
>>106323290
There's an unpruned unaligned 12B 'base' model as well.
Anonymous No.106323313
>>106323303
Nemo has existed for more than a year.
Anonymous No.106323330
>>106323303
>>unaligned 12B 'base' model
>https://huggingface.co/datasets/nvidia/Nemotron-CC-v2
>synthetic rephrasing using Qwen3-30B-A3B
>using Qwen3-30B-A3B
>30B-A3B
>A3B
Anonymous No.106323366
>>106322081
Nice seems deepseek got a trove of technical manuals to add into the pretrain dataset
Anonymous No.106323391 >>106323402
>>106322081
I might crucified for saying this but I'm pretty sure the DS webui searches online or in some massive RAG for info even without the option turned on and without telling you.
Anonymous No.106323402 >>106323494
>>106323391
It does search even without the option, but it doesn't hide it from you normally (I'd like to see the block here)
It's still possible to stop the searching behavior by telling it to not use search as part of the prompt
Anonymous No.106323427 >>106323444
aLoRA seems to be going places
>https://github.com/ggml-org/llama.cpp/pull/15327
Anonymous No.106323444 >>106323563
>>106323427
which one?
Anonymous No.106323472
>>106323459
>>106323459
>>106323459
Anonymous No.106323486 >>106324220
>>106323174
I don't know how that's happening to you, I've had absolutely no problem with very un-pc bdsm on qwen3.
What the hell does your system prompt look like?
Anonymous No.106323494
>>106323402
Anonymous No.106323521
I'm reading anons stating that DS V3.1 replaced not only V3-0324 but also R1-05.
Is this documented anywhere? I'm not seeing any announcements from DS that validate it.
I wouldn't be surprized; the -chat and -reasoner endpoints now spit out responses that are very similar, at least for rp.
Anonymous No.106323548 >>106323635
>>106323264
https://arxiv.org/abs/2411.15124

> To train our Tülu 3 models, we used between 4 and 16 8xH100 nodes with high speed interconnect. The final 8B model is trained on 32 GPUs for 6 hours and the 70B model was trained on 64 GPUs for 50 hours.
Anonymous No.106323563
>>106323444
lol
Anonymous No.106323635
>>106323548
>32 H100 for 6 hours
That's only like $400. Well within sane range, especially for the grifters getting free compute and ko-fi money.
Anonymous No.106323785
>>106317253
>Great until all web search requires paying for Cloudflare™ Web Scraping permission per connection established.
I seriously envision them trying this shit.
Anonymous No.106323829
>>106322603
>For anything erp related, even the larger models are just refusal machines
Don't start with rape... work up to it.
Anonymous No.106324220
>>106323486
qwen3 and qwq have worked great for me as well once I made the right kind of card for no censorship. Interestingly, I had more trouble getting them to talk about controversial physics topics than to just go wild in erp.