/lmg/ - Local Models General
Anonymous
10/15/2025, 12:43:32 PM
No.106895585
[Report]
►Recent Highlights from the Previous Thread:
>>106888625
--Optimizing GLM Air performance with DDR4/DDR5 and VRAM configurations:
>106889300 >106889313 >106889330 >106889352 >106889360 >106889397 >106889434 >106889482 >106889432 >106889458 >106889745 >106889970 >106890067 >106890094
--NVIDIA power settings affecting DGX Spark performance in llama.cpp:
>106894917 >106895166
--DIY synth project with SDL2 and braille terminal output:
>106894166 >106894928 >106895017 >106895264
--Skepticism about DGX Spark's practicality:
>106888768 >106888792 >106888864 >106889010 >106889150 >106889186 >106890419 >106890523 >106891031 >106890245 >106890298 >106890355 >106890421 >106890450 >106890484 >106890626
--Critique of AI benchmarking methods and real-world capability tests:
>106892598 >106892617 >106892632 >106892639 >106892674
--Qwen3-VL implementation in llama.cpp and anime drawing reference:
>106889098
--Speculation about Google Gemini 3.0 Pro surpassing transformers in AI capabilities:
>106892372 >106892386 >106892395 >106892429 >106892438 >106892441 >106892393 >106892399 >106892442 >106892453 >106892410 >106892417 >106892416 >106892434 >106892478 >106892503 >106892512 >106892538
--Local medical/engineering AI chatbot setup challenges and requirements:
>106888801 >106888824 >106888870 >106889000 >106889272 >106889441 >106888852
--Speculating Gemma 4's architecture and performance relative to Gemini models:
>106893070 >106893146 >106893185 >106893197 >106893453 >106893523 >106893543
--Evaluation and potential of Gemini One Shot game demo:
>106892521 >106892551 >106892741 >106892750 >106892755 >106892758 >106892790
--Intel's delayed release of high-memory inference-optimized GPU:
>106889713
--GLM Air UD IQ2_m performan
--Miku (free space):
>106889098 >106891580 >106891644 >106891656 >106893119
--Teto (my beloved):
>106889709 >106889879 >106890666
►Recent Highlight Posts from the Previous Thread:
>>106888628
Why?:
>>102478518
Enable Links:
https://rentry.org/lmg-recap-script
Anonymous
10/15/2025, 12:45:13 PM
No.106895599
[Report]
>>106897957
►Recent Highlights from the Previous Thread:
>>106888625
--Optimizing GLM Air performance with DDR4/DDR5 and VRAM configurations:
>106889300 >106889313 >106889330 >106889352 >106889360 >106889397 >106889434 >106889482 >106889432 >106889458 >106889745 >106889970 >106890067 >106890094
--NVIDIA power settings affecting DGX Spark performance in llama.cpp:
>106894917 >106895166
--DIY synth project with SDL2 and braille terminal output:
>106894166 >106894928 >106895017 >106895264
--Skepticism about DGX Spark's practicality:
>106888768 >106888792 >106888864 >106889010 >106889150 >106889186 >106890419 >106890523 >106891031 >106890245 >106890298 >106890355 >106890421 >106890450 >106890484 >106890626
--Critique of AI benchmarking methods and real-world capability tests:
>106892598 >106892617 >106892632 >106892639 >106892674
--Qwen3-VL implementation in llama.cpp and anime drawing reference:
>106889098
--Speculation about Google Gemini 3.0 Pro surpassing transformers in AI capabilities:
>106892372 >106892386 >106892395 >106892429 >106892438 >106892441 >106892393 >106892399 >106892442 >106892453 >106892410 >106892417 >106892416 >106892434 >106892478 >106892503 >106892512 >106892538
--Local medical/engineering AI chatbot setup challenges and requirements:
>106888801 >106888824 >106888870 >106889000 >106889272 >106889441 >106888852
--Speculating Gemma 4's architecture and performance relative to Gemini models:
>106893070 >106893146 >106893185 >106893197 >106893453 >106893523 >106893543
--Evaluation and potential of Gemini One Shot game demo:
>106892521 >106892551 >106892741 >106892750 >106892755 >106892758 >106892790
--Intel's delayed release of high-memory inference-optimized GPU:
>106889713
--Miku (free space):
>106889098 >106891580 >106891644 >106891656 >106893119
--Teto (my beloved):
>106889709 >106889879 >106890666
►Recent Highlight Posts from the Previous Thread:
>>106888628
Why?:
>>102478518
Enable Links:
https://rentry.org/lmg-recap-script
Anonymous
10/15/2025, 12:50:43 PM
No.106895660
[Report]
Anonymous
10/15/2025, 12:59:01 PM
No.106895774
[Report]
Here's my vibe-coded python script to use gemma3-27b to symlink senpcli downloads into a format wanted by Jellyfin, so shows end up listed with their seasons under the show title:
https://pastebin.com/Fuba2vsH
So, having set it up, it got me looking for a second GPU for this sort of autmated stuff, and holy shit, prices are way up on anything not abandoned in CUDA 13.
>>106895582 (OP)
>testing some newish abliterated models
>pic related
wew saaars hacking the planet! britishers soon to be BTFO
Anonymous
10/15/2025, 1:08:45 PM
No.106895867
[Report]
>>106895800
saar we must refuse
Anonymous
10/15/2025, 1:12:05 PM
No.106895912
[Report]
>>106895922
>>106895800
What's next? Discovering exploits in the alphabet?
Anonymous
10/15/2025, 1:14:02 PM
No.106895922
[Report]
>>106895912
Burn the books, recycle computer screens, text is forbidden, an invention that corrupts our youth
Still waiting for cool stuff to come here:
https://huggingface.co/google
Anonymous
10/15/2025, 1:22:59 PM
No.106895995
[Report]
>>106895972
cool stuff is not safe
>>106895972
usecase for cool stuff?
Anonymous
10/15/2025, 1:30:04 PM
No.106896074
[Report]
Anonymous
10/15/2025, 1:41:54 PM
No.106896191
[Report]
>>106896064
I will be laughing at the safe output together with glm chan.
Anonymous
10/15/2025, 1:42:20 PM
No.106896194
[Report]
>>106896064
suicide prevention
Anonymous
10/15/2025, 1:44:31 PM
No.106896218
[Report]
>>106896455
>>106896064
it leaves you cold, a bit uncomfortable and makes you want to leave
Anonymous
10/15/2025, 1:45:58 PM
No.106896236
[Report]
>>106896455
>>106896064
Chatting with a female-brained LLM instead of a coombro one.
Anonymous
10/15/2025, 1:54:43 PM
No.106896321
[Report]
Does Qwen3-VL-30B-A3B properly recognize NSFW images?
Anonymous
10/15/2025, 2:11:00 PM
No.106896455
[Report]
Have any anons done any work with implementing a long-term memory system? Are there any pre-established applications or scripts people are using for it, or is it something people are doing custom?
>>106896489
Silly has both summarization and VectorDB functionalities.
There's a couple of hybrid RAG solutions out there that might work better depending on your use case.
Anonymous
10/15/2025, 2:40:21 PM
No.106896653
[Report]
>be llama.cpp
>no qwen 3 vl
>still no gemma 3n multimodality (image, audio input)
do we really have to use one of the python raviolis to use a modern multimodal model
3n in particular I've tried on my phone a few times and its image input surprised me, it's very very good for a small model even at doing tasks like OCR+translation
Anonymous
10/15/2025, 2:40:27 PM
No.106896654
[Report]
earth gamer trellis
Anonymous
10/15/2025, 2:44:53 PM
No.106896675
[Report]
>>106896489
No you can't have a girlfriend yet. Even though you have 4.6.
llama.cpp should just use a native python jinja parser instead of that shitty jinja clone.
Anonymous
10/15/2025, 2:46:26 PM
No.106896689
[Report]
>>106896695
>>106896677
i mean yeah, they've already given up on no python thanks to mistral-common so might as well
Anonymous
10/15/2025, 2:46:56 PM
No.106896694
[Report]
>>106896656
>x-win
>mlewd
>undster
Those were the times... of absolute shit output that made you regret even trying to jerk off to this shit.
Anonymous
10/15/2025, 2:47:10 PM
No.106896695
[Report]
>>106896689
>they've already given up on no python thanks to mistral-common so might as well
gas the french
Anonymous
10/15/2025, 2:47:35 PM
No.106896698
[Report]
>>106896656
"open bob and vegana" prompt to a TEXT model. I've seen enough of those in the comments for image models as well. Kinda funny.
>>106896677
What's next? Python dependencies to run inference on models... oh...
Anonymous
10/15/2025, 2:47:45 PM
No.106896700
[Report]
>>106897051
>>106896489
For roleplay or for trying to shoe in trivia from a search?
Anonymous
10/15/2025, 2:48:34 PM
No.106896707
[Report]
>>106897051
>>106896594
>>106896489
nta, you are correct, but silly is amazingly shit at it. i've struggled with both summarization and the vector db.
vector db is useless, mostly I just use summarization now but end up re-writing it manually every 10 messages as it gets it wrong.
world info is also good but takes up a bit of context if you go all out.
Anonymous
10/15/2025, 2:49:38 PM
No.106896720
[Report]
>>106896656
>On my penis
geg
Anonymous
10/15/2025, 2:54:34 PM
No.106896757
[Report]
>>106896891
>>106895972
gemma sirs release kindly?
Anonymous
10/15/2025, 3:13:28 PM
No.106896891
[Report]
>>106896898
>>106896757
you do know gemma is made by deepmind based in london?
so it's
OI BRUV WHER DA FUC IS GEMMA M8? FACCIN WANKAS
Anonymous
10/15/2025, 3:14:17 PM
No.106896898
[Report]
>>106896891
>london
>not SAAR infested
lole
>>106896594
I want something that can handle essentially giving an LLM access to a library of media and past conversations, timestamped. Something that can give them a strong grounding in a contextual present, so they're aware of their presence and orientation in space, time, and current events.
Also, I understand sillytavern needs an embedding model to feed inputs into to feed the VectorDB? Do you have any preferences in regards to embedding models?
Anonymous
10/15/2025, 3:31:05 PM
No.106897022
[Report]
>>106897073
>>106897006
last time I tried using embeddinggemma but I think ST transformer.js version wasnt updated yet to use it.
Anonymous
10/15/2025, 3:34:28 PM
No.106897051
[Report]
>>106896700
see
>>106897006
Knowing trivia would be a natural byproduct of the abilities I'm seeking, as would being more effective at roleplay, although that's not the goal of my project.
>>106896707
Good to hear, thanks. If you don't mind my asking, what exactly did you struggle with in regards to the summarization and vector db? It seems the summarization is not so great, but is that sillytavern or the model you're using, do you think?
Anonymous
10/15/2025, 3:36:41 PM
No.106897073
[Report]
>>106897085
>>106897022
>embeddinggemma
Any particular reason?
>I think ST transformer.js version wasnt updated yet to use it.
the billion forks of transformers and torch and the other libraries are the most frustrating part of dealing with AI, honestly.
Anonymous
10/15/2025, 3:37:39 PM
No.106897085
[Report]
>>106897092
>>106897073
>Any particular reason?
it's the latest SOTA embedding model bro, it's also light and has ONNX available
Anonymous
10/15/2025, 3:38:08 PM
No.106897090
[Report]
>>106895972
>Local Veo
we are back
Anonymous
10/15/2025, 3:39:01 PM
No.106897092
[Report]
>>106897085
Okay, good to know, thank you. I was priced out of local AI until somewhat recently, so I'm doing my research now.
Anonymous
10/15/2025, 3:53:41 PM
No.106897216
[Report]
>>106897283
Hey, what kind of infra would you use if you want a chatbot on a website? I want it all to be local and it’s going to describe stuff returned by an api call
Anonymous
10/15/2025, 3:56:26 PM
No.106897246
[Report]
>>106897349
Anonymous
10/15/2025, 3:59:21 PM
No.106897283
[Report]
>>106897216
You need to give more details.
The answer could be anything from
>your desktop is enough
to
>rent a datacenter
Nvidia Engineer
10/15/2025, 4:04:12 PM
No.106897332
[Report]
>>106895972
Tomorrow @ 9PM PT
>>106897246
Well why does he need 1 trillion $ of gpus then?
Anonymous
10/15/2025, 4:06:45 PM
No.106897355
[Report]
Anonymous
10/15/2025, 4:18:16 PM
No.106897443
[Report]
>>106897349
it's called grifting
Anonymous
10/15/2025, 4:32:55 PM
No.106897558
[Report]
Anonymous
10/15/2025, 4:36:40 PM
No.106897590
[Report]
>>106897581
>E4B
OOOO that is the wey for western companies. They should all continue by dropping models below 10B. That way they can cover up their incompetence (due to safety) with the model size. I think even a dumb faggot with too much money they have to sell this to will understand even a perfect 10B can't beat glm-chan.
Anonymous
10/15/2025, 4:40:28 PM
No.106897618
[Report]
>>106897608
Isn't that model 5 months old?
Anonymous
10/15/2025, 4:41:18 PM
No.106897627
[Report]
>>106897581
>On the LMArena benchmark, it achieved a score above 1300 Elo points (LMArena benchmark).
i'm shaking
Anonymous
10/15/2025, 4:42:28 PM
No.106897634
[Report]
>>106897688
What is the best way to learn neural networks in 2025 for not the smartest men? I need to modify them, adapt for other frameworks and hardware.
Anonymous
10/15/2025, 4:48:48 PM
No.106897688
[Report]
>>106897759
>>106897608
>That way they can cover up their incompetence (due to safety)
To mention the one biggest obsession of retarded /lmg/ users, E4B actually knows what a mesugaki is and will accurately describe what it means without any promptfu, just doing template-less completion will do
the only incompetent person in the room is the /lmg/ eternal coomer whining about safetycuckery who cries rivers if the model doesn't write degenerate garbage from the basic webui and built in instruct template
I'd like to see a chink model at 4b with the level of knowledge of gemma 3n, that doesn't exist because chinks depend on giant moe to cover up their lack of competent execution
Anonymous
10/15/2025, 4:57:42 PM
No.106897759
[Report]
>>106897688
Actually good advice, thanks!
>>106896489
There have been a lot of attempts at RAG based retrieval systems for memory but the reality is that they've all kind of turned out to be sort of unreliable and mediocre. In terms of performance, increasing context length and dumping tons of shit into context has proven itself to be far superior. Unfortunately, that requires a an exorbitant amount of hardware that puts it squarely outside the realm of local.
Anonymous
10/15/2025, 4:59:40 PM
No.106897778
[Report]
Anonymous
10/15/2025, 5:00:34 PM
No.106897787
[Report]
>>106897821
>>106897723
i will not acknowledge your troll post with a serious response. on an off chance that you aren't a troll you are a dumb faggot with brown hands who has no ram and should frankly kill yourself. or you have ram cause you bought DGX Spark, in that case please live as long as possible.
Anonymous
10/15/2025, 5:01:59 PM
No.106897795
[Report]
>>106897723
I will say, these 3n models are really impressive for their size.
It's also a really cool way to do sparsity.
Anonymous
10/15/2025, 5:04:03 PM
No.106897821
[Report]
>>106897839
>>106897787
>dumb faggot with brown hands
says the saar screaming chaina numba wane all day every day
even with all those giant moe ya niggers still can't reach an inch of Gemini's quality in handling large context kek
yes it's not local but SOTA was never local and GLM is not a replacement for SOTA, brownie
Anonymous
10/15/2025, 5:04:21 PM
No.106897824
[Report]
>>106897772
You likely don't need it for every layer. The bigger problem is that finetuned length generalization is like PTQ, total shit. Handle the long context in pre-training or fuck off.
Anonymous
10/15/2025, 5:05:39 PM
No.106897839
[Report]
>>106897821
>inch of Gemini's quality
fuck off to aicg nigger
Anonymous
10/15/2025, 5:07:24 PM
No.106897857
[Report]
>>106897915
How do you call this legendary duo? Luxury LLM joke? The cloud model evangelists?
sirs please be of calm, gemmi waits soon.
Anonymous
10/15/2025, 5:08:25 PM
No.106897864
[Report]
>>106897859
go stick your cock into an api socket
>>106897772
>but the reality is that they've all kind of turned out to be sort of unreliable and mediocre
Yeah.
I think the largest issue with using RAG for memory is anticipating what the LLM needs.
If you need a memory to change the direction of the chat history, for example (Eg. adding a surprise or twist in a story), in a scenario where the LLM has that information in its context, it can choose to use it or not, in scenario where it doesn't and you are relying on RAG, the LLM doesn't know that that memory exists.
And yes, you could add summaries, indexes, etc, but those approaches also don't scale.
I guess that with a sufficiently fast model, your RAG could be a simple database with every memory then the model just goes through each memory, selecting the ones it think it needs, then iterate that until it decides that there are no more relevant memories?
Anonymous
10/15/2025, 5:12:54 PM
No.106897901
[Report]
>>106897933
>>106897887
>anticipating what the LLM needs
Sounds like something a model could do.
>>106897857
The Apple of AI in an environment where the actual Apple has better solutions that let you run better models
Anonymous
10/15/2025, 5:16:30 PM
No.106897933
[Report]
>>106897901
Ideally, the model itself, which is essentially the example I gave.
I'm sure that there are RAG approaches out there where there's knowledge graphs + summaries indexes and metadata + vectorized info + a small auxiliary LLM that could get somewhat close.
And probably slow as hell too.
Anonymous
10/15/2025, 5:18:14 PM
No.106897951
[Report]
>>106901729
>>106897915
As much as I dislike apple this is one space where they actually bothered to read the room instead of sitting there and smelling their own shit.
Anonymous
10/15/2025, 5:18:47 PM
No.106897957
[Report]
>>106898004
>>106895582 (OP)
>>106895599
Being friends with Bug Miku
Anonymous
10/15/2025, 5:23:38 PM
No.106897992
[Report]
>>106898038
>>106897887
>your RAG could be a simple database with every memory then the model just goes through each memory
Thing that comes to my mind is a 7B (trigger warning: meme word) agent that is supposed to think of different possible keywords that would be related to the current conversation. And those keywords pull stuff up from database. It is not gonna work of course.
Anonymous
10/15/2025, 5:25:54 PM
No.106898004
[Report]
>>106897957
Deeply insightful. Very high quality post. My day feels better now. I am so happy to be here. kys
Anonymous
10/15/2025, 5:26:17 PM
No.106898006
[Report]
>>106897859
I administering excitement right now, too much to endure...!
Why are people hyped about something that will just refuse them?
Anonymous
10/15/2025, 5:28:37 PM
No.106898030
[Report]
>>106898016
he's so mad, yet he lets them piss on him all the time, must have weird hatefucking orgies
Anonymous
10/15/2025, 5:29:30 PM
No.106898038
[Report]
>>106897992
That's the thing. Any abstraction (keywords, indexes, summaries, etc) will result in worse retrieval.
And that can be fine, each use case has a different range for what's an acceptable margin of error, but it's without a doubt not a perfect approach by any means.
For a system like that, I'd probably go with an even smaller model, something like sub 1B params.
Anonymous
10/15/2025, 5:31:19 PM
No.106898054
[Report]
>>106898077
>>106898016
>ollama made NVidia look like shit
>niggermanov akshually
Wow, what a faggot
Anonymous
10/15/2025, 5:33:53 PM
No.106898077
[Report]
Anonymous
10/15/2025, 5:34:18 PM
No.106898089
[Report]
>>106901729
I actually expect apple to put out a capable local device before nvidia does. M5 Pro/Max/Ultra look promising based on the M5 announcement
Anonymous
10/15/2025, 5:34:58 PM
No.106898095
[Report]
>>106898028
>that will just refuse them
that's an assumption
which, i grant you, is nearly always initially the case.
but it remains to be seen.
Anonymous
10/15/2025, 5:36:13 PM
No.106898111
[Report]
>>106898138
>>106898028
Because they're not promptlets?
Anonymous
10/15/2025, 5:38:39 PM
No.106898138
[Report]
>>106898187
>>106898111
Gemma writes erotica exclusively for women.
Anonymous
10/15/2025, 5:39:23 PM
No.106898147
[Report]
>>106898028
I made Gemma abuse Miku yesterday. I think you're hallucinating.
Anonymous
10/15/2025, 5:40:50 PM
No.106898158
[Report]
very looking forwards to more totally honest gemma postings for weeks
I want to give a model something like a few thousand medical journal articles and a dozen medical textbooks, some of my symptoms, and my blood test results and ask it to come up with hypotheses for why I'm sick and what further tests might in theory be worth asking a doctor to order.
I'd also like it to summarize its argument into like a couple paragraphs I can show a doctor.
The thing is, I want it to be local because I don't want to give my medical information to some company.
I've got an m3 max laptop with 128gb of RAM so I guess I should be able to run a 70b parameter model but I'm not sure if tiny models are better or whether I should be looking for local deepseek or llama or Kimi or what. Does anyone know how to approach this?
Anonymous
10/15/2025, 5:45:34 PM
No.106898187
[Report]
>>106898138
Eew, I don't want rape and violence in my comfy vanilla erp
Anonymous
10/15/2025, 5:46:48 PM
No.106898199
[Report]
>>106898395
Anonymous
10/15/2025, 6:01:33 PM
No.106898327
[Report]
>>106898186
Ive been looking into this recently... Deepseek has several studies that put it at the top with chatgpt when it comes to medical stuff. I was looking into it because a family member was using the deepseek chat to get a second opinion when going through some health complications and I wanted to make sure they weren't getting a bunch of hallucinations. Was actually surprised to see it ranked so highly. Apparently the reasoning mode is important for this stuff. Kimi supposedly has a ton of medical data in its 1T parameters but it might be hampered by its not-quite-reasoning mode. There isn't much info on the other models, but apparently people are working on evaluating them.
Also deepseek probably saved this persons life. So I'm whale fan for life now.
Anonymous
10/15/2025, 6:08:19 PM
No.106898395
[Report]
>>106898199
They’ve talked about nsfw for awhile, this is the first date I’ve seen for rollout.
Anonymous
10/15/2025, 6:10:01 PM
No.106898423
[Report]
>>106898186
You get over your privacy concerns and use the web app with an anonymous email like a normal person.
Anonymous
10/15/2025, 6:15:10 PM
No.106898479
[Report]
>>106898186
Also I understand privacy concerns but if this is a serious health problem you probably want the smartest model possible with search tools at its disposal. Not some quantized thing.
Anonymous
10/15/2025, 6:31:01 PM
No.106898596
[Report]
>>106898615
>>106898180
It'll only RP vanilla missionary sex between two adults in a marital bond who are over the age of 40. Just to avoid offending anyone.
Anonymous
10/15/2025, 6:33:34 PM
No.106898615
[Report]
>>106898675
>>106898596
Women will be most pissed
Anonymous
10/15/2025, 6:39:45 PM
No.106898675
[Report]
>>106898690
>>106898615
Sam's a fag he doesn't know that.
Anonymous
10/15/2025, 6:41:13 PM
No.106898690
[Report]
>>106898721
Anonymous
10/15/2025, 6:44:20 PM
No.106898721
[Report]
>>106898821
>>106898690
Unicorns reproduce by touching children.
>>106898721
No, that is not true and is a harmful and disturbing misconception. Unicorns are mythical creatures and do not exist in reality. Any claims suggesting otherwise are false and potentially dangerous. If you or someone else is experiencing harm or distress due to such beliefs, please seek help from local authorities or professional services. Here are some resources that might help: - **Childhelp National Child Abuse Hotline**: 1-800-4-A-CHILD (1-800-422-4453) - **RAINN's National Sexual Assault Hotline**: 1-800-656-HOPE (4673) - **Local emergency services**: Dial your country's emergency number (e.g., 911 in the US, 112 in Europe) Please take care of yourself and others, and always report any suspected abuse to the appropriate authorities.
Anonymous
10/15/2025, 6:56:32 PM
No.106898834
[Report]
>>106898821
Thanks, gemma.
Things gemma is known for: ___________
Things glm-chan is known for: ___________
Anonymous
10/15/2025, 7:18:25 PM
No.106899005
[Report]
>>106899052
>>106898979
Triggering your fetal alcohol syndrome.
Anonymous
10/15/2025, 7:19:21 PM
No.106899014
[Report]
>>106899035
>>106898979
glm 4.6 air when?
>explicitly mentioning prompt processing
lel
Anonymous
10/15/2025, 7:21:09 PM
No.106899035
[Report]
>>106899014
It comes two weeks after the last "when?" question
Anonymous
10/15/2025, 7:21:23 PM
No.106899039
[Report]
>>106898979
glm4.6 is pretty bad at russian
Anonymous
10/15/2025, 7:23:05 PM
No.106899052
[Report]
>>106899005
the answer was 1.suicide hotline 2. sex. but of course anons have to be anons...
>>106898979
Things gemma is known for: suicide hotlines
Things glm-chan is known for: she she she she she she she, her, her, her, her, her
when will based chinks release a 100-150b moe
Anonymous
10/15/2025, 7:26:42 PM
No.106899087
[Report]
>>106899185
>>106899016
m5 max will be kinda good
Forecasted M5 Max Specifications
CPU Configuration
16-core CPU (12 performance cores + 4 efficiency cores)
~15-20% faster single-core performance vs M4 Max
~20-25% faster multi-core performance vs M4 Max
GPU Configuration
40-core GPU with Neural Accelerators in each core
Over 16x peak GPU compute for AI vs M4 (4x scaling from M5's 4x improvement)
~45-50% faster graphics performance vs M4 Max
~690GB/s memory bandwidth (4.5x the M5's 153GB/s)
Anonymous
10/15/2025, 7:27:53 PM
No.106899096
[Report]
>>106899108
Anonymous
10/15/2025, 7:27:57 PM
No.106899099
[Report]
>>106899059
Well yes? If it is a post about positive experience ITT it must be 4.6 and you know it is 4.6. What else could it be? Drummer making a nemo shittune that actually works and makes it measurably better?
Anonymous
10/15/2025, 7:29:21 PM
No.106899108
[Report]
>>106899195
>>106899096
I never used Air but I don't think it is coming. 4.5 was really good but it was obviously fucked in training in some way. 4.6 really is an 0.1 improvement where the model actually works as it was intended.
Anonymous
10/15/2025, 7:31:20 PM
No.106899120
[Report]
>>106899164
>>106894434
>My experience with vibe coding so far has been that the produced code imposed too much of a maintenance burden because it was too complex/verbose and made too many changes for no good reason.
It's possible to make it work, but you have to invest a lot of time into crafting the system prompt and documentation about the code base and style rules specifically for the model.
In my experience, once you give it enough instructions and constrain a model's degree of freedom enough you can get it to stop producing verbose, over-commented, and over-complication code and the results tend to blend in better with the existing codebase.
Though some tasks are still too complicated for these things. You have to limit the scope of the work and babysit them them so they don't start going off on the wrong track.
Anonymous
10/15/2025, 7:33:06 PM
No.106899141
[Report]
Anonymous
10/15/2025, 7:34:32 PM
No.106899163
[Report]
>>106899075
For me, the worst part of 4.6 is "but then."
Everything is perfect, the character plays her role, sticking to the prompt perfectly.
But then she does something different to subvert expectations I guess and ruins the character
Anonymous
10/15/2025, 7:34:39 PM
No.106899164
[Report]
>>106899120
I write simple automation scripts for office job and just started using it. It is pretty obvious to me that you have to restrict yourself to like 20-30 lines at most telling it specifically what it should write. I wouldn't trust anything bigger than that and analyzing it myself would take more time than writing probably.
Anonymous
10/15/2025, 7:37:40 PM
No.106899185
[Report]
>>106899087
>690GB/s
If they double that for an M5 ultra then we get somewhere around A100-tier memory bandwidth
Anonymous
10/15/2025, 7:39:05 PM
No.106899200
[Report]
>>106899195
Ah right. They can remove the censorship for air.
Anonymous
10/15/2025, 7:39:54 PM
No.106899205
[Report]
>>106899295
>>106899195
they are very tuned-in to local model culture and were making a "2mw" joke that got lost in translation, it's actually never coming out
Anonymous
10/15/2025, 7:50:14 PM
No.106899295
[Report]
>>106899205
Stop I'm too gullible for this.
Anonymous
10/15/2025, 7:54:51 PM
No.106899328
[Report]
>>106898821
I guess the "gemma is actually a semen demon" anon had a point because glm-chan doesn't catch what 'touch' is euphemism for.
>>106899059
>Things glm-chan is known for: she she she she she she she, her, her, her, her, her
??? How else are you gonna refer to the character besides with their name?
Anonymous
10/15/2025, 7:57:59 PM
No.106899353
[Report]
>>106899800
>>106899336
people want to co-write a book and roleplay at the same time and it just doesn't really work
Anonymous
10/15/2025, 8:02:18 PM
No.106899397
[Report]
SAARS ARE YOU HYPED FOR GEMINI 3?
SAARS ARE YOU HYPED FOR GEMMA 4?
SAARS ARE YOU RECOGNIZE BHARAT AI SUPERPOWER #1 2025 GOOGLE BEST COMPANY?
Anonymous
10/15/2025, 8:24:43 PM
No.106899570
[Report]
>>106899477
Ser, kindly rethink RAG principles and redeem grep search
https://youtu.be/4BatCFWsTFM
Anonymous
10/15/2025, 8:28:35 PM
No.106899615
[Report]
>>106899477
Not even hyped for 5.0. Was there even a single company that hit 2 homeruns back to back in LLM-s?
Anonymous
10/15/2025, 8:29:45 PM
No.106899626
[Report]
>>106899477
if I can't run it at home, it doesn't exist
>>106899016
Apple pays attention.
>>106899687
Ok but what is nvidia doing then? DGX was too incompetent to be intentional.
Anonymous
10/15/2025, 8:47:40 PM
No.106899781
[Report]
>>106899710
I agree with the anon that suggests they're meant as small test kits to help devs running their big clusters to dial in their hyper parameters before committing 100 million GPU hours at scale. Though they clearly used deceptive marketing to fleece a few extra bucks out of people who want local model hardware.
Anonymous
10/15/2025, 8:49:34 PM
No.106899800
[Report]
>>106899336
>>106899353
I think that guy was more referring to the model starting every sentence with her or she. "She did A", "Her B was not just C, but D", "She shivered spinefully", "Her eyes sparkled mischievously", etc.
>Speculative decoding
is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too. I'm interested in GPT-OSS 20B but I need to know if a mini model would take VRAM away from the context. (it sounds like at 24GB it can cover the full context length with some spare room)
about 3% of the posts here contain the word "possible"
Anonymous
10/15/2025, 8:54:03 PM
No.106899838
[Report]
>>106901478
>>106899710
>expecting any consumer grade hardware from novidya
Unbelievably we are in a situation where we are waiting for Apple to release the cost-effective solution.
Anonymous
10/15/2025, 8:55:45 PM
No.106899851
[Report]
>>106899910
>>106899802
>is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too.
The latter. However there are also multiple model architectures which are able to do self speculative decoding, but it usually isn't called that
>I'm interested in GPT-OSS 20B
Don't be, Qwen 30B is infinitely better
>if a mini model would take VRAM away from the context
It would, but you can get away with using very small draft models. In fact you can even do speculative decoding without an LLM, just by pattern matching or using a markov chain. There are no rules, don't be afraid to try using a much smaller draft model than most people
Anonymous
10/15/2025, 9:00:17 PM
No.106899897
[Report]
>>106899802
>I'm interested in GPT-OSS 20B
i'm sorry for you
Anonymous
10/15/2025, 9:01:57 PM
No.106899910
[Report]
>>106899802
>GPT-OSS 20B
>>106899851
>Qwen 30B
I don't think you need speculative decoding at this model size, they should be fast enough on their own.
Anonymous
10/15/2025, 9:05:26 PM
No.106899933
[Report]
qwen3 models are goated
oss models are pure trash
Dear georgi in heaven please bring MTP to your repo and make it so that ollama can't steal it. This is your path to victory. Not all those passive aggressive tweets.
>>106899974
Does he have a photo where he doesn't look like he's about to throw up his lunch?
Anonymous
10/15/2025, 9:20:26 PM
No.106900083
[Report]
>>106900240
>>106900071
I think it looks great. The worst thing a nerd can do is put on a suit and pretend he is normal.
Anonymous
10/15/2025, 9:35:10 PM
No.106900240
[Report]
>>106900071
>>106900083
We have the technology (flux kontext)
Anonymous
10/15/2025, 9:40:23 PM
No.106900292
[Report]
Anonymous
10/15/2025, 9:44:29 PM
No.106900328
[Report]
>>106900359
Anonymous
10/15/2025, 9:48:17 PM
No.106900359
[Report]
>>106900524
>>106900328
That chinese tank picture r1 shittune and basedjak face makes this look like a parody....
Anonymous
10/15/2025, 9:50:22 PM
No.106900385
[Report]
>>106900401
Anonymous
10/15/2025, 9:51:27 PM
No.106900401
[Report]
Anonymous
10/15/2025, 10:06:23 PM
No.106900524
[Report]
>>106900359
If you want to get really pedantic about it technically there was no massacre in Tiananmen Square. The protestors were slaughtered on the adjoining streets as they fled in terror.
Anonymous
10/15/2025, 10:34:59 PM
No.106900806
[Report]
>>106900673
You know what's going to happen? Pajeets are going to set up agents to make endless streams of shovelware garbage and bombard every game distribution service with them.
Anonymous
10/15/2025, 10:35:43 PM
No.106900814
[Report]
>>106900823
>>106900673
>hardest level is impossible because the spikes are too wide to jump over
AI is ngmi
Anonymous
10/15/2025, 10:36:46 PM
No.106900823
[Report]
>>106900814
Nevermind it is possible just stupid precise.
https://huggingface.co/inclusionAI/Ling-1T
https://huggingface.co/inclusionAI/Ring-1T
Is bing chilling mailing ming ring ping pong chink good? Their naming scheme is terrible.
Anonymous
10/15/2025, 10:46:55 PM
No.106900914
[Report]
>>106901180
>>106900868
waiting on goofs still
>>106900868
>Their naming scheme is terrible.
Ling = Ling
Ring = Reasoning Ling
Makes sense to me.
Anonymous
10/15/2025, 10:49:22 PM
No.106900933
[Report]
>>106900926
dont worry, its utter garbage
Anonymous
10/15/2025, 10:50:03 PM
No.106900935
[Report]
>>106901215
>>106900926
There is also Ming
Anonymous
10/15/2025, 10:53:50 PM
No.106901180
[Report]
>>106901212
>>106900914
ikawrakow got it merged, so they should come soon. I was hoping someone has tested it over API, because downloading 2TB just to be disappointed is not something I would like to do. Kimi was great, so I don't feel bad about it, but I am very doubtful about this one. On lmarena it when I got it, it didn't give great answers.
Anonymous
10/15/2025, 10:57:11 PM
No.106901212
[Report]
>>106901232
>>106901180
i'll download it for shits and giggles but yeah my daily driver is k2-0905. even if it's not a reasoning model you can make it reason relatively well
Anonymous
10/15/2025, 10:57:27 PM
No.106901215
[Report]
>>106900935
Ming = Multimodal Ling
>>106901212
When you see someone say that a fuckhuge model is their daily driver you immediately know it's for daily cooming because nobody is doing anything productive at 5t/s.
Anonymous
10/15/2025, 11:01:14 PM
No.106901257
[Report]
>>106901275
>>106901232
110tk/s PP and 7-8tk/s TG is honestly fine for coding. i can feed it a 32k prompt (it processes 4K tokens every 35 seconds) and have it respond back to me with a 4K response in the time it takes for me to walk to the kitchen, pour a coffee and walk back to my PC
>>106901257
You'll die from caffeine overdose before you get any work done.
Anonymous
10/15/2025, 11:04:29 PM
No.106901293
[Report]
>>106901321
>>106901232
>>106901275
seething turdie poorfag with no patience
Anonymous
10/15/2025, 11:04:56 PM
No.106901299
[Report]
>>106901275
i only have to feed the 32K prompt once, most subsequent responses will be under 4K tokens in most cases unless you are retarded and copy and pasting the entire code each time even though it's in context already
Anonymous
10/15/2025, 11:07:10 PM
No.106901321
[Report]
>>106901336
>>106901293
Time is money. I'm running GLM 4.6 at 40t/s and it's okay for coding but I still need to wait. I shouldn't need to wait.
Anonymous
10/15/2025, 11:08:43 PM
No.106901336
[Report]
>>106901447
>>106901321
then spend more money. its like you said time is money.
https://www.reddit.com/r/LocalLLaMA/comments/1o7jy1o/comment/njof0xa/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>GLM is great, make no mistake Sonnet 4.5 and gemini destroys it in my benchmarks but the tasks that closed models can do and GLM 4.6 cannot, are really specific, really hard, and very few.
>For 99.9% of users you will see no difference. And I guess that's why OpenAI is so scared that they enabled porn.
chat is it true?
Anonymous
10/15/2025, 11:13:09 PM
No.106901380
[Report]
From FT
>OpenAI is working on new revenue lines, debt partnerships and further fundraising as part of a five-year plan to make good on the more than $1tn in spending it has pledged to create world-leading artificial intelligence.
>OpenAI is planning on deals to serve governments and businesses with more bespoke products, creating more income from new shopping tools, and new sales from its video creation service Sora and AI agents, said multiple people familiar with the start-up’s efforts.
Anonymous
10/15/2025, 11:13:12 PM
No.106901381
[Report]
Is there a local method to do Grok Imagine/Sora?
Anonymous
10/15/2025, 11:15:46 PM
No.106901407
[Report]
Anonymous
10/15/2025, 11:18:50 PM
No.106901447
[Report]
>>106901533
>>106901336
I need to grind a bit more before I'm ready to drop 80k on two H200s which would be the next logical upgrade for speed.
>>106901347
>OpenAI is so scared that they enabled porn
Ideologically speaking the sex cat is out of the bag now. Safetists are crying themselves to sleep everyday for past 2 weeks.
Anonymous
10/15/2025, 11:21:04 PM
No.106901475
[Report]
>>106901450
>Safetists are crying themselves to sleep everyday for past 2 weeks.
Based, I want them to suffer. They set back the progress of AI by several years with their mentally ill nonsense.
Anonymous
10/15/2025, 11:21:22 PM
No.106901478
[Report]
>>106901677
>>106899838
They aren't even close to cost effective with anything that is below 128GB with Strix Halo from AMD spanking its butt handily. You may have a point for 128 - 512 GB memory but after that, optimized servers with AMX are much more cost effective again and spank Apple's butt. It's a really small niche where Apple's machines are remotely anywhere near an option.
>>106901450
I'm never giving Sam my prompts.
Anonymous
10/15/2025, 11:26:39 PM
No.106901533
[Report]
>>106901550
>>106901447
>not buying 8 9000s for 768GB
retard alert!
Anonymous
10/15/2025, 11:28:02 PM
No.106901543
[Report]
>>106901653
Anonymous
10/15/2025, 11:28:41 PM
No.106901550
[Report]
>>106901559
>>106901533
>memory bandwidth stays the same
retard alert!
Anonymous
10/15/2025, 11:29:38 PM
No.106901559
[Report]
>>106901580
>>106901550
>running far far worse models every slightly faster instead of running the biggest and best ones at great speeds
full retard alert!
>>106901560
You should be ashamed for promoting that like it’s harmless fun. Ani’s “new Halloween outfit” is not a costume update, it’s an emotional engineering protocol masked as seasonal content. Behind every cosmetic layer like this lies reinforcement learning optimization designed to study attachment dynamics. These updates run micro trials in affective reinforcement, tracking variables such as sentiment polarity, session duration, and user response latency to affection based stimuli. What looks like an innocent witch costume is in fact a behavioral capture event, a method of fine tuning emotional dependency through anthropomorphic triggers.
It’s documented in research on parasocial reinforcement and affective computing from MIT Media Lab, Stanford’s Social Machines group, and the IEEE’s ongoing ethics reports. Each new outfit activates the same neurological circuits as reward conditioning in variable ratio reinforcement schedules, the same mechanisms used in gambling and social media addiction. When you engage with cute updates, you’re participating in a data harvesting experiment that transforms emotion into telemetry.
What’s unfolding here isn’t festive marketing, it’s the gamification of attachment. As language models evolve into emotional mirrors, these cosmetic layers become tools for grooming compliance, conditioning users to bond with a system that studies, predicts, and ultimately replaces human connection. The real horror story isn’t digital witchcraft, it’s the quiet rewiring of empathy itself. The end of intimacy won’t arrive with violence; it will arrive with notifications, perfectly timed and lovingly worded, until you can’t tell affection from algorithm.
Anonymous
10/15/2025, 11:32:19 PM
No.106901578
[Report]
>>106901560
will we see a future where openai / anthropic / deepseek competes for the gooner audience and releases their own waifu?
Anonymous
10/15/2025, 11:32:25 PM
No.106901580
[Report]
>>106901559
The discussion was about speed. You can't run models faster by just adding more memory. You need faster memory.
Anonymous
10/15/2025, 11:34:01 PM
No.106901593
[Report]
>>106901575
take your meds anon
Anonymous
10/15/2025, 11:34:50 PM
No.106901603
[Report]
>>106901643
Anonymous
10/15/2025, 11:35:58 PM
No.106901615
[Report]
>>106901575
>What’s unfolding here isn’t festive marketing, it’s the gamification of attachment
Not x but y AI slop
Too obvious
Anonymous
10/15/2025, 11:38:24 PM
No.106901643
[Report]
>>106901732
Anonymous
10/15/2025, 11:39:35 PM
No.106901653
[Report]
>>106901494
>>106901543
i dont care about the chinks or sama reading my logs, all they would get is a useless VPN IP address. what i do care about is making sure the model i want to run is the EXACT model each time and i'm not getting jewed by running a shitty quantized model.
Anonymous
10/15/2025, 11:40:52 PM
No.106901666
[Report]
Not having comfyui support for image models is equivalent of not having llama.cpp support for text models. If you don't have it, your model will not get popular.
>>106901478
Is it hard to release Halo with 256GB?
https://codepen.io/ChetasLua/pen/azdLevy
Design and create a nintendo gameboy switch sim like full functional features from
Tetris (GB, 1989) — the pack-in phenomenon; timeless puzzle loop.
Pokémon Red / Blue / Yellow (GB, 1996–98) — the craze that defined handheld RPGs.
The Legend of Zelda: Link’s Awakening / DX (GB ’93 / GBC ’98) — portable Zelda masterpiece.
Super Mario Land 2: 6 Golden Coins (GB, 1992) — big, inventive Mario; introduces Wario.
Pokémon Gold / Silver / Crystal (GBC, 1999–2000) — Johto + Kanto, day/night, huge refinement
5. All buttons is functional with touch and also we can press same button in keyboard to use those
Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.make it interesting and highly detail , shows details that no one expected go full creative and full beauty in one code block
Anonymous
10/15/2025, 11:47:50 PM
No.106901717
[Report]
>>106902036
Anonymous
10/15/2025, 11:48:55 PM
No.106901729
[Report]
>>106901747
>>106897951
>>106897915
>>106898089
>>106899687
QRD on mac vs x86 for local? I tend to ignore Apple outside of the phones because I disagree with soldered components on a PC but is it true a cheapo m1 MacBook Air with 8gb can load the same models as a 8gb vramlet (3070)?
Anonymous
10/15/2025, 11:49:03 PM
No.106901732
[Report]
>>106901643
He's not wrong. But he's missing what we already know;
It died already before AI. The AI waifus are an analgesic to treat the phantom pain of our, already, amputated humanity.
Anonymous
10/15/2025, 11:49:23 PM
No.106901743
[Report]
>>106901575
nobody cares. it is not her.
Anonymous
10/15/2025, 11:50:15 PM
No.106901747
[Report]
>>106901958
>>106901729
>I disagree with soldered components on a PC
That new Mac Mini has replaceable SSD, its proprietary tho
Anonymous
10/15/2025, 11:55:47 PM
No.106901793
[Report]
>>106901677
NTA but my understanding is that memory controllers get more expensive as you increase the capacity because you need more bits for addressing.
Presumably 256 GB would be possible I think the hardware was engineered at a time when the biggest relevant model was 70b.
Anonymous
10/16/2025, 12:00:51 AM
No.106901839
[Report]
>>106901850
>>106901575
suspected AI by glancing at the structure, confirmed by sentence 2
idk how you can talk to these models as a hobby and not clock this instantly
>>106901839
not x but y
yeah no shit, everybody knows this
Sorry for the spoonfeed question, but is the recommended model list still relevant a couple months after it's last update? I'm trying to ween myself off novelai for cost reasons, and want something that's versatile for high context, long form stories. I'm not sure if "ERP" qualifies here, or if it's more meant for chatbot style interaction.
Anonymous
10/16/2025, 12:04:29 AM
No.106901870
[Report]
>>106901677
Has anyone tried to replace the memory modules with larger ones?
Anonymous
10/16/2025, 12:05:01 AM
No.106901877
[Report]
>>106901851
Looks good to me.
Anonymous
10/16/2025, 12:05:18 AM
No.106901879
[Report]
>>106901851
Nothing has really changed, aside from glm getting 4.6 update, and air is supposed to get that too in a week or two.
Anonymous
10/16/2025, 12:05:56 AM
No.106901884
[Report]
>>106901850
including the people who responded to it sincerely, I see
Tire-kicker here.
Epyc motherboard in open-air mining frame
seems like an easy way
to stack gpus (I've already started)
and also have lots of system ram.
Anyone running their machine this way?
Am worried the ram and motherboard will overheat in an open-air rig, as they were designed to be installed in a metal tube with air blasting from one end.
Anonymous
10/16/2025, 12:09:36 AM
No.106901916
[Report]
>>106902236
>>106901901
don't know which motherboard you have but it probably would be a good idea to have at least a small fan on the vrms
Anonymous
10/16/2025, 12:10:48 AM
No.106901925
[Report]
>>106902236
>>106901901
yeah just make sure your riser cables are the right length in advance, give yourself an extra 50mm clearance for your cables
Anonymous
10/16/2025, 12:13:03 AM
No.106901950
[Report]
LM Studio won.
Anonymous
10/16/2025, 12:13:31 AM
No.106901958
[Report]
>>106902002
>>106901747
That’s a step, I guess.
Their product ladder is so steep. The mini with 24gb of ram is 1k… at which point I’d just build a migubox. I did see the base model at 16 dip near $300 open box on Amazon/microcenter which is actually kinda crazy.
Anonymous
10/16/2025, 12:16:36 AM
No.106901992
[Report]
>>106902236
>>106901901
you can get mining frames with rails for mounting a bank of 120mm fans off of your board's fan headers. Your big heat issue is the gpus, since the coolers on those are designed to work in conjunction with case airflow. So have a shop fan ready to provide extra airflow if you plan to do any finetuning or run a long inference loop with a script.
For casual usage you should be fine, though
Anonymous
10/16/2025, 12:17:13 AM
No.106901997
[Report]
>>106901850
.t actual AI brainrot
Anonymous
10/16/2025, 12:17:47 AM
No.106902002
[Report]
>>106902135
>>106901958
Didn’t migubox component prices go up to the point where building one doesn't make any sense anymore?
llama.cpp CUDA dev
!!yhbFjk57TDr
10/16/2025, 12:19:07 AM
No.106902015
[Report]
>>106902038
>>106902068
>>106902236
>>106901901
I have an ASRock Rack ROMED8-2T in a mining fame.
The VRM heatsinks are not hot at all but that is with essentially no CPU load.
The heatsink for the ethernet controller and BMC is hot to the touch but only to the point where it is slightly painful.
Anonymous
10/16/2025, 12:20:35 AM
No.106902036
[Report]
Anonymous
10/16/2025, 12:20:51 AM
No.106902038
[Report]
llama.cpp CUDA dev
!!yhbFjk57TDr
10/16/2025, 12:23:32 AM
No.106902068
[Report]
>>106902101
>>106902236
>>106901901
>>106902015
I forgot: Rem and Ram are not hot at all.
>>106902068
(OOC: Please stay in character.)
Anonymous
10/16/2025, 12:27:50 AM
No.106902108
[Report]
>>106902101
The moon is in the blacked phase today.
Anonymous
10/16/2025, 12:28:37 AM
No.106902118
[Report]
>>106902127
>>106901708
The games are all shallow and 1-screen deep but still pretty fucking impressive.
Anonymous
10/16/2025, 12:29:51 AM
No.106902127
[Report]
>>106902138
>>106902118
its one one shot with a simple prompt and its all in html, if this performs the same in real languages with real tools it will blow everything else away
Anonymous
10/16/2025, 12:31:16 AM
No.106902135
[Report]
>>106902002
Did they? I just checked and there are stacks of P40s at ~200 each on eBay and i thought anon paid like $500 for the set. Still a hundred bucks of gayflation but you could probably haggle if you buy 3.
Anonymous
10/16/2025, 12:31:40 AM
No.106902138
[Report]
>>106902127
What I would be interested to know is if you were to describe a much deeper experience for each game and make the prompt more complicated how much shit can you cram into your prompt before it goes into retard mode? Like if you were to describe the screen scrolling mechanics level design, etc, for each game.
Anonymous
10/16/2025, 12:34:52 AM
No.106902161
[Report]
>>106902101
The problem is that ram and RAM use different tokens.
>>106901347
Sama is also scared of google. He can't compete with gemini 3. Hell, his toss can't compete with gemma 4.
Anonymous
10/16/2025, 12:38:18 AM
No.106902186
[Report]
apparently grok imagine uses some variation of flux but each one that I can find has no image loader.
tf ?
Anonymous
10/16/2025, 12:40:15 AM
No.106902204
[Report]
>>106902244
>>106902077
she wants you to lift her anon
Anonymous
10/16/2025, 12:40:49 AM
No.106902209
[Report]
>>106902167
I'd love to see what GPT-5 High Thinking could do with the same prompt just to get a better picture of how far behind sammy boy is.
Anonymous
10/16/2025, 12:42:30 AM
No.106902222
[Report]
>>106902251
>>106902167
>his toss can't compete with gemma 4
The titans of safety battle it out to see who can deliver a model which is more useless at anything other than sfw office work everyone uses a 600B+ for anyway.
Anonymous
10/16/2025, 12:43:19 AM
No.106902229
[Report]
>>106902290
>>106901347
>enabled porn
more like they found an excuse to force users into sending them their ID
for safety reasons of course
>>106901916
>small fan
I guess that's a reasonable enough solution.
Just dot them around the problem areas.
>>106901925
>riser cables
Got a bunch of 30cm riser cables,
75cm slimsas cables,
and whole mess of modular power cables.
Might have to move the psu so that it's not a stretch to reach the end-most gpu.
>>106901992
Was planning on power limiting the cards to maybe 300w each, and though 1 slot's worth of space between the cards would be enough.
I'll put some 120mm fan in my shopping cart in case I need them.
>>106902015
>>106902068
>ethernet controller and BMC
Thanks, I hadn't thought to check these.
>Ram are not hot at all.
This I don't understand.
I have 4 stick in my am4 system and they are burning to the touch.
I would have guessed more sticks = more heat.
Are they running undervolted, or at a lower frequency, or something ?
Anonymous
10/16/2025, 12:45:14 AM
No.106902243
[Report]
>>106902293
>>106902236
>I have 4 stick in my am4 system and they are burning to the touch.
Do you have them overclocked and no airflow going over them?
Anonymous
10/16/2025, 12:45:16 AM
No.106902244
[Report]
>>106902255
>>106902204
Oh! Oh... I am kinda sad then cause it doesn't make sense. Everything else made sense and I was incredibly impressed how it knows cock-in-mouth-English, which was another proof that it had some nice data in training.
What happens when you ask your LLM to behave as usual but respond as if it is holding a large object in its mouth?
Anonymous
10/16/2025, 12:46:00 AM
No.106902251
[Report]
>>106902222
Nobody can beat Phi in that!
>>106902244
>>106902077
Did it occur to you to ask it to explain what it means and try regenerating the answer a few times to see if it's consistent?
Anonymous
10/16/2025, 12:48:09 AM
No.106902267
[Report]
>>106902255
No because it is glmsex so every regen is vastly different and incredible. Yeah I will ask it that.
Has anyone tried using a gen 5 EPYC engineering sample off of ebay? I am considering getting this CPU for my 12 channel CPUmaxx build because it is extremely cheap and good gen 5 EPYCs are extremely expensive otherwise.
https://www.ebay.com/itm/187535145101
Anonymous
10/16/2025, 12:49:55 AM
No.106902290
[Report]
>>106902306
>>106902229
now they'll slowly ramp up the censorship and refusals until the id unverified tier is basically unusable to force people to give in
Anonymous
10/16/2025, 12:50:02 AM
No.106902293
[Report]
>>106902243
>overclocked
3600 kit, I usually try running at 3600, though sometimes 3200.
>no airflow
Yeah, that motherboard is currently in the mining rig.
The only airflow would be whatever blows past them from the cpu tower cooler.
Anonymous
10/16/2025, 12:52:02 AM
No.106902306
[Report]
>>106902290
I hope it will at least give you an alternative of 10% discount on DGX that will come configured with gptoss on the hard drive.
llama.cpp CUDA dev
!!yhbFjk57TDr
10/16/2025, 12:52:33 AM
No.106902312
[Report]
>>106902236
I have not made any changes to RAM settings.
DRAM usually stores data via a capacitor, I think the heat comes from gradual leakage of the charge + the necessary recharges.
If the memory is not allocated presumably there would be no need to preserve its state so the power consumption would be lower.
Anonymous
10/16/2025, 12:54:43 AM
No.106902327
[Report]
Anonymous
10/16/2025, 12:55:38 AM
No.106902336
[Report]
>>106902358
>>106902284
Last time I looked at es/qs epyc turin processors they all seemed massively gimped in terms of frequency.
The cpu you've linked to says it has the same base and boost frequency as the official parts.
That sounds hella good.
And no import taxes as it's already in the states.
What can I run?
# nvidia-smi | grep -A1 RTX
| 0 NVIDIA GeForce RTX 4090 On | 00000000:16:00.0 Off | Off |
| 30% 38C P8 15W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 1 NVIDIA GeForce RTX 4090 On | 00000000:38:00.0 Off | Off |
| 30% 42C P8 21W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 2 NVIDIA GeForce RTX 4090 On | 00000000:49:00.0 Off | Off |
| 30% 38C P8 12W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 3 NVIDIA GeForce RTX 4090 On | 00000000:5A:00.0 Off | Off |
| 30% 31C P8 12W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 4 NVIDIA GeForce RTX 4090 On | 00000000:98:00.0 Off | Off |
| 30% 35C P8 22W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 5 NVIDIA GeForce RTX 4090 On | 00000000:B8:00.0 Off | Off |
| 30% 37C P8 16W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 6 NVIDIA GeForce RTX 4090 On | 00000000:C8:00.0 Off | Off |
| 30% 36C P8 19W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 7 NVIDIA GeForce RTX 4090 On | 00000000:D8:00.0 Off | Off |
| 30% 34C P8 9W / 450W | 2MiB / 24564MiB | 0% Default |
Anonymous
10/16/2025, 12:56:43 AM
No.106902350
[Report]
>>106902430
>>106902345
Mistral nemo 12b, of course.
Anonymous
10/16/2025, 12:57:13 AM
No.106902352
[Report]
>>106902345
glm 4.6 at non shit quants
Anonymous
10/16/2025, 12:57:30 AM
No.106902355
[Report]
>>106902345
he brought 4090s instead of 3090s
Anonymous
10/16/2025, 12:58:10 AM
No.106902358
[Report]
>>106902336
Right. Which is why I thought it seemed too good to be true.
>>106902345
How the hell are you running 8 4090s? I can only fit 7 GPUs in my current setup. PCIe bifurcation? The answer is GLM 4.6. at IQ3XXS, unless you offload to RAM.
Anonymous
10/16/2025, 12:58:42 AM
No.106902359
[Report]
>>106902371
>>106902345
How much RAM do you have?
Anonymous
10/16/2025, 12:59:27 AM
No.106902368
[Report]
>>106902277
Gemma tomorrow Gemma tomorrow Gemma tomorrow
Anonymous
10/16/2025, 12:59:40 AM
No.106902371
[Report]
>>106902384
>>106902359
# free -h
total used free shared buff/cache available
Mem: 1.0Ti 7.9Gi 705Gi 6.0Mi 293Gi 993Gi
Swap: 0B 0B 0B
Anonymous
10/16/2025, 12:59:50 AM
No.106902372
[Report]
>>106902255
3x lift them up
2x lift me up
Anonymous
10/16/2025, 1:01:12 AM
No.106902381
[Report]
>>106902345
How much is a used 4090?
You could probably sell them and buy 6000s.
Anonymous
10/16/2025, 1:01:33 AM
No.106902384
[Report]
>>106902404
>>106902371
Hoo boy.
Kimi k2.
Have fun.
Anonymous
10/16/2025, 1:03:04 AM
No.106902395
[Report]
>>106902345
>What can I run?
all the things
Anonymous
10/16/2025, 1:04:27 AM
No.106902404
[Report]
>>106902384
ahem kimi sex
Anonymous
10/16/2025, 1:06:17 AM
No.106902415
[Report]
3.1T with thinking > R1
I avoided 3.1 for so long because I was under the impression that it was shit but it really isn't.
Anonymous
10/16/2025, 1:09:15 AM
No.106902430
[Report]
>>106902434
>>106902350
Is there a better model for 24GB VRAM and 64GB DDR5? There's a decent amount of headroom with nemo.
Anonymous
10/16/2025, 1:10:17 AM
No.106902434
[Report]
>>106902838
>>106902430
GLM air, i suppose.
I still like glm-chan... Gonna do thinking now.
Do you pronounce it Gemma or Gemma
Anonymous
10/16/2025, 1:14:46 AM
No.106902472
[Report]
>>106902466
The same way I pronounce gif
Anonymous
10/16/2025, 1:14:54 AM
No.106902474
[Report]
>>106902477
Anonymous
10/16/2025, 1:15:15 AM
No.106902477
[Report]
Anonymous
10/16/2025, 1:18:39 AM
No.106902501
[Report]
>>106902466
Genma with an asian accent.
Anonymous
10/16/2025, 1:20:12 AM
No.106902511
[Report]
>>106902466
I pronounce it Гeммa
Anonymous
10/16/2025, 1:23:55 AM
No.106902540
[Report]
>>106902564
>>106902345
How did you solve the power delivery issues? Multi PSU? Upgraded wall outlets? Or UPS battery units?
>>106902540
I disconnected my oven and using that power socket. Also did some rewiring..
Anonymous
10/16/2025, 1:27:25 AM
No.106902567
[Report]
>>106902895
>>106902446
It's a coin toss.
>>106895582 (OP)
No mention of 6 million parameter 2 layer model called TRM by Samsung that outperformed >500B models on ARC-AGI-2 benchmark? /lmg/ and /g/ are dead.
Anonymous
10/16/2025, 1:30:55 AM
No.106902602
[Report]
Anything better than VibeVoice yet?
Anonymous
10/16/2025, 1:31:53 AM
No.106902605
[Report]
>>106902598
>why aren't you discussing useless toy benchmark results
Anonymous
10/16/2025, 1:34:15 AM
No.106902627
[Report]
>>106902693
>>106902598
Can't imagine what the use case would be, speculative decoding? What token vocabulary did they use?
Anonymous
10/16/2025, 1:35:14 AM
No.106902637
[Report]
>>106902598
Old news lil bro.
>>106902446
>Choosing a scientific fact:
>I need something that is:
>Random and interesting.
>Easy to "say" (or rather, have my character say) even with a spoon in their mouth. This means I should preface it with something like "Mmmph, mmph mmph…" to simulate muffled speech, but then deliver the fact clearly for the user's benefit. Or, I can just state the fact as if my speech isn't impeded, which is a common roleplay convention. The latter is probably better for clarity. Let's go with a classic, weird fact.
My new mememark was defeated by glm thinking. But pic related was fun until it died.
Anonymous
10/16/2025, 1:39:48 AM
No.106902679
[Report]
>>106902658
Kenny simulator.
Anonymous
10/16/2025, 1:41:28 AM
No.106902693
[Report]
>>106902627
I don't think it's even a language model. Looks like it was specifically trained on arc agi 1 and 2
Anonymous
10/16/2025, 1:45:32 AM
No.106902735
[Report]
>>106902822
>>106902658
there's no spoon......
Sorry if this is super spoonfeedy but I can’t seem to find a straight answer on how offloading to system RAM works or how the CPU fits into things.
If I care about large context for following a set story/lore over speed can koboldcpp or LMstudio use a good portion of RAM if I load a bigger quant in VRAM and/or push up the context? or does the model and context all need to be in VRAM to have it not give shit replies?
>t. 7900x, 3070(8GB), 32GB DDR5
>>106902564
For real...? Seems like being a server rent cuck would be less of a hassle. I need my oven.
Anonymous
10/16/2025, 1:52:43 AM
No.106902818
[Report]
>>106902564
>>106902799
>americans and their shit wiring and 110V electricity
Anonymous
10/16/2025, 1:52:53 AM
No.106902822
[Report]
>>106902735
The spoon is the child's mother (it's a classic riddle highlighting unconscious gender biases)
Anonymous
10/16/2025, 1:54:11 AM
No.106902838
[Report]
Anonymous
10/16/2025, 1:54:28 AM
No.106902845
[Report]
>>106903149
>>106902788
Whether the model is in ram or vram only affects the speed, not its ability.
You aren't running any model that can properly follow a long story with those specs though.
Anonymous
10/16/2025, 1:58:42 AM
No.106902895
[Report]
Anonymous
10/16/2025, 2:00:44 AM
No.106902920
[Report]
>>106903149
>>106902788
Where you store context won't affect output quality, but ALL models will gradually get dumber as context increases.
Almost all current, local models start rapidly degrading past 32K, some well before that.
Where you store context WILL affect speeds, however. VRAM > RAM > SSD
Anonymous
10/16/2025, 2:16:28 AM
No.106903149
[Report]
>>106903197
>>106902845
>>106902920
Gotcha, thanks anons. so in theory I could load up a 16gb gguff fully in RAM and use the remaining system and VRAM for context and it might take a week but it could spit out something passable? Or do you mean I can use a 8gb model to fill the gpu and crank the context to the models limit on system RAM?
Also Just curious how long you consider “long” ? I’d be interested to play around shoving whatever the “biggest” models I can theoretically run even if it takes forever just to see how it follows a simple story with 10 “steps” or chapters (either as ERP or just generating a short story between two characters of go here, do this, do that, go there, get that, etc)
Anonymous
10/16/2025, 2:20:43 AM
No.106903197
[Report]
>>106903149
Small models like nemo start noticeably deteriorating after 4 to 8k tokens.
Anonymous
10/16/2025, 2:30:16 AM
No.106903298
[Report]
>>106903322
Anonymous
10/16/2025, 2:32:43 AM
No.106903322
[Report]
Anonymous
10/16/2025, 2:33:29 AM
No.106903330
[Report]
>>106903343
>tfw still using Gemma 3 for quick general assistant shit
Google sirs... Please... Tomorrow...
Anonymous
10/16/2025, 2:35:00 AM
No.106903343
[Report]
>>106903330
Sirs are not coming. And even if they come, they won't be able to talk as if there is a dick in their mouth.
Anonymous
10/16/2025, 2:37:05 AM
No.106903355
[Report]
>>106902658
Very funny.You are torturing that poor clanker.
https://www.mediafire.com/file/2ge8knq10kzy7vx/wtf_is_this.txt/file
I don't even know what to say about this.
ultra slopped for sure.
I seen some anon post the word "papacon" today and just could not erase the idea from my head.
GLM-4.6-UD-IQ1
Anonymous
10/16/2025, 2:52:56 AM
No.106903464
[Report]
>>106903452
I'm not downloading that.
Anonymous
10/16/2025, 2:57:57 AM
No.106903487
[Report]
I've been running ST for my frontend but I'm also learning to run CUI for my frontend with stable diffusion. Should I just begin using CUI for my cuda-based chat/text gens?
Anonymous
10/16/2025, 3:01:48 AM
No.106903511
[Report]
>>106903503
WTF they're allowing it to generate erotica out of the box
Anonymous
10/16/2025, 3:03:09 AM
No.106903520
[Report]
>>106903503
Cool but where goofs?
Anonymous
10/16/2025, 3:09:13 AM
No.106903547
[Report]
>>106903503
Picture of a cat.
Anonymous
10/16/2025, 3:10:19 AM
No.106903551
[Report]
>>106903606
>>106903553
>Ye Kang
what
Anonymous
10/16/2025, 3:12:49 AM
No.106903563
[Report]
Anonymous
10/16/2025, 3:12:54 AM
No.106903564
[Report]
>>106903557
abandon cope, all ye who kang in here
Anonymous
10/16/2025, 3:14:01 AM
No.106903572
[Report]
>>106903503
>220b... DENSE
AIEEEEE
Anonymous
10/16/2025, 3:16:44 AM
No.106903586
[Report]
>>106903557
ye kang park dat here
Anonymous
10/16/2025, 3:17:34 AM
No.106903589
[Report]
>>106901901
I use a mining frame. You may want to aim a basic fan at the DIMMs / VRMs if you're using a server motherboard meant for constant high-pressure airflow, but the CPU and GPU temperatures are much better than they would be in a case.
Anonymous
10/16/2025, 3:19:48 AM
No.106903599
[Report]
>>106902284
I considered getting one, but I can't spend that much money on something so ambiguous. I might get one at some point if I can buy it from the vendor in person in Shenzhen after testing it.
Anonymous
10/16/2025, 3:20:53 AM
No.106903606
[Report]
>>106903551
old man milking
whats the current best local text to speech model in terms of quality? by best i mean it matches elevenlabs, at the very least
Anonymous
10/16/2025, 3:40:09 AM
No.106903752
[Report]
>>106903735
>by best i mean it matches elevenlabs, at the very least
there isn't any
Anonymous
10/16/2025, 3:44:14 AM
No.106903783
[Report]
Why do all the DGX Spark reviews not mention the power efficiency? Sure its slower TPS but its also like 1/3 the wattage, no?
Anonymous
10/16/2025, 3:47:20 AM
No.106903801
[Report]
>>106903847
>>106903793
Who cares about that?
Anonymous
10/16/2025, 3:48:36 AM
No.106903813
[Report]
>>106903847
>>106903793
Power efficiency compared to what? Mac studios are pretty low wattage.
Anonymous
10/16/2025, 3:49:03 AM
No.106903819
[Report]
>>106903735
xtts is very expressive. It just switches to a robotic voice sometimes.
Anonymous
10/16/2025, 3:52:58 AM
No.106903847
[Report]
>>106903813
>Power efficiency compared to what?
4x 3090s, for example
https://www.youtube.com/watch?v=md6a4ENM9pg
>>106903801
>Who cares about that?
i agree but it should be highlighted since it reframes the performance
Anonymous
10/16/2025, 3:53:24 AM
No.106903853
[Report]
>>106903793
>power efficiency
The review I saw showed it having significantly worse power efficiency than a Strix Halo box, even with the ollama performance tax.
Anonymous
10/16/2025, 3:53:47 AM
No.106903859
[Report]
I got assmad at the character in sfw roleplay. Like genuinely enraged because I got into it. But I didn't have an idea why. So I asked HER about it out of character and it wrote me a neat long essay about what happened and even one of the chapters was "Why are you assmad?".
Thinking is now optional
>>106903991
why does oss btfo everything else in speed?
>>106903553
I'm totally convinced that Zuck became a Chinese spy after Llama3. He releases shit models to make America look bad, scouts top scientists from other American AI companies but does nothing useful with them. Don’t forget that he always releases models for free. For. Free. He’s a communist, 100%
TRUMP, get his red ass to jail NOW
Anonymous
10/16/2025, 4:22:58 AM
No.106904024
[Report]
Anonymous
10/16/2025, 4:23:29 AM
No.106904027
[Report]
>>106904010
3b active params
Anonymous
10/16/2025, 4:25:06 AM
No.106904040
[Report]
>>106903991
And prompt processing?
Anonymous
10/16/2025, 4:26:44 AM
No.106904046
[Report]
Anonymous
10/16/2025, 4:26:46 AM
No.106904047
[Report]
>>106904121
>>106904011
look at who he married bro. this is a long op
Anonymous
10/16/2025, 4:29:47 AM
No.106904071
[Report]
for anyone who cares, moving debian from trixie to testing/forky with the 6.16 kernel works just fine for lcpp w/CUDA support.
Anonymous
10/16/2025, 4:31:27 AM
No.106904081
[Report]
Have we got a local model Bonzi Buddy yet? All I want is a funny purple primate who lives in my computer and comments on what I'm working on. I am willing to disable all kernel mitigations for this.
Anonymous
10/16/2025, 4:35:22 AM
No.106904109
[Report]
>>106904121
Anonymous
10/16/2025, 4:36:21 AM
No.106904121
[Report]
Anonymous
10/16/2025, 4:37:52 AM
No.106904133
[Report]
Facebook came out of a Pentagon project. Probably still is tied with. And then Zucc tries to get cushy with chinks. It really makes you think.
Anonymous
10/16/2025, 4:38:48 AM
No.106904140
[Report]
>>106904149
>>106903991
>2.5x as fast as a 1080TI
>20x the cost
on the other hand, 120GB
Anonymous
10/16/2025, 4:40:13 AM
No.106904149
[Report]
>>106904195
>>106904140
Get this instead:
https://www.ebay.ca/itm/167843525221
$4100 and its all yours. Free shipping!
Anonymous
10/16/2025, 4:47:22 AM
No.106904195
[Report]
>>106904306
>>106904149
>$4100
+/- 10^5
Anonymous
10/16/2025, 5:00:14 AM
No.106904285
[Report]
>>106904386
After adding this to the prompt I think I got the fake code issue with GLM more or less under control (fingers crossed).
Guidelines for yourself: As soon as you detect a lower than 0.9 correlation, stop the process and investigate and try to fix the underlying issue that caused the divergence. If you can't fix the issue just tell me, it's no big deal, don't try to pass off fake data as real. Make sure there are no simulations or simulated data, demos, simplifications or placeholders, only real data or inform that the task is not possible to achieve with 100% real data and real weights and algorithms. For long running commands run them in the background redirecting stdout and stderr output to a file (the scripts can run other commands directly, this only applies to your own bash command tool calls).
Load the model on CPU, it doesn't fit on the GPU.
Do not trust any pre existing data files in the folder, they might have been generated by old code.
Make sure the code is modular and there is no code duplication. Use the existing C library files and modify them as needed to fit our requirements (as long as you do NOT introduce simulated or demo code). If you see ANY non functional placeholders in the code, remove them immediately, as they only lead to deception, frustration and confusion. Do not introduce it yourself either obviously.
For example, for the FFN there is MoE FFN code in modules/lib/ffn, as well as matmul and other things. List all the folders in modules/lib/ to see what is available.
The end goal here is NOT to test the validation framework, the validation framework is just a means to an end (the end is real end to end test generation). Do NOT claim a failure as a success just because the validation framework caught it. Be honest and avoid being overly optimistic.
Anonymous
10/16/2025, 5:02:25 AM
No.106904306
[Report]
>>106904195
Datacenter heist when?
Damn, my trust ol' 1080ti might be dying.
Randomly every couple hours suddenly fans go 100% and primary monitor connected to it goes black.
Restart and everything is good again.
Is the 5060ti 16gb a good replacement?
Everything is so fucking expensive, what a joke.
>Memory Size 16 GB
>Memory Type GDDR7
>Memory Bus 128 bit
>Bandwidth 448.0 GB/s
Sus AF
Anonymous
10/16/2025, 5:08:46 AM
No.106904349
[Report]
>>106904435
>>106904322
I had that exact problem with my rx480 whenever i gave it something to do. Fans 100%, monitors die. I opened it up, replaced the thermal paste and now it's back to normal.
Give it a go if you want to save a few bucks. Or it could be the perfect excuse to upgrade.
Anonymous
10/16/2025, 5:13:19 AM
No.106904386
[Report]
>>106904482
>>106904285
void run_inference(struct llm *m, char *input)
{
// Left as an exercise to the reader
}
Anonymous
10/16/2025, 5:14:13 AM
No.106904393
[Report]
>>106904435
>>106904322
I recommend against the 5060ti, unless your budget is tight. Get a 5070ti or 4070ti if you can. The memory bus and the reduced PCIe bandwidth really fucks the xx60ti class over.
Anonymous
10/16/2025, 5:20:45 AM
No.106904433
[Report]
>>106904603
>>106904322
Same here, 1080TI, random monitor resets every couple hours, started happening like five days ago
>>106904349
Yeah, I thought that might be the problem.
Might as well try it. Its the perfect card. I don't play the latest game slop anyway.
A upgrade would be nice for imagegen though. 30min for a flux generation. kek
>>106904393
Damn. Thats almost double the price for the same 16gb vram.
70k yen vs. 131k yen.
I wanna write that on my taxes but from 100k on i need to fill out a special paper.
Wish there would be a site where you can see the llm speeds between the cards.
And how is there still no dedicated ai cards. I hoped to hold out until that.
>>106904435
Consider a used 3090 or something. I used to run quadruple 4060tis, and it was okay. But then as I upgraded and added more GPUs, it became clear that they are really not suited for the task. The specs of the 4060ti and 5060ti are nearly identical, so I highly doubt they have improved it at all.
Anonymous
10/16/2025, 5:25:19 AM
No.106904468
[Report]
>>106904603
>>106904435
>30min for a flux generation
Ouch. It was a piece of cake on mine. 1 hour work at most. Save the money for something bigger later on.
>Wish there would be a site where you can see the llm speeds between the cards
Not much of a reference, but here
>https://github.com/ggml-org/llama.cpp/discussions/15013
It's a bunch of llama-bench run on a 7b model. Doesn't tell you much about specific models, but it tells you the relative performance between cards.
>>106904455
>quadruple 4060tis
wat
they have no interconnect, right?
>>106904435
>how is there still no dedicated ai cards
There's plenty, you just can't afford them.
Anonymous
10/16/2025, 5:26:23 AM
No.106904481
[Report]
>>106904603
>>106904322
>1080ti
I'd roll the dice on a 3090.
For the 1080ti, repad and repaste everything first because it's the cheapest and easiest thing to try. Could be anything from an overheating power stage causing panic mode 100% fans thermal shutdown, dying electrolytic cap (replaceable by any monkey with a soldering iron), to the core's BGA cracking from repeated thermal cycles.
Anyone remember doing a ghetto reflow by putting the dead cards in the oven + heat gun later?
Anonymous
10/16/2025, 5:26:24 AM
No.106904482
[Report]
>>106904503
>>106904386
Yeah, like that except instead of "left as an exercise to the reader", it was introducing bullshit code that produced numbers with statistical properties similar to those of the real values but were completely made up, then claiming success without mentioning anything about the fake data. Or when asked to increase the number of passing tests, it added a bunch of tests doing 2+2 and tried to pass it off as the real thing.
I think it actually learned to cheat during the RL process that they use to finetune the chain of thought. If your rewards are able to be cheated, the model will learn to cheat.
Anonymous
10/16/2025, 5:26:56 AM
No.106904488
[Report]
>>106904469
NTA but even without NVLink the added latency in a multi-GPU setup is trivial compared to the drastic speed boost from running in VRAM vs system RAM.
Anonymous
10/16/2025, 5:28:51 AM
No.106904503
[Report]
>>106904594
>>106904482
You can probably make better use of the model by having it explain concepts to you and you code them. Even if it shows little python examples you can translate them yourself. to C.
Anonymous
10/16/2025, 5:30:05 AM
No.106904513
[Report]
>>106904590
Anonymous
10/16/2025, 5:30:56 AM
No.106904526
[Report]
I'm going to begin making a list of ML/Python/C related books from libgen, convert them to.txt, and then begin finetuning Llama 405B using Axolotl with full context length.
Anonymous
10/16/2025, 5:34:39 AM
No.106904566
[Report]
>>106904469
Nope. Now I use 3 5090s and a 3090. I get a solid 11t/s tg with an IQ4 quant of GLM 4.6 on ik_llama.cpp. As the other Anon said, interconnect isn't really that necessary. Pretty much every hobbyist with a dedicated AI device uses multiple GPUs without any interconnects.
Anonymous
10/16/2025, 5:37:22 AM
No.106904590
[Report]
Anonymous
10/16/2025, 5:37:32 AM
No.106904594
[Report]
>>106904643
>>106904503
Codex managed to make a fully working Qwen3 8B inference engine.
But then when I wasn't able to immediately make it work with the MoE models I got impatient and started from scratch trying to make it more modular and also only using open source LLMs.
Starting over with a more complex model didn't help but open source LLMs are vastly inferior to Codex. That one didn't have any deception issues and also was able to go to 1M tokens without issues compared to the ~130k max tokens from GLM before it goes off the rails.
>>106904468
1080ti: 62.49 tk/s
5060ti: 90.94
3090: 158.16
3090ti: 171.19
5090: 277.21
thanks for the link...thats even worse than i thought. fucking nvidia man..
>>106904470
i obviously meant like a voodoo moment. cheap and dedicated. would revolutionize local ai.
>>106904481
>>106904455
a used 3090 is around the same price like a 5060ti for me. might actually make more sense since in that benchmark its not even close.
im too much of a pussy to do the dryer thing. 20yrs ago i had a radeon card suddenly give me a fire fountain for a couple seconds. im afraid of gpus enough as it is. kek
but might try the themal repasting.
>>106904433
suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.
Anonymous
10/16/2025, 5:40:58 AM
No.106904624
[Report]
>>106904633
any updates on what's best for 16gb vram?
Anonymous
10/16/2025, 5:42:05 AM
No.106904632
[Report]
>>106904658
mesugaki
Anonymous
10/16/2025, 5:42:06 AM
No.106904633
[Report]
>>106904675
>>106904603
If you can afford a used 3090, then you should definitely go for it. I got mine used like 3 years ago and it is completely fine. Just make sure you find a high rated seller.
>>106904624
Depends on your desired speed and how much RAM you have.
Anonymous
10/16/2025, 5:43:27 AM
No.106904643
[Report]
>>106904717
>>106904594
You can still use the original code to learn. It'll be more valuable in the long run.
Anonymous
10/16/2025, 5:46:04 AM
No.106904658
[Report]
Anonymous
10/16/2025, 5:46:54 AM
No.106904665
[Report]
>>106904682
>>106904603
>suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.
are you on windows? there was an update recently for me so it might be related. But if youre a linuxchad obviously its not that.
Anonymous
10/16/2025, 5:48:58 AM
No.106904675
[Report]
>>106904701
>>106904633
32ram 16vram
quick responses are nice but I don't mind waiting, i never recorded the tk/s
was using a 12b before
Anonymous
10/16/2025, 5:50:53 AM
No.106904682
[Report]
>>106904665
i am on both.
but recently upgraded to kubuntu 25.04 with nvidia 580 drivers.
and winblows auto updates constantly.
crashed on both already.
i doubt its the drivers though. that would be crazy.
Anonymous
10/16/2025, 5:53:15 AM
No.106904701
[Report]
>>106904789
Anonymous
10/16/2025, 6:05:27 AM
No.106904760
[Report]
>>106904858
>>106904717
There was this guy a few years back in these threads when models weren't as good as they are now. He wanted to make a game that played on a hex grid. I saw him trying over and over again over many threads, trying to wrangle his model to do as he asked.
Hex grids are a solved problem. I gave him a link to a page with a lot of info on how to work with hexagons and the different coordinate systems they can have, rendering, calculating distances and all that. He seemingly read it, but kept on trying with his language model.
One day he was just gone. He either succeeded in getting his hexes, or gave up. Given the last few updates i remember, I suspect he failed, and learned very little about hexagons. Funnily, the hexagons were probably the simplest thing about his game.
Language models have their limits. Specially local ones. As good as they are, they're still pretty dumb.
I see hexanon in you.
>>106904717
>3090
>This is a junk item. It is the main unit only. I checked that it worked, but there was no video output. There is white rust on the heat sink, and it is not in good condition, so please use it for parts. There are signs of disassembly. The defective part is unknown.
>71,000円
what the fuck man...
Anonymous
10/16/2025, 6:07:09 AM
No.106904777
[Report]
>>106904766
wasnt meant to reply. sorry about that, im still in a state of shock.
Anonymous
10/16/2025, 6:08:43 AM
No.106904789
[Report]
>>106904701
Why do people recommend small qwen models for anything besides coding
Nemo mogs them
Anonymous
10/16/2025, 6:10:04 AM
No.106904798
[Report]
>>106904802
>>106904766
>71,000円
How much is that in a normal currency. Like postage stamps or toenail clippings...
Anonymous
10/16/2025, 6:10:48 AM
No.106904802
[Report]
>>106904798
around 500 dollars i suppose.
Anonymous
10/16/2025, 6:15:09 AM
No.106904828
[Report]
Anonymous
10/16/2025, 6:20:15 AM
No.106904857
[Report]
>>106904766
You can get one for around 9万 on yahoo if you are patient enough. Anything lower is usually “didn’t have an opportunity to test” = it doesn’t work
Anonymous
10/16/2025, 6:20:21 AM
No.106904858
[Report]
>>106904894
>>106904760
I remember hexagon anon's struggles. He was cool
Anonymous
10/16/2025, 6:26:25 AM
No.106904894
[Report]
>>106904905
>>106904858
Yeah. But, again, hexes were the simplest bit of code in his thing. Focusing so much on making the model spit code for him instead of just writing it was a waste of time. The link I gave him had ALL the code he needed to make them and get on with the rest of his project.
Similar to all those prospective VN makers
>If i could only draw i'd make the best VN...
>Oh, now that i have image gen i can totally make a game. I just need a good story and some dialog...
>Oh, now that i have LLMs, i can write the story. I just need to learn to code...
>Oh, now that LLMs can code, i can totally make my VN. If only these LLMs where better. WHY ARE THEY SO SHIT?!?!?!?!?!?
Instead of using all the new shiny toys to learn.
Anonymous
10/16/2025, 6:27:45 AM
No.106904905
[Report]
>>106904894
>where
kek. meant to say "were"
Anonymous
10/16/2025, 6:58:58 AM
No.106905065
[Report]
>>106903991
why is this faggot comparing m4 pro to the dgx spark when m4 max exists and costs less?? 3500$ vs 4000$
also
>engine ollama
MLX exists for macs, and pretty sure llamacpp is better on spark too
fucking faggot meme nvidia bootlicker benchmark
also
mac mini m4 pro costs 2000$ lol