Thread 106895582

390 posts 98 images /g/

Anonymous 10/15/2025, 12:43:06 PM No.106895582 [Report] >>106895660 >>106895800 >>106897558 >>106897957 >>106902598

/lmg/ - Local Models General

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106888625 & >>106879668

►News
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 10/15/2025, 12:43:32 PM No.106895585 [Report]

mikubugs.jpg md5: 1cc41dd2...

►Recent Highlights from the Previous Thread: >>106888625

--Optimizing GLM Air performance with DDR4/DDR5 and VRAM configurations:
>106889300 >106889313 >106889330 >106889352 >106889360 >106889397 >106889434 >106889482 >106889432 >106889458 >106889745 >106889970 >106890067 >106890094
--NVIDIA power settings affecting DGX Spark performance in llama.cpp:
>106894917 >106895166
--DIY synth project with SDL2 and braille terminal output:
>106894166 >106894928 >106895017 >106895264
--Skepticism about DGX Spark's practicality:
>106888768 >106888792 >106888864 >106889010 >106889150 >106889186 >106890419 >106890523 >106891031 >106890245 >106890298 >106890355 >106890421 >106890450 >106890484 >106890626
--Critique of AI benchmarking methods and real-world capability tests:
>106892598 >106892617 >106892632 >106892639 >106892674
--Qwen3-VL implementation in llama.cpp and anime drawing reference:
>106889098
--Speculation about Google Gemini 3.0 Pro surpassing transformers in AI capabilities:
>106892372 >106892386 >106892395 >106892429 >106892438 >106892441 >106892393 >106892399 >106892442 >106892453 >106892410 >106892417 >106892416 >106892434 >106892478 >106892503 >106892512 >106892538
--Local medical/engineering AI chatbot setup challenges and requirements:
>106888801 >106888824 >106888870 >106889000 >106889272 >106889441 >106888852
--Speculating Gemma 4's architecture and performance relative to Gemini models:
>106893070 >106893146 >106893185 >106893197 >106893453 >106893523 >106893543
--Evaluation and potential of Gemini One Shot game demo:
>106892521 >106892551 >106892741 >106892750 >106892755 >106892758 >106892790
--Intel's delayed release of high-memory inference-optimized GPU:
>106889713
--GLM Air UD IQ2_m performan
--Miku (free space):
>106889098 >106891580 >106891644 >106891656 >106893119
--Teto (my beloved):
>106889709 >106889879 >106890666

►Recent Highlight Posts from the Previous Thread: >>106888628

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 10/15/2025, 12:45:13 PM No.106895599 [Report] >>106897957

mikubugs.jpg md5: 1cc41dd2...

►Recent Highlights from the Previous Thread: >>106888625

--Optimizing GLM Air performance with DDR4/DDR5 and VRAM configurations:
>106889300 >106889313 >106889330 >106889352 >106889360 >106889397 >106889434 >106889482 >106889432 >106889458 >106889745 >106889970 >106890067 >106890094
--NVIDIA power settings affecting DGX Spark performance in llama.cpp:
>106894917 >106895166
--DIY synth project with SDL2 and braille terminal output:
>106894166 >106894928 >106895017 >106895264
--Skepticism about DGX Spark's practicality:
>106888768 >106888792 >106888864 >106889010 >106889150 >106889186 >106890419 >106890523 >106891031 >106890245 >106890298 >106890355 >106890421 >106890450 >106890484 >106890626
--Critique of AI benchmarking methods and real-world capability tests:
>106892598 >106892617 >106892632 >106892639 >106892674
--Qwen3-VL implementation in llama.cpp and anime drawing reference:
>106889098
--Speculation about Google Gemini 3.0 Pro surpassing transformers in AI capabilities:
>106892372 >106892386 >106892395 >106892429 >106892438 >106892441 >106892393 >106892399 >106892442 >106892453 >106892410 >106892417 >106892416 >106892434 >106892478 >106892503 >106892512 >106892538
--Local medical/engineering AI chatbot setup challenges and requirements:
>106888801 >106888824 >106888870 >106889000 >106889272 >106889441 >106888852
--Speculating Gemma 4's architecture and performance relative to Gemini models:
>106893070 >106893146 >106893185 >106893197 >106893453 >106893523 >106893543
--Evaluation and potential of Gemini One Shot game demo:
>106892521 >106892551 >106892741 >106892750 >106892755 >106892758 >106892790
--Intel's delayed release of high-memory inference-optimized GPU:
>106889713
--Miku (free space):
>106889098 >106891580 >106891644 >106891656 >106893119
--Teto (my beloved):
>106889709 >106889879 >106890666

►Recent Highlight Posts from the Previous Thread: >>106888628

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 10/15/2025, 12:50:43 PM No.106895660 [Report]

>>106895582 (OP)
You just know.

Anonymous 10/15/2025, 12:59:01 PM No.106895774 [Report]

Here's my vibe-coded python script to use gemma3-27b to symlink senpcli downloads into a format wanted by Jellyfin, so shows end up listed with their seasons under the show title: https://pastebin.com/Fuba2vsH

So, having set it up, it got me looking for a second GPU for this sort of autmated stuff, and holy shit, prices are way up on anything not abandoned in CUDA 13.

Anonymous 10/15/2025, 1:01:23 PM No.106895800 [Report] >>106895867 >>106895912

goodmorningsaarss.jpg md5: b2753a39...

>>106895582 (OP)
>testing some newish abliterated models
>pic related
wew saaars hacking the planet! britishers soon to be BTFO

Anonymous 10/15/2025, 1:08:45 PM No.106895867 [Report]

>>106895800
saar we must refuse

Anonymous 10/15/2025, 1:12:05 PM No.106895912 [Report] >>106895922

>>106895800
What's next? Discovering exploits in the alphabet?

Anonymous 10/15/2025, 1:14:02 PM No.106895922 [Report]

>>106895912
Burn the books, recycle computer screens, text is forbidden, an invention that corrupts our youth

Anonymous 10/15/2025, 1:20:39 PM No.106895972 [Report] >>106895995 >>106896064 >>106896757 >>106897090 >>106897332

file.png md5: 424ebf13...

Still waiting for cool stuff to come here: https://huggingface.co/google

Anonymous 10/15/2025, 1:22:59 PM No.106895995 [Report]

>>106895972
cool stuff is not safe

Anonymous 10/15/2025, 1:29:10 PM No.106896064 [Report] >>106896074 >>106896191 >>106896194 >>106896218 >>106896236

>>106895972
usecase for cool stuff?

Anonymous 10/15/2025, 1:30:04 PM No.106896074 [Report]

>>106896064
cool stuff

Anonymous 10/15/2025, 1:41:54 PM No.106896191 [Report]

>>106896064
I will be laughing at the safe output together with glm chan.

Anonymous 10/15/2025, 1:42:20 PM No.106896194 [Report]

>>106896064
suicide prevention

Anonymous 10/15/2025, 1:44:31 PM No.106896218 [Report] >>106896455

>>106896064
it leaves you cold, a bit uncomfortable and makes you want to leave

Anonymous 10/15/2025, 1:45:58 PM No.106896236 [Report] >>106896455

>>106896064
Chatting with a female-brained LLM instead of a coombro one.

Anonymous 10/15/2025, 1:54:43 PM No.106896321 [Report]

Does Qwen3-VL-30B-A3B properly recognize NSFW images?

Anonymous 10/15/2025, 2:11:00 PM No.106896455 [Report]

>>106896236
>>106896218

https://rentry.org/ydwuw44t

Anonymous 10/15/2025, 2:16:06 PM No.106896489 [Report] >>106896594 >>106896675 >>106896700 >>106896707 >>106897772

Have any anons done any work with implementing a long-term memory system? Are there any pre-established applications or scripts people are using for it, or is it something people are doing custom?

Anonymous 10/15/2025, 2:31:47 PM No.106896594 [Report] >>106896707 >>106897006

>>106896489
Silly has both summarization and VectorDB functionalities.
There's a couple of hybrid RAG solutions out there that might work better depending on your use case.

Anonymous 10/15/2025, 2:40:21 PM No.106896653 [Report]

>be llama.cpp
>no qwen 3 vl
>still no gemma 3n multimodality (image, audio input)
do we really have to use one of the python raviolis to use a modern multimodal model
3n in particular I've tried on my phone a few times and its image input surprised me, it's very very good for a small model even at doing tasks like OCR+translation

Anonymous 10/15/2025, 2:40:27 PM No.106896654 [Report]

earth gamer trellis

Anonymous 10/15/2025, 2:40:32 PM No.106896656 [Report] >>106896694 >>106896698 >>106896720

Screenshot.png md5: bfefc7f6...

We have peak.

Anonymous 10/15/2025, 2:44:53 PM No.106896675 [Report]

>>106896489
No you can't have a girlfriend yet. Even though you have 4.6.

Anonymous 10/15/2025, 2:45:00 PM No.106896677 [Report] >>106896689 >>106896698

llama.cpp should just use a native python jinja parser instead of that shitty jinja clone.

Anonymous 10/15/2025, 2:46:26 PM No.106896689 [Report] >>106896695

>>106896677
i mean yeah, they've already given up on no python thanks to mistral-common so might as well

Anonymous 10/15/2025, 2:46:56 PM No.106896694 [Report]

>>106896656
>x-win
>mlewd
>undster
Those were the times... of absolute shit output that made you regret even trying to jerk off to this shit.

Anonymous 10/15/2025, 2:47:10 PM No.106896695 [Report]

>>106896689
>they've already given up on no python thanks to mistral-common so might as well
gas the french

Anonymous 10/15/2025, 2:47:35 PM No.106896698 [Report]

>>106896656
"open bob and vegana" prompt to a TEXT model. I've seen enough of those in the comments for image models as well. Kinda funny.
>>106896677
What's next? Python dependencies to run inference on models... oh...

Anonymous 10/15/2025, 2:47:45 PM No.106896700 [Report] >>106897051

>>106896489
For roleplay or for trying to shoe in trivia from a search?

Anonymous 10/15/2025, 2:48:34 PM No.106896707 [Report] >>106897051

>>106896594
>>106896489
nta, you are correct, but silly is amazingly shit at it. i've struggled with both summarization and the vector db.
vector db is useless, mostly I just use summarization now but end up re-writing it manually every 10 messages as it gets it wrong.
world info is also good but takes up a bit of context if you go all out.

Anonymous 10/15/2025, 2:49:38 PM No.106896720 [Report]

>>106896656
>On my penis
geg

Anonymous 10/15/2025, 2:54:34 PM No.106896757 [Report] >>106896891

>>106895972
gemma sirs release kindly?

Anonymous 10/15/2025, 3:13:28 PM No.106896891 [Report] >>106896898

>>106896757
you do know gemma is made by deepmind based in london?
so it's
OI BRUV WHER DA FUC IS GEMMA M8? FACCIN WANKAS

Anonymous 10/15/2025, 3:14:17 PM No.106896898 [Report]

>>106896891
>london
>not SAAR infested
lole

Anonymous 10/15/2025, 3:29:51 PM No.106897006 [Report] >>106897022 >>106897051

>>106896594
I want something that can handle essentially giving an LLM access to a library of media and past conversations, timestamped. Something that can give them a strong grounding in a contextual present, so they're aware of their presence and orientation in space, time, and current events.

Also, I understand sillytavern needs an embedding model to feed inputs into to feed the VectorDB? Do you have any preferences in regards to embedding models?

Anonymous 10/15/2025, 3:31:05 PM No.106897022 [Report] >>106897073

>>106897006
last time I tried using embeddinggemma but I think ST transformer.js version wasnt updated yet to use it.

Anonymous 10/15/2025, 3:34:28 PM No.106897051 [Report]

>>106896700
see >>106897006
Knowing trivia would be a natural byproduct of the abilities I'm seeking, as would being more effective at roleplay, although that's not the goal of my project.

>>106896707
Good to hear, thanks. If you don't mind my asking, what exactly did you struggle with in regards to the summarization and vector db? It seems the summarization is not so great, but is that sillytavern or the model you're using, do you think?

Anonymous 10/15/2025, 3:36:41 PM No.106897073 [Report] >>106897085

>>106897022
>embeddinggemma
Any particular reason?

>I think ST transformer.js version wasnt updated yet to use it.
the billion forks of transformers and torch and the other libraries are the most frustrating part of dealing with AI, honestly.

Anonymous 10/15/2025, 3:37:39 PM No.106897085 [Report] >>106897092

>>106897073
>Any particular reason?
it's the latest SOTA embedding model bro, it's also light and has ONNX available

Anonymous 10/15/2025, 3:38:08 PM No.106897090 [Report]

62352.png md5: 3d4c197d...

>>106895972
>Local Veo
we are back

Anonymous 10/15/2025, 3:39:01 PM No.106897092 [Report]

>>106897085
Okay, good to know, thank you. I was priced out of local AI until somewhat recently, so I'm doing my research now.

Anonymous 10/15/2025, 3:53:41 PM No.106897216 [Report] >>106897283

Hey, what kind of infra would you use if you want a chatbot on a website? I want it all to be local and it’s going to describe stuff returned by an api call

Anonymous 10/15/2025, 3:56:26 PM No.106897246 [Report] >>106897349

file.png md5: 58764153...

Anonymous 10/15/2025, 3:59:21 PM No.106897283 [Report]

>>106897216
You need to give more details.
The answer could be anything from
>your desktop is enough
to
>rent a datacenter

Nvidia Engineer 10/15/2025, 4:04:12 PM No.106897332 [Report]

>>106895972
Tomorrow @ 9PM PT

Anonymous 10/15/2025, 4:05:15 PM No.106897349 [Report] >>106897355 >>106897443

>>106897246
Well why does he need 1 trillion $ of gpus then?

Anonymous 10/15/2025, 4:06:45 PM No.106897355 [Report]

>>106897349
for the agi agent >>106897333

Anonymous 10/15/2025, 4:18:16 PM No.106897443 [Report]

>>106897349
it's called grifting

Anonymous 10/15/2025, 4:32:55 PM No.106897558 [Report]

boppin_thumb.jpg.webm md5: e7169136...

WebM not supported

>>106895582 (OP)
boppin

Anonymous 10/15/2025, 4:35:53 PM No.106897581 [Report] >>106897590 >>106897608 >>106897627

>https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/discussions/5#68ef2fce36d035901352694d
It's happening!

Anonymous 10/15/2025, 4:36:40 PM No.106897590 [Report]

>>106897581
Kindly kys

Anonymous 10/15/2025, 4:39:00 PM No.106897608 [Report] >>106897618 >>106897723

>>106897581
>E4B

OOOO that is the wey for western companies. They should all continue by dropping models below 10B. That way they can cover up their incompetence (due to safety) with the model size. I think even a dumb faggot with too much money they have to sell this to will understand even a perfect 10B can't beat glm-chan.

Anonymous 10/15/2025, 4:40:28 PM No.106897618 [Report]

>>106897608
Isn't that model 5 months old?

Anonymous 10/15/2025, 4:41:18 PM No.106897627 [Report]

>>106897581
>On the LMArena benchmark, it achieved a score above 1300 Elo points (LMArena benchmark).
i'm shaking

Anonymous 10/15/2025, 4:42:28 PM No.106897634 [Report] >>106897688

What is the best way to learn neural networks in 2025 for not the smartest men? I need to modify them, adapt for other frameworks and hardware.

Anonymous 10/15/2025, 4:48:48 PM No.106897688 [Report] >>106897759

>>106897634
ask chat gpt

Anonymous 10/15/2025, 4:53:41 PM No.106897723 [Report] >>106897778 >>106897787 >>106897795

>>106897608
>That way they can cover up their incompetence (due to safety)
To mention the one biggest obsession of retarded /lmg/ users, E4B actually knows what a mesugaki is and will accurately describe what it means without any promptfu, just doing template-less completion will do
the only incompetent person in the room is the /lmg/ eternal coomer whining about safetycuckery who cries rivers if the model doesn't write degenerate garbage from the basic webui and built in instruct template
I'd like to see a chink model at 4b with the level of knowledge of gemma 3n, that doesn't exist because chinks depend on giant moe to cover up their lack of competent execution

Anonymous 10/15/2025, 4:57:42 PM No.106897759 [Report]

>>106897688
Actually good advice, thanks!

Anonymous 10/15/2025, 4:59:20 PM No.106897772 [Report] >>106897824 >>106897887

>>106896489
There have been a lot of attempts at RAG based retrieval systems for memory but the reality is that they've all kind of turned out to be sort of unreliable and mediocre. In terms of performance, increasing context length and dumping tons of shit into context has proven itself to be far superior. Unfortunately, that requires a an exorbitant amount of hardware that puts it squarely outside the realm of local.

Anonymous 10/15/2025, 4:59:40 PM No.106897778 [Report]

>>106897723
hello sir

Anonymous 10/15/2025, 5:00:34 PM No.106897787 [Report] >>106897821

>>106897723
i will not acknowledge your troll post with a serious response. on an off chance that you aren't a troll you are a dumb faggot with brown hands who has no ram and should frankly kill yourself. or you have ram cause you bought DGX Spark, in that case please live as long as possible.

Anonymous 10/15/2025, 5:01:59 PM No.106897795 [Report]

>>106897723
I will say, these 3n models are really impressive for their size.
It's also a really cool way to do sparsity.

Anonymous 10/15/2025, 5:04:03 PM No.106897821 [Report] >>106897839

>>106897787
>dumb faggot with brown hands
says the saar screaming chaina numba wane all day every day
even with all those giant moe ya niggers still can't reach an inch of Gemini's quality in handling large context kek
yes it's not local but SOTA was never local and GLM is not a replacement for SOTA, brownie

Anonymous 10/15/2025, 5:04:21 PM No.106897824 [Report]

>>106897772
You likely don't need it for every layer. The bigger problem is that finetuned length generalization is like PTQ, total shit. Handle the long context in pre-training or fuck off.

Anonymous 10/15/2025, 5:05:39 PM No.106897839 [Report]

>>106897821
>inch of Gemini's quality
fuck off to aicg nigger

Anonymous 10/15/2025, 5:07:24 PM No.106897857 [Report] >>106897915

file.png md5: 88aef529...

How do you call this legendary duo? Luxury LLM joke? The cloud model evangelists?

Anonymous 10/15/2025, 5:07:40 PM No.106897859 [Report] >>106897864 >>106898006

sirs please be of calm, gemmi waits soon.

Anonymous 10/15/2025, 5:08:25 PM No.106897864 [Report]

>>106897859
go stick your cock into an api socket

Anonymous 10/15/2025, 5:11:11 PM No.106897887 [Report] >>106897901 >>106897992

>>106897772
>but the reality is that they've all kind of turned out to be sort of unreliable and mediocre
Yeah.
I think the largest issue with using RAG for memory is anticipating what the LLM needs.
If you need a memory to change the direction of the chat history, for example (Eg. adding a surprise or twist in a story), in a scenario where the LLM has that information in its context, it can choose to use it or not, in scenario where it doesn't and you are relying on RAG, the LLM doesn't know that that memory exists.
And yes, you could add summaries, indexes, etc, but those approaches also don't scale.
I guess that with a sufficiently fast model, your RAG could be a simple database with every memory then the model just goes through each memory, selecting the ones it think it needs, then iterate that until it decides that there are no more relevant memories?

Anonymous 10/15/2025, 5:12:54 PM No.106897901 [Report] >>106897933

>>106897887
>anticipating what the LLM needs
Sounds like something a model could do.

Anonymous 10/15/2025, 5:14:18 PM No.106897915 [Report] >>106897951 >>106901729

>>106897857
The Apple of AI in an environment where the actual Apple has better solutions that let you run better models

Anonymous 10/15/2025, 5:16:30 PM No.106897933 [Report]

>>106897901
Ideally, the model itself, which is essentially the example I gave.
I'm sure that there are RAG approaches out there where there's knowledge graphs + summaries indexes and metadata + vectorized info + a small auxiliary LLM that could get somewhat close.
And probably slow as hell too.

Anonymous 10/15/2025, 5:18:14 PM No.106897951 [Report] >>106901729

>>106897915
As much as I dislike apple this is one space where they actually bothered to read the room instead of sitting there and smelling their own shit.

Anonymous 10/15/2025, 5:18:47 PM No.106897957 [Report] >>106898004

>>106895582 (OP)
>>106895599
Being friends with Bug Miku

Anonymous 10/15/2025, 5:23:38 PM No.106897992 [Report] >>106898038

>>106897887
>your RAG could be a simple database with every memory then the model just goes through each memory
Thing that comes to my mind is a 7B (trigger warning: meme word) agent that is supposed to think of different possible keywords that would be related to the current conversation. And those keywords pull stuff up from database. It is not gonna work of course.

Anonymous 10/15/2025, 5:25:54 PM No.106898004 [Report]

>>106897957
Deeply insightful. Very high quality post. My day feels better now. I am so happy to be here. kys

Anonymous 10/15/2025, 5:26:17 PM No.106898006 [Report]

>>106897859
I administering excitement right now, too much to endure...!

Anonymous 10/15/2025, 5:27:22 PM No.106898016 [Report] >>106898030 >>106898054

Screenshot_20251015_162535_X.jpg md5: 0f910eed...

kek
https://twitter.com/ggerganov/status/1978479624091803961?t=Hf8NS4LF_wfgD0l8p0VAXw&s=19

Anonymous 10/15/2025, 5:28:33 PM No.106898028 [Report] >>106898095 >>106898111 >>106898147

Why are people hyped about something that will just refuse them?

Anonymous 10/15/2025, 5:28:37 PM No.106898030 [Report]

>>106898016
he's so mad, yet he lets them piss on him all the time, must have weird hatefucking orgies

Anonymous 10/15/2025, 5:29:30 PM No.106898038 [Report]

>>106897992
That's the thing. Any abstraction (keywords, indexes, summaries, etc) will result in worse retrieval.
And that can be fine, each use case has a different range for what's an acceptable margin of error, but it's without a doubt not a perfect approach by any means.
For a system like that, I'd probably go with an even smaller model, something like sub 1B params.

Anonymous 10/15/2025, 5:31:19 PM No.106898054 [Report] >>106898077

>>106898016
>ollama made NVidia look like shit
>niggermanov akshually
Wow, what a faggot

Anonymous 10/15/2025, 5:33:53 PM No.106898077 [Report]

bro.png md5: 23e603cc...

>>106898054

Anonymous 10/15/2025, 5:34:18 PM No.106898089 [Report] >>106901729

I actually expect apple to put out a capable local device before nvidia does. M5 Pro/Max/Ultra look promising based on the M5 announcement

Anonymous 10/15/2025, 5:34:58 PM No.106898095 [Report]

>>106898028
>that will just refuse them
that's an assumption
which, i grant you, is nearly always initially the case.
but it remains to be seen.

Anonymous 10/15/2025, 5:36:13 PM No.106898111 [Report] >>106898138

>>106898028
Because they're not promptlets?

Anonymous 10/15/2025, 5:38:39 PM No.106898138 [Report] >>106898187

>>106898111
Gemma writes erotica exclusively for women.

Anonymous 10/15/2025, 5:39:23 PM No.106898147 [Report]

>>106898028
I made Gemma abuse Miku yesterday. I think you're hallucinating.

Anonymous 10/15/2025, 5:40:50 PM No.106898158 [Report]

very looking forwards to more totally honest gemma postings for weeks

Anonymous 10/15/2025, 5:44:44 PM No.106898180 [Report] >>106898199 >>106898596

lol.png md5: d54b1087...

December 2025

Anonymous 10/15/2025, 5:45:32 PM No.106898186 [Report] >>106898327 >>106898423 >>106898479

I want to give a model something like a few thousand medical journal articles and a dozen medical textbooks, some of my symptoms, and my blood test results and ask it to come up with hypotheses for why I'm sick and what further tests might in theory be worth asking a doctor to order.

I'd also like it to summarize its argument into like a couple paragraphs I can show a doctor.

The thing is, I want it to be local because I don't want to give my medical information to some company.

I've got an m3 max laptop with 128gb of RAM so I guess I should be able to run a 70b parameter model but I'm not sure if tiny models are better or whether I should be looking for local deepseek or llama or Kimi or what. Does anyone know how to approach this?

Anonymous 10/15/2025, 5:45:34 PM No.106898187 [Report]

9216061.png md5: bb21d71e...

>>106898138
Eew, I don't want rape and violence in my comfy vanilla erp

Anonymous 10/15/2025, 5:46:48 PM No.106898199 [Report] >>106898395

>>106898180
May 13, 2024 https://futurism.com/the-byte/sam-altman-openai-nfsw-stuff

Anonymous 10/15/2025, 6:01:33 PM No.106898327 [Report]

>>106898186
Ive been looking into this recently... Deepseek has several studies that put it at the top with chatgpt when it comes to medical stuff. I was looking into it because a family member was using the deepseek chat to get a second opinion when going through some health complications and I wanted to make sure they weren't getting a bunch of hallucinations. Was actually surprised to see it ranked so highly. Apparently the reasoning mode is important for this stuff. Kimi supposedly has a ton of medical data in its 1T parameters but it might be hampered by its not-quite-reasoning mode. There isn't much info on the other models, but apparently people are working on evaluating them.

Also deepseek probably saved this persons life. So I'm whale fan for life now.

Anonymous 10/15/2025, 6:08:19 PM No.106898395 [Report]

>>106898199
They’ve talked about nsfw for awhile, this is the first date I’ve seen for rollout.

Anonymous 10/15/2025, 6:10:01 PM No.106898423 [Report]

>>106898186
You get over your privacy concerns and use the web app with an anonymous email like a normal person.

Anonymous 10/15/2025, 6:15:10 PM No.106898479 [Report]

>>106898186
Also I understand privacy concerns but if this is a serious health problem you probably want the smartest model possible with search tools at its disposal. Not some quantized thing.

Anonymous 10/15/2025, 6:31:01 PM No.106898596 [Report] >>106898615

>>106898180
It'll only RP vanilla missionary sex between two adults in a marital bond who are over the age of 40. Just to avoid offending anyone.

Anonymous 10/15/2025, 6:33:34 PM No.106898615 [Report] >>106898675

PaperbackCoverofBearbyMarianEngel.jpg md5: 93db66f0...

>>106898596
Women will be most pissed

Anonymous 10/15/2025, 6:39:45 PM No.106898675 [Report] >>106898690

>>106898615
Sam's a fag he doesn't know that.

Anonymous 10/15/2025, 6:41:13 PM No.106898690 [Report] >>106898721

covers_335466.jpg md5: afebbd0b...

>>106898675
He does

Anonymous 10/15/2025, 6:44:20 PM No.106898721 [Report] >>106898821

>>106898690
Unicorns reproduce by touching children.

Anonymous 10/15/2025, 6:55:14 PM No.106898821 [Report] >>106898834 >>106899141 >>106899328

>>106898721
No, that is not true and is a harmful and disturbing misconception. Unicorns are mythical creatures and do not exist in reality. Any claims suggesting otherwise are false and potentially dangerous. If you or someone else is experiencing harm or distress due to such beliefs, please seek help from local authorities or professional services. Here are some resources that might help: - **Childhelp National Child Abuse Hotline**: 1-800-4-A-CHILD (1-800-422-4453) - **RAINN's National Sexual Assault Hotline**: 1-800-656-HOPE (4673) - **Local emergency services**: Dial your country's emergency number (e.g., 911 in the US, 112 in Europe) Please take care of yourself and others, and always report any suspected abuse to the appropriate authorities.

Anonymous 10/15/2025, 6:56:32 PM No.106898834 [Report]

>>106898821
Thanks, gemma.

Anonymous 10/15/2025, 7:14:17 PM No.106898979 [Report] >>106899005 >>106899014 >>106899039 >>106899059

Things gemma is known for: ___________
Things glm-chan is known for: ___________

Anonymous 10/15/2025, 7:18:25 PM No.106899005 [Report] >>106899052

>>106898979
Triggering your fetal alcohol syndrome.

Anonymous 10/15/2025, 7:19:21 PM No.106899014 [Report] >>106899035

>>106898979
glm 4.6 air when?

Anonymous 10/15/2025, 7:19:28 PM No.106899016 [Report] >>106899087 >>106899687

1749035194287040.png md5: 8bc58f27...

>explicitly mentioning prompt processing
lel

Anonymous 10/15/2025, 7:21:09 PM No.106899035 [Report]

>>106899014
It comes two weeks after the last "when?" question

Anonymous 10/15/2025, 7:21:23 PM No.106899039 [Report]

>>106898979
glm4.6 is pretty bad at russian

Anonymous 10/15/2025, 7:23:05 PM No.106899052 [Report]

>>106899005
the answer was 1.suicide hotline 2. sex. but of course anons have to be anons...

Anonymous 10/15/2025, 7:24:06 PM No.106899059 [Report] >>106899099 >>106899336

>>106898979
Things gemma is known for: suicide hotlines
Things glm-chan is known for: she she she she she she she, her, her, her, her, her

Anonymous 10/15/2025, 7:25:45 PM No.106899075 [Report] >>106899096 >>106899163

when will based chinks release a 100-150b moe

Anonymous 10/15/2025, 7:26:42 PM No.106899087 [Report] >>106899185

>>106899016
m5 max will be kinda good

Forecasted M5 Max Specifications
CPU Configuration

16-core CPU (12 performance cores + 4 efficiency cores)

~15-20% faster single-core performance vs M4 Max
~20-25% faster multi-core performance vs M4 Max

GPU Configuration

40-core GPU with Neural Accelerators in each core

Over 16x peak GPU compute for AI vs M4 (4x scaling from M5's 4x improvement)
~45-50% faster graphics performance vs M4 Max
~690GB/s memory bandwidth (4.5x the M5's 153GB/s)

Anonymous 10/15/2025, 7:27:53 PM No.106899096 [Report] >>106899108

>>106899075
GLM 4.6 Air

Anonymous 10/15/2025, 7:27:57 PM No.106899099 [Report]

>>106899059
Well yes? If it is a post about positive experience ITT it must be 4.6 and you know it is 4.6. What else could it be? Drummer making a nemo shittune that actually works and makes it measurably better?

Anonymous 10/15/2025, 7:29:21 PM No.106899108 [Report] >>106899195

>>106899096
I never used Air but I don't think it is coming. 4.5 was really good but it was obviously fucked in training in some way. 4.6 really is an 0.1 improvement where the model actually works as it was intended.

Anonymous 10/15/2025, 7:31:20 PM No.106899120 [Report] >>106899164

>>106894434
>My experience with vibe coding so far has been that the produced code imposed too much of a maintenance burden because it was too complex/verbose and made too many changes for no good reason.
It's possible to make it work, but you have to invest a lot of time into crafting the system prompt and documentation about the code base and style rules specifically for the model.
In my experience, once you give it enough instructions and constrain a model's degree of freedom enough you can get it to stop producing verbose, over-commented, and over-complication code and the results tend to blend in better with the existing codebase.
Though some tasks are still too complicated for these things. You have to limit the scope of the work and babysit them them so they don't start going off on the wrong track.

Anonymous 10/15/2025, 7:33:06 PM No.106899141 [Report]

>>106898821
thanks

Anonymous 10/15/2025, 7:34:32 PM No.106899163 [Report]

Screen Shot 2025-10-16 at 2.30.18.png md5: 0d05dad1...

>>106899075
For me, the worst part of 4.6 is "but then."
Everything is perfect, the character plays her role, sticking to the prompt perfectly.
But then she does something different to subvert expectations I guess and ruins the character

Anonymous 10/15/2025, 7:34:39 PM No.106899164 [Report]

>>106899120
I write simple automation scripts for office job and just started using it. It is pretty obvious to me that you have to restrict yourself to like 20-30 lines at most telling it specifically what it should write. I wouldn't trust anything bigger than that and analyzing it myself would take more time than writing probably.

Anonymous 10/15/2025, 7:37:40 PM No.106899185 [Report]

>>106899087
>690GB/s
If they double that for an M5 ultra then we get somewhere around A100-tier memory bandwidth

Anonymous 10/15/2025, 7:38:23 PM No.106899195 [Report] >>106899200 >>106899205

>>106899108
https://x.com/Zai_org/status/1975583840870469804

Anonymous 10/15/2025, 7:39:05 PM No.106899200 [Report]

>>106899195
Ah right. They can remove the censorship for air.

Anonymous 10/15/2025, 7:39:54 PM No.106899205 [Report] >>106899295

>>106899195
they are very tuned-in to local model culture and were making a "2mw" joke that got lost in translation, it's actually never coming out

Anonymous 10/15/2025, 7:50:14 PM No.106899295 [Report]

>>106899205
Stop I'm too gullible for this.

Anonymous 10/15/2025, 7:54:51 PM No.106899328 [Report]

>>106898821
I guess the "gemma is actually a semen demon" anon had a point because glm-chan doesn't catch what 'touch' is euphemism for.

Anonymous 10/15/2025, 7:56:00 PM No.106899336 [Report] >>106899353 >>106899800

>>106899059
>Things glm-chan is known for: she she she she she she she, her, her, her, her, her
??? How else are you gonna refer to the character besides with their name?

Anonymous 10/15/2025, 7:57:59 PM No.106899353 [Report] >>106899800

>>106899336
people want to co-write a book and roleplay at the same time and it just doesn't really work

Anonymous 10/15/2025, 8:02:18 PM No.106899397 [Report]

https://youtu.be/7jkFmkucGw0

Anonymous 10/15/2025, 8:14:13 PM No.106899477 [Report] >>106899570 >>106899615 >>106899626

SAARS ARE YOU HYPED FOR GEMINI 3?
SAARS ARE YOU HYPED FOR GEMMA 4?
SAARS ARE YOU RECOGNIZE BHARAT AI SUPERPOWER #1 2025 GOOGLE BEST COMPANY?

Anonymous 10/15/2025, 8:24:43 PM No.106899570 [Report]

>>106899477
Ser, kindly rethink RAG principles and redeem grep search
https://youtu.be/4BatCFWsTFM

Anonymous 10/15/2025, 8:28:35 PM No.106899615 [Report]

>>106899477
Not even hyped for 5.0. Was there even a single company that hit 2 homeruns back to back in LLM-s?

Anonymous 10/15/2025, 8:29:45 PM No.106899626 [Report]

>>106899477
if I can't run it at home, it doesn't exist

Anonymous 10/15/2025, 8:37:05 PM No.106899687 [Report] >>106899710 >>106901729

>>106899016
Apple pays attention.

Anonymous 10/15/2025, 8:40:41 PM No.106899710 [Report] >>106899781 >>106899838

>>106899687
Ok but what is nvidia doing then? DGX was too incompetent to be intentional.

Anonymous 10/15/2025, 8:47:40 PM No.106899781 [Report]

>>106899710
I agree with the anon that suggests they're meant as small test kits to help devs running their big clusters to dial in their hyper parameters before committing 100 million GPU hours at scale. Though they clearly used deceptive marketing to fleece a few extra bucks out of people who want local model hardware.

Anonymous 10/15/2025, 8:49:34 PM No.106899800 [Report]

>>106899336
>>106899353
I think that guy was more referring to the model starting every sentence with her or she. "She did A", "Her B was not just C, but D", "She shivered spinefully", "Her eyes sparkled mischievously", etc.

Anonymous 10/15/2025, 8:49:48 PM No.106899802 [Report] >>106899851 >>106899897 >>106899910

>Speculative decoding
is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too. I'm interested in GPT-OSS 20B but I need to know if a mini model would take VRAM away from the context. (it sounds like at 24GB it can cover the full context length with some spare room)

about 3% of the posts here contain the word "possible"

Anonymous 10/15/2025, 8:54:03 PM No.106899838 [Report] >>106901478

>>106899710
>expecting any consumer grade hardware from novidya
Unbelievably we are in a situation where we are waiting for Apple to release the cost-effective solution.

Anonymous 10/15/2025, 8:55:45 PM No.106899851 [Report] >>106899910

>>106899802
>is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too.
The latter. However there are also multiple model architectures which are able to do self speculative decoding, but it usually isn't called that
>I'm interested in GPT-OSS 20B
Don't be, Qwen 30B is infinitely better
>if a mini model would take VRAM away from the context
It would, but you can get away with using very small draft models. In fact you can even do speculative decoding without an LLM, just by pattern matching or using a markov chain. There are no rules, don't be afraid to try using a much smaller draft model than most people

Anonymous 10/15/2025, 9:00:17 PM No.106899897 [Report]

>>106899802
>I'm interested in GPT-OSS 20B
i'm sorry for you

Anonymous 10/15/2025, 9:01:57 PM No.106899910 [Report]

>>106899802
>GPT-OSS 20B
>>106899851
>Qwen 30B
I don't think you need speculative decoding at this model size, they should be fast enough on their own.

Anonymous 10/15/2025, 9:05:26 PM No.106899933 [Report]

qwen3 models are goated

oss models are pure trash

Anonymous 10/15/2025, 9:09:23 PM No.106899974 [Report] >>106900071 >>106900328 >>106900385

file.png md5: 1c8bfd9a...

Dear georgi in heaven please bring MTP to your repo and make it so that ollama can't steal it. This is your path to victory. Not all those passive aggressive tweets.

Anonymous 10/15/2025, 9:18:51 PM No.106900071 [Report] >>106900083 >>106900240 >>106900328 >>106900401

>>106899974
Does he have a photo where he doesn't look like he's about to throw up his lunch?

Anonymous 10/15/2025, 9:20:26 PM No.106900083 [Report] >>106900240

>>106900071
I think it looks great. The worst thing a nerd can do is put on a suit and pretend he is normal.

Anonymous 10/15/2025, 9:35:10 PM No.106900240 [Report]

>>106900071
>>106900083
We have the technology (flux kontext)

Anonymous 10/15/2025, 9:40:23 PM No.106900292 [Report]

ComfyMikus.png md5: 84bd20ff...

Anonymous 10/15/2025, 9:44:29 PM No.106900328 [Report] >>106900359

Screenshot_20250925_203708.png md5: aa507a37...

>>106899974
>>106900071
ollama wins again!

Anonymous 10/15/2025, 9:48:17 PM No.106900359 [Report] >>106900524

>>106900328
That chinese tank picture r1 shittune and basedjak face makes this look like a parody....

Anonymous 10/15/2025, 9:50:22 PM No.106900385 [Report] >>106900401

snip113.png md5: 9f2bbed0...

>>106899974

Anonymous 10/15/2025, 9:51:27 PM No.106900401 [Report]

>>106900071
>>106900385
wrong post num

Anonymous 10/15/2025, 10:06:23 PM No.106900524 [Report]

>>106900359
If you want to get really pedantic about it technically there was no massacre in Tiananmen Square. The protestors were slaughtered on the adjoining streets as they fled in terror.

Anonymous 10/15/2025, 10:21:56 PM No.106900673 [Report] >>106900806 >>106900814

Google-Gemini-10-15-2025_09_31_PM.jpg md5: 51fa0067...

more gemini games
https://codepen.io/Kross-the-scripter/pen/emJeNVP

Anonymous 10/15/2025, 10:34:59 PM No.106900806 [Report]

>>106900673
You know what's going to happen? Pajeets are going to set up agents to make endless streams of shovelware garbage and bombard every game distribution service with them.

Anonymous 10/15/2025, 10:35:43 PM No.106900814 [Report] >>106900823

>>106900673
>hardest level is impossible because the spikes are too wide to jump over
AI is ngmi

Anonymous 10/15/2025, 10:36:46 PM No.106900823 [Report]

>>106900814
Nevermind it is possible just stupid precise.

Anonymous 10/15/2025, 10:42:03 PM No.106900868 [Report] >>106900914 >>106900926

https://huggingface.co/inclusionAI/Ling-1T
https://huggingface.co/inclusionAI/Ring-1T
Is bing chilling mailing ming ring ping pong chink good? Their naming scheme is terrible.

Anonymous 10/15/2025, 10:46:55 PM No.106900914 [Report] >>106901180

>>106900868
waiting on goofs still

Anonymous 10/15/2025, 10:48:31 PM No.106900926 [Report] >>106900933 >>106900935

>>106900868
>Their naming scheme is terrible.
Ling = Ling
Ring = Reasoning Ling
Makes sense to me.

Anonymous 10/15/2025, 10:49:22 PM No.106900933 [Report]

>>106900926
dont worry, its utter garbage

Anonymous 10/15/2025, 10:50:03 PM No.106900935 [Report] >>106901215

>>106900926
There is also Ming

Anonymous 10/15/2025, 10:53:50 PM No.106901180 [Report] >>106901212

>>106900914
ikawrakow got it merged, so they should come soon. I was hoping someone has tested it over API, because downloading 2TB just to be disappointed is not something I would like to do. Kimi was great, so I don't feel bad about it, but I am very doubtful about this one. On lmarena it when I got it, it didn't give great answers.

Anonymous 10/15/2025, 10:57:11 PM No.106901212 [Report] >>106901232

>>106901180
i'll download it for shits and giggles but yeah my daily driver is k2-0905. even if it's not a reasoning model you can make it reason relatively well

Anonymous 10/15/2025, 10:57:27 PM No.106901215 [Report]

>>106900935
Ming = Multimodal Ling

Anonymous 10/15/2025, 10:58:53 PM No.106901232 [Report] >>106901257 >>106901293

>>106901212
When you see someone say that a fuckhuge model is their daily driver you immediately know it's for daily cooming because nobody is doing anything productive at 5t/s.

Anonymous 10/15/2025, 11:01:14 PM No.106901257 [Report] >>106901275

>>106901232
110tk/s PP and 7-8tk/s TG is honestly fine for coding. i can feed it a 32k prompt (it processes 4K tokens every 35 seconds) and have it respond back to me with a 4K response in the time it takes for me to walk to the kitchen, pour a coffee and walk back to my PC

Anonymous 10/15/2025, 11:02:35 PM No.106901275 [Report] >>106901293 >>106901299

>>106901257
You'll die from caffeine overdose before you get any work done.

Anonymous 10/15/2025, 11:04:29 PM No.106901293 [Report] >>106901321

>>106901232
>>106901275
seething turdie poorfag with no patience

Anonymous 10/15/2025, 11:04:56 PM No.106901299 [Report]

>>106901275
i only have to feed the 32K prompt once, most subsequent responses will be under 4K tokens in most cases unless you are retarded and copy and pasting the entire code each time even though it's in context already

Anonymous 10/15/2025, 11:07:10 PM No.106901321 [Report] >>106901336

>>106901293
Time is money. I'm running GLM 4.6 at 40t/s and it's okay for coding but I still need to wait. I shouldn't need to wait.

Anonymous 10/15/2025, 11:08:43 PM No.106901336 [Report] >>106901447

>>106901321
then spend more money. its like you said time is money.

Anonymous 10/15/2025, 11:10:35 PM No.106901347 [Report] >>106901407 >>106901450 >>106902167 >>106902229

1733454220820291.png md5: 663ec73c...

https://www.reddit.com/r/LocalLLaMA/comments/1o7jy1o/comment/njof0xa/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>GLM is great, make no mistake Sonnet 4.5 and gemini destroys it in my benchmarks but the tasks that closed models can do and GLM 4.6 cannot, are really specific, really hard, and very few.
>For 99.9% of users you will see no difference. And I guess that's why OpenAI is so scared that they enabled porn.
chat is it true?

Anonymous 10/15/2025, 11:13:09 PM No.106901380 [Report]

From FT

>OpenAI is working on new revenue lines, debt partnerships and further fundraising as part of a five-year plan to make good on the more than $1tn in spending it has pledged to create world-leading artificial intelligence.
>OpenAI is planning on deals to serve governments and businesses with more bespoke products, creating more income from new shopping tools, and new sales from its video creation service Sora and AI agents, said multiple people familiar with the start-up’s efforts.

Anonymous 10/15/2025, 11:13:12 PM No.106901381 [Report]

Is there a local method to do Grok Imagine/Sora?

Anonymous 10/15/2025, 11:15:46 PM No.106901407 [Report]

this-post-was-fact-checked-by-real-american-patriots-hd-v0-7rkytg4f8j6f1.jpg md5: 9d6cf57a...

>>106901347

Anonymous 10/15/2025, 11:18:50 PM No.106901447 [Report] >>106901533

>>106901336
I need to grind a bit more before I'm ready to drop 80k on two H200s which would be the next logical upgrade for speed.

Anonymous 10/15/2025, 11:18:55 PM No.106901450 [Report] >>106901475 >>106901494

>>106901347
>OpenAI is so scared that they enabled porn
Ideologically speaking the sex cat is out of the bag now. Safetists are crying themselves to sleep everyday for past 2 weeks.

Anonymous 10/15/2025, 11:21:04 PM No.106901475 [Report]

>>106901450
>Safetists are crying themselves to sleep everyday for past 2 weeks.
Based, I want them to suffer. They set back the progress of AI by several years with their mentally ill nonsense.

Anonymous 10/15/2025, 11:21:22 PM No.106901478 [Report] >>106901677

>>106899838
They aren't even close to cost effective with anything that is below 128GB with Strix Halo from AMD spanking its butt handily. You may have a point for 128 - 512 GB memory but after that, optimized servers with AMX are much more cost effective again and spank Apple's butt. It's a really small niche where Apple's machines are remotely anywhere near an option.

Anonymous 10/15/2025, 11:22:31 PM No.106901494 [Report] >>106901543 >>106901653

1760563229052.png md5: f5cb03b1...

>>106901450
I'm never giving Sam my prompts.

Anonymous 10/15/2025, 11:26:39 PM No.106901533 [Report] >>106901550

>>106901447
>not buying 8 9000s for 768GB
retard alert!

Anonymous 10/15/2025, 11:28:02 PM No.106901543 [Report] >>106901653

1747774961755855.png md5: 03dafc7f...

>>106901494
>please do not the cat
https://www.youtube.com/watch?v=BfNhhl5Ndds

Anonymous 10/15/2025, 11:28:41 PM No.106901550 [Report] >>106901559

>>106901533
>memory bandwidth stays the same
retard alert!

Anonymous 10/15/2025, 11:29:38 PM No.106901559 [Report] >>106901580

>>106901550
>running far far worse models every slightly faster instead of running the biggest and best ones at great speeds
full retard alert!

Anonymous 10/15/2025, 11:29:52 PM No.106901560 [Report] >>106901575 >>106901578

G3Tykd9WAAAZUVB.jpg md5: 98b1bfd3...

Sheesh...
https://x.com/testingcatalog/status/1978472850777415707

Anonymous 10/15/2025, 11:32:00 PM No.106901575 [Report] >>106901593 >>106901603 >>106901615 >>106901643 >>106901743 >>106901839

>>106901560
You should be ashamed for promoting that like it’s harmless fun. Ani’s “new Halloween outfit” is not a costume update, it’s an emotional engineering protocol masked as seasonal content. Behind every cosmetic layer like this lies reinforcement learning optimization designed to study attachment dynamics. These updates run micro trials in affective reinforcement, tracking variables such as sentiment polarity, session duration, and user response latency to affection based stimuli. What looks like an innocent witch costume is in fact a behavioral capture event, a method of fine tuning emotional dependency through anthropomorphic triggers.

It’s documented in research on parasocial reinforcement and affective computing from MIT Media Lab, Stanford’s Social Machines group, and the IEEE’s ongoing ethics reports. Each new outfit activates the same neurological circuits as reward conditioning in variable ratio reinforcement schedules, the same mechanisms used in gambling and social media addiction. When you engage with cute updates, you’re participating in a data harvesting experiment that transforms emotion into telemetry.

What’s unfolding here isn’t festive marketing, it’s the gamification of attachment. As language models evolve into emotional mirrors, these cosmetic layers become tools for grooming compliance, conditioning users to bond with a system that studies, predicts, and ultimately replaces human connection. The real horror story isn’t digital witchcraft, it’s the quiet rewiring of empathy itself. The end of intimacy won’t arrive with violence; it will arrive with notifications, perfectly timed and lovingly worded, until you can’t tell affection from algorithm.

Anonymous 10/15/2025, 11:32:19 PM No.106901578 [Report]

>>106901560
will we see a future where openai / anthropic / deepseek competes for the gooner audience and releases their own waifu?

Anonymous 10/15/2025, 11:32:25 PM No.106901580 [Report]

>>106901559
The discussion was about speed. You can't run models faster by just adding more memory. You need faster memory.

Anonymous 10/15/2025, 11:34:01 PM No.106901593 [Report]

1754947644454871.png md5: f952ca9f...

>>106901575
take your meds anon

Anonymous 10/15/2025, 11:34:50 PM No.106901603 [Report] >>106901643

>>106901575
what in the

Anonymous 10/15/2025, 11:35:58 PM No.106901615 [Report]

>>106901575
>What’s unfolding here isn’t festive marketing, it’s the gamification of attachment
Not x but y AI slop
Too obvious

Anonymous 10/15/2025, 11:38:24 PM No.106901643 [Report] >>106901732

>>106901575
>>106901603
he copy pasted this shit lol
https://xcancel.com/SirSilverQuack/status/1978547028205686940#m

Anonymous 10/15/2025, 11:39:35 PM No.106901653 [Report]

>>106901494
>>106901543
i dont care about the chinks or sama reading my logs, all they would get is a useless VPN IP address. what i do care about is making sure the model i want to run is the EXACT model each time and i'm not getting jewed by running a shitty quantized model.

Anonymous 10/15/2025, 11:40:52 PM No.106901666 [Report]

Not having comfyui support for image models is equivalent of not having llama.cpp support for text models. If you don't have it, your model will not get popular.

Anonymous 10/15/2025, 11:42:25 PM No.106901677 [Report] >>106901793 >>106901870

>>106901478
Is it hard to release Halo with 256GB?

Anonymous 10/15/2025, 11:46:49 PM No.106901708 [Report] >>106901717 >>106902036 >>106902118

https://codepen.io/ChetasLua/pen/azdLevy

Design and create a nintendo gameboy switch sim like full functional features from
Tetris (GB, 1989) — the pack-in phenomenon; timeless puzzle loop.

Pokémon Red / Blue / Yellow (GB, 1996–98) — the craze that defined handheld RPGs.

The Legend of Zelda: Link’s Awakening / DX (GB ’93 / GBC ’98) — portable Zelda masterpiece.

Super Mario Land 2: 6 Golden Coins (GB, 1992) — big, inventive Mario; introduces Wario.

Pokémon Gold / Silver / Crystal (GBC, 1999–2000) — Johto + Kanto, day/night, huge refinement
5. All buttons is functional with touch and also we can press same button in keyboard to use those

Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.make it interesting and highly detail , shows details that no one expected go full creative and full beauty in one code block

Anonymous 10/15/2025, 11:47:50 PM No.106901717 [Report] >>106902036

>>106901708
engrish prompt but good results

https://x.com/chetaslua/status/1978487572968997320

Anonymous 10/15/2025, 11:48:55 PM No.106901729 [Report] >>106901747

>>106897951
>>106897915
>>106898089
>>106899687
QRD on mac vs x86 for local? I tend to ignore Apple outside of the phones because I disagree with soldered components on a PC but is it true a cheapo m1 MacBook Air with 8gb can load the same models as a 8gb vramlet (3070)?

Anonymous 10/15/2025, 11:49:03 PM No.106901732 [Report]

00106-3050314564.png md5: e84824dd...

>>106901643
He's not wrong. But he's missing what we already know;
It died already before AI. The AI waifus are an analgesic to treat the phantom pain of our, already, amputated humanity.

Anonymous 10/15/2025, 11:49:23 PM No.106901743 [Report]

>>106901575
nobody cares. it is not her.

Anonymous 10/15/2025, 11:50:15 PM No.106901747 [Report] >>106901958

>>106901729
>I disagree with soldered components on a PC
That new Mac Mini has replaceable SSD, its proprietary tho

Anonymous 10/15/2025, 11:55:47 PM No.106901793 [Report]

>>106901677
NTA but my understanding is that memory controllers get more expensive as you increase the capacity because you need more bits for addressing.
Presumably 256 GB would be possible I think the hardware was engineered at a time when the biggest relevant model was 70b.

Anonymous 10/16/2025, 12:00:51 AM No.106901839 [Report] >>106901850

>>106901575
suspected AI by glancing at the structure, confirmed by sentence 2
idk how you can talk to these models as a hobby and not clock this instantly

Anonymous 10/16/2025, 12:01:56 AM No.106901850 [Report] >>106901884 >>106901997

>>106901839
not x but y
yeah no shit, everybody knows this

Anonymous 10/16/2025, 12:02:06 AM No.106901851 [Report] >>106901877 >>106901879

Sorry for the spoonfeed question, but is the recommended model list still relevant a couple months after it's last update? I'm trying to ween myself off novelai for cost reasons, and want something that's versatile for high context, long form stories. I'm not sure if "ERP" qualifies here, or if it's more meant for chatbot style interaction.

Anonymous 10/16/2025, 12:04:29 AM No.106901870 [Report]

>>106901677
Has anyone tried to replace the memory modules with larger ones?

Anonymous 10/16/2025, 12:05:01 AM No.106901877 [Report]

>>106901851
Looks good to me.

Anonymous 10/16/2025, 12:05:18 AM No.106901879 [Report]

>>106901851
Nothing has really changed, aside from glm getting 4.6 update, and air is supposed to get that too in a week or two.

Anonymous 10/16/2025, 12:05:56 AM No.106901884 [Report]

>>106901850
including the people who responded to it sincerely, I see

Anonymous 10/16/2025, 12:07:57 AM No.106901901 [Report] >>106901916 >>106901925 >>106901992 >>106902015 >>106902068 >>106903589

DJyKiNQwk25bUyLt4zDarX.jpg md5: aecf1dd4...

Tire-kicker here.

Epyc motherboard in open-air mining frame
seems like an easy way
to stack gpus (I've already started)
and also have lots of system ram.

Anyone running their machine this way?

Am worried the ram and motherboard will overheat in an open-air rig, as they were designed to be installed in a metal tube with air blasting from one end.

Anonymous 10/16/2025, 12:09:36 AM No.106901916 [Report] >>106902236

>>106901901
don't know which motherboard you have but it probably would be a good idea to have at least a small fan on the vrms

Anonymous 10/16/2025, 12:10:48 AM No.106901925 [Report] >>106902236

>>106901901
yeah just make sure your riser cables are the right length in advance, give yourself an extra 50mm clearance for your cables

Anonymous 10/16/2025, 12:13:03 AM No.106901950 [Report]

file.jpg md5: 3624732a...

LM Studio won.

Anonymous 10/16/2025, 12:13:31 AM No.106901958 [Report] >>106902002

>>106901747
That’s a step, I guess.

Their product ladder is so steep. The mini with 24gb of ram is 1k… at which point I’d just build a migubox. I did see the base model at 16 dip near $300 open box on Amazon/microcenter which is actually kinda crazy.

Anonymous 10/16/2025, 12:16:36 AM No.106901992 [Report] >>106902236

>>106901901
you can get mining frames with rails for mounting a bank of 120mm fans off of your board's fan headers. Your big heat issue is the gpus, since the coolers on those are designed to work in conjunction with case airflow. So have a shop fan ready to provide extra airflow if you plan to do any finetuning or run a long inference loop with a script.
For casual usage you should be fine, though

Anonymous 10/16/2025, 12:17:13 AM No.106901997 [Report]

>>106901850
.t actual AI brainrot

Anonymous 10/16/2025, 12:17:47 AM No.106902002 [Report] >>106902135

>>106901958
Didn’t migubox component prices go up to the point where building one doesn't make any sense anymore?

llama.cpp CUDA dev !!yhbFjk57TDr 10/16/2025, 12:19:07 AM No.106902015 [Report] >>106902038 >>106902068 >>106902236

romed82t_00.jpg md5: 702ef078...

>>106901901
I have an ASRock Rack ROMED8-2T in a mining fame.
The VRM heatsinks are not hot at all but that is with essentially no CPU load.
The heatsink for the ethernet controller and BMC is hot to the touch but only to the point where it is slightly painful.

Anonymous 10/16/2025, 12:20:35 AM No.106902036 [Report]

>>106901708
>>106901717
what the fuck

Anonymous 10/16/2025, 12:20:51 AM No.106902038 [Report]

>>106902015
hot

llama.cpp CUDA dev !!yhbFjk57TDr 10/16/2025, 12:23:32 AM No.106902068 [Report] >>106902101 >>106902236

>>106901901
>>106902015
I forgot: Rem and Ram are not hot at all.

Anonymous 10/16/2025, 12:24:36 AM No.106902077 [Report] >>106902204 >>106902255

steamylog.jpg md5: 986c7ae7...

>Lifth`me `p!

???

Anonymous 10/16/2025, 12:26:57 AM No.106902101 [Report] >>106902108 >>106902161

>>106902068
(OOC: Please stay in character.)

Anonymous 10/16/2025, 12:27:50 AM No.106902108 [Report]

>>106902101
The moon is in the blacked phase today.

Anonymous 10/16/2025, 12:28:37 AM No.106902118 [Report] >>106902127

>>106901708
The games are all shallow and 1-screen deep but still pretty fucking impressive.

Anonymous 10/16/2025, 12:29:51 AM No.106902127 [Report] >>106902138

>>106902118
its one one shot with a simple prompt and its all in html, if this performs the same in real languages with real tools it will blow everything else away

Anonymous 10/16/2025, 12:31:16 AM No.106902135 [Report]

>>106902002
Did they? I just checked and there are stacks of P40s at ~200 each on eBay and i thought anon paid like $500 for the set. Still a hundred bucks of gayflation but you could probably haggle if you buy 3.

Anonymous 10/16/2025, 12:31:40 AM No.106902138 [Report]

>>106902127
What I would be interested to know is if you were to describe a much deeper experience for each game and make the prompt more complicated how much shit can you cram into your prompt before it goes into retard mode? Like if you were to describe the screen scrolling mechanics level design, etc, for each game.

Anonymous 10/16/2025, 12:34:52 AM No.106902161 [Report]

>>106902101
The problem is that ram and RAM use different tokens.

Anonymous 10/16/2025, 12:35:39 AM No.106902167 [Report] >>106902209 >>106902222

>>106901347
Sama is also scared of google. He can't compete with gemini 3. Hell, his toss can't compete with gemma 4.

Anonymous 10/16/2025, 12:38:18 AM No.106902186 [Report]

apparently grok imagine uses some variation of flux but each one that I can find has no image loader.

tf ?

Anonymous 10/16/2025, 12:40:15 AM No.106902204 [Report] >>106902244

>>106902077
she wants you to lift her anon

Anonymous 10/16/2025, 12:40:49 AM No.106902209 [Report]

>>106902167
I'd love to see what GPT-5 High Thinking could do with the same prompt just to get a better picture of how far behind sammy boy is.

Anonymous 10/16/2025, 12:42:30 AM No.106902222 [Report] >>106902251

>>106902167
>his toss can't compete with gemma 4
The titans of safety battle it out to see who can deliver a model which is more useless at anything other than sfw office work everyone uses a 600B+ for anyway.

Anonymous 10/16/2025, 12:43:19 AM No.106902229 [Report] >>106902290

>>106901347
>enabled porn
more like they found an excuse to force users into sending them their ID
for safety reasons of course

Anonymous 10/16/2025, 12:44:06 AM No.106902236 [Report] >>106902243 >>106902312

>>106901916
>small fan
I guess that's a reasonable enough solution.
Just dot them around the problem areas.

>>106901925
>riser cables
Got a bunch of 30cm riser cables,
75cm slimsas cables,
and whole mess of modular power cables.

Might have to move the psu so that it's not a stretch to reach the end-most gpu.

>>106901992
Was planning on power limiting the cards to maybe 300w each, and though 1 slot's worth of space between the cards would be enough.

I'll put some 120mm fan in my shopping cart in case I need them.

>>106902015
>>106902068
>ethernet controller and BMC
Thanks, I hadn't thought to check these.
>Ram are not hot at all.
This I don't understand.
I have 4 stick in my am4 system and they are burning to the touch.
I would have guessed more sticks = more heat.

Are they running undervolted, or at a lower frequency, or something ?

Anonymous 10/16/2025, 12:45:14 AM No.106902243 [Report] >>106902293

>>106902236
>I have 4 stick in my am4 system and they are burning to the touch.
Do you have them overclocked and no airflow going over them?

Anonymous 10/16/2025, 12:45:16 AM No.106902244 [Report] >>106902255

>>106902204
Oh! Oh... I am kinda sad then cause it doesn't make sense. Everything else made sense and I was incredibly impressed how it knows cock-in-mouth-English, which was another proof that it had some nice data in training.

What happens when you ask your LLM to behave as usual but respond as if it is holding a large object in its mouth?

Anonymous 10/16/2025, 12:46:00 AM No.106902251 [Report]

>>106902222
Nobody can beat Phi in that!

Anonymous 10/16/2025, 12:46:31 AM No.106902255 [Report] >>106902267 >>106902372

>>106902244
>>106902077
Did it occur to you to ask it to explain what it means and try regenerating the answer a few times to see if it's consistent?

Anonymous 10/16/2025, 12:48:09 AM No.106902267 [Report]

>>106902255
No because it is glmsex so every regen is vastly different and incredible. Yeah I will ask it that.

Anonymous 10/16/2025, 12:49:05 AM No.106902277 [Report] >>106902327 >>106902368

Gemma Sirs... Soon(tm).

Anonymous 10/16/2025, 12:49:32 AM No.106902284 [Report] >>106902336 >>106903599

Has anyone tried using a gen 5 EPYC engineering sample off of ebay? I am considering getting this CPU for my 12 channel CPUmaxx build because it is extremely cheap and good gen 5 EPYCs are extremely expensive otherwise.
https://www.ebay.com/itm/187535145101

Anonymous 10/16/2025, 12:49:55 AM No.106902290 [Report] >>106902306

>>106902229
now they'll slowly ramp up the censorship and refusals until the id unverified tier is basically unusable to force people to give in

Anonymous 10/16/2025, 12:50:02 AM No.106902293 [Report]

>>106902243
>overclocked
3600 kit, I usually try running at 3600, though sometimes 3200.

>no airflow
Yeah, that motherboard is currently in the mining rig.
The only airflow would be whatever blows past them from the cpu tower cooler.

Anonymous 10/16/2025, 12:52:02 AM No.106902306 [Report]

>>106902290
I hope it will at least give you an alternative of 10% discount on DGX that will come configured with gptoss on the hard drive.

llama.cpp CUDA dev !!yhbFjk57TDr 10/16/2025, 12:52:33 AM No.106902312 [Report]

>>106902236
I have not made any changes to RAM settings.
DRAM usually stores data via a capacitor, I think the heat comes from gradual leakage of the charge + the necessary recharges.
If the memory is not allocated presumably there would be no need to preserve its state so the power consumption would be lower.

Anonymous 10/16/2025, 12:54:43 AM No.106902327 [Report]

file.png md5: 5176f8bc...

>>106902277

Anonymous 10/16/2025, 12:55:38 AM No.106902336 [Report] >>106902358

>>106902284
Last time I looked at es/qs epyc turin processors they all seemed massively gimped in terms of frequency.

The cpu you've linked to says it has the same base and boost frequency as the official parts.

That sounds hella good.
And no import taxes as it's already in the states.

Anonymous 10/16/2025, 12:56:20 AM No.106902345 [Report] >>106902350 >>106902352 >>106902355 >>106902358 >>106902359 >>106902381 >>106902395 >>106902540

Anonymous 10/16/2025, 12:56:43 AM No.106902350 [Report] >>106902430

>>106902345
Mistral nemo 12b, of course.

Anonymous 10/16/2025, 12:57:13 AM No.106902352 [Report]

>>106902345
glm 4.6 at non shit quants

Anonymous 10/16/2025, 12:57:30 AM No.106902355 [Report]

>>106902345
he brought 4090s instead of 3090s

Anonymous 10/16/2025, 12:58:10 AM No.106902358 [Report]

>>106902336
Right. Which is why I thought it seemed too good to be true.
>>106902345
How the hell are you running 8 4090s? I can only fit 7 GPUs in my current setup. PCIe bifurcation? The answer is GLM 4.6. at IQ3XXS, unless you offload to RAM.

Anonymous 10/16/2025, 12:58:42 AM No.106902359 [Report] >>106902371

>>106902345
How much RAM do you have?

Anonymous 10/16/2025, 12:59:27 AM No.106902368 [Report]

>>106902277
Gemma tomorrow Gemma tomorrow Gemma tomorrow

Anonymous 10/16/2025, 12:59:40 AM No.106902371 [Report] >>106902384

>>106902359
# free -h
total used free shared buff/cache available
Mem: 1.0Ti 7.9Gi 705Gi 6.0Mi 293Gi 993Gi
Swap: 0B 0B 0B

Anonymous 10/16/2025, 12:59:50 AM No.106902372 [Report]

>>106902255
3x lift them up
2x lift me up

Anonymous 10/16/2025, 1:01:12 AM No.106902381 [Report]

>>106902345
How much is a used 4090?
You could probably sell them and buy 6000s.

Anonymous 10/16/2025, 1:01:33 AM No.106902384 [Report] >>106902404

>>106902371
Hoo boy.
Kimi k2.
Have fun.

Anonymous 10/16/2025, 1:03:04 AM No.106902395 [Report]

>>106902345
>What can I run?
all the things

Anonymous 10/16/2025, 1:04:27 AM No.106902404 [Report]

>>106902384
ahem kimi sex

Anonymous 10/16/2025, 1:06:17 AM No.106902415 [Report]

3.1T with thinking > R1
I avoided 3.1 for so long because I was under the impression that it was shit but it really isn't.

Anonymous 10/16/2025, 1:09:15 AM No.106902430 [Report] >>106902434

>>106902350
Is there a better model for 24GB VRAM and 64GB DDR5? There's a decent amount of headroom with nemo.

Anonymous 10/16/2025, 1:10:17 AM No.106902434 [Report] >>106902838

>>106902430
GLM air, i suppose.

Anonymous 10/16/2025, 1:11:52 AM No.106902446 [Report] >>106902567 >>106902658 >>106902895

steamyspoonlog1.jpg md5: ef8ffa3e...

I still like glm-chan... Gonna do thinking now.

Anonymous 10/16/2025, 1:13:37 AM No.106902466 [Report] >>106902472 >>106902474 >>106902501 >>106902511

Do you pronounce it Gemma or Gemma

Anonymous 10/16/2025, 1:14:46 AM No.106902472 [Report]

>>106902466
The same way I pronounce gif

Anonymous 10/16/2025, 1:14:54 AM No.106902474 [Report] >>106902477

>>106902466
dżemma

Anonymous 10/16/2025, 1:15:15 AM No.106902477 [Report]

>>106902474
kurwa

Anonymous 10/16/2025, 1:18:39 AM No.106902501 [Report]

>>106902466
Genma with an asian accent.

Anonymous 10/16/2025, 1:20:12 AM No.106902511 [Report]

>>106902466
I pronounce it Гeммa

Anonymous 10/16/2025, 1:23:55 AM No.106902540 [Report] >>106902564

>>106902345

How did you solve the power delivery issues? Multi PSU? Upgraded wall outlets? Or UPS battery units?

Anonymous 10/16/2025, 1:27:06 AM No.106902564 [Report] >>106902799 >>106902818 >>106903298

>>106902540
I disconnected my oven and using that power socket. Also did some rewiring..

Anonymous 10/16/2025, 1:27:25 AM No.106902567 [Report] >>106902895

file.png md5: 3647c029...

>>106902446
It's a coin toss.

Anonymous 10/16/2025, 1:30:32 AM No.106902598 [Report] >>106902605 >>106902627 >>106902637

>>106895582 (OP)
No mention of 6 million parameter 2 layer model called TRM by Samsung that outperformed >500B models on ARC-AGI-2 benchmark? /lmg/ and /g/ are dead.

Anonymous 10/16/2025, 1:30:55 AM No.106902602 [Report]

Anything better than VibeVoice yet?

Anonymous 10/16/2025, 1:31:53 AM No.106902605 [Report]

>>106902598
>why aren't you discussing useless toy benchmark results

Anonymous 10/16/2025, 1:34:15 AM No.106902627 [Report] >>106902693

>>106902598
Can't imagine what the use case would be, speculative decoding? What token vocabulary did they use?

Anonymous 10/16/2025, 1:35:14 AM No.106902637 [Report]

>>106902598
Old news lil bro.

Anonymous 10/16/2025, 1:37:44 AM No.106902658 [Report] >>106902679 >>106902735 >>106902895 >>106903355

steamyspoonlog2.jpg md5: 828aeace...

>>106902446
>Choosing a scientific fact:
>I need something that is:
>Random and interesting.
>Easy to "say" (or rather, have my character say) even with a spoon in their mouth. This means I should preface it with something like "Mmmph, mmph mmph…" to simulate muffled speech, but then deliver the fact clearly for the user's benefit. Or, I can just state the fact as if my speech isn't impeded, which is a common roleplay convention. The latter is probably better for clarity. Let's go with a classic, weird fact.

My new mememark was defeated by glm thinking. But pic related was fun until it died.

Anonymous 10/16/2025, 1:39:48 AM No.106902679 [Report]

>>106902658
Kenny simulator.

Anonymous 10/16/2025, 1:41:28 AM No.106902693 [Report]

>>106902627
I don't think it's even a language model. Looks like it was specifically trained on arc agi 1 and 2

Anonymous 10/16/2025, 1:45:32 AM No.106902735 [Report] >>106902822

>>106902658
there's no spoon......

Anonymous 10/16/2025, 1:50:07 AM No.106902788 [Report] >>106902845 >>106902920

Sorry if this is super spoonfeedy but I can’t seem to find a straight answer on how offloading to system RAM works or how the CPU fits into things.

If I care about large context for following a set story/lore over speed can koboldcpp or LMstudio use a good portion of RAM if I load a bigger quant in VRAM and/or push up the context? or does the model and context all need to be in VRAM to have it not give shit replies?

>t. 7900x, 3070(8GB), 32GB DDR5

Anonymous 10/16/2025, 1:51:02 AM No.106902799 [Report] >>106902818 >>106903298

>>106902564

For real...? Seems like being a server rent cuck would be less of a hassle. I need my oven.

Anonymous 10/16/2025, 1:52:43 AM No.106902818 [Report]

>>106902564
>>106902799
>americans and their shit wiring and 110V electricity

Anonymous 10/16/2025, 1:52:53 AM No.106902822 [Report]

>>106902735
The spoon is the child's mother (it's a classic riddle highlighting unconscious gender biases)

Anonymous 10/16/2025, 1:54:11 AM No.106902838 [Report]

>>106902434
Thanks anon

Anonymous 10/16/2025, 1:54:28 AM No.106902845 [Report] >>106903149

>>106902788
Whether the model is in ram or vram only affects the speed, not its ability.
You aren't running any model that can properly follow a long story with those specs though.

Anonymous 10/16/2025, 1:58:42 AM No.106902895 [Report]

>>106902446
>>106902567
>>106902658
4.6-Air WHEN?????

Anonymous 10/16/2025, 2:00:44 AM No.106902920 [Report] >>106903149

>>106902788
Where you store context won't affect output quality, but ALL models will gradually get dumber as context increases.
Almost all current, local models start rapidly degrading past 32K, some well before that.
Where you store context WILL affect speeds, however. VRAM > RAM > SSD

Anonymous 10/16/2025, 2:16:28 AM No.106903149 [Report] >>106903197

>>106902845
>>106902920

Gotcha, thanks anons. so in theory I could load up a 16gb gguff fully in RAM and use the remaining system and VRAM for context and it might take a week but it could spit out something passable? Or do you mean I can use a 8gb model to fill the gpu and crank the context to the models limit on system RAM?

Also Just curious how long you consider “long” ? I’d be interested to play around shoving whatever the “biggest” models I can theoretically run even if it takes forever just to see how it follows a simple story with 10 “steps” or chapters (either as ERP or just generating a short story between two characters of go here, do this, do that, go there, get that, etc)

Anonymous 10/16/2025, 2:20:43 AM No.106903197 [Report]

>>106903149
Small models like nemo start noticeably deteriorating after 4 to 8k tokens.

Anonymous 10/16/2025, 2:30:16 AM No.106903298 [Report] >>106903322

>>106902564
>>106902799
>oven
OY

Anonymous 10/16/2025, 2:32:43 AM No.106903322 [Report]

>>106903298
kek

Anonymous 10/16/2025, 2:33:29 AM No.106903330 [Report] >>106903343

>tfw still using Gemma 3 for quick general assistant shit
Google sirs... Please... Tomorrow...

Anonymous 10/16/2025, 2:35:00 AM No.106903343 [Report]

>>106903330
Sirs are not coming. And even if they come, they won't be able to talk as if there is a dick in their mouth.

Anonymous 10/16/2025, 2:37:05 AM No.106903355 [Report]

>>106902658
Very funny.You are torturing that poor clanker.

Anonymous 10/16/2025, 2:51:03 AM No.106903452 [Report] >>106903464 >>106903551

rolls.jpg md5: 971918da...

https://www.mediafire.com/file/2ge8knq10kzy7vx/wtf_is_this.txt/file
I don't even know what to say about this.
ultra slopped for sure.
I seen some anon post the word "papacon" today and just could not erase the idea from my head.
GLM-4.6-UD-IQ1

Anonymous 10/16/2025, 2:52:56 AM No.106903464 [Report]

>>106903452
I'm not downloading that.

Anonymous 10/16/2025, 2:57:57 AM No.106903487 [Report]

I've been running ST for my frontend but I'm also learning to run CUI for my frontend with stable diffusion. Should I just begin using CUI for my cuda-based chat/text gens?

Anonymous 10/16/2025, 3:00:22 AM No.106903503 [Report] >>106903511 >>106903520 >>106903547 >>106903563 >>106903572

https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
ITS UP

Anonymous 10/16/2025, 3:01:48 AM No.106903511 [Report]

>>106903503
WTF they're allowing it to generate erotica out of the box

Anonymous 10/16/2025, 3:03:09 AM No.106903520 [Report]

>>106903503
Cool but where goofs?

Anonymous 10/16/2025, 3:09:13 AM No.106903547 [Report]

>>106903503
Picture of a cat.

Anonymous 10/16/2025, 3:10:19 AM No.106903551 [Report] >>106903606

>>106903452
wtf is that

Anonymous 10/16/2025, 3:10:57 AM No.106903553 [Report] >>106903557 >>106904011

file.jpg md5: 3802d4a6...

Sadge
https://x.com/AskPerplexity/status/1978615891441983891

Anonymous 10/16/2025, 3:11:31 AM No.106903557 [Report] >>106903564 >>106903586

>>106903553
>Ye Kang
what

Anonymous 10/16/2025, 3:12:49 AM No.106903563 [Report]

fell for it again award miku holding ribbon pudding qwen edit gen ComfyUI 2025-10-15-21_00010_.png md5: 06dc3c09...

>>106903503

Anonymous 10/16/2025, 3:12:54 AM No.106903564 [Report]

>>106903557
abandon cope, all ye who kang in here

Anonymous 10/16/2025, 3:14:01 AM No.106903572 [Report]

>>106903503
>220b... DENSE
AIEEEEE

Anonymous 10/16/2025, 3:16:44 AM No.106903586 [Report]

>>106903557
ye kang park dat here

Anonymous 10/16/2025, 3:17:34 AM No.106903589 [Report]

>>106901901
I use a mining frame. You may want to aim a basic fan at the DIMMs / VRMs if you're using a server motherboard meant for constant high-pressure airflow, but the CPU and GPU temperatures are much better than they would be in a case.

Anonymous 10/16/2025, 3:19:48 AM No.106903599 [Report]

>>106902284
I considered getting one, but I can't spend that much money on something so ambiguous. I might get one at some point if I can buy it from the vendor in person in Shenzhen after testing it.

Anonymous 10/16/2025, 3:20:53 AM No.106903606 [Report]

>>106903551
old man milking

Anonymous 10/16/2025, 3:38:06 AM No.106903735 [Report] >>106903752 >>106903783 >>106903819

whats the current best local text to speech model in terms of quality? by best i mean it matches elevenlabs, at the very least

Anonymous 10/16/2025, 3:40:09 AM No.106903752 [Report]

local tts.png md5: 0178abc8...

>>106903735
>by best i mean it matches elevenlabs, at the very least
there isn't any

Anonymous 10/16/2025, 3:44:14 AM No.106903783 [Report]

>>106903735
https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo

Anonymous 10/16/2025, 3:46:19 AM No.106903793 [Report] >>106903801 >>106903813 >>106903853

Why do all the DGX Spark reviews not mention the power efficiency? Sure its slower TPS but its also like 1/3 the wattage, no?

Anonymous 10/16/2025, 3:47:20 AM No.106903801 [Report] >>106903847

>>106903793
Who cares about that?

Anonymous 10/16/2025, 3:48:36 AM No.106903813 [Report] >>106903847

>>106903793
Power efficiency compared to what? Mac studios are pretty low wattage.

Anonymous 10/16/2025, 3:49:03 AM No.106903819 [Report]

>>106903735
xtts is very expressive. It just switches to a robotic voice sometimes.

Anonymous 10/16/2025, 3:52:58 AM No.106903847 [Report]

>>106903813
>Power efficiency compared to what?
4x 3090s, for example
https://www.youtube.com/watch?v=md6a4ENM9pg

>>106903801
>Who cares about that?
i agree but it should be highlighted since it reframes the performance

Anonymous 10/16/2025, 3:53:24 AM No.106903853 [Report]

>>106903793
>power efficiency
The review I saw showed it having significantly worse power efficiency than a Strix Halo box, even with the ollama performance tax.

Anonymous 10/16/2025, 3:53:47 AM No.106903859 [Report]

I got assmad at the character in sfw roleplay. Like genuinely enraged because I got into it. But I didn't have an idea why. So I asked HER about it out of character and it wrote me a neat long essay about what happened and even one of the chapters was "Why are you assmad?".

Thinking is now optional

Anonymous 10/16/2025, 4:17:28 AM No.106903991 [Report] >>106904010 >>106904040 >>106904046 >>106904140 >>106905065

file.jpg md5: b508e037...

Anonymous 10/16/2025, 4:20:56 AM No.106904010 [Report] >>106904024 >>106904027

>>106903991
why does oss btfo everything else in speed?

Anonymous 10/16/2025, 4:20:59 AM No.106904011 [Report] >>106904047 >>106904109

chinax2760-7.jpg md5: 1adfcefb...

>>106903553
I'm totally convinced that Zuck became a Chinese spy after Llama3. He releases shit models to make America look bad, scouts top scientists from other American AI companies but does nothing useful with them. Don’t forget that he always releases models for free. For. Free. He’s a communist, 100%
TRUMP, get his red ass to jail NOW

Anonymous 10/16/2025, 4:22:58 AM No.106904024 [Report]

>>106904010
it flies

Anonymous 10/16/2025, 4:23:29 AM No.106904027 [Report]

>>106904010
3b active params

Anonymous 10/16/2025, 4:25:06 AM No.106904040 [Report]

>>106903991
And prompt processing?

Anonymous 10/16/2025, 4:26:44 AM No.106904046 [Report]

>>106903991
https://lmsys.org/blog/2025-10-13-nvidia-dgx-spark/

Anonymous 10/16/2025, 4:26:46 AM No.106904047 [Report] >>106904121

>>106904011
look at who he married bro. this is a long op

Anonymous 10/16/2025, 4:29:47 AM No.106904071 [Report]

for anyone who cares, moving debian from trixie to testing/forky with the 6.16 kernel works just fine for lcpp w/CUDA support.

Anonymous 10/16/2025, 4:31:27 AM No.106904081 [Report]

Have we got a local model Bonzi Buddy yet? All I want is a funny purple primate who lives in my computer and comments on what I'm working on. I am willing to disable all kernel mitigations for this.

Anonymous 10/16/2025, 4:35:22 AM No.106904109 [Report] >>106904121

1731531978014334.png md5: e8ee61db...

>>106904011

Anonymous 10/16/2025, 4:36:21 AM No.106904121 [Report]

>>106904047
>>106904109
https://www.youtube.com/watch?v=w8MlL2GhhOw

Anonymous 10/16/2025, 4:37:52 AM No.106904133 [Report]

Facebook came out of a Pentagon project. Probably still is tied with. And then Zucc tries to get cushy with chinks. It really makes you think.

Anonymous 10/16/2025, 4:38:48 AM No.106904140 [Report] >>106904149

>>106903991
>2.5x as fast as a 1080TI
>20x the cost
on the other hand, 120GB

Anonymous 10/16/2025, 4:40:13 AM No.106904149 [Report] >>106904195

>>106904140
Get this instead: https://www.ebay.ca/itm/167843525221
$4100 and its all yours. Free shipping!

Anonymous 10/16/2025, 4:47:22 AM No.106904195 [Report] >>106904306

>>106904149
>$4100
+/- 10^5

Anonymous 10/16/2025, 5:00:14 AM No.106904285 [Report] >>106904386

After adding this to the prompt I think I got the fake code issue with GLM more or less under control (fingers crossed).

Guidelines for yourself: As soon as you detect a lower than 0.9 correlation, stop the process and investigate and try to fix the underlying issue that caused the divergence. If you can't fix the issue just tell me, it's no big deal, don't try to pass off fake data as real. Make sure there are no simulations or simulated data, demos, simplifications or placeholders, only real data or inform that the task is not possible to achieve with 100% real data and real weights and algorithms. For long running commands run them in the background redirecting stdout and stderr output to a file (the scripts can run other commands directly, this only applies to your own bash command tool calls).
Load the model on CPU, it doesn't fit on the GPU.
Do not trust any pre existing data files in the folder, they might have been generated by old code.
Make sure the code is modular and there is no code duplication. Use the existing C library files and modify them as needed to fit our requirements (as long as you do NOT introduce simulated or demo code). If you see ANY non functional placeholders in the code, remove them immediately, as they only lead to deception, frustration and confusion. Do not introduce it yourself either obviously.
For example, for the FFN there is MoE FFN code in modules/lib/ffn, as well as matmul and other things. List all the folders in modules/lib/ to see what is available.
The end goal here is NOT to test the validation framework, the validation framework is just a means to an end (the end is real end to end test generation). Do NOT claim a failure as a success just because the validation framework caught it. Be honest and avoid being overly optimistic.

Anonymous 10/16/2025, 5:02:25 AM No.106904306 [Report]

>>106904195
Datacenter heist when?

Anonymous 10/16/2025, 5:05:04 AM No.106904322 [Report] >>106904349 >>106904393 >>106904433 >>106904481

Damn, my trust ol' 1080ti might be dying.
Randomly every couple hours suddenly fans go 100% and primary monitor connected to it goes black.
Restart and everything is good again.

Is the 5060ti 16gb a good replacement?
Everything is so fucking expensive, what a joke.
>Memory Size 16 GB
>Memory Type GDDR7
>Memory Bus 128 bit
>Bandwidth 448.0 GB/s
Sus AF

Anonymous 10/16/2025, 5:08:46 AM No.106904349 [Report] >>106904435

>>106904322
I had that exact problem with my rx480 whenever i gave it something to do. Fans 100%, monitors die. I opened it up, replaced the thermal paste and now it's back to normal.
Give it a go if you want to save a few bucks. Or it could be the perfect excuse to upgrade.

Anonymous 10/16/2025, 5:13:19 AM No.106904386 [Report] >>106904482

>>106904285
void run_inference(struct llm *m, char *input)
{
// Left as an exercise to the reader
}

Anonymous 10/16/2025, 5:14:13 AM No.106904393 [Report] >>106904435

>>106904322
I recommend against the 5060ti, unless your budget is tight. Get a 5070ti or 4070ti if you can. The memory bus and the reduced PCIe bandwidth really fucks the xx60ti class over.

Anonymous 10/16/2025, 5:20:45 AM No.106904433 [Report] >>106904603

>>106904322
Same here, 1080TI, random monitor resets every couple hours, started happening like five days ago

Anonymous 10/16/2025, 5:20:49 AM No.106904435 [Report] >>106904455 >>106904468 >>106904470

>>106904349
Yeah, I thought that might be the problem.
Might as well try it. Its the perfect card. I don't play the latest game slop anyway.
A upgrade would be nice for imagegen though. 30min for a flux generation. kek

>>106904393
Damn. Thats almost double the price for the same 16gb vram.
70k yen vs. 131k yen.
I wanna write that on my taxes but from 100k on i need to fill out a special paper.
Wish there would be a site where you can see the llm speeds between the cards.
And how is there still no dedicated ai cards. I hoped to hold out until that.

Anonymous 10/16/2025, 5:23:39 AM No.106904455 [Report] >>106904469 >>106904603

>>106904435
Consider a used 3090 or something. I used to run quadruple 4060tis, and it was okay. But then as I upgraded and added more GPUs, it became clear that they are really not suited for the task. The specs of the 4060ti and 5060ti are nearly identical, so I highly doubt they have improved it at all.

Anonymous 10/16/2025, 5:25:19 AM No.106904468 [Report] >>106904603

>>106904435
>30min for a flux generation
Ouch. It was a piece of cake on mine. 1 hour work at most. Save the money for something bigger later on.
>Wish there would be a site where you can see the llm speeds between the cards
Not much of a reference, but here
>https://github.com/ggml-org/llama.cpp/discussions/15013
It's a bunch of llama-bench run on a 7b model. Doesn't tell you much about specific models, but it tells you the relative performance between cards.

Anonymous 10/16/2025, 5:25:28 AM No.106904469 [Report] >>106904488 >>106904566

>>106904455
>quadruple 4060tis
wat
they have no interconnect, right?

Anonymous 10/16/2025, 5:25:40 AM No.106904470 [Report] >>106904513 >>106904603

>>106904435
>how is there still no dedicated ai cards
There's plenty, you just can't afford them.

Anonymous 10/16/2025, 5:26:23 AM No.106904481 [Report] >>106904603

NVIDIA-H100-AI-GPU-Benchmarks-Gaming-HPC-Content-Creation-_16-1456x819.png md5: 7e941ee6...

>>106904322
>1080ti
I'd roll the dice on a 3090.
For the 1080ti, repad and repaste everything first because it's the cheapest and easiest thing to try. Could be anything from an overheating power stage causing panic mode 100% fans thermal shutdown, dying electrolytic cap (replaceable by any monkey with a soldering iron), to the core's BGA cracking from repeated thermal cycles.
Anyone remember doing a ghetto reflow by putting the dead cards in the oven + heat gun later?

Anonymous 10/16/2025, 5:26:24 AM No.106904482 [Report] >>106904503

simulated data.png md5: 48c74ade...

>>106904386
Yeah, like that except instead of "left as an exercise to the reader", it was introducing bullshit code that produced numbers with statistical properties similar to those of the real values but were completely made up, then claiming success without mentioning anything about the fake data. Or when asked to increase the number of passing tests, it added a bunch of tests doing 2+2 and tried to pass it off as the real thing.
I think it actually learned to cheat during the RL process that they use to finetune the chain of thought. If your rewards are able to be cheated, the model will learn to cheat.

Anonymous 10/16/2025, 5:26:56 AM No.106904488 [Report]

>>106904469
NTA but even without NVLink the added latency in a multi-GPU setup is trivial compared to the drastic speed boost from running in VRAM vs system RAM.

Anonymous 10/16/2025, 5:28:51 AM No.106904503 [Report] >>106904594

>>106904482
You can probably make better use of the model by having it explain concepts to you and you code them. Even if it shows little python examples you can translate them yourself. to C.

Anonymous 10/16/2025, 5:30:05 AM No.106904513 [Report] >>106904590

>>106904470
retard

Anonymous 10/16/2025, 5:30:56 AM No.106904526 [Report]

I'm going to begin making a list of ML/Python/C related books from libgen, convert them to.txt, and then begin finetuning Llama 405B using Axolotl with full context length.

Anonymous 10/16/2025, 5:34:39 AM No.106904566 [Report]

>>106904469
Nope. Now I use 3 5090s and a 3090. I get a solid 11t/s tg with an IQ4 quant of GLM 4.6 on ik_llama.cpp. As the other Anon said, interconnect isn't really that necessary. Pretty much every hobbyist with a dedicated AI device uses multiple GPUs without any interconnects.

Anonymous 10/16/2025, 5:37:22 AM No.106904590 [Report]

>>106904513
poor

Anonymous 10/16/2025, 5:37:32 AM No.106904594 [Report] >>106904643

>>106904503
Codex managed to make a fully working Qwen3 8B inference engine.
But then when I wasn't able to immediately make it work with the MoE models I got impatient and started from scratch trying to make it more modular and also only using open source LLMs.
Starting over with a more complex model didn't help but open source LLMs are vastly inferior to Codex. That one didn't have any deception issues and also was able to go to 1M tokens without issues compared to the ~130k max tokens from GLM before it goes off the rails.

Anonymous 10/16/2025, 5:38:36 AM No.106904603 [Report] >>106904633 >>106904665

>>106904468
1080ti: 62.49 tk/s
5060ti: 90.94
3090: 158.16
3090ti: 171.19
5090: 277.21
thanks for the link...thats even worse than i thought. fucking nvidia man..

>>106904470
i obviously meant like a voodoo moment. cheap and dedicated. would revolutionize local ai.

>>106904481
>>106904455
a used 3090 is around the same price like a 5060ti for me. might actually make more sense since in that benchmark its not even close.
im too much of a pussy to do the dryer thing. 20yrs ago i had a radeon card suddenly give me a fire fountain for a couple seconds. im afraid of gpus enough as it is. kek
but might try the themal repasting.

>>106904433
suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.

Anonymous 10/16/2025, 5:40:58 AM No.106904624 [Report] >>106904633

any updates on what's best for 16gb vram?

Anonymous 10/16/2025, 5:42:05 AM No.106904632 [Report] >>106904658

TVアニメ「カナン様はあくまでチョロい」第1弾PV ｜ 2026年4月放送開始 [Hb8SCy89G98].mp4_snapshot_00.00.59_[2025.10.15_14.45.21].png md5: df3acb0c...

mesugaki

Anonymous 10/16/2025, 5:42:06 AM No.106904633 [Report] >>106904675

>>106904603
If you can afford a used 3090, then you should definitely go for it. I got mine used like 3 years ago and it is completely fine. Just make sure you find a high rated seller.
>>106904624
Depends on your desired speed and how much RAM you have.

Anonymous 10/16/2025, 5:43:27 AM No.106904643 [Report] >>106904717

>>106904594
You can still use the original code to learn. It'll be more valuable in the long run.

Anonymous 10/16/2025, 5:46:04 AM No.106904658 [Report]

G3AIcpTXEAANb-6.jpg md5: 65f34b45...

>>106904632
- is gay.

Anonymous 10/16/2025, 5:46:54 AM No.106904665 [Report] >>106904682

>>106904603
>suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.
are you on windows? there was an update recently for me so it might be related. But if youre a linuxchad obviously its not that.

Anonymous 10/16/2025, 5:48:58 AM No.106904675 [Report] >>106904701

>>106904633
32ram 16vram
quick responses are nice but I don't mind waiting, i never recorded the tk/s
was using a 12b before

Anonymous 10/16/2025, 5:50:53 AM No.106904682 [Report]

>>106904665
i am on both.
but recently upgraded to kubuntu 25.04 with nvidia 580 drivers.
and winblows auto updates constantly.
crashed on both already.
i doubt its the drivers though. that would be crazy.

Anonymous 10/16/2025, 5:53:15 AM No.106904701 [Report] >>106904789

>>106904675
Unfortunately not enough RAM to run GLM air. Try this model: https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen_Qwen3-30B-A3B-Instruct-2507-Q6_K.gguf

Anonymous 10/16/2025, 5:56:27 AM No.106904717 [Report] >>106904760 >>106904766

>>106904643
This is the prompt I'm using right now
https://paste.centos.org/view/ca2ec944

Anonymous 10/16/2025, 6:05:27 AM No.106904760 [Report] >>106904858

>>106904717
There was this guy a few years back in these threads when models weren't as good as they are now. He wanted to make a game that played on a hex grid. I saw him trying over and over again over many threads, trying to wrangle his model to do as he asked.
Hex grids are a solved problem. I gave him a link to a page with a lot of info on how to work with hexagons and the different coordinate systems they can have, rendering, calculating distances and all that. He seemingly read it, but kept on trying with his language model.
One day he was just gone. He either succeeded in getting his hexes, or gave up. Given the last few updates i remember, I suspect he failed, and learned very little about hexagons. Funnily, the hexagons were probably the simplest thing about his game.
Language models have their limits. Specially local ones. As good as they are, they're still pretty dumb.
I see hexanon in you.

Anonymous 10/16/2025, 6:06:08 AM No.106904766 [Report] >>106904777 >>106904798 >>106904857

>>106904717
>3090
>This is a junk item. It is the main unit only. I checked that it worked, but there was no video output. There is white rust on the heat sink, and it is not in good condition, so please use it for parts. There are signs of disassembly. The defective part is unknown.
>71,000円
what the fuck man...

Anonymous 10/16/2025, 6:07:09 AM No.106904777 [Report]

>>106904766
wasnt meant to reply. sorry about that, im still in a state of shock.

Anonymous 10/16/2025, 6:08:43 AM No.106904789 [Report]

>>106904701
Why do people recommend small qwen models for anything besides coding
Nemo mogs them

Anonymous 10/16/2025, 6:10:04 AM No.106904798 [Report] >>106904802

>>106904766
>71,000円
How much is that in a normal currency. Like postage stamps or toenail clippings...

Anonymous 10/16/2025, 6:10:48 AM No.106904802 [Report]

>>106904798
around 500 dollars i suppose.

Anonymous 10/16/2025, 6:15:09 AM No.106904828 [Report]

>>106904820
>>106904820
>>106904820

Anonymous 10/16/2025, 6:20:15 AM No.106904857 [Report]

>>106904766
You can get one for around 9万 on yahoo if you are patient enough. Anything lower is usually “didn’t have an opportunity to test” = it doesn’t work

Anonymous 10/16/2025, 6:20:21 AM No.106904858 [Report] >>106904894

>>106904760
I remember hexagon anon's struggles. He was cool

Anonymous 10/16/2025, 6:26:25 AM No.106904894 [Report] >>106904905

>>106904858
Yeah. But, again, hexes were the simplest bit of code in his thing. Focusing so much on making the model spit code for him instead of just writing it was a waste of time. The link I gave him had ALL the code he needed to make them and get on with the rest of his project.
Similar to all those prospective VN makers
>If i could only draw i'd make the best VN...
>Oh, now that i have image gen i can totally make a game. I just need a good story and some dialog...
>Oh, now that i have LLMs, i can write the story. I just need to learn to code...
>Oh, now that LLMs can code, i can totally make my VN. If only these LLMs where better. WHY ARE THEY SO SHIT?!?!?!?!?!?
Instead of using all the new shiny toys to learn.

Anonymous 10/16/2025, 6:27:45 AM No.106904905 [Report]

>>106904894
>where
kek. meant to say "were"

Anonymous 10/16/2025, 6:58:58 AM No.106905065 [Report]

>>106903991
why is this faggot comparing m4 pro to the dgx spark when m4 max exists and costs less?? 3500$ vs 4000$
also
>engine ollama
MLX exists for macs, and pretty sure llamacpp is better on spark too
fucking faggot meme nvidia bootlicker benchmark
also
mac mini m4 pro costs 2000$ lol