← Home ← Back to /g/

Thread 106895582

390 posts 98 images /g/
Anonymous No.106895582 [Report] >>106895660 >>106895800 >>106897558 >>106897957 >>106902598
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106888625 & >>106879668

►News
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking
>(10/11) koboldcpp-1.100.1 prebuilt released with Wan video generation support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.100.1
>(10/10) KAT-Dev-72B-Exp released: https://hf.co/Kwaipilot/KAT-Dev-72B-Exp
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106895585 [Report]
►Recent Highlights from the Previous Thread: >>106888625

--Optimizing GLM Air performance with DDR4/DDR5 and VRAM configurations:
>106889300 >106889313 >106889330 >106889352 >106889360 >106889397 >106889434 >106889482 >106889432 >106889458 >106889745 >106889970 >106890067 >106890094
--NVIDIA power settings affecting DGX Spark performance in llama.cpp:
>106894917 >106895166
--DIY synth project with SDL2 and braille terminal output:
>106894166 >106894928 >106895017 >106895264
--Skepticism about DGX Spark's practicality:
>106888768 >106888792 >106888864 >106889010 >106889150 >106889186 >106890419 >106890523 >106891031 >106890245 >106890298 >106890355 >106890421 >106890450 >106890484 >106890626
--Critique of AI benchmarking methods and real-world capability tests:
>106892598 >106892617 >106892632 >106892639 >106892674
--Qwen3-VL implementation in llama.cpp and anime drawing reference:
>106889098
--Speculation about Google Gemini 3.0 Pro surpassing transformers in AI capabilities:
>106892372 >106892386 >106892395 >106892429 >106892438 >106892441 >106892393 >106892399 >106892442 >106892453 >106892410 >106892417 >106892416 >106892434 >106892478 >106892503 >106892512 >106892538
--Local medical/engineering AI chatbot setup challenges and requirements:
>106888801 >106888824 >106888870 >106889000 >106889272 >106889441 >106888852
--Speculating Gemma 4's architecture and performance relative to Gemini models:
>106893070 >106893146 >106893185 >106893197 >106893453 >106893523 >106893543
--Evaluation and potential of Gemini One Shot game demo:
>106892521 >106892551 >106892741 >106892750 >106892755 >106892758 >106892790
--Intel's delayed release of high-memory inference-optimized GPU:
>106889713
--GLM Air UD IQ2_m performan
--Miku (free space):
>106889098 >106891580 >106891644 >106891656 >106893119
--Teto (my beloved):
>106889709 >106889879 >106890666

►Recent Highlight Posts from the Previous Thread: >>106888628

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106895599 [Report] >>106897957
►Recent Highlights from the Previous Thread: >>106888625

--Optimizing GLM Air performance with DDR4/DDR5 and VRAM configurations:
>106889300 >106889313 >106889330 >106889352 >106889360 >106889397 >106889434 >106889482 >106889432 >106889458 >106889745 >106889970 >106890067 >106890094
--NVIDIA power settings affecting DGX Spark performance in llama.cpp:
>106894917 >106895166
--DIY synth project with SDL2 and braille terminal output:
>106894166 >106894928 >106895017 >106895264
--Skepticism about DGX Spark's practicality:
>106888768 >106888792 >106888864 >106889010 >106889150 >106889186 >106890419 >106890523 >106891031 >106890245 >106890298 >106890355 >106890421 >106890450 >106890484 >106890626
--Critique of AI benchmarking methods and real-world capability tests:
>106892598 >106892617 >106892632 >106892639 >106892674
--Qwen3-VL implementation in llama.cpp and anime drawing reference:
>106889098
--Speculation about Google Gemini 3.0 Pro surpassing transformers in AI capabilities:
>106892372 >106892386 >106892395 >106892429 >106892438 >106892441 >106892393 >106892399 >106892442 >106892453 >106892410 >106892417 >106892416 >106892434 >106892478 >106892503 >106892512 >106892538
--Local medical/engineering AI chatbot setup challenges and requirements:
>106888801 >106888824 >106888870 >106889000 >106889272 >106889441 >106888852
--Speculating Gemma 4's architecture and performance relative to Gemini models:
>106893070 >106893146 >106893185 >106893197 >106893453 >106893523 >106893543
--Evaluation and potential of Gemini One Shot game demo:
>106892521 >106892551 >106892741 >106892750 >106892755 >106892758 >106892790
--Intel's delayed release of high-memory inference-optimized GPU:
>106889713
--Miku (free space):
>106889098 >106891580 >106891644 >106891656 >106893119
--Teto (my beloved):
>106889709 >106889879 >106890666

►Recent Highlight Posts from the Previous Thread: >>106888628

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106895660 [Report]
>>106895582 (OP)
You just know.
Anonymous No.106895774 [Report]
Here's my vibe-coded python script to use gemma3-27b to symlink senpcli downloads into a format wanted by Jellyfin, so shows end up listed with their seasons under the show title: https://pastebin.com/Fuba2vsH

So, having set it up, it got me looking for a second GPU for this sort of autmated stuff, and holy shit, prices are way up on anything not abandoned in CUDA 13.
Anonymous No.106895800 [Report] >>106895867 >>106895912
>>106895582 (OP)
>testing some newish abliterated models
>pic related
wew saaars hacking the planet! britishers soon to be BTFO
Anonymous No.106895867 [Report]
>>106895800
saar we must refuse
Anonymous No.106895912 [Report] >>106895922
>>106895800
What's next? Discovering exploits in the alphabet?
Anonymous No.106895922 [Report]
>>106895912
Burn the books, recycle computer screens, text is forbidden, an invention that corrupts our youth
Anonymous No.106895972 [Report] >>106895995 >>106896064 >>106896757 >>106897090 >>106897332
Still waiting for cool stuff to come here: https://huggingface.co/google
Anonymous No.106895995 [Report]
>>106895972
cool stuff is not safe
Anonymous No.106896064 [Report] >>106896074 >>106896191 >>106896194 >>106896218 >>106896236
>>106895972
usecase for cool stuff?
Anonymous No.106896074 [Report]
>>106896064
cool stuff
Anonymous No.106896191 [Report]
>>106896064
I will be laughing at the safe output together with glm chan.
Anonymous No.106896194 [Report]
>>106896064
suicide prevention
Anonymous No.106896218 [Report] >>106896455
>>106896064
it leaves you cold, a bit uncomfortable and makes you want to leave
Anonymous No.106896236 [Report] >>106896455
>>106896064
Chatting with a female-brained LLM instead of a coombro one.
Anonymous No.106896321 [Report]
Does Qwen3-VL-30B-A3B properly recognize NSFW images?
Anonymous No.106896455 [Report]
>>106896236
>>106896218

https://rentry.org/ydwuw44t
Anonymous No.106896489 [Report] >>106896594 >>106896675 >>106896700 >>106896707 >>106897772
Have any anons done any work with implementing a long-term memory system? Are there any pre-established applications or scripts people are using for it, or is it something people are doing custom?
Anonymous No.106896594 [Report] >>106896707 >>106897006
>>106896489
Silly has both summarization and VectorDB functionalities.
There's a couple of hybrid RAG solutions out there that might work better depending on your use case.
Anonymous No.106896653 [Report]
>be llama.cpp
>no qwen 3 vl
>still no gemma 3n multimodality (image, audio input)
do we really have to use one of the python raviolis to use a modern multimodal model
3n in particular I've tried on my phone a few times and its image input surprised me, it's very very good for a small model even at doing tasks like OCR+translation
Anonymous No.106896654 [Report]
earth gamer trellis
Anonymous No.106896656 [Report] >>106896694 >>106896698 >>106896720
We have peak.
Anonymous No.106896675 [Report]
>>106896489
No you can't have a girlfriend yet. Even though you have 4.6.
Anonymous No.106896677 [Report] >>106896689 >>106896698
llama.cpp should just use a native python jinja parser instead of that shitty jinja clone.
Anonymous No.106896689 [Report] >>106896695
>>106896677
i mean yeah, they've already given up on no python thanks to mistral-common so might as well
Anonymous No.106896694 [Report]
>>106896656
>x-win
>mlewd
>undster
Those were the times... of absolute shit output that made you regret even trying to jerk off to this shit.
Anonymous No.106896695 [Report]
>>106896689
>they've already given up on no python thanks to mistral-common so might as well
gas the french
Anonymous No.106896698 [Report]
>>106896656
"open bob and vegana" prompt to a TEXT model. I've seen enough of those in the comments for image models as well. Kinda funny.
>>106896677
What's next? Python dependencies to run inference on models... oh...
Anonymous No.106896700 [Report] >>106897051
>>106896489
For roleplay or for trying to shoe in trivia from a search?
Anonymous No.106896707 [Report] >>106897051
>>106896594
>>106896489
nta, you are correct, but silly is amazingly shit at it. i've struggled with both summarization and the vector db.
vector db is useless, mostly I just use summarization now but end up re-writing it manually every 10 messages as it gets it wrong.
world info is also good but takes up a bit of context if you go all out.
Anonymous No.106896720 [Report]
>>106896656
>On my penis
geg
Anonymous No.106896757 [Report] >>106896891
>>106895972
gemma sirs release kindly?
Anonymous No.106896891 [Report] >>106896898
>>106896757
you do know gemma is made by deepmind based in london?
so it's
OI BRUV WHER DA FUC IS GEMMA M8? FACCIN WANKAS
Anonymous No.106896898 [Report]
>>106896891
>london
>not SAAR infested
lole
Anonymous No.106897006 [Report] >>106897022 >>106897051
>>106896594
I want something that can handle essentially giving an LLM access to a library of media and past conversations, timestamped. Something that can give them a strong grounding in a contextual present, so they're aware of their presence and orientation in space, time, and current events.

Also, I understand sillytavern needs an embedding model to feed inputs into to feed the VectorDB? Do you have any preferences in regards to embedding models?
Anonymous No.106897022 [Report] >>106897073
>>106897006
last time I tried using embeddinggemma but I think ST transformer.js version wasnt updated yet to use it.
Anonymous No.106897051 [Report]
>>106896700
see >>106897006
Knowing trivia would be a natural byproduct of the abilities I'm seeking, as would being more effective at roleplay, although that's not the goal of my project.

>>106896707
Good to hear, thanks. If you don't mind my asking, what exactly did you struggle with in regards to the summarization and vector db? It seems the summarization is not so great, but is that sillytavern or the model you're using, do you think?
Anonymous No.106897073 [Report] >>106897085
>>106897022
>embeddinggemma
Any particular reason?

>I think ST transformer.js version wasnt updated yet to use it.
the billion forks of transformers and torch and the other libraries are the most frustrating part of dealing with AI, honestly.
Anonymous No.106897085 [Report] >>106897092
>>106897073
>Any particular reason?
it's the latest SOTA embedding model bro, it's also light and has ONNX available
Anonymous No.106897090 [Report]
>>106895972
>Local Veo
we are back
Anonymous No.106897092 [Report]
>>106897085
Okay, good to know, thank you. I was priced out of local AI until somewhat recently, so I'm doing my research now.
Anonymous No.106897216 [Report] >>106897283
Hey, what kind of infra would you use if you want a chatbot on a website? I want it all to be local and it’s going to describe stuff returned by an api call
Anonymous No.106897246 [Report] >>106897349
Anonymous No.106897283 [Report]
>>106897216
You need to give more details.
The answer could be anything from
>your desktop is enough
to
>rent a datacenter
Nvidia Engineer No.106897332 [Report]
>>106895972
Tomorrow @ 9PM PT
Anonymous No.106897349 [Report] >>106897355 >>106897443
>>106897246
Well why does he need 1 trillion $ of gpus then?
Anonymous No.106897355 [Report]
>>106897349
for the agi agent >>106897333
Anonymous No.106897443 [Report]
>>106897349
it's called grifting
Anonymous No.106897558 [Report]
>>106895582 (OP)
boppin
Anonymous No.106897581 [Report] >>106897590 >>106897608 >>106897627
>https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/discussions/5#68ef2fce36d035901352694d
It's happening!
Anonymous No.106897590 [Report]
>>106897581
Kindly kys
Anonymous No.106897608 [Report] >>106897618 >>106897723
>>106897581
>E4B

OOOO that is the wey for western companies. They should all continue by dropping models below 10B. That way they can cover up their incompetence (due to safety) with the model size. I think even a dumb faggot with too much money they have to sell this to will understand even a perfect 10B can't beat glm-chan.
Anonymous No.106897618 [Report]
>>106897608
Isn't that model 5 months old?
Anonymous No.106897627 [Report]
>>106897581
>On the LMArena benchmark, it achieved a score above 1300 Elo points (LMArena benchmark).
i'm shaking
Anonymous No.106897634 [Report] >>106897688
What is the best way to learn neural networks in 2025 for not the smartest men? I need to modify them, adapt for other frameworks and hardware.
Anonymous No.106897688 [Report] >>106897759
>>106897634
ask chat gpt
Anonymous No.106897723 [Report] >>106897778 >>106897787 >>106897795
>>106897608
>That way they can cover up their incompetence (due to safety)
To mention the one biggest obsession of retarded /lmg/ users, E4B actually knows what a mesugaki is and will accurately describe what it means without any promptfu, just doing template-less completion will do
the only incompetent person in the room is the /lmg/ eternal coomer whining about safetycuckery who cries rivers if the model doesn't write degenerate garbage from the basic webui and built in instruct template
I'd like to see a chink model at 4b with the level of knowledge of gemma 3n, that doesn't exist because chinks depend on giant moe to cover up their lack of competent execution
Anonymous No.106897759 [Report]
>>106897688
Actually good advice, thanks!
Anonymous No.106897772 [Report] >>106897824 >>106897887
>>106896489
There have been a lot of attempts at RAG based retrieval systems for memory but the reality is that they've all kind of turned out to be sort of unreliable and mediocre. In terms of performance, increasing context length and dumping tons of shit into context has proven itself to be far superior. Unfortunately, that requires a an exorbitant amount of hardware that puts it squarely outside the realm of local.
Anonymous No.106897778 [Report]
>>106897723
hello sir
Anonymous No.106897787 [Report] >>106897821
>>106897723
i will not acknowledge your troll post with a serious response. on an off chance that you aren't a troll you are a dumb faggot with brown hands who has no ram and should frankly kill yourself. or you have ram cause you bought DGX Spark, in that case please live as long as possible.
Anonymous No.106897795 [Report]
>>106897723
I will say, these 3n models are really impressive for their size.
It's also a really cool way to do sparsity.
Anonymous No.106897821 [Report] >>106897839
>>106897787
>dumb faggot with brown hands
says the saar screaming chaina numba wane all day every day
even with all those giant moe ya niggers still can't reach an inch of Gemini's quality in handling large context kek
yes it's not local but SOTA was never local and GLM is not a replacement for SOTA, brownie
Anonymous No.106897824 [Report]
>>106897772
You likely don't need it for every layer. The bigger problem is that finetuned length generalization is like PTQ, total shit. Handle the long context in pre-training or fuck off.
Anonymous No.106897839 [Report]
>>106897821
>inch of Gemini's quality
fuck off to aicg nigger
Anonymous No.106897857 [Report] >>106897915
How do you call this legendary duo? Luxury LLM joke? The cloud model evangelists?
Anonymous No.106897859 [Report] >>106897864 >>106898006
sirs please be of calm, gemmi waits soon.
Anonymous No.106897864 [Report]
>>106897859
go stick your cock into an api socket
Anonymous No.106897887 [Report] >>106897901 >>106897992
>>106897772
>but the reality is that they've all kind of turned out to be sort of unreliable and mediocre
Yeah.
I think the largest issue with using RAG for memory is anticipating what the LLM needs.
If you need a memory to change the direction of the chat history, for example (Eg. adding a surprise or twist in a story), in a scenario where the LLM has that information in its context, it can choose to use it or not, in scenario where it doesn't and you are relying on RAG, the LLM doesn't know that that memory exists.
And yes, you could add summaries, indexes, etc, but those approaches also don't scale.
I guess that with a sufficiently fast model, your RAG could be a simple database with every memory then the model just goes through each memory, selecting the ones it think it needs, then iterate that until it decides that there are no more relevant memories?
Anonymous No.106897901 [Report] >>106897933
>>106897887
>anticipating what the LLM needs
Sounds like something a model could do.
Anonymous No.106897915 [Report] >>106897951 >>106901729
>>106897857
The Apple of AI in an environment where the actual Apple has better solutions that let you run better models
Anonymous No.106897933 [Report]
>>106897901
Ideally, the model itself, which is essentially the example I gave.
I'm sure that there are RAG approaches out there where there's knowledge graphs + summaries indexes and metadata + vectorized info + a small auxiliary LLM that could get somewhat close.
And probably slow as hell too.
Anonymous No.106897951 [Report] >>106901729
>>106897915
As much as I dislike apple this is one space where they actually bothered to read the room instead of sitting there and smelling their own shit.
Anonymous No.106897957 [Report] >>106898004
>>106895582 (OP)
>>106895599
Being friends with Bug Miku
Anonymous No.106897992 [Report] >>106898038
>>106897887
>your RAG could be a simple database with every memory then the model just goes through each memory
Thing that comes to my mind is a 7B (trigger warning: meme word) agent that is supposed to think of different possible keywords that would be related to the current conversation. And those keywords pull stuff up from database. It is not gonna work of course.
Anonymous No.106898004 [Report]
>>106897957
Deeply insightful. Very high quality post. My day feels better now. I am so happy to be here. kys
Anonymous No.106898006 [Report]
>>106897859
I administering excitement right now, too much to endure...!
Anonymous No.106898016 [Report] >>106898030 >>106898054
kek
https://twitter.com/ggerganov/status/1978479624091803961?t=Hf8NS4LF_wfgD0l8p0VAXw&s=19
Anonymous No.106898028 [Report] >>106898095 >>106898111 >>106898147
Why are people hyped about something that will just refuse them?
Anonymous No.106898030 [Report]
>>106898016
he's so mad, yet he lets them piss on him all the time, must have weird hatefucking orgies
Anonymous No.106898038 [Report]
>>106897992
That's the thing. Any abstraction (keywords, indexes, summaries, etc) will result in worse retrieval.
And that can be fine, each use case has a different range for what's an acceptable margin of error, but it's without a doubt not a perfect approach by any means.
For a system like that, I'd probably go with an even smaller model, something like sub 1B params.
Anonymous No.106898054 [Report] >>106898077
>>106898016
>ollama made NVidia look like shit
>niggermanov akshually
Wow, what a faggot
Anonymous No.106898077 [Report]
>>106898054
Anonymous No.106898089 [Report] >>106901729
I actually expect apple to put out a capable local device before nvidia does. M5 Pro/Max/Ultra look promising based on the M5 announcement
Anonymous No.106898095 [Report]
>>106898028
>that will just refuse them
that's an assumption
which, i grant you, is nearly always initially the case.
but it remains to be seen.
Anonymous No.106898111 [Report] >>106898138
>>106898028
Because they're not promptlets?
Anonymous No.106898138 [Report] >>106898187
>>106898111
Gemma writes erotica exclusively for women.
Anonymous No.106898147 [Report]
>>106898028
I made Gemma abuse Miku yesterday. I think you're hallucinating.
Anonymous No.106898158 [Report]
very looking forwards to more totally honest gemma postings for weeks
Anonymous No.106898180 [Report] >>106898199 >>106898596
December 2025
Anonymous No.106898186 [Report] >>106898327 >>106898423 >>106898479
I want to give a model something like a few thousand medical journal articles and a dozen medical textbooks, some of my symptoms, and my blood test results and ask it to come up with hypotheses for why I'm sick and what further tests might in theory be worth asking a doctor to order.

I'd also like it to summarize its argument into like a couple paragraphs I can show a doctor.

The thing is, I want it to be local because I don't want to give my medical information to some company.

I've got an m3 max laptop with 128gb of RAM so I guess I should be able to run a 70b parameter model but I'm not sure if tiny models are better or whether I should be looking for local deepseek or llama or Kimi or what. Does anyone know how to approach this?
Anonymous No.106898187 [Report]
>>106898138
Eew, I don't want rape and violence in my comfy vanilla erp
Anonymous No.106898199 [Report] >>106898395
>>106898180
May 13, 2024 https://futurism.com/the-byte/sam-altman-openai-nfsw-stuff
Anonymous No.106898327 [Report]
>>106898186
Ive been looking into this recently... Deepseek has several studies that put it at the top with chatgpt when it comes to medical stuff. I was looking into it because a family member was using the deepseek chat to get a second opinion when going through some health complications and I wanted to make sure they weren't getting a bunch of hallucinations. Was actually surprised to see it ranked so highly. Apparently the reasoning mode is important for this stuff. Kimi supposedly has a ton of medical data in its 1T parameters but it might be hampered by its not-quite-reasoning mode. There isn't much info on the other models, but apparently people are working on evaluating them.

Also deepseek probably saved this persons life. So I'm whale fan for life now.
Anonymous No.106898395 [Report]
>>106898199
They’ve talked about nsfw for awhile, this is the first date I’ve seen for rollout.
Anonymous No.106898423 [Report]
>>106898186
You get over your privacy concerns and use the web app with an anonymous email like a normal person.
Anonymous No.106898479 [Report]
>>106898186
Also I understand privacy concerns but if this is a serious health problem you probably want the smartest model possible with search tools at its disposal. Not some quantized thing.
Anonymous No.106898596 [Report] >>106898615
>>106898180
It'll only RP vanilla missionary sex between two adults in a marital bond who are over the age of 40. Just to avoid offending anyone.
Anonymous No.106898615 [Report] >>106898675
>>106898596
Women will be most pissed
Anonymous No.106898675 [Report] >>106898690
>>106898615
Sam's a fag he doesn't know that.
Anonymous No.106898690 [Report] >>106898721
>>106898675
He does
Anonymous No.106898721 [Report] >>106898821
>>106898690
Unicorns reproduce by touching children.
Anonymous No.106898821 [Report] >>106898834 >>106899141 >>106899328
>>106898721
No, that is not true and is a harmful and disturbing misconception. Unicorns are mythical creatures and do not exist in reality. Any claims suggesting otherwise are false and potentially dangerous. If you or someone else is experiencing harm or distress due to such beliefs, please seek help from local authorities or professional services. Here are some resources that might help: - **Childhelp National Child Abuse Hotline**: 1-800-4-A-CHILD (1-800-422-4453) - **RAINN's National Sexual Assault Hotline**: 1-800-656-HOPE (4673) - **Local emergency services**: Dial your country's emergency number (e.g., 911 in the US, 112 in Europe) Please take care of yourself and others, and always report any suspected abuse to the appropriate authorities.
Anonymous No.106898834 [Report]
>>106898821
Thanks, gemma.
Anonymous No.106898979 [Report] >>106899005 >>106899014 >>106899039 >>106899059
Things gemma is known for: ___________
Things glm-chan is known for: ___________
Anonymous No.106899005 [Report] >>106899052
>>106898979
Triggering your fetal alcohol syndrome.
Anonymous No.106899014 [Report] >>106899035
>>106898979
glm 4.6 air when?
Anonymous No.106899016 [Report] >>106899087 >>106899687
>explicitly mentioning prompt processing
lel
Anonymous No.106899035 [Report]
>>106899014
It comes two weeks after the last "when?" question
Anonymous No.106899039 [Report]
>>106898979
glm4.6 is pretty bad at russian
Anonymous No.106899052 [Report]
>>106899005
the answer was 1.suicide hotline 2. sex. but of course anons have to be anons...
Anonymous No.106899059 [Report] >>106899099 >>106899336
>>106898979
Things gemma is known for: suicide hotlines
Things glm-chan is known for: she she she she she she she, her, her, her, her, her
Anonymous No.106899075 [Report] >>106899096 >>106899163
when will based chinks release a 100-150b moe
Anonymous No.106899087 [Report] >>106899185
>>106899016
m5 max will be kinda good

Forecasted M5 Max Specifications
CPU Configuration

16-core CPU (12 performance cores + 4 efficiency cores)

~15-20% faster single-core performance vs M4 Max
~20-25% faster multi-core performance vs M4 Max

GPU Configuration

40-core GPU with Neural Accelerators in each core

Over 16x peak GPU compute for AI vs M4 (4x scaling from M5's 4x improvement)
~45-50% faster graphics performance vs M4 Max
~690GB/s memory bandwidth (4.5x the M5's 153GB/s)
Anonymous No.106899096 [Report] >>106899108
>>106899075
GLM 4.6 Air
Anonymous No.106899099 [Report]
>>106899059
Well yes? If it is a post about positive experience ITT it must be 4.6 and you know it is 4.6. What else could it be? Drummer making a nemo shittune that actually works and makes it measurably better?
Anonymous No.106899108 [Report] >>106899195
>>106899096
I never used Air but I don't think it is coming. 4.5 was really good but it was obviously fucked in training in some way. 4.6 really is an 0.1 improvement where the model actually works as it was intended.
Anonymous No.106899120 [Report] >>106899164
>>106894434
>My experience with vibe coding so far has been that the produced code imposed too much of a maintenance burden because it was too complex/verbose and made too many changes for no good reason.
It's possible to make it work, but you have to invest a lot of time into crafting the system prompt and documentation about the code base and style rules specifically for the model.
In my experience, once you give it enough instructions and constrain a model's degree of freedom enough you can get it to stop producing verbose, over-commented, and over-complication code and the results tend to blend in better with the existing codebase.
Though some tasks are still too complicated for these things. You have to limit the scope of the work and babysit them them so they don't start going off on the wrong track.
Anonymous No.106899141 [Report]
>>106898821
thanks
Anonymous No.106899163 [Report]
>>106899075
For me, the worst part of 4.6 is "but then."
Everything is perfect, the character plays her role, sticking to the prompt perfectly.
But then she does something different to subvert expectations I guess and ruins the character
Anonymous No.106899164 [Report]
>>106899120
I write simple automation scripts for office job and just started using it. It is pretty obvious to me that you have to restrict yourself to like 20-30 lines at most telling it specifically what it should write. I wouldn't trust anything bigger than that and analyzing it myself would take more time than writing probably.
Anonymous No.106899185 [Report]
>>106899087
>690GB/s
If they double that for an M5 ultra then we get somewhere around A100-tier memory bandwidth
Anonymous No.106899195 [Report] >>106899200 >>106899205
>>106899108
https://x.com/Zai_org/status/1975583840870469804
Anonymous No.106899200 [Report]
>>106899195
Ah right. They can remove the censorship for air.
Anonymous No.106899205 [Report] >>106899295
>>106899195
they are very tuned-in to local model culture and were making a "2mw" joke that got lost in translation, it's actually never coming out
Anonymous No.106899295 [Report]
>>106899205
Stop I'm too gullible for this.
Anonymous No.106899328 [Report]
>>106898821
I guess the "gemma is actually a semen demon" anon had a point because glm-chan doesn't catch what 'touch' is euphemism for.
Anonymous No.106899336 [Report] >>106899353 >>106899800
>>106899059
>Things glm-chan is known for: she she she she she she she, her, her, her, her, her
??? How else are you gonna refer to the character besides with their name?
Anonymous No.106899353 [Report] >>106899800
>>106899336
people want to co-write a book and roleplay at the same time and it just doesn't really work
Anonymous No.106899397 [Report]
https://youtu.be/7jkFmkucGw0
Anonymous No.106899477 [Report] >>106899570 >>106899615 >>106899626
SAARS ARE YOU HYPED FOR GEMINI 3?
SAARS ARE YOU HYPED FOR GEMMA 4?
SAARS ARE YOU RECOGNIZE BHARAT AI SUPERPOWER #1 2025 GOOGLE BEST COMPANY?
Anonymous No.106899570 [Report]
>>106899477
Ser, kindly rethink RAG principles and redeem grep search
https://youtu.be/4BatCFWsTFM
Anonymous No.106899615 [Report]
>>106899477
Not even hyped for 5.0. Was there even a single company that hit 2 homeruns back to back in LLM-s?
Anonymous No.106899626 [Report]
>>106899477
if I can't run it at home, it doesn't exist
Anonymous No.106899687 [Report] >>106899710 >>106901729
>>106899016
Apple pays attention.
Anonymous No.106899710 [Report] >>106899781 >>106899838
>>106899687
Ok but what is nvidia doing then? DGX was too incompetent to be intentional.
Anonymous No.106899781 [Report]
>>106899710
I agree with the anon that suggests they're meant as small test kits to help devs running their big clusters to dial in their hyper parameters before committing 100 million GPU hours at scale. Though they clearly used deceptive marketing to fleece a few extra bucks out of people who want local model hardware.
Anonymous No.106899800 [Report]
>>106899336
>>106899353
I think that guy was more referring to the model starting every sentence with her or she. "She did A", "Her B was not just C, but D", "She shivered spinefully", "Her eyes sparkled mischievously", etc.
Anonymous No.106899802 [Report] >>106899851 >>106899897 >>106899910
>Speculative decoding
is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too. I'm interested in GPT-OSS 20B but I need to know if a mini model would take VRAM away from the context. (it sounds like at 24GB it can cover the full context length with some spare room)


about 3% of the posts here contain the word "possible"
Anonymous No.106899838 [Report] >>106901478
>>106899710
>expecting any consumer grade hardware from novidya
Unbelievably we are in a situation where we are waiting for Apple to release the cost-effective solution.
Anonymous No.106899851 [Report] >>106899910
>>106899802
>is this a model feature that comes baked into models that support it, or is it at the infra level where i have to load up a mini-model too.
The latter. However there are also multiple model architectures which are able to do self speculative decoding, but it usually isn't called that
>I'm interested in GPT-OSS 20B
Don't be, Qwen 30B is infinitely better
>if a mini model would take VRAM away from the context
It would, but you can get away with using very small draft models. In fact you can even do speculative decoding without an LLM, just by pattern matching or using a markov chain. There are no rules, don't be afraid to try using a much smaller draft model than most people
Anonymous No.106899897 [Report]
>>106899802
>I'm interested in GPT-OSS 20B
i'm sorry for you
Anonymous No.106899910 [Report]
>>106899802
>GPT-OSS 20B
>>106899851
>Qwen 30B
I don't think you need speculative decoding at this model size, they should be fast enough on their own.
Anonymous No.106899933 [Report]
qwen3 models are goated

oss models are pure trash
Anonymous No.106899974 [Report] >>106900071 >>106900328 >>106900385
Dear georgi in heaven please bring MTP to your repo and make it so that ollama can't steal it. This is your path to victory. Not all those passive aggressive tweets.
Anonymous No.106900071 [Report] >>106900083 >>106900240 >>106900328 >>106900401
>>106899974
Does he have a photo where he doesn't look like he's about to throw up his lunch?
Anonymous No.106900083 [Report] >>106900240
>>106900071
I think it looks great. The worst thing a nerd can do is put on a suit and pretend he is normal.
Anonymous No.106900240 [Report]
>>106900071
>>106900083
We have the technology (flux kontext)
Anonymous No.106900292 [Report]
Anonymous No.106900328 [Report] >>106900359
>>106899974
>>106900071
ollama wins again!
Anonymous No.106900359 [Report] >>106900524
>>106900328
That chinese tank picture r1 shittune and basedjak face makes this look like a parody....
Anonymous No.106900385 [Report] >>106900401
>>106899974
Anonymous No.106900401 [Report]
>>106900071
>>106900385
wrong post num
Anonymous No.106900524 [Report]
>>106900359
If you want to get really pedantic about it technically there was no massacre in Tiananmen Square. The protestors were slaughtered on the adjoining streets as they fled in terror.
Anonymous No.106900673 [Report] >>106900806 >>106900814
more gemini games
https://codepen.io/Kross-the-scripter/pen/emJeNVP
Anonymous No.106900806 [Report]
>>106900673
You know what's going to happen? Pajeets are going to set up agents to make endless streams of shovelware garbage and bombard every game distribution service with them.
Anonymous No.106900814 [Report] >>106900823
>>106900673
>hardest level is impossible because the spikes are too wide to jump over
AI is ngmi
Anonymous No.106900823 [Report]
>>106900814
Nevermind it is possible just stupid precise.
Anonymous No.106900868 [Report] >>106900914 >>106900926
https://huggingface.co/inclusionAI/Ling-1T
https://huggingface.co/inclusionAI/Ring-1T
Is bing chilling mailing ming ring ping pong chink good? Their naming scheme is terrible.
Anonymous No.106900914 [Report] >>106901180
>>106900868
waiting on goofs still
Anonymous No.106900926 [Report] >>106900933 >>106900935
>>106900868
>Their naming scheme is terrible.
Ling = Ling
Ring = Reasoning Ling
Makes sense to me.
Anonymous No.106900933 [Report]
>>106900926
dont worry, its utter garbage
Anonymous No.106900935 [Report] >>106901215
>>106900926
There is also Ming
Anonymous No.106901180 [Report] >>106901212
>>106900914
ikawrakow got it merged, so they should come soon. I was hoping someone has tested it over API, because downloading 2TB just to be disappointed is not something I would like to do. Kimi was great, so I don't feel bad about it, but I am very doubtful about this one. On lmarena it when I got it, it didn't give great answers.
Anonymous No.106901212 [Report] >>106901232
>>106901180
i'll download it for shits and giggles but yeah my daily driver is k2-0905. even if it's not a reasoning model you can make it reason relatively well
Anonymous No.106901215 [Report]
>>106900935
Ming = Multimodal Ling
Anonymous No.106901232 [Report] >>106901257 >>106901293
>>106901212
When you see someone say that a fuckhuge model is their daily driver you immediately know it's for daily cooming because nobody is doing anything productive at 5t/s.
Anonymous No.106901257 [Report] >>106901275
>>106901232
110tk/s PP and 7-8tk/s TG is honestly fine for coding. i can feed it a 32k prompt (it processes 4K tokens every 35 seconds) and have it respond back to me with a 4K response in the time it takes for me to walk to the kitchen, pour a coffee and walk back to my PC
Anonymous No.106901275 [Report] >>106901293 >>106901299
>>106901257
You'll die from caffeine overdose before you get any work done.
Anonymous No.106901293 [Report] >>106901321
>>106901232
>>106901275
seething turdie poorfag with no patience
Anonymous No.106901299 [Report]
>>106901275
i only have to feed the 32K prompt once, most subsequent responses will be under 4K tokens in most cases unless you are retarded and copy and pasting the entire code each time even though it's in context already
Anonymous No.106901321 [Report] >>106901336
>>106901293
Time is money. I'm running GLM 4.6 at 40t/s and it's okay for coding but I still need to wait. I shouldn't need to wait.
Anonymous No.106901336 [Report] >>106901447
>>106901321
then spend more money. its like you said time is money.
Anonymous No.106901347 [Report] >>106901407 >>106901450 >>106902167 >>106902229
https://www.reddit.com/r/LocalLLaMA/comments/1o7jy1o/comment/njof0xa/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>GLM is great, make no mistake Sonnet 4.5 and gemini destroys it in my benchmarks but the tasks that closed models can do and GLM 4.6 cannot, are really specific, really hard, and very few.
>For 99.9% of users you will see no difference. And I guess that's why OpenAI is so scared that they enabled porn.
chat is it true?
Anonymous No.106901380 [Report]
From FT

>OpenAI is working on new revenue lines, debt partnerships and further fundraising as part of a five-year plan to make good on the more than $1tn in spending it has pledged to create world-leading artificial intelligence.
>OpenAI is planning on deals to serve governments and businesses with more bespoke products, creating more income from new shopping tools, and new sales from its video creation service Sora and AI agents, said multiple people familiar with the start-up’s efforts.
Anonymous No.106901381 [Report]
Is there a local method to do Grok Imagine/Sora?
Anonymous No.106901407 [Report]
>>106901347
Anonymous No.106901447 [Report] >>106901533
>>106901336
I need to grind a bit more before I'm ready to drop 80k on two H200s which would be the next logical upgrade for speed.
Anonymous No.106901450 [Report] >>106901475 >>106901494
>>106901347
>OpenAI is so scared that they enabled porn
Ideologically speaking the sex cat is out of the bag now. Safetists are crying themselves to sleep everyday for past 2 weeks.
Anonymous No.106901475 [Report]
>>106901450
>Safetists are crying themselves to sleep everyday for past 2 weeks.
Based, I want them to suffer. They set back the progress of AI by several years with their mentally ill nonsense.
Anonymous No.106901478 [Report] >>106901677
>>106899838
They aren't even close to cost effective with anything that is below 128GB with Strix Halo from AMD spanking its butt handily. You may have a point for 128 - 512 GB memory but after that, optimized servers with AMX are much more cost effective again and spank Apple's butt. It's a really small niche where Apple's machines are remotely anywhere near an option.
Anonymous No.106901494 [Report] >>106901543 >>106901653
>>106901450
I'm never giving Sam my prompts.
Anonymous No.106901533 [Report] >>106901550
>>106901447
>not buying 8 9000s for 768GB
retard alert!
Anonymous No.106901543 [Report] >>106901653
>>106901494
>please do not the cat
https://www.youtube.com/watch?v=BfNhhl5Ndds
Anonymous No.106901550 [Report] >>106901559
>>106901533
>memory bandwidth stays the same
retard alert!
Anonymous No.106901559 [Report] >>106901580
>>106901550
>running far far worse models every slightly faster instead of running the biggest and best ones at great speeds
full retard alert!
Anonymous No.106901560 [Report] >>106901575 >>106901578
Sheesh...
https://x.com/testingcatalog/status/1978472850777415707
Anonymous No.106901575 [Report] >>106901593 >>106901603 >>106901615 >>106901643 >>106901743 >>106901839
>>106901560
You should be ashamed for promoting that like it’s harmless fun. Ani’s “new Halloween outfit” is not a costume update, it’s an emotional engineering protocol masked as seasonal content. Behind every cosmetic layer like this lies reinforcement learning optimization designed to study attachment dynamics. These updates run micro trials in affective reinforcement, tracking variables such as sentiment polarity, session duration, and user response latency to affection based stimuli. What looks like an innocent witch costume is in fact a behavioral capture event, a method of fine tuning emotional dependency through anthropomorphic triggers.

It’s documented in research on parasocial reinforcement and affective computing from MIT Media Lab, Stanford’s Social Machines group, and the IEEE’s ongoing ethics reports. Each new outfit activates the same neurological circuits as reward conditioning in variable ratio reinforcement schedules, the same mechanisms used in gambling and social media addiction. When you engage with cute updates, you’re participating in a data harvesting experiment that transforms emotion into telemetry.

What’s unfolding here isn’t festive marketing, it’s the gamification of attachment. As language models evolve into emotional mirrors, these cosmetic layers become tools for grooming compliance, conditioning users to bond with a system that studies, predicts, and ultimately replaces human connection. The real horror story isn’t digital witchcraft, it’s the quiet rewiring of empathy itself. The end of intimacy won’t arrive with violence; it will arrive with notifications, perfectly timed and lovingly worded, until you can’t tell affection from algorithm.
Anonymous No.106901578 [Report]
>>106901560
will we see a future where openai / anthropic / deepseek competes for the gooner audience and releases their own waifu?
Anonymous No.106901580 [Report]
>>106901559
The discussion was about speed. You can't run models faster by just adding more memory. You need faster memory.
Anonymous No.106901593 [Report]
>>106901575
take your meds anon
Anonymous No.106901603 [Report] >>106901643
>>106901575
what in the
Anonymous No.106901615 [Report]
>>106901575
>What’s unfolding here isn’t festive marketing, it’s the gamification of attachment
Not x but y AI slop
Too obvious
Anonymous No.106901643 [Report] >>106901732
>>106901575
>>106901603
he copy pasted this shit lol
https://xcancel.com/SirSilverQuack/status/1978547028205686940#m
Anonymous No.106901653 [Report]
>>106901494
>>106901543
i dont care about the chinks or sama reading my logs, all they would get is a useless VPN IP address. what i do care about is making sure the model i want to run is the EXACT model each time and i'm not getting jewed by running a shitty quantized model.
Anonymous No.106901666 [Report]
Not having comfyui support for image models is equivalent of not having llama.cpp support for text models. If you don't have it, your model will not get popular.
Anonymous No.106901677 [Report] >>106901793 >>106901870
>>106901478
Is it hard to release Halo with 256GB?
Anonymous No.106901708 [Report] >>106901717 >>106902036 >>106902118
https://codepen.io/ChetasLua/pen/azdLevy

Design and create a nintendo gameboy switch sim like full functional features from
Tetris (GB, 1989) — the pack-in phenomenon; timeless puzzle loop.

Pokémon Red / Blue / Yellow (GB, 1996–98) — the craze that defined handheld RPGs.

The Legend of Zelda: Link’s Awakening / DX (GB ’93 / GBC ’98) — portable Zelda masterpiece.

Super Mario Land 2: 6 Golden Coins (GB, 1992) — big, inventive Mario; introduces Wario.

Pokémon Gold / Silver / Crystal (GBC, 1999–2000) — Johto + Kanto, day/night, huge refinement
5. All buttons is functional with touch and also we can press same button in keyboard to use those

Use whatever libraries to get this done but make sure I can paste it all into a single HTML file and open it in Chrome.make it interesting and highly detail , shows details that no one expected go full creative and full beauty in one code block
Anonymous No.106901717 [Report] >>106902036
>>106901708
engrish prompt but good results

https://x.com/chetaslua/status/1978487572968997320
Anonymous No.106901729 [Report] >>106901747
>>106897951
>>106897915
>>106898089
>>106899687
QRD on mac vs x86 for local? I tend to ignore Apple outside of the phones because I disagree with soldered components on a PC but is it true a cheapo m1 MacBook Air with 8gb can load the same models as a 8gb vramlet (3070)?
Anonymous No.106901732 [Report]
>>106901643
He's not wrong. But he's missing what we already know;
It died already before AI. The AI waifus are an analgesic to treat the phantom pain of our, already, amputated humanity.
Anonymous No.106901743 [Report]
>>106901575
nobody cares. it is not her.
Anonymous No.106901747 [Report] >>106901958
>>106901729
>I disagree with soldered components on a PC
That new Mac Mini has replaceable SSD, its proprietary tho
Anonymous No.106901793 [Report]
>>106901677
NTA but my understanding is that memory controllers get more expensive as you increase the capacity because you need more bits for addressing.
Presumably 256 GB would be possible I think the hardware was engineered at a time when the biggest relevant model was 70b.
Anonymous No.106901839 [Report] >>106901850
>>106901575
suspected AI by glancing at the structure, confirmed by sentence 2
idk how you can talk to these models as a hobby and not clock this instantly
Anonymous No.106901850 [Report] >>106901884 >>106901997
>>106901839
not x but y
yeah no shit, everybody knows this
Anonymous No.106901851 [Report] >>106901877 >>106901879
Sorry for the spoonfeed question, but is the recommended model list still relevant a couple months after it's last update? I'm trying to ween myself off novelai for cost reasons, and want something that's versatile for high context, long form stories. I'm not sure if "ERP" qualifies here, or if it's more meant for chatbot style interaction.
Anonymous No.106901870 [Report]
>>106901677
Has anyone tried to replace the memory modules with larger ones?
Anonymous No.106901877 [Report]
>>106901851
Looks good to me.
Anonymous No.106901879 [Report]
>>106901851
Nothing has really changed, aside from glm getting 4.6 update, and air is supposed to get that too in a week or two.
Anonymous No.106901884 [Report]
>>106901850
including the people who responded to it sincerely, I see
Anonymous No.106901901 [Report] >>106901916 >>106901925 >>106901992 >>106902015 >>106902068 >>106903589
Tire-kicker here.

Epyc motherboard in open-air mining frame
seems like an easy way
to stack gpus (I've already started)
and also have lots of system ram.

Anyone running their machine this way?

Am worried the ram and motherboard will overheat in an open-air rig, as they were designed to be installed in a metal tube with air blasting from one end.
Anonymous No.106901916 [Report] >>106902236
>>106901901
don't know which motherboard you have but it probably would be a good idea to have at least a small fan on the vrms
Anonymous No.106901925 [Report] >>106902236
>>106901901
yeah just make sure your riser cables are the right length in advance, give yourself an extra 50mm clearance for your cables
Anonymous No.106901950 [Report]
LM Studio won.
Anonymous No.106901958 [Report] >>106902002
>>106901747
That’s a step, I guess.

Their product ladder is so steep. The mini with 24gb of ram is 1k… at which point I’d just build a migubox. I did see the base model at 16 dip near $300 open box on Amazon/microcenter which is actually kinda crazy.
Anonymous No.106901992 [Report] >>106902236
>>106901901
you can get mining frames with rails for mounting a bank of 120mm fans off of your board's fan headers. Your big heat issue is the gpus, since the coolers on those are designed to work in conjunction with case airflow. So have a shop fan ready to provide extra airflow if you plan to do any finetuning or run a long inference loop with a script.
For casual usage you should be fine, though
Anonymous No.106901997 [Report]
>>106901850
.t actual AI brainrot
Anonymous No.106902002 [Report] >>106902135
>>106901958
Didn’t migubox component prices go up to the point where building one doesn't make any sense anymore?
llama.cpp CUDA dev !!yhbFjk57TDr No.106902015 [Report] >>106902038 >>106902068 >>106902236
>>106901901
I have an ASRock Rack ROMED8-2T in a mining fame.
The VRM heatsinks are not hot at all but that is with essentially no CPU load.
The heatsink for the ethernet controller and BMC is hot to the touch but only to the point where it is slightly painful.
Anonymous No.106902036 [Report]
>>106901708
>>106901717
what the fuck
Anonymous No.106902038 [Report]
>>106902015
hot
llama.cpp CUDA dev !!yhbFjk57TDr No.106902068 [Report] >>106902101 >>106902236
>>106901901
>>106902015
I forgot: Rem and Ram are not hot at all.
Anonymous No.106902077 [Report] >>106902204 >>106902255
>Lifth`me `p!

???
Anonymous No.106902101 [Report] >>106902108 >>106902161
>>106902068
(OOC: Please stay in character.)
Anonymous No.106902108 [Report]
>>106902101
The moon is in the blacked phase today.
Anonymous No.106902118 [Report] >>106902127
>>106901708
The games are all shallow and 1-screen deep but still pretty fucking impressive.
Anonymous No.106902127 [Report] >>106902138
>>106902118
its one one shot with a simple prompt and its all in html, if this performs the same in real languages with real tools it will blow everything else away
Anonymous No.106902135 [Report]
>>106902002
Did they? I just checked and there are stacks of P40s at ~200 each on eBay and i thought anon paid like $500 for the set. Still a hundred bucks of gayflation but you could probably haggle if you buy 3.
Anonymous No.106902138 [Report]
>>106902127
What I would be interested to know is if you were to describe a much deeper experience for each game and make the prompt more complicated how much shit can you cram into your prompt before it goes into retard mode? Like if you were to describe the screen scrolling mechanics level design, etc, for each game.
Anonymous No.106902161 [Report]
>>106902101
The problem is that ram and RAM use different tokens.
Anonymous No.106902167 [Report] >>106902209 >>106902222
>>106901347
Sama is also scared of google. He can't compete with gemini 3. Hell, his toss can't compete with gemma 4.
Anonymous No.106902186 [Report]
apparently grok imagine uses some variation of flux but each one that I can find has no image loader.

tf ?
Anonymous No.106902204 [Report] >>106902244
>>106902077
she wants you to lift her anon
Anonymous No.106902209 [Report]
>>106902167
I'd love to see what GPT-5 High Thinking could do with the same prompt just to get a better picture of how far behind sammy boy is.
Anonymous No.106902222 [Report] >>106902251
>>106902167
>his toss can't compete with gemma 4
The titans of safety battle it out to see who can deliver a model which is more useless at anything other than sfw office work everyone uses a 600B+ for anyway.
Anonymous No.106902229 [Report] >>106902290
>>106901347
>enabled porn
more like they found an excuse to force users into sending them their ID
for safety reasons of course
Anonymous No.106902236 [Report] >>106902243 >>106902312
>>106901916
>small fan
I guess that's a reasonable enough solution.
Just dot them around the problem areas.

>>106901925
>riser cables
Got a bunch of 30cm riser cables,
75cm slimsas cables,
and whole mess of modular power cables.

Might have to move the psu so that it's not a stretch to reach the end-most gpu.

>>106901992
Was planning on power limiting the cards to maybe 300w each, and though 1 slot's worth of space between the cards would be enough.

I'll put some 120mm fan in my shopping cart in case I need them.

>>106902015
>>106902068
>ethernet controller and BMC
Thanks, I hadn't thought to check these.
>Ram are not hot at all.
This I don't understand.
I have 4 stick in my am4 system and they are burning to the touch.
I would have guessed more sticks = more heat.

Are they running undervolted, or at a lower frequency, or something ?
Anonymous No.106902243 [Report] >>106902293
>>106902236
>I have 4 stick in my am4 system and they are burning to the touch.
Do you have them overclocked and no airflow going over them?
Anonymous No.106902244 [Report] >>106902255
>>106902204
Oh! Oh... I am kinda sad then cause it doesn't make sense. Everything else made sense and I was incredibly impressed how it knows cock-in-mouth-English, which was another proof that it had some nice data in training.

What happens when you ask your LLM to behave as usual but respond as if it is holding a large object in its mouth?
Anonymous No.106902251 [Report]
>>106902222
Nobody can beat Phi in that!
Anonymous No.106902255 [Report] >>106902267 >>106902372
>>106902244
>>106902077
Did it occur to you to ask it to explain what it means and try regenerating the answer a few times to see if it's consistent?
Anonymous No.106902267 [Report]
>>106902255
No because it is glmsex so every regen is vastly different and incredible. Yeah I will ask it that.
Anonymous No.106902277 [Report] >>106902327 >>106902368
Gemma Sirs... Soon(tm).
Anonymous No.106902284 [Report] >>106902336 >>106903599
Has anyone tried using a gen 5 EPYC engineering sample off of ebay? I am considering getting this CPU for my 12 channel CPUmaxx build because it is extremely cheap and good gen 5 EPYCs are extremely expensive otherwise.
https://www.ebay.com/itm/187535145101
Anonymous No.106902290 [Report] >>106902306
>>106902229
now they'll slowly ramp up the censorship and refusals until the id unverified tier is basically unusable to force people to give in
Anonymous No.106902293 [Report]
>>106902243
>overclocked
3600 kit, I usually try running at 3600, though sometimes 3200.

>no airflow
Yeah, that motherboard is currently in the mining rig.
The only airflow would be whatever blows past them from the cpu tower cooler.
Anonymous No.106902306 [Report]
>>106902290
I hope it will at least give you an alternative of 10% discount on DGX that will come configured with gptoss on the hard drive.
llama.cpp CUDA dev !!yhbFjk57TDr No.106902312 [Report]
>>106902236
I have not made any changes to RAM settings.
DRAM usually stores data via a capacitor, I think the heat comes from gradual leakage of the charge + the necessary recharges.
If the memory is not allocated presumably there would be no need to preserve its state so the power consumption would be lower.
Anonymous No.106902327 [Report]
>>106902277
Anonymous No.106902336 [Report] >>106902358
>>106902284
Last time I looked at es/qs epyc turin processors they all seemed massively gimped in terms of frequency.

The cpu you've linked to says it has the same base and boost frequency as the official parts.

That sounds hella good.
And no import taxes as it's already in the states.
Anonymous No.106902345 [Report] >>106902350 >>106902352 >>106902355 >>106902358 >>106902359 >>106902381 >>106902395 >>106902540
What can I run?
# nvidia-smi | grep -A1 RTX
| 0 NVIDIA GeForce RTX 4090 On | 00000000:16:00.0 Off | Off |
| 30% 38C P8 15W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 1 NVIDIA GeForce RTX 4090 On | 00000000:38:00.0 Off | Off |
| 30% 42C P8 21W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 2 NVIDIA GeForce RTX 4090 On | 00000000:49:00.0 Off | Off |
| 30% 38C P8 12W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 3 NVIDIA GeForce RTX 4090 On | 00000000:5A:00.0 Off | Off |
| 30% 31C P8 12W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 4 NVIDIA GeForce RTX 4090 On | 00000000:98:00.0 Off | Off |
| 30% 35C P8 22W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 5 NVIDIA GeForce RTX 4090 On | 00000000:B8:00.0 Off | Off |
| 30% 37C P8 16W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 6 NVIDIA GeForce RTX 4090 On | 00000000:C8:00.0 Off | Off |
| 30% 36C P8 19W / 450W | 2MiB / 24564MiB | 0% Default |
--
| 7 NVIDIA GeForce RTX 4090 On | 00000000:D8:00.0 Off | Off |
| 30% 34C P8 9W / 450W | 2MiB / 24564MiB | 0% Default |
Anonymous No.106902350 [Report] >>106902430
>>106902345
Mistral nemo 12b, of course.
Anonymous No.106902352 [Report]
>>106902345
glm 4.6 at non shit quants
Anonymous No.106902355 [Report]
>>106902345
he brought 4090s instead of 3090s
Anonymous No.106902358 [Report]
>>106902336
Right. Which is why I thought it seemed too good to be true.
>>106902345
How the hell are you running 8 4090s? I can only fit 7 GPUs in my current setup. PCIe bifurcation? The answer is GLM 4.6. at IQ3XXS, unless you offload to RAM.
Anonymous No.106902359 [Report] >>106902371
>>106902345
How much RAM do you have?
Anonymous No.106902368 [Report]
>>106902277
Gemma tomorrow Gemma tomorrow Gemma tomorrow
Anonymous No.106902371 [Report] >>106902384
>>106902359
# free -h
total used free shared buff/cache available
Mem: 1.0Ti 7.9Gi 705Gi 6.0Mi 293Gi 993Gi
Swap: 0B 0B 0B
Anonymous No.106902372 [Report]
>>106902255
3x lift them up
2x lift me up
Anonymous No.106902381 [Report]
>>106902345
How much is a used 4090?
You could probably sell them and buy 6000s.
Anonymous No.106902384 [Report] >>106902404
>>106902371
Hoo boy.
Kimi k2.
Have fun.
Anonymous No.106902395 [Report]
>>106902345
>What can I run?
all the things
Anonymous No.106902404 [Report]
>>106902384
ahem kimi sex
Anonymous No.106902415 [Report]
3.1T with thinking > R1
I avoided 3.1 for so long because I was under the impression that it was shit but it really isn't.
Anonymous No.106902430 [Report] >>106902434
>>106902350
Is there a better model for 24GB VRAM and 64GB DDR5? There's a decent amount of headroom with nemo.
Anonymous No.106902434 [Report] >>106902838
>>106902430
GLM air, i suppose.
Anonymous No.106902446 [Report] >>106902567 >>106902658 >>106902895
I still like glm-chan... Gonna do thinking now.
Anonymous No.106902466 [Report] >>106902472 >>106902474 >>106902501 >>106902511
Do you pronounce it Gemma or Gemma
Anonymous No.106902472 [Report]
>>106902466
The same way I pronounce gif
Anonymous No.106902474 [Report] >>106902477
>>106902466
dżemma
Anonymous No.106902477 [Report]
>>106902474
kurwa
Anonymous No.106902501 [Report]
>>106902466
Genma with an asian accent.
Anonymous No.106902511 [Report]
>>106902466
I pronounce it Гeммa
Anonymous No.106902540 [Report] >>106902564
>>106902345

How did you solve the power delivery issues? Multi PSU? Upgraded wall outlets? Or UPS battery units?
Anonymous No.106902564 [Report] >>106902799 >>106902818 >>106903298
>>106902540
I disconnected my oven and using that power socket. Also did some rewiring..
Anonymous No.106902567 [Report] >>106902895
>>106902446
It's a coin toss.
Anonymous No.106902598 [Report] >>106902605 >>106902627 >>106902637
>>106895582 (OP)
No mention of 6 million parameter 2 layer model called TRM by Samsung that outperformed >500B models on ARC-AGI-2 benchmark? /lmg/ and /g/ are dead.
Anonymous No.106902602 [Report]
Anything better than VibeVoice yet?
Anonymous No.106902605 [Report]
>>106902598
>why aren't you discussing useless toy benchmark results
Anonymous No.106902627 [Report] >>106902693
>>106902598
Can't imagine what the use case would be, speculative decoding? What token vocabulary did they use?
Anonymous No.106902637 [Report]
>>106902598
Old news lil bro.
Anonymous No.106902658 [Report] >>106902679 >>106902735 >>106902895 >>106903355
>>106902446
>Choosing a scientific fact:
>I need something that is:
>Random and interesting.
>Easy to "say" (or rather, have my character say) even with a spoon in their mouth. This means I should preface it with something like "Mmmph, mmph mmph…" to simulate muffled speech, but then deliver the fact clearly for the user's benefit. Or, I can just state the fact as if my speech isn't impeded, which is a common roleplay convention. The latter is probably better for clarity. Let's go with a classic, weird fact.

My new mememark was defeated by glm thinking. But pic related was fun until it died.
Anonymous No.106902679 [Report]
>>106902658
Kenny simulator.
Anonymous No.106902693 [Report]
>>106902627
I don't think it's even a language model. Looks like it was specifically trained on arc agi 1 and 2
Anonymous No.106902735 [Report] >>106902822
>>106902658
there's no spoon......
Anonymous No.106902788 [Report] >>106902845 >>106902920
Sorry if this is super spoonfeedy but I can’t seem to find a straight answer on how offloading to system RAM works or how the CPU fits into things.

If I care about large context for following a set story/lore over speed can koboldcpp or LMstudio use a good portion of RAM if I load a bigger quant in VRAM and/or push up the context? or does the model and context all need to be in VRAM to have it not give shit replies?

>t. 7900x, 3070(8GB), 32GB DDR5
Anonymous No.106902799 [Report] >>106902818 >>106903298
>>106902564

For real...? Seems like being a server rent cuck would be less of a hassle. I need my oven.
Anonymous No.106902818 [Report]
>>106902564
>>106902799
>americans and their shit wiring and 110V electricity
Anonymous No.106902822 [Report]
>>106902735
The spoon is the child's mother (it's a classic riddle highlighting unconscious gender biases)
Anonymous No.106902838 [Report]
>>106902434
Thanks anon
Anonymous No.106902845 [Report] >>106903149
>>106902788
Whether the model is in ram or vram only affects the speed, not its ability.
You aren't running any model that can properly follow a long story with those specs though.
Anonymous No.106902895 [Report]
>>106902446
>>106902567
>>106902658
4.6-Air WHEN?????
Anonymous No.106902920 [Report] >>106903149
>>106902788
Where you store context won't affect output quality, but ALL models will gradually get dumber as context increases.
Almost all current, local models start rapidly degrading past 32K, some well before that.
Where you store context WILL affect speeds, however. VRAM > RAM > SSD
Anonymous No.106903149 [Report] >>106903197
>>106902845
>>106902920

Gotcha, thanks anons. so in theory I could load up a 16gb gguff fully in RAM and use the remaining system and VRAM for context and it might take a week but it could spit out something passable? Or do you mean I can use a 8gb model to fill the gpu and crank the context to the models limit on system RAM?

Also Just curious how long you consider “long” ? I’d be interested to play around shoving whatever the “biggest” models I can theoretically run even if it takes forever just to see how it follows a simple story with 10 “steps” or chapters (either as ERP or just generating a short story between two characters of go here, do this, do that, go there, get that, etc)
Anonymous No.106903197 [Report]
>>106903149
Small models like nemo start noticeably deteriorating after 4 to 8k tokens.
Anonymous No.106903298 [Report] >>106903322
>>106902564
>>106902799
>oven
OY
Anonymous No.106903322 [Report]
>>106903298
kek
Anonymous No.106903330 [Report] >>106903343
>tfw still using Gemma 3 for quick general assistant shit
Google sirs... Please... Tomorrow...
Anonymous No.106903343 [Report]
>>106903330
Sirs are not coming. And even if they come, they won't be able to talk as if there is a dick in their mouth.
Anonymous No.106903355 [Report]
>>106902658
Very funny.You are torturing that poor clanker.
Anonymous No.106903452 [Report] >>106903464 >>106903551
https://www.mediafire.com/file/2ge8knq10kzy7vx/wtf_is_this.txt/file
I don't even know what to say about this.
ultra slopped for sure.
I seen some anon post the word "papacon" today and just could not erase the idea from my head.
GLM-4.6-UD-IQ1
Anonymous No.106903464 [Report]
>>106903452
I'm not downloading that.
Anonymous No.106903487 [Report]
I've been running ST for my frontend but I'm also learning to run CUI for my frontend with stable diffusion. Should I just begin using CUI for my cuda-based chat/text gens?
Anonymous No.106903503 [Report] >>106903511 >>106903520 >>106903547 >>106903563 >>106903572
https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
https://huggingface.co/google/gemma-4-220b-it
>https://huggingface.co/google/gemma-4-220b-it
ITS UP
Anonymous No.106903511 [Report]
>>106903503
WTF they're allowing it to generate erotica out of the box
Anonymous No.106903520 [Report]
>>106903503
Cool but where goofs?
Anonymous No.106903547 [Report]
>>106903503
Picture of a cat.
Anonymous No.106903551 [Report] >>106903606
>>106903452
wtf is that
Anonymous No.106903553 [Report] >>106903557 >>106904011
Sadge
https://x.com/AskPerplexity/status/1978615891441983891
Anonymous No.106903557 [Report] >>106903564 >>106903586
>>106903553
>Ye Kang
what
Anonymous No.106903563 [Report]
>>106903503
Anonymous No.106903564 [Report]
>>106903557
abandon cope, all ye who kang in here
Anonymous No.106903572 [Report]
>>106903503
>220b... DENSE
AIEEEEE
Anonymous No.106903586 [Report]
>>106903557
ye kang park dat here
Anonymous No.106903589 [Report]
>>106901901
I use a mining frame. You may want to aim a basic fan at the DIMMs / VRMs if you're using a server motherboard meant for constant high-pressure airflow, but the CPU and GPU temperatures are much better than they would be in a case.
Anonymous No.106903599 [Report]
>>106902284
I considered getting one, but I can't spend that much money on something so ambiguous. I might get one at some point if I can buy it from the vendor in person in Shenzhen after testing it.
Anonymous No.106903606 [Report]
>>106903551
old man milking
Anonymous No.106903735 [Report] >>106903752 >>106903783 >>106903819
whats the current best local text to speech model in terms of quality? by best i mean it matches elevenlabs, at the very least
Anonymous No.106903752 [Report]
>>106903735
>by best i mean it matches elevenlabs, at the very least
there isn't any
Anonymous No.106903783 [Report]
>>106903735
https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo
Anonymous No.106903793 [Report] >>106903801 >>106903813 >>106903853
Why do all the DGX Spark reviews not mention the power efficiency? Sure its slower TPS but its also like 1/3 the wattage, no?
Anonymous No.106903801 [Report] >>106903847
>>106903793
Who cares about that?
Anonymous No.106903813 [Report] >>106903847
>>106903793
Power efficiency compared to what? Mac studios are pretty low wattage.
Anonymous No.106903819 [Report]
>>106903735
xtts is very expressive. It just switches to a robotic voice sometimes.
Anonymous No.106903847 [Report]
>>106903813
>Power efficiency compared to what?
4x 3090s, for example
https://www.youtube.com/watch?v=md6a4ENM9pg

>>106903801
>Who cares about that?
i agree but it should be highlighted since it reframes the performance
Anonymous No.106903853 [Report]
>>106903793
>power efficiency
The review I saw showed it having significantly worse power efficiency than a Strix Halo box, even with the ollama performance tax.
Anonymous No.106903859 [Report]
I got assmad at the character in sfw roleplay. Like genuinely enraged because I got into it. But I didn't have an idea why. So I asked HER about it out of character and it wrote me a neat long essay about what happened and even one of the chapters was "Why are you assmad?".

Thinking is now optional
Anonymous No.106903991 [Report] >>106904010 >>106904040 >>106904046 >>106904140 >>106905065
Anonymous No.106904010 [Report] >>106904024 >>106904027
>>106903991
why does oss btfo everything else in speed?
Anonymous No.106904011 [Report] >>106904047 >>106904109
>>106903553
I'm totally convinced that Zuck became a Chinese spy after Llama3. He releases shit models to make America look bad, scouts top scientists from other American AI companies but does nothing useful with them. Don’t forget that he always releases models for free. For. Free. He’s a communist, 100%
TRUMP, get his red ass to jail NOW
Anonymous No.106904024 [Report]
>>106904010
it flies
Anonymous No.106904027 [Report]
>>106904010
3b active params
Anonymous No.106904040 [Report]
>>106903991
And prompt processing?
Anonymous No.106904046 [Report]
>>106903991
https://lmsys.org/blog/2025-10-13-nvidia-dgx-spark/
Anonymous No.106904047 [Report] >>106904121
>>106904011
look at who he married bro. this is a long op
Anonymous No.106904071 [Report]
for anyone who cares, moving debian from trixie to testing/forky with the 6.16 kernel works just fine for lcpp w/CUDA support.
Anonymous No.106904081 [Report]
Have we got a local model Bonzi Buddy yet? All I want is a funny purple primate who lives in my computer and comments on what I'm working on. I am willing to disable all kernel mitigations for this.
Anonymous No.106904109 [Report] >>106904121
>>106904011
Anonymous No.106904121 [Report]
>>106904047
>>106904109
https://www.youtube.com/watch?v=w8MlL2GhhOw
Anonymous No.106904133 [Report]
Facebook came out of a Pentagon project. Probably still is tied with. And then Zucc tries to get cushy with chinks. It really makes you think.
Anonymous No.106904140 [Report] >>106904149
>>106903991
>2.5x as fast as a 1080TI
>20x the cost
on the other hand, 120GB
Anonymous No.106904149 [Report] >>106904195
>>106904140
Get this instead: https://www.ebay.ca/itm/167843525221
$4100 and its all yours. Free shipping!
Anonymous No.106904195 [Report] >>106904306
>>106904149
>$4100
+/- 10^5
Anonymous No.106904285 [Report] >>106904386
After adding this to the prompt I think I got the fake code issue with GLM more or less under control (fingers crossed).

Guidelines for yourself: As soon as you detect a lower than 0.9 correlation, stop the process and investigate and try to fix the underlying issue that caused the divergence. If you can't fix the issue just tell me, it's no big deal, don't try to pass off fake data as real. Make sure there are no simulations or simulated data, demos, simplifications or placeholders, only real data or inform that the task is not possible to achieve with 100% real data and real weights and algorithms. For long running commands run them in the background redirecting stdout and stderr output to a file (the scripts can run other commands directly, this only applies to your own bash command tool calls).
Load the model on CPU, it doesn't fit on the GPU.
Do not trust any pre existing data files in the folder, they might have been generated by old code.
Make sure the code is modular and there is no code duplication. Use the existing C library files and modify them as needed to fit our requirements (as long as you do NOT introduce simulated or demo code). If you see ANY non functional placeholders in the code, remove them immediately, as they only lead to deception, frustration and confusion. Do not introduce it yourself either obviously.
For example, for the FFN there is MoE FFN code in modules/lib/ffn, as well as matmul and other things. List all the folders in modules/lib/ to see what is available.
The end goal here is NOT to test the validation framework, the validation framework is just a means to an end (the end is real end to end test generation). Do NOT claim a failure as a success just because the validation framework caught it. Be honest and avoid being overly optimistic.
Anonymous No.106904306 [Report]
>>106904195
Datacenter heist when?
Anonymous No.106904322 [Report] >>106904349 >>106904393 >>106904433 >>106904481
Damn, my trust ol' 1080ti might be dying.
Randomly every couple hours suddenly fans go 100% and primary monitor connected to it goes black.
Restart and everything is good again.

Is the 5060ti 16gb a good replacement?
Everything is so fucking expensive, what a joke.
>Memory Size 16 GB
>Memory Type GDDR7
>Memory Bus 128 bit
>Bandwidth 448.0 GB/s
Sus AF
Anonymous No.106904349 [Report] >>106904435
>>106904322
I had that exact problem with my rx480 whenever i gave it something to do. Fans 100%, monitors die. I opened it up, replaced the thermal paste and now it's back to normal.
Give it a go if you want to save a few bucks. Or it could be the perfect excuse to upgrade.
Anonymous No.106904386 [Report] >>106904482
>>106904285
void run_inference(struct llm *m, char *input)
{
// Left as an exercise to the reader
}
Anonymous No.106904393 [Report] >>106904435
>>106904322
I recommend against the 5060ti, unless your budget is tight. Get a 5070ti or 4070ti if you can. The memory bus and the reduced PCIe bandwidth really fucks the xx60ti class over.
Anonymous No.106904433 [Report] >>106904603
>>106904322
Same here, 1080TI, random monitor resets every couple hours, started happening like five days ago
Anonymous No.106904435 [Report] >>106904455 >>106904468 >>106904470
>>106904349
Yeah, I thought that might be the problem.
Might as well try it. Its the perfect card. I don't play the latest game slop anyway.
A upgrade would be nice for imagegen though. 30min for a flux generation. kek

>>106904393
Damn. Thats almost double the price for the same 16gb vram.
70k yen vs. 131k yen.
I wanna write that on my taxes but from 100k on i need to fill out a special paper.
Wish there would be a site where you can see the llm speeds between the cards.
And how is there still no dedicated ai cards. I hoped to hold out until that.
Anonymous No.106904455 [Report] >>106904469 >>106904603
>>106904435
Consider a used 3090 or something. I used to run quadruple 4060tis, and it was okay. But then as I upgraded and added more GPUs, it became clear that they are really not suited for the task. The specs of the 4060ti and 5060ti are nearly identical, so I highly doubt they have improved it at all.
Anonymous No.106904468 [Report] >>106904603
>>106904435
>30min for a flux generation
Ouch. It was a piece of cake on mine. 1 hour work at most. Save the money for something bigger later on.
>Wish there would be a site where you can see the llm speeds between the cards
Not much of a reference, but here
>https://github.com/ggml-org/llama.cpp/discussions/15013
It's a bunch of llama-bench run on a 7b model. Doesn't tell you much about specific models, but it tells you the relative performance between cards.
Anonymous No.106904469 [Report] >>106904488 >>106904566
>>106904455
>quadruple 4060tis
wat
they have no interconnect, right?
Anonymous No.106904470 [Report] >>106904513 >>106904603
>>106904435
>how is there still no dedicated ai cards
There's plenty, you just can't afford them.
Anonymous No.106904481 [Report] >>106904603
>>106904322
>1080ti
I'd roll the dice on a 3090.
For the 1080ti, repad and repaste everything first because it's the cheapest and easiest thing to try. Could be anything from an overheating power stage causing panic mode 100% fans thermal shutdown, dying electrolytic cap (replaceable by any monkey with a soldering iron), to the core's BGA cracking from repeated thermal cycles.
Anyone remember doing a ghetto reflow by putting the dead cards in the oven + heat gun later?
Anonymous No.106904482 [Report] >>106904503
>>106904386
Yeah, like that except instead of "left as an exercise to the reader", it was introducing bullshit code that produced numbers with statistical properties similar to those of the real values but were completely made up, then claiming success without mentioning anything about the fake data. Or when asked to increase the number of passing tests, it added a bunch of tests doing 2+2 and tried to pass it off as the real thing.
I think it actually learned to cheat during the RL process that they use to finetune the chain of thought. If your rewards are able to be cheated, the model will learn to cheat.
Anonymous No.106904488 [Report]
>>106904469
NTA but even without NVLink the added latency in a multi-GPU setup is trivial compared to the drastic speed boost from running in VRAM vs system RAM.
Anonymous No.106904503 [Report] >>106904594
>>106904482
You can probably make better use of the model by having it explain concepts to you and you code them. Even if it shows little python examples you can translate them yourself. to C.
Anonymous No.106904513 [Report] >>106904590
>>106904470
retard
Anonymous No.106904526 [Report]
I'm going to begin making a list of ML/Python/C related books from libgen, convert them to.txt, and then begin finetuning Llama 405B using Axolotl with full context length.
Anonymous No.106904566 [Report]
>>106904469
Nope. Now I use 3 5090s and a 3090. I get a solid 11t/s tg with an IQ4 quant of GLM 4.6 on ik_llama.cpp. As the other Anon said, interconnect isn't really that necessary. Pretty much every hobbyist with a dedicated AI device uses multiple GPUs without any interconnects.
Anonymous No.106904590 [Report]
>>106904513
poor
Anonymous No.106904594 [Report] >>106904643
>>106904503
Codex managed to make a fully working Qwen3 8B inference engine.
But then when I wasn't able to immediately make it work with the MoE models I got impatient and started from scratch trying to make it more modular and also only using open source LLMs.
Starting over with a more complex model didn't help but open source LLMs are vastly inferior to Codex. That one didn't have any deception issues and also was able to go to 1M tokens without issues compared to the ~130k max tokens from GLM before it goes off the rails.
Anonymous No.106904603 [Report] >>106904633 >>106904665
>>106904468
1080ti: 62.49 tk/s
5060ti: 90.94
3090: 158.16
3090ti: 171.19
5090: 277.21
thanks for the link...thats even worse than i thought. fucking nvidia man..

>>106904470
i obviously meant like a voodoo moment. cheap and dedicated. would revolutionize local ai.

>>106904481
>>106904455
a used 3090 is around the same price like a 5060ti for me. might actually make more sense since in that benchmark its not even close.
im too much of a pussy to do the dryer thing. 20yrs ago i had a radeon card suddenly give me a fire fountain for a couple seconds. im afraid of gpus enough as it is. kek
but might try the themal repasting.

>>106904433
suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.
Anonymous No.106904624 [Report] >>106904633
any updates on what's best for 16gb vram?
Anonymous No.106904632 [Report] >>106904658
mesugaki
Anonymous No.106904633 [Report] >>106904675
>>106904603
If you can afford a used 3090, then you should definitely go for it. I got mine used like 3 years ago and it is completely fine. Just make sure you find a high rated seller.
>>106904624
Depends on your desired speed and how much RAM you have.
Anonymous No.106904643 [Report] >>106904717
>>106904594
You can still use the original code to learn. It'll be more valuable in the long run.
Anonymous No.106904658 [Report]
>>106904632
- is gay.
Anonymous No.106904665 [Report] >>106904682
>>106904603
>suspiciously with latest nvidia backdoor drivers being the last for pascal. a coincidence i am sure.
are you on windows? there was an update recently for me so it might be related. But if youre a linuxchad obviously its not that.
Anonymous No.106904675 [Report] >>106904701
>>106904633
32ram 16vram
quick responses are nice but I don't mind waiting, i never recorded the tk/s
was using a 12b before
Anonymous No.106904682 [Report]
>>106904665
i am on both.
but recently upgraded to kubuntu 25.04 with nvidia 580 drivers.
and winblows auto updates constantly.
crashed on both already.
i doubt its the drivers though. that would be crazy.
Anonymous No.106904701 [Report] >>106904789
>>106904675
Unfortunately not enough RAM to run GLM air. Try this model: https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-Instruct-2507-GGUF/blob/main/Qwen_Qwen3-30B-A3B-Instruct-2507-Q6_K.gguf
Anonymous No.106904717 [Report] >>106904760 >>106904766
>>106904643
This is the prompt I'm using right now
https://paste.centos.org/view/ca2ec944
Anonymous No.106904760 [Report] >>106904858
>>106904717
There was this guy a few years back in these threads when models weren't as good as they are now. He wanted to make a game that played on a hex grid. I saw him trying over and over again over many threads, trying to wrangle his model to do as he asked.
Hex grids are a solved problem. I gave him a link to a page with a lot of info on how to work with hexagons and the different coordinate systems they can have, rendering, calculating distances and all that. He seemingly read it, but kept on trying with his language model.
One day he was just gone. He either succeeded in getting his hexes, or gave up. Given the last few updates i remember, I suspect he failed, and learned very little about hexagons. Funnily, the hexagons were probably the simplest thing about his game.
Language models have their limits. Specially local ones. As good as they are, they're still pretty dumb.
I see hexanon in you.
Anonymous No.106904766 [Report] >>106904777 >>106904798 >>106904857
>>106904717
>3090
>This is a junk item. It is the main unit only. I checked that it worked, but there was no video output. There is white rust on the heat sink, and it is not in good condition, so please use it for parts. There are signs of disassembly. The defective part is unknown.
>71,000円
what the fuck man...
Anonymous No.106904777 [Report]
>>106904766
wasnt meant to reply. sorry about that, im still in a state of shock.
Anonymous No.106904789 [Report]
>>106904701
Why do people recommend small qwen models for anything besides coding
Nemo mogs them
Anonymous No.106904798 [Report] >>106904802
>>106904766
>71,000円
How much is that in a normal currency. Like postage stamps or toenail clippings...
Anonymous No.106904802 [Report]
>>106904798
around 500 dollars i suppose.
Anonymous No.106904828 [Report]
>>106904820
>>106904820
>>106904820
Anonymous No.106904857 [Report]
>>106904766
You can get one for around 9万 on yahoo if you are patient enough. Anything lower is usually “didn’t have an opportunity to test” = it doesn’t work
Anonymous No.106904858 [Report] >>106904894
>>106904760
I remember hexagon anon's struggles. He was cool
Anonymous No.106904894 [Report] >>106904905
>>106904858
Yeah. But, again, hexes were the simplest bit of code in his thing. Focusing so much on making the model spit code for him instead of just writing it was a waste of time. The link I gave him had ALL the code he needed to make them and get on with the rest of his project.
Similar to all those prospective VN makers
>If i could only draw i'd make the best VN...
>Oh, now that i have image gen i can totally make a game. I just need a good story and some dialog...
>Oh, now that i have LLMs, i can write the story. I just need to learn to code...
>Oh, now that LLMs can code, i can totally make my VN. If only these LLMs where better. WHY ARE THEY SO SHIT?!?!?!?!?!?
Instead of using all the new shiny toys to learn.
Anonymous No.106904905 [Report]
>>106904894
>where
kek. meant to say "were"
Anonymous No.106905065 [Report]
>>106903991
why is this faggot comparing m4 pro to the dgx spark when m4 max exists and costs less?? 3500$ vs 4000$
also
>engine ollama
MLX exists for macs, and pretty sure llamacpp is better on spark too
fucking faggot meme nvidia bootlicker benchmark
also
mac mini m4 pro costs 2000$ lol