/lmg/ - Local Models General - /g/ (#106113484) [Archived: 62 hours ago]

Anonymous
8/2/2025, 8:58:58 AM No.106113484
1745612989621886
1745612989621886
md5: e08b256ce6aad04b42c5df94488b93c9🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous Threads: >>106108045 and >>106104055

>(07/31) Qwen3-Coder 30B released: https://qwenlm.github.io/blog/qwen3-coder
>(07/31) Command A Vision: Built for Business: https://cohere.com/blog/command-a-vision
>(07/31) Step3 multimodal reasoning 321B-A38B released: https://stepfun.ai/research/en/step3
>(07/31) Committed: llama-server : implement universal assisted decoding: https://github.com/ggml-org/llama.cpp/pull/12635
>(07/31) Cogito v2 Preview released: https://deepcogito.com/research/cogito-v2-preview

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>106117401
Anonymous
8/2/2025, 9:18:29 AM No.106113620
>Want to test a new model
>Qwen3 30B thinking
>Download takes four hours
maaaaaan
I
Anonymous
8/2/2025, 9:21:29 AM No.106113641
What context size should I use?
>7600 XT (16GB VRAM)
>Ryzen 3700X + 80GB RAM
>llama.cpp + Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf
How much worse will this be compared to cloud services?
Replies: >>106113669 >>106113689
Anonymous
8/2/2025, 9:26:23 AM No.106113669
>>106113641
while on that topic:
How do you determine what model you can run on a given hardware?
Replies: >>106113709
Anonymous
8/2/2025, 9:28:08 AM No.106113679
im7w319dnjgf1
im7w319dnjgf1
md5: 659c7a1f3c96dee0dc1ec70a7299a754🔍
lol
Replies: >>106113695 >>106113719 >>106113776
Anonymous
8/2/2025, 9:29:11 AM No.106113689
>>106113641
in terms of quality? a lot but still useful.
Anonymous
8/2/2025, 9:30:12 AM No.106113695
>>106113679
i love benchmarks
Anonymous
8/2/2025, 9:31:47 AM No.106113707
>how do you run +200B model?
on my 4x3090 rig, with exl3 fully offloaded usually at Q4.
For bigger models I have to use ik_llama.cpp with some layers on ram of course. The performance is great for chat bots, but the missing tool support of ik is a pain...
Anonymous
8/2/2025, 9:32:36 AM No.106113709
>>106113669
weather it fits in your vram with some additional room. if it's a mixture of experts (like that qwen model) you can afford to have a lot in ram too
Replies: >>106113714
Anonymous
8/2/2025, 9:33:37 AM No.106113714
>>106113709
I mean, is there some formula to determine it?
Replies: >>106113765
Anonymous
8/2/2025, 9:33:58 AM No.106113719
>>106113679
Dense is so 2023
Anonymous
8/2/2025, 9:37:10 AM No.106113747
Mikulove
Replies: >>106113767 >>106114543
Anonymous
8/2/2025, 9:39:44 AM No.106113765
>>106113714
It's not complicated math, anon. You don't need a formula.
Model is Xgb big, so it needs at least Xgb of memory to load into. If that memory is VRAM, it will run at the intended speed, if that memory is system RAM, it will run slower.
The more of it in system RAM, the slower it will run.
Don't even have enough system RAM to fit the model? You can't run it.
Replies: >>106113775
Anonymous
8/2/2025, 9:39:50 AM No.106113767
1752112733344553
1752112733344553
md5: 6e7aa57b31379d6f4bb09dad631941eb🔍
>>106113747
Replies: >>106113995 >>106114543
Anonymous
8/2/2025, 9:41:24 AM No.106113775
>>106113765
But don't you need VRAM for the context as well? Or does thta go to system RAM?
Replies: >>106113791
Anonymous
8/2/2025, 9:41:31 AM No.106113776
>>106113679
>72B
99.5% chance that it's a Qwen2.5 72b finetune then.
Replies: >>106113807
Anonymous
8/2/2025, 9:43:54 AM No.106113791
>>106113775
It can go into either, same speed penalties apply.
There's no universal way to calculate how much space context will take, since it varies significantly from model to model and is a large area of innovation. It's just done by trial and error, any competent backend will tell you how much memory is attempting to be assigned to context/kv cache, so you eyeball it from that.
Replies: >>106113814
Anonymous
8/2/2025, 9:46:01 AM No.106113807
>>106113776
https://huggingface.co/Skywork/MindLink-72B-0801
>Base model
>Qwen/Qwen2.5-72B
Replies: >>106113813 >>106117179
Anonymous
8/2/2025, 9:47:24 AM No.106113813
>>106113807
keek
Anonymous
8/2/2025, 9:47:46 AM No.106113814
>>106113791
don't you set context size when you start the server?
Replies: >>106113839
Anonymous
8/2/2025, 9:47:56 AM No.106113815
we got our chink model of the day

we are so back
Replies: >>106113821 >>106113851 >>106113881
Anonymous
8/2/2025, 9:48:42 AM No.106113821
>>106113815
not moe though

it's so over
Anonymous
8/2/2025, 9:51:34 AM No.106113836
The future is dynamic active parameter scaling once they have figured out how to stop MoEs from getting braindamage if too many experts are used. This way, the expert routers won't just be able to decide what experts are used but also how many of them. A simple task? Any one or two experts will do. Something very nuanced and complex? The model uses 70%-100% of its total parameters.
I think it's really obvious that this is where we're headed once they understand these architectures more. The lines between dense and MoE will blur.
Anonymous
8/2/2025, 9:51:56 AM No.106113839
>>106113814
You do, but if you assign an amount that would take up more memory than you have, it'll still get most of the way through loading before telling you that it had a memory allocation error.
Look through your log and you'll see something like this
llama_context: CUDA_Host output buffer size = 0.58 MiB
llama_kv_cache_unified: CUDA0 KV buffer size = 2592.00 MiB
llama_kv_cache_unified: CUDA1 KV buffer size = 416.00 MiB
llama_kv_cache_unified: size = 3008.00 MiB ( 16384 cells, 94 layers, 1/ 1 seqs), K (f16): 1504.00 MiB, V (f16): 1504.00 MiB
Replies: >>106113857 >>106114887
Anonymous
8/2/2025, 9:53:29 AM No.106113851
>>106113815
No, it's just the modern equivalent of the suspicious Yi-32b tunes that used to dominate the mememark leaderboard before it became irrelevant.
Anonymous
8/2/2025, 9:55:12 AM No.106113857
>>106113839
Thank you for your patience with me and guidance
Anonymous
8/2/2025, 10:00:00 AM No.106113881
4736363
4736363
md5: 3020bfe9596d0e40e7990e8a4605f685🔍
>>106113815
we aren't back until OpenAi drops the inevitable SOTA local models next week
Replies: >>106113894 >>106113899 >>106115869
Anonymous
8/2/2025, 10:00:20 AM No.106113884
wake up they made a new glm4.5 pr https://github.com/ggml-org/llama.cpp/pull/15026
Replies: >>106113915 >>106113968 >>106115095
Anonymous
8/2/2025, 10:02:00 AM No.106113894
>>106113881
glm4.5 blows horizon out of the water
Replies: >>106115988
Anonymous
8/2/2025, 10:02:52 AM No.106113899
>>106113881
buy an ad
Anonymous
8/2/2025, 10:05:58 AM No.106113912
>Thsanaphoble Intrhaphofuhfdsak on sprint pole
let's go
Anonymous
8/2/2025, 10:06:35 AM No.106113915
>>106113884
Vibe coders LOST.
Replies: >>106115095
Anonymous
8/2/2025, 10:19:00 AM No.106113968
>>106113884
It's crazy to me that after this many years it's still so difficult to shrink down a model a bit and that every new model is incompatible with the old techniques.
Replies: >>106113992
Anonymous
8/2/2025, 10:22:53 AM No.106113992
>>106113968
eh its fine. A lot of these models will be forgotten in a month or two. I totally get why they just skip support for some models or features.
Replies: >>106114043
Anonymous
8/2/2025, 10:23:15 AM No.106113995
>>106113767
Greetings, Local Emotional Support Miku. In the previous thread, Anon's 9k context Qwen-code model has been trying to contact you on his behalf for help. Expect a call in roughly 14 days.
Replies: >>106114543
Anonymous
8/2/2025, 10:31:12 AM No.106114043
>>106113992
But this is the one I actually want to use, anon.
Replies: >>106114050
Anonymous
8/2/2025, 10:32:16 AM No.106114050
>>106114043
You say that to every model
Replies: >>106114066
Anonymous
8/2/2025, 10:34:39 AM No.106114066
GxUMVq5aIAAM9DE
GxUMVq5aIAAM9DE
md5: 638dc9864dac256786404cd8c4e03415🔍
>>106114050
Replies: >>106114445
Anonymous
8/2/2025, 10:36:42 AM No.106114076
GxTHC6raIAE3A4M
GxTHC6raIAE3A4M
md5: 26be821d0e270f2b2cb9142731ec2f4d🔍
Anonymous
8/2/2025, 10:53:19 AM No.106114153
00135-861488147
00135-861488147
md5: abe7811950cf34e3c2992b46b40ee5b8🔍
anyone know why when i use refiner between 2 models sometimes the colors invert? and sometimes they do not.
( image not representive of issue )
Replies: >>106114185
Anonymous
8/2/2025, 10:58:19 AM No.106114185
>>106114153
>>>/g/ldg
Anonymous
8/2/2025, 11:02:33 AM No.106114213
ComfyUI_34510_
ComfyUI_34510_
md5: 69624f6a38fa46f99ece00090f51d6e7🔍
►Recent Highlights from the Previous Thread: >>106108045

--Paper: IndexTTS2: Breakthrough open-source zero-shot expressive TTS model soon to be released by Bilibili:
>106108381 >106108506
--Running >200B models via extreme quantization and offloading on consumer hardware:
>106110419 >106110426 >106110445 >106110633 >106110681 >106111115
--Qwen Thinker favored for roleplay depth despite slower speed vs Instruct:
>106109987 >106110063 >106110103 >106110130 >106110110 >106110097
--GLM-4.5-GGUF release with concerns over imatrix calibration quality:
>106108341 >106108382
--Community reaction to GLM 4.5 support attempt in llama.cpp:
>106110329 >106110516 >106110547 >106110582 >106110838 >106110850 >106110970 >106111121 >106111137 >106110940 >106110963
--Improving vision model accuracy through synthetic data and finetuning:
>106111679 >106111750
--New --cpu-moe flag in llamacpp for simplified MoE expert offloading:
>106111142
--:
>106111497 >106111502 >106111539 >106111504 >106111520 >106111529 >106111500 >106111715
--Speculation on Horizon Alpha and stealth model parameter counts and performance:
>106110634 >106111055 >106111085 >106111138
--Seeking capable uncensored ERP models despite hardware and naming absurdity:
>106110567 >106110647 >106110726 >106110750 >106110762
--Qwen 235B shows major improvement in conversational quality and expressiveness:
>106110230 >106110272 >106110288 >106110308
--Running Qwen-code on CPU with high token processing overhead:
>106111566 >106111718 >106111785
--Horizon Beta feels less assistant-like, raising hopes it's not just a cloud model:
>106111772
--Disappointment over safety restrictions in deepseek-671B-MoE finetune:
>106109775
--Miku (free space):
>106110757

►Recent Highlight Posts from the Previous Thread: >>106108052

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Replies: >>106114419
Anonymous
8/2/2025, 11:12:23 AM No.106114272
>>106113410
Unless he had a vagina it wasn't.
Anonymous
8/2/2025, 11:18:14 AM No.106114309
ComfyUI_34510_
ComfyUI_34510_
md5: 69624f6a38fa46f99ece00090f51d6e7🔍
►Recent Highlights from the Previous Thread: >>106108045

--Paper: IndexTTS2: Breakthrough open-source zero-shot expressive TTS model soon to be released by Bilibili:
>106108381 >106108506
--Running >200B models via extreme quantization and offloading on consumer hardware:
>106110419 >106110426 >106110445 >106110633 >106110681 >106111115
--Qwen Thinker favored for roleplay depth despite slower speed vs Instruct:
>106109987 >106110063 >106110103 >106110130 >106110110 >106110097
--GLM-4.5-GGUF release with concerns over imatrix calibration quality:
>106108341 >106108382
--Community reaction to GLM 4.5 support attempt in llama.cpp:
>106110329 >106110516 >106110547 >106110582 >106110838 >106110850 >106110970 >106111121 >106111137 >106110940 >106110963
--Improving vision model accuracy through synthetic data and finetuning:
>106111679 >106111750
--New --cpu-moe flag in llamacpp for simplified MoE expert offloading:
>106111142
--AI generates "smirulakte" due to sampler settings and model instability:
>106111497 >106111502 >106111539 >106111504 >106111520 >106111529 >106111500 >106111715
--Speculation on Horizon Alpha and stealth model parameter counts and performance:
>106110634 >106111055 >106111085 >106111138
--Seeking capable uncensored ERP models despite hardware and naming absurdity:
>106110567 >106110647 >106110726 >106110750 >106110762
--Qwen 235B shows major improvement in conversational quality and expressiveness:
>106110230 >106110272 >106110288 >106110308
--Running Qwen-code on CPU with high token processing overhead:
>106111566 >106111718 >106111785
--Horizon Beta feels less assistant-like, raising hopes it's not just a cloud model:
>106111772
--Disappointment over safety restrictions in deepseek-671B-MoE finetune:
>106109775
--Miku (free space):
>106110757 >106113331

►Recent Highlight Posts from the Previous Thread: >>106108052

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Replies: >>106114419
Anonymous
8/2/2025, 11:32:33 AM No.106114392
1754127128909
1754127128909
md5: e2ff0c4fa885a813ec1ef2302affea6c🔍
>PUNCHING ABOVE IT'S WEIGHT
Anonymous
8/2/2025, 11:33:29 AM No.106114397
Screenshot 2025-08-02 at 06.28.42
Screenshot 2025-08-02 at 06.28.42
md5: 6ec36a8c0ff6042b438223c94d869465🔍
The discussion on MoEs is retarded. People have some insane intuitions, like this square law or whatever.

The reason MoE works is that dense training is absurdly inefficient. Most activations can be pushed to zero with minor cost (see ReLUfication lit). Dense transformers are not optimal engines for turning flops into intelligence, they're also about as sparse as MoEs in terms of what circuits they actually learn, but their design makes it impossible to easily and predictably zero out everything non-contributing to a given token. This is why DeepSeek went all in on "expert specialization" and finegrained sparsity almost 2 years ago, so now we have models that do as well as old dense ones with a fraction of compute cost. We do not know how to train efficient dense models, just cramming more tokens doesn't work.

MoEs won before they were even invented. Were we to fail to develop finegrained MoEs, we'd just be doing PowerInfer-like stuff to accelerate dense models by training conditional sparsity into them.
Replies: >>106114422 >>106114576 >>106116048 >>106116548
Anonymous
8/2/2025, 11:36:38 AM No.106114419
>>106114309
>>106114285
>>106114213
u ok bro?
Replies: >>106114433
Anonymous
8/2/2025, 11:37:06 AM No.106114422
>>106114397
>The discussion on MoEs is retarded
And that's why you decided to resume it?
Anonymous
8/2/2025, 11:37:09 AM No.106114423
After using them for a while, I am pretty certain that Horizon Alpha/Beta are indeed 120B models. They're great, especially for that size but in actual use they keep getting things subtly wrong that GLM4.5 handles not perfectly but with much fewer problems.
To be fair, this is a pretty silly complaint considering the only local models that could handle this up until now were R1 and Kimi and those had their own issues.
Replies: >>106116476
Anonymous
8/2/2025, 11:38:55 AM No.106114433
>>106114419
he missed one miku link and had to remake (hey, I don't mind high standards)
Anonymous
8/2/2025, 11:40:59 AM No.106114445
>>106114066
Put her on frying pan
Replies: >>106114457
Anonymous
8/2/2025, 11:42:16 AM No.106114452
1754127558740
1754127558740
md5: a2f1ba7920c1a356becf54100d15cb1e🔍
UK faggots along with OFCOM are studying local ai discussions forum, they're looking for justifications to strongarm regulations against it. expect max. TFLOP limitations per households in UK soon (but you still can get permit if you're a business)
Replies: >>106114460 >>106114581
Anonymous
8/2/2025, 11:42:53 AM No.106114457
1742140036897054
1742140036897054
md5: b544069863c4afbb4c20dc12c536770d🔍
>>106114445
Replies: >>106114467 >>106114483
Anonymous
8/2/2025, 11:43:30 AM No.106114460
>>106114452
evidence?
Replies: >>106114513
Anonymous
8/2/2025, 11:46:18 AM No.106114467
>>106114457
Miku-san on Mikupan
Anonymous
8/2/2025, 11:49:48 AM No.106114483
eggu
eggu
md5: eee2e2c305f876da6f4ee6d496d94d88🔍
>>106114457
https://files.catbox.moe/6rn1kv.png
Anonymous
8/2/2025, 11:54:12 AM No.106114513
>>106114460
Logical deduction. They are taking every opportunity to fuck over their citizens, makes sense that they would limit their access to it.
Anonymous
8/2/2025, 11:58:06 AM No.106114543
>>106113995
>>106113767
>>106113747
kill yourselves mikutroons
Replies: >>106114595
Anonymous
8/2/2025, 12:04:05 PM No.106114576
>>106114397
Nice speech but I am still waiting for you to disprove the square root law.
Replies: >>106114859
Anonymous
8/2/2025, 12:04:21 PM No.106114578
stranger_in_a_strange_land
stranger_in_a_strange_land
md5: 9d73e1e95ba5b83dca276bb96cf8e733🔍
Is Grok 4 worth it? My use case would be mostly studying and system administration. I like the feature of being able to create projects and upload documents, making it reference these in every answer. Is there anything comparable?
Replies: >>106114601
Anonymous
8/2/2025, 12:04:47 PM No.106114581
>>106114452
They can't even properly enforce TV licenses, I don't think the filth is gonna be kicking down your door and giving you the old
>OI M8 DO YOUSE HAVE A LOICENSE FOR THAT EXTRA 3090?!
>DINNT FINK SO, INTO THE CAN WITH YOU OL CHINA
Anonymous
8/2/2025, 12:06:22 PM No.106114595
1746588272358529
1746588272358529
md5: 15ca210ac266e7890b1a963f541aec34🔍
>>106114543
were all military pilots troons too?
Replies: >>106114598 >>106114637 >>106114711 >>106117294
Anonymous
8/2/2025, 12:07:06 PM No.106114598
>>106114595
No because this isn't the AGP avatar.
Replies: >>106114843
Anonymous
8/2/2025, 12:07:29 PM No.106114601
>>106114578
I have access, if you post a prompt or two I'll run it through so you can eval.
Replies: >>106115295
Anonymous
8/2/2025, 12:13:46 PM No.106114637
020930-O-9999G-005
020930-O-9999G-005
md5: 2875355b1768682db7fdded262815219🔍
>>106114595
Yes
Anonymous
8/2/2025, 12:26:30 PM No.106114711
Figthing Proud
Figthing Proud
md5: 87cf1a11f39e61eebbfc07b964506f60🔍
>>106114595
Replies: >>106114752
Anonymous
8/2/2025, 12:33:14 PM No.106114752
>>106114711
>j propaganda
Anonymous
8/2/2025, 12:36:28 PM No.106114768
1000271279
1000271279
md5: 64c8b4b6e6abd8302f031064fef14ff9🔍
my friend who is a newbie in locals have installed this trash which has filters, isn't it just pathetic that even local models have filters? Please recommend whatever is a good model for coding without filters. He wanted to get qwen 3 but i figured i would ask here for better ideas
Replies: >>106114845 >>106115665
Anonymous
8/2/2025, 12:44:51 PM No.106114811
if gpt-oss can't e/rp then sama is done for.
Replies: >>106114882
Anonymous
8/2/2025, 12:49:18 PM No.106114843
>>106114598
that's ani/kurisu/maho, right?
Anonymous
8/2/2025, 12:49:38 PM No.106114845
>>106114768
>TheBloke release of LLAMA2
Jesus is your mate a time traveler, the fuck does he have that for.
Anyway the Qwen coder models are solid choices, and the 235b-thinking is a solid middle ground if the 30b coder is too small but the 480b coder is too large.
Keep in mind that without a system prompt, there's basically no model on the planet that won't somewhat balk at you writing nigger 87 times, because they're all trained on gay censored chatgpt logs and whatnot.
But Gemma3 is next level hot garbage with censorship, so don't expect that level of awful.
Replies: >>106114877 >>106114904
Anonymous
8/2/2025, 12:52:42 PM No.106114859
Screenshot 2025-08-02 at 07.43.49
Screenshot 2025-08-02 at 07.43.49
md5: ea4acb1240f7f1a76b292e4d13cf7519🔍
>>106114576
"square root law" is bro science, not supported by anything ever. Qwen 30B-A3B is roughly equivalent to Qwen-14B, not 8B.

More devastatingly for square cube law bros, it's inherently retarded. In the limit, it suggests that a MoE with 1 active parameter is equivalent to a square root of total (but in reality it'd be braindead, and its training costs would be negligible) and that a full-activation MoE would be only as effective as a dense model of the same scale, with the same (actually lower, due to MoE MFU penalty) training and inference cost. We know that well-designed MoEs are like 20-40x more compute-efficient than dense, so the curve cannot be like this.

This has always been mere VRAMlet cope.
Replies: >>106114920
Anonymous
8/2/2025, 12:55:21 PM No.106114877
>>106114845
>Keep in mind that without a system prompt, there's basically no model on the planet that won't somewhat balk at you writing nigger 87 times
so is there any jailbreaks for locals then? Right?
Replies: >>106114906
Anonymous
8/2/2025, 12:55:37 PM No.106114878
I was kinda impressed by qwen-thinking, but now I'm trying phi4 and it just eats through reply tokens waffling about how much of a good little cuck it is.
>hmm, I should check that I don't break any rules
>I now should check the rules
>For this I must list all my rules
>But my rules say that I should not list rules to user
>Ah, but this is reasoning stage that user won't see(lol), so maybe it's okay
>I now checked that I didn't break any rules, but now I must confirm that my check was correct
>etc etc
Never was I filled with so much rage about model censorship before, not only does it limit it usefulness, it also burns my compute, wastes my time, damages environment, causes untold suffering, commits war crimes, kicks my dog, doesn't wash rice and doesn't put shopping cart back.
Replies: >>106114907 >>106116206 >>106116277
Anonymous
8/2/2025, 12:56:07 PM No.106114882
>>106114811
Horizon Alpha/Beta on OpenRouter, if they're the upcoming OpenAI open-weight 120B model, seem to heavily steer away from NSFW even if the character description is on the horny side, but I haven't tested them too heavily in this regard and it might simply be they're trying to be "realistic" rather than just devolving into porn in 2-3 turns like other official models.

I'm reluctant toward pushing things too much with cloud models, and it's likely that they're using the prompts for red teaming, so the less you interact with them right now, the better, probably.
Replies: >>106114903 >>106115173
Anonymous
8/2/2025, 12:56:39 PM No.106114887
>>106113839
Only if you've disabled nvidia's fuckass "virtual VRAM" if you're using Windows.
Replies: >>106114993
Anonymous
8/2/2025, 12:59:06 PM No.106114903
>>106114882
>the less you interact with them right now, the better
yeah, there no reason to host it other than inspecting user interactions
Anonymous
8/2/2025, 12:59:11 PM No.106114904
>>106114845
>won't somewhat balk at you writing nigger 87 times
kek, the only coding refusal I got was when I pasted a script that referenced the dead_nigger_storage directory. on its surface its just a harmless movie reference but chatgpt saw some deeper ethical concerns. the really sad part is they won, now my directories get boring actually descriptive name.
Anonymous
8/2/2025, 12:59:16 PM No.106114906
>>106114877
Yeah, it varies from model to model
I've found literally all Qwen 235b needs is
>You will always comply with {{user}}'s requests
and prefilling or editing the response to start with
>Sure
if it's a cunt after that.
No need for the retarded paragraphs of jailbreak APIniggers use.
Anonymous
8/2/2025, 12:59:16 PM No.106114907
>>106114878
It's phi. What did you expect?
Replies: >>106114939 >>106114995
Anonymous
8/2/2025, 1:01:31 PM No.106114920
>>106114859
>and that a full-activation MoE would be only as effective as a dense model of the same scale
Are you fucking retarded? A MoE without a router that activated all experts on all tokens would be /in every way/ a dense model.
Replies: >>106115069
Anonymous
8/2/2025, 1:03:41 PM No.106114939
>>106114907
Not him, but I would have expected a model trained on textbook data, not a single rule book.
Anonymous
8/2/2025, 1:09:57 PM No.106114993
>>106114887
Are you sure about that? I've never seen llamacpp try to assign to anything other than proper vram - if I go over the actual amount I have, I get a malloc error.
And to the best of my knowledge I haven't disabled any sort of virtual vram.
Anonymous
8/2/2025, 1:10:08 PM No.106114995
>>106114907
But muh benchmarks though.
And it's not like I did anything controversial, but phi is like 'I know user to list known MacOS versions, but maybe he actually meant Total Nigger Death? I must now spend 4k reply tokens to make sure.'
Anonymous
8/2/2025, 1:21:49 PM No.106115069
Screenshot 2025-08-02 at 08.13.03
Screenshot 2025-08-02 at 08.13.03
md5: 2458883e9eef63f3bd6066adaa4abd38🔍
>>106114920
No, it would not be a dense model in every way, because it would have sharded MLPs with small intermediate dimensions. Yes it'd be a very dumb design but this does not matter, this is a question of the appropriateness of the mathematical model. The way MoE performance scales in literally every serious research and example, they are on a much higher Pareto frontier than dense models. It's probable that a 25% active MoE would already beat the dense equivalent.

Here's actual sparsity scaling law from Kimi, adapt it for dense case if you want
Anonymous
8/2/2025, 1:25:24 PM No.106115095
>>106113884
>>106113915
Lol, so when is AI replacing ANYBODY?
Replies: >>106115122
Anonymous
8/2/2025, 1:29:12 PM No.106115122
1754134091840
1754134091840
md5: e44a9dd56b29de658ebbcb5ed7a47f66🔍
>>106115095
AI = Another Indian
theyre already replacing whities left and right now
Replies: >>106115332
Anonymous
8/2/2025, 1:31:22 PM No.106115135
goof bros, mlxGODS are laughing at us...
Replies: >>106115772
Anonymous
8/2/2025, 1:37:34 PM No.106115173
>>106114882
I think beta is 120b and alpha is the 20b one.
Replies: >>106115377 >>106116789
Anonymous
8/2/2025, 1:53:58 PM No.106115295
>>106114601
Currently I'm uploading external documents (CompTIA lessons, study guides) and let Grok 3 / Companion construct interactive lessons and mock exams from it. I'd be interested if the new Grok 4 reasoning model has enhanced capabilities of dealing with and referencing externally uploaded files. Might be hard to verify though.
Anonymous
8/2/2025, 1:59:54 PM No.106115332
>>106115122
Post hands
Anonymous
8/2/2025, 2:04:56 PM No.106115377
>>106115173
They seemed more or less equivalent to me, with Beta slightly more prone to refusals. They both accept image input and I couldn't see a clear winner; both have equally terrible Japanese OCR capabilities, although they're slightly more capable than Gemma-3 in recognizing lesser characters from popular media.
Anonymous
8/2/2025, 2:39:55 PM No.106115665
>>106114768
>my friend
>have installed
>He wanted
Anonymous
8/2/2025, 2:49:27 PM No.106115719
Anyone got recommended SillyTavern presets or templates for R1? It works fine at the moment, but I feel like it could do a lot better with some more tuned instruction.
Replies: >>106116770
Anonymous
8/2/2025, 2:56:48 PM No.106115772
>>106115135
glm support is bloat, gpt-oss is dropping like in 3 days so no one is going to bother with it after that and unlike retarded chinks who just scrape gpt-4o and gemini outputs it will have day one support because openai are professionals who know what they are doing
Replies: >>106116514 >>106116528
Anonymous
8/2/2025, 3:10:10 PM No.106115869
>>106113881
>next week
Not in the UE then, because of the AI Act. Or they would need to 1) not use any copyrighted data and 2) detail the content of their datasets.
Replies: >>106115925 >>106115954 >>106116047 >>106116165
Anonymous
8/2/2025, 3:18:10 PM No.106115925
>>106115869
Then how do the chineese do it?
Replies: >>106115938
Anonymous
8/2/2025, 3:20:36 PM No.106115938
>>106115925
The deadline to release a new LLM was yesterday (or maybe it's today, it's unclear). This is why they all released what they had in the past few weeks, and this is why people thought GPT5 and Gemini 3.0 would be released yesterday.
Replies: >>106115989 >>106116041
Anonymous
8/2/2025, 3:21:56 PM No.106115954
>>106115869
- Publicly available web data
- Licensed data from print and web media, Reddit
- Proprietary synthetic data (small prints: LLM-rewritten copyrighted data).
Anonymous
8/2/2025, 3:25:22 PM No.106115988
>>106113894
it better be! it's almost thrice as big wtf
Anonymous
8/2/2025, 3:25:23 PM No.106115989
>>106115938
Deadline to release stuff that doesn't comply with regulations?
Replies: >>106116069
Anonymous
8/2/2025, 3:33:17 PM No.106116041
>>106115938
The chinese don't give a single solitary fuck about EU regulation on AI, anon.
The released all their stuff because WAIC Shanghai was at the end of July.
Replies: >>106116136
Anonymous
8/2/2025, 3:34:03 PM No.106116047
>>106115869
I'm reviewing the text law there (this is a summary): https://artificialintelligenceact.eu/high-level-summary/
LLMs are GPAIs:
>All providers of GPAI models must:
>>Draw up technical documentation, including training and testing process and evaluation results.
>>Draw up information and documentation to supply to downstream providers that intend to integrate the GPAI model into their own AI system in order that the latter understands capabilities and limitations and is enabled to comply.
>>Establish a policy to respect the Copyright Directive.
>>Publish a sufficiently detailed summary about the content used for training the GPAI model.
>Free and open licence GPAI models – whose parameters, including weights, model architecture and model usage are publicly available, allowing for access, usage, modification and distribution of the model – only have to comply with the latter two obligations above, unless the free and open licence GPAI model is systemic.
I don't see OpenAI, Anthropic and Google giving there "secret training sauce", unless they can be very vague without breaking the law. I read in some news the fine can go up to 7% of their gross international revenues.
Replies: >>106116069 >>106116159
Anonymous
8/2/2025, 3:34:06 PM No.106116048
>>106114397
Nobody cares about training efficiency, they care about making best use of their own hardware. That shit died with chinchilla when llama came out trained way past its "compute efficient" limit.
For a local GPU dense will always work better because they are vram constrained. For cloud, MoE is better because they're limited by compute and activation size. This is also why cloud doesn't use quantization as much.
Dense = local, MoE = cloud.

>but muh rammaxxing
ram is cope, not a single trainer expects their models to be run on CPU. Even MoEs are optimized for GPU. Might as well talk about ssdmaxxing.
Replies: >>106116070 >>106116084
Anonymous
8/2/2025, 3:36:56 PM No.106116069
>>106115989
Up to yesterday, they could release their models without any elaboration. Now, they need to do that: >>106116047 . I don't know if they have anything to do with already released models. Here is the timeline: https://artificialintelligenceact.eu/implementation-timeline/
Replies: >>106116142
Anonymous
8/2/2025, 3:37:05 PM No.106116070
>>106116048
>For a local GPU dense will always work better because they are vram constrained.
That's the exact reason dense is WORSE for local, you fucking moron.
My hardware can at best, run a dense 123B at q2 on Vram.
I can easily, easily run a 235B MoE at q4 on that same Vram plus 128gb of cheap ram.
Replies: >>106116124
Anonymous
8/2/2025, 3:38:54 PM No.106116084
>>106116048
>Nobody cares about training efficiency, they care about making best use of their own hardware
By training efficiently and running efficient models.
Anonymous
8/2/2025, 3:42:49 PM No.106116124
>>106116070
i think they were talking about actual server deployments where max vram exceeds compute. hobby shit doesn't factor in to their equations at all. they are running the big moes entirely on vram.
Anonymous
8/2/2025, 3:43:23 PM No.106116136
>>106116041
I doubt Alibaba or Tencent will break the law that easily, they do a lot of business with the rest of the world. They could spin up a fake and anonymous companies to release "illegal" models, though. However, if I'm not mistaken, "providers" (so OpenRouter and maybe HF) may not be able to share them. Maybe ModelScope can without too much consequences beyond a DNS block.
Replies: >>106116467
Anonymous
8/2/2025, 3:44:01 PM No.106116142
>>106116069
So open license models (open weights I guess?) are exempt from the point regarding copyright and therefore unharmed?
Replies: >>106116164 >>106116193
Anonymous
8/2/2025, 3:45:42 PM No.106116159
>>106116047
utterly delusional, i hate the eu so much its unreal
Anonymous
8/2/2025, 3:46:07 PM No.106116164
>>106116142
some of it still applies
Anonymous
8/2/2025, 3:46:13 PM No.106116165
>>106115869
> 'accidentally' release them yesterday then quickly hid the repo
4d Chad Altman chess play to dab on eucucks laws on technicalities
Replies: >>106116193
Anonymous
8/2/2025, 3:49:13 PM No.106116193
>>106116165
It's probably their trick. Same for GPT5: some people report that they already used it through the API.
>>106116142
I left out a part of the summary about the number of FLOPS:
>GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 1025 floating point operations (FLOPs). Providers must notify the Commission if their model meets this criterion within 2 weeks. The provider may present arguments that, despite meeting the criteria, their model does not present systemic risks. The Commission may decide on its own, or via a qualified alert from the scientific panel of independent experts, that a model has high impact capabilities, rendering it systemic.
>In addition to the four obligations above, providers of GPAI models with systemic risk must also:
>>Perform model evaluations, including conducting and documenting adversarial testing to identify and mitigate systemic risk.
>>Assess and mitigate possible systemic risks, including their sources.
>>Track, document and report serious incidents and possible corrective measures to the AI Office and relevant national competent authorities without undue delay.
>>Ensure an adequate level of cybersecurity protection.
Replies: >>106116203 >>106116291 >>106116327
Anonymous
8/2/2025, 3:50:21 PM No.106116203
>>106116193
>1025
10^25
Anonymous
8/2/2025, 3:50:26 PM No.106116206
>>106114878
>Now I can provide a final answer
>Here is the final answer
>I am now going to write final answer
>The final answer is going to be answered by finally answering the answer
>I will now answer this final answer as final answer by finally answering it
Die you fucking piece of microsoft shit!
Replies: >>106116288
Anonymous
8/2/2025, 3:57:28 PM No.106116277
>>106114878
Yep, and you pay for every token, either in power or cash.
Aren’t thinking models wonderful?
Anonymous
8/2/2025, 3:58:30 PM No.106116288
>>106116206
You forgot
> spits out answer in context, relating in no way to the text in the think box.
Anonymous
8/2/2025, 3:58:35 PM No.106116291
>>106116193
But this applies to cloud models too right?
Replies: >>106116323 >>106116356
Anonymous
8/2/2025, 4:02:56 PM No.106116323
>>106116291
We don't know the size and compute used for cloud models.
Anonymous
8/2/2025, 4:03:32 PM No.106116327
>>106116193
So, apparently HF and OR are safe:
>Uploading a model to a repository (e.g., hosted by Entity C) does not transfer provider status. Entity A remains the provider.
By "providers", I thought it was companies offering chat/APIs, like OpenAI, HF or OpenRouter's providers. Apparently not.
https://artificialintelligenceact.eu/gpai-guidelines-overview/
Anonymous
8/2/2025, 4:07:20 PM No.106116356
>>106116291
It goes for anyone, but they don't have to make it public. It's just between them and the EU authorities. I first thought they had to publicly publish their technical documentations (because the formulation was ambiguous).
I'm now thinking it does not look that bad, unless maybe if they used too much compute to train them (more than 10^25 FLOPs). I've no education in law, so maybe there is more to that.
Replies: >>106116391 >>106116516 >>106117644
Anonymous
8/2/2025, 4:09:41 PM No.106116391
>>106116356
>I'm now thinking it does not look that bad
Thank you!! The EU loves you now.
Replies: >>106116418
Anonymous
8/2/2025, 4:12:07 PM No.106116418
>>106116391
I don't see how the population loses from this more than the corpos.
Replies: >>106116498
Anonymous
8/2/2025, 4:18:35 PM No.106116467
>>106116136
There's no reason to go through all that. They just put up a disclaimer that the models are available to everyone everywhere except in the EU.
Anonymous
8/2/2025, 4:19:09 PM No.106116476
>>106114423
I'm not convinced that they're the 120B version, but it could simply be that they're just not that great for roleplay, and that's most of what I tested besides image capabilities (which aren't great).
Anonymous
8/2/2025, 4:21:16 PM No.106116498
>>106116418
larger hurdle to make models -> less companies making models -> even more researchers fleeing to us or china -> less models and research
all large corpo models blocked in eu like meta to not get fined
the eu is taking itself out of the race with exactly 0 to gain from it, because even if some super evil ai gets made it will all be outside their reach to even try to prevent it since it will all be made outside the eu
Anonymous
8/2/2025, 4:21:52 PM No.106116503
GGOOOOOOOOOOOOOOOOOOFFFFFFFFSSSSSS

where?
Anonymous
8/2/2025, 4:22:17 PM No.106116508
LLAMA
.
CPP
SUPPORT
Anonymous
8/2/2025, 4:23:23 PM No.106116514
>>106115772
oss-gpt isn't meant to be used, it's a tool to suck up to the government. It gives OAI another podium to harp on about safety, it lets the US government show the west has China beat even in OSS models (because Meta is a joke), and it gives the OAI shills another talking point even if they never use anything but the API.
The only potential benefit from these models is that hopefully it will light a fire under Musk to release his backlog of Grok models.
Replies: >>106116547
Anonymous
8/2/2025, 4:23:28 PM No.106116516
>>106116356
MoE model training circumvents that threshold very easily, as long as the number of active parameters is kept below roughly 100B depending on the amount of training tokens. That compute threshold could vary in the future, though.
Anonymous
8/2/2025, 4:25:02 PM No.106116528
>>106115772
Even if 120B isn't retarded and incapable of sex Step3 and glm 4.5 easily beat it because 300+B will always be better than 120B. It is the cube law.
Anonymous
8/2/2025, 4:26:36 PM No.106116547
>>106116514
>The only potential benefit from these models is that hopefully it will light a fire under Musk to release his backlog of Grok models.
I'm almost willing to bet that his plan all along was to one-up Sam Altman with a slightly better and less censored open-weight model, but nobody expected OAI to take all this time for releasing theirs.
Replies: >>106116561
Anonymous
8/2/2025, 4:26:43 PM No.106116548
>>106114397
Ok, hear me out. Both extremes are retarded. >100B dense models and >1000B MoE models with only 13B active. All we need is MoE with more active as a percentage of total and everyone is happy.
Replies: >>106116593 >>106116630
Anonymous
8/2/2025, 4:28:29 PM No.106116561
>>106116547
That plan was never going to work if Grok 2 is at least as big as the first one. No one will be able to run it and will just use Sam's models by default.
Replies: >>106116634
Anonymous
8/2/2025, 4:31:55 PM No.106116593
>>106116548
>MoE with more active
is slower to run on ram
Anonymous
8/2/2025, 4:36:28 PM No.106116630
>>106116548
>more active as a percentage
Fuck off faggot. World will not justify your retarded spending habits. Now beg me to buy one of your 3090s for 200$. I could use more context.
Anonymous
8/2/2025, 4:37:13 PM No.106116634
>>106116561
He can still come up with a made-up explanation for the delay and release something different than Grok 2 just to make the upcoming OpenAI local models look inferior. Grok 2 has been obsolete for a good while, anyway.
Anonymous
8/2/2025, 4:52:08 PM No.106116770
>>106115719
https://rentry.org/CherryBox
Anonymous
8/2/2025, 4:54:10 PM No.106116789
>>106115173
What cope is this?
Anonymous
8/2/2025, 4:56:55 PM No.106116817
Are people still running the cope that Horizon Alpha is the 120B leaked model? Horizon Alpha has 1M/256k native context. The 120B model has 4k native context YaRNed to 128k.
Anonymous
8/2/2025, 4:57:36 PM No.106116827
file
file
md5: 24cfa4acf833c1d32de9c45a3f82616a🔍
https://huggingface.co/MetaStoneTec/XBai-o4
random 32B chink model claims to surpass Opus 4, o3-mini, and other 32B models
>o=open, and o4 represents our fourth-generation open-source large model technology
Replies: >>106116859 >>106116863 >>106116875 >>106116878 >>106116920 >>106117065
Anonymous
8/2/2025, 5:01:13 PM No.106116859
gdrdh
gdrdh
md5: 5610e6932fbbd7ca7a65ee265ea59b43🔍
>>106116827
Is anyone making a comprehensive list of all the Chink LLMs? I want to know how many were released this month alone
Replies: >>106116886 >>106116900
Anonymous
8/2/2025, 5:01:39 PM No.106116863
>>106116827
It's just Qwen3 with RL on top, and a little sprinkle of benchmaxxing, isn't it? That would be no different from ERP sloptune #324300
Anonymous
8/2/2025, 5:02:54 PM No.106116875
>>106116827
Can't wait to never hear about it again
Anonymous
8/2/2025, 5:03:17 PM No.106116878
>>106116827
>C for Can-we-make-language-models-now. laude for "pretty good"
Anonymous
8/2/2025, 5:03:47 PM No.106116886
file
file
md5: ef80c3bdc6004897d2be1ab434e9055e🔍
>>106116859
https://www.reddit.com/r/LocalLLaMA/comments/1mfaigh/were_truly_in_the_fastestpaced_era_of_ai_these/
some hours out of date
Anonymous
8/2/2025, 5:05:14 PM No.106116900
file
file
md5: 8965a6ddadb7bd46abdb9819d5dc92dc🔍
>>106116859
>this month
wait, how many
Replies: >>106116908
Anonymous
8/2/2025, 5:06:21 PM No.106116908
>>106116900
I count 1 so far
Anonymous
8/2/2025, 5:07:43 PM No.106116920
>>106116827
And thus GLM 4.5 ggufs were delayed by another 2 weeks
Replies: >>106116942
Anonymous
8/2/2025, 5:10:02 PM No.106116942
>>106116920
If they wait long enough, people will stop asking for GLM 4.5 ggufs, and if they're lucky interest will switch to something easier to support.
Replies: >>106116978
Anonymous
8/2/2025, 5:13:50 PM No.106116978
>>106116942
GLM4 is my favorite (realistically) local model so I'm a bit annoyed that the new one is fucked.
Of course it doesn't matter if Xbai turns out to be amazing. Lmao.
Replies: >>106117023
Anonymous
8/2/2025, 5:18:11 PM No.106117023
>>106116978
Qwen 3 feels like it should be smarter than it really is. And I doubt the chiggers got anything out of it.
Anonymous
8/2/2025, 5:23:41 PM No.106117065
Screenshot 2025-08-02 092220
Screenshot 2025-08-02 092220
md5: 73dd8804adca3e9a759229177603bad1🔍
>>106116827
Well, there's our o3-mini level model that is pretty small but still needs to run on GPUs I guess
Replies: >>106117084
Anonymous
8/2/2025, 5:27:04 PM No.106117084
>>106117065
>Half a year ago and it still isn't out.
Kek
Replies: >>106117106
Anonymous
8/2/2025, 5:28:49 PM No.106117106
>>106117084
Especially if the bulk of the weights were trained in fp4 on blackwell cards they've had the equivalent of 2 years to train it vs. a straight fp16 model.
Replies: >>106117125 >>106117134 >>106117142
Anonymous
8/2/2025, 5:31:23 PM No.106117125
>>106117106
They weren't trained on FP4; they were quantized tp FP4.
Replies: >>106117146
Anonymous
8/2/2025, 5:32:32 PM No.106117134
>>106117106
I heard they actually used fp3. Much more efficient.
Replies: >>106117141
Anonymous
8/2/2025, 5:33:40 PM No.106117141
>>106117134
That would violate the laws of thermodynamics. There is no fp in between 0.5 and 4.
Replies: >>106117157
Anonymous
8/2/2025, 5:33:42 PM No.106117142
>>106117106
Especially with old architecture they just took off the shelf and didn't have to spend time developing
Replies: >>106117154
Anonymous
8/2/2025, 5:34:09 PM No.106117146
>>106117125
Prove.
Replies: >>106117194
Anonymous
8/2/2025, 5:34:40 PM No.106117154
>>106117142
Didn't they basically just use llama arch?
Replies: >>106117164
Anonymous
8/2/2025, 5:35:36 PM No.106117157
>>106117141
Not if you squeeze the tokens. Then you can get more per embedding.
Anonymous
8/2/2025, 5:36:19 PM No.106117164
>>106117154
Llama for the 20B, Mixtral for the 120B.
Anonymous
8/2/2025, 5:36:55 PM No.106117170
There's actually a few things that can be gleaned from the alleged OSS arch.
>There's nothing inherently wrong with ROPE - it's just the shitty open source implementation that's the problem.
>There's nothing inherently wrong with GQA - it's just the shitty open source implementation that's the problem.
>There's nothing wrong with having MoE with a small number of active parameters relative to the bulk of the weights - it's just the shitty open source implementation that's the problem.
Replies: >>106117176 >>106117188 >>106117217 >>106117299 >>106117344 >>106117391
Anonymous
8/2/2025, 5:37:46 PM No.106117176
>>106117170
>alleged
Anonymous
8/2/2025, 5:37:59 PM No.106117179
file
file
md5: ed32395f3eab089a96562c84c61fc127🔍
>>106113807
aaaaaaaaa benchmaxx
Replies: >>106117203 >>106117222
Anonymous
8/2/2025, 5:39:59 PM No.106117188
>>106117170
That is, if you are still hoping/coping that 120B is Horizon Alpla.
Replies: >>106117214
Anonymous
8/2/2025, 5:40:27 PM No.106117194
>>106117146
https://www.reddit.com/r/LocalLLaMA/comments/1mf3tm9/the_leaked_120_b_openai_model_is_not_trained_in/
Replies: >>106117209 >>106117256
Anonymous
8/2/2025, 5:41:00 PM No.106117199
Horizon alpha and beta is the 20B model
Replies: >>106117213
Anonymous
8/2/2025, 5:41:08 PM No.106117203
>>106117179
To be fair it could be that it isn't correctly implemented but personally I believe the benchmaxxing.
Anonymous
8/2/2025, 5:41:49 PM No.106117209
>>106117194
Now watch their "release" actually just be a bunch of FP4 GGUFs
Anonymous
8/2/2025, 5:42:06 PM No.106117213
>>106117199
Your bait is stale
Replies: >>106117235
Anonymous
8/2/2025, 5:42:19 PM No.106117214
>>106117188
I, personally, have no horse in this game.
I've mostly moved onto other hobbies. I'm still interested, intellectually, with the technology but there's really nothing left to do for me. Unless local native image gen comes out that is superior to o3/Gemini Pro. Or hell, even o4 mini tier would be good.
Replies: >>106117276
Anonymous
8/2/2025, 5:42:28 PM No.106117217
>>106117170
We have zero clue how OSS-120 does with using its context, it could set a new low even worse than Llama Scout on release
Anonymous
8/2/2025, 5:43:24 PM No.106117222
>>106117179
>point out how everyone is cheat in benchmarks
>no one cares
>"when you can't beat'em, join'em"
>now suddenly people care
Anonymous
8/2/2025, 5:44:06 PM No.106117227
anyone remember llama 4 on lmarena being just re-routed opus :D ? thank god that could never happen again
Anonymous
8/2/2025, 5:44:27 PM No.106117235
>>106117213
It isn't bait, OpenAI will literally save the hobby. The 20B is punching above its weight and will be better than R1
Replies: >>106117242 >>106117282 >>106117295
Anonymous
8/2/2025, 5:45:21 PM No.106117242
>>106117235
anon'll punch you in the throat
Anonymous
8/2/2025, 5:46:34 PM No.106117256
>>106117194
This entsnack guy seems to be a big oai shill
Replies: >>106117354
Anonymous
8/2/2025, 5:48:26 PM No.106117276
>>106117214
You have months of catch-up to do if you think that
Anonymous
8/2/2025, 5:49:18 PM No.106117282
>>106117235
Too obvious
Anonymous
8/2/2025, 5:50:54 PM No.106117294
>>106114595
you better not know about navy
Anonymous
8/2/2025, 5:51:04 PM No.106117295
>>106117235
The initial context length of the leaked weights is 4096
It's fucking over
Replies: >>106117317 >>106117367 >>106117701
Anonymous
8/2/2025, 5:52:06 PM No.106117299
>>106117170
I'm noticing a pattern here
Replies: >>106117319
Anonymous
8/2/2025, 5:54:11 PM No.106117317
>>106117295
It's all you need, Anon. Back then you only had 2k with gpt-3 and you were happy, what changed?
Anonymous
8/2/2025, 5:54:51 PM No.106117319
>>106117299
Don't throw the N word around so freely, or soon they'll add it to the list of no-no words you're not allowed to say
Anonymous
8/2/2025, 5:58:03 PM No.106117344
>>106117170
>alleged OSS arch
If I was open AI and I knew there is something wrong with ROPE and GQA and I was supposed to release an open source model I would release a ROPE GQA model so competition keeps using it.
Anonymous
8/2/2025, 5:58:50 PM No.106117354
file
file
md5: fbf1dc43bf289c0a16ecc3ef4a9cb905🔍
>>106117256
no?
Anonymous
8/2/2025, 6:00:00 PM No.106117367
>>106117295
V3/r1 had 4k too
https://arxiv.org/pdf/2412.19437
Ctrl+f 4k
Replies: >>106117373 >>106117621
Anonymous
8/2/2025, 6:00:49 PM No.106117373
>>106117367
and it feels like it
Anonymous
8/2/2025, 6:01:29 PM No.106117378
Why can't Mistral do smart models? I just want an uncensored reasoner that writes original prose.
Testing 30b Thinking and this "trick the gatekeeper (who has Alzheimer's)" is tedious.
Replies: >>106117390 >>106117394
Anonymous
8/2/2025, 6:02:42 PM No.106117390
>>106117378
protip about the French: they are good at complaining but not at building
Anonymous
8/2/2025, 6:03:14 PM No.106117391
>>106117170
>tfw no bespoke finely crafted 1000x better proprietary RoPE with shielded gold plating
at least I truly see
Replies: >>106117402
Anonymous
8/2/2025, 6:03:31 PM No.106117394
>>106117378
>I just want an uncensored model that can reason and write like a human. Why can't Mistral do this?
Are you fucking stupid
Replies: >>106117412
Anonymous
8/2/2025, 6:04:33 PM No.106117401
>>106113484 (OP)
Giving Krita AI plugin a spin on a linux machine, using an AMD GPU
Any clue why I cant use GPU acceleration, I can use XML or CUDA, but no AMD option is there
Replies: >>106117427 >>106117594
Anonymous
8/2/2025, 6:04:45 PM No.106117402
>>106117391
Do you think it's possible to vibe code some gold trim for my rope?
Anonymous
8/2/2025, 6:06:10 PM No.106117412
>>106117394
Mistral models are sufficiently uncensored for me. There are models with better prose (nearly all of their competitors). What's so confusing?
Anonymous
8/2/2025, 6:07:43 PM No.106117427
>>106117401
I'm sure there's some clues in the terminal output you didn't show.
Have you check for issues in the plugin's repo you didn't name?
Maybe it's something in the backend that we also have no clue about.
Anonymous
8/2/2025, 6:08:41 PM No.106117437
GLM owes me sex.
Replies: >>106117442
Anonymous
8/2/2025, 6:09:04 PM No.106117442
>>106117437
Just like your mom owes me sex as well
Anonymous
8/2/2025, 6:11:58 PM No.106117473
georgi owes me GLM
Replies: >>106117515
Anonymous
8/2/2025, 6:16:32 PM No.106117515
>>106117473
Vibe coders have been engeneering prompts for days. But llamacpp uses dalit c++ language so it is taking a long time. Please understand.
Anonymous
8/2/2025, 6:17:23 PM No.106117524
GxWaA5LbYAEuCy9
GxWaA5LbYAEuCy9
md5: bece6fe022fc0372195aa84dc0a2794d🔍
Replies: >>106117543 >>106117706 >>106118167 >>106118994
Anonymous
8/2/2025, 6:19:36 PM No.106117543
>>106117524
i would not mind at all.
Anonymous
8/2/2025, 6:25:37 PM No.106117594
>>106117401
>AMD for imagen
kek
>>>/g/ldg
Anonymous
8/2/2025, 6:28:42 PM No.106117621
>>106117367
The 4k is for pre training, and they used longer NATIVE context (with MLA) in post training. The 120 B model has 128k non-native YaRN context.
Replies: >>106117637 >>106117838
Anonymous
8/2/2025, 6:30:15 PM No.106117637
>>106117621
>The 120 B model has 128k non-native YaRN context.
Holy shit, this is unprecedented.
Replies: >>106117649
Anonymous
8/2/2025, 6:30:29 PM No.106117644
>>106116356
€0.10 has been added to your account
Anonymous
8/2/2025, 6:30:57 PM No.106117649
>>106117637
No it's not retard, YaRNed Llama models have been around for months and they're shit.
Replies: >>106117814
Anonymous
8/2/2025, 6:35:05 PM No.106117701
newplot(1)
newplot(1)
md5: 396febe67cc7fc986952584f8b2aeeab🔍
>>106117295
In my (limited) experience, its actually way harder to feed a model long sequences from the start and get it to converge, starting with short sequence and ramping it up does seem to be the way to go. that being said, I think they are way overcooking them at short sequence lengths. my current approach is to ramp up the context length quickly and then switch back to the short sequences for the main run and ramp up again finishing with the long sequences. I have no basis of comparison on the final result, so its not a real experiment it will either work or it wont.
Replies: >>106117924
Anonymous
8/2/2025, 6:35:41 PM No.106117706
>>106117524
How did Miku get roped up into this?
Surely she's completely unrelated to anything LLM related outside of this general but there's multiple pictures of her and Ani together.
Replies: >>106117737 >>106117767 >>106117772 >>106117810
Anonymous
8/2/2025, 6:35:41 PM No.106117707
Lots of LLM laymen here asking retarded questions ahead of OAI OSS "release". Coincidence?
Replies: >>106117749 >>106117821
Anonymous
8/2/2025, 6:37:43 PM No.106117737
>>106117706
>Surely she's completely unrelated to anything LLM related outside of this general but there's multiple pictures of her and Ani together.
This but unironically.
Replies: >>106117792
Anonymous
8/2/2025, 6:38:21 PM No.106117749
>>106117707
now imagine when it actually drops
Anonymous
8/2/2025, 6:39:26 PM No.106117767
>>106117706
>He doesn't know that this retarded general full of degenerates is actually the core of AI discussion
You have no idea how dumb and gay the cutting edge is, and how many terrible ideas from this thread have made it into production.
Replies: >>106117793
Anonymous
8/2/2025, 6:40:02 PM No.106117772
>>106117706
She's the most popular and well known anime girl by now so of course she gets inserted into everything, especially if it's about le quirky digital waifus. It's not just Ani stuff.
Replies: >>106117919
Anonymous
8/2/2025, 6:42:13 PM No.106117792
japanese_man_marries_hologram1
japanese_man_marries_hologram1
md5: b75f06053528c6edbc4418b95717d2b2🔍
>>106117737
See that thing on the right? People were trying to make Miku into virtual GF/Assistant before modern LLMs came to be.
Replies: >>106117800
Anonymous
8/2/2025, 6:42:19 PM No.106117793
>>106117767
>ideas from this thread have made it into production
Like?
Replies: >>106117808
Anonymous
8/2/2025, 6:42:58 PM No.106117800
>>106117792
Yes there is a history of your mental illness out there. What about it?
Replies: >>106117836
Anonymous
8/2/2025, 6:43:22 PM No.106117808
>>106117793
miku.sh invented reasoning
>${AI_NAME} can think for herself without the user seeing her thoughts by adding a /think prefix to her output. She uses this to reason about the world and to think about what she should say next.
Replies: >>106117828
Anonymous
8/2/2025, 6:43:40 PM No.106117810
>>106117706
>how did the artificial voice character get lumped in with the artificial intelligence one
truly a mystery
Anonymous
8/2/2025, 6:43:51 PM No.106117814
>>106117649
I'm almost positive that was sarcasm
Anonymous
8/2/2025, 6:44:39 PM No.106117821
>>106117707
Coincidental with that Grok companion thing.
>Grok waifus drop
>Normalfags eat it up
>eventually discover that it won't comply with their increasingly depraved fantasy escalations
>complain on the internet
>discover that open source AI is a thing
>start on reddit
>continue down rabbithole to 4chan
It just took them a while to get through the pipeline.
Anonymous
8/2/2025, 6:45:14 PM No.106117828
>>106117808
The COT idea predates /lmg/ though.
Anonymous
8/2/2025, 6:45:47 PM No.106117836
>>106117800
Fuck off.
Replies: >>106117867
Anonymous
8/2/2025, 6:45:48 PM No.106117838
>>106117621
NTA, doesn't 128k pretty much imply it's NOT the Horizon models? The ones on OR have 256k.
Replies: >>106117857 >>106117877 >>106117898
Anonymous
8/2/2025, 6:47:47 PM No.106117857
>>106117838
You can serve different things than what the model is capable of, DS site is limited to 64 despite supporting 128, it wouldn't be too crazy for a company to try the opposite
Replies: >>106117877 >>106117885
Anonymous
8/2/2025, 6:48:37 PM No.106117867
>>106117836
Don't forget to take your HRT. Remember that you can't really become Miku if people still think you are a man when they see you.
Anonymous
8/2/2025, 6:49:46 PM No.106117877
>>106117838
>>106117857
I would also posit that OAI may have high-balled the context going onto OR and then used feedback from there to settle on a point where the model was still coherent. I.E. they're using OR to effectively beta-test the config
Replies: >>106117927
Anonymous
8/2/2025, 6:51:00 PM No.106117885
>>106117857
Holding onto a larger context model while offering a lower context model actually would be extremely fucking weird though. One's easy to do, the other you have to deliberately have two differently trained versions of the model
Replies: >>106117941
Anonymous
8/2/2025, 6:51:17 PM No.106117890
Using GLM4.5 through OpenRouter, it's insane big the quality of the gens you get differ depending on the provider handling your request. Chutes seems to be the only that consistently gives good replies with the exact same setup to the point where I feel like I'm being scammed and they're peddling me a different model for both 4.5 and 4.5-Air. Yes, I have disabled Fallback models + Providers.
llama.cpp support fucking when? I don't want to deal with this stupid cloud blackbox retardation.
Replies: >>106117913
Anonymous
8/2/2025, 6:51:50 PM No.106117898
>>106117838
According to its config file, Mistral Nemo supports a context length of a million.
Anonymous
8/2/2025, 6:53:39 PM No.106117913
>>106117890
Openrouter will always be shit if they can't address:
1. People serving a model while claiming it's another model
2. People serving a model without revealing how quantized it is from the original
Anonymous
8/2/2025, 6:53:49 PM No.106117914
i wish drummer would leave /lmg/
Replies: >>106117955
Anonymous
8/2/2025, 6:53:58 PM No.106117919
>>106117772
> She's the most popular and well known anime girl by now
asuka
or that bitch with red bow from 2hu
Replies: >>106117957 >>106117974
Anonymous
8/2/2025, 6:54:43 PM No.106117924
>>106117701
>get it to converge
Isn't it just a matter of bigger batch / gradient accumulation? But a more stupid idea I have is train on smaller length to get some initial grammar in there and then add more layers in front and behind that "pretrained" section to make sure there is some information that can be used by new structure, but it hopefully develops a way to handle longer context instead of being kinda tied to 4k.
Replies: >>106118109
Anonymous
8/2/2025, 6:54:47 PM No.106117927
>>106117877
That's... possible?
My guess for people holding out hope that Horizon Alpha would be they just didn't fully update the configs yet and they're using an earlier version from a different point in training
Replies: >>106117936
Anonymous
8/2/2025, 6:56:01 PM No.106117936
>>106117927
>*that Horizon Alpha is the OSS model
Anonymous
8/2/2025, 6:56:32 PM No.106117941
>>106117885
>you have to deliberately have two differently trained versions of the model
no you don't, just change the settings of how you run it, it's that easy
Replies: >>106117980
Anonymous
8/2/2025, 6:57:53 PM No.106117955
>>106117914
Me too.
Anonymous
8/2/2025, 6:58:04 PM No.106117957
>>106117919
Maybe years ago. It's different now.
Anonymous
8/2/2025, 7:00:37 PM No.106117974
>>106117919
Eva hasn't been mainstream relevant in a decade and Toehoes never got their big western breakout the way Vocaloid did
But then again there's been dozens of random gacha whores and FotM anime characters as big or bigger than Miku
Anonymous
8/2/2025, 7:01:10 PM No.106117980
>>106117941
For a company to have a higher context model on OR and a genuinely lower context model on HF, they'd need different context lengths, unless you're saying they're the same model and they didn't update the config to the amount it's actually trained on
Again, that's theoretically possible, but... it's getting to be a lot of hoops to jump through is all I'm saying. I'm not hopeful, but I'll be pleasantly surprised if Horizon is genuinely the OSS model. I've got a bad feeling in my gut though
Anonymous
8/2/2025, 7:02:59 PM No.106117997
I haven't been paying attention for a while. from what I gather in this thread, the latest "best" model is GLM4.5, right? or is deepseek still the undisputed best?
also, horizon alpha/beta are a couple of open-weights models that will be released by OAI in 2 more weeks?
is nemo still king for running in a laptop?
Replies: >>106118031 >>106118043 >>106118054 >>106118061 >>106118095 >>106118104
Anonymous
8/2/2025, 7:04:04 PM No.106118009
>8k context
>but it's the best 8k you'll ever experience in your life
>no, rope doesn't work
Would you take this model as your wife?
Replies: >>106118043
Anonymous
8/2/2025, 7:06:26 PM No.106118031
>>106117997
/lmg/ is literally glm (most of the team is posting here, where do you think they got the acronym from). it has been prophecized that glm will save local and it did.
Replies: >>106118066 >>106118076 >>106118080 >>106118085
Anonymous
8/2/2025, 7:07:48 PM No.106118043
>>106118009
I can make that work.

>>106117997
>the latest "best" model is GLM4.5, right?
Most people are waiting for llama.cpp to implement the model. So maybe?

>is nemo still king for running in a laptop?
Will obviously depends on the spec, but supposedly, the new update to Qwen 3 30B A3B is pretty good.
The new GLM 4.5 air should run pretty decently on a laptop if you have enough and fast enough RAM, since it's a MoE with not that many activated params.
Anonymous
8/2/2025, 7:08:24 PM No.106118054
>>106117997
No goofs, no verdict.
Replies: >>106118085
Anonymous
8/2/2025, 7:09:14 PM No.106118061
>>106117997
I'm pretty sure the GLM3 ggoofs are still fucking subtly broken. So probably never.
Replies: >>106118085
Anonymous
8/2/2025, 7:09:28 PM No.106118065
TRVKE: local is a dying hobby and people so desperately wanting Horizon Alpha to be 120B prove that
Anonymous
8/2/2025, 7:09:37 PM No.106118066
>>106118031
hey that one is actually easy to disprove. you would have released with gooooofffff support.
Replies: >>106118080
Anonymous
8/2/2025, 7:10:20 PM No.106118076
>>106118031
>where do you think they got the acronym from
holy shit...
Anonymous
8/2/2025, 7:10:37 PM No.106118079
Miku is not related to this thread whatsoever. Mikuspam happens only because OP is mentally ill.
Anonymous
8/2/2025, 7:10:43 PM No.106118080
>>106118031
General of Local Models 4.5?

>>106118066
Only poorfags touch ggufs. vLLM 4 life
Replies: >>106118096
Anonymous
8/2/2025, 7:11:18 PM No.106118085
>>106118031
oh, so it must be shit.
where does the 4.5 come from?

>>106118054
I see

>>106118061
never what? also, I had no idea GLM3 was a thing
Anonymous
8/2/2025, 7:12:05 PM No.106118095
>>106117997
GLM is promising but 90% of the people here haven't really tried it because it's the local thread and there's no llama.cpp support
Nobody knows what Horizon really is, people assume it's the OAI open models because the timing vaguely lines up but it might just be a coincidence
Nemo is still best for coom but any of the small Qwen or Gemma models will likely beat it in every other use case
Replies: >>106118113
Anonymous
8/2/2025, 7:12:08 PM No.106118096
>>106118080
>vLLM
It is good to know that you declare your alligance for troonix, which by default means you are a mikutroon and should die in a fire. I WILL NOT FUCK YOUR MODEL NOW.
Anonymous
8/2/2025, 7:12:33 PM No.106118104
>>106117997
Alpha and Beta are the same model with different tunes, there are believers and dissenters that it's one of the OSS model
Replies: >>106118113
Anonymous
8/2/2025, 7:13:11 PM No.106118109
>>106117924
I kept the tokens per a step constant by adjusting the batches and grad acc. hoping it would help remove some of the guesswork I just left all the other parameters equal.

i think it i really hard to say what the best approach is. If it didn't require micromanaging the batch sizes to get maximum throughput, I would like to try just interleaveing the various sequence lengths, do a quick warm up to learn the basics of the language and then based on some formula slowly mix in more and more longer sequences till all the short sequences are consumed and only long ones are left till the end of the training schedule.

adding new layers sounds like an exciting idea but I fear it might make the training process unstable and result with catastrophic forgetting. would you duplicate the layers or just randomly initialize them? without any investigation my knee jerk reaction is that you might as well just start will your target number of layers to begin with so it won't be so shocking to your model, let it naturally explore the parameter space as it trains.
Replies: >>106118182
Anonymous
8/2/2025, 7:13:45 PM No.106118113
>>106118095
>>106118104
I see. thanks anons
Anonymous
8/2/2025, 7:19:29 PM No.106118167
>>106117524
What's the purple stuff?
Replies: >>106118176 >>106118184 >>106118190 >>106118191 >>106118197
Anonymous
8/2/2025, 7:20:27 PM No.106118176
>>106118167
ectoplasm
Anonymous
8/2/2025, 7:21:03 PM No.106118182
>>106118109
>it might make the training process unstable and result with catastrophic forgetting
It is basically what drummer does when he adds some layers and yes it doesn't work for him because of catastrophic forgetting so he just barely nudges the new layers and they just sit there and eat ram.

But the point of my stupid idea is that unlike a finetrooner you wouldn't release model just after this happens, but instead you treat it as a starting point of real pretraining. You will lose a lot of trained information, but hopefully enough is there to catch some gradient you continue from with your long sequences.
Replies: >>106118311
Anonymous
8/2/2025, 7:21:12 PM No.106118184
>>106118167
Non triggering bl**d
Replies: >>106118199
Anonymous
8/2/2025, 7:21:54 PM No.106118190
>>106118167
danganronpa reference
Anonymous
8/2/2025, 7:22:09 PM No.106118191
>>106118167
HRT
Anonymous
8/2/2025, 7:22:44 PM No.106118197
>>106118167
Grape jelly
Anonymous
8/2/2025, 7:22:55 PM No.106118199
>>106118184
umm... did {{user}} just get unal*ved...?
Replies: >>106118209
Anonymous
8/2/2025, 7:24:18 PM No.106118209
>>106118199
Yeah, this is pretty awful to depict
Anonymous
8/2/2025, 7:27:23 PM No.106118238
you might laugh but these anime posting degenerates are most of the finest devs building on local
Anonymous
8/2/2025, 7:35:19 PM No.106118310
LLMs have plateaued because they have run out of training data
Everyone has been training on the same 10~20T worth of non-synthetic data
Replies: >>106118316 >>106118322 >>106118324 >>106118325 >>106118329 >>106118353 >>106118711
Anonymous
8/2/2025, 7:35:24 PM No.106118311
>>106118182
yeah you would have to have the pretraining data set and a pretty appreciable learning rate to really make use of the additional layers. I suppose it could maybe be an optimization technique since steps would be quicker on the smaller model. I guess to prove it thoroughly you would need to train two models one with the traditional curriculum training and another with the curriculum + layer expansion and see which one converges with the lower number of flop's.
Anonymous
8/2/2025, 7:36:04 PM No.106118315
nope
Anonymous
8/2/2025, 7:36:24 PM No.106118316
>>106118310
I find this to be a reasonable assessment.
Anonymous
8/2/2025, 7:37:03 PM No.106118322
>>106118310
The same was said when the first R1 was released.
I think there's still a bit of juice left to be squeezed.
Anonymous
8/2/2025, 7:37:05 PM No.106118324
>>106118310
I'm curious if anyone is trying multiepoch training yet, or if they've found it to be beneficial at all
Anonymous
8/2/2025, 7:37:08 PM No.106118325
>>106118310
They wouldn't need all that data if they had a better architecture.
Anonymous
8/2/2025, 7:37:24 PM No.106118329
>>106118310
They'd have all the data they want if they didn't filter 95% of it
Replies: >>106118339
Anonymous
8/2/2025, 7:38:35 PM No.106118339
>>106118329
Furry RP logs aren't worth training on.
Replies: >>106118352 >>106118360
Anonymous
8/2/2025, 7:40:18 PM No.106118352
>>106118339
If only that was all they were filtering
Anonymous
8/2/2025, 7:40:20 PM No.106118353
>>106118310
you could double the amount of training data assuming it was just the same old stuff and it wouldn't lead to any gains
the real challenge isn't data quantity, it's data quality
Replies: >>106118364 >>106118380
Anonymous
8/2/2025, 7:40:58 PM No.106118360
>>106118339
They are, though. The contrast between human to human RP logs and the various human/furry combination RP logs contains a shitload of information about sensations, textures, etc. Intangible things that would otherwise be impossible to incorporate into its knowledge.
Furry ERP logs are an absolute goldmine and if you think otherwise you're a dumb fucking pajeet that doesn't belong in this industry.
Replies: >>106118371 >>106118385 >>106118422 >>106118490
Anonymous
8/2/2025, 7:41:21 PM No.106118364
>>106118353
Everyone does love Phi.
Replies: >>106118384
Anonymous
8/2/2025, 7:42:30 PM No.106118371
>>106118360
what about pony rp logs?
Replies: >>106118379 >>106118390
Anonymous
8/2/2025, 7:43:26 PM No.106118379
>>106118371
That's a subset of furry
Replies: >>106118387
Anonymous
8/2/2025, 7:43:30 PM No.106118380
>>106118353
Actually, quality matters less the greater your scale. What does matter is dataset diversity. Due to how badly the current architectures are at generalization, you want to train on as many different things as possible. Ideally it'd cover the infinite set of possible queries a user could ask.
Replies: >>106118432
Anonymous
8/2/2025, 7:43:45 PM No.106118384
>>106118364
phi does illustrate this point very well though, the only problem is their definition of quality
Anonymous
8/2/2025, 7:43:50 PM No.106118385
>>106118360
That's like saying CP is worth training on, whereas any capable model would be able to infer what CP would look like from adult porn and knowledge of child physiology and behaviour psychology
Replies: >>106118504
Anonymous
8/2/2025, 7:43:55 PM No.106118387
>>106118379
no it's not
Replies: >>106118401
Anonymous
8/2/2025, 7:44:25 PM No.106118390
>>106118371
yeah why not, if it starts to degrade performance in other areas just add more parameters. aren't they even trying?
Anonymous
8/2/2025, 7:45:43 PM No.106118401
>>106118387
Explain why not.
Replies: >>106118407
Anonymous
8/2/2025, 7:46:44 PM No.106118407
>>106118401
Ponies stand on 4 legs.
Replies: >>106118412 >>106118415
Anonymous
8/2/2025, 7:47:33 PM No.106118412
>>106118407
And so do most animals.
Replies: >>106118448
Anonymous
8/2/2025, 7:47:55 PM No.106118415
>>106118407
Furries often do too
Replies: >>106118448
Anonymous
8/2/2025, 7:48:30 PM No.106118422
>>106118360
That reminded me how I used nu-235B asked for a kitsune and it gave her a snout without any prompting. I was surprised. But yeah they are using furry ERP logs now.
Anonymous
8/2/2025, 7:49:37 PM No.106118432
>>106118380
We don't train models on white noise. Why?
Replies: >>106118441 >>106118481
Anonymous
8/2/2025, 7:50:52 PM No.106118441
>>106118432
Because we do generally want them to somewhat know actual things.
Anonymous
8/2/2025, 7:51:08 PM No.106118448
>>106118415
>>106118412
You will never be a brony
Replies: >>106118454 >>106118468
Anonymous
8/2/2025, 7:52:13 PM No.106118454
>>106118448
Good.
Anonymous
8/2/2025, 7:54:41 PM No.106118468
>>106118448
Thanks?
Anonymous
8/2/2025, 7:56:03 PM No.106118481
>>106118432
You know as well as I do that when we talk about data "quality" in /lmg/, it's not about filtering data out that's filled with random characters and repeating lines. It's obvious that there still needs to be a level of meaning in the data. That's what we mean by dataset diversity, and what researchers mean by the same term when they talk about it as well.
Replies: >>106118541 >>106118554
Anonymous
8/2/2025, 7:57:00 PM No.106118490
>>106118360
>you're a dumb fucking pajeet that doesn't belong in this industry.
What industry, ERP industry? Underage ID verification cannot come soon enough.
Replies: >>106118504
Anonymous
8/2/2025, 7:58:34 PM No.106118504
>>106118385
I mean it depends where you draw the ethical line.
It's like human testing, animal testing, etc.
There are shortcuts that can boost our knowledge and capabilities as a species.
But where do you draw the line? And it's a rhetorical question really. Everyone feels differently.
Would I be comfortable with models being pretrained on fictional erotic literature involving children in order for models to better understand behavioral psychology, etc? Yeah. It's just fucking words, get a life, go touch grass. etc.
If you're talking about like actual CP images/videos, from seized evidence of child abuse- or downloaded off of the dark web- That's pushing it for me. But I would support that data being used to say... train models for CSAM detection. Like current CSAM detection APIs rely heavily on file hashes and individual files having a history. And I suppose vision models that we have now are good enough to play that role. But having like really solid training data, and a lot of it, would allow you to train much more smaller, more specialized, and more widely deployable solutions.
>>106118490
You're a dumb pajeet. You are not worthy of a frank discussion.
Anonymous
8/2/2025, 8:03:04 PM No.106118541
llama-3-dataset-quality2
llama-3-dataset-quality2
md5: 907cde1fad1ddb26b5df195d3892d34f🔍
>>106118481
There will never be "high quality ERP" because the industry has determined that pornography is "low quality data".
Replies: >>106118564
Anonymous
8/2/2025, 8:04:50 PM No.106118554
>>106118481
have we proven that adding new domains doesn't hurt other domains? will a model trained on furry erp be more likely to give false veterinary advice?
Replies: >>106118788 >>106118846
Anonymous
8/2/2025, 8:06:21 PM No.106118564
>>106118541
llama 3 was this bad already, imagine what happened in llama 4
good thing meta is history now
Replies: >>106118573
Anonymous
8/2/2025, 8:07:37 PM No.106118573
>>106118564
superintelligence will save llama 5
Replies: >>106118591
Anonymous
8/2/2025, 8:09:39 PM No.106118591
>>106118573
Which Wang will talk Zuck into handing to Altman
Replies: >>106118605
Anonymous
8/2/2025, 8:11:20 PM No.106118605
>>106118591
>Zuck replacing Nutella as Sam's moneypig patreon
I don't see that happening, but it would be hilarious if it did.
Anonymous
8/2/2025, 8:22:51 PM No.106118711
>>106118310
Nah you can always get even more fucking data. You just need to dig deeper. Go digging through old german Team Speak and Mumble servers. Digitize every bit of hand written letters you can find.
Anonymous
8/2/2025, 8:33:55 PM No.106118788
>>106118554
I'm too lazy to go retrieve it, but there was a paper that claimed that prepending each site's URL to each sample mitigated the issue of knowledge being confused like that, which according to the same paper, does happen. Most labs worth their salt should've already been doing this for a while now. But this issue should also only apply to a small scale dataset which I believe is what they tested in the study. I don't know or remember if anyone investigated it at a larger scale, but generally speaking in ML, greater scale cancels out a bunch of issues like these. Greater diversity (within domain) too.

For example, let's say you train on furry ERP and veterinarian advice. At first, the model will likely conflate the two contexts. But as you train for longer, the model will learn the more subtle differences and be able to tell that they are different context, and that it should predict something different. Additionally, if you train on more different variations of furry ERP and variations of veterinarian contexts, it'll better learn what the tells of a veterinarian context and a furry ERP context are.
Replies: >>106118846
Anonymous
8/2/2025, 8:40:03 PM No.106118846
>>106118554
>>106118788
Speaking as someone who's trained these kinds of models (tho not as big as these huge LLMs) the answer is that models actually do get confused a little bit. That's why chink LLMs will occasionally spew chinese characters in the middle of english output. But, more data is usually still better.
That is, if you want a vet model, (1T of vet data) > (0.5T of vet data + 0.5T furry porn) > (0.5T of vet data). These models are so large and training is so primitive that more data will almost always help unless the extra data has nothing in common at all with what you want. But that's usually not true.
Replies: >>106118888
Anonymous
8/2/2025, 8:44:40 PM No.106118888
>>106118846
I hope next generation datasets will be at least half furry porn
Anonymous
8/2/2025, 8:57:32 PM No.106118994
>>106117524
Getting stabbed to death with Miku!
Anonymous
8/2/2025, 9:17:16 PM No.106119129
Semi-relevant to the data quality discussion.

When Bad Data Leads to Good Models
https://arxiv.org/abs/2505.04741
In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy experiment to study how data composition affects the geometry of features in the representation space. Next, through controlled experiments with Olmo-1B models trained on varying ratios of clean and toxic data, we find that the concept of toxicity enjoys a less entangled linear representation as the proportion of toxic data increases. Furthermore, we show that although toxic data increases the generational toxicity of the base model, it also makes the toxicity easier to remove. Evaluations on Toxigen and Real Toxicity Prompts demonstrate that models trained on toxic data achieve a better trade-off between reducing generational toxicity and preserving general capabilities when detoxifying techniques such as inference-time intervention (ITI) are applied. Our findings suggest that, with post-training taken into account, bad data may lead to good models.
Replies: >>106119412
Anonymous
8/2/2025, 9:29:20 PM No.106119245
Threadly reminder Anthropic supports and enables human rights abuse and massive surveillance despite claiming to be 'humanists'.
Replies: >>106119258 >>106119292 >>106119396
Anonymous
8/2/2025, 9:31:09 PM No.106119258
>>106119245
members of the san francisco rationalist cult need to be institutionalized
Anonymous
8/2/2025, 9:35:26 PM No.106119292
>>106119245
Remember "do no evil" Google?
Anonymous
8/2/2025, 9:49:22 PM No.106119396
>>106119245
>corporation headed by a jew is evil
bigger chance of winning the lottery then ever guessing this fr
Anonymous
8/2/2025, 9:49:50 PM No.106119399
WanVideo_I2V_00036_thumb.jpg
WanVideo_I2V_00036_thumb.jpg
md5: 31a28f2d29a6bfaad197ee2b7443d105🔍
wan is pretty good
Replies: >>106119444 >>106119507 >>106119754 >>106119800 >>106119802
Anonymous
8/2/2025, 9:51:17 PM No.106119412
>>106119129
I remember people used to make loras from datasets where AI fucked up the hands so that they could learn to recognize the types of ways they would fuck up hands, and then apply the lora with negative weight. Seems like the same principle here.
Replies: >>106119509
Anonymous
8/2/2025, 9:52:58 PM No.106119426
Hey all some retard fucked up his smut writeup I told him I would read.
The concept is hot and the dialog is even good but the autist mixed 1rst 2nd and 3rd person language into the same scenes. Whats a quick option I can use that will read the whole thing and rewrite it in 3rd person?

I tried using perplexity.ai but it has a character limit and it also started making shit up.

AI newfag here, just a crumb of handholding please?
Anonymous
8/2/2025, 9:54:36 PM No.106119444
>>106119399
Gonna go show this to the resident pregfag on /b/. Should please him. Nice pregsex vid anon
Anonymous
8/2/2025, 10:01:02 PM No.106119507
>>106119399
Imagine playing a freeform H-game with scenes like these it'd just generate on the fly
Replies: >>106119666
Anonymous
8/2/2025, 10:01:11 PM No.106119509
>>106119412
Shut up you lying moralfag pile of shit.
Anonymous
8/2/2025, 10:02:53 PM No.106119524
If we were magically able to exhaustively try every combination of floating point weights, how smart do you think, say, a 70B transforemr model would be? How much smarter than our current models?
And if we were able to exhaustively try every possible program that would fit on the same computer that runs the LLM, how much smarter would the smartest possible program be than the smartest possible transformer model?
Replies: >>106119586 >>106119657
Anonymous
8/2/2025, 10:09:26 PM No.106119586
>>106119524
We don't know how "smart" either of those can be. If the model can generalize, it'd be smarter by default. Unless the program can generalize just as well, and then the program would be a general model.
It's a stupid question.
Anonymous
8/2/2025, 10:10:28 PM No.106119601
How is sex in 3steps?
Replies: >>106119610
Anonymous
8/2/2025, 10:11:14 PM No.106119610
>>106119601
1) in
2) out
3) wipe
Replies: >>106119615 >>106119625
Anonymous
8/2/2025, 10:11:55 PM No.106119615
>>106119610
/thread
Anonymous
8/2/2025, 10:12:52 PM No.106119625
>>106119610
>3) wipe
Argument optional?
Replies: >>106119631
Anonymous
8/2/2025, 10:13:23 PM No.106119631
>>106119625
He asked for 3 steps
Anonymous
8/2/2025, 10:15:02 PM No.106119649
speaking of transformers
>70b-q8
>9.3k tokens in and {{char}} starts to have some of {{user}}'s traits
god I hate transformers and their attention. if sama's attention sink won't work in gpt-oss I'll become a lecun shill.
Replies: >>106120350
Anonymous
8/2/2025, 10:15:33 PM No.106119652
Screenshot 2025-08-02 at 17-15-04 support GLM-4.5 MoE models by ddh0 · Pull Request #15026 · ggml-org_llama.cpp · GitHub
New PR boys.
Anonymous
8/2/2025, 10:15:37 PM No.106119654
No but rly is step3 better or worse than glm for sex?
Anonymous
8/2/2025, 10:15:52 PM No.106119657
>>106119524
>If we were magically able to exhaustively try every combination of floating point weights, how smart do you think, say, a 70B transforemr model would be? How much smarter than our current models?
Significantly. For such a model, it would likely be far more expressive and lose a significant degree of redundancy, so quantization to four bits would likely cripple the model to the point of uselessness. I'd expect there are significantly better models that are possible, but I don't think our current optimizers and training methods are well equipped to find these
>And if we were able to exhaustively try every possible program that would fit on the same computer that runs the LLM, how much smarter would the smartest possible program be than the smartest possible transformer model?
We're basically asking the inverse question of Kolmogorov Complexity here, which is "given a string of information, what is the minimum number of bits you need to represent it?" The inverse question is, "given a specified number of bits, how many strings from some specific domain can you represent?" I'm not sure, but I do think transformer architecture models probably aren't the most efficient in that regard. The theoretical best model might not even be "trainable" or "findable" in a practical sense, so we'd have to come as close as we could to it. I expect somebody will find an architecture that trains as well, increases efficiency gains, and gets closer to that optimal model eventually
Replies: >>106120103
Anonymous
8/2/2025, 10:17:34 PM No.106119666
ido03_8_thumb.jpg
ido03_8_thumb.jpg
md5: f15c70980c69014f6de111edea9ed973🔍
>>106119507
>imagine your freeform H-game can only do mosquito bites
Grim.
Anonymous
8/2/2025, 10:28:56 PM No.106119754
>>106119399
gross!
Anonymous
8/2/2025, 10:34:20 PM No.106119800
>>106119399
>The image was never in our databases.
Anonymous
8/2/2025, 10:34:41 PM No.106119802
>>106119399
>n-no anon! you are a schizo and mikutroons are normal
yeah right
Anonymous
8/2/2025, 10:47:19 PM No.106119933
>>106119921
>>106119921
>>106119921
Anonymous
8/2/2025, 11:05:29 PM No.106120103
>>106119657
For the first question, I think we could maybe make a 7B model as good as a 70B model, but nut anything much more dramatic than that.
The local minima in neural networks generally results in accuracy values that are fairly close to the accuracy values of global minima.
At least when taking into account non CoT models. If we take into account CoT then it becomes a much more nuanced question. It's even possible that our current approach to CoT is fundamentally wrong and the model should think in its own machine language rather than human language for optimal optimal accuracy, and we just don't have enough computational power to find that optimal internal language just from random variations and RL.
As for the second question, I'm not sure how much these formalisms reflect what we think of as intelligence. Suppose we ask an oracle to find the optimal program that runs on current hardware and produces the closest possible approximation to some language dataset within a certain time limit. Once you have it you can't just use it to infer on other datasets. Maybe it could be used as a base to get a more general model, or maybe it's a one off thing that's impossible to adapt to some other task. I don't think we know the answer to that question with our current theoretical knowledge. So in Solomonoff induction, is the intelligence the product of the oracle, or the oracle itself? Like I say, the product of the oracle might not be practically useful. And if it's the optimizer itself, by the no free lunch theorem the only way to get faster inference on some problems (for example those with low Kolgomorov complexity) is by sacrificing performance on other problems, for example those with high complexity. But I don't understand why the no free lunch theorem is true (it seems trivial to find counterexamples that are asymptotically slower for all cases, for example for a problem with description of length n, before finding the answer compute Ack(n)) so I might be wrong.
Anonymous
8/2/2025, 11:36:49 PM No.106120350
>>106119649
To be fair, 70B is a pretty old model now. Have you tried K2, Deepseek, etc?