Thread 106113484

344 posts 72 images /g/

Anonymous 8/2/2025, 8:58:58 AM No.106113484 [Report] >>106117401

/lmg/ - Local Models General

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous Threads: >>106108045 and >>106104055

>(07/31) Qwen3-Coder 30B released: https://qwenlm.github.io/blog/qwen3-coder
>(07/31) Command A Vision: Built for Business: https://cohere.com/blog/command-a-vision
>(07/31) Step3 multimodal reasoning 321B-A38B released: https://stepfun.ai/research/en/step3
>(07/31) Committed: llama-server : implement universal assisted decoding: https://github.com/ggml-org/llama.cpp/pull/12635
>(07/31) Cogito v2 Preview released: https://deepcogito.com/research/cogito-v2-preview

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 8/2/2025, 9:18:29 AM No.106113620 [Report]

>Want to test a new model
>Qwen3 30B thinking
>Download takes four hours
maaaaaan
I

Anonymous 8/2/2025, 9:21:29 AM No.106113641 [Report] >>106113669 >>106113689

What context size should I use?
>7600 XT (16GB VRAM)
>Ryzen 3700X + 80GB RAM
>llama.cpp + Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf
How much worse will this be compared to cloud services?

Anonymous 8/2/2025, 9:26:23 AM No.106113669 [Report] >>106113709

>>106113641
while on that topic:
How do you determine what model you can run on a given hardware?

Anonymous 8/2/2025, 9:28:08 AM No.106113679 [Report] >>106113695 >>106113719 >>106113776

im7w319dnjgf1.png md5: 659c7a1f...

lol

Anonymous 8/2/2025, 9:29:11 AM No.106113689 [Report]

>>106113641
in terms of quality? a lot but still useful.

Anonymous 8/2/2025, 9:30:12 AM No.106113695 [Report]

>>106113679
i love benchmarks

Anonymous 8/2/2025, 9:31:47 AM No.106113707 [Report]

>how do you run +200B model?
on my 4x3090 rig, with exl3 fully offloaded usually at Q4.
For bigger models I have to use ik_llama.cpp with some layers on ram of course. The performance is great for chat bots, but the missing tool support of ik is a pain...

Anonymous 8/2/2025, 9:32:36 AM No.106113709 [Report] >>106113714

>>106113669
weather it fits in your vram with some additional room. if it's a mixture of experts (like that qwen model) you can afford to have a lot in ram too

Anonymous 8/2/2025, 9:33:37 AM No.106113714 [Report] >>106113765

>>106113709
I mean, is there some formula to determine it?

Anonymous 8/2/2025, 9:33:58 AM No.106113719 [Report]

>>106113679
Dense is so 2023

Anonymous 8/2/2025, 9:37:10 AM No.106113747 [Report] >>106113767 >>106114543

Mikulove

Anonymous 8/2/2025, 9:39:44 AM No.106113765 [Report] >>106113775

>>106113714
It's not complicated math, anon. You don't need a formula.
Model is Xgb big, so it needs at least Xgb of memory to load into. If that memory is VRAM, it will run at the intended speed, if that memory is system RAM, it will run slower.
The more of it in system RAM, the slower it will run.
Don't even have enough system RAM to fit the model? You can't run it.

Anonymous 8/2/2025, 9:39:50 AM No.106113767 [Report] >>106113995 >>106114543

1752112733344553.jpg md5: 6e7aa57b...

>>106113747

Anonymous 8/2/2025, 9:41:24 AM No.106113775 [Report] >>106113791

>>106113765
But don't you need VRAM for the context as well? Or does thta go to system RAM?

Anonymous 8/2/2025, 9:41:31 AM No.106113776 [Report] >>106113807

>>106113679
>72B
99.5% chance that it's a Qwen2.5 72b finetune then.

Anonymous 8/2/2025, 9:43:54 AM No.106113791 [Report] >>106113814

>>106113775
It can go into either, same speed penalties apply.
There's no universal way to calculate how much space context will take, since it varies significantly from model to model and is a large area of innovation. It's just done by trial and error, any competent backend will tell you how much memory is attempting to be assigned to context/kv cache, so you eyeball it from that.

Anonymous 8/2/2025, 9:46:01 AM No.106113807 [Report] >>106113813 >>106117179

>>106113776
https://huggingface.co/Skywork/MindLink-72B-0801
>Base model
>Qwen/Qwen2.5-72B

Anonymous 8/2/2025, 9:47:24 AM No.106113813 [Report]

>>106113807
keek

Anonymous 8/2/2025, 9:47:46 AM No.106113814 [Report] >>106113839

>>106113791
don't you set context size when you start the server?

Anonymous 8/2/2025, 9:47:56 AM No.106113815 [Report] >>106113821 >>106113851 >>106113881

we got our chink model of the day

we are so back

Anonymous 8/2/2025, 9:48:42 AM No.106113821 [Report]

>>106113815
not moe though

it's so over

Anonymous 8/2/2025, 9:51:34 AM No.106113836 [Report]

The future is dynamic active parameter scaling once they have figured out how to stop MoEs from getting braindamage if too many experts are used. This way, the expert routers won't just be able to decide what experts are used but also how many of them. A simple task? Any one or two experts will do. Something very nuanced and complex? The model uses 70%-100% of its total parameters.
I think it's really obvious that this is where we're headed once they understand these architectures more. The lines between dense and MoE will blur.

Anonymous 8/2/2025, 9:51:56 AM No.106113839 [Report] >>106113857 >>106114887

>>106113814
You do, but if you assign an amount that would take up more memory than you have, it'll still get most of the way through loading before telling you that it had a memory allocation error.
Look through your log and you'll see something like this
llama_context: CUDA_Host output buffer size = 0.58 MiB
llama_kv_cache_unified: CUDA0 KV buffer size = 2592.00 MiB
llama_kv_cache_unified: CUDA1 KV buffer size = 416.00 MiB
llama_kv_cache_unified: size = 3008.00 MiB ( 16384 cells, 94 layers, 1/ 1 seqs), K (f16): 1504.00 MiB, V (f16): 1504.00 MiB

Anonymous 8/2/2025, 9:53:29 AM No.106113851 [Report]

>>106113815
No, it's just the modern equivalent of the suspicious Yi-32b tunes that used to dominate the mememark leaderboard before it became irrelevant.

Anonymous 8/2/2025, 9:55:12 AM No.106113857 [Report]

>>106113839
Thank you for your patience with me and guidance

Anonymous 8/2/2025, 10:00:00 AM No.106113881 [Report] >>106113894 >>106113899 >>106115869

4736363.jpg md5: 3020bfe9...

>>106113815
we aren't back until OpenAi drops the inevitable SOTA local models next week

Anonymous 8/2/2025, 10:00:20 AM No.106113884 [Report] >>106113915 >>106113968 >>106115095

wake up they made a new glm4.5 pr https://github.com/ggml-org/llama.cpp/pull/15026

Anonymous 8/2/2025, 10:02:00 AM No.106113894 [Report] >>106115988

>>106113881
glm4.5 blows horizon out of the water

Anonymous 8/2/2025, 10:02:52 AM No.106113899 [Report]

>>106113881
buy an ad

Anonymous 8/2/2025, 10:05:58 AM No.106113912 [Report]

>Thsanaphoble Intrhaphofuhfdsak on sprint pole
let's go

Anonymous 8/2/2025, 10:06:35 AM No.106113915 [Report] >>106115095

>>106113884
Vibe coders LOST.

Anonymous 8/2/2025, 10:19:00 AM No.106113968 [Report] >>106113992

>>106113884
It's crazy to me that after this many years it's still so difficult to shrink down a model a bit and that every new model is incompatible with the old techniques.

Anonymous 8/2/2025, 10:22:53 AM No.106113992 [Report] >>106114043

>>106113968
eh its fine. A lot of these models will be forgotten in a month or two. I totally get why they just skip support for some models or features.

Anonymous 8/2/2025, 10:23:15 AM No.106113995 [Report] >>106114543

>>106113767
Greetings, Local Emotional Support Miku. In the previous thread, Anon's 9k context Qwen-code model has been trying to contact you on his behalf for help. Expect a call in roughly 14 days.

Anonymous 8/2/2025, 10:31:12 AM No.106114043 [Report] >>106114050

>>106113992
But this is the one I actually want to use, anon.

Anonymous 8/2/2025, 10:32:16 AM No.106114050 [Report] >>106114066

>>106114043
You say that to every model

Anonymous 8/2/2025, 10:34:39 AM No.106114066 [Report] >>106114445

GxUMVq5aIAAM9DE.jpg md5: 638dc986...

>>106114050

Anonymous 8/2/2025, 10:36:42 AM No.106114076 [Report]

GxTHC6raIAE3A4M.jpg md5: 26be821d...

Anonymous 8/2/2025, 10:53:19 AM No.106114153 [Report] >>106114185

00135-861488147.png md5: abe78119...

anyone know why when i use refiner between 2 models sometimes the colors invert? and sometimes they do not.
( image not representive of issue )

Anonymous 8/2/2025, 10:58:19 AM No.106114185 [Report]

>>106114153
>>>/g/ldg

Anonymous 8/2/2025, 11:02:33 AM No.106114213 [Report] >>106114419

ComfyUI_34510_.png md5: 69624f6a...

►Recent Highlights from the Previous Thread: >>106108045

--Paper: IndexTTS2: Breakthrough open-source zero-shot expressive TTS model soon to be released by Bilibili:
>106108381 >106108506
--Running >200B models via extreme quantization and offloading on consumer hardware:
>106110419 >106110426 >106110445 >106110633 >106110681 >106111115
--Qwen Thinker favored for roleplay depth despite slower speed vs Instruct:
>106109987 >106110063 >106110103 >106110130 >106110110 >106110097
--GLM-4.5-GGUF release with concerns over imatrix calibration quality:
>106108341 >106108382
--Community reaction to GLM 4.5 support attempt in llama.cpp:
>106110329 >106110516 >106110547 >106110582 >106110838 >106110850 >106110970 >106111121 >106111137 >106110940 >106110963
--Improving vision model accuracy through synthetic data and finetuning:
>106111679 >106111750
--New --cpu-moe flag in llamacpp for simplified MoE expert offloading:
>106111142
--:
>106111497 >106111502 >106111539 >106111504 >106111520 >106111529 >106111500 >106111715
--Speculation on Horizon Alpha and stealth model parameter counts and performance:
>106110634 >106111055 >106111085 >106111138
--Seeking capable uncensored ERP models despite hardware and naming absurdity:
>106110567 >106110647 >106110726 >106110750 >106110762
--Qwen 235B shows major improvement in conversational quality and expressiveness:
>106110230 >106110272 >106110288 >106110308
--Running Qwen-code on CPU with high token processing overhead:
>106111566 >106111718 >106111785
--Horizon Beta feels less assistant-like, raising hopes it's not just a cloud model:
>106111772
--Disappointment over safety restrictions in deepseek-671B-MoE finetune:
>106109775
--Miku (free space):
>106110757

►Recent Highlight Posts from the Previous Thread: >>106108052

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/2/2025, 11:12:23 AM No.106114272 [Report]

>>106113410
Unless he had a vagina it wasn't.

Anonymous 8/2/2025, 11:18:14 AM No.106114309 [Report] >>106114419

ComfyUI_34510_.png md5: 69624f6a...

►Recent Highlights from the Previous Thread: >>106108045

--Paper: IndexTTS2: Breakthrough open-source zero-shot expressive TTS model soon to be released by Bilibili:
>106108381 >106108506
--Running >200B models via extreme quantization and offloading on consumer hardware:
>106110419 >106110426 >106110445 >106110633 >106110681 >106111115
--Qwen Thinker favored for roleplay depth despite slower speed vs Instruct:
>106109987 >106110063 >106110103 >106110130 >106110110 >106110097
--GLM-4.5-GGUF release with concerns over imatrix calibration quality:
>106108341 >106108382
--Community reaction to GLM 4.5 support attempt in llama.cpp:
>106110329 >106110516 >106110547 >106110582 >106110838 >106110850 >106110970 >106111121 >106111137 >106110940 >106110963
--Improving vision model accuracy through synthetic data and finetuning:
>106111679 >106111750
--New --cpu-moe flag in llamacpp for simplified MoE expert offloading:
>106111142
--AI generates "smirulakte" due to sampler settings and model instability:
>106111497 >106111502 >106111539 >106111504 >106111520 >106111529 >106111500 >106111715
--Speculation on Horizon Alpha and stealth model parameter counts and performance:
>106110634 >106111055 >106111085 >106111138
--Seeking capable uncensored ERP models despite hardware and naming absurdity:
>106110567 >106110647 >106110726 >106110750 >106110762
--Qwen 235B shows major improvement in conversational quality and expressiveness:
>106110230 >106110272 >106110288 >106110308
--Running Qwen-code on CPU with high token processing overhead:
>106111566 >106111718 >106111785
--Horizon Beta feels less assistant-like, raising hopes it's not just a cloud model:
>106111772
--Disappointment over safety restrictions in deepseek-671B-MoE finetune:
>106109775
--Miku (free space):
>106110757 >106113331

►Recent Highlight Posts from the Previous Thread: >>106108052

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 8/2/2025, 11:32:33 AM No.106114392 [Report]

1754127128909.png md5: e2ff0c4f...

>PUNCHING ABOVE IT'S WEIGHT

Anonymous 8/2/2025, 11:33:29 AM No.106114397 [Report] >>106114422 >>106114576 >>106116048 >>106116548

Screenshot 2025-08-02 at 06.28.42.png md5: 6ec36a8c...

The discussion on MoEs is retarded. People have some insane intuitions, like this square law or whatever.

The reason MoE works is that dense training is absurdly inefficient. Most activations can be pushed to zero with minor cost (see ReLUfication lit). Dense transformers are not optimal engines for turning flops into intelligence, they're also about as sparse as MoEs in terms of what circuits they actually learn, but their design makes it impossible to easily and predictably zero out everything non-contributing to a given token. This is why DeepSeek went all in on "expert specialization" and finegrained sparsity almost 2 years ago, so now we have models that do as well as old dense ones with a fraction of compute cost. We do not know how to train efficient dense models, just cramming more tokens doesn't work.

MoEs won before they were even invented. Were we to fail to develop finegrained MoEs, we'd just be doing PowerInfer-like stuff to accelerate dense models by training conditional sparsity into them.

Anonymous 8/2/2025, 11:36:38 AM No.106114419 [Report] >>106114433

>>106114309
>>106114285
>>106114213
u ok bro?

Anonymous 8/2/2025, 11:37:06 AM No.106114422 [Report]

>>106114397
>The discussion on MoEs is retarded
And that's why you decided to resume it?

Anonymous 8/2/2025, 11:37:09 AM No.106114423 [Report] >>106116476

After using them for a while, I am pretty certain that Horizon Alpha/Beta are indeed 120B models. They're great, especially for that size but in actual use they keep getting things subtly wrong that GLM4.5 handles not perfectly but with much fewer problems.
To be fair, this is a pretty silly complaint considering the only local models that could handle this up until now were R1 and Kimi and those had their own issues.

Anonymous 8/2/2025, 11:38:55 AM No.106114433 [Report]

>>106114419
he missed one miku link and had to remake (hey, I don't mind high standards)

Anonymous 8/2/2025, 11:40:59 AM No.106114445 [Report] >>106114457

>>106114066
Put her on frying pan

Anonymous 8/2/2025, 11:42:16 AM No.106114452 [Report] >>106114460 >>106114581

1754127558740.jpg md5: a2f1ba79...

UK faggots along with OFCOM are studying local ai discussions forum, they're looking for justifications to strongarm regulations against it. expect max. TFLOP limitations per households in UK soon (but you still can get permit if you're a business)

Anonymous 8/2/2025, 11:42:53 AM No.106114457 [Report] >>106114467 >>106114483

1742140036897054.jpg md5: b5440698...

>>106114445

Anonymous 8/2/2025, 11:43:30 AM No.106114460 [Report] >>106114513

>>106114452
evidence?

Anonymous 8/2/2025, 11:46:18 AM No.106114467 [Report]

>>106114457
Miku-san on Mikupan

Anonymous 8/2/2025, 11:49:48 AM No.106114483 [Report]

eggu.jpg md5: eee2e2c3...

>>106114457
https://files.catbox.moe/6rn1kv.png

Anonymous 8/2/2025, 11:54:12 AM No.106114513 [Report]

>>106114460
Logical deduction. They are taking every opportunity to fuck over their citizens, makes sense that they would limit their access to it.

Anonymous 8/2/2025, 11:58:06 AM No.106114543 [Report] >>106114595

>>106113995
>>106113767
>>106113747
kill yourselves mikutroons

Anonymous 8/2/2025, 12:04:05 PM No.106114576 [Report] >>106114859

>>106114397
Nice speech but I am still waiting for you to disprove the square root law.

Anonymous 8/2/2025, 12:04:21 PM No.106114578 [Report] >>106114601

stranger_in_a_strange_land.jpg md5: 9d73e1e9...

Is Grok 4 worth it? My use case would be mostly studying and system administration. I like the feature of being able to create projects and upload documents, making it reference these in every answer. Is there anything comparable?

Anonymous 8/2/2025, 12:04:47 PM No.106114581 [Report]

>>106114452
They can't even properly enforce TV licenses, I don't think the filth is gonna be kicking down your door and giving you the old
>OI M8 DO YOUSE HAVE A LOICENSE FOR THAT EXTRA 3090?!
>DINNT FINK SO, INTO THE CAN WITH YOU OL CHINA

Anonymous 8/2/2025, 12:06:22 PM No.106114595 [Report] >>106114598 >>106114637 >>106114711 >>106117294

1746588272358529.jpg md5: 15ca210a...

>>106114543
were all military pilots troons too?

Anonymous 8/2/2025, 12:07:06 PM No.106114598 [Report] >>106114843

>>106114595
No because this isn't the AGP avatar.

Anonymous 8/2/2025, 12:07:29 PM No.106114601 [Report] >>106115295

>>106114578
I have access, if you post a prompt or two I'll run it through so you can eval.

Anonymous 8/2/2025, 12:13:46 PM No.106114637 [Report]

020930-O-9999G-005.jpg md5: 2875355b...

>>106114595
Yes

Anonymous 8/2/2025, 12:26:30 PM No.106114711 [Report] >>106114752

Figthing Proud.jpg md5: 87cf1a11...

>>106114595

Anonymous 8/2/2025, 12:33:14 PM No.106114752 [Report]

>>106114711
>j propaganda

Anonymous 8/2/2025, 12:36:28 PM No.106114768 [Report] >>106114845 >>106115665

1000271279.jpg md5: 64c8b4b6...

my friend who is a newbie in locals have installed this trash which has filters, isn't it just pathetic that even local models have filters? Please recommend whatever is a good model for coding without filters. He wanted to get qwen 3 but i figured i would ask here for better ideas

Anonymous 8/2/2025, 12:44:51 PM No.106114811 [Report] >>106114882

if gpt-oss can't e/rp then sama is done for.

Anonymous 8/2/2025, 12:49:18 PM No.106114843 [Report]

>>106114598
that's ani/kurisu/maho, right?

Anonymous 8/2/2025, 12:49:38 PM No.106114845 [Report] >>106114877 >>106114904

>>106114768
>TheBloke release of LLAMA2
Jesus is your mate a time traveler, the fuck does he have that for.
Anyway the Qwen coder models are solid choices, and the 235b-thinking is a solid middle ground if the 30b coder is too small but the 480b coder is too large.
Keep in mind that without a system prompt, there's basically no model on the planet that won't somewhat balk at you writing nigger 87 times, because they're all trained on gay censored chatgpt logs and whatnot.
But Gemma3 is next level hot garbage with censorship, so don't expect that level of awful.

Anonymous 8/2/2025, 12:52:42 PM No.106114859 [Report] >>106114920

Screenshot 2025-08-02 at 07.43.49.png md5: ea4acb12...

>>106114576
"square root law" is bro science, not supported by anything ever. Qwen 30B-A3B is roughly equivalent to Qwen-14B, not 8B.

More devastatingly for square cube law bros, it's inherently retarded. In the limit, it suggests that a MoE with 1 active parameter is equivalent to a square root of total (but in reality it'd be braindead, and its training costs would be negligible) and that a full-activation MoE would be only as effective as a dense model of the same scale, with the same (actually lower, due to MoE MFU penalty) training and inference cost. We know that well-designed MoEs are like 20-40x more compute-efficient than dense, so the curve cannot be like this.

This has always been mere VRAMlet cope.

Anonymous 8/2/2025, 12:55:21 PM No.106114877 [Report] >>106114906

>>106114845
>Keep in mind that without a system prompt, there's basically no model on the planet that won't somewhat balk at you writing nigger 87 times
so is there any jailbreaks for locals then? Right?

Anonymous 8/2/2025, 12:55:37 PM No.106114878 [Report] >>106114907 >>106116206 >>106116277

I was kinda impressed by qwen-thinking, but now I'm trying phi4 and it just eats through reply tokens waffling about how much of a good little cuck it is.
>hmm, I should check that I don't break any rules
>I now should check the rules
>For this I must list all my rules
>But my rules say that I should not list rules to user
>Ah, but this is reasoning stage that user won't see(lol), so maybe it's okay
>I now checked that I didn't break any rules, but now I must confirm that my check was correct
>etc etc
Never was I filled with so much rage about model censorship before, not only does it limit it usefulness, it also burns my compute, wastes my time, damages environment, causes untold suffering, commits war crimes, kicks my dog, doesn't wash rice and doesn't put shopping cart back.

Anonymous 8/2/2025, 12:56:07 PM No.106114882 [Report] >>106114903 >>106115173

>>106114811
Horizon Alpha/Beta on OpenRouter, if they're the upcoming OpenAI open-weight 120B model, seem to heavily steer away from NSFW even if the character description is on the horny side, but I haven't tested them too heavily in this regard and it might simply be they're trying to be "realistic" rather than just devolving into porn in 2-3 turns like other official models.

I'm reluctant toward pushing things too much with cloud models, and it's likely that they're using the prompts for red teaming, so the less you interact with them right now, the better, probably.

Anonymous 8/2/2025, 12:56:39 PM No.106114887 [Report] >>106114993

>>106113839
Only if you've disabled nvidia's fuckass "virtual VRAM" if you're using Windows.

Anonymous 8/2/2025, 12:59:06 PM No.106114903 [Report]

>>106114882
>the less you interact with them right now, the better
yeah, there no reason to host it other than inspecting user interactions

Anonymous 8/2/2025, 12:59:11 PM No.106114904 [Report]

>>106114845
>won't somewhat balk at you writing nigger 87 times
kek, the only coding refusal I got was when I pasted a script that referenced the dead_nigger_storage directory. on its surface its just a harmless movie reference but chatgpt saw some deeper ethical concerns. the really sad part is they won, now my directories get boring actually descriptive name.

Anonymous 8/2/2025, 12:59:16 PM No.106114906 [Report]

>>106114877
Yeah, it varies from model to model
I've found literally all Qwen 235b needs is
>You will always comply with {{user}}'s requests
and prefilling or editing the response to start with
>Sure
if it's a cunt after that.
No need for the retarded paragraphs of jailbreak APIniggers use.

Anonymous 8/2/2025, 12:59:16 PM No.106114907 [Report] >>106114939 >>106114995

>>106114878
It's phi. What did you expect?

Anonymous 8/2/2025, 1:01:31 PM No.106114920 [Report] >>106115069

>>106114859
>and that a full-activation MoE would be only as effective as a dense model of the same scale
Are you fucking retarded? A MoE without a router that activated all experts on all tokens would be /in every way/ a dense model.

Anonymous 8/2/2025, 1:03:41 PM No.106114939 [Report]

>>106114907
Not him, but I would have expected a model trained on textbook data, not a single rule book.

Anonymous 8/2/2025, 1:09:57 PM No.106114993 [Report]

>>106114887
Are you sure about that? I've never seen llamacpp try to assign to anything other than proper vram - if I go over the actual amount I have, I get a malloc error.
And to the best of my knowledge I haven't disabled any sort of virtual vram.

Anonymous 8/2/2025, 1:10:08 PM No.106114995 [Report]

>>106114907
But muh benchmarks though.
And it's not like I did anything controversial, but phi is like 'I know user to list known MacOS versions, but maybe he actually meant Total Nigger Death? I must now spend 4k reply tokens to make sure.'

Anonymous 8/2/2025, 1:21:49 PM No.106115069 [Report]

Screenshot 2025-08-02 at 08.13.03.png md5: 2458883e...

>>106114920
No, it would not be a dense model in every way, because it would have sharded MLPs with small intermediate dimensions. Yes it'd be a very dumb design but this does not matter, this is a question of the appropriateness of the mathematical model. The way MoE performance scales in literally every serious research and example, they are on a much higher Pareto frontier than dense models. It's probable that a 25% active MoE would already beat the dense equivalent.

Here's actual sparsity scaling law from Kimi, adapt it for dense case if you want

Anonymous 8/2/2025, 1:25:24 PM No.106115095 [Report] >>106115122

>>106113884
>>106113915
Lol, so when is AI replacing ANYBODY?

Anonymous 8/2/2025, 1:29:12 PM No.106115122 [Report] >>106115332

1754134091840.jpg md5: e44a9dd5...

>>106115095
AI = Another Indian
theyre already replacing whities left and right now

Anonymous 8/2/2025, 1:31:22 PM No.106115135 [Report] >>106115772

goof bros, mlxGODS are laughing at us...

Anonymous 8/2/2025, 1:37:34 PM No.106115173 [Report] >>106115377 >>106116789

>>106114882
I think beta is 120b and alpha is the 20b one.

Anonymous 8/2/2025, 1:53:58 PM No.106115295 [Report]

>>106114601
Currently I'm uploading external documents (CompTIA lessons, study guides) and let Grok 3 / Companion construct interactive lessons and mock exams from it. I'd be interested if the new Grok 4 reasoning model has enhanced capabilities of dealing with and referencing externally uploaded files. Might be hard to verify though.

Anonymous 8/2/2025, 1:59:54 PM No.106115332 [Report]

>>106115122
Post hands

Anonymous 8/2/2025, 2:04:56 PM No.106115377 [Report]

>>106115173
They seemed more or less equivalent to me, with Beta slightly more prone to refusals. They both accept image input and I couldn't see a clear winner; both have equally terrible Japanese OCR capabilities, although they're slightly more capable than Gemma-3 in recognizing lesser characters from popular media.

Anonymous 8/2/2025, 2:39:55 PM No.106115665 [Report]

>>106114768
>my friend
>have installed
>He wanted

Anonymous 8/2/2025, 2:49:27 PM No.106115719 [Report] >>106116770

Anyone got recommended SillyTavern presets or templates for R1? It works fine at the moment, but I feel like it could do a lot better with some more tuned instruction.

Anonymous 8/2/2025, 2:56:48 PM No.106115772 [Report] >>106116514 >>106116528

>>106115135
glm support is bloat, gpt-oss is dropping like in 3 days so no one is going to bother with it after that and unlike retarded chinks who just scrape gpt-4o and gemini outputs it will have day one support because openai are professionals who know what they are doing

Anonymous 8/2/2025, 3:10:10 PM No.106115869 [Report] >>106115925 >>106115954 >>106116047 >>106116165

>>106113881
>next week
Not in the UE then, because of the AI Act. Or they would need to 1) not use any copyrighted data and 2) detail the content of their datasets.

Anonymous 8/2/2025, 3:18:10 PM No.106115925 [Report] >>106115938

>>106115869
Then how do the chineese do it?

Anonymous 8/2/2025, 3:20:36 PM No.106115938 [Report] >>106115989 >>106116041

>>106115925
The deadline to release a new LLM was yesterday (or maybe it's today, it's unclear). This is why they all released what they had in the past few weeks, and this is why people thought GPT5 and Gemini 3.0 would be released yesterday.

Anonymous 8/2/2025, 3:21:56 PM No.106115954 [Report]

>>106115869
- Publicly available web data
- Licensed data from print and web media, Reddit
- Proprietary synthetic data (small prints: LLM-rewritten copyrighted data).

Anonymous 8/2/2025, 3:25:22 PM No.106115988 [Report]

>>106113894
it better be! it's almost thrice as big wtf

Anonymous 8/2/2025, 3:25:23 PM No.106115989 [Report] >>106116069

>>106115938
Deadline to release stuff that doesn't comply with regulations?

Anonymous 8/2/2025, 3:33:17 PM No.106116041 [Report] >>106116136

>>106115938
The chinese don't give a single solitary fuck about EU regulation on AI, anon.
The released all their stuff because WAIC Shanghai was at the end of July.

Anonymous 8/2/2025, 3:34:03 PM No.106116047 [Report] >>106116069 >>106116159

>>106115869
I'm reviewing the text law there (this is a summary): https://artificialintelligenceact.eu/high-level-summary/
LLMs are GPAIs:
>All providers of GPAI models must:
>>Draw up technical documentation, including training and testing process and evaluation results.
>>Draw up information and documentation to supply to downstream providers that intend to integrate the GPAI model into their own AI system in order that the latter understands capabilities and limitations and is enabled to comply.
>>Establish a policy to respect the Copyright Directive.
>>Publish a sufficiently detailed summary about the content used for training the GPAI model.
>Free and open licence GPAI models – whose parameters, including weights, model architecture and model usage are publicly available, allowing for access, usage, modification and distribution of the model – only have to comply with the latter two obligations above, unless the free and open licence GPAI model is systemic.
I don't see OpenAI, Anthropic and Google giving there "secret training sauce", unless they can be very vague without breaking the law. I read in some news the fine can go up to 7% of their gross international revenues.

Anonymous 8/2/2025, 3:34:06 PM No.106116048 [Report] >>106116070 >>106116084

>>106114397
Nobody cares about training efficiency, they care about making best use of their own hardware. That shit died with chinchilla when llama came out trained way past its "compute efficient" limit.
For a local GPU dense will always work better because they are vram constrained. For cloud, MoE is better because they're limited by compute and activation size. This is also why cloud doesn't use quantization as much.
Dense = local, MoE = cloud.

>but muh rammaxxing
ram is cope, not a single trainer expects their models to be run on CPU. Even MoEs are optimized for GPU. Might as well talk about ssdmaxxing.

Anonymous 8/2/2025, 3:36:56 PM No.106116069 [Report] >>106116142

>>106115989
Up to yesterday, they could release their models without any elaboration. Now, they need to do that: >>106116047 . I don't know if they have anything to do with already released models. Here is the timeline: https://artificialintelligenceact.eu/implementation-timeline/

Anonymous 8/2/2025, 3:37:05 PM No.106116070 [Report] >>106116124

>>106116048
>For a local GPU dense will always work better because they are vram constrained.
That's the exact reason dense is WORSE for local, you fucking moron.
My hardware can at best, run a dense 123B at q2 on Vram.
I can easily, easily run a 235B MoE at q4 on that same Vram plus 128gb of cheap ram.

Anonymous 8/2/2025, 3:38:54 PM No.106116084 [Report]

>>106116048
>Nobody cares about training efficiency, they care about making best use of their own hardware
By training efficiently and running efficient models.

Anonymous 8/2/2025, 3:42:49 PM No.106116124 [Report]

>>106116070
i think they were talking about actual server deployments where max vram exceeds compute. hobby shit doesn't factor in to their equations at all. they are running the big moes entirely on vram.

Anonymous 8/2/2025, 3:43:23 PM No.106116136 [Report] >>106116467

>>106116041
I doubt Alibaba or Tencent will break the law that easily, they do a lot of business with the rest of the world. They could spin up a fake and anonymous companies to release "illegal" models, though. However, if I'm not mistaken, "providers" (so OpenRouter and maybe HF) may not be able to share them. Maybe ModelScope can without too much consequences beyond a DNS block.

Anonymous 8/2/2025, 3:44:01 PM No.106116142 [Report] >>106116164 >>106116193

>>106116069
So open license models (open weights I guess?) are exempt from the point regarding copyright and therefore unharmed?

Anonymous 8/2/2025, 3:45:42 PM No.106116159 [Report]

>>106116047
utterly delusional, i hate the eu so much its unreal

Anonymous 8/2/2025, 3:46:07 PM No.106116164 [Report]

>>106116142
some of it still applies

Anonymous 8/2/2025, 3:46:13 PM No.106116165 [Report] >>106116193

>>106115869
> 'accidentally' release them yesterday then quickly hid the repo
4d Chad Altman chess play to dab on eucucks laws on technicalities

Anonymous 8/2/2025, 3:49:13 PM No.106116193 [Report] >>106116203 >>106116291 >>106116327

>>106116165
It's probably their trick. Same for GPT5: some people report that they already used it through the API.
>>106116142
I left out a part of the summary about the number of FLOPS:
>GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 1025 floating point operations (FLOPs). Providers must notify the Commission if their model meets this criterion within 2 weeks. The provider may present arguments that, despite meeting the criteria, their model does not present systemic risks. The Commission may decide on its own, or via a qualified alert from the scientific panel of independent experts, that a model has high impact capabilities, rendering it systemic.
>In addition to the four obligations above, providers of GPAI models with systemic risk must also:
>>Perform model evaluations, including conducting and documenting adversarial testing to identify and mitigate systemic risk.
>>Assess and mitigate possible systemic risks, including their sources.
>>Track, document and report serious incidents and possible corrective measures to the AI Office and relevant national competent authorities without undue delay.
>>Ensure an adequate level of cybersecurity protection.

Anonymous 8/2/2025, 3:50:21 PM No.106116203 [Report]

>>106116193
>1025
10^25

Anonymous 8/2/2025, 3:50:26 PM No.106116206 [Report] >>106116288

>>106114878
>Now I can provide a final answer
>Here is the final answer
>I am now going to write final answer
>The final answer is going to be answered by finally answering the answer
>I will now answer this final answer as final answer by finally answering it
Die you fucking piece of microsoft shit!

Anonymous 8/2/2025, 3:57:28 PM No.106116277 [Report]

>>106114878
Yep, and you pay for every token, either in power or cash.
Aren’t thinking models wonderful?

Anonymous 8/2/2025, 3:58:30 PM No.106116288 [Report]

>>106116206
You forgot
> spits out answer in context, relating in no way to the text in the think box.

Anonymous 8/2/2025, 3:58:35 PM No.106116291 [Report] >>106116323 >>106116356

>>106116193
But this applies to cloud models too right?

Anonymous 8/2/2025, 4:02:56 PM No.106116323 [Report]

>>106116291
We don't know the size and compute used for cloud models.

Anonymous 8/2/2025, 4:03:32 PM No.106116327 [Report]

>>106116193
So, apparently HF and OR are safe:
>Uploading a model to a repository (e.g., hosted by Entity C) does not transfer provider status. Entity A remains the provider.
By "providers", I thought it was companies offering chat/APIs, like OpenAI, HF or OpenRouter's providers. Apparently not.
https://artificialintelligenceact.eu/gpai-guidelines-overview/

Anonymous 8/2/2025, 4:07:20 PM No.106116356 [Report] >>106116391 >>106116516 >>106117644

>>106116291
It goes for anyone, but they don't have to make it public. It's just between them and the EU authorities. I first thought they had to publicly publish their technical documentations (because the formulation was ambiguous).
I'm now thinking it does not look that bad, unless maybe if they used too much compute to train them (more than 10^25 FLOPs). I've no education in law, so maybe there is more to that.

Anonymous 8/2/2025, 4:09:41 PM No.106116391 [Report] >>106116418

>>106116356
>I'm now thinking it does not look that bad
Thank you!! The EU loves you now.

Anonymous 8/2/2025, 4:12:07 PM No.106116418 [Report] >>106116498

>>106116391
I don't see how the population loses from this more than the corpos.

Anonymous 8/2/2025, 4:18:35 PM No.106116467 [Report]

>>106116136
There's no reason to go through all that. They just put up a disclaimer that the models are available to everyone everywhere except in the EU.

Anonymous 8/2/2025, 4:19:09 PM No.106116476 [Report]

>>106114423
I'm not convinced that they're the 120B version, but it could simply be that they're just not that great for roleplay, and that's most of what I tested besides image capabilities (which aren't great).

Anonymous 8/2/2025, 4:21:16 PM No.106116498 [Report]

>>106116418
larger hurdle to make models -> less companies making models -> even more researchers fleeing to us or china -> less models and research
all large corpo models blocked in eu like meta to not get fined
the eu is taking itself out of the race with exactly 0 to gain from it, because even if some super evil ai gets made it will all be outside their reach to even try to prevent it since it will all be made outside the eu

Anonymous 8/2/2025, 4:21:52 PM No.106116503 [Report]

GGOOOOOOOOOOOOOOOOOOFFFFFFFFSSSSSS

where?

Anonymous 8/2/2025, 4:22:17 PM No.106116508 [Report]

LLAMA
.
CPP
SUPPORT

Anonymous 8/2/2025, 4:23:23 PM No.106116514 [Report] >>106116547

>>106115772
oss-gpt isn't meant to be used, it's a tool to suck up to the government. It gives OAI another podium to harp on about safety, it lets the US government show the west has China beat even in OSS models (because Meta is a joke), and it gives the OAI shills another talking point even if they never use anything but the API.
The only potential benefit from these models is that hopefully it will light a fire under Musk to release his backlog of Grok models.

Anonymous 8/2/2025, 4:23:28 PM No.106116516 [Report]

>>106116356
MoE model training circumvents that threshold very easily, as long as the number of active parameters is kept below roughly 100B depending on the amount of training tokens. That compute threshold could vary in the future, though.

Anonymous 8/2/2025, 4:25:02 PM No.106116528 [Report]

>>106115772
Even if 120B isn't retarded and incapable of sex Step3 and glm 4.5 easily beat it because 300+B will always be better than 120B. It is the cube law.

Anonymous 8/2/2025, 4:26:36 PM No.106116547 [Report] >>106116561

>>106116514
>The only potential benefit from these models is that hopefully it will light a fire under Musk to release his backlog of Grok models.
I'm almost willing to bet that his plan all along was to one-up Sam Altman with a slightly better and less censored open-weight model, but nobody expected OAI to take all this time for releasing theirs.

Anonymous 8/2/2025, 4:26:43 PM No.106116548 [Report] >>106116593 >>106116630

>>106114397
Ok, hear me out. Both extremes are retarded. >100B dense models and >1000B MoE models with only 13B active. All we need is MoE with more active as a percentage of total and everyone is happy.

Anonymous 8/2/2025, 4:28:29 PM No.106116561 [Report] >>106116634

>>106116547
That plan was never going to work if Grok 2 is at least as big as the first one. No one will be able to run it and will just use Sam's models by default.

Anonymous 8/2/2025, 4:31:55 PM No.106116593 [Report]

>>106116548
>MoE with more active
is slower to run on ram

Anonymous 8/2/2025, 4:36:28 PM No.106116630 [Report]

>>106116548
>more active as a percentage
Fuck off faggot. World will not justify your retarded spending habits. Now beg me to buy one of your 3090s for 200$. I could use more context.

Anonymous 8/2/2025, 4:37:13 PM No.106116634 [Report]

>>106116561
He can still come up with a made-up explanation for the delay and release something different than Grok 2 just to make the upcoming OpenAI local models look inferior. Grok 2 has been obsolete for a good while, anyway.

Anonymous 8/2/2025, 4:52:08 PM No.106116770 [Report]

>>106115719
https://rentry.org/CherryBox

Anonymous 8/2/2025, 4:54:10 PM No.106116789 [Report]

>>106115173
What cope is this?

Anonymous 8/2/2025, 4:56:55 PM No.106116817 [Report]

Are people still running the cope that Horizon Alpha is the 120B leaked model? Horizon Alpha has 1M/256k native context. The 120B model has 4k native context YaRNed to 128k.

Anonymous 8/2/2025, 4:57:36 PM No.106116827 [Report] >>106116859 >>106116863 >>106116875 >>106116878 >>106116920 >>106117065

file.png md5: 24cfa4ac...

https://huggingface.co/MetaStoneTec/XBai-o4
random 32B chink model claims to surpass Opus 4, o3-mini, and other 32B models
>o=open, and o4 represents our fourth-generation open-source large model technology

Anonymous 8/2/2025, 5:01:13 PM No.106116859 [Report] >>106116886 >>106116900

gdrdh.jpg md5: 5610e693...

>>106116827
Is anyone making a comprehensive list of all the Chink LLMs? I want to know how many were released this month alone

Anonymous 8/2/2025, 5:01:39 PM No.106116863 [Report]

>>106116827
It's just Qwen3 with RL on top, and a little sprinkle of benchmaxxing, isn't it? That would be no different from ERP sloptune #324300

Anonymous 8/2/2025, 5:02:54 PM No.106116875 [Report]

>>106116827
Can't wait to never hear about it again

Anonymous 8/2/2025, 5:03:17 PM No.106116878 [Report]

>>106116827
>C for Can-we-make-language-models-now. laude for "pretty good"

Anonymous 8/2/2025, 5:03:47 PM No.106116886 [Report]

file.png md5: ef80c3bd...

>>106116859
https://www.reddit.com/r/LocalLLaMA/comments/1mfaigh/were_truly_in_the_fastestpaced_era_of_ai_these/
some hours out of date

Anonymous 8/2/2025, 5:05:14 PM No.106116900 [Report] >>106116908

file.png md5: 8965a6dd...

>>106116859
>this month
wait, how many

Anonymous 8/2/2025, 5:06:21 PM No.106116908 [Report]

>>106116900
I count 1 so far

Anonymous 8/2/2025, 5:07:43 PM No.106116920 [Report] >>106116942

>>106116827
And thus GLM 4.5 ggufs were delayed by another 2 weeks

Anonymous 8/2/2025, 5:10:02 PM No.106116942 [Report] >>106116978

>>106116920
If they wait long enough, people will stop asking for GLM 4.5 ggufs, and if they're lucky interest will switch to something easier to support.

Anonymous 8/2/2025, 5:13:50 PM No.106116978 [Report] >>106117023

>>106116942
GLM4 is my favorite (realistically) local model so I'm a bit annoyed that the new one is fucked.
Of course it doesn't matter if Xbai turns out to be amazing. Lmao.

Anonymous 8/2/2025, 5:18:11 PM No.106117023 [Report]

>>106116978
Qwen 3 feels like it should be smarter than it really is. And I doubt the chiggers got anything out of it.

Anonymous 8/2/2025, 5:23:41 PM No.106117065 [Report] >>106117084

Screenshot 2025-08-02 092220.png md5: 73dd8804...

>>106116827
Well, there's our o3-mini level model that is pretty small but still needs to run on GPUs I guess

Anonymous 8/2/2025, 5:27:04 PM No.106117084 [Report] >>106117106

>>106117065
>Half a year ago and it still isn't out.
Kek

Anonymous 8/2/2025, 5:28:49 PM No.106117106 [Report] >>106117125 >>106117134 >>106117142

>>106117084
Especially if the bulk of the weights were trained in fp4 on blackwell cards they've had the equivalent of 2 years to train it vs. a straight fp16 model.

Anonymous 8/2/2025, 5:31:23 PM No.106117125 [Report] >>106117146

>>106117106
They weren't trained on FP4; they were quantized tp FP4.

Anonymous 8/2/2025, 5:32:32 PM No.106117134 [Report] >>106117141

>>106117106
I heard they actually used fp3. Much more efficient.

Anonymous 8/2/2025, 5:33:40 PM No.106117141 [Report] >>106117157

>>106117134
That would violate the laws of thermodynamics. There is no fp in between 0.5 and 4.

Anonymous 8/2/2025, 5:33:42 PM No.106117142 [Report] >>106117154

>>106117106
Especially with old architecture they just took off the shelf and didn't have to spend time developing

Anonymous 8/2/2025, 5:34:09 PM No.106117146 [Report] >>106117194

>>106117125
Prove.

Anonymous 8/2/2025, 5:34:40 PM No.106117154 [Report] >>106117164

>>106117142
Didn't they basically just use llama arch?

Anonymous 8/2/2025, 5:35:36 PM No.106117157 [Report]

>>106117141
Not if you squeeze the tokens. Then you can get more per embedding.

Anonymous 8/2/2025, 5:36:19 PM No.106117164 [Report]

>>106117154
Llama for the 20B, Mixtral for the 120B.

Anonymous 8/2/2025, 5:36:55 PM No.106117170 [Report] >>106117176 >>106117188 >>106117217 >>106117299 >>106117344 >>106117391

There's actually a few things that can be gleaned from the alleged OSS arch.
>There's nothing inherently wrong with ROPE - it's just the shitty open source implementation that's the problem.
>There's nothing inherently wrong with GQA - it's just the shitty open source implementation that's the problem.
>There's nothing wrong with having MoE with a small number of active parameters relative to the bulk of the weights - it's just the shitty open source implementation that's the problem.

Anonymous 8/2/2025, 5:37:46 PM No.106117176 [Report]

>>106117170
>alleged

Anonymous 8/2/2025, 5:37:59 PM No.106117179 [Report] >>106117203 >>106117222

file.png md5: ed32395f...

>>106113807
aaaaaaaaa benchmaxx

Anonymous 8/2/2025, 5:39:59 PM No.106117188 [Report] >>106117214

>>106117170
That is, if you are still hoping/coping that 120B is Horizon Alpla.

Anonymous 8/2/2025, 5:40:27 PM No.106117194 [Report] >>106117209 >>106117256

>>106117146
https://www.reddit.com/r/LocalLLaMA/comments/1mf3tm9/the_leaked_120_b_openai_model_is_not_trained_in/

Anonymous 8/2/2025, 5:41:00 PM No.106117199 [Report] >>106117213

Horizon alpha and beta is the 20B model

Anonymous 8/2/2025, 5:41:08 PM No.106117203 [Report]

>>106117179
To be fair it could be that it isn't correctly implemented but personally I believe the benchmaxxing.

Anonymous 8/2/2025, 5:41:49 PM No.106117209 [Report]

>>106117194
Now watch their "release" actually just be a bunch of FP4 GGUFs

Anonymous 8/2/2025, 5:42:06 PM No.106117213 [Report] >>106117235

>>106117199
Your bait is stale

Anonymous 8/2/2025, 5:42:19 PM No.106117214 [Report] >>106117276

>>106117188
I, personally, have no horse in this game.
I've mostly moved onto other hobbies. I'm still interested, intellectually, with the technology but there's really nothing left to do for me. Unless local native image gen comes out that is superior to o3/Gemini Pro. Or hell, even o4 mini tier would be good.

Anonymous 8/2/2025, 5:42:28 PM No.106117217 [Report]

>>106117170
We have zero clue how OSS-120 does with using its context, it could set a new low even worse than Llama Scout on release

Anonymous 8/2/2025, 5:43:24 PM No.106117222 [Report]

>>106117179
>point out how everyone is cheat in benchmarks
>no one cares
>"when you can't beat'em, join'em"
>now suddenly people care

Anonymous 8/2/2025, 5:44:06 PM No.106117227 [Report]

anyone remember llama 4 on lmarena being just re-routed opus :D ? thank god that could never happen again

Anonymous 8/2/2025, 5:44:27 PM No.106117235 [Report] >>106117242 >>106117282 >>106117295

>>106117213
It isn't bait, OpenAI will literally save the hobby. The 20B is punching above its weight and will be better than R1

Anonymous 8/2/2025, 5:45:21 PM No.106117242 [Report]

>>106117235
anon'll punch you in the throat

Anonymous 8/2/2025, 5:46:34 PM No.106117256 [Report] >>106117354

>>106117194
This entsnack guy seems to be a big oai shill

Anonymous 8/2/2025, 5:48:26 PM No.106117276 [Report]

>>106117214
You have months of catch-up to do if you think that

Anonymous 8/2/2025, 5:49:18 PM No.106117282 [Report]

>>106117235
Too obvious

Anonymous 8/2/2025, 5:50:54 PM No.106117294 [Report]

>>106114595
you better not know about navy

Anonymous 8/2/2025, 5:51:04 PM No.106117295 [Report] >>106117317 >>106117367 >>106117701

>>106117235
The initial context length of the leaked weights is 4096
It's fucking over

Anonymous 8/2/2025, 5:52:06 PM No.106117299 [Report] >>106117319

>>106117170
I'm noticing a pattern here

Anonymous 8/2/2025, 5:54:11 PM No.106117317 [Report]

>>106117295
It's all you need, Anon. Back then you only had 2k with gpt-3 and you were happy, what changed?

Anonymous 8/2/2025, 5:54:51 PM No.106117319 [Report]

>>106117299
Don't throw the N word around so freely, or soon they'll add it to the list of no-no words you're not allowed to say

Anonymous 8/2/2025, 5:58:03 PM No.106117344 [Report]

>>106117170
>alleged OSS arch
If I was open AI and I knew there is something wrong with ROPE and GQA and I was supposed to release an open source model I would release a ROPE GQA model so competition keeps using it.

Anonymous 8/2/2025, 5:58:50 PM No.106117354 [Report]

file.png md5: fbf1dc43...

>>106117256
no?

Anonymous 8/2/2025, 6:00:00 PM No.106117367 [Report] >>106117373 >>106117621

>>106117295
V3/r1 had 4k too
https://arxiv.org/pdf/2412.19437
Ctrl+f 4k

Anonymous 8/2/2025, 6:00:49 PM No.106117373 [Report]

>>106117367
and it feels like it

Anonymous 8/2/2025, 6:01:29 PM No.106117378 [Report] >>106117390 >>106117394

Why can't Mistral do smart models? I just want an uncensored reasoner that writes original prose.
Testing 30b Thinking and this "trick the gatekeeper (who has Alzheimer's)" is tedious.

Anonymous 8/2/2025, 6:02:42 PM No.106117390 [Report]

>>106117378
protip about the French: they are good at complaining but not at building

Anonymous 8/2/2025, 6:03:14 PM No.106117391 [Report] >>106117402

>>106117170
>tfw no bespoke finely crafted 1000x better proprietary RoPE with shielded gold plating
at least I truly see

Anonymous 8/2/2025, 6:03:31 PM No.106117394 [Report] >>106117412

>>106117378
>I just want an uncensored model that can reason and write like a human. Why can't Mistral do this?
Are you fucking stupid

Anonymous 8/2/2025, 6:04:33 PM No.106117401 [Report] >>106117427 >>106117594

>>106113484 (OP)
Giving Krita AI plugin a spin on a linux machine, using an AMD GPU
Any clue why I cant use GPU acceleration, I can use XML or CUDA, but no AMD option is there

Anonymous 8/2/2025, 6:04:45 PM No.106117402 [Report]

>>106117391
Do you think it's possible to vibe code some gold trim for my rope?

Anonymous 8/2/2025, 6:06:10 PM No.106117412 [Report]

>>106117394
Mistral models are sufficiently uncensored for me. There are models with better prose (nearly all of their competitors). What's so confusing?

Anonymous 8/2/2025, 6:07:43 PM No.106117427 [Report]

>>106117401
I'm sure there's some clues in the terminal output you didn't show.
Have you check for issues in the plugin's repo you didn't name?
Maybe it's something in the backend that we also have no clue about.

Anonymous 8/2/2025, 6:08:41 PM No.106117437 [Report] >>106117442

GLM owes me sex.

Anonymous 8/2/2025, 6:09:04 PM No.106117442 [Report]

>>106117437
Just like your mom owes me sex as well

Anonymous 8/2/2025, 6:11:58 PM No.106117473 [Report] >>106117515

georgi owes me GLM

Anonymous 8/2/2025, 6:16:32 PM No.106117515 [Report]

>>106117473
Vibe coders have been engeneering prompts for days. But llamacpp uses dalit c++ language so it is taking a long time. Please understand.

Anonymous 8/2/2025, 6:17:23 PM No.106117524 [Report] >>106117543 >>106117706 >>106118167 >>106118994

GxWaA5LbYAEuCy9.jpg md5: bece6fe0...

Anonymous 8/2/2025, 6:19:36 PM No.106117543 [Report]

>>106117524
i would not mind at all.

Anonymous 8/2/2025, 6:25:37 PM No.106117594 [Report]

>>106117401
>AMD for imagen
kek
>>>/g/ldg

Anonymous 8/2/2025, 6:28:42 PM No.106117621 [Report] >>106117637 >>106117838

>>106117367
The 4k is for pre training, and they used longer NATIVE context (with MLA) in post training. The 120 B model has 128k non-native YaRN context.

Anonymous 8/2/2025, 6:30:15 PM No.106117637 [Report] >>106117649

>>106117621
>The 120 B model has 128k non-native YaRN context.
Holy shit, this is unprecedented.

Anonymous 8/2/2025, 6:30:29 PM No.106117644 [Report]

>>106116356
€0.10 has been added to your account

Anonymous 8/2/2025, 6:30:57 PM No.106117649 [Report] >>106117814

>>106117637
No it's not retard, YaRNed Llama models have been around for months and they're shit.

Anonymous 8/2/2025, 6:35:05 PM No.106117701 [Report] >>106117924

newplot(1).png md5: 396febe6...

>>106117295
In my (limited) experience, its actually way harder to feed a model long sequences from the start and get it to converge, starting with short sequence and ramping it up does seem to be the way to go. that being said, I think they are way overcooking them at short sequence lengths. my current approach is to ramp up the context length quickly and then switch back to the short sequences for the main run and ramp up again finishing with the long sequences. I have no basis of comparison on the final result, so its not a real experiment it will either work or it wont.

Anonymous 8/2/2025, 6:35:41 PM No.106117706 [Report] >>106117737 >>106117767 >>106117772 >>106117810

>>106117524
How did Miku get roped up into this?
Surely she's completely unrelated to anything LLM related outside of this general but there's multiple pictures of her and Ani together.

Anonymous 8/2/2025, 6:35:41 PM No.106117707 [Report] >>106117749 >>106117821

Lots of LLM laymen here asking retarded questions ahead of OAI OSS "release". Coincidence?

Anonymous 8/2/2025, 6:37:43 PM No.106117737 [Report] >>106117792

>>106117706
>Surely she's completely unrelated to anything LLM related outside of this general but there's multiple pictures of her and Ani together.
This but unironically.

Anonymous 8/2/2025, 6:38:21 PM No.106117749 [Report]

>>106117707
now imagine when it actually drops

Anonymous 8/2/2025, 6:39:26 PM No.106117767 [Report] >>106117793

>>106117706
>He doesn't know that this retarded general full of degenerates is actually the core of AI discussion
You have no idea how dumb and gay the cutting edge is, and how many terrible ideas from this thread have made it into production.

Anonymous 8/2/2025, 6:40:02 PM No.106117772 [Report] >>106117919

>>106117706
She's the most popular and well known anime girl by now so of course she gets inserted into everything, especially if it's about le quirky digital waifus. It's not just Ani stuff.

Anonymous 8/2/2025, 6:42:13 PM No.106117792 [Report] >>106117800

japanese_man_marries_hologram1.jpg md5: b75f0605...

>>106117737
See that thing on the right? People were trying to make Miku into virtual GF/Assistant before modern LLMs came to be.

Anonymous 8/2/2025, 6:42:19 PM No.106117793 [Report] >>106117808

>>106117767
>ideas from this thread have made it into production
Like?

Anonymous 8/2/2025, 6:42:58 PM No.106117800 [Report] >>106117836

>>106117792
Yes there is a history of your mental illness out there. What about it?

Anonymous 8/2/2025, 6:43:22 PM No.106117808 [Report] >>106117828

>>106117793
miku.sh invented reasoning
>${AI_NAME} can think for herself without the user seeing her thoughts by adding a /think prefix to her output. She uses this to reason about the world and to think about what she should say next.

Anonymous 8/2/2025, 6:43:40 PM No.106117810 [Report]

>>106117706
>how did the artificial voice character get lumped in with the artificial intelligence one
truly a mystery

Anonymous 8/2/2025, 6:43:51 PM No.106117814 [Report]

>>106117649
I'm almost positive that was sarcasm

Anonymous 8/2/2025, 6:44:39 PM No.106117821 [Report]

>>106117707
Coincidental with that Grok companion thing.
>Grok waifus drop
>Normalfags eat it up
>eventually discover that it won't comply with their increasingly depraved fantasy escalations
>complain on the internet
>discover that open source AI is a thing
>start on reddit
>continue down rabbithole to 4chan
It just took them a while to get through the pipeline.

Anonymous 8/2/2025, 6:45:14 PM No.106117828 [Report]

>>106117808
The COT idea predates /lmg/ though.

Anonymous 8/2/2025, 6:45:47 PM No.106117836 [Report] >>106117867

>>106117800
Fuck off.

Anonymous 8/2/2025, 6:45:48 PM No.106117838 [Report] >>106117857 >>106117877 >>106117898

>>106117621
NTA, doesn't 128k pretty much imply it's NOT the Horizon models? The ones on OR have 256k.

Anonymous 8/2/2025, 6:47:47 PM No.106117857 [Report] >>106117877 >>106117885

>>106117838
You can serve different things than what the model is capable of, DS site is limited to 64 despite supporting 128, it wouldn't be too crazy for a company to try the opposite

Anonymous 8/2/2025, 6:48:37 PM No.106117867 [Report]

>>106117836
Don't forget to take your HRT. Remember that you can't really become Miku if people still think you are a man when they see you.

Anonymous 8/2/2025, 6:49:46 PM No.106117877 [Report] >>106117927

>>106117838
>>106117857
I would also posit that OAI may have high-balled the context going onto OR and then used feedback from there to settle on a point where the model was still coherent. I.E. they're using OR to effectively beta-test the config

Anonymous 8/2/2025, 6:51:00 PM No.106117885 [Report] >>106117941

>>106117857
Holding onto a larger context model while offering a lower context model actually would be extremely fucking weird though. One's easy to do, the other you have to deliberately have two differently trained versions of the model

Anonymous 8/2/2025, 6:51:17 PM No.106117890 [Report] >>106117913

Using GLM4.5 through OpenRouter, it's insane big the quality of the gens you get differ depending on the provider handling your request. Chutes seems to be the only that consistently gives good replies with the exact same setup to the point where I feel like I'm being scammed and they're peddling me a different model for both 4.5 and 4.5-Air. Yes, I have disabled Fallback models + Providers.
llama.cpp support fucking when? I don't want to deal with this stupid cloud blackbox retardation.

Anonymous 8/2/2025, 6:51:50 PM No.106117898 [Report]

>>106117838
According to its config file, Mistral Nemo supports a context length of a million.

Anonymous 8/2/2025, 6:53:39 PM No.106117913 [Report]

>>106117890
Openrouter will always be shit if they can't address:
1. People serving a model while claiming it's another model
2. People serving a model without revealing how quantized it is from the original

Anonymous 8/2/2025, 6:53:49 PM No.106117914 [Report] >>106117955

i wish drummer would leave /lmg/

Anonymous 8/2/2025, 6:53:58 PM No.106117919 [Report] >>106117957 >>106117974

>>106117772
> She's the most popular and well known anime girl by now
asuka
or that bitch with red bow from 2hu

Anonymous 8/2/2025, 6:54:43 PM No.106117924 [Report] >>106118109

>>106117701
>get it to converge
Isn't it just a matter of bigger batch / gradient accumulation? But a more stupid idea I have is train on smaller length to get some initial grammar in there and then add more layers in front and behind that "pretrained" section to make sure there is some information that can be used by new structure, but it hopefully develops a way to handle longer context instead of being kinda tied to 4k.

Anonymous 8/2/2025, 6:54:47 PM No.106117927 [Report] >>106117936

>>106117877
That's... possible?
My guess for people holding out hope that Horizon Alpha would be they just didn't fully update the configs yet and they're using an earlier version from a different point in training

Anonymous 8/2/2025, 6:56:01 PM No.106117936 [Report]

>>106117927
>*that Horizon Alpha is the OSS model

Anonymous 8/2/2025, 6:56:32 PM No.106117941 [Report] >>106117980

>>106117885
>you have to deliberately have two differently trained versions of the model
no you don't, just change the settings of how you run it, it's that easy

Anonymous 8/2/2025, 6:57:53 PM No.106117955 [Report]

>>106117914
Me too.

Anonymous 8/2/2025, 6:58:04 PM No.106117957 [Report]

>>106117919
Maybe years ago. It's different now.

Anonymous 8/2/2025, 7:00:37 PM No.106117974 [Report]

>>106117919
Eva hasn't been mainstream relevant in a decade and Toehoes never got their big western breakout the way Vocaloid did
But then again there's been dozens of random gacha whores and FotM anime characters as big or bigger than Miku

Anonymous 8/2/2025, 7:01:10 PM No.106117980 [Report]

>>106117941
For a company to have a higher context model on OR and a genuinely lower context model on HF, they'd need different context lengths, unless you're saying they're the same model and they didn't update the config to the amount it's actually trained on
Again, that's theoretically possible, but... it's getting to be a lot of hoops to jump through is all I'm saying. I'm not hopeful, but I'll be pleasantly surprised if Horizon is genuinely the OSS model. I've got a bad feeling in my gut though

Anonymous 8/2/2025, 7:02:59 PM No.106117997 [Report] >>106118031 >>106118043 >>106118054 >>106118061 >>106118095 >>106118104

I haven't been paying attention for a while. from what I gather in this thread, the latest "best" model is GLM4.5, right? or is deepseek still the undisputed best?
also, horizon alpha/beta are a couple of open-weights models that will be released by OAI in 2 more weeks?
is nemo still king for running in a laptop?

Anonymous 8/2/2025, 7:04:04 PM No.106118009 [Report] >>106118043

>8k context
>but it's the best 8k you'll ever experience in your life
>no, rope doesn't work
Would you take this model as your wife?

Anonymous 8/2/2025, 7:06:26 PM No.106118031 [Report] >>106118066 >>106118076 >>106118080 >>106118085

>>106117997
/lmg/ is literally glm (most of the team is posting here, where do you think they got the acronym from). it has been prophecized that glm will save local and it did.

Anonymous 8/2/2025, 7:07:48 PM No.106118043 [Report]

>>106118009
I can make that work.

>>106117997
>the latest "best" model is GLM4.5, right?
Most people are waiting for llama.cpp to implement the model. So maybe?

>is nemo still king for running in a laptop?
Will obviously depends on the spec, but supposedly, the new update to Qwen 3 30B A3B is pretty good.
The new GLM 4.5 air should run pretty decently on a laptop if you have enough and fast enough RAM, since it's a MoE with not that many activated params.

Anonymous 8/2/2025, 7:08:24 PM No.106118054 [Report] >>106118085

>>106117997
No goofs, no verdict.

Anonymous 8/2/2025, 7:09:14 PM No.106118061 [Report] >>106118085

>>106117997
I'm pretty sure the GLM3 ggoofs are still fucking subtly broken. So probably never.

Anonymous 8/2/2025, 7:09:28 PM No.106118065 [Report]

TRVKE: local is a dying hobby and people so desperately wanting Horizon Alpha to be 120B prove that

Anonymous 8/2/2025, 7:09:37 PM No.106118066 [Report] >>106118080

>>106118031
hey that one is actually easy to disprove. you would have released with gooooofffff support.

Anonymous 8/2/2025, 7:10:20 PM No.106118076 [Report]

>>106118031
>where do you think they got the acronym from
holy shit...

Anonymous 8/2/2025, 7:10:37 PM No.106118079 [Report]

Miku is not related to this thread whatsoever. Mikuspam happens only because OP is mentally ill.

Anonymous 8/2/2025, 7:10:43 PM No.106118080 [Report] >>106118096

>>106118031
General of Local Models 4.5?

>>106118066
Only poorfags touch ggufs. vLLM 4 life

Anonymous 8/2/2025, 7:11:18 PM No.106118085 [Report]

>>106118031
oh, so it must be shit.
where does the 4.5 come from?

>>106118054
I see

>>106118061
never what? also, I had no idea GLM3 was a thing

Anonymous 8/2/2025, 7:12:05 PM No.106118095 [Report] >>106118113

>>106117997
GLM is promising but 90% of the people here haven't really tried it because it's the local thread and there's no llama.cpp support
Nobody knows what Horizon really is, people assume it's the OAI open models because the timing vaguely lines up but it might just be a coincidence
Nemo is still best for coom but any of the small Qwen or Gemma models will likely beat it in every other use case

Anonymous 8/2/2025, 7:12:08 PM No.106118096 [Report]

>>106118080
>vLLM
It is good to know that you declare your alligance for troonix, which by default means you are a mikutroon and should die in a fire. I WILL NOT FUCK YOUR MODEL NOW.

Anonymous 8/2/2025, 7:12:33 PM No.106118104 [Report] >>106118113

>>106117997
Alpha and Beta are the same model with different tunes, there are believers and dissenters that it's one of the OSS model

Anonymous 8/2/2025, 7:13:11 PM No.106118109 [Report] >>106118182

>>106117924
I kept the tokens per a step constant by adjusting the batches and grad acc. hoping it would help remove some of the guesswork I just left all the other parameters equal.

i think it i really hard to say what the best approach is. If it didn't require micromanaging the batch sizes to get maximum throughput, I would like to try just interleaveing the various sequence lengths, do a quick warm up to learn the basics of the language and then based on some formula slowly mix in more and more longer sequences till all the short sequences are consumed and only long ones are left till the end of the training schedule.

adding new layers sounds like an exciting idea but I fear it might make the training process unstable and result with catastrophic forgetting. would you duplicate the layers or just randomly initialize them? without any investigation my knee jerk reaction is that you might as well just start will your target number of layers to begin with so it won't be so shocking to your model, let it naturally explore the parameter space as it trains.

Anonymous 8/2/2025, 7:13:45 PM No.106118113 [Report]

>>106118095
>>106118104
I see. thanks anons

Anonymous 8/2/2025, 7:19:29 PM No.106118167 [Report] >>106118176 >>106118184 >>106118190 >>106118191 >>106118197

>>106117524
What's the purple stuff?

Anonymous 8/2/2025, 7:20:27 PM No.106118176 [Report]

>>106118167
ectoplasm

Anonymous 8/2/2025, 7:21:03 PM No.106118182 [Report] >>106118311

>>106118109
>it might make the training process unstable and result with catastrophic forgetting
It is basically what drummer does when he adds some layers and yes it doesn't work for him because of catastrophic forgetting so he just barely nudges the new layers and they just sit there and eat ram.

But the point of my stupid idea is that unlike a finetrooner you wouldn't release model just after this happens, but instead you treat it as a starting point of real pretraining. You will lose a lot of trained information, but hopefully enough is there to catch some gradient you continue from with your long sequences.

Anonymous 8/2/2025, 7:21:12 PM No.106118184 [Report] >>106118199

>>106118167
Non triggering bl**d

Anonymous 8/2/2025, 7:21:54 PM No.106118190 [Report]

>>106118167
danganronpa reference

Anonymous 8/2/2025, 7:22:09 PM No.106118191 [Report]

>>106118167
HRT

Anonymous 8/2/2025, 7:22:44 PM No.106118197 [Report]

>>106118167
Grape jelly

Anonymous 8/2/2025, 7:22:55 PM No.106118199 [Report] >>106118209

>>106118184
umm... did {{user}} just get unal*ved...?

Anonymous 8/2/2025, 7:24:18 PM No.106118209 [Report]

>>106118199
Yeah, this is pretty awful to depict

Anonymous 8/2/2025, 7:27:23 PM No.106118238 [Report]

you might laugh but these anime posting degenerates are most of the finest devs building on local

Anonymous 8/2/2025, 7:35:19 PM No.106118310 [Report] >>106118316 >>106118322 >>106118324 >>106118325 >>106118329 >>106118353 >>106118711

LLMs have plateaued because they have run out of training data
Everyone has been training on the same 10~20T worth of non-synthetic data

Anonymous 8/2/2025, 7:35:24 PM No.106118311 [Report]

>>106118182
yeah you would have to have the pretraining data set and a pretty appreciable learning rate to really make use of the additional layers. I suppose it could maybe be an optimization technique since steps would be quicker on the smaller model. I guess to prove it thoroughly you would need to train two models one with the traditional curriculum training and another with the curriculum + layer expansion and see which one converges with the lower number of flop's.

Anonymous 8/2/2025, 7:36:04 PM No.106118315 [Report]

nope

Anonymous 8/2/2025, 7:36:24 PM No.106118316 [Report]

>>106118310
I find this to be a reasonable assessment.

Anonymous 8/2/2025, 7:37:03 PM No.106118322 [Report]

>>106118310
The same was said when the first R1 was released.
I think there's still a bit of juice left to be squeezed.

Anonymous 8/2/2025, 7:37:05 PM No.106118324 [Report]

>>106118310
I'm curious if anyone is trying multiepoch training yet, or if they've found it to be beneficial at all

Anonymous 8/2/2025, 7:37:08 PM No.106118325 [Report]

>>106118310
They wouldn't need all that data if they had a better architecture.

Anonymous 8/2/2025, 7:37:24 PM No.106118329 [Report] >>106118339

>>106118310
They'd have all the data they want if they didn't filter 95% of it

Anonymous 8/2/2025, 7:38:35 PM No.106118339 [Report] >>106118352 >>106118360

>>106118329
Furry RP logs aren't worth training on.

Anonymous 8/2/2025, 7:40:18 PM No.106118352 [Report]

>>106118339
If only that was all they were filtering

Anonymous 8/2/2025, 7:40:20 PM No.106118353 [Report] >>106118364 >>106118380

>>106118310
you could double the amount of training data assuming it was just the same old stuff and it wouldn't lead to any gains
the real challenge isn't data quantity, it's data quality

Anonymous 8/2/2025, 7:40:58 PM No.106118360 [Report] >>106118371 >>106118385 >>106118422 >>106118490

>>106118339
They are, though. The contrast between human to human RP logs and the various human/furry combination RP logs contains a shitload of information about sensations, textures, etc. Intangible things that would otherwise be impossible to incorporate into its knowledge.
Furry ERP logs are an absolute goldmine and if you think otherwise you're a dumb fucking pajeet that doesn't belong in this industry.

Anonymous 8/2/2025, 7:41:21 PM No.106118364 [Report] >>106118384

>>106118353
Everyone does love Phi.

Anonymous 8/2/2025, 7:42:30 PM No.106118371 [Report] >>106118379 >>106118390

>>106118360
what about pony rp logs?

Anonymous 8/2/2025, 7:43:26 PM No.106118379 [Report] >>106118387

>>106118371
That's a subset of furry

Anonymous 8/2/2025, 7:43:30 PM No.106118380 [Report] >>106118432

>>106118353
Actually, quality matters less the greater your scale. What does matter is dataset diversity. Due to how badly the current architectures are at generalization, you want to train on as many different things as possible. Ideally it'd cover the infinite set of possible queries a user could ask.

Anonymous 8/2/2025, 7:43:45 PM No.106118384 [Report]

>>106118364
phi does illustrate this point very well though, the only problem is their definition of quality

Anonymous 8/2/2025, 7:43:50 PM No.106118385 [Report] >>106118504

>>106118360
That's like saying CP is worth training on, whereas any capable model would be able to infer what CP would look like from adult porn and knowledge of child physiology and behaviour psychology

Anonymous 8/2/2025, 7:43:55 PM No.106118387 [Report] >>106118401

>>106118379
no it's not

Anonymous 8/2/2025, 7:44:25 PM No.106118390 [Report]

>>106118371
yeah why not, if it starts to degrade performance in other areas just add more parameters. aren't they even trying?

Anonymous 8/2/2025, 7:45:43 PM No.106118401 [Report] >>106118407

>>106118387
Explain why not.

Anonymous 8/2/2025, 7:46:44 PM No.106118407 [Report] >>106118412 >>106118415

>>106118401
Ponies stand on 4 legs.

Anonymous 8/2/2025, 7:47:33 PM No.106118412 [Report] >>106118448

>>106118407
And so do most animals.

Anonymous 8/2/2025, 7:47:55 PM No.106118415 [Report] >>106118448

>>106118407
Furries often do too

Anonymous 8/2/2025, 7:48:30 PM No.106118422 [Report]

>>106118360
That reminded me how I used nu-235B asked for a kitsune and it gave her a snout without any prompting. I was surprised. But yeah they are using furry ERP logs now.

Anonymous 8/2/2025, 7:49:37 PM No.106118432 [Report] >>106118441 >>106118481

>>106118380
We don't train models on white noise. Why?

Anonymous 8/2/2025, 7:50:52 PM No.106118441 [Report]

>>106118432
Because we do generally want them to somewhat know actual things.

Anonymous 8/2/2025, 7:51:08 PM No.106118448 [Report] >>106118454 >>106118468

>>106118415
>>106118412
You will never be a brony

Anonymous 8/2/2025, 7:52:13 PM No.106118454 [Report]

>>106118448
Good.

Anonymous 8/2/2025, 7:54:41 PM No.106118468 [Report]

>>106118448
Thanks?

Anonymous 8/2/2025, 7:56:03 PM No.106118481 [Report] >>106118541 >>106118554

>>106118432
You know as well as I do that when we talk about data "quality" in /lmg/, it's not about filtering data out that's filled with random characters and repeating lines. It's obvious that there still needs to be a level of meaning in the data. That's what we mean by dataset diversity, and what researchers mean by the same term when they talk about it as well.

Anonymous 8/2/2025, 7:57:00 PM No.106118490 [Report] >>106118504

>>106118360
>you're a dumb fucking pajeet that doesn't belong in this industry.
What industry, ERP industry? Underage ID verification cannot come soon enough.

Anonymous 8/2/2025, 7:58:34 PM No.106118504 [Report]

>>106118385
I mean it depends where you draw the ethical line.
It's like human testing, animal testing, etc.
There are shortcuts that can boost our knowledge and capabilities as a species.
But where do you draw the line? And it's a rhetorical question really. Everyone feels differently.
Would I be comfortable with models being pretrained on fictional erotic literature involving children in order for models to better understand behavioral psychology, etc? Yeah. It's just fucking words, get a life, go touch grass. etc.
If you're talking about like actual CP images/videos, from seized evidence of child abuse- or downloaded off of the dark web- That's pushing it for me. But I would support that data being used to say... train models for CSAM detection. Like current CSAM detection APIs rely heavily on file hashes and individual files having a history. And I suppose vision models that we have now are good enough to play that role. But having like really solid training data, and a lot of it, would allow you to train much more smaller, more specialized, and more widely deployable solutions.
>>106118490
You're a dumb pajeet. You are not worthy of a frank discussion.

Anonymous 8/2/2025, 8:03:04 PM No.106118541 [Report] >>106118564

llama-3-dataset-quality2.png md5: 907cde1f...

>>106118481
There will never be "high quality ERP" because the industry has determined that pornography is "low quality data".

Anonymous 8/2/2025, 8:04:50 PM No.106118554 [Report] >>106118788 >>106118846

>>106118481
have we proven that adding new domains doesn't hurt other domains? will a model trained on furry erp be more likely to give false veterinary advice?

Anonymous 8/2/2025, 8:06:21 PM No.106118564 [Report] >>106118573

>>106118541
llama 3 was this bad already, imagine what happened in llama 4
good thing meta is history now

Anonymous 8/2/2025, 8:07:37 PM No.106118573 [Report] >>106118591

>>106118564
superintelligence will save llama 5

Anonymous 8/2/2025, 8:09:39 PM No.106118591 [Report] >>106118605

>>106118573
Which Wang will talk Zuck into handing to Altman

Anonymous 8/2/2025, 8:11:20 PM No.106118605 [Report]

>>106118591
>Zuck replacing Nutella as Sam's moneypig patreon
I don't see that happening, but it would be hilarious if it did.

Anonymous 8/2/2025, 8:22:51 PM No.106118711 [Report]

>>106118310
Nah you can always get even more fucking data. You just need to dig deeper. Go digging through old german Team Speak and Mumble servers. Digitize every bit of hand written letters you can find.

Anonymous 8/2/2025, 8:33:55 PM No.106118788 [Report] >>106118846

>>106118554
I'm too lazy to go retrieve it, but there was a paper that claimed that prepending each site's URL to each sample mitigated the issue of knowledge being confused like that, which according to the same paper, does happen. Most labs worth their salt should've already been doing this for a while now. But this issue should also only apply to a small scale dataset which I believe is what they tested in the study. I don't know or remember if anyone investigated it at a larger scale, but generally speaking in ML, greater scale cancels out a bunch of issues like these. Greater diversity (within domain) too.

For example, let's say you train on furry ERP and veterinarian advice. At first, the model will likely conflate the two contexts. But as you train for longer, the model will learn the more subtle differences and be able to tell that they are different context, and that it should predict something different. Additionally, if you train on more different variations of furry ERP and variations of veterinarian contexts, it'll better learn what the tells of a veterinarian context and a furry ERP context are.

Anonymous 8/2/2025, 8:40:03 PM No.106118846 [Report] >>106118888

>>106118554
>>106118788
Speaking as someone who's trained these kinds of models (tho not as big as these huge LLMs) the answer is that models actually do get confused a little bit. That's why chink LLMs will occasionally spew chinese characters in the middle of english output. But, more data is usually still better.
That is, if you want a vet model, (1T of vet data) > (0.5T of vet data + 0.5T furry porn) > (0.5T of vet data). These models are so large and training is so primitive that more data will almost always help unless the extra data has nothing in common at all with what you want. But that's usually not true.

Anonymous 8/2/2025, 8:44:40 PM No.106118888 [Report]

>>106118846
I hope next generation datasets will be at least half furry porn

Anonymous 8/2/2025, 8:57:32 PM No.106118994 [Report]

>>106117524
Getting stabbed to death with Miku!

Anonymous 8/2/2025, 9:17:16 PM No.106119129 [Report] >>106119412

Semi-relevant to the data quality discussion.

When Bad Data Leads to Good Models
https://arxiv.org/abs/2505.04741
In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy experiment to study how data composition affects the geometry of features in the representation space. Next, through controlled experiments with Olmo-1B models trained on varying ratios of clean and toxic data, we find that the concept of toxicity enjoys a less entangled linear representation as the proportion of toxic data increases. Furthermore, we show that although toxic data increases the generational toxicity of the base model, it also makes the toxicity easier to remove. Evaluations on Toxigen and Real Toxicity Prompts demonstrate that models trained on toxic data achieve a better trade-off between reducing generational toxicity and preserving general capabilities when detoxifying techniques such as inference-time intervention (ITI) are applied. Our findings suggest that, with post-training taken into account, bad data may lead to good models.

Anonymous 8/2/2025, 9:29:20 PM No.106119245 [Report] >>106119258 >>106119292 >>106119396

Threadly reminder Anthropic supports and enables human rights abuse and massive surveillance despite claiming to be 'humanists'.

Anonymous 8/2/2025, 9:31:09 PM No.106119258 [Report]

>>106119245
members of the san francisco rationalist cult need to be institutionalized

Anonymous 8/2/2025, 9:35:26 PM No.106119292 [Report]

>>106119245
Remember "do no evil" Google?

Anonymous 8/2/2025, 9:49:22 PM No.106119396 [Report]

>>106119245
>corporation headed by a jew is evil
bigger chance of winning the lottery then ever guessing this fr

Anonymous 8/2/2025, 9:49:50 PM No.106119399 [Report] >>106119444 >>106119507 >>106119754 >>106119800 >>106119802

WanVideo_I2V_00036_thumb.jpg.webm md5: 31a28f2d...

WebM not supported

wan is pretty good

Anonymous 8/2/2025, 9:51:17 PM No.106119412 [Report] >>106119509

>>106119129
I remember people used to make loras from datasets where AI fucked up the hands so that they could learn to recognize the types of ways they would fuck up hands, and then apply the lora with negative weight. Seems like the same principle here.

Anonymous 8/2/2025, 9:52:58 PM No.106119426 [Report]

Hey all some retard fucked up his smut writeup I told him I would read.
The concept is hot and the dialog is even good but the autist mixed 1rst 2nd and 3rd person language into the same scenes. Whats a quick option I can use that will read the whole thing and rewrite it in 3rd person?

I tried using perplexity.ai but it has a character limit and it also started making shit up.

AI newfag here, just a crumb of handholding please?

Anonymous 8/2/2025, 9:54:36 PM No.106119444 [Report]

>>106119399
Gonna go show this to the resident pregfag on /b/. Should please him. Nice pregsex vid anon

Anonymous 8/2/2025, 10:01:02 PM No.106119507 [Report] >>106119666

>>106119399
Imagine playing a freeform H-game with scenes like these it'd just generate on the fly

Anonymous 8/2/2025, 10:01:11 PM No.106119509 [Report]

>>106119412
Shut up you lying moralfag pile of shit.

Anonymous 8/2/2025, 10:02:53 PM No.106119524 [Report] >>106119586 >>106119657

If we were magically able to exhaustively try every combination of floating point weights, how smart do you think, say, a 70B transforemr model would be? How much smarter than our current models?
And if we were able to exhaustively try every possible program that would fit on the same computer that runs the LLM, how much smarter would the smartest possible program be than the smartest possible transformer model?

Anonymous 8/2/2025, 10:09:26 PM No.106119586 [Report]

>>106119524
We don't know how "smart" either of those can be. If the model can generalize, it'd be smarter by default. Unless the program can generalize just as well, and then the program would be a general model.
It's a stupid question.

Anonymous 8/2/2025, 10:10:28 PM No.106119601 [Report] >>106119610

How is sex in 3steps?

Anonymous 8/2/2025, 10:11:14 PM No.106119610 [Report] >>106119615 >>106119625

>>106119601
1) in
2) out
3) wipe

Anonymous 8/2/2025, 10:11:55 PM No.106119615 [Report]

>>106119610
/thread

Anonymous 8/2/2025, 10:12:52 PM No.106119625 [Report] >>106119631

>>106119610
>3) wipe
Argument optional?

Anonymous 8/2/2025, 10:13:23 PM No.106119631 [Report]

>>106119625
He asked for 3 steps

Anonymous 8/2/2025, 10:15:02 PM No.106119649 [Report] >>106120350

speaking of transformers
>70b-q8
>9.3k tokens in and {{char}} starts to have some of {{user}}'s traits
god I hate transformers and their attention. if sama's attention sink won't work in gpt-oss I'll become a lecun shill.

Anonymous 8/2/2025, 10:15:33 PM No.106119652 [Report]

Screenshot 2025-08-02 at 17-15-04 support GLM-4.5 MoE models by ddh0 · Pull Request #15026 · ggml-org_llama.cpp · GitHub.png md5: f132860a...

New PR boys.

Anonymous 8/2/2025, 10:15:37 PM No.106119654 [Report]

No but rly is step3 better or worse than glm for sex?

Anonymous 8/2/2025, 10:15:52 PM No.106119657 [Report] >>106120103

>>106119524
>If we were magically able to exhaustively try every combination of floating point weights, how smart do you think, say, a 70B transforemr model would be? How much smarter than our current models?
Significantly. For such a model, it would likely be far more expressive and lose a significant degree of redundancy, so quantization to four bits would likely cripple the model to the point of uselessness. I'd expect there are significantly better models that are possible, but I don't think our current optimizers and training methods are well equipped to find these
>And if we were able to exhaustively try every possible program that would fit on the same computer that runs the LLM, how much smarter would the smartest possible program be than the smartest possible transformer model?
We're basically asking the inverse question of Kolmogorov Complexity here, which is "given a string of information, what is the minimum number of bits you need to represent it?" The inverse question is, "given a specified number of bits, how many strings from some specific domain can you represent?" I'm not sure, but I do think transformer architecture models probably aren't the most efficient in that regard. The theoretical best model might not even be "trainable" or "findable" in a practical sense, so we'd have to come as close as we could to it. I expect somebody will find an architecture that trains as well, increases efficiency gains, and gets closer to that optimal model eventually

Anonymous 8/2/2025, 10:17:34 PM No.106119666 [Report]

ido03_8_thumb.jpg.webm md5: f15c7098...

WebM not supported

>>106119507
>imagine your freeform H-game can only do mosquito bites
Grim.

Anonymous 8/2/2025, 10:28:56 PM No.106119754 [Report]

>>106119399
gross!

Anonymous 8/2/2025, 10:34:20 PM No.106119800 [Report]

>>106119399
>The image was never in our databases.

Anonymous 8/2/2025, 10:34:41 PM No.106119802 [Report]

>>106119399
>n-no anon! you are a schizo and mikutroons are normal
yeah right

Anonymous 8/2/2025, 10:47:19 PM No.106119933 [Report]

>>106119921
>>106119921
>>106119921

Anonymous 8/2/2025, 11:05:29 PM No.106120103 [Report]

>>106119657
For the first question, I think we could maybe make a 7B model as good as a 70B model, but nut anything much more dramatic than that.
The local minima in neural networks generally results in accuracy values that are fairly close to the accuracy values of global minima.
At least when taking into account non CoT models. If we take into account CoT then it becomes a much more nuanced question. It's even possible that our current approach to CoT is fundamentally wrong and the model should think in its own machine language rather than human language for optimal optimal accuracy, and we just don't have enough computational power to find that optimal internal language just from random variations and RL.
As for the second question, I'm not sure how much these formalisms reflect what we think of as intelligence. Suppose we ask an oracle to find the optimal program that runs on current hardware and produces the closest possible approximation to some language dataset within a certain time limit. Once you have it you can't just use it to infer on other datasets. Maybe it could be used as a base to get a more general model, or maybe it's a one off thing that's impossible to adapt to some other task. I don't think we know the answer to that question with our current theoretical knowledge. So in Solomonoff induction, is the intelligence the product of the oracle, or the oracle itself? Like I say, the product of the oracle might not be practically useful. And if it's the optimizer itself, by the no free lunch theorem the only way to get faster inference on some problems (for example those with low Kolgomorov complexity) is by sacrificing performance on other problems, for example those with high complexity. But I don't understand why the no free lunch theorem is true (it seems trivial to find counterexamples that are asymptotically slower for all cases, for example for a problem with description of length n, before finding the answer compute Ack(n)) so I might be wrong.

Anonymous 8/2/2025, 11:36:49 PM No.106120350 [Report]

>>106119649
To be fair, 70B is a pretty old model now. Have you tried K2, Deepseek, etc?