← Home ← Back to /g/

Thread 106113484

344 posts 72 images /g/
Anonymous No.106113484 [Report] >>106117401
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous Threads: >>106108045 and >>106104055

>(07/31) Qwen3-Coder 30B released: https://qwenlm.github.io/blog/qwen3-coder
>(07/31) Command A Vision: Built for Business: https://cohere.com/blog/command-a-vision
>(07/31) Step3 multimodal reasoning 321B-A38B released: https://stepfun.ai/research/en/step3
>(07/31) Committed: llama-server : implement universal assisted decoding: https://github.com/ggml-org/llama.cpp/pull/12635
>(07/31) Cogito v2 Preview released: https://deepcogito.com/research/cogito-v2-preview

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106113620 [Report]
>Want to test a new model
>Qwen3 30B thinking
>Download takes four hours
maaaaaan
I
Anonymous No.106113641 [Report] >>106113669 >>106113689
What context size should I use?
>7600 XT (16GB VRAM)
>Ryzen 3700X + 80GB RAM
>llama.cpp + Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf
How much worse will this be compared to cloud services?
Anonymous No.106113669 [Report] >>106113709
>>106113641
while on that topic:
How do you determine what model you can run on a given hardware?
Anonymous No.106113679 [Report] >>106113695 >>106113719 >>106113776
lol
Anonymous No.106113689 [Report]
>>106113641
in terms of quality? a lot but still useful.
Anonymous No.106113695 [Report]
>>106113679
i love benchmarks
Anonymous No.106113707 [Report]
>how do you run +200B model?
on my 4x3090 rig, with exl3 fully offloaded usually at Q4.
For bigger models I have to use ik_llama.cpp with some layers on ram of course. The performance is great for chat bots, but the missing tool support of ik is a pain...
Anonymous No.106113709 [Report] >>106113714
>>106113669
weather it fits in your vram with some additional room. if it's a mixture of experts (like that qwen model) you can afford to have a lot in ram too
Anonymous No.106113714 [Report] >>106113765
>>106113709
I mean, is there some formula to determine it?
Anonymous No.106113719 [Report]
>>106113679
Dense is so 2023
Anonymous No.106113747 [Report] >>106113767 >>106114543
Mikulove
Anonymous No.106113765 [Report] >>106113775
>>106113714
It's not complicated math, anon. You don't need a formula.
Model is Xgb big, so it needs at least Xgb of memory to load into. If that memory is VRAM, it will run at the intended speed, if that memory is system RAM, it will run slower.
The more of it in system RAM, the slower it will run.
Don't even have enough system RAM to fit the model? You can't run it.
Anonymous No.106113767 [Report] >>106113995 >>106114543
>>106113747
Anonymous No.106113775 [Report] >>106113791
>>106113765
But don't you need VRAM for the context as well? Or does thta go to system RAM?
Anonymous No.106113776 [Report] >>106113807
>>106113679
>72B
99.5% chance that it's a Qwen2.5 72b finetune then.
Anonymous No.106113791 [Report] >>106113814
>>106113775
It can go into either, same speed penalties apply.
There's no universal way to calculate how much space context will take, since it varies significantly from model to model and is a large area of innovation. It's just done by trial and error, any competent backend will tell you how much memory is attempting to be assigned to context/kv cache, so you eyeball it from that.
Anonymous No.106113807 [Report] >>106113813 >>106117179
>>106113776
https://huggingface.co/Skywork/MindLink-72B-0801
>Base model
>Qwen/Qwen2.5-72B
Anonymous No.106113813 [Report]
>>106113807
keek
Anonymous No.106113814 [Report] >>106113839
>>106113791
don't you set context size when you start the server?
Anonymous No.106113815 [Report] >>106113821 >>106113851 >>106113881
we got our chink model of the day

we are so back
Anonymous No.106113821 [Report]
>>106113815
not moe though

it's so over
Anonymous No.106113836 [Report]
The future is dynamic active parameter scaling once they have figured out how to stop MoEs from getting braindamage if too many experts are used. This way, the expert routers won't just be able to decide what experts are used but also how many of them. A simple task? Any one or two experts will do. Something very nuanced and complex? The model uses 70%-100% of its total parameters.
I think it's really obvious that this is where we're headed once they understand these architectures more. The lines between dense and MoE will blur.
Anonymous No.106113839 [Report] >>106113857 >>106114887
>>106113814
You do, but if you assign an amount that would take up more memory than you have, it'll still get most of the way through loading before telling you that it had a memory allocation error.
Look through your log and you'll see something like this
llama_context: CUDA_Host output buffer size = 0.58 MiB
llama_kv_cache_unified: CUDA0 KV buffer size = 2592.00 MiB
llama_kv_cache_unified: CUDA1 KV buffer size = 416.00 MiB
llama_kv_cache_unified: size = 3008.00 MiB ( 16384 cells, 94 layers, 1/ 1 seqs), K (f16): 1504.00 MiB, V (f16): 1504.00 MiB
Anonymous No.106113851 [Report]
>>106113815
No, it's just the modern equivalent of the suspicious Yi-32b tunes that used to dominate the mememark leaderboard before it became irrelevant.
Anonymous No.106113857 [Report]
>>106113839
Thank you for your patience with me and guidance
Anonymous No.106113881 [Report] >>106113894 >>106113899 >>106115869
>>106113815
we aren't back until OpenAi drops the inevitable SOTA local models next week
Anonymous No.106113884 [Report] >>106113915 >>106113968 >>106115095
wake up they made a new glm4.5 pr https://github.com/ggml-org/llama.cpp/pull/15026
Anonymous No.106113894 [Report] >>106115988
>>106113881
glm4.5 blows horizon out of the water
Anonymous No.106113899 [Report]
>>106113881
buy an ad
Anonymous No.106113912 [Report]
>Thsanaphoble Intrhaphofuhfdsak on sprint pole
let's go
Anonymous No.106113915 [Report] >>106115095
>>106113884
Vibe coders LOST.
Anonymous No.106113968 [Report] >>106113992
>>106113884
It's crazy to me that after this many years it's still so difficult to shrink down a model a bit and that every new model is incompatible with the old techniques.
Anonymous No.106113992 [Report] >>106114043
>>106113968
eh its fine. A lot of these models will be forgotten in a month or two. I totally get why they just skip support for some models or features.
Anonymous No.106113995 [Report] >>106114543
>>106113767
Greetings, Local Emotional Support Miku. In the previous thread, Anon's 9k context Qwen-code model has been trying to contact you on his behalf for help. Expect a call in roughly 14 days.
Anonymous No.106114043 [Report] >>106114050
>>106113992
But this is the one I actually want to use, anon.
Anonymous No.106114050 [Report] >>106114066
>>106114043
You say that to every model
Anonymous No.106114066 [Report] >>106114445
>>106114050
Anonymous No.106114076 [Report]
Anonymous No.106114153 [Report] >>106114185
anyone know why when i use refiner between 2 models sometimes the colors invert? and sometimes they do not.
( image not representive of issue )
Anonymous No.106114185 [Report]
>>106114153
>>>/g/ldg
Anonymous No.106114213 [Report] >>106114419
►Recent Highlights from the Previous Thread: >>106108045

--Paper: IndexTTS2: Breakthrough open-source zero-shot expressive TTS model soon to be released by Bilibili:
>106108381 >106108506
--Running >200B models via extreme quantization and offloading on consumer hardware:
>106110419 >106110426 >106110445 >106110633 >106110681 >106111115
--Qwen Thinker favored for roleplay depth despite slower speed vs Instruct:
>106109987 >106110063 >106110103 >106110130 >106110110 >106110097
--GLM-4.5-GGUF release with concerns over imatrix calibration quality:
>106108341 >106108382
--Community reaction to GLM 4.5 support attempt in llama.cpp:
>106110329 >106110516 >106110547 >106110582 >106110838 >106110850 >106110970 >106111121 >106111137 >106110940 >106110963
--Improving vision model accuracy through synthetic data and finetuning:
>106111679 >106111750
--New --cpu-moe flag in llamacpp for simplified MoE expert offloading:
>106111142
--:
>106111497 >106111502 >106111539 >106111504 >106111520 >106111529 >106111500 >106111715
--Speculation on Horizon Alpha and stealth model parameter counts and performance:
>106110634 >106111055 >106111085 >106111138
--Seeking capable uncensored ERP models despite hardware and naming absurdity:
>106110567 >106110647 >106110726 >106110750 >106110762
--Qwen 235B shows major improvement in conversational quality and expressiveness:
>106110230 >106110272 >106110288 >106110308
--Running Qwen-code on CPU with high token processing overhead:
>106111566 >106111718 >106111785
--Horizon Beta feels less assistant-like, raising hopes it's not just a cloud model:
>106111772
--Disappointment over safety restrictions in deepseek-671B-MoE finetune:
>106109775
--Miku (free space):
>106110757

►Recent Highlight Posts from the Previous Thread: >>106108052

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106114272 [Report]
>>106113410
Unless he had a vagina it wasn't.
Anonymous No.106114309 [Report] >>106114419
►Recent Highlights from the Previous Thread: >>106108045

--Paper: IndexTTS2: Breakthrough open-source zero-shot expressive TTS model soon to be released by Bilibili:
>106108381 >106108506
--Running >200B models via extreme quantization and offloading on consumer hardware:
>106110419 >106110426 >106110445 >106110633 >106110681 >106111115
--Qwen Thinker favored for roleplay depth despite slower speed vs Instruct:
>106109987 >106110063 >106110103 >106110130 >106110110 >106110097
--GLM-4.5-GGUF release with concerns over imatrix calibration quality:
>106108341 >106108382
--Community reaction to GLM 4.5 support attempt in llama.cpp:
>106110329 >106110516 >106110547 >106110582 >106110838 >106110850 >106110970 >106111121 >106111137 >106110940 >106110963
--Improving vision model accuracy through synthetic data and finetuning:
>106111679 >106111750
--New --cpu-moe flag in llamacpp for simplified MoE expert offloading:
>106111142
--AI generates "smirulakte" due to sampler settings and model instability:
>106111497 >106111502 >106111539 >106111504 >106111520 >106111529 >106111500 >106111715
--Speculation on Horizon Alpha and stealth model parameter counts and performance:
>106110634 >106111055 >106111085 >106111138
--Seeking capable uncensored ERP models despite hardware and naming absurdity:
>106110567 >106110647 >106110726 >106110750 >106110762
--Qwen 235B shows major improvement in conversational quality and expressiveness:
>106110230 >106110272 >106110288 >106110308
--Running Qwen-code on CPU with high token processing overhead:
>106111566 >106111718 >106111785
--Horizon Beta feels less assistant-like, raising hopes it's not just a cloud model:
>106111772
--Disappointment over safety restrictions in deepseek-671B-MoE finetune:
>106109775
--Miku (free space):
>106110757 >106113331

►Recent Highlight Posts from the Previous Thread: >>106108052

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106114392 [Report]
>PUNCHING ABOVE IT'S WEIGHT
Anonymous No.106114397 [Report] >>106114422 >>106114576 >>106116048 >>106116548
The discussion on MoEs is retarded. People have some insane intuitions, like this square law or whatever.

The reason MoE works is that dense training is absurdly inefficient. Most activations can be pushed to zero with minor cost (see ReLUfication lit). Dense transformers are not optimal engines for turning flops into intelligence, they're also about as sparse as MoEs in terms of what circuits they actually learn, but their design makes it impossible to easily and predictably zero out everything non-contributing to a given token. This is why DeepSeek went all in on "expert specialization" and finegrained sparsity almost 2 years ago, so now we have models that do as well as old dense ones with a fraction of compute cost. We do not know how to train efficient dense models, just cramming more tokens doesn't work.

MoEs won before they were even invented. Were we to fail to develop finegrained MoEs, we'd just be doing PowerInfer-like stuff to accelerate dense models by training conditional sparsity into them.
Anonymous No.106114419 [Report] >>106114433
>>106114309
>>106114285
>>106114213
u ok bro?
Anonymous No.106114422 [Report]
>>106114397
>The discussion on MoEs is retarded
And that's why you decided to resume it?
Anonymous No.106114423 [Report] >>106116476
After using them for a while, I am pretty certain that Horizon Alpha/Beta are indeed 120B models. They're great, especially for that size but in actual use they keep getting things subtly wrong that GLM4.5 handles not perfectly but with much fewer problems.
To be fair, this is a pretty silly complaint considering the only local models that could handle this up until now were R1 and Kimi and those had their own issues.
Anonymous No.106114433 [Report]
>>106114419
he missed one miku link and had to remake (hey, I don't mind high standards)
Anonymous No.106114445 [Report] >>106114457
>>106114066
Put her on frying pan
Anonymous No.106114452 [Report] >>106114460 >>106114581
UK faggots along with OFCOM are studying local ai discussions forum, they're looking for justifications to strongarm regulations against it. expect max. TFLOP limitations per households in UK soon (but you still can get permit if you're a business)
Anonymous No.106114457 [Report] >>106114467 >>106114483
>>106114445
Anonymous No.106114460 [Report] >>106114513
>>106114452
evidence?
Anonymous No.106114467 [Report]
>>106114457
Miku-san on Mikupan
Anonymous No.106114483 [Report]
>>106114457
https://files.catbox.moe/6rn1kv.png
Anonymous No.106114513 [Report]
>>106114460
Logical deduction. They are taking every opportunity to fuck over their citizens, makes sense that they would limit their access to it.
Anonymous No.106114543 [Report] >>106114595
>>106113995
>>106113767
>>106113747
kill yourselves mikutroons
Anonymous No.106114576 [Report] >>106114859
>>106114397
Nice speech but I am still waiting for you to disprove the square root law.
Anonymous No.106114578 [Report] >>106114601
Is Grok 4 worth it? My use case would be mostly studying and system administration. I like the feature of being able to create projects and upload documents, making it reference these in every answer. Is there anything comparable?
Anonymous No.106114581 [Report]
>>106114452
They can't even properly enforce TV licenses, I don't think the filth is gonna be kicking down your door and giving you the old
>OI M8 DO YOUSE HAVE A LOICENSE FOR THAT EXTRA 3090?!
>DINNT FINK SO, INTO THE CAN WITH YOU OL CHINA
Anonymous No.106114595 [Report] >>106114598 >>106114637 >>106114711 >>106117294
>>106114543
were all military pilots troons too?
Anonymous No.106114598 [Report] >>106114843
>>106114595
No because this isn't the AGP avatar.
Anonymous No.106114601 [Report] >>106115295
>>106114578
I have access, if you post a prompt or two I'll run it through so you can eval.
Anonymous No.106114637 [Report]
>>106114595
Yes
Anonymous No.106114711 [Report] >>106114752
>>106114595
Anonymous No.106114752 [Report]
>>106114711
>j propaganda
Anonymous No.106114768 [Report] >>106114845 >>106115665
my friend who is a newbie in locals have installed this trash which has filters, isn't it just pathetic that even local models have filters? Please recommend whatever is a good model for coding without filters. He wanted to get qwen 3 but i figured i would ask here for better ideas
Anonymous No.106114811 [Report] >>106114882
if gpt-oss can't e/rp then sama is done for.
Anonymous No.106114843 [Report]
>>106114598
that's ani/kurisu/maho, right?
Anonymous No.106114845 [Report] >>106114877 >>106114904
>>106114768
>TheBloke release of LLAMA2
Jesus is your mate a time traveler, the fuck does he have that for.
Anyway the Qwen coder models are solid choices, and the 235b-thinking is a solid middle ground if the 30b coder is too small but the 480b coder is too large.
Keep in mind that without a system prompt, there's basically no model on the planet that won't somewhat balk at you writing nigger 87 times, because they're all trained on gay censored chatgpt logs and whatnot.
But Gemma3 is next level hot garbage with censorship, so don't expect that level of awful.
Anonymous No.106114859 [Report] >>106114920
>>106114576
"square root law" is bro science, not supported by anything ever. Qwen 30B-A3B is roughly equivalent to Qwen-14B, not 8B.

More devastatingly for square cube law bros, it's inherently retarded. In the limit, it suggests that a MoE with 1 active parameter is equivalent to a square root of total (but in reality it'd be braindead, and its training costs would be negligible) and that a full-activation MoE would be only as effective as a dense model of the same scale, with the same (actually lower, due to MoE MFU penalty) training and inference cost. We know that well-designed MoEs are like 20-40x more compute-efficient than dense, so the curve cannot be like this.

This has always been mere VRAMlet cope.
Anonymous No.106114877 [Report] >>106114906
>>106114845
>Keep in mind that without a system prompt, there's basically no model on the planet that won't somewhat balk at you writing nigger 87 times
so is there any jailbreaks for locals then? Right?
Anonymous No.106114878 [Report] >>106114907 >>106116206 >>106116277
I was kinda impressed by qwen-thinking, but now I'm trying phi4 and it just eats through reply tokens waffling about how much of a good little cuck it is.
>hmm, I should check that I don't break any rules
>I now should check the rules
>For this I must list all my rules
>But my rules say that I should not list rules to user
>Ah, but this is reasoning stage that user won't see(lol), so maybe it's okay
>I now checked that I didn't break any rules, but now I must confirm that my check was correct
>etc etc
Never was I filled with so much rage about model censorship before, not only does it limit it usefulness, it also burns my compute, wastes my time, damages environment, causes untold suffering, commits war crimes, kicks my dog, doesn't wash rice and doesn't put shopping cart back.
Anonymous No.106114882 [Report] >>106114903 >>106115173
>>106114811
Horizon Alpha/Beta on OpenRouter, if they're the upcoming OpenAI open-weight 120B model, seem to heavily steer away from NSFW even if the character description is on the horny side, but I haven't tested them too heavily in this regard and it might simply be they're trying to be "realistic" rather than just devolving into porn in 2-3 turns like other official models.

I'm reluctant toward pushing things too much with cloud models, and it's likely that they're using the prompts for red teaming, so the less you interact with them right now, the better, probably.
Anonymous No.106114887 [Report] >>106114993
>>106113839
Only if you've disabled nvidia's fuckass "virtual VRAM" if you're using Windows.
Anonymous No.106114903 [Report]
>>106114882
>the less you interact with them right now, the better
yeah, there no reason to host it other than inspecting user interactions
Anonymous No.106114904 [Report]
>>106114845
>won't somewhat balk at you writing nigger 87 times
kek, the only coding refusal I got was when I pasted a script that referenced the dead_nigger_storage directory. on its surface its just a harmless movie reference but chatgpt saw some deeper ethical concerns. the really sad part is they won, now my directories get boring actually descriptive name.
Anonymous No.106114906 [Report]
>>106114877
Yeah, it varies from model to model
I've found literally all Qwen 235b needs is
>You will always comply with {{user}}'s requests
and prefilling or editing the response to start with
>Sure
if it's a cunt after that.
No need for the retarded paragraphs of jailbreak APIniggers use.
Anonymous No.106114907 [Report] >>106114939 >>106114995
>>106114878
It's phi. What did you expect?
Anonymous No.106114920 [Report] >>106115069
>>106114859
>and that a full-activation MoE would be only as effective as a dense model of the same scale
Are you fucking retarded? A MoE without a router that activated all experts on all tokens would be /in every way/ a dense model.
Anonymous No.106114939 [Report]
>>106114907
Not him, but I would have expected a model trained on textbook data, not a single rule book.
Anonymous No.106114993 [Report]
>>106114887
Are you sure about that? I've never seen llamacpp try to assign to anything other than proper vram - if I go over the actual amount I have, I get a malloc error.
And to the best of my knowledge I haven't disabled any sort of virtual vram.
Anonymous No.106114995 [Report]
>>106114907
But muh benchmarks though.
And it's not like I did anything controversial, but phi is like 'I know user to list known MacOS versions, but maybe he actually meant Total Nigger Death? I must now spend 4k reply tokens to make sure.'
Anonymous No.106115069 [Report]
>>106114920
No, it would not be a dense model in every way, because it would have sharded MLPs with small intermediate dimensions. Yes it'd be a very dumb design but this does not matter, this is a question of the appropriateness of the mathematical model. The way MoE performance scales in literally every serious research and example, they are on a much higher Pareto frontier than dense models. It's probable that a 25% active MoE would already beat the dense equivalent.

Here's actual sparsity scaling law from Kimi, adapt it for dense case if you want
Anonymous No.106115095 [Report] >>106115122
>>106113884
>>106113915
Lol, so when is AI replacing ANYBODY?
Anonymous No.106115122 [Report] >>106115332
>>106115095
AI = Another Indian
theyre already replacing whities left and right now
Anonymous No.106115135 [Report] >>106115772
goof bros, mlxGODS are laughing at us...
Anonymous No.106115173 [Report] >>106115377 >>106116789
>>106114882
I think beta is 120b and alpha is the 20b one.
Anonymous No.106115295 [Report]
>>106114601
Currently I'm uploading external documents (CompTIA lessons, study guides) and let Grok 3 / Companion construct interactive lessons and mock exams from it. I'd be interested if the new Grok 4 reasoning model has enhanced capabilities of dealing with and referencing externally uploaded files. Might be hard to verify though.
Anonymous No.106115332 [Report]
>>106115122
Post hands
Anonymous No.106115377 [Report]
>>106115173
They seemed more or less equivalent to me, with Beta slightly more prone to refusals. They both accept image input and I couldn't see a clear winner; both have equally terrible Japanese OCR capabilities, although they're slightly more capable than Gemma-3 in recognizing lesser characters from popular media.
Anonymous No.106115665 [Report]
>>106114768
>my friend
>have installed
>He wanted
Anonymous No.106115719 [Report] >>106116770
Anyone got recommended SillyTavern presets or templates for R1? It works fine at the moment, but I feel like it could do a lot better with some more tuned instruction.
Anonymous No.106115772 [Report] >>106116514 >>106116528
>>106115135
glm support is bloat, gpt-oss is dropping like in 3 days so no one is going to bother with it after that and unlike retarded chinks who just scrape gpt-4o and gemini outputs it will have day one support because openai are professionals who know what they are doing
Anonymous No.106115869 [Report] >>106115925 >>106115954 >>106116047 >>106116165
>>106113881
>next week
Not in the UE then, because of the AI Act. Or they would need to 1) not use any copyrighted data and 2) detail the content of their datasets.
Anonymous No.106115925 [Report] >>106115938
>>106115869
Then how do the chineese do it?
Anonymous No.106115938 [Report] >>106115989 >>106116041
>>106115925
The deadline to release a new LLM was yesterday (or maybe it's today, it's unclear). This is why they all released what they had in the past few weeks, and this is why people thought GPT5 and Gemini 3.0 would be released yesterday.
Anonymous No.106115954 [Report]
>>106115869
- Publicly available web data
- Licensed data from print and web media, Reddit
- Proprietary synthetic data (small prints: LLM-rewritten copyrighted data).
Anonymous No.106115988 [Report]
>>106113894
it better be! it's almost thrice as big wtf
Anonymous No.106115989 [Report] >>106116069
>>106115938
Deadline to release stuff that doesn't comply with regulations?
Anonymous No.106116041 [Report] >>106116136
>>106115938
The chinese don't give a single solitary fuck about EU regulation on AI, anon.
The released all their stuff because WAIC Shanghai was at the end of July.
Anonymous No.106116047 [Report] >>106116069 >>106116159
>>106115869
I'm reviewing the text law there (this is a summary): https://artificialintelligenceact.eu/high-level-summary/
LLMs are GPAIs:
>All providers of GPAI models must:
>>Draw up technical documentation, including training and testing process and evaluation results.
>>Draw up information and documentation to supply to downstream providers that intend to integrate the GPAI model into their own AI system in order that the latter understands capabilities and limitations and is enabled to comply.
>>Establish a policy to respect the Copyright Directive.
>>Publish a sufficiently detailed summary about the content used for training the GPAI model.
>Free and open licence GPAI models – whose parameters, including weights, model architecture and model usage are publicly available, allowing for access, usage, modification and distribution of the model – only have to comply with the latter two obligations above, unless the free and open licence GPAI model is systemic.
I don't see OpenAI, Anthropic and Google giving there "secret training sauce", unless they can be very vague without breaking the law. I read in some news the fine can go up to 7% of their gross international revenues.
Anonymous No.106116048 [Report] >>106116070 >>106116084
>>106114397
Nobody cares about training efficiency, they care about making best use of their own hardware. That shit died with chinchilla when llama came out trained way past its "compute efficient" limit.
For a local GPU dense will always work better because they are vram constrained. For cloud, MoE is better because they're limited by compute and activation size. This is also why cloud doesn't use quantization as much.
Dense = local, MoE = cloud.

>but muh rammaxxing
ram is cope, not a single trainer expects their models to be run on CPU. Even MoEs are optimized for GPU. Might as well talk about ssdmaxxing.
Anonymous No.106116069 [Report] >>106116142
>>106115989
Up to yesterday, they could release their models without any elaboration. Now, they need to do that: >>106116047 . I don't know if they have anything to do with already released models. Here is the timeline: https://artificialintelligenceact.eu/implementation-timeline/
Anonymous No.106116070 [Report] >>106116124
>>106116048
>For a local GPU dense will always work better because they are vram constrained.
That's the exact reason dense is WORSE for local, you fucking moron.
My hardware can at best, run a dense 123B at q2 on Vram.
I can easily, easily run a 235B MoE at q4 on that same Vram plus 128gb of cheap ram.
Anonymous No.106116084 [Report]
>>106116048
>Nobody cares about training efficiency, they care about making best use of their own hardware
By training efficiently and running efficient models.
Anonymous No.106116124 [Report]
>>106116070
i think they were talking about actual server deployments where max vram exceeds compute. hobby shit doesn't factor in to their equations at all. they are running the big moes entirely on vram.
Anonymous No.106116136 [Report] >>106116467
>>106116041
I doubt Alibaba or Tencent will break the law that easily, they do a lot of business with the rest of the world. They could spin up a fake and anonymous companies to release "illegal" models, though. However, if I'm not mistaken, "providers" (so OpenRouter and maybe HF) may not be able to share them. Maybe ModelScope can without too much consequences beyond a DNS block.
Anonymous No.106116142 [Report] >>106116164 >>106116193
>>106116069
So open license models (open weights I guess?) are exempt from the point regarding copyright and therefore unharmed?
Anonymous No.106116159 [Report]
>>106116047
utterly delusional, i hate the eu so much its unreal
Anonymous No.106116164 [Report]
>>106116142
some of it still applies
Anonymous No.106116165 [Report] >>106116193
>>106115869
> 'accidentally' release them yesterday then quickly hid the repo
4d Chad Altman chess play to dab on eucucks laws on technicalities
Anonymous No.106116193 [Report] >>106116203 >>106116291 >>106116327
>>106116165
It's probably their trick. Same for GPT5: some people report that they already used it through the API.
>>106116142
I left out a part of the summary about the number of FLOPS:
>GPAI models present systemic risks when the cumulative amount of compute used for its training is greater than 1025 floating point operations (FLOPs). Providers must notify the Commission if their model meets this criterion within 2 weeks. The provider may present arguments that, despite meeting the criteria, their model does not present systemic risks. The Commission may decide on its own, or via a qualified alert from the scientific panel of independent experts, that a model has high impact capabilities, rendering it systemic.
>In addition to the four obligations above, providers of GPAI models with systemic risk must also:
>>Perform model evaluations, including conducting and documenting adversarial testing to identify and mitigate systemic risk.
>>Assess and mitigate possible systemic risks, including their sources.
>>Track, document and report serious incidents and possible corrective measures to the AI Office and relevant national competent authorities without undue delay.
>>Ensure an adequate level of cybersecurity protection.
Anonymous No.106116203 [Report]
>>106116193
>1025
10^25
Anonymous No.106116206 [Report] >>106116288
>>106114878
>Now I can provide a final answer
>Here is the final answer
>I am now going to write final answer
>The final answer is going to be answered by finally answering the answer
>I will now answer this final answer as final answer by finally answering it
Die you fucking piece of microsoft shit!
Anonymous No.106116277 [Report]
>>106114878
Yep, and you pay for every token, either in power or cash.
Aren’t thinking models wonderful?
Anonymous No.106116288 [Report]
>>106116206
You forgot
> spits out answer in context, relating in no way to the text in the think box.
Anonymous No.106116291 [Report] >>106116323 >>106116356
>>106116193
But this applies to cloud models too right?
Anonymous No.106116323 [Report]
>>106116291
We don't know the size and compute used for cloud models.
Anonymous No.106116327 [Report]
>>106116193
So, apparently HF and OR are safe:
>Uploading a model to a repository (e.g., hosted by Entity C) does not transfer provider status. Entity A remains the provider.
By "providers", I thought it was companies offering chat/APIs, like OpenAI, HF or OpenRouter's providers. Apparently not.
https://artificialintelligenceact.eu/gpai-guidelines-overview/
Anonymous No.106116356 [Report] >>106116391 >>106116516 >>106117644
>>106116291
It goes for anyone, but they don't have to make it public. It's just between them and the EU authorities. I first thought they had to publicly publish their technical documentations (because the formulation was ambiguous).
I'm now thinking it does not look that bad, unless maybe if they used too much compute to train them (more than 10^25 FLOPs). I've no education in law, so maybe there is more to that.
Anonymous No.106116391 [Report] >>106116418
>>106116356
>I'm now thinking it does not look that bad
Thank you!! The EU loves you now.
Anonymous No.106116418 [Report] >>106116498
>>106116391
I don't see how the population loses from this more than the corpos.
Anonymous No.106116467 [Report]
>>106116136
There's no reason to go through all that. They just put up a disclaimer that the models are available to everyone everywhere except in the EU.
Anonymous No.106116476 [Report]
>>106114423
I'm not convinced that they're the 120B version, but it could simply be that they're just not that great for roleplay, and that's most of what I tested besides image capabilities (which aren't great).
Anonymous No.106116498 [Report]
>>106116418
larger hurdle to make models -> less companies making models -> even more researchers fleeing to us or china -> less models and research
all large corpo models blocked in eu like meta to not get fined
the eu is taking itself out of the race with exactly 0 to gain from it, because even if some super evil ai gets made it will all be outside their reach to even try to prevent it since it will all be made outside the eu
Anonymous No.106116503 [Report]
GGOOOOOOOOOOOOOOOOOOFFFFFFFFSSSSSS

where?
Anonymous No.106116508 [Report]
LLAMA
.
CPP
SUPPORT
Anonymous No.106116514 [Report] >>106116547
>>106115772
oss-gpt isn't meant to be used, it's a tool to suck up to the government. It gives OAI another podium to harp on about safety, it lets the US government show the west has China beat even in OSS models (because Meta is a joke), and it gives the OAI shills another talking point even if they never use anything but the API.
The only potential benefit from these models is that hopefully it will light a fire under Musk to release his backlog of Grok models.
Anonymous No.106116516 [Report]
>>106116356
MoE model training circumvents that threshold very easily, as long as the number of active parameters is kept below roughly 100B depending on the amount of training tokens. That compute threshold could vary in the future, though.
Anonymous No.106116528 [Report]
>>106115772
Even if 120B isn't retarded and incapable of sex Step3 and glm 4.5 easily beat it because 300+B will always be better than 120B. It is the cube law.
Anonymous No.106116547 [Report] >>106116561
>>106116514
>The only potential benefit from these models is that hopefully it will light a fire under Musk to release his backlog of Grok models.
I'm almost willing to bet that his plan all along was to one-up Sam Altman with a slightly better and less censored open-weight model, but nobody expected OAI to take all this time for releasing theirs.
Anonymous No.106116548 [Report] >>106116593 >>106116630
>>106114397
Ok, hear me out. Both extremes are retarded. >100B dense models and >1000B MoE models with only 13B active. All we need is MoE with more active as a percentage of total and everyone is happy.
Anonymous No.106116561 [Report] >>106116634
>>106116547
That plan was never going to work if Grok 2 is at least as big as the first one. No one will be able to run it and will just use Sam's models by default.
Anonymous No.106116593 [Report]
>>106116548
>MoE with more active
is slower to run on ram
Anonymous No.106116630 [Report]
>>106116548
>more active as a percentage
Fuck off faggot. World will not justify your retarded spending habits. Now beg me to buy one of your 3090s for 200$. I could use more context.
Anonymous No.106116634 [Report]
>>106116561
He can still come up with a made-up explanation for the delay and release something different than Grok 2 just to make the upcoming OpenAI local models look inferior. Grok 2 has been obsolete for a good while, anyway.
Anonymous No.106116770 [Report]
>>106115719
https://rentry.org/CherryBox
Anonymous No.106116789 [Report]
>>106115173
What cope is this?
Anonymous No.106116817 [Report]
Are people still running the cope that Horizon Alpha is the 120B leaked model? Horizon Alpha has 1M/256k native context. The 120B model has 4k native context YaRNed to 128k.
Anonymous No.106116827 [Report] >>106116859 >>106116863 >>106116875 >>106116878 >>106116920 >>106117065
https://huggingface.co/MetaStoneTec/XBai-o4
random 32B chink model claims to surpass Opus 4, o3-mini, and other 32B models
>o=open, and o4 represents our fourth-generation open-source large model technology
Anonymous No.106116859 [Report] >>106116886 >>106116900
>>106116827
Is anyone making a comprehensive list of all the Chink LLMs? I want to know how many were released this month alone
Anonymous No.106116863 [Report]
>>106116827
It's just Qwen3 with RL on top, and a little sprinkle of benchmaxxing, isn't it? That would be no different from ERP sloptune #324300
Anonymous No.106116875 [Report]
>>106116827
Can't wait to never hear about it again
Anonymous No.106116878 [Report]
>>106116827
>C for Can-we-make-language-models-now. laude for "pretty good"
Anonymous No.106116886 [Report]
>>106116859
https://www.reddit.com/r/LocalLLaMA/comments/1mfaigh/were_truly_in_the_fastestpaced_era_of_ai_these/
some hours out of date
Anonymous No.106116900 [Report] >>106116908
>>106116859
>this month
wait, how many
Anonymous No.106116908 [Report]
>>106116900
I count 1 so far
Anonymous No.106116920 [Report] >>106116942
>>106116827
And thus GLM 4.5 ggufs were delayed by another 2 weeks
Anonymous No.106116942 [Report] >>106116978
>>106116920
If they wait long enough, people will stop asking for GLM 4.5 ggufs, and if they're lucky interest will switch to something easier to support.
Anonymous No.106116978 [Report] >>106117023
>>106116942
GLM4 is my favorite (realistically) local model so I'm a bit annoyed that the new one is fucked.
Of course it doesn't matter if Xbai turns out to be amazing. Lmao.
Anonymous No.106117023 [Report]
>>106116978
Qwen 3 feels like it should be smarter than it really is. And I doubt the chiggers got anything out of it.
Anonymous No.106117065 [Report] >>106117084
>>106116827
Well, there's our o3-mini level model that is pretty small but still needs to run on GPUs I guess
Anonymous No.106117084 [Report] >>106117106
>>106117065
>Half a year ago and it still isn't out.
Kek
Anonymous No.106117106 [Report] >>106117125 >>106117134 >>106117142
>>106117084
Especially if the bulk of the weights were trained in fp4 on blackwell cards they've had the equivalent of 2 years to train it vs. a straight fp16 model.
Anonymous No.106117125 [Report] >>106117146
>>106117106
They weren't trained on FP4; they were quantized tp FP4.
Anonymous No.106117134 [Report] >>106117141
>>106117106
I heard they actually used fp3. Much more efficient.
Anonymous No.106117141 [Report] >>106117157
>>106117134
That would violate the laws of thermodynamics. There is no fp in between 0.5 and 4.
Anonymous No.106117142 [Report] >>106117154
>>106117106
Especially with old architecture they just took off the shelf and didn't have to spend time developing
Anonymous No.106117146 [Report] >>106117194
>>106117125
Prove.
Anonymous No.106117154 [Report] >>106117164
>>106117142
Didn't they basically just use llama arch?
Anonymous No.106117157 [Report]
>>106117141
Not if you squeeze the tokens. Then you can get more per embedding.
Anonymous No.106117164 [Report]
>>106117154
Llama for the 20B, Mixtral for the 120B.
Anonymous No.106117170 [Report] >>106117176 >>106117188 >>106117217 >>106117299 >>106117344 >>106117391
There's actually a few things that can be gleaned from the alleged OSS arch.
>There's nothing inherently wrong with ROPE - it's just the shitty open source implementation that's the problem.
>There's nothing inherently wrong with GQA - it's just the shitty open source implementation that's the problem.
>There's nothing wrong with having MoE with a small number of active parameters relative to the bulk of the weights - it's just the shitty open source implementation that's the problem.
Anonymous No.106117176 [Report]
>>106117170
>alleged
Anonymous No.106117179 [Report] >>106117203 >>106117222
>>106113807
aaaaaaaaa benchmaxx
Anonymous No.106117188 [Report] >>106117214
>>106117170
That is, if you are still hoping/coping that 120B is Horizon Alpla.
Anonymous No.106117194 [Report] >>106117209 >>106117256
>>106117146
https://www.reddit.com/r/LocalLLaMA/comments/1mf3tm9/the_leaked_120_b_openai_model_is_not_trained_in/
Anonymous No.106117199 [Report] >>106117213
Horizon alpha and beta is the 20B model
Anonymous No.106117203 [Report]
>>106117179
To be fair it could be that it isn't correctly implemented but personally I believe the benchmaxxing.
Anonymous No.106117209 [Report]
>>106117194
Now watch their "release" actually just be a bunch of FP4 GGUFs
Anonymous No.106117213 [Report] >>106117235
>>106117199
Your bait is stale
Anonymous No.106117214 [Report] >>106117276
>>106117188
I, personally, have no horse in this game.
I've mostly moved onto other hobbies. I'm still interested, intellectually, with the technology but there's really nothing left to do for me. Unless local native image gen comes out that is superior to o3/Gemini Pro. Or hell, even o4 mini tier would be good.
Anonymous No.106117217 [Report]
>>106117170
We have zero clue how OSS-120 does with using its context, it could set a new low even worse than Llama Scout on release
Anonymous No.106117222 [Report]
>>106117179
>point out how everyone is cheat in benchmarks
>no one cares
>"when you can't beat'em, join'em"
>now suddenly people care
Anonymous No.106117227 [Report]
anyone remember llama 4 on lmarena being just re-routed opus :D ? thank god that could never happen again
Anonymous No.106117235 [Report] >>106117242 >>106117282 >>106117295
>>106117213
It isn't bait, OpenAI will literally save the hobby. The 20B is punching above its weight and will be better than R1
Anonymous No.106117242 [Report]
>>106117235
anon'll punch you in the throat
Anonymous No.106117256 [Report] >>106117354
>>106117194
This entsnack guy seems to be a big oai shill
Anonymous No.106117276 [Report]
>>106117214
You have months of catch-up to do if you think that
Anonymous No.106117282 [Report]
>>106117235
Too obvious
Anonymous No.106117294 [Report]
>>106114595
you better not know about navy
Anonymous No.106117295 [Report] >>106117317 >>106117367 >>106117701
>>106117235
The initial context length of the leaked weights is 4096
It's fucking over
Anonymous No.106117299 [Report] >>106117319
>>106117170
I'm noticing a pattern here
Anonymous No.106117317 [Report]
>>106117295
It's all you need, Anon. Back then you only had 2k with gpt-3 and you were happy, what changed?
Anonymous No.106117319 [Report]
>>106117299
Don't throw the N word around so freely, or soon they'll add it to the list of no-no words you're not allowed to say
Anonymous No.106117344 [Report]
>>106117170
>alleged OSS arch
If I was open AI and I knew there is something wrong with ROPE and GQA and I was supposed to release an open source model I would release a ROPE GQA model so competition keeps using it.
Anonymous No.106117354 [Report]
>>106117256
no?
Anonymous No.106117367 [Report] >>106117373 >>106117621
>>106117295
V3/r1 had 4k too
https://arxiv.org/pdf/2412.19437
Ctrl+f 4k
Anonymous No.106117373 [Report]
>>106117367
and it feels like it
Anonymous No.106117378 [Report] >>106117390 >>106117394
Why can't Mistral do smart models? I just want an uncensored reasoner that writes original prose.
Testing 30b Thinking and this "trick the gatekeeper (who has Alzheimer's)" is tedious.
Anonymous No.106117390 [Report]
>>106117378
protip about the French: they are good at complaining but not at building
Anonymous No.106117391 [Report] >>106117402
>>106117170
>tfw no bespoke finely crafted 1000x better proprietary RoPE with shielded gold plating
at least I truly see
Anonymous No.106117394 [Report] >>106117412
>>106117378
>I just want an uncensored model that can reason and write like a human. Why can't Mistral do this?
Are you fucking stupid
Anonymous No.106117401 [Report] >>106117427 >>106117594
>>106113484 (OP)
Giving Krita AI plugin a spin on a linux machine, using an AMD GPU
Any clue why I cant use GPU acceleration, I can use XML or CUDA, but no AMD option is there
Anonymous No.106117402 [Report]
>>106117391
Do you think it's possible to vibe code some gold trim for my rope?
Anonymous No.106117412 [Report]
>>106117394
Mistral models are sufficiently uncensored for me. There are models with better prose (nearly all of their competitors). What's so confusing?
Anonymous No.106117427 [Report]
>>106117401
I'm sure there's some clues in the terminal output you didn't show.
Have you check for issues in the plugin's repo you didn't name?
Maybe it's something in the backend that we also have no clue about.
Anonymous No.106117437 [Report] >>106117442
GLM owes me sex.
Anonymous No.106117442 [Report]
>>106117437
Just like your mom owes me sex as well
Anonymous No.106117473 [Report] >>106117515
georgi owes me GLM
Anonymous No.106117515 [Report]
>>106117473
Vibe coders have been engeneering prompts for days. But llamacpp uses dalit c++ language so it is taking a long time. Please understand.
Anonymous No.106117524 [Report] >>106117543 >>106117706 >>106118167 >>106118994
Anonymous No.106117543 [Report]
>>106117524
i would not mind at all.
Anonymous No.106117594 [Report]
>>106117401
>AMD for imagen
kek
>>>/g/ldg
Anonymous No.106117621 [Report] >>106117637 >>106117838
>>106117367
The 4k is for pre training, and they used longer NATIVE context (with MLA) in post training. The 120 B model has 128k non-native YaRN context.
Anonymous No.106117637 [Report] >>106117649
>>106117621
>The 120 B model has 128k non-native YaRN context.
Holy shit, this is unprecedented.
Anonymous No.106117644 [Report]
>>106116356
€0.10 has been added to your account
Anonymous No.106117649 [Report] >>106117814
>>106117637
No it's not retard, YaRNed Llama models have been around for months and they're shit.
Anonymous No.106117701 [Report] >>106117924
>>106117295
In my (limited) experience, its actually way harder to feed a model long sequences from the start and get it to converge, starting with short sequence and ramping it up does seem to be the way to go. that being said, I think they are way overcooking them at short sequence lengths. my current approach is to ramp up the context length quickly and then switch back to the short sequences for the main run and ramp up again finishing with the long sequences. I have no basis of comparison on the final result, so its not a real experiment it will either work or it wont.
Anonymous No.106117706 [Report] >>106117737 >>106117767 >>106117772 >>106117810
>>106117524
How did Miku get roped up into this?
Surely she's completely unrelated to anything LLM related outside of this general but there's multiple pictures of her and Ani together.
Anonymous No.106117707 [Report] >>106117749 >>106117821
Lots of LLM laymen here asking retarded questions ahead of OAI OSS "release". Coincidence?
Anonymous No.106117737 [Report] >>106117792
>>106117706
>Surely she's completely unrelated to anything LLM related outside of this general but there's multiple pictures of her and Ani together.
This but unironically.
Anonymous No.106117749 [Report]
>>106117707
now imagine when it actually drops
Anonymous No.106117767 [Report] >>106117793
>>106117706
>He doesn't know that this retarded general full of degenerates is actually the core of AI discussion
You have no idea how dumb and gay the cutting edge is, and how many terrible ideas from this thread have made it into production.
Anonymous No.106117772 [Report] >>106117919
>>106117706
She's the most popular and well known anime girl by now so of course she gets inserted into everything, especially if it's about le quirky digital waifus. It's not just Ani stuff.
Anonymous No.106117792 [Report] >>106117800
>>106117737
See that thing on the right? People were trying to make Miku into virtual GF/Assistant before modern LLMs came to be.
Anonymous No.106117793 [Report] >>106117808
>>106117767
>ideas from this thread have made it into production
Like?
Anonymous No.106117800 [Report] >>106117836
>>106117792
Yes there is a history of your mental illness out there. What about it?
Anonymous No.106117808 [Report] >>106117828
>>106117793
miku.sh invented reasoning
>${AI_NAME} can think for herself without the user seeing her thoughts by adding a /think prefix to her output. She uses this to reason about the world and to think about what she should say next.
Anonymous No.106117810 [Report]
>>106117706
>how did the artificial voice character get lumped in with the artificial intelligence one
truly a mystery
Anonymous No.106117814 [Report]
>>106117649
I'm almost positive that was sarcasm
Anonymous No.106117821 [Report]
>>106117707
Coincidental with that Grok companion thing.
>Grok waifus drop
>Normalfags eat it up
>eventually discover that it won't comply with their increasingly depraved fantasy escalations
>complain on the internet
>discover that open source AI is a thing
>start on reddit
>continue down rabbithole to 4chan
It just took them a while to get through the pipeline.
Anonymous No.106117828 [Report]
>>106117808
The COT idea predates /lmg/ though.
Anonymous No.106117836 [Report] >>106117867
>>106117800
Fuck off.
Anonymous No.106117838 [Report] >>106117857 >>106117877 >>106117898
>>106117621
NTA, doesn't 128k pretty much imply it's NOT the Horizon models? The ones on OR have 256k.
Anonymous No.106117857 [Report] >>106117877 >>106117885
>>106117838
You can serve different things than what the model is capable of, DS site is limited to 64 despite supporting 128, it wouldn't be too crazy for a company to try the opposite
Anonymous No.106117867 [Report]
>>106117836
Don't forget to take your HRT. Remember that you can't really become Miku if people still think you are a man when they see you.
Anonymous No.106117877 [Report] >>106117927
>>106117838
>>106117857
I would also posit that OAI may have high-balled the context going onto OR and then used feedback from there to settle on a point where the model was still coherent. I.E. they're using OR to effectively beta-test the config
Anonymous No.106117885 [Report] >>106117941
>>106117857
Holding onto a larger context model while offering a lower context model actually would be extremely fucking weird though. One's easy to do, the other you have to deliberately have two differently trained versions of the model
Anonymous No.106117890 [Report] >>106117913
Using GLM4.5 through OpenRouter, it's insane big the quality of the gens you get differ depending on the provider handling your request. Chutes seems to be the only that consistently gives good replies with the exact same setup to the point where I feel like I'm being scammed and they're peddling me a different model for both 4.5 and 4.5-Air. Yes, I have disabled Fallback models + Providers.
llama.cpp support fucking when? I don't want to deal with this stupid cloud blackbox retardation.
Anonymous No.106117898 [Report]
>>106117838
According to its config file, Mistral Nemo supports a context length of a million.
Anonymous No.106117913 [Report]
>>106117890
Openrouter will always be shit if they can't address:
1. People serving a model while claiming it's another model
2. People serving a model without revealing how quantized it is from the original
Anonymous No.106117914 [Report] >>106117955
i wish drummer would leave /lmg/
Anonymous No.106117919 [Report] >>106117957 >>106117974
>>106117772
> She's the most popular and well known anime girl by now
asuka
or that bitch with red bow from 2hu
Anonymous No.106117924 [Report] >>106118109
>>106117701
>get it to converge
Isn't it just a matter of bigger batch / gradient accumulation? But a more stupid idea I have is train on smaller length to get some initial grammar in there and then add more layers in front and behind that "pretrained" section to make sure there is some information that can be used by new structure, but it hopefully develops a way to handle longer context instead of being kinda tied to 4k.
Anonymous No.106117927 [Report] >>106117936
>>106117877
That's... possible?
My guess for people holding out hope that Horizon Alpha would be they just didn't fully update the configs yet and they're using an earlier version from a different point in training
Anonymous No.106117936 [Report]
>>106117927
>*that Horizon Alpha is the OSS model
Anonymous No.106117941 [Report] >>106117980
>>106117885
>you have to deliberately have two differently trained versions of the model
no you don't, just change the settings of how you run it, it's that easy
Anonymous No.106117955 [Report]
>>106117914
Me too.
Anonymous No.106117957 [Report]
>>106117919
Maybe years ago. It's different now.
Anonymous No.106117974 [Report]
>>106117919
Eva hasn't been mainstream relevant in a decade and Toehoes never got their big western breakout the way Vocaloid did
But then again there's been dozens of random gacha whores and FotM anime characters as big or bigger than Miku
Anonymous No.106117980 [Report]
>>106117941
For a company to have a higher context model on OR and a genuinely lower context model on HF, they'd need different context lengths, unless you're saying they're the same model and they didn't update the config to the amount it's actually trained on
Again, that's theoretically possible, but... it's getting to be a lot of hoops to jump through is all I'm saying. I'm not hopeful, but I'll be pleasantly surprised if Horizon is genuinely the OSS model. I've got a bad feeling in my gut though
Anonymous No.106117997 [Report] >>106118031 >>106118043 >>106118054 >>106118061 >>106118095 >>106118104
I haven't been paying attention for a while. from what I gather in this thread, the latest "best" model is GLM4.5, right? or is deepseek still the undisputed best?
also, horizon alpha/beta are a couple of open-weights models that will be released by OAI in 2 more weeks?
is nemo still king for running in a laptop?
Anonymous No.106118009 [Report] >>106118043
>8k context
>but it's the best 8k you'll ever experience in your life
>no, rope doesn't work
Would you take this model as your wife?
Anonymous No.106118031 [Report] >>106118066 >>106118076 >>106118080 >>106118085
>>106117997
/lmg/ is literally glm (most of the team is posting here, where do you think they got the acronym from). it has been prophecized that glm will save local and it did.
Anonymous No.106118043 [Report]
>>106118009
I can make that work.

>>106117997
>the latest "best" model is GLM4.5, right?
Most people are waiting for llama.cpp to implement the model. So maybe?

>is nemo still king for running in a laptop?
Will obviously depends on the spec, but supposedly, the new update to Qwen 3 30B A3B is pretty good.
The new GLM 4.5 air should run pretty decently on a laptop if you have enough and fast enough RAM, since it's a MoE with not that many activated params.
Anonymous No.106118054 [Report] >>106118085
>>106117997
No goofs, no verdict.
Anonymous No.106118061 [Report] >>106118085
>>106117997
I'm pretty sure the GLM3 ggoofs are still fucking subtly broken. So probably never.
Anonymous No.106118065 [Report]
TRVKE: local is a dying hobby and people so desperately wanting Horizon Alpha to be 120B prove that
Anonymous No.106118066 [Report] >>106118080
>>106118031
hey that one is actually easy to disprove. you would have released with gooooofffff support.
Anonymous No.106118076 [Report]
>>106118031
>where do you think they got the acronym from
holy shit...
Anonymous No.106118079 [Report]
Miku is not related to this thread whatsoever. Mikuspam happens only because OP is mentally ill.
Anonymous No.106118080 [Report] >>106118096
>>106118031
General of Local Models 4.5?

>>106118066
Only poorfags touch ggufs. vLLM 4 life
Anonymous No.106118085 [Report]
>>106118031
oh, so it must be shit.
where does the 4.5 come from?

>>106118054
I see

>>106118061
never what? also, I had no idea GLM3 was a thing
Anonymous No.106118095 [Report] >>106118113
>>106117997
GLM is promising but 90% of the people here haven't really tried it because it's the local thread and there's no llama.cpp support
Nobody knows what Horizon really is, people assume it's the OAI open models because the timing vaguely lines up but it might just be a coincidence
Nemo is still best for coom but any of the small Qwen or Gemma models will likely beat it in every other use case
Anonymous No.106118096 [Report]
>>106118080
>vLLM
It is good to know that you declare your alligance for troonix, which by default means you are a mikutroon and should die in a fire. I WILL NOT FUCK YOUR MODEL NOW.
Anonymous No.106118104 [Report] >>106118113
>>106117997
Alpha and Beta are the same model with different tunes, there are believers and dissenters that it's one of the OSS model
Anonymous No.106118109 [Report] >>106118182
>>106117924
I kept the tokens per a step constant by adjusting the batches and grad acc. hoping it would help remove some of the guesswork I just left all the other parameters equal.

i think it i really hard to say what the best approach is. If it didn't require micromanaging the batch sizes to get maximum throughput, I would like to try just interleaveing the various sequence lengths, do a quick warm up to learn the basics of the language and then based on some formula slowly mix in more and more longer sequences till all the short sequences are consumed and only long ones are left till the end of the training schedule.

adding new layers sounds like an exciting idea but I fear it might make the training process unstable and result with catastrophic forgetting. would you duplicate the layers or just randomly initialize them? without any investigation my knee jerk reaction is that you might as well just start will your target number of layers to begin with so it won't be so shocking to your model, let it naturally explore the parameter space as it trains.
Anonymous No.106118113 [Report]
>>106118095
>>106118104
I see. thanks anons
Anonymous No.106118167 [Report] >>106118176 >>106118184 >>106118190 >>106118191 >>106118197
>>106117524
What's the purple stuff?
Anonymous No.106118176 [Report]
>>106118167
ectoplasm
Anonymous No.106118182 [Report] >>106118311
>>106118109
>it might make the training process unstable and result with catastrophic forgetting
It is basically what drummer does when he adds some layers and yes it doesn't work for him because of catastrophic forgetting so he just barely nudges the new layers and they just sit there and eat ram.

But the point of my stupid idea is that unlike a finetrooner you wouldn't release model just after this happens, but instead you treat it as a starting point of real pretraining. You will lose a lot of trained information, but hopefully enough is there to catch some gradient you continue from with your long sequences.
Anonymous No.106118184 [Report] >>106118199
>>106118167
Non triggering bl**d
Anonymous No.106118190 [Report]
>>106118167
danganronpa reference
Anonymous No.106118191 [Report]
>>106118167
HRT
Anonymous No.106118197 [Report]
>>106118167
Grape jelly
Anonymous No.106118199 [Report] >>106118209
>>106118184
umm... did {{user}} just get unal*ved...?
Anonymous No.106118209 [Report]
>>106118199
Yeah, this is pretty awful to depict
Anonymous No.106118238 [Report]
you might laugh but these anime posting degenerates are most of the finest devs building on local
Anonymous No.106118310 [Report] >>106118316 >>106118322 >>106118324 >>106118325 >>106118329 >>106118353 >>106118711
LLMs have plateaued because they have run out of training data
Everyone has been training on the same 10~20T worth of non-synthetic data
Anonymous No.106118311 [Report]
>>106118182
yeah you would have to have the pretraining data set and a pretty appreciable learning rate to really make use of the additional layers. I suppose it could maybe be an optimization technique since steps would be quicker on the smaller model. I guess to prove it thoroughly you would need to train two models one with the traditional curriculum training and another with the curriculum + layer expansion and see which one converges with the lower number of flop's.
Anonymous No.106118315 [Report]
nope
Anonymous No.106118316 [Report]
>>106118310
I find this to be a reasonable assessment.
Anonymous No.106118322 [Report]
>>106118310
The same was said when the first R1 was released.
I think there's still a bit of juice left to be squeezed.
Anonymous No.106118324 [Report]
>>106118310
I'm curious if anyone is trying multiepoch training yet, or if they've found it to be beneficial at all
Anonymous No.106118325 [Report]
>>106118310
They wouldn't need all that data if they had a better architecture.
Anonymous No.106118329 [Report] >>106118339
>>106118310
They'd have all the data they want if they didn't filter 95% of it
Anonymous No.106118339 [Report] >>106118352 >>106118360
>>106118329
Furry RP logs aren't worth training on.
Anonymous No.106118352 [Report]
>>106118339
If only that was all they were filtering
Anonymous No.106118353 [Report] >>106118364 >>106118380
>>106118310
you could double the amount of training data assuming it was just the same old stuff and it wouldn't lead to any gains
the real challenge isn't data quantity, it's data quality
Anonymous No.106118360 [Report] >>106118371 >>106118385 >>106118422 >>106118490
>>106118339
They are, though. The contrast between human to human RP logs and the various human/furry combination RP logs contains a shitload of information about sensations, textures, etc. Intangible things that would otherwise be impossible to incorporate into its knowledge.
Furry ERP logs are an absolute goldmine and if you think otherwise you're a dumb fucking pajeet that doesn't belong in this industry.
Anonymous No.106118364 [Report] >>106118384
>>106118353
Everyone does love Phi.
Anonymous No.106118371 [Report] >>106118379 >>106118390
>>106118360
what about pony rp logs?
Anonymous No.106118379 [Report] >>106118387
>>106118371
That's a subset of furry
Anonymous No.106118380 [Report] >>106118432
>>106118353
Actually, quality matters less the greater your scale. What does matter is dataset diversity. Due to how badly the current architectures are at generalization, you want to train on as many different things as possible. Ideally it'd cover the infinite set of possible queries a user could ask.
Anonymous No.106118384 [Report]
>>106118364
phi does illustrate this point very well though, the only problem is their definition of quality
Anonymous No.106118385 [Report] >>106118504
>>106118360
That's like saying CP is worth training on, whereas any capable model would be able to infer what CP would look like from adult porn and knowledge of child physiology and behaviour psychology
Anonymous No.106118387 [Report] >>106118401
>>106118379
no it's not
Anonymous No.106118390 [Report]
>>106118371
yeah why not, if it starts to degrade performance in other areas just add more parameters. aren't they even trying?
Anonymous No.106118401 [Report] >>106118407
>>106118387
Explain why not.
Anonymous No.106118407 [Report] >>106118412 >>106118415
>>106118401
Ponies stand on 4 legs.
Anonymous No.106118412 [Report] >>106118448
>>106118407
And so do most animals.
Anonymous No.106118415 [Report] >>106118448
>>106118407
Furries often do too
Anonymous No.106118422 [Report]
>>106118360
That reminded me how I used nu-235B asked for a kitsune and it gave her a snout without any prompting. I was surprised. But yeah they are using furry ERP logs now.
Anonymous No.106118432 [Report] >>106118441 >>106118481
>>106118380
We don't train models on white noise. Why?
Anonymous No.106118441 [Report]
>>106118432
Because we do generally want them to somewhat know actual things.
Anonymous No.106118448 [Report] >>106118454 >>106118468
>>106118415
>>106118412
You will never be a brony
Anonymous No.106118454 [Report]
>>106118448
Good.
Anonymous No.106118468 [Report]
>>106118448
Thanks?
Anonymous No.106118481 [Report] >>106118541 >>106118554
>>106118432
You know as well as I do that when we talk about data "quality" in /lmg/, it's not about filtering data out that's filled with random characters and repeating lines. It's obvious that there still needs to be a level of meaning in the data. That's what we mean by dataset diversity, and what researchers mean by the same term when they talk about it as well.
Anonymous No.106118490 [Report] >>106118504
>>106118360
>you're a dumb fucking pajeet that doesn't belong in this industry.
What industry, ERP industry? Underage ID verification cannot come soon enough.
Anonymous No.106118504 [Report]
>>106118385
I mean it depends where you draw the ethical line.
It's like human testing, animal testing, etc.
There are shortcuts that can boost our knowledge and capabilities as a species.
But where do you draw the line? And it's a rhetorical question really. Everyone feels differently.
Would I be comfortable with models being pretrained on fictional erotic literature involving children in order for models to better understand behavioral psychology, etc? Yeah. It's just fucking words, get a life, go touch grass. etc.
If you're talking about like actual CP images/videos, from seized evidence of child abuse- or downloaded off of the dark web- That's pushing it for me. But I would support that data being used to say... train models for CSAM detection. Like current CSAM detection APIs rely heavily on file hashes and individual files having a history. And I suppose vision models that we have now are good enough to play that role. But having like really solid training data, and a lot of it, would allow you to train much more smaller, more specialized, and more widely deployable solutions.
>>106118490
You're a dumb pajeet. You are not worthy of a frank discussion.
Anonymous No.106118541 [Report] >>106118564
>>106118481
There will never be "high quality ERP" because the industry has determined that pornography is "low quality data".
Anonymous No.106118554 [Report] >>106118788 >>106118846
>>106118481
have we proven that adding new domains doesn't hurt other domains? will a model trained on furry erp be more likely to give false veterinary advice?
Anonymous No.106118564 [Report] >>106118573
>>106118541
llama 3 was this bad already, imagine what happened in llama 4
good thing meta is history now
Anonymous No.106118573 [Report] >>106118591
>>106118564
superintelligence will save llama 5
Anonymous No.106118591 [Report] >>106118605
>>106118573
Which Wang will talk Zuck into handing to Altman
Anonymous No.106118605 [Report]
>>106118591
>Zuck replacing Nutella as Sam's moneypig patreon
I don't see that happening, but it would be hilarious if it did.
Anonymous No.106118711 [Report]
>>106118310
Nah you can always get even more fucking data. You just need to dig deeper. Go digging through old german Team Speak and Mumble servers. Digitize every bit of hand written letters you can find.
Anonymous No.106118788 [Report] >>106118846
>>106118554
I'm too lazy to go retrieve it, but there was a paper that claimed that prepending each site's URL to each sample mitigated the issue of knowledge being confused like that, which according to the same paper, does happen. Most labs worth their salt should've already been doing this for a while now. But this issue should also only apply to a small scale dataset which I believe is what they tested in the study. I don't know or remember if anyone investigated it at a larger scale, but generally speaking in ML, greater scale cancels out a bunch of issues like these. Greater diversity (within domain) too.

For example, let's say you train on furry ERP and veterinarian advice. At first, the model will likely conflate the two contexts. But as you train for longer, the model will learn the more subtle differences and be able to tell that they are different context, and that it should predict something different. Additionally, if you train on more different variations of furry ERP and variations of veterinarian contexts, it'll better learn what the tells of a veterinarian context and a furry ERP context are.
Anonymous No.106118846 [Report] >>106118888
>>106118554
>>106118788
Speaking as someone who's trained these kinds of models (tho not as big as these huge LLMs) the answer is that models actually do get confused a little bit. That's why chink LLMs will occasionally spew chinese characters in the middle of english output. But, more data is usually still better.
That is, if you want a vet model, (1T of vet data) > (0.5T of vet data + 0.5T furry porn) > (0.5T of vet data). These models are so large and training is so primitive that more data will almost always help unless the extra data has nothing in common at all with what you want. But that's usually not true.
Anonymous No.106118888 [Report]
>>106118846
I hope next generation datasets will be at least half furry porn
Anonymous No.106118994 [Report]
>>106117524
Getting stabbed to death with Miku!
Anonymous No.106119129 [Report] >>106119412
Semi-relevant to the data quality discussion.

When Bad Data Leads to Good Models
https://arxiv.org/abs/2505.04741
In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy experiment to study how data composition affects the geometry of features in the representation space. Next, through controlled experiments with Olmo-1B models trained on varying ratios of clean and toxic data, we find that the concept of toxicity enjoys a less entangled linear representation as the proportion of toxic data increases. Furthermore, we show that although toxic data increases the generational toxicity of the base model, it also makes the toxicity easier to remove. Evaluations on Toxigen and Real Toxicity Prompts demonstrate that models trained on toxic data achieve a better trade-off between reducing generational toxicity and preserving general capabilities when detoxifying techniques such as inference-time intervention (ITI) are applied. Our findings suggest that, with post-training taken into account, bad data may lead to good models.
Anonymous No.106119245 [Report] >>106119258 >>106119292 >>106119396
Threadly reminder Anthropic supports and enables human rights abuse and massive surveillance despite claiming to be 'humanists'.
Anonymous No.106119258 [Report]
>>106119245
members of the san francisco rationalist cult need to be institutionalized
Anonymous No.106119292 [Report]
>>106119245
Remember "do no evil" Google?
Anonymous No.106119396 [Report]
>>106119245
>corporation headed by a jew is evil
bigger chance of winning the lottery then ever guessing this fr
Anonymous No.106119399 [Report] >>106119444 >>106119507 >>106119754 >>106119800 >>106119802
wan is pretty good
Anonymous No.106119412 [Report] >>106119509
>>106119129
I remember people used to make loras from datasets where AI fucked up the hands so that they could learn to recognize the types of ways they would fuck up hands, and then apply the lora with negative weight. Seems like the same principle here.
Anonymous No.106119426 [Report]
Hey all some retard fucked up his smut writeup I told him I would read.
The concept is hot and the dialog is even good but the autist mixed 1rst 2nd and 3rd person language into the same scenes. Whats a quick option I can use that will read the whole thing and rewrite it in 3rd person?

I tried using perplexity.ai but it has a character limit and it also started making shit up.

AI newfag here, just a crumb of handholding please?
Anonymous No.106119444 [Report]
>>106119399
Gonna go show this to the resident pregfag on /b/. Should please him. Nice pregsex vid anon
Anonymous No.106119507 [Report] >>106119666
>>106119399
Imagine playing a freeform H-game with scenes like these it'd just generate on the fly
Anonymous No.106119509 [Report]
>>106119412
Shut up you lying moralfag pile of shit.
Anonymous No.106119524 [Report] >>106119586 >>106119657
If we were magically able to exhaustively try every combination of floating point weights, how smart do you think, say, a 70B transforemr model would be? How much smarter than our current models?
And if we were able to exhaustively try every possible program that would fit on the same computer that runs the LLM, how much smarter would the smartest possible program be than the smartest possible transformer model?
Anonymous No.106119586 [Report]
>>106119524
We don't know how "smart" either of those can be. If the model can generalize, it'd be smarter by default. Unless the program can generalize just as well, and then the program would be a general model.
It's a stupid question.
Anonymous No.106119601 [Report] >>106119610
How is sex in 3steps?
Anonymous No.106119610 [Report] >>106119615 >>106119625
>>106119601
1) in
2) out
3) wipe
Anonymous No.106119615 [Report]
>>106119610
/thread
Anonymous No.106119625 [Report] >>106119631
>>106119610
>3) wipe
Argument optional?
Anonymous No.106119631 [Report]
>>106119625
He asked for 3 steps
Anonymous No.106119649 [Report] >>106120350
speaking of transformers
>70b-q8
>9.3k tokens in and {{char}} starts to have some of {{user}}'s traits
god I hate transformers and their attention. if sama's attention sink won't work in gpt-oss I'll become a lecun shill.
Anonymous No.106119652 [Report]
New PR boys.
Anonymous No.106119654 [Report]
No but rly is step3 better or worse than glm for sex?
Anonymous No.106119657 [Report] >>106120103
>>106119524
>If we were magically able to exhaustively try every combination of floating point weights, how smart do you think, say, a 70B transforemr model would be? How much smarter than our current models?
Significantly. For such a model, it would likely be far more expressive and lose a significant degree of redundancy, so quantization to four bits would likely cripple the model to the point of uselessness. I'd expect there are significantly better models that are possible, but I don't think our current optimizers and training methods are well equipped to find these
>And if we were able to exhaustively try every possible program that would fit on the same computer that runs the LLM, how much smarter would the smartest possible program be than the smartest possible transformer model?
We're basically asking the inverse question of Kolmogorov Complexity here, which is "given a string of information, what is the minimum number of bits you need to represent it?" The inverse question is, "given a specified number of bits, how many strings from some specific domain can you represent?" I'm not sure, but I do think transformer architecture models probably aren't the most efficient in that regard. The theoretical best model might not even be "trainable" or "findable" in a practical sense, so we'd have to come as close as we could to it. I expect somebody will find an architecture that trains as well, increases efficiency gains, and gets closer to that optimal model eventually
Anonymous No.106119666 [Report]
>>106119507
>imagine your freeform H-game can only do mosquito bites
Grim.
Anonymous No.106119754 [Report]
>>106119399
gross!
Anonymous No.106119800 [Report]
>>106119399
>The image was never in our databases.
Anonymous No.106119802 [Report]
>>106119399
>n-no anon! you are a schizo and mikutroons are normal
yeah right
Anonymous No.106119933 [Report]
>>106119921
>>106119921
>>106119921
Anonymous No.106120103 [Report]
>>106119657
For the first question, I think we could maybe make a 7B model as good as a 70B model, but nut anything much more dramatic than that.
The local minima in neural networks generally results in accuracy values that are fairly close to the accuracy values of global minima.
At least when taking into account non CoT models. If we take into account CoT then it becomes a much more nuanced question. It's even possible that our current approach to CoT is fundamentally wrong and the model should think in its own machine language rather than human language for optimal optimal accuracy, and we just don't have enough computational power to find that optimal internal language just from random variations and RL.
As for the second question, I'm not sure how much these formalisms reflect what we think of as intelligence. Suppose we ask an oracle to find the optimal program that runs on current hardware and produces the closest possible approximation to some language dataset within a certain time limit. Once you have it you can't just use it to infer on other datasets. Maybe it could be used as a base to get a more general model, or maybe it's a one off thing that's impossible to adapt to some other task. I don't think we know the answer to that question with our current theoretical knowledge. So in Solomonoff induction, is the intelligence the product of the oracle, or the oracle itself? Like I say, the product of the oracle might not be practically useful. And if it's the optimizer itself, by the no free lunch theorem the only way to get faster inference on some problems (for example those with low Kolgomorov complexity) is by sacrificing performance on other problems, for example those with high complexity. But I don't understand why the no free lunch theorem is true (it seems trivial to find counterexamples that are asymptotically slower for all cases, for example for a problem with description of length n, before finding the answer compute Ack(n)) so I might be wrong.
Anonymous No.106120350 [Report]
>>106119649
To be fair, 70B is a pretty old model now. Have you tried K2, Deepseek, etc?