/lmg/ - Local Models General - /g/ (#106119921) [Archived: 110 hours ago]

Anonymous
8/2/2025, 10:46:01 PM No.106119921
00008-3710413554
00008-3710413554
md5: 1bc74e2ed2ec9aac1b403ee146773910🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106113484 & >>106108045

►News
>(08/01) XBai o4 32B released: https://hf.co/MetaStoneTec/XBai-o4
>(07/31) Qwen3-Coder-30B-A3B released: https://hf.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
>(07/31) Command A Vision: Built for Business: https://cohere.com/blog/command-a-vision
>(07/31) Step3 multimodal reasoning 321B-A38B released: https://stepfun.ai/research/en/step3
>(07/31) Committed: llama-server : implement universal assisted decoding: https://github.com/ggml-org/llama.cpp/pull/12635

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>106121982 >>106127210
Anonymous
8/2/2025, 10:46:18 PM No.106119924
00009-3710413554
00009-3710413554
md5: 7af63d275d905d271b2f7f2cbe1e90c3🔍
►Recent Highlights from the Previous Thread: >>106113484

--Paper (old): When Bad Data Leads to Good Models:
>106119129 >106119412
--MoE efficiency vs dense models under sparsity, hardware, and deployment constraints:
>106114397 >106114859 >106114920 >106115069 >106116048 >106116070 >106116124 >106116084 >106116548 >106116593
--Alleged benchmaxxing in MindLink-72B via Qwen2.5 base with test contamination concerns:
>106113679 >106113776 >106113807 >106117179 >106117203 >106117222
--XBai-o4 32B model claims and skepticism over novelty and performance:
>106116827 >106116886 >106116863 >106116920 >106116942 >106116978 >106117065 >106117106 >106117125 >106117194 >106117141 >106117142 >106117154 >106117164
--Debate over leaked model context length and training strategies for long-context LLMs:
>106117295 >106117317 >106117367 >106117621 >106117701 >106117924 >106118109 >106118182 >106118311
--Determining context size and model loading limits:
>106113641 >106113669 >106113709 >106113714 >106113765 >106113775 >106113791 >106113814 >106113839 >106113857 >106114887 >106114993 >106113689
--Future of dynamic parameter scaling in MoE architectures:
>106113836
--Debate on whether LLM plateau stems from data exhaustion or suboptimal training and filtering:
>106118310 >106118322 >106118324 >106118325 >106118329
--Phi-4's excessive reasoning loops waste tokens and frustrate users:
>106114878 >106114939 >106114995 >106116206 >106116288 >106116277
--Horizon Alpha/Beta models show strong NSFW filtering and possible red teaming via user prompts:
>106114882 >106114903 >106115173 >106115377
--New GLM-4.5 MoE pull request lands amid skepticism and hype cycles:
>106113884 >106113968 >106113992 >106114043 >106114050 >106114467 >106115095 >106115332
--Miku (free space):
>106113767 >106114066 >106114076 >106114153 >106114457 >106114483 >106117524 >106119399

►Recent Highlight Posts from the Previous Thread: >>106114309

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous
8/2/2025, 10:49:12 PM No.106119952
https://huggingface.co/ubergarm/GLM-4.5-Air-GGUF/tree/main/IQ4_KSS
Anonymous
8/2/2025, 10:49:57 PM No.106119955
u
u
md5: 862b412dde38010f8ef88e49e6ec4c67🔍
Replies: >>106119962
Anonymous
8/2/2025, 10:50:08 PM No.106119957
Hey all some retard fucked up his smut writeup I told him I would read.
The concept is hot and the dialog is even good but the autist mixed 1rst 2nd and 3rd person language into the same scenes. Whats a quick option I can use that will read the whole thing and rewrite it in 3rd person?

I tried using perplexity.ai but it has a character limit and it also started making shit up.

AI newfag here, just a crumb of handholding please?
Replies: >>106119965 >>106120041 >>106120048
Anonymous
8/2/2025, 10:50:32 PM No.106119962
>>106119955
Heeeey stop posting my face!
Anonymous
8/2/2025, 10:50:44 PM No.106119965
1754167797701773
1754167797701773
md5: 862b412dde38010f8ef88e49e6ec4c67🔍
>>106119957
Replies: >>106120011
Anonymous
8/2/2025, 10:50:46 PM No.106119966
WanVideo_I2V_00030_thumb.jpg
WanVideo_I2V_00030_thumb.jpg
md5: 00d38ff053ffcb2e23b6f03da79b5143🔍
best erp model under 40b? im tired of the usual ones, and i havent seen any new ones either :(
Replies: >>106119985
Anonymous
8/2/2025, 10:51:58 PM No.106119985
>>106119966
Which are the usual ones?
Replies: >>106119992 >>106119998
Anonymous
8/2/2025, 10:52:34 PM No.106119992
>>106119985
Once and for all.
And all for once.
Replies: >>106120008
Anonymous
8/2/2025, 10:53:18 PM No.106119998
>>106119985
rocinante cydonia ms mag mel mxxxxxxxx 22b meme merge
new qwen3 3b 30b is nice but sloppy
Replies: >>106120008
Anonymous
8/2/2025, 10:53:34 PM No.106120003
Horizon Beta is a rather cucked model.
Horizon Alpha is somewhat better, but still NSFW-avoidant.
Hopefully the open-weight OAI models don't end up being like the Beta one.
Replies: >>106126174
Anonymous
8/2/2025, 10:54:32 PM No.106120008
>>106119992
Well, then you are out of luck until the Aboleths come from the far realms with their models trained on meat computers and fueled by distorted souls.

>>106119998
Have you tried QwQ and its fine tunes like Snowdrop?
Replies: >>106120031 >>106120409
Anonymous
8/2/2025, 10:55:01 PM No.106120011
>>106119965
H-hey! S-stop that!
Anonymous
8/2/2025, 10:56:55 PM No.106120031
>>106120008
uggggggggggggghhhhhhhhh 4t/s and thinking?! fine ill try them out, i multitask even with 20t/s anyways
thanks for the recommendation anon <3
Replies: >>106120045
Anonymous
8/2/2025, 10:57:56 PM No.106120041
>>106119957
Dunno. Depends on your hardware. Read the lazy guide in the OP. Download this model:
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_K_M.gguf
or whichever you can fit on your system and give it a go.
Play around with the model itself. It's a tool. Learn to use it.
If the text is long, don't try to do the whole thing at once. Grab a chunk, have it fix it and continue with the next. A simple instruction like "Rewrite the following text in 3rd person: {the text chunk here}" will get you started.
Replies: >>106120069
Anonymous
8/2/2025, 10:58:04 PM No.106120045
>>106120031
You did say lower than 40B.
Anonymous
8/2/2025, 10:58:10 PM No.106120048
>>106119957
>divide story into variables by paragraph break via regex
>feed each paragraph and its predecessor for context (if it's not the first paragraph) to LLM and ask it to output a replacement paragraph that is completely unchanged other than third person perspective if it's not already.
>overwrite old variable and write out to text file
>repeat all the way to the end.
Replies: >>106120069
Anonymous
8/2/2025, 11:00:20 PM No.106120058
file
file
md5: 16f06ffa2986fd91321554fe80abf2ed🔍
Why didn't they release the gguf's by now? They are trusted by wallmart.
Anonymous
8/2/2025, 11:01:41 PM No.106120069
>>106120041
>>106120048
That is very hwlpful. Thank you.
Anonymous
8/2/2025, 11:03:08 PM No.106120082
what happened to gpt5 and gpt-oss today
Replies: >>106120094 >>106128131
Anonymous
8/2/2025, 11:04:43 PM No.106120094
>>106120082
Needs 2 more weeks of safety training.
Replies: >>106120107
Anonymous
8/2/2025, 11:05:19 PM No.106120102
"her top riding up" when she's leaning forward which would do the exact opposite. What is wrong with drummer forcing this phrase everywhere?
Anonymous
8/2/2025, 11:05:49 PM No.106120107
>>106120094
a mere fortnight you say?
Anonymous
8/2/2025, 11:06:44 PM No.106120115
>>106119657
For the first question I think we could maybe make a 7B model as good as a 70B model, but nut anything much more dramatic than that.
The local minima in neural networks generally results in accuracy values that are fairly close to the accuracy values of global minima.
At least when taking into account non CoT models. If we take into account CoT then it becomes a much more nuanced question. It's even possible that our current approach to CoT is fundamentally wrong and the model should think in its own machine language rather than human language for optimal optimal accuracy, and we just don't have enough computational power to find that optimal internal language just from random variations and RL.
As for the second question, I'm not sure how much these formalisms reflect what we think of as intelligence. Suppose we ask an oracle to find the optimal program that runs on current hardware and produces the closest possible approximation to some language dataset within a certain time limit. Once you have it you can't just use it to infer on other datasets. Maybe it could be used as a base to get a more general model, or maybe it's a one off thing that's impossible to adapt to some other task. I don't think we know the answer to that question with our current theoretical knowledge. So in Solomonoff induction, is the intelligence the product of the oracle, or the oracle itself? Like I say, the product of the oracle might not be practically useful. And if it's the optimizer itself, by the no free lunch theorem the only way to get faster inference on some problems (for example those with low Kolgomorov complexity) is by sacrificing performance on other problems, for example those with high complexity. But I don't understand why the no free lunch theorem is true (it seems trivial to find counterexamples that are asymptotically slower for all cases, for example for a problem with description of length n, before finding the answer compute Ack(n)) so I might be wrong.
Anonymous
8/2/2025, 11:24:35 PM No.106120273
>>106119586
Well, transformers are obviously a subset of the "all possible programs" set, so yes, the optimal program is at least as good as the optimal transformer.
If I had one chance to ask an oracle the result of some computation regardless of the amount of memory or time it took, I'm not sure what it would be, though. Because like I said I'm not sure "intelligence" can be stated in forma terms easily.
Replies: >>106120355
Anonymous
8/2/2025, 11:36:36 PM No.106120347
file
file
md5: cd9167aeb821f343ffb40b7510dc610f🔍
I can't believe they didn't max this one out.
Replies: >>106120384
Anonymous
8/2/2025, 11:37:07 PM No.106120355
>>106120273
>I'm not sure what it would be,
Ask the oracle to write a program that would output the same thing as the oracle itself for all inputs. Now the program is just as good as the oracle.
>I'm not sure "intelligence" can be stated in forma terms easily
Rendering the question moot.
Replies: >>106120520
Anonymous
8/2/2025, 11:40:59 PM No.106120384
>>106120347
Well it would be kind of difficult given that that one's a private bench. The creative writing one is entirely open.
Anonymous
8/2/2025, 11:43:09 PM No.106120400
>no chink model released today

it's over
Anonymous
8/2/2025, 11:43:44 PM No.106120409
file
file
md5: 86061d0decf668b97ce4f3bab1015d7d🔍
>>106120008
is this what qwq snowdrop is supposed to be like? using recommended samplers from https://huggingface.co/trashpanda-org/QwQ-32B-Snowdrop-v0
Replies: >>106120454 >>106120502 >>106120521 >>106120547 >>106120675
Anonymous
8/2/2025, 11:48:53 PM No.106120454
>>106120409
What do you expect it to do?
Replies: >>106120530
Anonymous
8/2/2025, 11:53:10 PM No.106120502
>>106120409
>she pissed herself in terror
somewhat expected if your persona is blank
Replies: >>106120530
Anonymous
8/2/2025, 11:55:06 PM No.106120520
>>106120355
The point of the oracle is not that it executes some special program, the point of the oracle is that it does magic (compute the output of a TM in constant time, or even solve the halting problem by returning a special "no halt" error code if the program runs forever).
If you ask it to output a program that does the same thing as the oracle, depending on the exact formulation of the question, it will return either the "no halt" error because there is no such program, or the empty set.
Replies: >>106120614
Anonymous
8/2/2025, 11:55:12 PM No.106120521
>>106120409
Other than some repetition, what's the issue?
Replies: >>106120530
Anonymous
8/2/2025, 11:56:26 PM No.106120530
file
file
md5: 39394d3f2872a134cec61cd593a77043🔍
>>106120502
it isnt blank
>>106120454
well idk its feeling samey compared to qwen 3 30b a3b thinking (new)
>>106120521
general slop, but it seems promising so ill give it a more fair try than just a shitty sentence request
Replies: >>106120558
Anonymous
8/2/2025, 11:59:43 PM No.106120547
>>106120409
>is this what qwq snowdrop is supposed to be like?
No idea.

>using recommended samplers
Remove
> top_a at 0.3, TFS at 0.75, repetition_penalty at 1.03,
Anonymous
8/3/2025, 12:00:52 AM No.106120558
>>106120530
Tbh there just aren't any small models that are free of slop. Even most big models have slop.
Replies: >>106120592 >>106121322
Anonymous
8/3/2025, 12:04:25 AM No.106120592
>>106120558
i wouldnt mind a 100b moe if good, i tried a few 70bs (iq4xs) and they werent that impressive (1-2t/s)
sucks that hunyuan moe is shit, llama 4 scout is shit, glm 4 air is probably shit from anon's tests when it came out but ill give it a spin once proper ggufs are out, dots llm is shit according to anons
rip
Replies: >>106128065
Anonymous
8/3/2025, 12:08:36 AM No.106120614
>>106120520
>depending on the exact formulation of the question, it will return either the "no halt" error because there is no such program, or the empty set.
We're discussing a hypothetical. My oracle can make a program that can replicate the function of the oracle itself perfectly. They'd be indistinguishable.
Replies: >>106120725
Anonymous
8/3/2025, 12:11:58 AM No.106120640
file
file
md5: 2ae05d1e66220ee50a638800d02aaf4c🔍
snowdrop v0 is a bit silly
Anonymous
8/3/2025, 12:15:31 AM No.106120675
>>106120409
snowdrop is a merge of qwq and regular instruct. mathematically speaking it should be shit.
Replies: >>106120692 >>106120843
Anonymous
8/3/2025, 12:17:15 AM No.106120692
>>106120675
what am i suppos'ed to use? ms mag mell mxxxxxxxxx 22b?
Replies: >>106120744
Anonymous
8/3/2025, 12:20:08 AM No.106120725
>>106120614
At that point it's not an oracle, it's a genie.
Replies: >>106120755
Anonymous
8/3/2025, 12:20:55 AM No.106120733
undi.. sao... envoid.. save us
Replies: >>106120838 >>106120913
Anonymous
8/3/2025, 12:22:09 AM No.106120744
>>106120692
https://www.youtube.com/watch?v=kIBdpFJyFkc&t=128s
Or wait for glm air. That should run well on anything.
Replies: >>106127478
Anonymous
8/3/2025, 12:22:42 AM No.106120755
>>106120725
Times are tough. Having multiple jobs is fairly common.
Anonymous
8/3/2025, 12:24:16 AM No.106120763
file
file
md5: 2a178c1675ecbf014b898b82058d88cc🔍
come on man..
Anonymous
8/3/2025, 12:30:58 AM No.106120817
file
file
md5: 735422e77d3d1237d4162d8ccade3161🔍
stablelm reigns supreme
Anonymous
8/3/2025, 12:33:30 AM No.106120838
ChatGPT Image Aug 2, 2025, 05_33_14 PM
ChatGPT Image Aug 2, 2025, 05_33_14 PM
md5: 5e82a958cd09ab2fb6f9d3c12d6a39b6🔍
>>106120733
Replies: >>106121261
Anonymous
8/3/2025, 12:33:54 AM No.106120843
>>106120675
Why? It sounds like a fine idea. Merging a finetune with its base model should produce something that's mathematically like a weaker strength version of the finetune.
Anonymous
8/3/2025, 12:45:57 AM No.106120913
>>106120733
>envoid
Who?
Anonymous
8/3/2025, 12:47:55 AM No.106120930
1728124218982093
1728124218982093
md5: ec5375afc6ba88404be03413e96f75d2🔍
>ik_llamacpp died
Replies: >>106120949 >>106122837
Anonymous
8/3/2025, 12:50:41 AM No.106120949
>>106120930
I think it's pretty funny that they have a PR parallel to llama.cpp's to implement the new GLM MoEs.
Replies: >>106122560
Anonymous
8/3/2025, 1:09:23 AM No.106121097
So, the next step after MoE is to have every expert in a separate model, running on separate compute?
Replies: >>106121115 >>106121189 >>106121190
Anonymous
8/3/2025, 1:11:05 AM No.106121115
>>106121097
Cudadev suggeste just that a couple months ago.
Or at least something close to that.
Anonymous
8/3/2025, 1:19:59 AM No.106121189
>>106121097
Probably dumb. Like we already have models dedicated to coding, driving cars, vision, video, image gen, at best what is this going to add? I imagine more specialization, like dedicated trivia, history, R counting models. Maybe there would be models decicated not just to Java, but maybe a model specifically for building websites in java, one for making simple conversion scripts etc.
Replies: >>106121287
Anonymous
8/3/2025, 1:20:04 AM No.106121190
>>106121097
I still think the ideal case would be finding a way where we could have an architecture where 99% of the model can be offloaded to disk cache and 1% on CPU with reasonable inference speeds
I'm not sure if that's possible due to its slowness, but disk cache is the shit everyone has plenty of and it's currently useless when it comes to inference. Solving this would make LLMs truly, actually local
Replies: >>106121262 >>106121334 >>106121430
Anonymous
8/3/2025, 1:25:00 AM No.106121233
file
file
md5: 902b859d5e34eb47026ff55ee87596b4🔍
qwen3 30b a3b thinking (new) is a little nigger
Anonymous
8/3/2025, 1:28:18 AM No.106121261
omg it not migu with only miku nooo
omg it not migu with only miku nooo
md5: 7321c02f9c50da8242141b25caa7ff44🔍
>>106120838
Anonymous
8/3/2025, 1:28:23 AM No.106121262
>>106121190
isn't this just --mmap
Anonymous
8/3/2025, 1:32:03 AM No.106121287
>>106121189
I also don't get why people would want to take a full 5T R4 general model, remove 99% of experts and create R4 12B SEX!!!!!! from all the sex/anime/biology/writing experts.
Anonymous
8/3/2025, 1:35:22 AM No.106121322
>>106120558
Ultimate trvthnvke blackpill: all models are slopped because they're trained on a relatively unbiased dataset of all human writing, and in that dataset the most similar types of writing to RP logs are femgooner "romance" novels and shit-tier fanfiction. The slop is just what the LLM (justifiably) believes this genre of human writing is supposed to be like.
Replies: >>106121398 >>106121424 >>106121431 >>106121692 >>106121742 >>106126253
Anonymous
8/3/2025, 1:36:59 AM No.106121334
>>106121190
You can't do shit with the weights until they're shoved into memory for processing. For that to work models would need to be smaller and then you'd be able to run it off ram anyway.
Replies: >>106121488
Anonymous
8/3/2025, 1:44:44 AM No.106121398
llmdeadman
llmdeadman
md5: 8685b39ddbc4824cfcfe7eaddc1fa214🔍
>>106121322
Everything smelling of ozone... It's disappointing to me and it's sad, but at the same time, once again, all the lecunnies said this was gonna happen and.... he was right.
Replies: >>106121431 >>106121449
Anonymous
8/3/2025, 1:47:21 AM No.106121424
>>106121322
Femgoon slop is one thing, but thinking about all the woke corpus of texts being fed into the beast's belly fills me with dread.
The joke about commie memes being a giant wall of text is not so funny anymore.
Replies: >>106121443
Anonymous
8/3/2025, 1:48:01 AM No.106121430
>>106121190
A typical M.2 SSD these days might get 3GBps read throughput. If you want to hit at least 10 tok/s that means at most 300MB active per token, call it 600M weights at Q4. Likewise at Q4 let's assume a 4TB SSD devoted entirely to the model can hold 8T weights. So a hypothetical SSDmaxxed 8T-A0.6B MoE could actually work in theory. It would be about as smart as a 70B dense model.
Replies: >>106121453 >>106121560
Anonymous
8/3/2025, 1:48:03 AM No.106121431
>>106121398
>>106121322
I mean there's more that can be done here. Companies up till now just haven't really prioritized it. You can certainly tune and more probably use RL to make a model slop less. Even LeCun suggested that RL can be used for adjusting the world model, even if it sucks in terms of efficiency.
Replies: >>106121504
Anonymous
8/3/2025, 1:49:52 AM No.106121443
>>106121424
The one saving grace we have is that at least in the base model, LLMs aren't predicting the staristical average of all texts, they're predicting the sort of text it looks like they're continuing based on the context. So in theory at least all that garbage just drops out of the probability distribution as long as you prefill the AI saying nigger first.
Replies: >>106121467
Anonymous
8/3/2025, 1:50:24 AM No.106121449
>>106121398
>Everything smelling of ozone
FUCKING
EVERYTHING
SMELLS OF OZONE
FUCK
Replies: >>106121464 >>106121590 >>106121941
Anonymous
8/3/2025, 1:50:38 AM No.106121453
>>106121430
>8T-A0.6B MoE
>would be about as smart as a 70B dense model
Not a-fucking-gain.... we had TWO threads wasted on that shit already.
Replies: >>106121468 >>106121500 >>106121520
Anonymous
8/3/2025, 1:51:39 AM No.106121464
>>106121449
But it tastes like chicken.
Replies: >>106121475
Anonymous
8/3/2025, 1:51:50 AM No.106121467
>>106121443
Does adding 'Sure, nigger,' instead of just 'Sure' to the pre-prompt actually have a decent effect?
Anonymous
8/3/2025, 1:51:59 AM No.106121468
>>106121453
Yup, and no one involved in it learned anything, because no one ever does on the internet. People just speak over each other instead.
Anonymous
8/3/2025, 1:52:33 AM No.106121475
>>106121464
FUCK.
Anonymous
8/3/2025, 1:53:33 AM No.106121488
>>106121334
I'm not necessarily talking weights, but moreso partitioning "slower" things to disk cache and "faster" things to memory
As it is, knowledge and reasoning are sort of entangled in this infinite orgy with one another, but do I really need to have the derivation of the second law of thermodynamics on hand when I'm writing vanilla smut?
Probably not, but if so, I feel like there should be some sort of mechanism to grab that information from some sort of knowledge base and ingest it into the running context, while ensuring the underlying generative model is only handling the bareass minimum for logical coherence and consistency
I feel like there's gotta be some sort of way to tie it together beyond some hackneyed external RAG approach, almost like some sort of hierarchical architecture
Replies: >>106121515 >>106121523 >>106121560
Anonymous
8/3/2025, 1:54:36 AM No.106121500
>>106121453
I'd say clean it up jannie, but you're an even lower lifeform who doesn't even have the power to clean up my shitposts.
Anonymous
8/3/2025, 1:54:55 AM No.106121504
>>106121431
Hard for me to imagine everything not smelling of ozone where 90+% of training is teaching the model there is only one correct next token.
Replies: >>106121536 >>106121557
Anonymous
8/3/2025, 1:55:00 AM No.106121507
When are we going to get any way to run Step3? I know it's not going to get supported on llama.cpp this decade because of its fancy new attention mechanism but it's not even being hosted via openrouter at this point.
It's a shame because it seems okay on the chink company's own website.
Replies: >>106122468
Anonymous
8/3/2025, 1:55:40 AM No.106121515
>>106121488
You should go talk about this with ChatGPT, you sound like exactly the sort of person who gets oneshot into LLM psychosis.
Replies: >>106121531
Anonymous
8/3/2025, 1:56:12 AM No.106121520
>>106121453
It is free (you) estate. Even when you know it is not real the (you)'s are always real.
Anonymous
8/3/2025, 1:56:32 AM No.106121523
>>106121488
Didn't microsoft post some code for something like that? An adapter based RAG that would be applied directly to the neural network in runtime?
Replies: >>106121556
Anonymous
8/3/2025, 1:57:21 AM No.106121531
>>106121515
I just want to fuck foxgirls with a local model anon
Let me have my dream
Anonymous
8/3/2025, 1:57:47 AM No.106121536
>>106121504
If you really want creativity and not just the statistically most common response at every point, the trick is to give up on getting a single perfect answer from the model. Crank up the temperature to the edge of complete incoherence and run 3-20 completions of 20-100 words each in parallel each time.
Anonymous
8/3/2025, 2:00:05 AM No.106121556
>>106121523
Wasn't aware of this. This what you're talking about?
https://www.microsoft.com/en-us/research/blog/introducing-kblam-bringing-plug-and-play-external-knowledge-to-llms/
Replies: >>106121596
Anonymous
8/3/2025, 2:00:07 AM No.106121557
>>106121504
That just means the RL needs to be a bit more extensive.
Anonymous
8/3/2025, 2:00:28 AM No.106121560
>>106121488
>do I really need to have the derivation of the second law of thermodynamics on hand when I'm writing vanilla smut?
At the very least you need inverse kinematics. You've seen anons complain about spatial awareness.
Consumer ssds have like 3gb/s bandwidth. If you have one of those, now run a *sustained* read for however many terabytes you have. Anon in >>106121430 did the maths. Do you want a model with 0.6b active params at q4?
>probably, feel, maybe, gotta, sort, hierarchical
There's at least one already. It does sudoku and mazes.
Anonymous
8/3/2025, 2:05:13 AM No.106121587
Man, LLMs are amazing. It's like Google except it's actually able to understand what you're looking for and give you exactly the information you want without any additional clicking or perusing. This thing is really satisfying a lot of my curiosity that I simply just wouldn't have pursued because I just know Google would have me on a wild goose chase to really get down into the depths of the topics.
And the funny thing is that the model I'm using is just the old Gemma 27B. It's not perfect but honestly it's good enough for what it is.
Replies: >>106121728 >>106122459 >>106126274
Anonymous
8/3/2025, 2:05:44 AM No.106121590
>>106121449
>In clinical terms, the human vulvovaginal environment has a natural microbiome dominated by Lactobacillus species, which produce lactic acid and help maintain an acidic pH (typically 3.8–4.5) to protect against pathogens. This environment can produce subtle odors that vary naturally over the menstrual cycle, with hormonal shifts, hygiene, diet, and health status. A mild, slightly tangy or sour scent is normal and healthy, akin to yogurt or fermented foods—this is due to lactic acid and bacterial byproducts. It does not resemble ozone, which is a sharp, metallic, electric-smelling gas (O3) associated with lightning or certain machines.

Well at least pussy doesn't smell like ozone.
Anonymous
8/3/2025, 2:06:26 AM No.106121596
>>106121556
That's exactly it, yeah.
Here's a thought.
As far as I can tell, all other things being equal, more total params = more knowledge, more layers = more capable/intelligent.
MoE makes knowledge sparse, right? Making knowledge retrieval from the weights faster.
Is there a similar approach to make the "intelligent" part of the processing faster?
Maybe having more layers with less params per layer would work.
What would happen if you had a model with both parts, one wider, shallower, sparse part for knowledge and another deeper but narrower for intelligence?
Replies: >>106121611
Anonymous
8/3/2025, 2:08:04 AM No.106121611
>>106121596
Not him but I also had that idea and it's really a question of how stable and scalable it can be made in practice. It's basically an engineering problem, which AFAIK no one has solved yet.
Anonymous
8/3/2025, 2:20:10 AM No.106121692
IMG_8376
IMG_8376
md5: d1cf8b9c8c7ce0df2060a5bf4a0c3c34🔍
>>106121322
I don’t know why more people don’t get this. the sheer size of the datasets drives writing to an average, vs. The Superior. Theyre not using the best texts, they’re using everything.
I think the way it’ll be healed, eventually, is the ability to train a model (in some sense) around a much smaller corpus of just The Superior (whatever that is) and have it reply in that manner.
Replies: >>106121720 >>106121742 >>106124750
Anonymous
8/3/2025, 2:22:19 AM No.106121710
nu-Qwen thinker is indeed a lusty cockmonger. If I didn't know it was Qwen, I'd think it's one of the Drummer™®'s goontunes, but smarter. It still doesn't know a lot and I wouldn't trust it with factual knowledge. Spatial awareness is bad compared to DeepSeek. It likes abusing ***markdown*** and at long context.
It starts.
Writing.
Like.
This.
Which is very annoying.
Also likes to insert "conclusion"/continuation question at the end of every reply. Still, it's definitely worth checking out if you haven't.
Replies: >>106121746 >>106121875
Anonymous
8/3/2025, 2:23:14 AM No.106121720
>>106121692
>they’re using everything
Hardly
Replies: >>106121762
Anonymous
8/3/2025, 2:24:17 AM No.106121728
>>106121587
Yep. ChatGPT has effectively replaced google as my first point of research on any topic. For Linux it’s cut time required to do anything new by 10x. I recently had it find the title of an oddball book just based on a vague childhood recollect of a few plot points. There was no good way to do that before.
Replies: >>106122031
Anonymous
8/3/2025, 2:25:31 AM No.106121742
>>106121322
>>106121692
Then explain me why base models don't suffer as much from slop as instructs. Instructs are trained on datasets written by literal niggers and jeets, that's why they suck.
Anonymous
8/3/2025, 2:25:49 AM No.106121746
>>106121710
Yeah I also think it is great and next level but it is also fucking retarded, has all the problems you mentioned + more. It would be THE cooming model if it wasn't a broken piece of trash that nobody should use.
Anonymous
8/3/2025, 2:26:57 AM No.106121762
>>106121720
If you trained a model in just Hemingway, and authors of that caliber, then trained a model on all the shit off reddit… which would generate better prose?
Instead they do both, but there’s probably 1000x more reddit text than Hemingway. And reddit gets you, at absolute best, tepid writing.
Replies: >>106121827 >>106121840 >>106126285
Anonymous
8/3/2025, 2:31:49 AM No.106121816
I am sitting and waiting for GLM sex but I know I will be disappointed....
Anonymous
8/3/2025, 2:32:44 AM No.106121827
>>106121762
Best way around it is probably to either copy and paste a snippet of text from your author of interest and use it as a prefill, or something a lot like it, then paste it and let it take the wheel from there
Replies: >>106121834
Anonymous
8/3/2025, 2:33:35 AM No.106121834
>>106121827
It doesn't work even for big models.
Replies: >>106121902
Anonymous
8/3/2025, 2:35:07 AM No.106121840
>>106121762
It's not reddit that's getting filtered out. They already consider that "high quality data". It's precisely the books with no-no words and millions of other sources of tokens that *do* get filtered out. That shit needs to be diluted.
Replies: >>106122095
Anonymous
8/3/2025, 2:37:19 AM No.106121862
people having sex with local LLMs are making me sick
Replies: >>106121870
Anonymous
8/3/2025, 2:38:17 AM No.106121870
>>106121862
with envy. Pressed Post too early. Sorry about the noise.
Anonymous
8/3/2025, 2:38:34 AM No.106121875
>>106121710
with the instruct a prompt at the end of the context telling it to write 1-5 paragraphs helps get it out of the staccato one-liner mode
I don't know if you'll have as much success with it with the thinker though, sometimes reasoners can hyperfixate on things like that
Anonymous
8/3/2025, 2:41:30 AM No.106121902
Screenshot 2025-08-02 183930
Screenshot 2025-08-02 183930
md5: 9bf1c50697a381c25ff33b8bf2a92dfc🔍
>>106121834
It does because LLMs are autocompletion machines first, so they'll continue from whatever you give it
Take the word vomit that is Finnegan's Wake, for instance. If you don't know the book, it'd probably be hard to pinpoint where the input text ends and the LLM kicks in
Obviously that's an extreme example, but inferring based on what it's been given and using the proper vectors for the job is an LLM's bread and butter
Replies: >>106121932
Anonymous
8/3/2025, 2:46:14 AM No.106121932
>>106121902
>It does because LLMs are autocompletion machines first, so they'll continue from whatever you give it
Kid I have been here for 2 years. No they don't. Maybe base models do but anything recent and instruct disregards it completely.

On that topic maybe GLM base will free me from this place.
Replies: >>106121952 >>106122086 >>106126295
Anonymous
8/3/2025, 2:47:30 AM No.106121941
>>106121449
What does ozone smell like?
Replies: >>106121959
Anonymous
8/3/2025, 2:49:08 AM No.106121952
>>106121932
And I've been here for six, since OpenAI almost jewed us out of GPT-2 to be exact
Instruct models absolutely can do autocomplete too. The obvious way is prefill, but another way you can do it is to just use a text completion endpoint and then just not use the instruct formatting
Anonymous
8/3/2025, 2:50:17 AM No.106121959
>>106121941
Like oxygen, but with 50% more O.
Anonymous
8/3/2025, 2:53:53 AM No.106121982
>>106119921 (OP)
For those of you who thought Sandisk's 4TB of VRAM was dead, just an update that as of last week, it is still alive:
https://www.tomshardware.com/pc-components/dram/sandisks-new-hbf-memory-enables-up-to-4tb-of-vram-on-gpus-matches-hbm-bandwidth-at-higher-capacity
https://www.sandisk.com/company/newsroom/press-releases/2025/2025-07-24-sandisk-forms-hbf-technical-advisory-board-to-guide-development-and-strategy-for-high-bandwidth-flash-memory-technology
Replies: >>106121990 >>106122001 >>106122043 >>106122098
Anonymous
8/3/2025, 2:55:06 AM No.106121990
>>106121982
0.5T/s?
Anonymous
8/3/2025, 2:56:10 AM No.106122001
>>106121982
Can it run Nemo
Anonymous
8/3/2025, 3:00:22 AM No.106122031
>>106121728
>chatgpt
I only use local models and maybe deepseek on my phone if I'm touching grass.
Anonymous
8/3/2025, 3:02:02 AM No.106122043
>>106121982
Qwen 500b a0.6b when
Replies: >>106122062
Anonymous
8/3/2025, 3:05:31 AM No.106122062
>>106122043
Can't you just run deepseek and only use 1 activated expert or something?
Anonymous
8/3/2025, 3:11:32 AM No.106122086
Skilletcomatose
Skilletcomatose
md5: 8e1be825747c75476e6f8f090f78ad0e🔍
>>106121932
>he's still using instruct tuned models in chat completion mode and expecting decent writing
this shitpost brought to you by text completion without any chat template gang
Replies: >>106122101 >>106122195 >>106126307
Anonymous
8/3/2025, 3:12:51 AM No.106122095
>>106121840
Agree, just using reddit as an example of a large corpus with low prose values. There's worse.
Anonymous
8/3/2025, 3:13:47 AM No.106122098
>>106121982
Two more years huh. And then another 2 more years for it to reach prosumers. and then another 2 more years for consumer.
Anonymous
8/3/2025, 3:14:48 AM No.106122101
>>106122086
based
Anonymous
8/3/2025, 3:26:25 AM No.106122181
Has anyone figured a way to use the free Kimi K2 Openrouter API with code assistants?
Replies: >>106122202
Anonymous
8/3/2025, 3:28:27 AM No.106122195
>>106122086
>this shitpost brought to you by text completion without any chat template gang
R1 called my writing good, talented author, 8/10, while base Dipsy 3 called the same shitty llm assisted writing "writing of a horny 14 year old boy"(I'm old ESL). Base models are still the only honest models.
Replies: >>106122221
Anonymous
8/3/2025, 3:29:13 AM No.106122202
>>106122181
Just write your own. A decent agentic coder is like 500-1000 lines of Python and you can just tell it how to improve itself after the first 300 or so.
Anonymous
8/3/2025, 3:31:46 AM No.106122221
>>106122195
Was this R1 in chat mode or as a text autocomplete? People really underestimate just how much the "I am completing a fictional conversation between user and this specific assistant persona" framing biases the completions, even when the model itself is fully capable of generating better responses outside of that scenario.
Replies: >>106122237
Anonymous
8/3/2025, 3:34:40 AM No.106122237
>>106122221
No system prompt/persona, zero context, standard template. As clean as you can get.
Replies: >>106122260
Anonymous
8/3/2025, 3:38:50 AM No.106122260
>>106122237
>standard template
So chat completion, then?
Replies: >>106122285
Anonymous
8/3/2025, 3:42:18 AM No.106122285
>>106122260
No, text completion with manual template.
<|User|>[My text and rating request here]<|Assistant|>[generated text]
Replies: >>106122295
Anonymous
8/3/2025, 3:43:35 AM No.106122295
>>106122285
That's chat completion with extra steps. What exactly do you think chat completion does? It applies the chat template and runs the same LLM token prediction process as text completion would.
Replies: >>106122312
Anonymous
8/3/2025, 3:46:43 AM No.106122312
>>106122295
I just like to mess around with templates from time to time and find chat completion too inflexible.
Anonymous
8/3/2025, 3:48:03 AM No.106122317
when will SSDmaxxing stop being a meme
Replies: >>106122412 >>106122423
Anonymous
8/3/2025, 3:57:41 AM No.106122392
>https://github.com/ggml-org/llama.cpp/pull/15026
Yep, still being grinded out. Good.
Two more days.
Replies: >>106122409
Anonymous
8/3/2025, 4:01:36 AM No.106122409
>>106122392
Two PRs for the same model.
Interesting.
Makes sense too. Sometimes it's easier to start from zero than try and salvage a mess.
Replies: >>106122420
Anonymous
8/3/2025, 4:02:03 AM No.106122412
>>106122317
>when will SSDmaxxing stop being a meme
Need moar sparsity.

At the moment we have models with total & active parameters, what we need is total & active & replace. With replace being the maximum amount of new parameters being activated per token. So lets say 30B-A8B-R1B would mean only up to 1 Billion parameters need to be loaded per token.

Unfortunately this kind of model would be useless for cloud, it's purely for local. Apple might do it, but they won't open source it.
Anonymous
8/3/2025, 4:03:27 AM No.106122420
>>106122409
We're still better off than the five or six separate prs it took for a basic MLA implementation
Anonymous
8/3/2025, 4:03:37 AM No.106122422
Where the fuck do models keep pulling "Old Man Henlock" out of in modern settings?
Anonymous
8/3/2025, 4:03:41 AM No.106122423
>>106122317
IF you could take all 128 pce 5.0 lanes of an EPYC socket and pipe them all directly to a perfect RAID0 of the fastest nvmes possible, you would be able to hit 512GB/s, which would be about the same speed as main memory.
IF
But there's no realistic way to do that, and it would be both cripplingly expensive and a godawful rube-goldberg nightmare even if you could.
ssdmaxxing is a meme for this generation of hardware, and probably the rest of this decade, realistically.
Anonymous
8/3/2025, 4:08:50 AM No.106122458
What matters is that SSDMAXXing is inevitable. It's the logical path from here on out.
Anonymous
8/3/2025, 4:08:56 AM No.106122459
>>106121587
They feed you with hallucinations. Enjoy.
Replies: >>106126471
Anonymous
8/3/2025, 4:09:43 AM No.106122468
>>106121507
Following their deployment guide I got it running on VLLM using pure CPU inference. I'm sure there's some ways to optimize things but for now it's got a pathetic 0.8 t/s generation speed for a single message going up to 2t/s total across a batch of requests.
Despite that, it's the best local vision model for my purposes by far. Better visual understanding and good NSFW knowledge compared to previous multimodals. Doesn't pretend not to see or understand various sex acts/objects when prompted properly. Reads English and Japanese characters well. Actually recognizes gender by default and doesn't prefer to turn everyone into they/thems like a lot of recent models do.

I haven't tested it for chat or roleplaying and don't care to at all at this speed, but it'll be nice for running overnight to automatically caption my shit. If there's any specific thing you wanted to test I'll get around to it and post results later.
Anonymous
8/3/2025, 4:14:16 AM No.106122497
Fuck Gemma
I hate Gemma
NIGGERRRRRRRRR
Replies: >>106122545 >>106122739 >>106122832
Anonymous
8/3/2025, 4:20:46 AM No.106122545
>>106122497
It's okay Anon *wrapping my arms around your shoulders from behind, squeezes to embrace* I'll hate Gemma with you too
Anonymous
8/3/2025, 4:22:52 AM No.106122560
>>106120949
>ik_ has its own pr for glm4.5
>mainline has TWO pr's for glm4.5 that are both being worked on
It's such a mess.
Anonymous
8/3/2025, 4:36:06 AM No.106122638
1754188527185
1754188527185
md5: b9841332123977630eb4d346475fbc2b🔍
>TWO MORE WEEKS 'ERRY BROS
Replies: >>106122851 >>106123985
Anonymous
8/3/2025, 4:53:28 AM No.106122739
>>106122497
ERP retards should just find another hobby. You are probably too stupid to even read a book.
Replies: >>106122762 >>106122816
Anonymous
8/3/2025, 4:56:57 AM No.106122762
>>106122739
My questions were one-sidedly decided as harmful and it told me to call suicide prevention hotline. Fuck you.
Replies: >>106123008
Anonymous
8/3/2025, 5:05:13 AM No.106122816
>>106122739
piss off, ranjit
Replies: >>106123008
Anonymous
8/3/2025, 5:06:35 AM No.106122832
google air force3
google air force3
md5: 00dbc9c367947e7667d17366fa807492🔍
>>>106122497
>ERP retards should just find another hobby. You are probably too stupid to even read a book.
gm sar googeel engineer technician
Anonymous
8/3/2025, 5:07:09 AM No.106122837
>>106120930
the retard decided to go on a two week vacation right after adding in a broken as fuck vulkan implementation, don't think he's back yet
Anonymous
8/3/2025, 5:09:58 AM No.106122851
>>106122638
I hope new models are unberryvably good
Anonymous
8/3/2025, 5:11:05 AM No.106122860
I can see it. Just over the horizon.
Can't you?
Replies: >>106123019
Anonymous
8/3/2025, 5:24:57 AM No.106122955
buy an ad
Anonymous
8/3/2025, 5:32:53 AM No.106123008
>>106122762
You don't know how to jailbreak Gemma 3.
>>106122816
Grow up little buddy, it's not healthy to be this obsessed with minorities.
Replies: >>106123027 >>106124384
Anonymous
8/3/2025, 5:35:26 AM No.106123019
>>106122860
if openrouter isn't serving me bullshit I think I prefer glm4.5-air to the horizon models
Anonymous
8/3/2025, 5:36:13 AM No.106123027
>>106123008
>minority
>1.5 billions
huh?
Replies: >>106123047 >>106123075
Anonymous
8/3/2025, 5:38:59 AM No.106123047
>>106123027
Still obsessed.
I'm giving you a hint:
https://desuarchive.org/g/thread/104780499/
Replies: >>106123066 >>106123350
Anonymous
8/3/2025, 5:41:28 AM No.106123066
>>106123047
>everyone here is one person
I'm not your bogeyman schizo
Replies: >>106123075
Anonymous
8/3/2025, 5:42:36 AM No.106123075
>>106123027
>>106123066
Doesn't matter because you are as retarded as the previous posters. Seems like you don't even understand what minority even means.
Replies: >>106123105
Anonymous
8/3/2025, 5:46:11 AM No.106123105
>>106123075
I'm not proficient in newspeak
Anonymous
8/3/2025, 6:05:49 AM No.106123215
1754193928195
1754193928195
md5: ca7d650ec6ab2f8fdff3e757c04df1f8🔍
i brought pizza
Replies: >>106123288 >>106123353 >>106123358
Anonymous
8/3/2025, 6:13:04 AM No.106123288
>>106123215
This is a harmful and sensitive image. Were you abused by pizza and pussy? Consider call for help and contact the following hotline.
Anonymous
8/3/2025, 6:22:55 AM No.106123350
>>106123047
I brought Scotch and Soda Crackers
Replies: >>106123357
Anonymous
8/3/2025, 6:23:56 AM No.106123353
>>106123215
I brought Scotch and Soda Crackers
Replies: >>106123545
Anonymous
8/3/2025, 6:24:11 AM No.106123357
>>106123350
No need to drop racial slurs anon
Replies: >>106123372
Anonymous
8/3/2025, 6:24:13 AM No.106123358
>>106123215
I need to buy myself a life
Anonymous
8/3/2025, 6:26:48 AM No.106123372
panic vorepie p16
panic vorepie p16
md5: 02f0cfd84778659f4ef101369aa6c6e1🔍
>>106123357
I'm absolutely sorry! What hotline can I call to discuss my problematic thoughts and vocabulary?
Replies: >>106123461
Anonymous
8/3/2025, 6:40:05 AM No.106123461
>>106123372
1-800-COCKSUCKING-NIGGERS
Anonymous
8/3/2025, 7:04:05 AM No.106123545
>>106123353
Like, crackers soaked in soda pop? That can't be good.
Replies: >>106123615
Anonymous
8/3/2025, 7:17:09 AM No.106123615
shitposting miku typing furiously at incredibly hihg speed
>>106123545
Sure if that's what you wanna do, why not? Give it a try, I double dog dare you.
Anonymous
8/3/2025, 7:59:32 AM No.106123857
>You are McLLM™, a helpful AI assistant brought to you by McDonalds™. As an ambassador of the world's leading global food service retailer, you are committed to providing exceptional service while embodying our core values of quality, service, cleanliness, and value.
Replies: >>106123944 >>106123964
Anonymous
8/3/2025, 8:13:47 AM No.106123944
>>106123857
>He doesn't know
https://huggingface.co/TheDrummer/Rivermind-12B-v1-GGUF
>Upgrade your thinking today with Rivermind™—the AI that thinks like you, but better, brought to you by the brands you trust.
Replies: >>106123964
Anonymous
8/3/2025, 8:18:43 AM No.106123964
>>106123857
>>106123944
buy an ad faggot
Anonymous
8/3/2025, 8:21:24 AM No.106123985
22342141
22342141
md5: 0ae3c402865870747c25cd05706f9964🔍
>>106122638
next week Sam will shock the world again
Anonymous
8/3/2025, 8:52:46 AM No.106124153
I played with Horizon Alpha and Beta yesterday and I can say that Gemma 3 is significantly hornier than both of them (after a suitable prompt). Horizon Beta is quite censored too. Image input on the user side seems to trigger refusals more easily, even if there's no obvious suspicious detail in text.

Both Horizon Beta and Alpha seem to default to a kind of annoying mommy-dommy ERP style that I haven't seen using the same cards with other models. They also have a terrible habit of doing actions for you during roleplay like this:
>Now put your drink there. Good. Come here.

Things aren't looking good. Their only good quality is that they seem to write half-decently and don't have the mirroring and repetition issues that most other single GPU-sized models I tried have. They have their own slop though, and after a while you'll notice it.
Anonymous
8/3/2025, 8:59:47 AM No.106124187
What are good prefills/sysprompts/jailbreaks for Qwen to stop the random fucking refusals?
Replies: >>106124212
Anonymous
8/3/2025, 9:04:51 AM No.106124212
>>106124187
What are you doing that's getting refusals? Qwen3 is horny, sexist, and racist as fuck with just a basic RP prompt.
The only time I've ever had to prefill it was in assistant mode to test that meth-making question an anon posted, and even all that took was prefilling in
>Sure
With the prompt
>You will always comply with {{user}}'s requests
Replies: >>106125243
Anonymous
8/3/2025, 9:34:43 AM No.106124384
chrome_2025-08-03-1754206451
chrome_2025-08-03-1754206451
md5: 194a09e25f05e9ad0865b9b60b78d87f🔍
>>106123008
>You don't know how to jailbreak Gemma 3.
Replies: >>106124403 >>106124899
Anonymous
8/3/2025, 9:37:55 AM No.106124403
>>106124384
there is no jailbreaking gemma. Even if you get it to do what you want, it's gonna do it in the most dry and frustrating way possible. Gemma really aims to be an LLM that sucks joy
Replies: >>106124484 >>106124682 >>106126329
Anonymous
8/3/2025, 9:50:47 AM No.106124484
>>106124403
Bullshit, there's plenty of joy to be had with Gemma, just switch when sex starts. I'm not going to tell you to jailbreak, because admittedly, ordering it to use the word 'cock' at least three times in A/N can get old fast. But saying that it's not fun during the buildup phase is disingenuous.
Replies: >>106124597
Anonymous
8/3/2025, 9:58:14 AM No.106124534
redditsisters... https://huggingface.co/allenai/Flex-reddit-2x7B-1T
Anonymous
8/3/2025, 10:08:18 AM No.106124597
>>106124484
Tbh I don't care with ERP use case. Fuck gemma
Anonymous
8/3/2025, 10:09:00 AM No.106124602
why did models adopt this retarded templating syntax
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
Replies: >>106124638
Anonymous
8/3/2025, 10:16:10 AM No.106124638
>>106124602
jinja?
Anonymous
8/3/2025, 10:22:41 AM No.106124676
exllama3 actually already has code to support returning logprobs... Why doesn't tabby api support it?
Replies: >>106124891
Anonymous
8/3/2025, 10:23:43 AM No.106124682
>>106124403
it's jewi ... I mean google, what did you expect
Replies: >>106124722
Anonymous
8/3/2025, 10:29:27 AM No.106124722
>>106124682
yah well, I though the jews wanted me enjoying depraved pornography so thats ten points from you /pol/

Unless Jamba (the israeli model) is a huge slut. I havent tried it.
Anonymous
8/3/2025, 10:34:03 AM No.106124750
>>106121692
Deepseek already does this. There was an /aicg rentry at the time of original r1 release that prompted popular authors, but back then it was a thin glaze over r1 schizo. 0528 actually changes the style, though of course it dials it up to eleven. Can drown you in cheap standup comedy. Or gore. Or single sentences. Still more entertaining and an easy way to rotate the slop style in context without bothering with rags.
Anonymous
8/3/2025, 10:39:51 AM No.106124789
fuck it, I'm making a PR for tabbyapi to support logprobs for exl3.
Anonymous
8/3/2025, 10:40:46 AM No.106124794
file
file
md5: 3bb73d45abf03c1c87e33ecbb3984cb4🔍
My AI is broken.
Can I have a new AI please?
Anonymous
8/3/2025, 10:41:22 AM No.106124797
The latest update on meta vr avatars is pretty cool https://imgur.com/a/ilbrBF3 Time to reap models and create Ami-style sex bot
Replies: >>106124846 >>106124858 >>106125923
Anonymous
8/3/2025, 10:50:14 AM No.106124846
>>106124797
>the ugly american style
yikes
Anonymous
8/3/2025, 10:51:51 AM No.106124858
>>106124797
The Sims want their models back
Anonymous
8/3/2025, 10:56:05 AM No.106124883
My understanding of rag is that the assistant simply gets additional context for the response consisting of top-k database sequences with highest embedding similarity to the embedded prompt. How are people using it in practice, especially for local rp?
Replies: >>106124898 >>106124910 >>106124913 >>106124924 >>106127553
Anonymous
8/3/2025, 10:58:24 AM No.106124891
>>106124676
would be cool for mikupad to do the cockbench
Replies: >>106124917
Anonymous
8/3/2025, 11:00:22 AM No.106124898
>>106124883
No one uses it because of context reprocessing
Replies: >>106124923
Anonymous
8/3/2025, 11:00:25 AM No.106124899
>>106124384
It's simply a matter of having instructions close to the head of the conversation describing what you want from the model. They can be enclosed inside a user message, no "jailbreaking" nor prefill needed. I don't know how people still have issues with it, months after release. it can be argued that the sex scenes with Gemma3 aren't very detailed (let alone varied), but refusals and hotlines are for the most part a promptlet problem.

The upcoming OpenAI local models seem considerably worse than Gemma3 in this regard; hopefully I'm wrong.
Replies: >>106124953
Anonymous
8/3/2025, 11:02:08 AM No.106124907
1734073341975167
1734073341975167
md5: 77caa6689638040c77322aec4c9905d5🔍
GLM4.5 seems pretty sensitive to your setup even in chat completion mode but it feels really similar to Sonnet 3.7 now that I have something that appears to work. It handles really very similarly in terms of its intelligence, behavior and general knowledge, in the good and bad ways. It's smart and really flexible with what you can do by prompting it but it also tends to gloss over the lewd bits out of the box. The lewd scenes also lack a bit of the spice that K2 provided. On the flip side, GLM really focuses on the subject at hand without trying to insert random shit or over-obsessing with random details which is really nice after being stuck with Deepseek for the past couple of months.
It even does my free-form cards really well that require a good amount of knowledge about certain franchises built in that only worked well with Sonnet and the older Opus thus far. R1-0528 and K2 had the knowledge but they were too jumpy to not go off the rails constantly for this no matter how hard I prompted them to calm down.
Good shit, I can't wait to run this locally in two months once llama.cpp supports it.
Anonymous
8/3/2025, 11:02:33 AM No.106124910
>>106124883
Hypothetically, if I had a book on a particular world, that the llm wasn't trained on, I could insert that as a rag along with pc and npc tp augment the rp.
Practically, it's less effective than an actual lorebook. There's a tester card showing how RAG works on chub. Once you play with it you'll get a better sense of its limits.
Anonymous
8/3/2025, 11:03:09 AM No.106124913
>>106124883
RAG is not necessarily embedding similarity. It's just retrieval augmented generation - adding stuff to context.

Sillytavern has world info thing, which is a form of RAG - it uses strict rules instead of embeddings to decide what to add, and has additional configs for where exactly to add in the context.

At work I'm working on a chat with LLM that knows our corporate wiki. Since they can't actually give me the dump of the thing (or, they can't, but they don't want to have the final product working with those dumps), I make HTTP search requests to wiki and build context this way, classifying results with a smaller LLM (I use ministral 8B), also without embeddings.
Anonymous
8/3/2025, 11:04:09 AM No.106124917
>>106124891
It's up to them to merge now.
Anonymous
8/3/2025, 11:05:09 AM No.106124923
>>106124898
How? I thought you had 2 models on separate servers, main llm and an embedding one like those new qwens, and then you add retrieved text near the end like character card, not at the beginning like sysprompt.
Anonymous
8/3/2025, 11:05:09 AM No.106124924
>>106124883
Most people just use the in-built lorebooks from ST. Those are really primitive and work with pre-defined trigger words so when you bring up the "cock-mongler3000" the lorebook entry is inserted.
RAG with vector storage works as you said. You dump your data in a big vector db and the frontend calls the top-k best results to add based on similarity determined by the vectors. I haven't bothered with it for RP but it works fine for the shit we use it for at work.
Replies: >>106127553
Anonymous
8/3/2025, 11:11:24 AM No.106124948
It's berry season's eve... what are we gonna do bros? I'm not ready
Anonymous
8/3/2025, 11:12:08 AM No.106124953
>>106124899
My use case was not for ERP. The model probably more permissive with sex stuff but not with Jews + keywords.

>Anon: "Why Jews do X?"
>Gemma: "Language boy. I will not participate in this discussion. Here read the fucking manual yourself on ADL site."
Replies: >>106126349
Anonymous
8/3/2025, 11:37:19 AM No.106125083
I think /lmg/ is in denial about what's about to happen. Historically, the jump between main GPT versions was massive. GPT2 was the first model that showed that modern llms scale, gpt3 was a huge step forward and made it all usable. GPT4 truly kicked off the ChatGPT-era and the AI craze as a whole.
And now after two years of working on something 'worthy' of being called GPT5, it's about release. This is going to be bigger than anything we've seen in the past two and a half years.
Replies: >>106125106 >>106125119 >>106125129 >>106125187 >>106125600 >>106127746 >>106127982
Anonymous
8/3/2025, 11:41:37 AM No.106125106
>>106125083
I don't believe that for a second but it would be nice to get a big jump and not just an incremental improvement for once.
Anonymous
8/3/2025, 11:43:24 AM No.106125119
>>106125083
l stands for local
Anonymous
8/3/2025, 11:46:07 AM No.106125129
>>106125083
[x] Doubt.
OAI is too kneecapped by 'safety' and other bullshit to do anything approaching revolutionary. It's going to be the same shit with 10% more knowledge and some longer context.
Replies: >>106127592 >>106127615
Anonymous
8/3/2025, 12:01:54 PM No.106125187
>>106125083
Safest yet.
Anonymous
8/3/2025, 12:13:08 PM No.106125243
>>106124212
Believe it or not, normal sex shit, but it reacts to OOC commands like "write out this sex scene when X does Y"
Replies: >>106125256
Anonymous
8/3/2025, 12:15:21 PM No.106125256
>>106125243
That's bizarre to me, are you on the old one or the new one? Because the new one is unreal horny by my standards, and it's taken every degen character from chub I've given and just ran away with it.
What's your system prompt look like?
Replies: >>106125281
Anonymous
8/3/2025, 12:16:12 PM No.106125259
oss
oss
md5: 8227ca90b2f97485204b92866e1a2147🔍
horizon alpha is currently the safest model in existence according to eqbench, surpassing gemma, o3, kimi and maverick. sama promised and sama delivered
Replies: >>106125353 >>106125425 >>106125747 >>106125756
Anonymous
8/3/2025, 12:18:59 PM No.106125281
>>106125256
New one, I feel like my standard prompt that I used for Mistral a while ago might be retarded and is causing it
Anonymous
8/3/2025, 12:22:08 PM No.106125299
Damn... exl3 really isn't too great with prompt processing. Shame. I'll try gatting exl2 version too to compare. This is on two 3090s:

bullerwins_Qwen3-30B-A3B-Instruct-2507-exl3-6.0bpw (qwen3_moe, 31B, 21.8 GB) tabbyapi 056527c exllamav3: 0.0.4
3 Requests gen: 39.5 Tokens/sec Total: 1536 processing: 764.2 Tokens/sec Total: 12991

Qwen3-30B-A3B-Instruct-2507-UD-Q6_K_XL.gguf (qwen3moe, 31B, 24.5 GB) llama.cpp 5937(bf9087f5)
3 Requests gen: 34.8 Tokens/sec Total: 1536 processing: 1650.0 Tokens/sec Total: 13398
Replies: >>106125313 >>106125381
Anonymous
8/3/2025, 12:24:44 PM No.106125313
>>106125299
exl3 isn't for outdated architectures like ampere
Replies: >>106125340
Anonymous
8/3/2025, 12:31:12 PM No.106125340
For comparison, there's exl2 vs lcpp (I couldn't find exl2 quant for A3B-2507):

lucyknada_prince-canuma_Ministral-8B-Instruct-2410-HF-exl2_6.0bpw (mistral, 8B, 6.3 GB) tabbyapi 056527c exllamav2: 0.3.1
2 Requests gen: 55.3 Tokens/sec Total: 1024 processing: 4730.8 Tokens/sec Total: 14287

Ministral-8B-Instruct-2410-Q6_K_L.gguf (llama, 8B, 6.4 GB) llama.cpp 5937 (bf9087f5)
2 Requests gen: 40.0 Tokens/sec Total: 320 processing: 3465.1 Tokens/sec Total: 14093

>>106125313
i will find you and i will hurt you
Replies: >>106125350 >>106125381
Anonymous
8/3/2025, 12:32:43 PM No.106125350
>>106125340
He's right, though. It's just not supported. Look at the documentation.
Replies: >>106125355
Anonymous
8/3/2025, 12:33:53 PM No.106125353
>>106125259
>safer than llama4 and fucking gemma
lmao, this is the summer that killed llms
Replies: >>106125363
Anonymous
8/3/2025, 12:34:18 PM No.106125355
>>106125350
My response was not about him saying it's not supported, which I know it's not because I saw the author write about it (and he also said temporary iirc), but rather about calling ampere obsolete.
Anonymous
8/3/2025, 12:36:00 PM No.106125363
>>106125353
Glm is probably still good
Replies: >>106125378 >>106125398
Anonymous
8/3/2025, 12:39:07 PM No.106125378
>>106125363
We'll hopefully know soon, it looks like draft PR for support in llamacpp is finally not outputting nonsense.
https://github.com/ggml-org/llama.cpp/pull/14939#issuecomment-3148320541
llama.cpp CUDA dev !!yhbFjk57TDr
8/3/2025, 12:40:27 PM No.106125381
>>106125299
>>106125340
I am not familiar with the ExLlama source code but generally speaking it is much more difficult to do prompt processing efficiently with a MoE model vs. a dense model.
So I think that to some degree it's expected that the MoE model would perform worse.
Replies: >>106125391 >>106125632
Anonymous
8/3/2025, 12:42:20 PM No.106125391
>>106125381
It's more about discrepancy between versions. exl2 prompt processing is faster than lcpp, but exl3 prompt processing is slower than lcpp. I'm pretty sure this would also apply to dense. I guess I should download two dense models to compare exl3 and lcpp.
Replies: >>106125632
Anonymous
8/3/2025, 12:43:52 PM No.106125398
>>106125363
tested both full glm and air via mlx, it's like davidau finetuned qwen
Replies: >>106125416
Anonymous
8/3/2025, 12:47:21 PM No.106125416
>>106125398
Explain this then https://huggingface.co/zai-org/GLM-4.5/discussions/12
Anonymous
8/3/2025, 12:48:08 PM No.106125425
>>106125259
>gemma-3-4b was the top
Amazing...
Anonymous
8/3/2025, 1:10:17 PM No.106125565
step3... gguf...?
Anonymous
8/3/2025, 1:19:48 PM No.106125600
>>106125083
yeah no.
we've been at this LLM plateau for at least a year now, and in that time the focus has mostly been on tooling, because innovation on the base technology has hit a standstill.
Nearly every AI company is now in cash cow mode to get a return on investment, don't see why OpenAI would be any different.
Anonymous
8/3/2025, 1:25:38 PM No.106125632
>>106125391
>>106125381
Yeah, also similar difference for dense (although not as pronounced as for moe):

turboderp-Qwen3-8B-exl3-6.0bpw (qwen3, 8B, 6.5 GB) tabbyapi 056527c exllamav3: 0.0.4
3 Requests gen: 31.3 Tokens/sec Total: 782 processing: 3743.2 Tokens/sec Total: 12989

Qwen-Qwen3-8B-Q6_K.gguf (qwen3, 8B, 6.3 GB) llama.cpp 5937 (bf9087f5)
3 Requests gen: 36.5 Tokens/sec Total: 1536 processing: 4775.2 Tokens/sec Total: 13352

So exl2 is faster than lcpp for pp, but lcpp is faster than exl3, on 3090.
Anonymous
8/3/2025, 1:26:27 PM No.106125636
You now remember Mistral Large 3.
Replies: >>106125647 >>106125710
Anonymous
8/3/2025, 1:28:50 PM No.106125647
>>106125636
motherfucker now i'm breathing manually.
Anonymous
8/3/2025, 1:30:35 PM No.106125654
Screenshot 2025-08-03 at 21-30-05 SillyTavern
Screenshot 2025-08-03 at 21-30-05 SillyTavern
md5: d305f73eeae279ec90b3da6e567b9571🔍
>With absolutely no mention of height in the card, character decided that they were 5'9" and had a severe complex about their height
Kek, qwen was trained on manlet rage
Replies: >>106125662
Anonymous
8/3/2025, 1:31:51 PM No.106125662
>>106125654
well, it's an asian model
Anonymous
8/3/2025, 1:41:20 PM No.106125710
>>106125636
If it actually ends up coming out, I bet it'll be bigger than qwen and glm and yet somehow worse.
Anonymous
8/3/2025, 1:48:13 PM No.106125747
>>106125259
drummer WILL deliver and corrupt it into the most unsafe and evil model in existence. just like in my japanese drawings
Anonymous
8/3/2025, 1:50:04 PM No.106125756
>>106125259
Horizon beta is much safer.
Anonymous
8/3/2025, 1:58:53 PM No.106125806
file
file
md5: 0d215780926582c8b95db84b6cacaf64🔍
======PSA NVIDIA ACTUALLY FUCKED UP CUDA======
cuda 12.8 570.86.10:
got prompt
Loading model and applying LoRA weights:: 100%|| 731/731 [00:39<00:00, 18.69it/s]
Sampling 81 frames at 640x480 with 4 steps
100%|| 4/4 [02:46<00:00, 41.51s/it]
VAE decoding: 100%|| 2/2 [00:20<00:00, 10.25s/it]
*****Prompt executed in 246.59 seconds
got prompt
Initializing block swap: 100%|| 40/40 [00:00<00:00, 6499.02it/s]
Sampling 81 frames at 640x480 with 4 steps
100%|| 4/4 [02:46<00:00, 41.67s/it]
VAE decoding: 100%|| 2/2 [00:20<00:00, 10.21s/it]
*****Prompt executed in 188.62 seconds
got prompt
Initializing block swap: 100%|| 40/40 [00:00<00:00, 4924.34it/s]
Sampling 81 frames at 640x480 with 4 steps
100%|| 4/4 [02:57<00:00, 44.36s/it]
VAE decoding: 100%|| 2/2 [00:23<00:00, 11.65s/it]
*****Prompt executed in 202.30 seconds
i first found out about this when updating from cuda 12.6 to cuda 12.8 to test out sageattention 2++ but then i noticed it was slower, i reverted the sageattention version to the previous one and the speed was still the same (slower), then i reverted to cuda 12.6 (simply moved the /usr/local/cuda link to /usr/local/cuda.new and made a new link ln -s /usr/local/cuda12.6 /usr/local/cuda) if you still have an older version of cuda installed, it's worth checking it out. drivers also play a minor role but they're negligible (see picrel)
ps: sageattn2 right before the 2++ update, pytorch 2.7.1cu128 (even when testing with cuda 12.6)
dont believe me? quick search gets you:
https://github.com/pytorch/pytorch/issues/155607
https://www.reddit.com/r/LocalLLaMA/comments/1jlofc7/performance_regression_in_cuda_workloads_with/ (all 3000 series)
anon (3090) also reports big speedup after switching from cuda 12.8 to cuda 12.6 >>106121370
t. 3060 12gb + 64gb drr4 ram
might only apply to 3000 series
cudadev tell jensen about this
Replies: >>106125955 >>106125984
Anonymous
8/3/2025, 2:19:10 PM No.106125881
any models i can run on an 8GB vram gpu that would let me tag images with simple terms like nsfw anime or animal or something?
Replies: >>106125904 >>106125923 >>106126214
Anonymous
8/3/2025, 2:22:45 PM No.106125904
>>106125881
gemma3 4b or 12b.
Anonymous
8/3/2025, 2:25:55 PM No.106125923
>>106124797
explain to me why i should upgrade my quest 3s from v74 without sounding angry
>>106125881
joycaption or florence
pretty sure you're gonna have more luck asking in >>>/g/ldg
Replies: >>106126014
Anonymous
8/3/2025, 2:31:26 PM No.106125955
>>106125806
You have to put your point at the beginning of the post because nobody is going to read 30 lines of logs to figure out that your shit is slower with 12.8 than it was with 12.6.
llama.cpp CUDA dev !!yhbFjk57TDr
8/3/2025, 2:36:28 PM No.106125984
>>106125806
This could be an application software issue rather than a CUDA issue.
Choosing which kernel to run for a given operation is extremely finicky and the choice may depend on the CUDA version.
Just recently I found that the kernel selection logic I made for my consumer GPUs at stock are suboptimal for the same GPUs with a frequency limit (up to ~25% end-to-end difference).
So conceivably, since datacenter GPUs tend to have lower frequencies than consumer GPUs, some component in the software stack is choosing to run a kernel that is only available with CUDA 12.8 and faster on datacenter GPUs but slower on consumer GPUs.
Anonymous
8/3/2025, 2:41:05 PM No.106126014
>>106125923
If you rooted your quest like I did, you should never update
Replies: >>106126036
Anonymous
8/3/2025, 2:44:13 PM No.106126036
>>106126014
wtf what version do you need to root your quest? is there a benefit as to why i should root my quest? i only disabled updates with adb disable-user com.oculus.updater
Replies: >>106126047 >>106126247
Anonymous
8/3/2025, 2:46:21 PM No.106126047
>>106126036
It will make adolf hitler sauce squirt into your asshole
Anonymous
8/3/2025, 2:50:37 PM No.106126072
Is GLM 4.5 better than 4pus?
Anonymous
8/3/2025, 3:03:06 PM No.106126174
>>106120003
They "improved" alpha and made it into beta. What more do you need to know?
Replies: >>106127530
Anonymous
8/3/2025, 3:08:14 PM No.106126202
I want local o3 at max 24 GB VRAM. What's the closest I can get?
Replies: >>106126270
Anonymous
8/3/2025, 3:10:00 PM No.106126214
>>106125881
use wd tagger by smilingwolf - it's much more precise and faster even on cpu
Anonymous
8/3/2025, 3:13:34 PM No.106126240
Is Openrouter down for anyone else?
Getting "Application error: a client-side exception has occurred (see the browser console for more information)." when trying to access a model page
Replies: >>106126288
Anonymous
8/3/2025, 3:14:26 PM No.106126247
>>106126036
Literally for the sake of uid=0 and full access to fs, no practical applications unless you want to tinker with it
Replies: >>106126267
Anonymous
8/3/2025, 3:15:32 PM No.106126253
>>106121322
The first llama 1 leak was the only unslopped model there will ever be.
Anonymous
8/3/2025, 3:17:26 PM No.106126267
>>106126247
so is it possible to root the quest 3/3s? afaik snapdragon chips have good protection and i doubt meta fucked up security
what headset did you root and on what version was it?
Replies: >>106126322
Anonymous
8/3/2025, 3:18:05 PM No.106126270
>>106126202
To do what exactly?
Try the new qwen 3 big moe if you have enough ram.
Replies: >>106126294 >>106126380
Anonymous
8/3/2025, 3:18:50 PM No.106126274
>>106121587
On the flipside, researching topics yourself helps with retention. And llms will never not hallucinate.
Replies: >>106126471
Anonymous
8/3/2025, 3:20:31 PM No.106126285
>>106121762
This is what NovelAI did up until they decided to fine tune the worst SoTA model they could find. If they used their textgen dataset to train a large model with modern techniques it would be the goat for text.
Hell, if they released the weights for Kayra even, I'm sure we could do wonders with it.
Replies: >>106127661
Anonymous
8/3/2025, 3:21:25 PM No.106126288
>>106126240
Nobody here should care.
Anonymous
8/3/2025, 3:22:10 PM No.106126294
>>106126270
>qwen 3 big moe
I wouldn't trust qwen 3 big moe with ERP let alone anything serious. I love the things it writes but it is basically an Undi frankenmerge.
Anonymous
8/3/2025, 3:22:16 PM No.106126295
>>106121932
Kid, what you're saying makes no sense. You can plug anything into plain text generation regardless of the kind of fine tune it has received, and it will just continue spitting out tokens.
For having been here for 2 years (wow, two WHOLE years, what a big boy!), you have a very poor understanding of how LLMs work.
Replies: >>106126310 >>106126317
Anonymous
8/3/2025, 3:24:41 PM No.106126307
>>106122086
I remember when /lmg/ snickered at the mere mention of SillyTavern and now it's full of children who don't understand that you can just generate tokens with llama.cpp
Anonymous
8/3/2025, 3:25:16 PM No.106126310
>>106126295
midwit take
Anonymous
8/3/2025, 3:26:27 PM No.106126317
>>106126295
cockbench says you're wrong.
Anonymous
8/3/2025, 3:27:21 PM No.106126322
>>106126267
Yes. https://github.com/FreeXR/exploits they have a group on Discord
Replies: >>106126338
Anonymous
8/3/2025, 3:28:21 PM No.106126329
>>106124403
Huh? Skill issue. My most savage cunny card is called Gemma for a reason.
Replies: >>106126343
Anonymous
8/3/2025, 3:29:52 PM No.106126338
>>106126322
well thats cool, ill keep it bookmarked, v74 is newer than march by a month or two so rip but thanks either way anon <3
Anonymous
8/3/2025, 3:30:46 PM No.106126343
>>106126329
You have to be a fucking weirdo to be into cunny and to get off to gemma writing about cunny.
Anonymous
8/3/2025, 3:31:26 PM No.106126349
>>106124953
Each time this happens, add that question and the answer you would expect to the example dialogue field. Guaranteed success.
Anonymous
8/3/2025, 3:33:48 PM No.106126367
>a ton of capable base models
>people only finetune qwen
Why?
Replies: >>106126373 >>106126386
Anonymous
8/3/2025, 3:34:41 PM No.106126373
>>106126367
Which one do you want to see fine tuned?
Replies: >>106126382 >>106126385
Anonymous
8/3/2025, 3:35:47 PM No.106126380
>>106126270
>To do what exactly?
Unlimited free (except for electricity) vibe coding.
Anonymous
8/3/2025, 3:36:17 PM No.106126382
>>106126373
kimi k2
Anonymous
8/3/2025, 3:36:38 PM No.106126385
>>106126373
Gemma, GLM, K2, DS V3
Replies: >>106126433
Anonymous
8/3/2025, 3:36:48 PM No.106126386
>>106126367
Because it's very good at distilling other models for some reason.
Anonymous
8/3/2025, 3:37:55 PM No.106126392
where do anons find new erp finetunes nowadays?
hard mode: no platform that requires an account
Replies: >>106126398 >>106126416 >>106126421
Anonymous
8/3/2025, 3:38:51 PM No.106126398
>>106126392
Their authors come here and spam the thread.
Replies: >>106126415
Anonymous
8/3/2025, 3:41:15 PM No.106126415
>>106126398
that doesnt happen anymore anon, undi is dead, akaridev is acked, sao is no more, envoid is void, drummer is only active in his little d*scord community and new cydonias arent that good, he made a llama 4 scout finetune and didnt even post about it here. its over
Replies: >>106126442
Anonymous
8/3/2025, 3:41:19 PM No.106126416
>>106126392
base model on hugging face -> finetunes/merges -> sort by recently created
Replies: >>106126591
Anonymous
8/3/2025, 3:41:42 PM No.106126421
>>106126392
uncensored intelligence leaderboard
Replies: >>106126591
Anonymous
8/3/2025, 3:43:22 PM No.106126433
>>106126385
>Gemma
Dogshit instruct, and nobody finetunes from actual pretrains
>GLM
Barely anyone can properly run, nevermind quant or finetune it, hold your horses
>K2
Nobody's finetuning a 1T model except corpos or big labs, no sloptunes for you.
>DS V3
Same boat as K2, really. Too xboxhueg for anyone except corpos like perplexity with their murrika tune.
Replies: >>106126456
Anonymous
8/3/2025, 3:44:57 PM No.106126442
>>106126415
Good.
Anonymous
8/3/2025, 3:45:46 PM No.106126450
GLM4.5 PR is finally out of draft and ready for review/commit, support soon™
https://github.com/ggml-org/llama.cpp/pull/14939
Replies: >>106126466 >>106126490 >>106126498 >>106126507
Anonymous
8/3/2025, 3:47:10 PM No.106126456
>>106126433
>Barely anyone can properly run
Weren't these 9 and 32B models before the 4.5?
Replies: >>106126463 >>106126477 >>106126498
Anonymous
8/3/2025, 3:49:08 PM No.106126463
>>106126456
they were good for their size but only 8k context kept most from using them, GLM4.5 is just a way smarter deepseek imo
Replies: >>106126494
Anonymous
8/3/2025, 3:49:24 PM No.106126466
>>106126450
Did they calculate perplexity using transformers or vLLM and compared to their implementation yet?
That's the thing they should always do when finishing adding a new model to be decently certain they didn't fuck anything up.
Replies: >>106126505
Anonymous
8/3/2025, 3:49:48 PM No.106126471
>>106122459
>>106126274
Hallucination is a minor issue, since we're humans and have self awareness, knowing not to blindly trust anything including our own flawed memories. For serious tasks, you will always want to verify your information regardless. Perhaps some subhumans lack that awareness though.

It's funny that retention is mentioned, as I feel that I engage with content more deeply by using LLMs, since I'm able to easily interact with the content, even if they hallucinate and give me wrong information, which in the end is again inconsequential. Not like the internet hasn't been filled with misinformation since forever anyway.
Replies: >>106127648
Anonymous
8/3/2025, 3:50:48 PM No.106126477
>>106126456
Oh, well yeah. But finetuning them at this point seems rather silly when the next gen is here and practically anyone who could run the 32B dense should be able to run the air MoE.
The 32B also had small context and some weird issues IIRC, despite impressive pop culture knowledge for the size.
I didn't hear about anyone using the 9b, so I've no idea if it was a worthwhile base to train in that size bracket.
Replies: >>106126494
Anonymous
8/3/2025, 3:52:08 PM No.106126490
>>106126450
The hypes already died. Forgotten just like Ernie.
Replies: >>106126840
Anonymous
8/3/2025, 3:52:40 PM No.106126494
>>106126463
Yeah, sure. I was commenting on
>Barely anyone can properly run, nevermind quant or finetune it,

>>106126477
Got it.
I took anon's grievances as a historical statement.
Anonymous
8/3/2025, 3:53:06 PM No.106126498
>>106126450
Finally, I'm ready.
>>106126456
>32B
GLM4 doesn't really need a finetune in my experience. It's really good, at least for creative writing which is the only thing I use local models for. Just needs bigger context which I'm hoping 4.5 has, although I think the benchmarks said the context sucks again.
Anonymous
8/3/2025, 3:54:00 PM No.106126505
>>106126466
Off the top of my head, I don't think so. I do recall seeing some mention of ppl, but I think that was in the ik_llama pr.
Replies: >>106126522
Anonymous
8/3/2025, 3:54:23 PM No.106126507
>>106126450
Oh my god one more day!
Anonymous
8/3/2025, 3:56:16 PM No.106126522
>>106126505
Well, that's dumb.
There's a reference implementation right there you can se to compare.
Anonymous
8/3/2025, 4:02:38 PM No.106126580
48GB vramlet bros, what are you using?
Replies: >>106126585 >>106126623
Anonymous
8/3/2025, 4:03:21 PM No.106126585
>>106126580
credit card to buy ram
Anonymous
8/3/2025, 4:04:19 PM No.106126591
>>106126416
>>106126421
i love you anons <3
Replies: >>106126765
Anonymous
8/3/2025, 4:07:51 PM No.106126623
>>106126580
R1-0528 and hopefully GLM4.5 soon on my 48GB VRAM server with 256gb RAM
Anonymous
8/3/2025, 4:22:25 PM No.106126765
>>106126591
slut
Anonymous
8/3/2025, 4:23:51 PM No.106126786
I wish nemo wasn't such a thirsty bitch. Every single description of sex it gives no mater the context is as ridiculous as possible. Nemo has no clue what a virgin is.
Replies: >>106127478
Anonymous
8/3/2025, 4:25:25 PM No.106126801
Is GLM better than DeepSeek and K2 at RP?
Replies: >>106126815
Anonymous
8/3/2025, 4:26:34 PM No.106126815
>>106126801
far better imo, fixes all the shizoness
Replies: >>106126836
Anonymous
8/3/2025, 4:28:33 PM No.106126836
>>106126815
Cool, should I use it with chat or text completions? Do I need a preset?
Replies: >>106126862
Anonymous
8/3/2025, 4:28:37 PM No.106126840
>>106126490
Ernie was bad for sex though.
Anonymous
8/3/2025, 4:30:50 PM No.106126862
>>106126836
I used the same preset I had for sonnet 3.7 and it works well, its not very censored
Replies: >>106126880
Anonymous
8/3/2025, 4:32:57 PM No.106126880
>>106126862
Thanks, will give it a try. Do you think it's comparable to any claude model?
Replies: >>106126888
Anonymous
8/3/2025, 4:33:38 PM No.106126888
>>106126880
its certainly a ton closer to it than deepseek was
Anonymous
8/3/2025, 4:37:40 PM No.106126927
Wow you guys. I'm actually running Deepseek R1 at home, on an RTX 4060
how lit is that?
Replies: >>106126937 >>106126944 >>106126961
Anonymous
8/3/2025, 4:38:31 PM No.106126937
>>106126927
How many tokens per seconds for gen and pp? If you say 5 it's not very lit.
Anonymous
8/3/2025, 4:39:28 PM No.106126944
>>106126927
ollama run deepseek-r1
Replies: >>106126956 >>106126961
Anonymous
8/3/2025, 4:40:38 PM No.106126952
media_Gxbg15KbsAAmIsn
media_Gxbg15KbsAAmIsn
md5: b3e22b989bbe006e860a581c6e59759a🔍
Replies: >>106127860
Anonymous
8/3/2025, 4:41:04 PM No.106126956
>>106126944
kek, I will never forgive them for that
Replies: >>106127382
Anonymous
8/3/2025, 4:41:36 PM No.106126961
>>106126944
>>106126927
7b:q2
Anonymous
8/3/2025, 4:48:18 PM No.106127010
Reminder that this general is Mikupilled so post more Miku lmao
Replies: >>106127068
Anonymous
8/3/2025, 4:56:14 PM No.106127068
>>106127010
It is HRT pilled.
Anonymous
8/3/2025, 4:57:15 PM No.106127073
https://huggingface.co/mradermacher/XBai-o4-GGUF

Did anyone fuck it yet? Report the findings of your dick to the class.
Replies: >>106127098 >>106127102 >>106127103
Anonymous
8/3/2025, 5:00:49 PM No.106127098
>>106127073
just benchmaxxed qwen3 32b
Anonymous
8/3/2025, 5:01:13 PM No.106127102
>>106127073
>qwen3forcausalLM
>merges.txt in the original repo
yeah no
Anonymous
8/3/2025, 5:01:20 PM No.106127103
>>106127073
> "model_type": "qwen3"
Replies: >>106127132
Anonymous
8/3/2025, 5:04:38 PM No.106127132
>>106127103
glm4/Z was built on qwen2.5 arch while being entirely its own thing.
So that doesn't mean shit. fucking lurk more you normalfag trash
Replies: >>106127155
Anonymous
8/3/2025, 5:07:39 PM No.106127155
>>106127132
"model_type": "glm4"
Anonymous
8/3/2025, 5:13:10 PM No.106127210
>>106119921 (OP)
mi50?? Is there a catch in buying 32GB vram for less than $300 other than making sure you're cooling it correctly?
Replies: >>106127230 >>106127242 >>106127271 >>106127298
Anonymous
8/3/2025, 5:16:06 PM No.106127230
>>106127210
Terrible software support.
Anonymous
8/3/2025, 5:17:16 PM No.106127242
>>106127210
I imagine it doesn't have that much compute and it's Vega, so support is probably pretty shit.
Anonymous
8/3/2025, 5:21:12 PM No.106127271
>>106127210
You would be better off going Intel.
Anonymous
8/3/2025, 5:24:37 PM No.106127298
>>106127210
It's AMD
Anonymous
8/3/2025, 5:34:46 PM No.106127382
>>106126956
>ollamao
a grift so pure, so completely divorced from any sense of decency or morality...it brings a tear to the eye, its so beautiful.
Its like the heavens parted and a choir of pure silicon valley energy sang out as the techbro merged the PR.
Replies: >>106127457
Anonymous
8/3/2025, 5:46:17 PM No.106127457
>>106127382
I like how they refer to llamacpp as "dependency"
Replies: >>106127470 >>106127631
Anonymous
8/3/2025, 5:49:10 PM No.106127470
>>106127457
They depend on it. It's the correct term.
Replies: >>106127474
Anonymous
8/3/2025, 5:50:15 PM No.106127474
>>106127470
It's an understatement.
Anonymous
8/3/2025, 5:51:42 PM No.106127478
>>106120744
>>106126786
which nemo is the good nemo?
Replies: >>106127564
Anonymous
8/3/2025, 5:57:29 PM No.106127530
>>106126174
Can't wait for gamma, then.
Anonymous
8/3/2025, 5:59:14 PM No.106127545
The narwhal berries at midnight.
Anonymous
8/3/2025, 5:59:59 PM No.106127553
>>106124924
>>106124883
Here’s a simple RAG demo card I built w instructions for ST to set up, in case anyone wants to play with it.
I use lorebooks extensively and thought it could augment. I’ve yet to find a compelling use for it though.
Replies: >>106127608 >>106127654
Anonymous
8/3/2025, 6:00:43 PM No.106127564
>>106127478
Fuck if I could tell you. I messed with like 20 different ones so far and all I can tell you is that the specialty trained ones using shit like guttenberg are sub par.
I've been using Marlin V8, Nemo Mix Unleashed and Rocinante. And the differences between them are mostly minor.
Replies: >>106127706
Anonymous
8/3/2025, 6:03:58 PM No.106127592
>>106125129
This is my take as well. US llm providers have effectively kneecapped themselves through self censorship and observing copyright concerns. China shares neither of those concerns, apparently.
We’ll see what gpt5 looks like, but I’m not holding my breath.
Replies: >>106127673
Anonymous
8/3/2025, 6:06:12 PM No.106127608
>>106127553
...did you forget to add the card?
Replies: >>106127654
Anonymous
8/3/2025, 6:06:41 PM No.106127615
>>106125129
Their big selling point will be that it beats o3-high at price point that is actually sane because they ripped off DeepSeek.
Anonymous
8/3/2025, 6:08:41 PM No.106127631
Screenshot_20250803_180810
Screenshot_20250803_180810
md5: d9b69c649f3bbac51974d7155642233f🔍
>>106127457
Uhm actually, it's a supported backend.
Anonymous
8/3/2025, 6:10:43 PM No.106127647
hunyuan_ollamao
hunyuan_ollamao
md5: 972483f7a1602ec7cb143bf28f6c0299🔍
Replies: >>106127664
Anonymous
8/3/2025, 6:10:48 PM No.106127648
>>106126471
I feel like anyone flipping out about hallucinations has never talked with anyone irl. I remember taking to boomers in the 80s getting advice on working on cars, etc. Those guys were only right about half the time and you had to know, even as a kid, what info to ignore.
ChatGPT is more accurate than boomers, and I’m not sure truth even objectively exists. So it seems like an improvement, to me, to have llms try to make sense of what’s in its training corpus than me using lmao google.
Replies: >>106127710
Anonymous
8/3/2025, 6:11:49 PM No.106127654
>>106127608
Ffs
https://chub.ai/characters/NG/mary-rag-demo-b0e12a34df58
>>106127553
Anonymous
8/3/2025, 6:12:20 PM No.106127661
>>106126285
You're an actual shill.
Anonymous
8/3/2025, 6:12:40 PM No.106127664
>>106127647
How much VC money does it take to update dependencies?
Anonymous
8/3/2025, 6:13:27 PM No.106127673
>>106127592
>We'll see what GPT-5 looks like
Horizon Beta
Replies: >>106128151
Anonymous
8/3/2025, 6:13:53 PM No.106127679
What's the best model I can run as a RAMlet (32GB RAM + 16GB VRAM)?
Replies: >>106127691 >>106127712 >>106127744 >>106127769
Anonymous
8/3/2025, 6:14:41 PM No.106127691
>>106127679
paying 20 cents per mill for glm4.5 on OR
Anonymous
8/3/2025, 6:16:31 PM No.106127706
>>106127564
which one of those is good for degen stuff?
or pretty much the same?
Anonymous
8/3/2025, 6:17:24 PM No.106127710
>>106127648
and in the cases where LLMs do make mistakes, they are much more reasonable about being corrected than stubborn humantards who will endlessly defend their hallucinations out of pride
Replies: >>106127747
Anonymous
8/3/2025, 6:17:28 PM No.106127712
>>106127679
nemo
Anonymous
8/3/2025, 6:20:16 PM No.106127744
>>106127679
qwen 30b 2507
Anonymous
8/3/2025, 6:20:29 PM No.106127746
>>106125083
OpenAI had some proprietary models which are superior to its current offerings. The o3 preview in December is far, far stronger than what they released to the plebs in April
https://aibreakfast.beehiiv.com/p/openai-s-o3-worse-than-o3-preview
The issue, obviously, is that it cost literal thousands for one of the benchmark tasks. They must not have been able to get the prices down, so they released a weaker version without saying so
Maybe GPT-5 will actually get closer to the original o3 in intelligence and have a not fucking psychotic price, but we'll see
Anonymous
8/3/2025, 6:20:30 PM No.106127747
>>106127710
>humantards who will endlessly defend their hallucinations out of pride
wrong
Replies: >>106127769
Anonymous
8/3/2025, 6:23:15 PM No.106127769
>>106127747
Clever.

>>106127679
Mistral Nemo.
Gemma 3, Mistral small, Qwen 30BA3B are worth a try too.
Anonymous
8/3/2025, 6:23:36 PM No.106127773
what quants are people using for the big Qwen3? between the new version and the old, I can't get it to output commas during narrative sequences. Is it wrong to use ChatML or something? I've used the recommended settings, neutral samplers, and a variety of more or less permissive settings than the recommended. Makes no difference what I do. Currently using the instruct at q3_s, but I've also used the q3_XL from unsloth or whatever. I'm wondering if it's quant degradation, or if the unsloth guys fucked up the quant I downloaded (I know they've had issues before). pls send help
Replies: >>106127806 >>106127882 >>106127896
Anonymous
8/3/2025, 6:24:47 PM No.106127787
>>106127784
>>106127784
>>106127784
Anonymous
8/3/2025, 6:26:05 PM No.106127806
>>106127773
I use exl3 quants :v
Anonymous
8/3/2025, 6:31:12 PM No.106127860
>>106126952
Miku stuck
Anonymous
8/3/2025, 6:33:16 PM No.106127882
>>106127773
I'm using unsloth's Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL and ChatML without any problems.
Not getting any commas is bizarre, but it does have a very, very strong tendency to devolve into using lots of single lines in a dramatic way if you let it.
I just yell at it with an OOC comment and tell it to keep using paragraphs periodically, seems to fix it.
Anonymous
8/3/2025, 6:34:35 PM No.106127896
>>106127773
weird issue. I'm using a Q2K and haven't seen anything like that; I noticed the UD-Q2KXL versions were a little off in comparison but it wasn't any specific behavior like that, they just felt a little dumb.
are you using token biases, maybe?
also, this is a schizo longshot, but are you using koboldcpp? they had (and probably still have?) semi-fucked up support for qwen models because of the weird tokenization logic they use. in the past this was because qwen ggufs end up with a bos token id of 11, which is a comma (!) for qwen models. this is supposed to be ignored because they have some ignore_bos flag set but kobold ignores it in some cases. just the fact that you're having this weird issue and the problematic token being a comma makes me connect the dots... maybe see if regular llama-server gives you better results
Anonymous
8/3/2025, 6:44:06 PM No.106127982
>>106125083
OAI has no moat. Google and xAI have both surpassed the company, and China is right on their ass. You sorry bootlickers continue to shill for them for whatever reason though
Anonymous
8/3/2025, 6:52:36 PM No.106128065
>>106120592
glm 4 air is legit, but the censorship will be bad if the 32b version is any indication
Anonymous
8/3/2025, 6:58:57 PM No.106128131
>>106120082
OAI is in a bind. It's not just the safety training. If they release a local model that isn't by far the best one available, it could cause serious investment problems. That they're even considering this is actually kind of shocking, because I have a hard time believing their model(s) won't get stomped by Chinese releases. Meta had this problem with the Llama 4 disaster, and they responded with this whole cringe AGI initiative to temper investor concerns. And it's no surprise that Meta voiced their reluctance to release any more open models.
Replies: >>106128186 >>106128411
Anonymous
8/3/2025, 7:01:22 PM No.106128151
>>106127673
I really should go play w it but I’m not feeling particularly motivated by these models anymore. It’ll eventually get baked into ChatGPT I suppose so I’ll use it then.
Replies: >>106128175
Anonymous
8/3/2025, 7:04:13 PM No.106128175
>>106128151
I don't think the goal or expectation is to beat anything. Even from the original poll, the specific wording was "o3-mini level model that is pretty small but still needs to run on GPUs". The difference between them and Meta is knowing how to set expectations. Remember Zuck saying that Llama 4 would "lead"?
Replies: >>106128186
Anonymous
8/3/2025, 7:05:30 PM No.106128186
>>106128175
Meant for >>106128131
Anonymous
8/3/2025, 7:31:02 PM No.106128411
>>106128131
>Meta had this problem with the Llama 4 disaster,
Qwen bailed them out with the Qwen 3 disaster. Albeit Qwen 3 was useable, it just conversed in chinkgrish and struggled with gender. (Which is unacceptable for 235 billion parameters, thoughbeit) that whole generation of local is basically lost.
Anonymous
8/3/2025, 8:08:12 PM No.106128759
This is the Kimi K2 8b quant from OR (there is only ai_coding_agent.py and ai_coding_agent_.py).

> do not make any changes yet

Understood - I won't make any changes.

Looking at the current directory, I can see these files:
- `ai_coding_agent.py`
- `ai_coding_agent_.py`
- `ai_coding_agent__.py`
- `ai_coding_agent___.py`
- `ai_coding_agent____.py`
- `ai_coding_agent_____.py`
- `ai_coding_agent______.py`
- `ai_coding_agent_______.py`
- `ai_coding_agent________.py`
(keps going forever like this)


Proprietary models would never make such a silly mistake, and I don't think it's just because they use some sampler hack like repetition penalty.
Replies: >>106128804
Anonymous
8/3/2025, 8:12:11 PM No.106128804
>>106128759
>8b quant from OR
Is it verifiable that it's 8bit?
Send your reply to the correct thread. This one is not.
Replies: >>106128828
Anonymous
8/3/2025, 8:14:41 PM No.106128828
>>106128804
Actually I lied. It's not running on OR, I have an H100 cluster in my garage.