← Home ← Back to /g/

Thread 106034671

252 posts 42 images /g/
Anonymous No.106034671 [Report] >>106034733 >>106034798 >>106035079
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106022725 & >>106011911

►News
>(07/26) Intern-S1 released: 235B multimodal reasoning model: https://hf.co/internlm/Intern-S1
>(07/25) Qwen3-235B-A22B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-235B-A22B-Thinking-2507
>(07/24) Magistral Small 1.1 update released: https://hf.co/mistralai/Magistral-Small-2507
>(07/24) YUME interactive world generation model released: https://stdstu12.github.io/YUME-Project
>(07/22) Higgs Audio v2 released: https://www.boson.ai/blog/higgs-audio-v2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106034680 [Report] >>106034773
►Recent Highlights from the Previous Thread: >>106022725

--Confidential computing may protect against providers but not against corporations or determined attackers:
>106032051 >106032109 >106032125 >106032847 >106032906 >106032943 >106033227
--Pretraining data augmentation and synthetic recycling as emerging industry standards:
>106032565 >106032668 >106032775 >106033083 >106033222 >106033274
--Optimizing 24B model inference on 16GB GPU via tensor offloading and quantization:
>106030678 >106030699 >106030726 >106030746 >106030888 >106030911 >106030936 >106030952 >106031007 >106031309 >106031127 >106031129
--Meta AI's leaked system prompt reveals poor prompt engineering with excessive negative directives:
>106030482 >106033266 >106033329 >106033349 >106033368 >106033422
--LLMs as unreliable standalone tools but useful when integrated with traditional systems:
>106025198 >106025285 >106025482 >106025526 >106025721 >106025949 >106026069 >106028082
--ST fails to remove thinking blocks starting with parentheses from context:
>106028030 >106028084 >106028130 >106029199 >106029213 >106029266
--Anon builds minimal terminal interface for LLMs:
>106024282 >106024421 >106024944 >106024758
--Intern-S1 is a Qwen3 fine-tune with 235B MoE architecture and multimodal support:
>106031261 >106031267 >106031296 >106031307 >106031358 >106031277 >106031280
--Small 350M model shows strong coherence with custom tokenizer:
>106034024 >106034064 >106034083 >106034143 >106034205
--Challenges in building persistent, stat-aware LLM-driven RPGs locally:
>106033668 >106033772 >106033817
--Anon shares thoughts on NuQwen Thinker for RP:
>106024560 >106024601 >106024668
--Links:
>106033095 >106023407 >106029240 >106023197
--Miku (free space):
>106022834 >106022983 >106023491 >106026316 >106029061 >106029200 >106030177 >106031397

►Recent Highlight Posts from the Previous Thread: >>106022743

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.106034733 [Report]
>>106034671 (OP)
Never seen this miku before
Anonymous No.106034773 [Report]
>>106034680
>no (You)s
I've been slacking.
Anonymous No.106034798 [Report]
>>106034671 (OP)
Cool local cyberllama
Anonymous No.106034824 [Report] >>106034951
>>106009695
hi, if you're still here and the rig is too, can you post your discord in this email?
gout_margin330@simplelogin.com
PS: would be willing to ship it half way across the world to India/Germany? I'll pay the shipping of course.
Anonymous No.106034951 [Report] >>106035276 >>106035653
>>106034824
I hope you're planning to use it for something other than LLMs because 4 channels of DDR4 is beyond useless.
Anonymous No.106034957 [Report] >>106034995 >>106035282
>>106034746
>>106034786
Even with no CUDA the 2k Strix Halo boxes and 10k Mac Studios are still selling, and even literal whos like Rockchip can manage to offer their own inference software stack for their SBCs
If you offer (perceived) value people will buy your shit even with all the pains in the ass, the stupid part of that concept is I doubt you can R&D and sell what would essentially be high end server hardware at bargain bin prices and still make any money off of it
Anonymous No.106034995 [Report] >>106035043
>>106034957
inference maybe but I think nvidia still has the stranglehold on the training market
Anonymous No.106035040 [Report] >>106035100
I would like to gloat and proclaim that I was probably right about my fan theory that main reason reasoning works is that it improves / fixes attention. It was kinda obvious to me that with reasoning the model basically sums up the context so far which then puts those summary token up top and makes them more significant. And the reason I am thinking it is confirmed is the long context result of 235B non reasoning verision update on long context benchmark. It is like the model has learned to rely on reasoning for the summary and that is why if you forcibly strip it out in continued training it is gonna do poorly on long context.
Anonymous No.106035043 [Report]
>>106034995
That's the point, isn't it? I'm pretty sure the vast majority here would trade training for cheaper access to more fast memory.
Anonymous No.106035079 [Report] >>106035190
>>106034671 (OP)
Finally a good non-schizo OP
Anonymous No.106035100 [Report] >>106035172
>>106035040
I'll be honest I had zero fucking faith reasoning would do anything when o1 was released, since it didn't seem that beneficial
In retrospect it makes sense though. It's good for retention in creative writing settings since it tends to probe the model to identify and highlight key details before responding (basically like a more natural RAG) and it's good for problem solving since having scratch space to organize thoughts is essential for hard problems
I still wish there was a better way to handle it though. It still feels like there should be a more natural and efficient modality for these models to reason than just a hidden block of text
Anonymous No.106035172 [Report]
>>106035100
I still think reasoning is crap for creative writing. It causes the reverse problem, where the model focuses on irrelevant details and collapses over time. It's very bad at following the implicit thread of narrative and tone. The only time reasoning works well is when making single-turn one shots. Which makes sense, since that's what it was always for, benchmaxxing single-turn puzzles.
Anonymous No.106035180 [Report]
Reasoning is woke
Anonymous No.106035190 [Report] >>106035209 >>106037082 >>106037324
>>106035079
Next should be picrel.
Anonymous No.106035209 [Report] >>106035285
>>106035190
>left: changes clothes
>right: changes into a literally different character
dropped
Anonymous No.106035276 [Report] >>106035357
>>106034951
You're totally wrong. Six channels of DDR4 is 2-3 t/s running deepseek r1 q4_k_m. It's slow but not useless.
Anonymous No.106035282 [Report] >>106035360
>>106034957
>the stupid part of that concept is I doubt you can R&D and sell what would essentially be high end server hardware at bargain bin prices and still make any money off of it
Ampere (the company) did that for a while, cheap 8-channel ARM server CPUs in workstation form factor with socketable everything.
See https://system76.com/desktops/thelio-astra-a1.1-n1/configure
But they never bothered updating it and they eventually got bought out, unfortunate because a modern equivalent would shit all over Strix Halo/DGX Spark.
Anonymous No.106035285 [Report] >>106035308 >>106035324 >>106035348
>>106035209
What if Miku is actually bald and she rotates through wigs because her "hair" is so unwieldly long, coming with annoyances like having to wash it, stepping on herself (safer if it just falls off) or random samurais passing by cutting her hair off to test their swords?
Anonymous No.106035308 [Report]
>>106035285
hot
Anonymous No.106035324 [Report]
>>106035285
>bald
blasphemy
Anonymous No.106035348 [Report] >>106035409
>>106035285
>she
>her
Anonymous No.106035357 [Report] >>106035653
>>106035276
2-3 t/s for a reasoning model is useless. 10 is bare minimum.
Anonymous No.106035360 [Report]
>>106035282
>$4,763.00 for 512GB
meh, I'd rather go with 9004
Anonymous No.106035382 [Report] >>106035404 >>106035663 >>106037082
I can't find much motivation in LLMs anymore. Even "fintuned" mistral-small is an "it's important to remember that" faggot model... yeah yeah you can talk it into what you want to do, but still.
What kind of safety slop is in the typical base model these days? Say gemma3 12b. Is it just a NSFW-pruned training set?
Anonymous No.106035401 [Report] >>106035536
>>106034722
No it is retarded because people can't get it through their thick fucking skulls that even though inferencing is memory bandwidth bound prompt processing is compute bound. You can have 6 billion exabytes per second of memory bandwidth but if you have a narrow compute pipeline like a CPU, and especially some shitty meme ARM CPU it's still going to take forever to do the prompt processing.
On top of that there's diminishing returns when adding more memory channels. Doubling the memory channels doesn't double your real world performance. Not even close. It increases it, sure, but at some point you would find a crossing point where the inefficiencies of managing more memory channels outweighs the increase in bandwidth.
Anonymous No.106035404 [Report]
>>106035382
Oh yeah where the hell is Wan 2.2?
Anonymous No.106035409 [Report]
>>106035348
[insert ungendered robot pronouns]
Anonymous No.106035536 [Report]
>>106035401 (Me)
Also adding to my rant here multiple memory channels doesn't just divide the memory load bit-per-bit
The memory is basically divided into "pages" of a specific size, (usually somewhere between like 256 bytes and 2 kilobytes) which are then interleaved. And so your performance only theoretically doubles in situations where every bit of data you need alternatingly comes from pages that are on opposite memory channels. (using 2-channel as an example here).
But at the same time having pages that are too small just bogs shit down with mountains of superfluous memory pages and in fact when setting up your memory interleave for running LLMs it actually helps to max out page size to prevent this. So even in a perfect world where there is no inefficiency in switching memory pages you'd have a lot of situations where the next consecutive bit you need is on the same page as the last, negating the bandwidth of the second channel entirely.
If it were just as simple as making 1 trillion channel RAM someone would already be doing it.
Anonymous No.106035547 [Report] >>106035568
You guys ever used that "memory" function in the web interface of the likes of gemini and chatgpt?
Is there a frontend out there that has something like that for open weight LLMs?
Basically, it's kind of like lorebooks or databank in Silly, only the LLM itself has agency to save and edit these memories and you can request in the conversation for it to do so.
As far as I can tell, it's just an application of tool use.
Anonymous No.106035568 [Report] >>106035600
>>106035547
It's literally just vector storage that injects shit into the context as it sees fit. How the prompt is formatted to incorporate it is obviously trade secret. It obviously works way better than any open shit we've had to that effect but it's still not perfect.
Anonymous No.106035600 [Report] >>106035668
>>106035568
>trade secret
>works better than any open shit
lmao
Anonymous No.106035653 [Report]
>>106034951
yep that is one of the use cases, other than that I just want to enter the server h/w world and be core and RAM rich. I also plan on opening up my services to my extended techy family so this would be plenty for that.
It has 6 channels though, how many are considered ideal?
>>106035357
>2-3 t/s for a reasoning model is useless. 10 is bare minimum.
I do agree here. 2-3 t/s is only usable for non-reasoning models. It once took 17 minutes for QwQ 32B running at 2 t/s to generate a shitty coding related answer. That was the first and last usage of it.
Anonymous No.106035663 [Report] >>106035738 >>106035768 >>106035849 >>106036656
>>106035382
Why not just ignore the retards in this thread that say that the moment you touch the weights the model becomes retarded and become a sloptuner yourself.

You can almost certainly RL against those moralizing patterns to make them almost never happen. You just need the VRAM anon.and a lot of patience to experiment. Unlearning is also a thing. There's so much that can be done to improve the quality of existing models, but people seem to have just given up because tuning SOTA is too VRAM expensive.

Gemma likely was trained with excessive synthetic slop (distillation too), and it hasn't seen much NSFW.
Anonymous No.106035668 [Report] >>106035805
>>106035600
It does work well.
the only place the chatgpt 'memories' seem to fail me is with image generation. It adds image generation tokens to the memories for some ungodly reason which then causes weird shit to happen if you ask it to generate an image at the end of a long conversation.
Anonymous No.106035669 [Report] >>106035901 >>106037082
Anonymous No.106035738 [Report] >>106035878
>>106035663
>You can almost certainly RL against those moralizing patterns
you cant
>Unlearning is also a thing.
it isnt
>There's so much that can be done to improve the quality of existing models
there isnt
>but people seem to have just given up
its over
Anonymous No.106035742 [Report]
Did anything interesting come from that mercury LLM (diffusion-based)?
Anonymous No.106035764 [Report] >>106035931 >>106035961
>I notice he made a typo again ("solutiton"), but I'll let it slide since he's clearly trying to understand.
r1 is bullyng me in its thoughts again ;_;
Anonymous No.106035768 [Report] >>106035878
>>106035663
>become a sloptuner yourself.
Quickest way for completely losing interest in *actually using* the models, while wasting money (yours or somebody else's) in the process.
Anonymous No.106035805 [Report]
>>106035668
Not saying it doesn't work, lmaoing at 'no open solutions matching it' and it being a 'trade secret'
Anonymous No.106035849 [Report] >>106035878
>>106035663
>You can almost certainly RL against those moralizing patterns to make them almost never happen.
>almost certainly
>almost never happen
I trust this clueless retard
Anonymous No.106035878 [Report] >>106035939 >>106036140
>>106035738
It is, I've tried it on toy models, it works.
I don't have the VRAM for larger though
>>106035768
.How come? Because it becomes "work"?
>>106035849
>never does research
>always assumes it's impossible
this thread is a dead-end because of this demoralizing type of posting
If you actually think optimizing in a way that makes a pattern unlikely to happen is impossible, I have a few bridges to sell you
Anonymous No.106035901 [Report] >>106035913
>>106035669
i like this style, what's the model/lora?
Anonymous No.106035913 [Report] >>106035934
>>106035901
https://danbooru.donmai.us/posts?tags=akableak
Anonymous No.106035931 [Report] >>106035945 >>106035972 >>106035986
>>106035764
>misspell a character's name
>R1 spends half of its thinking analyzing it as if I did it on purpose
I just refer to characters as {{char}} now.
Anonymous No.106035934 [Report]
>>106035913
thans
Anonymous No.106035939 [Report] >>106036008
>>106035878
>It is, I've tried it on toy models, it works.
Just because it works on toy models doesn't mean it will be so easy to scale up. Smaller models are so stupid they can barely keep their own moralizing patterns straight on their own.
Anonymous No.106035945 [Report] >>106036012
>>106035931
>using reasoning for rp
is it worth it?
Anonymous No.106035961 [Report]
>>106035764
me analysing my gf's timeline
Anonymous No.106035972 [Report] >>106035986 >>106036012
>>106035931
>Hmm. The user wrote {char}} in their message. Perhaps it is some sort of code or puzzle? Let's see. To "char" is to burn or blacken [...]
Anonymous No.106035986 [Report]
>>106035972
>>106035931
lol
Anonymous No.106036008 [Report]
>>106035939
Maybe, although,can't know without trying.
I'm a poorfag, so I lack the VRAM, but there's at least a few people with 96GB+ VRAM in this thread, they could tune those 12Bs with that.
For larger, you probably would have to rent though.
Anonymous No.106036011 [Report]
Is there a better model for vision than molmo 7b? It has to be smarter AND not a falls-silent-if-triggered-faggot like gemma when it comes to describing things.
I'm trying to caption lora images and it's just too damn much to do by hand.
Anonymous No.106036012 [Report] >>106036027 >>106036038
>>106035945
I think so. I haven't tried V3. I was using Mistral Large 2 until R1 and I hated its positivity bias. R1 doesn't have that and I'm pretty sure it's because it thinks which stops it from leaning into whatever I write. Maybe V3 would be good too, I don't know.
>>106035972
Kek, ST replaces {{char}} with the character's name though.
Anonymous No.106036027 [Report] >>106036080
>>106036012
Anon, reread his post. He didn't say "{{char}}".
Anonymous No.106036038 [Report]
>>106036012
r1 works well with alpaca template and doesn't try to think. you can even just do story writing with it without formatting, with mikupad or similar. just saying
Anonymous No.106036046 [Report] >>106036085 >>106036116
lol this is so deceptive. I've been debugging this for hours.
Accidentally passing float instead of bool means not only are attending to future tokens, you're attending more (+1)
Anonymous No.106036080 [Report] >>106036102
>>106036027
Yeah, I'm not good at reading.
Anonymous No.106036085 [Report] >>106036092
>>106036046
float mask would have to be -1e8 or something for it to work
Anonymous No.106036092 [Report]
>>106036085
float('-inf')
Anonymous No.106036102 [Report]
>>106036080
This might be a poor choice of hobby for you then.
Anonymous No.106036116 [Report] >>106036136
>>106036046
lol the exact same thing happened to me, torch.bool getting upcasted and breaking my attention mask, what a fucked up api

>not only are attending to future tokens, you're attending more (+1)
Not true, True(1) means to include the token, so it attends to past tokens "more" if you have this bug. Of course any amount of future data bleeding in is still a bug and +1 isn't that much. Make sure you don't still have a bug
Anonymous No.106036136 [Report] >>106038916
>>106036116
True means not allowed to attend though
Anonymous No.106036140 [Report] >>106036246 >>106036275 >>106036286
>>106035878
>How come? Because it becomes "work"?
That is one factor. Another is that getting to see how the models work behind the curtain is a big turn off that makes you not want to seriously interact with them. And eventually (unless you're putting serious data gathering/cleaning efforts and compute, i.e. money) you may slowly come to the realization that you could have got there anyway with a fraction of the effort and time by better prompting the models you already had.

Do it if you're curious about the process, but forget about making "better models", especially if you're doing it alone. Those recommending others to finetune are probably too invested in the art (i.e. are getting paid in some way) and/or delusional.
Anonymous No.106036209 [Report]
IT'S SUNDAY IN CHINA AND GLM 4.5 IS STILL NOT RELEASED
Anonymous No.106036246 [Report]
>>106036140
> Another is that getting to see how the models work behind the curtain is a big turn off that makes you not want to seriously interact with them
I don't know about that, if you already knew from the start you're just dealing with an autoregressive LLM, just a completion model, it wouldn't be that much of a surprise. I think i this was the case, people wouldn't work on things, but they do, because they see some magic/potential in it regardless.
>"better models
better as far as your subjective taste. I would say that if the earlier poster that was annoyed at the moralizing patterns wanted them gone, he could do it, there's at least 5-6 methods I can think of that can be used to make them less likely to happen.
I think if you're not doing it for "yourself" then there's little point in working on it anyway, you have to be the user. And yes, maybe you can get the same effect with prompting, but it's frustrating to know it produces that output by default and surely you'd want to fix it.
Anonymous No.106036275 [Report]
>>106036140
I have a very good local TTS model, you don't. Training as a private individual also has advantages. Think about what these could be.
Anonymous No.106036286 [Report] >>106036381
>>106036140
>to see how the models work behind the curtain is a big turn off
Why?
Anonymous No.106036287 [Report]
I've started using a depth 0 author's note that tells nuqwen instruct to write 2-5 paragraphs and this seems to have cured it's one-liner addiction
Anonymous No.106036292 [Report] >>106036430 >>106037082
Is it possible to have official /lmg/ mascot Hatsune Miku automatically sing the output of local models?
Anonymous No.106036381 [Report] >>106036565
>>106036286
You'll see that LLMs are not as smart as you'd think, that their outputs are mostly a function of the bias of the training data you're using, that they'll never learn what you want to the degree you'd like without destroying their previously learned capabilities, that you'll never get too far with just LoRA/QLoRA finetuning (and ultimately, that machine learning in general sucks).

And then, if you've curated the data yourself, perhaps even going through every sample by hand, after a while you won't really want to see any of that anymore.
Anonymous No.106036430 [Report] >>106036468 >>106036476
>>106036292
You need a license.
Anonymous No.106036468 [Report]
>>106036430
Oi m8 u got a loicense for dat mascot?
Anonymous No.106036476 [Report]
>>106036430
>license
Which Git zoo did you escape from?
Anonymous No.106036565 [Report]
>>106036381
NTA, but
>You'll see that LLMs are not as smart as you'd think
you already can see them fuck up in the dumbest ways, but people still use them? Bigger usually is less dumb though.
> mostly a function of the bias of the training data you're using
does this come as a surprise?
> that they'll never learn what you want to the degree you'd like without destroying their previously learned capabilities
just make sure the capabilities you care about are invoked in the batch to make sure SGD or whatever optimizer you're using doesn't destroy it, that and catastrophic forgetting is less bad with larger models.
>you'll never get too far with just LoRA/QLoRA finetuning
true, it's slower and causes artifacts
you could still do higher rank loras and merge them in, but more steps are needed. ideally, full finetuning is preferable.
>And then, if you've curated the data yourself, perhaps even going through every sample by hand, after a while you won't really want to see any of that anymore.
maybe you need better tools and figuring out how to slavedrive your LLMs better and with more automation. obviously gets into synth slopping territory, but do you really believe most labs hand curate everything?
Anonymous No.106036656 [Report] >>106036746 >>106036829 >>106037052
>>106035663
I feel like a lot of the slo[tune hate comes from vramlets running 12b-30b models. Once you get to 70b, the sloptunes become smart enough for smut and are just better.

Also, as companies clean their datasets better and better, smut may become functionally impossible with more modern models (maybe).
Anonymous No.106036679 [Report] >>106036897
What are people using to convert HTML to Markdown? Digging into Claude Code, I found that it uses https://mixmark-io.github.io/turndown/ internally. There's also the Obsidian Web Clipper and Pandoc.
Anonymous No.106036746 [Report] >>106036840 >>106036893
>>106036656
good 50b when
70 is too goddamn slow
Anonymous No.106036829 [Report] >>106036881
>>106036656
Looking at how things are going with all that age verification b.s. and all things considered we may be in a situation in which Nemo is still the only good uncensored model in 2035.
Anonymous No.106036840 [Report] >>106037008
>>106036746
you can try TheDrummer_Valkyrie-49B-v1. Its the only sloptune I know of at the midpoint of 30 and 70b. I think it's okay and definitely feels kinda like 70b.
Anonymous No.106036881 [Report] >>106036904 >>106036994
>>106036829
You already have much larger models with unfiltered datasets though? And China doesn't filter as much (recent Qwen's not counting).
By203 5 the price of the hardware would have dropped considerably enough.
Anonymous No.106036893 [Report]
>>106036746
Llama-3_3-Nemotron-Super-49B-v1_5
Anonymous No.106036897 [Report]
>>106036679
>heading tag not converted to a markdown heading
Anonymous No.106036904 [Report]
>>106036881
Your right, buddy boy.
Anonymous No.106036955 [Report] >>106037249
llamacpp training state? what can I expect?
Anonymous No.106036994 [Report] >>106037093
>>106036881
>And China doesn't filter as much
*Two* model makers haven't filtered much. It's not a lot considering all the models they release (which are all just derivatives of ds and qwen). And as always, until they follow it up with a new model with as few restrictions as the previous one, they're all suspect.
>By203 5 the price of the hardware would have dropped considerably enough.
People will traffic used 3090s in their assholes to get them through customs. The good news is that models stopped getting bigger. That would have been terrible...
Anonymous No.106037008 [Report]
>>106036840
buy an ad you fucking faggot
Anonymous No.106037015 [Report] >>106037038
thedr*mmer should be put on the spam list
Anonymous No.106037038 [Report]
>>106037015
>thedr*mmer
Are you triggered by words?
Anonymous No.106037052 [Report] >>106037093 >>106037130
>>106036656
I hate sloptuners because in aggregate they're fraudulent, slimy personalities pretending to be ML luminaires, who seem to believe they're making good shit just because they've found clueless retards willing to throw money or free compute at them for their experiments.
Anonymous No.106037069 [Report] >>106037091 >>106037103
Is cydonia 24b any good? better than nemo/rocinate?
Anonymous No.106037082 [Report] >>106037088 >>106037093 >>106037117 >>106039690
>>106035190
>>106035382
>>106035669
>>106036292
vocaloidtranny posting porn in /ldg/: >>105715769
It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation: https://desuarchive.org/g/thread/104414999/#q104418525 https://desuarchive.org/g/thread/104414999/#q104418574
he makes ryona picture: >>105714003 of some random generic anime girl the different anon posted earlier: >>105704741 (could be the vocaloidtranny playing both sides)
here >>105884523 he tests bait poster bot for better shitflinging in threads
admits spamming /v/ with AI slop: https://desuarchive.org/g/thread/103462620/#103473545

Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.

TLDR: vocaloid troon / janitor protects resident avatarfags and deletes everyone who outs him, making the general his little personal safespace with samefagging. Is prone to screech "Go back to teh POL!" when someone posts something mildly political about language models or experiments around topic.

As said in previous thread(s) >>105716637 I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed spamming. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted

xis ai slop profiles
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
Anonymous No.106037088 [Report]
>>106037082
based
Anonymous No.106037091 [Report]
>>106037069
I've been testing it with my dungeons and dragons python thing and I think 3.2 base Mistral is better.
However don't take my word alone as I haven't been that extensive about it yet. I think Cydonia has lost some IQ in the process if I'm correct...
Anonymous No.106037093 [Report] >>106037149
>>106037052
You're jealous they get free compute. I'm jealous too, but I wouldn't hate them for being more fortunate!
>>106036994
>People will traffic used 3090s in their assholes to get them through customs
Eh, fuck doomers, they won't win.
Things aren't looking that bad, Trump endorsed open source AI/open weights, he even let Jensen sell to any country besides China, the Biden diffusion rule is gone, even China is getting those h800s
>>106037082
fuck off
Anonymous No.106037103 [Report] >>106037123
>>106037069
To add: Rocinante isn't that good either not even for a dilly dally simple chat. It's somewhat strange that it gets spammed here so much. 12B models are trash in general.
Anonymous No.106037117 [Report] >>106037708
>>106037082
Have another.
Anonymous No.106037123 [Report] >>106037154
>>106037103
thedr*mmer sloptunes are all shit. the neet faggot is spamming his shit constantly here because he has mental issues.
Anonymous No.106037130 [Report] >>106037146 >>106037238
>>106037052
how so? Im using qwen 235b and no matter what it's biased to start talking about how 'but then they realized the joys of consent and mutual respect" no matter how hard you jailbreak it within a few paragraphs. Only a sloptune can fix that, only a sloptune ever will. Every LLM ever made has a personality, and changing it to fit the purpose is fine. The sterile models good at coding and ethics put out by corporations are just a different kind of slop. I know you don't enjoy using that crap, and if you are, youre a frustrated cuck who just hates local.
Anonymous No.106037144 [Report] >>106037182 >>106037817
Rocinante is the official /lmg/ model and only a single schizo samefag posts that it's bad because he hates Drummer for mental illness-related reasons.
Anonymous No.106037146 [Report] >>106037178
>>106037130
>Only a sloptune can fix that, only a sloptune ever will.
Sloptunes don't fix that at all.
Anonymous No.106037149 [Report] >>106037182
>>106037093
Things will work out well if we get just one other company (not AMD because they dont compete) undercutting vram jewery from Nvidia, but if that doesn't happen I can see always being perpetually overpriced and gens behind what would be reasonable in a competitive hardware market.
Anonymous No.106037154 [Report] >>106037199 >>106037817
>>106037123
Everyone who has tried his shit has complained about how retarded and fried they are, logs were posted by people who gave in and tried them and showed all the shivers and refusals, and yet thread after thread you have people recommending his models and only his models with nothing to back it up.
Anonymous No.106037178 [Report] >>106037207
>>106037146
yah they do. Fucking eva-qwen 70b will just start rapes for the hell of it. Dumbass. Why you gotta make such a dumb post on 4chan? Did I hurt your fucking feelings or some shit?
Anonymous No.106037182 [Report] >>106037817
>>106037144
Now don't get ahead of yourself. Some are okay, but don't go thinking it's SOTA or anything., it's semiusable as many sloptunes.
>>106037149
Intel, Tenstorrent, a bunch of chinese GPU makers, a new one appeared in recent days, saying they're shipping ~september. I think most are just using GDDR, but remember blackwell 6000 is *just* GDDR7 and reaches 96GB.
Anonymous No.106037199 [Report]
>>106037154
you have one (1) spammer recommending them (hint: its the faggot himself). he is begging for donations on discord and has bots spamming r*ddit with his retarded sloptunes as well.

it's all so exhausting.
Anonymous No.106037207 [Report] >>106037244
>>106037178
>eva-qwen 70b will just start rapes for the hell of it.
This is what the average sloptune user thinks is a selling point. I guess if all you need is ahh ahh mistress to get off sloptunes are great.
Anonymous No.106037226 [Report]
the drummer is like hatsune miku
Anonymous No.106037238 [Report] >>106037261
>>106037130
the subtext to this post is that sloptunes are for people with severe skill issues
Anonymous No.106037244 [Report]
>>106037207
have fun with models that have to be explicitly prompted to use the word penis, then they use it once, and revert back to flowery language.
llama.cpp CUDA dev !!yhbFjk57TDr No.106037249 [Report] >>106037283
>>106036955
>llamacpp training state?
My work on it is on hold until I've achieved other objectives, in particular better code for evaluating the quality of finetuned models and higher batched throughput for the HTTP server.

>what can I expect?
A long wait.
Anonymous No.106037261 [Report] >>106037278 >>106037290 >>106037314 >>106037973
>>106037238
no thats bullshit. The prune the data the models are trained on. Many models have literally never seen smut. You're prompting a tiny part of the model that has been stripped of all soul on purpose. You corporate shill.
Anonymous No.106037278 [Report]
>>106037261
>Many models have literally never seen smut.
Exactly, and you think a quick finetune is going to fix that?
Anonymous No.106037283 [Report] >>106037302
>>106037249
Oh, i thought it was already merged.
Anonymous No.106037290 [Report]
>>106037261
hello the*rummer keep begging for 25$ on discord to release your slopshit
llama.cpp CUDA dev !!yhbFjk57TDr No.106037302 [Report]
>>106037283
A PR of mine that added training support has been merged but I don't think the code is in a state where it's actually usable beyond a proof of concept.
Anonymous No.106037314 [Report]
>>106037261
mhm that's nice sweetie go back to the kobold discord now
Anonymous No.106037324 [Report] >>106038329
>>106035190
<A challenger appears>
Anonymous No.106037708 [Report]
>>106037117
>wifi signal or loud sleeves?
Anonymous No.106037721 [Report] >>106037759
https://github.com/SillyTavern/SillyTavern/releases/tag/1.13.2
Anonymous No.106037735 [Report]
>he pulled
Anonymous No.106037759 [Report]
>>106037721
Looks unimportant what am i missing?
Anonymous No.106037798 [Report] >>106039938
bros...
Anonymous No.106037817 [Report] >>106037825 >>106037845 >>106037902
>>106037144
>>106037154
>>106037182
yet not one, not ONE of you faggots offer a model that is better than Rocinante.
Hypocrites, the lot of you.
Anonymous No.106037818 [Report] >>106037935 >>106037950
got the opportunity to buy a modded 2080 ti with 22gb of vram for about 400 usd, worth it or nah? kinda wanna go for it just for the novelty but like, i could also just keep my money
Anonymous No.106037825 [Report]
>>106037817
Because there isn't one.
Anonymous No.106037840 [Report]
glm4 100b moe is going to save local
Anonymous No.106037845 [Report]
>>106037817
R1
Anonymous No.106037902 [Report]
>>106037817
In the same size class? I don't know I haven't tested every slop tune out there

But things like R1 or DS3 need no finetune and easily beat it.
Anonymous No.106037935 [Report]
>>106037818
If you don't miss the money why not. I'd get it if I had that opportunity.
Anonymous No.106037950 [Report]
>>106037818
You can use that VRAM for CUDA, but you can't use it for games, I think.
Anonymous No.106037973 [Report] >>106038037
>>106037261
>The prune the data the models are trained on. Many models have literally never seen smut. You're prompting a tiny part of the model that has been stripped of all soul on purpose.
Didn't you just accidentally explain why sloptunes don't work?
Anonymous No.106038037 [Report] >>106038053 >>106038083
>>106037973
Not drummer, but:
Both you and quoted are wrong.
You're implying continued pretrain is necessary. Maybe it is for good quality, but in practice most have seen some smut, but of purple prose variety, and some knowledge is there, so a finetune would be able to get it to write more explicit.
You're also wrong to assume that you'll truly need billions of tokens to make this work.
Billions of tokens might be needed to get much better, more natural quality to it, but something half-way there can be had with far less.
The question is how much can a sloptune achieve in practice. How are you going to benchmark that, besides "try it and find out"
Anonymous No.106038053 [Report]
>>106038037
You're alright. The thread really be like
>sloptunes bad!
>sloptunes good!
>nuh uh
>nuh uh
Anonymous No.106038083 [Report] >>106038169 >>106038194 >>106038201
>>106038037
>The question is how much can a sloptune achieve in practice.
I think it is either nothing and model still works or something and model overfits to the point it is retarded.
Anonymous No.106038169 [Report] >>106038183
>>106038083
I guess I disagree here, you can get coherent and quite usable output. I think people forget that almost every single instruct tune from corpos is already overfit to hell. The probabilities a base model outputs and what instruct/chat tunes output are very different, the latter often tend to have less variable outputs. There's still plenty of room for finetunes.
Anonymous No.106038183 [Report] >>106038279
>>106038169
>you can get coherent and quite usable output
Yes when you do 1 epoch on low learn rate. Then the finetuning did nothing.
Anonymous No.106038194 [Report] >>106038254 >>106038273 >>106038335
>>106038083
nta.
If finetuning does nothing, instruct tuning wouldn't be possible. We know it can be done.
If finetuning can only fry models, good instructs wouldn't be possible. We know it can be done.
So if we can do instruct, why not something else?
Tuners being shit at it or not is a different thing. I rather they keep on trying.
Anonymous No.106038201 [Report] >>106038230 >>106038279
>>106038083
In my experience some tunes improve on the intelligence of the parent model in some contexts but decrease intelligence in others, and ultimately they are not truly better or worse in intelligence when it comes to writing tasks. But they do uncensor the model a bit and change the style noticeably, so it's a net improvement if you're willing to load up a model for RP and use a different model for assistant stuff.
Anonymous No.106038230 [Report]
>>106038201
Most tunes are absolute garbage though btw, like 99% of the ones you see on HF and the greedy nala test.
Anonymous No.106038254 [Report] >>106038279 >>106038327
>>106038194
>If finetuning does nothing, instruct tuning wouldn't be possible.
Don't they interchange instruct shit with some generous pretraining iterations?
Anonymous No.106038273 [Report] >>106038292 >>106038327
>>106038194
>I rather they keep on trying.
For the most part they're keeping methods and the data to themselves, so any individual or lab that would like to reproduce the results has to start from scratch. Sloptuners can go get fucked. AI companies at least give us entire base models and professionally-made general-purpose instruct tunes.
Anonymous No.106038279 [Report] >>106038317
>>106038183
I've seen cases where at high enough LR or bad hyperparams you fry it quite badly, I've seen cases where it barely learns, but you can find a good goldilocks zone where it's not really broken, and it has learned something. Obviously you'll always trade stuff off as >>106038201 said, you can at best try to harm the least and get the most of what you want.
There is no unbiased tune out there, not from corpos, not from sloptuners, but there's plenty fun ones? Until youget tired of the slop of that particular variant! Even big stuff like R1 and K2 have slop you'll notice given enough tries.

>>106038254
They didn't do this originally. Nowadays this is called "annealing", and they put a bit of instruct mix into the base dataset to make instruct tuning easier.
It's also why Llama3 and a few others like Qwen start regurgitating math slop on an empty context when you use the base model, they clearly altered the distribution of data from the normal ones so that it's better at benchs and easier to tune for their purposes.
Anonymous No.106038292 [Report] >>106038302
>>106038273
>AI companies at least give us entire base models
I thought they'd stopped doing this and are just putting out the instruct tuned models now.
Anonymous No.106038302 [Report] >>106038317
>>106038292
Some still do base model releases, but many are annealed.
Anonymous No.106038317 [Report] >>106038349
>>106038279
>>106038302
>annealing
Wait that's what that term was always referring to?
Sheeit.
Anonymous No.106038327 [Report]
>>106038254
DS apparently didn't and it's one of the best models. Back in the day, a dedicated instruct tuning round was the norm. Base was Base.

>>106038273
>AI companies at least give us entire base models
SOME AI companies at least give us SOME models. Not that we'd be able to reproduce anything they make with what they provide. Other than allenai probably.
>and professionally-made general-purpose instruct tunes.
Yeah. Those falcon models are looking great...
Anonymous No.106038329 [Report]
>>106037324
still amazed people save these
Anonymous No.106038335 [Report] >>106038366 >>106038546
>>106038194
Let's make an analogy
A person that was cared for excessively by his parents, never got to experience the real world, grew up with an immature, incomplete world view. This is a model with a "curated" pretraining dataset. Instruction sft would be like teaching him how to be an assistant, it's relatively simple and desn't require new knowledge, only adhering to a pattern. But you can't teach him how to be a street gangster because he only has a superficial understanding of this concept. He may memorize some behaviors but that's about it.
Anonymous No.106038348 [Report] >>106038403
is running local models even worth it? like these models in the range up to 72b are retarded, it's better to just pay for the claude or grok than to buy few gpus, I have been thinking about a maxxed out macbook pro but buying base and paying for subscription will be many times cheaper, I would love to run grok 4 locally but the tech just isn't there yet and I am thinking it will never be since this is all just a big scam or like in 20 years
Anonymous No.106038349 [Report]
>>106038317
Yes, it's usually stuff like instruct data, synthetic slop (for example math) and similar stuff to what they'd try to benchmark it on. There were a few non-annealed models base models recently , maybe DOTS one or some others. I forgot. Unfortunately I think that one still filtered NSFW somewhat from the dataset.
Anonymous No.106038366 [Report] >>106038422
>>106038335
Now this goes to sample efficiency, how many fiction books do you have to train until it can do a character or style well. This is something you probably ocould get concrete data on.
Anonymous No.106038376 [Report] >>106038383 >>106038406 >>106038436 >>106038545
So to sum it up. Dots was bad at sex and didn't save local. Minimax was bad at sex and didn't save local. Exaone was bad at sex and didn't save local. Hunyuan was bad at sex and didn't save local. Ernie was bad at sex and didn't save local. Nemotron was bad at sex and didn't save local.
Anonymous No.106038381 [Report] >>106038456
Any good Ani cards that use and improve upon the original's prompt?
Anonymous No.106038383 [Report]
>>106038376
qwen was good at sex and saved local
Anonymous No.106038403 [Report]
>>106038348
Depends on whether you feel comfortable with elon reading your mesugaki correction rps.
Anonymous No.106038406 [Report]
>>106038376
Qwen was good at sex and didn't save local (it's still dumb and it mostly still doesn't know more pop culture than a 27B).
Anonymous No.106038422 [Report]
>>106038366
Yeah it's possible but sloptunes are called sloptunes for a reason.
Anonymous No.106038436 [Report] >>106038479
>>106038376
Is Kimi bad too? Haven't bothered to try it.
Anonymous No.106038447 [Report]
I found you "MoE is bad" poster.
Anonymous No.106038456 [Report]
>>106038381
I don't think ani made a card for his mascot yet.
Anonymous No.106038479 [Report] >>106038498
>>106038436
It's very good, but the instruct is safetyslopped, needs a prefill to make it not refuse most porn.
Base model probably is quite good, but I'd be surprised if you can beat the official instruct at performance.
Unironically it needs an uncensoring tune or merge back to base of the"refusing" experts, to uncensor it. Prefilling works without any modifications, of course.
Anonymous No.106038498 [Report] >>106038551
>>106038479
>or merge back to base of the"refusing" experts
Has anyone tried this on any MoE? At least in theory it sounds plausible.
Anonymous No.106038545 [Report] >>106038599
>>106038376
What does good sex look like?
Anonymous No.106038546 [Report]
>>106038335
Are we using bad analogies now? Fine. When did you stop learning things?
>teaching him how to be an assistant, it's relatively simple and desn't require new knowledge, only adhering to a pattern
It's a statistical model. Following patterns is what it does.
>But you can't teach him how to be a street gangster because he only has a superficial understanding of this concept
Why not? Have someone live an extra 20 years in the right environment and he could learn anything. Couldn't you?
>He may memorize some behaviors but that's about it.
Have you put words together in new ways since you learn how to speak or are they all copies of things you've heard before? Would it be impossible for you to learn a new language?
This is the danger of analogies.
Anonymous No.106038551 [Report]
>>106038498
DeepSeek has a repo and paper for doing "ESFT" (specific expert finetuning), should need much less VRAM.

The merge back stuff was tried by those TNG guys on deepseek v3/R1, but given it's already uncensored for most purposes aside from a few CCP things, it was far less useful. Kimi 2 on the other hand needs this as it will consistently refuse even 30-40 turns into the chat without a prefill. With a prefill, you can get it to write anything, results are good and the output is fun, although still some mild slop specific to it, but enjoyable enough.
Anonymous No.106038599 [Report]
>>106038545
Like you plug 8k tokens of your organic ERP logs into the model and what it spits out is kinda the same (but also different) as the continuation of your organic ERP logs. I only used IQ1 R1 so maybe Q4 is different but even that fails at that.
Anonymous No.106038685 [Report] >>106038692 >>106038702
We need 2T models.
Anonymous No.106038692 [Report] >>106038748
>>106038685
local?
Anonymous No.106038702 [Report]
>>106038685
llama4-behemoth 2t/288a soon
Anonymous No.106038709 [Report] >>106038722 >>106038727 >>106038734 >>106038737 >>106038792 >>106038948 >>106039217
How is the expert selected?
Anonymous No.106038722 [Report]
>>106038709
Router bribery.
Anonymous No.106038727 [Report]
>>106038709
By using A(n) I(ndian).
Anonymous No.106038734 [Report]
>>106038709
Just like everything related to LLMs, vibes.
Anonymous No.106038737 [Report]
>>106038709
nobody knows, countless papers have been written but still nobody understands how the Neural Network works
Anonymous No.106038748 [Report]
>>106038692
Only for people who can organise lan parties.
Anonymous No.106038766 [Report] >>106038789 >>106038804 >>106038826
Is rocinante still the meta for <16GB? I've been running that on my 1080ti and gemma3 27b on my 7900xtx for what feels like forever now.
Electricity is 65c/kwh I can't run more gpu.
Anonymous No.106038789 [Report] >>106039478
>>106038766
yes
Anonymous No.106038792 [Report] >>106039236 >>106039236
>>106038709
By saying "WAAAAAAAAAAAAAAAAAAAAAAAAAAAA" the loudest.
Anonymous No.106038804 [Report] >>106038829 >>106038864 >>106039478
>>106038766
Qwen 30B A3B.
Anonymous No.106038826 [Report] >>106039478
>>106038766
>Is rocinante still the meta for <16GB?
Yes.
Anonymous No.106038829 [Report]
>>106038804
it's retarded as fuck don't listen to this anon
Anonymous No.106038864 [Report]
>>106038804
I sure love when my waifus randomly start speaking in Chinese.
Anonymous No.106038916 [Report] >>106038929
>>106036136
Oh that's even worse, is that huggingface? I got bit by the pytorch SDPA function which does things the opposite way. The situation is even more retarded lmao.
Anonymous No.106038929 [Report] >>106039204
>>106038916
https://docs.pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html
Anonymous No.106038948 [Report]
>>106038709
You check which one is least likely to say sex as next token and you pick that one.
Anonymous No.106039093 [Report] >>106039104 >>106039179
What local model is 4plebs using to describe and add description metadata to all their images?
I want to do that for my own 4chan folder which is like 100,000 images and save the descriptions in the exif metadata so I can find images faster.
Anonymous No.106039104 [Report]
>>106039093
Kita has athlete's foot.
Anonymous No.106039179 [Report]
>>106039093
Gemma is probably the best local LLM capable of describing images.
If you want booru tags there's joytag.
Anonymous No.106039204 [Report]
>>106038929
https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

They're not even consistent in their own framework
Anonymous No.106039217 [Report]
>>106038709
by a diverse committee that prioritizes equity and inclusivity
Anonymous No.106039236 [Report]
>>106038792
I wish >>106038792 was true, because it'd be cute, but it'd be inefficient if you had to actually run all the weights and you'd be back to dense performance.

Look at https://raw.githubusercontent.com/deepseek-ai/DeepSeek-V3/refs/heads/main/inference/model.py

The short of it is as you'd expect, some LP is applied after the output of some block and you get some scores from that which decide what o run net. So that MLP has to learn which expert to use. This is not really differentiable, but you can use various trick to make it trainable.

See the Gate and MoE classes in the source code there, it's "simple"
Anonymous No.106039247 [Report] >>106039931
git pulled and my ikllama isn't outputting any tokens but it keeps iterating.....
Anonymous No.106039260 [Report]
>he pulled
HAHHAHAHAHHAHAHAHAHAHA
Anonymous No.106039305 [Report] >>106039471
>he pulled
Anonymous No.106039444 [Report] >>106039471 >>106040350
Nvidia RTX 5070 SUPER 18GB is going to save local models
Anonymous No.106039471 [Report]
>>106039444
>18GB
see this image: >>106039305
Anonymous No.106039478 [Report]
>>106038789
>>106038826
sad state of affairs
>>106038804
I tried it, seemed on par with gemma3 for general chat/info bot and worse than rocinante for RP.
I also tried some larger models half on cpu RAM half on gpu and it's unusably slow, like 5t/s or something. Dual channel RAM is ass I regret building an AM5 AI machine.
Anonymous No.106039491 [Report]
Is it possible to fine tune quantized large models like Deepseek with QLoRA? How much vram would be necessary and would it be possible to reduce the requirements further with optimization techniques?
Anonymous No.106039540 [Report] >>106039561
!EXPERIMENT!

If you import this human-sourced ERP chat in SillyTavern, remove any system prompt or anything else (keep just the conversation), then delete or branch the chat at any given point (e.g. message #27), will your LLM continue it in character, maintaining the same style and good situational awareness?

https://files.catbox.moe/6axnn8.jsonl
Anonymous No.106039561 [Report]
>>106039540
>will your LLM continue it in character, maintaining the same style and good situational awareness?
Yeah that is the way I benchmark models and even when 235B is noticeably better than anything else before it, it still fails tests like this.
Anonymous No.106039687 [Report]
next week is going to be HUGE for local models
Anonymous No.106039690 [Report]
>>106037082
Doing god's work.
Anonymous No.106039706 [Report]
animated sex
Anonymous No.106039744 [Report]
Here's a small snippet which will parse SillyTavern's json world book into a human readable config file.
>https://files.catbox.moe/vlvhp1.py
Anonymous No.106039772 [Report] >>106039782
Sex with the new intern?
Anonymous No.106039782 [Report]
>>106039772
dude shut up if HR catches us saying stuff like that we'll get fired
Anonymous No.106039888 [Report] >>106039919 >>106039940
I don't see the the amount of active parameters for the new InternLM model mentioned anywhere.
Call me paranoid but this being exactly 235b (+6b vision) with the exact same tokenizer as Qwen3-235b makes me think this is another chink scam where they rebadged Qwen.
Anonymous No.106039919 [Report]
>>106039888
>https://huggingface.co/internlm/Intern-S1
>Built upon a 235B MoE language model (Qwen3) and a 6B Vision encoder (InternViT)
It's not like they are hiding it.
Anonymous No.106039931 [Report]
>>106039247
FALSE ALARM! Ikllama probably works. ST is the guilty nigger.
Anonymous No.106039938 [Report]
>>106037798
The cloud is just…
Anonymous No.106039940 [Report]
>>106039888
every internlm model has been qwen with vision bolted on so this is no surprise
Anonymous No.106039965 [Report] >>106039980 >>106040020 >>106040359 >>106040383 >>106040760
now that the 235b dust has settled, instruct or thinker?
Anonymous No.106039980 [Report]
>>106039965
The more you think the more woke you are and LLMs are no different
Anonymous No.106040020 [Report] >>106040065 >>106040094
>>106039965
I don't care about qwen outside of qwq. It's crazy how they had a great soulful reasoner that they simply had to scale up and they arrived on Qwen3 instead.
Anonymous No.106040065 [Report]
>>106040020
>qwq
>soulful
bait
Anonymous No.106040094 [Report] >>106040148
>>106040020
they did, it's called 235b
only the small models were cheapo distilled afterthoughts
Anonymous No.106040108 [Report] >>106040189
I've been using mistral nemo 12b tunes and it's getting boring. Is there a decent model with different DNA in that range to play with
Anonymous No.106040148 [Report] >>106040275
>>106040094
235b has nothing in common with qwq preview
Anonymous No.106040173 [Report]
Pulled my benis from Migu ass
Anonymous No.106040179 [Report]
>traced idle patterns
>guttural
Yawn.
Anonymous No.106040189 [Report]
>>106040108
Just go on and upgrade to 24B or 27B. It doesn't matter. 24B is the absolute minimum and it's still shit anyway.
Anonymous No.106040275 [Report] >>106040318
>>106040148
>qwq preview
yeah because no one used that garbage
assuming you mean regular qwq it very clearly does
Anonymous No.106040318 [Report]
>>106040275
qwq was more creative and the only downside was getting stuck in repetition, regular qwq and 235b are just more boring r1 distills
Anonymous No.106040336 [Report]
Imagine still talking about old 235B when the new 2507 checkpoint's here
Anonymous No.106040350 [Report]
>>106039444
keek
Anonymous No.106040357 [Report]
Imagine still talking about old 2507 when the new S1 checkpoint's here
Anonymous No.106040359 [Report]
>>106039965
instruct
Anonymous No.106040383 [Report] >>106040449
>>106039965
Instruct for RP
Coder for run of the mill coding
Thinking for hard shit
Qwen actually did something right. I'm shocked too
Anonymous No.106040435 [Report]
Never mind again. Ikllama latest version is fucked on my pc. I downloaded some fork from 4 days ago and it works properly.
Anonymous No.106040449 [Report]
>>106040383
>Instruct for RP
>Thinking for hard shit
But my penis is hard... so....
Anonymous No.106040677 [Report] >>106040747 >>106040752 >>106040766 >>106040776 >>106040787 >>106040830
god I wish I could safely share what I've seen. it's fine... only have to live like this for a week tops
Anonymous No.106040747 [Report]
>>106040677
Whisper sweet nothings about the strawberry-flavored OpenGPT in my ear
Anonymous No.106040752 [Report]
>>106040677
Okay mr fbi reviewing the epstein files
Anonymous No.106040760 [Report] >>106040808
>>106039965
Thinker for RP if your RP is not too unsafe.
Instruct for RP if your RP is very unsafe, and use something like "write 2 paragraphs" to prevent it from doing one liners.
Anonymous No.106040766 [Report]
>>106040677
CONFIRMED: OpenAI's NEW STRAWBERRY GPT5 AGI will bring about the APOCALYPSE and END ALL HUMAN LIFE in "A WEEK TOPS"
Anonymous No.106040776 [Report]
>>106040677
ok sam
Anonymous No.106040787 [Report]
>>106040677
*delayed by a chinese model taking a dump all over it*
Anonymous No.106040808 [Report]
>>106040760
>unsafe
why would qwen care whether the characters are wearing condoms?
Anonymous No.106040819 [Report] >>106040825
I am one of the testers for openai's new model. I asked it what a mesugaki is and an hour later Sam emailed me personally and invited me to a party on some island or something.
Should I go?
Anonymous No.106040825 [Report]
>>106040819
RUN
Anonymous No.106040830 [Report] >>106040845
>>106040677
OpenAI hype squad again?
I'll register my predictions that I won't be very surprised by it.
There's a number of things we know would give impressive results but nobody has the balls to scale it up, OpenAI could do it, but I'd give it a sub-10% that they do anything I would personally consider important. But even if they do that, I would expect it to get replicated when the others once again notice they were sleeping on the wheel - even when many already stated that this or that should be done already, but nobody did it. Once OAI does it, then everyone else copies this obvious thing. Instead of preemptively betting money on scaling up the things they should believe were good. Meh, 5 years of this already since 2020 summer.
Anything else? If not that, I expect some incremental improvement that would be cool and all, but not earth shattering. And not really relevant to /lmg/ since corpo APIs and usual OAI slop. I would not even expect their open source model to be much better than the other stuff, probably very positivity and safety slopped making Deepseek a winner again here for not doing that.
Anonymous No.106040845 [Report] >>106040867
>>106040830
I'll also say that if you truly want to post it, just use the gay proxy and post anonymously through Tor, won't be traced back to you, unless the information you want to share was known mostly to you and very very few people.
Anonymous No.106040867 [Report] >>106040904
>>106040845
Have you considered, even for a moment, that he was just larping and didn't actually see anything?
Anonymous No.106040871 [Report] >>106040891 >>106040934 >>106040948
Guys my uncle works for open AI and he said that the open model will be gpt5o mini and that it knows how to RP as a Mesugaki Nala while quoting Castlevania: SOTN and that's why it has been delayed.
Anonymous No.106040891 [Report]
>>106040871
If I give you my chocolate lunch milk for a week can you ask your uncle to leak the RP capable gpt5o weights?
Anonymous No.106040904 [Report]
>>106040867
I assumed he was larping, but on the off chance he was not, I reminded him that he has an easy way to anonymously leak.
Anonymous No.106040920 [Report]
>chocolate
For me, it's strawberry milk ;)
Anonymous No.106040934 [Report]
>>106040871
>it knows how to RP as a Mesugaki Nala while quoting Castlevania: SOTN
audible lol
Anonymous No.106040948 [Report]
>>106040871
How many watermelons can it hold? This is make or break information.