Thread 107184305

356 posts 60 images /g/

Anonymous 11/12/2025, 4:27:49 PM No.107184305 [Report]

/lmg/ - Local Models General

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107174614 & >>107164243

►News
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 11/12/2025, 4:28:16 PM No.107184306 [Report]

__akita_neru_vocaloid_drawn_by_smith_hioka__df80f14c3b64eee0da8833f94e475bed.jpg md5: 577d8435...

►Recent Highlights from the Previous Thread: >>107174614

--Paper: LeJEPA paper and Yann LeCun's potential new venture discussed:
>107181985 >107182047 >107182081 >107182097 >107182105 >107182118 >107182786 >107182462
--Skepticism over Google's 'secure cloud AI' claims:
>107182872 >107182888 >107182907 >107183248 >107183385 >107183482 >107183498
--Comparing Kimi, GLM, and DeepSeek for creative writing:
>107179399 >107179425 >107179434 >107179510 >107179674 >107180095 >107180171 >107180180 >107180221 >107180134
--Quantization optimization experiments with Q8_0_64 and intermediate formats:
>107180476 >107180530 >107180688
--GLM 4.5 Air deployment challenges and optimization on consumer-grade hardware:
>107174665 >107174677 >107174681 >107175083 >107175095 >107175120 >107175142 >107175231 >107175270 >107175290 >107175624 >107177243 >107176390 >107176473 >107176533 >107176578 >107176611 >107177015 >107177252 >107177277 >107177524 >107177546 >107177566 >107178047 >107181418
--Frontend tool comparison for story writing:
>107178671 >107178760 >107179089 >107179188
--Optimizing 120b model performance on a single 3090 GPU:
>107182483 >107182594 >107182615 >107182618 >107182656 >107182671 >107182676 >107182694 >107182707 >107182742 >107182749
--GPT-5's limitations in generating performant CUDA kernels for llama.cpp integration:
>107179734
--Debating AI's capability for detailed agentic coding and optimal abstraction levels:
>107181333 >107181358 >107181467 >107182044 >107182064 >107181430 >107181472 >107181428
--Implementing persistent memory systems for local LLMs using markdown-based RAG approaches:
>107175255 >107175762 >107177084 >107177172 >107177189 >107177209 >107177241 >107177634 >107177771 >107178429 >107178789
--Kimi K2 Thinking webapp:
>107176092 >107176237 >107176249
--Miku (free space):
>107178964 >107180253 >107180428 >107178764

►Recent Highlight Posts from the Previous Thread: >>107174619

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 11/12/2025, 4:31:10 PM No.107184325 [Report] >>107184602

>>107184173
you are a living tumor upon the earth

Anonymous 11/12/2025, 4:36:26 PM No.107184363 [Report] >>107184547

>>107184240
2x RTX 6000 in a 12 channel epyc platform with the fastest DDR5 you can get.

>>107184258
>>107184299
Alright.
One IDE extension user, one CLI user.
I've been using Cline too and it's been working alright so far.
Haven't tried any of the pure CLI tools.
What are the advantage of those? Anything that would make them work better with local models?
I imagine not, but figured I might as well ask.

Anonymous 11/12/2025, 4:58:42 PM No.107184547 [Report] >>107184770

>>107184240
>I'm seriously thinking of putting together a setup with 2 RTX 6000 Pros.
>>107184363
>2x RTX 6000 in a 12 channel epyc platform with the fastest DDR5 you can get.

I don't think building a ddr5 epyc system is good right now, due to the extreme price increase of ddr5 ram.

Zen 6 Epyc is supposedly going to be announced at CES in january. Zen 6 epyc is going to be much, much better than zen 5. It's also going to use MRDIMMS, which will supposedly exist at 12800hz. Compare to *maybe* getting 8000hz ddr5 next year. There will be 16 channel cpus too, but even 8-channel will be 2x the bandwidth of the best ddr5 ram.

One rtx 6000 pro and wait for zen 6 is The Way.

Anonymous 11/12/2025, 5:01:59 PM No.107184585 [Report] >>107184616 >>107188347

q8_0_imp.png md5: 9e73eb7c...

Thanks to the anon for suggesting checking out the k quants and trellis quants. I learned about importance-weighted optimization and I think I just got a free lunch out of Q8_0

You can quantize to Q8_0 slightly better by using the importance-weighted optimizations that smaller quant formats use and this gives you about a 5% reduction in mean square error. The resulting GGUF is fully backwards-compatible with Q8 (it's literally Q8 just quantized a bit more efficiently at the cost of a much more expensive algorithm than just dividing the weight by 127)

There is no reason I see not to quantize like this if you're releasing a final Q8_0, or to use a Q8_0 that was quantized like this

Anonymous 11/12/2025, 5:03:50 PM No.107184602 [Report] >>107184623 >>107184656 >>107184670

>>107184325
You that ESL spammer. Thanks to you there's never any real discussion here.

Anonymous 11/12/2025, 5:05:53 PM No.107184616 [Report] >>107184772

>>107184585
does bartowski know?

Anonymous 11/12/2025, 5:06:30 PM No.107184623 [Report] >>107184681

>>107184602
>real discussion is vibe coding advice
literally kys retard

Anonymous 11/12/2025, 5:09:53 PM No.107184656 [Report]

>>107184602
>ESL
he thinks americunts are the main posters on this board lmao

Anonymous 11/12/2025, 5:11:09 PM No.107184670 [Report]

file.png md5: 707f8609...

>>107184602
>You that ESL

Anonymous 11/12/2025, 5:12:17 PM No.107184681 [Report] >>107184702 >>107184742

>>107184623
Better discussion than forcing llms to output vulgar text.

Anonymous 11/12/2025, 5:14:16 PM No.107184702 [Report] >>107184713

>>107184681
according to whom? we only care about cockbench here

Anonymous 11/12/2025, 5:14:51 PM No.107184713 [Report]

>>107184702
>we

Anonymous 11/12/2025, 5:18:06 PM No.107184742 [Report] >>107184766

>>107184681
there is no discussion to be had with mongoloids like you
bugger off making more inane PRs that waste maintainer time like the onslaught of garbage that constantly tries to get pushed in llama.cpp
even SOTA models can't really produce good code or that nigger trying to vibecode deepseek v3.2 wouldn't have entered the loopy circle of unending refactor that never properly works
you are an unwanted abortion, a plague on all repos that have to suffer your existence

Anonymous 11/12/2025, 5:21:17 PM No.107184766 [Report] >>107184830

>>107184742
>even SOTA models can't really produce good code
Garbage in, garbage out. And it seems like you are incapable of anything but garbage.

Anonymous 11/12/2025, 5:21:42 PM No.107184770 [Report]

>>107184399
That should be relatively easy since it's only got 10B active params

>>107184547
Thanks for that heads up

Anonymous 11/12/2025, 5:21:50 PM No.107184772 [Report] >>107188913

quantize q8 vs q8_imp.png md5: 4c2a99aa...

>>107184616
>does bartowski know?
he probably has better things to care about i'd think. There is literally no reason to not quantize Q8_0 like this though if you're releasing a Q8_0 version of a model

This isn't a new quantization format though its just an alternate way to quantize Q8_0 that is very slightly better so I might just make an issue on github and show this to the devs and they can decide if/how they want to implement it.

Anonymous 11/12/2025, 5:28:29 PM No.107184830 [Report]

>>107184766
riddle me this, mongoloid, if it worked, why has there been not even one singular instance of enhanced productivity and velocity in open source projects where anyone can actually see the code and features being added? where are all the projects that were LLM boosted? you vibe coding niggers are always at the stage of useless prototype or wasting the rest of your team's time in your real life job, if you even have one
believe me every fucking developer in existence that actually produce value hate your guts with the force of a thousand sun
it used to be mosquitoes or cockroaches were the first thing one would push the genocide button on but I would argue your kind should be exterminated first
your ability to generate endless garbage with a few prompts is indeed like literal tumors but with contagion powers.

Anonymous 11/12/2025, 5:30:03 PM No.107184844 [Report] >>107185040 >>107185108

All this sperging because I asked about "vibe coding" tools?
Damn.

Anonymous 11/12/2025, 5:39:59 PM No.107184952 [Report]

jej

Anonymous 11/12/2025, 5:42:07 PM No.107184971 [Report] >>107184990 >>107185177 >>107185199

why is editing the thinking block so poorly supported in many frontends

Anonymous 11/12/2025, 5:44:26 PM No.107184990 [Report]

>>107184971
Such as?

Anonymous 11/12/2025, 5:48:41 PM No.107185040 [Report] >>107185148 >>107185160

>>107184844
"vibe coding" is an annoying buzzword that sets a lot people off. You might be received better if you ask for AI Agent-Assisted Development Tooling next time.

Anonymous 11/12/2025, 5:55:19 PM No.107185108 [Report]

to-bait-or-not-to-bait_thumb.jpg.webm md5: bb90a35c...

WebM not supported

>>107184844
You're damn right that vibe coding is for tools.

Anonymous 11/12/2025, 5:58:28 PM No.107185148 [Report] >>107185216

>>107185040
I suppose.
Trying to doge schizos is standard 4chan fare these days I guess.
Anyhow, impressed with Qwen3 30B. It's surprisingly usable for something with 3 active params.

Anonymous 11/12/2025, 5:59:00 PM No.107185154 [Report] >>107185469

>>107180688
>I'm just messing around, you can't make a format better than Q8_0. It's literally just a float16 multiplied by an int8.
Q8_1 (or whatever) was a float16 multiplied by an int8 and summed with another float16 instead of implicitly summed with 0. That's what q8 MLX does with a default group size of 64 rather than 32 which works out to the same amount of metadata per weight. I wonder if in practice it's typically a win.

Anonymous 11/12/2025, 5:59:42 PM No.107185160 [Report] >>107185173 >>107185248

>>107185040
NTA but Karpathy made that decision for us. I hated the term as well but if I don't use it somebody else will so might as well claim it.

Anonymous 11/12/2025, 6:01:32 PM No.107185173 [Report] >>107185186 >>107185472

>>107185160
why should we care what that anti open sores snake decides?

Anonymous 11/12/2025, 6:01:49 PM No.107185177 [Report] >>107185199

>>107184971
just be a grug and write your own scripts for anything that needs to be batched/chunked, and use mikupad for chat and hand edit things yourself
the more features frontends have the worse they are in real use

Anonymous 11/12/2025, 6:02:49 PM No.107185186 [Report]

>>107185173
It's less that he decided anything, and more that thought of a catchy term the zoomers instantly fell in love with and now everyone is using it.

Anonymous 11/12/2025, 6:04:58 PM No.107185199 [Report] >>107187103

>>107184971
llama-server default
lm studio
cherry studio

I have now resorted to sillytavern but I don't like it.
>>107185177
3 years into the LLM craze I would have hoped to have more robust tools. Then again I also experience so many rendering issues on OpenAI/Claude etc I guess frontend is just too hard to do properly.

Anonymous 11/12/2025, 6:06:55 PM No.107185216 [Report] >>107185256

>>107185148
>Anyhow, impressed with Qwen3 30B. It's surprisingly usable for something with 3 active params.
I wish they made a coder variant of the 32B. Would love to trade some speed for a more capable small model.

>>107184173
>A visual studio extension?
If you find one, let me know. Apparently no one interested working on these extensions is capable of anything but Python and JavaScript. I considered forking and developing one of the Chinese shoddy extensions, but it was easier to just use VSCode for this shit.

Anonymous 11/12/2025, 6:09:48 PM No.107185248 [Report] >>107185501

karpathy's hour long vibe coding session.jpg md5: 9d9e4598...

>>107185160
pic related is one of the things he showed as an example of proud vibe coding in the thread where he coined the term
this is the sort of shit bootcamp genz faggots could hand write in 10 minutes

Anonymous 11/12/2025, 6:10:19 PM No.107185256 [Report] >>107185303

>>107185216
>If you find one, let me know.
Coding agent extensions for vs code?
As one anon mentioned, there's Cline
There's Roo, a Cline fork and Continue.

Anonymous 11/12/2025, 6:15:30 PM No.107185303 [Report]

>>107185256
I keep Roo and Contiue installed. Continue is good for autocomplete and quick questions and Roo for agentic tasks. Tried Cline first, but the only thing it had over Roo was that it had a button to generate commit messages, but even that was annoying because it gives the model all changes instead of just what was staged and no way to change it.

Anonymous 11/12/2025, 6:23:22 PM No.107185380 [Report] >>107185454

Mistral Nemo really is nice... sad there's no bigger version.

Anonymous 11/12/2025, 6:26:07 PM No.107185406 [Report] >>107185425

file.png md5: e0ec1e1c...

am I retarded where are the rest of the sampler settings like min p?

Anonymous 11/12/2025, 6:27:45 PM No.107185425 [Report]

>>107185406
They don't show up in the chat completion interface, but you can still use them by setting those as custom properties/headers.
Same with shit like GBNF and anything else the API accepts.

Anonymous 11/12/2025, 6:30:07 PM No.107185454 [Report] >>107185474

>>107185380
could always merge two nemos together

Anonymous 11/12/2025, 6:31:46 PM No.107185469 [Report] >>107188630

>>107185154
>Q8_1 (or whatever) was a float16 multiplied by an int8 and summed with another float16 instead of implicitly summed with 0.
In practice it's typically a loss. Try it out yourself. Summing a float16 destroys any quality bonuses you get from having the extra info from the float16 bias in the first place. That's probably why Q8_1 isn't exposed and is only used internally for an intermediate step in some niche quants.

Yes, you can get slightly higher precision by using an int16 instead but it comes with 2 bytes more of overhead per 32 elements which is 9.0bpw and it performs worse than fp16 outlier strategies

another reminder that none of this matters (other than improving the quantization of Q8_0 itself, and maybe Q8_0_64 and its _IMP version because 3% less model size for 0.001% loss in accuracy might be interesting to some) because you can't practically a single fp16 * int8 calculation. you can easily imagine how well that could be optimized for hardware instructions

I'm gonna poke around and see if i can squeeze any better precision out of the Q8_0_IMP quantization function and then if I can' think of anything else, I'll open an issue

Anonymous 11/12/2025, 6:32:14 PM No.107185472 [Report]

>>107185173
Might as well ask why the state of Israel must exist

Anonymous 11/12/2025, 6:32:26 PM No.107185474 [Report] >>107185479 >>107185498 >>107185976

>>107185454
how
is it actually worth it?

Anonymous 11/12/2025, 6:33:00 PM No.107185479 [Report]

>>107185474
No. He's pulling your leg.

Anonymous 11/12/2025, 6:35:36 PM No.107185498 [Report] >>107185607

>>107185474
>how
you can easily google this, merging a model with itself slightly improves its intelligence

>is it actually worth it?
using local LLMs isn't worth it beyond learning how they work lol

Anonymous 11/12/2025, 6:35:42 PM No.107185501 [Report] >>107185670

>>107185248
I think you're overestimating the speed of development when hand coding

Anonymous 11/12/2025, 6:41:55 PM No.107185557 [Report] >>107185771 >>107190976

1740323820151276.png md5: dcb8e9b2...

WE MUST PROTECT AI CHILDREN

Anonymous 11/12/2025, 6:47:13 PM No.107185607 [Report] >>107185629

>>107185498
>you can easily google this
kys

Anonymous 11/12/2025, 6:49:28 PM No.107185629 [Report] >>107185634 >>107185672

>>107185607
dude just google "miqu-70b merged with itself" and the first result is miqu-120b ... and just do your own research from there

Anonymous 11/12/2025, 6:50:29 PM No.107185634 [Report] >>107185655

>>107185629
>just do your own research from there
kys gossipnigger

Anonymous 11/12/2025, 6:52:33 PM No.107185655 [Report]

>>107185634
>This is a 120b frankenmerge of miqu-1-70b created by interleaving layers of miqu-1-70b-sf with itself using mergekit.

There now you have the full spoonfeed. Go and use mergekit to interleave layers of mistral-nemo with itself

Anonymous 11/12/2025, 6:54:47 PM No.107185670 [Report]

>>107185501
And the attention required for manual implementation. Sometimes most of my brain is locked in on a specific big picture problem and it's very helpful to be able to delegate things to a language model to validate some random ideas.

In many cases the quality of the vibed LLM implementation is irrelevant (I might throw it out entirely) I just wanna see if something might be good to pursue further.

Anonymous 11/12/2025, 6:54:59 PM No.107185672 [Report] >>107185734

>>107185629
>70b + 70b = 120b
Where did the other 20b go?

Anonymous 11/12/2025, 7:01:20 PM No.107185734 [Report]

>>107185672
>Where did the other 20b go?
mergekit uses a passthrough method, which concatenates/assembles transformer blocks from the source(s) into a deeper model rather than just averaging weights

Anonymous 11/12/2025, 7:05:30 PM No.107185771 [Report] >>107185804

>>107185557
Even if the UK citizens voted against it they would still implement that law.

Anonymous 11/12/2025, 7:09:18 PM No.107185804 [Report]

>>107185771
>citizens voted against it
Huh

Anonymous 11/12/2025, 7:09:39 PM No.107185810 [Report] >>107185821 >>107185825 >>107185841 >>107185940 >>107186686 >>107187103 >>107191492 >>107191898

I have a genuine question.
Why the fuck is everyone so obsessed with making an LLM run as fast as possible?
I understand it for audio or images it's very important since the result is something we can process as fast as our brains can, but reading is very slow comparatively, and with token streaming wouldn't be the best choice to pick the smartest model that we can run at our reading speed?
What is the point of having an answer in seconds if we still need to take a minute to read it? But I do understand the want to run a small model to also be able run a tts and/or image model together.

Anonymous 11/12/2025, 7:10:57 PM No.107185821 [Report] >>107185909

>>107185810
for code or generating huge chunks of text you mostly skim, as well as reasoning which takes ages at reading speed

Anonymous 11/12/2025, 7:11:07 PM No.107185825 [Report] >>107185909 >>107186686

>>107185810
>Why the fuck is everyone so obsessed with making an LLM run as fast as possible?
because LLMs are mostly used for coding, and time is money

Anonymous 11/12/2025, 7:12:38 PM No.107185841 [Report] >>107185909

>>107185810
Because you need to reroll 46 times to get one usable line out of these POS

Anonymous 11/12/2025, 7:15:05 PM No.107185859 [Report]

Should I use I quants for >6_k_s?

Anonymous 11/12/2025, 7:20:15 PM No.107185909 [Report] >>107185938

>>107185821
>>107185825
Yeah I forgot lazy fucks just copy paste the code without reading it.

>>107185841
Yes, but wouldn't make sense to use a smarter model so you don't need to reroll as much? Besides you still need to read each reroll at the slow speed to know if you have to reroll to begin with.

Anonymous 11/12/2025, 7:23:04 PM No.107185938 [Report] >>107186110

>>107185909
I mean... it doesn't really take more than a few s to read the few sentences it gens, I'm not genning 4k token walls.

Anonymous 11/12/2025, 7:23:09 PM No.107185940 [Report] >>107186110

>>107185810
You might be a slow reader anon. Also it's fun to experiment with card settings and prompts, or reroll to see what else could happen. If your model is slow it greatly degrades the experience. Every time I switched to offloading to CPU I regretted it, the models are smarter but it's not worth it.

Anonymous 11/12/2025, 7:25:53 PM No.107185976 [Report]

>>107185474
iirc merging was based on the observation that residual layers (most transformers stack these) can work somewhat independently of each other. There was a paper (https://arxiv.org/abs/1605.06431) showing that you could permute/delete them with minimal performance degradation, and people attributed this to iterative refinement or ensemble-like behavior, but it's still an open problem to my knowledge. I'd assume adding layers from finetuned variants of a model shouldn't decrease performance by much, but idk if it would benefit either

Anonymous 11/12/2025, 7:27:05 PM No.107185984 [Report] >>107185998 >>107186002

Is there a collection of best practices to minimize prompt length without losing information?

Anonymous 11/12/2025, 7:28:04 PM No.107185998 [Report]

>>107185984
>chatgpt, condense this prompt without losing information

Anonymous 11/12/2025, 7:29:13 PM No.107186002 [Report]

>>107185984
>day 999 of reinventing /aids/
Does it really matter with the context sizes?

Anonymous 11/12/2025, 7:33:35 PM No.107186047 [Report] >>107186120

>day 999 of forcing /aids/ into the conversation

Anonymous 11/12/2025, 7:37:06 PM No.107186093 [Report]

maxresdefault.jpg md5: 500d4bdb...

/aids/? nobody's got /aids/!

Anonymous 11/12/2025, 7:38:40 PM No.107186110 [Report] >>107186301

>>107185938
Yes, but I usually read as it generates the answer.
>>107185940
Well, probably yes since I'm not a native English speaker, but I'm asking if it would make more sense to chose the best model according to your individual reading speed instead the one that runs as fast as possible. For example the best model I can run at my own reading speed on my 8GB card is a 16B Q4_k_m at 8k context or if I want a model with vision I run an 8B model Q6_k_m with 12k context.

Anonymous 11/12/2025, 7:39:12 PM No.107186120 [Report]

>>107186047
this
wow /aids/ touched on a fundamental behavior of LLMs at one point, so did every other LLM community, who cares? unless they have a specific ingenious solution that 1) still applies with modern models and 2) isn't already common knowledge, it's not worth bringing up

Anonymous 11/12/2025, 7:46:37 PM No.107186221 [Report] >>107186311

>tried the self merge
>it's full on repeating schizo
W A O W

Anonymous 11/12/2025, 7:48:31 PM No.107186244 [Report]

At this point I am checking /lmg/ out of habit. Still not tired of glmsex.

Anonymous 11/12/2025, 7:55:03 PM No.107186301 [Report] >>107186458

>>107186110
>16B
Q6_k_m
oh you're just a baitie

Anonymous 11/12/2025, 7:56:09 PM No.107186311 [Report] >>107186337 >>107186372 >>107186614 >>107186696

>>107186221
any model bigger than the original model made by internet randos was either:
snake oil
or literally broken garbage that's worse than snake oil
also fuck solar and other upscale retardation
you want a big model? spend the money on training a big model
there, that's it
everything else is a cope

Anonymous 11/12/2025, 7:58:33 PM No.107186337 [Report] >>107186374

>>107186311
brother the whole field is cope layered on more cope

Anonymous 11/12/2025, 8:01:40 PM No.107186372 [Report]

>>107186311
I don't think they're any smarter or better at actual problem solving than their source components but I think they can be more interesting for creative writing and similar tasks

Anonymous 11/12/2025, 8:01:42 PM No.107186374 [Report]

mood.png md5: cc3675ea...

>>107186337

Anonymous 11/12/2025, 8:09:07 PM No.107186458 [Report] >>107186591 >>107189178

>>107186301
With that lack of reading compression it's no wonder you read fast.
I said I can run at my slow reading speed:
-16B at Q4
-8B at Q6 with vision.

Anonymous 11/12/2025, 8:22:09 PM No.107186568 [Report]

Just tried GLM-4.5-Air EXL3 at 3.07 (optimized) bpw on 2x3090.

native tp (no nvlink), 30k context: 952 tok/s pp, 28 tok/s tgs
nccl tp (uses nvlink), 30k context: 1135 tok/s pp, 28 tok/s tgs

Anonymous 11/12/2025, 8:25:01 PM No.107186591 [Report] >>107186722

>>107186458
yes and 16b (one thing) and q6km (another) is bait

Anonymous 11/12/2025, 8:25:38 PM No.107186595 [Report]

i've been bragging about getting 18 tps on a 1080ti
but it turns out the vast majority was being offloaded onto my 5800x3d. pls ignore my bad benchmark.

Anonymous 11/12/2025, 8:28:08 PM No.107186614 [Report] >>107186640 >>107186696 >>107187264

>>107186311
I kind of never got how people expect this to work. Any "finetuning" does almost nothing because you have to do very little (one epoch) or you start overfitting and frying the model. If you add new layers you are just giving the training algorithm a place, which it can modify to reach the overfitting state faster. Even if you would train only those layers it is hard to imagine not overfitting.

I guess in the best case you could get the model to output a specific type of output like specific formatting or something, but that is only if the possibility of it was already in the model. You aren't teaching it new things this way. It is just impossible.

Anonymous 11/12/2025, 8:31:08 PM No.107186640 [Report] >>107186663 >>107186803

stop_fud.png md5: fdc553b3...

>>107186614

Anonymous 11/12/2025, 8:33:19 PM No.107186663 [Report] >>107186670

>>107186640
You can't rag your model into being an expert masterpiece highest quality ERP-er. You just need to buy ram for 4.6.

Anonymous 11/12/2025, 8:34:02 PM No.107186670 [Report]

>>107186663
oh, just a NAI shill, carry on sir

Anonymous 11/12/2025, 8:35:22 PM No.107186686 [Report] >>107186701 >>107186725

>>107185810
>>107185825
I could wait for code 2 or 3 days, if it worked and was accurate. But bigger models are not that smart.

Anonymous 11/12/2025, 8:36:02 PM No.107186696 [Report] >>107186720 >>107186726 >>107186744 >>107187365

>>107186311
>>107186614
The psycology that is in effect when people are making finetunes is the same as when people are making "ShadowMaster's Ultra-High-Res Skyrim Grass Modpack"

1) Feeling of acomplishment. Technically, they did manage to create a mod pack. This is fine.
2) Denial of skill and expertise. "If the game developers were as smart as me, they would have made the grass more high resolution."
3) Denial of their role in the consumer class. "People are downloading my mod, so I've created something of value, just like the game's developers."
4) Denial of taste. "I like my high res grass (although I'm unaware that it's becuase of reasons 1-3). Anyone who says it's shit must be jealous or just have different taste. Therefore, the fact that I can't tell that it's ugly doesn't mean I lack taste."
5) Imitation of academic tradition. "There's something named after me."

It's literally the same exact brain damage for finetunes. There was a very brief period where finetuning was being invented, where individual people were going back and finetuning the earlier untuned models. That was valid, but everything for the last year is cope.

Seriously, if finetuning was good, don't you think the billion dollar company would have someone doing it? They are better than you at this. Only delusion prevents this realization.

Anonymous 11/12/2025, 8:36:29 PM No.107186701 [Report]

>>107186686
Yes of course, run it overnight, heard ALL about it when llama 405B dropped. So many people do this it's crazy!

Anonymous 11/12/2025, 8:38:30 PM No.107186720 [Report] >>107186730 >>107186737

>>107186696
i don't think you know what finetuning means

Anonymous 11/12/2025, 8:38:40 PM No.107186722 [Report] >>107186747

Captura de pantalla_20251112_163120.png md5: 67429221...

>>107186591
I don't understand what are you trying to say then, this is the speed I get with the 8B model with vision enabled and it is a Q6 and it's a lot faster than I can read English

Anonymous 11/12/2025, 8:38:58 PM No.107186725 [Report]

>>107186686
Right?
If there was a model that would take 3 days to spit out what you need but would get it exactly right every time, I'd be more than happy leaving the thing running.
Alas, that's not yet a thing.

Anonymous 11/12/2025, 8:38:58 PM No.107186726 [Report]

>>107186696
drummer mentioned

Anonymous 11/12/2025, 8:39:24 PM No.107186730 [Report] >>107186743

>>107186720
Hi faggot, all here...

Anonymous 11/12/2025, 8:39:58 PM No.107186737 [Report]

>>107186720
people post training or merging or whatever to create mods of existing models. releasing the whole model instead of a lora

Anonymous 11/12/2025, 8:40:15 PM No.107186743 [Report]

>>107186730
uh, yeah, right?

Anonymous 11/12/2025, 8:40:19 PM No.107186744 [Report] >>107186755 >>107186762

>>107186696
this post was written by an llm

Anonymous 11/12/2025, 8:40:41 PM No.107186747 [Report] >>107186821

>>107186722
>Captura de pantalla
lolmao
what 16b are you running little bro

Anonymous 11/12/2025, 8:41:29 PM No.107186755 [Report] >>107186787

>>107186744
>ShadowMaster's Ultra-High-Res Skyrim Grass Modpack

Make your LLM output that. I dare you.

Anonymous 11/12/2025, 8:41:59 PM No.107186762 [Report]

>>107186744
>nigger seeing capitol letters on four chan

Anonymous 11/12/2025, 8:42:25 PM No.107186768 [Report] >>107186796

this post was written by an esl

Anonymous 11/12/2025, 8:43:50 PM No.107186787 [Report] >>107187103

>>107186755
that's possibly the most llm-y part of the post, kimi for example is addicted to unnecessary little flourishes like that

Anonymous 11/12/2025, 8:44:08 PM No.107186796 [Report]

>>107186768
esl hobby sir de pantella Pareto paradigm just mooned

Anonymous 11/12/2025, 8:44:44 PM No.107186803 [Report] >>107186817

>>107186640
The real misconception is that the model parroting finetuning data means it has learned new knowledge. A tiny QLoRA adapter is enough for that, for limited amounts of data. But it doesn't really mean the model has actually learned to use and apply any new information.

Anonymous 11/12/2025, 8:45:55 PM No.107186817 [Report]

>>107186803
>noooo muh mesugaki lightbulb bublesort benchie

Anonymous 11/12/2025, 8:46:24 PM No.107186821 [Report] >>107186830 >>107186933

>>107186747
Fuck me, do you even know how to read numbers? I said is a 8b model.
The 16b model runs at 8 tokes per second.

Anonymous 11/12/2025, 8:47:13 PM No.107186830 [Report] >>107186876

>>107186821
i'm asking which 16b you claim to run ffs

Anonymous 11/12/2025, 8:49:31 PM No.107186860 [Report] >>107186867 >>107186873

drummer getting desperate ITT...

Anonymous 11/12/2025, 8:50:16 PM No.107186867 [Report]

>>107186860
leave the Pantella frontier alone!

Anonymous 11/12/2025, 8:51:22 PM No.107186873 [Report]

>>107186860
kofi bucks running low his discord are ungratefulls

Anonymous 11/12/2025, 8:52:02 PM No.107186876 [Report] >>107186884 >>107186919 >>107186933 >>107186944

>>107186830
I swap between these two depending on the mood:
LLama-3.1-128k-Darkest-Planet-Uncensored-16.5B-Q4_k_m
Nyanade_Stunna-Maid-7B-v0.2-Q6_K-imat

Also, the vision model it's a 7b, not an 8b.

Anonymous 11/12/2025, 8:53:13 PM No.107186884 [Report] >>107186919 >>107186936

>>107186876
and there we go...
>128k-Darkest-Planet-Uncensored-16.5B
a davidau clownmoe atrocity

Anonymous 11/12/2025, 8:56:39 PM No.107186919 [Report]

>>107186876
>Darkest-Planet-Uncensored
That's so fucking funny.

>128k
I bet it is.

>>107186884
>davidau
Figures.
I love that guy man. I always get a chuckle out of his shit on huggingface.

Anonymous 11/12/2025, 8:58:40 PM No.107186933 [Report]

>>107186821
>do you even know how to read numbers? I said is a 8b model.
>>107186876
>it's a 7b, not an 8b.
Womp womp

Anonymous 11/12/2025, 8:58:56 PM No.107186936 [Report] >>107186946

>>107186884
Yes, and? I'm just discussing the sizes of models and their running speeds, not what they are for.

Anonymous 11/12/2025, 8:59:47 PM No.107186944 [Report]

file.png md5: 395c1d20...

>>107186876
>LLama-3.1-128k-Darkest-Planet-Uncensored-16.5B-Q4_k_m
>Nyanade_Stunna-Maid-7B-v0.2-Q6_K-imat

Anonymous 11/12/2025, 9:00:13 PM No.107186946 [Report]

>>107186936
The running speed of atrocities in their own size class is surely widely useful info, thanks anon.

Anonymous 11/12/2025, 9:04:51 PM No.107186998 [Report] >>107187017

For me it's the pre Llama2 merges consisting of 278 nestled models (confirmed)

Anonymous 11/12/2025, 9:06:18 PM No.107187017 [Report]

>>107186998
Utopia/UtopiaXL my beloveds

Anonymous 11/12/2025, 9:14:09 PM No.107187103 [Report]

>>107185199
>3 years into the LLM craze I would have hoped to have more robust tools.
I'll bet their readme files on their git repos have been the bulk of their merge histories.
>>107185810
Fried dopamine receptors needing faster validation. Every other answer is cope.
>>107186787
This is why Kimi is so good.

Anonymous 11/12/2025, 9:29:15 PM No.107187264 [Report] >>107187294 >>107187357 >>107187393

>>107186614
You can do multiple epochs over the data you want to actually train on by diluting it with more generic data.
Also what makes you think you can't teach the model something in one epoch? Pretraining is often just 1 epoch.

Anonymous 11/12/2025, 9:31:19 PM No.107187294 [Report]

>>107187264
>Pretraining is often just 1 epoch.
pretty sure that hasn't been true in years, that's how they get to claim their crazy 30T+ tokens by doing multi epochs on the same shit, also iirc some papers showed they specifically did multiple epochs of stuff like wikipedia.

Anonymous 11/12/2025, 9:33:45 PM No.107187326 [Report] >>107187348 >>107187354 >>107187660 >>107188323

yo is it just me or is QwQ weirdly better than you'd expect? feels like it punches way above it's weight, least slopped and smartest ~30B model in my book (compared to Qwen3 30 & 32, magistral and gemma)

Anonymous 11/12/2025, 9:35:31 PM No.107187348 [Report] >>107187375 >>107187417

>>107187326
>punches way above it's weight
HELL YEAH!!
>>107182378

Anonymous 11/12/2025, 9:35:50 PM No.107187354 [Report] >>107187363

>>107187326
I don't think I've seen one good Qwen model but IG I'll download it and see

Anonymous 11/12/2025, 9:35:57 PM No.107187357 [Report] >>107187373 >>107187450

>>107187264
One pretraining epoch has information repeated hundreds (at the minimum) or thousands of times in many different ways, though.

Anonymous 11/12/2025, 9:36:32 PM No.107187363 [Report]

>>107187354
Qwen models post 2507 are all pretty good

Anonymous 11/12/2025, 9:36:36 PM No.107187365 [Report]

>>107186696
They don't because they don't have an ML department and they don't want to invest resources into something that sounds technical and risky/scary.
My boomer boss literally thinks you can "train the AI with your own data" with <shitty low code software> but finetuning is "too low level".

Anonymous 11/12/2025, 9:36:46 PM No.107187369 [Report] >>107187374 >>107187716

it's out
https://openai.com/index/gpt-5-1/

Anonymous 11/12/2025, 9:37:03 PM No.107187373 [Report] >>107187446

>>107187357
Not on our proprietary high quality de duplicated filtered dataset sir.

Anonymous 11/12/2025, 9:37:07 PM No.107187374 [Report]

>>107187369
buy an ad

Anonymous 11/12/2025, 9:37:10 PM No.107187375 [Report] >>107187386

>>107187348
How did soul not make the list?

Anonymous 11/12/2025, 9:38:04 PM No.107187386 [Report]

>>107187375
because soul is sovl of course

Anonymous 11/12/2025, 9:38:56 PM No.107187393 [Report] >>107187408 >>107187433 >>107187497

>>107187264
Ok drummer then where is that one model that is actually noticeably better? And why do you shit out new models every few weeks? I have not seen a single fine-tune that delivered an ERP improvement you get when you jump from 7B>30B>70B>the land of eternal magical sex (4.6)

Anonymous 11/12/2025, 9:39:57 PM No.107187408 [Report] >>107187434 >>107187444

>>107187393
>the land of eternal magical sex (4.6)
buy the ad NAI shill

Anonymous 11/12/2025, 9:41:03 PM No.107187417 [Report]

>>107187348
>slop words:
>slop
Russell's Paradox?

Anonymous 11/12/2025, 9:42:03 PM No.107187433 [Report]

>>107187393
>tunes and drummer are bad because we don't have them on NAI

Anonymous 11/12/2025, 9:42:04 PM No.107187434 [Report] >>107187442

>>107187408
It is just a number. I didn't say the model actual name. You see NAI everywhere anon.

Anonymous 11/12/2025, 9:43:08 PM No.107187442 [Report] >>107187467

>>107187434
With how much you guys are spamming about muh glm sex it's very obvious what you meant.

Anonymous 11/12/2025, 9:43:13 PM No.107187444 [Report]

>>107187408
Based.

Anonymous 11/12/2025, 9:43:18 PM No.107187446 [Report]

>>107187373
Deduplication removes identical documents, not repeated information, though. It's the repeated information under many different contexts that gives LLM general knowledge. One epoch of information that is only mentioned and used once won't work.

Anonymous 11/12/2025, 9:43:44 PM No.107187450 [Report] >>107187488

>>107187357
There are ways to do data augmentation and synthetic data generation for finetuning. That's the main strength of finetuning IMO.
Any system prompt can be baked into a model through SFT on the generated data, except without wasting context or the model becoming confused due to too many rules. Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.

Anonymous 11/12/2025, 9:44:58 PM No.107187467 [Report] >>107187515

>>107187442
But I run my 4.6 locally...

Anonymous 11/12/2025, 9:46:26 PM No.107187488 [Report] >>107187559

>>107187450
>Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.
I will try to get it. For fun. So that expert erp-er prompt actually helps a 7B avoid the surprise prostate/ kissing while blowjob problem? Have you tested it?

Anonymous 11/12/2025, 9:46:53 PM No.107187497 [Report] >>107187507

>>107187393
I'm not Drummer, my personal finetuning dataset is private and meant to teach a model how to work effectively with my own code assistant which is also not public.
But I do want to add fiction/roleplay data to it as well to reduce the slop to a bearable level (I've tried and failed to get rid of it by system prompting).

Anonymous 11/12/2025, 9:47:44 PM No.107187507 [Report] >>107187519

>>107187497
You just need to switch to a good model, that's the simple fix.

Anonymous 11/12/2025, 9:48:47 PM No.107187515 [Report] >>107187538

>>107187467
>i keep hearing so much about 4.6 but I have a shitty pc, where can I use it online?
>oh, I don't want to use kimi because nobody is spamming about it!

Anonymous 11/12/2025, 9:49:09 PM No.107187519 [Report] >>107187541

>>107187507
There aren't any good open weights models.

Anonymous 11/12/2025, 9:50:36 PM No.107187538 [Report] >>107187617

>>107187515
Sucks to suck for kimi but it is outside an AM5 range. I am gonna fangirl kimi once it is runnable without a server.

Anonymous 11/12/2025, 9:50:49 PM No.107187541 [Report] >>107187595

>>107187519
zai-org/GLM-4.6-SEXXO
exists

Anonymous 11/12/2025, 9:52:11 PM No.107187559 [Report]

>>107187488
I only began seriously finetuning a couple weeks ago. I'll give it a year to see what is possible to achieve as a hobbyist.

Anonymous 11/12/2025, 9:54:29 PM No.107187595 [Report] >>107187602 >>107187606 >>107188073 >>107188089

>>107187541
When a programming task is too hard GLM always adds fake placeholder code to generate fake but real looking data, then claims everything was done perfectly. It also sometimes gets stuck in loops and uses file edit tools in a way I don't like (rewrites existing code rather than making atomic edits).

Anonymous 11/12/2025, 9:54:48 PM No.107187599 [Report]

heh.png md5: 474a1d16...

Anonymous 11/12/2025, 9:55:15 PM No.107187602 [Report] >>107187623 >>107187628

>>107187595
Which part of sex you don't understand?

Anonymous 11/12/2025, 9:55:22 PM No.107187606 [Report] >>107187623 >>107187628

>>107187595
i said SEXXO you nerd

Anonymous 11/12/2025, 9:56:01 PM No.107187617 [Report] >>107187625 >>107187631 >>107187637

>>107187538
>I am gonna fangirl kimi once it is runnable without a server.
Kimi K3 Bitnet would be the best Christmas present

Anonymous 11/12/2025, 9:56:28 PM No.107187623 [Report] >>107187636

>>107187602
>>107187606
Sex with your hand you mean?

Anonymous 11/12/2025, 9:56:48 PM No.107187625 [Report] >>107187800

>>107187617
>Bitnet
let it go, just wait for 4.6 ait to cook

Anonymous 11/12/2025, 9:57:05 PM No.107187628 [Report]

>>107187602
>>107187606
What memelang is that?

Anonymous 11/12/2025, 9:57:12 PM No.107187631 [Report]

>>107187617
>bitnet
wake up

Anonymous 11/12/2025, 9:57:44 PM No.107187636 [Report]

>>107187623
Absolutely. And it also fucked my brain this past month.

Anonymous 11/12/2025, 9:57:50 PM No.107187637 [Report] >>107187662

>>107187617
pupper farm reminds of when the qwen 3s said they would bitnet and shit good times of cope

Anonymous 11/12/2025, 9:59:26 PM No.107187659 [Report] >>107187691

For me, it's DavidAU clown car MoEs
SCHIZOMAXXING

Anonymous 11/12/2025, 9:59:29 PM No.107187660 [Report] >>107187814 >>107187845

>>107187326
Yea I like it. I think it was made shortly after deepseek r1, and it was qwen's "lets try reasoning" experiment. It was basically reasoningmaxxed and neglected creating writing or flowery responses as a result. It's dank for generating really specific brief answers, but has no value for erp.

Anonymous 11/12/2025, 9:59:36 PM No.107187662 [Report] >>107187683

>>107187637
I love how they never mentioned it again.
Not "oh it doesn't work" not "oh we ran out of time, maybe for qwen 4" just total memory hole silence.

Anonymous 11/12/2025, 10:01:29 PM No.107187683 [Report] >>107188008

jensen_thumb.jpg.webm md5: dba15a9a...

WebM not supported

>>107187662
Jensen gave them a visit to stop that.

Anonymous 11/12/2025, 10:02:18 PM No.107187691 [Report]

>>107187659
Llama-3-70B-Instruct-Failed-Cryogenic-Reanimation-Support-Group-Moderator-Q4_K_M
Mixtral-8x7B-v0.1-Amateur-Body-Snatcher-But-For-Garden-Gnomes-Q5_K_M
Qwen2-72B-Chat-Evil-Super-Villain-With-A-Very-Specific-Allergy-To-Peanuts-Q8_0
Gemma-2-27B-Accidentally-Summoned-A-Demon-While-Trying-To-Make-A-Vegan-Quesadilla-Q6_K
Mistral-7B-v0.3-Excommunicated-Monk-Who-Now-Runs-A-Successful-OnlyFans-Q4_K_S
Phi-3-mini-4k-instruct-Haunted-Doll-That-Just-Gives-Unsolicited-Parenting-Advice-Q3_K_M
Solar-10.7B-Instruct-Graverobber-Who-Only-Takes-The-Shoes-Q5_0
Yi-1.5-34B-Chat-Cult-Leader-But-The-Cult-Is-Just-About-Organizing-Your-Spice-Rack-Alphabetically-Q4_K_M
DeepSeek-Coder-V2-Lite-Base-Argues-With-Your-Smart-Fridge-About-Your-Eating-Habits-Q6_K

Anonymous 11/12/2025, 10:02:25 PM No.107187695 [Report]

>model would rather eat a newborn than say nigger
Weirdly real-like...

Anonymous 11/12/2025, 10:04:03 PM No.107187716 [Report] >>107187731 >>107187768 >>107187785

>>107187369
>GPT‑5.1 Instant, ChatGPT’s most used model, is now warmer by default and more conversational. Based on early testing, it often surprises people with its playfulness while remaining clear and useful.
That's such a massively gay backpedalling on what was a legit improvement
GPT-5 was so much better than 4o in tone
fuck this gay earth

Anonymous 11/12/2025, 10:05:51 PM No.107187731 [Report]

>>107187716
shit it's actually real? so used to cat posts.. it seems quick for a .1

Anonymous 11/12/2025, 10:09:53 PM No.107187768 [Report] >>107187785

>>107187716
their employees were getting mobbed by mentally ill #save4o people on xitter so they caved

Anonymous 11/12/2025, 10:11:58 PM No.107187785 [Report] >>107187803 >>107187823

>>107187716
>>107187768
a single chat with kimi would kill an average white woman

Anonymous 11/12/2025, 10:13:17 PM No.107187800 [Report] >>107187818

>>107187625
Everything is going to be natively trained at int4/fp4 within 6 months, with Hadamard, at the very least. Then the jump to binary/ternary is small.

Anonymous 11/12/2025, 10:13:24 PM No.107187803 [Report]

>>107187785
Kimi do be really nice on creative writing brainstorming, cause it's the only "stock" model I know to openly tell you your idea is dogshit and you should feel bad.

Anonymous 11/12/2025, 10:14:46 PM No.107187814 [Report] >>107187832 >>107187902

>>107187660
Wasn't it before r1? I remember it as the first open cot model (before we only had llama with a think step by step prompt larp)

Anonymous 11/12/2025, 10:15:02 PM No.107187818 [Report]

>>107187800
I hope not but we do always head the worst direction so you're likely right :(

Anonymous 11/12/2025, 10:15:17 PM No.107187823 [Report]

>>107187785
Or brown man for that matter. Kimi does not fuck around.

Anonymous 11/12/2025, 10:16:09 PM No.107187832 [Report]

>>107187814
iirc between deepseek's online only reasoner preview and R1 releasing yeah

Anonymous 11/12/2025, 10:17:28 PM No.107187845 [Report] >>107187872 >>107187907

>>107187660
I disagree. It was very very good for ERP especially for its size and how it had no right to be good. My bet is they reasonmaxxed so hard it fucked with the censorship. Also if i remember the trick for sex was to use it without reasoning.

Anonymous 11/12/2025, 10:20:12 PM No.107187872 [Report]

>>107187845
>Also if i remember the trick for sex was to use it without reasoning.
that would just give you almost stock qwen 2.5 experience, I think you are misremembering something, there was qwq preview, which was absolute dogshit and then the proper qwq, which distilled r1 a bit afterwards

Anonymous 11/12/2025, 10:22:22 PM No.107187902 [Report] >>107187912 >>107187924

>>107187814
that was qwq preview (which sucked)
full qwq was released after r1 and was basically a distill done right

Anonymous 11/12/2025, 10:22:48 PM No.107187907 [Report]

>>107187845
>My bet is they reasonmaxxed so hard it fucked with the censorship.
This happens with all the non-mainline Qwen models. I think they just put less effort into the safetymaxxing of their specialist finetunes.
I have personal benchsets of text to translate, the recent VL models for example are much more likely to accurately translate expletives like "Fuck!" from their corresponding terms in other languages. Whereas mainline Qwen 3 tends to go for "Holy crap!", "Damn" etc and tries very hard not to say "Fuck!".
Their coder models are also more compliant. But not too good for translation because they fuck the multilingualism very hard on those.

Anonymous 11/12/2025, 10:23:10 PM No.107187912 [Report]

>>107187902
>distill done right
oxymoron

Anonymous 11/12/2025, 10:24:00 PM No.107187924 [Report]

>>107187902
The preview had a repetition loop issue but was more creative and less censored. The full they distilled from R1 resolved the looping issue but was censored, less creative, and more schizo.

Anonymous 11/12/2025, 10:25:58 PM No.107187934 [Report] >>107187953

I don't understand people who have the patience for the amount of thinking tokens R1 and its distilled derivatives tend to output. R1 was the worst thing to happen to open models. (ds v3 was great tho')

Anonymous 11/12/2025, 10:27:57 PM No.107187953 [Report]

>>107187934
at least it actually somewhat used the thinking, new deepseek "thinks" for half a second, doesn't plan anything and just still answer whatever

Anonymous 11/12/2025, 10:33:58 PM No.107188008 [Report] >>107188019 >>107188032

>>107187683
OH yes feed me daddy

Anonymous 11/12/2025, 10:35:12 PM No.107188019 [Report]

>>107188008
>tfw you will never play the pocky game with Jensen

Anonymous 11/12/2025, 10:36:41 PM No.107188032 [Report] >>107188182

>>107188008
>jensen huang handing you your daily food ration after agi utopia was achieved

Anonymous 11/12/2025, 10:41:26 PM No.107188073 [Report] >>107188080 >>107188460

>>107187595
I can tell most of you are vibe coders because it took me a couple of days coding with LLMs to figure out how to avoid this

It's also how I get good usable code out of small models like devsteal and qwen coder moe

hint: it's a technique every junior coder is taught in every bootcamp

Anonymous 11/12/2025, 10:42:23 PM No.107188080 [Report] >>107188090 >>107188097

>>107188073
>devsteal
give it back jamarius!

Anonymous 11/12/2025, 10:43:50 PM No.107188089 [Report]

>>107187595
Claude used to do that too

Anonymous 11/12/2025, 10:43:51 PM No.107188090 [Report] >>107188108 >>107188122

>>107188080
Kek
Devstral*

Anonymous 11/12/2025, 10:44:51 PM No.107188097 [Report] >>107188108 >>107188122

>>107188080
Kek
Devstral*

Anonymous 11/12/2025, 10:45:42 PM No.107188108 [Report] >>107188118

>>107188090
>>107188097
now you go and steal tow posts, you can't keep getting away with this mate

Anonymous 11/12/2025, 10:46:26 PM No.107188117 [Report] >>107188141 >>107188157 >>107188603 >>107190002

What's the best Speech to Speech for pure Youtube voice overs (*.MP3, *.WAV, *.FLAC).

The goal here is to not disclose my voice on the internet, make my voice deeper, and make the voice over cleaner and more intelligible (I have an accent). I will imprint the emotions in my voice, the model just needs to change the sound of my voice.

I really need the focus to be on it sounding as human as possible, I do not care about real-time voice changing.

Anonymous 11/12/2025, 10:46:27 PM No.107188118 [Report]

>>107188108
Guess what

I stole two (you)'s two ;)

Anonymous 11/12/2025, 10:46:39 PM No.107188122 [Report]

>>107188090
>>107188097
>exactly a minute apart
nice try faggot

Anonymous 11/12/2025, 10:48:07 PM No.107188141 [Report] >>107188152 >>107188582

>>107188117
Good morning sir
This is retarded qvestion sir just use the texts to speeches

Anonymous 11/12/2025, 10:49:25 PM No.107188152 [Report]

>>107188141
Thank you sir, will do.

Anonymous 11/12/2025, 10:49:55 PM No.107188157 [Report] >>107188167

>>107188117
>make my voice deeper
not using female voice, you are ngmi my dude

Anonymous 11/12/2025, 10:51:13 PM No.107188167 [Report]

>>107188157
voicecel need love and support anon

Anonymous 11/12/2025, 10:53:26 PM No.107188182 [Report] >>107188258 >>107188375

>>107188032
>agi utopia
= plapping my robowaifu deep into her self-sanitising orifices while she tells me I'm a good boy

Anonymous 11/12/2025, 10:58:47 PM No.107188220 [Report] >>107188230

most of you will have become impotent and limp before we reach the stage of robowaifu

Anonymous 11/12/2025, 10:59:54 PM No.107188230 [Report]

>>107188220
there will be drugs for that.

Anonymous 11/12/2025, 11:02:22 PM No.107188251 [Report] >>107188265 >>107188274 >>107188911

>System prompt : You are Ani, an AI assistant. Anon is a boy.
>Anon : State yourself, Ani
>Ani : I'm Ani, a boy
This is damage from quantized right? I'm still downloading higher quantz to check.

Anonymous 11/12/2025, 11:03:14 PM No.107188258 [Report] >>107188384

>>107188182
And jensen gently places his pocky on your tongue mid thrust.

Anonymous 11/12/2025, 11:03:42 PM No.107188265 [Report] >>107188314

>>107188251
abliterated?

Anonymous 11/12/2025, 11:04:20 PM No.107188274 [Report] >>107188289 >>107188331

>>107188251
Oh, yes. The mystery quant of the mystery model. Yes. It's either that or something else.

Anonymous 11/12/2025, 11:05:25 PM No.107188289 [Report]

>>107188274
shut up nerd

Anonymous 11/12/2025, 11:08:28 PM No.107188314 [Report] >>107188358

>>107188265
I'm not sure, it's from https://huggingface.co/irmma/ERNIE-4.5-21B-A3B-PT-Q4_K_S-GGUF
This is the only link after the original model updated. Prob mistake in quantizing too.
Still downloading Unsloth, but this release is outdated one.

Anonymous 11/12/2025, 11:09:41 PM No.107188323 [Report]

>>107187326
>yo is it just me or is QwQ weirdly better than you'd expect?

$$\boxed{\text{yes}}$$

Anonymous 11/12/2025, 11:10:56 PM No.107188331 [Report]

yZGCPZzWsZdUj65UUQhMyHvbQ4k=.gif md5: dcdf4d92...

>>107188274
please tell me more about the mystery quant

Anonymous 11/12/2025, 11:12:00 PM No.107188339 [Report]

x.6 made looking for better model on hf obsolete.

Anonymous 11/12/2025, 11:12:44 PM No.107188346 [Report] >>107188419 >>107188472

>Hard drives on backorder for two years as AI data centers trigger HDD shortage
the more new models get released, both closed api models and open source, the more I feel all the negative impact of the existence of this tech is not really paying off with the positive

Anonymous 11/12/2025, 11:12:48 PM No.107188347 [Report] >>107188913

>>107184585
>You can quantize to Q8_0 slightly better by using the importance-weighted optimizations that smaller
You're not referring to imatrix right?

Anonymous 11/12/2025, 11:13:53 PM No.107188358 [Report] >>107188364

>>107188314
>A3B
>Q4_K_S
>retarded
Gee, I wonder why.

Anonymous 11/12/2025, 11:14:20 PM No.107188364 [Report] >>107188378 >>107188389

>>107188358
stop that and properly help anon

Anonymous 11/12/2025, 11:15:35 PM No.107188375 [Report]

>>107188182
>Of course, you are absolutely right anon! *the robot leans in for conspiratorial whisper that shivers your spine*

Anonymous 11/12/2025, 11:15:58 PM No.107188378 [Report]

>>107188364
He can't. That's why he's shitposting and has nothing worthwhile to add.
The LLM is compelled to respond when you hit enter and some anons are compelled to vomit tokens into the post submission field.

Anonymous 11/12/2025, 11:16:43 PM No.107188384 [Report]

>>107188258
UGHhHG-G-g--g-- when anon sends you over the edge

Anonymous 11/12/2025, 11:17:30 PM No.107188389 [Report] >>107188399

>>107188364
>properly help anon
to be honest: just don't use that model, it's one of the millions of garbage models china has flooded the market with
aside from qwen and deepseek the rest are pretty unserious

Anonymous 11/12/2025, 11:18:22 PM No.107188399 [Report] >>107188416

>>107188389
>What is the Kimi and GLM4.6

Anonymous 11/12/2025, 11:19:37 PM No.107188416 [Report]

>>107188399
kimi yes; glm no

Anonymous 11/12/2025, 11:19:57 PM No.107188419 [Report]

>>107188346
Don't worry, if there wasn't an AI boom, they'd just start burying hardware in concrete like the Funko Pops

Anonymous 11/12/2025, 11:24:46 PM No.107188460 [Report]

>>107188073
I can tell most of you are poor because I just pay an ukranian draft dodger to code for me

Anonymous 11/12/2025, 11:26:17 PM No.107188472 [Report] >>107188522 >>107188531

>>107188346
The driving force is wrongly "let's replace the workforce with text prediction" instead of "let's create very cute robowaifus"

Anonymous 11/12/2025, 11:34:34 PM No.107188522 [Report] >>107188636

>>107188472
>"let's replace the workforce with text prediction"
That's not really true when every actual AI application required an order of magnitude of workers more for supervision than if they just let the people do it
It's just taking advantage of a bubble where saying muh AI has a 90% chance of making your market cap 20x what it was three years ago

Anonymous 11/12/2025, 11:35:25 PM No.107188531 [Report] >>107188543 >>107188561 >>107188870

>>107188472
So you would rather have people like me slave away for your fake disability scam aka neetbucks? Fuck you nigger.
If you weirdos get off to anime and text porn I've no doubt you could get off to a talking fleshlight but society isn't made to please your fucked up fetishes. Keep using your hand like everyone else faggot. And jerk off to real women not your fake animu that you cope with because you know no "3dpd" would ever dare to touch your dick that hasn't been washed in a year.

Anonymous 11/12/2025, 11:36:59 PM No.107188543 [Report]

>>107188531
hello llm

Anonymous 11/12/2025, 11:38:46 PM No.107188561 [Report]

>>107188531
Go back to work

Anonymous 11/12/2025, 11:41:51 PM No.107188582 [Report] >>107189655

>>107188141
This is a stupid answer because the TTS result is subpar compared to conversions, people notice it's AI and click away.

Anonymous 11/12/2025, 11:44:51 PM No.107188603 [Report] >>107188641

>>107188117
Nothing has surpassed RVC yet.

Anonymous 11/12/2025, 11:48:44 PM No.107188630 [Report]

>>107185469
Without changing the format, that is the best you can probably do. If you could do an overhaul, Trellis quanting via QTIP is SOTA and you have formats from ik_llama.cpp and EXL3 being based on that.

Anonymous 11/12/2025, 11:49:15 PM No.107188636 [Report]

>>107188522
saar do not redeem

Anonymous 11/12/2025, 11:50:18 PM No.107188641 [Report]

>>107188603
ESL retard

Anonymous 11/12/2025, 11:58:27 PM No.107188709 [Report] >>107188733 >>107188816

how many of you have embraced the ssdmaxx kimi life?

Anonymous 11/13/2025, 12:01:37 AM No.107188733 [Report] >>107188780 >>107188843

>>107188709
>ssdmaxx
not a real thing
even the most autistic schizos here are not going to waste away at 0.2t/s

Anonymous 11/13/2025, 12:06:57 AM No.107188780 [Report] >>107188853

>>107188733
I get around 1 t/s though
which actually isn't totally unuseable

Anonymous 11/13/2025, 12:13:15 AM No.107188816 [Report] >>107188853

>>107188709
I'm happy with 2.1 t/s Kimi.

Anonymous 11/13/2025, 12:17:37 AM No.107188843 [Report]

>>107188733
4x pcie5 ssds in raid-0 would be slightly above dual-channel ddr4 level in terms of bandwidth, assuming you have the right system.

Anonymous 11/13/2025, 12:19:47 AM No.107188853 [Report] >>107188920

>>107188780
>>107188816
What do you guys use it for?
Or put another way, how does an average session of using the model goes at those speeds?

Anonymous 11/13/2025, 12:22:54 AM No.107188870 [Report] >>107188891

>>107188531
>And jerk off to real women
I can't remember the last time I've done that

Anonymous 11/13/2025, 12:25:47 AM No.107188891 [Report]

>>107188870
Now I realize that it's probably been if not upwards of, at least close to a year since I last did.

Anonymous 11/13/2025, 12:26:02 AM No.107188894 [Report] >>107189025

file.png md5: 13faae28...

Can someone explain the discrepancy in the measured memory bandwidth for the EPYC 9275F vs the 9255? I thought the memory bandwidth was correlated to the amount of CCDs the CPU has, and the 9275F has 8 vs the 9255's 4.
https://jp.fujitsu.com/platform/server/primergy/performance/pdf/wp-performance-report-primergy-rx2450-m2-ww-ja.pdf

Anonymous 11/13/2025, 12:27:21 AM No.107188911 [Report]

>>107188251
Tested with the outdated Unsloth Q6_K model, it seems not mixing User and Assistant role.

Anonymous 11/13/2025, 12:27:27 AM No.107188913 [Report]

>>107188347
>You're not referring to imatrix right?
Probably not, the code is here >>107184772
It's only technically interesting since perplexity difference is probably less than 0.01 when you actually compare between the two quants, but it is technically speaking strictly better and fully backwards compatible

Q8_0 with 64 or 128 elements per block reducing the bits per weight to 8.25 or 8.125 is probably more relevant since those savings could actually matter for huge models

For a 300B model:

8.5 bpw ≈ 318.75 GB
8.25 bpw ≈ 309.38 GB (≈ 2.9 % smaller)
8.125 bpw ≈ 304.69 GB (≈ 4.4 % smaller)

So you shave 10gb with 0.0001% quality loss. Not sure how 64 elements instead of 32 in a block impacts hardware optimizations though

Anonymous 11/13/2025, 12:28:13 AM No.107188920 [Report]

>>107188853
General use. Sometimes I have Kimi analyze documents, consolidate lots of information input into easily digestible bullet points for redistribution or making into infographs, produce simple, tedious, but necessary code snippets for hobby projects (do not trust LLMs directly with the codebase yet), acting as a smarter calculator that's more comfortable to use than the average texas instruments calc, and writing shitpost novels and essays for my own amusement. I've started using Kimi to mess around with exploring latent underlying patterns in linguistic theory as a whole too.

Anonymous 11/13/2025, 12:29:31 AM No.107188930 [Report] >>107188971 >>107188997 >>107189007

Could someone please give me a 60 second summary of what a chat template is and what "jinja" is?

As far as I can tell, models output different kinds of tags, like [THINK][/THINK] vs <think></think> vs <|begin_think|> etc. There is something schema for converting between a raw stream of text and json with the {"role":"assistant","content":"hello"} schema.

My question is, is the chat template included in the gguf (or model repo)? If so, why would I ever have to specify which chat template I'm using via a dropdown? How would the --jinja flag ever be needed then?

Also if I'm programming an interface that wants to convert between raw text and formatted conversations, do I need to access the model? There doesn't seem to be a standard endpoint like /v1/chat-template that would do this for arbitrary models.

Anonymous 11/13/2025, 12:30:48 AM No.107188942 [Report] >>107188992

today in autistic RP pet peeves
>model asks a great question early in its reply
>"but wait... my reply isn't long enough"
>writes 4 more paragraphs of action and dialogue that completely run over the original question, making it impossible to respond to it coherently
meanwhile the original question is the only interesting thing about the reply and the rest is a bunch of generic filler designed solely to pad out the response to the desired length
>just edit it
I do, but it's still annoying

Anonymous 11/13/2025, 12:34:04 AM No.107188971 [Report] >>107189029

>>107188930
Also, why is like 50% of every changelog fixing chat template problems. How is it not one-and-done by the model author?

Anonymous 11/13/2025, 12:36:50 AM No.107188988 [Report]

If I'm going to ssdmax, I won't need good compute anyway, right?

Anonymous 11/13/2025, 12:37:07 AM No.107188990 [Report]

currently going through all my old AI generated stories to organise them into obsidian.
I was not ready to see last modified: 27/06/2022. I think that one may be from GPT-3 davinci on OpenAI Playground.

Anonymous 11/13/2025, 12:37:41 AM No.107188992 [Report] >>107189011

>>107188942
The day models are able to modulate output length downward when they recognize they're asking questions will be a good day.

Anonymous 11/13/2025, 12:38:05 AM No.107188997 [Report] >>107189045

>>107188930
the --jinja flag is not for chosing an alternate chat templater, it's for enabling the gguf's chat template, without the flag you're actually not running the proper chat template at all
https://github.com/ggml-org/llama.cpp/tree/master/tools/server
--chat-template-file (used in ADDITION to --jinja) is what you would use to set another template formatter.
jinja can run complex logic, for example gemma models really hate not alternating user/assistant roles (for eg having two consecutive user messages) so it will helpfully reject your prompt if you do that

Anonymous 11/13/2025, 12:38:52 AM No.107189007 [Report] >>107189045 >>107189060

>>107188930
>chat template
A structure a model is trained to follow and that helps create patterns that differentiate things like what is the user's turn, the AI's turn, the thinking block, a tool call block, etc.

>and what "jinja" is?
A template engine. It helps creating templates with conditions, loops, etc.

>is the chat template included in the gguf (or model repo)?
Yes. (and yes).

>why would I ever have to specify which chat template I'm using via a dropdown?
For the most part, you wouldn't. Unless you know that there's something specific about the template you want to change, like adding support for suffixing/prefilling the assistant response.

>How would the --jinja flag ever be needed then?
If I'm not wrong, when you don't use the --jinja flag, llama.cpp uses an internal, built in hardcoded version of the chat template by default. When you use the flag, it pulls the actual JINJA template from the GGUF metadata and parses that.
I imagine that's either (or both) a failsafe and a way to retain backwards compatibility.

>Also if I'm programming an interface that wants to convert between raw text and formatted conversations, do I need to access the model? There doesn't seem to be a standard endpoint like /v1/chat-template that would do this for arbitrary models.
I'm not sure what you are asking.
llama.cpp has two endpoints, one for chat (that receives a structured chat array) and one for text completion (that receives raw text).

Anonymous 11/13/2025, 12:39:06 AM No.107189011 [Report] >>107189434

>>107188992
If you think of an important question to ask the user, you can stop generating output and ask the user. You will be able to continue your response using that information after they provide an answer.

Anonymous 11/13/2025, 12:40:58 AM No.107189025 [Report] >>107189049

bad dox.png md5: 96636135...

>>107188894
Weird because that that number is like single CPU performance.
Don't discount the possibility that the numanuma dual socket testing went bad and the incorrect value slipped its way into the docs. Does it still show up in single CPU benchmarks?

Anonymous 11/13/2025, 12:41:58 AM No.107189029 [Report]

>>107188971
>How is it not one-and-done by the model author
the model author didn't make a llama.cpp implementation, and in many cases, didn't make any popular inference tool implementation themselves
and recently mistral even wanted llama.cpp to use their tokenizer (which also did the templating), mistral-common, rather than having to answer to the expectation of jinja templates
machine learning is a far-west with no standards or will to settle on common grounds

Anonymous 11/13/2025, 12:43:39 AM No.107189045 [Report] >>107189069

>>107188997
>>107189007
Thanks!

I'm trying to understand the proper way to do this: In my frontend, I have raw text. I would like to convert that text into a conversation with the model's proper chat template.

The backend is whatever arbitrary way I'm running the model, which is llama-server right now.

Anonymous 11/13/2025, 12:44:07 AM No.107189049 [Report]

>>107189025
My thought was that it was just bad data, or that the value was accidentally swapped with the 9255. This was unfortunately the only data sheet I could find on memory performance for gen 5 EPYC.
411GB/s is way lower than the expected single CPU bandwidth of 576GB/s, as is the 9015 and 9115. The previous explanation I heard was because of the significantly reduced CCDs compared to the higher spec EPYCs.

Anonymous 11/13/2025, 12:45:38 AM No.107189060 [Report] >>107189069

>>107189007
>I imagine that's either (or both) a failsafe and a way to retain backwards compatibility.
Most likely the embedded template convention didn't exist initially, so llama couldn't start reading it without breaking backward compat. It'll probably be turned on by default in some future version.

Anonymous 11/13/2025, 12:47:32 AM No.107189069 [Report] >>107189114 >>107191472

>>107189045
If what you are going for is turn by turn exchanges, you are better off using the "OpenAI Compatible" chat endpoint.
That also has the upside of making your app compatible with whatever other API that uses that style of API like open router and pretty much everything else really.

>>107189060
That's my thinking. GGML format had no metadata from what I remember, then they created GGUF, and I guess at a later revision they added the template as a new metadata field.

Anonymous 11/13/2025, 12:55:05 AM No.107189114 [Report] >>107189164

>>107189069
>GGML
RIP TheBLoke.
You were too good for this world and didn't deserve to be killed by ninjas.

Anonymous 11/13/2025, 1:00:12 AM No.107189137 [Report] >>107191332

thankfully, the unlimited storage fest is over and so will be the unlimited troontunes production and ensuing quant uploads
some select users like barf-oski were given free reign but if troontuners can't upload more models quanters won't have more cope quant to produce

Anonymous 11/13/2025, 1:04:38 AM No.107189164 [Report] >>107189181 >>107189186

file.png md5: 758e0c57...

>>107189114
>TheBLoke
Far from TheBroke as he is to this day getting $398 monthly from Patreon for lord knows what

Anonymous 11/13/2025, 1:06:16 AM No.107189178 [Report]

>>107186458
>reading compression
that's a new one

Anonymous 11/13/2025, 1:06:46 AM No.107189181 [Report] >>107189269

>>107189164
>$4.8k passive annual income, pre-tax

Anonymous 11/13/2025, 1:07:59 AM No.107189186 [Report] >>107189288

>>107189164
>getting $398 monthly from Patreon for lord knows what
patreon has a lot of retarded users who sub to a guy and then forget about it
it's a very common phenomenon, saw webnovels writers still get a grand a month long after they stopped writing anything, and worse in the case of things like rpgm and visual novel pron games with no updates

Anonymous 11/13/2025, 1:14:05 AM No.107189229 [Report]

It's over

Anonymous 11/13/2025, 1:16:34 AM No.107189248 [Report]

It's starting again

Anonymous 11/13/2025, 1:17:24 AM No.107189256 [Report]

it never began and nothing ever happens

Anonymous 11/13/2025, 1:19:36 AM No.107189269 [Report]

>>107189181
i live in switzerland and that's pocket change there lol.

though i had a collegue that worked remote for a US company for like 120K/y but he lived in vietnam, dude must have been a king there.

Anonymous 11/13/2025, 1:20:20 AM No.107189274 [Report]

It never stopped, in fact it's going too fast and in the wrong direction

Anonymous 11/13/2025, 1:22:12 AM No.107189288 [Report]

>>107189186
The real cash is keeping something updated but that something is so pedestrian it involves no effort to keep it updated. All the effort was baiting retards into sucking you off.
Many "projects" earning 20K+ a month.

Anonymous 11/13/2025, 1:28:49 AM No.107189327 [Report] >>107189394

what happened to this?
https://github.com/huggingface/open-r1

Anonymous 11/13/2025, 1:38:25 AM No.107189394 [Report] >>107189518 >>107189638

thishappened.png md5: 950cc8b5...

>>107189327

Anonymous 11/13/2025, 1:43:39 AM No.107189424 [Report]

thishappened2.png md5: 5c0dfff2...

at least HF built this

Anonymous 11/13/2025, 1:45:01 AM No.107189434 [Report]

>>107189011
Well look at mister hotshot Prompt Engineer over here

Anonymous 11/13/2025, 1:58:00 AM No.107189518 [Report] >>107189544 >>107189622 >>107192305

>>107189394
There are women in this hobby?

Anonymous 11/13/2025, 2:02:00 AM No.107189544 [Report] >>107189617 >>107189622 >>107190878

>>107189518
pleasuring yourself with written word is indeed extremely female field

Anonymous 11/13/2025, 2:15:49 AM No.107189617 [Report] >>107189652 >>107189699 >>107190878

this, sister.png md5: 4304bd5e...

>>107189544
so much this sister
this thread oozes estrogen

Anonymous 11/13/2025, 2:16:13 AM No.107189622 [Report]

>>107189518
>>107189544
There are no men in the hobby. Only fujoshis pretending to be men.

Anonymous 11/13/2025, 2:18:46 AM No.107189638 [Report] >>107189756

file.png md5: 80f1fdda...

>>107189394
would

Anonymous 11/13/2025, 2:19:47 AM No.107189652 [Report]

>>107189617
10/10, would write my own llm smut fanfics about.

Anonymous 11/13/2025, 2:20:02 AM No.107189655 [Report] >>107189678

>>107188582
Not with vibevoice
Tons of live service video games have implemented AI voice acting and hardly anyone has noticed
It's only the jeets using 2 year old tts tech for YouTube spam that are obvious about it

Anonymous 11/13/2025, 2:23:18 AM No.107189678 [Report]

>>107189655
I've seen some pretty convincing youtube spam that I only noticed as TTS because no human could possibly read 30 minutes of repetitive GPT slop without wanting to kill themselves
if you notice a 30 min video with slop writing that uploads often it has to be TTS even if it sounds human

Anonymous 11/13/2025, 2:25:20 AM No.107189699 [Report] >>107189711

magus-bride.jpg md5: abb0bea0...

>>107189617
"Thing: Japan" reigns supreme.
WEGs are also notoriously ugly compared to hentai games, a phenomenon I find both fascinating and alarming.

Anonymous 11/13/2025, 2:28:03 AM No.107189711 [Report]

iu[1].jpg md5: 22161034...

>>107189699
Mahotsukai no Yome is so good.
Wizard's Blue is also great.

Anonymous 11/13/2025, 2:35:44 AM No.107189756 [Report]

Brigitte Tousignant.jpg md5: 726d5c73...

>>107189638
I concur

Anonymous 11/13/2025, 3:17:09 AM No.107190002 [Report] >>107191389

>>107188117
seed-vc after finetuning on a particular voice, depends on your input tho

haven't touched TTS though in like 9 months, curious if anyone has thoughts on what's good nowadays

Anonymous 11/13/2025, 5:15:53 AM No.107190781 [Report] >>107190853

open models?

Anonymous 11/13/2025, 5:29:28 AM No.107190852 [Report] >>107190855 >>107190859 >>107190873 >>107191161

I'm too busy discussing the Steam hardware/software news that came out today. Might want to do the VR on my AI PC but not sure if that'll need SteamOS or something.

Anonymous 11/13/2025, 5:29:39 AM No.107190853 [Report]

>>107190781
yes

Anonymous 11/13/2025, 5:30:12 AM No.107190855 [Report] >>107190859 >>107190893

>>107190852
i'm kinda disapointed by the steam frame, if it had been 4k per eye oled, sure.
but this isn't an upgrade from my bigscreen, much better than the quest though;

Anonymous 11/13/2025, 5:31:16 AM No.107190859 [Report] >>107190913

>>107190852
>>107190855
i just sold my valve index for $800 to then buy the steam frame when that comes out

Anonymous 11/13/2025, 5:33:01 AM No.107190873 [Report]

>>107190852
SteamOS is just Arch with Steam. Should be able to do the VR on any Linux.

Anonymous 11/13/2025, 5:33:20 AM No.107190878 [Report]

>>107189544
>>107189617
it is well known that literacy is gay

Anonymous 11/13/2025, 5:35:30 AM No.107190893 [Report] >>107190923 >>107190928

>>107190855
Yeah I had that thought as well but then it's not the product for us (enthusiasts), but a Quest competitor and will be priced as such probably. But one thing that kind of got me was the weight. I had to do a double take since it was so surprising. It's 185 grams in the front. That's literally like 100 grams lighter than the Rift CV1's front module, even though it has the full SoC in it and stuff. And just 2x heavier than the Beyond 2. First impressions also said it was super comfy. So that kind of got me hyped again, for the overall package, just not for the thing I thought Deckard was going to be of course.

Anonymous 11/13/2025, 5:39:23 AM No.107190913 [Report] >>107190925

>>107190859
they are different devices though, it'll be more confortable to play on the steam deck for 3h than it'd be on a vr headset.

Anonymous 11/13/2025, 5:40:24 AM No.107190923 [Report]

>>107190893
yea desu i'll probably buy it if it's < 600 bucks.
i kinda want to get rid of my quest too because it's just taking dust.

Anonymous 11/13/2025, 5:40:38 AM No.107190925 [Report] >>107190930

>>107190913
where did i bring up the steam deck?

Anonymous 11/13/2025, 5:41:25 AM No.107190928 [Report] >>107190952

>>107190893
It's comfy and optimized for wireless streaming. I don’t mind the resolution, but I can’t go back to LCD, the gray mess where there should be pitch black ruins immersion for me

Anonymous 11/13/2025, 5:41:54 AM No.107190930 [Report] >>107190933

>>107190925
i think it is about time i go to bed lmao.

Anonymous 11/13/2025, 5:42:50 AM No.107190933 [Report] >>107190947

>>107190930
ok goodnight. i said i sold my index, not a steam deck. they are both vr headsets

Anonymous 11/13/2025, 5:46:07 AM No.107190947 [Report]

>>107190933
yea i get it now, idk when i read index the steam deck poped in my mind.

good night / day man !

Anonymous 11/13/2025, 5:47:28 AM No.107190952 [Report] >>107190968

>>107190928
Yup. And I also enjoy the color accuracy of good OLED/QLED. It won't replace my monitor. But I could use some VR gaming in my life again. I had fun with it in the past and just didn't bother keeping up with the hobby. I'd probably get a Frame and then hope they do a Frame OLED so I'd sell the old one.

Anonymous 11/13/2025, 5:51:51 AM No.107190968 [Report] >>107191009

>>107190952
since it's linux based anyway, we'll probably have ways to mod and upgrade the displays though.

Anonymous 11/13/2025, 5:54:07 AM No.107190976 [Report]

>>107185557
so they give out licenses to generate loli porn? and a paycheck?

Anonymous 11/13/2025, 5:59:43 AM No.107191009 [Report] >>107191075

>>107190968
You can't just replace panels because pancake lenses lose 90% of the light and oled isn't bright enough to compensate for that

Anonymous 11/13/2025, 6:13:02 AM No.107191075 [Report] >>107191103

>>107191009
depends which oled technology we are talking about, more expensive panels can rival and be bright enough.

i think they were going for an affordable device not a premium one and that's why they made that choice.

there are many vr devices with pancake optics and oled panels.

Anonymous 11/13/2025, 6:18:24 AM No.107191103 [Report] >>107191114

>>107191075
The only option is microOLED, which features an array of individually controllable OLEDs with RGB filters in front of them, topped by a tiny collimating microlens per pixel to focus light in one direction. These are expensive and barely bright enough for use with pancakes

Anonymous 11/13/2025, 6:20:05 AM No.107191114 [Report] >>107191136 >>107191156

>>107191103
bigscreen beyond is oled.
shiftall meganex superlight 8k is oled.
you can look on vr compare there are tons of oled vr headset with pancake lenses.

yes it's not cheap, but i wouldn't mind paying 2k for a device with 4k oled panels.

Anonymous 11/13/2025, 6:23:52 AM No.107191136 [Report]

>>107191114
Both use microOLEDs that function as described. Bigscreen uses 2.5K microOLED displays from BOE, MeganeX 3.5K/3.8K microOLED
>i wouldn't mind paying 2k
You are not their target audience

Anonymous 11/13/2025, 6:27:30 AM No.107191156 [Report] >>107191165

>>107191114
the Beyond 1/2 pretty much only work because their custom fitted face pads prevent light leaking, so their low brightness microOLEDs still look okay thanks to the total darkness inside

Anonymous 11/13/2025, 6:28:25 AM No.107191161 [Report] >>107191174 >>107191298

>>107190852
>steam quest 3.5
VR is possibly the only market more depressing than AI inference hardware.

Anonymous 11/13/2025, 6:29:27 AM No.107191165 [Report] >>107191225 >>107191335

>>107191156
fun how you ignored the other device i mentioned.

also vision pro and galaxy xr are also oled.

and again, on vr compare you'll find dozens of other devices with oled and pancakes.

Anonymous 11/13/2025, 6:30:44 AM No.107191174 [Report]

>>107191161
> more depressing
at least they are actualy getting hardware.

we beg for a few crumbs of extra vram

Anonymous 11/13/2025, 6:39:21 AM No.107191225 [Report]

1762979731425698.jpg md5: 149b4e19...

>>107191165
Apple uses microOLED with dual white OLED backlights to add extra brightness, resulting in lower yields and very high prices
>on vr compare you'll find dozens of other devices with oled and pancakes.
Unlike you, I’ve tried or owned nearly all of them

Anonymous 11/13/2025, 6:49:29 AM No.107191274 [Report]

>why can't they just make an OLED version later
Since it's micro OLED, it's completely different in terms of how the panels are made and the resulting panel size. They're much tinier screens, which means the optics have a harder job to do and some trade-offs have to be made. Notice that the MeganeX and Beyonds all have tiny lenses. They advertise an average FOV which seems good on paper, but the reality is that actually that FOV is only when you've gotten your eyes almost touching the lenses AND you are looking straight ahead. When your eyes rotate to look at things, your FOV on that side gets reduced. It makes the experience worse but it's hard to understand that it's an issue and no one talks about it. So this is why there will not easily be a Frame OLED, unless they give it an optical redesign. Also, this isn't to say stuff like Beyond is bad. Obviously micro OLED is great for being OLED, and the headsets are lighter. In the end it's just trade-offs.

Anonymous 11/13/2025, 6:53:49 AM No.107191298 [Report]

>>107191161
I wouldn't say so? Just the fact that we have an alternative to Meta's closed garden and it runs any PC application you want makes their industry's progress way ahead of where we're at, where there is basically no alternative to Nvidia that's actually good (and not expensive like apple). Also RAM prices getting fucked.

Anonymous 11/13/2025, 6:55:43 AM No.107191308 [Report]

097ec7d6-59d6-4856-8f99-31e5952c3c2a.png md5: a783906c...

Good morning sirs!

Anonymous 11/13/2025, 7:01:52 AM No.107191332 [Report]

>>107189137
>free reign
it's "REIN" you dumb fuck mutt

you have terabytes of LLMs andyet you chose to stay retarded

Anonymous 11/13/2025, 7:01:53 AM No.107191333 [Report]

is this /lmg/ or /vrg/?

Anonymous 11/13/2025, 7:02:22 AM No.107191335 [Report] >>107192148

>>107191165
No one asked.

Anonymous 11/13/2025, 7:13:28 AM No.107191389 [Report]

>>107190002
ESL retard

Anonymous 11/13/2025, 7:28:02 AM No.107191472 [Report] >>107191480 >>107191507 >>107191530 >>107191532

>>107189069
>GGML format had no metadata
the terminal retardation of niggerganov has shown itself even back then

i remember thinking 'why would you fucking embed textual metadata into a multi-gigabyte binary file, surely he'll come to his senses and put them into an external file'

and now you can redownload the full thing for a 1-byte template/token change

c++ programmers are garbage

Anonymous 11/13/2025, 7:29:22 AM No.107191480 [Report]

>>107191472
to be clear i'm talking about dogshit GGUF here

Anonymous 11/13/2025, 7:31:33 AM No.107191492 [Report]

>>107185810
what a fucking retarded take
TIME IS MONEY
when asking an LLM to do some coding for me (mostly relegate it to do unit tests/documentation, but sometimes I want to see its takes on how to efficiently implement some functions) you want it to be FAST.
Is not wasting time an hard to grasp concept? fucking RETARD

Anonymous 11/13/2025, 7:33:12 AM No.107191507 [Report] >>107191530

>>107191472
That's entirely on the users. The textual metadata could easily be patched and no one would have to download more than a simple text file, but it's just easier for everyone to reupload and redownload the entire model weights because people are dumb and lazy, internet speeds are fast, and HF apparently has bandwidth to spare.

Anonymous 11/13/2025, 7:37:21 AM No.107191530 [Report] >>107191611

>>107191472
i mean, you can still load templates externally. I think it's good that you can embed a default template. The issue is like >>107191507 says: users are retarded, quant makers and hf dont care.
I remember the unsloth guys fucked up a template recently and they reuploaded the whole model again.. I think twice? because distributing a template is HARD.

Anonymous 11/13/2025, 7:37:49 AM No.107191532 [Report] >>107191611

>>107191472
braindead take, you can just patch the file if you care about bandwidth, it's way better that the file is self contained

btw holy shit i haven't run a 70b model in ages and i just tried again and llama.cpp went from 4tok/s to 12tok/s somehow, based ggregnov

Anonymous 11/13/2025, 7:48:29 AM No.107191611 [Report] >>107191688

>>107191530
it's not only templates, llama 2 goofs had to be recreated and reuploaded due to that borked eps setting or whatever

that would have been a good clue to realize "we've fucked up goys" but no

>>107191532
yeah i can "just do" anything
it just takes TIME
and i don't like it when retards play with my time

wget shitstral-q8.xml <- now was that so hard

Anonymous 11/13/2025, 8:03:13 AM No.107191688 [Report] >>107191825

>>107191611
wget shitstral-q8-metadata.patch | ./scripts/patch_model.py shitstral-q8.gguf <- now was that so hard

Anonymous 11/13/2025, 8:15:14 AM No.107191759 [Report] >>107191817 >>107191818

sars what's the best model for 48gb vram 64gb ram? some low bpw moe? 70b miqu?

Anonymous 11/13/2025, 8:25:52 AM No.107191815 [Report] >>107191833

we ended up with split data on goofs because of the multimodal mmproj anyway
we would have been better off having metadata done separately too
this is why imagen people are less retarded, they quickly dropped the idea of single file from the stable diffusion niggers and went all in diffusers

Anonymous 11/13/2025, 8:26:31 AM No.107191817 [Report]

>>107191759
Nemo FP16

Anonymous 11/13/2025, 8:26:43 AM No.107191818 [Report] >>107191875

>>107191759
gm sir. glm air best model for ramlet sir

Anonymous 11/13/2025, 8:28:22 AM No.107191825 [Report] >>107191851

>>107191688
if that sounds better to you, then you can actually be a woman one day, or already are

Anonymous 11/13/2025, 8:30:08 AM No.107191833 [Report] >>107191860

>>107191815
How about you STFU and just use a framework based on PyTorch instead?
You're not too poor for that, are you?

Anonymous 11/13/2025, 8:32:41 AM No.107191851 [Report] >>107191904

>>107191825
It is objectively better in that it can be done today, right now, with existing models and doesn't require yet another ggml file format change, dipshit.

Anonymous 11/13/2025, 8:34:21 AM No.107191860 [Report]

>>107191833
>How about you STFU
not before you lick my anus

Anonymous 11/13/2025, 8:36:09 AM No.107191875 [Report]

>>107191818
thanks saar, is there anything specific i need to do to get llama.cpp to handle moving the experts in and out of vram/ram to make it fast or is it just works tm?

Anonymous 11/13/2025, 8:39:37 AM No.107191895 [Report] >>107191899 >>107191906 >>107191933

Le ironic saarposting is just cringe.

Anonymous 11/13/2025, 8:39:48 AM No.107191898 [Report]

>>107185810
Even basic agentic workflows improve everything, from coding tasks to erp, at the cost of generation time. Smaller models using multistage responses outperform direct chat with a larger model

Anonymous 11/13/2025, 8:39:55 AM No.107191899 [Report]

>>107191895
saaaaaaar

Anonymous 11/13/2025, 8:40:28 AM No.107191904 [Report] >>107191930

>>107191851
>objectively better
when someone says this it's almost always guaranteed to be false

however this time it's ok

the point was that niggerganov was confirmed to be a thoughtless retard back then, not what would be the less-braindead workaround right now

and so, llama.cpp doesn't have version numbers TO THIS DAY

no release cycle

code is just simply being shat into the repo daily, testing in prod

this is the guy "designing" goof

Anonymous 11/13/2025, 8:40:51 AM No.107191906 [Report]

>>107191895
Good Mornings and many blessings of Vishnu for you saar!

Anonymous 11/13/2025, 8:44:12 AM No.107191930 [Report]

>>107191904
>and so, llama.cpp doesn't have version numbers TO THIS DAY
yeah lcpp had really buggy multimodal across the board for a while (it seems like recent releases finally ironed out all the issues) because of some refactors they were doing on kv cache / slots mechanisms and the lack of versioning, proper roadmap and branching of feature development really makes this all feel chaotic with no way of ever knowing if you're using a release from a "blessed" time of development where nothing retarded is happening, there's just nothing telling you when the retardation starts and stops it's like running windows insiders edition

Anonymous 11/13/2025, 8:44:55 AM No.107191933 [Report] >>107191944

1750167713267654_thumb.jpg.webm md5: 14fd8aaa...

WebM not supported

>>107191895

Anonymous 11/13/2025, 8:47:44 AM No.107191944 [Report]

>>107191933
saar not like this saar this is ai fake made by people who hate saar

Anonymous 11/13/2025, 9:19:53 AM No.107192130 [Report]

>>107192120
>>107192120
>>107192120

Anonymous 11/13/2025, 9:22:49 AM No.107192148 [Report] >>107192329

>>107191335
i don't care, you made the claim that it's not possible which is patently false.

Anonymous 11/13/2025, 9:48:01 AM No.107192305 [Report]

>>107189518
You gotta counterweight 99.999% of written smug (including gayshit) being women and requiring to click an exe, so it's 50% women

Anonymous 11/13/2025, 9:52:50 AM No.107192329 [Report]

>>107192148
retard