Anonymous
11/12/2025, 4:27:49 PM
No.107184305
[Report]
/lmg/ - Local Models General
Anonymous
11/12/2025, 4:28:16 PM
No.107184306
[Report]
►Recent Highlights from the Previous Thread:
>>107174614
--Paper: LeJEPA paper and Yann LeCun's potential new venture discussed:
>107181985 >107182047 >107182081 >107182097 >107182105 >107182118 >107182786 >107182462
--Skepticism over Google's 'secure cloud AI' claims:
>107182872 >107182888 >107182907 >107183248 >107183385 >107183482 >107183498
--Comparing Kimi, GLM, and DeepSeek for creative writing:
>107179399 >107179425 >107179434 >107179510 >107179674 >107180095 >107180171 >107180180 >107180221 >107180134
--Quantization optimization experiments with Q8_0_64 and intermediate formats:
>107180476 >107180530 >107180688
--GLM 4.5 Air deployment challenges and optimization on consumer-grade hardware:
>107174665 >107174677 >107174681 >107175083 >107175095 >107175120 >107175142 >107175231 >107175270 >107175290 >107175624 >107177243 >107176390 >107176473 >107176533 >107176578 >107176611 >107177015 >107177252 >107177277 >107177524 >107177546 >107177566 >107178047 >107181418
--Frontend tool comparison for story writing:
>107178671 >107178760 >107179089 >107179188
--Optimizing 120b model performance on a single 3090 GPU:
>107182483 >107182594 >107182615 >107182618 >107182656 >107182671 >107182676 >107182694 >107182707 >107182742 >107182749
--GPT-5's limitations in generating performant CUDA kernels for llama.cpp integration:
>107179734
--Debating AI's capability for detailed agentic coding and optimal abstraction levels:
>107181333 >107181358 >107181467 >107182044 >107182064 >107181430 >107181472 >107181428
--Implementing persistent memory systems for local LLMs using markdown-based RAG approaches:
>107175255 >107175762 >107177084 >107177172 >107177189 >107177209 >107177241 >107177634 >107177771 >107178429 >107178789
--Kimi K2 Thinking webapp:
>107176092 >107176237 >107176249
--Miku (free space):
>107178964 >107180253 >107180428 >107178764
►Recent Highlight Posts from the Previous Thread:
>>107174619
Why?:
>>102478518
Enable Links:
https://rentry.org/lmg-recap-script
Anonymous
11/12/2025, 4:31:10 PM
No.107184325
[Report]
>>107184602
>>107184173
you are a living tumor upon the earth
Anonymous
11/12/2025, 4:36:26 PM
No.107184363
[Report]
>>107184547
>>107184240
2x RTX 6000 in a 12 channel epyc platform with the fastest DDR5 you can get.
>>107184258
>>107184299
Alright.
One IDE extension user, one CLI user.
I've been using Cline too and it's been working alright so far.
Haven't tried any of the pure CLI tools.
What are the advantage of those? Anything that would make them work better with local models?
I imagine not, but figured I might as well ask.
Anonymous
11/12/2025, 4:58:42 PM
No.107184547
[Report]
>>107184770
>>107184240
>I'm seriously thinking of putting together a setup with 2 RTX 6000 Pros.
>>107184363
>2x RTX 6000 in a 12 channel epyc platform with the fastest DDR5 you can get.
I don't think building a ddr5 epyc system is good right now, due to the extreme price increase of ddr5 ram.
Zen 6 Epyc is supposedly going to be announced at CES in january. Zen 6 epyc is going to be much, much better than zen 5. It's also going to use MRDIMMS, which will supposedly exist at 12800hz. Compare to *maybe* getting 8000hz ddr5 next year. There will be 16 channel cpus too, but even 8-channel will be 2x the bandwidth of the best ddr5 ram.
One rtx 6000 pro and wait for zen 6 is The Way.
Thanks to the anon for suggesting checking out the k quants and trellis quants. I learned about importance-weighted optimization and I think I just got a free lunch out of Q8_0
You can quantize to Q8_0 slightly better by using the importance-weighted optimizations that smaller quant formats use and this gives you about a 5% reduction in mean square error. The resulting GGUF is fully backwards-compatible with Q8 (it's literally Q8 just quantized a bit more efficiently at the cost of a much more expensive algorithm than just dividing the weight by 127)
There is no reason I see not to quantize like this if you're releasing a final Q8_0, or to use a Q8_0 that was quantized like this
>>107184325
You that ESL spammer. Thanks to you there's never any real discussion here.
Anonymous
11/12/2025, 5:05:53 PM
No.107184616
[Report]
>>107184772
>>107184585
does bartowski know?
Anonymous
11/12/2025, 5:06:30 PM
No.107184623
[Report]
>>107184681
>>107184602
>real discussion is vibe coding advice
literally kys retard
Anonymous
11/12/2025, 5:09:53 PM
No.107184656
[Report]
>>107184602
>ESL
he thinks americunts are the main posters on this board lmao
Anonymous
11/12/2025, 5:11:09 PM
No.107184670
[Report]
>>107184602
>You that ESL
>>107184623
Better discussion than forcing llms to output vulgar text.
Anonymous
11/12/2025, 5:14:16 PM
No.107184702
[Report]
>>107184713
>>107184681
according to whom? we only care about cockbench here
Anonymous
11/12/2025, 5:14:51 PM
No.107184713
[Report]
Anonymous
11/12/2025, 5:18:06 PM
No.107184742
[Report]
>>107184766
>>107184681
there is no discussion to be had with mongoloids like you
bugger off making more inane PRs that waste maintainer time like the onslaught of garbage that constantly tries to get pushed in llama.cpp
even SOTA models can't really produce good code or that nigger trying to vibecode deepseek v3.2 wouldn't have entered the loopy circle of unending refactor that never properly works
you are an unwanted abortion, a plague on all repos that have to suffer your existence
Anonymous
11/12/2025, 5:21:17 PM
No.107184766
[Report]
>>107184830
>>107184742
>even SOTA models can't really produce good code
Garbage in, garbage out. And it seems like you are incapable of anything but garbage.
Anonymous
11/12/2025, 5:21:42 PM
No.107184770
[Report]
>>107184399
That should be relatively easy since it's only got 10B active params
>>107184547
Thanks for that heads up
Anonymous
11/12/2025, 5:21:50 PM
No.107184772
[Report]
>>107188913
>>107184616
>does bartowski know?
he probably has better things to care about i'd think. There is literally no reason to not quantize Q8_0 like this though if you're releasing a Q8_0 version of a model
This isn't a new quantization format though its just an alternate way to quantize Q8_0 that is very slightly better so I might just make an issue on github and show this to the devs and they can decide if/how they want to implement it.
Anonymous
11/12/2025, 5:28:29 PM
No.107184830
[Report]
>>107184766
riddle me this, mongoloid, if it worked, why has there been not even one singular instance of enhanced productivity and velocity in open source projects where anyone can actually see the code and features being added? where are all the projects that were LLM boosted? you vibe coding niggers are always at the stage of useless prototype or wasting the rest of your team's time in your real life job, if you even have one
believe me every fucking developer in existence that actually produce value hate your guts with the force of a thousand sun
it used to be mosquitoes or cockroaches were the first thing one would push the genocide button on but I would argue your kind should be exterminated first
your ability to generate endless garbage with a few prompts is indeed like literal tumors but with contagion powers.
All this sperging because I asked about "vibe coding" tools?
Damn.
Anonymous
11/12/2025, 5:39:59 PM
No.107184952
[Report]
jej
why is editing the thinking block so poorly supported in many frontends
Anonymous
11/12/2025, 5:44:26 PM
No.107184990
[Report]
>>107184844
"vibe coding" is an annoying buzzword that sets a lot people off. You might be received better if you ask for AI Agent-Assisted Development Tooling next time.
Anonymous
11/12/2025, 5:55:19 PM
No.107185108
[Report]
>>107184844
You're damn right that vibe coding is for tools.
Anonymous
11/12/2025, 5:58:28 PM
No.107185148
[Report]
>>107185216
>>107185040
I suppose.
Trying to doge schizos is standard 4chan fare these days I guess.
Anyhow, impressed with Qwen3 30B. It's surprisingly usable for something with 3 active params.
Anonymous
11/12/2025, 5:59:00 PM
No.107185154
[Report]
>>107185469
>>107180688
>I'm just messing around, you can't make a format better than Q8_0. It's literally just a float16 multiplied by an int8.
Q8_1 (or whatever) was a float16 multiplied by an int8 and summed with another float16 instead of implicitly summed with 0. That's what q8 MLX does with a default group size of 64 rather than 32 which works out to the same amount of metadata per weight. I wonder if in practice it's typically a win.
>>107185040
NTA but Karpathy made that decision for us. I hated the term as well but if I don't use it somebody else will so might as well claim it.
>>107185160
why should we care what that anti open sores snake decides?
Anonymous
11/12/2025, 6:01:49 PM
No.107185177
[Report]
>>107185199
>>107184971
just be a grug and write your own scripts for anything that needs to be batched/chunked, and use mikupad for chat and hand edit things yourself
the more features frontends have the worse they are in real use
Anonymous
11/12/2025, 6:02:49 PM
No.107185186
[Report]
>>107185173
It's less that he decided anything, and more that thought of a catchy term the zoomers instantly fell in love with and now everyone is using it.
Anonymous
11/12/2025, 6:04:58 PM
No.107185199
[Report]
>>107187103
>>107184971
llama-server default
lm studio
cherry studio
I have now resorted to sillytavern but I don't like it.
>>107185177
3 years into the LLM craze I would have hoped to have more robust tools. Then again I also experience so many rendering issues on OpenAI/Claude etc I guess frontend is just too hard to do properly.
Anonymous
11/12/2025, 6:06:55 PM
No.107185216
[Report]
>>107185256
>>107185148
>Anyhow, impressed with Qwen3 30B. It's surprisingly usable for something with 3 active params.
I wish they made a coder variant of the 32B. Would love to trade some speed for a more capable small model.
>>107184173
>A visual studio extension?
If you find one, let me know. Apparently no one interested working on these extensions is capable of anything but Python and JavaScript. I considered forking and developing one of the Chinese shoddy extensions, but it was easier to just use VSCode for this shit.
Anonymous
11/12/2025, 6:09:48 PM
No.107185248
[Report]
>>107185501
>>107185160
pic related is one of the things he showed as an example of proud vibe coding in the thread where he coined the term
this is the sort of shit bootcamp genz faggots could hand write in 10 minutes
Anonymous
11/12/2025, 6:10:19 PM
No.107185256
[Report]
>>107185303
>>107185216
>If you find one, let me know.
Coding agent extensions for vs code?
As one anon mentioned, there's Cline
There's Roo, a Cline fork and Continue.
Anonymous
11/12/2025, 6:15:30 PM
No.107185303
[Report]
>>107185256
I keep Roo and Contiue installed. Continue is good for autocomplete and quick questions and Roo for agentic tasks. Tried Cline first, but the only thing it had over Roo was that it had a button to generate commit messages, but even that was annoying because it gives the model all changes instead of just what was staged and no way to change it.
Anonymous
11/12/2025, 6:23:22 PM
No.107185380
[Report]
>>107185454
Mistral Nemo really is nice... sad there's no bigger version.
Anonymous
11/12/2025, 6:26:07 PM
No.107185406
[Report]
>>107185425
am I retarded where are the rest of the sampler settings like min p?
Anonymous
11/12/2025, 6:27:45 PM
No.107185425
[Report]
>>107185406
They don't show up in the chat completion interface, but you can still use them by setting those as custom properties/headers.
Same with shit like GBNF and anything else the API accepts.
Anonymous
11/12/2025, 6:30:07 PM
No.107185454
[Report]
>>107185474
>>107185380
could always merge two nemos together
Anonymous
11/12/2025, 6:31:46 PM
No.107185469
[Report]
>>107188630
>>107185154
>Q8_1 (or whatever) was a float16 multiplied by an int8 and summed with another float16 instead of implicitly summed with 0.
In practice it's typically a loss. Try it out yourself. Summing a float16 destroys any quality bonuses you get from having the extra info from the float16 bias in the first place. That's probably why Q8_1 isn't exposed and is only used internally for an intermediate step in some niche quants.
Yes, you can get slightly higher precision by using an int16 instead but it comes with 2 bytes more of overhead per 32 elements which is 9.0bpw and it performs worse than fp16 outlier strategies
another reminder that none of this matters (other than improving the quantization of Q8_0 itself, and maybe Q8_0_64 and its _IMP version because 3% less model size for 0.001% loss in accuracy might be interesting to some) because you can't practically a single fp16 * int8 calculation. you can easily imagine how well that could be optimized for hardware instructions
I'm gonna poke around and see if i can squeeze any better precision out of the Q8_0_IMP quantization function and then if I can' think of anything else, I'll open an issue
Anonymous
11/12/2025, 6:32:14 PM
No.107185472
[Report]
>>107185173
Might as well ask why the state of Israel must exist
>>107185454
how
is it actually worth it?
Anonymous
11/12/2025, 6:33:00 PM
No.107185479
[Report]
>>107185474
No. He's pulling your leg.
Anonymous
11/12/2025, 6:35:36 PM
No.107185498
[Report]
>>107185607
>>107185474
>how
you can easily google this, merging a model with itself slightly improves its intelligence
>is it actually worth it?
using local LLMs isn't worth it beyond learning how they work lol
Anonymous
11/12/2025, 6:35:42 PM
No.107185501
[Report]
>>107185670
>>107185248
I think you're overestimating the speed of development when hand coding
WE MUST PROTECT AI CHILDREN
Anonymous
11/12/2025, 6:47:13 PM
No.107185607
[Report]
>>107185629
>>107185498
>you can easily google this
kys
>>107185607
dude just google "miqu-70b merged with itself" and the first result is miqu-120b ... and just do your own research from there
Anonymous
11/12/2025, 6:50:29 PM
No.107185634
[Report]
>>107185655
>>107185629
>just do your own research from there
kys gossipnigger
Anonymous
11/12/2025, 6:52:33 PM
No.107185655
[Report]
>>107185634
>This is a 120b frankenmerge of miqu-1-70b created by interleaving layers of miqu-1-70b-sf with itself using mergekit.
There now you have the full spoonfeed. Go and use mergekit to interleave layers of mistral-nemo with itself
Anonymous
11/12/2025, 6:54:47 PM
No.107185670
[Report]
>>107185501
And the attention required for manual implementation. Sometimes most of my brain is locked in on a specific big picture problem and it's very helpful to be able to delegate things to a language model to validate some random ideas.
In many cases the quality of the vibed LLM implementation is irrelevant (I might throw it out entirely) I just wanna see if something might be good to pursue further.
Anonymous
11/12/2025, 6:54:59 PM
No.107185672
[Report]
>>107185734
>>107185629
>70b + 70b = 120b
Where did the other 20b go?
Anonymous
11/12/2025, 7:01:20 PM
No.107185734
[Report]
>>107185672
>Where did the other 20b go?
mergekit uses a passthrough method, which concatenates/assembles transformer blocks from the source(s) into a deeper model rather than just averaging weights
Anonymous
11/12/2025, 7:05:30 PM
No.107185771
[Report]
>>107185804
>>107185557
Even if the UK citizens voted against it they would still implement that law.
Anonymous
11/12/2025, 7:09:18 PM
No.107185804
[Report]
>>107185771
>citizens voted against it
Huh
I have a genuine question.
Why the fuck is everyone so obsessed with making an LLM run as fast as possible?
I understand it for audio or images it's very important since the result is something we can process as fast as our brains can, but reading is very slow comparatively, and with token streaming wouldn't be the best choice to pick the smartest model that we can run at our reading speed?
What is the point of having an answer in seconds if we still need to take a minute to read it? But I do understand the want to run a small model to also be able run a tts and/or image model together.
Anonymous
11/12/2025, 7:10:57 PM
No.107185821
[Report]
>>107185909
>>107185810
for code or generating huge chunks of text you mostly skim, as well as reasoning which takes ages at reading speed
>>107185810
>Why the fuck is everyone so obsessed with making an LLM run as fast as possible?
because LLMs are mostly used for coding, and time is money
Anonymous
11/12/2025, 7:12:38 PM
No.107185841
[Report]
>>107185909
>>107185810
Because you need to reroll 46 times to get one usable line out of these POS
Anonymous
11/12/2025, 7:15:05 PM
No.107185859
[Report]
Should I use I quants for >6_k_s?
Anonymous
11/12/2025, 7:20:15 PM
No.107185909
[Report]
>>107185938
>>107185821
>>107185825
Yeah I forgot lazy fucks just copy paste the code without reading it.
>>107185841
Yes, but wouldn't make sense to use a smarter model so you don't need to reroll as much? Besides you still need to read each reroll at the slow speed to know if you have to reroll to begin with.
Anonymous
11/12/2025, 7:23:04 PM
No.107185938
[Report]
>>107186110
>>107185909
I mean... it doesn't really take more than a few s to read the few sentences it gens, I'm not genning 4k token walls.
Anonymous
11/12/2025, 7:23:09 PM
No.107185940
[Report]
>>107186110
>>107185810
You might be a slow reader anon. Also it's fun to experiment with card settings and prompts, or reroll to see what else could happen. If your model is slow it greatly degrades the experience. Every time I switched to offloading to CPU I regretted it, the models are smarter but it's not worth it.
Anonymous
11/12/2025, 7:25:53 PM
No.107185976
[Report]
>>107185474
iirc merging was based on the observation that residual layers (most transformers stack these) can work somewhat independently of each other. There was a paper (
https://arxiv.org/abs/1605.06431) showing that you could permute/delete them with minimal performance degradation, and people attributed this to iterative refinement or ensemble-like behavior, but it's still an open problem to my knowledge. I'd assume adding layers from finetuned variants of a model shouldn't decrease performance by much, but idk if it would benefit either
Is there a collection of best practices to minimize prompt length without losing information?
Anonymous
11/12/2025, 7:28:04 PM
No.107185998
[Report]
>>107185984
>chatgpt, condense this prompt without losing information
Anonymous
11/12/2025, 7:29:13 PM
No.107186002
[Report]
>>107185984
>day 999 of reinventing /aids/
Does it really matter with the context sizes?
Anonymous
11/12/2025, 7:33:35 PM
No.107186047
[Report]
>>107186120
>day 999 of forcing /aids/ into the conversation
Anonymous
11/12/2025, 7:37:06 PM
No.107186093
[Report]
/aids/? nobody's got /aids/!
Anonymous
11/12/2025, 7:38:40 PM
No.107186110
[Report]
>>107186301
>>107185938
Yes, but I usually read as it generates the answer.
>>107185940
Well, probably yes since I'm not a native English speaker, but I'm asking if it would make more sense to chose the best model according to your individual reading speed instead the one that runs as fast as possible. For example the best model I can run at my own reading speed on my 8GB card is a 16B Q4_k_m at 8k context or if I want a model with vision I run an 8B model Q6_k_m with 12k context.
Anonymous
11/12/2025, 7:39:12 PM
No.107186120
[Report]
>>107186047
this
wow /aids/ touched on a fundamental behavior of LLMs at one point, so did every other LLM community, who cares? unless they have a specific ingenious solution that 1) still applies with modern models and 2) isn't already common knowledge, it's not worth bringing up
Anonymous
11/12/2025, 7:46:37 PM
No.107186221
[Report]
>>107186311
>tried the self merge
>it's full on repeating schizo
W A O W
Anonymous
11/12/2025, 7:48:31 PM
No.107186244
[Report]
At this point I am checking /lmg/ out of habit. Still not tired of glmsex.
Anonymous
11/12/2025, 7:55:03 PM
No.107186301
[Report]
>>107186458
>>107186110
>16B
Q6_k_m
oh you're just a baitie
>>107186221
any model bigger than the original model made by internet randos was either:
snake oil
or literally broken garbage that's worse than snake oil
also fuck solar and other upscale retardation
you want a big model? spend the money on training a big model
there, that's it
everything else is a cope
Anonymous
11/12/2025, 7:58:33 PM
No.107186337
[Report]
>>107186374
>>107186311
brother the whole field is cope layered on more cope
Anonymous
11/12/2025, 8:01:40 PM
No.107186372
[Report]
>>107186311
I don't think they're any smarter or better at actual problem solving than their source components but I think they can be more interesting for creative writing and similar tasks
Anonymous
11/12/2025, 8:01:42 PM
No.107186374
[Report]
>>107186301
With that lack of reading compression it's no wonder you read fast.
I said I can run at my slow reading speed:
-16B at Q4
-8B at Q6 with vision.
Anonymous
11/12/2025, 8:22:09 PM
No.107186568
[Report]
Just tried GLM-4.5-Air EXL3 at 3.07 (optimized) bpw on 2x3090.
native tp (no nvlink), 30k context: 952 tok/s pp, 28 tok/s tgs
nccl tp (uses nvlink), 30k context: 1135 tok/s pp, 28 tok/s tgs
Anonymous
11/12/2025, 8:25:01 PM
No.107186591
[Report]
>>107186722
>>107186458
yes and 16b (one thing) and q6km (another) is bait
Anonymous
11/12/2025, 8:25:38 PM
No.107186595
[Report]
i've been bragging about getting 18 tps on a 1080ti
but it turns out the vast majority was being offloaded onto my 5800x3d. pls ignore my bad benchmark.
>>107186311
I kind of never got how people expect this to work. Any "finetuning" does almost nothing because you have to do very little (one epoch) or you start overfitting and frying the model. If you add new layers you are just giving the training algorithm a place, which it can modify to reach the overfitting state faster. Even if you would train only those layers it is hard to imagine not overfitting.
I guess in the best case you could get the model to output a specific type of output like specific formatting or something, but that is only if the possibility of it was already in the model. You aren't teaching it new things this way. It is just impossible.
Anonymous
11/12/2025, 8:33:19 PM
No.107186663
[Report]
>>107186670
>>107186640
You can't rag your model into being an expert masterpiece highest quality ERP-er. You just need to buy ram for 4.6.
Anonymous
11/12/2025, 8:34:02 PM
No.107186670
[Report]
>>107186663
oh, just a NAI shill, carry on sir
>>107185810
>>107185825
I could wait for code 2 or 3 days, if it worked and was accurate. But bigger models are not that smart.
>>107186311
>>107186614
The psycology that is in effect when people are making finetunes is the same as when people are making "ShadowMaster's Ultra-High-Res Skyrim Grass Modpack"
1) Feeling of acomplishment. Technically, they did manage to create a mod pack. This is fine.
2) Denial of skill and expertise. "If the game developers were as smart as me, they would have made the grass more high resolution."
3) Denial of their role in the consumer class. "People are downloading my mod, so I've created something of value, just like the game's developers."
4) Denial of taste. "I like my high res grass (although I'm unaware that it's becuase of reasons 1-3). Anyone who says it's shit must be jealous or just have different taste. Therefore, the fact that I can't tell that it's ugly doesn't mean I lack taste."
5) Imitation of academic tradition. "There's something named after me."
It's literally the same exact brain damage for finetunes. There was a very brief period where finetuning was being invented, where individual people were going back and finetuning the earlier untuned models. That was valid, but everything for the last year is cope.
Seriously, if finetuning was good, don't you think the billion dollar company would have someone doing it? They are better than you at this. Only delusion prevents this realization.
Anonymous
11/12/2025, 8:36:29 PM
No.107186701
[Report]
>>107186686
Yes of course, run it overnight, heard ALL about it when llama 405B dropped. So many people do this it's crazy!
>>107186696
i don't think you know what finetuning means
Anonymous
11/12/2025, 8:38:40 PM
No.107186722
[Report]
>>107186747
>>107186591
I don't understand what are you trying to say then, this is the speed I get with the 8B model with vision enabled and it is a Q6 and it's a lot faster than I can read English
Anonymous
11/12/2025, 8:38:58 PM
No.107186725
[Report]
>>107186686
Right?
If there was a model that would take 3 days to spit out what you need but would get it exactly right every time, I'd be more than happy leaving the thing running.
Alas, that's not yet a thing.
Anonymous
11/12/2025, 8:38:58 PM
No.107186726
[Report]
>>107186696
drummer mentioned
Anonymous
11/12/2025, 8:39:24 PM
No.107186730
[Report]
>>107186743
>>107186720
Hi faggot, all here...
Anonymous
11/12/2025, 8:39:58 PM
No.107186737
[Report]
>>107186720
people post training or merging or whatever to create mods of existing models. releasing the whole model instead of a lora
Anonymous
11/12/2025, 8:40:15 PM
No.107186743
[Report]
>>107186730
uh, yeah, right?
>>107186696
this post was written by an llm
Anonymous
11/12/2025, 8:40:41 PM
No.107186747
[Report]
>>107186821
>>107186722
>Captura de pantalla
lolmao
what 16b are you running little bro
Anonymous
11/12/2025, 8:41:29 PM
No.107186755
[Report]
>>107186787
>>107186744
>ShadowMaster's Ultra-High-Res Skyrim Grass Modpack
Make your LLM output that. I dare you.
Anonymous
11/12/2025, 8:41:59 PM
No.107186762
[Report]
>>107186744
>nigger seeing capitol letters on four chan
Anonymous
11/12/2025, 8:42:25 PM
No.107186768
[Report]
>>107186796
this post was written by an esl
Anonymous
11/12/2025, 8:43:50 PM
No.107186787
[Report]
>>107187103
>>107186755
that's possibly the most llm-y part of the post, kimi for example is addicted to unnecessary little flourishes like that
Anonymous
11/12/2025, 8:44:08 PM
No.107186796
[Report]
>>107186768
esl hobby sir de pantella Pareto paradigm just mooned
Anonymous
11/12/2025, 8:44:44 PM
No.107186803
[Report]
>>107186817
>>107186640
The real misconception is that the model parroting finetuning data means it has learned new knowledge. A tiny QLoRA adapter is enough for that, for limited amounts of data. But it doesn't really mean the model has actually learned to use and apply any new information.
Anonymous
11/12/2025, 8:45:55 PM
No.107186817
[Report]
>>107186803
>noooo muh mesugaki lightbulb bublesort benchie
>>107186747
Fuck me, do you even know how to read numbers? I said is a 8b model.
The 16b model runs at 8 tokes per second.
Anonymous
11/12/2025, 8:47:13 PM
No.107186830
[Report]
>>107186876
>>107186821
i'm asking which 16b you claim to run ffs
drummer getting desperate ITT...
Anonymous
11/12/2025, 8:50:16 PM
No.107186867
[Report]
>>107186860
leave the Pantella frontier alone!
Anonymous
11/12/2025, 8:51:22 PM
No.107186873
[Report]
>>107186860
kofi bucks running low his discord are ungratefulls
>>107186830
I swap between these two depending on the mood:
LLama-3.1-128k-Darkest-Planet-Uncensored-16.5B-Q4_k_m
Nyanade_Stunna-Maid-7B-v0.2-Q6_K-imat
Also, the vision model it's a 7b, not an 8b.
>>107186876
and there we go...
>128k-Darkest-Planet-Uncensored-16.5B
a davidau clownmoe atrocity
Anonymous
11/12/2025, 8:56:39 PM
No.107186919
[Report]
>>107186876
>Darkest-Planet-Uncensored
That's so fucking funny.
>128k
I bet it is.
>>107186884
>davidau
Figures.
I love that guy man. I always get a chuckle out of his shit on huggingface.
Anonymous
11/12/2025, 8:58:40 PM
No.107186933
[Report]
>>107186821
>do you even know how to read numbers? I said is a 8b model.
>>107186876
>it's a 7b, not an 8b.
Womp womp
Anonymous
11/12/2025, 8:58:56 PM
No.107186936
[Report]
>>107186946
>>107186884
Yes, and? I'm just discussing the sizes of models and their running speeds, not what they are for.
Anonymous
11/12/2025, 8:59:47 PM
No.107186944
[Report]
>>107186876
>LLama-3.1-128k-Darkest-Planet-Uncensored-16.5B-Q4_k_m
>Nyanade_Stunna-Maid-7B-v0.2-Q6_K-imat
Anonymous
11/12/2025, 9:00:13 PM
No.107186946
[Report]
>>107186936
The running speed of atrocities in their own size class is surely widely useful info, thanks anon.
Anonymous
11/12/2025, 9:04:51 PM
No.107186998
[Report]
>>107187017
For me it's the pre Llama2 merges consisting of 278 nestled models (confirmed)
Anonymous
11/12/2025, 9:06:18 PM
No.107187017
[Report]
>>107186998
Utopia/UtopiaXL my beloveds
Anonymous
11/12/2025, 9:14:09 PM
No.107187103
[Report]
>>107185199
>3 years into the LLM craze I would have hoped to have more robust tools.
I'll bet their readme files on their git repos have been the bulk of their merge histories.
>>107185810
Fried dopamine receptors needing faster validation. Every other answer is cope.
>>107186787
This is why Kimi is so good.
>>107186614
You can do multiple epochs over the data you want to actually train on by diluting it with more generic data.
Also what makes you think you can't teach the model something in one epoch? Pretraining is often just 1 epoch.
Anonymous
11/12/2025, 9:31:19 PM
No.107187294
[Report]
>>107187264
>Pretraining is often just 1 epoch.
pretty sure that hasn't been true in years, that's how they get to claim their crazy 30T+ tokens by doing multi epochs on the same shit, also iirc some papers showed they specifically did multiple epochs of stuff like wikipedia.
yo is it just me or is QwQ weirdly better than you'd expect? feels like it punches way above it's weight, least slopped and smartest ~30B model in my book (compared to Qwen3 30 & 32, magistral and gemma)
>>107187326
>punches way above it's weight
HELL YEAH!!
>>107182378
Anonymous
11/12/2025, 9:35:50 PM
No.107187354
[Report]
>>107187363
>>107187326
I don't think I've seen one good Qwen model but IG I'll download it and see
>>107187264
One pretraining epoch has information repeated hundreds (at the minimum) or thousands of times in many different ways, though.
Anonymous
11/12/2025, 9:36:32 PM
No.107187363
[Report]
>>107187354
Qwen models post 2507 are all pretty good
Anonymous
11/12/2025, 9:36:36 PM
No.107187365
[Report]
>>107186696
They don't because they don't have an ML department and they don't want to invest resources into something that sounds technical and risky/scary.
My boomer boss literally thinks you can "train the AI with your own data" with <shitty low code software> but finetuning is "too low level".
Anonymous
11/12/2025, 9:37:03 PM
No.107187373
[Report]
>>107187446
>>107187357
Not on our proprietary high quality de duplicated filtered dataset sir.
Anonymous
11/12/2025, 9:37:07 PM
No.107187374
[Report]
Anonymous
11/12/2025, 9:37:10 PM
No.107187375
[Report]
>>107187386
>>107187348
How did soul not make the list?
Anonymous
11/12/2025, 9:38:04 PM
No.107187386
[Report]
>>107187375
because soul is sovl of course
>>107187264
Ok drummer then where is that one model that is actually noticeably better? And why do you shit out new models every few weeks? I have not seen a single fine-tune that delivered an ERP improvement you get when you jump from 7B>30B>70B>the land of eternal magical sex (4.6)
>>107187393
>the land of eternal magical sex (4.6)
buy the ad NAI shill
Anonymous
11/12/2025, 9:41:03 PM
No.107187417
[Report]
>>107187348
>slop words:
>slop
Russell's Paradox?
Anonymous
11/12/2025, 9:42:03 PM
No.107187433
[Report]
>>107187393
>tunes and drummer are bad because we don't have them on NAI
Anonymous
11/12/2025, 9:42:04 PM
No.107187434
[Report]
>>107187442
>>107187408
It is just a number. I didn't say the model actual name. You see NAI everywhere anon.
Anonymous
11/12/2025, 9:43:08 PM
No.107187442
[Report]
>>107187467
>>107187434
With how much you guys are spamming about muh glm sex it's very obvious what you meant.
Anonymous
11/12/2025, 9:43:13 PM
No.107187444
[Report]
Anonymous
11/12/2025, 9:43:18 PM
No.107187446
[Report]
>>107187373
Deduplication removes identical documents, not repeated information, though. It's the repeated information under many different contexts that gives LLM general knowledge. One epoch of information that is only mentioned and used once won't work.
Anonymous
11/12/2025, 9:43:44 PM
No.107187450
[Report]
>>107187488
>>107187357
There are ways to do data augmentation and synthetic data generation for finetuning. That's the main strength of finetuning IMO.
Any system prompt can be baked into a model through SFT on the generated data, except without wasting context or the model becoming confused due to too many rules. Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.
Anonymous
11/12/2025, 9:44:58 PM
No.107187467
[Report]
>>107187515
>>107187442
But I run my 4.6 locally...
Anonymous
11/12/2025, 9:46:26 PM
No.107187488
[Report]
>>107187559
>>107187450
>Imagine if you could use a 1 MB system prompt and the model actually followed everything in the prompt. That is what people who shit on finetuning don't get.
I will try to get it. For fun. So that expert erp-er prompt actually helps a 7B avoid the surprise prostate/ kissing while blowjob problem? Have you tested it?
Anonymous
11/12/2025, 9:46:53 PM
No.107187497
[Report]
>>107187507
>>107187393
I'm not Drummer, my personal finetuning dataset is private and meant to teach a model how to work effectively with my own code assistant which is also not public.
But I do want to add fiction/roleplay data to it as well to reduce the slop to a bearable level (I've tried and failed to get rid of it by system prompting).
Anonymous
11/12/2025, 9:47:44 PM
No.107187507
[Report]
>>107187519
>>107187497
You just need to switch to a good model, that's the simple fix.
Anonymous
11/12/2025, 9:48:47 PM
No.107187515
[Report]
>>107187538
>>107187467
>i keep hearing so much about 4.6 but I have a shitty pc, where can I use it online?
>oh, I don't want to use kimi because nobody is spamming about it!
Anonymous
11/12/2025, 9:49:09 PM
No.107187519
[Report]
>>107187541
>>107187507
There aren't any good open weights models.
Anonymous
11/12/2025, 9:50:36 PM
No.107187538
[Report]
>>107187617
>>107187515
Sucks to suck for kimi but it is outside an AM5 range. I am gonna fangirl kimi once it is runnable without a server.
Anonymous
11/12/2025, 9:50:49 PM
No.107187541
[Report]
>>107187595
>>107187519
zai-org/GLM-4.6-SEXXO
exists
Anonymous
11/12/2025, 9:52:11 PM
No.107187559
[Report]
>>107187488
I only began seriously finetuning a couple weeks ago. I'll give it a year to see what is possible to achieve as a hobbyist.
>>107187541
When a programming task is too hard GLM always adds fake placeholder code to generate fake but real looking data, then claims everything was done perfectly. It also sometimes gets stuck in loops and uses file edit tools in a way I don't like (rewrites existing code rather than making atomic edits).
Anonymous
11/12/2025, 9:54:48 PM
No.107187599
[Report]
>>107187595
Which part of sex you don't understand?
>>107187595
i said SEXXO you nerd
>>107187538
>I am gonna fangirl kimi once it is runnable without a server.
Kimi K3 Bitnet would be the best Christmas present
Anonymous
11/12/2025, 9:56:28 PM
No.107187623
[Report]
>>107187636
>>107187602
>>107187606
Sex with your hand you mean?
Anonymous
11/12/2025, 9:56:48 PM
No.107187625
[Report]
>>107187800
>>107187617
>Bitnet
let it go, just wait for 4.6 ait to cook
Anonymous
11/12/2025, 9:57:05 PM
No.107187628
[Report]
>>107187602
>>107187606
What memelang is that?
Anonymous
11/12/2025, 9:57:12 PM
No.107187631
[Report]
>>107187617
>bitnet
wake up
Anonymous
11/12/2025, 9:57:44 PM
No.107187636
[Report]
>>107187623
Absolutely. And it also fucked my brain this past month.
Anonymous
11/12/2025, 9:57:50 PM
No.107187637
[Report]
>>107187662
>>107187617
pupper farm reminds of when the qwen 3s said they would bitnet and shit good times of cope
Anonymous
11/12/2025, 9:59:26 PM
No.107187659
[Report]
>>107187691
For me, it's DavidAU clown car MoEs
SCHIZOMAXXING
>>107187326
Yea I like it. I think it was made shortly after deepseek r1, and it was qwen's "lets try reasoning" experiment. It was basically reasoningmaxxed and neglected creating writing or flowery responses as a result. It's dank for generating really specific brief answers, but has no value for erp.
Anonymous
11/12/2025, 9:59:36 PM
No.107187662
[Report]
>>107187683
>>107187637
I love how they never mentioned it again.
Not "oh it doesn't work" not "oh we ran out of time, maybe for qwen 4" just total memory hole silence.
Anonymous
11/12/2025, 10:01:29 PM
No.107187683
[Report]
>>107188008
>>107187662
Jensen gave them a visit to stop that.
Anonymous
11/12/2025, 10:02:18 PM
No.107187691
[Report]
>>107187659
Llama-3-70B-Instruct-Failed-Cryogenic-Reanimation-Support-Group-Moderator-Q4_K_M
Mixtral-8x7B-v0.1-Amateur-Body-Snatcher-But-For-Garden-Gnomes-Q5_K_M
Qwen2-72B-Chat-Evil-Super-Villain-With-A-Very-Specific-Allergy-To-Peanuts-Q8_0
Gemma-2-27B-Accidentally-Summoned-A-Demon-While-Trying-To-Make-A-Vegan-Quesadilla-Q6_K
Mistral-7B-v0.3-Excommunicated-Monk-Who-Now-Runs-A-Successful-OnlyFans-Q4_K_S
Phi-3-mini-4k-instruct-Haunted-Doll-That-Just-Gives-Unsolicited-Parenting-Advice-Q3_K_M
Solar-10.7B-Instruct-Graverobber-Who-Only-Takes-The-Shoes-Q5_0
Yi-1.5-34B-Chat-Cult-Leader-But-The-Cult-Is-Just-About-Organizing-Your-Spice-Rack-Alphabetically-Q4_K_M
DeepSeek-Coder-V2-Lite-Base-Argues-With-Your-Smart-Fridge-About-Your-Eating-Habits-Q6_K
Anonymous
11/12/2025, 10:02:25 PM
No.107187695
[Report]
>model would rather eat a newborn than say nigger
Weirdly real-like...
>>107187369
>GPT‑5.1 Instant, ChatGPT’s most used model, is now warmer by default and more conversational. Based on early testing, it often surprises people with its playfulness while remaining clear and useful.
That's such a massively gay backpedalling on what was a legit improvement
GPT-5 was so much better than 4o in tone
fuck this gay earth
Anonymous
11/12/2025, 10:05:51 PM
No.107187731
[Report]
>>107187716
shit it's actually real? so used to cat posts.. it seems quick for a .1
Anonymous
11/12/2025, 10:09:53 PM
No.107187768
[Report]
>>107187785
>>107187716
their employees were getting mobbed by mentally ill #save4o people on xitter so they caved
>>107187716
>>107187768
a single chat with kimi would kill an average white woman
Anonymous
11/12/2025, 10:13:17 PM
No.107187800
[Report]
>>107187818
>>107187625
Everything is going to be natively trained at int4/fp4 within 6 months, with Hadamard, at the very least. Then the jump to binary/ternary is small.
Anonymous
11/12/2025, 10:13:24 PM
No.107187803
[Report]
>>107187785
Kimi do be really nice on creative writing brainstorming, cause it's the only "stock" model I know to openly tell you your idea is dogshit and you should feel bad.
>>107187660
Wasn't it before r1? I remember it as the first open cot model (before we only had llama with a think step by step prompt larp)
Anonymous
11/12/2025, 10:15:02 PM
No.107187818
[Report]
>>107187800
I hope not but we do always head the worst direction so you're likely right :(
Anonymous
11/12/2025, 10:15:17 PM
No.107187823
[Report]
>>107187785
Or brown man for that matter. Kimi does not fuck around.
Anonymous
11/12/2025, 10:16:09 PM
No.107187832
[Report]
>>107187814
iirc between deepseek's online only reasoner preview and R1 releasing yeah
>>107187660
I disagree. It was very very good for ERP especially for its size and how it had no right to be good. My bet is they reasonmaxxed so hard it fucked with the censorship. Also if i remember the trick for sex was to use it without reasoning.
Anonymous
11/12/2025, 10:20:12 PM
No.107187872
[Report]
>>107187845
>Also if i remember the trick for sex was to use it without reasoning.
that would just give you almost stock qwen 2.5 experience, I think you are misremembering something, there was qwq preview, which was absolute dogshit and then the proper qwq, which distilled r1 a bit afterwards
>>107187814
that was qwq preview (which sucked)
full qwq was released after r1 and was basically a distill done right
Anonymous
11/12/2025, 10:22:48 PM
No.107187907
[Report]
>>107187845
>My bet is they reasonmaxxed so hard it fucked with the censorship.
This happens with all the non-mainline Qwen models. I think they just put less effort into the safetymaxxing of their specialist finetunes.
I have personal benchsets of text to translate, the recent VL models for example are much more likely to accurately translate expletives like "Fuck!" from their corresponding terms in other languages. Whereas mainline Qwen 3 tends to go for "Holy crap!", "Damn" etc and tries very hard not to say "Fuck!".
Their coder models are also more compliant. But not too good for translation because they fuck the multilingualism very hard on those.
Anonymous
11/12/2025, 10:23:10 PM
No.107187912
[Report]
>>107187902
>distill done right
oxymoron
Anonymous
11/12/2025, 10:24:00 PM
No.107187924
[Report]
>>107187902
The preview had a repetition loop issue but was more creative and less censored. The full they distilled from R1 resolved the looping issue but was censored, less creative, and more schizo.
Anonymous
11/12/2025, 10:25:58 PM
No.107187934
[Report]
>>107187953
I don't understand people who have the patience for the amount of thinking tokens R1 and its distilled derivatives tend to output. R1 was the worst thing to happen to open models. (ds v3 was great tho')
Anonymous
11/12/2025, 10:27:57 PM
No.107187953
[Report]
>>107187934
at least it actually somewhat used the thinking, new deepseek "thinks" for half a second, doesn't plan anything and just still answer whatever
>>107187683
OH yes feed me daddy
Anonymous
11/12/2025, 10:35:12 PM
No.107188019
[Report]
>>107188008
>tfw you will never play the pocky game with Jensen
Anonymous
11/12/2025, 10:36:41 PM
No.107188032
[Report]
>>107188182
>>107188008
>jensen huang handing you your daily food ration after agi utopia was achieved
>>107187595
I can tell most of you are vibe coders because it took me a couple of days coding with LLMs to figure out how to avoid this
It's also how I get good usable code out of small models like devsteal and qwen coder moe
hint: it's a technique every junior coder is taught in every bootcamp
>>107188073
>devsteal
give it back jamarius!
Anonymous
11/12/2025, 10:43:50 PM
No.107188089
[Report]
>>107187595
Claude used to do that too
>>107188080
Kek
Devstral*
>>107188080
Kek
Devstral*
Anonymous
11/12/2025, 10:45:42 PM
No.107188108
[Report]
>>107188118
>>107188090
>>107188097
now you go and steal tow posts, you can't keep getting away with this mate
What's the best Speech to Speech for pure Youtube voice overs (*.MP3, *.WAV, *.FLAC).
The goal here is to not disclose my voice on the internet, make my voice deeper, and make the voice over cleaner and more intelligible (I have an accent). I will imprint the emotions in my voice, the model just needs to change the sound of my voice.
I really need the focus to be on it sounding as human as possible, I do not care about real-time voice changing.
Anonymous
11/12/2025, 10:46:27 PM
No.107188118
[Report]
>>107188108
Guess what
I stole two (you)'s two ;)
Anonymous
11/12/2025, 10:46:39 PM
No.107188122
[Report]
>>107188090
>>107188097
>exactly a minute apart
nice try faggot
>>107188117
Good morning sir
This is retarded qvestion sir just use the texts to speeches
Anonymous
11/12/2025, 10:49:25 PM
No.107188152
[Report]
>>107188141
Thank you sir, will do.
Anonymous
11/12/2025, 10:49:55 PM
No.107188157
[Report]
>>107188167
>>107188117
>make my voice deeper
not using female voice, you are ngmi my dude
Anonymous
11/12/2025, 10:51:13 PM
No.107188167
[Report]
>>107188157
voicecel need love and support anon
>>107188032
>agi utopia
= plapping my robowaifu deep into her self-sanitising orifices while she tells me I'm a good boy
Anonymous
11/12/2025, 10:58:47 PM
No.107188220
[Report]
>>107188230
most of you will have become impotent and limp before we reach the stage of robowaifu
Anonymous
11/12/2025, 10:59:54 PM
No.107188230
[Report]
>>107188220
there will be drugs for that.
>System prompt : You are Ani, an AI assistant. Anon is a boy.
>Anon : State yourself, Ani
>Ani : I'm Ani, a boy
This is damage from quantized right? I'm still downloading higher quantz to check.
Anonymous
11/12/2025, 11:03:14 PM
No.107188258
[Report]
>>107188384
>>107188182
And jensen gently places his pocky on your tongue mid thrust.
Anonymous
11/12/2025, 11:03:42 PM
No.107188265
[Report]
>>107188314
>>107188251
Oh, yes. The mystery quant of the mystery model. Yes. It's either that or something else.
Anonymous
11/12/2025, 11:05:25 PM
No.107188289
[Report]
Anonymous
11/12/2025, 11:08:28 PM
No.107188314
[Report]
>>107188358
>>107188265
I'm not sure, it's from
https://huggingface.co/irmma/ERNIE-4.5-21B-A3B-PT-Q4_K_S-GGUF
This is the only link after the original model updated. Prob mistake in quantizing too.
Still downloading Unsloth, but this release is outdated one.
Anonymous
11/12/2025, 11:09:41 PM
No.107188323
[Report]
>>107187326
>yo is it just me or is QwQ weirdly better than you'd expect?
$$\boxed{\text{yes}}$$
Anonymous
11/12/2025, 11:10:56 PM
No.107188331
[Report]
>>107188274
please tell me more about the mystery quant
Anonymous
11/12/2025, 11:12:00 PM
No.107188339
[Report]
x.6 made looking for better model on hf obsolete.
>Hard drives on backorder for two years as AI data centers trigger HDD shortage
the more new models get released, both closed api models and open source, the more I feel all the negative impact of the existence of this tech is not really paying off with the positive
Anonymous
11/12/2025, 11:12:48 PM
No.107188347
[Report]
>>107188913
>>107184585
>You can quantize to Q8_0 slightly better by using the importance-weighted optimizations that smaller
You're not referring to imatrix right?
Anonymous
11/12/2025, 11:13:53 PM
No.107188358
[Report]
>>107188364
>>107188314
>A3B
>Q4_K_S
>retarded
Gee, I wonder why.
>>107188358
stop that and properly help anon
Anonymous
11/12/2025, 11:15:35 PM
No.107188375
[Report]
>>107188182
>Of course, you are absolutely right anon! *the robot leans in for conspiratorial whisper that shivers your spine*
Anonymous
11/12/2025, 11:15:58 PM
No.107188378
[Report]
>>107188364
He can't. That's why he's shitposting and has nothing worthwhile to add.
The LLM is compelled to respond when you hit enter and some anons are compelled to vomit tokens into the post submission field.
Anonymous
11/12/2025, 11:16:43 PM
No.107188384
[Report]
>>107188258
UGHhHG-G-g--g-- when anon sends you over the edge
Anonymous
11/12/2025, 11:17:30 PM
No.107188389
[Report]
>>107188399
>>107188364
>properly help anon
to be honest: just don't use that model, it's one of the millions of garbage models china has flooded the market with
aside from qwen and deepseek the rest are pretty unserious
Anonymous
11/12/2025, 11:18:22 PM
No.107188399
[Report]
>>107188416
>>107188389
>What is the Kimi and GLM4.6
Anonymous
11/12/2025, 11:19:37 PM
No.107188416
[Report]
>>107188399
kimi yes; glm no
Anonymous
11/12/2025, 11:19:57 PM
No.107188419
[Report]
>>107188346
Don't worry, if there wasn't an AI boom, they'd just start burying hardware in concrete like the Funko Pops
Anonymous
11/12/2025, 11:24:46 PM
No.107188460
[Report]
>>107188073
I can tell most of you are poor because I just pay an ukranian draft dodger to code for me
>>107188346
The driving force is wrongly "let's replace the workforce with text prediction" instead of "let's create very cute robowaifus"
Anonymous
11/12/2025, 11:34:34 PM
No.107188522
[Report]
>>107188636
>>107188472
>"let's replace the workforce with text prediction"
That's not really true when every actual AI application required an order of magnitude of workers more for supervision than if they just let the people do it
It's just taking advantage of a bubble where saying muh AI has a 90% chance of making your market cap 20x what it was three years ago
>>107188472
So you would rather have people like me slave away for your fake disability scam aka neetbucks? Fuck you nigger.
If you weirdos get off to anime and text porn I've no doubt you could get off to a talking fleshlight but society isn't made to please your fucked up fetishes. Keep using your hand like everyone else faggot. And jerk off to real women not your fake animu that you cope with because you know no "3dpd" would ever dare to touch your dick that hasn't been washed in a year.
Anonymous
11/12/2025, 11:36:59 PM
No.107188543
[Report]
Anonymous
11/12/2025, 11:38:46 PM
No.107188561
[Report]
>>107188531
Go back to work
Anonymous
11/12/2025, 11:41:51 PM
No.107188582
[Report]
>>107189655
>>107188141
This is a stupid answer because the TTS result is subpar compared to conversions, people notice it's AI and click away.
Anonymous
11/12/2025, 11:44:51 PM
No.107188603
[Report]
>>107188641
>>107188117
Nothing has surpassed RVC yet.
Anonymous
11/12/2025, 11:48:44 PM
No.107188630
[Report]
>>107185469
Without changing the format, that is the best you can probably do. If you could do an overhaul, Trellis quanting via QTIP is SOTA and you have formats from ik_llama.cpp and EXL3 being based on that.
Anonymous
11/12/2025, 11:49:15 PM
No.107188636
[Report]
>>107188522
saar do not redeem
Anonymous
11/12/2025, 11:50:18 PM
No.107188641
[Report]
how many of you have embraced the ssdmaxx kimi life?
>>107188709
>ssdmaxx
not a real thing
even the most autistic schizos here are not going to waste away at 0.2t/s
Anonymous
11/13/2025, 12:06:57 AM
No.107188780
[Report]
>>107188853
>>107188733
I get around 1 t/s though
which actually isn't totally unuseable
Anonymous
11/13/2025, 12:13:15 AM
No.107188816
[Report]
>>107188853
>>107188709
I'm happy with 2.1 t/s Kimi.
Anonymous
11/13/2025, 12:17:37 AM
No.107188843
[Report]
>>107188733
4x pcie5 ssds in raid-0 would be slightly above dual-channel ddr4 level in terms of bandwidth, assuming you have the right system.
Anonymous
11/13/2025, 12:19:47 AM
No.107188853
[Report]
>>107188920
>>107188780
>>107188816
What do you guys use it for?
Or put another way, how does an average session of using the model goes at those speeds?
Anonymous
11/13/2025, 12:22:54 AM
No.107188870
[Report]
>>107188891
>>107188531
>And jerk off to real women
I can't remember the last time I've done that
Anonymous
11/13/2025, 12:25:47 AM
No.107188891
[Report]
>>107188870
Now I realize that it's probably been if not upwards of, at least close to a year since I last did.
Anonymous
11/13/2025, 12:26:02 AM
No.107188894
[Report]
>>107189025
Can someone explain the discrepancy in the measured memory bandwidth for the EPYC 9275F vs the 9255? I thought the memory bandwidth was correlated to the amount of CCDs the CPU has, and the 9275F has 8 vs the 9255's 4.
https://jp.fujitsu.com/platform/server/primergy/performance/pdf/wp-performance-report-primergy-rx2450-m2-ww-ja.pdf
Anonymous
11/13/2025, 12:27:21 AM
No.107188911
[Report]
>>107188251
Tested with the outdated Unsloth Q6_K model, it seems not mixing User and Assistant role.
Anonymous
11/13/2025, 12:27:27 AM
No.107188913
[Report]
>>107188347
>You're not referring to imatrix right?
Probably not, the code is here
>>107184772
It's only technically interesting since perplexity difference is probably less than 0.01 when you actually compare between the two quants, but it is technically speaking strictly better and fully backwards compatible
Q8_0 with 64 or 128 elements per block reducing the bits per weight to 8.25 or 8.125 is probably more relevant since those savings could actually matter for huge models
For a 300B model:
8.5 bpw ≈ 318.75 GB
8.25 bpw ≈ 309.38 GB (≈ 2.9 % smaller)
8.125 bpw ≈ 304.69 GB (≈ 4.4 % smaller)
So you shave 10gb with 0.0001% quality loss. Not sure how 64 elements instead of 32 in a block impacts hardware optimizations though
Anonymous
11/13/2025, 12:28:13 AM
No.107188920
[Report]
>>107188853
General use. Sometimes I have Kimi analyze documents, consolidate lots of information input into easily digestible bullet points for redistribution or making into infographs, produce simple, tedious, but necessary code snippets for hobby projects (do not trust LLMs directly with the codebase yet), acting as a smarter calculator that's more comfortable to use than the average texas instruments calc, and writing shitpost novels and essays for my own amusement. I've started using Kimi to mess around with exploring latent underlying patterns in linguistic theory as a whole too.
Could someone please give me a 60 second summary of what a chat template is and what "jinja" is?
As far as I can tell, models output different kinds of tags, like [THINK][/THINK] vs <think></think> vs <|begin_think|> etc. There is something schema for converting between a raw stream of text and json with the {"role":"assistant","content":"hello"} schema.
My question is, is the chat template included in the gguf (or model repo)? If so, why would I ever have to specify which chat template I'm using via a dropdown? How would the --jinja flag ever be needed then?
Also if I'm programming an interface that wants to convert between raw text and formatted conversations, do I need to access the model? There doesn't seem to be a standard endpoint like /v1/chat-template that would do this for arbitrary models.
Anonymous
11/13/2025, 12:30:48 AM
No.107188942
[Report]
>>107188992
today in autistic RP pet peeves
>model asks a great question early in its reply
>"but wait... my reply isn't long enough"
>writes 4 more paragraphs of action and dialogue that completely run over the original question, making it impossible to respond to it coherently
meanwhile the original question is the only interesting thing about the reply and the rest is a bunch of generic filler designed solely to pad out the response to the desired length
>just edit it
I do, but it's still annoying
Anonymous
11/13/2025, 12:34:04 AM
No.107188971
[Report]
>>107189029
>>107188930
Also, why is like 50% of every changelog fixing chat template problems. How is it not one-and-done by the model author?
Anonymous
11/13/2025, 12:36:50 AM
No.107188988
[Report]
If I'm going to ssdmax, I won't need good compute anyway, right?
Anonymous
11/13/2025, 12:37:07 AM
No.107188990
[Report]
currently going through all my old AI generated stories to organise them into obsidian.
I was not ready to see last modified: 27/06/2022. I think that one may be from GPT-3 davinci on OpenAI Playground.
Anonymous
11/13/2025, 12:37:41 AM
No.107188992
[Report]
>>107189011
>>107188942
The day models are able to modulate output length downward when they recognize they're asking questions will be a good day.
Anonymous
11/13/2025, 12:38:05 AM
No.107188997
[Report]
>>107189045
>>107188930
the --jinja flag is not for chosing an alternate chat templater, it's for enabling the gguf's chat template, without the flag you're actually not running the proper chat template at all
https://github.com/ggml-org/llama.cpp/tree/master/tools/server
--chat-template-file (used in ADDITION to --jinja) is what you would use to set another template formatter.
jinja can run complex logic, for example gemma models really hate not alternating user/assistant roles (for eg having two consecutive user messages) so it will helpfully reject your prompt if you do that
>>107188930
>chat template
A structure a model is trained to follow and that helps create patterns that differentiate things like what is the user's turn, the AI's turn, the thinking block, a tool call block, etc.
>and what "jinja" is?
A template engine. It helps creating templates with conditions, loops, etc.
>is the chat template included in the gguf (or model repo)?
Yes. (and yes).
>why would I ever have to specify which chat template I'm using via a dropdown?
For the most part, you wouldn't. Unless you know that there's something specific about the template you want to change, like adding support for suffixing/prefilling the assistant response.
>How would the --jinja flag ever be needed then?
If I'm not wrong, when you don't use the --jinja flag, llama.cpp uses an internal, built in hardcoded version of the chat template by default. When you use the flag, it pulls the actual JINJA template from the GGUF metadata and parses that.
I imagine that's either (or both) a failsafe and a way to retain backwards compatibility.
>Also if I'm programming an interface that wants to convert between raw text and formatted conversations, do I need to access the model? There doesn't seem to be a standard endpoint like /v1/chat-template that would do this for arbitrary models.
I'm not sure what you are asking.
llama.cpp has two endpoints, one for chat (that receives a structured chat array) and one for text completion (that receives raw text).
Anonymous
11/13/2025, 12:39:06 AM
No.107189011
[Report]
>>107189434
>>107188992
If you think of an important question to ask the user, you can stop generating output and ask the user. You will be able to continue your response using that information after they provide an answer.
Anonymous
11/13/2025, 12:40:58 AM
No.107189025
[Report]
>>107189049
>>107188894
Weird because that that number is like single CPU performance.
Don't discount the possibility that the numanuma dual socket testing went bad and the incorrect value slipped its way into the docs. Does it still show up in single CPU benchmarks?
Anonymous
11/13/2025, 12:41:58 AM
No.107189029
[Report]
>>107188971
>How is it not one-and-done by the model author
the model author didn't make a llama.cpp implementation, and in many cases, didn't make any popular inference tool implementation themselves
and recently mistral even wanted llama.cpp to use their tokenizer (which also did the templating), mistral-common, rather than having to answer to the expectation of jinja templates
machine learning is a far-west with no standards or will to settle on common grounds
Anonymous
11/13/2025, 12:43:39 AM
No.107189045
[Report]
>>107189069
>>107188997
>>107189007
Thanks!
I'm trying to understand the proper way to do this: In my frontend, I have raw text. I would like to convert that text into a conversation with the model's proper chat template.
The backend is whatever arbitrary way I'm running the model, which is llama-server right now.
Anonymous
11/13/2025, 12:44:07 AM
No.107189049
[Report]
>>107189025
My thought was that it was just bad data, or that the value was accidentally swapped with the 9255. This was unfortunately the only data sheet I could find on memory performance for gen 5 EPYC.
411GB/s is way lower than the expected single CPU bandwidth of 576GB/s, as is the 9015 and 9115. The previous explanation I heard was because of the significantly reduced CCDs compared to the higher spec EPYCs.
Anonymous
11/13/2025, 12:45:38 AM
No.107189060
[Report]
>>107189069
>>107189007
>I imagine that's either (or both) a failsafe and a way to retain backwards compatibility.
Most likely the embedded template convention didn't exist initially, so llama couldn't start reading it without breaking backward compat. It'll probably be turned on by default in some future version.
>>107189045
If what you are going for is turn by turn exchanges, you are better off using the "OpenAI Compatible" chat endpoint.
That also has the upside of making your app compatible with whatever other API that uses that style of API like open router and pretty much everything else really.
>>107189060
That's my thinking. GGML format had no metadata from what I remember, then they created GGUF, and I guess at a later revision they added the template as a new metadata field.
Anonymous
11/13/2025, 12:55:05 AM
No.107189114
[Report]
>>107189164
>>107189069
>GGML
RIP TheBLoke.
You were too good for this world and didn't deserve to be killed by ninjas.
Anonymous
11/13/2025, 1:00:12 AM
No.107189137
[Report]
>>107191332
thankfully, the unlimited storage fest is over and so will be the unlimited troontunes production and ensuing quant uploads
some select users like barf-oski were given free reign but if troontuners can't upload more models quanters won't have more cope quant to produce
>>107189114
>TheBLoke
Far from TheBroke as he is to this day getting $398 monthly from Patreon for lord knows what
Anonymous
11/13/2025, 1:06:16 AM
No.107189178
[Report]
>>107186458
>reading compression
that's a new one
Anonymous
11/13/2025, 1:06:46 AM
No.107189181
[Report]
>>107189269
>>107189164
>$4.8k passive annual income, pre-tax
Anonymous
11/13/2025, 1:07:59 AM
No.107189186
[Report]
>>107189288
>>107189164
>getting $398 monthly from Patreon for lord knows what
patreon has a lot of retarded users who sub to a guy and then forget about it
it's a very common phenomenon, saw webnovels writers still get a grand a month long after they stopped writing anything, and worse in the case of things like rpgm and visual novel pron games with no updates
Anonymous
11/13/2025, 1:14:05 AM
No.107189229
[Report]
It's over
Anonymous
11/13/2025, 1:16:34 AM
No.107189248
[Report]
It's starting again
Anonymous
11/13/2025, 1:17:24 AM
No.107189256
[Report]
it never began and nothing ever happens
Anonymous
11/13/2025, 1:19:36 AM
No.107189269
[Report]
>>107189181
i live in switzerland and that's pocket change there lol.
though i had a collegue that worked remote for a US company for like 120K/y but he lived in vietnam, dude must have been a king there.
Anonymous
11/13/2025, 1:20:20 AM
No.107189274
[Report]
It never stopped, in fact it's going too fast and in the wrong direction
Anonymous
11/13/2025, 1:22:12 AM
No.107189288
[Report]
>>107189186
The real cash is keeping something updated but that something is so pedestrian it involves no effort to keep it updated. All the effort was baiting retards into sucking you off.
Many "projects" earning 20K+ a month.
Anonymous
11/13/2025, 1:28:49 AM
No.107189327
[Report]
>>107189394
Anonymous
11/13/2025, 1:43:39 AM
No.107189424
[Report]
at least HF built this
Anonymous
11/13/2025, 1:45:01 AM
No.107189434
[Report]
>>107189011
Well look at mister hotshot Prompt Engineer over here
>>107189394
There are women in this hobby?
>>107189518
pleasuring yourself with written word is indeed extremely female field
>>107189544
so much this sister
this thread oozes estrogen
Anonymous
11/13/2025, 2:16:13 AM
No.107189622
[Report]
>>107189518
>>107189544
There are no men in the hobby. Only fujoshis pretending to be men.
Anonymous
11/13/2025, 2:18:46 AM
No.107189638
[Report]
>>107189756
Anonymous
11/13/2025, 2:19:47 AM
No.107189652
[Report]
>>107189617
10/10, would write my own llm smut fanfics about.
Anonymous
11/13/2025, 2:20:02 AM
No.107189655
[Report]
>>107189678
>>107188582
Not with vibevoice
Tons of live service video games have implemented AI voice acting and hardly anyone has noticed
It's only the jeets using 2 year old tts tech for YouTube spam that are obvious about it
Anonymous
11/13/2025, 2:23:18 AM
No.107189678
[Report]
>>107189655
I've seen some pretty convincing youtube spam that I only noticed as TTS because no human could possibly read 30 minutes of repetitive GPT slop without wanting to kill themselves
if you notice a 30 min video with slop writing that uploads often it has to be TTS even if it sounds human
Anonymous
11/13/2025, 2:25:20 AM
No.107189699
[Report]
>>107189711
>>107189617
"Thing: Japan" reigns supreme.
WEGs are also notoriously ugly compared to hentai games, a phenomenon I find both fascinating and alarming.
Anonymous
11/13/2025, 2:28:03 AM
No.107189711
[Report]
>>107189699
Mahotsukai no Yome is so good.
Wizard's Blue is also great.
Anonymous
11/13/2025, 2:35:44 AM
No.107189756
[Report]
Anonymous
11/13/2025, 3:17:09 AM
No.107190002
[Report]
>>107191389
>>107188117
seed-vc after finetuning on a particular voice, depends on your input tho
haven't touched TTS though in like 9 months, curious if anyone has thoughts on what's good nowadays
Anonymous
11/13/2025, 5:15:53 AM
No.107190781
[Report]
>>107190853
open models?
I'm too busy discussing the Steam hardware/software news that came out today. Might want to do the VR on my AI PC but not sure if that'll need SteamOS or something.
Anonymous
11/13/2025, 5:29:39 AM
No.107190853
[Report]
>>107190852
i'm kinda disapointed by the steam frame, if it had been 4k per eye oled, sure.
but this isn't an upgrade from my bigscreen, much better than the quest though;
Anonymous
11/13/2025, 5:31:16 AM
No.107190859
[Report]
>>107190913
>>107190852
>>107190855
i just sold my valve index for $800 to then buy the steam frame when that comes out
Anonymous
11/13/2025, 5:33:01 AM
No.107190873
[Report]
>>107190852
SteamOS is just Arch with Steam. Should be able to do the VR on any Linux.
Anonymous
11/13/2025, 5:33:20 AM
No.107190878
[Report]
>>107189544
>>107189617
it is well known that literacy is gay
>>107190855
Yeah I had that thought as well but then it's not the product for us (enthusiasts), but a Quest competitor and will be priced as such probably. But one thing that kind of got me was the weight. I had to do a double take since it was so surprising. It's 185 grams in the front. That's literally like 100 grams lighter than the Rift CV1's front module, even though it has the full SoC in it and stuff. And just 2x heavier than the Beyond 2. First impressions also said it was super comfy. So that kind of got me hyped again, for the overall package, just not for the thing I thought Deckard was going to be of course.
Anonymous
11/13/2025, 5:39:23 AM
No.107190913
[Report]
>>107190925
>>107190859
they are different devices though, it'll be more confortable to play on the steam deck for 3h than it'd be on a vr headset.
Anonymous
11/13/2025, 5:40:24 AM
No.107190923
[Report]
>>107190893
yea desu i'll probably buy it if it's < 600 bucks.
i kinda want to get rid of my quest too because it's just taking dust.
Anonymous
11/13/2025, 5:40:38 AM
No.107190925
[Report]
>>107190930
>>107190913
where did i bring up the steam deck?
Anonymous
11/13/2025, 5:41:25 AM
No.107190928
[Report]
>>107190952
>>107190893
It's comfy and optimized for wireless streaming. I don’t mind the resolution, but I can’t go back to LCD, the gray mess where there should be pitch black ruins immersion for me
Anonymous
11/13/2025, 5:41:54 AM
No.107190930
[Report]
>>107190933
>>107190925
i think it is about time i go to bed lmao.
Anonymous
11/13/2025, 5:42:50 AM
No.107190933
[Report]
>>107190947
>>107190930
ok goodnight. i said i sold my index, not a steam deck. they are both vr headsets
Anonymous
11/13/2025, 5:46:07 AM
No.107190947
[Report]
>>107190933
yea i get it now, idk when i read index the steam deck poped in my mind.
good night / day man !
Anonymous
11/13/2025, 5:47:28 AM
No.107190952
[Report]
>>107190968
>>107190928
Yup. And I also enjoy the color accuracy of good OLED/QLED. It won't replace my monitor. But I could use some VR gaming in my life again. I had fun with it in the past and just didn't bother keeping up with the hobby. I'd probably get a Frame and then hope they do a Frame OLED so I'd sell the old one.
Anonymous
11/13/2025, 5:51:51 AM
No.107190968
[Report]
>>107191009
>>107190952
since it's linux based anyway, we'll probably have ways to mod and upgrade the displays though.
Anonymous
11/13/2025, 5:54:07 AM
No.107190976
[Report]
>>107185557
so they give out licenses to generate loli porn? and a paycheck?
Anonymous
11/13/2025, 5:59:43 AM
No.107191009
[Report]
>>107191075
>>107190968
You can't just replace panels because pancake lenses lose 90% of the light and oled isn't bright enough to compensate for that
Anonymous
11/13/2025, 6:13:02 AM
No.107191075
[Report]
>>107191103
>>107191009
depends which oled technology we are talking about, more expensive panels can rival and be bright enough.
i think they were going for an affordable device not a premium one and that's why they made that choice.
there are many vr devices with pancake optics and oled panels.
Anonymous
11/13/2025, 6:18:24 AM
No.107191103
[Report]
>>107191114
>>107191075
The only option is microOLED, which features an array of individually controllable OLEDs with RGB filters in front of them, topped by a tiny collimating microlens per pixel to focus light in one direction. These are expensive and barely bright enough for use with pancakes
>>107191103
bigscreen beyond is oled.
shiftall meganex superlight 8k is oled.
you can look on vr compare there are tons of oled vr headset with pancake lenses.
yes it's not cheap, but i wouldn't mind paying 2k for a device with 4k oled panels.
Anonymous
11/13/2025, 6:23:52 AM
No.107191136
[Report]
>>107191114
Both use microOLEDs that function as described. Bigscreen uses 2.5K microOLED displays from BOE, MeganeX 3.5K/3.8K microOLED
>i wouldn't mind paying 2k
You are not their target audience
Anonymous
11/13/2025, 6:27:30 AM
No.107191156
[Report]
>>107191165
>>107191114
the Beyond 1/2 pretty much only work because their custom fitted face pads prevent light leaking, so their low brightness microOLEDs still look okay thanks to the total darkness inside
>>107190852
>steam quest 3.5
VR is possibly the only market more depressing than AI inference hardware.
>>107191156
fun how you ignored the other device i mentioned.
also vision pro and galaxy xr are also oled.
and again, on vr compare you'll find dozens of other devices with oled and pancakes.
Anonymous
11/13/2025, 6:30:44 AM
No.107191174
[Report]
>>107191161
> more depressing
at least they are actualy getting hardware.
we beg for a few crumbs of extra vram
Anonymous
11/13/2025, 6:39:21 AM
No.107191225
[Report]
>>107191165
Apple uses microOLED with dual white OLED backlights to add extra brightness, resulting in lower yields and very high prices
>on vr compare you'll find dozens of other devices with oled and pancakes.
Unlike you, I’ve tried or owned nearly all of them
Anonymous
11/13/2025, 6:49:29 AM
No.107191274
[Report]
>why can't they just make an OLED version later
Since it's micro OLED, it's completely different in terms of how the panels are made and the resulting panel size. They're much tinier screens, which means the optics have a harder job to do and some trade-offs have to be made. Notice that the MeganeX and Beyonds all have tiny lenses. They advertise an average FOV which seems good on paper, but the reality is that actually that FOV is only when you've gotten your eyes almost touching the lenses AND you are looking straight ahead. When your eyes rotate to look at things, your FOV on that side gets reduced. It makes the experience worse but it's hard to understand that it's an issue and no one talks about it. So this is why there will not easily be a Frame OLED, unless they give it an optical redesign. Also, this isn't to say stuff like Beyond is bad. Obviously micro OLED is great for being OLED, and the headsets are lighter. In the end it's just trade-offs.
Anonymous
11/13/2025, 6:53:49 AM
No.107191298
[Report]
>>107191161
I wouldn't say so? Just the fact that we have an alternative to Meta's closed garden and it runs any PC application you want makes their industry's progress way ahead of where we're at, where there is basically no alternative to Nvidia that's actually good (and not expensive like apple). Also RAM prices getting fucked.
Anonymous
11/13/2025, 6:55:43 AM
No.107191308
[Report]
Good morning sirs!
Anonymous
11/13/2025, 7:01:52 AM
No.107191332
[Report]
>>107189137
>free reign
it's "REIN" you dumb fuck mutt
you have terabytes of LLMs andyet you chose to stay retarded
Anonymous
11/13/2025, 7:01:53 AM
No.107191333
[Report]
is this /lmg/ or /vrg/?
Anonymous
11/13/2025, 7:02:22 AM
No.107191335
[Report]
>>107192148
>>107191165
No one asked.
Anonymous
11/13/2025, 7:13:28 AM
No.107191389
[Report]
>>107189069
>GGML format had no metadata
the terminal retardation of niggerganov has shown itself even back then
i remember thinking 'why would you fucking embed textual metadata into a multi-gigabyte binary file, surely he'll come to his senses and put them into an external file'
and now you can redownload the full thing for a 1-byte template/token change
c++ programmers are garbage
Anonymous
11/13/2025, 7:29:22 AM
No.107191480
[Report]
>>107191472
to be clear i'm talking about dogshit GGUF here
Anonymous
11/13/2025, 7:31:33 AM
No.107191492
[Report]
>>107185810
what a fucking retarded take
TIME IS MONEY
when asking an LLM to do some coding for me (mostly relegate it to do unit tests/documentation, but sometimes I want to see its takes on how to efficiently implement some functions) you want it to be FAST.
Is not wasting time an hard to grasp concept? fucking RETARD
Anonymous
11/13/2025, 7:33:12 AM
No.107191507
[Report]
>>107191530
>>107191472
That's entirely on the users. The textual metadata could easily be patched and no one would have to download more than a simple text file, but it's just easier for everyone to reupload and redownload the entire model weights because people are dumb and lazy, internet speeds are fast, and HF apparently has bandwidth to spare.
Anonymous
11/13/2025, 7:37:21 AM
No.107191530
[Report]
>>107191611
>>107191472
i mean, you can still load templates externally. I think it's good that you can embed a default template. The issue is like
>>107191507 says: users are retarded, quant makers and hf dont care.
I remember the unsloth guys fucked up a template recently and they reuploaded the whole model again.. I think twice? because distributing a template is HARD.
Anonymous
11/13/2025, 7:37:49 AM
No.107191532
[Report]
>>107191611
>>107191472
braindead take, you can just patch the file if you care about bandwidth, it's way better that the file is self contained
btw holy shit i haven't run a 70b model in ages and i just tried again and llama.cpp went from 4tok/s to 12tok/s somehow, based ggregnov
Anonymous
11/13/2025, 7:48:29 AM
No.107191611
[Report]
>>107191688
>>107191530
it's not only templates, llama 2 goofs had to be recreated and reuploaded due to that borked eps setting or whatever
that would have been a good clue to realize "we've fucked up goys" but no
>>107191532
yeah i can "just do" anything
it just takes TIME
and i don't like it when retards play with my time
wget shitstral-q8.xml <- now was that so hard
Anonymous
11/13/2025, 8:03:13 AM
No.107191688
[Report]
>>107191825
>>107191611
wget shitstral-q8-metadata.patch | ./scripts/patch_model.py shitstral-q8.gguf <- now was that so hard
sars what's the best model for 48gb vram 64gb ram? some low bpw moe? 70b miqu?
Anonymous
11/13/2025, 8:25:52 AM
No.107191815
[Report]
>>107191833
we ended up with split data on goofs because of the multimodal mmproj anyway
we would have been better off having metadata done separately too
this is why imagen people are less retarded, they quickly dropped the idea of single file from the stable diffusion niggers and went all in diffusers
Anonymous
11/13/2025, 8:26:31 AM
No.107191817
[Report]
Anonymous
11/13/2025, 8:26:43 AM
No.107191818
[Report]
>>107191875
>>107191759
gm sir. glm air best model for ramlet sir
Anonymous
11/13/2025, 8:28:22 AM
No.107191825
[Report]
>>107191851
>>107191688
if that sounds better to you, then you can actually be a woman one day, or already are
Anonymous
11/13/2025, 8:30:08 AM
No.107191833
[Report]
>>107191860
>>107191815
How about you STFU and just use a framework based on PyTorch instead?
You're not too poor for that, are you?
Anonymous
11/13/2025, 8:32:41 AM
No.107191851
[Report]
>>107191904
>>107191825
It is objectively better in that it can be done today, right now, with existing models and doesn't require yet another ggml file format change, dipshit.
Anonymous
11/13/2025, 8:34:21 AM
No.107191860
[Report]
>>107191833
>How about you STFU
not before you lick my anus
Anonymous
11/13/2025, 8:36:09 AM
No.107191875
[Report]
>>107191818
thanks saar, is there anything specific i need to do to get llama.cpp to handle moving the experts in and out of vram/ram to make it fast or is it just works tm?
Le ironic saarposting is just cringe.
Anonymous
11/13/2025, 8:39:48 AM
No.107191898
[Report]
>>107185810
Even basic agentic workflows improve everything, from coding tasks to erp, at the cost of generation time. Smaller models using multistage responses outperform direct chat with a larger model
Anonymous
11/13/2025, 8:39:55 AM
No.107191899
[Report]
Anonymous
11/13/2025, 8:40:28 AM
No.107191904
[Report]
>>107191930
>>107191851
>objectively better
when someone says this it's almost always guaranteed to be false
however this time it's ok
the point was that niggerganov was confirmed to be a thoughtless retard back then, not what would be the less-braindead workaround right now
and so, llama.cpp doesn't have version numbers TO THIS DAY
no release cycle
code is just simply being shat into the repo daily, testing in prod
this is the guy "designing" goof
Anonymous
11/13/2025, 8:40:51 AM
No.107191906
[Report]
>>107191895
Good Mornings and many blessings of Vishnu for you saar!
Anonymous
11/13/2025, 8:44:12 AM
No.107191930
[Report]
>>107191904
>and so, llama.cpp doesn't have version numbers TO THIS DAY
yeah lcpp had really buggy multimodal across the board for a while (it seems like recent releases finally ironed out all the issues) because of some refactors they were doing on kv cache / slots mechanisms and the lack of versioning, proper roadmap and branching of feature development really makes this all feel chaotic with no way of ever knowing if you're using a release from a "blessed" time of development where nothing retarded is happening, there's just nothing telling you when the retardation starts and stops it's like running windows insiders edition
Anonymous
11/13/2025, 8:44:55 AM
No.107191933
[Report]
>>107191944
Anonymous
11/13/2025, 8:47:44 AM
No.107191944
[Report]
>>107191933
saar not like this saar this is ai fake made by people who hate saar
Anonymous
11/13/2025, 9:19:53 AM
No.107192130
[Report]
Anonymous
11/13/2025, 9:22:49 AM
No.107192148
[Report]
>>107192329
>>107191335
i don't care, you made the claim that it's not possible which is patently false.
Anonymous
11/13/2025, 9:48:01 AM
No.107192305
[Report]
>>107189518
You gotta counterweight 99.999% of written smug (including gayshit) being women and requiring to click an exe, so it's 50% women
Anonymous
11/13/2025, 9:52:50 AM
No.107192329
[Report]