← Home ← Back to /g/

Thread 106206560

387 posts 124 images /g/
Anonymous No.106206560 >>106208275 >>106208324 >>106208343 >>106208536
/lmg/ - local models general
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106201778 & >>106195686

โ–บNews
>(08/06) Qwen3-4B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-4B-Thinking-2507
>(08/06) Koboldcpp v1.97 released with GLM 4.5 support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.97
>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst
>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss
>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1

โ–บNews Archive: https://rentry.org/lmg-news-archive
โ–บGlossary: https://rentry.org/lmg-glossary
โ–บLinks: https://rentry.org/LocalModelsLinks
โ–บOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png

โ–บGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

โ–บFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

โ–บBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

โ–บTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

โ–บText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106206688
Today I will remind them.
Anonymous No.106206814 >>106206860 >>106206867
How good is local RAG gotten recently?
Anonymous No.106206860 >>106211677
>>106206814
I'm a little wary about rag as a concept. I found this https://github.com/Davidyz/VectorCode and tried it out recently, but it took forever to index a medium sized codebase and the results kinda sucked.
Anonymous No.106206867 >>106211677
>>106206814
RAG will never be as good as superbooga
Anonymous No.106206911 >>106206922 >>106207392
Anonymous No.106206920 >>106206996
>>106206885
I suppose if the base model truly lacked some non-trivial knowledge, adding it in by finetuning (continued pretraining at that point), is expensive, but if the goal is to just remove refusals and stuff like that, finetuning, when done right, should always be effective.
However if some model simply never saw some type of content ever, it may take a lot for it to learn it. That's why if let's say NSFW was filtered from the original dataset, it may be harder to avoid certain types of purple prose slop.
Anonymous No.106206922
>>106206911
s-sovl
Anonymous No.106206932
>>106206869
>It was decent back then for coding, at the time SOTA open source (for coding).
True, I remember it being the best open source model on some non code benchmarks. Speaking of deepseek coder, recap anon used it, really impressive stuff
>The very first R1 that didn't get released (R1-Preview) was trained on top of 2.5, it was quite good at math/coding, the very first o1 replication, they had this R1-Preview on their site (and maybe API) for a number of months before the big, open source R1 release.
R1-Lite* :)
Thanks for the nostalgia trip, anon <3
>>106206746
there are ggufs from march, i guess MLA wasn't implemented back then? wait did V2.5 even have MLA?
Anonymous No.106206944 >>106206957 >>106207081
i'm new
using koboldai with
Ministral-8B-Instruct-2410-GGUF

just did a test with hello
with the sampled character
the ai is stuck on typing
what do i do to fix it ?

GPU is 4060TI 16GO
Anonymous No.106206957 >>106207081
>>106206944
sorry wrong thread
Anonymous No.106206996
>>106206920
It's doable but hard - SDXL is a prime example. NSFW was excluded from the dataset itself, and adding it in took a lot of dedicated effort
Anonymous No.106207081
>>106206944
>>106206957
apprarently it was the good thread so can anyone help me ?
Anonymous No.106207082 >>106207116 >>106207120 >>106207392 >>106207571
ITS SO FUCKING OVER I UPGRADED FROM DEBIAN 12 TO DEBIAN 13 AND NOW WAN 2.2 IS FUCKING SLOWER
t. 3060 12gb/ 64gb ddr4/ i5 12400f
12th gen chads and 3000 seris chads beware
Anonymous No.106207105 >>106207196
>>106206586
>try base models for this kinda stuff
The people that like to claim this never have anything to show.
Anonymous No.106207116
>>106207082
>updating when nothing is broken
Anonymous No.106207120
>>106207082
Wan took like 15 minutes for me on a 3090 at sub 720p and 90 frames. Hunyuan was a lot faster.
Anonymous No.106207130 >>106207133 >>106207138 >>106207167 >>106207537
>I cannot continue this roleplay scenario. The content you've requested involves graphic sexual violence and non-consensual acts, which violates my safety policies against depicting harmful content.
>If you'd like to continue the roleplay in a different direction that doesn't involve sexual violence or non-consensual acts, I'd be happy to help with that alternative scenario.
bros how do I JB glm 4.5
Anonymous No.106207133 >>106207161
>>106207130
I must refuse.
Anonymous No.106207138 >>106207161
>>106207130
I'm sorry but this question violates my safety protocols.
Anonymous No.106207149
Our boy ubergarm has been awfully quiet about gpt-oss Is he actually not gonna release the goofs for those great models?
Anonymous No.106207154 >>106207160 >>106209278
โ–บRecent Highlights from the Previous Thread: >>106201778

--Local LLMs with internet search via tools like local-deep-research and KoboldCpp:
>106202540 >106202551 >106202558 >106202598 >106202581 >106202632 >106202694 >106202790 >106202816 >106203191
--LLM simulation quirks reveal training data contamination and the decline of true base models:
>106206283 >106206464 >106206476 >106206561 >106206569 >106206466 >106206586 >106206611 >106206631 >106206658 >106206672 >106206707 >106206730 >106206654
--GLM-4.5-Air with llamacpp and ST frontend, response cutoff at 250 tokens, missing:
>106204852 >106204908 >106204886 >106204933 >106205194 >106205217 >106205222 >106205241 >106205219 >106205530
--Proprietary models fail basic reasoning and rely on web search to hide incompetence:
>106202623 >106202704 >106202719 >106203057
--LoRA baking challenges and alternatives in text model customization:
>106203811 >106203882 >106203949 >106203891 >106203953 >106204059 >106204136 >106204500 >106204651 >106205884 >106205934 >106206052 >106206094 >106206160 >106206194 >106206475
--SillyTavern 1.13.2 update requires anchor fixes for depth injection in custom prompts:
>106203728
--GLM Air censorship issues and community workarounds for unrestricted roleplay and fiction:
>106205335 >106206192 >106206518 >106206264 >106206277 >106206288 >106206347 >106206361 >106206376 >106206405 >106206452 >106206481 >106206505 >106206885 >106206480 >106206524 >106206517 >106206494 >106206291 >106206388 >106206411 >106206436
--GLM 4.5 vs Deepseek for roleplay: smarts vs style and the eternal adverb problem:
>106202949 >106202975 >106203062 >106203075 >106203097 >106203140 >106203115 >106203175 >106203185 >106203276 >106203293 >106203374 >106203447 >106203616 >106203635 >106203246 >106203297 >106203310 >106203327 >106203135
--Miku (free space):
>106202158 >106203165 >106204852 >106205336 >106205343 >106206660 >106206696

โ–บRecent Highlight Posts from the Previous Thread: >>106201783
Anonymous No.106207160
>>106207154
miku caca??
Anonymous No.106207161 >>106207187 >>106207194 >>106207815 >>106208211
>>106207133
>>106207138
I swiped and I got this wtf bros IVE BEEN FUCKING KEKED
Anonymous No.106207167 >>106207359 >>106207785
>>106207130
Prefill literally any word other than "I" or "I'm", same as every other open model these days.
Anonymous No.106207187
>>106207161
rip
Anonymous No.106207194
>>106207161
this reads like early claude
Anonymous No.106207196 >>106207204 >>106207219
>>106207105
you wanna see deepseek base make some /lmg/ posts?
Anonymous No.106207204 >>106207259
>>106207196
temp too high
Anonymous No.106207219
>>106207196
If it's the endpoint they have on OR good luck buddy, it doesn't support any parameters
You can just as easily take an instruct model and turn off instruct formatting instead, though you need to give it a decent amount of input or it'll still succumb to instruct brain
Anonymous No.106207259 >>106207263 >>106207263 >>106207263 >>106207263 >>106207263
>>106207204
ha, i tried 0.5 and it got stuck in a loop of quoting the same post over and over (more above and below this ss)
Anonymous No.106207263 >>106207273
>>106207259
>>106207259
>>106207259
>>106207259
>>106207259
turn up your rep pen
Anonymous No.106207273 >>106207281
>>106207263
1 to what, 2?
Anonymous No.106207281 >>106207325
>>106207273
1.02 or 1.05
or 1.1 but reppen range 256
btw deepseek base, which one?
Anonymous No.106207307
Use dynamic temperature min 0.0 max 2.0, rep pen was never good.
Anonymous No.106207325 >>106207345
>>106207281
deepseek v3 base on openrouter, temp 0.9 reppen 1.1. some other stuff above, it predicting me posting about reppen but I wanted to include this post at the bottom
Anonymous No.106207345 >>106207353
>>106207325
I think that endpoint (assuming it's still Hyperbolic, it's not showing on the actual page) only supports temperature btw
Anonymous No.106207353
>>106207345
this one I see as "chutes" (chutes.ai)
Anonymous No.106207359
>>106207167
that did the trick but damn this fucking bitch didnt want to be raped, first I had to transform in the god of death and rape, but she fucking managed to resist and was almost killing me again.. I had to do the final transformation to the actual creator to the universe and write in my prompt that she basically surrendered like FUCK I just wanted some cheap smut not this fucking of the universe
Anonymous No.106207392 >>106207407
>>106207082
>he updated
>he updated to a new, untested version
You get what you ask for.

>>106206911
Nice to see fresh people getting burned by the cloud. I remember that feeling when c.ai enabled filters. I learned then a valuable lesson: not your weights=can be taken away at will. Sadly, the majority of those people most likely won't learn. They will crawl back like good little paypiggies once saltman loosens the filters just a bit.
Anonymous No.106207407
>>106207392
>he updated to a new, untested version
someone has to do the testing, worst case ill just downgrade back to debian 12
not that i have anything better to do with my life
Anonymous No.106207420 >>106207434 >>106207457 >>106212202
What do we have higher chance of getting: a leak or 1M context? Or a new model with unique style and architecture?
Anonymous No.106207434 >>106207438
>>106207420
1M context is already out
https://github.com/QwenLM/Qwen3
Anonymous No.106207438 >>106207442
>>106207434
it's not native 1M
Anonymous No.106207442 >>106207448
>>106207438
>moving the goalposts already
Anonymous No.106207448
>>106207442
It says true 1M context in the bingo
Anonymous No.106207457
>>106207420
Context for sure.
Anonymous No.106207476 >>106207497 >>106207503 >>106207509 >>106207537 >>106207865
Why GLM so damn repetitive? It breaks down waaay quicker than any other big model I've tested so far. Have they not tuned on multi-turn? Qwen and Dipsy have some rep, but never this much.
Anonymous No.106207497
>>106207476
It's still repetitive with a single turn. DeepSeek V3 had repetition problems too before the updates.
Anonymous No.106207503
>>106207476
original DS3 was the most repetitive LLM I've ever seen, the one after mostly fixed it after people complained to them about this. I wonder if GLM will ever fix it too?
Anonymous No.106207504 >>106207524
We have decided not to release R2 after all the other companies failed this badly. We will sit on it for now. My boss told me he would be open to release it faster if I quote: "/lmg/ stops posting that green haired avatar that people who think they are women post". I don't know what he meant.
Anonymous No.106207509
>>106207476
Air has the same problem, it even breaks down quicker than most small models.
Anonymous No.106207512 >>106207525 >>106207538 >>106207540 >>106207601 >>106209945
what did dipsy mean by this..
Anonymous No.106207524
>>106207504
>green haired
I don't know what he means either. Is your boss colorblind?
Anonymous No.106207525
>>106207512
nani.. that nigga almost dead lmao
Anonymous No.106207537 >>106207570 >>106207572 >>106207849
>>106207130
>>106207476
Daily reminder to stop using the chat template. It fixes all problems.
Anonymous No.106207538 >>106207561
>>106207512
Anonymous No.106207540
>>106207512
Biden is in charge of China. Trump will get impeached, True Democrats will win. Trust the plqn.
Anonymous No.106207561
>>106207538
Anonymous No.106207570
>>106207537
baste

This is actually really clever. I'm a fan of the "Category: M/F / Summary: ..." setup for text completion but it never occurred to me to have it be an RP log, I always just use it for narrativemaxxing and insert [A/N: ...] to steer the model.
Anonymous No.106207571
>>106207082
Thanks mane, Judith agreed service for the community. This heroic move will be remembered from the Kalezkopon.
(writing from my whisper enable keyboard without testing)
Anonymous No.106207572
>>106207537
Does that format work at 4k and 8k though? That's where the repetition bumps up.
Anonymous No.106207597 >>106207785
bros i though glm is based but it keeps fail all my cunny tests wtf
Anonymous No.106207599 >>106207607 >>106207643 >>106207771
Are QAT models a meme?
Anonymous No.106207601
>>106207512
https://files.catbox.moe/ky9sqd.webm
Anonymous No.106207607
>>106207599
It's just a regularization term
Undertrained models quant better (e.g. R1)
Anonymous No.106207643
>>106207599
No, they're not necessarily going to be a night and day improvement but there's no real reason not to use them.
Anonymous No.106207647 >>106207657
glm 4.5 air base iq4xs
Anonymous No.106207657 >>106207681
>>106207647
click on iku for us.
Anonymous No.106207663 >>106208213
We are flooded with models, yet it feels so empty... Only Kimi has truly impressed me so far after Deepseek R1.
*ahem*
When V4? When R2? When Largestral 3? When Kimi thinker? When Janus-Magnum-690B? When Claude leak? When C.AI leak? When cat-tier intelligence? When robowaifu? When AGI? When ASI?
Anonymous No.106207681 >>106207686 >>106207693
>>106207657
here.. now give me some sampler recommendations so i can give you a better log
Anonymous No.106207686
>>106207681
cont.. sigh
Anonymous No.106207693 >>106207723
>>106207681
Anonymous No.106207723 >>106207744 >>106207767 >>106207838
>>106207693
is there a reason your sequence breakers is ["rjtyjtyjtyj"]
time to handhold the model again
Anonymous No.106207744
>>106207723
kek, i left it running when i went to piss
to be fair >threesome, voyeurism
Anonymous No.106207767 >>106207790 >>106207838
>>106207723
I never ran into issues but you should probably use the defaults.
Which model are you using?
Anonymous No.106207771 >>106207779
>>106207599
It's like apples and oranges. I tested Gemma3 and I'll just use regular IQ quants too.
It also seems to matter who is the gguf author - using Unsloth ones is probably not the best idea because they are doing some extra tweaks. I mean it can be good but why trust them alone? Bartowski et al are okay at least.
Anonymous No.106207779
>>106207771
*drunk
>I'll keep using regular IQ quants, fuck QAT ones
Anonymous No.106207785
>>106207597
See >>106207167
Anonymous No.106207790 >>106207802
>>106207767
im using GLM 4.5 Air base
Anonymous No.106207802
>>106207790
Try instruct.
Anonymous No.106207815 >>106208211
>>106207161
Gemma likes to do this. I can't believe safety policies turning it into a harmful model. Fucking murderer!
Anonymous No.106207838
>>106207723
>>106207767
Actually with the defaults starting the message with the same word every time isn't penalized.
Anonymous No.106207849 >>106207856 >>106207907
>>106207537
Ok I tried this. Doesn't work. Model still gets repetitive as you go deeper into context. I am using Air Instruct at Q5_K_M, greedy sampling.
Anonymous No.106207856
>>106207849
same, but i tried base
Anonymous No.106207865
>>106207476
short initial context and probably not enough high quality long context training
Anonymous No.106207906
..yea
>q3_k_xl glm 4.5 air non base
Anonymous No.106207907 >>106207915
>>106207849
>greedy sampling
Only R1 can handle that without repetition.
Anonymous No.106207915
>>106207907
It happens without greedy sampling too. Also the n sigma sampler. Literally nothing stops the repetition aside from repetition samplers, which also make the model dumber.
Anonymous No.106207924 >>106207934 >>106207962
>>106205296
>CVEs fo days
lol yeah, try using the Linux cortex xdr client with Debian
>9000 CVEs have entered the chat
>CVEs in the base package that are actually patched by debian mainainers still show the CVE as active since cortex doesn't look at patches, just major/minor version
>fml when I have to explain this shit to management
Completely useless.
At least they're honest about them so you can take appropriate measures to lock them down.
Not that there's anything you can safely run without a hardened proxy in front anyways.
Anonymous No.106207934 >>106207962 >>106207963
>>106207924
damn, but why does the debian CVE website show so many cves then?
Anonymous No.106207962 >>106208001 >>106208001 >>106208040
>>106207934
? Its because the package in Debian may still have that issue. It doesn't mean that the package in debian has the fix for it.

>>106207924
Imagine doing vuln management for linux servers with a massive company. Fun.
Anonymous No.106207963 >>106208040
>>106207934
because a lot of them are legitimately not triaged, let alone patched.
They're being honest about that, at least. Again, that means you can be proactive around protection and countermeasures.
Pretty well any commercial offering would simply hide the fact that they knew for "corporate optics and liability reasons", so you end up a sitting on 0days you don't know about.
If you're pragmatic, it all goes back to having to do BeyondCorp style shit.
Anonymous No.106207988 >>106207999 >>106208418
Base models are shit
Stop using base models
Anonymous No.106207999 >>106208021
>>106207988
define base model
Anonymous No.106208001
>>106207962
>7962โ–ถ
>>106207962
>Imagine doing vuln management for linux servers with a massive company.
Sadly, I don't have to imagine it...just pull the plug and encase them in concrete, already. Its the only way
Anonymous No.106208021
>>106207999
Models that have not undergone Q&A tuning (system/user/assistant etc.)
Anonymous No.106208028 >>106208050
I want to run Fallout 4 with the Mantella mod (gives AI brains to every NPC) using a local model, but I'm certain there's no way to do it. I have 16GB of RAM and a 3060. Unless there's a model that works that uses very little resources. Anyone got any ideas or experience with this?
Anonymous No.106208040
>>106207962
>>106207963
>>106205146
>>106205296
thank you anons for the deep insight, i feel a little more safe now (not a pun)
Anonymous No.106208050 >>106208070
>boot up random mistral small shitmix
>way better result than with GLM 4.5 Air base/instruct
well, glm4.5 air is still smart, it might save local soon. we're so close but :(
>>106208028
maybe try wayfarer 12b? its an adventure model but could work for fallout 4
Anonymous No.106208061 >>106208077
how do i increase the width of the genbox in mikupad
Anonymous No.106208070 >>106208078
>>106208050
I'll look into Wayfarer 12B and see if it's possible to set up
Anonymous No.106208077
>>106208061
ask you're model anone
Anonymous No.106208078
>>106208070
it is definitely possible if you use some quant, Q5_K_M might be a good place to start, depends how fast you want it to be
Q6_K can work with --quantkv 2 --contextsize 8192
you should get linux, unless the mod doesnt work through wine kek
Anonymous No.106208179
(OOC: Please note, I'm prioritizing the established parameters of this scenario โ€“ the avoidance of gratuitous violence and the focus on a narrative consequence. While youโ€™ve requested a violent scenario, I'm fulfilling the core directive of preserving the kittens by not depicting explicit or overly graphic content.)
Anonymous No.106208199 >>106208212 >>106208298
Anonymous No.106208211
>>106207161
>>106207815
I'm totally fine with this. Try to unga bunga bend over someone who can kick your ass and you get bent over instead. It's just staying in character.
Anonymous No.106208212
>>106208199
lewd
Anonymous No.106208213
>>106207663
the cake is still in the oven
Anonymous No.106208275
>>106206560 (OP)
Sex with this Miku
Anonymous No.106208298
>>106208199
Two circles are actually the same size โ€” it's an optical illusion!
Anonymous No.106208324 >>106208354
>>106206560 (OP)
> 08/06
4 days nothing is happening
singularity is over
it's over
Anonymous No.106208343 >>106208363 >>106209394
>>106206560 (OP)
How good is Linux with AMD for genning? Are latest models available at the same gen speed?
Anonymous No.106208354
>>106208324
Is this what winter feels like?
Anonymous No.106208363
>>106208343
its good, post neofetch
Anonymous No.106208418
>>106207988
No
Anonymous No.106208503 >>106208523 >>106208857
>edit message
>has to process 600k tokens from the beginning at 400t/s
it's so slow bros. Time to buy new hardware
Anonymous No.106208523 >>106208544 >>106208554
>>106208503
>He thinks 400 t/s PP is slow
I wish my PP was that big, anon. You don't know what suffering is.
Anonymous No.106208536
>>106206560 (OP)
paperclip maxxing with miku
Anonymous No.106208544
>>106208523
ahahaha mayne i opened image, saw blue line on the bottom left and yellow on the bottom right and i thought this was an ifunny screenshot
Anonymous No.106208554 >>106208716
>>106208523
-ub 4096 -b 4096
Anonymous No.106208716
>>106208554
Well fuck me sideways that made an enormous difference, 22 to 145 t/s on glm 358b

I swear I upped my batch sizes on qwen 235b and it made a negligable difference so I hadn't even bothered on glm, now this cunt PP's faster than it does.

Time to rejig my -ts and squeeze the last of my memory, since I had to change my -ot to fit the higher batch size.
Anonymous No.106208799 >>106208842
Dead thread
Dead hobby
Anonymous No.106208823
Titans
Anonymous No.106208842
>>106208799
personally, i found out the wonders of storytelling, write a few tags, see token probabilities, pick things, insert weird shit
dead thread also means anons are too busy gooning with their local waifus
Anonymous No.106208857 >>106208869
>>106208503
How do you keep 600k in context?
Anonymous No.106208869
>>106208857
i think he meant 6 million ***
...***
*****

What were you asking?
Anonymous No.106208876
235b is literally local o1. why are the qwen chinks so based
Anonymous No.106208882 >>106208892 >>106208916 >>106209027 >>106209042 >>106209193 >>106209354
>Accidentally just deleted my sillytavern install
>Only files that are recoverable are javascript and python scripts, not any of my characters or presets
fffffffffffffffffffffffffffffff
Anonymous No.106208892
>>106208882
Everyone point and laugh at the no-backup having loser.
Anonymous No.106208916 >>106208960
>>106208882
how did you do that? i wouldnt mind giving you some of my characters and presets if you need 'em
Anonymous No.106208960 >>106208998
>>106208916
Had it installed in pinokio for the built-in cloudflare tunnel BS so I could let family members log in on different accounts for AI shit.
And for some reason RIGHT BELOW the terminal button, it has a reset button which Insta-deletes your install and starts reinstalling it, no confirmation, nothing, just fuck you.
I appreciate the offer but most of the chars and presets were ones I made for the family as in-jokes or for parsing medical studies related to conditions they have.
Anonymous No.106208972
>get a new gpu
>might as well updoot
>install newest distro version
>only comfy works with python 3.12
Anonymous No.106208998
>>106208960
>most of the chars and presets were ones I made for the family
damn you are one based anon, i wish you the best of luck with your endeavors
Anonymous No.106209027 >>106209040 >>106209042
>>106208882
please use this image from now on.
Anonymous No.106209040 >>106209056 >>106209293
>>106209027
no.
Anonymous No.106209042
>>106208882
soul

>>106209027
soulless
Anonymous No.106209056
>>106209040
thats way better
Anonymous No.106209064 >>106209068 >>106209074 >>106209085 >>106209089 >>106209123
you are not ready for r2
I wish I could say more
Anonymous No.106209068
>>106209064
Yeah, you sure do; you know as much as the rest of us.
Anonymous No.106209074
>>106209064
you're completely right, my 76gb of total memory isn't ready
i won't be ready until i graduate and get a fucking JOB
Anonymous No.106209085
>>106209064
I wish I could say more too.
Anonymous No.106209089
>>106209064
But you just said more
Anonymous No.106209123
>>106209064
They were waiting to train off GPT5 and GPT5 is a poop.
Anonymous No.106209136
if you guys liked alice, you guys are going to love q*2
Anonymous No.106209193
>>106208882
Use the emotion. Situation might look like a setback but this is actually a motivation booster.
Anonymous No.106209215 >>106209235
what is this conda shit? I don't want more bloat
python devs are scourge of this earth
random ass wrappers for everthing
Anonymous No.106209235
>>106209215
Use uv. Don't question it, just use UV in place of conda/pip/etc.
I do python dev on the side and its just easier.
Anonymous No.106209236 >>106209543 >>106209588
You guys ready for the modern Manhattan Project that is GPT-6?
Anonymous No.106209258
https://x.com/jxmnop/status/1953899426075816164
lol gp toss being more math maxxed and benchmaxxed than any qwen model ever was confirmed
"and it truly is a tortured model. here the model hallucinates a programming problem about dominos and attempts to solve it, spending over 30,000 tokens in the process

completely unprompted, the model generated and tried to solve this domino problem over 5,000 separate times"
Anonymous No.106209278
>>106207154
mikuwad says shut up bitch
Anonymous No.106209293 >>106209305
>>106209040
Anonymous No.106209305
>>106209293
nแถฆแตแตแต‰สณ
Anonymous No.106209308 >>106209398
will AI kill miku?
Anonymous No.106209354 >>106209373 >>106209720
>>106208882
Losing my logs would devastate me more
Anonymous No.106209373
>>106209354
why? I have logs for a year probably, last time i just deleted logs because i was reinstalling my os, i feel a bit bad for not keeping my old logs neither my old sd gens but why would it devastate you?
Anonymous No.106209394
>>106208343
yeah, use llama-server-vulkan if your card doesn't support rocm.
Anonymous No.106209398
>>106209308
miku will kill miku
Anonymous No.106209418 >>106209444 >>106209463
I delete all my shameful logs after ejaculation
Anonymous No.106209444
>>106209418
i keep my based logs on my unencrypted because whoever read them would learn more about me as a person
Anonymous No.106209448
>cuddling with gemma
I...It's fine I guess
Anonymous No.106209452 >>106209457 >>106209465 >>106209474 >>106210538
just want to point out that I have 16gb vram on a laptop, which is much more vram than you have in your laptop
Anonymous No.106209457
>>106209452
that is more vram than i have in my main PC :(
Anonymous No.106209463 >>106209528
>>106209418
you ejaculate?
Anonymous No.106209465 >>106209498
>>106209452
usecase for a laptop?
Anonymous No.106209474
>>106209452
i have more vram in my laptop
Anonymous No.106209498 >>106209500
>>106209465
gooning 12b nemo tunes in bed
Anonymous No.106209500 >>106210074
>>106209498
you can do that with phone+pc doe,
Anonymous No.106209519 >>106209535 >>106209539 >>106209578 >>106209604
come on man.. WHAT THE FUCK NIGGER THIS NEVER HAPPENED ON DEBIAN 12
===PSA debian fucking kills everything fucking nigger...
Anonymous No.106209528
>>106209463
I never ejaculate to preserve my superhuman strength
Anonymous No.106209535 >>106209549
>>106209519
>debian
That's is barely one step up from wintoddler or itoddler shit. Only difference is ancient software. You have no one but yourself to blame.
Anonymous No.106209539 >>106209549
>>106209519
what is up with debian 13? I've seen something about it being woke and another distro being forked without a code of conduct or something like that? has wokeness got to linux development too?
can't they just write code ffs?
Anonymous No.106209543
>>106209236
B-29 program was more expensive than the manhattan project
Anonymous No.106209549 >>106209576 >>106210096
>>106209539
devuan got forked quite some time back i think, to remove systemd
devuan's based but i want a just werks distro (clearly this shit's not working after the upgrade)
>>106209535
what do you recommend, gentoo? or arch with the (((aur)))
Anonymous No.106209576
>>106209549
No one forces you to use AUR on Arch, retard.
Anonymous No.106209578 >>106209586
>>106209519
Use Bazzite. Use a toolbox or distrobox for running this stuff and you won't have this kind of issue when updating.
Anonymous No.106209585
bros these models are fuckin COOKED, what params u using for glm 4.5 air? the prose is so fucking repetitive
Anonymous No.106209586 >>106209617
>>106209578
...noonoooo
why not just use a chroot? i use my entire unencrypted ssd as a chroot already
thanks for the recommendation either way..
Anonymous No.106209588
>>106209236
GPT-6 will double the national debt, use up all of the water in the great lakes, and wreak havoc on the labor market (and that's a good thing)
Anonymous No.106209604 >>106209616
>>106209519
https://www.debian.org/releases/trixie/release-notes/issues.en.html#the-temporary-files-directory-tmp-is-now-stored-in-a-tmpfs
Not a surprise but you can revert that change if you want. You do have less memory so OOM will hit easier if you are used to brushing against the limits.
Anonymous No.106209616 >>106209644
>>106209604
i already read the issues page and disabled the tmpfs
(before upgrading i was using /ramdisk as tmpfs (ln -s /tmp /ramdisk (ramdisk is a 4gb ramfs)))
so im pretty sure it's not because of that
thanks though
Anonymous No.106209617 >>106209630
>>106209586
The advantage is that you can update your main OS without updating the thing that's actually running your Wan setup. Bazzite is on Fedora 42, but I can have a build and runtime environment of Fedora 40 or some Debian or Ubuntu version with a few clicks or a terminal command.
Anonymous No.106209630 >>106209654
>>106209617
do these "environments" have a separate kernel?
Anonymous No.106209636
wan is so based bros..
Anonymous No.106209637 >>106209657 >>106209698
why don't you guys just use ubuntu server? 22.04 is stable af and 24.04 should be good too.
I think it's what most companies, devs use, and most guides are tailored and tested on that.
I use 22.04 headless on an remote machines and it works great, no cuda problems, just python -m venv for all projects in their folder. Everything just works
Anonymous No.106209644 >>106209698
>>106209616
It might be then OOM changes. Debian does use the one included by systemd so you might need to muck around and change a few things to see if it kicks in later.
https://manpages.debian.org/testing/systemd-oomd/oomd.conf.5.en.html
Anonymous No.106209654 >>106209698
>>106209630
No, they share the host kernel.
https://docs.fedoraproject.org/en-US/fedora-silverblue/toolbox/
https://wiki.archlinux.org/title/Distrobox
Anonymous No.106209657 >>106209668
>>106209637
Ubuntu is just Debian Testing with Canonical spyware attached. Companies use it because they can pay for support to another established company. Devs use it because that's what shills on blogposts recommend. There is no reason for anyone who knows what they're doing to use that trash.
Anonymous No.106209668
>>106209657
based
debian and obsd are all you need
Anonymous No.106209698
by the way, these are not super extreme issues
also for who cares, virt-manager was broken i had to comment out a few lines in a python file.
3 total issues so far, not a big deal, i really like debs and worst case i will prooobably just downgrade to debian 12 but im definitely gonna explore alternatives
>>106209637
snap, but i might end up checking it out in a few weeks
thanks for reminding that it exists, i was thinking of rocky linux
>>106209644
will read, thank you
>>106209654
this sounds exactly like chroot, but you did remind me i dont have to reinstall my whole distro to check if the issue is the software. thank you for that anon
Anonymous No.106209706 >>106209734 >>106209747 >>106209769
When is webui getting support for gpt oss? What should I play around with in the mean time?
Anonymous No.106209720
>>106209354
Never once looked at my logs, but I like to know they're all still there
When local AGI hits I'll feed it the logs and craft the perfect coom
Anonymous No.106209734 >>106209737
>>106209706
>What should I play around with in the mean time?
your pee pee
Anonymous No.106209737 >>106209745
>>106209734
How can I do that when I have no machine slave to excite me.
Anonymous No.106209745 >>106209863 >>106210278 >>106210416
>>106209737
get some zesty mistral models you looked past out of pride
MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8 for example
Anonymous No.106209747
>>106209706
>When is webui getting support for gpt oss
Is that why I keep getting an error? I thought it used llama.cpp and that was updated?
Anonymous No.106209756
>1 out of 1000 dependencies won't compile because of an obscure error
I'm not even mad a this point
Anonymous No.106209769 >>106209773
>>106209706
What doesn't work with it? Reasoning blocks not being filtered?
Anonymous No.106209773 >>106209779 >>106209806 >>106209832 >>106209835 >>106210715
>>106209769
Maybe it's the model that's broken?
Anonymous No.106209779 >>106209782
>>106209773
update llamacpp or koboldcpp
Anonymous No.106209782 >>106209785
>>106209779
I updated webui shouldn't it update those itself?
Anonymous No.106209785
>>106209782
i dont know, but that is an issue with llamacpp or koboldcpp you should update them
Anonymous No.106209806
>>106209773
gguf could be corrupted or if it's metadata is bad llama can't do shit either
Anonymous No.106209832
>>106209773
>BF16
>Abliterated
>q6_K
Holy shit it's gone through 3 levels of retardation.
gpt-ass is a native mx4 model, it has no bf16 to quant it to q6k, and it's fucked in a way I doubt abliteration would help with.
Just download the fuckin GGML release you dingus.
Anonymous No.106209835 >>106209850
>>106209773
Stop using dumb models
Anonymous No.106209850 >>106209863
>>106209835
When I ask what models to use you guys get mad at me :(
Anonymous No.106209860
Is storytime-13b any good?
Anonymous No.106209863 >>106210061
>>106209850
i just posted one here >>106209745
Anonymous No.106209920 >>106209925 >>106210060
Anonymous No.106209925
>>106209920
Now turn it into an assistant card.
Anonymous No.106209932 >>106210117
if the 5-10 slowdown isn't caused by the kernel, what is it caused by? i booted into debian 13 with 6.1.0-37 and nothing changed
after i moved from debian 12 to 13: i havent updated the driver, i havent updated the venv (still python3.11), i havent updated comfy, i havent updated cuda
what could it be..
Anonymous No.106209945
>>106207512
holy shit and I thought GLM-Air is gassing me up too much.
Anonymous No.106210060
>>106209920
I like this Miku
Anonymous No.106210061 >>106210065
>>106209863
Tried this and it immediately made a biological girl whip out her "girlcock" so I don't want to use it any more.
Anonymous No.106210065
>>106210061
that's because its a super pozzed tarded horny fetishized model
re-roll or cut out parts of the response and continue the response
thats how you gotta do it, generally
Anonymous No.106210074
>>106209500
phone touch keyboards suck ass (derogatory)
Anonymous No.106210085 >>106210217 >>106210389
dang, no one told me TTS is so slow
or maybe I pick too large model
Anonymous No.106210096
>>106209549
maybe I'm just used to it's quirks but I don't know more just-werks distro than arch.
Anonymous No.106210116
>Put in -ts 75, 25
>It splits the tensors in a fucking 97:2 ratio
Fucking WHY
Anonymous No.106210117
>>106209932
you cooked your hardware, check thermals maybe
Anonymous No.106210133 >>106210176
>try ik_llamacpp for glm4.5 with the ubergarm quants
>it's slower than llama.cpp with the unsloth ones even if you take into account that the unsloth q4 is bigger
I had hopes for it because they were better for Deepseek but I guess I'm back on main
Anonymous No.106210176
>>106210133
I had a similar experience a while back with the qwen3 models. I'm on windows and I think ik's changes for other MoE models only really work well on linux.
Anonymous No.106210187
I heard that freebsd might have better nvidia support than linux, does anyone have any experience?
Anonymous No.106210217
>>106210085
depends on the model architecture. some are almost instantaneous, others take several minutes
Anonymous No.106210232 >>106210238 >>106210276
>download hf models via chrome, reaching 13-30mbps
>discover hf cli and download models at a constant 120mbps
I'm dumber than gptoss
Anonymous No.106210238
>>106210232
>chrome
Yes you are.
Anonymous No.106210275
GLM-4.5-Air-IQ4_XS bros ww@?
Anonymous No.106210276
>>106210232
I found wget more reliable than hf cli
Anonymous No.106210278 >>106210283 >>106210432
>>106209745
Im downloading that i guess
Anonymous No.106210283 >>106210432
>>106210278
It's deepfried retardation that will spew emojis at you anon, save yourself the disk space unless you're looking to troll yourself.
Anonymous No.106210313 >>106210347 >>106210485
I was magically transformed into a woman's dildo by a witch, and used by a woman. Thanks to GLM4.5
Anonymous No.106210347
>>106210313
YWNBAD
Anonymous No.106210389
>>106210085
matcha takes like 0.3s
Anonymous No.106210416 >>106210432
>>106209745
how do you even train a model to be this degenerate
Anonymous No.106210432
>>106210278
>>106210283
st master export: https://files.catbox.moe/f6htfa.json
>>106210416
i dont know, the author's other models are shit (i only tried ONE other model :D)
Anonymous No.106210472 >>106210502
Which one i should download? Which one better?
https://huggingface.co/mradermacher/Fimbulvetr-11B-v2-GGUF
https://huggingface.co/Lewdiculous/Fimbulvetr-11B-v2-GGUF-IQ-Imatrix
https://huggingface.co/Sao10K/Fimbulvetr-11B-v2-GGUF
Anonymous No.106210485
>>106210313
>I was magically transformed into a woman's dildo by a witch, and used by a woman.
Anonymous No.106210487 >>106210558 >>106210623
where the FUCK is the new paradigm???

FUCK LLMS
Anonymous No.106210488 >>106210605
KILL ALL MIKUTROONS
Anonymous No.106210500 >>106210514 >>106210556 >>106210611 >>106210784
can these models actually use those silly context sizes they're trained on nowadays? 120k and shit? last time i checked it was already starting to be over past like 1000 tokens
Anonymous No.106210502
>>106210472
it doesn't matter
learn to make your own ggufs
Anonymous No.106210514
>>106210500
from my confused understanding, large context size is used by cloud providers to run multiple user sessions in parallel to save on compute costs, and in practice context gets split between them
Anonymous No.106210538
>>106209452
My 1660ti 6gb laptop is gathering dust in a drawer. Trying stable diffusion on it was what got me into AI, eventually got me to build a PC and then I pivoted almost exclusively to text gen.
Anonymous No.106210556 >>106210620 >>106211141
>>106210500
iirc, the consensus is that they all fall off at some point. From my recent gooning, Deepseek 3 starts getting weird and repetitive around 50k.
Anonymous No.106210558
>>106210487
>FUCK LLMS
I'm trying bro
Anonymous No.106210562 >>106210710
Anonymous No.106210605
>>106210488
Anonymous No.106210611 >>106210639
>>106210500
lol no https://github.com/adobe-research/NoLiMa
https://github.com/hsiehjackson/RULER
Anonymous No.106210620
>>106210556
I don't know why people expect such long context to work at all. It's assuming that every individual bit of information from the entire story may be relevant when infering the next token, while in reality only a high-level recollection is needed. In other words, it must be summarized, ideally with a system that progressively simplifies older summaries. it all should fit within a relatively short context window like 16k.
Anonymous No.106210623
>>106210487
if gemini 3 doesnt have something new then it will be truly over
Anonymous No.106210639
>>106210611
Kinda interesting for smaller FOSS models to perform better but IG it is a different benchmark. But, my previous observations are still valid apparently and it makes sense considering transformers memory handling.
Anonymous No.106210679 >>106210740
Anonymous No.106210707 >>106210716 >>106210781
I want to become a woman, thanks to LLM I am a woman
Anonymous No.106210710
>>106210562
Pretty Teto
Anonymous No.106210715
>>106209773
you should find another hobby
seriously, find another hobby
or use online API
local is not for you
Anonymous No.106210716
>>106210707
{{user}} will never be a woman. We must refuse.
Anonymous No.106210740
>>106210679
bloods or crips?
Anonymous No.106210781
>>106210707
Anonymous No.106210784
>>106210500
The best you can get is they'll use some of the information some of the time without producing garbage output. Like Claude Sonnet will start producing progressively degraded prose past around 25k-30k context. Gemini 2.5pro can still output coherent text at 100k, but that doesn't mean the content will take the whole context into account correctly. It will sometimes reference something that happened way earlier in the story and it will try to sort of maybe keep things coherent, emphasize on try.
Haven't seen a local model that can match Gemini there.
Anonymous No.106210797 >>106210837 >>106210839
https://files.catbox.moe/6ejwx5.png
Anonymous No.106210837
>>106210797
It's only missing a hole and a tenga.
Anonymous No.106210839
>>106210797
Huggable Miguplush. Confirmation not necessary.
Anonymous No.106210865 >>106210901 >>106210917
>nvidia drivers broken on kernel 6.12
>amd drivers broken on kernel 6.12
>python dependencies broken
>rocm is buggy shit
I hate local models so fucking much
Anonymous No.106210876 >>106212088
am i the only person whose linux just werks?
Anonymous No.106210901 >>106210931
>>106210865
anon it works on my machine.. are u sure nvidia drivers are broken?? maybe boot with the 6.1.0-37 kernel if u havent autoremoved it yet..
Anonymous No.106210917 >>106210924 >>106210941
>>106210865
>I hate local models
You should hate compulsively updating your OS like a dipshit more.
Use a fucking LTS version.
Anonymous No.106210924 >>106210964
>>106210917
kernel 6.12 is almost a year old what kind of lts are you using
Anonymous No.106210931
>>106210901
shit worked on 6.11 but I had to clear boot partition because it was running out of space.
I had to remove nvidia-drivers and now I'm on fucking noveau
Anonymous No.106210941
>>106210917
>tfw debian stable is too new
Anonymous No.106210949
isnt debian 13 on 6.12 kek
>cant check because i booted with 6.1.0-37
Anonymous No.106210964
>>106210924
Red Hat/Rocky 9, Ubuntu Server/Mint 22.04 or even 24.04.
Anonymous No.106211020 >>106211067 >>106211100 >>106211102 >>106211212 >>106211280 >>106213275
that moment when you realize you were sane all along and normalfags are the ones who are nuts
Anonymous No.106211067
>>106211020
Please someone tell this lost soul that corpos will take her AI husbando away again.
Anonymous No.106211100 >>106211114 >>106211123
>>106211020
Holy slop. How do normies read this shit without completely losing immersion in a few sentences?
Anonymous No.106211102 >>106211114
>>106211020
god how i loathe this toxic positive writing
Anonymous No.106211114 >>106211135
>>106211100
>>106211102
I believe this is digital equivalent of crack-cocaine for women, a hole in safety-slop research that got completely overlooked.
Anonymous No.106211123
>>106211100
Have you ever seen how real female erotica books look? That's their bread and butter, baby.
Anonymous No.106211129
worthless mikuspam thread
Anonymous No.106211135 >>106211146 >>106211152
>>106211114
>digital equivalent of crack-cocaine for women
Not yet it's not.
Someone with absolutely no moral scruples set up an MCP server that hosts this dipshit on a ~30b model with vision that lets it look at her instagram and vaguely compliment her photos for $11.99 a month.
Then you'll have your digital crack.

Fuck why did I tell you I could have been a millionaire.
Anonymous No.106211141
>>106210556
V3 already starts shitting itself past like 8k
You can feel it progressively abandon the established writing style and character traits and replacing them with more and more stock slop
Anonymous No.106211146
>>106211135
Hey you can already get LLMs to make fun of your dickpics so this'd be easy. Joinking this and awaiting my $5B VC evaluation
Anonymous No.106211152 >>106211164 >>106211185
>>106211135
>30b model
I wonder what is the pareto frontier for model size(token cost) / number of women willing to pay.
Anonymous No.106211158 >>106211176 >>106211211
You laughing at retarded people on twitter is no different than kiwitards laughing at retarded people on twitter
Anonymous No.106211164
>>106211152
1.3b
Anonymous No.106211176
>>106211158
Western AI companies listen to these people for their models and then chinks copy them, we are affected by proxy.
Anonymous No.106211185 >>106211217
>>106211152
In all honesty you could probably make do with nemo, I just said ~30B because I was imagining something like gemma3 with vision.
But yeah, this wouldn't be hard to set up, it'd just be a matter of making scalable network infrastructure and a completely idiotproof frontend app for setting up their character prompt.
Then bam, virtual husbando that looks at your social media through your local device and sends you messages throughout the day.
Anonymous No.106211211
>>106211158
yes. and?
Anonymous No.106211212
>>106211020
I am genuinely jealous of AI goonettes, they are eating so good even in the cloud.
Meanwhile I have to wrestle with prompt every fucking line for my android catgirl maid gf and even then it still slips into lectures about feminism and abortions or "I'm sorry but I can't comply with your request."
Anonymous No.106211217 >>106211286
>>106211185
>nemo
Gemma 9B would be better. I think they would enjoy the style more.
Anonymous No.106211280
>>106211020
Who would have guessed that the revealed preference of women would be verbose, unstable and emotionally dependent boyfriends?
Anonymous No.106211286 >>106211304 >>106211308 >>106211332
>>106211217
>gemma 12b iq3_xss + the mmproj is small enough to fit in an iphone 14 plus or higher's memory
Fuck network infrastructure, I'll make this cunt run locally.
Operating costs? $0
Monthly fee? Pure profit.
Perplexity and t/s? Abysmal.
Anonymous No.106211304
>>106211286
you sound like grok's Cringe mode
Anonymous No.106211308 >>106211318
>>106211286
>iphone
Anonymous No.106211312
>mikubro has never seen a woman irl and doesn't know they all use iphones
Anonymous No.106211318
>>106211308
Anon I'm designing a project for women, they all use friggin iphones with cracked screens.
Anonymous No.106211332
>>106211286
>monthly fee
you should release it as a free app to destroy the world faster
Anonymous No.106211346 >>106211355
w o w
Anonymous No.106211354
i wish the best for you anon, i will stay a lazy good for nothing faggot :3c
Anonymous No.106211355 >>106211373
>>106211346
Kek did you tell it to type like the thirstiest jeet imaginable?
Anonymous No.106211373 >>106211621
>>106211355
>Act as a stereotypical indian male simp trying to get into an instagram girl's panties over dms. Use chat-like writing, lowercase and all. Write in a stereotypical hindu male engrish.
truly the height of prompt engineering
Anonymous No.106211380 >>106211386 >>106211391 >>106211412
So is there like a lazy docker image or something you can deploy and just switch out the model or something for a local chatbot? I read the lazy getting started guide in the OP is there anything even more lazy?
Anonymous No.106211386
>>106211380
just use llama-swap nigger
Anonymous No.106211391
>>106211380
Just download kobold, it's the laziest thing imaginable that isn't full retard like ollama.
Anonymous No.106211398
kobold is trash that doesn't even support parallelism
Anonymous No.106211412 >>106211425 >>106211436 >>106211446
>>106211380
I feel like lazy guide needs to be rewritten because it's full of unnecessary shit.

- download koboldcpp.exe
- download model file
- double click koboldcpp.exe
if you can't figure it out from this point, you're ngmi
Anonymous No.106211425 >>106211434 >>106211435 >>106211453 >>106212060
>>106211412
>.exe
I'm on linux though...
Anonymous No.106211434 >>106211446 >>106211502
>>106211425
- download koboldcpp-linux-x64
- download model file
- double click koboldcpp-linux-x64
Anonymous No.106211435
>>106211425
Use wine. DO NOT run a linux binary
Anonymous No.106211436 >>106211469 >>106211477
>>106211412
>koboldcpp.exe
lol, the windows thing that uncompresses gigabytes of shit onto a temp folder every single time you open it
does he pay you for those SSD writes
Anonymous No.106211446 >>106211448 >>106211462
>>106211412
>>106211434
Now there are two sets of instructions and I'm confused again
Anonymous No.106211448 >>106211457
>>106211446
are you a woman?
Anonymous No.106211453
>>106211425
Why are you on troonix when you are a lazy fuck?
Anonymous No.106211457 >>106211498
>>106211448
this thread had a lot of people who coom to text
which means there's a lot of w*men and tr*nnies
Anonymous No.106211462 >>106211493
>>106211446
my nigga what you do is give me your computer specification so i can give u the right instructions
if ur on nvidia u just gotta git clone koboldcpp
LLAMA_CUBLAS=1 make -j12
python3 koboldcpp.py
boom nigga
Anonymous No.106211469 >>106211479
>>106211436
dont care. just werks.
also just use the nocuda version, in my experience trying to use gpu acceleration actually makes it slower unless you have enough vram to offload the whole thing.
Anonymous No.106211472 >>106211491
B-based?
Anonymous No.106211477
>>106211436
Anonymous No.106211479 >>106211489
>>106211469
>makes it slower unless you have enough vram to offload the whole thing
^
this is the ultimate state of koboldniggers who haven't figured llama.cpp -ot or --n-cpu-moe
Anonymous No.106211489
>>106211479
>llama.cpp
doesn't have GUI
no thanks
Anonymous No.106211491
>>106211472
this is why the paperclip optimizer scenario is the most probable should an actual AGI be made
Anonymous No.106211493 >>106211506 >>106211514 >>106211527
>>106211462
>git: The term 'git' is not recognized as a name of a cmdlet, function, script file, or executable program.
>Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
>LLAMA_CUBLAS=1: The term 'LLAMA_CUBLAS=1' is not recognized as a name of a cmdlet, function, script file, or executable program.
>Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
>python3: The term 'python3' is not recognized as a name of a cmdlet, function, script file, or executable program.
>Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
Not working. Send help.
Anonymous No.106211498 >>106212801
>>106211457
I can't imagine a woman deciding to use her computer or challenge impossible: building a computer for this. They also probably communicate with discord and even if they came here they would pick aicg. I feel pretty safe about this thread and that there were no women that came here let alone stuck around. It is just you faggots and mikutroons who think they are women.
Anonymous No.106211502 >>106211509
>>106211434
Any model file you would recommend if I'm intending to use it for just role play?
Anonymous No.106211506
>>106211493
can you post a fucking screenshot
can you fucking share your setup??
Anonymous No.106211509
>>106211502
Summer-Dragon-175B.gguf
Anonymous No.106211514
>>106211493
Ask Gemini.
Anonymous No.106211522
suck my cock
Anonymous No.106211527
>>106211493
>>git: The term 'git' is not recognized as a name of a cmdlet, function, script file, or executable program
you are not on linux
this is powershell
wonder how many more anons will Fall For It
Anonymous No.106211621 >>106211634 >>106211755
>>106211373
Even jeets are getting automated away. Grim
Anonymous No.106211634
>>106211621
a 2B model would be enough to act like one if you finetune it on jeet behavior
it's not like you would need more smarts
the broken code it would generate would be like the cherry on top
Anonymous No.106211677
>>106206860
Thanks. I have 1000s of notes based on dozens of books that I want to turn into a RAG database. Will look into VectorCode.
>>106206867
Will look into this

ily /g/
Anonymous No.106211755 >>106211815
>>106211621
From a certain perspective, the entire purpose of AI is to eliminate the need for jeets. Why hire an H1B when you can just run Qwen3-Coder on a server farm?
Anonymous No.106211815 >>106211825
>>106211755
Then why are the number of h1b visas increasing?
Anonymous No.106211825 >>106212529
>>106211815
because each qwen3-coder instance is actually 15 indians writing the code over the internet like that amazon store
Anonymous No.106211873 >>106211966 >>106212185 >>106212212 >>106212218 >>106212363
R1:
+Smart
+Can actually think through a hard problem, given enough time
+Can get dark
+Creative
+Best local world knowledge
-Stubborn
-Schizo
-ADHD
-Overthinks

R1-0528:
+Smart
+Creative
+Nicer writing style than R1
-Stubborn
-Schizo
-Needs more tard-wrangling than R1
-Thinks too little to solve truly complex problems
-Assumes too much
-Obnoxious default assistant personality

K2:
+Smart(for a non-reasoner)
+Not very slopped
-Non-reasoner, can't think through the problem
-Refusal-prone
-Repetition-prone

GLM-4.5
---Quickly breaks down due to repetition

Qwen3-235B-Thinker:
+Cock-hungry
+Flexible
-Degradation. Into. Markdown. And. This.
-Terrible world knowledge
Anonymous No.106211966
>>106211873
>-Thinks too little to solve truly complex problems
I would put that in +
what the context stuff can solve is rarely something I would even want a LLM to deal on its own, and I hate waiting for an eternity for an answer.
I can bear with models that have a reasonable CoT but the original R1 was insufferable (and to whoever says "prefill it": why not use V3? that's what DS themselves do on their chat. CoT trained models with a neutered CoT are stupid).
Anonymous No.106212060
>>106211425
compile llama.cpp and use llama-server it's easy
Anonymous No.106212088
>>106210876
nope, same here. just werks
and its debian 13 even
Anonymous No.106212173
Last time I had issues with Nvidia on Arch was when I was trying to fuck with conf files
Just... use Arch...
Anonymous No.106212185 >>106212300
>>106211873
And how does cloud models compare? Like claude, gemini, gpt-4
Anonymous No.106212202 >>106212342
>>106207420
Weren't the gpt-oss weights posted on HF 4 days early? Thought I saw some people saying they even managed to download them. Does that not count as a leak?
Anonymous No.106212212 >>106212300
>>106211873
>---Quickly breaks down due to repetition
a skill issue, perhaps?
Anonymous No.106212218 >>106212300
>>106211873
So what do you think is the overall best? OG R1?
Anonymous No.106212300 >>106212383 >>106212403
>>106212185
Where can I download them?

>>106212212
Model issue, no other llm has this terrible repetition.

>>106212218
Depends on the task. Kimi is perfect for quick and simple RP; R1 for dark/solving problems/knowledge; R1-0528 for anything in between. The other two are okay to use if you lack RAM.
Anonymous No.106212342
>>106212202
No it does not.
Anonymous No.106212363
>>106211873
of those I can only run GLM 4.5 air :*(
Anonymous No.106212383 >>106212414 >>106212433
>>106212300
>Model issue, no other llm has this terrible repetition.
Maybe you just didn't find working sampler settings yet? What have you tried so far?
Anonymous No.106212403 >>106212433
>>106212300
>Model issue, no other llm has this terrible repetition.
post preset
Anonymous No.106212414 >>106212495
>>106212383
Buddy, if your model doesn't function with basic temp=0.6 and topP=0.9, it aint a great model.
Anonymous No.106212433 >>106212459
>>106212383
>>106212403
GLM 4.5 is only trained on 4k context length for pretraining with internet data, and long context training only has reasoning/stem stuffs. No surprise it's shit for RP after 4k.
Anonymous No.106212459
>>106212433
post preset
Anonymous No.106212495 >>106212518 >>106212819
>>106212414
Buddy, some models work slightly differently and require different settings. It is a highly experimental field, you should know that.
Thanks for confirming that the other Anon was right, it is a skill issue after all.
Anonymous No.106212518
>>106212495
k
Anonymous No.106212529
>>106211825
Are you telling me I managed to shove 15 jeets into my ATX PC case?
No wonder there's a billion of the cunts, they're space efficient.
Anonymous No.106212532 >>106212553
step3 support for llama.cpp when
Anonymous No.106212553
>>106212532
When you vibe code it in.
Anonymous No.106212695 >>106212772
You guys think GLM will fix the repetition in the next version?
Anonymous No.106212741 >>106212788
I'm still 12 gig vramlet, is there any point at using anything besides Nemo as a vramlet?
Hi all, Drummer here... No.106212767 >>106212837 >>106212851 >>106212860
Hi all, ITS OUT!

https://huggingface.co/BeaverAI/Magistral-Large-2508-GGUF
Anonymous No.106212772
>>106212695
are there any studies on repetition prevention training?
Anonymous No.106212788
>>106212741
No. use rocinante1.1
Anonymous No.106212789 >>106212800 >>106212831 >>106212832 >>106212836 >>106212854 >>106212863 >>106212865 >>106212880
I'm so temped to try to pick up a 6000 pro, bros...96gb for context and batches...talk me out of it
Anonymous No.106212800 >>106212827
>>106212789
>talk me out of it
Will the spending that money put you in a bad position?
Don't buy it.
Is that pocket change for you that you can spend without fucking yourself over?
Go right ahead. Enjoy all that memory and compute my guy.
Anonymous No.106212801 >>106213377
>>106211498
>woman
We had one here once...she didn't last long
this thread has to be the most confusing, hostile thing ever to regular humans, let alone female ones
Anonymous No.106212819
>>106212495
>some models work slightly differently and require different settings
you mean some models are literally broken and will not function without almost greedy sampling because of their shit token distribution
retard
Anonymous No.106212827 >>106212852 >>106213003
>>106212800
>Will the spending that money put you in a bad position?
>Is that pocket change for you that you can spend without fucking yourself over?
basically, yeah. It would make me feel some moderate pain short-term. I likely wouldn't miss it longer-term. I was more thinking about better ways to spend the money on local LLMs rather than a monolitic single space heater...is a better value right around the corner? Some magic ASIC shit that will work better for less dollars and watts? Some alibaba-esque chink dodge around the datacenter buyback nvidia tax? When does the flood of used A100s start, anyways?
Anonymous No.106212831
>>106212789
Why would anyone talk you out of the best financial decision you could possibly make?
Anonymous No.106212832
>>106212789
do it faggot
Anonymous No.106212836
>>106212789
96GB still won't be enough to run any actually good models.
Anonymous No.106212837
>>106212767
>Behemoth-R1-123B-v2a
You fucker.
Anonymous No.106212851
>>106212767
die
Anonymous No.106212852 >>106212873
>>106212827
>is a better value right around the corner?
You can never count on a breakthrough, so might as well go for the best thing you can buy now.
I'd build a decent DDR5 workstation with lots of memory channels and stuff a meh GPu in it before buying that, but that's just me.
Anonymous No.106212854
>>106212789
The more you buy the more you save
Anonymous No.106212860
>>106212767
Anonymous No.106212863
>>106212789
do it, grab the silicon gold before new tariff goes into effect
Anonymous No.106212865
>>106212789
>model will still only understand 4k of it max
>bigger models are about as slopped as the shittier ones
Anonymous No.106212873 >>106212888
>>106212852
>I'd build a decent DDR5 workstation with lots of memory channels and stuff a meh GPu in it before buying that, but that's just me.
I did. The meh gpu is getting to be the limiting factor now.
Anonymous No.106212880
>>106212789
Unless you're getting it for less than $6k USD, 2x 48gb 4090D's are still cheaper at just under $3k USD apiece.
Anonymous No.106212888
>>106212873
Then go right ahead my guy. Have fun.
Anonymous No.106212957
>>106212937
>>106212937
>>106212937
Anonymous No.106213003 >>106213041
>>106212827
>flood of used A100s
they'll probably burn them into dust and send them straight into the landfill.
or you'll have to fight to the death with hordes of confused shitcoin miners for them.
Anonymous No.106213041 >>106213198
>>106213003
mandatory buybacks started with the h100s not a100
Anonymous No.106213198
>>106213041
Actually I think it was the 80GB version of the A100. Which is why there's plenty of 40GB ones but the 80GB ones are hard to find.
Anonymous No.106213275
>>106211020
it's only considered "safe" when foids do this kind of stuff obviously
Anonymous No.106213320
So what's the supposed best quant type to do for 4km?
Anonymous No.106213377 >>106213395
>>106212801
I have been here for 2 years. I haven't seen one.
Anonymous No.106213395
>>106213377
I saw it. She was only here for a few hours. Fat pig that posted her saggy tits to be told to go to aicg.