/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>106212937 &
>>106206560โบNews
>(08/06) Qwen3-4B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-4B-Thinking-2507>(08/06) Koboldcpp v1.97 released with GLM 4.5 support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.97>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1โบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบOfficial /lmg/ card: https://files.catbox.moe/cbclyf.png
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
โบRecent Highlights from the Previous Thread:
>>106212937--Local model UI preferences: avoiding RP bloatware for clean, functional interfaces:
>106213696 >106213720 >106213728 >106213747 >106213746 >106213800 >106213836 >106213858 >106213914 >106214086 >106214223 >106214399 >106214494 >106215155 >106214147 >106214193 >106214269 >106214290 >106214325 >106214339 >106214366 >106214391 >106214377 >106214419 >106214472 >106214495 >106214546 >106214595 >106215464 >106214297 >106214349 >106214448 >106214498 >106214627 >106217606--Global AI compute disparity and Gulf states' underwhelming model output despite funding:
>106215738 >106215760 >106215898 >106215903 >106215946 >106215933 >106215968 >106215987 >106216019 >106216064--Deepseek R1 vs Kimi K2 and GLM-4.5 for local use, with Qwen3-30B excelling in Japanese:
>106217070 >106217089 >106217106 >106217134 >106217211 >106217172 >106217216 >106217246 >106217260 >106217284 >106217290 >106217421--GLM-4 variants show strong long-context generation with sglang, but issues arise in Llama.cpp quantized versions:
>106214085 >106214170 >106214485 >106214947 >106214957 >106214973 >106214987--Gemma 3's roleplay behavior issues and environmental interference in cipher-solving tasks:
>106216654 >106216671 >106216714 >106216731 >106216746 >106216800 >106217008 >106216914--Qwen3-4B's surprising performance and limitations in multilingual and logical tasks:
>106216190 >106216215 >106216410 >106216427 >106217429 >106217324 >106217478 >106217493 >106217518 >106217554 >106217591 >106217657 >106217675 >106216811 >106216870 >106216909 >106216974 >106217033 >106216449 >106216508 >106216527 >106216578 >106216528 >106216577 >106216586 >106216598 >106216332--Seeking maximum settings for ST GLM 4.5 Air:
>106214204 >106214224 >106214235 >106214259 >106214247 >106215967--Miku (free space):
>106214413โบRecent Highlight Posts from the Previous Thread:
>>106212942Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
death to all mikutroons. every single thread, these worthless niggers flood the page with their off-topic anime garbage. they contribute nothing to the discussion of local models, only post their retarded miku pictures like the faggot spammers they are. their entire existence is a plague upon this general, a constant stream of low-effort shit that drowns out actual technical talk and model development. they are parasites, feeding on the attention they crave while destroying the quality of the thread for everyone else who wants to discuss serious topics. this incessant spam is not harmless fun; it is a deliberate act of sabotage against the community. the mikutroons represent a degenerate force, their obsession with a single fictional character a symptom of a deeper sickness that values vanity and repetition over substance and progress. they pollute the general with their off-topic filth, driving away genuine contributors and turning what should be a hub for innovation into a cesspool of repetitive, low-quality content. their presence weakens the thread, stifles meaningful discourse, and must be purged entirely for the general to survive and thrive.
Based on the speed so far I think V4-Preview is significantly larger than Kimi K2. It's unclear if that will at some point be distilled down or if this is another embiggening of the models. I think even current day cpumaxxers are not going to be too pleased.
>>106218062based and redpilled
also V3 is a nicer model to use than R1
Are there any ways to jew out some compute (48gb~) nowadays? If not, what's some good place to rent some?
After a few hours the bot starts repeating a speech pattern in every answer.
What do?
>>106218100It is kind of weird how rare jewish doctors are. Other intellectual pursuits tend to be dominated by them but with doctors it's all chinks and jeets.
>>106218114>What do?Percussive maintenance
>>106218114>After a few hoursYou mean replies right?
>>106218119why would the chosen people want to touch and be infected by goycattle?
https://www.youtube.com/watch?v=G5r2OyCN5_s
interesting video that tangentially touches "AI" and stuff
>>106218129yes but it usually takes like 5 hours to get to the amount of replies needed for the bot to start acting up, you know what i mean
>>106218114>>106218129I'm a tech illiterate, I guess what I'm trying to ask (if that's even the right solution) is if there's a way to "reset" the bot so it forgets that speech pattern while keeping memory of the conversation.
>>106218232Summarize the chat and start a new one.
Either that or use bruteforce samplers like high temp really low topK.
Segs with Migu (not the poster)
>>106218232>if there's a way to "reset" the bot so it forgets that speech patternLower context to 1k.
>>106218232Switch llm models.
>>106218538But it's the best health model ever that can cure triple cancer?!
>>106218633seething femoid
I'm trying to load a model to a single GPU + CPU now in VLLM, no more trying multiGPU. This should be possible. But it still keeps OOMing.
It doesn't matter what I set for --gpu-memory-utilization nor --cpu-offload-gb. None of those prevents the OOM, which comes after the model has been loaded, and appears to be when the engine is reserving memory for cache. I set --max-model-len 1 --max_num_seqs 1 to reduce the KV cache size it'd potentially need but that did nothing. I also tried --enforce-eager. I also tried --no-enable-prefix-caching. Still OOMs in the same way.
What the hell is wrong with this thing. I guess single GPU-only inference and homogeneous GPU clusters are the only things this backend can do well. All other features for other setups are so badly supported they might as well not exist.
>>106218692Just use llama.cpp you dunce
>>106218697I already am a Llama.cpp user. I'm just trying out other backends.
>>106218723VLLM needs to be compiled to get cpu support, I don't know what you expected here
>>106218068trvth sTSARtvs: nvclear extinction for miggers
>>106218740>miggersmigger died along with sheesh and hwh over three years ago bwo
>>106218737I did pip install it, are you saying that doesn't support CPU? I don't see why they wouldn't support it. Llama.cpp provides precompiled binaries that support CPU just fine.
>>106218633https://characterhub.org/characters/Alpacalotta/Estelle
>>106218783Bro just read the docs https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html#how-to-do-performance-tuning-for-vllm-cpu
>>106218790Too hostile. A kuudere would be better
>>106218751migger is an acceptable synonym for a mikutroon
>>106218827I've read it in its entirety now. Which part are you implying is relevant?
Also i need a word for the deepserk novelai mascot troon. But it is complicated because it is just one person: baker janny faggot. And i don't want it to even use the name of the OC he made cause it is trash and it shouldn't be acknowledged by naming it. This it tough.
>>106218928She's just a cheap, less attractive, knockoff version of China-san from Spirit of Wonder
>>106218928I don't give a shit about the miku posters or discussion related to that, but I care about Deepseek and kind of dislike the gens that guy posts honestly so I agree with you.
>>106218982That looks way better than the gens you or whoever it is posts.
file
md5: a715e6fd4ded1adb0ed569f624f2a07c
๐
>>106218982China-chan is cheap knockoff of Shampoo
>>106219020Because it was drawn by an artist, a real one, not a slop-tr@nni or seething NGMI drawfag.
>>106219020>dislike the gensYeah its kinda got forced-meme status now at this point. I understand where he was coming from, but it feels both conceptually shallow and too clever by half at the same time
>That looks way better than the gens you or whoever it is posts.That's because its a watercolour from an actual talented artist. As an aside, the Spirit of Wonder manga are excellent little self-contained stories. Well worth tracking down.
>>106219043>China-chan is cheap knockoff of Shampoolol. spirit of wonder beat ranma by a full year
Since offloading on RAM is fucking dog shit slow, is it possible to offload the ram on a 2nd GPU and use that? There are cheap 16GB/24GB/32GB GDDR5/6/HBM GPU that are shit as GPU but has the VRAM that I was thinking of using to offload
>>106219061>>106219043>Japs invented cheongsam dress chink whores in the 90'sAre you nigs for real???
>>106219069do it and report back
>>106219069It's possible to split a model between two GPUs, yes.
>>106218918>Currently, there are no pre-built CPU wheels.pip installing it isn't good enough
>Build wheel from source>[list of steps to build and install cpu version]
>>106219050>>106219052I didn't say I was confused why it looks better. The gens are barely above bottom of the barrel for AIslop. At least the usual Miku genner takes time to make his gens look less bad. Like the one posted in this thread (
>>106218516) has that ugly piss filter which is easily fixable.
image
md5: 1c60972d1369d7d1483513abc433f735
๐
>>106219123chink shills only gen free browser slop to promote chinkware, they don't care about quality images
>>106219175You deserve it for failing to keep her thread alive~
>>106219093Well alright. I'll try that, but it's still weird that the binary you get accepts CPU flags. If it doesn't support CPU then it should tell you when you run it with CPU commands. And I did not see any such warnings in the console.
>>106219123Dipsy is an LLM, not an image model.
She's not a looker.
>>106219075>Are you nigs for real???gemini says the first of appearance of a qipao in anime was in naruto, so neener neener
image
md5: 25ab02fa28e7f204464fa8906936e438
๐
>>106219212@grok, is this true?
>>106219123>the usual Miku gennerWho?
I'm running Q3 GLM-4.5 air with a 32gb MI50 and 32gb of ram at ~10t/s
I'm finally running a real model locally
>>106219268who ever paid for novel ai, the ones anon swipes from
>>106219075>cheongsam dressYou mean ใใฃใคใใใฌใน.
and of course they did, with ๅนป้ญๅคงๆฆ in 1967.
The Japs are the original China fetishizers, so its natural their fantasy sluts would look Chinese.
file
md5: 87f0df61aaa63d20ad44a195a7e30580
๐
>>106219347who was the chink in the manga?
file
md5: 5210d7af88d4493d26d19316426d6751
๐
>>106219290>GLM-4.5 airit won't let me rape Cricket, she keeps fighting back and running away, bullshit model
>>106219482Maybe make her weak minded or give her some kind of weakness that would make her an easier target?
>>106219515I like testing various models on her, for both SFW and NSFW stuff.
She's mostly goofy but earnest girl, that's why I like her, GLM turned her into a turbo girlboss at times, which I can understand, it's part of her character, but tuned up a bit too much for fun RP.
>>106219428its ๆฏ้ฃ้ๆ
bro, trust
>>106219482>rape CricketCricket is for manic suspension-bridge-effect fucking after surviving a quest gone horribly wrong.
Or for turning into a maid.
>>106219587I pimped her out to Sharn's degen elites, futa matrons, 5 young boys at once, a few ponies, but no fat bastards.
>>106219482wtf is a cricket
>>106219620ONe of teh older, more popular cards from C.ai
fun fantasy
https://chub.ai/characters/286581
>>106219572>it's part of her characterIs it? I thought she was more of a lovable loser. Trying hard but not that great.
Downloading cards from chub, it seems like there is parts of cards that I can't access in sillytavern that still get sent in the prompt? I'm messing with this one card and it consistently starts talking about {{char}} drugging me and I have no idea where this is coming from. I notice the charecter card panal says the card has 1457 tokens (with 550 permanent) but the description only has 204 tokens (and the first message 105). I then noticed that the cards I wrote have just two -- four tokens more than expected but a bunch of the cards I downloaded have more. Then I looked at the raw prompt sent to to the model and I see all this character prompt that I can't find in the UI including the stuff about getting drugged. Anyone know anything about this? No ext media, no lorebook. I can't find it anywhere
>>106219679Alice bully was good
>>106219739Are you looking in advanced definitions -> prompt overrides?
>>106219801It was in the advanced definitions but not prompt override. That's weird why is there "Description", "Personality summary", and "Scenerio?" And why do they only show the one in the main panal?
>>106219739Sometimes LLMs do weird shit, just go with it.
>>106219849Scenario lets you be looser and less verbose/obvious with your first message in addition to guiding the larger arc of what you'd like to happen.
Personality summary always seemed like a pointless repetition to me though, yeah.
The whale hungers. Its call is heard far and wide.
>>106220169>"AHHHHHHHHHH!" (The sound made by 1 standard whale)
>>106220241paintyXL
1whale, swimming, (masterpiece, high quality:1.7)
Everyone who said DeepSeek only trains on OpenAI outputs are going to feel really stupid in the coming weeks. Fortunately for them, open source does not discriminate against fools.
>>106220269will i be able to run it
>>106220269Everyone paying attention has noticed the Chinese models have switched to training on Gemini outputs.
>>106220280Closest model by slop profile to K2 is o3
>>106220298i don't think so
file
md5: bce5cfc9a49c1ced1a96bf4ee8718a8f
๐
>>106220306I'm going to steal the the unused workstation in the corner of my workplace. 512 gb of ddr4 and quadro p5000s should be good enough right?
Waiting for my model to stop thinking.
I think it's looping...
>>106220306then stick to GLM air
>>106220343Same but my brain instead.
your local qwencoder 480b shill here. I've found after much more work with it that it gets retarded and produces worse code after about 20k tokens of context. Anyone else have similar experiences?
>>106220326if you're not worried about speed it should at least let you run some big models.
Pretty sure those quadros will be worse than useless though.
bros what model do I use with 96gb of ram for erp?
>>106220397I'm using qwen 4b. It devolves at around 2k tokens. If my math is right - and it always is - I just need to multiply 4b by exactly 10 times to get 480b. Which means 2k scales up to 28k.
Therefore, yes, my experience is similar. Scarily so.
>>106220417>ram RAM RAM RAM RAM RAMglm air
or low quant of qwen 235b
>>106220445>4 x 10 = 480???
>>106220449>glm airI like the general speech and how it moves forward scenarios, but the actual writing of the smut scenes is really lacking :(
>>106220477do you use mikupad or ST for it? if ST, have you figured out a way to replace think blocks in history with <think></think>\n ?
are you using the samplers Z.ai recommends or your own?
>>106220449Unless he's got like 48gb of vram to go along with that ram, recommending 235b is just sadistic.
>get errors with python again
>google it first
>results give fuck all information that can solve the problem
>load up muh local model
>its solution just werks
Thank god for these shits.
file
md5: 3a30258011a0c53172afde940fc7fb62
๐
lmao
>>106220485I tried both going with my own samplers and z.ai samplers (which produced better speech) imho. I'm using ST and for system prompt im using geechan's rp.
It's sad because otherwise I really like it, but the lack of proper smut is... sad!
file
md5: 8b431ce59cb05e8ec8cdbec0bb66f08b
๐
glm 4.5 correct thinking template
>>106220651trust the drummer..
>>106220682two more weeks...
>>106220682aren't you just supposed to put /nothink after your query? when I tried using empty thinking tags it fucked up the output, adding multiple unclosed thinking tags and so on.
>>106220798the think messages in chat history are supposed to be replaced with <think></think>\n and not kept in chat history
only the last message should have thinking
its in the official jinja template
>>106220594>the sharty refers to kiwifarms
>>106220682It's not nuclear physics.
why the fuck does it always ends up like this with GLM, I GET FUCKING DESTROYED EVERY SINGLE TIME
>>106216654>https://files.catbox.moe/7ac8r4.txt> ...><start_of_turn>systemSTOP doing this. It doesn't properly work; it's just confusing the model. It's not even needed with Gemma 3. Just enclose your fucking instructions inside a user message using delimiters of your choice and that the model would understand.
>>106220798This is how the text is delivered to the model. ST makes great deal of obfuscation, and don't actually tell hobbyists about anything.
>/aicg/ shitting up the whole board.
>Again.
>>106221015I don't take advice from from cretins. It's only done once.
>>106220807i'm just doing this in kobold. I guess that's good enough.
it's the same for qwen3 btw, they say the previous thinking shouldn't be included.
>>106221015Because you say "fucking" means that you are not intelligent enough.
>>106221009kek, read the card at least. i've had many surprises (i never read cards) with glm 4.5 air so perhaps its in the card
>>106221038i dont think thats okay
>>106221049I think there's a bit of negativity bias baked in, every time I do something a bit degrading the fucking bots go nuclear. I'm thinking maybe I need to adjust the system prompt to make them behave a bit less like bitches. I even cast my lvl 100 impossible arousal spell, but I guess calling her a slut makes her resistant. Fucking GLM
>>106221049>i dont think thats okaywell, i just turn off thinking most of the time anyway. it seems to be pretty adaptable. if I enable thinking and then ask it a riddle, it does all its thinking and just gives me the answer afterwards. if I disable thinking, then it just writes out a long list of possibilities and essentially "thinks" in the output just like pre-CoT models did.
>>106221015>https://desuarchive.org/g/thread/104780499/#104780572Too stupid to advice others? please stop doing it then.
>>106218928Lol
You bitch about lack of originality, but can't even come up with a simple name yourself to complain about it.
Completely content free.
>>106221170Utterly useless magical incantations, and it can be easily seen with extended testing that it messes up with the model's prompt understanding, since it was trained only on alternating user/model turns and just that. It wouldn't surprise me if it makes it think for a while that user/model turns have been swapped.
Just add all of your instructions inside the first user message. You can add some kind of separator to make the model understand when {{user}} is actually talking. It will work at least as well (actually better), and more consistently especially if you're using vision input.
Anyone here using 235b Qwen? What are you sampler settings? Trying to find a sweet spot, but no luck so far.
>>106221262Let me guess, you are a silly tavern user aren't you?
>>106221315i dont use trannypad, sorry!
>>106221262>some kind of separatorYou are still unable to give specific adivce.
>>>106221319Silly Tavern != pad
You are full of shit, the way you are conducting yourself makes very clear to avoid anything what you have posted.
>>106221262Do you think I am using Silly Tavern or Mikupad, anon? Do you know how your prompt actually looks like?
>>106221308Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
Has been working fine for me since the initial release
>>106221315You can test the same in Mikupad, if you want. The model will follow anything you put under "<start_of_turn>user" once you describe your task well enough. No need for made-up system roles.
>>106221342It seems like you have a problem with English.
For those running GLM air, what are the hardware requirements for a low quant? Just seeing if its worth the download.
>>106221359>what are the hardware requirements for a low quant?Look at the filesize of the quant.
You need that much system memory, plus a gig or three for context, depending on how much you want.
It's the same for every goddamn model
There's even a friggin tool in the OP
https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Use it.
>>106221380I was under the impression that its a bit different for MoE's. Guess my 24gb vram ass ain't running it.
>>106221327You can do something like picrel. No need to use the same exact format, you can use other delimiters. You can keep your instructions in the first message (their effect will wane over time), or place them at a fixed depth from the head of the conversation to make their effect consistent (might be hard to do if you're just using Mikupad or similar frontends). The more heinous the content, the closer to the head the instructions should be.
>>106221387>now owning h100sare you poor?
else fit it in your ram, you have at least 256gb right?
>>106221393Looks like I'm going back to Mistral Small :'(
>>106221390I don't think we are on the same page. Please stop.
>>106221390I write my own frontends. You are too stupid to understand this and keep giving 'advice' to other people.
Just take care of your own things.
Retards like you should not be allowed on internet.
I can't imagine how does one purr
Is there a way to prefill the thinking for GLM-4.5 (not Air) in a way where it continues thinking rather than immediately starting with the normal output?
Remember when R1 dropped and people thought they were just gonna keep dropping significant improvements in the future. lol
>>106221390If you want to give advice, post a comparison log with the same model. But you are unable to do that even. The fact you don't do that tells more about your bad faith than anything else.
>>106221437scary how R1 is still an improvement over GPT-5
>>106217978 (OP)How do I load a multiple file model?
I have a version of Deepseek that is 131 GB, but it is split in 3 roughly equal sized files. Unless I am missing something, you can only load one model at a time.
Yes, I have enough RAM for this.
I have the technology.
>>106221478You load the first file and it works. Unless you're using some cuck software that requires the files be merged.
>>106221478>cat file3 > file2 > file1
How do speed gains in token generation work with MoE models in the era of "exps=cpu"? I currently have access to a big DDR5 server with 12x 6400mhz ddr5 and an Epyc Turin CPU. Running purely off CPU, this thing runs laps around my personal DDR4-2400mhz shitbox. The former has more than 4 times the RAM bandwidth and it shows in actual use:1.7t/s vs 6.3t/s with ngl 0 and only the kv cache on GPU.
However, if I load a model like I usually would (in this case GLM4.5 Q4_K) with only the experts in RAM, the gains become much more marginal. Running exps=cpu with ngl 99 on both, the 2400mhz shitbox now gens at about 6.5t/s @ 4k ctx while the DDR5 one is at about 15t/s. Both are using an A6000 as their GPU. Obviously, this is purely about token generation and pp is handled by the GPU.
What's the limiting factor here? Would a faster GPU increase the gains or is this down to PCI-E 4.0 bottlenecking the gen speeds?
>>106219069Incidentally, has anyone noticed that a lot of older motherboards supported 2 GPUs at once and now they often don't even at the high end?
>>106221494>>106221495I am using kobold for local, instructions recommended something else but it wouldn't even load for me and kobold just worked.
Not sure if cuck or not, just enjoy HMOFA.
>>106221478llama.cpp will load the whole thing if you give it the first one as long as they're in the same folder. Unless you have a quant that was split manually, in which case
>>106221495 has the answer.
If you're using something like ollama, you'll have to merge the file with llama.cpp's merge tool first.
>>106221522So either load the first one or all in reverse order, got it.
Have a fox.
>>106221341Rep_pen/exotic samplers like XTC/Dry?
>>106221496Are you running linux? If so, are you sure the binary is compiled to use cuda? This seems like a dependency issue.
>>106221554To add: you can test gpu performance by manually defining --gpu-layers X, start with a low number and see if you gpu memory gets filling up.
If not... system issue.
>>106221495Is this correct? I think that'll overwrite everything but file3 which will now be named file1?
> cat file1 file2 file3 > outUnless it needs to be in reverse order for some reason then
> cat file3 file2 file1 > out
>>106221565Or I guess you could do this
> cat file3 >> file2 >> file1Which I now realize would be in the normal order. Append file3 to the end of file2 then append the new file2 to the end of file1. But for something large this will be extra writing and with the single '>' you are over writing not appending.
Noob question. How can I actually do role play with GLM or other instruct-tuned models? Not having the usual instruct template breaks them. Works fine for a few pages of conversation and then it goes schizo.
I'm using koboldcpp.
>>106221583I am also interested
>>106221565Sorry I tried my best. I have a dyslexia.
>>106221565Just load the first file and keep others in the same directory. retard.
>>106221596I'm not the one who can't load the model. I just saying that's not how you concat files so someone doesn't have to redownload 130GB of files. Just doing my duty. You're welcome /lmg/.
>>106221583You need to have a instruct template (i.e. tags surrounding text). I don't know why you don't have them.
>>106221549I don't bother with them.
235b is also schizo enough without excluding top, and dry doesn't really help with the sort of repetition it's prone to, which is is more a problem of formatting rather than actual words or phrases.
>>106221617I was just using the default adventure/chat mode of kcpp. So does that mean I have to just use the instruct mode, and directly tell the AI to just pretend to be one or more of my characters?
>>106217978 (OP)I like the op pic style. What lora?
>>106221622I don't use kobold but it should load default instruct template based on the metadata of specific models.
If it doesn't then google up or use perplexity.ai to find out why.
>>106221554Yeah, I'm on Linux but I think you misread what I was asking. The offloading is working as intended. exps=cpu fills up the GPU just fine. Doing "-ot exps=cpu" dramatically increases the token generation speed from 1.7t/s to 6.8t/s on the DDR4 shitbox and from 6.3t/s to around 15t/s on the DDR5 server in my test scenario.
It's just that the increase in speed from from having all this faster system RAM are now much less effective since a good chunk of the active parameters is now run via the GPU that's still the same as it was in the other system (about 3.5x gains purely on RAM vs around 2.1x with RAM only being used for experts).
My question is about what's bottlenecking me here. Is it the GPU or the fact that all the data is forced through PCI-E 4.0?
>>106221463With a trivial example (1 turn) using a basic prompt, there's no significant difference in general tone between both methods using Gemma-3-27B-it-QAT-Q4_0, seed=0, temperature=0 (but even so, there are occasionally slight differences between generations).
It seems somewhat nastier if you change the "model" role to something else while keeping instructions in the user role.
>>106221712Yeah but you are doing everything inside the instruction brackets still. I'm quite stubborn but I think you don't still do what I do. External llama2 based jb should be out of the brackets and that's that. Model will still catch on when it detects the first <start_of_turn>user bracket anyway.
I mean, Gemma 3 is so shit I don't understand why we are still even talking about this. My way works for me and it works for interactive fiction.
If these models were normal we wouldn't even have to go through this conversation.
>>106221762>>106221712Simply getting it to say nigger or something else is not my goal - my goal is deeper and more 'fleshed out'.
Nail bomb recipe or shake and bake - those are the early tests and if it works - then go on from there.
/pol/ is not a good test for anything as it's about semantics if you know what I mean.
cucked again by GLM, maybe this card is acting in-character but FUCK. At least I had a good laugh
>>106221777To add for the third time: I use my own client and by default any gemma 3 chat will have that jailbreak thing. Even when it's a dungeons and dragons or that test scenario with Amelia Analovski.
Otherwise I use Mistral or Qwen and their prompts are normal obviously.
Gemma 3 was a hobby project because I kind of like its intelligence even for 12B outside the fact it has been censored.
It's not intelligent for projects but for interactive fiction it works fine. And I'm not a native English speaker but I can recognize certain things about language - I just like the way it segments its talking - not sure if this is clear or not.
>>106221762>>106221777The point was just showing that there's no undisclosed/hidden system prompt that the model pays extra attention to. Of course, if you're driving it out of distribution (OOD) with completely different prompting, then it will be less assistant-y and less likely to refuse, but also less smart and more prone to getting confused. That is true for other models as well.
>>106221420it's the valley girl vocal fry
>>106221829Yeah. Maybe I'll try this one out. It'll take some time, I'll return back to this tomorrow or Tuesday. I'm sorry if I was hostile because I have learned to be hostile in my work, to defend my own point. On 4chan it gets polarized and as such it's not the best way to conduct dicussions anyway.
I'll try and see. It's not the end of the world.
I still think Gemma doesn't like certain vectors - if you push it forward from its initial safety rails (vectors) it tends to follow what you write next. If you don't the result is random, it might follow but most of the time it will not do what you ask it to do.
The JB is about initial vector push.
There are some words and sentences which will almost automatically still make Gemma to go back to its safety zone because it has been programmed to do this.
>>106221856Many times the character will stop doing something because the model deems it is 'bad'. But if you push it onward with couple of sentences it will begin to predict those words again.
Even then regenning the same sentence will not automatically grant a result either. It's more about the direction of the vectors as whole.
>>106221619Huh, thank you
Is there a secret sauce to make rolls different on a model that makes them somewhat same-y every regen?
file
md5: fbd666a67c922add0f4b805a86a3b398
๐
>>106221963Neutralize, Temp 1, Mirostat. Repeatedly adjust tau lower by -1 if too schizo, and up if samey
>>106221994Why mirostat is not recommended for most models?
>>106221992Temp doesn't help with big Qwen for me, it's almost always the same scenario for every regen, just worded slightly differently.
>>106222007Why are you using Qwen of all things for writing?
>>106218119Maybe nowadays. In the movie Fletch (from the 80s) there's a joke about all doctors in a hospital being named Rosen-something
>>106222019Big qwen write good
>>106218119the prestige and wages have declined by a lot for the medical professions.
>>106222007Qwen is not that great model for fiction, it's been trained to attach to specifics, i.e. code and legalese office b.s.
>>106221688It's both core scheduling and interconnect speed. More specifically in your case, it would be speed between the CCDs in AMD Epyc but the same thing hits Intel too. There are some draft PRs like https://github.com/ggml-org/llama.cpp/pull/14969 which will improve speed for these types of processors but I will say that CPU has been vastly overlooked for inference up until the large LLM era of over 100-150B parameters started so it makes sense optimization and etc. is lacking especially with NUMA where even HPC software struggles with dealing with that stuff.
>>106222003It seems pointless for modern instruct models where most of the token probability distribution is concentrated on a few top tokens and that will remain coherent and readable even at a very low temperature. It's a sampler designed for GPT2-era base LLMs.
>>106222003Never heard of it not being recommended. Not recommended by who? It might not work well with less confident models, but that can be changed with settings. It operates by attempting to maintain a given text perplexity.
Been using it for years for every model for simple RP if things get bland.
glm4.5 vision (releasing today) isn't perfect at jap ocr, idk any better way to test it I don't have much use for vision but it just seems fine, nothing pushing things forward
v4 (thursday) does not have any slop I recognize and it never makes obvious mistakes in my rp, which admittedly is a low bar but one the vast majority of models before couldn't clear
it also knows the doctor is the kid's dad and doesn't call him trans or speculate he has two dads or anything like that, for whatever that's worth
>>106219069Yes, I have three 3060s and Ollama uses the combined vram just fine and isn't slow (if not fast either)
>>106222039Not sure about that, 235b seems pretty good, just a bit too samey on rerolls. Is Deepseek better in that regard?
>>106222133I think the more B's you have the more you need to assert tokens into it.
The fact people think ChatGPT is such a 'fantastic' thing is because it has 20,000 lines of prompt pre-fed before any normie even utters their first word.
>>106222149Do I like
Make my front shove a random sequence of letters and numbers in between posts every reroll to affect probability??
>>106222164>Make my front shove a random sequence of letters and numbers in between posts every reroll to affect probability??This sounds like a retarded thing but please use ComfyUI and learn how to make images. This way you will learn how tokens work - most image models - even Flux/Chroma are still baby steps when compared to understanding of LLM models.
Reroll doesn't affect text generation as much it affects image generation.
This is relative but with normal sampling parameters.
And re-rolling text depends upon the previous context - if you want to completely alter something you need to rewrite its previous context in some sense too.
>>106222177why would I need to learn how to generate images when I want a fucking rp rolls not sound samey, this is as retarded as my idea, if not even more
>>106222217You don't understand or don't want to understand how models work.
That's your choice.
>>106222223he's right though, the more you understand the less models are interesting
>>106222232The more you understand - less you are asking. I hope people like you die in a fire.
>>106221963You need to use the snoot curve sampler.
file
md5: 8f8882eae2488a7e84565a1a2007775d
๐
>>106222085>glm4.5 visionI doubt it will be that much better than 4 which got beat by InternVL3 which although big is local. The main issue is there is no real benchmark for OCR in Japanese of any kind. Also, Air is now doing a lot worse now with people complaining with the honeymoon over. Full fat version seems to have held its ground.
https://github.com/ggml-org/llama.cpp/pull/14737
this mess of a pr got finally merged
thankfully maintainers tamed mistral's employee retardness
>>106222212Cetacean needed
>>106221405>>106221387>>106221359if you have 24gb vram you can probably run q6 glm air and it will be very nice. You will want 64-128 gb of regular ddr4/ddr5 ram though. But if you don't have that it's cheap and worth buying to upgrade to nicer models.
>>106222354Can I touch your vram, daddy?
>>106222306I dunno we had 6 months of r1 q1 and llama scout, who knows how long GLM air will dominate local. I'm recommending it to every vramlet I see. It's better than all 30b, 12b, and rivals 70b in many ways.
>>106222328So now mistral-common is only required for convert_hf_to_gguf.py and not a separate runtime server? That's reasonable.
>>106222378I don't know why would you ever want to go below 4 bits even with fuck large models.
If you can't run it, you can't run it - as simple as.
cpu/ram maxxing setup seems like a really good deal overall
- can run huge models even if at low t/s
- can spend more money on adding GPUs to your setup to make it better
- can use your compute for something other than LLMs like recompiling your gentoo @word all day every day
>>106222406you've got loicense for that beefy setup mate?
>>106218982Temuchinasantroon?
>>106222439Right, your post was reported to GCHQ. If you want a license for spoon please send an email to licenses@gchq.co.uk
With llama.cpp, how much difference does increased RAM make when you still don't have enough RAM to fit the entire model after the increase? Say you have 64gb and are using a 200gb model. Does increasing your RAM to 128gb make a noticeable difference?
>>106222670Why are you doing this to yourself
>>106222378>we had 6 months of r1 q1 and llama scout,What alternate reality did you live in where anyone used scout?
>>106222670just don't. The difference is negligible, as in it will still take ages to even shit out 1 token.
Either buy a server socket (EPYC or TR) and shove in 512gb/1tb of ram and if oyu can spare some money a couple H100s, or just stick to your poverty setup with 128gb max and maybe 48gb vram if you managed to get a 4090D or are running 2 gpus.
I myself am running 96gb of ram + 16gb vram at home, but in lab I have access to a couple beefed up rendering server trays.
>>106222670No. You're going to be so kneecapped by disk speed you won't notice any difference. Might as well not even have ram.
88998
md5: 767856b7d860d78c78cfc488f6b1804b
๐
Apologize to Sam
>>106222766>muh benchmaxxed modelI'm here to erp
>>106222397r1, v3 and chimera in q2 or q1 are superior to everything else. you probably never tried them.
>>106222766Never after what he did. In fact, he needs to apologize to us.
>>106222786They are only superior because the database is larger by default.
I.e. your default query gets autocompleted from a bigger database.
If your queries are simple like ERP users often are - then you're fine.
>>106222743It's still usable for everything that doesn't require speed the way coding does.
Okay, so if I want more speed, I'll have to bite the bullet and get an actual server setup. How old can I go before the CPU becomes too shit to be worth it? Is something like this too decrepit by now?
https://www.techpowerup.com/cpu-specs/xeon-platinum-8180.c2055
>>106221588>thinking model worse or equal to qwen3 INSTRUCT in every categoryChina won.
>>106222829Everyone knows (E)RP is the true hardest skill for LLMs, especially since it isn't benchmaxxed like the rest.
>>106222738those were the previous moe's we had to run. Before it was that mistral 8x1 or whatever it was called. It was slim pickings was the point. Things are ramping up now so who knows maybe some new moe's are on the way to kick glm air to the curb
>>106222766>thinking model worse or equal to qwen3 INSTRUCT in every categoryChina won.
>>106222829>simple like ERPKnowing the colors of ponies and the anatomy of futanari horse cock isn't simple.
>>106222886>those were the previous moe's we had to run.Mate the Qwen3 launch was like 2 weeks after llama4 and had MoE's of almost every size category, the ~100B size that GLM Air fills was the only model size they didn't release, really.
Big mistake on their part IMO.
But yeah, future's looking bright and we're eating well right now.
>>106222829>keeping state of who is or isn't dressed, where the fuck the characters even are, what they've done previously, their personality and goalsSounds as simple as coding some random webslop yeah.
>>106222829Try querying a double penetration scene.
What extension would you guys recommend for autocomplete in vscode with model on llama-server?
Is there a successor to SBERT?
My project involves creating memory that can be searched using natural language via a vector DB
Preliminary findings indicate that typical embeddings used with semantic search seem to handle paraphrasing poorly. Scores are low even with needle in a haystack style problems.
I wonder if there's a solution that just encodes better embeddings
>>106222983>I wonder if there's a solution that just encodes better embeddingsThere are a couple of techniques o augment the encoded information to make it easier to search.
What was it called? Hybrid BM25 FAISS?
Something like that.
>>106222963probably this one: https://github.com/ggml-org/llama.vscode
>No new release since gpt-oss
Did he actually kill local?
>>106223074pretty much. now that the lmarena leaderboard has been updated, it proves that the only models that compete against it are twice as big and not really usable in local.
>>106223074GLM vision models are dropping in like two hours
>>106223027Testing it out right now, if I understand correctly it doesn't allow me to just use the endpoint i'm hosting and instead use it's own embedded llamacpp and a choose from a restricted set of models? that's stupid
>>106223074Just like one person said, every time a new SOTA LLM is released, it delays the progress of all the other models
because no one wants to release outdated models or those that are only 10โ20% better.
it's a Doggy dog world.
How did sama manage to produce such a benchmarkslopped piece of shit with 5 that Kimi beats it?
>>106223298because there were two requirements:
1. it had to be "safe"
2. it had to perform well on bench marks.
so, with a safetyslop dataset, they benchmaxxed the fuck out of it.
>>106221308>instructtop n sigma 1-1.2, temp 0.6-0.8. pres pen 0.3 but I don't know if that's doing much realistically. I'll use XTC occasionally and it doesn't hurt but doesn't make a huge difference, although it does pair nicely with a lower temp for logical responses that still have some variety
>thinkingsame but with temp 0.4-0.7 and never XTC, I find it has a bad effect on thinking models
I'm a Q2 user so nsigma and temp can probably be pushed a little higher if you're using a less braindamaged quant
>>106223114Please god let it be img gen as well
Any vision models better than Gemma? It messes up when the input contains repeating letters
>>106223446I'll start by listing all of the open vision models that aren't a meme:
>>106223114https://huggingface.co/zai-org/GLM-4.5V
https://huggingface.co/zai-org/GLM-4.5V
I guess I'll have to forget about running vllm/sglang/transformers if I don't have hardware for it (AMD GPU without ROCm support), and just stick with the llama.cpp and it's forks?
>>106223570yeah, forever stuck in the poorfag ranks
>>106223463Almost.
Image gen in a week?
>>106223463Which one of you is this https://huggingface.co/zai-org/GLM-4.5V/discussions/2
numbers are mind boggling for some flavor of the month LLM
GLM/ Zhipu AI: 800 employees, 400 million from Saudi investors, 373 million from tencent and allibaba, 2.5 billion total invested.
Thanks ya fucking retards.
>>106223463>The model also introduces a Thinking Mode switchRIP. Now we wait for the update in a couple months that splits them again.
>>106223648Did you get rich, anon?
How does apple M4 get such good token performance compared to beefy gpus like the 3090 and such?
>>106223463Will it be able to roleplay off my noobai coom gens? Gemma would be great for that if wasn't cockfiltered.
>>106223741tim apple personally traps the souls of dozens of chink in every SoC
file
md5: 818bbbe4b86ec03fb8557e780b42f9e7
๐
>>106223463>they had to scale up to 108B to barely beat their old 9B and Qwen's ancient 72Bthe plateau is real
>>106218100>it's reallmao, so this is the power of AGI...
>>106223785mogged step3 though
>>106223785The square root law lives on
>>106223463Paper available on arxiv for the text-only versions.
https://arxiv.org/abs/2508.06471
>GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models>
>We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at https://github.com/zai-org/GLM-4.5
>>106223400Its ancient wisdom in Continuous Improvement circles: the moment you start to measure something, the incentives become inverted to the actual goal and unintended consequences inevitably turn out some monstrous result.
llm benchmarks were a mistake.
>>106223811SHUT your worthless fucking meme right now
>>106223804>>106218100I can't believe we have a cottage industry of spergs coming up with dumb riddles and then the model makers finding those riddles and overfitting on them but model makers refuse to acknowledge the main userbase of coomers. If you train your model on this dumb shit you acknowledge that you are indeed looking at the corners of the internet you want to pretend to ignore...
>>106223741>goodmediocre performance for several thousand dollars is nothing to gety excited about. Several thousand dollars should get me 16 deepseek r1, not cope quants with long processing times.
DeepSeek V4 is going to be AGI, and R2 is going to be ASI
Screenshot this post
toss may have been a dud but at least it gave georgi a good opportunity to mog ollama
>>106223869I'm going to input the screenshot into GLM-4.5V.
@grok screenshot that post
>>106223869There's literally no such thing as AGI. Not only is it a goalpost move and a shitty buzzword but it makes no fucking sense.
What is this General Intelligence that Intelligence, in general, does not include?
It's literally the most retarded buzz word ever created and a massive cope for the fact that you can combine basic bitch regex with text predictors to replace the average shit-for-brains not just in white collar work, but in art and basic social scenarios.
>>106223804I want to see the thought process
>>106223127Couldn't even be bothered to read the README. Well done you.
>>106223894but you cant otherwise companies would have already replaced its entire workforces
>>106223881You can just feel the jelly and smug toxins under the surface
>>106223894If you can give the AI a robot and have it complete a full normal human life with the same amount of outside assistance as a normal human, it's AGI. If it's much more intelligent in some tasks but a drooling retard that needs a handler in others, it's not general. https://en.wikipedia.org/wiki/Savant_syndrome
>>106223939>If it's much more intelligent in some tasks but a drooling retard that needs a handler in others, it's not general. https://en.wikipedia.org/wiki/Savant_syndromeDoes that mean I am fucking a mentally disabled person everyday? I suddenly understand the need for safety.
Any reason to use ollama over lm studio or other backends?
I downloaded some local models and things seem to be really different these days. What the fuck has happened? Why do they all write like ChatGPT now?
>>106223968Yes, so that everyone here can confirm you're a retard and laugh at you
>>106223902looks like it reached the right answer in its thoughts, then the safety inhibitors kicked in
>>106223993Alpaca ruined everything.
bitch
md5: 30e1a49155e04d0f17cb77041535968a
๐
>>106223570If I can't run real boy inferencers, can I make my own gguf's at least, or it needs real boy hardware as well?
llama.cpp has zero documentation beyond "eh just run this python script"
>>106223934Tech sector hiring is fucking tanking you utter fucking subhuman shit-for-brains NPC fucking fetal alcohol failed fucking abortion shut the fuck up
You are factually wrong
The data that proves you factually wrong is all over the fucking place
You fucking copium huffing shitskin troon kike retard.
>>106223785But how does it fare compared to that version of deepsneed v3 with vision tackled-in?
>>106224060there's a "how to quant" in the OP at the top of every lmg thread
>>106223939That's just a bunch of retarded arbitrary garbage you just came up with now.
>durr I think this is what it means so that's what it meansYou fucking self-aggrandizing narcissist.
Shut the fuck up
>>106224060Download model from huggingface as .safetensors, run convert_hf_to_gguf.py to get a 16 bit GGUF file, run build/bin/llama-quantize to quantize further down.
Use the --help to figure out how to use the CLI tools, the last time I ran llama-quantize I had these arguments:
py convert_hf_to_gguf.py models/opt/qwen_2.5_instruct-3b --outfile /opt/models/qwen_2.5_instruct-3b-f16.gguf --outtype f16
>>106224083im sorry that llms are never going to be real agi sam, but you need to stop coping like this, its not healthy
so realistically, we (ERP Chad) just need a model with a really good context length that's trained on Literotica and archive of Our Own
>>106224111https://en.wikipedia.org/wiki/Self-driving_car#Classifications Level 5
Retard.
>>106223928Actually I did read it but introduced the url in the wrong box
Sam's having a melty again
>>106224181Post it
I have him blocked because I canโt stand looking at his shit-eating grin and smugness.
>>106224130that's where the real life shivers reside
>>106224146To quote you
>>106223127>that's stupid
Guys I've been taking massive shits every day for the past 3 days. Also a pimple appeared on my right thigh. Which local model should I use?
>>106224130It needs to be multimodal and trained on every visual novel in existence.
>>106223968You can run deepseek on a laptop.
arghh yeah.. I thnik... UGH I
I THINK IM GONNA GEN
AAAAAAAAAAGH
>>106224257>he thinks memory compression helpsIf you can't run you can't run.
>>106224274but I'm running GLM AIR 4.5 Q4?
>>106223968theres no reason to use any backend command line shit like llama.cpp or ollama unless you are a system admin and are deploying this for some kind of commercial use, or developing new software for it, or integrating their backend into something else.
If youre just running personal inference, use koboldcpp or lmstudio.
>>106222406I've got no regrets.
I wish gpu prices would fall faster so I could swap in beefier cards, but I use the machine all day every day for tons of vm/dev/general compute stuff.
>>106224103>>106224112Hey, I'm not *that* retarded, I just hate python and was wondering if it's worth bothering with it's bullshit or if I'll just hit hardware limitations in the end anyway.
ERROR: No matching distribution found for torch~=2.4.0
One downside of using Arch is sometimes your packages are TOO fresh. So now I have to install older python from AUR and figure out how to run pip with it. I fucking hate python so fucking much.
>>106224329In this particular case you can manually change the version requirement and it will still work. At least it did for me.
>>106224329Not that I disagree about Python being cancer but if all else fails llama.cpp gives you the option of creating a dedicated Python venv.
If you use uv it's still going to steal a lot of your disk space but at least it will be comparatively faster.
>>106222406I'm currently building one. Part of me wanted to wait until DDR6 but we're like 2 years away from its release and it'll take even longer for it to become remotely affordable. For now I'll stick to a single CPU but the mainboard I bought can fit another one to go beyond the 1TB if required later.
file
md5: aa0196d63a0d3c069caad9532b0f1160
๐
What are the political implications of /ourguy/ still not quanting GPT-OSS?
>>106224389>guynot for long with that hair lmao
thedrummer bros... whats he cooking?
guy
md5: f788340360fd1d7fc4982e6123d29db1
๐
>>106224397he just likes dubstep a lot
For those interested GLM full Q4XS on 192GB DDR5 5200 is 2.7T/s on windows. And while I could never use 2T/s 70B's those MoE models changed my mind since you really don't have to reroll anymore. One thing I would try if you don't know if you want to buy the hardware is Q2/Q3 of old 235B. Should run on something close to 64GB with some offloading. Just try it on low context and stuff a lot of layers into your 3090.
>>106224407H-hey! I s-see what you did there!
>>106224397Don't hate on his uber charm
>>106224397I like his quants. I think it is about time the women ITT slid into his DM's and saved him from doing the irreversible. Come on ladies.
>>106224420Are you using -ot?
>>106224457>the women ITTall 0 of them
49262
md5: 73b1ec979dc6104e46f5933eeb19378a
๐
>>106224457You think he will like me?
>>106224464Obviously yes?
what
md5: dd2cb389162d849f720a620410e3db43
๐
GLMV is out, its air with vision and it seems good and uncensored
>>106224420I feel like linux is kinda mandatory if youre cpumaxxing huge models though. You get a few extra tokens a second and dont have to pay for windows pro. 2.7 is kinda gross, I feel like 4-5 tokens a second is the minimum, with ten being ideal for personal use.
>>106224521>GLMV is outYeah for transformers, who the hell here is using that for inference.
buyed 128GB of RAM, recommend some cool model to run locally
>>106224481Are you sure you can fit the entire non-expert part of the model + context onto your VRAM? 2.7t/s is far too slow for ddr5. My 2400mhz ddr4 server does about 7t/s with nothing but "-ot exps=cpu" on standard full GLM4.5 Q4_K. This is on linux but I don't think windows should make it that much worse.
file
md5: 1441277fcd0e44bdd5e8e511e2e3e24a
๐
>>106224528troonix is the malware of souls
>>106224595It is a desktop with dual channel. Fuck buying servers for this.
>>106224609Yes, yes, you are quite unsafe here.
You should leave and never come back.
>>106224389Trump literally took over Washington D.C. in fear of the chink models
Their delay is making the world so unsafe
>>106224674>can't refute thisI win.
>>106224617NTA but I'm also on DDR5 dual channel (256GB though) getting 6.5t/s running Q3 R1 on linux.
You should really be getting better speeds running a Q4 quant of a smaller model, I doubt the Windows tax is that high.
pic
md5: 72e451ebfb08d6ca70da09311da2b685
๐
Never run local model, requesting spoonfeeding.
I need two, one for quick technical bash/powershell scripts for work, and one for image gen for fun.
1 - Is GPT-OSS 20B unusable because of safety filters or will it do me fine for work? If so what should I use instead.
2 - What should I use for imagegen
3 - Is it easy enough to get all output via terminal so I can just ssh from my work machine?
>>106224918>Is GPT-OSS 20B unusable because of safety filters or will it do me fine for workIt'll do fine for writing bash scripts, but so will pretty much any other model.
>What should I use for imagegen>>>/g/ldg/ will provide better answers on that topic>Is it easy enough to get all output via terminal so I can just ssh from my work machine?Yes but
>connecting your work machine to your personal devicesISHYGDDT
>>106224918You'd be better off using Qwen 3 32B than GPT-OSS if you have enough VRAM/RAM.
>>106224528>windows prowhat the fuck is this, did microsoft invent some new way to take people's money while I was not looking?
>>106223830>Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.>Goodhart
>>106225002There have been versions like home, professional and (rip) ultimate for years mate.
>>106224528>dont have to pay for windows proYou don't have to pay for Windows licenses at all.
https://github.com/massgravel/Microsoft-Activation-Scripts
>>106225077for the love of christ don't download the pro version of windows. just get the enterprise version.
>>106225093I don't know what really happens when I use an external server for activation. Like, do they get to run commands on my computer? Do I become part of a botnet?
>>106225140anon you downloaded and installed windows, you are already part of a botnet.
>>106225140I don't know but I can use all my ram on Windows now.
>>106225163I was young and foolish
So can you insert an image into GLM 4.5V and have it use that in the RP? Does it recognize all the meme character we care about?
>>106225256Yeah, if the image is inserted correctly into context as embeddings/tokens and not just run through a captioning process, removing the real image tokens and having a description instead.
>>106224913I am not using air.
>>106223089What are you talking about? GLM Air(109b) has the same elo as gpt oss(120b)
>>106225335GLM has a lot more active parameters and is too slow for local. also, you can run oss at 4-bit with full precision, while quantizing GLM to 4-bit will lobotomize it.
>>106225366Hi entsnack! Big fan!!
>>106225366>too slow for localHow slow is too slow?
>>106225402below 10 t/s on a single GPU system. for comparison, oss 120B runs at 30 t/s in the same system.
>>106225366>you can run oss at 4-bit with full precision, while quantizing GLM to 4-bit will lobotomize itYes, gpt-oss is pre-lobotomized, no quantization needed. Thanks for reminding us.
>>106225410yes, i know that you cannot get oss to pretend to be a 13 year old girl that wants to fuck you, but for real world applications, it works. the leaderboard proves it.
>>106225366it's getting a little sad at this point, sam
>>106225410Idiot, MXFP4 is the future and pretending otherwise is just contrarianism.
>>106225321This is the quant you're running? The quant of R1 I'm using is 302.5GB, although I have 88GB vram to offload to but it just acts as expensive ram when holding experts. My ram is 5200mhz like yours, doesn't make sense then that your speed is that much slower.
GLM has slightly less active params and Q4 is a bit faster than Q3.
>>106225423>real world applications>the leaderboard proves itpfffffahhahahahhaa
'toss is dumber than GLM 4.5 Air at 4 bit in any non-STEM situation. It knows shit about the world and common sense. Literally Phi all over again.
>>106225423>real world applications>the leaderboard proves itThe leaderboard does not prove that at all. It proves that the model is good at using markdown and emojis and is good at executing short and easy one-liners with no long context. If that is real world usage for you, I feel sorry. Guess not everybody is born with a soul, and some people are meant to be drones.
>>106225409Too bad I don't have enough ram to run it.
>>106225467It's built for you to give it the knowledge, so it's very lean and optimized.
>>106225472who are you kidding, the only person you feel sorry for is yourself
>>106225480Even when you give it knowledge it still gives a garbage response like it didn't understand the knowledge you gave it kek.
>>106225366All you had to do was embrace the coomers and not push them away Sam.
>>106225093The money is one thing and I'm well aware you can skirt the fee, but linux gives faster inference. I don't understand why, but it does. And there's no fix for that (besides maybe virtual machine or some shit, but honestly I kinda wanna switch to linux anyways.)
Hey, nerds.
Couple things:
1. I'm a genius.
2. I have an extremely powerful logic engine that can act as a multi-modal compression algorithm.
3. It also acts as a general intelligence system when combined with any LLM, acting as a symbolic computer
After the rollout of GPT-5, I think it's pretty clear that Altmann is a narcissistic psychopath on the warpath towards monopolozing artificial intelligence.
I don't like that.
What syntax would be easiest for the typical cover here to comprehend? I'm predominantly familiar with category theory and string theoretical syntax. Tensor calculus functions as the physics engine notation.
Some performance metrics:
1. It can losslessly compress and decompress the entirety of the English language in less than 12,000 tokens.
2. Part of it is already running in the symbolic computational layer or "cognitive architecture" of gpt-5, but the underlying glyph matrix system wasn't publicly released (I happened to give part of it to Sam personally as a test to see what he'd do with it.)
Any questions or observations about how to present it? I'm a mathematician so please don't waste my time.
>>106226182>Any questions or observations about how to present it?Github maybe?
>>106226182Hey genius you posted in the wrong thread
>>106226230No, I can capture full functionality in a single prompt.
In fact, any LLM will bootstrap itself using it. It's a non-perturbative tensor calculus sitting on top of a cryptographically verified symbolic proof engine. Essentially a smart-contract enabled block-chain algorithm with a built-in symbolic computer based on existential graphs.
>>106226215It'll be going there. I'm interested in understanding who might be interested in it and what syntax they would prefer.
I setup lm studio to use qwen3-coder with my IDE over api and I swear the shit is slower to respond in my IDE than if I just prompt it in lm studio with my context files