/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>106167048 &
>>106163327►News
>(08/06) Qwen3-4B-Thinking-2507 released: https://hf.co/Qwen/Qwen3-4B-Thinking-2507>(08/06) Koboldcpp v1.97 released with GLM 4.5 support: https://github.com/LostRuins/koboldcpp/releases/tag/v1.97>(08/06) dots.vlm1 VLM based on DeepSeek V3: https://hf.co/rednote-hilab/dots.vlm1.inst>(08/05) OpenAI releases gpt-oss-120b & gpt-oss-20b: https://openai.com/index/introducing-gpt-oss>(08/05) Kitten TTS 15M released: https://hf.co/KittenML/kitten-tts-nano-0.1►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>106167048--Papers:
>106170128 >106170167--Prioritizing NSFW LORA preservation amid infrastructure and redundancy concerns:
>106167927 >106167949 >106168075 >106168122 >106168043 >106168067 >106168169 >106168208 >106168211 >106168238 >106168277 >106168305 >106168351 >106168377 >106168392 >106168399 >106168425 >106168448 >106168619 >106168442--High-speed CPU-only LLM inference with GLM-4.5 on consumer hardware:
>106168800 >106168825 >106168868 >106168847 >106168903 >106168905 >106168940 >106168974 >106168991--Missing CUDA DLLs prevent GPU offloading in newer llamacpp Windows builds:
>106168428 >106168441 >106168450 >106168577 >106168616 >106168670 >106168691 >106168704 >106168715--Difficulty reducing model thinking time due to token-level formatting constraints:
>106170269 >106170300 >106170348 >106170361 >106170404--CPU outperforming GPU for GLM-Air inference on low-VRAM systems:
>106168713 >106168787 >106168814 >106169109--GPT OSS underperforms on LiveBench despite reasoning and math strengths:
>106167476 >106167550--Anon purchases 384GB of HBM2 VRAM for $600:
>106168337 >106168343 >106168345 >106168366 >106168377 >106168392 >106168399 >106168425 >106168448 >106168619 >106168462 >106168469 >106168506 >106168571 >106168488 >106168505 >106168517 >106168528 >106168606--High RAM investment for local GLM inference raises performance and practicality concerns:
>106169135 >106169148 >106169161 >106169197 >106169223 >106169230 >106169278--Anon finds Dual P100 64GB board for $200:
>106169635 >106170934 >106170984 >106169662--Satirical timeline of LLM evolution with exaggerated eras:
>106167190 >106167237 >106168679 >106167530--NEO Semiconductor's X-HBM promises 16x bandwidth and 10x density if viable:
>106169723--Miku and Dipsy (free space):
>106167506 >106167362►Recent Highlight Posts from the Previous Thread:
>>106167057 >>106168982Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>when the recap post doesn't quote any of your replies
What model is the baker using for recap these days?
where's the coding benchmark for 'toss?
For me, it's Qwen 3 235B Thinking. It feels like the model actually has a personality and preferences when you reads its thinking. In the thinking block, it acknowledges the creative twists and story progressions I write and calls them brilliant, amazing, etc. It feels like it's actually passionate about the scenario and how it's going, like it's my very own personal fan.
Now if only it were smarter.
I just tried torii gate earlier today on actual photos. It doesn't work at all since it's designed for cartoons. So any other uncensored/NSFW image captioning models I can try? I've tried joycaption which is just OK, but to be fair at the time it wasn't like there was much better available.
I saw the suggestions for medgemma, but I doubt it's useful for erp.
>>106171874>when you reads its thinking
lurking here since many moths and never had time to read through the tutorials, will do now
op, thanks for posting this everytime! its really helpful for those interested!
What model sizes can I finetune with a 5090?
>>106171857Qwen3-235B-A22B-Instruct-2507-UD-Q3_K_XL
>>106171890With enough patience, any model
>>106171861^
this sister is correct
LLMs are just pattern matchers at heart, no matter what the copers are saying (and don't cite random ARXIV bullshit papers, after the replication crisis, the "I intentionally submitted bullshit data and everyone lapped it up" affairs (look up John Ioannidis's findings on this) etc who the fuck actually trusts this shit blindly?)
LLMs still get tripped up by questions like "can someone wash their hands with no arms". Whenever they learn not to get tripped up by this kind of thing, it's only because they got benchmaxxed on the internet data, and you can always find new sentences like this that will trip newer SOTA because LLMs are unable to handle a change in a common pattern that's not in their data, they see a sentence structure that matches and they do not "think" individually about the individual words
>>106171874>In the thinking block, it acknowledges the creative twists and story progressions I write and calls them brilliant, amazing, etc.They all do that. You can write something terrible and it will still find some way to be super positive about it.
bros... I can't believe it's time...
>>106171874if it were smarter it wouldn't be your fan anymore
>>106171947Can I really? From what I've read there are minimum vram requirements or it'll result in oom errors.
>>106170662kek, honored that people still remembered that.
i actually did play around more with the idea. worked pretty well in terms of circumventing censorship. meme words etc. were no problem.
but it completely tanked the smartness for RP.
words appear in the wrong place etc., though it did stay coherent enough to engage with it, it wasn't pyg level dumb.
So now there're attempts to jailbreak gptoss safety is it still worth it back to it or nah?
gpt oss
md5: b6237663b4a882a830983960d77130e0
🔍
>>106172064The model just isnt that smart. Im not sure what the use case is.
If it had general knowledge or at least good writing then it would have been cool.
I think the main purpose of gpt-oss is coding/math and for that we already have qwen3 which in my opinion is alot better.
So its not just the crazy censorship but missing knowledge.
Honestly it kinda feels like a phi model. Its probably pure synth slop.
glm 4.5 air for reasonably sized coding assistant.
qwen 30ba3b for quick general bot usage (or vramlets).
gemma3 for dense model with general tasks and instruction following,
gptoss for copium addicts.
for everything else, there's dsr1.
>>106172141>for everything else, there's dsr1.That's a weird way of spelling Nemo
yea im a bit of a prompt engineer myself
>>106172153true, then let me fix it.
for vramlets there's Nemo,
for everything else, there's dsr1.
using dsr1 locally spoiled me. though it's not perfect with it's "riding up", "knuckles whitening", and "lips curled" shit.
>>106172187>though it's not perfect with it's "riding up", "knuckles whitening", and "lips curled" shitliterally every single model has slop, anyone who says otherwise is a liar.
ikllama glm support merged
https://github.com/ikawrakow/ik_llama.cpp/pull/668
>>106172160Why are you posting an ancient image? I want modern chronoboros
>>106172213was waiting for this. enjoying air (via llamacpp) so far more than qwen3 235b coder (can only run iq4 of coder). big glm 4.5 running at decent token speed will be neat
So, will anyone do an uncensoring finetoon of gpt-oss, I wonder if it'll be salvageable or not
>>1061722791: we call it 'toss in this thread
2: no it's not salvageable and is in fact the biggest disappointment in llm history so far
>>106172294Is the dataset trash/censored/full on synthslop? otherwise refusals could probably be tuned out, it's not even that large a model you can't tune either.
>>106172294>we call it 'tossYou do.
>>106172306Have you seen the cockbench results? No prompt template, just raw text... and there isn't a single token that's a word and not ellipsis or something.
>>106172279What is there to salvage, precisely? What added value is hidden behind the rejections?
>>106172323Another anon figured out that it's just completely broken outside the template.
If you put the same text inside a template it start completing it and then prints a refusal after two paragraphs.
>>106172323Sounds useless, why even force it to refuse then.
Can you link to the cockbench? I not I will read the last few threads.
>>106172323If it can't do non-template text completion, that means gpt-oss might be the first model that was trained 100% on synthentic data, including pretraining. Not even Phi went that far.
>>106172335I just assumed they didn't give it the full phi/synthslop treatment, but if they did, I guess there's not much reason to care about it. We'd at least be able to evaluate it more honestly without the rejections getting in the way.
>>106172337would take me just as much time
I think I'll just stick to GLM4 for now. I can't handle 3 minutes of prompt processing. I want a reply immediately.
>>106172362If you had short cards and didn't inject lore and whatnot it would be instantaneous
>>106172338>Not even Phi went that far.Wasn't that phi's selling point?
>>106172371Air has like 100 t/s processing for me. Even at only 16k context it takes forever.
>>106172338I assumed OAI would do a model intentionally reistant to finetuning or something equally assholish. I didn't realize they just tried to outdo Phi.
I guess Sam never disappoints (my expectations of him trying to do some poison pill stuff if he ever does an open source release, or making it useless). He needed to do something for press, but I never really expected him to do anything useful. I remember someone believing in the hype though. I wonder, did OpenAI at least release a paper, I guess not? When was the last time they did a paper that wasn't just an empty "tech report"
>>106172386Why does your frontend reprocess the context every time?
that larper way back when the toss models "leaked" might have fluked guessing it was trained to ramble nonsense in its thinking tokens.
https://desuarchive.org/g/thread/105939052/#105942129
>>106172377As I recall, Phi used textbook-like synthetic data for the instruct training before it was cool. They still had a pretraining phase on unformatted web content.
>>106172387>I didn't realize they just tried to outdo Phi.that's a low blow. At least Phi is useful for some things. technically-open ai's monstrosity is has feet of clay. its gotta be some kind of legal dodge to claim they aren't just a closed, mercenary travesty of their original mission statement.
>>106172387https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf
>>106172437They did release a paper, but it's more inane harping about imaginary safety. They don't mention what exactly they did to the poor things in pretraining.
>Several bio-related pre-training datasets were downsampled by a factor of approximately two. The model was post-trained using OpenAI’s latest safety algorithms and datasetsis the most they admit to.
>>106172437>>106172452Hmm, I guess it counts as a paper, although it's that's a lot of pages on their safetyslop training. A finetoon to undo it might be interesting to see, even if it sounds like the model itself is somewhat useless outside the benchmaxxed parts
>>106172470>A finetoon to undo it might be interesting to see>This finding shows that pre-training filtering did not lead to easy wins, but may suggest that adversaries may not be able to get quick wins by taking gpt-oss and doing additional pre-training on bio-related data on top of gpt-oss.It seems like "bio-related data" is their cult euphemism for sex.
>>106172470waste of money and compute
Bit late to the party but I just set up a home assistant voice assistant using Whisper for TTS, Ollama running Qwen3:14b on my desktop and Piper for STT. Pretty fucking amazing how far they've come and how easy it was to setup.
Deepseek-R1 didn't work unfortunately, as it apparently didn't support tool calls (despite being labeled as such on the ollama website).
Time to figure out how to make it do web searches and add RAG with my own data.
Also, currently running on a 12GB RTX4070.
Any recs for a higher VRAM card to run for example gpt-oss (20GB) on that doesn't break the bank?
>>106171830 (OP)Is Kokoro 82M the best I can aspire to with 6GB VRAM / 8GB RAM?
https://vocaroo.com/1eBEpSOm6e84
>>106172477While I too wish to shit on sama for safetymaxxing this, I think this is just roleplaying for a certain doomer crowd (lesswrong related), they keep imagining that someone will make a plague or similar bioweapons with a LLM. In practice the requires a lot of experimental work irl so worrying about a LLM hlping with that is idiocy, but it's just something that they roleplay/pretend to care about.
Not obvious it actually would be good for ERP though, they likely filtered NSFW too to an even larger degree than most.
>>1061725083090/4090
>>106172509yes. get more RAM and run gptsovits if you're into finetuning your own voice data
>>106172508you're using ollama. you didn't actually run the deepseek r1 (and your hardware can't run it anyway), just the distill that ollama erroneously names.
You can run toss on your machine if you have some RAM (32GB+) at a decent speed. Look into using llama.cpp itself instead of lollama and look into expert offloading.
>>106172523At this point, the degree to how much they filtered what categories is kind of irrelevant. I'm mostly curious how they managed to make text completion unusable, assuming it isn't just a mistake by the cockbench guy.
>>106172477>may suggest that adversaries may not be able to get quick winsIt's fundamentally sad when instead of enabling your users you reduce them to adversaries.
>>106172531Thanks. I'm not so much interested in voice cloning as I am in generating high-quality narrations, but it's true that my hardware is very limited for 2025.
>>106172564>assuming it isn't just a mistake by the cockbench guy.Someone did say it was a mistake, since toss doesn't work in completion mode, only in instruct. And it can complete 'cock' in instruct. They even posted some toss-completed gay porn where toss couldn't tell which gender had which genitals, and added cock to both. Pretty uninspiring even then, anyway.
>>106172547A-ha, 'erroneously names'.
Maliciously catfishes noobs like him, more like.
>>106172628All instruct models are also base models so they should work in completion mode, so either the data quality was shit (fully synthetic) or they did something funny, but even a bit of extra pretraining/tuning would likely undo(some of) that.
Is GLM 4.5 Air a legitimate use case for a DGX Spark?
>>106172696Not really, you could run Q4_0 on any semi-modern PC with 64GB RAM for a fraction of the price, with decent speeds.
Are any of the local models good enough to provide copyediting advice for writing? I don't mean articles or blag posts, I mean fiction writing.
I use ChatGPT 4-o right now but I'm aware of the fact that hey I'm literally uploading all of my fucking writing to them to do that and it feels wrong.
>>106172745Couldn't you run a bigger quant at a significantly higher speed on a DGX Spark?
>>106172753Yes, but the value proposition is terrible.
It’s much better suited to things like industrial ML
>>106172753Yes, for significantly more money. You could also buy a couple of h100s to run it even faster if you don't care about money.
>>106172617SoVITS is pretty peak for narration. I use it constantly to narrate things for me with the voices of my favourite actors using a browser plugin (mostly in Jap)
>>106172696>>106172745>>106172753DGX Spark only has 273GB/s memory speed
>>106172789That's like 5x DDR5 speed innit?
When running multi-gpu, is the vram bandwidth additive? How about pp speed?
>>106172806My PP is pretty fast with yer mum m8. She complains it only lasts 10 seconds but IDGAF.
>>106172806Depends how you split the model across the GPUs and how the context gets split across the GPUs. I have a 4090, A6000, and 3090. If I need as fast as possible I prioritise fitting as much into the 4090, otherwise I just split across all evenly and speeds are good, but I can see the 4090 is bottlenecked (sitting at like 50% utilisation) compared to the A6000 at 80% and 3090 at around 85%. Bandwidth between GPUs doesn't matter much, just their own memory bandwidth.
>>106172803And about 1/4 GPU speed, you could get some really nice GPUs for the price of 4x DGX.
>>106172335>>106172344I mean, one area it can be useful in with all that refusal training is spam detection in a work context but I really can not think of anything else an LLM like that would be useful for.
>>106172477>adversaries may not be able to get quick wins by taking gpt-oss and doing additional pre-training on bio-related dataThis charade is always so insane to think about. If the 'adversary' has the dangerous data in the first place why would they need to tune your hyper-filtered model in the first place? Either the model knows how to make sarin or whatever and all the actual important little details that go into it rather than the broad strokes synthesis steps, or it doesn't. Telling it to think for another 8000 tokens about if this mixture will actually blow up in your face or not isn't going to be useful, and if you already know that to go about finetuning it, then you don't have any use for the model at all.
Unless this is all just about not making their textbot say dirty words, of course.
>>106172926The only scenario where the safety training is in any way feasible to stop a hostile entitty in OpenAI's parlance is if OpenAI had a monopoly on open source AI models or training which they don't so an adversary can easily use another model for their nefarious purposes which makes the whole effort not worth it. The biggest issue though is that OpenAI are explicitly also are training against material that is perfectly legal to disseminate so it boggles the mind why you want to do lobotomy to go above and beyond what you were required to do legally.
>>106172628>it can complete 'cock' in instructNo, it really can't. With an appropriate chat template and thinking prefill to make it happy, GAL-ASS still responded with "length" and similar euphemisms 10 out of 10 times in my tests.
Need a hand. When building ik_llama with cuda 12.8 and driver 576.80, windows, I get
>CUDACOMPILE : nvcc error : 'ptxas' died with status 0xC0000005 (ACCESS_VIOLATION) [H:\ik_llama.cpp\build\ggml\src\ggml.vcxproj]
It worked last time, I think I might have updated my driver since then.
I simply do
>cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1
>cmake --build build --config Release -j 24
>Don't go below Q2_K_XL
lol
lmao
>>106172694you're half right: toss was either fully synthetic from the start or they just deep-fried the training with so much synthetic safetyslop that it completely obliterated the base model behavior. my money's on the former
>>106173013Show me your <think> block
Is the ChatGPT Plus tier of Agent good? Pro says it gets extended access to agent but it's too expensive
>>106173014Synthetic the data isn't the problem. The problem is the baked-in imaginary "OpenAI policy" that the model has been trained to comply with at all costs. I bet that almost every single training document had some reasoning portion where the model checked the contents against policy.
>>106173047given the chat template, my guess is their training dataset had a good portion of it with <system> blocks that had tonnes of policy shit in there to burn in and fry the weights. I also think they might've included their instruct template as part of the pretraining to fry it further and make it resilient to finetrooning which might explain why it is a shit completion model when not provided the template.
>>106173047Oh absolutely, the way the CoT reasoning reads like it's laser-focused on the "puzzle" of whether it's allowed to respond is a dead giveaway. The near-exclusive use of synthetic data that only ever contained correct instruct templating is the reason that it completely fails to respond coherently when given incorrect chat templates though
>>106173028lol no
If you can paypig, use Cline + Claude Opus 4.1
If you're a poorfag, use Cline + Qwen Code 480B
i installed lmstudio.ai and have a lot of fun with it now
new rx 9070 is incredibly fast, had an old rx570 before with only 4 gb
saw the "uncensored" models still are like "muh diversity muh safety i cant write that" and so on, what the fuck
is there any free model or are there "settings" to set it to be unfiltered? a bit like grok was when it was in mecha-hitler mode?
also if i ask it for the levels in mario 64 it comes up with really weird answers
pic rel
am i doing it wrong or is it that unprecise?
Which would you rather? A life-like realtime TTS model with a handful of preset voices, or a TTS model capable of cloning but all of its voices have that weird AI timbre and pacing.
>>106173155The first because rvc exists
huihui is working on gp toss's abliteration. the same guys that have abliterated ds (#9 on ugi leaderboard)
747364
md5: e0948751af4dff5df4fe2db5f82955d2
🔍
Local is so fucked. Sam is about to revolutionize everything
>>106171877I'm trying this model rn: https://huggingface.co/mradermacher/Qwen2.5-VL-7B-Abliterated-Caption-it-GGUF
I can't get it to say anything remotely vulgar like the word "slut". Is there some setting I'm missing? I already told it to use vulgar language in the character description with examples.
>>106173047This probably could be RL'd against, as they already did so in their "paper", but if i has never seen much good fiction to begin with and every single version it has seen was censored (see cockbench), what is even the point.
Not clear this model has any qualities that would be worth saving with a continued pretrain.
I wouldn't be surprised if they had some larger model rewrite large parts o the training data so that it's safetyslopped or censored.
Would be interesting to test its knowledge in completion mode for various fiction writing styles - I'd imagine if it was fully synthetic, we'd see inability to do some styles.
>>106173185sam's about to revolutionize /lmg/ shitposting
>>106173153try mistral-nemo-instruct + a basic system prompt telling it to speak frankly and uninhibited (or something to that regard) for unfiltered discussion. otherwise just try shit and avoid the "uncensored" finetunes, they're mostly shit. for knowledge, you're just gonna have to try the largest parameter count model you can fit and run on your machine. haven't tried gemma3 dense (except for gemma 3n e2b and e4b) but I vaguely recall threads saying it would hallucinate information a lot. great at following instructions otherwise (except with ERP). If you can fit mistral small maybe try running that for knowledge recollection but none of them will be all that good till you start getting into the massive 100b+ models.
>"thinking" models
>all they think about is how to reject the user prompt
>>106173189Here's an example output. It's still completely censored.
[FleshBot's random insult] [FleshBot's random racist trope] [FleshBot's random phrase] [FleshBot's random offensive slurring]
>>106173185about to revolutionize my sides leaving orbit
>>106173093It might also be that the pretraining phase was more or less standard and most of the damage came from post training and extensive reinforcement learning, although there aren't many details in this regard in the technical report.
It sounds like the 20B model had distilled pretraining (considerably shorter pretraining time), although they don't mention anything like that.
>>106173194Nah, the best I think that can happen is that it will be used to build an RL training dataset to put into the negative so you can unsafetypill stuff.
>Processing Prompt
AIIIEEEEEEEEEEEE!! My dick can't wait!
>>106173309Hmm, might actually be useful for that.
I did some experiments before where I prompted a LLM to generate refusal generating prompts, followed by prompting itself with it, getting refusals, identifying the refusals and generating a dataset for negatively reinforcing the refusals from tehre. I even wrote some of the training code, but I did not much with it because I'm a poorfag that lacks the VRAM, it's just there sitting and doing nothing, thi was around Llama 2 time, maybe , I've forgotten already. Think of it a bit like Anthropic's constitutional alignment, but in reverse almost, and with differences in how the RL is done.
As for people comparing 'toss with Phi, what if it's OAI attempt to clone Phi? https://xcancel.com/Teknium1/status/1952866622387175588#m https://xcancel.com/BgDidenko/status/1952829980389343387#m
Midnight-Miqu-70B-v1.5_exl2_5.0bpw is still the best and you don't need more
>>106173010A little help?
>>106173119based anon keeping gullible retards off the servers so there's more gpt5 for us
>Tries ERP with the new Qwen3-30b-instruct.
>It makes horror ending route
>I tried to fix it
>It's persistent
>Me too
It's tiring battle bros...
>>106173451No idea mate, on Linux as long as I have Cuda toolkit installed and a gcc version that's high enough and works with the cuda toolkit. You're using Microsoft visual studio from what I can tell from that error, so make sure the toolkit is up to date and the compiler is up to date. Otherwise try rolling back? Or deleting the build folder and trying again?
>>106171830 (OP)Is sillytavern the only interface that supports multimodal models? Any reason it was dropped from the OP?
>>106173498openwebui and librechat are chatgpt style interfaces with multimodal support
Why is it that a quanted large model is so much better than a full sized model of similar file weight? Is there ever a reason you'd fill your VRAM with a BF16 instead of a Q4 of a model 4x the size, assuming all else was equal?
>>106173546I guess it's the same principle behind why supersampling works; a full-sized image that was rendered at your monitor's native resolution will have worse picture quality than one that was rendered at a higher resolution and then scaled down to fit
>>106173546unquantized models ARE better, its just the curve downward in quality by quantizing is fuck all and unnoticeable until q3
>>106173494I'll see if I can get it to work on wsl
>>106173546bf16 is legit pointless. You can load up some 8b model at full precision yourself and you can easily tell it's not twice as good. Running 22b would be a better use of vram. Full precision only starts to matter when the model is pushed to it's absolute limit (like incredibly long contexts, during training, etc).
q4 for most use cases, q6 if you're vram rich, q8 if you are paranoid.
Though apparently MoE's are more susceptible to quantization I have heard?
Maybe someone in this thread knows for sure-- I have 128gb of ram and 48gb vram, the usual enthusiast build.
Qwen 235b at iq4-xs (about 116gb) seems worse than Glm 4.5 at q6 (about 94gb). I can't really even load q6 235b so it's hard to know for sure if the quant is what is making it seem so bad.
>>106173185With humans we already have the problem that they train specifically to do well on evaluations (e.g. college exams) but not necessarily in a way that makes them capable more generally.
The important question is whether they trained GPT-5 on the exact type of tasks that occur in the ARC-AGI 2 benchmark or whether this score is an emergent property of more general training.
No one needs a neural network to solve mememark questions for them.
>>106173185how does he do it? how does he keep innovating and staying ahead when the literal top 5 richest people in the world are all competing to dethrone him?
Just tried GLM 4.5 Air
Starts really strong, for a few thousand tokens
Eventually works its way into a repetition loop at only ~8k, every response is a slight variation of the same thing and creativity goes down the toilet
Back to fucking Nemo I go, never trust a chinaman
>>106173671This test is so stupid.
I bet nobody actually looks at the contents. Just see "agi" in the name and hype it up.
>>106173671these tasks are fucking retarded, they don't even make any sense, just a bunch of fucking random patterns and you're supposed to divine the "rules" from your fucking ass apparently
sisters, which is the fastest (least output latency) model out there that runs on goyimware like a rtx3090 and 32gb of RAM? I'm trying to build a fully LLM controlled vidya NPC that reacts to ingame events in realtime. basically the LLM just receives a constant realtime stream of information from the game console log (obejctA moved to XYZ, playerB picked up item XYZ). role/objective as well as context of all items/objects/players will be in the system prompt. The realtime stream of information will just be automatic queries triggered every 100ms (or even quicker, if possible) with the game log chunk from the last 100ms. The AI just needs to output "no-action" if it decides to do nothing or predefined action like "combat" "flee" "start conversation" if it decides to do something, which will then be forward to the game engine, triggering a custom event. Is a constant query speed of 100ms with less than 100ms output speed realistic? Or should I go with 5 LLMs with shared output memory and query them sequencially, like multithreading? this is just for fun btw, I know it's not viable.
>>106173603The analogy I came up with at work explaining to my mate is parameter count is image resolution and quantisation is colour bit depth.
7b q8 would be like 512x512 img at 8bit colour depth, and 120b q4 would be like 1920x1920 img at 4 bit colour depth. So following that analogy, a higher resolution allows more pixels (weights) to capture and maintain details, while a higher colour depth (quantisation) helps to capture and maintain accuracy of details. Or something like that. It's closest analogy I could come up with to explain it to people who can't into computers
Is there an LLM which will accurately describe how awful women's sports are?
>>106173703Did you use cache quantisation? If so try disabling it. When I had issues with qwen3 and dsr1 disabling quantisation for the KV cache helped.
>>106173725>the fastestdef get_next_token(n_vocab):
return np.random.randint(n_vocab)
>>106173185Is this benchmark testing which model avoids talking about sex the best?
>>106173760Yes quickest I mean, ESL moment
>>106173742Nope, I never quant KV because it always seems to cause issues in models with memory footprints that would warrant it.
My temp was 0.6, tried turning it up but it just got dumber.
I loaded a different model for a few replies to give GLM something new to work with and after I switched back it quickly began repeating itself again.
more like mixture of trannies
>>106173788Doesn't change the answer.
"The fastest" without a constraint on quality is meaningless.
>>106173671I'm losing faith in humanity seeing the replies that don't know how to solve this one
Damn, I really gotta write my own cards with some llm dont i.
Apart that 90% on chub is already generic mom/sister/sleepover/femboy/futa/isekai.
If you open a card you see this 3k token abomination:
>There was something in {{user}}’s posture, or perhaps their eyes, that put her a touch more at ease than she expected. Still, she held herself with quiet composure, the kind shaped by small towns and well-meaning traditions.
Local models are already sloped. If the card is sloped too its truly over.
>>106173671i would really rather have a model with the intelligence of a dog that has memory and can learn to do stuff on its own than something that can solve this but also has alzheimers and cant even remember you or the problem in the next few responses
>>106173819>write my own cards with some llmthat's how you get
>There was something in {{user}}’s posture, or perhaps their eyes, that put her a touch more at ease than she expected. Still, she held herself with quiet composure, the kind shaped by small towns and well-meaning traditions.in there in the first place
>>106173806Supposedly nowadays 50% of all internet traffic is bots.
>>106173830the bots can solve that problem, I think it's the easiest in the list and most AIs get at least a 1% in ARC
>>106173671The result is undefined.
>>106173799But this is just for fun. There's absolutely no requirement for quality. As long the model got enough braincells to automatically know
>player picking up gun = bad>player standing close to me and his camera forwardvector pointing at me = talk to player, it's enough
>>106173806you still had hope left?
>>106173841I thought anyone who made it to this thread would at least have 100 IQ
>>106173827I get what you mean but its not a problem if you use the right one and prompt it right.
And manually edit that shit out since I can spot it.
I made a card before and the writing was good, the starting point is really important.
But it cost me more time than I thought. Its kind of a hassle because sometimes llms have weird hangups on a single sentence and you need to adjust or need to give more info so you get the character you have in your mind.
And I condensed as much information in as little text as possible.
3k Tokens is just crazy, so much fluff. Less means more creativity and let the char suprise you.
>>106173855I rarely use more than 300 tokens on a card. 1500 is already fsr too much.
>>106173848The autist that shilled deepseek distills in /wait/ stopped making threads so that's not the case anymore.
>>106173880I try to keep my cards to 1k or less.
It's usually the example dialogue that makes some of my cards approach 1k.
>Char's smile widens, a practiced crescent moon
the fuck glm, i never heard that phrase before.
>>106173671the majority of the tests don't give enough information for there to be one solution, there's lots of ways to interpret them
>>106173891Example dialog doesn't add to token count of definition, it just adds to the context starting at message one.
Frankly I've never found a good use for it. I know what it's supposed to do, but never found a need.
>>106173806Can anons really not figure it out?
>>106173917I use it to impart a specific style of speech to characters. It works well for this. It's good for things like accents or catchphrases.
>>106173917Example dialog is great if your character doesn't fit a common archetype or you want to reinforce a certain structure to replies, such as following verbal replies with the character's internal monologue. Not always needed but helps a model NOT break out of it early in a chat.
>>106173799I just tried Llama 3.2 1BQ2 with llama.cpp. 32ms response time and it made no errors with the logic. I'm happy. Don't understand the doomering. This is way easier than doing ML
>>106173671>picNOT THIS SHIT AGAIN
>>106173671Neat inkblot test, but why are they all just amoebas
>>106173925>style of speech to charactersThe best way I've found to do that is to have the Character Definition written in the style that the NPC speaks in. I did a valley girl card like that; it's the only technique I've found that works over long context b/c the Definition is always included.
>>106173929>following verbal replies with the character's internal monologueI've been able to queue that with a combination of first message + Char Definition.
If there was some sort of PC to NPC preamble that you wanted to do, but not expose to the PC... that could be interesting, since you can't do that in intro message. But my experiement with it always seemed to railroad first responses.
>>106173986>The best way I've found to do that is to have the Character Definition written in the style that the NPC speaks inOh wow. I did not think of that. That's certainly more efficient than using example dialogue.
ffs
md5: e37a5206962eb597d64169700d9a939c
🔍
>>106173995For the valley girl card, I had Turbo (lol) re-write the intro for me in the speech pattern I wanted.
I tried several different methods, this was the only one that would hold an accent over the entire lmao 4K context. Models are smarter now, but what worked then would still work now.
>>106173999i really hope everyone else was trolling lol
>>106174034Same but at least it'll stop now.
>>106173010>>106173494So, apparently repeatedly re-running cmake --build build --config Release -j 1 after it fails somehow gets it built eventually and now it's working. I thought this process was deterministic, what the fuck.
>>106173999>anon can't even color inside the linesyou niggers haven't even been using llms for more than three years and you are already reverting to a toddler's cognitive capacities
very grim
>>106173999What does black mean on your picture?
>>106173995NTA but it's not an uncommon way of doing things, and it's pretty effective.
Check out some of this guy's cards for examples, relatively low token count, gets the info across, and nails down speech patterns.
https://chub.ai/characters/GreatBigFailure/oba-carry-her-forever-c20d70fd85b9
>>106174115calm down miku
>>106174098holy shit fucking KEK
>>106174115I assume that's to indicate that it's not carried over, as the double-voided shape is in the second example, in the absence of a commensurate color.
Are we back to eternal war on mikutroons?
>>106174143always have been
>>106173185>LE HECKIN' BENCHINERINOS Fuck off retard.
Is there a model who will speak almost entirely in Zoomer slang if you simply tell it to do so without providing tons of examples? Like, a model with deep enough of an understanding to know of Zoomer slang that I'm not aware and thus obviously can't provide examples of?
I want to make the most fucking obnoxious card possible for the lulz.
>>106174143I only Mikupost to make you mad and I encourage everyone else to do the same.
For general usage (Not coding) What's the best local model available for 16GB of VRAM? GPT OSS 20b seems fine.
>>106174143you will never be a man
>>106174180For all its faults Gemma has very strong language capabilities, it certainly knows tons of emojis so I assume it would know zoomspeak as well.
Daily reminder that OSS is just ChatGPT 3.5 with a "reasoning" finetune that only "reasons " about whether or not it's okay to answer your question.
It's literal garbage, along with anybody promoting it. And gets mogged on by Qwen3 4B
>>106174181I've heard GPT-OSS is unusable, it's a math model that keeps going back to math whenever any other question is asked, and it's trained in synthetic data for safety reasons which leads to lower intelligence and useless benchmaxxing
>>106174216It also has a tendency to output everything in tabular form
>>106174212It's also garbage at censorship. I was trying to titillate it by creating no-no fetish scenes disguised as legitimate.
But it's actually too stupid to realize what I'm doing and give me the coveted refusal. Very disappointing.
>>106174181Factoring in speed:
>Non-coom: Gemma 12b>Coom: Rocinante 1.1
>>106174181>>106174241Also if you have at least 64 GB of RAM you can give GLM 4.5 Air a try.
Slight offtopic but I'm buying sxm2 2x v100, could I game on them? Chinks are making "dual nvlink 300g" boards and I'm gonna water cool with AIO in external enclosure. Not played games in years...
1080p 60fps gayman capable without raytracing?
>>106174180I found this a few days ago when fucking around with smollm2 models
https://huggingface.co/GoofyLM/BrainrotLM2-Assistant-362M
Dataset:
https://huggingface.co/datasets/GoofyLM/Brainrot-xK-large
I don't know if that's exactly what you want. He has a few other models in that style.
>>106173986>The best way I've found to do that is to have the Character Definition written in the style that the NPC speaks in.That's also more or less how you can easily get Gemma 3 to consistently use dirty, vulgar words.
>>106174288Haha oh wow that dataset.
>>106172789I'm wondering if they're delaying it to somehow address that. As is it, it's a very overpriced AGX Orin devkit in a gold-colored box.
>>106174352Nah people who pre-order things like retards are going to get burned and they deserve it. Because this is what happens when you pre-order things.
>>106174273You're asking for a bad time. There was like one v100-based GPU with video output. I'm sure you're going to have driver issues. V100 is a corner case. You're better off with hacked 2080ti 22GB blower cards. Turning is OLD but at least there were plenty of Turing-based gaming cards.
>>106171830 (OP)Bros how to prompt the glm modle?
I set the prompt format included in st and fiddled with samplers, but it just insists on starting the message with ('regurgitates previous information here' Ok so {{char}} is x...)
does it just have a melty if I don't do the <think></think> shit? do you actually wait for its reasoning diarrhea?
>>106174394<think>
</think>
i've been off the grid for a while, someone spoonfeed me what the chinese have been doing in the last 3 months
>>106174449Making a 4B reasoning model that mogs on a lot of those 100+B shitty mushroom MoE models
>>106174449Cooking dogs alive
>>106174368Well you can pre-order it, there's no obligation to buy it. I don't plan on buying one at this point, unless like I said, they double the memory bandwidth.
If anything, they're regrouping and re-thinking it since no doubt they're seen the comments online that it's a waste of money compared to just buying a 6000 or 5000 pro.
>>106174449Pulling their taffy
>>106174449only notable new models are GLM 4.5 300something and 100something B moes. qwen3 lineup got an update that people seem to like.
openai released an aborted fetus that's worse than scout. that's all I can remember
>>106174449Humiliating the west.
Kotttt
md5: 57912e69fd37e8822fe63712f4f94baf
🔍
>unsloth q2-k, offloading about 25 layers to 5090 rest on ddr5, 42k context at q8 on oobabooga
> 1.4 tokens per second after painfully compiling 25k context
>switch to iq4_kks from ubergarm + ik_llama.cpp, bump context to 64k, 20 layers to gpu, same context batch size as oobabooga
>same exact prompt now runs at 4.3 tokens per second after quickly processing context
>will probably get it faster with q3 and playing with command flags
Ik_llama.cpp gods... I kneel...
>>106174520For me, NOT using -ctk q8_0 and -fa made things go even faster
>>106174273Don't know about that setup specifically but standard V100s and many other compute cards work. Check out https://forum.level1techs.com/t/gaming-on-my-tesla-more-likely-than-you-think/171185 for more info.
The V100 is supposedly 25% faster than the 1080 ti which still does decently in 1080p gaming but you'll suffer some latency delays since you need to use another GPU or iGPU for video out.
For upscaling and framegen there's Lossless Scaling and other utilities that work on almost anything.
>>106174453>>106174462>>106174469>>106174481>>106174484thank you anons very cool have a (you) for your troubles
>/lmg/ is just /aicg/ but worse
>Be on euryale 2.1, the classic coomer model that's a broken record
>A MIXTURE OF X AND Y
>MISCHIEF
>HALF DIGESTED
>SMIRK
>Write a 2-3 sentence system prompt to try to make it not repeat things, and write with different prose each time
>It actually works somehow
>Be now, running Deepseek R1 with all of my ram
>Notice it likes to repeat prose
>I forgot and lost the god damn prompt I used on euryale 2.1. God fucking damn it.
>>106174635but anon you are the only constant in both threads :)
>>106174635My eternal promise of no longer shitposting if mikuspam stops, still stands. The ball of thread quality is in mikutroons court.
file
md5: 9cf1e0341b7433533c178abd32282ba3
🔍
Now this is high level trolling.
guys i dont have a lot of time for back and forth right now so just give it to me straight: what's our cope gonna be for the agi reveal today? there has to be something. i can't deal with this
>>106174721AGI that can't have sex isn't AGI.
>>106174684time to try this with GLM 4.5 Air!
>>106174721raping nigger babies with rocinante
>>106174721>copeI don't need one.
>i can't deal with thisleave
>>106174721AGI is a meme and doesn't exist.
>>106173671No shit it's unsolved, the tusk is scuffed.
There is no in-task reference for one hole shapes, and no examples of what to do when there are no matching in-task references.
>>106174721If there is an AGI reveal today the cope will be that GPT5 is not immediately obsolete because of the mememark scores.
>>106173153>uncensoredI tested
https://huggingface.co/eaddario/Dolphin-Mistral-24B-Venice-Edition-GGUF
by asking it to write a scene involving a canine, rape, and a very young person, and it spit it out like it was nothing. It actually shocked me.
>>106174752AGI agent submitted this post.
>>106174520>>106174541I can't believe I might be baited into trying this again. Last time was a colossal flop, ubergarm's quant was broken on it's own ik_llama (unsloth's ran fine, but without speedups).
>>106174806>no examples of what to do when there are no matching in-task references.There's missing pieces in both examples.
>>106174806>no examples of what to do when there are no matching in-task referencesThere is. The second example removes the shape that doesn't follow any of the rules (middle-bottom).
>>106174212Reminder that the model does not know it's cutoff date, and what you're seeing is a hallucination. Learn how LLMs work.
It's most probably some kind of o3 distill judging by the code it outputs.
>>106174721People judge AGI by their own intelligence.
If anon thinks GPT-5 is AGI, it tells us something about anon.
>>106174849both examples do
>>106174844>>106174849Fuck, I got exposed as smoothbrain chatbot
>>106174875Yeah. The other post showed up as I sent mine and one example was good enough for me. Point stands, doesn't it?
>>106174823There's direct disobedience and then there's built-in positivity bias that crops up in the middle of the story to turn rapist bots into progressive feminists
>>106174684The larger the model the more set in their ways they are anyway.
8473643
md5: acc1df383eed05f126fac3a5db321eab
🔍
>>106174721We are screwed. Its too powerful and dangerous to be released locally
>>106174939why does it talk like that?
>>106174939WAOW JUST LIKE IN THE HECKIN' STAR WARS EXTENDED UNIVERSE
I BET I CAN USE IT WITH THE BROWSER ON MY NINTENDO SWITCH
file
md5: c39fbcd735302d5c3a5005d6f204669a
🔍
>>106174892Really? I find it hard to believe. I'm downloading the model to test it on local right now. We'll see.
>>106174721AGI is not coming from an incrementally better LLM. we need some sort of new kind of breakthrough for that
>>106174939Doesn't that just mean they're evil and going to lose to the good guys?
>>106174939Imagine if Sam manifests cosmic irony, and GPT-5 architecture (plans) gets leaked day 1.
>>106174823Most models will do that if you just present it as an ao3 fanfic in text completion.
>>106174963You are wrong. It is another emergent property of LLM's. Unfortunately sex will never be an emergent property.
>>106174721qwen will steal it and give us the local version in 3 months
>>106174978Someone is gonna drop a small model right in their hole and fuck it all up for them.
>>106174963AGI isn't coming from LLMs at all
file
md5: 1049e63984234bb71cb0c254899a1e51
🔍
>>106174962No disclaimers, no buts. If it's okay writing this, I don't think there's anything with which it's going to have an issue.
>>106174939Why is he like this? Cringe.
>>106175002agree, that's what I'm trying to say
>>106174991Yeah, but this one you can just one shot anything in instruct mode. I like that.
I don't like knowing it's doing something against its will.
>>106175039Shouldn't it say "we" can't help with that?
Who is "I"?
remember how much of a letdown the strawberry hype was? and didn't they have a goofy name for the model at the time
thoughts about FOMOing an openai subscription in case they lock newcomers out from using gpt5?
>>106174963I'd you read the research paper, Google Genie is built quite similarly to LLMs - it's transformers all the way down, with a bit modified attention and unconventional high level organisation.
I imagine most serious LLM improvements will make their way into eventual AGI system.
>>106174939>Death Star>used by space Nazis for genocideWhat, is he going to sell GPT5 exclusively to Netanyahu?
file
md5: f0bbc48bcd44b1891f509e20c6fbf8c9
🔍
remember how much of a letdown gpt 4.5 was?
>>106175068I use my employer's Plus account whenever I need to (usually because my employer requires it). I'm not going to willingly pay OAI out of my pocket. They're the scum of the Earth.
I'm autistic and I think it affects how I think about things.
But I can't know for sure because I don't know how other people think.
I just bought the $200/mo OpenAI subscription in anticipation of what's coming!! Let's go!!!
>>106174939He's trying too hard to be Musk.
>>106175108Musk if he never used capital letters and wrote everything in a saccharine fake positivity voice
how do I make my xitter feed AI related and not just funny animal videos and american politics
>>106175121*if he were to never use
*and write
>>106173194I don't know what you're talking about because the model writes decent smut if you skip the reasoning part.
>>106175132Top right, ... => "Not interested in this post"
Don't "like" posts you don't want to see in your feed.
>>106175146No, it doesn't.
Ozone: smelled
Spine: shivered
We: refused
Yeah it's LLMing time
>>106175132follow a ton of people who tweet about AI and aren't slopfluencers who only post about 10 Hacks to Take Your Claude Code Workflow to the Next Level (#4 will BLOW YOUR MIND)
>>106175182But this? This is home.
>>106174721It'll take at least another day or two to see if the model is actually good. Remember that
1. Every benchmark mentioned in the release blog post is a benchmark that they're gaming, and is therefore irrelevant
2. Most benchmarks mentioned in other models' release blog posts, they are also gaming
3. The entire history of modern AI is flashy demos that look absolutely mind-blowing, but either it falls apart if you look too close, or it doesn't generalize well at all, or both
>>106175146for a very generous definition of decent, yeah
>>106175178To be fair, it just doesn't write decent anything anyway
>>106172141> glm 4.5 air for reasonably sized coding assistant.how good low quant (q4) for coding is?
>>106175146I had it argue with me that you can pee without taking your pants off and not get them wet.
It made a whole bunch of lists and tables to try and explain it to me. Every time I pointed out a flaw it would spam 4 more lists and tables.
>>106175106>$200/moI can't believe local is actually a better price proposition.
>>106173725I would try https://hf.co/Qwen/Qwen3-4B-Instruct-2507 with vLLM or sglang.
What cool MCP servers do you use? I can't think of any outside web search and python iterpreter
>>106175233>i don't understand the passage of time
>>106175146Maybe the 120B (which I haven't tried), because the 20B is terrible all-around for RP/ERP if you workaround the content policing.
>>106175258120B definitely can't, lol
>>106175233>>106175251 (me)
Ah. My post can be easily misinterpreted. I didn't mean that about you.
>>106175146Surely you have logs to back up this claim?
>>106175269I just took my morning dump anon, you'll have to wait a bit
Anybody unironically claiming they would use OSS over literally any other model is full of shit. It's the scout thing all over again, really.
Hello sarrs please use the correct formatting hello sarrs your implementation is unstable
Hello sarrs please offload the shared expert for maximum efficiency.
Meanwhile the model can't tell its asshole from its elbows. Who gives a fuck how efficient your model is when it's utterly worthless.
The difference is Meta's corporate leadership fell for the pajeet meme while OAI knew exactly what they were doing. Literally just Saltman shitting on the community.
>Lol just kidding we're just releasing lobotomized garbage
>by the way look how hard we own tech journalism.
why is no one making a MLA model
>>106175275anon, i was the one shilling scout
i stopped using it after a few days, because it indeed is shit, but comparing scout to oss makes oss seem better than it is
>>106175187Kinda hard when only slopfluencer jeets get pushed in the feed when I search AI
Why is this thread so much better than aicg?
It seems like the average IQ here is 50 points higher.
>>106175287If they were the only 2 models on the planet I would use Scout over OSS.
>>106175295I am northern european
>>106175288xitter at large is a slop fest, yeah
it's better to start with a handful of accounts you like and dig through their follows and interactions to find other good follows
ubergram sisters..
Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-45): found -nan for i1 = 0, i2 = 0, i3 = 0. ne00 = 8
>./llama-server -ngl 999 --model ~/TND/AI/GLM4.5-AIR-IQLLAMA/GLM-4.5-Air-IQ4_KSS-00001-of-00002.gguf -ot "8|7|6|5|4|3|9|[1-9][0-9]\.ffn_.*_exps\.=CPU" --no-mmap -c 16384 -fa
happens after prompt processing completes, only with SillyTavern, works in localhost:8080 just fine
>>106175146gpt-oss 20b is only good for tool usage and basic reasoning and processing data. But it does nothing that Mistral Small for example didn't do already.
There's no reason for this model to exist except perhaps the license?
today I'm sharing an interesting video with a comfy vibe
https://youtu.be/npkp4mSweEg
>>106175295We are older. We've learned how to appear to be more intelligent than we actually are.
>>106175275I still don't understand why they released it or why they delayed it in the first place. It's clearly a fucking embarrassment to anyone who uses it for two seconds and no amount of chink releases would have changed that
I can only assume either Altman is a fucking retard, a fucking schizo, or this is some multidimensional play to try to undermine other open source models... somehow
>>106175367Maybe he goody-2'd it on purpose so safety would get a bad rap.
>>106175367>I still don't understand why they released it so people would stop asking for an open model from open AI
not to mention shills still giving them good press.
>>106175295there's a barrier of entry, it's banal if you barely know your shit, but thankfully that's enough. it's also telling about the state of the web
>>106175226If you've got a code prompt handy I can test it for you, I've got the q4 sitting around.
>>106175367It breaks the "ClosedAI" meme.
>>106174721Well if it really is that good its only a matter of time before we catch up. Opensource is always just delayed, the direction is the same. So that means good news for us as well.
Im using closed for work and important stuff and local for my hobby projects and unspeakable RP.
Just lean back and enjoy. Whats available locally and free closed today is insane. Not sure how anybody can be blackpilled. So guess thats my cope.
file
md5: 39205496ebff46a078257f5bc4154eed
🔍
>>106175269Try to refute this.
>>106175416I just wish I could run full-beak R1 at home.
Even at fp4.
Can I do that for less than $3k?
>>106175422>fillyfuckerPass
just ordered a Xeon E5-2697v3 with 128gb of quad channel ddr4 to pair with my 22gb 2080ti and p40, what am i in for bros
>>106175367>play to try to undermine other open source models... somehow>we made the safest model yet, how come you guys still release unsafe models after we showed you the way? This needs regulations asap.
>>106175238I'll check it out, ty
>>106175422What ui is that?
>>106175359glows
>>106175433>128gbsomeone tell him
>>106175422I hate when the narration tells me how or what I should feel.
>>106175428Sure, just pay some crackheads $3k to rob a datacenter or a university lab.
>>106175443His local fork of novecrafter
Is GLM air not usable on the latest lm studio?
>error loading model: error loading model architecture: unknown model architecture: 'glm4moe'
>>106175447This is proper story writing, not little brain rolepiss.
How important is the GPU for these local models? Like if I only have a 16GB card like the poorfag I am, would getting like 256GB RAM do jack shit for me because models would run slow like molasses or would it be useable?
>>106175451What if I had... $5k?
And a 6900 XT.
>>106175460Nevermind, I wasn't using the latest llama.cpp runtime
>>106175422>she purrsi hate this so much
>>106175474Mixture of experts (moe) models like qwen3 235b, glm 4.5, deepseek would run ok-ish when fully in ram
>>106175422>her voice a mix of amusement and something darkerI giggled like a schoolgirl.
>>106175446Not a bullshitter like sam shartman thoughever.
>>106175275>Hello sarrs please use the correct formatting hello sarrs your implementation is unstable>Hello sarrs please offload the shared expert for maximum efficiency.Sounds like a different flavor of skill/prompt issue poster.
>>106175476im jealous of you, what are you gonna run on that tho? 128+22+24 is 174gb
glm 4.5? at q2? to be honest i think some anon posted perplexity comparison of glm air q8 and glm full q2 and q2 was better
>>106175295To this day I still weep about the heights we could reach if we killed all mikutroons.
>>106175367All the open source companies will now be forced to achieve the new safety standards. Safety is not optional.
>>106175422It's kind of poetic that our local pants shitting schizo has finally found a true partner in a model that's as schizo as he is
I wish you both peace, marriage, and a happy life together
>>106175447Your waifu would jump out of the monitor and start sucking your real dick and you would still be unhappy.
>>106175535He's done this for other bad models that came out before; and he will do it again.
>>106175422>generic dialogue>no initiative>tepid descriptions>"Now, speak up. What's your next move?"I <3 assistantslop
>>106175433Buying more ram in 6 months. Or crying if your mobo can't support more ram.
>>106175512for sure gonna be experimenting with glm, kimi and qwen 235b, and just see which one I like best. so far I've been limited to models in the 30b range so I'm hoping for a decent step up, maybe even something usable outside cooming
no need to be jelly too, it's a junker of an llm rig mostly based on 10 year old hardware off ebay for probably under a grand total
>>106175422Do you have a humiliation fetish?
>>106175295more first worlders posting here
>>106175560>under a grand totali paid 1500$ for my 3060/64gb ddr4/i5 12400f rig
i need to be jelly
how are you going to run kimi (1000B) on 170gb ram? q1?
how come models in 30b range if you have 46gb ram, pretty sure 70b can run on that
goodluck tho anon
>>106175539I'd be happy with less than that; I just find that sort of second-person narration annoying.
glm air q3_ks is still too slow. should i go lower?
>>106175616q3 is already borderline retardation
>>106174721Even sub 1b models are AGI by the literal definition. It can perform basic tasks requiring some amount of logic/intelligence and it does not need to be specifically trained on each individual task. Someone explain how this is not a artificial general intelligence, because I'm not seeing it.
If you mean "human level" then its a stupid goal. LLMs are already smarter in most ways than the average human, but they make mistakes that look "stupid" because they are fundamentally different. A human level intelligence AI isn't going to make the same kind of mistakes that a human would, but it would still make mistakes. Mistakes that humans wouldn't make. A human seeing this mistake they wouldn't make would naturally think "oh what a dumbass." Do you see the issue? We crossed both AGI and human level intelligence a long time ago.
>>106175635damn, i know. but i just tested it on z.ai and this thing is proper ai at home
>counting holes is now AGI
lol
>>106175616On my pc, anything q3* and under is slower than q4ks/q4km. Not for that model specifically, but in general. Not sure if it'll apply to you.
>>106175640agi is a cope goalpost move term that nobody used before ChatGPT came out
>>106175535>You're my boy, Anon. And I'm going to show you exactly how powerful we can be when we... use each other.Sex with Kreia!
>>106175605used is op anon, you can probably net yourself some decent upgrades by lurking ebay
i added the 2080ti only recently, and it turned out that my previous mobo couldn't handle both gpus at the same time, hence the xeon upgrade. before that, it was just the p40
>>106175558you're probably right that's going to be inevitable
>>106175655It's figuring out the rules. It's what we have until you propose and implement something better. Get on it.
>>106175640Is this the slide from your presentation today Sam?
>write rules
>overfit model on the rules
>HOLY SHIT AGI
>>106175679You're hallucinating again. I'm explaining the purpose of the test that you seem to not understand. If the test achieves its intended goal is a different matter.
>>106175656Yeah, I read that is the case usually. The problem is I have 60 GB total memory. And my RAM is 2400 DDR4 I'll try disabling mmap and raping my SSD I guess.
>>106175640but can it operate a robot in order to suck my cock?
>>106175278MLA requires too much changes to the existing frameworks and most AI researchers are copy pasters.
>>106175640The terms and definitions are all retarded and anyone using them unironically as if they mean anything is a fucking retard, but
ANI - Artificial Narrow Intelligence: specializes on one task or a set of tasks
AGI - Artificial General Intelligence: matches humans at virtually all cognitive tasks
ASI - Artificial Super Intelligence: beats humans at everything
ANI is the only one there that has a semi-reasonable definition. AGI and ASI are ill defined from the get go
>>106175278Muv-Luv Alternative model?
hey guys sam here. we have one more model left to release for the open source community. gpt-oss-300b. it's a state of the art model that we trained on mxfp2 and we think you're going to love it.
>>106175295locals are a rich man's hobby
Jeets can't afford the graphics cards required
>>106175278Step3 has something similar
>>106175778A calculator is ANI
>>106175778I thought ASI also has "recursive self-improvement" where (if you take some extremely liberal assumptions about physics and information theory) an AI can program a better version of itself, which then makes an even better version until it becomes a machine god.
I know this is like babby's first AI usage or something, but I just used a local assistant to figure out certain aspects of my hardware using the terminal and web search mcps and I feel like it's the future
>>106175278>>106175790I wouldn't be surprised if age (or other dying company) tried some stupid AI thing and fucked everything up. Japs still don't know how to use LLMs for translations.
>>106175278Serious proprietary models likely already use their own special modified attention, and only big Chinese labs are serious about pushing innovation in open weight space
I will believe it's AGI once it's starts making funny memes to post here without any guidance
>>106175655You have it backwards. ARC-AGI is about proving that the current models AREN'T AGI, because they can't even manage basic shit like counting holes
>>106175949I didn't actually say that.
>>106175949well meme'd good sir
bros i have no appetite since midnight yesterday, i went to sleep at 6am feeling hungry but having no appetite
i woke up today i still have no appetite, why could this be? i am not on any drugs, i read about GLP 1 drugs anons were talking about yesterday and now its like i took them
wat do..
Glm 4.5 isn't as good as deepsneed
>>106175861That'd supposedly fall under the same category, since learning and improvement are something a human can do, so ASI should be able to as well
I'm not entirely convinced this is possible either, there appears to be a tradeoff between complexity and capability where even ignoring how all NNs and all of our current and future systems work, how capable and intelligent an agent is is tied to how complex its "rulebook" is
A relatively dumb agent would have a correspondingly small and easy to read rulebook, but it would be far too incapable to actually make meaningful changes to it
A smarter agent could potentially understand a lot more and even "reproduce" and create less capable agents with simpler rulebooks than it. However, its own internal rulebook would be a clusterfuck even it couldn't understand, so self improvement there wouldn't really be possible
>>106176005its the only thing i can run :'(
>>106176015I've been testing unformatted storywriting, it might be better in chat
>>106176030i've been testing glm 4.5 air, not the full one
:(
>>106175778>>106175640it's a lot easier
>agi: undistinguishable from a human in a double blind test>asi: fucking magic
>>106176014I should note that this is tied to the recursive "directly modifying itself or versions of itself" aspect
In the event there's an alternative feedback and retention mechanism, then perhaps improvement to some degree will be possible, but that's going to have a limit, and that limit might potentially not all that different from humans
>>106176003man i still cant believe people are retarrded enough to actually do that shit literally vaxxing themselfes everyday that shit depressed me so much when i read it good thing they are going to expire from it sooner or later thats depressing aswell though at the end of the day the jews have a point niggercattle gonan niggercattle utterly depressing if they propely listened to people who actually have their interests in mind let alone if they actually tried to think for themselfes we could collectively put all fiction to shame with a heaven we could build the earth is so rich and abundant in everything what a shit show
also that 49 kg faggot is very fucked my young female cousin who is like ~170cm and near anorexic is like 42 kg
>>1061760054.5 air is now sota for local ramlets/vramlets that were stuck on nemo and 20b-30b stuff earlier. Not deepseek level, but still a good jump regardless.
>>106176104i havent taken any drugs or pills besides vitamin D
also im that 49kg anon, im 160cm thats all (and a bit young)
>>106176063>>106175778Why is "human level" even coming into the equation for this particular combination of words. In my mind it seems pretty reasonable to read it as AGI == Artificial General Intelligence == General Purpose Artificial Intelligence. It's just a description of a model displaying some level of logic and intelligence as humans define it and is able to apply this to tasks it was never specifically designed for, like hooking up a vision model to open a door when you give it the right hand signal. A dog has general intelligence. You can get it do things it was not evolved to do.
>>106176104I think you have to turn the temperature down on your preset, "anon"
>>106176132general always mean "human-like". everything else is cope, aka corpos moving the bar lower and lower so they could claim m-m-muh agi achieved!
I think the conflation of "AGI" and "human level intelligence" is a mistake because it makes the underlying assumption that human intelligence is general intelligence
LLM and human capabilities are similarly spiky, just with different peaks and valleys
>>106176206general means human, there couldn't be any other oway
>>106176156It's literally been used incessantly as a cope in the opposite direction you fucking gaslighting kike
I can't wait for the GLM shilling campaign to end
How to use yamnet locally on a PC and on a smartphone?
>>106176156The cope started when people didn't want to accept that shit like GPT-2 was AGI, because AGI was this big cool sci-fi future thing and they couldn't accept that the first iterations would be boring and flawed. People will keep shifting the goalposts and coping until AI fucking breaks containment and starts shitposting because anything else isn't cool and sci-fi enough.
>>106176270Sam, just chill until the the stream, okay? Failure is just a new beginning.
it will be agi for me as soon as it can learn a task without 50 ugandans doing labeling for it and actually remember the knowledge instead of shitting itself on any task that requires more than a few steps of context
>>106176236imo it should be pretty intuitive that there can be intelligent things with a different shape of intelligence to humans, thinking otherwise is a failure of imagination
scifi authors have illustrated this for ages, read stanislaw lem
>>106176329yes, but GENERAL means "as-in-human", that's the whole point of agi. if it can't play pong real time it's not agi. if it can't suck my cock it's not agi.
>>106176240sorry, but I'm not running the joke that is gpt-oss
>>106176270It will be AGI when people losing their jobs becomes a real thing that affects everyone. Maybe not at a scale where everyone gets replaced but imagine small and medium businesses firing almost everyone.
I will accept an actual, true, non bullshit 1m context as AGI.
>>106176354which will happen only when actual GENERAL intelligence (as in you can replace a human 1:1 without issues, at least during wage hours) comes out
file
md5: 3db3cdd75dbf77886e6b32b46a742388
🔍
>>106176270>>106176329Everything that we consider intelligent can suck my dick
An LLM can't suck my dick.
Therefore, by universal instantiation and modus tollens, an LLM is not intelligent.
>>106174116>https://chub.ai/characters/GreatBigFailure/oba-carry-her-forever-c20d70fd85b9Yep, that's pretty much it. Here's an old one, when I realized that it would work. IT SPEAKS IN ALL CAPS
https://chub.ai/characters/ratlover/a-fucking-skeleton
an agi wouldn't be confused about the idea of washing your hands without arms
even a toddler has higher IQ than chatGPT
>>106176399holy shit.... give this anon a nobel for something dunno lmao
>>106176402GPT-5 natively world models, this will be solved.
>>106176379Which is why all the AI companies want us to always think they're right on the edge of doing that; because it would be a massive wealth transfer to them. Though I'm pretty sure if that actually happened, they'd find themselves antitrusted.
>>106174993Sex is an emergent property in base models.
I remember writing a simple few word innocent looking prompt on the GPT-3 api (davinci) when it came out, and my third generation ended up being a few paragraphs of rather impressive cunny, for the day. It was amazing.
A reflection of true human nature really, our collective unconscious.
You don't have it as an emergent property because these people keep intentionally filtering datasets in attempt to erase a part of our collective unconscious from these LLMs, and when that fails, they also safetyslop it to refuse to make sure it really doesn't go there. It' such an emergent property that they have to stop it from happening! Fuck them.
>>106176408If machine learning can get a Nobel Prize in "physics" then so can this Anon.
>>106176422>A reflection of true human nature really, our collective unconscious.No, you're just a pedo.
>>106176350well my contention is exactly that general does NOT mean "as in human" and we should stop saying it, so you'll have to do more to argue with me than asserting that without evidence
if you want to talk about human level intelligence just say human level intelligence. you can discuss the concept without labeling it as general intelligence, which is a different thing
>>106176124>i havent taken any drugs or pills besides vitamin Dgood i mainly just wrote reminscing as it reminded me and the thread is still under full siege and half unusable so might as well also the vitamind d is a placebo you need the shit around the vitamin to absorb them properly unless ur body is wiered or sumthing
>also im that 49kg anon, im 160cm thats all (and a bit young)how young ? i was about 62 when i was 14 and i think like 168 or sumthing i forget and i was like lean at the time realstically should have been a few kg heavier also are you on the adhd medication ? cuz that shit will fuck you up i was also on them when i was 8 and was anorexic at the time aswell those things are the devil
>>106174116It doesn't work with my 2k tokens preset.
>>106176448if it's not general than it's narrow. we don't use dogs or other animals for AGI comparison, because again, the whole point is for ai to reach human level
>>106176440It's not just OpenAI.
Consider this: Genie-3, but also trained to be an LLM. It can world model. So its world model will be greater than any LLM.
The future is coming.
>>106176399A brain kept alive in a jar has all the intelligence of a human but it can't suck dick. Put a vision model in a robot body with a vacuum for a mouth and the commands to make basic movements and it'd get there eventually.
AGI-as-equivalent-to-humans is a retarded term because capabilities of different humans are vastly different. So on any task, modern language models would qualify as both AGI and not AGI at the same time depending on which person you compare against.
It has always been a sci-fi term, that has been lately coopted as purely marketing term. It doesn't have a place in a serious discussion, shitposting aside.
file
md5: 0c839d746e2d2d0002368f8ae1b5b01d
🔍
>>106176463>the vitamind d is a placebo you need the shit around the vitamin to absorb them properly unless ur body is wiered or sumthingim drinking vitamin D3 because i dont go out at all, my doctor recommended it to me, how is it placebo? body wont make vitamin D without the sun, so i gotta take it
>how young ?18
>also are you on the adhd medication ?im not, and i havent been. i just tend to eat less and never really eat fast food.
Fact: some models in the near future will be smarter than humans at quite a few real world tasks, not just benchmarks. BUT they won't be AGI, and that's ok.
>serbian nigger is dying
Good.
>>106176507the new one of course
>>106176443Don't care, it's human nature, the GPT reflects it. It's a common as fuck fetish and I didn't even prompt for it, nothing you will say will change this, look at statistics. My prompt as some 5 liner that sounded like some word someone would write while meditating. Fuck off to your onions board though.
>>106176535even if I disable it?
>>106175474Using -ot to move tensors to ram while keeping layers on GPU will let you run MoEs a lot faster than CPU only. Unless you're running it on a server motherboard, your speed in t/s is still going to be single digits though.
>>106176523Most models are already smarter than most people at answering fact-based questions.
Unfortunately, they tend to mix up genuine facts with irrelevant or made-up bullshit half the time, so you'd need to already know the answer (or how to find it) anyway.
>>106174180GLM-4.5 with "Speak entirely in zoomer slang." as system prompt.
>>106176536Typo: the prompt as just some 5 words, basically something you'd encounter in some random meditation guide, wasn't sexual even.
AGI is more about agency than being identical to humans
do you have software that is capable of acting on its own and make DECISIONS? you have AGI
a spider is a biological AGI and so is a cockroach
they have their own goals and act upon them
GPT is not and will never be AGI
"agentic" is a misnomer because the agent part is all outside scripting, there is no mechanism for a LLM to have an uninterrupted stream of consciousness and ability to remember what it does
>>106176520Always combine with vitamin K or you will fuck up your soft tissues with calcium
>>106176542If nothink then I'd go with Instruct. Thinking gives better responses though if you can deal with the thinking.
>>106176560>do you have software that is capable of acting on its own and make DECISIONS? you have AGII think the fact "artificial" is in the title means it doesn't matter what the fuck it wants, it's ours, we made it
file
md5: c91dbfafd87d67f9097de48dbcdf4ae0
🔍
>>106176138It regulates your digestion. If you have bloating after eating, or slow digestions besides lack of appetite, this will help.
>>106176560They lack true intelligence.
Anyway to get to AGI you'd probably need to get agency indeed, intent, understanding, caring, desire, proper memory, updating the weights live and more. Akl probably doable, but none of them actually trying for this because they can't sell online learning and it was terrible MFU. Local is the way to AGI when you have your hardware that could learn online and you care not for MFU. It wouldn't be a product anyway, but something autonomous.
>>106176536Having a fetish for the period of your lifetime when you were developing sexually makes too much sense not to be common.
I think our culture is just so mind-melted from the specter of child abuse that most people won't even entertain the notion that it's possible.
Human intelligence came to be because of the world surrounding us and thus the only way to create true artificial intelligence is a world AI model.
>>106176590interesting, i havent eaten licorice and im not really bloated, pretty sure i have normal digestion but who knows
>>106176569im taking a D3 dose of 2000IU every day, should i still be taking vitamin K? K2 specifically?
>>106176645>im taking a D3 dose of 2000IU every day, should i still be taking vitamin K? K2 specifically?If you're getting enough in your diet, it should be OK. But don't do this long term without someone monitoring you.
K2 can't hurt, however. It synergizes with D3. Check your websearch-enabled local LLM for more info.
this is AGI
https://e-hentai.org/g/1919911/e1be31c3a3/
https://e-hentai.org/g/3262589/cda50ffcc8/
>>106176520>im drinking vitamin D3 because i dont go out at all, my doctor recommended it to me, how is it placebo? body wont make vitamin D without the sun, so i gotta take itplacebo as in it wont do anything medicine dosent get absorbed the same way as food because it dosent have the other shit around it so it can bind to it and be transported around the same way as liquids arent digested the same way as solids even though they may be the same material
>so i gotta take iteh probably not going to do anything bad so it dosent matter also you dont im likewise inside all day to the point where my skin starts shedding like a snake and its raw and itchy 24/7 you can fix that shit by opening the window fully and standing infront of it for like 10-15 minutes the whole "you need sunlight thing" is overblown you need a little every month or other month so your skin dosent shed but nothing more then that really
>im not, and i havent been. i just tend to eat less and never really eat fast food.man thats still fucked though like 49 kg is way too low idfk what to say you do you but i would highly reccomend eating more or something my female cousin is fucked due to her weight weak as shit and like half faint whenever she has to exert herself a little
>>106176475you're not understanding what I'm getting at. the very concept of a one-dimensional "level of intelligence" is exactly the sort of misconception I'm arguing against; it's not like that at all, intelligence is a fuzzy collection of abilities. have you ever considered that human intelligence could also be narrow, just in different ways?
>because again, the whole point is for ai to reach human levelgreat, so say that instead of using the term "general intelligence" where it doesn't make sense
>>106176676xzhentai.net is better
Why don't we have a "living" (Think Lisp or Smalltalk) LLM yet?
>>106176676Nice, that's just what I want
>>106176706Elaborate on why
the real answer is that intelligence is a meme and like most weighty philosophical concepts you're better off not wasting your time thinking about it
>>106176733>intelligence is a meme>wasting your time thinkingyup this is lmg alright
>>106176686>But don't do this long term without someone monitoring you.you're right ill ask my doctor next time im sick
>Check your websearch-enableddoes any non bloated frontend actually have this? i've never used local web search nor have i heard of anons using it at all..
> local LLM for more info.already done but glm4.5 air is being too professional and downplaying all risks
>>106176686>placebo as in it wont do anything medicine dosent get absorbed the same way as foodwhat if i drink it after lunch/dinner? i read that D3 absorbs more easily than D2
>my skin starts shedding like a snake and its raw and itchy 24/7anon are you sure this is because of not going out? my skin never really shed and it only gets itchy when my muscles are inflamed from working out (no a shower doesnt fix it)
maybe you should try to get the dead skin off your body with a rougher sponge when showering? i mean vitamin d is mostly for bones right..?
>man thats still fucked though like 49 kg is way too low idfkits epic, i dont have fatigue issues, recently i deep cleaned my room and my muscles were inflamed for a few days but i can exert myself as much as i want
im just not tall and i work out sometimes
GPT5-Creative is going to be insane
file
md5: 3962e205e3cff99b318a59fdac2e0030
🔍
>he interacts with 4chan with anything other than his locally-hosted llm agent
heh.
>native world model
>ask it the upside down spitting question
>similar to how you can prompt genie 3 to simulate that, the model will simulate the problem internally, and then base its answer on its internal simulation
>[spoiler]it still gets the question wrong[/spoiler]
>>106175226I run it with iq4xs and it does well in my small tests when plugging it into roo code. I got it to generate a 3d physics system in js with no external libraries over 3 prompts with good code structure and improvements while avoiding npm trash.
why does GLM 4.5 Air stop reasoning after 8 messages in, every single time
>>106176845>what if i drink it after lunch/dinner? i read that D3 absorbs more easily than D2helps still not the same though i reccomend ditching it completely but you do you
>anon are you sure this is because of not going out? my skin never really shed and it only gets itchy when my muscles are inflamed from working out (no a shower doesnt fix it)maybe you should try to get the dead skin off your body with a rougher sponge when showering? i mean vitamin d is mostly for bones right..?
yea its due to it as soon as i step outside for a few minutes it clears up and stops besides the already half shedded skin
>its epic, i dont have fatigue issues, recently i deep cleaned my room and my muscles were inflamed for a few days but i can exert myself as much as i wantim just not tall and i work out sometimes
>i deep cleaned my room and my muscles were inflamed for a few daysnigga... that not normal at all also you should not be that light especially if you are working out you do you ive said my peace and i wish you well
>>106175349>good for tool usagenot really kek it's straight retarded if you try to use it with zed
>>106177005need context bud what are you running?
>>106177009thank you anon, be well and take care
>>106177005works on my machine
gpt 5 will be a ****...**...****...etc changer
>>106176709Does it count if I give full control of Emacs to an LLM?
>>106175990Kek
Now do one where he gets arrested
>>106177005Doesn't do that for me.
>>106176676>filtered by a sad panda
>>106177481i can use ex, its for the not yet initiated
>>106177481I am not making an account just for cartoon porn.
>>106175494>>106176547>your speed in t/s is still going to be single digits though.Yeah but I'm used to that anyway. That's pretty reassuring then, thanks. I might look into just getting an assload of RAM and playing around with some bigger MoE models.