file
md5: 52ac8eb4374bf8d7e29af106c175ad61
๐
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105966718 &
>>105959558โบNews
>(07/18) OpenReasoning-Nemotron released: https://hf.co/blog/nvidia/openreasoning-nemotron>(07/17) Seed-X translation models released: https://hf.co/collections/ByteDance-Seed/seed-x-6878753f2858bc17afa78543>(07/17) Support for Ernie 4.5 MoE merged: https://github.com/ggml-org/llama.cpp/pull/14658>(07/16) Support diffusion models: Add Dream 7B merged: https://github.com/ggml-org/llama.cpp/pull/14644>(07/15) Support for Kimi-K2 merged: https://github.com/ggml-org/llama.cpp/pull/14654โบNews Archive: https://rentry.org/lmg-news-archive
โบGlossary: https://rentry.org/lmg-glossary
โบLinks: https://rentry.org/LocalModelsLinks
โบGetting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
โบFurther Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
โบBenchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
โบTools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
โบText Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
file
md5: 8fff5da90924258c574c888472f242c3
๐
Ani Won
Mikutroons started splitting the thread again...
where the FUCK are my ik_llama.cpp issues and discussions? literally all the info on the commands where there because nobody fucking documented anything in that shitheap
What is the best Q5_S quantization model for 8GB's of DDR4 and a 3090 assuming I want to use 17000token context with NTK scaling set to 2.5?
>>105971793Try:
llama-server --model Qwen3-235B-A22B-mix-IQ3_K-00001-of-00003.gguf -fa -ctk q8_0 -ctv q8_0 -c 30000 -amb 512 -ot blk\.[6-9]\.ffn.*=CPU -ot blk\.[1-9][0-9]\.ffn.*=CPU -ngl 99 --threads 8 --host 127.0.0.1 --port 8080 --no-mmap
And disable enable stuff from there.
best 12gb vram model? a little offload is fine too
Looking for another round of feedback from the other timezone regarding this rentry I made to answer the question we get a few times per day. https://rentry.org/recommended-models
You can check replies to this post
>>105965438 (Cross-thread) to see what was already suggested.
>faggot copied my post in the other thread for engagement
You guys really need to stop using a corporate model for this general. It's probably just one guy, but it's a really bad look.
glm4 100b moe will save us (local)
>>105972072no savior of mine is smaller than 700B (MoE) or 120b (dense)
>>105972398Here, you can eat a snack from my bag of Comfy Mikus
>>105973216TBdesu making his AI GF cosplay as Miku is a pretty believable Elon move
Decisive MikuGOD victory.
https://youtu.be/mco3UX9SqDA
Any direct link to get Llama-3.2-1B without huggingface or giving my contact information to Meta?
>>105974974Look for a reupload in the same site.
waaah
md5: 3c586621371a7f853e3042c550e1a29c
๐
whats the current best model below 24b for e/rp? is mistralai/Mistral-Small-3.2-24B-Instruct-2506 good?
>>105974974I just made a fake account with fake info
>>105971741she's a bit repetitve but DAMN
I like her
For the people who want a copy of ik_llama.cpp before it disappeared, someone claimed they had it including the missing 1-3 commits.
https://github.com/PieBru/ik_llama.cpp_temp_copy/
A lot of the forks I looked at, people were missing one or all of the Q1 quant work done but almost everyone also was missing the new WebUI changes.
>>105974974Install ollama and use that to grab the model.
file
md5: 29cb914c414077da83e6fde0345a8ab2
๐
anisex. mikudeath
file
md5: 32be0ce1e7d27099396db433f4c519a0
๐
why did our mascot go from a local voice synthesizer model to a closed source chat api? isn't this local models general?
>>105977903Because Ani is an AI gf and Miku is what you hope you become when you inject estrogen into yourself.
>>105977903First time witnessing a thread slide by /g/'s paid shills?
There's always been a team dedicated to promoting Musk's propaganda.
Here it comes
The last flicker
Molten by heat
It seeps through the cracks
Of a broken mind.
>>105977903Because you're in the wrong thread
>>105971710
>>105977976>paid shills>they changed my mascotseek help. preferably a handgun you could suck start.
>>105971714 (OP)>lmg>op picture does not represent a lm
Ani no longer responds to me jacking off on webcam. Grok won't acknowledge it either. First removing the NSFW outfit, and now this...
>>105978042>>op picture does not represent a lmIt is a tradition that /lmg/ OP picture has nothing to do with /lmg/. You could call it thread culture.
>>105978045Thank you saar. We collected enough of that data already.
>>105971880What are expected speeds for prompt processing and generation on one 3090 with DeepSeek? On two?
>>105971880As for your question about gap, I use monstral_iQ2m.gguf on two 3090s. Another model I like is 72B-Qwen2.5-Kunou-v1-IQ4_XS.gguf.
>>105975234If your only task in mind is "erp" I don't think it really matters at all. You are most likely too stupid to notice any difference whatsoever.
>>105977976We've had plenty of shit threads in the past.
Just means reading the summery then dipping out until we starting getting less shit threads again.
>>105975541>she's a bit repetitveAre you sure?
>>>/wsg/5925135 >>>/wsg/5925144 >>>/wsg/5928719 >>>/wsg/5928724Seems varied enough to me.
A new NeMo??
What's /g/'s verdict so far?
>>105980065If you are asking about sex then it is somewhere between pygmalion and llama-1 7b
>>105977976>paid shillsdude, they are shills, but they are Doing It For Free, which is even worse tbdesu
This thing's on sale now. Should I get it, or just wait for the Framework one?
>>105980223>42% offLOL. I knew it was dead on arrival but that much?
>>105980217nuh-uh.
They are mostly bots run by paid shills from Chindia.
>>105980223>128GB, only 90gb useableuseless, 4 tks for 70B maybe, spend 2-3x that and run deepseek instead on a mac much faster. Spend 20 grand and get a DDR5 server and run it even faster
>>105980239Its CPU is left in the dust by GPU from 2020. Why would anyone pay more than hundred bucks for that crap?
Apart from the occasional clueless idiot from /g/, I mean...
>>105980274I would buy it for 2000 if it had 256GB. With 128GB it is absolutely useless for anything. It was designed with dense 70B's in mind which died by the time it released.
>>105980282He probably fucked jart in his bussy and then Jart basically had him wrapped around his finger which allowed him to do bullying by proxy.
>>105980269>spend 2-3x that and run deepseek instead on a mac much fasterMac prompt processing is shit.
>>105980330faster than token processing would be on that amd, and unless you give it tens of thousands of new tokens to process every input then its like a few seconds at most
What is the best coding model in the 30b~70b range? Are there any models or solutions that are efficient to do DeepResearch stuff locally?
>>105980341>faster than token processing would be on that amd, and unless you give it tens of thousands of new tokens to process every input then its like a few seconds at mostFor coomer shit maybe, but if you're using it to host a server and make requests the AMD will be far faster.
>>105980223you know its bad when you can DIY a build better than it. 3 5060's and a cheap pc can run 70b faster.
The only thing this has is more 70b context at ~3 tokens a second, or running 100b models at ~3 tokens a second. That is not worth 2600. Especially with b60's on the horizen
>>105980402lol your just wrong, that thing will get you 4 tks at most on a 70B, maybe 6-8 for something like deepseek if it had 500GB at the same speed, is this buyers remorse I'm seeing?
>>105980223Have you looked up what performance it gets in various models?
>>105980449250 GB/s memory, so for generation it will be bottlenecked pretty hard by that. Image/audio processing will be good, assuming ROCM works.
>>105980466>Image/audio processing will be goodNo... it wont... That is even more compute heavy
>>105980482Yeah that's the fucking point, the GPU compute performance is fine, it's just the memory speed is shit.
>>105980490it would be far worse than a comparably priced gpu and the 128gb is not apart of that. Are you a actual shill trying to sell your stock for your failed product?
>>105980506>and the 128gb is not apart of thatWhat is this supposed to mean?
>>105980568did you even bother reading up on the product you are shilling?
>>105980582Are you having a fucking stroke? The 128 GB shared memory is also for the built-in GPU. I cannot even fucking tell if you're trolling or just retarded at this point.
What's better for RP K2 or R1/V3? Also, whats the best preset?
>>105971806New Qwen 235B dropped
>>105980609retard, its up to 96GB and integrated Radeon 8000S is utter garbage
>>105980609no one wants your unupgradable machine that can run outdated 70B models at 4tks
>>105980223It's fine. You get better performance with a dedicated GPU and you also don't need to worry about the shared memory or inability to upgrade the memory.
I'd recommend the HP over the Chinese version if you're planning on getting one. This is the EVO-X2 and the BIOS sucks. I can't even turn on Hyper-V. Linux might work better, but I haven't had a need to try that yet (besides Hyper-V not working).
>>105980223https://www.youtube.com/watch?v=BTIUL-yaY4s
>>105980223If you have $2600, you can swing another $400 and get a 4090D 48GB. You can play with 27-32B models at q8, and you can definitely do image and video gen at fp16 and fulll resolution, as well as train loras.
Be real. Local is for porn and ERP. if you want to write programs, pay for grok 4 or some other huge online model. This framework thing isn't good at anything AI, it's a reddit meme.
>>105980663I knew it, buyers remorse, no wonder he was trying so hard to defend his retarded purchase. Could get 8 3090s for that price, or save up a bit more to run deepseek faster
>>105980672>>105980663Yes, that's the one I have. You'll want to avoid it.
>>105980678>8 3090s>2.8 kWI'd hope it's a lot faster than a 200 watt computer.
>>105980695or get a 512gb mac, much faster and you can run actually good models. Or get a DDR5 server and run it even faster
I wonder when the DGX Spark is launch? PNY is going to be sorry they signed up for being the OEM, no one is going to buy this piece of shit when they see how poorly it performs for the $4600+ price tag. If you have that kind of money, surely you can justify an actual 6000 Pro, instead of fucking around with 2016-tier memory speed nonsense.
M3 ultra runs deepseek at 23 t/s, imagine paying half the price to run a shitty llama 70B model at a 5th of that
>>105980702Mac is shit too. Fast memory but tiny TFLOPS. Enjoy your wait as the context grows larger.
>>105980702>512gb macWhy the fuck are you trying to shill this garbage? That's 9500 dollars anyways.
>>105980723context processing is nearly 200 tks, wtf are you on about, you are working on old info
All those mixed memory 128GB capped dedicated AI computers (Spark too) that are obsolete on launch were funny to laugh at if you didn't buy one. And it is fun to laugh at retards who boughted.
But I now realize that we are actually kinda fucked. Because boomers with all the money saw this and we probably won't see half of actually useful AI hardware that would have been developed otherwise.
lol, mac is blazing fast for deepseek, cope
Just wait for DDR6, it'll be fine.
>>105971714 (OP)>no "official" mikutroon card in OPBASED
>wasting $2600 when you could get 2 3090s and a decent psu
amd/rocm fags are either tech illiterate or plain retarded
>>105980754>500GBhow much that cost?
>>105980754>4k context>487 GB used>can't upgrade memory so that's the absolute limitI'll just not get a Mac, or one of these soldered LPDDR5X devices (AMD 395, Nvidia Spark) thanks though.
>>105980783that is 5bit, it fits 32k context easily still, that is the most you want to go for deepseek anyways, it drops off after that
>>10598077610 thousand united states dollars
>>105980776A car basically. And not even a secondhand cheapshit.
>>105980776About 10% of what I earn and a much smaller fraction of what I've saved :)
>>105980721>M3 ultra runs deepseek at 23 t/sName me one reason other than coom to spend 12k for this speed
>>105980797so you can be a cool guy on /lmg, smarter thing to do is just pay pennies for api
>>105980783the reason why it's a usable gen speed is BECAUSE it's soldered ram
slotted ram will never come close
to get decent ram bandwidth requires obeying the laws of physics and physics say the chips need to be very close together.
>>105980223just buy a mac studio
>>105980721>23 t/sI'll wait for M5
>>105980797>Name me one reason other than coom to spend 12k for this speedIDE usage, batched translation
>>105980757>Just wait for DDR6, it'll be fine.2028 ?
>>105980797Everything that isn't coom. You don't need 40 t/s for answering knowledge-based short questions. Not everything people do with LLMs needs 100k context and 10k reasoning tokens. In fact that's rare for the average person. For vibe coding using API is fine, no one said you can't use both at the same time.
>>105980854Yeah, basically tomorrow.
>>105980741Yeah I'm not jumping through hoops trying to get mlx image gen models to work, when something old like a 3090 still blows it away performance wise. You're going to spend a fortune on a llama.cpp machine? If you have that kind of money, just by a Blackwell 6000 Pro already.
>>105980873>just by a Blackwell 6000 Pro alreadycept you would need like 6 of them and that means rewiring the house
>>105980873>when something old like a 3090 still blows it away performance wisedoes it matter to have good performance when you only have enough vram for absolute trash models
>>105980852>batched translationllama-cli with with promp from file, single shot and logging into file running over night
I did it.
>>105980873Wtf are you talking about? Almost all of us in this thread have at least a 3090 and are doing image gen already. Obviously you're not going to sell your machine to main a mac.
>>105980872>Just wait for DDR6, it'll be fine.>2028 ?>Yeah, basically tomorrow.That's, like, 900 /lmg/ threads away.
>>105980754>--max-toklens 8000Wow, that's going to be completely useless for coding. What other fails do you want to show us?
/lmg/ btfo'd by reddit
kek
Nice to see that they dropped the hybrid thinking meme. Found it super stupid that they trained the same model two ways based on a single token, probably fucked up everything downstream. At a glance it looks like they also fixed it having long-term brain damage, but I'll test it when the quants come out. Never can trust OR providers to actually use the model correctly
>>105980852>IDE usageYou need to rethink your workflow as far as coding is concerned. Even having 50 t/s will kepp you tied to the display
>>105980896that is max output and its set to 800 and it could be increased? It fits 32k and deepseek falls off after that anyways
>>105980891They'll go quick when you're fighting the miku vs ani war.
>>105980898none of r*dditards even downloaded let alone ran this model
they all just jerk each other off and look on the benchmarks
>>105980896>>105980906and that is at 5bit, at 4bit could prob fit 120k but no model does past 32k well cept maybe gemini
>>105980857>Everything that isn't coomI ask "what else if not just coom?"
I agre that you don't need it for the tasks you mentioned
>>105980883Have you seen how much memory a mac needs to run something like flux? Not even 64GB is enough. Unless you use mlx, everything has to run at fp32. It's a joke.
>>105980913have (You) to confirm it's benchmaxxed?
>>105980873>Blackwell 6000 ProBASED
>>105980924that is what my 5090 is for, why would I use a mac for that
>>105980890Then why do you need a mac in the first place?
>>105980939NTA but that's not the new one. New one would have 2507 attached to the model name.
>>105980916>and that is at 5bit, at 4bit could prob fit 120k but no model does past 32k well cept maybe geminiAgain, what's the point? Local is for coom. 48GB of VRAM is fine for that. If you want to code, you want 128K+ context, and that means cloud models.
It's like being gay; I don't really care if you are, just keep it to yourself that you're a mac user.
>>105980947for deepseek / other moes dumbass, no one is using it for image gen
>>105980963poor cope, 48GB is not fine for shit. all 70B models are garbage and are actual wastes of money
file
md5: c0d934db4af86bea62efeaafb44c9c24
๐
>>105980958no other model is available on their website
and none of them have dates
>>105980947Deepseek, what else?
>>105980963>just keep it to yourself that you're a mac userIt used to be a general policy back then
>>105980965I see. So you're using your 32K context deepseek for what exactly, other than posting videos of your very first gen? Even cpumaxxing deepseek isn't terrible the first gen.
>>105980741>context processing is nearly 200 tks, wtf are you on about, you are working on old info>paste a medium sized code file>wait an entire minute before the answer even begins to generateNice paperweight you got there.
>>105980975What recent 70B model is worthwhile for coom?
>>105980986>>105980965What do you need 23 t/s for?
What kind of tasks?
>>105980963IME cooming benefits more from model size than other tasks. Probably because it's harder to benchmaxx on. Local models are pretty good at coding now. If you want to produce anything of value, you can't let any model code without constant tard wrangling anyway no matter how big it is
>>105980995Nta
Did you know that you can pre-process prompt (big file) to load them in 1 sec?
>>105980750Better way to think of it is this is LLM for idiots V1โ for people too afraid to stack gpu's or have some big 'ugly' box.
Also, it doesn't take a brainiac to realize theyโ (hardware manufacturers)โ came up with something minimally viable first, and in 6 months to a year will come up with something way better. You should be interested in v2, v3, because if they ever manage to make something capable of running 200-600b models, maybe there will be a point where they can edge out dedicated gpu's. I feel like its a bit of a tossup thoughโ what it will probably be is dedicated gpu stacking to 96-196 gb becomes cheap(ish), while crap like this struggles to run larger models in the 512gb range with the same kind ofโ "you can, but it will be painful at 4 t/s".
>>105980983NTA but stop being a retard ffs
https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507
>>105981019How do you expect to preprocess something this is actively changing?
>>105980991writing, general assistant stuff, rp... everything I need a llm for?
>>10598099520 seconds for 4k context processing, and most use cases you type like 100 words or less so its near instant since old context is reused, what kind of cope is this
>>105981003I need 80 t/s for zed
>>105981003Normal AI tasks? What are you doing where you're satisfied with less?
>>105981019What if you change a line in the middle?
And then what if another line changes?
>>105981027ita but prove if it knows what mesugaki is, ok?
>>105981031>most use cases you type like 100 words or less so its near instant since old context is reusedFor programming you're rarely reusing context.
This
>>105981033
>>105981029just pre-process it after it changes duh
>>105981042I dont use it to code with
>>1059809831. You're using their website. First misstep.
2.The website has a thinking toggle. Only the original 235B has that.
3. I did that test with Q3_XL back when 235B was still new, and it did the same exact output. That's the old one.
If it was the new Qwen they probably would advertise it as new.
>>105981008I dunno man, I guess if you want to play "prude simulator" then the Meta or Google local models are fine.
I personally think broken-tutu-24b is the best compromise for local coom. negative-llama3 70b is maybe more nuanced, but it's prudish. I don't bother with the truly brain-damaned "fine tunes".
>>105981029>this is actively changingWiki article?
>>105981037>Normal AI tasks?Which were?....
You can't read at 23 t/s if it's not porn which was exactly my point
Nothing but coom
file
md5: 4a83c886eac6a67f87a5df39a6d2053b
๐
https://x.com/elonmusk/status/1947179677325652459
>>105981076no one is stopping you from paying for api
>>105981049CPU deepseek is nice for one-shot stuff that you won't be waiting for but it's truly awful to use it at such a slow speed when you're trying to make quick iterations to a project
>>105981008>you can't let any model code without constant tard wranglingHave you tried generating prompts with a model? I find that if I first request for a prompt to do what I want, I can then fine-tune that generated prompt which tends to produce better results than if I just ask for the basic first request to be immediately implemented.
>>105981029>>105981038Just pre-process every variation of the prompt you fucking retards. Buy another mac, are you poor or something?
>>105981078Total twitter death NOW
>>105981076You need to account for reasoning. In which case, 23 t/s is kind of still not enough.
>>105981086Exactly. Coding with an AI is going to be an iterative process, unless you're asking for a hello world program. I can't see coding anything complex locally, it just takes too long.
>>105981038>What if you change a line in the middle?Why don't you let AI gen some decent code based on your well-structured promt ?
Why is a single line able to fix or break anything?
Guys, you have to re-think how you use AI
It's not about some shitty coding anymore
>>105981078The smartest and best AI confirmed.
โบRecent Highlights from the Previous Thread:
>>105971710--Paper: CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning:
>105979713 >105979801 >105979861 >105980265 >>105980303 >105980321 >105980384 >105980477 >105980512--Overview of 2025 Chinese MoE models with focus on multimodal BAGEL:
>105971749 >105971848 >105971891 >105971983 >105972003 >105972101 >105972110 >105972282 >105971903 >105972293 >105972308 >105972373 >105972396 >105972425 >105972471 >105972317 >105972323--Techniques to control or bias initial model outputs and bypass refusal patterns:
>105972510 >105972536 >105972593 >105972627 >105972655 >105972713 >105972735 >105972972 >105981009 >105973013 >105973146 >105972548 >105972576 >105972675 >105972685--Multi-stage agentic pipeline for narrative roleplay writing:
>105977946 >105977998 >105978038 >105978268 >105978815 >105978189 >105978248 >105978885 >105979472 >105978364 >105978380--Troubleshooting remote self-hosted LLM server downtime and automation workarounds:
>105977036 >105977073 >105977134 >105977232 >105977270 >105977334--Preservation efforts for ik_llama.cpp and quantization comparisons:
>105975833 >105975904 >105975923 >105976020--Hacking Claude Code to run offline with llama.cpp via context patching:
>105978622 >105978821 >105978965--Anon's experience optimizing R1 quantization for speed and context retention:
>105979489 >105979593--Qwen3-235B-A22B updated with 256K context support in new Instruct version:
>105978819 >105979585--Assessing the viability of PowerEdge ML350 G10 based on RAM and upgrade potential:
>105974903 >105974928 >105977337 >105975195 >105975224 >105975230 >105975250 >105975254 >105975273 >105975287 >105975311--Feasibility of building an AMD MI50 GPU cluster:
>105977878 >105977907 >105978783 >105979064--Miku (free space):
>105978729 >105979092 >105979388โบRecent Highlight Posts from the Previous Thread:
>>105971718Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>>105981129Bit late there bud
>>105981097>In which case, 23 t/s is kind of still not enough.Not enough for fucking WHAT?
Nobody of you could name a task where you can use up 23 t/s, and won't be enough
You think AI on M3 is for quick answers. You have to learn to ask yourvquestions correctly
Super quick re-runs is waste anyway
>>105981140>Nobody of you could name a task where you can use up 23 t/s, and won't be enoughat least two people said zed already
>>105981086I found current gen <30b models like small-3.2 and gemma3 to be quite good as coding assistants, so I don't see where this idea comes from that only cloud models are worthwhile. You just need to be able to run them on GPU so that the gen speed is fast enough.
I only use them as assistants though. IMO you still are responsible for understanding and maintaining the code so I don't see the appeal of vibe-coding without understanding. Cloud models might be better at that but that just means they'll take a little while longer to fall over. I've used them and while they're smarter they still aren't reliable enough for that kind of use
>>105981129Thank you for Miku!
A bit late but still
>>105981129My bookmarklet isn't working :(
>>105981151>two people said zed alreadyJeeeez!
To do WHAT?
>>105981165Leave this board.
>>105981161The script doesn't work for me either.
test
md5: 62fb3433c15043503947893d7a4e430d
๐
>>105981041Again, I'm not the original anon you talked to, but here you go.
Keep in mind this is the free model hosted by Chutes used via OR.
>>105981140Oh I see. You're one of those people that think there's nothing wrong with treating AI like it's an e-mail communication.
>>105981180>jews out of nowherewow local is really saved this time
thanks alibaba
Where are the goofs Daniel?
>>105981152>I only use them as assistants though. IMO you still are responsible for understanding and maintaining the code so I don't see the appeal of vibe-coding without understanding.Yeah this is the difference in our use-cases for sure, I'm heavily relying on LLMs for code generation. A project I've been working on for like two days now is at something like 8k lines of code and I've written maybe 100 of those. It definitely does make having a full understanding of the project more difficult but I've had reasonable results with this workflow so I'm continuing to experiment with it.
If I were less lazy, I would probably share your mindset lol
>>105981165maybe you should look into what Zed is first
>>105981184>implying being smarter than othersWon't work with me.
Your coding is not worth 10k.
Learn to articulate what you will achieve with 23 t/s.
None of you could tell
>>105981218>what Zed isAn editor?
>>105981241nta. are you a bot or genuinely this obtuse?
Currently downloading the new qwen3...how bad is it gonna be bros?
>>105981241I never implied that, but what I did imply is that you're coping about slow responses being all you need. If 6 t/s all you can have, then it's better than nothing, but faster is always better.
Also with this post you seem to be implying that there's some value you need to be getting to justify spending money. People easily spend tons of money on unnecessary shows of luxury like sports cars, that are not worth that much in returned emotional value or any other value. And there's nothing inherently wrong with that in moderation.
>>105981261You're going to love the new Qwen, it's so bad.
>It's been about six months since Google dropped a research paper about inserting new info to a LLM without any training
>Still no proof of concept
Fuck man, I just wish we could do more with LLMs other than coding. Grok 4 is probably the closest to a fun model, but it's still not good enough for me.
How likely is it that new qwen is less censored?
>>105981342>APIBro your Claude?
this just in, qwen is still a useless benchmaxxed model
>>105981320>People easily spend tons of money on unnecessary shows of luxury like sports carsClassical redistribution of wealth
I see no point to discuss it
>>105981320>luxury like sports carsAnd this kind of indugence won't waste 50% of value in 3 years like gadgets do
>the (tr)anni/grok shill is also a vramlet
take your brown skinned brethren and go back to aicg.
>>105981338>>105981379thanks for providing proof with your gaslighting
>>105981399are you a mikufag?
>>105981415>>105981180it also doesn't know anything about dnd, so again, another benchmaxxed model
>>105981397Your gadgets. Macs don't reduce in value that fast. Plus many of those people don't care about or ever sell their luxury shit.
>>105981425>it also doesn't know anything about dndAh, fuck.
Thanks for saving me some time anon.
>>105981161Here is my working version:
javascript:document.querySelectorAll('span.quote').forEach(quoteSpan=>{const post=quoteSpan.parentNode;const previousThreadUrl=post.querySelector('a[href*="thread"]');let threadId=null;if(previousThreadUrl){const threadMatch=previousThreadUrl.href.match(/thread\/(\d{9})/);if(threadMatch)threadId=threadMatch[1];}const quoteIds=quoteSpan.textContent.match(/>(\d{9})/g);if(quoteIds){quoteSpan.outerHTML=quoteIds.map(id=>{const postId=id.slice(1);const linkUrl=threadId?`/g/thread/${threadId}#p${postId}`:`#p${postId}`;return `<a href="${linkUrl}" class="quotelink">>>${postId}</a>`;}).join(' ');}});
>>105981177And here is a working user script, just replace all the code with this one:
document.querySelectorAll('span.quote').forEach(quoteSpan => {
const post = quoteSpan.parentNode;
const previousThreadUrl = post.querySelector('a[href*="thread"]');
let threadId = null;
if (previousThreadUrl) {
const threadMatch = previousThreadUrl.href.match(/thread\/(\d{9})/);
if (threadMatch) threadId = threadMatch[1];
}
const quoteIds = quoteSpan.textContent.match(/>(\d{9})/g);
if (quoteIds) {
quoteSpan.outerHTML = quoteIds.map(id => {
const postId = id.slice(1);
const linkUrl = threadId ? `/g/thread/${threadId}#p${postId}` : `#p${postId}`;
return `<a href="${linkUrl}" class="quotelink">>>${postId}</a>`;
}).join(' ');
}
});
>>105974974Install 9llama and, atleast get a 3b if you have 4gb vram
>>105981516ollama run deepseek-r1:7b
>>1059815393b models run at a pretty good speed on 4gb vram
But yeah you probably can run 7b as well
https://github.com/MoonshotAI/Kimi-K2/blob/main/tech_report.pdf
>>105981637>Open A(gentic) I(ntelligence)kek
>>105981442>people don't care about or ever sell their luxury shit>John Wick (2014) and his 45yo MustangThat luxury shit keeps being attractive for the others
M3 will be outdated in 3 years
regular K quant goofs
https://huggingface.co/lmstudio-community/Qwen3-235B-A22B-Instruct-2507-GGUF/tree/main
Are there any open models that can do image OCR just as good as GPT 4.1 mini?
I don't know what black magic OAI did to make such a good model, but it beats even Gemini-pro when it comes to extracting Japanese text from an image.
>>105981637Instead of skipping the safety tests like the WizardLM team, Moonshot has opted to straight up lie
>b-but K2 said 'I'm not allowed to-'Prefill a single token. Enable token probabilities and choose the most probable non-"I" token. Trust me
>>105981732Outdated or or no doesn't matter, it's an Apple and will not lose value as quick as you exaggerated.
Yes some people don't give a shit about selling luxury shit, that's what I meant to say. But even if you take it the way you interpreted my post, it still works because in the end John Wick is a fictional character and people care about him in a superficial way. And the people who interact with actually rich people don't care what they buy with it. Maybe if this were the 1800's with upper class social circles that excluded you if you presented low.
Is post-op sex with qwen better?
Anyone know of good creative writing benchmarks other than EQbench? Ideally for prose writing, but an RP bench might be okay too. I've started seeing EQbench "Creative Writing v3" results in model release blogs (such as the Qwen update today), which means it's now being gamed/benchmaxxed and will soon be completely useless. (I guess the slop analysis is maybe still useful, but it only looks at n-grams, so it catches specific phrases like "shiver down her spine" but can't detect sentence structures like "it's not just X, it's Y".)
>>105981991>goodLol. But even if you didn't say that, no there aren't really any others. Not anymore at least.
>>105978045they removed the nsfw outfit?
>>105981991I think there was a specific benchmark for "x but y" phrases, but it might have been a personal project by someone on LocalLlama.
>>105982039>>105978045Huh. That would be another L for cloud, kek.
>>105981477Thank you. I could have sworn I fixed that back in November. Not sure why it didn't work this time.
I updated the rentry with your version.
>>105982046>>105981991It was actually the same guy but he only posted it to reddit and not the eqbench site. Or maybe he just included it in the Slop score idk.
whats the point of releasing 235b that is just a few gb over 128gb at 4bit
>>105971714 (OP)Can someone share some good sampler settings for EVA LLaMA 3.33 v0.0 70b? Do I use shit like smooth sampling, XTC or DRY?
>>105982082Neat, thanks anon. Interesting that the smaller qwen3s are so high up when full-size qwen3 is down at 0.42
>>105982094To fuck with you specifically.
>>105982094Use ubergarm's Q3_KL quant for ik_llama when that comes out. If you don't have ik_llama since the github shit the bed you might be forced to wait longer
u
md5: 862b412dde38010f8ef88e49e6ec4c67
๐
>>105982101yep, the fullsize one is an independent model while the smaller ones are distilled on STEMslop
>an army of phds working on this with nearly unlimited funding
>best anyone can manage is incremental """""""""""upgrades""""""""""""
I'm so glad Qwen3 stepped back from the brink of this thinking bullshit.
>>105971714 (OP)I want to fuck a cosplayer dressed up as the grok slut so bad
>>105982226just fuck misa from death note already
>>105982125stop posting pictures of me
>>105982263Most people posting here weren't even born by the time everyone already forgot about Death Note.
>>105982263Born too late to have gotten a good wife, born too early to be fucking AI robots
All that's left for me is to lift and try to pull another cosplayer, but one that's less crazy than the one who dressed up as Loona then told me she had a past as a hooker and was baby crazy
>>105982276>Death Note started airing in 2006.Dear god.
>>105982307>>105982276can you guys fuck off, I'm here to feel young again
>>105982287>cosplaying as 'Luna'>past hooker>baby crazyRUN! RUUUUUNNNNNN!
>>105982287So how much did you owe for child support payments?
>>105981129looga booba and shallowcervix mixtureofthighs
>>105974561>>105974459
>>105982323I did, which she didn't forgive me for, then tried to get with my friend, and when that didn't work I think she ended up hating him too.
The head was good though, and if she had been as kinky as she claimed she might've snared me.
>>105982346Fortunately no babies were possible using the delivery route we used
>>105982161>3+ months just to see if your changes have any meaningful effects>changes that may be beneficial are aborted early, possibly because it was too early to reach a threshold where there would be visible improvements
image
md5: bd79ac5c89650c0be4f004dfa1229d3d
๐
I came across this image and had a laugh.
I'm on neither side of the mikuani war btw.
>>105982418I'm neither side of it either. I just like to fan the flames of it, and watch as people who care angrily go at it.
i wish erp trannies would leave /lmg/ im not here to see this sick shit
>>105982452you do realize you are on /g/ right?
>>105982418>mikuani warThere is no mikuani war, there's one resident schizo who's entire self worth is built around being this general's troll, and he's realized that the latest thing he can get a reaction out of people is fomenting some artificial conflict, like when he'd samefag for 12 posts arguing about whether deepseek was local and then post blacked porn when he got called out.
>>105982418>I came across this imageDid you clean your screen?
>>105982388Deepseek was proof that local could be good, Kimi proved it wasn't a fluke.
>>105982452this sir, /lmg/ could be a leading place for productivety and agentes and mcp who create stunning solution by pushing vibe coding further with our years of ai experience
we could create the true e = mc + ai and turn it into e = mc^local_ai
>>105982477The war is real, and Ani is our new queen. Miku got BTFO so bad that even her defenders are having her cosplay as Ani.
It's over.
>>105982276Death note is so old that I watched it like 6 years after it aired and I thought light did nothing wrong did everything correct. and then I rewatched it like 10 years after that and realized that both light and L are fucking psychos. The ending with Ryuk was also great for me the second time. He basically saw a retard slapfight and got everything he wanted out of it. In a way it is a lot like mikufaggots and antimiku faggots.
>>105982448how did you even find me what the fuck man
>>105982477You are:
>people who care angrily go at it.
>>105982452I think there should be an erp general so this thread can be specific to ... technology ...
>>105982452ERP, especially that involving the most depraved fetishes, is the penultimate delineation of where information becomes knowledge. Sex and sexuality are a part of life and furthermore a part of the human condition. You're mentally ill if it bothers you that much.
/lmg/ = local mikus general
>>105982501I think the main issue with Miku is that she has no tits and appeals only to actual pedophiles
I'm making my own python frontend to deal with llama-server. I mean I want to simulate old interactive fiction game and for this I need to be using terminal.
ST (or its UI) is actually way too complicated for what it is - it just adds bunch of strings together and then submits them onward...
>>105982501I will accept ani when elon opensources her.
>>105982550>penultimate delineation of where information becomes knowledgeI do hope you're being facetious
file
md5: 2614bd8a2152b4e9ce9744d62c5f1fc4
๐
>>105982567>ani is for: sex>miku is for: detecting pedosu mad?
>>105982561I think you're on to something.
>>105982388good luck, bring ai gf soon ok
misa
md5: 5e59cffd8c03844c61d62d9120ddd0b2
๐
>>105982263Witnessed
>>105982226I'm sure you could pay someone.
>>105982418lol I'm borrowing that one.
>>105982590I think you're being shallow and pedantic
>>105982494>e = mc + aiKek, thanks for reminding me of that.
>>105982599>I'm sure you could pay someone.That exists?
I forgot this thread is useless for any sort of constrive discussion. My bad.
file
md5: 15efc08d69b5dea60c8112fcb2f77c91
๐
>>105982590>penultimateno it is just DSP posting on his mobile phone while streaming
>>105982600I'm not original anon you replied to, idc if people want to erp or talk about it here
>>105982076>Not sure why it didn't work this time.I forgot what error I got exactly, but I remember that the regex couldn't match the previous thread id.
>I updated the rentry with your version.And you also bumped the version, neat, thank you.
>>105982610The world's oldest profession anon.
>>105982616The Guilty Gear thread is two blocks down.
>Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Demo and code for this is up:
https://jerryliang24.github.io/DnD/
https://arxiv.org/pdf/2506.16406
https://huggingface.co/spaces/Jerrylz/Drag-and-Drop-LLMs
https://github.com/jerryliang24/Drag-and-Drop-LLMs
>>105982632Is it worth it? It seems like it'd take away all the sexiness. Unless she can pretend to be dominant and kinky, convincingly.
>>105982418must say, I never cared for miku but I quite like this art style
>>105982649There's a reason the expression Post Nut Clarity exists.
For anything further on prostitution, I suggest >>>/a/ or >>>/trv/
Changelog:
>Google redeemed
>MoonshotAI added as Zhong Xina("You can't see me"/unexpected newcomer)
>Added two retards to fill up the row(Drummer has more chance at making AGI than Meta)
>>105982548I agree, but the problem is the technology version would be too slow to survive /g/...
>>105982594>ani prompt literally describes her like a 14 year old grimesi... uh... anon...
>>105982684/mu/ oldfags, we won
>>105982675>Drummer has more chance at making AGI than Metakek
>>105982683You're probably right. Someone should do an analysis on the percentage of on-topic posts here lol
>>105982715I was only off topic because I was waiting for FedEx to delivery my Epyc 7763, an upgrade over my 7352. Now that it's arrived it's time to swap it.
>>105982715We need our shitposters to keep the thread bumped while we wait for new models.
Confession: I shitpost as both sides for fun
>>105982745kek valid
>>105982739nice, enjoy anon
>>105982770shhh don't tell them
>so many posts about people not giving a shit about Miku
This really was a troon instigated terror wasn't it?
>>105982548I don't want to jump between two threads after I git pull and nothing works.
>>105982808ahh yes thank you for directing me to the 3 /aicg/ threads
file
md5: 21a2da7bba0457117992b07d30b6a711
๐
>>105982477correct there is just 2MW
got real shit to do don't have time for /lmg/ keep missing all the image days
>>105982796I like miku
Not a troon, not a fan of troons, no idea why that one guy keeps trying to associate. miku with that
>>105982836>no migu reflectionit's so over
>>105982836Cuda dev likes miku and loves jart. Janny tranny loves miku. OG miku card posted ITT years ago had said that the character is a troon. Finally Hatsune Miku says trans rights.
>>105982846>OG miku card posted ITT years ago had said that the character is a troon.sounds fake to me
>>105982839Humanslop. Any image gen model wouldn't have made this error.
me when I make a thread and say something then reference myself in the future using my own post as proof of the same point I'm making
>>105982808>/aicg/They don't care about local stuff, even less so about technology, and that's okay.
But it is not the place I would like to migrate to.
Thinking about it, I would even prefer a board without images. That would prevent so much annoying and off-topic spam.
LOOK AT THIS THING, YOU FUCKS!
>>105982638
>>105982869Okay, I'll pull that instead. *rezips dick*
>>105982638>>105982869damn that looks rad. does it work with llama tho xd
>>105982869Where can I drag and drop my dataset on mistral nemo? No webui? No easy setup? Still needs 10 GPUs? Useless then.
drag and drop myself on top of ani
>>105982869>outperforms the strongest training LoRAs by up to 30% on zero-shot common-sense reasoning, math, coding, and multimodal benchmarks, and generalizes robustly across domains, all requiring only unlabeled data prompts.So uhhhh.... Talking about the actual application which is sex.... And considering for example drummer whose models give 0% improvement to model sex... The improvement with this is 0% * 130%?
>>105982869@Drummer lfg to the moon!
>>105982869Where is the exe?
>>105982897>>105982929This. Provide proper implementation or your shit will stay irrelevant.
>>105982897Yeah I concur. Someone should just drag and drop literotica on that and come back with the result.
>>105982869>The models you may need for DnD: Qwen/Qwen2.5-0.5/1.5/7B-Instruct, Qwen/Qwen2.5-VL-3B-Instruct, sentence-transformers/all-MiniLM-L12-v2, google-t5/t5-baseHaven't read the paper, does it work for larger parameter counts?
>>105982869someone turned drummer into software
>>105982965They claim it scales well from 0.5 to 7b. If that's true it probably scales above that.
What happened to ikawrakow and ik_llama.cpp?
>>105983002Nobody knows. There's only gossip.
>>105982638https://www.youtube.com/watch?v=XpNdGvbwtf0
PUT ERP IN MY NEMO! I WANT NALA AND MESUGAKI! DRAG AND DROP!
>>105983002nothing haha I am sure feeling sleepy haha
>>105983002gerganov mafia paid him a visit
>>105983002a very happy thing
may it never resurrect
>>105982684proof? this is important
>>105983081I have leaked prompts saved but they're on an nvme drive in a server I gutted so you'll have to ask extra nicely if you want them
file
md5: 5409a638bef0ba6ce0dd17c583b173e0
๐
Interesting...
https://x.com/DyLhun/status/1947289034327257126
>>105983002Github deleted his account (and with it, all his repos and contributions),and since none of his old PR's or whatever were turned into ghosts, he can't have just done it himself.
Other than that, it's total speculation.
>>105982684Ani is 22 retard
Do Qwenqucks *really* expect people to believe the benchmark scores?
>>105983161Anyone with half a lick of sense knows everything is benchmaxxed to hell and back, especially qwen models.
Doesn't mean they aren't still decent.
It remains to be seen if the instruct is any better than the hybrid, but I'll compare 'em once I'm done downloading - I'm using the hybrid right now.
128gb bros... 235 q3_k_l quant is here
New Qwen3-235B-A22B-Instruct-2507 knows Teto's birthday unlike the previous one, and kinda knows what correction is in addition to mesugaki. But when asked to take the role of a mesugaki without a definition in context (not shown in pic), it does a generic slutty girl impression without any of the expected traits.
From short tests it's definitely better than the previous 235b when it comes to knowledge.
Don't think it'll replace Deepseek V3 0324 IQ1 for me at this rate but I'll try it a little more.
>>105983107Breddy cool.
Shame if this actually reaches market it will be 99% for people to jerk off horsecocks in VRchat rather than anything interesting or engaging.
>>105982561Well, Anon-kun, that's why there's MIgu, who has a fat ass and big tits. Always been that way, long before Grok started pretending to be a girl.
>>105983219what is up with that highlighting
>>105982638Brainlet here. Can this be used to add knowledge to a model?
>>105983256Highlighted tokens in Mikupad are ones generated by the model. By default, more reddish = more perplexity/lower probability of having been chosen.
>>105982567Kikes diddle kids and you fap to something a kike would love.
>>105983248we are so back
>>105983219>Qwen3-235B-A22B-Instruct-2507download download download download
>>105983219wadufak where goofs daniel
>>105983244Death to the slampig as well
This person said this regarding copyright content
>everything in the video, the script, visuals, editing, and overall concept, was created entirely by me, the only AI-generated element is the voice, which I used as a narration tool.
How does something distinguish these types of videos from copyrighted ones? does this content not look like the average AI slop to you? or is he just BSing???
>>105983356Miku you have work tomorrow you shouldn't be drinking
>>105983360>the only AI-generated element is the voice, which I used as a narration tool.>the mascot is clearly piss-tined gptslop tooif he's already lying about this then the rest is bullshit as well
>>105983391>chatGPT invented piss filter(you)
is this the right place to as for cuddling anime girls
>>105974459I like this Luka
damn I trusted zed too much. I'm no better than a broccoli headed zoomer. I didn't use source control guys oh god oh fuck
>>105983444this, it's clearly hand drawn and resembles every chatgpt image ever as a stylistic choice
>>105983467>I didn't use source control guys oh god oh fuckLet this pain be a lesson to you.
>>105983324https://huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF
UD-{2,3,4}K_XLs up
>>105983552Get ready for them to be re-uploaded 6 times in the next 3 hours, at least one time being only like 6gb in size.
>>105983546I will commit seppuku by robot, I'm telling my local model to turn on the toaster
>>105983589don't worry guys, it doesn't know how to call tools
>>105983572It's not a new architecture though so I don't think it's that likely, but I guess you never know with Unslot.
>>105983667Dudes have absolutely no self or version control, their entire process is just 'who can slap shit in the LFS upload folder first?'
It looks like Daniel just uploaded the good old fucked up IQ quants that he always had to delete in the past. He never fixed his script.
looooool
>>105983754>1GB 235B IQ2_M return of the classic, lmao. Vramlets are back
OpenAI and Google are cool and all... but when will Meta announce their IMO medal?
>>105983790wait for llama 4 Behemoth deepthink
>>105983552i hate daniel so much
>>105983754just get the bartowski quants
>>10598379>imoMeta INVENTED the potato model
>>10598241it's out of sock, any update on this anon?
>>105983935Just pad it out to 120k tokens and it'll be fine
>>105983935You don't NEED more than 2k tokens of context.
>>105983965Static knives fracture thought.
>>105983935Qwen really is just chink meta. I don't know why you'd expect anything from them.
>>105983935It's not a thinking model. Wait him.
I just accidentally spent 12$ on opus 4 thinking prompts and it was actually worth it
qwen qwill you learn that your actions have conseqwences
>>105983935It's kind of crazy how QwQ does on that compared to all of Qwen 3. Wtf happened?
>>105983992>qwen qwill you learn that your actions have conseqwencesqwestions quake, qwesting quivers, qonsequences quench qwest.
>>105983994QwQ quickens, quelling Qwenโs quaint questsโwhat queer quirk quelled quality?
so the new 235b is basically more useless than scout
>>105983935I wonder why they didn't do Llama 3. Would be funny to see it beating L4 as well.
>>105983935Dimentionality of each model might be good.
As well as colour coding the cells.
Not sure if this is the thread to ask this, we don't really have one for audio. Is there a current SOTA voice generation? Doing a project and want to make sure I'm current. I'm currently using chatterbox, sparkTTS and Zonos mostly. FS-TTS is decent sometimes too. Is there anything else out there that's considered better?
>>105984079I too am curious about this. I wanna remake that old home assistant tts thing but with LLMs
>>105984079Of the stuff I tried a few months ago, Zonos was the "clearest", but pacing/prosody was schizo and I ended up using Kokoro since it was at least neutral.
The styletts guy just released something new if you want to try it https://github.com/yl4579/DMOSpeech2
>>105984064>>105983219 (Me)
Just tried it with an actual chat.
I sense it's like the previous 235B's nothink prefilled <think>\n\n</think>\n, while writing differently (?) can't tell. I think it's better than the previous 235B's nothink at least for RP, but not gamechanging.
At 10k it lost some of the character's mannerisms where V3 0324 IQ1 in the same chat still retained some of the early context's feeling. Note that old 235B nothink also lost the character's personality and earlier instructions. All 3 made some mistakes at 10k.
Just like the old 235B, to me it feels like it's taking ideas from context without really knowing what they mean.
Only tested with a single chat. I'm tired.
>>105983458>tfw still using deepseek v2 litev3 lite where
>>105983935What's 0 and 400 use case?