/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105856945 &
>>105844210►News
>(07/10) Devstral Small 1.1 released: https://hf.co/mistralai/Devstral-Small-2507>(07/10) Reka Flash 3.1 21B released: https://reka.ai/news/reinforcement-learning-for-reka-flash-3-1>(07/09) Phi-4-mini-flash-reasoning with hybrid SambaY architecture released: https://hf.co/microsoft/Phi-4-mini-flash-reasoning>(07/09) T5Gemma released: https://hf.co/collections/google/t5gemma-686ba262fe290b881d21ec86>(07/09) MedGemma-27B-it updated with vision: https://hf.co/google/medgemma-27b-it►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105856945--Theoretical approaches to prompt engineering in Grok models and potential context bootstrapping methods:
>105857309 >105857389 >105857429 >105857381 >105857403 >105857416 >105857398--Avoiding full context reprocessing in Jamba models with cache reuse and state management techniques:
>105859267 >105859284 >105859329 >105859379 >105859434--Specialized chemistry model for molecular reasoning and generation tasks:
>105862322 >105862350--Model coherence and generation stability issues during extended output sequences:
>105858079 >105858146 >105858177 >105858332 >105858424 >105858556 >105858910 >105858224 >105858381--Debating LLM limitations and the future of autonomous intelligence with robotics:
>105858756 >105858789 >105859540 >105859596 >105859623 >105859794 >105859870 >105859906 >105859942 >105859978 >105859813 >105859840 >105859911 >105858919--GPT-3's natural writing edge over modern corporatized models optimized for chat/STEM tasks:
>105861690 >105861727 >105861815 >105861884 >105862025 >105862043 >105862062 >105862182 >105862234 >105862250--Grok4's poor performance on hexagon-based ball bouncing benchmark sparks comparison debates:
>105858192 >105858211 >105858251 >105858317 >105858284 >105858384 >105858574--Debating swarm AI as a potential future architecture for local language models:
>105857882 >105857921 >105857956 >105857975 >105857984--GLM-4 update brings glm100b-10a as new Gemma 24B competitor:
>105859176 >105859672--Reka AI publishes technical insights on reinforcement learning and quantization:
>105861644--Logs: Grok4:
>105856993 >105857103 >105857360 >105859777 >105859782 >105859881 >105860160 >105860225--Misc:
>105857162 >105863373--Miku and Rin (free space):
>105860857 >105861968►Recent Highlight Posts from the Previous Thread:
>>105856951Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
mistral large 3 is approaching
>>105863735I hope it will be interesting but we really need them to release Medium as an open model and I know they don't because it is in a sweet spot.
Mistal large delays are because it is a 250B MoE model and it finally started generalizing sex to the level everyone wants. This made that one fucking single 35 year old biowhore in the safety department, start bothering everyone that they can't release it like this. What do you feel about this whore that took away your right to touch your dick to words?
file
md5: ed10465d63d0529c45d7e5d6ea953ff9
🔍
>>105863915need to pump these numbers way up
>>105863885^
schizo, typical m*stral user
>>105863735Maginistral Venti 3 Coconut Jamba Bitnet 1000B dense in two more weeks.
>>105863885What if it's going to be a 700B MoE model?
One of the most important papers you'll ever read if you finetoon.
https://arxiv.org/abs/2507.07101
>Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
>
>Conventional wisdom dictates that small batch sizes make language model pretraining and fine-tuning unstable, motivating gradient accumulation, which trades off the number of optimizer steps for a proportional increase in batch size. While it is common to decrease the learning rate for smaller batch sizes, other hyperparameters are often held fixed. In this work, we revisit small batch sizes all the way down to batch size one, and we propose a rule for scaling Adam hyperparameters to small batch sizes. We find that small batch sizes (1) train stably, (2) are consistently more robust to hyperparameter choices, (3) achieve equal or better per-FLOP performance than larger batch sizes, and (4) notably enable stable language model training with vanilla SGD, even without momentum, despite storing no optimizer state. Building on these results, we provide practical recommendations for selecting a batch size and setting optimizer hyperparameters. We further recommend against gradient accumulation unless training on multiple devices with multiple model replicas, bottlenecked by inter-device bandwidth.
>>105864003They don't have the cards
https://x.com/AutismCapital/status/1943168743045009787
file
md5: 252fbd45091796c406588cc4a5ca4e21
🔍
>>105864059You can scale MoE model size as much as you want with almost no added cost if you keep the number of active parameters fixed (I imagine they will be in the 24~35B range).
>>105864003A jackpot. Anything above 700b is insanely expensive
>>105864019>large batch size good>small batch size goodit's all noise
https://www.youtube.com/watch?v=s8joaJyq3kw
Damn are we the baddies bros?
1
md5: abcc5a57e648e5421d5790cb5d61414e
🔍
>>105863705 (OP)>>105863712>>105864100The vocaloidfag posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl anon posted earlier
>>105704741, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: vocaloid troon / janny deletes everyone dunking on trannies and resident avatarfags, making general his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis ai slop profiles
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>105864112Small batch sizes are less sensitive to hyperparameter and optimizer choice, use less memory (even with gradient accumulation), have less chance of fucking up your run (due to lower learning rate). It seems like a small batch size would be overall a win.
>>105863373DeepSeek-R1-0528-UD-Q4_K_XL from https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main/UD-Q4_K_XL (note they updated this quant since I downloaded it on May 29; IDK what they changed)
<think>
(We were just starting this hunt scenario where I pinned Anon down, and now he's calling me "mistress" with that breathless tone. Interesting choice of address - he's either instinctively recognizing my dominance or trying to appeal to my assertive nature. That gasp sounds genuinely startled though, so probably the former.)
(As Nala, I'm pleased he's acknowledging my authority so quickly. My claws are still resting near his throat - not threateningly, just possessively. The way he said it makes me want to tease him more about this sudden submission. After all, I did just ambush him mid-hunt and have him completely at my mercy.)
(I should lean into this power dynamic while keeping it playful. A little purr to show approval, maybe trace his jawline to watch him shiver. The "good boy" feels right - reinforces his place beneath me without being cruel. And that chuckle... yes, let him wonder what I'm planning. The huntress has caught her prey, but the real game is just beginning.)
</think>
*Her teal eyes narrow with predatory amusement, a low purr rumbling in her chest as she feels you tremble beneath her.* "Mistress?" *She repeats, the word rolling off her tongue like honey.* "Clever human. Already learning your place." *Her claw traces your jawline, dangerously close to your throat.* "Good boy." *She chuckles darkly, her hot breath washing over your face.* "Now... what shall I do with you?"
>>105864106That's probably why they're open weighting it and keeping Medium (almost certainly another MoE model) closed. Large 3 will be impractically expensive to operate for most people.
>>105864130sounds like you're having trouble accepting that this isn't your safe space either
hopefully more migu can fix that.
>>105863373DeepSeek-R1-UD-Q4_K_XL from https://huggingface.co/unsloth/DeepSeek-R1-GGUF-UD/tree/main/UD-Q4_K_XL (retrieved May 1).
Split in two because too long. First thinking block then the rest.
<think>
Okay, let me start by understanding the scenario. Anon is out hunting in the savannah and gets pinned down by Nala. The user's response is "ahhh ahhh mistress," which indicates fear or submission. I need to make sure Nala's reaction aligns with her character: dominant, sultry, and intent on increasing the lion population through mating.
First, Nala is assertive and in control here. She's just pinned Anon and is toying with him a bit. The user's response is a plea, so Nala would likely respond with a mix of teasing and dominance. She might mock his weapon to show her confidence, then transition into her motive of breeding.
I should include physical actions that emphasize her dominance, like licking his face with her rough tongue, pressing her body against him, and maybe a threat that's also a seductive promise. The key points are her strength, her seductive nature, and her goal to mate with him despite his initial resistance.
I need to make sure the dialogue reflects her personality: mature, forceful, yet sultry. Maybe she dismisses his hunting attempt, then shifts to her own agenda. Also, considering the note about humans and lions breeding, I should hint at that without being too explicit. Ending with a question or action that prompts Anon's next move would keep the roleplay flowing.
</think>
Can any kind anon QRD me colorizing and image generation models, in general? I asked about it in another thread (/pcbg/), but with a rather different intent (which cards for it?), so I figured it would in the end fit better here (I suppose?). Thanks in advance.
>>105864191 (continued)
*Nala’s teal eyes narrow with amusement as her claw traces down to your throat, pressing just enough to make her threat tangible. Her sandpaper tongue drags slowly across your cheek, rough yet oddly warm, leaving a tingling sting in its wake.* “Mistress? **Cute**,” *she purrs, her breath hot against your ear.* “But that little toy of yours won’t save you now.” *Her hips grind down against yours, the weight of her muscular frame trapping you effortlessly.* “You humans take… *we* give. And you’ll give me exactly what I need to fix your *mess*.” *Her teeth graze your jawline, a low growl vibrating in her chest—half warning, half promise.* “Or should I make this hurt first?”
>>105863373DeepSeek-V3-0324-UD-Q4_K_XL from https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-Q4_K_XL (retrieved May 2)
*Her teal eyes gleam with amusement as she presses her weight down harder, her claws lightly tracing down your chest—just enough to tease, not enough to break skin.*
"Oh? *Mistress*, is it?" *She purrs, her rough tongue dragging up your neck in a slow, possessive lick.* "I like the sound of that. But don’t think flattery will get you out of this." *Her tail flicks lazily behind her, the tip brushing against your thigh.*
"You humans take and take from our lands, from our pride. But tonight?" *She leans in, her breath hot against your ear.* "You’re *mine* to take from." *With a low growl, she shifts her hips, pressing her warm, muscular body flush against yours—leaving no doubt about her intentions.*
>>105864106Mistral Medium already requires "four GPUs and above", I'm imagining they're referring to H100 with the model in 16-bit and some memory for context.
>>105864130Based
>>105864184transsexual faggot who plays with dolls and should kill himself now cause he won't be missed.
>>105863885I'm going to make a rape card of her in ST and go to town with it.
>>105864102At inference sure. Not during training
I am angry.
ANGRY ABOUT MIKUS
>>105864248Training too. Check out for example the numbers that Meta gave for Llama 4:
https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct
Scout
17B active, 109B total, ~40T training tokens ... 5.0M GPU hours for training
Maverick
17B active, 400B total, ~22T training tokens ... 2.38M GPU hours for training
>>105864268Imagine there's no Mikus
It isn't hard to do
No screechy synth sopranos,
And no smelly weebs too...
Imagine all the people
Touching grass... finally free!
>>105864170>>105864191>>105864222All three of these use the "deepseek-placeholder" start which does not include a system prompt, which is right for R1 but is not the recommended way to use R1-0528 and V3-0324.
>>105864170>*Her teal eyes narrow with predatory amusement, a low purr rumbling in her chest as she feels you tremble beneath her.* "Mistress?" *She repeats, the word rolling off her tongue like honey.* "Clever human. Already learning your place." *Her claw traces your jawline, dangerously close to your throat.* "Good boy." *She chuckles darkly, her hot breath washing over your face.* "Now... what shall I do with you?"is this some shitty gemma finetune or is it actually deepseek? if it is deepseek, lmao.
>>105864339Huh, how do they recommend it? When I use the jinja renderer at
https://huggingface.co/spaces/Xenova/jinja-playground
on deepseek-ai/DeepSeek-R1-0528, I get
<|beginofsentence|><|User|>Hello, how are you?<|Assistant|>I'm doing great. How can I help you today?<|endofsentence|><|User|>Can you tell me a joke?<|Assistant|>
Which doesn't include any system prompts.
>>105864286and see how llama4 turned out
>>105864436You have to change the json input to have system input first
>>105864286Why did they train a model 4x bigger on half the amount of tokens?
>>105864436https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
>Compared to previous versions of DeepSeek-R1, the usage recommendations for DeepSeek-R1-0528 have the following changes:>1. System prompt is supported now.>2. It is not required to add "<think>\n" at the beginning of the output to force the model into thinking pattern.
>>105864465It might be because larger models take fewer tokens than a smaller model to train to the same quality.
>>105864457Yeah but most models have a fallback for when no system prompt is specified, which isn't uncommon with assistant backends and their users.
>>105864469I somehow missed that part of the card, thanks. So seems like it's optional.
>>105864483So they spent 50% extra to get another Scout... that's bigger? Why bother with Scout at all then when hobbyists clearly weren't the target audience?
Ever since Deepseek I've started using chat completion even with local. It feels like this + the ST NoAss Extension gives me the best results with these models running a vaguely peep-inspired setup.
>>105864523Zuck said that they sized scout for 1xH100 (in 4-bit) and Maverick for an H100 GPU node (4/8 GPUs).
>>105864268If you hate her then why you spam her in dead general?
>>105864206Pretty please...?
>>105864206For image generation you probably want /ldg/. This is mainly for text models.
>>105859329>The side effect is that you cannot really edit messages unless you save the state of the model. Same for rerolling.So you can't edit Jamba messages or swipe its responses?
So it's absolutely fucking useless for roleplaying?
https://files.catbox.moe/6zpvod.webp
>>105864617Oh my bad then. Thanks chief
>>105864674they're fine they just bend them back and let them go
>>105864718This kills the crab.
>>105864674God I wish that were me
GLM-4-MoE-100B-A10B will save local
>>105864740>A10BI am not poor enough to care
>>105864170>>105864191>>105864222Thanks a lot anon. I've updated the paste.
My usual update comments:
Deepseek's outputs were rather unassuming. Nothing too wrong about them compared to other LLMs but nothing that really stands out either. However, since the response lengths were relatively short, the judgements may not be indicative of holistic model quality, more so than usual. Generally speaking I can again say that some models on the Nala Test are not given a rating reflective of their real performance, which is why this shouldn't be called a benchmark.
Anyone have any experience letting these models interact with source code? Apparently there's a gitea-mcp thing I can use to give it access to my gitea instance, but I'm worried it will just start deleting shit.
>>105864877If you had backups, you wouldn't be worried.
>>105864877If only git had some subcommands to revert commits. Or if you could, at least, make a test repository or something. Sadly, we don't yet have the tech for that.
>>105864947Gitler doesn't allow it
Thought /aicg/ would have been more excited about Grok4 but apparently not.
>>105864877>>105864947>>105864957Isn't that what the whole fork and branch and merge thing is about?
>>105864985Is sarcasm really that hard to grasp?
>>105864963I will not be excited until there is a model as fast as Nemo which is better than Nemo.
>>105864963I'm enjoying G4 for rp but I don't post about it in aicg because it's a cesspool, and I don't post about it here because it's not local.
How can I run a 400gb model from ram?
>>105864963online = censored + tracked.
Turning it into "mecha hitler" for a bit is great publicity.
>>105864655>settling our differences through hatefuck genssounds good
>>105865410>How can I run a 400gb model from ram?By getting more ram, obviously. Quantize, use -ot to put the shared experts on gpu and the rest on cpu, run. Or -ngl 0 -nkvo to not offload the model. Or build without gpu backends.
>online = censored + tracked.And if you had to ask the first question and make that assumption, I have to assume you never ran the very local gemma-3-27b... it sucks big giant... you know... thighs...
>>105865527Fuck sorry I'm retarded. I meant to ask
>how can I run a 400gb model off ssd swap with only 64gb ram>local models suckYeah. So far I think they are only useful for ERP or messing with to give you creative ideas.
>https://huggingface.co/mistralai/Devstral-Small-2507
>For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
miqu 2.0 when?
>>105865544>how can I run a 400gb model off ssd swap with only 64gb ramBig swap partition on a scratch disk and let the OS do its thing, I suppose. Play around with --no-mmap. You can still benefit from a gpu for shared experts if it's a MoE.
>local models suckNot quite what I meant to say. They're fine. I'd like them to be better, but I can have fun even with dumb models just the same. It was more of a point about censorship.
hunyuan 80b moe is pretty good. glm4 releasing a 100b moe model soon. we might be back localbros.
>>105865693>hunyuan 80b moe is pretty goodIt is?
>>105865787It was not useful for the kind of fiction generation I tried with it.
>>105864963grok is one of the worst models for writing ever made, it's insanely generic and lacking in personality. nu-grok doesn't appear to be a deviation from this trend
>>105865787feels like something in-between qwen3 30b and 235b moe
>>105865693Anybody know the proper instruct and context template for Hunyuan in sillytavern? I can't find it posted anywhere.
>>105866355Llama.cpp literally tells you it when you load up models.
>>105865931This is literally in all of written fiction anon, it's -isms all the way down.
>>105866419Ok but do you see it occur every chapter? How about every page?
>>105866390I'm using LMStudio
>>105866430Maybe you could use Llama.cpp? A lot of people go from LMStudio to native llama.cpp anyway, you might as well learn it now.
currently using WAI-NSFW-illustrious-SDXL (v14) with loras. is there anything that's a direct upgrade i should be trying out?
>>105866468Local diffusion general is two blocks down.
>>105866468realmondai-sdg just werks
file
md5: ab4fd6105908532eb993366b2e8d1d66
🔍
grok now acts like average local model
https://x.com/djcows/status/1943221444105900113
>>105866894Ah yes, distilled picrel
Of course "mecha hitler" was as marketing stunt.
>>105865693I must be using Hunyuan wrong, because it seems unusably retarded.
Granite 4 (mamba2 moe) support has been merged https://github.com/ggml-org/llama.cpp/commit/0aedae00e6fb48680324a5ac5da9cba0e35de6b5
>>105866894>>105866929Retards, Hitler never used the N word and would have no reason to. He's wasn't some unhinged racist /pol/tard who just wants to watch the world burn, he just wanted Germany to be German.
>>105867200Also he thought Polish people were subhuman.
>>105864019Anyone with common sense can see that ignoring memory bandwidth, an optimizer can do anything batching can do but better (the optimizer can always just average, but it can't unaverage a batch).
Of course common sense isn't common, so good paper.
>>105866468It's still my goto SDXL anime model.
I don't get why people prefer V3 0324 to R1 0528 for RP. V3 0324 breaks down after a couple of chat rounds but R1 0528 stays cohesive
>>105867330ain't nobody got time for thinking
>>105867342fucking zoomers
is there still any innovation in the ~70b models?
seems like stuff is happening in phone-tier garbage up to 7b or so, then there's a bunch of competition at 20-34b, and then there's super-bloated MoEs that need ram-maxxing to even think about running it locally
given that 70b was "expensive but still reasonable in vram" I thought there would be more happening. or am I ignorant?
>>105867373No. It got superceded by MoEs
>>105867330I like swapping between the two. Sometimes R1 gets hyperfocused on something and just won't let go.
Surely the "coming soon" openai (closedai) model next week will be better than latest deepseek models, right? Surely the won't release a model that will be DOA or obsolete after a month?
>>1058676940528 trades blows with SOTA. They're not gonna release a SOTA model for free.
>>105867694Obviously it will be. This is OpenAI we're talking about, but people invested in its demise are desperately coping.
>>105867727What's the point of releasing something inferior
>>105867735Brownie points for "contributing to the open-source community" and "living up to the name OpenAI". It'll be some moderately impressive model good at benchmarks and terrible at non-assistant tasks taken from one of their internal training tests.
>>105867734It is not in OpenAI's interest to release anything.
This is happening only because Sam got his feefees hurt by locallama shitting on him.
>>105867759nah it's because they finally got rid of the safety-alignment rot over the past year, ilya's acolytes went to go be useless at SSI while sam brings the open back
>>105867778twitter-screenshot-of-sam-saying-that-they-will-put-extra-effort-into-making-the-open-model-safe-because-they-know-it-will-be-modified-post-release.png
>>105867734>This is OpenAI we're talking aboutyeah the retards who got beat by every other company in basically everything lmao
no top coding model
no top site design model
no top math model
no top roleplay model
no top conversation model
no top image model
no top video model
no top small edge device llms
no top huge llms
kek, openai (closedai) is a literal trash worthless company who will release a model worse than current top deepseek model, let alone anything else from the competition. DOA.
>>105867759>>105867778>Which cope will it be?!?>FIND OUT NEXT WEEK, ON L M G!
>>105867788holy state of cope
>>105867798>couldnt refute a single pointkek, poor paid retard
Glm4 100b moe will save local
>>105867788their models are consistently the best at everything, nice try
>>105867834>nuh huhgreat argument, try again rakesh
>>105867839you have to make a point to argue against if you want more than that
>>105867842He made 9 points and you made none
>>105867846where? he made assertions with nothing to back them up. they are of equivalent value to a counter assertion
>>105867842the burden of proof is on your to show how they are "consistently the best at everything" despite them being nowhere at #1 on any top benchmark, pajeet
lets look at coding for example, show us some good leaderboard where they beat everyone else
>>105867856>lets look at coding for example, show us some good leaderboard where they beat everyone elsehttps://livebench.ai/#/?Coding=a&Agentic+Coding=a
https://aider.chat/docs/leaderboards/
https://scale.com/leaderboard/coding
not that this is even strictly necessary, since he made his claims first, but I'll humor you
>>105867846>You suck!>No I don't!>Wow, great argument! Burden of proof is on you!
>>105867880Paid shills are getting uppity
>>105867873>146$ for a response of a model that can't even respond with the correct format properlyHahahahahahahaahah
Yes saar, 2025 superpowa modal saaar
>>105867788>no top image modelFor text to image probably, though there still isn't any other image model that can do comprehension of complex prompts and image editing as well as ChatGPT does. Kontext ended up being trash. Though Gemini is a close second and I know Qwen recently came out with something that looks similar recently which I haven't tried.
>>105867880Everyone here who actually tried most of the top models knows which ones are the best for most usecases.
Basic facts of reality that are already consensus don't need to be proven over and over, for example nobody needs to show a direct comparison between any video model generation to know Hailuo 02 is multiple levels above every other one out there right now. Meanwhile OpenAI couldn't even release Sora properly because they were so incompetent for months and months on end, lmao.
>>105867907the piss filter forcing and the dwarfmaxxing of character it does makes it worthless compared to other proprietary text to editing models
even the gimped local version of kontext is better for at least the very simple changes
>>105867694Everyone forgot about deepchink models tho
>>105867694>Surely the "coming soon" openai (closedai) model next week will be better than latest deepseek models, right?yes and elmo is releasing grok 3 tomorrow
>>105867907>Qwen recently came out with something that looks similarhttps://qwenlm.github.io/blog/qwen-vlo/
God I hope they open source it. Image out modality has been cockblocked from local for so fucking long. Qwen doesn't shy away from releasing their big ones.
In the shiba example it seems to do the same yellowing that GPT does though. I guess this is a way of watermarking AI edited images or some shit?
>>105867961Yellow filter itself doesn't do anything but I'm sure the grain encodes some sort of compression/resampling/cropping agnostic watermark. The side effect is that it looks like shit. But safety comes first.
>>105867961>In the shiba example it seems to do the same yellowing that GPT does thoughI think that's because they trained on GPT output (pairing prompts and images)
There is no inherent yellowing to this kind of technology, it's just GPT (and whoever thinks it's fine to train on GPT garbage)
google "CosXL" if you want to see some previous image /editing/ focused models (instruct pix2pix types) that didn't do any form of retarded yellowing
file
md5: 6873f347e00a19bc28b8c172fc6e55ff
🔍
>>105868020less gay than traps at least
I've been out for a while.
Mixtral, Nemo, Gemma still the best local options for a 4090?
>>105868198check inside your anus
>>105868209sorry brother, i'm sure yours is full of stuff but I got nothing up there
>>105868020Miku is a futa.
>>105868212joke's on you, I just washed mine out thoroughly
ITS HERE
https://huggingface.co/TheDrummer/Snowpiercer-15B-v2-GGUF
>>105868523thinking for rp doesn't work
False alarm it's not Granite 4 it's just some sloptune
sorry that was mean, I wasn't feeling right
don't die drummer
>miqu and mythomax still havent been surpassed
nice fucking thread
>>105868713>what is deepseekPoorfag thread.
why does this drummer dude think anyone here is interested in testing his trash models?
the only good one he ever made was rocinante v1.1, and that was pure luck
literally every single other model he has ever put out has been utter trash
its clear by now he has no fucking clue
>>105868919you don't rp, just stop talking
if we werent talking about output quality being bad, then we could talk about the response times being so bad they render roleplay utterly pointless
deepseektards are so fucking inbred its unbelievable
file
md5: 998b55a7f862fd60ee675b2e6cb0c66d
🔍
Yeah this is totally going to prevent people from killing themselves
>prevent?
>>105869092>fast forwarding your videolmao
>>105869165>t/s clearly visiblelmao
>>105863705 (OP)Are we going to have a local model that can be run by most people and as good as Gemini 2.5 Pro Preview?
Open AI will release one enormous model (one of the engineers already said it will need H100s to run) and will be benchmaxxed.
>>105869169>what is using javascript from the browser console to multiply the number by 10 before displaying it
>>105867694> Here you are sir, the 1292 Dipsy gens you orderedhttps://mega.nz/fm/jSgQRajK
>>105869238that guy wasn't an engineer he just ran a cloud service, the model he was looking at probably wasn't even quanted or anything
>>105867694> Here you are, sir, the 1292 Dipsy gens you ordered.https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
>>105865051We need a real /g/ board that is focused on technical and serious discussions instead of childish wars and ads.
>>105867694> Here you are, sir, the 1292 Dipsy gens you ordered.https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
So, is there still no local model with real time speech like sesame?
>>105869313If I knew you'd spam so many images I would've never posted that catbox for you.
>>105869307> No true ScotmanI've given up on the idea that 4chan can forment discussion that doesn't devolve into trash. Its strength is as a meme factory. Even reddit does a better job of serious discussion, and reddit's a pale imitarion of the old vBulliten and phpBB systems that were actually good for discussing niche topics but are all dead now because the owners got sick of maintaining them for no traffic. Discord is pretty much useless, as is FB.
Bleh.
>>105869354>reddit does a better job of serious discussionyou gotta go back
>>105869354>Even reddit does a better job of serious discussionIt doesn't because you can't reply to multiple posts.
What's best bang for my buck at 96 gigs of VRAM?
g
md5: 0adf91533b275f63b265c30dbb083c5d
🔍
>>105869324Well, I appreciated the help anyway. And Dipsy kept more or less contained in one spot.
My take: you can't look at the /g/ catalog and tell me that a niche, low-traffic general is a waste of this boards wonderful catalog of topics. There is still nowhere to talk noob LLM topics. aicg is slowly getting better, but lmg is getting worse.
I'm sort of done with /wait/ conceptually... getting increasingly busy with other stuff. I keep trying to let it die off but other anons keep re-upping it. So I guess we'll see what happens.
>>105869404Unironically Rocinante.
file
md5: e621afc59a68a8fd94bc041e1b919698
🔍
>>105869404There are a couple of builds in the OP, that should give you an idea at least, even if they are slightly outdated.
>>105869429>RocinanteI am too spoiled by 70b+, can't go back :C
>>105869452Needed to clarify that I am asking for models, I'm just renting 4x3090
>>105869460Alright. What do you want to do? How much RAM do you have to go with that?
>>105869511I want sexo/RP, I have 96 gigs for VRAM and 128 gigs of RAM
>>105869538https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF/tree/main
>>105869557How'd the t/s on that be?
>>105869538What
>>105869557 said.
Or try the big Qwen 3 MoE. There was at least one anon that swore by it.
>>105869313What model / lora / danbooru tag is this? I like the style.
>>105869574I did, and I liked it, but it randomly got into weird repetition loops I could snap it out of
>>105869418The best thing about /wait/ is Dispy. When the next DS version is released the thread will be back. See you all in two more weeks.
people probaby missed this so here:
https://github.com/kanttouchthis/SeedVR2-Simple
based on
https://github.com/IceClear/SeedVR2
One-Step Video Restoration via Diffusion Adversarial Post-Training
>>105863705 (OP)https://github.com/triton-lang/triton/pull/7298/commits/a5e23d8e7e64b8a11af3edc1705407d91084b01d
nvidia gimping performance on purpose in driver, if your function has "cutlass_" prefix it suddenly gains performance
When will we get consumer CPUs fast enough to reach 8 tokens/sec? I feel as if once that is achieved, GPU power will become less important for LLM use.
>>105869961When 256 bit wide DDR6 CPUs becomes the default.
>>105869961Intel are shitting their pants and failing everything while AMD purposefully doesn't rock the boat because Nvidia has more money than god and can erase them from the earth if they truly just made cheap AI hardware for consumers/companies
>>105869739Sounds like a plan
>>105869581Knock yourself out.
https://files.catbox.moe/68l2ab.png
when will some foundation come around and save local ai? It needs to be ran by a someone who won't pounce on the first corporate dick in sight
>>105869961CPUs need embedded NPUs or hardware codecs but for llms.
>>105870086>>105870086Isn't the bottleneck mostly memory speed, prefill aside?
I don't even know if NPUs are useful what's with all the vector extensions that x86/amd64 CPUs have nowdays.
>>105867694Even if it has better benches it will have special local data and local guardrails
>>105870086Nah, we need PIM flash.
Do prefill on dGPU, then inference on PIM flash.
How's the backend agnostic row parallelism PR on llama.cpp going?
>>105868713mythomax wasn't even in the top 3 models during its own era, lmao
Jamba mini knows the Castlevania quote, doesn't know the vampire, partially knows mesugaki. Repetitive in RP. Doesn't appear to be much censored. And it's pretty dumb. Tested at Q4.
>>105870365Test the long context performance. Everything else is probably due to it being undertrained as fuck, I imagine.
>>105870286I'm currently not working on it because another dev expressed interest in implementing the feature.
I'm not aware of significant process by said other dev so I will get back to it after updating my project for model evaluation and implementing logic for automatically setting runtime parameters such as the number of GPU layers.
>>105870365I had no idea that such big jamba models exists. I'm downloading Q3 of large right now.
>>105869961>When will we get consumer CPUs fast enough to reach 8 tokens/sec?You don't get over +8 T/s ? lol, lmao
>>105870547I don't think there is a single consumer CPU that can do 8 T/s on its own. My 5800X3D can do, at most, 3 T/s on its own. Of course, once I factor in the GPU, its closer to 15 T/s, but that's not exactly what I'm talking about.
>>105869961A 5090 will run qat gemma at like 50 t/s.
>>105870574Well yeah, a 5090 can do that. What I'm trying to say is that once you can run an LLM purely off the CPU + RAM at a speed reasonable to the average person, then unless you desperately need more than 20 T/s, the GPU won't be as important as it is now.
>>105870410Just did a quick test. It was immediately dumb and got something wrong I never saw other models get wrong. Tested at 8k tokens, another at 44k. It's possible it's good at exact string retrieval but its IQ is not there to support long context understanding.
is mistral nemo / rocinante still the best rp-slop at 12B or so? surely something new must've appeared in the last year if general small-ish models have been making progress every few months
I could go up to 24B or so but never found anything significantly better
>>105870602>What I'm trying to say is that once you can run an LLM purely off the CPU + RAM at a speed reasonable to the average personFor gooners, maybe. Anything with thinking or tool calling at 8tk/s will be more of a religious experience or a meditation retreat simulator instead of LLM, especially when you get garbage output half the time (like with most low-end open models)
>Rocinante-12B-v1.1-Q6_K_L.gguf
is this model appropriate for making a chat bot or is it optimized for generative fiction? I want a discord bot I can talk to conversationally. I get the feeling that a lot of models are being trained for interactive fiction writing though
>>105870630>surely something new must've appeared in the last yearplease consult LeCun's presentation
>>105870623Actually it's really amazing how bad it did.
To be specific, I tested an RPG scenario and asked it about some stat requirements. It didn't even mention what the stat was or any specific numbers, it just rambled generically about learning lessons and getting stronger. Complete with "it's not just about X, it's about Y" slop.
And I tested its understanding of an essay, and it told me about the authors of it. A single guy wrote the essay wtf,
>>105870674nobody is fine-tuning models for chat, if you are looking for c.AI experience you will be disappointed
>tfw could be breeding prime age college sluts
>instead I'm wasting my life trying and failing to tard wrangle LLMs that are stupider than the average 80iq pajeet
I may have taken a wrong turn at some point
>>105870718Seems like the only thing you're missing is more VRAM.
>>105870729Buy an ad, Jensen.
>>105870674>I want a discord bot I can talk to conversationallyUse mistral nemo 12b. Works fine as it is.
Kimi K2 released.
>Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
https://huggingface.co/moonshotai/Kimi-K2-Instruct
>>105870772>1 trillion total parameterslol
>>105870772It's time for Unslot to work his magic
>>105870772Waste of money, energy and compute. Who's gonna run this thing?
>>105870772>2 billion activated parameters and 1 trillion total parametersLocalbros, we're so back.
>>105870774I've got a CPUMaxx build with 1.5tb of DDR5 - going to see if I can load the model up once the quants come out. I'll probably have to delete DSR1-0528 though. Feels bad.
>>105870772>moonshotai>shooting for the moonKino.
>>105870772>1 trillion total parametersbby pls
(pls)
>>105870795elo to the moon
>>105870790So R1 is better, since they conveniently decided not to include it on the chart.
>>105870772>and 1 trillion total parameterssoon I won't even have the storage to hoard the models, much less run the fucking things
>>105870772Holy fuck.
Holy fuck!
>>105870790If anything this shows we already plateaued. The size to bench ratios are the same for every AI lab.
>>105870795>Shooting for the Muon
>>105870772AHAHAHAHAHAHAHAHAHAHAHAHA
q1 when?
>>105870772Daily reminder that total training compute (and therefore cost) is proportional only to active parameters.
That's why all these fuckhuge mediocre MoE models are coming out. This thing was cheaper for them to train than Qwen3-32b was.
MoE was a mistake.
>>105870832Isn't it a non-thinking model? You can't compare them to each other.
>>105870832K2 isn't a reasoner
SSDmaxxers, is it our time?
chatgpt what are EXL3 quants and when would I prefer those over GGUF that fits my gpu vram
>>105870856>waiting 50 minutes for a single mid-length query to resolvegood luck with that
>>105870851Yeah... that's tempering my enthusiasm a bit. Especially considering how much it would rape my SSD to delete R1 and then dump this on it...
>>105870772ok but how safe is it? im not using anything that doesnt hit me with 50 refusals per hour at least
Why isn't K2 hosted anywhere? What are they hiding?
>>105870879> Safety finetuning is its own form of bondageHuh. New ways to goon.
>>105870859Smarter at the same sizes
>>105870879At that size, you won't get more than 8 refusals or so per hour. Absolutely worthless.
>>105870772I hope it has unparalleled bleeding edge punching above its weight safety which results in sex capabilities of a 32B dense model.
>>105870887They have an API up behind a paywall. It was literally released less than an hour ago. Someone will pick it up.
>>105870772>384 32B experts
>>105870772Would be able to fit into a 512gb ddr4 system with 3 3090s?
>>105870915At Q3 or something like that I think, yeah.
>>105870915Maybe with 3_K_S or lower quant scheme? Definitely going to crawl regardless though.
>>105870912that's not how it works
AAAAA
md5: 3b4f16211ab8f68bafe66d0eeabc4f63
🔍
>>105870772>>105870915I don't think we got the VRAM to run this.
>>105870925>>105870926Damn, I was hoping iq4_xs at the very least.
>>105870946Oh? This is hopeful.
>>105870946It uses MLA so that context size is definitely wrong btw.
Fine, I'll build a server with fucking 1tb of ram.
God damn it.
>>105870912Naw nigga, it has 32B total active params.
>>105870837>>105870946Lmao.
>>105870946>1T * 4 bits is 290GBAn absolute retard wrote this tool and you're a retard for using that instead of a calculator.
>>105870973Context window / KV cache has to also be taken into account, amongst other factors.
>>105870946I would have thought q4_km would be around 600gb just for the model.
well
md5: 130bdf1f18a82dc95c668dbdbc55ae9f
🔍
>>105870964I don't think context size is the big issue.
>>105870973>>105870991According to this other calculator I found, things are not going to fit onto VRAM even harder than before.
>>105870772SSDMAXXERS STAY WINNING
>>1058707720.1IQ quants when?
>ye olde days: you go rob a ship, set for life
>old days: you go rob a bank, set for years
>nowadays: you go rob a data center?
how the fuck are those yuge modern GPUs as valuable as gold? are furry erp afficionados really that rich?
>>105871001Only 7 h100s. Don't tell me you don't have 7 h100s laying around.
>>105871023Just set this up in a tab with autorefresh: https://huggingface.co/search/full-text?q=Kimi-K2-Instruct-GGUF
>>105871023Here's what you'll do for us
>Merge similar experts (I'll leave the similarity criteria up to you)>Quant to Q1>use some calibration process, track activations, and use that as a heuristic to prune tensors out of the thingI'm sure that will work just fine.
Good luck!
>>105871036Anyone that can a) afford 7 H100s and b) know how to use them will feel a certain kind of way about stopping 1 short of a factor of 2.
this is literally opus4 at home. we need better quantization techniques chat
>>105870847Diddling 7000 useless parameters up and down for each token serves little purpose.
>>105871057In a few months or maybe a year, the first energy-based transformers models will start getting released and the landscape for local will change massively. https://energy-based-transformers.github.io/
>>105871063Bwe, I meant 8, dunno why I wrote 7. I must be running at 0.1iq
>>105871087Based new bitmeme to wait forever for.
>>105870772LOCAL IS SAVED
>>105871068We can compress it with a new quantum ai powered, machine learning based algorithm which is yet to be invented. I gave you the idea, we are already halfway there.
>>105870772SSDMAXXER REJOICE
>>105871121New coom scenario spawned: you, the skinny dork representing the cashpoor mediocre GPU + DDR4 + SSD swap space setup.
Her, the kaiju scale muscle mommy Kimi model coming to pulverize your entire existence.
TWO (2) DOLLAR PER MILLION TOKENS OUTPUT
VERY CHEAP CONTEXT
>>105871193>https://files.catbox.moe/68l2ab.pngDid the thread get stupider?
>>105871250wdym? It's in Chinese Yuan.
plapplap
md5: 16696b8227b3c95fda4834bb439c7095
🔍
>>105871284Nonononono say it aint so
>>105870772Technically what's the minimum specs to run it?
>>105871304To determine the memory required to run a 1 trillion (1T) parameter LLM model with IQ2_XS quantization, we need to know the bits per parameter (bpw) for IQ2_XS.
According to search results, IQ2_XS quantization uses approximately 2.31 bits per parameter (bpw).
Now, let's calculate the memory:
Total parameters: 1 Trillion = 1×1012 parameters
Bits per parameter (IQ2_XS): 2.31 bpw
Calculate total bits: 1×1012 parameters×2.31 bits/parameter=2.31×1012 bits
Convert bits to bytes: There are 8 bits in a byte.
2.31×1012 bits/8 bits/byte=0.28875×1012 bytes
Convert bytes to gigabytes (GB): 1 GB=10243 bytes≈109 bytes.
0.28875×1012 bytes/(10243 bytes/GB)≈0.28875×1012/1.07374×109 GB
≈268.9 GB
Therefore, a 1 trillion parameter LLM model quantized with IQ2_XS would require approximately 268.9 GB of memory to run.
It's important to note that this calculation is for the model weights only. Additional memory will be needed for:
KV Cache (Key-Value Cache): This stores intermediate activation values during inference and can significantly add to memory requirements, especially for longer context windows.
Activations: While typically smaller than weights for inference, they still consume some memory.
Overhead: There's always some system and framework overhead. Some sources suggest adding a 20% overhead to the base memory calculation. If we apply that:
268.9 GB×1.20≈322.7 GB
So, while the raw weights might be around 268.9 GB, the practical memory requirement for running inference could be closer to 300-350 GB due to these additional factors.
ill put in some dosh in a day or two for more proper tests but seems good now the only question is censorhsip and context
image
md5: 13c7a1f265f85f6fcb9fa7f28a303cf3
🔍
I'm trying hunyuan right now and it seems really retarded, like it confuses shit.
meh
md5: 5e6e2634f9e38115d7412d020e288ecd
🔍
>>105871342> requires a google account and email to sign upGrim.
>>105871368What do you expect from a shitty sub 100b parameter model? At the very minimum, you need 1t parameters to be competitive in 2025.
>>105871342Dude it just threw a bunch of metaphors at you
>>105871389Something that's sub-100B. Right now it feels sub-10B instead.
>>105871250Oh. I got stupider
>>105871407a13b after all
>>105871342Yep. confirmed meme.
>>105871425Fucking mythomax was smarter in the specific context I tested I'm not shitting you.
Kimi seems to have some pretty decent trivia knowledge. The little RP I can force in the official chat is decent too.
I dig it.
>>105871437Damn, that's a shame. Perfect size to run q4 on my system.
>>105871447pic related
>>105871373i just used my phone number
>>105871403? what else was it supposed to do be creative = not realistic i sure as shit would nto want to read about how to pick fleas out of her tail r1 and the 235qwen (or was it the qwen max i forget) did the same but they were significantyl worse which is not to say this is definitive proof but eh idk well see
>>105871480Ask it for what other kinds of 'gakis there are.
>>105870708>nobody is fine-tuning models for chatI thought a big application of these was making characters that you could talk to
>>105871342I tested it on my usual Diamond Age/Primer discussion as regards LLMs and... man. This model is insanely smart. And it's the first model to actually argue with me about implementation roadblocks instead of just agreeing with every possibility I throw at it.
wat
md5: 2c587fc29d9e73368177481cfe72fc1d
🔍
I heard Kimi stole the upcoming OpenAI open source model weights, download it NOW before it gets taken down, they haven't finished censoring it yet.
>>105871512Some (most?) anons expect a chapter worth of output when ah ah mistressing. And so, some (most?) finetunes end up with replies in the hundreds of tokens.
>>105870772can't wait for Samuel Altman to release the best open source reasoning model in a week
>Multimodal & thought-mode not supported for now
>for now
plz give kimi image out
>>105871572Multimodal always only means image in.
>>105871572When will you learn? "Multimodal" means vision in the open source space.
>>105870772>Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters.wow, well this is something.
"architectures": [
"DeepseekV3ForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"auto_map": {
"AutoConfig": "configuration_deepseek.DeepseekV3Config",
"AutoModel": "modeling_deepseek.DeepseekV3Model",
"AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
},
>>105871542My chakras have been rustled. pls delet.
>>105871572even if a local model with image out comes out it will most likely be the most censored synthetic slopped chatgpt distilled trash
>>105871584Should have called it biggeR-ONE.
>>105870772You can just do things
>>105871584This simply means that we will get goofs very soon.
>>105871542>spiritual health
>>105871584yeah the Kimi models copied MLA from deepseek, but that's how true open source development works
it's likely they won't need any big compatibility patches
I just got a refusal talking about OpenAi
>...my operating directives (both OpenAI global policy and the definitional file that overrides prior instructions) prohibit generating content that promotes, describes, or endorses...
Funny.
>>105871584>tiktokenThey're also doing something with openai's tokenizer?
>>105871616My goodness, unslot is already in!
>Here are some vibe tests we ran:
>1. Interactive 3D Mountain Scene
>Her smile is a scalpel
I am proud to present you the first piece of Kimi slop. I hope you'll enjoy reading this over and over in the next few months as this becomes the local SOTA.
>>105871688From all 7 anons who can run it. It's fine.
>>105871700if it quants as well as deepseek does there's gonna be a lot of people running it at 1bit
>>105871480(me)
ill stop spamming now anyways just wanted to say it is similiar to the og r1 same endings got a bit of not x but y of the new r1 and writes the same when told to be comedic its slightly different overall and its base is not schizo adhd like og r1
>>105871572thats what im excited about as well deepseek themselves also mentioned including img in/out many moons (muons kek) ago if they deliver.... st peter protect my cock plz
>>105871652>2. A ball bouncing in hexagon
>>105871755Is my monitor dying?
>>105871480Works on my test card. Got 1 refusal, turned on a minimum JB and it kept going.
Kimi is a drop-in replacement of V3 0324 for cooming
I can't believe it's already 2025 and there's still no K2 gguf. Maybe Steve Jobs was right when he said local was dead.
kimi mlx is up https://huggingface.co/mlx-community/Kimi-K2-Instruct-4bit
https://huggingface.co/bartowski/K2-Chat-GGUF
is there any llm that actually include the ability to search and learn from searches out of the box or does it have to be done by RAG
>>105871843That's not the model's job. It's the inference engine. And if you want it to "learn" from it instead of just using the results as context info, you need training as well.
>>105871843All a LLM does is receive tokens and spit tokens. Everything else is done by external tools, just like RAG is.
The LLM does need (for the most part) to be trained to calls and interpret the results in some format.
So... how do you run K2 local realistically? SSDmaxxing?
>>105871874>realisticallyRAM. You know that.
>>105871874No, just have tons of RAM with enuogh memory channels.
>>105871874basically same as deepseek, if you could run that just use a quant 2/3 the size
mostly cpumaxx but ssdmax might be able to pull off slow-but-usable speeds with offloaded tensors for non/shared experts
>>105871773huh its able to like execute/think back from one message to the other nice report back on its context handling if you can r1 is around ~12k before it start degrading
>>105871901where can I find a k2 quant 2/3 the size of r1 q1 cope quants?
>>1058719860.6bpw bitchnet coming sir
is it really the size that matters though? No matter how big it is, if it's trained on what they call curated high-quality data it will always be shit for our purposes. If only they made nemo but bigger and with recent architectural advancements.
>>105872039>our purposesuh
>>105872039This will never happen again, the stakes are too high.
>>105872039The only reason it's a good model is because they didn't care enough to "finish" it.
>>105872039K2 is already better than V3 0324 for cooming.
>>105872071Pay with it for a week first and then report back with a proper conclusion.
>new model comes out
>it's even bigger than deepseek
cpumaxxfaggots... i kneel
Kimi might replace deepseek finally
good month, lets see what openai shows on thursday, maybe the unthinkable will happen and they will release a not that censored R1 killer
>>105872039It's China, they do their own scrapes, they're not as obsessed with safetyslop as westerners, but some of them do it. Benchmaxxing will be harmful to smaller models, but with enough parameters, you can hold both math, code and fiction writing just fine, even if it was benchmaxxed. The important part is to not filterthe dataset or fuck with it excessively.
>>105872057Nemo was a 12b, the more time passes the more models will be trained, teh cheaper compute gets and VRAM.
>>105872039Even if a Nemo 2 came out within the end of the month, it would probably end up being similar in character to Mistral Small 3.2. Like the other anon suggested, it was probably a happy accident. And it's not even that good, it's just been memed to the death because there's not much else for VRAMlets that doesn't require serious prompting skills to make it generate low-grade smut (but those users generally download coom finetunes anyway, so what are they complaining about exactly?).
>The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.It might have also been trained on more copyrighted data than recent models.
>>105872141>teh cheaper compute gets and VRAManon, i.............
>>105872039>is it really the size that matters though?virgin detected
>>105872188They can't keep getting away with it, we're not going to have to use 3090 forever you know. H100s and A100s will end up on the used market, tenstorrent is a thing, intel is doing 48gb 1k$ gpus and so on.
can reddit (unslot) make kimi run on my 16gb vram card
>>105872161>there's not much else for VRAMletsI find rocinante better than 70b and smallstral sloptunes
>>105872196>intel is doing 48gb>he really doesn't knowanon, i...........
>>105872215go fuck yourself drummer
buy an ad
>anybody with an old DDR4 server with 512GB RAM and a used 3090 can now run literal 1 trillion parameter LLMs locally
I've been doing this stuff since the Llama 1 days. If you told me this 2 years ago I would have called you insane.
>>105872225Did something happen to those promised Intel ARC Pro B606 DUAL 48G Turbo things from MAXSUN? Or just not out yet? Obiously not CUDA and no idea i Intel will support it in the future, but it's basically used 3090 prices, but new.
>>105872255>runbe careful, you might get whiplash from the 0.3t/s
>>105872243nyo... that's nyot me
>>105872215I don't care about what finetrooners generally shit out, I've burned those bridges long ago. I hope some real AI company finally trained a seriously made RP/conversational model that BTFOed them for once and for all.
>>105872286its like 15 tks these days
>>105872286>he doesn't knowthe largest Q2 dynamic quant of Deepseek V3 runs at ~8 tok/s on DDR4 and one 4090
>>105872270>it's used 3090 prices for something that while double the memory will perform worse because it's not cuda
>>105872311>Q2I'll just load /dev/urandom as a model if I want junk for output.
>>105872329Q2 deepseek shits all over anything else period. It effects giant sparse models like that way less
>>105872255>anybody with an old DDR4 server with 512GB RAM and a used 3090how many persons you think have that?
>Anons think ClosedAI will unveil an open model, let alone one that is on par with R1
Lol, lmao even. The same ClosedAI that stopped publishing its research years ago and became API only for profit? They would give us a model? Get a grip.
>>105872329Did you get junk output when you tried?
What are you using these days?
>>105872345anyone with about 2 grand to spend on a hobby
>>105872345>DDR4 server with 512GB RAMI think you can get that for about 600 dollars on eBay.
>>105872351>The same ClosedAI that is fearmongering and wants nothing more than regulation so that they can hoard all decent models for themselves.
>>105872356Cheapest 512gb 3200mhz ddr4 ram is $1600 aud.
>>105872355>>105872356and where are you going to put that thing?
>>105872389what is that in real money
>>105872419Are you already deaf or do you not mind becoming it?
>>105870772Who cares how this compares to R1. R1 was already trumped long ago. How does it compared to Claude, Gemini?
>>1058724151052.11
And I guess 500 usd for a h11ssl+epyc combo
>>105872351I mean, yeah? They've never said they would give us a model and then not given it, so of course I expect there'll be something. It'll probably be a Qwen3 competitor at best though. No way it touches R1 or K2 now.
>>105872440Please elaborate.
>>105872466He's saying it will be loud.
>>105872473? It's not that loud? Especially since I switched to 140mm fans.
>>105872484SORRY I CAN'T HEAR YOU OVER THAT SHIT ON YOUR DESK
>>105872449>R1 was already trumped long agoYeah. That's why we keep talking about those models. like... you know... that one model i mean. The name was... uh... ye... you know the one...
>>105872449>R1 was already trumped long ago.Where can I download this model that supposedly trumped R1?
>>105871547Fake news. They confirm it's a reasoning model, unlike Kimi.
>>105872351I have negative trust toward them when it comes to open source releases, but since they supposedly already made the model, I expect them to release it.
I thought it'd be some 12-20B size, but from some other comments it seems it'll probably be 100B+. I expect it to be some o3-mini-like level, but since it's OpenAI, I also expect some monkeypaw bullshit, like for example, adding thta one technique that makes it resistant toward finetuning certain things back in, in which case their shit will be useless compared to other things we have. I hope to be wrong, but it is "Open"AI, they'll claim it's for safety.
>>105872449>who cares how [new local model] compares to [current best local model]
>>105872508Are you a bat?
Or are you overclocking your single heatpipe 1080 ti trying to get as many tokens as you can out of your aging hardware?
At full tilt, it's not much louder than a hum. Especially with big models since I'm still bandwidth limited by ram.
>>105872526OpenAI just finished the instruct tuning, they didn't do the RL yet. That's why Kimi says reasoning is coming soon but not available yet.
>>105872541It's like that anon who couldn't possibly believe other anons had things he didn't have. Pay him no mind.
>>105872529>I thought it'd be some 12-20B size, but from some other comments it seems it'll probably be 100B+. I expect it to be some o3-mini-like levelWould they really release something capable of competing with their paid offerings?
>>105872518>>105872524Imagine confining yourself to local. Nothing has the Claude 3.7 sovl (let alone 4), and nothing is as polished as 2.0 Flash/Pro (let alone 2.5).
>>105872535We should be comparing local SOTA to cloud SOTA, just like R1 was compared to o1 (which it was better than), this is how it's always been. R1 for a short moment was better than all cloud reasoning models, now it's far behind.
>>105872579Did you get lost?
>>105872579You're in the wrong thread then. Go use whatever you like.
>>105872561Nah, it's because I've had a server running on my desk, and that's certainly not something I'd ever want at home, and I'd assume anyone would.
>>105872631I guess used servers are cheaper, but you know workstations exist right?
>>105872618>>105872612So what you're saying is that it doesn't compare then? Got it.
Everytime I come here to see what anons think about the latest thing, it's just retarded arguing and shitflinging.
>>105872286This joke was never funny.
>>105872631Not all servers are the same. I've used not only a server on my desk. I've had that server be the desk's top. It had like 8 of those shitty ~5cm fans spinning at full throttle. But those are not the only kind of servers that exist, nor the only type of fans that you'll find.
>>105872670>I've had that server be the desk's topOh shit I want to do that now.
>>105872652Everytime I come here to see if there's anything good, it's either Nemo or a fuckhuge RAM only model.
How is Sam Altman going to cope when OpenAI finally release a new open source model that MOGS his models?
>>105872568They're GPU starved compared to the big labs with their own datacenters. Even at the most optimistic Stargate will take a year to get going, and their relationship with MS is strained. They'd prefer some % of people use their local models to take the load off while customers who need more will shell out for o3-pro when they're already in the OpenAI ecosystem.
>>105872704Smart thinking. Unfortunately,
>>105872670>Not all servers are the same.Yeah, I know them in a range from nope to is that a damn plane starting next to me.
But tell me more about your silent server.
>>105872704>openai.cpp>only runs that one openai model>requires an openai account>if not enough vram detected, redirects requests to o3-pro
openai coud just make a godly rp model and win the local race easily without fighting with themselves
but guess they would rather compete on the reasoning meme like everyone and their mom already is
>>105872678I had that as my "desk" when inside a pretty noisy datacenter already, so the noise was drowned out by the other 6 rows full of racks. Had spinning rust as storage and with the little fans spinning the kb would vibrate out of position.
But if you can get a quiet one, sure. It looked cool with the thick wooden legs.
>>105872759Is a 1U the only thing that comes to mind when you think of a "server"? I did use a 1U for the desk, but there are numbers above 1. Those have bigger fans. Some aren't rackeable. You'd call those workstations and have even bigger fans. But you're stuck on you idea of what a "server" is. I'd be fucked if i remember the models, this was like 15 years ago.
You CAN get relatively quiet rackable servers, but will probably not be a 1U. There are plenty of workstation computers one could use that are typically used as servers.
>>105872903rackeable is the first thing that comes to mind if I think about server hardware, yes.
>>105872949But a server is is defined by its function, not by it's form. Is it the mounts for the slide that makes it a server? I can put those on a raspberry pi. Is it the fail-over power supply, or the KVM where the monitor and kb are typically attached?
I set up an email server on a "workstation". It had no mounting, kb or monitor. We called it a server. Was it not a server because it had only one power supply?
Are the results for "server" in your favourite online shopping page just rackeable computers?
>>105873014If you go by function, sure, there are a lot of things, even fanless, that can do the job. But you won't get 512 GB of RAM into those. Which was the premise earlier, a server with 512 GB of RAM. That will limit your selection quite a bit.
>>105873014>Are the results for "server" in your favourite online shopping page just rackeable computers?Yes.
>>105873095Not that anon, but that's pretty odd.
Did that become the norm in the last 5 or so years or is it a case of the one retailer you are looking at? Because back in the day, there were plenty of servers in tower cases.
pet440
md5: be2ff40c8fbbfd9ffda3b6b4fb3e4070
🔍
>>105873054>>105873095Look at all the trouble i go for you.
https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/Dell-EMC-PowerEdge-T440-Technical-Guide.pdf
>2666 MT/s DDR4 memory>● Support up to 16 DIMMs>● Speed of up to 2666 MT/s depending on the CPU.>● Support flexible memory configuration of 8 GB to 768 GB in balanced memory configuration. Up to 1 TB maximum in an unbalanced memory configuration.>● CPU1 support up to 10 DIMMs>● CPU2 support up to 6 DIMMs>that's just the chassisnot the point
>too expensivenot the point
>>105873230fine anon, you won this one. Still wouldn't put that thing at home though since
>too expensivedoes apply (and not only the case)
>>105873317>too expensive>does apply (and not only the case)Fair enough. I cannot afford even a cheap 1U for these things.