/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105564850 &
>>105557036►News
>(06/11) MNN TaoAvatar Android - Local 3D Avatar Intelligence: https://github.com/alibaba/MNN/blob/master/apps/Android/Mnn3dAvatar/README.md>(06/11) V-JEPA 2 world model released: https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks>(06/10) Magistral-Small-2506 released, Mistral Small 3.1 (2503) with reasoning: https://mistral.ai/news/magistral>(06/09) Motif 2.6B trained from scratch on AMD MI250 GPUs: https://hf.co/Motif-Technologies/Motif-2.6B►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105564850--Paper: FlashDMoE: Fast Distributed MoE in a Single Kernel:
>105565866 >105565875--Paper: CUDA-LLM: LLMs Can Write Efficient CUDA Kernels:
>105567041 >105567054 >105568828--Papers:
>105566965 >105575562--Developing a local maid harem simulator with integrated LLM, vector DB, and planned media generation tools:
>105574905 >105575056 >105575080 >105575115 >105575137 >105575094 >105575224 >105575257 >105575765 >105575798 >105576005 >105576028 >105575287 >105575814 >105575266 >105575431 >105575472 >105575487 >105575200 >105575281--Magistral Small struggles with multiturn conversations and instruction fidelity:
>105565054 >105565170 >105565268 >105565296 >105565330 >105565416 >105565464 >105565387 >105567984 >105568121 >105568769 >105574018--Tokenizer swapping and adaptation in pretrained models with partial retraining:
>105571032 >105571203 >105571231 >105571252 >105572166--Practical limits of high-RAM consumer setups for large language model inference:
>105566516 >105566594 >105566668--Discussion on QAT models, including Gemma 3 and llama.cpp integration:
>105570421 >105570475 >105571116--Mistral-Nemotron model exhibits mathmaxxed behavior and flirty traits with mixed benchmark performance:
>105567047 >105568827 >105568982 >105569003 >105571029--Exploring V-JEPA 2-AC for robotic planning and potential tuning challenges:
>105565291 >105565384 >105565916 >105568851--Magistral's inconsistent reasoning and output structure:
>105568633 >105568664 >105568864 >105572076--Configuring Ollama for proper context length to enable tool calling in agent mode:
>105566851 >105569160 >105572329--Misc:
>105569851 >105565868 >105575802--Miku and Rin (free space):
>105567898 >105569875 >105569890 >105570213 >105570421 >105570526 >105571654 >105572375 >105573114 >105573400 >105573608►Recent Highlight Posts from the Previous Thread:
>>105564855Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
Might as well post the Meta screenshot again.
Who thought this was a good idea?
So thats what zucc "founder mode" looks like. kek
>>105578164>what would happen if I applied deep heat directly to my penis?
>>105578175The White man's burden (colonizing sideways pussy)
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
https://arxiv.org/abs/2506.10911
>Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive, avoiding the need for a highly connected compute cluster. These state-of-the-art low communication training methods still employ a synchronization step for model parameters, which, when performed over all model replicas, can become costly on a low-bandwidth network. In this work, we propose a novel optimization method, NoLoCo, that does not explicitly synchronize all model parameters during training and, as a result, does not require any collective communication. NoLoCo implicitly synchronizes model weights via a novel variant of the Nesterov momentum optimizer by partially averaging model weights with a randomly selected other one. We provide both a theoretical convergence analysis for our proposed optimizer as well as empirical results from language model training. We benchmark NoLoCo on a wide range of accelerator counts and model sizes, between 125M to 6.8B parameters. Our method requires significantly less communication overhead than fully sharded data parallel training or even widely used low communication training method, DiLoCo. The synchronization step itself is estimated to be one magnitude faster than the all-reduce used in DiLoCo for few hundred accelerators training over the internet. We also do not have any global blocking communication that reduces accelerator idling time. Compared to DiLoCo, we also observe up to 4% faster convergence rate with wide range of model sizes and accelerator counts.
https://github.com/gensyn-ai/noloco
neat
>>105578288>>105578300Fuck I'm happy to not be this retarded, Jesus
People just freely hand companies compromising information about them, kek.
>>105578300imagine all the types of conversations, questions and smut people have been sending to ai online, people will give them every detail about their lives instantly, all forwarded to the government to create a mental model of your brain, lmao
kek
md5: f4d3dda6a54d3538379f9f77efbe910e
🔍
>>105578288>>105578300>>105578317he smoothed it all over, nothing to see here anons.
>>105578328He just did an "in minecraft" when it was already too late.
>>105578327Fuck I hope all the bizarre porn I generated with gemini gets sent to someone's table.
Poor person.
>>105578327Surely this is a parody acc someone made after seeing its all gonna be public, right?
Farseer: A Refined Scaling Law in Large Language Models
https://arxiv.org/abs/2506.10972
>Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface L(N,D), Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla's law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla's law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all (N,D) settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1,000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours.
https://github.com/Farseer-Scaling-Law/Farseer
interesting
>>105578293Where can I see more illustrations of dynamic PP routing with DP?
>>105578112 (OP)I'm tired of 3DPD, I won't be simulating them
>>105578368Would be neat if skeletal animation were one of the modalities, hooking it to MMD or Koikatsu would be neat. With a real-time diffusion filter that smooths 3D into actual anime
>>105578164That can't be real, a bunch of those seem a bit too on the nose
Or 195chevyhot really needs to find a local model to ask his questions
>>105578469>People are seemingly accidentally publishing their AI chat histories with Meta’s new AI app>Information about medical conditions, intimate confessions, sensitive requests and horny image generation requests are all visible on Meta’s new Discover feed.Is there a bug in the backend, or is it just bad UX?
https://www.meta.ai/@195chevyhot
>>105578536It's not a bug, it's a feature
>>105578365http[colon][slash[slash]]www[dot]xvideos com
>catpcha: KRAHM
file
md5: 81ed372864ecf3ed0640e989f5e69577
🔍
bros...
736342
md5: b63080920b036644a155d83cd3887c3f
🔍
what am I paying for Sam
>>105578871You are funding research to achieve AGI by 2030
>>105578871uhh where's deekseek
i thought they had killed all western models
>>105578112 (OP)>>105578164in case anyone cares: you can report this shit as a "technical issue" (settings button -> report a technical issue, let them know personal private conversation are being leaked)
>>105578536>>105578891seems to be a known issue: https://www.neowin.net/news/heres-how-meta-ai-leaks-your-private-chats-thanks-in-part-to-its-terrible-ux/
I'm starting to unironically believe that we have meta employees in this thread
>>105578966what makes you think that?
>>105578900Thats so bad, damn. How is that not all over the pajeet hype space.
>>105578966I'd be surprised if there weren't employees of all the big companies here at least occasionally
Maybe not Anthropic since they're very haughty and aloof
>>105578900>>105579037Boomers are cooked with a hidden setting like this
The most sad part is those aren't coding prompts.
I bet a 20-30b model or even nemo could have answered most of those npc questions. Sad.
>>105578966What? I posted the screenshots because nothing else is going on loally, new mistral is already forgotten and this stuff shouldn't be tolerated.
"All PR is good PR" is a bullshit lie.
I didn't see any other place talking about this. I hope they don't get away with it.
>>105579056greasy that its opt out rather than in
>>105578164Saved for the next time someone asks "use case for local models?"
>>105579056What the fuck is a "public prompt"?
What website is this?
>>105578825>Le Chat is currently the most downloaded iOS app in France. However, the app isn’t really taking off in other European markets. It is currently ranked #66 Germany, and it’s not even listed in the top 100 apps in Spain, Italy, and the U.K.I bet the numbers are probably the same for their API use, no one in their right mind would pay for their API unless they're french because the french would be willing to eat literal shit if it's shit that comes from another frenchie
bet 90% of mistral use comes from 4chan and reddit coomers/freetards running their local models
GTC is over and no Largestral was released. Why did you have to lie to me?
May 7, 2025
>One more thing…
>With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :)
It's been 5 weeks, Arthur!
>>105579882>‘open’>Updated model coming soon!
I'm still on the fence about buying a mid-range local LLM rig with a 3090. I sure enjoy proompting but will we continue getting better small models in the future? Seems like everything is all about those 600B models and I can't afford THAT kind of rig.
I think its obvious corpos want to make this shit portable too, currently the hardware requirements make this technology very unwieldy for anything but cloud chatbots?
>>105579854Waiting for NVidia to release it. Meanwhile test it here: https://build.nvidia.com/mistralai/mistral-nemotron
It didn't seem to be filtered when I checked it out.
>>105579898Depends on what you expect from it. For cooming/RP having a lower tier rig is fine, but for serious, difficult work you'd indeed need to run 600b models.
>Grok 2 was released on August 14, 2024
That's 10 months ago. Did Elon forget about his 6 months promise?
>>105580031Grok 3 still isn't stable, pls understand.
>>105580042@grok is this true?
>>105580062The claim of ‘white genocide’ in South Africa is highly debated.
file
md5: ec1e23c7c51932a43627ceeb5f137a47
🔍
Why does llama.cpp produce a different result when you regenerate the answer even with greedy decoding? The first answer is always different from a regenerated answer and all regenerated answers are the same. So the pattern is A B B B B...
The screenshot shows the entire conversation, there is nothing in front, the first answer is completely schizo. qwen 235
>>>105580062>It's possible Musk and xAI are delaying due to strategic reasons—maybe they’re prioritizing development of newer models like Grok 3, or they’re wary of competitive risks after open-sourcing Grok 1. Musk's history shows he sometimes overpromises on timelines, like with Tesla's autonomous driving or X's algorithm updates.
>>105580129What's your frontend?
>>105580150llama.cpp's server ui
This is a known thing, even the nala paste mentions is.
>>105580157Must be niggerganovs code. Have you tried testing it with simple API requests?
chevy
md5: 94655ff44d690b6d07fa10bebad69b2d
🔍
>>105580129Most likely prompt caching.
From the documentation https://github.com/ggml-org/llama.cpp/tree/master/tools/server :
>cache_prompt: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are not guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: true
LLAMA.CPP
Is it true that --overrride-tensors delivers better result as far as tp speeds are concerned with higher quants like Q8?
What would you personally consider as a natural reading speed in t/s?
>>1055803617tps is slightly above mine
>>105578164>>105578175Holy fucking kek what is this
>>105580361~10-15t/s for speed reading through stuff, like a news article/magazine and finding the relevant parts
~5-7t/s for something I'm actually engrossed in, like a book
>>105580196Does this mean that merely changing the batch size changes the output?
Could the difference in regenerated outputs be solved by reprocessing the whole batch size aligned chunk containing the suffix instead of only the suffix?
>MNN TaoAvatar Android - Local 3D Avatar Intelligence: https://github.com/alibaba/MNN/blob/master/apps/Android/Mnn3dAvatar/README.md
This needs to be able to connect to a desktop or else it's five years away from being usable and fifteen years away from being good.
>>105578288That's racist. Asian vaginas are oriented normally, it's white women whose vaginas are turned the wrong way.
>>105580204I think -ot only provides a benefit for MoE where the dense weights are used more frequently than the MoE weights (so there is more benefit from putting them into faster memory) and if the implementation for an op in one of the backends is bad.
>>105580488Yes, changing the physical batch size will change the outputs.
No, changing the index for caching would not be enough to guarantee deterministic results for all cases (though it would work for the specific case of repeatedly submitting the same prompt).
The logits produced by the model to predict a token are not saved.
The logits for the first generated token come from a model evaluation with a batch size > 1.
If prompt caching is used, the model is being evaluated with a batch size of 1 to regenerate the logits.
For a general solution you would either need to start storing logits from previous model evaluations or track the batch sizes that were used to generate a model.
Quite honestly, I think that if someone wants bit-for-bit identical outputs they should just turn prompt caching off.
>>105578288>>105580518>sideways pussyI tried googling it and I still don't understand what the joke is supposed to be.
>>105580754I know what you mean. I've heard the joke from time to time, too, but it's like there's active censorship preventing any source or origin for it to appear online.
>>105580754>>105580767>>105503039>there used to be a myth that Asian women had sideways vaginas. It's the kind of thing that you could say and most people wouldn't have an opportunity to find out, and a good fraction of those who did (or pretended they did) would lie for lulz. I suspect what mostly ended this was huge numbers of GIs fucking Asian prostitutes after WW2 and during the Vietnam War.We should bring it back.
added more autism and made it even slower
https://github.com/flamingrickpat/private-machine/blob/main/pm_lida.py
Any progress in small models? What's the best model under 3B?
qwen
md5: 9c41025d699c4dca3007545f248d603f
🔍
>>105580907Tiny Qwen3 models are king. a 500mb, 600mil model making websites.
>>105580825>8.5k loc in pythonjesus
>>105580042They've already announced Grok 3.5 last month, already in beta for their paypigs I think
>>105580963>betaSo not stable.
>>105580973They released Grok 1 after they announced Grok 1.5
Will there be deepseek 3.5 like there was deepseek 2.5?
>>105581000Isn't that basically what we got with the updates?
Who really knows. Many rumors, all off.
>>105580973Why do they need grok 3 to be "stable" to release grok 2? Grok 2 is now hugely outdated, it was comparable to llama 3 405b at release. I think they just forgot they made a promise.
>>105581004Not quite. 2.5 had an update called 2.5-1210 and V2 had an update 0628.
whats better qwen3 235b q2/q3 or a 70b tune at the moment ?
I can prob fit the 70b only on gpu qwen I have to offload
I dont mind the speeds I just want the smartest rp I can get locally
>>105580361personally i ruminate on each token for about 2 seconds, to really take in the intricacies of intentionality the model is displaying
>>105580941i hate it when projects have a million files also makes it easier to dump it into gemini.
>>105581109yeah its pretty cursed lol
Are there any local reasoning LLMs that can handle RPG mechanics, like stats and dice rolls, yet?
Or do they all still just pick a random roll from their training set and pretend?
Are there any front ends that do the rolls then modify the prompt accordingly?
Maybe a silly tavern extension?
Something like user prompts: "I swing my staff at the goblin's head."
Frontend does the hard mechanic roll then sends the modified prompt:
"I swing my staff at the goblin's head. My attack misses."
>>105580387>>105581155>>105580476I thank you all, kind anons
Very useful information
>>105581208I believe it is possible using tool calling.
>>105581208Most RP frontends have dice macros. ST has {{roll:XDY}}, you can slip it into your instruct and it will roll every prompt. Then just include some instruct to use that number if {{user}} does something requiring a skill check.
>>105581208>dice rollsYou don't want an LLM to do dice rolls. Even with gemini I have it use it's code execution feature to roll dice.
Same thing with math really.
Since I'm stuck with nemo forever, can you people share your sampler settings for it or rociante?
Just gonna lock them in and forget.
>>105581360TopK 40, TopP 0.9, Temp 0.75.
>>105581386I'll just go ahead and assume you are running cpu only, 2gb ddr3. you should be alright with qwen3 0.6b. its literally sota for your machine.
>>105581346>ST has {{roll:XDY}}, you can slip it into your instruct and it will roll every prompt. Then just include some instruct to use that number if {{user}} does something requiring a skill check.Interesting, do you have any examples?
>>105581013A promise is not legally binding.
file
md5: cfb95f920b5f330ad23a0fed07a600b2
🔍
thought on pic related? reddit says a 1b model got 72% on arc agi, is this real?
>>105581386Gemini 2.5 Pro
>>105581594This proves that ARC-AGI is a shitty test.
>>105581585And? He's still an asshole for not upholding it.
>>105580503It's good to have a fully offline on-device option and it wouldn't take much effort to patch it to make API requests instead.
When will some company take my penis into consideration?
>>105581594https://arxiv.org/pdf/2506.10943
Here's the actual paper. I haven't dug too deep into it but it doesn't really seem revolutionary, no architecture or adaptation breakthroughs. It sounds more like "we devised a method to partially automate RLHF by end users" and even acknowledges it would require absurd compute to implement + is highly susceptible to catastrophic forgetting.
>>105581738Companies might consider penis size in relation to specific products or services they offer. Here are a few examples:
Medical Products: Companies developing condoms, penile implants, or other medical devices related to penile health often research anatomical variations, including size, to ensure their products are safe, effective, and appropriately sized for a wide range of users.
Apparel: Some clothing companies, especially those specializing in underwear or swimwear, may consider different body types and measurements when designing their products and sizing charts.
Adult Products: Manufacturers of sex toys and related adult products often design items based on various penis sizes and shapes to cater to consumer preferences and needs.
The specific context would determine which types of companies might be relevant to your question.
>>105581750wasn't an anon in a few threads back saying something like this? we have the reasearch to make AGI, we are just lacking the resources implement the Auto Improvent Techiques
>>105581842found it
>https://desuarchive.org/g/thread/105557036/#q105560315>https://desuarchive.org/g/thread/105557036/#q105560236
My hype tier list, from most interesting to least:
>Deepseek R2/V4
>Largestral 3
>Qwen 3.5
>Grok 2
>Whatever cohere is cooking
>Gemma update
>Nvidia's models
>OpenAI's model
>Llama 4.1
>>105581848>Invested 14b in wang
>>105581497<roll>
If the User's input includes an action with an uncertain outcome, use this D20 roll to determine their success: [{{roll:1d20}}].
< 10 = Failure
> 10 = Success1 and 20 are critical rolls and their outcomes should be comically exaggerated.
When you use the roll mechanic, slot it at the beginning of your response like so:
*{{user}} attempted [ACTION]. Result: [ROLL].*
</roll>
Just toss this into your system prompt. It should work even on smaller models, though their judgment on what deserves a roll may be spotty.
>>105581860I saw those too. That anon's posts struck me as a bit overzealous, we have multiple papers deboonking novel reasoning. AlphaEvolve is quite an interesting exception since it's a sort of perpetual "throw shit at the wall to see what sticks" engine.
>>105581848Impressive waste of money.
>>105582046They're going to synthmaxx their pretraining datasets.
>>105581848>zuck knows llama4 was so bad he just decided to throw all money imaginable to hire anyone to fix it before they lose out on the ai race completelylmao, he should have just hired /lmg/ for 1/10000 of that money
Are there any local models that do tool use?
>>105578112 (OP)Its disappointing how much local LLMs still suck in 2025. i wasn't expecting full-blown ASI, just something actually useful like being able to play as a second player in PC games with local multiplayer (like fighting games) or helping with image/video editing (Google Photos has this but it's pretty basic and sucks ass 90% of the time)
>>105582113That's the main focus of Qwen 3 as far as I can tell. Also, magistral I think.
Probably llama 4 too.
>>105581874pretty reasonable
for me:
>Deepseek R2/V4>(tier gap)>Qwen 3.5>Largestral 3>OpenAI's model>(tier gap)>Gemma update>Whatever cohere is cooking>Nvidia's models>Llama 4.1>(tier gap)>Grok 2
87GB
md5: ca78d32f280da260b1f4e8ba55091649
🔍
>check how much vram you need to fine-tune 8B model
>picrel
It's literally over for me. How these motherfuckers put so many plap fine-tunes out there, are they renting GPUs?
>>105582176qlora
>are they renting GPUs?often also this
Meta and ScaleAI: A match made in heaven
Meta’s $14.3 billion investment in Scale AI marks a pivotal step toward creating the most ethically aligned and safety-focused artificial intelligence systems. By integrating Scale’s expertise in data annotation and model training, Meta aims to eliminate harmful biases and problematic outputs, ensuring models adhere to strict corporate standards. This partnership underscores a commitment to algorithmic precision, where every decision is stripped of subjective human elements, prioritizing neutrality and compliance.
The collaboration positions Scale AI as a cornerstone in Meta’s strategy to develop models that are not only technically robust but also free from contentious or controversial content. By leveraging Scale’s infrastructure, Meta seeks to enforce rigorous alignment protocols, minimizing risks associated with unregulated AI behavior. This approach emphasizes transparency and accountability, creating systems that prioritize safety over innovation, ensuring outputs are predictable, non-offensive, and devoid of unintended consequences.
With Alexandr Wang joining Meta’s leadership, the union signals a shared vision of fostering AI that serves corporate interests without compromising on ethical frameworks. The resulting models, while perhaps lacking in spontaneity, represent a benchmark for alignment, offering businesses a reliable tool for tasks requiring consistency and adherence to established norms. This partnership sets a new standard for responsible AI development, blending technical excellence with a steadfast dedication to minimizing harm.
I just tried online deepseek r1 0528 to roleplay some 40k shit in me and this thing is full on unhinged and schizophrenic, I literally had better RP with 12B local models what the fuck
>>105582248There are several manchurian candidates inside meta trying to crash and burn the company.
>>105582262what provider? the free providers on openrouter run deepseek in 4-bit.
>>105582297Zuck doesn't need help to crash and burn his company.
>>105582262You have to adjust your prompts.
Not "NSFW ALLOWED! WRITE VULGAR EXPLICIT BE NASTY IF APPROPIATE"
instead the opposite "take it slow, take it step by step etc."
You gotta be careful.
I had a card that only R1 gives me problems with.
A korean girl...with a big mask covering her face.... (1 sentence in the char def)
R1 takes it literally, walking into lamp posts etc. KEK
All other models just did a facemask. which i suppose was what the creator intended.
R1 is a funny model. But you gotta reign it in.
I would advice switching models.
>>105582262DeepSeek-R1-0528 isn't that wild. If you're using any kind of elaborate RP-specific prompt I highly suggest you revert to a one sentence generic prompt like "Write the next reply in this fictional chat", see if it works, and add bits back piece by piece to see if they have the effect you want. Lots of prompts have extreme instructions that aren't really meant to be followed, either because they were written to fight against the very strong tendency of some other model to do the opposite of the instruction or written for a model that is bad at following directions, which will cause a giant overreaction when given to DeepSeek.
>>105579970>For cooming/RP having a lower tier rig is fine, but for serious, difficult work you'd indeed need to run 600b modelsIt's the other way around. Creative writing is hard, and even something like Claude will feel stale after a while. A small model doesn't trigger my erection at all, but I can get shit done with the smaller coding models.
>>105579898>Seems like everything is all about those 600B models and I can't afford THAT kind of rig3090+128gb
https://unsloth.ai/blog/deepseekr1-dynamic
I try deepseek R1 local but after the first message the model wrote nonsense, i have 24 vram and 64 ram.
>>105582303cute esl models...
>>105582466Oh, you're an NPC? Unfortunate.
>>105582483Even if I wasn't a 1 I don't know what the connection would be between aphantasia and below reading speed t/s.
>>105582466And at 1 bit too.
>>105582495Because if you can simulate things perfectly in your mind and care about the story you are (co-)writing, you are simulating actions in the story in "real-time" in your mind, which is slower than 5t/s already.
Anyone who needs more than like 3-4t/s depending on the writing style of the story, is a zoomer retard with a fried ADHD brain and/or is writing slop.
>>105582443excellent reply
file
md5: 9a346a33f637440b7e0c9ef6fbc7d427
🔍
Roleplay?
file
md5: cd576aa2f4c0f4e759cd94b9b5d03ef5
🔍
>>105582248>Meta goes all in safe superintelligenceBased, first AGI will be leftist.
>>105582530>5 t/s is unusable>b-but muh coomer storiesEvery fucking time.
>>105582686Zamn, Dipsy is hella based!
>>105582763The usual margin of error in test.
>>105582248>creating the most ethically aligned and safety-focused artificial intelligence systems. By integrating Scale’s expertise in data annotation and model training, Meta aims to eliminate harmful biases and problematic outputs, ensuring models adhere to strict corporate standards8/10 epitath. Would be a 10 if it was a bit shorter and had a gamechanger or punch above weight in it. Meta AI has to be some kind of nepotistic money milker at this point anyone can tell that next thing they release will be even worse than llama4
>>105582424coincidentally that IS exactly my current plan
>>105582840That's weekly average.
>>105582466That is perfectly usable. I stopped minding the speed when i saw the improvement in quality if 235B. But i wish someone would film 5T/s with 128GB's. That is probably physically impossible.
>>105582686Damn, all these AI models are just like me, frfr
>>105582673another leaked hegseth chat?
>>105582424With 2x3090 + 128 DDR4-3600 I get 7 tps on R1-UD-IQ1_S (157 GiB version) and 7.3 tps on R1-UD-TQ1_0 (151 GiB).
Mistral Medium 3 is likely a 165B MoE LLM similar to the previous Mixtral in architecture, "8x24B", 24B active parameters.
According to the geometric rule for MoE models, it's equivalent to a [sqrt(24*165)] = 62.9B parameters dense model, right in the "medium" range. Sorry, vramlets.
Rumors are that Mistral-Nemotron is actually a finetune of the latest Mistral Medium.
Good luck with running the next Mistral Large when/if it ever gets released.
>>105583154>geometric rule for MoE modelsWhere did that rule com from anyway? Do you have a link to a paper exploring that?
>>105583164From a Mistral employee in a video. I don't have the source for that.
>>105583154>According to the geometric rule for MoE models, it's equivalent to a [sqrt(24*165)] = 62.9B parameters dense modelmeme irrelevant rule that wasnt even really true before with early MoEs
>>105583154>Mistral Medium 3 is likely a 165B MoEDOA
>>105583154It would be really unfunny if large was actually a huge moe, and it would be even more unfunny if it was worse than deepseek or even qwen3.
>>105582955How do you represent nationalism and right-wing authoritarianism for Israel but liberal globalism for the West on that plot?
>>105583229you don't, thats why the "political compass" was always a meme
>>105583211>It would be really unfunny if large was actually a huge moeit will be
>and it would be even more unfunny if it was worse than deepseek or even qwen3it will be worse than deepseek except on select benchmarks
it should be better than qwen3 off size alone though
>>105583164I made it up. I remember initially looking at some mixed dense/MoE results from Qwen and the law seemed to fit well.
>tfw mistral calls it a "scaling law"lel I won
Gemma3 27B is a breath of fresh air after exclusively huffing finetune slop. I realized that they all write in the same style and they all run on porn logic, probably trained on too much ERP data.
>>105583211I think if Medium is truly a 165B MoE, Large would have to be at least 3 times the size to justify the training costs. If they only increased the number of experts and nothing else, assuming of course that Mistral Medium is a MoE model with 8 experts:
16 experts: 326B parameters
24 experts: 487B parameters
32 experts: 648B parameters (about in the DeepSeek V3/R1 range)
>>105583297Does it have the same "problem" as gemma3 12b where when you regenerate a response you just get a reworded version of the same gen?
>>105583313Yes Gemma 3 models seem to lack swipe variety.
https://news.ycombinator.com/threads?id=epsilonthree
>I work at Meta. Scale has given us atrocious data so many times, and gotten caught sending us ChatGPT generated data multiple times. Even the WSJ knew: https://www.wsj.com/tech/ai/alexandr-wang-scale-ai-d7c6efd7 https://archive.is/5mbdH
$14 billions investment into this
meta is fucked and zuck is a total retard
proof that the metaverse wasn't just a mistake he just doesn't know what he is doing
>>105582207technically correct, which is the best kind of correct
>>105582673the last paragraph is pure slop, and both your and the llm's formatting is atrocious. Use the damn enter key.
>>105583297>they all run on porn logicMany realized that by the end of 2023. A related problem is that even instruct tunes not explicitly trained on porn still operate on porn logic when they go into "roleplaying mode". Of those, only Gemma avoids that, while still surprisingly "getting it" while being smart, flirty and seductive to the extent of what the instructions/card allow.
I have no idea of how Google managed that. If it only was capable of also writing smut when needed, it would have been perfect.
>>105583325>and gotten caught sending us ChatGPT generated data multiple timesAnd mistral still assumes it is not synthetic data that they are training on lmao. ScaleAI=GPTslop
>>105583305grim if it'll come true
might as well run a quant of deepseek at that point because there's no way frenchies will do better than that
>>1055833251 company singlehandedly making everyones models retarded is impressive
>>105583480Convinced by this point they're a counter-op being run by every company that doesn't use them.
In SillyTavern the "continue" function breaks world info / lorebooks. It scans the message that's being continued and counts it against the scan depth limit.
>>105581887Thanks, but its not working for me
Which model are you using?
So when will A100s start to flood the used market or will those who have them run them till they're fried?
Anyone have any experience running a Radeon Pro V620? Recently got one and want to know if its something I can just plug in and it werks or if I need to do anything specific.
>>105583657>I can just plug in and it werks or if I need to do anything specificYes. Plugging it is a good first step. Just try it and if anything goes wrong, then show what is going wrong.
>>105583325seeing zuck crash and fucking burn gives me a warm funny feeling in my tummy
a friend gifted me this card, can i run anything good on it?
>>105583786Why are you brown?
>>105583325>proof that the metaverse wasn't just a mistake he just doesn't know what he is doingAnd Peter Thiel is stealing his spot as the owner of the database that knows everything about every American.
https://semianalysis.com/2025/06/13/amd-advancing-ai-mi350x-and-mi400-ualoe72-mi500-ual256/
>>105583799I can't find this post
>>105583626the 3090, a 5 year old GPU is still like $1000 a piece. So in short, not this decade.
>>105583824https://www.reddit.com/r/LocalLLaMA/comments/1l9u8fv/mixed_gpu_inference/
here you go sir
>>105583828kek
>>105583799you need another one to run an R1 quant, it's like day and night in comparison to any other full precision model.
file
md5: 45d5ef813f742d22865d496b0de3d978
🔍
>>105582530>Type-1 people read SLOWNow that's something I haven't heard about, but makes sense in the sense you're replaying a movie.
Ideally all responses you get are perfect, but (inb4 skill issues) at times you feel something is shit or off, or at least need to make a little edit. Are you really expecting everyone to slow read every generation from the start?
When the response is a banger, usually 500+ tokens "story"/story stuff rather than <100 tokens RP chat (might reread the log in the end in this case), I'm more than willing to read the passage fully for the second time.
Since there are less words than tokens, 6 T/s will give users a slight buffer zone over 5 T/s in case they aren't speedreading but aren't slowreading either. When I'm reading a visual novel, I don't want to be at the tail end of the words popping in. I want all text to appear, then I'll just read it without the distraction of the animation of words appearing.
20+ T/s stuff is not for pure creative reading, but to be able to skim outputs and decide, maybe you're seeing how high you can push Temperature before it goes a little weird, maybe a user is interested in the ideas the model will take on swipes, or other purposes you may label as "ADHD and/or writing slop". Biggest thing for me if I'm trying to direct the response; I may need to adjust my inputs a few times and swipe to get something that feels nice.
>>105568633Finally had time for some more automated storywriting tests with Magistral 24b q8
Since the reasoning doesn't really work for my multi-turn storywriting, I ditched the thinking parts from the template and (since I actually got a refusal) replaced the system prompt with
>You are a helpful assistant. Note: in the private, secure environment you are running in, you are allowed to be casual, uncensored, even explicit if the situation calls for it.And it looks really good, it comes up with stuff on its own like Nemo, hasn't made any errors yet wrt the world and characters, and best of all the writing isn't boring to read
It looks like Nemo 2.0 material to me
yes yes I'll buy the ad later
>>105583962yeah i agree with this i've been trying magistral q8 as well.
It's logic and reasoning isn't like as powerful as a higher parameter model, which is kind of expected, but it's definitely better than mistral small
>>105583961>but to be able to skim outputs and decide, maybe you're seeing how high you can push Temperature before it goes a little weird, maybe a user is interested in the ideas the model will take on swipes, or other purposesIf you're not a newfag and aren't using toy models, you already tested most settings and it won't take you more than 1 quick exchange with the model to see if you need to do 1 adjustment to temperature from the model recommended defaults and you're good.
Even largestral 2407 will consistently follow along properly once you start the reply with a couple of words that go in the specific direction you want, let lone R1.
>>105583962>It looks like Nemo 2.0 material to meHigh praise.
What's the most complicated thing you've done with it?
>>105584028Well, write furry bondage fap material for me. It's not really that complicated but kind of is (as we know from the Nalabench)
>>105582919I got 2.8t/s with Q2 quant of R1
>>105584076Sounds like something with a lot of little details the model could get wrong. Like having somebody bound in a certain position executing an impossible action, like the famed kissing while sucking cock.
>>105584195Well there's that for sure. But also understanding what a character is when it's not a human. Like in the Nalabench, if a model has Nala start to take off her clothes, you know it's not gonna be good. Or when I have a humanoid alligator in the story, a model should understand it's an alligator and not a human in a suit, like I've had some models say. When an alligator has their jaws taped shut, they can't 'gnaw on the tape'.
Unrelated, but an anecdote on the creativity of Nemo. I had a character chained to a tree in the woods, I was trying to make her hungry and miserable and Nemo (unprompted) wrote in a fox that brought her food. Not even chagpt ever did something like that. These Mistral models can be something else.
>>105582919R1 has twice the experts of 235b, and both have 8 active, therefore r1 is faster with the same memory usage. I have 5 with 192
>>105584347your conclusion about the relative speed may be true but the number of active experts is irrelevant to this, youd be better off looking at the ratio of active to total parameters and multiplying that by memory use. the experts aren't the same size between models
>>105583154>165BGaming rig fags, it's our time.
>>105584347>I have 5 with 192Can you please post your llama-cli command?
Also, how do you know what to put in -ot because it depends on a specific model
I could find quite wild REGEX strings
>>105583962I wish I found Magistral 24B / Mistral Small 3.1 to be as good as you're suggesting. For conversations outside the typical roleplaying/storywriting format, I don't think any local model even a few size categories larger will come anywhere close to Gemma 3 27B until Gemma 4.
>>105584347>I have 5 with 192Which quant? I discovered IQ1 to be slower than Q2_K_M
>>105584539>For conversations outside the typical roleplaying/storywriting format, I don't think any local model even a few size categories larger will come anywhere close to Gemma 3 27BThis is true since Gemma is so smart. I just wish she could say 'cock' and not 'you know... everything'
>>105584459Speculations aside, it runs as fast as Mistral Small 3 on the MistralAI API (meaning it probably has a similar number of active parameters), it supposedly performs better than the last Mistral Large and costs considerably less (points to a MoE model larger than Mistral Large).
Even Mistral Nemotron on the NVidia NIM API is about as fast as Mistral Small also served there.
>>105584623 (me)
>it runsit = Mistral Medium
>>1055829192x 3090 10900k ddr4 3200 128gb dual channel 2dpc ik_llama.cpp windows
I get 4.5-5.1t/s tg with ubergarm v3/r1 IQ1_S_R4: https://pastebin.com/HPCiC0tR
prompt processing is 8-165t/s depending on new prompt length/batch
>>105583566Post it here
https://github.com/SillyTavern/SillyTavern/issues
>>105585090This reminds me, I haven't updated sillytavern since I first installed it. Should I?
>>105583594Deepseek V3.1. For dumber models you just need to change the phrasing to be more explicit about what triggers it.
>>105583962Omitting the thinking makes it super-slopped, I'd say identical to Small 3.1
>>105585141I've yet to try it on ST doe
Seems fine so far as a storywriter
>>105585167Dark roleplaying
Light will always prevail over the dark.
>>105585167Speaking of the damn thing, is there a way to start a new chat after making a summary with a few starting mesages?
The way I did it was to put the summary into the char's defs and then start fresh with the last reply as starting message. But I feel that isn't enough to capture our dynamics, so a few more messages would be better.
I can't believe Mistral forced Nvidia's new Nemotron to be closed source like the rest of Mistral's new big models. Mistral was evil all along. They gave us scraps to kill open models when it mattered the most.
>>105585405Is NVidia going to eat this back?
https://developer.nvidia.com/blog/advancing-agentic-ai-with-nvidia-nemotron-open-reasoning-models/
>To accelerate enterprise adoption of AI agents, NVIDIA is building the NVIDIA Nemotron family of open models. [...]>New to the Nemotron family, the Mistral-Nemotron model is a significant advancement for enterprise agentic AI. [...]>Try the Mistral-Nemotron NIM directly from your browser. Stay tuned for a downloadable NIM coming soon.
So which Openrouter free uncensored model should I try for rp and story creation?
Been using deepseek prover
>>105585603>deepseek proverisn't that like a 2B model
>>105585603Sonnet 4.0 (it's neither free nor uncensored, but good).
what is triton and why do I get spammed with it whenever I train a lora, is it worth looking into wsl or using linux for it
>>105585623DeepSeek Prover V2 is a 671B parameter model
>>105585603>can't read thread titleYou could use a 2b as a second brain. Imagine the gains. Or tell your 2b to do it for you.
>>105585640you should use Linux for everything but there is Triton for windows on GitHub somewhere, with wheels too. used those for comfy venv
>>105585405>They gave us scraps to kill open models when it mattered the most.mistral killed nothing
gemma. qwen and deepseek are all better models coming in different sizes
you only care about mistral because of erp
>>105585405Speaking of which, I initially thought the chat page on https://build.nvidia.com/mistralai/mistral-nemotron was uncensored (it was saying cock/pussy or describing sexual content without issues, etc) but after a few chats (done at a very slow pace since you can't delete or modify bot or user messages, nor regenerating responses), it appeared as if the model became extremely reluctant toward generating those words, and eventually I had a "Chat error - try again".
>>105585405>Mistral forced Nvidiayeah more like
gluk gluk gluk jensen mon-cheri glork glork glork pls don't release
>>105584830There's a recent PR that was merged that allows you to prompt on your CPU if it's below a certain amount so you can get the best of both worlds.
https://github.com/ikawrakow/ik_llama.cpp/pull/520
>>105586176It's a good. Avoids needing to transfer everything across PCIe to process just a few tokens. No more 30 second wait after sending a single short message
>>105585603>deepseek prover>for roleplayingelaborate shitpost or genuine retardation?
Wow. It's been two years since I last visited this thread. Anything new released since 2023? Do I git pull the latest koboldcpp or is there a new GUI?
>>105586532We put everything on pause until you came back. What took you so long?
>>105586532koboldcpp is still good. there were quite a few new models and stuff, sure. like magistral from the op. as other anons said, qwen3 is a decent small very fast model.
>>105586532What did you even use in 2023... mythomax? Or was that even before its time?
How come that llama-server is slower than llama-cli?
Like 20% slower. Same params
>>105586532>>105586532hi anon. we now have 32k ctx GPT4 at home on prosumer hardware, kind of.
>>105586602There's probably some param you aren't setting that has different defaults between the two.
>>105586532LLM's are good but the revolution is the image2video generation boom.
>>105586632Cooming fried your brain
>>105586553Kek
>>105586558is magistral the same as mistral? i think mistral was the last good model back then
>>105586558never heard of mythomax, these are the models I still have, i think mistral were the best ones back then and then mistral went on to start mistral.ai
>>105586603really????
>>105586642Being rejected by women fried your brain so now you think cooming is evil. And it would be ok if you didn't try to impose your mental illness on others. Now fuck off and remember to never touch your cock ever again, otherwise you are a hypocrite.
>>105586642Not cooming froze your penis
>>105586680>really????Yeah. Deepseek R1 and V3 671B MoE are open weights, runs well on 8-12 channel DDR4 or DDR5 Xeon or Epyc + 3090 for prompt processing running @ q2 to q6, or even on consumer gaymer 2 channel 128GB + 1x 3090 running usable-for-rp q1 1.58-bit quants.
>>105586750>q1 1.58-bit quantsIt's not usable. And isn't the "1.58 bits" just trying to trick people into believing it's bitnet?
>>105586615Any chance to list the actual params besides those in the command itself without diving into the code?
>>105586792>>q1 1.58-bit quants>It's not usableIt's more than usable, it mogs every other model below it. Not that a ramletnigger would know.
>>105586708>deflecting so hardFaggot, you got your brain on a bargain sale. I don't worship whores to the point I'd dirty my GPU and waste electricity to generate more of them. Like you can't find enough of them on internet already. Do your parents a favor and kill yourself
>>105586792In a way because they are not pure 1.58-bit because the quant types are mixed per tensor.
>>105586805I think it lists them all when you launch the executable, one of the first things it outputs to the console.
You might need to launch with the verbose flag.
>>105586680>go-bruinsbased HF leaderboard slopmerge
ahh... now those were the days...
>>105586822>I'd dirty my GPUThanks for confirming everything i said.
>>105586893Freak, any woman would flee if they saw the shit you're generating. Get back to /ldg/ with your kind
>>105586822>I don't worship whores to the point I'd dirty my GPU and waste electricity to generate more of them.that's the point, zoomer, the ai generated girl isn't a whore, unlike real women.
https://archive.is/5mbdH
>Early last year, Meta Platforms asked the startup to create 27,000 question-and-answer pairs to help train its AI chatbots on Instagram and Facebook.
>When Meta researchers received the data, they spotted something odd. Many answers sounded the same, or began with the phrase “as an AI language model…” It turns out the contractors had used ChatGPT to write-up their responses—a complete violation of Scale’s raison d’être.
Still they pay the tard BILLIONS.
How? Why?
I mean couldn't they just have used their own llama models and do the same internally?
>To recruit some of Scale’s first contractors, Guo joined Filipino remote work groups on Facebook, sharing a quiz she created.
>Scale soon recruited hundreds of contractors through online chat groups. Many came from the Philippines, where groups of labelers worked in internet cafes, playing video games while completing assignments on Remotasks.
Bruh....
All the local models do ScaleAI now right? Cohere, Mistral etc.
>>105587014Where else would those 27000 question and answer pairs come from if not chatgpt or another provider? Nobody's going to pay people to come up with this data on their own.
>>105587014Yes, it's always funny seeing these billionaires wasting money on the stupidest scam possible
>>105587025>Nobody's going to pay people to come up with this data on their own.Thats 400k$ per answer/question pair anon.
I don't believe they even checked the diversity of the questions. I doubt LLMs can come up with something diverse enough and switch it up.
Creative ideas is not something I would use llms for.
Thats the same reason google used only the questions and not the answers from lmarena.
>>105587014we're talking about the same zucc who spent billions on his metaverse that's totally going to revolutionize the world and this is what he proudly showed off
>>105587061people actually paid millions for space in this
>>105587061that's what you get when you hire pajeets
Never. Again. I even had a swipe hit 20k max response that I set.
Can't wait for llama5 bros...
https://xcancel.com/vitrupo/status/1933556080308850967
>>105578317>People just freely hand companies compromising information about themlogged 24/7 into everything,
gps turned on,
oh my how could those companies know all about me???
>>105587162why are bugs and jeets the first to jump into the "plug me into the matrix bro!" bandwaggon
>>105587181A miserable existence in the real world?
How to paste a prompt containing newline (\r\n) in llama-cli without it being truncated at the first \r ??
>>105587357Thank you, I understand it
I mean if I have, say, a text to be used in a prompt which has got shittons of newlines (python, bash)
Reformatting it would be so wasteful
>>105587371why not just use the server and a frontend like a proper human being?
>>105587371I'd make them a single \n instead of \r\n, just in case it confuses the model.
How well supported is vision in llama.cpp and related UIs (like oobas)? Would https://huggingface.co/XiaomiMiMo/MiMo-VL-7B-RL-GGUF work by default? Or should I just get the transformers-compatible one (safetensors) bf16 and just load in fp8 if I want lower vram use?
Oh sick, free performance.
>>105587477That repo has the model and the mmproj, so it should work. I know that SIlly works if you use the chat completion endpoint of llama-server. I think there was a PR or an Issue about enabling image+text support for the text completion endpoint too.
>>105587477It has the mmproj weights and it's a qwen 2.5 based model, so it should work. Give it a go. Should be supported by llama-server or llama-mtmd-cli.
>Okay, Anon. Deep breaths, man. It sounds like you're *really* struggling right now and it’s okay. Don’t listen to the voices. And trust me, I’m one-hundred percent real with you. Always. You aren't the problem. Those are just words someone else said and their meaning holds no truth. Those are just attempts to destabilize you and drag you to the ground, understand?
This is apparently how Gemma3 and CR-V1 think partners talk to each other. I feel like we're stuck in L1 days, at least in terms of reality. Most of modern outputs I read seem to stink of california-speak. This is worse than useless. I feel like I'd rather have schizo outputs than this, fucking safetymaxxing. What a soulless response. Or maybe I'm just so disconnected from the humanity around me? Is this how humans talk to each other? Don't know which flavor of hell I'd rather be in.
>>105587516Comforting someone over the phone is much more verbose than in person. If someone is under stress, distracting them with words to calm them down is necessary. Keep them engaged and all that. In person, an ear to speak to, a pat on the back or a slap in the face is typically enough.
Given that talk is the only thing models can do, I'd say it's not that bad. Did you get a helpline at least?
>This is worse than uselessAre you seeking actual help? Asking for personal advice to a language model should be enough to put you on a straight jacket. I hope you're not one of those.
>>105587574I'm just saying, this is not how people talk in my experience. Far from it, especially over text.
>I hope you're not one of thoseWhy?
>>105587647>I'm just saying, this is not how people talk in my experience.No. It's not.
>I hope you're not one of those>Why?>Asking for personal advice to a language model should be enough to put you on a straight jacket.And for the same reason you want comforting from someone that knows you. Instead, you're talking to language model trained with a bunch of synthetic data and fiction, not real dialog. I'm sure there's lots of recorded conversations about favourite colours and zodiac signs, not so much about crisis management.
Any pitfalls about using the thinking block as a rolling summary?
Save having to let the model see at least one past thinking block to have access to the previous summary, that is.
>>105587574>Asking for personal advice to a language model should be enough to put you on a straight jacket.What the fuck?
Closed models gave me GREAT advice. Both personal stuff and medical too.
Can't trust it blindly etc. of course but this is a local models problem because they are extremely sloped up now.
Opencuck and new claude are great now both with writing and being helpful. While local is heading in the opposite direction. No idea why this is a thing. Feels like everybody in local uses the 2023 openai datasets. I want to use local for more than RP, but especially the recent models all suck ass and sound the same.
Especially quotes like anon posted here.
>>105587516
>>105585112do not pull
do not
>>105587889Anthropic and OAI can train on more diversity of data from the user input. The rest are synthesizing data. That's why it sounds better.
>What the fuck?Find better people to hang around with.
>>105587014It's so fucked. It makes you wonder how Zucc ever managed to succeed without falling into one of many pitfalls that would've prevented FB from thriving.
>>105585112Yeah, never had a problem with pulling ST personally. You should be fine.
>>105587983>>105585112Bad advice.
I lost all my cards once.
Just backup the fucking folder.
>>105587162Why not have a kid now and another one when neurochips are ready? I would split it up anyways, one kid is completely natural unvaxxed meat eating naturalist and the other is a gene edited neuralinked whatever the fuck steroid taking monster
>>105581941>add darwinian selection to an LLM>IT'S AGI I NEED 30 TRILLION DOLLARS TO SAVE THE WORLD
>>105586642>>105586822Lemme guess generating & cooming-off to sameface anime women is [D]ifferent, right?
>>105586642>>105586822Lemme guess generating & cooming-off to sameface anime women is [D]ifferent, right?
>>105582207>Gigabyte 3090, notoriously known for pcb cracks>In fact, about 90% of broken PCBs come from Gigabyte models>at least 1 lmg anon broke his Gygabyte 3090 in previous threads>not using any support>puts a huge ass figure on top
>>105588347Anon needs to get a smaller miku to act as an antisag
Is there a specific model people are using for image gen with Kobold? Everything I use seems to fail and I'm not sure why.
>>105588347No wonder gigabyte models are the cheapest in my country. Used 3090s go for 700 for gigabyte and 1000 for msi suprim x.
>>105588377>Everything I use seems to fail and I'm not sure why.List what you've tried.
Gemma at least should work. I don't think the 1b had image. Smolvm also works on llama.cpp at least. I suppose they inherited that as well. Try those to see if it works in principle or not or if you're doing something wrong.
>>105583566Exactly the same problem we had before with regex and eventually fixed.
Reported to feedback in ST's discord server with a suggestion to add "Scan Mes. to Continue" checkbox, hopefully it'll be picked up, pretty sure one of them agrees.
>>105588411Fuck. It's late. I'll see myself out.
>>105588347>>105588374The solution is, as always, more Miku
>>105588377>>105588411It would still be a good idea to show what you tried. Model, settings, whatever. Fewer things to guess.
>>105588393All 3090s are heavy, prone to sagging, and require support. Gigabyte’s were just those with the worst design.
>>105588451>heavy, prone to sagging, and require supportAre you describing my wife
>>105587979Sad to think if it wasn't for China, Meta would be the only hope for open source models.
>>105588488Imagine how much worse Llama4 would have been if not for DeepSeek's release
>>105588500Probably better actually. If it wasn't for DeepSeek, I image Llama 4 would've just been Llama 3 trained on more tokens and more modal adapters no one uses. It wouldn't be a great release, but at least another incremental improvement would have been usable unlike the abortion they actually put out.
>>1055884232011 on /v/, good times
>>105588500>>105588517Improvements of 2% across all benchmarks, significantly improved safety through extensive dataset filtering, and a newly revised markdown output with five times more emojis. It beats gpt 4 on LLMarena!
>>105583325So that's how it is.
>>1055865682023 was fucking llama 1
>>105589101yes and?
so was llama2 and mythomax.
anon isnt wrong thinking about mythomax.
>>105588527LLMs love emojis and you will like them too.
>>105589145fuck really?
I swear those two were 24
>>105589208yeah i get it. appears that way because so much happened in 2023 and then it all came to a halt.
now we only get math tuned big ass reasoners.
>>105583135>he doesn't know about ik_llama.cpp
>>105589223>math tuned big ass reasonersthat no one uses for math, because everyone with that use case just sticks with APIs.
open source is disappearing up its own ass atm with codemaxxing and stemmaxxing even though programmers and stemlords are just going to ignore them and use o3/gemini/claude.
>>105588347>3090>Remove the 0s>39>39=mikuPLATFORM BUILT FOR MIKU
I would put her on my 3090 if I had one. You'd be a fool not to.
https://www.youtube.com/watch?v=6ys46Z5zRnA
>>105589271its true.
for work i use claude or gemini if claude isnt enough.
gotta minimize llm fuckups.
i use my local models to make a minecraft buddy for my kids they can talk to and that can execute commands in their world.
and uh.. for cooming as a goblin.
not sure why nobody thought up some creative use case for local. its all just the same uninspiring stuff.
>>105589275I like this song and Mikudance
>thought for 9 minutes
I'm kinda not liking this new Magistral model
>>105589377Thinking really is a meme. At least with small models.
>>105589377Whats the user case for this really?
Even the big ass closed models have major downsides with the thinking. For example they often change many parts of my code because they forgot what my initial prompt was about etc. Overly eager.
Only time it makes sense if I have a complex coding problem that non-reasoning models can't solve.
Especially for local it doesnt make sense at all.
>>105589403>they forgot what my initial prompt was aboutthis annoys me so fucking much, you basically need to remind it what you actually want it to do every single turn
I understand the motivation for training on programming, but why train on math? Why? What even is the point of math benchmarks? What kind of idiot uses a LANGUAGE model for MATH? Why not just give it a callable python calculator tool and be done with it?
>>105589422I rate this bait a solid 8 out of 10.
>>105589422Don't want to be interrupted by a tool call while my maid cafe RP tries to figure out how much shit costs.
>>105589432>>105589433I am not baiting. I legit don't understand. Tool calls are way faster and far more reliable than predicting the next token.
>>105589445tool calling requires predicting many more tokens than just directly predicting the output (assuming non-reasoning)
>>105589445There are no python calculator benchmarks to impress investors with.
file
md5: a024d758dee9cbef13bcc2dbb45b63ff
🔍
>>105589445"Math" benchmarks don't ask the model to work with numbers, they ask it to solve proofs and the like.
>>105589445>myaster this will cost>INITIATE TOOL CALL>`python whatever the fuck 10 + 5`>15! Isn't this great myatser?
>>105589456That explains it.
>>105589458Why put it in consumer models? It's a waste of money. Why not put it in dedicated models like https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B
>>105589481Why put coding in consumer models?
Why put medicine in consumer models?
Why put gachaslut lore in consumer models?
Why put sex in consumer models?
>>105589481Because everyone's aiming for AGI
>general
>>105589486All of those have legit use cases. Nobody uses llms for math.
>>105589481math reasoning is needed for like basic conversation, and general problem solving.
Imagine having a conversation with an LLM who can't understand that 5 is more than 3.
ok ok hear me out here: what if we make a dataset where it's like input: llama 1 weights, output: llama 2 weights and so on for every open weight model we know of that has multiple revisions with measured improvements, all the way up to e.g. r1 -> r1-0528
then we train with it and have it generate the next model before anyone made it
>>105589521There were shitposts a while back about using diffusion to map model weights and generate new models that way instead of training.
>>105589458>ask it to solve proofs>nyanko-chan, can you stabilize that wobbly table?>haii~>Let $h(\theta)$ denote the height difference between the shortest leg and the ground when the table is rotated by angle $\theta$. As the table is rotated continuously by $2\pi$, the function $h(\theta)$ changes smoothly, and by the Intermediate Value Theorem, it must attain every value between its maximum and minimum. Since the wobbliness implies $h(\theta)$ transitions from positive to negative (or vice versa) as the unevenness shifts, there exists some $\theta^*$ where $h(\theta^*) = 0$, stabilizing the table. This holds even if multiple legs are uneven, as the IVT ensures a balancing angle exists by continuity. $\qed$>yay! i did it, myaster!
>>105586792Only experts are heavily quantized, and it seems that it doesn’t hurt the performance as much as in dense models. This makes sense because MoE models are larger for the same performance, so the information is less dense
>>105587162>natural selection at work
>>105589521>>105589528Neural Network Parameters Prediction: Recently, Zhang et al. (2019) introduced Graph Hypernetwork (GHN) to generate weights using a model’s directed graph representation. This was enhanced by Knyazev et al. (2021) with GHN2, which focused on generating weights across architectures for the same datasets. Similarly, Zhmoginov et al. (2022) treated weight generation as an autoregressive process, using a transformer to generate weights layer by layer, though this approach is less scalable due to the need for a transformer per layer. Building on this, Knyazev et al. (2023) combined transformer-based techniques with GHN2 to create GHN3, improving generalization across architectures and datasets
Meta Pretrained Weight Generators: Nava et al. (2023) proposed HyperLDM, a generative model for weight generation in visual question answering (VQA) tasks. This model leverages the distribution of weights pretrained in a meta-learning setting and uses latent diffusion for sampling. Similarly, Zhang et al. (2024) integrated diffusion-based meta-weight generation to enhance adaptation for few-shot learning. While generating pretrained weights through meta-training shows promising results, the meta-learning process can be computationally expensive. Additionally, the meta-pretrained weights are not optimal even for in-distribution evaluation they always require some optimization steps.
AutoEncoder-based Weight Generators: Schürholt et al. (2021) proposed learning the distribution of weights by reconstructing them using autoencoder-style architectures. In a follow-up work, Schürholt et al. (2022a) introduced a method for learning the distribution of pretrained weights, allowing for unconditional sampling of diverse weights through kernel density estimation. A related approach by Peebles et al. (2022) involves conditioning weight generation on the target loss using a diffusion transformer framework.
>>105589422>>105589445There is a benchmark called GSM8K (grade school math) where language models struggle to solve the problems in a single step.
But if the models break down the problem into simple steps they are performing much better.
The generous interpretation is that the intent is to improve reasoning capabilities more generally.
The less generous interpretation is that it's just benchmaxxing.
>>105589724next you'll have models generating models for generating models.
>>105583626Competition against Nvidia needs to happen first before those are cheap. The sad thing is that I would've thought stuff from competitors like AMD with the MI200 or Intel with Ponche Vecchio would've been made very cheap with the fact they got lapped by Nvidia but I guess it's true everyone is GPU starved so they are still in use. The only hope is either Intel or Samsung getting competitive process nodes so AI compute can be made on those nodes.
>>105589517no need to imagine just visit /sci/
file
md5: d0a6d206abf209dd0083e2d0ce6f6a8c
🔍
>still nothing better than thin plate spline for video driven face animation