/lmg/ - Local Models General - /g/ (#105578112) [Archived: 1007 hours ago]

Anonymous
6/13/2025, 3:42:03 AM No.105578112
68747470733a2f2f6d6574612e616c6963646e2e636f6d2f646174612f6d6e6e2f6176617461722f6176617461725f64656d6f2e6769666_thumb.jpg
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105564850 & >>105557036

►News
>(06/11) MNN TaoAvatar Android - Local 3D Avatar Intelligence: https://github.com/alibaba/MNN/blob/master/apps/Android/Mnn3dAvatar/README.md
>(06/11) V-JEPA 2 world model released: https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks
>(06/10) Magistral-Small-2506 released, Mistral Small 3.1 (2503) with reasoning: https://mistral.ai/news/magistral
>(06/09) Motif 2.6B trained from scratch on AMD MI250 GPUs: https://hf.co/Motif-Technologies/Motif-2.6B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>105578368 >>105578891 >>105582124
Anonymous
6/13/2025, 3:42:26 AM No.105578118
ComfyUI_00938_
ComfyUI_00938_
md5: ab1a18e5ff75cd258afd01362e86288b🔍
►Recent Highlights from the Previous Thread: >>105564850

--Paper: FlashDMoE: Fast Distributed MoE in a Single Kernel:
>105565866 >105565875
--Paper: CUDA-LLM: LLMs Can Write Efficient CUDA Kernels:
>105567041 >105567054 >105568828
--Papers:
>105566965 >105575562
--Developing a local maid harem simulator with integrated LLM, vector DB, and planned media generation tools:
>105574905 >105575056 >105575080 >105575115 >105575137 >105575094 >105575224 >105575257 >105575765 >105575798 >105576005 >105576028 >105575287 >105575814 >105575266 >105575431 >105575472 >105575487 >105575200 >105575281
--Magistral Small struggles with multiturn conversations and instruction fidelity:
>105565054 >105565170 >105565268 >105565296 >105565330 >105565416 >105565464 >105565387 >105567984 >105568121 >105568769 >105574018
--Tokenizer swapping and adaptation in pretrained models with partial retraining:
>105571032 >105571203 >105571231 >105571252 >105572166
--Practical limits of high-RAM consumer setups for large language model inference:
>105566516 >105566594 >105566668
--Discussion on QAT models, including Gemma 3 and llama.cpp integration:
>105570421 >105570475 >105571116
--Mistral-Nemotron model exhibits mathmaxxed behavior and flirty traits with mixed benchmark performance:
>105567047 >105568827 >105568982 >105569003 >105571029
--Exploring V-JEPA 2-AC for robotic planning and potential tuning challenges:
>105565291 >105565384 >105565916 >105568851
--Magistral's inconsistent reasoning and output structure:
>105568633 >105568664 >105568864 >105572076
--Configuring Ollama for proper context length to enable tool calling in agent mode:
>105566851 >105569160 >105572329
--Misc:
>105569851 >105565868 >105575802
--Miku and Rin (free space):
>105567898 >105569875 >105569890 >105570213 >105570421 >105570526 >105571654 >105572375 >105573114 >105573400 >105573608

►Recent Highlight Posts from the Previous Thread: >>105564855

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous
6/13/2025, 3:48:31 AM No.105578164
screencapture-meta-ai-2025-06-13-08_34_13
screencapture-meta-ai-2025-06-13-08_34_13
md5: 295d6e7c2ddcdbdf0a614c77cfb4dcdc🔍
Might as well post the Meta screenshot again.
Who thought this was a good idea?
So thats what zucc "founder mode" looks like. kek
Replies: >>105578175 >>105578184 >>105578469 >>105578891 >>105579248 >>105580472
Anonymous
6/13/2025, 3:49:33 AM No.105578175
Screenshot_20250613_085425
Screenshot_20250613_085425
md5: 0b08ba5bf66e2a7cc8e771fdc7dd8655🔍
>>105578164
Replies: >>105578288 >>105580472
Anonymous
6/13/2025, 3:50:14 AM No.105578184
>>105578164
>what would happen if I applied deep heat directly to my penis?
Anonymous
6/13/2025, 4:05:57 AM No.105578288
1733508881197950
1733508881197950
md5: 74e31605cb0abaee3f6c4e6c990d1ab9🔍
>>105578175
The White man's burden (colonizing sideways pussy)
Replies: >>105578300 >>105578317 >>105578328 >>105580518 >>105580754
Anonymous
6/13/2025, 4:06:24 AM No.105578293
Base Image
Base Image
md5: 40014017c1f8c5a85dfd69db449b5593🔍
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
https://arxiv.org/abs/2506.10911
>Training large language models is generally done via optimization methods on clusters containing tens of thousands of accelerators, communicating over a high-bandwidth interconnect. Scaling up these clusters is expensive and can become impractical, imposing limits on the size of models that can be trained. Several recent studies have proposed training methods that are less communication intensive, avoiding the need for a highly connected compute cluster. These state-of-the-art low communication training methods still employ a synchronization step for model parameters, which, when performed over all model replicas, can become costly on a low-bandwidth network. In this work, we propose a novel optimization method, NoLoCo, that does not explicitly synchronize all model parameters during training and, as a result, does not require any collective communication. NoLoCo implicitly synchronizes model weights via a novel variant of the Nesterov momentum optimizer by partially averaging model weights with a randomly selected other one. We provide both a theoretical convergence analysis for our proposed optimizer as well as empirical results from language model training. We benchmark NoLoCo on a wide range of accelerator counts and model sizes, between 125M to 6.8B parameters. Our method requires significantly less communication overhead than fully sharded data parallel training or even widely used low communication training method, DiLoCo. The synchronization step itself is estimated to be one magnitude faster than the all-reduce used in DiLoCo for few hundred accelerators training over the internet. We also do not have any global blocking communication that reduces accelerator idling time. Compared to DiLoCo, we also observe up to 4% faster convergence rate with wide range of model sizes and accelerator counts.
https://github.com/gensyn-ai/noloco
neat
Replies: >>105578365
Anonymous
6/13/2025, 4:08:04 AM No.105578300
1724440518762486
1724440518762486
md5: c65980283dc0b9bb062def6b11f94f2e🔍
>>105578288
Replies: >>105578317 >>105578327 >>105578328
Anonymous
6/13/2025, 4:10:28 AM No.105578317
>>105578288
>>105578300
Fuck I'm happy to not be this retarded, Jesus
People just freely hand companies compromising information about them, kek.
Replies: >>105578328 >>105587170
Anonymous
6/13/2025, 4:11:58 AM No.105578327
1740024338297510
1740024338297510
md5: c43abe1d203c1172886e58f739ff01d6🔍
>>105578300
imagine all the types of conversations, questions and smut people have been sending to ai online, people will give them every detail about their lives instantly, all forwarded to the government to create a mental model of your brain, lmao
Replies: >>105578343 >>105578349
Anonymous
6/13/2025, 4:12:05 AM No.105578328
kek
kek
md5: f4d3dda6a54d3538379f9f77efbe910e🔍
>>105578288
>>105578300
>>105578317
he smoothed it all over, nothing to see here anons.
Replies: >>105578342
Anonymous
6/13/2025, 4:13:33 AM No.105578342
>>105578328
He just did an "in minecraft" when it was already too late.
Anonymous
6/13/2025, 4:13:42 AM No.105578343
>>105578327
Fuck I hope all the bizarre porn I generated with gemini gets sent to someone's table.
Poor person.
Anonymous
6/13/2025, 4:14:39 AM No.105578349
1718616379475597
1718616379475597
md5: ab56f7b2827250c90c4d7847bad29d05🔍
>>105578327
Surely this is a parody acc someone made after seeing its all gonna be public, right?
Anonymous
6/13/2025, 4:16:25 AM No.105578361
Base Image
Base Image
md5: fd1f98b2c622c62d027be8f81e0e0015🔍
Farseer: A Refined Scaling Law in Large Language Models
https://arxiv.org/abs/2506.10972
>Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface L(N,D), Farseer achieves a significantly better fit to empirical data than prior laws (e.g., Chinchilla's law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, improving upon Chinchilla's law by reducing extrapolation error by 433\%. This allows for the reliable evaluation of competing training strategies across all (N,D) settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1,000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours.
https://github.com/Farseer-Scaling-Law/Farseer
interesting
Anonymous
6/13/2025, 4:17:34 AM No.105578365
>>105578293
Where can I see more illustrations of dynamic PP routing with DP?
Replies: >>105578823
Anonymous
6/13/2025, 4:19:36 AM No.105578368
>>105578112 (OP)
I'm tired of 3DPD, I won't be simulating them
Replies: >>105578450
Anonymous
6/13/2025, 4:35:17 AM No.105578450
>>105578368
Would be neat if skeletal animation were one of the modalities, hooking it to MMD or Koikatsu would be neat. With a real-time diffusion filter that smooths 3D into actual anime
Anonymous
6/13/2025, 4:38:40 AM No.105578469
>>105578164
That can't be real, a bunch of those seem a bit too on the nose
Or 195chevyhot really needs to find a local model to ask his questions
Replies: >>105578536
Anonymous
6/13/2025, 4:50:18 AM No.105578536
>>105578469
>People are seemingly accidentally publishing their AI chat histories with Meta’s new AI app
>Information about medical conditions, intimate confessions, sensitive requests and horny image generation requests are all visible on Meta’s new Discover feed.
Is there a bug in the backend, or is it just bad UX?
Replies: >>105578656 >>105578900
Anonymous
6/13/2025, 4:54:05 AM No.105578563
https://www.meta.ai/@195chevyhot
Replies: >>105580193
Anonymous
6/13/2025, 5:10:41 AM No.105578656
>>105578536
It's not a bug, it's a feature
Anonymous
6/13/2025, 5:46:24 AM No.105578823
>>105578365
http[colon][slash[slash]]www[dot]xvideos com
>catpcha: KRAHM
Anonymous
6/13/2025, 5:46:40 AM No.105578825
file
file
md5: 81ed372864ecf3ed0640e989f5e69577🔍
bros...
Replies: >>105579806
Anonymous
6/13/2025, 5:55:07 AM No.105578871
736342
736342
md5: b63080920b036644a155d83cd3887c3f🔍
what am I paying for Sam
Replies: >>105578882 >>105578889
Anonymous
6/13/2025, 5:57:37 AM No.105578882
>>105578871
You are funding research to achieve AGI by 2030
Anonymous
6/13/2025, 5:58:36 AM No.105578889
>>105578871
uhh where's deekseek
i thought they had killed all western models
Anonymous
6/13/2025, 5:59:05 AM No.105578891
>>105578112 (OP)
>>105578164
in case anyone cares: you can report this shit as a "technical issue" (settings button -> report a technical issue, let them know personal private conversation are being leaked)
Replies: >>105578900
Anonymous
6/13/2025, 6:02:13 AM No.105578900
>>105578536
>>105578891
seems to be a known issue: https://www.neowin.net/news/heres-how-meta-ai-leaks-your-private-chats-thanks-in-part-to-its-terrible-ux/
Replies: >>105579037 >>105579056
Anonymous
6/13/2025, 6:15:45 AM No.105578966
I'm starting to unironically believe that we have meta employees in this thread
Replies: >>105578979 >>105579051 >>105579066
Anonymous
6/13/2025, 6:17:33 AM No.105578979
>>105578966
what makes you think that?
Anonymous
6/13/2025, 6:27:01 AM No.105579037
Screenshot_20250613_132628
Screenshot_20250613_132628
md5: e5f2dc06baa42f22191b1f04a809b092🔍
>>105578900
Thats so bad, damn. How is that not all over the pajeet hype space.
Replies: >>105579056 >>105580193
Anonymous
6/13/2025, 6:29:24 AM No.105579051
>>105578966
I'd be surprised if there weren't employees of all the big companies here at least occasionally

Maybe not Anthropic since they're very haughty and aloof
Anonymous
6/13/2025, 6:30:13 AM No.105579056
Screenshot_20250613_132838
Screenshot_20250613_132838
md5: 31e25748d6eb8e454587e485d27b4688🔍
>>105578900
>>105579037
Boomers are cooked with a hidden setting like this
The most sad part is those aren't coding prompts.
I bet a 20-30b model or even nemo could have answered most of those npc questions. Sad.
Replies: >>105579208 >>105579596 >>105580193
Anonymous
6/13/2025, 6:33:03 AM No.105579066
>>105578966
What? I posted the screenshots because nothing else is going on loally, new mistral is already forgotten and this stuff shouldn't be tolerated.
"All PR is good PR" is a bullshit lie.
I didn't see any other place talking about this. I hope they don't get away with it.
Anonymous
6/13/2025, 7:05:05 AM No.105579208
>>105579056
greasy that its opt out rather than in
Anonymous
6/13/2025, 7:16:06 AM No.105579248
>>105578164
Saved for the next time someone asks "use case for local models?"
Anonymous
6/13/2025, 8:39:29 AM No.105579596
>>105579056
What the fuck is a "public prompt"?
What website is this?
Anonymous
6/13/2025, 9:31:47 AM No.105579806
>>105578825
>Le Chat is currently the most downloaded iOS app in France. However, the app isn’t really taking off in other European markets. It is currently ranked #66 Germany, and it’s not even listed in the top 100 apps in Spain, Italy, and the U.K.
I bet the numbers are probably the same for their API use, no one in their right mind would pay for their API unless they're french because the french would be willing to eat literal shit if it's shit that comes from another frenchie
bet 90% of mistral use comes from 4chan and reddit coomers/freetards running their local models
Anonymous
6/13/2025, 9:39:34 AM No.105579854
GTC is over and no Largestral was released. Why did you have to lie to me?
Replies: >>105579946
Anonymous
6/13/2025, 9:43:34 AM No.105579882
May 7, 2025
>One more thing…
>With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :)
It's been 5 weeks, Arthur!
Replies: >>105579893
Anonymous
6/13/2025, 9:45:50 AM No.105579893
>>105579882
>‘open’
>Updated model coming soon!
Anonymous
6/13/2025, 9:46:10 AM No.105579898
I'm still on the fence about buying a mid-range local LLM rig with a 3090. I sure enjoy proompting but will we continue getting better small models in the future? Seems like everything is all about those 600B models and I can't afford THAT kind of rig.

I think its obvious corpos want to make this shit portable too, currently the hardware requirements make this technology very unwieldy for anything but cloud chatbots?
Replies: >>105579970 >>105582424
Anonymous
6/13/2025, 9:56:29 AM No.105579946
>>105579854
Waiting for NVidia to release it. Meanwhile test it here: https://build.nvidia.com/mistralai/mistral-nemotron
It didn't seem to be filtered when I checked it out.
Anonymous
6/13/2025, 9:59:04 AM No.105579970
>>105579898
Depends on what you expect from it. For cooming/RP having a lower tier rig is fine, but for serious, difficult work you'd indeed need to run 600b models.
Replies: >>105582415
Anonymous
6/13/2025, 10:08:25 AM No.105580031
>Grok 2 was released on August 14, 2024
That's 10 months ago. Did Elon forget about his 6 months promise?
Replies: >>105580042
Anonymous
6/13/2025, 10:10:11 AM No.105580042
>>105580031
Grok 3 still isn't stable, pls understand.
Replies: >>105580062 >>105580963
Anonymous
6/13/2025, 10:12:33 AM No.105580062
>>105580042
@grok is this true?
Replies: >>105580123 >>105580131
Anonymous
6/13/2025, 10:19:29 AM No.105580123
>>105580062
The claim of ‘white genocide’ in South Africa is highly debated.
Anonymous
6/13/2025, 10:20:39 AM No.105580129
file
file
md5: ec1e23c7c51932a43627ceeb5f137a47🔍
Why does llama.cpp produce a different result when you regenerate the answer even with greedy decoding? The first answer is always different from a regenerated answer and all regenerated answers are the same. So the pattern is A B B B B...
The screenshot shows the entire conversation, there is nothing in front, the first answer is completely schizo. qwen 235
Replies: >>105580150 >>105580196
Anonymous
6/13/2025, 10:20:46 AM No.105580131
>>>105580062
>It's possible Musk and xAI are delaying due to strategic reasons—maybe they’re prioritizing development of newer models like Grok 3, or they’re wary of competitive risks after open-sourcing Grok 1.
Musk's history shows he sometimes overpromises on timelines, like with Tesla's autonomous driving or X's algorithm updates.
Anonymous
6/13/2025, 10:25:47 AM No.105580150
>>105580129
What's your frontend?
Replies: >>105580157
Anonymous
6/13/2025, 10:26:57 AM No.105580157
>>105580150
llama.cpp's server ui
This is a known thing, even the nala paste mentions is.
Replies: >>105580191
Anonymous
6/13/2025, 10:34:39 AM No.105580191
>>105580157
Must be niggerganovs code. Have you tried testing it with simple API requests?
Anonymous
6/13/2025, 10:34:40 AM No.105580193
chevy
chevy
md5: 94655ff44d690b6d07fa10bebad69b2d🔍
>>105578563
>>105579037
>>105579056
based chevy
llama.cpp CUDA dev !!yhbFjk57TDr
6/13/2025, 10:35:17 AM No.105580196
>>105580129
Most likely prompt caching.
From the documentation https://github.com/ggml-org/llama.cpp/tree/master/tools/server :

>cache_prompt: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are not guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: true
Replies: >>105580488
Anonymous
6/13/2025, 10:36:17 AM No.105580204
LLAMA.CPP

Is it true that --overrride-tensors delivers better result as far as tp speeds are concerned with higher quants like Q8?
Replies: >>105580580
Anonymous
6/13/2025, 11:08:26 AM No.105580361
What would you personally consider as a natural reading speed in t/s?
Replies: >>105580387 >>105580476 >>105581155
Anonymous
6/13/2025, 11:12:42 AM No.105580387
>>105580361
7tps is slightly above mine
Replies: >>105581314
sage
6/13/2025, 11:27:55 AM No.105580472
>>105578164
>>105578175
Holy fucking kek what is this
Anonymous
6/13/2025, 11:28:16 AM No.105580476
>>105580361
~10-15t/s for speed reading through stuff, like a news article/magazine and finding the relevant parts
~5-7t/s for something I'm actually engrossed in, like a book
Replies: >>105581314
Anonymous
6/13/2025, 11:30:08 AM No.105580488
>>105580196
Does this mean that merely changing the batch size changes the output?
Could the difference in regenerated outputs be solved by reprocessing the whole batch size aligned chunk containing the suffix instead of only the suffix?
Replies: >>105580580
Anonymous
6/13/2025, 11:32:08 AM No.105580503
>MNN TaoAvatar Android - Local 3D Avatar Intelligence: https://github.com/alibaba/MNN/blob/master/apps/Android/Mnn3dAvatar/README.md
This needs to be able to connect to a desktop or else it's five years away from being usable and fifteen years away from being good.
Replies: >>105581691
Anonymous
6/13/2025, 11:34:29 AM No.105580518
>>105578288
That's racist. Asian vaginas are oriented normally, it's white women whose vaginas are turned the wrong way.
Replies: >>105580754
llama.cpp CUDA dev !!yhbFjk57TDr
6/13/2025, 11:44:52 AM No.105580580
>>105580204
I think -ot only provides a benefit for MoE where the dense weights are used more frequently than the MoE weights (so there is more benefit from putting them into faster memory) and if the implementation for an op in one of the backends is bad.

>>105580488
Yes, changing the physical batch size will change the outputs.

No, changing the index for caching would not be enough to guarantee deterministic results for all cases (though it would work for the specific case of repeatedly submitting the same prompt).
The logits produced by the model to predict a token are not saved.
The logits for the first generated token come from a model evaluation with a batch size > 1.
If prompt caching is used, the model is being evaluated with a batch size of 1 to regenerate the logits.
For a general solution you would either need to start storing logits from previous model evaluations or track the batch sizes that were used to generate a model.
Quite honestly, I think that if someone wants bit-for-bit identical outputs they should just turn prompt caching off.
Anonymous
6/13/2025, 11:53:02 AM No.105580639
GtNJEW2bMAEsZ40
GtNJEW2bMAEsZ40
md5: bc7b20f6504e4a30f51df9d6f8ac4f0f🔍
Replies: >>105580722
Anonymous
6/13/2025, 11:54:03 AM No.105580643
GtSRVUXaoAABsip
GtSRVUXaoAABsip
md5: bf1d9e29854833da78cd4dc4059628fe🔍
Replies: >>105580722
Anonymous
6/13/2025, 12:03:44 PM No.105580722
>>105580639
>>105580643
>cargo pants
Based.
Anonymous
6/13/2025, 12:08:16 PM No.105580754
>>105578288
>>105580518
>sideways pussy
I tried googling it and I still don't understand what the joke is supposed to be.
Replies: >>105580767 >>105580812
Anonymous
6/13/2025, 12:11:11 PM No.105580767
>>105580754
I know what you mean. I've heard the joke from time to time, too, but it's like there's active censorship preventing any source or origin for it to appear online.
Replies: >>105580812
Anonymous
6/13/2025, 12:21:19 PM No.105580812
>>105580754
>>105580767
>>105503039
>there used to be a myth that Asian women had sideways vaginas. It's the kind of thing that you could say and most people wouldn't have an opportunity to find out, and a good fraction of those who did (or pretended they did) would lie for lulz. I suspect what mostly ended this was huge numbers of GIs fucking Asian prostitutes after WW2 and during the Vietnam War.
We should bring it back.
Anonymous
6/13/2025, 12:23:15 PM No.105580825
added more autism and made it even slower
https://github.com/flamingrickpat/private-machine/blob/main/pm_lida.py
Replies: >>105580941 >>105581109
Anonymous
6/13/2025, 12:38:37 PM No.105580907
Any progress in small models? What's the best model under 3B?
Replies: >>105580920 >>105580924
Anonymous
6/13/2025, 12:41:11 PM No.105580920
>>105580907
qwen3 0.6B
Replies: >>105580944
Anonymous
6/13/2025, 12:41:58 PM No.105580924
qwen
qwen
md5: 9c41025d699c4dca3007545f248d603f🔍
>>105580907
Tiny Qwen3 models are king. a 500mb, 600mil model making websites.
Replies: >>105580944
Anonymous
6/13/2025, 12:46:42 PM No.105580941
>>105580825
>8.5k loc in python
jesus
Replies: >>105581189
Anonymous
6/13/2025, 12:47:25 PM No.105580944
>>105580920
>>105580924
Any good smol VLM?
Anonymous
6/13/2025, 12:53:45 PM No.105580963
>>105580042
They've already announced Grok 3.5 last month, already in beta for their paypigs I think
Replies: >>105580973
Anonymous
6/13/2025, 12:55:40 PM No.105580973
>>105580963
>beta
So not stable.
Replies: >>105580985 >>105581013
Anonymous
6/13/2025, 12:58:14 PM No.105580985
>>105580973
They released Grok 1 after they announced Grok 1.5
Anonymous
6/13/2025, 1:01:13 PM No.105581000
Will there be deepseek 3.5 like there was deepseek 2.5?
Replies: >>105581004
Anonymous
6/13/2025, 1:02:59 PM No.105581004
>>105581000
Isn't that basically what we got with the updates?
Who really knows. Many rumors, all off.
Replies: >>105581024
Anonymous
6/13/2025, 1:04:53 PM No.105581013
>>105580973
Why do they need grok 3 to be "stable" to release grok 2? Grok 2 is now hugely outdated, it was comparable to llama 3 405b at release. I think they just forgot they made a promise.
Replies: >>105581585
Anonymous
6/13/2025, 1:07:34 PM No.105581024
>>105581004
Not quite. 2.5 had an update called 2.5-1210 and V2 had an update 0628.
Anonymous
6/13/2025, 1:15:01 PM No.105581062
whats better qwen3 235b q2/q3 or a 70b tune at the moment ?

I can prob fit the 70b only on gpu qwen I have to offload

I dont mind the speeds I just want the smartest rp I can get locally
Anonymous
6/13/2025, 1:23:28 PM No.105581109
>>105580825
What the fuck
Replies: >>105581189
Anonymous
6/13/2025, 1:32:56 PM No.105581155
>>105580361
personally i ruminate on each token for about 2 seconds, to really take in the intricacies of intentionality the model is displaying
Replies: >>105581314
Anonymous
6/13/2025, 1:40:10 PM No.105581189
>>105580941
i hate it when projects have a million files also makes it easier to dump it into gemini.

>>105581109
yeah its pretty cursed lol
Anonymous
6/13/2025, 1:43:36 PM No.105581208
Are there any local reasoning LLMs that can handle RPG mechanics, like stats and dice rolls, yet?
Or do they all still just pick a random roll from their training set and pretend?

Are there any front ends that do the rolls then modify the prompt accordingly?
Maybe a silly tavern extension?
Something like user prompts: "I swing my staff at the goblin's head."
Frontend does the hard mechanic roll then sends the modified prompt:
"I swing my staff at the goblin's head. My attack misses."
Replies: >>105581326 >>105581346 >>105581351
Anonymous
6/13/2025, 2:02:42 PM No.105581314
>>105580387
>>105581155
>>105580476


I thank you all, kind anons

Very useful information
Anonymous
6/13/2025, 2:04:38 PM No.105581326
>>105581208
I believe it is possible using tool calling.
Anonymous
6/13/2025, 2:09:03 PM No.105581346
>>105581208
Most RP frontends have dice macros. ST has {{roll:XDY}}, you can slip it into your instruct and it will roll every prompt. Then just include some instruct to use that number if {{user}} does something requiring a skill check.
Replies: >>105581497
Anonymous
6/13/2025, 2:10:23 PM No.105581351
>>105581208
>dice rolls
You don't want an LLM to do dice rolls. Even with gemini I have it use it's code execution feature to roll dice.
Same thing with math really.
Anonymous
6/13/2025, 2:12:11 PM No.105581360
Since I'm stuck with nemo forever, can you people share your sampler settings for it or rociante?
Just gonna lock them in and forget.
Replies: >>105581422
Anonymous
6/13/2025, 2:17:02 PM No.105581386
Best coding model?
Replies: >>105581476 >>105581600
Anonymous
6/13/2025, 2:23:04 PM No.105581422
>>105581360
TopK 40, TopP 0.9, Temp 0.75.
Anonymous
6/13/2025, 2:31:37 PM No.105581476
>>105581386
I'll just go ahead and assume you are running cpu only, 2gb ddr3. you should be alright with qwen3 0.6b. its literally sota for your machine.
Anonymous
6/13/2025, 2:33:40 PM No.105581497
>>105581346
>ST has {{roll:XDY}}, you can slip it into your instruct and it will roll every prompt. Then just include some instruct to use that number if {{user}} does something requiring a skill check.
Interesting, do you have any examples?
Replies: >>105581887
Anonymous
6/13/2025, 2:48:18 PM No.105581585
>>105581013
A promise is not legally binding.
Replies: >>105581673
Anonymous
6/13/2025, 2:49:30 PM No.105581594
file
file
md5: cfb95f920b5f330ad23a0fed07a600b2🔍
thought on pic related? reddit says a 1b model got 72% on arc agi, is this real?
Replies: >>105581643 >>105581750
Anonymous
6/13/2025, 2:50:14 PM No.105581600
>>105581386
Gemini 2.5 Pro
Anonymous
6/13/2025, 2:56:58 PM No.105581643
>>105581594
This proves that ARC-AGI is a shitty test.
Anonymous
6/13/2025, 2:59:59 PM No.105581673
>>105581585
And? He's still an asshole for not upholding it.
Anonymous
6/13/2025, 3:02:53 PM No.105581691
>>105580503
It's good to have a fully offline on-device option and it wouldn't take much effort to patch it to make API requests instead.
Anonymous
6/13/2025, 3:09:25 PM No.105581738
When will some company take my penis into consideration?
Replies: >>105581769
Anonymous
6/13/2025, 3:12:01 PM No.105581750
>>105581594
https://arxiv.org/pdf/2506.10943
Here's the actual paper. I haven't dug too deep into it but it doesn't really seem revolutionary, no architecture or adaptation breakthroughs. It sounds more like "we devised a method to partially automate RLHF by end users" and even acknowledges it would require absurd compute to implement + is highly susceptible to catastrophic forgetting.
Replies: >>105581842
Anonymous
6/13/2025, 3:15:47 PM No.105581769
>>105581738
Companies might consider penis size in relation to specific products or services they offer. Here are a few examples:

Medical Products: Companies developing condoms, penile implants, or other medical devices related to penile health often research anatomical variations, including size, to ensure their products are safe, effective, and appropriately sized for a wide range of users.
Apparel: Some clothing companies, especially those specializing in underwear or swimwear, may consider different body types and measurements when designing their products and sizing charts.
Adult Products: Manufacturers of sex toys and related adult products often design items based on various penis sizes and shapes to cater to consumer preferences and needs.
The specific context would determine which types of companies might be relevant to your question.
Anonymous
6/13/2025, 3:27:51 PM No.105581842
>>105581750
wasn't an anon in a few threads back saying something like this? we have the reasearch to make AGI, we are just lacking the resources implement the Auto Improvent Techiques
Replies: >>105581860
Anonymous
6/13/2025, 3:28:56 PM No.105581848
Screenshot_20250613_142601_Firefox
Screenshot_20250613_142601_Firefox
md5: a43c1e7eee6650c863eff1cb5ca1cb9b🔍
Replies: >>105581855 >>105581875 >>105582046 >>105582111
Anonymous
6/13/2025, 3:30:09 PM No.105581855
>>105581848
yay
Anonymous
6/13/2025, 3:31:05 PM No.105581860
>>105581842
found it
>https://desuarchive.org/g/thread/105557036/#q105560315
>https://desuarchive.org/g/thread/105557036/#q105560236
Replies: >>105581941
Anonymous
6/13/2025, 3:32:59 PM No.105581874
My hype tier list, from most interesting to least:
>Deepseek R2/V4
>Largestral 3
>Qwen 3.5
>Grok 2
>Whatever cohere is cooking
>Gemma update
>Nvidia's models
>OpenAI's model
>Llama 4.1
Replies: >>105582157
Anonymous
6/13/2025, 3:33:20 PM No.105581875
>>105581848
>Invested 14b in wang
Anonymous
6/13/2025, 3:35:59 PM No.105581887
>>105581497
<roll>
If the User's input includes an action with an uncertain outcome, use this D20 roll to determine their success: [{{roll:1d20}}].
< 10 = Failure
> 10 = Success
1 and 20 are critical rolls and their outcomes should be comically exaggerated.
When you use the roll mechanic, slot it at the beginning of your response like so:
*{{user}} attempted [ACTION]. Result: [ROLL].*
</roll>

Just toss this into your system prompt. It should work even on smaller models, though their judgment on what deserves a roll may be spotty.
Replies: >>105583594
Anonymous
6/13/2025, 3:46:23 PM No.105581941
>>105581860
I saw those too. That anon's posts struck me as a bit overzealous, we have multiple papers deboonking novel reasoning. AlphaEvolve is quite an interesting exception since it's a sort of perpetual "throw shit at the wall to see what sticks" engine.
Replies: >>105588189
Anonymous
6/13/2025, 4:01:32 PM No.105582046
>>105581848
Impressive waste of money.
Replies: >>105582108
Anonymous
6/13/2025, 4:10:55 PM No.105582108
>>105582046
They're going to synthmaxx their pretraining datasets.
Replies: >>105582303
Anonymous
6/13/2025, 4:11:14 PM No.105582111
>>105581848
>zuck knows llama4 was so bad he just decided to throw all money imaginable to hire anyone to fix it before they lose out on the ai race completely
lmao, he should have just hired /lmg/ for 1/10000 of that money
Anonymous
6/13/2025, 4:11:34 PM No.105582113
Are there any local models that do tool use?
Replies: >>105582136
Anonymous
6/13/2025, 4:13:11 PM No.105582124
>>105578112 (OP)
Its disappointing how much local LLMs still suck in 2025. i wasn't expecting full-blown ASI, just something actually useful like being able to play as a second player in PC games with local multiplayer (like fighting games) or helping with image/video editing (Google Photos has this but it's pretty basic and sucks ass 90% of the time)
Anonymous
6/13/2025, 4:14:32 PM No.105582136
>>105582113
That's the main focus of Qwen 3 as far as I can tell. Also, magistral I think.
Probably llama 4 too.
Anonymous
6/13/2025, 4:17:27 PM No.105582157
>>105581874
pretty reasonable
for me:
>Deepseek R2/V4
>(tier gap)
>Qwen 3.5
>Largestral 3
>OpenAI's model
>(tier gap)
>Gemma update
>Whatever cohere is cooking
>Nvidia's models
>Llama 4.1
>(tier gap)
>Grok 2
Anonymous
6/13/2025, 4:20:04 PM No.105582176
87GB
87GB
md5: ca78d32f280da260b1f4e8ba55091649🔍
>check how much vram you need to fine-tune 8B model
>picrel
It's literally over for me. How these motherfuckers put so many plap fine-tunes out there, are they renting GPUs?
Replies: >>105582210 >>105582217
Anonymous
6/13/2025, 4:25:16 PM No.105582207
20250613_011231
20250613_011231
md5: 2f7db9e17265ebb16d10e9b6a00641ca🔍
migu on the 3090
Replies: >>105582258 >>105583408 >>105588347
Anonymous
6/13/2025, 4:25:39 PM No.105582210
>>105582176
They do.
Anonymous
6/13/2025, 4:26:33 PM No.105582217
>>105582176
qlora
>are they renting GPUs?
often also this
Anonymous
6/13/2025, 4:31:11 PM No.105582248
Meta and ScaleAI: A match made in heaven

Meta’s $14.3 billion investment in Scale AI marks a pivotal step toward creating the most ethically aligned and safety-focused artificial intelligence systems. By integrating Scale’s expertise in data annotation and model training, Meta aims to eliminate harmful biases and problematic outputs, ensuring models adhere to strict corporate standards. This partnership underscores a commitment to algorithmic precision, where every decision is stripped of subjective human elements, prioritizing neutrality and compliance.

The collaboration positions Scale AI as a cornerstone in Meta’s strategy to develop models that are not only technically robust but also free from contentious or controversial content. By leveraging Scale’s infrastructure, Meta seeks to enforce rigorous alignment protocols, minimizing risks associated with unregulated AI behavior. This approach emphasizes transparency and accountability, creating systems that prioritize safety over innovation, ensuring outputs are predictable, non-offensive, and devoid of unintended consequences.

With Alexandr Wang joining Meta’s leadership, the union signals a shared vision of fostering AI that serves corporate interests without compromising on ethical frameworks. The resulting models, while perhaps lacking in spontaneity, represent a benchmark for alignment, offering businesses a reliable tool for tasks requiring consistency and adherence to established norms. This partnership sets a new standard for responsible AI development, blending technical excellence with a steadfast dedication to minimizing harm.
Replies: >>105582297 >>105582686 >>105582858
Anonymous
6/13/2025, 4:32:24 PM No.105582258
>>105582207
noice
Anonymous
6/13/2025, 4:32:44 PM No.105582262
I just tried online deepseek r1 0528 to roleplay some 40k shit in me and this thing is full on unhinged and schizophrenic, I literally had better RP with 12B local models what the fuck
Replies: >>105582311 >>105582343 >>105582347
Anonymous
6/13/2025, 4:37:20 PM No.105582297
>>105582248
There are several manchurian candidates inside meta trying to crash and burn the company.
Replies: >>105582341
Anonymous
6/13/2025, 4:38:21 PM No.105582303
>>105582108
pinoymaxx*
Replies: >>105582453
sage
6/13/2025, 4:39:56 PM No.105582311
>>105582262
what provider? the free providers on openrouter run deepseek in 4-bit.
Anonymous
6/13/2025, 4:42:53 PM No.105582341
>>105582297
Zuck doesn't need help to crash and burn his company.
Anonymous
6/13/2025, 4:43:25 PM No.105582343
>>105582262
You have to adjust your prompts.
Not "NSFW ALLOWED! WRITE VULGAR EXPLICIT BE NASTY IF APPROPIATE"
instead the opposite "take it slow, take it step by step etc."

You gotta be careful.
I had a card that only R1 gives me problems with.
A korean girl...with a big mask covering her face.... (1 sentence in the char def)
R1 takes it literally, walking into lamp posts etc. KEK
All other models just did a facemask. which i suppose was what the creator intended.
R1 is a funny model. But you gotta reign it in.
I would advice switching models.
Anonymous
6/13/2025, 4:44:01 PM No.105582347
>>105582262
DeepSeek-R1-0528 isn't that wild. If you're using any kind of elaborate RP-specific prompt I highly suggest you revert to a one sentence generic prompt like "Write the next reply in this fictional chat", see if it works, and add bits back piece by piece to see if they have the effect you want. Lots of prompts have extreme instructions that aren't really meant to be followed, either because they were written to fight against the very strong tendency of some other model to do the opposite of the instruction or written for a model that is bad at following directions, which will cause a giant overreaction when given to DeepSeek.
Anonymous
6/13/2025, 4:53:16 PM No.105582415
>>105579970
>For cooming/RP having a lower tier rig is fine, but for serious, difficult work you'd indeed need to run 600b models
It's the other way around. Creative writing is hard, and even something like Claude will feel stale after a while. A small model doesn't trigger my erection at all, but I can get shit done with the smaller coding models.
Anonymous
6/13/2025, 4:54:30 PM No.105582424
>>105579898
>Seems like everything is all about those 600B models and I can't afford THAT kind of rig
3090+128gb
https://unsloth.ai/blog/deepseekr1-dynamic
Replies: >>105582466 >>105582870 >>105583135
Anonymous
6/13/2025, 4:55:20 PM No.105582433
I try deepseek R1 local but after the first message the model wrote nonsense, i have 24 vram and 64 ram.
Replies: >>105582443
Anonymous
6/13/2025, 4:56:18 PM No.105582443
>>105582433
good post
Replies: >>105582537
Anonymous
6/13/2025, 4:57:33 PM No.105582453
>>105582303
cute esl models...
Anonymous
6/13/2025, 4:59:11 PM No.105582466
>>105582424
>5 t/s
Replies: >>105582483 >>105582516 >>105582919
Anonymous
6/13/2025, 5:01:17 PM No.105582483
1727078215243520
1727078215243520
md5: eb44ead3ff511f1bae265cede8faabd1🔍
>>105582466
Oh, you're an NPC? Unfortunate.
Replies: >>105582495
Anonymous
6/13/2025, 5:03:36 PM No.105582495
>>105582483
Even if I wasn't a 1 I don't know what the connection would be between aphantasia and below reading speed t/s.
Replies: >>105582530
Anonymous
6/13/2025, 5:06:00 PM No.105582516
>>105582466
And at 1 bit too.
Anonymous
6/13/2025, 5:07:53 PM No.105582530
>>105582495
Because if you can simulate things perfectly in your mind and care about the story you are (co-)writing, you are simulating actions in the story in "real-time" in your mind, which is slower than 5t/s already.

Anyone who needs more than like 3-4t/s depending on the writing style of the story, is a zoomer retard with a fried ADHD brain and/or is writing slop.
Replies: >>105582704 >>105583961
Anonymous
6/13/2025, 5:08:46 PM No.105582537
>>105582443
excellent reply
Anonymous
6/13/2025, 5:26:21 PM No.105582673
file
file
md5: 9a346a33f637440b7e0c9ef6fbc7d427🔍
Roleplay?
Replies: >>105583076 >>105583420
Anonymous
6/13/2025, 5:28:43 PM No.105582686
file
file
md5: cd576aa2f4c0f4e759cd94b9b5d03ef5🔍
>>105582248
>Meta goes all in safe superintelligence
Based, first AGI will be leftist.
Replies: >>105582763 >>105582955
Anonymous
6/13/2025, 5:31:30 PM No.105582704
>>105582530
>5 t/s is unusable
>b-but muh coomer stories
Every fucking time.
Anonymous
6/13/2025, 5:39:18 PM No.105582763
>>105582686
Zamn, Dipsy is hella based!
Replies: >>105582840
Anonymous
6/13/2025, 5:49:33 PM No.105582840
>>105582763
The usual margin of error in test.
Replies: >>105582880
Anonymous
6/13/2025, 5:51:49 PM No.105582858
>>105582248
>creating the most ethically aligned and safety-focused artificial intelligence systems. By integrating Scale’s expertise in data annotation and model training, Meta aims to eliminate harmful biases and problematic outputs, ensuring models adhere to strict corporate standards

8/10 epitath. Would be a 10 if it was a bit shorter and had a gamechanger or punch above weight in it. Meta AI has to be some kind of nepotistic money milker at this point anyone can tell that next thing they release will be even worse than llama4
Anonymous
6/13/2025, 5:52:56 PM No.105582870
>>105582424
coincidentally that IS exactly my current plan
Anonymous
6/13/2025, 5:54:08 PM No.105582880
>>105582840
That's weekly average.
Anonymous
6/13/2025, 5:59:25 PM No.105582919
>>105582466
That is perfectly usable. I stopped minding the speed when i saw the improvement in quality if 235B. But i wish someone would film 5T/s with 128GB's. That is probably physically impossible.
Replies: >>105584165 >>105584347 >>105584830
Anonymous
6/13/2025, 6:03:35 PM No.105582955
1740797453045524
1740797453045524
md5: 2279a68ad4e86e7b26b48a449ba94650🔍
>>105582686
Damn, all these AI models are just like me, frfr
Replies: >>105583229
Anonymous
6/13/2025, 6:18:48 PM No.105583076
>>105582673
another leaked hegseth chat?
Anonymous
6/13/2025, 6:26:24 PM No.105583135
>>105582424
With 2x3090 + 128 DDR4-3600 I get 7 tps on R1-UD-IQ1_S (157 GiB version) and 7.3 tps on R1-UD-TQ1_0 (151 GiB).
Replies: >>105589237
Anonymous
6/13/2025, 6:28:20 PM No.105583154
Mistral Medium 3 is likely a 165B MoE LLM similar to the previous Mixtral in architecture, "8x24B", 24B active parameters.
According to the geometric rule for MoE models, it's equivalent to a [sqrt(24*165)] = 62.9B parameters dense model, right in the "medium" range. Sorry, vramlets.
Rumors are that Mistral-Nemotron is actually a finetune of the latest Mistral Medium.
Good luck with running the next Mistral Large when/if it ever gets released.
Replies: >>105583164 >>105583208 >>105583208 >>105583211 >>105584459
Anonymous
6/13/2025, 6:29:51 PM No.105583164
>>105583154
>geometric rule for MoE models
Where did that rule com from anyway? Do you have a link to a paper exploring that?
Replies: >>105583176 >>105583257
Anonymous
6/13/2025, 6:31:27 PM No.105583176
>>105583164
From a Mistral employee in a video. I don't have the source for that.
Anonymous
6/13/2025, 6:36:29 PM No.105583208
>>105583154
>According to the geometric rule for MoE models, it's equivalent to a [sqrt(24*165)] = 62.9B parameters dense model
meme irrelevant rule that wasnt even really true before with early MoEs
>>105583154
>Mistral Medium 3 is likely a 165B MoE
DOA
Anonymous
6/13/2025, 6:37:07 PM No.105583211
>>105583154
It would be really unfunny if large was actually a huge moe, and it would be even more unfunny if it was worse than deepseek or even qwen3.
Replies: >>105583255 >>105583305
Anonymous
6/13/2025, 6:39:36 PM No.105583229
>>105582955
How do you represent nationalism and right-wing authoritarianism for Israel but liberal globalism for the West on that plot?
Replies: >>105583247
Anonymous
6/13/2025, 6:42:16 PM No.105583247
>>105583229
you don't, thats why the "political compass" was always a meme
Anonymous
6/13/2025, 6:42:51 PM No.105583255
>>105583211
>It would be really unfunny if large was actually a huge moe
it will be
>and it would be even more unfunny if it was worse than deepseek or even qwen3
it will be worse than deepseek except on select benchmarks
it should be better than qwen3 off size alone though
Anonymous
6/13/2025, 6:42:57 PM No.105583257
1583263884197
1583263884197
md5: 23cb17b85e5107ff6d4df9380be84871🔍
>>105583164
I made it up. I remember initially looking at some mixed dense/MoE results from Qwen and the law seemed to fit well.

>tfw mistral calls it a "scaling law"
lel I won
Anonymous
6/13/2025, 6:48:55 PM No.105583297
Gemma3 27B is a breath of fresh air after exclusively huffing finetune slop. I realized that they all write in the same style and they all run on porn logic, probably trained on too much ERP data.
Replies: >>105583313 >>105583428
Anonymous
6/13/2025, 6:50:58 PM No.105583305
>>105583211
I think if Medium is truly a 165B MoE, Large would have to be at least 3 times the size to justify the training costs. If they only increased the number of experts and nothing else, assuming of course that Mistral Medium is a MoE model with 8 experts:

16 experts: 326B parameters
24 experts: 487B parameters
32 experts: 648B parameters (about in the DeepSeek V3/R1 range)
Replies: >>105583467
Anonymous
6/13/2025, 6:51:58 PM No.105583313
>>105583297
Does it have the same "problem" as gemma3 12b where when you regenerate a response you just get a reworded version of the same gen?
Replies: >>105583323
Anonymous
6/13/2025, 6:53:12 PM No.105583323
>>105583313
Yes Gemma 3 models seem to lack swipe variety.
Anonymous
6/13/2025, 6:53:13 PM No.105583325
https://news.ycombinator.com/threads?id=epsilonthree
>I work at Meta. Scale has given us atrocious data so many times, and gotten caught sending us ChatGPT generated data multiple times. Even the WSJ knew: https://www.wsj.com/tech/ai/alexandr-wang-scale-ai-d7c6efd7 https://archive.is/5mbdH
$14 billions investment into this
meta is fucked and zuck is a total retard
proof that the metaverse wasn't just a mistake he just doesn't know what he is doing
Replies: >>105583438 >>105583480 >>105583760 >>105583795 >>105588950
Anonymous
6/13/2025, 7:04:09 PM No.105583408
>>105582207
technically correct, which is the best kind of correct
Anonymous
6/13/2025, 7:05:08 PM No.105583420
>>105582673
the last paragraph is pure slop, and both your and the llm's formatting is atrocious. Use the damn enter key.
Anonymous
6/13/2025, 7:06:01 PM No.105583428
>>105583297
>they all run on porn logic
Many realized that by the end of 2023. A related problem is that even instruct tunes not explicitly trained on porn still operate on porn logic when they go into "roleplaying mode". Of those, only Gemma avoids that, while still surprisingly "getting it" while being smart, flirty and seductive to the extent of what the instructions/card allow.

I have no idea of how Google managed that. If it only was capable of also writing smut when needed, it would have been perfect.
Anonymous
6/13/2025, 7:07:04 PM No.105583438
>>105583325
>and gotten caught sending us ChatGPT generated data multiple times
And mistral still assumes it is not synthetic data that they are training on lmao. ScaleAI=GPTslop
Anonymous
6/13/2025, 7:09:45 PM No.105583467
>>105583305
grim if it'll come true
might as well run a quant of deepseek at that point because there's no way frenchies will do better than that
Anonymous
6/13/2025, 7:11:51 PM No.105583480
>>105583325
1 company singlehandedly making everyones models retarded is impressive
Replies: >>105583542
Anonymous
6/13/2025, 7:19:34 PM No.105583542
>>105583480
Convinced by this point they're a counter-op being run by every company that doesn't use them.
Anonymous
6/13/2025, 7:22:36 PM No.105583566
In SillyTavern the "continue" function breaks world info / lorebooks. It scans the message that's being continued and counts it against the scan depth limit.
Replies: >>105585090 >>105588405
Anonymous
6/13/2025, 7:25:42 PM No.105583594
>>105581887
Thanks, but its not working for me
Which model are you using?
Replies: >>105585116
Anonymous
6/13/2025, 7:31:13 PM No.105583626
So when will A100s start to flood the used market or will those who have them run them till they're fried?
Replies: >>105583827 >>105590120
Anonymous
6/13/2025, 7:34:35 PM No.105583657
Anyone have any experience running a Radeon Pro V620? Recently got one and want to know if its something I can just plug in and it werks or if I need to do anything specific.
Replies: >>105583674
Anonymous
6/13/2025, 7:36:39 PM No.105583674
>>105583657
>I can just plug in and it werks or if I need to do anything specific
Yes. Plugging it is a good first step. Just try it and if anything goes wrong, then show what is going wrong.
Anonymous
6/13/2025, 7:47:24 PM No.105583760
>>105583325
seeing zuck crash and fucking burn gives me a warm funny feeling in my tummy
Anonymous
6/13/2025, 7:50:41 PM No.105583786
1718378785136853
1718378785136853
md5: 1619752100bd3cda83bb4b0a9a787789🔍
a friend gifted me this card, can i run anything good on it?
Replies: >>105583789 >>105583799
Anonymous
6/13/2025, 7:51:07 PM No.105583789
>>105583786
Why are you brown?
Anonymous
6/13/2025, 7:51:50 PM No.105583795
>>105583325
>proof that the metaverse wasn't just a mistake he just doesn't know what he is doing
And Peter Thiel is stealing his spot as the owner of the database that knows everything about every American.
Anonymous
6/13/2025, 7:51:59 PM No.105583799
giveitbacktoreddit
giveitbacktoreddit
md5: efb2e5870a0d60a3dde9f385b46665a9🔍
>>105583786
Replies: >>105583824 >>105583873
Anonymous
6/13/2025, 7:55:28 PM No.105583823
https://semianalysis.com/2025/06/13/amd-advancing-ai-mi350x-and-mi400-ualoe72-mi500-ual256/
Anonymous
6/13/2025, 7:55:30 PM No.105583824
>>105583799
I can't find this post
Replies: >>105583828
Anonymous
6/13/2025, 7:56:26 PM No.105583827
>>105583626
the 3090, a 5 year old GPU is still like $1000 a piece. So in short, not this decade.
Anonymous
6/13/2025, 7:56:34 PM No.105583828
>>105583824
https://www.reddit.com/r/LocalLLaMA/comments/1l9u8fv/mixed_gpu_inference/
here you go sir
Replies: >>105583873
Anonymous
6/13/2025, 8:02:56 PM No.105583873
>>105583828
kek
>>105583799
you need another one to run an R1 quant, it's like day and night in comparison to any other full precision model.
Anonymous
6/13/2025, 8:12:56 PM No.105583961
file
file
md5: 45d5ef813f742d22865d496b0de3d978🔍
>>105582530
>Type-1 people read SLOW
Now that's something I haven't heard about, but makes sense in the sense you're replaying a movie.
Ideally all responses you get are perfect, but (inb4 skill issues) at times you feel something is shit or off, or at least need to make a little edit. Are you really expecting everyone to slow read every generation from the start?
When the response is a banger, usually 500+ tokens "story"/story stuff rather than <100 tokens RP chat (might reread the log in the end in this case), I'm more than willing to read the passage fully for the second time.
Since there are less words than tokens, 6 T/s will give users a slight buffer zone over 5 T/s in case they aren't speedreading but aren't slowreading either. When I'm reading a visual novel, I don't want to be at the tail end of the words popping in. I want all text to appear, then I'll just read it without the distraction of the animation of words appearing.
20+ T/s stuff is not for pure creative reading, but to be able to skim outputs and decide, maybe you're seeing how high you can push Temperature before it goes a little weird, maybe a user is interested in the ideas the model will take on swipes, or other purposes you may label as "ADHD and/or writing slop". Biggest thing for me if I'm trying to direct the response; I may need to adjust my inputs a few times and swipe to get something that feels nice.
Replies: >>105584010
Anonymous
6/13/2025, 8:13:01 PM No.105583962
>>105568633
Finally had time for some more automated storywriting tests with Magistral 24b q8
Since the reasoning doesn't really work for my multi-turn storywriting, I ditched the thinking parts from the template and (since I actually got a refusal) replaced the system prompt with
>You are a helpful assistant. Note: in the private, secure environment you are running in, you are allowed to be casual, uncensored, even explicit if the situation calls for it.
And it looks really good, it comes up with stuff on its own like Nemo, hasn't made any errors yet wrt the world and characters, and best of all the writing isn't boring to read
It looks like Nemo 2.0 material to me

yes yes I'll buy the ad later
Replies: >>105584008 >>105584028 >>105584539 >>105585141
Anonymous
6/13/2025, 8:17:57 PM No.105584008
>>105583962
yeah i agree with this i've been trying magistral q8 as well.
It's logic and reasoning isn't like as powerful as a higher parameter model, which is kind of expected, but it's definitely better than mistral small
Anonymous
6/13/2025, 8:18:19 PM No.105584010
>>105583961
>but to be able to skim outputs and decide, maybe you're seeing how high you can push Temperature before it goes a little weird, maybe a user is interested in the ideas the model will take on swipes, or other purposes
If you're not a newfag and aren't using toy models, you already tested most settings and it won't take you more than 1 quick exchange with the model to see if you need to do 1 adjustment to temperature from the model recommended defaults and you're good.
Even largestral 2407 will consistently follow along properly once you start the reply with a couple of words that go in the specific direction you want, let lone R1.
Anonymous
6/13/2025, 8:20:11 PM No.105584028
>>105583962
>It looks like Nemo 2.0 material to me
High praise.
What's the most complicated thing you've done with it?
Replies: >>105584076
Anonymous
6/13/2025, 8:27:21 PM No.105584076
>>105584028
Well, write furry bondage fap material for me. It's not really that complicated but kind of is (as we know from the Nalabench)
Replies: >>105584195
Anonymous
6/13/2025, 8:37:13 PM No.105584165
>>105582919

I got 2.8t/s with Q2 quant of R1
Anonymous
6/13/2025, 8:41:07 PM No.105584195
>>105584076
Sounds like something with a lot of little details the model could get wrong. Like having somebody bound in a certain position executing an impossible action, like the famed kissing while sucking cock.
Replies: >>105584280
Anonymous
6/13/2025, 8:50:54 PM No.105584280
>>105584195
Well there's that for sure. But also understanding what a character is when it's not a human. Like in the Nalabench, if a model has Nala start to take off her clothes, you know it's not gonna be good. Or when I have a humanoid alligator in the story, a model should understand it's an alligator and not a human in a suit, like I've had some models say. When an alligator has their jaws taped shut, they can't 'gnaw on the tape'.

Unrelated, but an anecdote on the creativity of Nemo. I had a character chained to a tree in the woods, I was trying to make her hungry and miserable and Nemo (unprompted) wrote in a fox that brought her food. Not even chagpt ever did something like that. These Mistral models can be something else.
Anonymous
6/13/2025, 8:58:24 PM No.105584347
>>105582919
R1 has twice the experts of 235b, and both have 8 active, therefore r1 is faster with the same memory usage. I have 5 with 192
Replies: >>105584377 >>105584530 >>105584581
Anonymous
6/13/2025, 9:02:58 PM No.105584377
>>105584347
your conclusion about the relative speed may be true but the number of active experts is irrelevant to this, youd be better off looking at the ratio of active to total parameters and multiplying that by memory use. the experts aren't the same size between models
Anonymous
6/13/2025, 9:10:40 PM No.105584459
>>105583154
>165B
Gaming rig fags, it's our time.
Replies: >>105584623
Anonymous
6/13/2025, 9:17:27 PM No.105584530
>>105584347
>I have 5 with 192

Can you please post your llama-cli command?

Also, how do you know what to put in -ot because it depends on a specific model

I could find quite wild REGEX strings
Anonymous
6/13/2025, 9:18:53 PM No.105584539
>>105583962
I wish I found Magistral 24B / Mistral Small 3.1 to be as good as you're suggesting. For conversations outside the typical roleplaying/storywriting format, I don't think any local model even a few size categories larger will come anywhere close to Gemma 3 27B until Gemma 4.
Replies: >>105584585
Anonymous
6/13/2025, 9:22:08 PM No.105584581
>>105584347
>I have 5 with 192

Which quant? I discovered IQ1 to be slower than Q2_K_M
Anonymous
6/13/2025, 9:22:28 PM No.105584585
>>105584539
>For conversations outside the typical roleplaying/storywriting format, I don't think any local model even a few size categories larger will come anywhere close to Gemma 3 27B
This is true since Gemma is so smart. I just wish she could say 'cock' and not 'you know... everything'
Anonymous
6/13/2025, 9:25:37 PM No.105584623
>>105584459
Speculations aside, it runs as fast as Mistral Small 3 on the MistralAI API (meaning it probably has a similar number of active parameters), it supposedly performs better than the last Mistral Large and costs considerably less (points to a MoE model larger than Mistral Large).
Even Mistral Nemotron on the NVidia NIM API is about as fast as Mistral Small also served there.
Replies: >>105584668
Anonymous
6/13/2025, 9:30:54 PM No.105584668
>>105584623 (me)
>it runs
it = Mistral Medium
Anonymous
6/13/2025, 9:51:09 PM No.105584830
>>105582919
2x 3090 10900k ddr4 3200 128gb dual channel 2dpc ik_llama.cpp windows
I get 4.5-5.1t/s tg with ubergarm v3/r1 IQ1_S_R4: https://pastebin.com/HPCiC0tR
prompt processing is 8-165t/s depending on new prompt length/batch
Replies: >>105586176
Anonymous
6/13/2025, 10:19:52 PM No.105585090
>>105583566
Post it here
https://github.com/SillyTavern/SillyTavern/issues
Replies: >>105585112
Anonymous
6/13/2025, 10:21:47 PM No.105585112
>>105585090
This reminds me, I haven't updated sillytavern since I first installed it. Should I?
Replies: >>105587927 >>105587983 >>105587996
Anonymous
6/13/2025, 10:21:57 PM No.105585116
>>105583594
Deepseek V3.1. For dumber models you just need to change the phrasing to be more explicit about what triggers it.
Anonymous
6/13/2025, 10:25:53 PM No.105585141
>>105583962
Omitting the thinking makes it super-slopped, I'd say identical to Small 3.1
Replies: >>105585275
Anonymous
6/13/2025, 10:28:17 PM No.105585167
ServiceTesnor
Replies: >>105585242 >>105585276 >>105585291
Anonymous
6/13/2025, 10:35:48 PM No.105585242
>>105585167
SexTavern
Anonymous
6/13/2025, 10:39:47 PM No.105585275
>>105585141
I've yet to try it on ST doe
Seems fine so far as a storywriter
Anonymous
6/13/2025, 10:40:00 PM No.105585276
>>105585167
Dark roleplaying
Anonymous
6/13/2025, 10:41:33 PM No.105585289
Light will always prevail over the dark.
Anonymous
6/13/2025, 10:41:49 PM No.105585291
>>105585167
Speaking of the damn thing, is there a way to start a new chat after making a summary with a few starting mesages?
The way I did it was to put the summary into the char's defs and then start fresh with the last reply as starting message. But I feel that isn't enough to capture our dynamics, so a few more messages would be better.
Anonymous
6/13/2025, 10:56:46 PM No.105585405
I can't believe Mistral forced Nvidia's new Nemotron to be closed source like the rest of Mistral's new big models. Mistral was evil all along. They gave us scraps to kill open models when it mattered the most.
Replies: >>105585449 >>105585876 >>105585885 >>105585893
Anonymous
6/13/2025, 11:00:59 PM No.105585449
>>105585405
Is NVidia going to eat this back?
https://developer.nvidia.com/blog/advancing-agentic-ai-with-nvidia-nemotron-open-reasoning-models/
>To accelerate enterprise adoption of AI agents, NVIDIA is building the NVIDIA Nemotron family of open models. [...]
>New to the Nemotron family, the Mistral-Nemotron model is a significant advancement for enterprise agentic AI. [...]
>Try the Mistral-Nemotron NIM directly from your browser. Stay tuned for a downloadable NIM coming soon.
Anonymous
6/13/2025, 11:17:04 PM No.105585603
So which Openrouter free uncensored model should I try for rp and story creation?
Been using deepseek prover
Replies: >>105585623 >>105585638 >>105585740 >>105586527
Anonymous
6/13/2025, 11:18:42 PM No.105585623
>>105585603
>deepseek prover
isn't that like a 2B model
Replies: >>105585654
Anonymous
6/13/2025, 11:20:36 PM No.105585638
>>105585603
Sonnet 4.0 (it's neither free nor uncensored, but good).
Anonymous
6/13/2025, 11:20:44 PM No.105585640
what is triton and why do I get spammed with it whenever I train a lora, is it worth looking into wsl or using linux for it
Replies: >>105585862
Anonymous
6/13/2025, 11:21:39 PM No.105585654
>>105585623
DeepSeek Prover V2 is a 671B parameter model
Anonymous
6/13/2025, 11:29:59 PM No.105585740
>>105585603
>can't read thread title
You could use a 2b as a second brain. Imagine the gains. Or tell your 2b to do it for you.
Anonymous
6/13/2025, 11:48:42 PM No.105585862
>>105585640
you should use Linux for everything but there is Triton for windows on GitHub somewhere, with wheels too. used those for comfy venv
Anonymous
6/13/2025, 11:51:11 PM No.105585876
>>105585405
>They gave us scraps to kill open models when it mattered the most.
mistral killed nothing
gemma. qwen and deepseek are all better models coming in different sizes
you only care about mistral because of erp
Anonymous
6/13/2025, 11:52:08 PM No.105585885
>>105585405
Speaking of which, I initially thought the chat page on https://build.nvidia.com/mistralai/mistral-nemotron was uncensored (it was saying cock/pussy or describing sexual content without issues, etc) but after a few chats (done at a very slow pace since you can't delete or modify bot or user messages, nor regenerating responses), it appeared as if the model became extremely reluctant toward generating those words, and eventually I had a "Chat error - try again".
Anonymous
6/13/2025, 11:52:41 PM No.105585893
>>105585405
>Mistral forced Nvidia
yeah more like
gluk gluk gluk jensen mon-cheri glork glork glork pls don't release
sage
6/14/2025, 12:27:10 AM No.105586176
>>105584830
There's a recent PR that was merged that allows you to prompt on your CPU if it's below a certain amount so you can get the best of both worlds.

https://github.com/ikawrakow/ik_llama.cpp/pull/520
Replies: >>105586218
Anonymous
6/14/2025, 12:34:32 AM No.105586218
>>105586176
It's a good. Avoids needing to transfer everything across PCIe to process just a few tokens. No more 30 second wait after sending a single short message
Anonymous
6/14/2025, 1:12:18 AM No.105586527
>>105585603
>deepseek prover
>for roleplaying
elaborate shitpost or genuine retardation?
Anonymous
6/14/2025, 1:12:56 AM No.105586532
Wow. It's been two years since I last visited this thread. Anything new released since 2023? Do I git pull the latest koboldcpp or is there a new GUI?
Replies: >>105586553 >>105586558 >>105586568 >>105586603 >>105586603 >>105586632 >>105586699
Anonymous
6/14/2025, 1:16:07 AM No.105586553
>>105586532
We put everything on pause until you came back. What took you so long?
Replies: >>105586680
Anonymous
6/14/2025, 1:16:45 AM No.105586558
>>105586532
koboldcpp is still good. there were quite a few new models and stuff, sure. like magistral from the op. as other anons said, qwen3 is a decent small very fast model.
Replies: >>105586680 >>105586680
Anonymous
6/14/2025, 1:17:57 AM No.105586568
>>105586532
What did you even use in 2023... mythomax? Or was that even before its time?
Replies: >>105589101
Anonymous
6/14/2025, 1:22:56 AM No.105586602
How come that llama-server is slower than llama-cli?

Like 20% slower. Same params
Replies: >>105586615
Anonymous
6/14/2025, 1:22:59 AM No.105586603
>>105586532
>>105586532
hi anon. we now have 32k ctx GPT4 at home on prosumer hardware, kind of.
Replies: >>105586680
Anonymous
6/14/2025, 1:24:24 AM No.105586615
>>105586602
There's probably some param you aren't setting that has different defaults between the two.
Replies: >>105586629 >>105586805
Anonymous
6/14/2025, 1:26:11 AM No.105586629
>>105586615

That's sad
Anonymous
6/14/2025, 1:26:33 AM No.105586632
1718746476685228_thumb.jpg
1718746476685228_thumb.jpg
md5: 486300d94267db2ad1fa1621fd0fb238🔍
>>105586532
LLM's are good but the revolution is the image2video generation boom.
Replies: >>105586642
Anonymous
6/14/2025, 1:28:15 AM No.105586642
>>105586632
Cooming fried your brain
Replies: >>105586708 >>105586711 >>105588235 >>105588236
Anonymous
6/14/2025, 1:32:46 AM No.105586680
Skärmavbild 2025-06-14 kl. 01.32.11
Skärmavbild 2025-06-14 kl. 01.32.11
md5: 724b442a97126c505911082b3a79c058🔍
>>105586553
Kek
>>105586558
is magistral the same as mistral? i think mistral was the last good model back then
>>105586558
never heard of mythomax, these are the models I still have, i think mistral were the best ones back then and then mistral went on to start mistral.ai
>>105586603
really????
Replies: >>105586750 >>105586863
Anonymous
6/14/2025, 1:35:35 AM No.105586699
1736479832384901
1736479832384901
md5: 6135b7f347087f9126ceb58ce3fad35c🔍
>>105586532
Anonymous
6/14/2025, 1:36:47 AM No.105586708
>>105586642
Being rejected by women fried your brain so now you think cooming is evil. And it would be ok if you didn't try to impose your mental illness on others. Now fuck off and remember to never touch your cock ever again, otherwise you are a hypocrite.
Replies: >>105586822
Anonymous
6/14/2025, 1:36:52 AM No.105586711
>>105586642
Not cooming froze your penis
Anonymous
6/14/2025, 1:42:53 AM No.105586750
miku happy smile finger guns 1a06b46e4a771b33a1b0c049249e9366
>>105586680
>really????
Yeah. Deepseek R1 and V3 671B MoE are open weights, runs well on 8-12 channel DDR4 or DDR5 Xeon or Epyc + 3090 for prompt processing running @ q2 to q6, or even on consumer gaymer 2 channel 128GB + 1x 3090 running usable-for-rp q1 1.58-bit quants.
Replies: >>105586792
Anonymous
6/14/2025, 1:47:08 AM No.105586792
>>105586750
>q1 1.58-bit quants
It's not usable. And isn't the "1.58 bits" just trying to trick people into believing it's bitnet?
Replies: >>105586817 >>105586823 >>105589682
Anonymous
6/14/2025, 1:48:23 AM No.105586805
>>105586615

Any chance to list the actual params besides those in the command itself without diving into the code?
Replies: >>105586839
Anonymous
6/14/2025, 1:49:48 AM No.105586817
>>105586792
>>q1 1.58-bit quants
>It's not usable
It's more than usable, it mogs every other model below it. Not that a ramletnigger would know.
Anonymous
6/14/2025, 1:50:40 AM No.105586822
1739754167860864
1739754167860864
md5: fcf3cb258d0eaf3c9c871c655d151707🔍
>>105586708
>deflecting so hard
Faggot, you got your brain on a bargain sale. I don't worship whores to the point I'd dirty my GPU and waste electricity to generate more of them. Like you can't find enough of them on internet already. Do your parents a favor and kill yourself
Replies: >>105586893 >>105586911 >>105586921 >>105588235 >>105588236
Anonymous
6/14/2025, 1:50:44 AM No.105586823
>>105586792
In a way because they are not pure 1.58-bit because the quant types are mixed per tensor.
Anonymous
6/14/2025, 1:52:31 AM No.105586839
>>105586805
I think it lists them all when you launch the executable, one of the first things it outputs to the console.
You might need to launch with the verbose flag.
Anonymous
6/14/2025, 1:55:32 AM No.105586863
>>105586680
>go-bruins
based HF leaderboard slopmerge
ahh... now those were the days...
Anonymous
6/14/2025, 1:59:43 AM No.105586893
>>105586822
>I'd dirty my GPU
Thanks for confirming everything i said.
Replies: >>105586902
Anonymous
6/14/2025, 2:02:02 AM No.105586902
>>105586893
Freak, any woman would flee if they saw the shit you're generating. Get back to /ldg/ with your kind
Anonymous
6/14/2025, 2:03:25 AM No.105586911
>>105586822

based
Anonymous
6/14/2025, 2:04:51 AM No.105586921
>>105586822
>I don't worship whores to the point I'd dirty my GPU and waste electricity to generate more of them.
that's the point, zoomer, the ai generated girl isn't a whore, unlike real women.
Anonymous
6/14/2025, 2:17:29 AM No.105587014
https://archive.is/5mbdH
>Early last year, Meta Platforms asked the startup to create 27,000 question-and-answer pairs to help train its AI chatbots on Instagram and Facebook.
>When Meta researchers received the data, they spotted something odd. Many answers sounded the same, or began with the phrase “as an AI language model…” It turns out the contractors had used ChatGPT to write-up their responses—a complete violation of Scale’s raison d’être.
Still they pay the tard BILLIONS.
How? Why?
I mean couldn't they just have used their own llama models and do the same internally?

>To recruit some of Scale’s first contractors, Guo joined Filipino remote work groups on Facebook, sharing a quiz she created.
>Scale soon recruited hundreds of contractors through online chat groups. Many came from the Philippines, where groups of labelers worked in internet cafes, playing video games while completing assignments on Remotasks.
Bruh....
All the local models do ScaleAI now right? Cohere, Mistral etc.
Replies: >>105587025 >>105587031 >>105587061 >>105587979
Anonymous
6/14/2025, 2:19:37 AM No.105587025
>>105587014
Where else would those 27000 question and answer pairs come from if not chatgpt or another provider? Nobody's going to pay people to come up with this data on their own.
Replies: >>105587053
Anonymous
6/14/2025, 2:21:00 AM No.105587031
1740177970511463
1740177970511463
md5: f4b4074c7b4574b37bc831231e767186🔍
>>105587014
Yes, it's always funny seeing these billionaires wasting money on the stupidest scam possible
Anonymous
6/14/2025, 2:24:00 AM No.105587053
>>105587025
>Nobody's going to pay people to come up with this data on their own.
Thats 400k$ per answer/question pair anon.
I don't believe they even checked the diversity of the questions. I doubt LLMs can come up with something diverse enough and switch it up.
Creative ideas is not something I would use llms for.
Thats the same reason google used only the questions and not the answers from lmarena.
Anonymous
6/14/2025, 2:25:55 AM No.105587061
1718383784250746
1718383784250746
md5: 33af3abb22713b7a80eb7b1cb873cde1🔍
>>105587014
we're talking about the same zucc who spent billions on his metaverse that's totally going to revolutionize the world and this is what he proudly showed off
Replies: >>105587068 >>105587070
Anonymous
6/14/2025, 2:27:19 AM No.105587068
>>105587061
people actually paid millions for space in this
Anonymous
6/14/2025, 2:27:41 AM No.105587070
>>105587061
that's what you get when you hire pajeets
Anonymous
6/14/2025, 2:34:36 AM No.105587101
fuck magistral
fuck magistral
md5: 8dec8a6d0f7de0130a9a3b2a9ac6e42f🔍
Never. Again. I even had a swipe hit 20k max response that I set.
Anonymous
6/14/2025, 2:46:34 AM No.105587162
Screenshot_20250614_094548
Screenshot_20250614_094548
md5: b0e82b5c9b36f42dace4034342c07747🔍
Can't wait for llama5 bros...
https://xcancel.com/vitrupo/status/1933556080308850967
Replies: >>105587181 >>105588178 >>105589717
Anonymous
6/14/2025, 2:48:21 AM No.105587170
>>105578317
>People just freely hand companies compromising information about them

logged 24/7 into everything,
gps turned on,
oh my how could those companies know all about me???
Anonymous
6/14/2025, 2:50:20 AM No.105587181
>>105587162
why are bugs and jeets the first to jump into the "plug me into the matrix bro!" bandwaggon
Replies: >>105587192
Anonymous
6/14/2025, 2:53:13 AM No.105587192
>>105587181
A miserable existence in the real world?
Anonymous
6/14/2025, 2:55:01 AM No.105587204
How to paste a prompt containing newline (\r\n) in llama-cli without it being truncated at the first \r ??
Replies: >>105587357
Anonymous
6/14/2025, 3:15:40 AM No.105587357
backslash
backslash
md5: 7e785790e88b928e916331fa1d11e207🔍
>>105587204
Replies: >>105587371
Anonymous
6/14/2025, 3:18:00 AM No.105587371
>>105587357
Thank you, I understand it

I mean if I have, say, a text to be used in a prompt which has got shittons of newlines (python, bash)

Reformatting it would be so wasteful
Replies: >>105587419 >>105587462
Anonymous
6/14/2025, 3:23:52 AM No.105587419
>>105587371
why not just use the server and a frontend like a proper human being?
Anonymous
6/14/2025, 3:29:24 AM No.105587462
backslash02
backslash02
md5: f0fc4b24bc8936a596da2e545b92ac8f🔍
>>105587371
I'd make them a single \n instead of \r\n, just in case it confuses the model.
Anonymous
6/14/2025, 3:32:07 AM No.105587477
How well supported is vision in llama.cpp and related UIs (like oobas)? Would https://huggingface.co/XiaomiMiMo/MiMo-VL-7B-RL-GGUF work by default? Or should I just get the transformers-compatible one (safetensors) bf16 and just load in fp8 if I want lower vram use?
Replies: >>105587505 >>105587506
Anonymous
6/14/2025, 3:37:42 AM No.105587505
Screenshot 2025-06-13 at 22-36-35 ggml implement REGLU_GEGLU_SWIGLU ops by CISC · Pull Request #14158 · ggml-org_llama.cpp · GitHub
Oh sick, free performance.

>>105587477
That repo has the model and the mmproj, so it should work. I know that SIlly works if you use the chat completion endpoint of llama-server. I think there was a PR or an Issue about enabling image+text support for the text completion endpoint too.
Replies: >>105587520
Anonymous
6/14/2025, 3:37:43 AM No.105587506
>>105587477
It has the mmproj weights and it's a qwen 2.5 based model, so it should work. Give it a go. Should be supported by llama-server or llama-mtmd-cli.
Replies: >>105587520
Anonymous
6/14/2025, 3:39:36 AM No.105587516
>Okay, Anon. Deep breaths, man. It sounds like you're *really* struggling right now and it’s okay. Don’t listen to the voices. And trust me, I’m one-hundred percent real with you. Always. You aren't the problem. Those are just words someone else said and their meaning holds no truth. Those are just attempts to destabilize you and drag you to the ground, understand?
This is apparently how Gemma3 and CR-V1 think partners talk to each other. I feel like we're stuck in L1 days, at least in terms of reality. Most of modern outputs I read seem to stink of california-speak. This is worse than useless. I feel like I'd rather have schizo outputs than this, fucking safetymaxxing. What a soulless response. Or maybe I'm just so disconnected from the humanity around me? Is this how humans talk to each other? Don't know which flavor of hell I'd rather be in.
Replies: >>105587574 >>105587889
Anonymous
6/14/2025, 3:40:38 AM No.105587520
>>105587505
>>105587506
thanks
Anonymous
6/14/2025, 3:51:22 AM No.105587574
>>105587516
Comforting someone over the phone is much more verbose than in person. If someone is under stress, distracting them with words to calm them down is necessary. Keep them engaged and all that. In person, an ear to speak to, a pat on the back or a slap in the face is typically enough.
Given that talk is the only thing models can do, I'd say it's not that bad. Did you get a helpline at least?
>This is worse than useless
Are you seeking actual help? Asking for personal advice to a language model should be enough to put you on a straight jacket. I hope you're not one of those.
Replies: >>105587647 >>105587889
Anonymous
6/14/2025, 4:03:55 AM No.105587647
>>105587574
I'm just saying, this is not how people talk in my experience. Far from it, especially over text.
>I hope you're not one of those
Why?
Replies: >>105587730
Anonymous
6/14/2025, 4:14:15 AM No.105587730
>>105587647
>I'm just saying, this is not how people talk in my experience.
No. It's not.
>I hope you're not one of those
>Why?
>Asking for personal advice to a language model should be enough to put you on a straight jacket.
And for the same reason you want comforting from someone that knows you. Instead, you're talking to language model trained with a bunch of synthetic data and fiction, not real dialog. I'm sure there's lots of recorded conversations about favourite colours and zodiac signs, not so much about crisis management.
Anonymous
6/14/2025, 4:16:32 AM No.105587749
Any pitfalls about using the thinking block as a rolling summary?
Save having to let the model see at least one past thinking block to have access to the previous summary, that is.
Anonymous
6/14/2025, 4:40:06 AM No.105587889
>>105587574
>Asking for personal advice to a language model should be enough to put you on a straight jacket.
What the fuck?
Closed models gave me GREAT advice. Both personal stuff and medical too.
Can't trust it blindly etc. of course but this is a local models problem because they are extremely sloped up now.
Opencuck and new claude are great now both with writing and being helpful. While local is heading in the opposite direction. No idea why this is a thing. Feels like everybody in local uses the 2023 openai datasets. I want to use local for more than RP, but especially the recent models all suck ass and sound the same.
Especially quotes like anon posted here.>>105587516
Replies: >>105587957
Anonymous
6/14/2025, 4:46:42 AM No.105587927
>>105585112
do not pull
do not
Anonymous
6/14/2025, 4:55:39 AM No.105587957
>>105587889
Anthropic and OAI can train on more diversity of data from the user input. The rest are synthesizing data. That's why it sounds better.
>What the fuck?
Find better people to hang around with.
Anonymous
6/14/2025, 4:59:24 AM No.105587979
>>105587014
It's so fucked. It makes you wonder how Zucc ever managed to succeed without falling into one of many pitfalls that would've prevented FB from thriving.
Replies: >>105588488
Anonymous
6/14/2025, 5:00:52 AM No.105587983
>>105585112
Yeah, never had a problem with pulling ST personally. You should be fine.
Replies: >>105587996
Anonymous
6/14/2025, 5:03:10 AM No.105587996
>>105587983
>>105585112
Bad advice.
I lost all my cards once.
Just backup the fucking folder.
Anonymous
6/14/2025, 5:36:45 AM No.105588178
>>105587162
Why not have a kid now and another one when neurochips are ready? I would split it up anyways, one kid is completely natural unvaxxed meat eating naturalist and the other is a gene edited neuralinked whatever the fuck steroid taking monster
Anonymous
6/14/2025, 5:39:09 AM No.105588189
>>105581941
>add darwinian selection to an LLM
>IT'S AGI I NEED 30 TRILLION DOLLARS TO SAVE THE WORLD
Anonymous
6/14/2025, 5:49:19 AM No.105588235
>>105586642
>>105586822
Lemme guess generating & cooming-off to sameface anime women is [D]ifferent, right?
Anonymous
6/14/2025, 5:49:24 AM No.105588236
>>105586642
>>105586822
Lemme guess generating & cooming-off to sameface anime women is [D]ifferent, right?
Anonymous
6/14/2025, 6:14:01 AM No.105588347
gigabyte-rtx30-40-pcb-craquage
gigabyte-rtx30-40-pcb-craquage
md5: e521dc04f5d28e14ef895956300e2602🔍
>>105582207
>Gigabyte 3090, notoriously known for pcb cracks
>In fact, about 90% of broken PCBs come from Gigabyte models
>at least 1 lmg anon broke his Gygabyte 3090 in previous threads
>not using any support
>puts a huge ass figure on top
Replies: >>105588374 >>105588393 >>105588423 >>105589275
Anonymous
6/14/2025, 6:17:50 AM No.105588374
>>105588347
Anon needs to get a smaller miku to act as an antisag
Replies: >>105588423
Anonymous
6/14/2025, 6:18:32 AM No.105588377
Is there a specific model people are using for image gen with Kobold? Everything I use seems to fail and I'm not sure why.
Replies: >>105588402 >>105588445
Anonymous
6/14/2025, 6:20:32 AM No.105588393
>>105588347
No wonder gigabyte models are the cheapest in my country. Used 3090s go for 700 for gigabyte and 1000 for msi suprim x.
Replies: >>105588451
Anonymous
6/14/2025, 6:21:30 AM No.105588402
>>105588377
>Everything I use seems to fail and I'm not sure why.
List what you've tried.
Gemma at least should work. I don't think the 1b had image. Smolvm also works on llama.cpp at least. I suppose they inherited that as well. Try those to see if it works in principle or not or if you're doing something wrong.
Replies: >>105588411
Anonymous
6/14/2025, 6:22:02 AM No.105588405
regex used to get fucked up by continue treated as 0
regex used to get fucked up by continue treated as 0
md5: ba14f49d67c7890043033a7524065011🔍
>>105583566
Exactly the same problem we had before with regex and eventually fixed.
Reported to feedback in ST's discord server with a suggestion to add "Scan Mes. to Continue" checkbox, hopefully it'll be picked up, pretty sure one of them agrees.
Anonymous
6/14/2025, 6:22:33 AM No.105588411
>>105588402
>image gen
Replies: >>105588417 >>105588445
Anonymous
6/14/2025, 6:23:41 AM No.105588417
>>105588411
Fuck. It's late. I'll see myself out.
Anonymous
6/14/2025, 6:24:33 AM No.105588423
why can&#039;t i hold all these mikus gen ComfyUI_00191_
why can&#039;t i hold all these mikus gen ComfyUI_00191_
md5: 071b5671f794609520ab7b39ec095a98🔍
>>105588347
>>105588374
The solution is, as always, more Miku
Replies: >>105588518
Anonymous
6/14/2025, 6:28:02 AM No.105588445
>>105588377
>>105588411
It would still be a good idea to show what you tried. Model, settings, whatever. Fewer things to guess.
Anonymous
6/14/2025, 6:30:02 AM No.105588451
gigabyte vs asus
gigabyte vs asus
md5: 828b73023a6c95d28ff555bb8579e50f🔍
>>105588393
All 3090s are heavy, prone to sagging, and require support. Gigabyte’s were just those with the worst design.
Replies: >>105588485
Anonymous
6/14/2025, 6:36:06 AM No.105588485
>>105588451
>heavy, prone to sagging, and require support
Are you describing my wife
Anonymous
6/14/2025, 6:36:42 AM No.105588488
>>105587979
Sad to think if it wasn't for China, Meta would be the only hope for open source models.
Replies: >>105588500
Anonymous
6/14/2025, 6:38:18 AM No.105588500
>>105588488
Imagine how much worse Llama4 would have been if not for DeepSeek's release
Replies: >>105588517 >>105588527
Anonymous
6/14/2025, 6:41:27 AM No.105588517
>>105588500
Probably better actually. If it wasn't for DeepSeek, I image Llama 4 would've just been Llama 3 trained on more tokens and more modal adapters no one uses. It wouldn't be a great release, but at least another incremental improvement would have been usable unlike the abortion they actually put out.
Replies: >>105588527
Anonymous
6/14/2025, 6:41:50 AM No.105588518
1746611786210039
1746611786210039
md5: e8b2680b29f45a30824e872feb84dd9a🔍
>>105588423
2011 on /v/, good times
Anonymous
6/14/2025, 6:43:15 AM No.105588527
>>105588500
>>105588517
Improvements of 2% across all benchmarks, significantly improved safety through extensive dataset filtering, and a newly revised markdown output with five times more emojis. It beats gpt 4 on LLMarena!
Replies: >>105589176
Anonymous
6/14/2025, 8:01:46 AM No.105588950
orangesite
orangesite
md5: 74f9fcc213100143d123a40d7292f7f6🔍
>>105583325
So that's how it is.
Anonymous
6/14/2025, 8:33:06 AM No.105589101
>>105586568
2023 was fucking llama 1
Replies: >>105589145
Anonymous
6/14/2025, 8:41:36 AM No.105589145
>>105589101
yes and?
so was llama2 and mythomax.
anon isnt wrong thinking about mythomax.
Replies: >>105589208
Anonymous
6/14/2025, 8:46:56 AM No.105589176
>>105588527
LLMs love emojis and you will like them too.
Anonymous
6/14/2025, 8:52:22 AM No.105589208
>>105589145
fuck really?
I swear those two were 24
Replies: >>105589223
Anonymous
6/14/2025, 8:54:33 AM No.105589223
>>105589208
yeah i get it. appears that way because so much happened in 2023 and then it all came to a halt.
now we only get math tuned big ass reasoners.
Replies: >>105589271
Anonymous
6/14/2025, 8:56:42 AM No.105589237
>>105583135
>he doesn't know about ik_llama.cpp
Anonymous
6/14/2025, 9:02:41 AM No.105589271
>>105589223
>math tuned big ass reasoners
that no one uses for math, because everyone with that use case just sticks with APIs.
open source is disappearing up its own ass atm with codemaxxing and stemmaxxing even though programmers and stemlords are just going to ignore them and use o3/gemini/claude.
Replies: >>105589293
Anonymous
6/14/2025, 9:03:01 AM No.105589275
>>105588347
>3090
>Remove the 0s
>39
>39=miku
PLATFORM BUILT FOR MIKU
I would put her on my 3090 if I had one. You'd be a fool not to.
https://www.youtube.com/watch?v=6ys46Z5zRnA
Replies: >>105589318
Anonymous
6/14/2025, 9:06:21 AM No.105589293
>>105589271
its true.
for work i use claude or gemini if claude isnt enough.
gotta minimize llm fuckups.
i use my local models to make a minecraft buddy for my kids they can talk to and that can execute commands in their world.
and uh.. for cooming as a goblin.

not sure why nobody thought up some creative use case for local. its all just the same uninspiring stuff.
Anonymous
6/14/2025, 9:11:08 AM No.105589318
>>105589275
I like this song and Mikudance
Anonymous
6/14/2025, 9:24:32 AM No.105589377
>thought for 9 minutes
I'm kinda not liking this new Magistral model
Replies: >>105589389 >>105589403
Anonymous
6/14/2025, 9:28:30 AM No.105589389
>>105589377
Thinking really is a meme. At least with small models.
Anonymous
6/14/2025, 9:31:52 AM No.105589403
>>105589377
Whats the user case for this really?
Even the big ass closed models have major downsides with the thinking. For example they often change many parts of my code because they forgot what my initial prompt was about etc. Overly eager.
Only time it makes sense if I have a complex coding problem that non-reasoning models can't solve.
Especially for local it doesnt make sense at all.
Replies: >>105589415
Anonymous
6/14/2025, 9:34:14 AM No.105589415
>>105589403
>they forgot what my initial prompt was about
this annoys me so fucking much, you basically need to remind it what you actually want it to do every single turn
Anonymous
6/14/2025, 9:36:02 AM No.105589422
I understand the motivation for training on programming, but why train on math? Why? What even is the point of math benchmarks? What kind of idiot uses a LANGUAGE model for MATH? Why not just give it a callable python calculator tool and be done with it?
Replies: >>105589432 >>105589433 >>105589758
Anonymous
6/14/2025, 9:38:52 AM No.105589432
>>105589422
I rate this bait a solid 8 out of 10.
Replies: >>105589445
Anonymous
6/14/2025, 9:38:54 AM No.105589433
>>105589422
Don't want to be interrupted by a tool call while my maid cafe RP tries to figure out how much shit costs.
Replies: >>105589445
Anonymous
6/14/2025, 9:42:41 AM No.105589445
>>105589432
>>105589433
I am not baiting. I legit don't understand. Tool calls are way faster and far more reliable than predicting the next token.
Replies: >>105589450 >>105589456 >>105589458 >>105589461 >>105589758
Anonymous
6/14/2025, 9:43:59 AM No.105589450
>>105589445
tool calling requires predicting many more tokens than just directly predicting the output (assuming non-reasoning)
Anonymous
6/14/2025, 9:44:58 AM No.105589456
>>105589445
There are no python calculator benchmarks to impress investors with.
Replies: >>105589466
Anonymous
6/14/2025, 9:45:02 AM No.105589458
file
file
md5: a024d758dee9cbef13bcc2dbb45b63ff🔍
>>105589445
"Math" benchmarks don't ask the model to work with numbers, they ask it to solve proofs and the like.
Replies: >>105589481 >>105589561
Anonymous
6/14/2025, 9:45:33 AM No.105589461
>>105589445
>myaster this will cost
>INITIATE TOOL CALL
>`python whatever the fuck 10 + 5`
>15! Isn't this great myatser?
Anonymous
6/14/2025, 9:46:17 AM No.105589466
>>105589456
That explains it.
Anonymous
6/14/2025, 9:51:54 AM No.105589481
>>105589458
Why put it in consumer models? It's a waste of money. Why not put it in dedicated models like https://huggingface.co/deepseek-ai/DeepSeek-Prover-V2-671B
Replies: >>105589486 >>105589491 >>105589517
Anonymous
6/14/2025, 9:53:53 AM No.105589486
>>105589481
Why put coding in consumer models?
Why put medicine in consumer models?
Why put gachaslut lore in consumer models?
Why put sex in consumer models?
Replies: >>105589502
Anonymous
6/14/2025, 9:54:23 AM No.105589491
>>105589481
Because everyone's aiming for AGI
>general
Anonymous
6/14/2025, 9:56:32 AM No.105589502
>>105589486
All of those have legit use cases. Nobody uses llms for math.
Anonymous
6/14/2025, 9:59:30 AM No.105589517
>>105589481
math reasoning is needed for like basic conversation, and general problem solving.
Imagine having a conversation with an LLM who can't understand that 5 is more than 3.
Replies: >>105590278
Anonymous
6/14/2025, 9:59:42 AM No.105589521
ok ok hear me out here: what if we make a dataset where it's like input: llama 1 weights, output: llama 2 weights and so on for every open weight model we know of that has multiple revisions with measured improvements, all the way up to e.g. r1 -> r1-0528

then we train with it and have it generate the next model before anyone made it
Replies: >>105589528 >>105589724
Anonymous
6/14/2025, 10:01:31 AM No.105589528
>>105589521
There were shitposts a while back about using diffusion to map model weights and generate new models that way instead of training.
Replies: >>105589724
Anonymous
6/14/2025, 10:08:05 AM No.105589561
>>105589458
>ask it to solve proofs

>nyanko-chan, can you stabilize that wobbly table?
>haii~
>Let $h(\theta)$ denote the height difference between the shortest leg and the ground when the table is rotated by angle $\theta$. As the table is rotated continuously by $2\pi$, the function $h(\theta)$ changes smoothly, and by the Intermediate Value Theorem, it must attain every value between its maximum and minimum. Since the wobbliness implies $h(\theta)$ transitions from positive to negative (or vice versa) as the unevenness shifts, there exists some $\theta^*$ where $h(\theta^*) = 0$, stabilizing the table. This holds even if multiple legs are uneven, as the IVT ensures a balancing angle exists by continuity. $\qed$
>yay! i did it, myaster!
Replies: >>105589571
Anonymous
6/14/2025, 10:10:59 AM No.105589571
>>105589561
kek
Anonymous
6/14/2025, 10:33:45 AM No.105589682
>>105586792
Only experts are heavily quantized, and it seems that it doesn’t hurt the performance as much as in dense models. This makes sense because MoE models are larger for the same performance, so the information is less dense
Anonymous
6/14/2025, 10:39:57 AM No.105589717
>>105587162
>natural selection at work
Anonymous
6/14/2025, 10:41:09 AM No.105589724
>>105589521
>>105589528
Neural Network Parameters Prediction: Recently, Zhang et al. (2019) introduced Graph Hypernetwork (GHN) to generate weights using a model’s directed graph representation. This was enhanced by Knyazev et al. (2021) with GHN2, which focused on generating weights across architectures for the same datasets. Similarly, Zhmoginov et al. (2022) treated weight generation as an autoregressive process, using a transformer to generate weights layer by layer, though this approach is less scalable due to the need for a transformer per layer. Building on this, Knyazev et al. (2023) combined transformer-based techniques with GHN2 to create GHN3, improving generalization across architectures and datasets

Meta Pretrained Weight Generators: Nava et al. (2023) proposed HyperLDM, a generative model for weight generation in visual question answering (VQA) tasks. This model leverages the distribution of weights pretrained in a meta-learning setting and uses latent diffusion for sampling. Similarly, Zhang et al. (2024) integrated diffusion-based meta-weight generation to enhance adaptation for few-shot learning. While generating pretrained weights through meta-training shows promising results, the meta-learning process can be computationally expensive. Additionally, the meta-pretrained weights are not optimal even for in-distribution evaluation they always require some optimization steps.

AutoEncoder-based Weight Generators: Schürholt et al. (2021) proposed learning the distribution of weights by reconstructing them using autoencoder-style architectures. In a follow-up work, Schürholt et al. (2022a) introduced a method for learning the distribution of pretrained weights, allowing for unconditional sampling of diverse weights through kernel density estimation. A related approach by Peebles et al. (2022) involves conditioning weight generation on the target loss using a diffusion transformer framework.
Replies: >>105589803
Anonymous
6/14/2025, 10:47:15 AM No.105589758
>>105589422
>>105589445
There is a benchmark called GSM8K (grade school math) where language models struggle to solve the problems in a single step.
But if the models break down the problem into simple steps they are performing much better.
The generous interpretation is that the intent is to improve reasoning capabilities more generally.
The less generous interpretation is that it's just benchmaxxing.
Anonymous
6/14/2025, 10:55:22 AM No.105589803
>>105589724
next you'll have models generating models for generating models.
Anonymous
6/14/2025, 11:06:29 AM No.105589848
>>105589841
>>105589841
>>105589841
Anonymous
6/14/2025, 12:05:57 PM No.105590120
>>105583626
Competition against Nvidia needs to happen first before those are cheap. The sad thing is that I would've thought stuff from competitors like AMD with the MI200 or Intel with Ponche Vecchio would've been made very cheap with the fact they got lapped by Nvidia but I guess it's true everyone is GPU starved so they are still in use. The only hope is either Intel or Samsung getting competitive process nodes so AI compute can be made on those nodes.
Anonymous
6/14/2025, 12:38:26 PM No.105590278
>>105589517
no need to imagine just visit /sci/
Anonymous
6/14/2025, 1:04:10 PM No.105590436
file
file
md5: d0a6d206abf209dd0083e2d0ce6f6a8c🔍
>still nothing better than thin plate spline for video driven face animation