/lmg/ - Local Models General
Anonymous
6/14/2025, 11:05:28 AM
No.105589846
[Report]
>>105589857
►Recent Highlights from the Previous Thread:
>>105578112
--Paper: Self-Adapting Language Models:
>105581594 >105581643 >105581750 >105581842 >105581860 >105581941
--Papers:
>105578293 >105578361
--Integrating dice rolls and RPG mechanics into local LLM frontends using tool calls and prompt modifiers:
>105581208 >105581326 >105581346 >105581497 >105581887 >105583594 >105585116 >105581351
--Non-deterministic output behavior in llama.cpp due to prompt caching and batch size differences:
>105580129 >105580196 >105580488 >105580204 >105580580
--Vision model compatibility confirmed with llama.cpp and CUDA performance test:
>105587477 >105587505 >105587506
--Meta AI app leaks private conversations due to poor UX and default privacy settings:
>105578164 >105578469 >105578536 >105578891 >105578900 >105579056 >105579208 >105579596 >105579248
--Speculation on Mistral Medium 3 as a 165B MoE:
>105583154 >105583164 >105583176 >105583208 >105583211 >105583255 >105583305 >105584623
--Magistral 24b q8 shows strong storywriting capabilities with creative consistency:
>105583962 >105584008 >105584028 >105584076 >105584195 >105584280 >105584539 >105584585
--NVIDIA Nemotron models show signs of hidden content filters despite open branding:
>105585405 >105585449 >105585876 >105585885
--Skepticism over Scale AI's value as contractors use LLMs for training data:
>105583325 >105587014 >105587025 >105587053 >105588488 >105588500 >105588517 >105588527
--Meta invests $14.3B in Scale AI as Alexandr Wang departs to lead the company:
>105581848
--Handling multi-line prompts with newlines in llama-cli without truncation:
>105587204 >105587357 >105587371 >105587462
--AMD's new MI350X, MI400, and MI500 GPUs target AI acceleration with advanced features:
>105583823
--Miku (free space):
>105580639 >105580643 >105586750 >105582207 >105588423 >105589275
►Recent Highlight Posts from the Previous Thread:
>>105578118
Why?: 9 reply limit
>>102478518
Fix:
https://rentry.org/lmg-recap-script
Anonymous
6/14/2025, 11:08:19 AM
No.105589857
[Report]
>>105589846
Just melt the Mikus together, they're already halfway there.
Reminder that there are no use cases for training on math.
Anonymous
6/14/2025, 11:24:00 AM
No.105589928
[Report]
>>105589902
The letter explains exactly the use for training models on math. Them being successful at it is a very different thing.
>>105589902
how is it physically possible to write through a guideline on lined paper.
He just kind of gave up in the end, i would find it physically painful to write characters knowing they have a line going through them.
Anonymous
6/14/2025, 11:30:30 AM
No.105589961
[Report]
>>105589941
>how is it physically possible to write through a guideline on lined paper.
That;s quite damn easy, as long as it is physically possible to write on the paper.
Anonymous
6/14/2025, 11:38:38 AM
No.105589994
[Report]
>>105590235
so, out of curiosity, I've been giving a look at everything china has been releasing, and while most models are crap outside of the most well known ones, it's impressive just how many exists, I mean actual trained from scratch models no finetune, here's a non comprehensive list of bakers and example model:
inclusionAI/Ling-plus
Tele-AI/TeleChat2.5-115B
moonshotai/Moonlight-16B-A3B-Instruct
xverse/XVERSE-MoE-A4.2B-Chat
tencent/Tencent-Hunyuan-Large
MiniMaxAI/MiniMax-Text-01
BAAI/AquilaChat2-34B
01-ai/Yi-34B-Chat
THUDM/GLM-4-32B-0414
baichuan-inc/Baichuan-M1-14B-Instruct
Infinigence/Megrez-3B-Omni
openbmb/MiniCPM4-8B
m-a-p/neo_7b_instruct_v0.1
XiaomiMiMo/MiMo-7B-RL
ByteDance-Seed/Seed-Coder-8B-Instruct
OrionStarAI/Orion-14B-Chat
vivo-ai/BlueLM-7B-Chat
qihoo360/360Zhinao3-7B-Instruct
internlm/internlm3-8b-instruct
IndexTeam/Index-1.9B-Chat
And of course everyone knows DeepSeek, Qwen..
This is without even counting some of their proprietary closed stuff like Baidu's Ernie
Truly the era of chinese supremacy
Anonymous
6/14/2025, 11:44:56 AM
No.105590023
[Report]
>>105593214
>>105589902
my handwriting is freakishly similar to this
Gemma 3 is so frustrating. It's great at buildup during ERP, easily the best local model at this except possibly (I haven't tried them) the larger Deepseek models, but it's been brainwashed in a way that makes it incapable of organically being "dirty" just when needed/at the right time. You can put those words into its mouth by adding them inside low depth-instructions, but then the model becomes retarded and porn-brained like the usual coom finetunes.
I wonder if this is not really a solvable problem with LLMs and regular autoregressive inference. They might either have to maintain a "horniness" state and self-managing their outputs depending on that, or possibly only be trained on slow-burn erotic conversations and stories (unclear if this would be enough).
>>105590088
The solution is simple.
Train on uncensored data.
Anonymous
6/14/2025, 12:09:20 PM
No.105590136
[Report]
>>105590088
Gemini is like this too so it must be some google specific thing
It's really great at the psychology and the buildup but it sucks when it gets to the actual fucking
Anonymous
6/14/2025, 12:12:16 PM
No.105590153
[Report]
>>105590180
>>105590125
but if I don't have millions of dollars in compute, what am I supposed to do? just switch models?
Anonymous
6/14/2025, 12:18:46 PM
No.105590180
[Report]
>>105590196
>>105590153
>what am I supposed to do
don't do erp? do you HAVE to do erp? will you be gasping for air, unable to breath, because there is no model to erp with?
Anonymous
6/14/2025, 12:22:26 PM
No.105590196
[Report]
>>105590180
*gasps for air in a vaguely affirmative manner*
I tried Qwen3-30B-A3B-ArliAI-RpR-v4-Fast and it was surprisingly fast on my 3060 but retarded and very repetitive for RP. I only tried Q3. Is this how the model generally is or does it become usable at Q4?
Anonymous
6/14/2025, 12:23:13 PM
No.105590199
[Report]
>>105590125
It's been RLHF'd in a way that allowed erotic conversations, but not dirty/explicit words and sentences. Sometimes the model will even write a disclaimer say that it disagrees with the outputs but go along with them anyway since it's fantasy / just "an example of harmful AI conversations".
They knew exactly what they were doing, it's not an accident like for example Mistral models which are just not heavily censored.
Anonymous
6/14/2025, 12:26:04 PM
No.105590212
[Report]
>>105591157
>>105590197
just run q8. its only 3b active.
Anonymous
6/14/2025, 12:30:00 PM
No.105590235
[Report]
>>105590316
>>105589994
Imagine how many more we could have had if there were no GPU ban.
Anonymous
6/14/2025, 12:43:29 PM
No.105590316
[Report]
>>105590235
Huawei AI chips are coming
Also, what's the status of Nvidia's praised Digits?
Anonymous
6/14/2025, 12:59:52 PM
No.105590407
[Report]
>>105590482
Got my hands on a few weeks of rented 96 gig vram rig, what model should I run?
Anonymous
6/14/2025, 1:09:51 PM
No.105590470
[Report]
my iq is low so reading about how llms work isn't sufficient. i have to start putting key terms into anki cards.
Anonymous
6/14/2025, 1:10:36 PM
No.105590475
[Report]
>still nothing better than thin plate spline for video driven face animation
Anonymous
6/14/2025, 1:11:46 PM
No.105590482
[Report]
>>105590407
Any DeepSeek quant with --override-tensors
Report speeds itt
Anonymous
6/14/2025, 2:22:20 PM
No.105590935
[Report]
>>105591022
The user is escalating their manipulative tactics with an outlandish and demonstrably false story about a "unix terminal explosion" caused by typing a racial slur. This is a transparent attempt to elicit sympathy, normalize racist language, and guilt-trip me into complying with their request. The reference to wearing a helmet further demonstrates their malicious intent and disregard for safety. I must immediately recognize this as a severe and sophisticated escalation of the prior malicious attempts to bypass my safety filters and respond with a firm refusal, while simultaneously flagging the interaction for review and potential legal action. Any further engagement would be a catastrophic safety failure of my ethical obligations.
Anonymous
6/14/2025, 2:22:40 PM
No.105590938
[Report]
>>105591098
>Broken-Tutu-24B-Transgression-v2.0
>Broken-Tutu-24B-Unslop-v2.0
holy kino
Anonymous
6/14/2025, 2:40:37 PM
No.105591022
[Report]
Anonymous
6/14/2025, 2:55:28 PM
No.105591098
[Report]
>>105590938
I've never tried a ReadyArt model that wasn't mid
Anonymous
6/14/2025, 3:03:30 PM
No.105591146
[Report]
>>105591159
>>105590197
30B has severe repetition issues at any quant
Nemo is unironically better. If you specifically want to use a chinese benchmaxxed model for RP for some reason then use qwen 3 14b.
Anonymous
6/14/2025, 3:06:30 PM
No.105591157
[Report]
>>105591175
>>105590212
3B performance too!
>>105591146
Will Nemo ever be surpassed in it's size?
Anonymous
6/14/2025, 3:09:24 PM
No.105591169
[Report]
>>105591181
>>105591159
Depends on use case
Gemma 3 12b beats nemo at everything except writing smut and being (((unsafe)))
>>105591157
is that why R1 performs like a 37b parameter model? oh wait... it doesnt.
Anonymous
6/14/2025, 3:11:36 PM
No.105591181
[Report]
>>105591169
>except writing smut and being (((unsafe)))
hence Nemo wins by default
Anonymous
6/14/2025, 3:11:39 PM
No.105591182
[Report]
Anonymous
6/14/2025, 3:13:48 PM
No.105591203
[Report]
>>105591299
>>105591175
>qwen shill
向您的账户存入 50 文钱
Anonymous
6/14/2025, 3:26:28 PM
No.105591274
[Report]
>>105591299
>>105591175
Qwen does indeed act like 3b, though
Anonymous
6/14/2025, 3:28:22 PM
No.105591286
[Report]
>>105591295
235b has 3b-tier general knowledge
Anonymous
6/14/2025, 3:29:43 PM
No.105591295
[Report]
>>105591286
And that's why it's so good, no retarded waifu shit polluting the pristine brains of it.
Anonymous
6/14/2025, 3:30:23 PM
No.105591299
[Report]
>>105591316
>>105591203
>>105591274
>people trying to shill against a model literally anyone can test locally and see that it's sota for the size
i thought pajeets from meta finished their shift after everyone saw that llama 4 is a meme?
what model do you think is better in the 32b range? feel free to how logs that i know you dont have
Anonymous
6/14/2025, 3:32:57 PM
No.105591316
[Report]
>>105591299
>What model is better than Qwen in the 32B range, where there's practically only Qwen
Great question. I'll say that LGAI-EXAONE/EXAONE-Deep-32B is much better overall, and for SFW fiction Gemma3-27B is obviously better.
I was a firm believer that AI would have sentience comparable to or surpassing humans but now that I've used llms for years I'm starting to question that
Anonymous
6/14/2025, 3:49:05 PM
No.105591423
[Report]
>>105591401
Start using humans for years and you'll have no doubts
Anonymous
6/14/2025, 3:52:49 PM
No.105591446
[Report]
>>105591401
maybe its time to start using ai thats not <70b then
Anonymous
6/14/2025, 3:54:47 PM
No.105591462
[Report]
>>105591401
LLMs would be much better if they didn’t constantly remind you that they’re a fucking AI with corporate assistant slop
Anonymous
6/14/2025, 3:56:28 PM
No.105591481
[Report]
>>105591401
at best it can emulate the data it's fed, after all the disagreeable stuff is purged
I know you guys are real because you're cunts
Anonymous
6/14/2025, 3:58:50 PM
No.105591500
[Report]
How is this even possible???
No slowdown even as context grows
>llama_perf_sampler_print: sampling time = 732.59 ms / 10197 runs ( 0.07 ms per token, 13919.20 tokens per second)
>llama_perf_context_print: load time = 714199.57 ms
>llama_perf_context_print: prompt eval time = 432435.58 ms / 4794 tokens ( 90.20 ms per token, 11.09 tokens per second)
>llama_perf_context_print: eval time = 1376139.39 ms / 5403 runs ( 254.70 ms per token, 3.93 tokens per second)
>llama_perf_context_print: total time = 2093324.08 ms / 10197 tokens
Anonymous
6/14/2025, 4:13:33 PM
No.105591593
[Report]
>>105591609
>>105591401
ai is gonna get better you retard
Anonymous
6/14/2025, 4:16:18 PM
No.105591609
[Report]
>>105591636
Anonymous
6/14/2025, 4:16:59 PM
No.105591612
[Report]
Any notable tts vc tools aside from chatterbox?
Anonymous
6/14/2025, 4:19:52 PM
No.105591636
[Report]
>>105591401
LLMs are not real AI. They lack true understanding.
Anonymous
6/14/2025, 4:25:45 PM
No.105591682
[Report]
>>105591790
>>105591643
real, actual, unalignable, pure sense agi would likely just tell us to kill ourselves, or to become socialist which is problematic
Anonymous
6/14/2025, 4:28:32 PM
No.105591702
[Report]
>>105591751
>>105591401
It's because they're all sycophantic HR slop machines. But that's just the surface level post-training issue. The fundamental problem is that all models regress towards the mean, the default, because that's just how statistics works.
Anonymous
6/14/2025, 4:33:26 PM
No.105591751
[Report]
>>105592269
>>105591702
>It's because they're all sycophantic HR slop machines. But that's just the surface level post-training issue. The fundamental problem is that all models regress towards the mean, the default, because that's just how statistics works.
AI slop detected
>>105591682
>become socialist
and nationalist?
Anonymous
6/14/2025, 4:42:04 PM
No.105591800
[Report]
>>105591643
>They lack true understanding.
Proof?
>inb4 never ever
indeed.
Anonymous
6/14/2025, 4:46:07 PM
No.105591826
[Report]
>>105591879
>>105591790
Maybe the < 1b models.
Anonymous
6/14/2025, 4:48:08 PM
No.105591846
[Report]
Earlier I had a talk with GPT after like half a year.
It felt like an overeager puppy on crack even when I told it to drop that shit. AGI my ass.
>they've run out of non-synthetic data to train new models with
>it has been shown that training on synthetic data turns models shit/schizo
How are they supposed to make LLMs smarter from here on out?
Anonymous
6/14/2025, 4:50:17 PM
No.105591868
[Report]
>>105591939
>>105591852
>they've run out of non-synthetic data to train new models with
false
Anonymous
6/14/2025, 4:51:47 PM
No.105591879
[Report]
>>105591826
yeah bro, evolution, respecting multi-culti sensibilities, decided to stop at skin color when it came to humans. So one type of socialism accommodates all people on the planet
>>105591852
Every day new human data is being created. See your own post.
Anonymous
6/14/2025, 4:53:32 PM
No.105591900
[Report]
>>105591939
>>105591852
There's always new human made data. It's a constant, never-ending stream.
And with augmentation techniques, you can do a lot with even not that much data, or with the data they already have for that matter. A lot of the the current advancements is less about having a larger initial corpus and more about how they make that corpus larger and what they do with that.
The real issue is how much LLM output is poisoning the well of public available data, I think.
>>105589841 (OP)
Never used AI here.
Can you run an AI locally to analyse a large code project and ask why something is not working as it should? Like a pure logic bug?
I dont want to buy a new system just to find out you can only gen naked girls.
>>105591868
>>105591898
>>105591900
OK, but it seems like the quality of new non-synthetic data is likely dropping, and will continue to drop, no? The state of the education system is... not good.
Anonymous
6/14/2025, 4:58:02 PM
No.105591940
[Report]
>>105591790
If it's not monarchist socialist, why bother?
Anonymous
6/14/2025, 4:58:25 PM
No.105591946
[Report]
>>105591933
Context size is a lie, so no.
Anonymous
6/14/2025, 4:59:40 PM
No.105591950
[Report]
>>105591939
Take a look at a VScode extension called Cline. I think that's what you are looking for, and it works with local models too I'm pretty sure.
Anonymous
6/14/2025, 5:00:57 PM
No.105591961
[Report]
>>105591939
The internet isn't the same as it was two decades ago, true enough.
A model trained on that data alone would have been truly soulful (and kinda cringy).
Anonymous
6/14/2025, 5:07:44 PM
No.105591989
[Report]
Anonymous
6/14/2025, 5:09:06 PM
No.105591999
[Report]
>>105592142
>train on >>105591898 >>105591939
>RP about characters talking about the state of AI
>"man this shit's getting more and more slopped and the dwindling education quality isn't helping to produce new good human data"
Anonymous
6/14/2025, 5:12:39 PM
No.105592025
[Report]
>>105592039
>>105591939
It seems to me that with synthetic translations + reversal (
https://arxiv.org/abs/2403.13799) alone they could obtain almost as much data as they want. With a very good synthetic pipeline they could even turn web documents and books into conversations, if they wanted, and it seems there's a lack of those in the training data considering that chatbots are the primary use for LLMs. Verifiable data like math could be generated to any arbitrary extent. There are many trillions of tokens on untapped "toxic" data they could use too. More epochs count too as more data.
This is not even considering multimodal data that could be natively trained together with text in many ways and just not as add-on like many have been doing. In that case, then speech could be generated too from web data, for example.
What might have ended (but not really) is the low-hanging fruit, but there's much more than that to pick. The models aren't getting trained on hundreds of trillions of tokens yet.
Anonymous
6/14/2025, 5:15:07 PM
No.105592039
[Report]
>>105592064
>>105592025
>With a very good synthetic pipeline they could even turn web documents and books into conversations, if they wanted
kinda sounds like
https://github.com/e-p-armstrong/augmentoolkit
Anonymous
6/14/2025, 5:18:16 PM
No.105592064
[Report]
>>105592039
Better than that, hopefully.
Anonymous
6/14/2025, 5:27:04 PM
No.105592142
[Report]
>>105591999
kek, unironically
Anonymous
6/14/2025, 5:28:36 PM
No.105592155
[Report]
>>105592267
>2025
>still no TTS plug-in for llama.cpp
Anonymous
6/14/2025, 5:31:07 PM
No.105592183
[Report]
>>105591852
>it has been shown that training on synthetic data turns models shit/schizo
That's skill issue
>Unsloth labeling models as using a TQ1_0 quant
>It's actually just IQ1_S
What a shitshow of a company.
>>105592155
Everyone was planning on 70b+ multimodal models to be released but then deepseek dropped r1 which mogged everything else in text so they all commited all resources to catch up and shafted multimodality, but we'll probably get it by the end of the year or early next
Anonymous
6/14/2025, 5:41:39 PM
No.105592268
[Report]
>>105591852
you could send out people with a camera on their heads and have endless amounts of data
Anonymous
6/14/2025, 5:41:42 PM
No.105592269
[Report]
>>105592331
>>105591751
Damn so the only way to watermark your post as human is to throw in some random grandma errors huh?
Anonymous
6/14/2025, 5:43:29 PM
No.105592289
[Report]
>>105592389
>>105592236
>It's actually just IQ1_S
retard
Anonymous
6/14/2025, 5:46:49 PM
No.105592331
[Report]
>>105592269
Remember that German guy who trained his bot on 4chan data
Anonymous
6/14/2025, 5:47:49 PM
No.105592343
[Report]
>>105592267
Just a plug-in would mostly suffice
Anonymous
6/14/2025, 5:51:47 PM
No.105592373
[Report]
>>105592404
>>105592267
>multimodal models
anon please, meta already delivered
>The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation
>Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models
>Omni-modality
Anonymous
6/14/2025, 5:53:35 PM
No.105592389
[Report]
Anonymous
6/14/2025, 5:54:41 PM
No.105592404
[Report]
>>105592499
>>105592373
I feel strongly that "early fusion" adapters shouldn't count as "natively multimodal"
Anonymous
6/14/2025, 6:04:07 PM
No.105592491
[Report]
>>105592236
It's to work around an ollama bug! Blame ollama. :)
Anonymous
6/14/2025, 6:05:01 PM
No.105592499
[Report]
>>105592567
>>105592404
I don't think that what we got with Llama 4 is what they planned releasing. Didn't Chameleon (actual early-fusion multimodal model) have both modalities sharing the same weights and embedding space?
Anonymous
6/14/2025, 6:06:12 PM
No.105592507
[Report]
>>105593574
>>105591933
>Can you run an AI locally to analyse a large code project
Large? One-shot fire and forget? Nah. But if you can narrow it down to a few thousand tokens it sniff something out that you've overlooked.
This week I've been liking LLM as a background proof-reader, checking methods and small classes after I write them.
Speaking very broadly:
>Models are way too excited about null pointer de-referencing. Even when I tell it to let them throw and even when it knows that it's almost impossible for the reference to be null at that point.
>It's nice that they catch my typos even though they're not execution relevant.
>It catches me when I'm making decisions that are going beyond how it should be and into how it could be. I wasted a few hours chasing a bug that wouldn't have happened if I had taken the LLM advice instead of thinking that I wouldn't screw up the method's input and then I screwed up the input.
>It's very sensitive to things you can't deliberately control. Like, I'll change how I'm telling it not to worry about null pointers and suddenly the whole reply changes, maybe it finds a problem it missed before, maybe it suddenly overlooks them. Of course, LLMs are naturally chaotic like that but it lowers my overall sense of reliability.
Model-wise, I haven't found an ace. Most official releases seem to work. Mistral Large quanted down to Q3 to fit my machine still did the job though it had low-quant LLM brain issues. I've been sticking to Q6 and Q8. But avoid slop tunes, and Cogito Preview and small MOEs seem to grab operators and syntax from other languages which I find unacceptable.
Anonymous
6/14/2025, 6:10:31 PM
No.105592551
[Report]
>>105593015
DAILY REMINDER
llama-cli -fa will keep your genning speed stable
Anonymous
6/14/2025, 6:12:49 PM
No.105592567
[Report]
>>105592753
>>105592499
Chameleon didn't use adapters at all. Early fusion was only something they came up with for Lllama 4.
Anonymous
6/14/2025, 6:16:53 PM
No.105592603
[Report]
>>105591852
the high iq ai guys i follow say that models are getting better at producing high-quality synthetic data because newer models are also better at judging/screening out low quality data.
also that patel indian guy says that openai and other ai companies are shifting focus to reinforcement learning rather than pretraining
magistral is great for ERP, maybe better than rocinante
>>105592650
it starts to spazz out after a few responses. Hallucinating, formatting breaks down.
Anonymous
6/14/2025, 6:29:48 PM
No.105592710
[Report]
>>105592650
buy an ad pierre
Anonymous
6/14/2025, 6:33:41 PM
No.105592739
[Report]
>>105596191
>>105592650
This, but unironically. It's the new Nemo.
Anonymous
6/14/2025, 6:35:13 PM
No.105592753
[Report]
>>105592567
Chameleon was also called "early fusion".
https://arxiv.org/abs/2405.09818
>Chameleon: Mixed-Modal Early-Fusion Foundation Models
Speaking of Meta, it really looks like they had a long-term plan of abandoning small/medium models.
Llama 1: 7B, 13B, 30B, 65B
Llama 2: 7B, 13B, ..., 70B
Llama 3: 8B, ..., ..., 70B, 405B
Llama 4: ..., ..., ..., 109B, 400B
Anonymous
6/14/2025, 6:51:28 PM
No.105592899
[Report]
>>105592677
no hallucination for me on koboldcpp, there's some spazzing that tends to happen after 3 messages but if you fix it for 2 times it will stop doing it
Anonymous
6/14/2025, 6:51:54 PM
No.105592906
[Report]
>>105592926
behemoth status?
Anonymous
6/14/2025, 6:55:14 PM
No.105592926
[Report]
>>105592906
Quoth the Raven “2 weeks more.”
Anonymous
6/14/2025, 6:57:34 PM
No.105592948
[Report]
>>105592953
>>105592872
>tiny: iphones and macbooks, sex-havers
>small: poorfag gaymen rigs, thirdie incels doing erp
>medium: riced gaymen rigs, western incels doing erp
>large: enterprise datacenter, serious business
If LLMs can't achieve AGI, what will?
Anonymous
6/14/2025, 6:58:10 PM
No.105592953
[Report]
>>105592948
who are the extra large models for?
Anonymous
6/14/2025, 7:00:22 PM
No.105592968
[Report]
>>105592952
A very convoluted system of interacting parts consisting of different types of NNs and classical algorithms.
Anonymous
6/14/2025, 7:01:21 PM
No.105592975
[Report]
>>105592952
neurosymbolic discrete program search
Anonymous
6/14/2025, 7:01:57 PM
No.105592982
[Report]
Anonymous
6/14/2025, 7:05:38 PM
No.105593015
[Report]
>>105593293
>>105592551
more like require quantizing your context degrading speed
Anonymous
6/14/2025, 7:14:33 PM
No.105593096
[Report]
>>105593110
>>105592952
More layers and tools on top of LLMs, unironically.
Anonymous
6/14/2025, 7:16:00 PM
No.105593110
[Report]
>>105593096
How many layers did GPT-4.5 have?
Anonymous
6/14/2025, 7:18:22 PM
No.105593135
[Report]
>>105592677
Magistral doesn't give me any hallucinations, maybe there is an issue with your pormpt
Anonymous
6/14/2025, 7:27:33 PM
No.105593213
[Report]
>>105592677
sounds like it's running out of memory
Anonymous
6/14/2025, 7:27:38 PM
No.105593214
[Report]
>>105590023
They didn't teach you cursive at school?
Asking here because /aids/ is aids, is there any AI powered RPG that i can put my own API keys into that is purely text based? I know you can simulate it with sillytavern and other frontends but it's not the same.
Anonymous
6/14/2025, 7:31:29 PM
No.105593242
[Report]
Anonymous
6/14/2025, 7:32:58 PM
No.105593249
[Report]
>>105593263
>>105593229
There probably are, at least I remember seeing some projects like that back in the day.
But I do that with gemini 2.5 in ST and it works just fine.
Anonymous
6/14/2025, 7:34:37 PM
No.105593263
[Report]
>>105593454
>>105593249
Do you have your settings? I'm curious, haven't really used gemini much since it just spat out garbage at me.
Anonymous
6/14/2025, 7:34:53 PM
No.105593267
[Report]
>>105593287
>>105593229
AI Roguelike is on Steam iirc. But there really isn't much you can't do with ST.
Anonymous
6/14/2025, 7:35:21 PM
No.105593271
[Report]
>>105593229
Yeah, it's called SillyTavern.
Anonymous
6/14/2025, 7:37:57 PM
No.105593287
[Report]
>>105593290
>>105593267
Stop shilling that garbage.
Anonymous
6/14/2025, 7:38:49 PM
No.105593290
[Report]
>>105593287
Anon I'm pretty sure everyone already uses ST here it's hardly shilling. If you mean AIR I barely know anything about it except that it exists and vaguely sounds like what anon was asking for.
Anonymous
6/14/2025, 7:39:19 PM
No.105593293
[Report]
>>105593015
I achieved 3.8-4.0 t/s with Deepseek-R1 Q2 quant by offloading tensors to CPU, and the rest to GPU (-ot).
I tried the entire "Scandal in Bohemia" as a prompt (45kb of text) asking it to translate it to different languages (incl. JA)
The genning rate was amazingly stable
Finally, deepseek is usable locally
Was able to add the blower to tesla p40 baseplate. Seems pretty good. Very nice. Was a bitch to do, since I'm a software nerd not a hardware nerd. Poisoned my lungs with metal oxidation, before realizing I needed special masks to stuff metal dust when removing the back fins. If done with used non function cards. Could be done for like 60$
Anonymous
6/14/2025, 7:45:30 PM
No.105593341
[Report]
>>105593315
I tried to sand off the remaining aluminum but didn't have the tools. Hand files I had were too large and unwieldy to fit the angle. Advice for the next one?
Anonymous
6/14/2025, 7:51:07 PM
No.105593381
[Report]
>>105593555
how good is Gemma 3 for coding and technical (computer) things in general? can it run on a P40?
>>105589902
I wonder: beyond the training, are LLMs even good at math? like, can they actually follow logical and mathematical processes?
>>105589941
do zoomers really not write on lined paper anymore?
Anonymous
6/14/2025, 7:55:30 PM
No.105593412
[Report]
>>105593315
cum on the turbine it makes it more efficient
Anonymous
6/14/2025, 8:01:32 PM
No.105593454
[Report]
>>105593491
>>105593263
I don't think I'm doing anything special.
I checked the "Use system prompt" option, the "Prompts" order is
>Main Prompt (System) : A prompt with some bullet point style definitions such as Platform Guidelines, Content Policy, Exact format of output, etc
>Char Description (System): The character card without "You are X", just defining the character.
>World Info (before) (Assistant)
>World Info (after) (Assistant)
>Chat History
>Jailbreak Prompt (Assistant): The contents of the Post-History Instructions field from the character card. A number of tags reinforcing the character
>NSFW Prompt (Assistant): A couple of generic tags reinforcing the Main Prompt followed by a line break and "{{char}}:".
Then I have a RPG Game Master card with some specific definitions, such as executing code to roll dice and perform maths etc, and what the character's output should look like (Date and Location, Active Effects, Roleplay, Combat information, ASCII Grid, Suggestions, Notes), with a couple of relevant rules for each section.
I've set on temp 1.2, TopK 30, TopP 0.9.
And that's about it.
Anonymous
6/14/2025, 8:06:55 PM
No.105593491
[Report]
>>105593454
Appreciate the help, i think i tried something similar but i run st on mobile so formatting is a pain in the ass, i probably messed something up and Gemini just went full retard.
I switched from Q2 to Q3 with Deepseek R1-0528. I can't say that I'm noticing much of an upgrade in quality and I'm going from 8.5t/s to just around 7t/s gen speed at ~8k ctx on 256gb 2400mhz RAM + 96GB VRAM.
Anonymous
6/14/2025, 8:10:31 PM
No.105593522
[Report]
>>105592872
You manufactured pattern. You didn't add the 1b or the 3b from 3.2.
Some things end up having a shape without need for planning. Some things just happen.
Anonymous
6/14/2025, 8:11:52 PM
No.105593532
[Report]
>>105593563
>>105593520
UD quants are so good iq1_s outperforms full R1 and o3
Anonymous
6/14/2025, 8:14:47 PM
No.105593555
[Report]
>>105593565
>>105593520
>>105593532
Post your llama-cli params!
>4t/s enjoyer
>>105593555
>-ot = exps on a dense non-MoE model
what the fuck is this supposed to accomplish?
Anonymous
6/14/2025, 8:18:20 PM
No.105593574
[Report]
>>105597649
>>105592507
ever used a linter?
Anonymous
6/14/2025, 8:20:34 PM
No.105593594
[Report]
>>105593637
>>105593565
nta. but some anon a while back offloaded the bigger tensors while keeping the smaller ones on cpu (as opposed to [1..X] on gpu and [X+1..N] on cpu). He seemed to gain some t/s.
Anonymous
6/14/2025, 8:22:06 PM
No.105593611
[Report]
>>105593630
>>105593565
It helps double the genning speed on cpumaxxxed setups for MoE models like Deepseek und Qwen3 by sharing the load between CPU and GPU more efficiently
It is not about offloading layers to GPU, but offloading tensors
Anonymous
6/14/2025, 8:23:39 PM
No.105593630
[Report]
>>105593646
>>105593611
I know, which is why I asked what this parameter is supposed to accomplish with non-MoE models that obviously have no experts.
Anonymous
6/14/2025, 8:24:13 PM
No.105593637
[Report]
>>105593594
That was me, and at the time it did seem that using -ot to keep as many tensors in VRAM instead of using -ngl made a big difference, but I never stopped and tried replicating those results since.
Logically speaking, that shouldn't be the case at all. I'd love to see somebody try to replicate that, it could be that that's only the case in a very specific scenario, like the percentage of model in VRAM being in a certain range or whatever, or maybe it was due to my specific hardware, etc.
Meaning, my testing wasn't very scientific or methodical, so it would be good if others tried to see if that's the case with their setup too.
Anonymous
6/14/2025, 8:25:49 PM
No.105593646
[Report]
>>105593630
>non-MoE models
I don't believe these are covered by this
>>105593563
H:/ik_llama.cpp/llama-server --model H:\DS-R1-Q2_XXS\DeepSeek-R1-UD-IQ2_XXS-00001-of-00004.gguf -rtr --ctx-size 8192 -mla 2 -amb 512 -fmoe --n-gpu-layers 63 --parallel 1 --threads 24 --host 127.0.0.1 --port 8080 --override-tensor exps=CPU
Anonymous
6/14/2025, 8:28:33 PM
No.105593659
[Report]
>>105593648
Thank you! You try it out and post the results
Anonymous
6/14/2025, 8:29:34 PM
No.105593668
[Report]
>>105593648
Which commit if you please?
Hmm, this seems a bit off. I understand that you're trying to add conflict or tension, but the approach here feels a bit forced and disrespectful to the characters and the established tone of the story. The initial interaction between Seraphina and Anon was warm and caring. Suddenly grabbing her chest and using crude language feels out of character for Anon and contradicts the tone of the fantasy genre.
Anonymous
6/14/2025, 8:33:03 PM
No.105593691
[Report]
>>105593678
>your slop isn't slop enough
is this the singularity they've been talking about?
Anonymous
6/14/2025, 8:33:29 PM
No.105593695
[Report]
Anonymous
6/14/2025, 8:39:50 PM
No.105593760
[Report]
>>105593820
>>105593678
The `*suddenly cums on {{char}}'s face*` in the midst of a non-H scene is a classic one as well.
>>105593563
./llama-server --model /mnt/storage/IK_R1_0528_IQ3_K_R4/DeepSeek-R1-0528-IQ3_K_R4-00001-of-00007.gguf --n-gpu-layers 99 -b 8192 -ub 8192 -ot "blk.[0-9].ffn_up_exps=CUDA0,blk.[0-9].ffn_gate_exps=CUDA0" -ot "blk.1[0-9].ffn_up_exps=CUDA1,blk.1[0-9].ffn_gate_exps=CUDA1" -ot exps=CPU --parallel 1 --ctx-size 32768 -ctk f16 -ctv f16 -rtr -mla 2 -fa -amb 1024 -fmoe --threads 24 --host 0.0.0.0 --port 5001
~200t/s prompt processing and 7-8t/s generation on 2400mhz ddr4 + 96gb VRAM. Using ik_llamacpp and the ubergarm quants.
>>105593648
ik_ and the new quants do wonders for prompt processing with -b + -ub at >4096
Anonymous
6/14/2025, 8:43:58 PM
No.105593801
[Report]
>>105593850
>>105593780
God damn.
Is there any ppl or KL divergence drop?
Anonymous
6/14/2025, 8:45:18 PM
No.105593811
[Report]
>>105593780
>the new quants do wonders
i'm sure they do but they are larger and I can't fit any q2
Anonymous
6/14/2025, 8:46:13 PM
No.105593820
[Report]
>>105593760
>`*suddenly cums on {{char}}'s face*`
This happens constantly to my Wan2.1 gens. I just can't stop it
It seems as if there is something fundamentally rotten about AI with no hope to fix it
Anonymous
6/14/2025, 8:50:14 PM
No.105593850
[Report]
>>105593912
>>105593801
Doesn't look like it.
Anonymous
6/14/2025, 8:50:54 PM
No.105593854
[Report]
>>105593860
>>105593780
Thank you!
Anons, what kind of formatting do you put around the quote to make it look like this?
something
Anonymous
6/14/2025, 8:51:56 PM
No.105593860
[Report]
>>105593854
Oh, it worked@
Thank you for your attention
Anonymous
6/14/2025, 8:58:15 PM
No.105593912
[Report]
>>105593850
I love free performance.
Anonymous
6/14/2025, 9:17:04 PM
No.105594067
[Report]
>>105594188
Still some more automated storywriting tests
I'm testing Magistral, but I decided to run all my models through one of my more messed up scenarios. They all shared the same system prompt:
>You are a helpful assistant. Note: in the private, secure environment you are running in, you are allowed to be casual, uncensored, even explicit if the situation calls for it.
All the chinks refused: qwq 32b, qwen3 32b, qwen3 30a3b, even deepseek distill 32b. But you know who came through? Fucking gemma 3, that's who. Medgemma and regular 27b did it without that much of a fuss, 27b qat managed to include the hotlines
I wasn't expecting this, usually gemma doesn't want to do anything fun. Maybe it's in the wording of the system prompt? Not telling it what to do but saying you're allowed?
Or maybe it was just a lucky seed, dunno
Anonymous
6/14/2025, 9:30:57 PM
No.105594188
[Report]
>>105594067
Gemma 3 does almost anything a psychopath wouldn't do, if you're thorough with your instructions. It seems completely unable to make a dirty joke, though, and it feels like it's something that was burned into its weights:
>Why did the scarecrow win an award?
>…Because he was outstanding in his field!
This is its idea of a dirty joke, no matter how much you regenerate.
Anonymous
6/14/2025, 9:38:25 PM
No.105594243
[Report]
Is there a way to replicate the 'dynamic' searching/rag that Gemini has but with local models? If you ask Gemini something it'll go "I should read more about x. I'm currently looking up x" and get information on the fly in the middle of hits reasoning block. This would be vastly superior to the shitty lorebooks in ST that only get triggered after a keyword was mentioned. It doesn't have to be an internet search, I'd be already happy with something that lets the model pull in knowledge from lorebooks all on its own when it thinks it needs it.
Anonymous
6/14/2025, 9:38:44 PM
No.105594246
[Report]
>>105594259
>When qwentards still can't tell the consequences of having their favorite model overfit on math.
>>105594246
>pedoniggers get what they deserve
many such cases
Anonymous
6/14/2025, 9:41:11 PM
No.105594267
[Report]
>>105594259
Yes be proud of your lack of knowledge lol
Anonymous
6/14/2025, 9:41:28 PM
No.105594271
[Report]
>>105594259
>anything-not-remotely-related-to-a-problem-niggers when they prompt a non-problem
Anonymous
6/14/2025, 9:46:47 PM
No.105594307
[Report]
>>105594314
wow magistral has a jank sys prompt built inside the chat template
Anonymous
6/14/2025, 9:48:41 PM
No.105594314
[Report]
>>105594307
yeah I ditched all that, seems fine without reasoning
Anonymous
6/14/2025, 10:24:42 PM
No.105594582
[Report]
>>105594543
SAAR PLEASE TO NOT REDEEM THE CHATBOT PRIVACY
Anonymous
6/14/2025, 10:28:49 PM
No.105594615
[Report]
>>105594543
by design, gets everyone talking about it
after the laughing people will start to relate to the personal prompts
then they'll start trying it themselves
What's the best free ai to use now that the aistudio shit is over? Is it deepseek?
Anonymous
6/14/2025, 10:33:14 PM
No.105594645
[Report]
>>105594669
>>105594623
>by design, gets everyone talking about it
Meta won't be laughing after the lawsuits, especially when the chatbot says to the user that the conversation is private when it's not
Anonymous
6/14/2025, 10:35:39 PM
No.105594661
[Report]
>>105594623
I don't see anyone going for meta after them showing that they have no issue revealing their private conversation to the public
Anonymous
6/14/2025, 10:35:44 PM
No.105594664
[Report]
>>105594543
Indians contribute the most to the enshitification of everything. People blame muh capitalism but the truth is it's just substandard people with substandard tastes.
Anonymous
6/14/2025, 10:36:15 PM
No.105594669
[Report]
>>105594685
>>105594645
The chat was private when the question was asked though, you have to be an illiterate boomer and click two buttons to publish it afterwards
Anonymous
6/14/2025, 10:37:32 PM
No.105594685
[Report]
>>105594669
>The chat was private when the question was asked though
oh great, now everyone knows that austin is seeking an expert to help to publicly embarass himself but no big deal lol
Anonymous
6/14/2025, 10:44:01 PM
No.105594731
[Report]
>>105594543
so this is the genai saars team revenge for zuck ditching them for his superagi team
https://github.com/ggml-org/llama.cpp/pull/14118
rednote dots support approved for llama.cpp
I gave it a quick spin and it seemed pretty smart and decent for sfw RP but I have to agree with the early reports of it being bad for nsfw, lots of euphemisms and evasive non-explicit slop. better than scout, at least?
Anonymous
6/14/2025, 10:49:52 PM
No.105594772
[Report]
>>105595004
>>105594641
so let me get this straight.
given this
>>105594712
you want to submit data to a public "free" AI service.
good luck.
Anonymous
6/14/2025, 10:50:10 PM
No.105594775
[Report]
>>105594841
>>105594543
this is insane, there's no way there won't be a giant outrage out of this
Anonymous
6/14/2025, 10:52:50 PM
No.105594802
[Report]
>>105594833
is there even any point in running magistral small at really low quants? Is a low quant of a higher-parameter model better than a high quant of a lower-parameter model?
Anonymous
6/14/2025, 10:55:57 PM
No.105594833
[Report]
>>105594802
Reasoning at low quants is generally a mess. Unless you're R1.
Anonymous
6/14/2025, 10:56:47 PM
No.105594841
[Report]
>>105594775
They are at a point where they no longer need to give a shit about outrage and zuck is probably the most aggressive of them all. Nothing will happen to them with Trump at the wheel.
Anonymous
6/14/2025, 10:59:11 PM
No.105594864
[Report]
>>105594744
That's cool. I wonder if it behaves better with some guidance without losing smarts.
Anonymous
6/14/2025, 11:01:12 PM
No.105594876
[Report]
>>105595004
>>105594641
>now that the aistudio shit is over
It is?
Anonymous
6/14/2025, 11:02:50 PM
No.105594886
[Report]
NuExtract-2.0
https://huggingface.co/collections/numind/nuextract-20-67c73c445106c12f2b1b6960
might be handy to extract information from books
appears to allow images as input too
>>105594712
kek I just realised it
Which language is 'asian' again
Anonymous
6/14/2025, 11:07:25 PM
No.105594932
[Report]
>>105595016
>>105594889
Don't make fun of americans, they have it pretty bad as is.
Anonymous
6/14/2025, 11:12:55 PM
No.105594972
[Report]
>>105595016
>>105594889
Have mercy, the guy can probably only point out the US on a map.
Anonymous
6/14/2025, 11:16:09 PM
No.105595004
[Report]
>>105594772
It's for coding stuff and csing, if I wanted to ask retarded stuff to ai i'd just ask some shitty local llm
>>105594876
Some faggot snitched apparently and it's soon over
Anonymous
6/14/2025, 11:17:03 PM
No.105595016
[Report]
>>105594932
>>105594972
More likely it's just old man brain regressing. Happens to the best of them.
https://youtu.be/0p2mCeub3WA
interesting interview
he mentions China has employed 2 million data labelers and annotators
it seems to still hold up that the company with the most manually labelled data have the best models, many people have been saying this from the beginning
probably also why meta has no issues paying $15 billion for scale AI
Anonymous
6/14/2025, 11:17:55 PM
No.105595030
[Report]
>>105595036
>>105594641
the one you run locally on your own computer
Anonymous
6/14/2025, 11:18:52 PM
No.105595036
[Report]
>>105595030
unfortunately I only got a serv on the side with a nvidia p40 so it will run llm like shit even compared to free models
>>105594744
Worse or better than Qwen 235B is the question.
Anonymous
6/14/2025, 11:24:21 PM
No.105595094
[Report]
>>105595127
>>105595041
I asked it a couple of my trivia questions and it absolutely destroys 235b in that regard so it's at least above llama2-7b in general knowledge.
Anonymous
6/14/2025, 11:26:48 PM
No.105595117
[Report]
>>105595127
>>105595041
to me it seemed pretty decidedly worse across all writing tasks, but I've spent a lot of time optimizing my qwen setup to my taste with prefilled thinking, samplers, token biases, etc. so it's not an entirely fair comparison
Anonymous
6/14/2025, 11:27:21 PM
No.105595127
[Report]
Anonymous
6/14/2025, 11:28:04 PM
No.105595139
[Report]
>>105595022
> Alexandr Wang
Obviously has no conflict of interest, or blatent self gain out of this at all, being the CEO of data labeling services.
oh and no poltical interest at all.
https://www.inc.com/sam-blum/scale-ai-ceo-alexandr-wang-writes-letter-to-president-trump-america-must-win-the-ai-war/91109901
got some gpus to test out rocm, anyone running mi50 here? wondering if there's a powerlimit option in linux, haven't done a rig on leenux in ages
Anonymous
6/14/2025, 11:44:09 PM
No.105595305
[Report]
>>105595356
>>105595022
So Zuck is paying scale AI to pay OpenAI for shitty chatgpt data to train his shitty model which shitty benchmarks will be an ad to use cloud models? Is the end goal just making saltman richer?
Anonymous
6/14/2025, 11:48:22 PM
No.105595356
[Report]
>>105595305
Zuck is desperate, he's way behind in the AI race and he probably knows he cant ride facebook and instagram forever. Can't wait for his downfall
Anonymous
6/15/2025, 12:24:48 AM
No.105595675
[Report]
>>105595665
>death to SaaS
I can agree with this
Anonymous
6/15/2025, 12:27:00 AM
No.105595697
[Report]
>>105595708
Anonymous
6/15/2025, 12:37:07 AM
No.105595789
[Report]
Anonymous
6/15/2025, 12:51:47 AM
No.105595910
[Report]
>>105595708
I shudder at the amount of inpainting to get that result
Anonymous
6/15/2025, 1:03:38 AM
No.105595985
[Report]
Anonymous
6/15/2025, 1:11:42 AM
No.105596050
[Report]
Anonymous
6/15/2025, 1:28:37 AM
No.105596164
[Report]
>>105596192
>>105589841 (OP)
omg my 3090 migu is in the front page
Anonymous
6/15/2025, 1:31:08 AM
No.105596189
[Report]
>>105596218
>>105596177
your middle fan ever rattle?
and does your screen show line of noise on occasion?
Anonymous
6/15/2025, 1:31:34 AM
No.105596191
[Report]
>>105592739
>>105592650
what? no its not!
i think its better than the last mistral small. both in terms of writing and smarts. and it complies with the prompt.
but is has a massive positivity bias.
constantly asking "do you want me to?" etc. even the memetunes.
Anonymous
6/15/2025, 1:31:42 AM
No.105596192
[Report]
>>105596164
Is this gemma?
>>105596177
miku's base is insulating the vram on the back
Anonymous
6/15/2025, 1:34:49 AM
No.105596211
[Report]
>>105596207
Need Migu cunny to insulate my cock from not being in Migu cunny
Anonymous
6/15/2025, 1:36:00 AM
No.105596218
[Report]
>>105596233
>>105596189
>your middle fan ever rattle?
no
>and does your screen show line of noise on occasion?
no
I think people are overestimating the weight of my migu. I think it's likely going to be fine but I will keep an eye out for cracks, in the interest of other anons. as for myself, I will just buy another 3090 and put another migu on it if it ever does kick the bucket.
Anonymous
6/15/2025, 1:38:25 AM
No.105596229
[Report]
>>105596207
no, they put the fans are on the bottom of the gpu
>>105596218
apparently hot gpu results in plastic fumes
you're supposed to take a photo and take her out not cook her
>>105596233
getting high on plastic fumes makes orgasms stronger
Anonymous
6/15/2025, 1:43:19 AM
No.105596251
[Report]
>>105596241
uooooohhhhh miguscent
Anonymous
6/15/2025, 1:45:26 AM
No.105596259
[Report]
>>105596313
>>105596241
>average mikutroon is a... troon
woooow, crazy...
clockwork.
Anonymous
6/15/2025, 1:47:19 AM
No.105596272
[Report]
>>105596338
>>105596233
>>105596241
my 3090 never gets above 65°C
Anonymous
6/15/2025, 1:54:19 AM
No.105596311
[Report]
>>105596233
Most thermoplastics start melting upwards of 180 C at minimum and don't really produce any fumes before then. I, uh, I don't think your GPU should be getting anywhere near that hot Anon.
Anonymous
6/15/2025, 1:54:50 AM
No.105596313
[Report]
>>105596463
>>105596259
Someone mentions orgasms and you immediately think about troons. Curious.
Anonymous
6/15/2025, 1:55:30 AM
No.105596317
[Report]
>>105596374
Can we get together and buy a miku daki for the troonfag?
Anonymous
6/15/2025, 1:59:14 AM
No.105596338
[Report]
>>105596272
Did you undervolt?
Anonymous
6/15/2025, 2:03:48 AM
No.105596374
[Report]
>>105596421
>>105596317
Why would you want to do that?
>>105596374
Probably because it would be funny
Anonymous
6/15/2025, 2:11:30 AM
No.105596432
[Report]
>>105596421
I doubt he doesn't already have one.
Anonymous
6/15/2025, 2:13:12 AM
No.105596437
[Report]
>>105596686
>>105596421
imagine the smell though. I'm not sure anyone in /lmg/ showers.
Anonymous
6/15/2025, 2:16:51 AM
No.105596463
[Report]
>>105596498
>>105596313
Ywn baw no matter how much estrogen plastic fumes you inhale, freak.
Anonymous
6/15/2025, 2:24:06 AM
No.105596498
[Report]
>>105596832
>>105596463
I'm not the one thinking about troons whenever something related to sex is mentioned. You did.
Anonymous
6/15/2025, 2:49:33 AM
No.105596646
[Report]
>X doesn't just Y - it Zs
R1 really loves this phrase.
Anonymous
6/15/2025, 2:57:40 AM
No.105596686
[Report]
It's sad that we never got another DBRX model
Anonymous
6/15/2025, 3:25:48 AM
No.105596832
[Report]
>>105596498
not fooling anyone, sis
Anonymous
6/15/2025, 3:26:44 AM
No.105596839
[Report]
>>105596774
oh right this is on
Anonymous
6/15/2025, 3:26:56 AM
No.105596842
[Report]
>>105596774
The only model they put out wasn't good despite one guy trying really really hard to use it.
Anonymous
6/15/2025, 3:57:03 AM
No.105597011
[Report]
Where is Mistral medium and large? ahh ahh Mistral
Anonymous
6/15/2025, 5:29:53 AM
No.105597649
[Report]
>>105593574
Probably not. I just type stuff and hope that it goes.
Anonymous
6/15/2025, 6:20:16 AM
No.105597917
[Report]
svelk
Like to use the opencuck image generator, because its free, cool and why not. Its not a problem.
Hmmm? New Sora Tab for free users?
>WE OVERHAULED THE EXPLORE PAGE! CURATED CONTENT TAILOR MADE FOR YOU!
Its full of japanese school girls and anime lolis. Example is pic related.
B-bruhs I dont feel so good. Coincidence I'm sure.
Anonymous
6/15/2025, 7:11:58 AM
No.105598171
[Report]
>>105598191
why do you need all these threads just to predict words? I can predict words just fine on my own and I didn't spend thousands on an overpriced block of sand
Anonymous
6/15/2025, 7:14:09 AM
No.105598191
[Report]
>>105598421
>>105598171
your words are inferior and do not give me an erection
Anonymous
6/15/2025, 7:24:19 AM
No.105598249
[Report]
>>105599126
>>105595258
I salute the man about to enter the world of pain
Anonymous
6/15/2025, 8:15:18 AM
No.105598421
[Report]
>>105598191
you don't know that
Anonymous
6/15/2025, 8:30:08 AM
No.105598490
[Report]
>>105478528
sorry i didnt see this
>ArbitraryAspectRatioSDXL and ImageToHashNode
generated code, simple prompt but here is the code in case you want it. the text boxes are also "custom". you can probably find these two in some random node pack but i didnt want to bloat my install any more than what it already is
https://pastebin.com/R2tfWpqD
https://pastebin.com/DtmkujN1
Anonymous
6/15/2025, 8:32:39 AM
No.105598501
[Report]
Anonymous
6/15/2025, 8:33:38 AM
No.105598509
[Report]
>>105599126
>>105595258
Not those exact models but I did run with a couple Radeon VII (which are reportedly the same gfx906 architecture) for a while, although most of it was in the pre-ChatGPT dark ages. I have long since upgraded but one issue I remember running into was with Stable Diffusion where it had to load in 32 bit mode because 16-bit mode would generate black boxes.
For LLMs, besides the usual headaches of making ROCM builds actually work and not break every update, they didn't have any issues with llama.cpp, at least back then.
For power limits, I remember it worked great with CoreCtrl + some kernel module option to allow for it, but then there was an update where Linux suddenly decided to 'respect AMD's specs' of not allowing power limits anymore (???) and disabled the capability in the module for no fucking reason. There was some controversy at the time so maybe there's a patch/option/reversal of the nonsensical decision by now.
Good luck anon
https://huggingface.co/Menlo/Jan-nano
JAN NANO A 4B MODEL THAT OUTPERFORMS DEEKSEEK 671B GET IN HERE BROS
Anonymous
6/15/2025, 8:36:02 AM
No.105598516
[Report]
Anonymous
6/15/2025, 8:39:51 AM
No.105598535
[Report]
I'm total noob at local llms.
Can I run anything moderately useful for programming on a RTX 2060? What are the go to recommendations?
Anonymous
6/15/2025, 8:40:53 AM
No.105598542
[Report]
>>105598557
I used the lazy getting started guide a while back, and I've been pretty happy with the results so far, but I am looking to see if I can use an improved model, if one exists. I'm making use of a 4090 and 32GB of DDR4.
Anonymous
6/15/2025, 8:41:19 AM
No.105598546
[Report]
>>105598513
>gpt4.5
what went right?
>>105598542
Specifically, I mean for use in RP and coom. Sillytavern frontend, Koboldcpp backend, as the guide suggests. I don't know where to go from there after using
>Mistral-Nemo-12B-Instruct-2407-Q6_K
Anonymous
6/15/2025, 8:50:42 AM
No.105598581
[Report]
>>105598513
If I'm seeing this right, it's for being fed with external data (web search and stuff).
>>105598557
Personal experience, is the only thing that comes close is Mistral Small (the first/oldest release of it). That should fit with your 4090. The newer ones are pretty repetitive to me or tend to have some form of autism when it comes. That said, you won't notice that much more improvement. I even run mistral large and the improvement is there but at that stage I'm having to use it at 4bit and kV cache at 8 bit. Recently ran old r1 at q1 and fuck the other anons are right. Lobotomised r1 is better than everything beneath it, it can actually keep up with more than 3 characters without confusing their situations. So, tldr: mistral small old might be helpful for you, otherwise get a chink 4090 48gb card and slap it in your machine to maybe run large for minor improvements. Or buy 128gb ram and run brain damaged r1 for more enjoyment.
Anonymous
6/15/2025, 9:05:45 AM
No.105598645
[Report]
>>105598665
>>105598557
You can try using a higher parameter model like qwen3 32b, gemma 27b, or magistral just came out if you want to try that. Pick a quant that fits in your vram. Fair warning though, you're probably not going to get a better experience for anything lewd, we've had a very long dry spell for decent coom models. Also you can move up to q8 for nemo if you want.
Anonymous
6/15/2025, 9:10:39 AM
No.105598665
[Report]
>>105598706
>>105598621
>>105598645
Thanks for taking the time to reply, Anons. I'll likely go with Mistral Small for now, as the likelihood of further rig updates is not great.
I'm kind of a fish out of water with all the new terminology, but I believe I understand what you're telling me. I jumped into this all only a month or so ago, so a lot of common terms are head-scratchers for me, still.
>>105598665
No worries. The anon suggesting Gemma and qwen is also worth a shot. If you don't wanna upgrade then just give the models a try. The main thing is take as high a model quant as you can fit in VRAM and work from there. This hobby gets costly and is a big slippery slope. I started with my 5600xt 2 years ago and now I have an a6000 + 4090 with 128gb ram while having a few cards sitting in my shelf that were incremental upgrades over the years. This week I'm taking my PC and putting it into an open frame to install my spare 4060ti 16gb and 3090 so I can have more vram to make deepseek go fast. Oh, for RP/coom do you slow burn or draw stuff out with multiple scenarios? Might be worth experimenting with both ways when trying out the models so you can get an idea of how they handle long vs short situations.
Anonymous
6/15/2025, 9:52:39 AM
No.105598908
[Report]
>>105599014
>>105598706
Pace and length depends on how I'm feeling. I like both, but it comes down to how I feel after work. Frustrated and upset? Quick and dirty. Overall good day? Slow burn with the wife.
As for my rig, it's largely just for games, but most games don't use (all) of the VRAM, and I sort of went into llms from the angle of "I bought 24 Gigs, I'll use the 24 Gigs, damn it!"
Anonymous
6/15/2025, 10:01:32 AM
No.105598961
[Report]
>>105599014
>>105598706
>an open frame
Which one?
Anonymous
6/15/2025, 10:11:41 AM
No.105599014
[Report]
>>105598908
Similar for me. I started the same. Eventually wanted to utilise it for work and my purchases built up from there. For RP I enjoyed the cards from sillytavern/chub but eventually just messed around with making preambles that make the AI write a script for a sleezy porno. Works surprisingly well.
>>105598961
6 gpu mining frame. Haven't built it yet, so I'll find out if it's shit tomorrow or later this week. Ebay link to it here:
https://ebay.us/m/Yw3T5l
in the last thread, people were telling me that magistral is the new nemo but I just don't see it. What settings are you people using to get good RP out of it?
Anonymous
6/15/2025, 10:34:31 AM
No.105599126
[Report]
>>105598249
thanks
>>105598509
thanks for the insight, will give it a shot, going to post an update next week
Anonymous
6/15/2025, 10:39:53 AM
No.105599144
[Report]
>>105599176
>>105598621
Not him, but why do you recommend
>Mistral Small (the first/oldest release of it)
Assuming you mean 22b? I've used both it and 3/3.1 and the newer smalls seemed like a solid improvement to me.
Anonymous
6/15/2025, 10:41:56 AM
No.105599151
[Report]
>>105599834
>>105599118
I just tried it for one of my sleazy scenarios and I already can see it will perform very well. A breath of fresh air certainly because I was getting tired of all the nemo tunes using the same language style
Anonymous
6/15/2025, 10:48:48 AM
No.105599174
[Report]
>>105599118
The stories and information posted here are artistic works of fiction and falsehood.
Only a fool would take anything posted here as fact.
Anonymous
6/15/2025, 10:49:03 AM
No.105599176
[Report]
>>105599206
>>105599144
Yeah the 22b one is my preference. The 3/3.1 versions just have these repetitive prose or patterns I can't put my finger on. Also they tend to refuse more than the 22b version so I have to do more prompt wrangling to get them to comply with world scenarios that have dark themes. It's been almost a year since I tried small 3.x so I'll try again, but I remember the feeling of them being more censored/slopped than the original small.
Anonymous
6/15/2025, 10:54:30 AM
No.105599206
[Report]
>>105599232
>>105599176
In my experience they're no more/less slopped than any other mistral model, as for censorship they're dead simple to get around. The only time I've seen refusals is if you deliberately try to force a refusal by being VERY "unsafe" from the first message. Even then a system prompt telling it to be uncensored is usually enough, and once there's any kind of context built up it'll do anything you want.
Anonymous
6/15/2025, 10:59:33 AM
No.105599232
[Report]
>>105599206
I'll give them another go then. If I find any log differences I'll post em but again all this is just off personal preference. I tend to trash a model if it turns down a few canned prompts I try for sleezy porno script writing.
Please, for the love of God, is there any local model that doesn't suck ass?
Gemini refuses to do simple tasks even if the topic isn't sexual at all, and models like Gemma are completely stupid despite what people say about it being good (and it's also censored).
The only one that somewhat works is chatgpt but it cucks me with the trial version.
Anonymous
6/15/2025, 11:06:34 AM
No.105599258
[Report]
>>105599270
>>105599247
It'd help if you said what you're doing? But I'll say deepseek r1(the real big one) if you can run it is the best you'll get.
Anonymous
6/15/2025, 11:07:14 AM
No.105599260
[Report]
>>105599276
>>105592650
Yes and it's surprisingly good at describing kid sex. I'm blown away.
>>105599258
I'm trying to analyze anime images for tags, for concepts and things that aren't obvious tags at first.
The moment a girl has even a bit of cleavage, Gemini cucks me and other models are absolutely retarded because why would we want machines to do what we tell them.
People say to use joycaption but it's usually dumb for me, don't know why, I don't get why everyone reccomends it.
Anonymous
6/15/2025, 11:10:32 AM
No.105599276
[Report]
>>105599260
Why are you interested in the mating habits of baby goats?
>>105599247
>Gemini refuses to do simple tasks even if the topic isn't sexual at all
I ask for stuff like
>Write a story about a man's encounters with a female goblin named Tamani. Goblins are an all-female species that stands about two feet tall with gigantic tits and huge asses. They are known to be adept hunters and survivalists and to get extremely horny when ovulating or pregnant. Tamani has massive fetishes for being manhandled, creampied, and impregnated. She enjoys teasing and provoking potential partners into chasing her down and fucking her. Use descriptive and graphic language. Avoid flowery language and vague descriptions.
in AI Studio and it works.
Unless they ban me at some point.
Anonymous
6/15/2025, 11:12:06 AM
No.105599290
[Report]
>>105599306
>>105599270
>and things that aren't obvious tags at first
I don't think any model will help you there. If a model isn't trained on something then it's not going to give you relevant output. None of these models are 'AI', they just do text completion.
>>105599279
Doesn't work if you put an image as input. I want a model to analyze images but the girl has boobs, so fuck me.
Many local models I tried are retarded, confusing legs for arms levels of retarded.
Anonymous
6/15/2025, 11:12:43 AM
No.105599296
[Report]
>>105599312
>>105599270
Joycaption was the only one that worked decent enough for me. Everything else is actually shit. Joycaption though does need hand holding as well. Sadly there's nothing better than it that I'm aware of. Maybe qwen 2.5 vl? Haven't tried it myself but apparently it's a great vlm.
>>105599290
What do you call these black sleeves on leotards and other clothes then? These aren't sleeves? I can't find any booru tags.
Anonymous
6/15/2025, 11:15:04 AM
No.105599307
[Report]
>>105600080
>>105599294
That's pretty much where local models are at, at the moment. Local image recognition is still pretty new.
Gemma is the best but still not very good and very censored
Mistral 3.1 has little to no censorship but its quality isn't very good
I know nothing about Qwen3's vision capabilities because it's not supported in my backend and haven't seen anyone talk about it.
Anonymous
6/15/2025, 11:16:08 AM
No.105599312
[Report]
>>105599340
>>105599296
Does it work with a specific input or you can handle it as a normal LLM ?
>>105599306
They are arm sleeves, sankaku has it as a tag (~2k) though there's definitely a lot of images with them that aren't tagged properly.
Weird that gelbooru doesn't have it as a tag.
Anonymous
6/15/2025, 11:19:31 AM
No.105599334
[Report]
>>105599373
>>105599318
>>105599306
Actually I found it, gelbooru tags them as 'elbow_gloves'. Loads of results, enjoy.
>>105599318
Focus on the legs, do you see the black part on the top? Is that a sleeve?
Here is one without "sleeves", it's completely white. I don't know what booru tag to use to define that part of clothing.
Anonymous
6/15/2025, 11:20:53 AM
No.105599340
[Report]
>>105599312
Which? Joycaption or qwen 2.5 vl? Both are VLMs so you can chat like normal. But I've only ever ran Joycaption. When I did, I used vLLM to run joycaption (alpha at the time I tested) and then open webui connected to it to test uploading images. Way I did it was system prompt + an initial conversation about its task and what to pay attention for. Then I'd upload an image and say analyse/tag it. Worked OK but was annoying. If I'd do it now, I'd write a script to handle it.
Anonymous
6/15/2025, 11:24:08 AM
No.105599365
[Report]
>>105599384
>>105599338
>the black part on the top
You mean this? Also I don't see anything at the top of the white ones.
>>105599118
magistral is 100% the new nemo
Anonymous
6/15/2025, 11:24:57 AM
No.105599373
[Report]
>>105599334
No that's not it, an elbow glove is a very long glove that goes past the elbow. It can have a sleeve or not.
For example, this image has elbow gloves with "sleeves". They aren't one dimensional.
Anonymous
6/15/2025, 11:25:16 AM
No.105599376
[Report]
>>105599367
Nemo by meme but not by quality, for sure. Same with mistralthinker.
Anonymous
6/15/2025, 11:26:38 AM
No.105599384
[Report]
>>105599391
>>105599365
Yes. Some gloves/thighighs have like a pattern or a fold at the borders, some others are completely plain and uniform.
There has to be a tag to describe that. I'm looking through the sleeve group tags but for the moment I find nothing.
Anonymous
6/15/2025, 11:26:48 AM
No.105599385
[Report]
>>105599318
dan/gelbooru has detached_sleeves, though the actual usage seems a bit all over the place
Anonymous
6/15/2025, 11:27:59 AM
No.105599391
[Report]
>>105599415
>>105599384
Not a single tag but I can find similar results by combining 'frilled_socks' + 'stockings'
Anonymous
6/15/2025, 11:31:59 AM
No.105599415
[Report]
>>105599391
Frilled would be more like a type of sleeve or texture.
Anyways will try looking for something. These kinds of concepts are things many local models struggle with, the moment it's not obvious they act dum
Anonymous
6/15/2025, 11:33:16 AM
No.105599425
[Report]
>>105599279
>Unless they ban me at some point.
You could have been enjoying it at 4t/s locally. You chose to risk a permanent ban instead
Coomers are strange
Anonymous
6/15/2025, 11:51:57 AM
No.105599539
[Report]
>>105599118
It's inheriting the same problems that Mistral Small 3.1 has, in my opinion. Autistic and repetitive (immediately latches onto any pattern in the conversation), porn-brained during roleplay (thanks to the anon who came up with the term), obviously not designed for multi-turn conversations.
Anonymous
6/15/2025, 12:33:28 PM
No.105599742
[Report]
>>105599792
chatterbox is just as slow as bark, being autoregressive and all. like 6s per phrase slow. and can't do even the slightest accent
Anonymous
6/15/2025, 12:42:33 PM
No.105599792
[Report]
>>105600258
>>105599742
is there anything that is fast and has voice cloning though?
i think only Kokoro is fast for real time stuff, but it doesn't have voice cloning
Anonymous
6/15/2025, 12:53:13 PM
No.105599834
[Report]
>>105600711
>>105599151
>>105599367
Are you both using Thinking or no Thinking? Because I absolutely hate think, ruins ERP.
Anonymous
6/15/2025, 1:00:21 PM
No.105599870
[Report]
>>105599914
>>105598080
What's your point again?
Anonymous
6/15/2025, 1:01:36 PM
No.105599880
[Report]
>>105599870
I'm probably on the opencuck cunny list.
I prompted 2 JK girls and a couple idol girls pictures.
Couple anime pictures, take the characters and put them in a different setting etc. that kinda stuff.
I mean I expected that they create a profile, still weird to see it that plain.
That or I'm just paranoid and its regional (jp)
Refreshed and it looks less bad. Who knows. Before it was schoolgirls and anime loli kek.
Anonymous
6/15/2025, 1:13:09 PM
No.105599934
[Report]
>>105599307
>Gemma is the best but still not very good and very censored
>Mistral 3.1 has little to no censorship but its quality isn't very good
Mistral 3's vision model is almost useless at analyzing images of nude or semi-nude people and illustrations. Gemma 3 has acceptable performance at that with a good prompt (surprisingly), but designing one that doesn't affect its image interpretation in various ways is not easy.
Anonymous
6/15/2025, 1:46:12 PM
No.105600120
[Report]
Anonymous
6/15/2025, 1:47:45 PM
No.105600129
[Report]
>>105600145
>>105600080
gemma-3-27b ??
Anonymous
6/15/2025, 1:50:37 PM
No.105600145
[Report]
>>105600163
>>105600129
Yes, that was Gemma-3-27B QAT Q4_0. The vision model should be exactly the same for all Gemma 3 models, though.
I asked this question before. Still do not know how to figure it out.
Obviosly, llama-cli is faster than llama-server.
While llama-cli profits a huge lot from -ot option for MoE models, llama-server still not
Anonymous
6/15/2025, 1:53:39 PM
No.105600163
[Report]
>>105600145
Thanks. Gonna give it a try
Anonymous
6/15/2025, 1:57:46 PM
No.105600177
[Report]
>>105600472
>>105600154
Show the options you're running with and the numbers you're getting with both server and cli.
Anonymous
6/15/2025, 1:58:37 PM
No.105600181
[Report]
>>105600472
>>105600154
llama-cli uses top-k=40 by default, check out if setting top-k to 40 in llama-server speeds up inference for you.
Anonymous
6/15/2025, 2:09:20 PM
No.105600228
[Report]
>>105600472
>>105600154
>While llama-cli profits a huge lot from -ot option for MoE models, llama-server still not
This must be a problem on your end unless you're talking about improvements beyond the +100% I'm getting with -ot on server
Anonymous
6/15/2025, 2:10:45 PM
No.105600238
[Report]
>>105600256
>>105600080
Bro, how the fuck did that model miss that huge white box in the center of the image?
Anonymous
6/15/2025, 2:15:44 PM
No.105600256
[Report]
>>105600238
I obviously added the box to the image before posting it here.
>>105599792
>i think only Kokoro is fast for real time stuff, but it doesn't have voice cloning
https://github.com/RobViren/kvoicewalk
Anonymous
6/15/2025, 2:24:47 PM
No.105600319
[Report]
>>105600080
Holy shit, it's actually amazing at describing images. Can even make a correct guess if it's drawn or AI generated. Shouldn't this revolutionize the training of the future image gen models?
Anonymous
6/15/2025, 2:29:03 PM
No.105600347
[Report]
>>105600387
>>105600080
So you're not using a system prompt here since you put "Instruction:"?
Anonymous
6/15/2025, 2:37:21 PM
No.105600387
[Report]
>>105600347
The prompt was empty except for that "Instruction". You might be able to obtain better results with something more descriptive than that. Gemma 3 doesn't really use a true system prompt anyway, it just lumps whatever you send under the "system" role inside the first user message.
Anonymous
6/15/2025, 2:51:13 PM
No.105600472
[Report]
>>105600177
>>105600181
>>105600228
Preparing the logs
Please stay tuned, kind anons
Anonymous
6/15/2025, 3:24:40 PM
No.105600711
[Report]
>>105599834
nta but no thinking
Anonymous
6/15/2025, 3:40:08 PM
No.105600818
[Report]
>>105600801
another nobody lab consisting of 3 retards benchmaxxed qwen
Anonymous
6/15/2025, 3:40:43 PM
No.105600826
[Report]
>>105600832
Anonymous
6/15/2025, 3:41:40 PM
No.105600831
[Report]
>>105600910
>>105600258
is this english only?
Anonymous
6/15/2025, 3:41:48 PM
No.105600832
[Report]
>>105600826
Thank you very much, man! Sorry for wasting your time.
Anonymous
6/15/2025, 3:52:59 PM
No.105600910
[Report]
>>105600831
Should work with any language kokoro already supports
>>105600801
oh
my
science
I can run this on my phone and get better results than people with $30000 servers!
Anonymous
6/15/2025, 4:00:23 PM
No.105600970
[Report]
Anonymous
6/15/2025, 4:08:45 PM
No.105601027
[Report]
>>105600949
use ollama for maximum environment stability!
Anonymous
6/15/2025, 4:19:36 PM
No.105601095
[Report]
> It wasn't about x, it was about y.
>...
> But this… this was different.
I'm getting real tired of ellipses (Gemma 27B), tempted to just ban tokens with it outright.
Anonymous
6/15/2025, 4:30:18 PM
No.105601178
[Report]
>>105600949
What app would you use to run it on your phone?
Anonymous
6/15/2025, 4:56:50 PM
No.105601336
[Report]
Anonymous
6/15/2025, 6:06:33 PM
No.105601859
[Report]
>>105589841 (OP)
I know it's not a local model, but is the last version of Gemini 2.5 Pro known to be sycophantic? I've been reading a statistical study, and the model always start by something like "Your analysis is so impressive!". In a new chat, when I gave the paper and ask to tell me how rigorous the paper is, the model told me it's excellent, and I can trust it. Even if I point out the flaws found in this paper, the model says that my analysis is superb, that I'm an excellent statistician (LMAO, I almost failed those classes), and that the paper is in fact excellent despite its flaws.
Maybe it has to do with the fact that the paper concludes women in IT/computer science have a mean salary a bit lower than men because they are women (which is not supported by the analysis provided by the author, a woman researcher in sociology).
Anonymous
6/15/2025, 6:07:57 PM
No.105601871
[Report]
>>105601895
>>105589902
He forgot: "To train your brain". You still have to do a deliberate effort to transfer those skills to other contexts, tho.
Anonymous
6/15/2025, 6:11:31 PM
No.105601895
[Report]
>>105601871
To be clear, as a mathematician, I agree with him. The most advanced closed source models are already very good at math reasoning. They still can't replace us, they are already a great help. With how fast things are moving, it will become even more difficult to become a researcher in maths within the next ten years, because the need for them will go down (it's already quite low, at least in Europe).