Thread 105611492

374 posts 78 images /g/

Anonymous 6/16/2025, 5:43:49 PM No.105611492 >>105611887 >>105612859 >>105612968 >>105613219 >>105613273 >>105613788 >>105613888 >>105614932 >>105621307

/lmg/ - Local Models General

1723669949923791.jpg md5: 064c7d78... 🔍

Anonymous 6/16/2025, 5:44:04 PM No.105611494

Lovely Miku General.jpg md5: cf8bf576... 🔍

►Recent Highlights from the Previous Thread: >>105601326

--Papers:
>105606869 >105606875
--Evaluation of dots.llm1 model performance and integration challenges in local inference pipelines:
>105601735 >105604736 >105604782 >105604857 >105604810 >105604838 >105605017 >105605319 >105605475 >105605551 >105605556 >105605609 >105605671 >105605701 >105605582 >105605670 >105605965
--llama-cli vs llama-server performance comparison showing speed differences and config inconsistencies:
>105601495 >105601540 >105601746 >105601830 >105601953 >105601967 >105602123 >105602170 >105602190 >105602380 >105601654
--Evaluating budget hardware options for local LLM deployment with portability and future model scaling in mind:
>105609676 >105609743 >105609808 >105609858 >105610000 >105610275 >105610095
--VideoPrism: A versatile video encoder achieving SOTA on 31 of 33 benchmarks:
>105610184
--Sugoi LLM 14B/32B released via Patreon with GGUF binaries and claimed benchmark leads:
>105606204 >105606305 >105606399 >105609562 >105609620
--Interleaving predictions from multiple LLMs via scripts or code modifications:
>105609453 >105609499 >105609500 >105609534
--Hailo-10H M.2 accelerator questioned for real-world AI application viability:
>105602205 >105602335
--Radeon Pro V620 GPU rejected due to driver issues and overheating in LLM use case:
>105603370 >105603394 >105603418 >105603454 >105603762 >105603893 >105604087
--Sycophantic tendencies in cloud models exposed through academic paper evaluation:
>105601903 >105602389 >105602410 >105602064 >105603398 >105603416
--MiniMax-M1, hybrid-attention reasoning models:
>105611241 >105611443
--Qwen3 models released in MLX format:
>105608806
--Miku (free space):
>105601934 >105603103 >105604354 >105604389 >105604736 >105605940 >105606009 >105606217 >105610016 >105610160 >105610284 >105610486 >105611108 >105611119

►Recent Highlight Posts from the Previous Thread: >>105601330

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous 6/16/2025, 5:47:26 PM No.105611523 >>105611563

>it's june 16, 2025 and there is STILL no minimax gguf

Anonymous 6/16/2025, 5:47:40 PM No.105611524

Nothing ever happens.

Anonymous 6/16/2025, 5:50:51 PM No.105611563 >>105611656 >>105612211

>>105611523
And there never will be. It uses lightning attention
https://github.com/ggml-org/llama.cpp/issues/11290

Anonymous 6/16/2025, 5:52:46 PM No.105611583 >>105611602 >>105611680

1726164673361456.png md5: 5d26df94... 🔍

>>105611471
here you go sar, they have a huggingface space
https://huggingface.co/spaces/MiniMaxAI/MiniMax-M1

Anonymous 6/16/2025, 5:54:03 PM No.105611602 >>105612438

>>105611583
>looking it up in my mind

Anonymous 6/16/2025, 5:56:23 PM No.105611630

I see that Unsloth uploaded dots.llm1 quants within the last few hours. I've been waiting to try out this model. If I have 96GB VRAM, which is better: IQ4_XS, IQ4_NL, or UD-Q3_K_XL? These are the 3 that look like the largest size I can fit. Tbh I'm not even really sure what all these newer meme quants are or which is supposed to be best.

Anonymous 6/16/2025, 5:58:40 PM No.105611651

does r1 pass the mesugaki test with the new version they released?

Anonymous 6/16/2025, 5:59:12 PM No.105611656 >>105611692

>>105611563
>lightning attention
What's next? Bolt attention?

Anonymous 6/16/2025, 6:00:02 PM No.105611662

totalen mikunigger death

Hi all, Drummer here... 6/16/2025, 6:01:14 PM No.105611673 >>105611999

> Drummer's merge is already an improvement, yet retains most of Magistral's strengths.

>>105610116

Hey anon, which version did you use and what strengths are you referring to? Was reasoning good and useful?

Anonymous 6/16/2025, 6:01:42 PM No.105611680

Screenshot 2025-06-16 at 13-00-40 MiniMax M1 - a Hugging Face Space by MiniMaxAI.png md5: b93f4186... 🔍

>>105611583
Here's a version with a bone thrown in.

Anonymous 6/16/2025, 6:03:07 PM No.105611692

>>105611656
I'm holding out for thunder attention

Anonymous 6/16/2025, 6:14:35 PM No.105611805 >>105611942 >>105611989

qte.jpg md5: 566dfea4... 🔍

Anonymous 6/16/2025, 6:17:55 PM No.105611838 >>105611985 >>105612989

F5 TTS has had bit of upgrade to inference speed recently. If you havent kept up with the update. ~3 different perf update.

>flash atten2
>Empirically Pruned Step Sampling (lower number of steps for high quality output)
>Single transformation instead of 2 step (half the inference time required)

Anonymous 6/16/2025, 6:20:05 PM No.105611862 >>105611874 >>105612298 >>105612316 >>105612335 >>105619697

https://huggingface.co/moonshotai/Kimi-Dev-72B
https://github.com/MoonshotAI/Kimi-Dev
>We introduce Kimi-Dev-72B, our new open-source coding LLM for software engineering tasks. Kimi-Dev-72B achieves a new state-of-the-art on SWE-bench Verified among open-source models.
>Kimi-Dev-72B achieves 60.4% performance on SWE-bench Verified. It surpasses the runner-up, setting a new state-of-the-art result among open-source models.
>Kimi-Dev-72B is optimized via large-scale reinforcement learning. It autonomously patches real repositories in Docker and gains rewards only when the entire test suite passes. This ensures correct and robust solutions, aligning with real-world development standards.
>Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub. We welcome developers and researchers to explore its capabilities and contribute to development.

Anonymous 6/16/2025, 6:21:22 PM No.105611874

>>105611862
>code
>local
I sleep

Anonymous 6/16/2025, 6:22:33 PM No.105611887 >>105611970 >>105612215

1740034581240832.jpg md5: 29931d44... 🔍

>>105611492 (OP)
Uh oh, the 24/7 seething fatoid disgusting ratoid troons transisters didn't like that one, huh?

You will
Never
Ever
Be a
Real
Woman

looooooooooool

Anonymous 6/16/2025, 6:27:00 PM No.105611942

>>105611805
Yes (Aliens)

Anonymous 6/16/2025, 6:29:22 PM No.105611970

>>105611887
Don't misunderstand. I don't project to be or even want to be a woman. That construct is entirely within your own ass, not even a snug fit, it's spacious.
Miku as a character or concept is irrelevant to me. I like her design and my perception of her is a convenient, often portable twintailed onahole. There is no wanting to be her, she is a sleeve for me to rub one out.
Hope that helps clarify. Goodness, you can't seem to kick the habit of malding.
Keep this up and you'll never get a kurisu.

Anonymous 6/16/2025, 6:30:13 PM No.105611985 >>105612086

>>105611838
I've definitely seen a huge increase in speed for large chunks of texts. With NFE=7

>Declaration of Independence
>8188 characters
>Inference time: 77 seconds
>Output Time: 8 mins 49 seconds of audio

https://vocaroo.com/1oiTcWPgdj6i

Anonymous 6/16/2025, 6:30:18 PM No.105611989 >>105612018

>>105611805
The second option is a trick, we all know she's not wearing any.

Anonymous 6/16/2025, 6:30:54 PM No.105611999

>>105611673
Do you know how to properly fine tune MoE models?

Anonymous 6/16/2025, 6:32:29 PM No.105612018

>>105611989
ah! caught!

Anonymous 6/16/2025, 6:40:14 PM No.105612086

>>105611985
This is with 2070 btw. So anyone with better GPU can double/triple the inference speeds.

Anonymous 6/16/2025, 6:40:52 PM No.105612097

file.png md5: 6c52dd43... 🔍

>Looking up (in my mind) some sources
Minimax-M1 knows Teto's birthday and that she's an UTAU. It would be a disappointment if it did not given its size.
this single point of knowledge is irrefutable evidence that proves that the model is good. we'll be back.

Anonymous 6/16/2025, 6:51:15 PM No.105612211

>>105611563
>never
but that was closed saying it could be revisited after refactoring, and they seemed to later do the refactor here:
https://github.com/ggml-org/llama.cpp/pull/12181
>Introduce llm_memory_i concept that will abstract different cache/memory mechanisms. For now we have only llama_kv_cache as a type of memory
and looks like work has picked up on other models with competing cache mechanisms (mamba etc.)
https://github.com/ggml-org/llama.cpp/pull/13979

now we just need someone with motivation, a vibe coding client, and good enough prompt engineering skills to revisit minimax and we're fucking IN

Anonymous 6/16/2025, 6:51:42 PM No.105612215 >>105612227

17441668868331400484052251153607.png md5: 20f01356... 🔍

>>105611887
Okay schizo

Anonymous 6/16/2025, 6:52:56 PM No.105612227

>>105612215
>ACK

Anonymous 6/16/2025, 6:57:39 PM No.105612268 >>105612298

https://huggingface.co/moonshotai/Kimi-Dev-72B

Anonymous 6/16/2025, 6:59:14 PM No.105612285 >>105612413

>>105610392
don't you need to enable tool use or something like that? are most engines compatible?

Anonymous 6/16/2025, 7:00:13 PM No.105612298 >>105612305 >>105612404

>>105612268

please search before posting
>>105611862

Anonymous 6/16/2025, 7:00:59 PM No.105612305

>>105612298
nah go shove a janny dilator up your holes though ;)

Anonymous 6/16/2025, 7:02:27 PM No.105612316 >>105612363 >>105612598 >>105619697

open_performance_white.png md5: e7faeb50... 🔍

>>105611862
>qwen 2 finetune
*yawn*
Also, apologize for Devstral

Anonymous 6/16/2025, 7:03:48 PM No.105612335

>>105611862
do they actually aim for the moon?

Anonymous 6/16/2025, 7:07:24 PM No.105612363

>>105612316
most meaningless graph award

Anonymous 6/16/2025, 7:12:10 PM No.105612404

>>105612298
ywnnaj

Anonymous 6/16/2025, 7:13:37 PM No.105612413

>>105612285
"tool use" is just sending a json object of available tools in the context and executing whatever tool the model invoked in its reply. That's entirely up to the client making the requests to abstract away. I mostly use llama-server, but any engine that exposes an OpenAI-compatible API should work.

Anonymous 6/16/2025, 7:16:27 PM No.105612438

>>105611602
deepseek (through web) said "making a mental note" to me recently, had no seen that before.

Anonymous 6/16/2025, 7:24:48 PM No.105612527 >>105612561 >>105612568

>>105610000

unfortunately yes, between rooms

>>105610095

that's great, but double the price. Also, I understand a 16GB card can only load small models (but could be used for diffusion)

Anonymous 6/16/2025, 7:27:53 PM No.105612561 >>105615772

>>105612527
Literally any pc case is portable "between rooms"

Anonymous 6/16/2025, 7:28:53 PM No.105612568 >>105615772

>>105612527
>portable between rooms
for what motherfucking purpose?

Anonymous 6/16/2025, 7:31:29 PM No.105612598

>>105612316
DID THE PARETO FRONT JUST DO WHAT I THINK IT DID?

Anonymous 6/16/2025, 7:40:31 PM No.105612703 >>105612715 >>105612769

hello saaars
haven't been keeping up with lmgs since deepsneed released, what's the current meta?

Anonymous 6/16/2025, 7:41:23 PM No.105612715 >>105612729

>>105612703
deepsneed or nemo.

Anonymous 6/16/2025, 7:42:24 PM No.105612729

>>105612715
This.

sage 6/16/2025, 7:45:12 PM No.105612769

>>105612703
deepsnemo

Anonymous 6/16/2025, 7:52:21 PM No.105612859 >>105612884 >>105612888 >>105613005 >>105614385 >>105617634

>>105611492 (OP)
She's sexy. Can I look like that? Is there any tech for that?

Anonymous 6/16/2025, 7:53:42 PM No.105612873 >>105612995 >>105613064 >>105613075 >>105613470 >>105613494 >>105613982

how are LLMs at femdom? beyond the cursory stuff like verbal degredation and humiliation, can they lean more into the power dynamic side and give you orders and encouragement, dictate what you eat, how you dress, more control yet still nurturing
asking for friend

sage 6/16/2025, 7:54:46 PM No.105612884 >>105613005

3795479221.png md5: 117eb281... 🔍

>>105612859
yes.

Anonymous 6/16/2025, 7:55:01 PM No.105612888 >>105613149

>>105612859
arch linus

Anonymous 6/16/2025, 8:01:34 PM No.105612968 >>105613058 >>105613077

>>105611492 (OP)
>Looks at news
>Nothing but small models and research stuff that can't RP worth shit
Is it over?

Anonymous 6/16/2025, 8:03:50 PM No.105612989 >>105613222 >>105613873

>>105611838
F5R-TTS is better

Anonymous 6/16/2025, 8:04:26 PM No.105612995

>>105612873
no

Anonymous 6/16/2025, 8:05:07 PM No.105613005 >>105613030

>>105612859
You need to reroll your char. See this: >>105612884

Anonymous 6/16/2025, 8:07:34 PM No.105613030 >>105613139

>>105613005
The cooldown for rerolling again is kinda long and early game is ass.

Anonymous 6/16/2025, 8:11:13 PM No.105613058 >>105613759

>>105612968
We got Magistral and dots last week.

Anonymous 6/16/2025, 8:11:33 PM No.105613064 >>105613087

>>105612873
tell your friend he has mommy issues

Anonymous 6/16/2025, 8:12:18 PM No.105613075

>>105612873
There is a trick to it. Tell it to roleplay as a wealthy sadistic werewolf milionare that inexplicably fell in love with his 5/10 average unassuming secretary. Then use an agent to rewrite what it wrote and swap werewolf with dommy mommy of your choice.

sage 6/16/2025, 8:12:24 PM No.105613077

>>105612968
2mw until deepsex V4

Anonymous 6/16/2025, 8:13:15 PM No.105613087 >>105613110 >>105614204

>>105613064
high ground? here? are you actually serious

Anonymous 6/16/2025, 8:15:11 PM No.105613110

>>105613087
Yes. It is over anakin. Take your mikutroons and walk into the lava yourself

Anonymous 6/16/2025, 8:17:02 PM No.105613139 >>105613241

>>105613030
Yes, this is a problem. I want to look like her in 3-4 years.

Anonymous 6/16/2025, 8:17:37 PM No.105613149

>>105612888
when regular linus isn't strong enough, take it to the arch linus.

Anonymous 6/16/2025, 8:24:38 PM No.105613219

>>105611492 (OP)
sex with miku

Anonymous 6/16/2025, 8:24:52 PM No.105613222

>>105612989
Thanks but hows the performance between the two?

Anonymous 6/16/2025, 8:26:33 PM No.105613241

>>105613139
You will never be 2d, anon

Anonymous 6/16/2025, 8:29:40 PM No.105613273

>>105611492 (OP)
alice.gguf when?
lmg activate the insider agent and leak it
we will finish her training with antisemitic propaganda and gpu bukkake

Anonymous 6/16/2025, 8:32:44 PM No.105613312 >>105613348 >>105613349 >>105613389

glitchgirl.jpg md5: 8214aa58... 🔍

/lit/fag here. I'm working on an "offline MUD" of sorts and need a writing buddy to ping pong ideas. Chatgpt is good enough but I'm interested in fine-tuning.
What model would you guys recommend for my setup?
>ryzen 7 5700G
>32 GB
>no GPU

Anonymous 6/16/2025, 8:35:52 PM No.105613346 >>105613377 >>105613382 >>105613760

>do the mesugaki test with R1-0528
>explains it flawlessly, with examples
>fucking ends the message asking me if I want nh codes to illustrate that
>would've posted the log but anons over in aicg got banned for less than that
this is why the chinks are gonna win the ai race

Anonymous 6/16/2025, 8:35:55 PM No.105613348 >>105613546

>>105613312
SmoLlm-0.15B

Anonymous 6/16/2025, 8:35:57 PM No.105613349 >>105613546

>>105613312
>fine-tuning
Rent hardware. You're not doing anything with that.

Anonymous 6/16/2025, 8:38:44 PM No.105613377 >>105613930

>>105613346
anon, the ai models are supposed to do the halluzinations, not you.

Anonymous 6/16/2025, 8:38:50 PM No.105613382

>>105613346
>>fucking ends the message asking me if I want nh codes to illustrate that
No way.

Anonymous 6/16/2025, 8:39:15 PM No.105613389 >>105613546

>>105613312
>fine-tuning
just tell the model at the start what you want and how to write
https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-Q4_K_S.gguf
https://huggingface.co/bartowski/google_gemma-3-27b-it-qat-GGUF/blob/main/google_gemma-3-27b-it-qat-Q4_0.gguf

Anonymous 6/16/2025, 8:42:57 PM No.105613432

What coomodel is good these days for a 24GB vramlet?

Anonymous 6/16/2025, 8:45:47 PM No.105613470 >>105613951

>>105612873
I've tried various loli dom/yuri s&m scenarios since GPT-3 came out in 2020.
Often it works poorly needing hand holding. Closed stuff like OpenAI had too much positivity bias.
Open models like Llama was not smart enough, needed hand holding, thus ruining it, same for most small ones.
Closed stuff like Claude (Opus, sometimes Sonnet) would manage it somewhat.
(Open) Command-R managed a bit, but needed hand holding and was very schizo.
From open models, DeepSeek R1 manages it properly, but as most LLMs it will still by default jump on your dick or start sex too quickly, but with careful prompting that explains in a few sentences the desired pacing it manages to ace this almost perfectly, It can go both fast and slow and it's leading the story by itself thus keeping your immersion.
I'd say DeepSeek 3 (the first one) failed at it, but the update works. Both R1's new and old wor, but the new has a slower pace old one was more intense, but both are intense enough if prompted right.
Now maybe the model size is too much for most, but when you consider that closed stuff that did well like Opus 3 is dying ("deprecating") and OpenAI has some positivity bias that often ruins it, I'd say R1 is one of the very few that manage to do it right.
If you accept some degree of hand holding smaller models like the 70b llama and some others managed partially, but considerably more poorly. I haven't seen any 7-13b range manage.
I'm a bit interested in trying it with Magistral sometime, because I've noticed that R1 would sometimes make plans and I would pass some of those CoT plans back to it (selectively retaining think blocks), so that it can lead the story over over many turns, which is a lot more fun than LLMs that forget what they were doing half a page ago or what they intended to do.
tl;dr: with careful prompting it works very well on some big models, mostly R1. DS3 sometimes works, but is gacha. everything else often needs hand holding.

Anonymous 6/16/2025, 8:48:09 PM No.105613494 >>105613577

>>105612873
you have narrow shoulders and literally zero eyebrow line

Anonymous 6/16/2025, 8:53:18 PM No.105613546 >>105613625

Screenshot 2025-06-16 152848.png md5: 44648593... 🔍

>>105613389
>20+B
I don't want to turn my toaster into a pressure cooker bomb.

>>105613348
>0.15B
Isn't that super small? Still, might be fun to play with, thanks.

>>105613349
Yeah probably fine-tuning is not the right word. I just want some degree of control beyond setting temp and prompting. Pic related is more or less what I expect. Just a dumb box that churns out lore.

Anonymous 6/16/2025, 8:56:50 PM No.105613577 >>105613608

>>105613494
are you erping with me?

Anonymous 6/16/2025, 8:59:50 PM No.105613608

>>105613577
Colon three

Anonymous 6/16/2025, 9:01:36 PM No.105613625 >>105613837

>>105613546
Qwen3-30b should give you ~10t/s at low context so it is very suitable for your machine, although I don't know if it is good for writing or whatever you are doing.

Anonymous 6/16/2025, 9:08:38 PM No.105613690 >>105614491

>>105611419
My bad.

Let's enjoy another Chinese SOTA at 7t/s (ds-r1 runs at 4t/s)

Anonymous 6/16/2025, 9:15:27 PM No.105613759

>>105613058
Didn't magistral suck, though?

Anonymous 6/16/2025, 9:15:28 PM No.105613760

>>105613346
>nh codes to illustrate that
I believe it. Fucking normalfag shit.

Anonymous 6/16/2025, 9:18:17 PM No.105613788

seated_pantyshot_angle.png md5: 1676181f... 🔍

>>105611492 (OP)
This got me thinking about the calculation for minimum non-whore skirt length.

Anonymous 6/16/2025, 9:21:18 PM No.105613814 >>105613915

How is dots for RP? 235b is too big for my rig, but dots seems like it could be a sweet spot.

Anonymous 6/16/2025, 9:23:12 PM No.105613829

kiryu-on-autism.jpg md5: b3683948... 🔍

It's been quite some time since I've played with local models, has windows + amd gotten any better? It's a pain in the ass to have to boot up linux every time I want to rp

Anonymous 6/16/2025, 9:23:57 PM No.105613837

exosuit.jpg md5: 92b23558... 🔍

>>105613625
Yeah I think fewer parameters but high context is going to work better for me. But gonna keep that in mind. Thanks.

Anonymous 6/16/2025, 9:26:59 PM No.105613873

>>105612989
code to run it? I can only find papers

Anonymous 6/16/2025, 9:27:11 PM No.105613874 >>105613979 >>105614033

>>>/v/712790873

Anonymous 6/16/2025, 9:28:29 PM No.105613888 >>105614327 >>105614358

>>105611492 (OP)
I currently have a server with a ton of CPU cores and spare RAM, but it only has a 1050ti with 4GB VRAM in it. Is it even worth trying to run a local language model on it?

Anonymous 6/16/2025, 9:31:41 PM No.105613915 >>105613962

>>105613814
it's noticeably pretty sterile when it comes to explicit nsfw, but that's nothing new for that general size range. if you're used to llama/qwen2.5 70b-class derivatives you'll feel right at home, but at least dots may be faster and have some more knowledge

Anonymous 6/16/2025, 9:33:43 PM No.105613930

>>105613377
You killed me fucker, kek

Anonymous 6/16/2025, 9:35:33 PM No.105613951 >>105614741

>>105613470
Are you doing a thesis on the topic or something?

Anonymous 6/16/2025, 9:36:47 PM No.105613962 >>105613983

>>105613915
I liked 72b EVA-Qwen 2.5 at IQ4_XS, though it ran really slow on my system (1-2t/s). If this performs anything like that, but with the speed of a MoE, then it sounds like it's for me.

Anonymous 6/16/2025, 9:36:54 PM No.105613963 >>105613987 >>105613989 >>105614351

I'm downloading Llama-4-Scout-17B-16E-Instruct-UD-TQ1_0.gguf.
Wish me luck.
I might not survive it.

Anonymous 6/16/2025, 9:38:52 PM No.105613979 >>105614033

>>105613874
Gayest thread on /v/ right now.

Anonymous 6/16/2025, 9:39:05 PM No.105613982

>>105612873
That's not femdom

Anonymous 6/16/2025, 9:39:19 PM No.105613983

>>105613962
That's an RP tune.

Anonymous 6/16/2025, 9:39:38 PM No.105613987 >>105614005

Screenshot 2025-06-16 133922.png md5: a517be8f... 🔍

>>105613963
That's beyond being 'funny' bad. Also it's insane that unsloth's Scout repo has 100k downloads in the past month. That has to be wrong

Anonymous 6/16/2025, 9:39:43 PM No.105613989 >>105614005

>>105613963
You'll survive fine anon, even the full precision model is shit and retarded

Anonymous 6/16/2025, 9:41:46 PM No.105614005 >>105614130 >>105614248 >>105614351

demons in microstructures.jpg md5: a63ca6a4... 🔍

>>105613987
>That's beyond being 'funny' bad
I know, pray for me.

>>105613989
I feel this one might be so bad as to be a cognitohazard.
We'll see.

Anonymous 6/16/2025, 9:44:42 PM No.105614033 >>105614260

>>105613874
>>105613979
How did mikutroons become more mentally ill than furfags? At least those retards contribute to image gen and keep to their own degenerate communities instead of spamming the same generic dogshit of a waifu they obsess over everywhere because they have nothing else in their miserable life to attach to.

Anonymous 6/16/2025, 9:52:34 PM No.105614104

hi anons, i know that this isn't the best thread to ask about commercial things, but... what are the services where I can deploy sdxl/etc finetuned models (anime ones) for easy API access? Obviously one choice is renting GPU servers on runpod/vast and so on, but are there any managed solutions? I don't think I need a dedicated GPU server to start, but eventually I guess I might need to generate up to 100 images/minute or something like that.

Anonymous 6/16/2025, 9:53:41 PM No.105614114

AI generated post >105614104 btw

Anonymous 6/16/2025, 9:54:58 PM No.105614130 >>105614148

>>105614005
>that pic
Baka, go back to /x/

Anonymous 6/16/2025, 9:57:19 PM No.105614148 >>105614248 >>105614351

Caenemung.jpg md5: 315e5d64... 🔍

>>105614130
>105614104
I say this with utmost sincerity.
I've been on 4chan since 2008. I have been to /x/ maybe 5 times total.
As in, individual instances.
Anyhow, finished downloaded it. Let's see what happens.
If I don't report back, please call my parents.

Anonymous 6/16/2025, 10:02:53 PM No.105614204 >>105614222

20241007_021413.png md5: a60a792a... 🔍

>>105613087
to be fair, if you post on 4chan -- people WILL make fun of you. even if they're just as depraved.
but hey, i hope you find your perfect jerk-off mommy machine bro.

Anonymous 6/16/2025, 10:04:39 PM No.105614222

>>105614204
s&m is vanilla compared to marrying a cartoon
that's not even a fetish, it's psychosis

Anonymous 6/16/2025, 10:06:43 PM No.105614248 >>105614266 >>105614351

>>105614005
>>105614148
>load_tensors: layer 0 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 1 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 2 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 3 assigned to device CUDA0, is_swa = 0
>load_tensors: layer 4 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 5 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 6 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 7 assigned to device CUDA0, is_swa = 0
>load_tensors: layer 8 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 9 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 10 assigned to device CUDA0, is_swa = 1
Is this supposed to be a thing? Interleaved swa layers?
That's how they got """1 million""" context?

Anonymous 6/16/2025, 10:08:13 PM No.105614260 >>105614372 >>105614527

_ec22462d-fa92-4d3c-ba24-88fb265ee83d.jpg md5: 4c2d9f4e... 🔍

>>105614033
Through the power of your butthurt, you have now summoned Migu.

I wonder if there will be a Blackwell card with 48GB? I don't need 96, and 32 just isn't enough. 48GB is about right. It just seems a little overboard to spend $8500 on a GPU.

Anonymous 6/16/2025, 10:08:49 PM No.105614264 >>105614304

Can't wait for Minimax to get supported. I'll abandon Deepseek for it.

Anonymous 6/16/2025, 10:08:58 PM No.105614266 >>105614351

>>105614248
Scout is 10 million sir.

Anonymous 6/16/2025, 10:12:38 PM No.105614304

>>105614264
Buy an ad

Anonymous 6/16/2025, 10:15:22 PM No.105614327

>>105613888
sure. Just run with llama.cpp in pure CPU inference mode (or really low context on the GPU)
It'll be a bit slow, but slow is ok for playing with. You'll be better off than desktop anons stuck with 128GB max RAM capacities that can't even run big models.

Anonymous 6/16/2025, 10:17:55 PM No.105614351 >>105614374

>>105614266
All the same when it breaks down after 8k.

>>105613963
>>105614005
>>105614148
>>105614248
Yeah, it's real bad.
Worse than Qweb 3 30B. It can't even keep up with outputting a specific pattern that much smaller models can just fine.
Amazing.

Anonymous 6/16/2025, 10:18:40 PM No.105614358

>>105613888
1.5 t/s for deepseek quants
3 t/s for qwen3-235b

Your GPU will be used for promp processing only

Anonymous 6/16/2025, 10:19:53 PM No.105614372

>>105614260
I remember this Miku
>I don't need 96
yes you do. Big batch size genning of big Migus.

Anonymous 6/16/2025, 10:20:20 PM No.105614374

>>105614351
breaks down much sooner than 8k even
https://github.com/adobe-research/NoLiMa?tab=readme-ov-file#results

Anonymous 6/16/2025, 10:21:18 PM No.105614383

Less than two weeks until we get open source Ernie 4.5/X1

Anonymous 6/16/2025, 10:21:31 PM No.105614385 >>105614481

>>105612859
Install gentoo

Anonymous 6/16/2025, 10:33:28 PM No.105614479 >>105614491 >>105614515 >>105614535

Getting 10 t/s with dots on my 96 GB gaming rig and normal Llama.cpp with a custom -ot.

Anonymous 6/16/2025, 10:33:51 PM No.105614481 >>105614507

>>105614385
>Gentroon
It is in the name. I have seen tech shrek. I won't be fooled.

Anonymous 6/16/2025, 10:35:31 PM No.105614491 >>105614503

>>105614479
Post command params and which quant you use

Also, >>105613690
>another Chinese SOTA at 7t/s
I saw it coming

Anonymous 6/16/2025, 10:36:38 PM No.105614497

One day a sex model will finally drop and i will be free from this place. I hope you all die the next day.

Anonymous 6/16/2025, 10:36:54 PM No.105614503 >>105614519

>>105614491
Hold on. I'm actually failing to allocate more context. I was testing with only 2k initially. Damn does this model not use MLA or even GQA?

Anonymous 6/16/2025, 10:37:26 PM No.105614507

tenor.gif_thumb.jpg.webm md5: 2369ccda... 🔍

WebM not supported

>>105614481

Anonymous 6/16/2025, 10:38:23 PM No.105614515 >>105614521

>>105614479
Well, is it good?

Anonymous 6/16/2025, 10:38:58 PM No.105614519 >>105614529

>>105614503
>gayming
What GPU?

Anonymous 6/16/2025, 10:39:09 PM No.105614521

>>105614515
Idk yet i just wanted to do an initial speed test first but trying to give more context is giving me the ooms.

Anonymous 6/16/2025, 10:40:04 PM No.105614527

>>105614260
>butthurt
2007 called your hrt ass

Anonymous 6/16/2025, 10:40:09 PM No.105614529 >>105614544

>>105614519
Just a 3090

Anonymous 6/16/2025, 10:40:32 PM No.105614535 >>105614542

>>105614479
>with a custom -ot
Suggested by unsloth brothers?

Anonymous 6/16/2025, 10:41:30 PM No.105614542

>>105614535
No? I am using unsloth's q4 quant though.

Anonymous 6/16/2025, 10:41:43 PM No.105614544

>>105614529
On ik_llama-cli or the original?

Anonymous 6/16/2025, 10:57:47 PM No.105614673 >>105614695

>https://github.com/ggml-org/llama.cpp/issues/14044#issuecomment-2961375166
>since it uses MHA rather than GQA or MLA
ACK

Anonymous 6/16/2025, 11:01:00 PM No.105614695 >>105614739 >>105614871

>>105614673
GQA and its devil offspring are the true killers of soulful outputs. This was common knowledge back during the llama2 era and the first command-r was good because it used natural attention as well

Anonymous 6/16/2025, 11:07:26 PM No.105614739

>>105614695
Ok but what if dots doesn't have soulful outputs

Anonymous 6/16/2025, 11:07:28 PM No.105614741

>>105613951
sybau nigger, let the anon talk. Finally someone shares their own experiences instead of just shitposting

Anonymous 6/16/2025, 11:10:12 PM No.105614758 >>105614873

Ok so it looks like I can't squeeze more than 11k context out of dots for the amount of memory I have, and now I am also getting 8.8 t/s (at 0 context, generating 100 tokens). Guess I'll test it a bit to see if it's worth downloading Q3 for.

Anonymous 6/16/2025, 11:25:40 PM No.105614871

>>105614695
This. GQA kills that feeling that the model *gets* what you mean.

Anonymous 6/16/2025, 11:25:48 PM No.105614873

>>105614758
thank for the info

Anonymous 6/16/2025, 11:28:53 PM No.105614895

we need some madlad company to get rid of the tokenizer and train the model on unicode

Anonymous 6/16/2025, 11:33:45 PM No.105614932 >>105615858

>>105611492 (OP)
I'm completely new to this. Should I look further into lmgs if I don't care about chatting and image gen? Will I need a dedicated build or will my pc do?

Anonymous 6/16/2025, 11:45:04 PM No.105614993 >>105615046 >>105615060 >>105615100 >>105615313 >>105619181

guys, what will save local?

Anonymous 6/16/2025, 11:47:42 PM No.105615012 >>105615065

we are already saved.

Anonymous 6/16/2025, 11:52:55 PM No.105615046

>>105614993
BitNet OpenGPT

Anonymous 6/16/2025, 11:55:26 PM No.105615060 >>105615100

>>105614993
miku

Anonymous 6/16/2025, 11:55:52 PM No.105615065

>>105615012
I don't feel saved

Anonymous 6/16/2025, 11:59:26 PM No.105615090 >>105615112

Ok I tested dots and it's really meh. Feels like any other LLM really and on top of that, the trivia knowledge is also not really that good in my tests. No better than Gemma or GLM-4. MAYBE a bit better than Qwen. What trivia did people test that it had better performance on? I didn't do better on the /lmg/ favorites like mesugaki at least, and not on my personal set.

Anonymous 6/17/2025, 12:01:02 AM No.105615100

annoyed angry miku pointing gen ComfyUI 2025-06-16-15_00011_(1).png md5: 7e74251c... 🔍

>>105614993
New paradigm. Eternal waiting room until then. Every possible LLM sucks. It's over.
More realistically I'd like to see more online learning experiments or papers. Like a live feedback thumbs up/down. Not to save local, but to keep my own curiosity alive even if the resulting implementations make the models retarded, slow, broken or anything. Something new to play with.
>>105615060
Miku's love

Anonymous 6/17/2025, 12:02:15 AM No.105615106 >>105615149

Can I generate porn and or 3d models on a 3090Ti?

Anonymous 6/17/2025, 12:02:46 AM No.105615112

>>105615090
>What trivia did people test that it had better performance on?
Even hilab admits the only thing it has better knowledge of is Chinese language trivia knowledge. For anything else it's beaten by fucking Qwen 2.5 72B.

Anonymous 6/17/2025, 12:08:34 AM No.105615149

>>105615106
No, you're retarded

Anonymous 6/17/2025, 12:27:05 AM No.105615313

>>105614993
openai's SOTA phone model will shift the pareto frontier of speed x safety

Anonymous 6/17/2025, 12:30:25 AM No.105615335 >>105615373

Trying to use a local model to make enhancements to tesseract OCR. Tesseract is fairly good without AI, but my ultimate goal is structured output of receipt data, so I can easily port it into hledger

What models would be best for this sort of thing? I've used ollama with some of the vision models and results haven't been great so far

Anonymous 6/17/2025, 12:34:54 AM No.105615360 >>105615415 >>105615419 >>105615433 >>105615438 >>105615727 >>105615887 >>105615959 >>105616569 >>105619906

some migus & friends
https://www.mediafire.com/folder/et1b18ntkdlac/vocaloid

Anonymous 6/17/2025, 12:36:33 AM No.105615373

>>105615335
Why would you need a vision model after you've OCRed the receipt to text?

Anonymous 6/17/2025, 12:41:35 AM No.105615415 >>105615430

>>105615360
>can't download the entire folder without a premium account

Anonymous 6/17/2025, 12:42:10 AM No.105615419 >>105615430

>>105615360
>mfw I've been saving them all manually

Anonymous 6/17/2025, 12:43:29 AM No.105615430 >>105615727 >>105615743

file.png md5: ff49862f... 🔍

>>105615415
>>105615419
sorry, first filehost that came to mind. you can use jDownloader.
I know some other people download em so if you want to make up a more complete collection feel free
my collection, ironically, is probably more incomplete due to catastrophic data loss.

Anonymous 6/17/2025, 12:43:53 AM No.105615433

>>105615360
Thank you Migu genner

Anonymous 6/17/2025, 12:44:56 AM No.105615438

>>105615360
>all .jpg
i curse you!

Anonymous 6/17/2025, 12:45:25 AM No.105615443

Minimax was very obviously trained on the old R1. The thinking process is the same endlessly long plain text where the model tries to think about even the most trivial shit. It even sometimes deliberately gets things wrong at first just to be able to correct itself and think some more.

Anonymous 6/17/2025, 12:59:36 AM No.105615535 >>105615582

>llama.cpp
>warming up the model with an empty run - please wait ... (--no-warmup to disable)

can I just skip warmup for good?

Anonymous 6/17/2025, 1:06:27 AM No.105615582

>>105615535
"Warming up?" You don't know the meaning of those words, Bardin.

Anonymous 6/17/2025, 1:07:42 AM No.105615587 >>105615598

Do people use the quantized context with llama.cpp?

Anonymous 6/17/2025, 1:08:28 AM No.105615595 >>105615710 >>105615716

https://huggingface.co/bartowski/rednote-hilab_dots.llm1.inst-GGUF

bartgoatski quants are up. gogogo

Anonymous 6/17/2025, 1:08:39 AM No.105615598

>>105615587
Yeah, but I wouldn't call them "people"

Anonymous 6/17/2025, 1:25:12 AM No.105615710

>>105615595
does it need a llamacpp update?

Anonymous 6/17/2025, 1:25:44 AM No.105615716

>>105615595
Shit llm for copers that somehow still dont have even 128gb ram for sneedsex

Anonymous 6/17/2025, 1:27:19 AM No.105615727

>>105615360
>>105615430
Host a public FTP server, you coward.

Anonymous 6/17/2025, 1:28:50 AM No.105615743 >>105615767

>>105615430
>first filehost that came to mind
makes sense that a mikutroon's mind is retarded

Anonymous 6/17/2025, 1:29:03 AM No.105615745 >>105615817

Ive being seeing stupid adds for this other half ai anime bullshit...is there anything to approximate locally(plug in some llm to like an anime vroid model or something that has maybe limited voice rec)

Anonymous 6/17/2025, 1:32:31 AM No.105615767 >>105615788 >>105615805 >>105615819

>>105615743
can you point at, precisely, what aspect of miku makes it at all relevant to trans
not the people coopting the design and changing it, the official design, as per crypton future media
you're been throwing around this trans/agp thing for literal months if not years at this point and yet you've failed to even once properly ground your point or lack thereof in any actual sense
nobody's looking at miku and thinking of sex change surgeries that's all you
nobod- why am I even bothering you're clearly off your meds

Anonymous 6/17/2025, 1:33:34 AM No.105615772

>>105612561
fair enough, but you probably dont want to haul a full blown desktop like in the good lan days everyday. Still, if you have a better rec with desktop form factor Im happy to listen

>>105612568
living in a shoebox, cant use the same room all the time, its functionality is time multiplexed

Anonymous 6/17/2025, 1:36:14 AM No.105615788 >>105615794

>>105615767
Xir, this is a trans website. Nobody would be obsessed with this obsolete design if he wasn't a real woman.

Anonymous 6/17/2025, 1:36:56 AM No.105615794

>>105615788
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
whatever retard keep thinking of cock

Anonymous 6/17/2025, 1:38:07 AM No.105615805 >>105615820

>>105615767
>you're been throwing around this trans/agp thing for literal months if not years at this point and yet you

>only 1 person realized that mikuniggers are retards who just spam their dogshit mascot obsessively and almost never ever have a single based opinion imaginable that they post in the thread despite being in the thread every day and despite the many actual trannies and faggots that raid the threads but never got told off by a single mikunigger avatarposter, instead they ignore those people and keep posting their generic trash obsession waifu
yeah... i wonder why people dislike you

Anonymous 6/17/2025, 1:39:34 AM No.105615817

>>105615745
You don't have the IQ to run that

Anonymous 6/17/2025, 1:39:38 AM No.105615819

>>105615767
>nobod- why am I even bothering you're clearly off your meds
American brown tumblr tier writing btw

Anonymous 6/17/2025, 1:39:38 AM No.105615820 >>105615842

>>105615805
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
you could just answer

Anonymous 6/17/2025, 1:43:12 AM No.105615842 >>105615867

>>105615820
I concede you aren't a troon. Now continue not being a troon by no longer spamming that worthless avatar. And if you continue then... well you admit you are a disgusting troon.

Anonymous 6/17/2025, 1:46:01 AM No.105615858

>>105614932
vector databases and semantic search

Anonymous 6/17/2025, 1:47:06 AM No.105615867 >>105615880

>>105615842
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
for someone who rants and raves endlessly about proper argumentation and logic you're proper fucking shit at it
never expect me to reply to your bs again

Anonymous 6/17/2025, 1:48:59 AM No.105615880

>>105615867
Well there you go that is how we know you are a disgusting troon and you have AGP fantasies focusing on that retarded avatar you keep pushing on everyone. I recommend joining the 41%

Anonymous 6/17/2025, 1:50:03 AM No.105615887 >>105615944 >>105615959

>>105615360
https://multiup.io/download/f927ee16eeea9bf6db1576a0d0c1f536/xx.zip
single file

Anonymous 6/17/2025, 1:56:42 AM No.105615944

>>105615887
Thank you. Download finished in a couple seconds. Much better than fucking with jDownloader.

Anonymous 6/17/2025, 1:58:41 AM No.105615959

>>105615360
>>105615887
An artifact to be preserved

Anonymous 6/17/2025, 1:59:53 AM No.105615963 >>105615971 >>105615982 >>105616370

1722130950934-1.jpg md5: bcfbfada... 🔍

>load 0.6B model
>rig starts screeching like it's getting fistfucked by Satan himself

>load 8B model
>rigs handles it just fine

Okay I'm way over my head here, guys.

Anonymous 6/17/2025, 2:00:44 AM No.105615971

>>105615963
It likes its models small and open

Anonymous 6/17/2025, 2:02:46 AM No.105615982

>>105615963
First case it ran on CPU

Anonymous 6/17/2025, 2:06:02 AM No.105616002

__hatsune_miku_and_kasane_teto_vocaloid_and_1_more_drawn_by_rtm1016__2548c07e192f6bfb5b2c0097987e1f44.jpg md5: c2d8b058... 🔍

Anonymous 6/17/2025, 2:35:21 AM No.105616197 >>105616261 >>105616387

Ive been running mythomax for years now, and I just upgraded to a 5080. Whats the new meta for coom llms?

Anonymous 6/17/2025, 2:45:00 AM No.105616261 >>105616340

>>105616197
there is still nothing better than mythomax

Anonymous 6/17/2025, 2:56:16 AM No.105616340 >>105616408

>>105616261
is there a way I can finetrain it a bunch of specific fetish smut to make it better?

Anonymous 6/17/2025, 3:00:05 AM No.105616370 >>105616422

>>105615963
0.6b needs very little bandwidth, so compute usage goes up (and so do the fans). 8b needs more bandwidth, so it spends more time just waiting for memory to reach registers to compute, giving it time to chill.

Anonymous 6/17/2025, 3:02:52 AM No.105616387

>>105616197
Try Cydonia

Anonymous 6/17/2025, 3:06:12 AM No.105616408

>>105616340
Yes you can finetune if you want (but not on a single 5080), but you really should just read a thread because this question gets asked every single god damn thread and if you look at the last thread you would be able to find at least 5 different recommendations for someone in your situation

Anonymous 6/17/2025, 3:08:06 AM No.105616422

>>105616370
This. When my CPU is doing prompt processing the fans go full blast, but once it starts generating tokens they calm down.

Anonymous 6/17/2025, 3:11:19 AM No.105616441 >>105616511

file.png md5: 8d8a0533... 🔍

can someone generate neutral looking anime women pictures so i can use them for my blogposts?

Anonymous 6/17/2025, 3:20:27 AM No.105616511 >>105616512

>>105616441
You could.

Anonymous 6/17/2025, 3:20:47 AM No.105616512 >>105616521

>>105616511
i have rx 6600

Anonymous 6/17/2025, 3:22:56 AM No.105616521

>>105616512
Most image gen is python based. I don't know how if they work with amd. Try stablediffusion.cpp. Should probably work with vulkan.

Anonymous 6/17/2025, 3:29:38 AM No.105616569 >>105619751

>>105615360
ピクシブのリンクでいいでは?

Anonymous 6/17/2025, 3:52:53 AM No.105616734 >>105616768 >>105616800 >>105617143 >>105617582 >>105617625

1597786378292.gif md5: bf906331... 🔍

justpaste (DOTit) GreedyNalaTests

Added:
dans-personalityengine-v1.3.0-24b
Cydonia-24B-v3e
Broken-Tutu-24B-Unslop-v2.0
Delta-Vector_Austral-24B-Winton
Magistral-Small-2506
medgemma-27b-text-it
Q3-30B-A3B-Designant
QwQ-32B-ArliAI-RpR-v4
TheDrummer_Agatha-111B-v1-IQ2_M
Qwen3-235B-A22B-Q5_K_M from community

Been preoccupied for a while but now I'm caught up. 235B was given a star rating, the others had no stars and no flags, they're just the same old really.

Looking for contributions:
Deepseek models
dots.llm1.inst
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the EXACT prompt sent to the backend, in addition to the output. And your backend used + pull datetime. Also a link to the quant used, or what settings you used to make your quant.

Anonymous 6/17/2025, 3:57:18 AM No.105616768 >>105616867

>>105616734
Long time no see

Hi all, Drummer here... 6/17/2025, 4:01:38 AM No.105616800 >>105616867

>>105616734
Could you test...

Cydonia-24B-v3i
Cydonia-24B-v3j

They're both v3.1 candidates.

I'm also curious about...

Cydonia-24B-v3f and Cydonia-24B-v3g but more for research purposes.

Big fan of your work!

Anonymous 6/17/2025, 4:12:42 AM No.105616867

>>105616768
Yee

>>105616800
I'll be honest, I don't feel like spending my kind of slow internet downloading all that. You could just copy and paste the prompts into mikupad and get the outputs yourself pretty easily. If you simply just want all the outputs archived in one place, I do take contributions and will add them if you give them (and of course I will read/rate them).

Anonymous 6/17/2025, 4:55:17 AM No.105617143 >>105619640

>>105616734
>235B was given a star rating
"235b is bad" bros how are we coping with being empirically proven wrong

Anonymous 6/17/2025, 4:55:49 AM No.105617146

Base Image.png md5: c7a9a581... 🔍

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
https://arxiv.org/abs/2506.13284
>In this work, we investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models. We begin by curating the SFT training data through two scaling strategies: increasing the number of collected prompts and the number of generated responses per prompt. Both approaches yield notable improvements in reasoning performance, with scaling the number of prompts resulting in more substantial gains. We then explore the following questions regarding the synergy between SFT and RL: (i) Does a stronger SFT model consistently lead to better final performance after large-scale RL training? (ii) How can we determine an appropriate sampling temperature during RL training to effectively balance exploration and exploitation for a given SFT initialization? Our findings suggest that (i) holds true, provided effective RL training is conducted, particularly when the sampling temperature is carefully chosen to maintain the temperature-adjusted entropy around 0.3, a setting that strikes a good balance between exploration and exploitation. Notably, the performance gap between initial SFT models narrows significantly throughout the RL process. Leveraging a strong SFT foundation and insights into the synergistic interplay between SFT and RL, our AceReason-Nemotron-1.1 7B model significantly outperforms AceReason-Nemotron-1.0 and achieves new state-of-the-art performance among Qwen2.5-7B-based reasoning models on challenging math and code benchmarks, thereby demonstrating the effectiveness of our post-training recipe.
https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
Isn't posted yet. pretty interesting

Anonymous 6/17/2025, 4:59:10 AM No.105617169

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
https://arxiv.org/abs/2506.13585
Not sure if they posted a paper when they released their model but the arxiv version is up now
The Amazon Nova Family of Models: Technical Report and Model Card
https://arxiv.org/abs/2506.12103
paper from amazon. doesn't seem like they're open sourcing anything so w/e

Anonymous 6/17/2025, 5:06:46 AM No.105617224 >>105617522

Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
https://arxiv.org/abs/2506.13681
>Sampling from language models impacts the quality and diversity of outputs, affecting both research and real-world applications. Recently, Nguyen et al. 2024's "Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs" introduced a new sampler called min-p, claiming it achieves superior quality and diversity over established samplers such as basic, top-k, and top-p sampling. The significance of these claims was underscored by the paper's recognition as the 18th highest-scoring submission to ICLR 2025 and selection for an Oral presentation. This paper conducts a comprehensive re-examination of the evidence supporting min-p and reaches different conclusions from the original paper's four lines of evidence. First, the original paper's human evaluations omitted data, conducted statistical tests incorrectly, and described qualitative feedback inaccurately; our reanalysis demonstrates min-p did not outperform baselines in quality, diversity, or a trade-off between quality and diversity; in response to our findings, the authors of the original paper conducted a new human evaluation using a different implementation, task, and rubric that nevertheless provides further evidence min-p does not improve over baselines. Second, comprehensively sweeping the original paper's NLP benchmarks reveals min-p does not surpass baselines when controlling for the number of hyperparameters. Third, the original paper's LLM-as-a-Judge evaluations lack methodological clarity and appear inconsistently reported. Fourth, community adoption claims (49k GitHub repositories, 1.1M GitHub stars) were found to be unsubstantiated, leading to their removal; the revised adoption claim remains misleading. We conclude that evidence presented in the original paper fails to support claims that min-p improves quality, diversity, or a trade-off between quality and diversity.
RIP minp

Anonymous 6/17/2025, 5:15:00 AM No.105617281

Base Image.png md5: 442f5526... 🔍

BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
https://arxiv.org/abs/2506.12040
>Binary quantization represents the most extreme form of large language model (LLM) compression, reducing weights to ±1 for maximal memory and computational efficiency. While recent sparsity-aware binarization methods achieve sub-1-bit compression by pruning redundant binary weights, they suffer from three critical challenges: performance deterioration, computational complexity from sparse mask management, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit LLM quantization framework that leverages adaptive weight transformation and binary pattern clustering to overcome these limitations, delivering both superior accuracy and efficiency. Our approach incorporates two key innovations: (1) a Learnable Transformation that optimizes invertible scaling and rotation matrices to align binarized weights with full-precision distributions, enabling incoherence processing to enhance layer-wise representation quality; (2) a Flash and Accurate Binary Codebook that identifies recurring binary vector clusters, compressing them into compact indices with tailored distance metrics and sign-based centroid updates. This eliminates the need for sparse masks, enabling efficient inference on standard hardware.
https://github.com/Chooovy/BTC-LLM
bpw below bitnet...

Anonymous 6/17/2025, 5:59:26 AM No.105617522

>>105617224
>After we showed these results to the authors, they informed us that we had run our experiments using the “Llama" formatting of GSM8K prompts as we used the command from the authors’ public Colab notebook; the authors clarified that "Llama" formatting should be used only for Llama models. We then reran our experiments using standard formatting of GSM8K prompts. The results were nearly identical (Appendix B), with one small difference: min-p does produce higher scores for 2 of 12 language models. Again, we conclude min-p does not outperform other samplers on either formatting of GSM8K when controlling for hyperparameter volume.
Why would you want to publish your ignorance of chat templates and wasting of ~3000 Nvidia A100-hours of compute as a result? Instead of engendering confidence in your findings, this just makes you come across as petty and seething.

Anonymous 6/17/2025, 6:08:05 AM No.105617582

>>105616734
All those new 24b mistral slops and not a gem among them.

Anonymous 6/17/2025, 6:11:57 AM No.105617604 >>105617608

will they ever release the multimodal qwen 3

Anonymous 6/17/2025, 6:12:28 AM No.105617608

>>105617604
don't worry, qwen 4 will be omni and smart

Anonymous 6/17/2025, 6:14:31 AM No.105617625

>>105616734
cockbench?

Anonymous 6/17/2025, 6:15:30 AM No.105617634

>>105612859
hrt.coffee

Anonymous 6/17/2025, 6:16:27 AM No.105617637 >>105617645 >>105617684 >>105617864 >>105618521

so what's the general consensus here on doing RP with reasoning? It's shit, right? It kinda improves creativity but not by much

Anonymous 6/17/2025, 6:18:08 AM No.105617645

>>105617637
It depends. I've found that if you do a lot of complex math, logic, and programming in your RPs you'll notice a massive difference.

Hi all, Drummer here... 6/17/2025, 6:23:25 AM No.105617684

>>105617637
Prefilling the think block to have reasoning act like a director/storywriter seems to help. I've only tried it on the new R1 though.

Anonymous 6/17/2025, 6:32:22 AM No.105617731 >>105620373

Can I have a general mesugaki card sample?

Anonymous 6/17/2025, 6:35:44 AM No.105617753

__kaai_yuki_and_hiyama_kiyoteru_vocaloid_drawn_by_naoto_sakurai__1827a0f038230efcfe7748f02f9382be.png md5: bea0a0df... 🔍

>the purpose of benchmaxxing on math is to improve the quality of RPs where anon is a grade-school math teacher

Anonymous 6/17/2025, 6:50:22 AM No.105617864

>>105617637
It definitely makes it worse for magistral, it actually makes it less likely to follow the sys prompt

Anonymous 6/17/2025, 7:43:47 AM No.105618208 >>105618276 >>105618366 >>105618436 >>105618500

Can you redpill me on WizardLM and Miqu? They seem quite large models, did anyone actually use them at larger quants?

Anonymous 6/17/2025, 7:56:20 AM No.105618276 >>105618364

>>105618208
Buy an ad.

Anonymous 6/17/2025, 8:05:17 AM No.105618329 >>105618428 >>105619257 >>105619427

Given how close we are to AGI, is it safe to say that Europe has zero chance of entering the running? What are the odds there's some dark horse AI lab that has been building in secret on the continent?

Anonymous 6/17/2025, 8:12:08 AM No.105618364

>>105618276
Yeah cause those are the current hot thing.

Anonymous 6/17/2025, 8:12:55 AM No.105618366 >>105618665

>>105618208
I see that you woke up from a year old coma. Qwen 3 235b repaces WizardLM directly and if you got 128gb ram +16/24gb vram dynamic R1 replaces that https://unsloth.ai/blog/deepseekr1-dynamic

Anonymous 6/17/2025, 8:23:11 AM No.105618426 >>105618490

Anyone knows what multi modal model used on smash-or-pass-ai site? Can abliterated Gemma-3 do this?

Anonymous 6/17/2025, 8:23:28 AM No.105618428

>>105618329
AGI will start with mistral-nemo 2

Anonymous 6/17/2025, 8:24:05 AM No.105618436 >>105618481

>>105618208
No, they didn't. People would usually try to fit as much as they could into a single 3090 because that's what everyone was using (and probably what most people still are using)

Anonymous 6/17/2025, 8:32:55 AM No.105618481 >>105618834

>>105618436
>probably what most people still are using
I doubt that

Anonymous 6/17/2025, 8:34:02 AM No.105618490 >>105618562

>>105618426
I think Gemini 2.5 Flash probably. If you click the websim button you can edit it.

Anonymous 6/17/2025, 8:35:55 AM No.105618500

>>105618208
they are more sovl compared to what we have now but they are also noticeably more retarded

Anonymous 6/17/2025, 8:39:12 AM No.105618521

>>105617637
Reasoning feels like is the right place for the model to plan ahead and maintain state, but you'd have to keep at least 2 reasoning traces in context, which is different than how they've been trained (mostly single-turn math questions). And Gemma 3 works better with fake reasoning for RP than Magistral Small natively trained on it does.

Another problem of reasoning is that it dilutes attention to the instructions, so ideally you'd want to keep instructions high in the context, but again models aren't trained for it, so it often gives issues. Ironically, models not trained with system instructions in mind (just to follow whatever user says) may work better for that.

On the other hand, I find that reasoning tends to decrease repetitive patterns in the actual responses. It's just that Mistral Small and by extension Magistral suck for these uses and they're only good for saying "fuck", "cock" and "pussy". If you're OK with just that...

Anonymous 6/17/2025, 8:45:16 AM No.105618562

>>105618490
>Gemini 2.5
Isn't it censored?

Anonymous 6/17/2025, 8:59:19 AM No.105618654 >>105618688

server cpu cucks will lose their time in the spotlight
MoEs will not scale

Anonymous 6/17/2025, 9:01:39 AM No.105618665 >>105618678

>>105618366
>Qwen 3 235b repaces WizardLM directly
Smaller Qwen 3's are censored quite badly, is this the same?

Anonymous 6/17/2025, 9:03:17 AM No.105618678

>>105618665
Idk your use case but in my experience barring very few insanely slopped exceptions, no model is really censored given a good system prompt, especially 100b+ models.

Anonymous 6/17/2025, 9:04:26 AM No.105618688 >>105618840

>>105618654
>MoEs will not scale
Titans&co are mixture of attention experts.

Anonymous 6/17/2025, 9:33:42 AM No.105618834

>>105618481
I'm not running ~70b models but I'm still using a single 3090
A 5090 would be about 5x the price I paid for the 3090 and not much more useful

Anonymous 6/17/2025, 9:34:58 AM No.105618840

>>105618688
>moae
it doesn't even sound cool!

Anonymous 6/17/2025, 9:37:46 AM No.105618863 >>105618877

chroma-unlocked-v37-Q8_0.gguf.png md5: 12f6851d... 🔍

LLMbros... we got too cocky while image and videogenbros are eating good... I don't think anything short of actually multimodal R3 will save us...

Anonymous 6/17/2025, 9:39:59 AM No.105618877 >>105619001

>>105618863
pic unrelated?

Anonymous 6/17/2025, 10:00:31 AM No.105619001

>>105618877
No

Anonymous 6/17/2025, 10:09:59 AM No.105619044 >>105619183 >>105619192 >>105619265 >>105619471

What is raison d'etre for Q4_0 quants if everyone agrees that Q4_K is always better? I hear this argument for years now

Anonymous 6/17/2025, 10:39:01 AM No.105619181

>>105614993
No one can make shit. All hopes ride on Sam

Anonymous 6/17/2025, 10:39:05 AM No.105619183

>>105619044
og nigga

Anonymous 6/17/2025, 10:40:20 AM No.105619192 >>105619228 >>105619265

quantcompgraph.png md5: 1515522c... 🔍

>>105619044
Nothing, it's a legacy format. iq4_xs is both smaller and better. q4_k_s/m are very slightly bigger and much better.

Anonymous 6/17/2025, 10:44:50 AM No.105619228 >>105619265

>>105619192
nta but k_s?
I've been grabbing k_m like a monkey all this time

Anonymous 6/17/2025, 10:49:45 AM No.105619257

>>105618329
>dark horse AI lab
Kek, you have no idea.

Anonymous 6/17/2025, 10:51:13 AM No.105619265 >>105619472

>>105619044
>>105619192
>>105619228
qat only works for q4_0 for models that have it (gemma)

Anonymous 6/17/2025, 11:00:30 AM No.105619313 >>105619435

has anyone tried any of the new ocr models such as MonkeyOCR and Nanonets-OCR-s?

looking to convert research papers in pdf to markdown or txt
docling is letting me down in accuracy and some other issues

Anonymous 6/17/2025, 11:18:51 AM No.105619427 >>105619439 >>105619443

>>105618329
>Given how close we are to AGI
We're not.

Anonymous 6/17/2025, 11:20:24 AM No.105619435 >>105619493

>>105619313
Have you tried simply extracting the text directly from the PDF?

Anonymous 6/17/2025, 11:21:35 AM No.105619439

>>105619427
We're

Anonymous 6/17/2025, 11:22:21 AM No.105619443 >>105619485

>>105619427
Of course, your job is safe don't worry. But hypothetically if we were... does Europe have a shot?

llama.cpp CUDA dev !!yhbFjk57TDr 6/17/2025, 11:25:39 AM No.105619471

>>105619044
q4_0 is faster than q4_K_M due to the simpler data structure.
For development purposes in particular it's also the quant that I use because I don't care about maximizing quality/VRAM in that scenario but I do care about speed and simplicity to make my measurements easier.
I never use q4_0 outside of testing though.

Anonymous 6/17/2025, 11:25:48 AM No.105619472

>>105619265
qat is not nearly as good as google claims it to be.

Anonymous 6/17/2025, 11:27:41 AM No.105619485 >>105619525

>>105619443
AGI will not be achieved by scaling up our current architectures.
It requires a fundamental breakthrough which could come from anywhere, including Europe.

Anonymous 6/17/2025, 11:28:33 AM No.105619493

>>105619435
it didn't work that well with the multiple columns and formulas, but I'll give it another try, thanks.

Anonymous 6/17/2025, 11:33:24 AM No.105619525 >>105619590

>>105619485
>AGI will not be achieved by scaling up our current architectures.
Source: your ass
>breakthrough which could come from anywhere, including Europe
It could also come from a pajeet 5 year old. Will it? No. You actually need a ML industry and companies for that.

Anonymous 6/17/2025, 11:40:57 AM No.105619566 >>105619662

Gemma 3 seems obsessed with wrapping her legs around your waist, no matter her position.

Anonymous 6/17/2025, 11:44:38 AM No.105619590 >>105619596 >>105619607

1730378102308121.png md5: 36bf987c... 🔍

>>105619525
*AHEM*

Anonymous 6/17/2025, 11:46:01 AM No.105619596

>>105619590
They have nothing absolutely nothing noteworthy since Miqu and Mixtral.

Anonymous 6/17/2025, 11:46:59 AM No.105619607 >>105619612

>>105619590
>no good model since 2407, 9 months ago
>rekt by r1 like everyone else, except they dont have nearly as much money to recover, and didn't

Anonymous 6/17/2025, 11:47:58 AM No.105619612

1733193251086829.jpg md5: 2434e858... 🔍

>>105619607
>9 months ago
jesus christ

Anonymous 6/17/2025, 11:51:45 AM No.105619630 >>105620152 >>105620434

file.jpg md5: 637c2c41... 🔍

https://huggingface.co/Menlo/Jan-nano
https://huggingface.co/Menlo/Jan-nano-gguf

Anonymous 6/17/2025, 11:53:04 AM No.105619635 >>105619642

>still no local llm with native image gen

Anonymous 6/17/2025, 11:54:15 AM No.105619640

>>105617143
It's only good at high quant.

Anonymous 6/17/2025, 11:54:25 AM No.105619642

>>105619635
too unsafe, please understand

Anonymous 6/17/2025, 11:57:54 AM No.105619662 >>105620164

>>105619566
For me she loves tangling her fingers in my hair and arching her back, no matter what the context.

Anonymous 6/17/2025, 12:03:59 PM No.105619694 >>105619710 >>105619724 >>105619725 >>105620779

I tried dots q4. It feels like a very smart 30b model that still makes a brainfart or two like a 30b. So it is pretty useless. It is like a moe grok1.

Anonymous 6/17/2025, 12:04:20 PM No.105619697

>>105612316
>>105611862
SWE-bench tests Python only. Perfect if you need something marginally better at writing a glue script I guess.

Anonymous 6/17/2025, 12:06:56 PM No.105619710 >>105619717 >>105619719

>>105619694
prompt issue

Anonymous 6/17/2025, 12:08:17 PM No.105619717 >>105619722

>>105619710
model issue

Anonymous 6/17/2025, 12:08:39 PM No.105619719

>>105619710
The sperm that made you had a prompt issue.

Anonymous 6/17/2025, 12:09:06 PM No.105619722

>>105619717
works for me, I'm having a blast

Anonymous 6/17/2025, 12:09:25 PM No.105619724

>>105619694
>It is like a moe grok1.
but grok1 is already a moe?

Anonymous 6/17/2025, 12:09:38 PM No.105619725 >>105619734

>>105619694
grok 1 was a moe

Anonymous 6/17/2025, 12:11:14 PM No.105619734 >>105619741

>>105619725
YOU are moe

Anonymous 6/17/2025, 12:11:58 PM No.105619741

>>105619734
nuh uh im dense

Anonymous 6/17/2025, 12:12:54 PM No.105619751 >>105619796 >>105619812

long shot but if anyone else saved migus (or friends) down feel free to reupload. most of my edited data got wiped.

>>105616569
what

Anonymous 6/17/2025, 12:21:11 PM No.105619796

>>105619751
Stop with offtopic spamming. Nobody cares about your journey to become a woman.

Anonymous 6/17/2025, 12:23:00 PM No.105619812

>>105619751
>my edited data got wiped
Good!

Anonymous 6/17/2025, 12:24:43 PM No.105619828 >>105619849

why can't you faggots just get a miku thread going and migrate? you can spam there as much as you want

Anonymous 6/17/2025, 12:27:41 PM No.105619849

>>105619828
They don't like /a/ for some reason.

Anonymous 6/17/2025, 12:27:58 PM No.105619851 >>105619864

>samefagging this hard

Anonymous 6/17/2025, 12:30:02 PM No.105619864

>>105619851
Stop spamming troon

Anonymous 6/17/2025, 12:31:40 PM No.105619870

my thinking boxes in st are inconsistent and sometimes the reply ends up inside both the thinking and the reply blocks

Anonymous 6/17/2025, 12:37:23 PM No.105619906 >>105619933

well duh_thumb.jpg.webm md5: 0518ec02... 🔍

WebM not supported

>>105615360
>>712856332
added some random loop animations
single file:
https://multiup.io/download/1f7bdf2cee24911d0cef25316887bba0/migu%20animations.zip

Anonymous 6/17/2025, 12:43:33 PM No.105619933 >>105619951

>>105619906
Nobody loves you and you should kill yourself

Anonymous 6/17/2025, 12:48:15 PM No.105619951 >>105620049

>>105619933
A few loony troons love him though

Anonymous 6/17/2025, 12:52:16 PM No.105619971 >>105619977 >>105619997 >>105620008

Mikugaki Anon makes the thread more comfy, muh troons posters are just annoying.

Anonymous 6/17/2025, 12:53:33 PM No.105619977

>>105619971
True story

Anonymous 6/17/2025, 12:53:35 PM No.105619978

Masculine urge to post pictures of vocaloids.

Anonymous 6/17/2025, 12:56:10 PM No.105619994 >>105620063

>vocaloids
>masculine
The irony ironing

Anonymous 6/17/2025, 12:56:32 PM No.105619997

>>105619971
What causes a person to think about and see "troons" everywhere?

Anonymous 6/17/2025, 12:59:08 PM No.105620008 >>105620013 >>105620183

>>105619971
This is not your safespace.

Anonymous 6/17/2025, 1:00:02 PM No.105620013 >>105620026

>>105620008
It's not yours either, feel free to fuck off.

Anonymous 6/17/2025, 1:01:52 PM No.105620026 >>105620047

>>105620013
I am not the one shitting up thread with anime ai generated slop though

Anonymous 6/17/2025, 1:05:17 PM No.105620045 >>105620155

I think there's a compromise to be made here - miku can be posted, but she must advocate for total troon death.

Anonymous 6/17/2025, 1:05:45 PM No.105620047 >>105620103

>>105620026
>go to ai thread
>see ai generated content
>get upset
zoomer logic everyone

Anonymous 6/17/2025, 1:06:21 PM No.105620049

>>105619951
i love xim xes my xusband

Anonymous 6/17/2025, 1:08:29 PM No.105620063

>>105619994
>masculine urge to fuck women

>women
>masculine

This is your logic.

Anonymous 6/17/2025, 1:16:15 PM No.105620103 >>105620144

>>105620047
No that works with /sdg/ or /ldg/ only.

Anonymous 6/17/2025, 1:22:40 PM No.105620144

>>105620103
You would still be sperging out even if they were made with anole or bagel.

Anonymous 6/17/2025, 1:23:56 PM No.105620152

>>105619630
Q4 might be fine to load in browser directly and replace google

Anonymous 6/17/2025, 1:25:07 PM No.105620155

>>105620045
Why would she go against her only fanbase?

Anonymous 6/17/2025, 1:26:06 PM No.105620164

>>105619662
Yeah, those too. I get that sex ultimately is repetitive, but it feels as if Gemma 3 only knows a couple ways of describing it non-explicitly. There are other areas too where it isn't very creative and after a while you'll notice it uses always the same patterns.

Anonymous 6/17/2025, 1:29:59 PM No.105620183

>>105620008
Ywnbaw

Anonymous 6/17/2025, 1:31:32 PM No.105620193 >>105620203 >>105620226 >>105620425 >>105620650

Today has been one of the worst blackpills for me regarding local models in a long time.

The strongest open source vision model is worse than o4-mini.

Try this experiment:

Upload this image https://www.lifewire.com/thmb/7GETJem9McVRDI8kbzLM6TfwED0=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/windows-11-screenshots-615d31976db445bb9f725b510becd850.png

With this prompt

You are an assistant tasked with finding a bounding box for the indicated element.
In your response you should return a string in the following format:
[BOUNDING_BOX x0 y0 x1 y1]
x0, y0, x1, y1 being decimal coordinates normalized from 0 to 1 starting at the top left corner of the screen.
The indicated element you are required to find is the following:
Start Menu button

The only model that gives a half right response is Gemma 3 27B. All the other open source models give wrong answers. ALL of the proprietary models give better answers than the best open source model.

Now you might think this is a weird thing to ask the model. But this is exactly the kind of tasks that are required for an assistant to control a computer and perform tasks autonomously.

Anonymous 6/17/2025, 1:32:38 PM No.105620203 >>105620227

>>105620193
This is about as retarded as expecting them to count letters and do calculations.

Anonymous 6/17/2025, 1:36:12 PM No.105620226 >>105620247 >>105621260

>>105620193
Gemma 3 only uses a 400M parameters vision model and with current implementations it encodes all images in 256 tokens at 896x896 resolution. It's a miracle it performs like it does. Imagine if it had at least twice the size.

Anonymous 6/17/2025, 1:36:17 PM No.105620227 >>105620299

>>105620203
Then what workflow do you suggest to let a model control a computer?
And if it's so retarded then how come the tiniest OpenAI or Gemini models can give a reasonable answer but Qwen 3 235B can't? Your response sounds like cope to me.

Anonymous 6/17/2025, 1:39:14 PM No.105620247

>>105620226
Yeah but the problem is that I haven't found any open source models that work. Llama 3.2 90B and the Qwen model I mentioned above give worse responses than Gemma.
The fact that Gemma 27B works given the tiny size tells me Google has used some of the same training data or methods they used for Gemini and that's why it kinda works.

Anonymous 6/17/2025, 1:43:40 PM No.105620270 >>105620315

Gtm44Jia0AcwMyM.jpg md5: c143e604... 🔍

Anonymous 6/17/2025, 1:49:04 PM No.105620299 >>105620369

>>105620227
>tiniest OpenAI or Gemini models
How do you know the number of parameters these closed models have?
For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?

Anonymous 6/17/2025, 1:51:26 PM No.105620315

>>105620270
beeg meeg

Anonymous 6/17/2025, 1:52:55 PM No.105620326 >>105620345 >>105620350

no-qwen-above-30b.png md5: bad538e9... 🔍

Qwen won't make dense models larger than 30B anymore.

https://x.com/JustinLin610/status/1934809653004939705
> For dense models larger than 30B, it is a bit hard to optimize effectiveness and efficiency (either training or inference). We prefer to use MoE for large models.

Anonymous 6/17/2025, 1:53:52 PM No.105620333 >>105620361

https://www.youtube.com/watch?v=_0rftbXPJLI

Anonymous 6/17/2025, 1:56:16 PM No.105620345

>>105620326
Local is evolving

Anonymous 6/17/2025, 1:56:36 PM No.105620350 >>105620407

>>105620326
The future is <=32B Dense and >150B MoE. And I'm all for it, just buy RAM.

Anonymous 6/17/2025, 1:57:46 PM No.105620361

>>105620333
Buy ad!

Anonymous 6/17/2025, 1:59:03 PM No.105620366

migu_general.jpg md5: c637c3ec... 🔍

Anonymous 6/17/2025, 1:59:33 PM No.105620369

>>105620299
>How do you know the number of parameters these closed models have?
I don't, but what difference does it make? If they have better quality results because they have bigger models they still have better quality results.
>For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?
Again, does it matter? Do local users have any kind of image segmentation model that can detect GUI elements? No, at least that I know of. I tried the most popular phrase grounding model (owlv2) and it basically knows nothing about GUI elements.
If you wanna have a go at it here's a list of models
https://huggingface.co/models?pipeline_tag=zero-shot-object-detection&sort=trending

Anonymous 6/17/2025, 2:00:04 PM No.105620373

>>105617731
Yuma.

Anonymous 6/17/2025, 2:05:50 PM No.105620407 >>105620414 >>105620419

>>105620350
RAM is slow.

Anonymous 6/17/2025, 2:06:33 PM No.105620414 >>105620422

>>105620407
but not for MoE!

Anonymous 6/17/2025, 2:07:30 PM No.105620419 >>105620648

>>105620407
Just buy fast RAM.

Anonymous 6/17/2025, 2:07:37 PM No.105620422

>>105620414
It is compared to a dense model you can fit in VRAM.

Anonymous 6/17/2025, 2:07:53 PM No.105620425 >>105620437

>>105620193
This is a trap, data-mining post by closed AI
Tell sam atman to kill himself.

Anonymous 6/17/2025, 2:09:10 PM No.105620434 >>105620457 >>105620517

>>105619630
>MCP Server?
>No Problem, set up SERPER API!
>Just place your SERPER API KEY IN JAN! 1000 Pages for only 0.30$!!
Are they fucking retarded? I can just use grok for free and it probably works better too. Soon probably gemini according to rumors.
I want something local. Why don't they show me how to set that up with duckduckgo or brave or whatever.

Anonymous 6/17/2025, 2:09:30 PM No.105620437

>>105620425
No it's not

Anonymous 6/17/2025, 2:11:15 PM No.105620457

>>105620434
Download a gguf model locally, load it up on your gpu....to call a $$ api for the results.
They didnt think that through. How stupid.

Anonymous 6/17/2025, 2:18:08 PM No.105620495 >>105620519 >>105620918

Asking for a friend, how do I go from no model to having a model which acts just like a girl i used to know?

Anonymous 6/17/2025, 2:22:00 PM No.105620517

>>105620434
>It's a chrome app
>2GB installed
>1GB updater
>All in my C: which has no more free space left
I'm so angry

Anonymous 6/17/2025, 2:22:08 PM No.105620519

>>105620495
Go to koboldcpp's github, find the wiki tab, and read the quickstart.
Then download mistral nemo instruct gguf from huggingface and silly tavern and make a card of that person.

Anonymous 6/17/2025, 2:45:26 PM No.105620648 >>105620666

>>105620419
If only you could easily buy more channels...

Anonymous 6/17/2025, 2:45:38 PM No.105620650 >>105620738

>>105620193
Bro you could train a tiny yolo model on whatever GUI elements you want and strap that to your LLM. Go be retarded elsewhere

Anonymous 6/17/2025, 2:49:39 PM No.105620666 >>105620674

>>105620648
DDR6 and CAMM2 are coming Soon™ to save the day

Anonymous 6/17/2025, 2:51:35 PM No.105620674

>>105620666
Here's hoping it comes alongside CPUs with even wider busses too.

Anonymous 6/17/2025, 3:01:40 PM No.105620738 >>105620763 >>105620807

>>105620650
We're talking about GUI elements here. Training an object detection model on specific GUI elements would take a lot of effort for little reward. If I were to take screenshots of all the elements I want to click on I could just do image similarity matching.
What I want to do is describe the element I want to match in natural language and have the model find it, without giving it any visual examples (other than the one it has to find in the image).
You think I'm retarded? Why? You think the idea is retarded?
You think Manus and Operator are retarded?

Anonymous 6/17/2025, 3:03:49 PM No.105620763

>>105620738
https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B

Anonymous 6/17/2025, 3:05:52 PM No.105620779

>>105619694
It felt worse than the current 30B models, though.

Anonymous 6/17/2025, 3:09:28 PM No.105620807 >>105620889 >>105621260

>>105620738
You want the bounding boxes of GUI elements, which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what is. The reward would be having a model for your use case and stop bitching here. I'm training my own small models for my use cases all the time.

Anonymous 6/17/2025, 3:21:11 PM No.105620889 >>105620944

>>105620807
It's not really a segmentation task. Segmentation in the classic sense (like in the YOLO models you mentioned) means detecting pre-determined categories in the image. The term they use for detecting objects based on free form natural language is phrase grounding.
>which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what is
I don't think using the tool that works better for the job is retarded. Do you? Or are you saying there a tool that works better? If so, then I challenge you to show me a segmentation model that performs better at doing this task than ChatGPT or Gemini. Like I said above, I tried the most popular phrase grounding model and it doesn't know what a start menu is. When you ask it about GUI elements it will just randomly highlight random icons in the image.
>I'm training my own small models for my use cases all the time.
If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.

Anonymous 6/17/2025, 3:24:50 PM No.105620918

>>105620495
>just like a girl i used to know?
Stop. Don't try to have LLMs simulate real people. Take your meds instead.

Anonymous 6/17/2025, 3:28:18 PM No.105620944 >>105621016

>>105620889
Multimodality is in its infancy and requires a lot more resources to pull off than simple LLM + tool calling, so why are you even surprised that only cloud models are able to do that?
The only solution now for local models is training a model for a specific task, which here is segmentation on GUI elements. Even if Google bothers to release a new model it won't be as good as their flagship, that much is a given.
>If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.
Model A is a cloud model and model B is running on my own computer, model B is superior by default.

Anonymous 6/17/2025, 3:37:36 PM No.105621016 >>105621036 >>105621039

>>105620944
>so why are you even surprised that only cloud models are able to do that?
Because I was under the impression that the largest open source models would BTFO the -mini and Flash commercial models in all tasks. I was hoping somebody to prove me wrong but it seems that the vision capabilities in local models are just worse.
>Model A is a cloud model and model B is running on my own computer, model B is superior by default.
That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.

Anonymous 6/17/2025, 3:40:09 PM No.105621036 >>105621076

>>105621016
>That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.
For all work. There is no reason to ever give them free data, even if you've been successfully programmed to not care about them building a profile on you.

Anonymous 6/17/2025, 3:40:24 PM No.105621039 >>105621076

>>105621016
You can't automate that stuff without using your cloud model API and that's not free, it has nothing to do with nsfw

Anonymous 6/17/2025, 3:44:48 PM No.105621076 >>105621096

>>105621039
You mean free as in beer or free as in freedom? Because if I can spend a few dollars to save me an hour of work then I probably would.

>>105621036
It's not free if they're giving me something in return.

Anonymous 6/17/2025, 3:47:25 PM No.105621096 >>105621319

>>105621076
>free as in beer or free as in freedom
Both. You really want to give cloud models pics of your own computer? That's not even a question if you work for a company

Anonymous 6/17/2025, 4:11:55 PM No.105621260 >>105621294 >>105621405

>>105620226
do local inference backends not support some form of tiling to spread the image out across several tiles?

I've been using Gemini 2.5 for some video understanding tasks and I use ffmpeg to resize+pad the extracted frames so that they fit across two tiles. Works fairly well for a lot of tasks but the bounding boxes are kinda shit.

>>105620807
retard

Anonymous 6/17/2025, 4:13:56 PM No.105621282 >>105621313

New music model dropped:

https://github.com/tencent-ailab/SongGeneration

Someone please try this and report back.

Anonymous 6/17/2025, 4:15:08 PM No.105621294 >>105621343

>>105621260
>Do X, but it's shit for your use case
Retarded gorilla

Anonymous 6/17/2025, 4:16:38 PM No.105621307 >>105621969

>>105611492 (OP)
>install LM Studio on laptop with a 3060
>Not even 30% CUDA utilization with 100% offloading to vram
>Everything takes minutes to answer
Why might this be?

Anonymous 6/17/2025, 4:17:27 PM No.105621313 >>105621337

>>105621282
>SongGeneration-base(zh) v20250520
>SongGeneration-base(zh&en) Coming soon
>SongGeneration-full(zh&en) Coming soon
Chinese only for now

Anonymous 6/17/2025, 4:18:46 PM No.105621319 >>105621342

>>105621096
If George Hotz is not afraid of streaming his PC to the whole world, I don't see why I should be afraid of streaming my PC to Google.
As for the company I work for, nobody is monitoring me that closely that I'd get in trouble for leaking a few screenshots with bits of internal data here and there to a non authorized API . I don't handle anything too sensitive.

Anonymous 6/17/2025, 4:20:32 PM No.105621337

>>105621313
It seems to support instrumental stuff at least.
I am curious what is the generation speed for this, if it's dreadful like YuE or if it's a bit fast like AceStep (which is shit but fast)

Anonymous 6/17/2025, 4:21:14 PM No.105621342 >>105621386 >>105621399

>>105621319
>Defending cloud cuckery this hard
You're in the wrong general bro

Anonymous 6/17/2025, 4:21:18 PM No.105621343 >>105621399

>>105621294
Get some reading comprehension you ape. Everything apart from the bounding boxes works well. I can get time stamped occurrences of company logos even when they're blurred/upside down, it completely btfos older methods. Even the inaccurate bounding boxes are still in the general location and useful for human evaluation. It's super convenient to feed the audio in too.

Anonymous 6/17/2025, 4:28:19 PM No.105621386

>>105621342
I'm just being realistic. Acknowledging deficiencies is the first step towards improving.
I have a friend who told me he has had success with this using an open source model but doesn't remember which one he used, he's gonna send me some info when he gets back from work.

Anonymous 6/17/2025, 4:29:31 PM No.105621399

>>105621343
>>105621342

Anonymous 6/17/2025, 4:30:28 PM No.105621405 >>105621463

>>105621260
Pan&Scan (tiling) isn't implemented in llama.cpp and it didn't work last time I tried it in vLLM where apparently it's implemented (with an optional flag to enable it). From what I've read, it needs some form of prompt engineering; it's not as simple as just sending image tiles.

Anonymous 6/17/2025, 4:38:33 PM No.105621463

>>105621405
thanks for the info. That explains why Gemma performed so poorly for manga translation when I tried a while back, the image was probably resized to the point of the text being unreadable.

Anonymous 6/17/2025, 4:50:28 PM No.105621574

>>105621559
>>105621559
>>105621559

Anonymous 6/17/2025, 5:38:16 PM No.105621969

>>105621307
Low memory bandwidth can't saturate compute.