/lmg/ - Local Models General - /g/ (#105611492) [Archived: 948 hours ago]

Anonymous
6/16/2025, 5:43:49 PM No.105611492
1723669949923791
1723669949923791
md5: 064c7d78fc96d0f0052777152b6a0830🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105601326 & >>105589841

►News
>(06/16) MiniMax-M1, hybrid-attention reasoning models released: https://github.com/MiniMax-AI/MiniMax-M1
>(06/15) llama-model : add dots.llm1 architecture support merged: https://github.com/ggml-org/llama.cpp/pull/14118
>(06/14) NuExtract-2.0 for structured information extraction: https://hf.co/collections/numind/nuextract-20-67c73c445106c12f2b1b6960
>(06/13) Jan-Nano: A 4B MCP-Optimized DeepResearch Model: https://hf.co/Menlo/Jan-nano

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>105611887 >>105612859 >>105612968 >>105613219 >>105613273 >>105613788 >>105613888 >>105614932 >>105621307
Anonymous
6/16/2025, 5:44:04 PM No.105611494
Lovely Miku General
Lovely Miku General
md5: cf8bf57614332b59dc9c40e7570009db🔍
►Recent Highlights from the Previous Thread: >>105601326

--Papers:
>105606869 >105606875
--Evaluation of dots.llm1 model performance and integration challenges in local inference pipelines:
>105601735 >105604736 >105604782 >105604857 >105604810 >105604838 >105605017 >105605319 >105605475 >105605551 >105605556 >105605609 >105605671 >105605701 >105605582 >105605670 >105605965
--llama-cli vs llama-server performance comparison showing speed differences and config inconsistencies:
>105601495 >105601540 >105601746 >105601830 >105601953 >105601967 >105602123 >105602170 >105602190 >105602380 >105601654
--Evaluating budget hardware options for local LLM deployment with portability and future model scaling in mind:
>105609676 >105609743 >105609808 >105609858 >105610000 >105610275 >105610095
--VideoPrism: A versatile video encoder achieving SOTA on 31 of 33 benchmarks:
>105610184
--Sugoi LLM 14B/32B released via Patreon with GGUF binaries and claimed benchmark leads:
>105606204 >105606305 >105606399 >105609562 >105609620
--Interleaving predictions from multiple LLMs via scripts or code modifications:
>105609453 >105609499 >105609500 >105609534
--Hailo-10H M.2 accelerator questioned for real-world AI application viability:
>105602205 >105602335
--Radeon Pro V620 GPU rejected due to driver issues and overheating in LLM use case:
>105603370 >105603394 >105603418 >105603454 >105603762 >105603893 >105604087
--Sycophantic tendencies in cloud models exposed through academic paper evaluation:
>105601903 >105602389 >105602410 >105602064 >105603398 >105603416
--MiniMax-M1, hybrid-attention reasoning models:
>105611241 >105611443
--Qwen3 models released in MLX format:
>105608806
--Miku (free space):
>105601934 >105603103 >105604354 >105604389 >105604736 >105605940 >105606009 >105606217 >105610016 >105610160 >105610284 >105610486 >105611108 >105611119

►Recent Highlight Posts from the Previous Thread: >>105601330

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous
6/16/2025, 5:47:26 PM No.105611523
>it's june 16, 2025 and there is STILL no minimax gguf
Replies: >>105611563
Anonymous
6/16/2025, 5:47:40 PM No.105611524
Nothing ever happens.
Anonymous
6/16/2025, 5:50:51 PM No.105611563
>>105611523
And there never will be. It uses lightning attention
https://github.com/ggml-org/llama.cpp/issues/11290
Replies: >>105611656 >>105612211
Anonymous
6/16/2025, 5:52:46 PM No.105611583
1726164673361456
1726164673361456
md5: 5d26df94f01bd526701143d787512de6🔍
>>105611471
here you go sar, they have a huggingface space
https://huggingface.co/spaces/MiniMaxAI/MiniMax-M1
Replies: >>105611602 >>105611680
Anonymous
6/16/2025, 5:54:03 PM No.105611602
>>105611583
>looking it up in my mind
Replies: >>105612438
Anonymous
6/16/2025, 5:56:23 PM No.105611630
I see that Unsloth uploaded dots.llm1 quants within the last few hours. I've been waiting to try out this model. If I have 96GB VRAM, which is better: IQ4_XS, IQ4_NL, or UD-Q3_K_XL? These are the 3 that look like the largest size I can fit. Tbh I'm not even really sure what all these newer meme quants are or which is supposed to be best.
Anonymous
6/16/2025, 5:58:40 PM No.105611651
does r1 pass the mesugaki test with the new version they released?
Anonymous
6/16/2025, 5:59:12 PM No.105611656
>>105611563
>lightning attention
What's next? Bolt attention?
Replies: >>105611692
Anonymous
6/16/2025, 6:00:02 PM No.105611662
totalen mikunigger death
Hi all, Drummer here...
6/16/2025, 6:01:14 PM No.105611673
> Drummer's merge is already an improvement, yet retains most of Magistral's strengths.

>>105610116

Hey anon, which version did you use and what strengths are you referring to? Was reasoning good and useful?
Replies: >>105611999
Anonymous
6/16/2025, 6:01:42 PM No.105611680
Screenshot 2025-06-16 at 13-00-40 MiniMax M1 - a Hugging Face Space by MiniMaxAI
>>105611583
Here's a version with a bone thrown in.
Anonymous
6/16/2025, 6:03:07 PM No.105611692
>>105611656
I'm holding out for thunder attention
Anonymous
6/16/2025, 6:14:35 PM No.105611805
qte
qte
md5: 566dfea476b42df878517f4ab96d3e71🔍
Replies: >>105611942 >>105611989
Anonymous
6/16/2025, 6:17:55 PM No.105611838
F5 TTS has had bit of upgrade to inference speed recently. If you havent kept up with the update. ~3 different perf update.

>flash atten2
>Empirically Pruned Step Sampling (lower number of steps for high quality output)
>Single transformation instead of 2 step (half the inference time required)
Replies: >>105611985 >>105612989
Anonymous
6/16/2025, 6:20:05 PM No.105611862
https://huggingface.co/moonshotai/Kimi-Dev-72B
https://github.com/MoonshotAI/Kimi-Dev
>We introduce Kimi-Dev-72B, our new open-source coding LLM for software engineering tasks. Kimi-Dev-72B achieves a new state-of-the-art on SWE-bench Verified among open-source models.
>Kimi-Dev-72B achieves 60.4% performance on SWE-bench Verified. It surpasses the runner-up, setting a new state-of-the-art result among open-source models.
>Kimi-Dev-72B is optimized via large-scale reinforcement learning. It autonomously patches real repositories in Docker and gains rewards only when the entire test suite passes. This ensures correct and robust solutions, aligning with real-world development standards.
>Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub. We welcome developers and researchers to explore its capabilities and contribute to development.
Replies: >>105611874 >>105612298 >>105612316 >>105612335 >>105619697
Anonymous
6/16/2025, 6:21:22 PM No.105611874
>>105611862
>code
>local
I sleep
Anonymous
6/16/2025, 6:22:33 PM No.105611887
1740034581240832
1740034581240832
md5: 29931d4451b69913ae7c0a220096163a🔍
>>105611492 (OP)
Uh oh, the 24/7 seething fatoid disgusting ratoid troons transisters didn't like that one, huh?

You will
Never
Ever
Be a
Real
Woman

looooooooooool
Replies: >>105611970 >>105612215
Anonymous
6/16/2025, 6:27:00 PM No.105611942
>>105611805
Yes (Aliens)
Anonymous
6/16/2025, 6:29:22 PM No.105611970
>>105611887
Don't misunderstand. I don't project to be or even want to be a woman. That construct is entirely within your own ass, not even a snug fit, it's spacious.
Miku as a character or concept is irrelevant to me. I like her design and my perception of her is a convenient, often portable twintailed onahole. There is no wanting to be her, she is a sleeve for me to rub one out.
Hope that helps clarify. Goodness, you can't seem to kick the habit of malding.
Keep this up and you'll never get a kurisu.
Anonymous
6/16/2025, 6:30:13 PM No.105611985
>>105611838
I've definitely seen a huge increase in speed for large chunks of texts. With NFE=7

>Declaration of Independence
>8188 characters
>Inference time: 77 seconds
>Output Time: 8 mins 49 seconds of audio

https://vocaroo.com/1oiTcWPgdj6i
Replies: >>105612086
Anonymous
6/16/2025, 6:30:18 PM No.105611989
>>105611805
The second option is a trick, we all know she's not wearing any.
Replies: >>105612018
Anonymous
6/16/2025, 6:30:54 PM No.105611999
>>105611673
Do you know how to properly fine tune MoE models?
Anonymous
6/16/2025, 6:32:29 PM No.105612018
>>105611989
ah! caught!
Anonymous
6/16/2025, 6:40:14 PM No.105612086
>>105611985
This is with 2070 btw. So anyone with better GPU can double/triple the inference speeds.
Anonymous
6/16/2025, 6:40:52 PM No.105612097
file
file
md5: 6c52dd43bbc39d54f336c57b7ae74333🔍
>Looking up (in my mind) some sources
Minimax-M1 knows Teto's birthday and that she's an UTAU. It would be a disappointment if it did not given its size.
this single point of knowledge is irrefutable evidence that proves that the model is good. we'll be back.
Anonymous
6/16/2025, 6:51:15 PM No.105612211
>>105611563
>never
but that was closed saying it could be revisited after refactoring, and they seemed to later do the refactor here:
https://github.com/ggml-org/llama.cpp/pull/12181
>Introduce llm_memory_i concept that will abstract different cache/memory mechanisms. For now we have only llama_kv_cache as a type of memory
and looks like work has picked up on other models with competing cache mechanisms (mamba etc.)
https://github.com/ggml-org/llama.cpp/pull/13979

now we just need someone with motivation, a vibe coding client, and good enough prompt engineering skills to revisit minimax and we're fucking IN
Anonymous
6/16/2025, 6:51:42 PM No.105612215
17441668868331400484052251153607
17441668868331400484052251153607
md5: 20f01356f2f67cc650be53d9c8b833ee🔍
>>105611887
Okay schizo
Replies: >>105612227
Anonymous
6/16/2025, 6:52:56 PM No.105612227
>>105612215
>ACK
Anonymous
6/16/2025, 6:57:39 PM No.105612268
https://huggingface.co/moonshotai/Kimi-Dev-72B
Replies: >>105612298
Anonymous
6/16/2025, 6:59:14 PM No.105612285
>>105610392
don't you need to enable tool use or something like that? are most engines compatible?
Replies: >>105612413
Anonymous
6/16/2025, 7:00:13 PM No.105612298
>>105612268

please search before posting
>>105611862
Replies: >>105612305 >>105612404
Anonymous
6/16/2025, 7:00:59 PM No.105612305
>>105612298
nah go shove a janny dilator up your holes though ;)
Anonymous
6/16/2025, 7:02:27 PM No.105612316
open_performance_white
open_performance_white
md5: e7faeb506e5064ea3fac6ba415c2edbd🔍
>>105611862
>qwen 2 finetune
*yawn*
Also, apologize for Devstral
Replies: >>105612363 >>105612598 >>105619697
Anonymous
6/16/2025, 7:03:48 PM No.105612335
>>105611862
do they actually aim for the moon?
Anonymous
6/16/2025, 7:07:24 PM No.105612363
>>105612316
most meaningless graph award
Anonymous
6/16/2025, 7:12:10 PM No.105612404
>>105612298
ywnnaj
Anonymous
6/16/2025, 7:13:37 PM No.105612413
>>105612285
"tool use" is just sending a json object of available tools in the context and executing whatever tool the model invoked in its reply. That's entirely up to the client making the requests to abstract away. I mostly use llama-server, but any engine that exposes an OpenAI-compatible API should work.
Anonymous
6/16/2025, 7:16:27 PM No.105612438
>>105611602
deepseek (through web) said "making a mental note" to me recently, had no seen that before.
Anonymous
6/16/2025, 7:24:48 PM No.105612527
>>105610000

unfortunately yes, between rooms

>>105610095

that's great, but double the price. Also, I understand a 16GB card can only load small models (but could be used for diffusion)
Replies: >>105612561 >>105612568
Anonymous
6/16/2025, 7:27:53 PM No.105612561
>>105612527
Literally any pc case is portable "between rooms"
Replies: >>105615772
Anonymous
6/16/2025, 7:28:53 PM No.105612568
>>105612527
>portable between rooms
for what motherfucking purpose?
Replies: >>105615772
Anonymous
6/16/2025, 7:31:29 PM No.105612598
>>105612316
DID THE PARETO FRONT JUST DO WHAT I THINK IT DID?
Anonymous
6/16/2025, 7:40:31 PM No.105612703
hello saaars
haven't been keeping up with lmgs since deepsneed released, what's the current meta?
Replies: >>105612715 >>105612769
Anonymous
6/16/2025, 7:41:23 PM No.105612715
>>105612703
deepsneed or nemo.
Replies: >>105612729
Anonymous
6/16/2025, 7:42:24 PM No.105612729
>>105612715
This.
sage
6/16/2025, 7:45:12 PM No.105612769
>>105612703
deepsnemo
Anonymous
6/16/2025, 7:52:21 PM No.105612859
>>105611492 (OP)
She's sexy. Can I look like that? Is there any tech for that?
Replies: >>105612884 >>105612888 >>105613005 >>105614385 >>105617634
Anonymous
6/16/2025, 7:53:42 PM No.105612873
how are LLMs at femdom? beyond the cursory stuff like verbal degredation and humiliation, can they lean more into the power dynamic side and give you orders and encouragement, dictate what you eat, how you dress, more control yet still nurturing
asking for friend
Replies: >>105612995 >>105613064 >>105613075 >>105613470 >>105613494 >>105613982
sage
6/16/2025, 7:54:46 PM No.105612884
3795479221
3795479221
md5: 117eb281b7c6c519828d07e890ef6cb8🔍
>>105612859
yes.
Replies: >>105613005
Anonymous
6/16/2025, 7:55:01 PM No.105612888
>>105612859
arch linus
Replies: >>105613149
Anonymous
6/16/2025, 8:01:34 PM No.105612968
>>105611492 (OP)
>Looks at news
>Nothing but small models and research stuff that can't RP worth shit
Is it over?
Replies: >>105613058 >>105613077
Anonymous
6/16/2025, 8:03:50 PM No.105612989
>>105611838
F5R-TTS is better
Replies: >>105613222 >>105613873
Anonymous
6/16/2025, 8:04:26 PM No.105612995
>>105612873
no
Anonymous
6/16/2025, 8:05:07 PM No.105613005
>>105612859
You need to reroll your char. See this: >>105612884
Replies: >>105613030
Anonymous
6/16/2025, 8:07:34 PM No.105613030
>>105613005
The cooldown for rerolling again is kinda long and early game is ass.
Replies: >>105613139
Anonymous
6/16/2025, 8:11:13 PM No.105613058
>>105612968
We got Magistral and dots last week.
Replies: >>105613759
Anonymous
6/16/2025, 8:11:33 PM No.105613064
>>105612873
tell your friend he has mommy issues
Replies: >>105613087
Anonymous
6/16/2025, 8:12:18 PM No.105613075
>>105612873
There is a trick to it. Tell it to roleplay as a wealthy sadistic werewolf milionare that inexplicably fell in love with his 5/10 average unassuming secretary. Then use an agent to rewrite what it wrote and swap werewolf with dommy mommy of your choice.
sage
6/16/2025, 8:12:24 PM No.105613077
>>105612968
2mw until deepsex V4
Anonymous
6/16/2025, 8:13:15 PM No.105613087
>>105613064
high ground? here? are you actually serious
Replies: >>105613110 >>105614204
Anonymous
6/16/2025, 8:15:11 PM No.105613110
>>105613087
Yes. It is over anakin. Take your mikutroons and walk into the lava yourself
Anonymous
6/16/2025, 8:17:02 PM No.105613139
>>105613030
Yes, this is a problem. I want to look like her in 3-4 years.
Replies: >>105613241
Anonymous
6/16/2025, 8:17:37 PM No.105613149
>>105612888
when regular linus isn't strong enough, take it to the arch linus.
Anonymous
6/16/2025, 8:24:38 PM No.105613219
>>105611492 (OP)
sex with miku
Anonymous
6/16/2025, 8:24:52 PM No.105613222
>>105612989
Thanks but hows the performance between the two?
Anonymous
6/16/2025, 8:26:33 PM No.105613241
>>105613139
You will never be 2d, anon
Anonymous
6/16/2025, 8:29:40 PM No.105613273
>>105611492 (OP)
alice.gguf when?
lmg activate the insider agent and leak it
we will finish her training with antisemitic propaganda and gpu bukkake
Anonymous
6/16/2025, 8:32:44 PM No.105613312
glitchgirl
glitchgirl
md5: 8214aa58b481f001313ead9c9e098925🔍
/lit/fag here. I'm working on an "offline MUD" of sorts and need a writing buddy to ping pong ideas. Chatgpt is good enough but I'm interested in fine-tuning.
What model would you guys recommend for my setup?
>ryzen 7 5700G
>32 GB
>no GPU
Replies: >>105613348 >>105613349 >>105613389
Anonymous
6/16/2025, 8:35:52 PM No.105613346
>do the mesugaki test with R1-0528
>explains it flawlessly, with examples
>fucking ends the message asking me if I want nh codes to illustrate that
>would've posted the log but anons over in aicg got banned for less than that
this is why the chinks are gonna win the ai race
Replies: >>105613377 >>105613382 >>105613760
Anonymous
6/16/2025, 8:35:55 PM No.105613348
>>105613312
SmoLlm-0.15B
Replies: >>105613546
Anonymous
6/16/2025, 8:35:57 PM No.105613349
>>105613312
>fine-tuning
Rent hardware. You're not doing anything with that.
Replies: >>105613546
Anonymous
6/16/2025, 8:38:44 PM No.105613377
>>105613346
anon, the ai models are supposed to do the halluzinations, not you.
Replies: >>105613930
Anonymous
6/16/2025, 8:38:50 PM No.105613382
>>105613346
>>fucking ends the message asking me if I want nh codes to illustrate that
No way.
Anonymous
6/16/2025, 8:39:15 PM No.105613389
>>105613312
>fine-tuning
just tell the model at the start what you want and how to write
https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-Q4_K_S.gguf
https://huggingface.co/bartowski/google_gemma-3-27b-it-qat-GGUF/blob/main/google_gemma-3-27b-it-qat-Q4_0.gguf
Replies: >>105613546
Anonymous
6/16/2025, 8:42:57 PM No.105613432
What coomodel is good these days for a 24GB vramlet?
Anonymous
6/16/2025, 8:45:47 PM No.105613470
>>105612873
I've tried various loli dom/yuri s&m scenarios since GPT-3 came out in 2020.
Often it works poorly needing hand holding. Closed stuff like OpenAI had too much positivity bias.
Open models like Llama was not smart enough, needed hand holding, thus ruining it, same for most small ones.
Closed stuff like Claude (Opus, sometimes Sonnet) would manage it somewhat.
(Open) Command-R managed a bit, but needed hand holding and was very schizo.
From open models, DeepSeek R1 manages it properly, but as most LLMs it will still by default jump on your dick or start sex too quickly, but with careful prompting that explains in a few sentences the desired pacing it manages to ace this almost perfectly, It can go both fast and slow and it's leading the story by itself thus keeping your immersion.
I'd say DeepSeek 3 (the first one) failed at it, but the update works. Both R1's new and old wor, but the new has a slower pace old one was more intense, but both are intense enough if prompted right.
Now maybe the model size is too much for most, but when you consider that closed stuff that did well like Opus 3 is dying ("deprecating") and OpenAI has some positivity bias that often ruins it, I'd say R1 is one of the very few that manage to do it right.
If you accept some degree of hand holding smaller models like the 70b llama and some others managed partially, but considerably more poorly. I haven't seen any 7-13b range manage.
I'm a bit interested in trying it with Magistral sometime, because I've noticed that R1 would sometimes make plans and I would pass some of those CoT plans back to it (selectively retaining think blocks), so that it can lead the story over over many turns, which is a lot more fun than LLMs that forget what they were doing half a page ago or what they intended to do.
tl;dr: with careful prompting it works very well on some big models, mostly R1. DS3 sometimes works, but is gacha. everything else often needs hand holding.
Replies: >>105613951
Anonymous
6/16/2025, 8:48:09 PM No.105613494
>>105612873
you have narrow shoulders and literally zero eyebrow line
Replies: >>105613577
Anonymous
6/16/2025, 8:53:18 PM No.105613546
Screenshot 2025-06-16 152848
Screenshot 2025-06-16 152848
md5: 44648593e503ccdf3f1057a90936e18f🔍
>>105613389
>20+B
I don't want to turn my toaster into a pressure cooker bomb.

>>105613348
>0.15B
Isn't that super small? Still, might be fun to play with, thanks.

>>105613349
Yeah probably fine-tuning is not the right word. I just want some degree of control beyond setting temp and prompting. Pic related is more or less what I expect. Just a dumb box that churns out lore.
Replies: >>105613625
Anonymous
6/16/2025, 8:56:50 PM No.105613577
>>105613494
are you erping with me?
Replies: >>105613608
Anonymous
6/16/2025, 8:59:50 PM No.105613608
>>105613577
Colon three
Anonymous
6/16/2025, 9:01:36 PM No.105613625
>>105613546
Qwen3-30b should give you ~10t/s at low context so it is very suitable for your machine, although I don't know if it is good for writing or whatever you are doing.
Replies: >>105613837
Anonymous
6/16/2025, 9:08:38 PM No.105613690
>>105611419
My bad.

Let's enjoy another Chinese SOTA at 7t/s (ds-r1 runs at 4t/s)
Replies: >>105614491
Anonymous
6/16/2025, 9:15:27 PM No.105613759
>>105613058
Didn't magistral suck, though?
Anonymous
6/16/2025, 9:15:28 PM No.105613760
>>105613346
>nh codes to illustrate that
I believe it. Fucking normalfag shit.
Anonymous
6/16/2025, 9:18:17 PM No.105613788
seated_pantyshot_angle
seated_pantyshot_angle
md5: 1676181f190559b500e86a19f0111b1a🔍
>>105611492 (OP)
This got me thinking about the calculation for minimum non-whore skirt length.
Anonymous
6/16/2025, 9:21:18 PM No.105613814
How is dots for RP? 235b is too big for my rig, but dots seems like it could be a sweet spot.
Replies: >>105613915
Anonymous
6/16/2025, 9:23:12 PM No.105613829
kiryu-on-autism
kiryu-on-autism
md5: b36839487877768757e270a903462b60🔍
It's been quite some time since I've played with local models, has windows + amd gotten any better? It's a pain in the ass to have to boot up linux every time I want to rp
Anonymous
6/16/2025, 9:23:57 PM No.105613837
exosuit
exosuit
md5: 92b235587833d7e9dae666a6e72feb2f🔍
>>105613625
Yeah I think fewer parameters but high context is going to work better for me. But gonna keep that in mind. Thanks.
Anonymous
6/16/2025, 9:26:59 PM No.105613873
>>105612989
code to run it? I can only find papers
Anonymous
6/16/2025, 9:27:11 PM No.105613874
>>>/v/712790873
Replies: >>105613979 >>105614033
Anonymous
6/16/2025, 9:28:29 PM No.105613888
>>105611492 (OP)
I currently have a server with a ton of CPU cores and spare RAM, but it only has a 1050ti with 4GB VRAM in it. Is it even worth trying to run a local language model on it?
Replies: >>105614327 >>105614358
Anonymous
6/16/2025, 9:31:41 PM No.105613915
>>105613814
it's noticeably pretty sterile when it comes to explicit nsfw, but that's nothing new for that general size range. if you're used to llama/qwen2.5 70b-class derivatives you'll feel right at home, but at least dots may be faster and have some more knowledge
Replies: >>105613962
Anonymous
6/16/2025, 9:33:43 PM No.105613930
>>105613377
You killed me fucker, kek
Anonymous
6/16/2025, 9:35:33 PM No.105613951
>>105613470
Are you doing a thesis on the topic or something?
Replies: >>105614741
Anonymous
6/16/2025, 9:36:47 PM No.105613962
>>105613915
I liked 72b EVA-Qwen 2.5 at IQ4_XS, though it ran really slow on my system (1-2t/s). If this performs anything like that, but with the speed of a MoE, then it sounds like it's for me.
Replies: >>105613983
Anonymous
6/16/2025, 9:36:54 PM No.105613963
I'm downloading Llama-4-Scout-17B-16E-Instruct-UD-TQ1_0.gguf.
Wish me luck.
I might not survive it.
Replies: >>105613987 >>105613989 >>105614351
Anonymous
6/16/2025, 9:38:52 PM No.105613979
>>105613874
Gayest thread on /v/ right now.
Replies: >>105614033
Anonymous
6/16/2025, 9:39:05 PM No.105613982
>>105612873
That's not femdom
Anonymous
6/16/2025, 9:39:19 PM No.105613983
>>105613962
That's an RP tune.
Anonymous
6/16/2025, 9:39:38 PM No.105613987
Screenshot 2025-06-16 133922
Screenshot 2025-06-16 133922
md5: a517be8fb15c6767a3dc0b307fa77f5e🔍
>>105613963
That's beyond being 'funny' bad. Also it's insane that unsloth's Scout repo has 100k downloads in the past month. That has to be wrong
Replies: >>105614005
Anonymous
6/16/2025, 9:39:43 PM No.105613989
>>105613963
You'll survive fine anon, even the full precision model is shit and retarded
Replies: >>105614005
Anonymous
6/16/2025, 9:41:46 PM No.105614005
demons in microstructures
demons in microstructures
md5: a63ca6a41f6fcbede08e84bcb3ec6639🔍
>>105613987
>That's beyond being 'funny' bad
I know, pray for me.

>>105613989
I feel this one might be so bad as to be a cognitohazard.
We'll see.
Replies: >>105614130 >>105614248 >>105614351
Anonymous
6/16/2025, 9:44:42 PM No.105614033
>>105613874
>>105613979
How did mikutroons become more mentally ill than furfags? At least those retards contribute to image gen and keep to their own degenerate communities instead of spamming the same generic dogshit of a waifu they obsess over everywhere because they have nothing else in their miserable life to attach to.
Replies: >>105614260
Anonymous
6/16/2025, 9:52:34 PM No.105614104
hi anons, i know that this isn't the best thread to ask about commercial things, but... what are the services where I can deploy sdxl/etc finetuned models (anime ones) for easy API access? Obviously one choice is renting GPU servers on runpod/vast and so on, but are there any managed solutions? I don't think I need a dedicated GPU server to start, but eventually I guess I might need to generate up to 100 images/minute or something like that.
Anonymous
6/16/2025, 9:53:41 PM No.105614114
AI generated post >105614104 btw
Anonymous
6/16/2025, 9:54:58 PM No.105614130
>>105614005
>that pic
Baka, go back to /x/
Replies: >>105614148
Anonymous
6/16/2025, 9:57:19 PM No.105614148
Caenemung
Caenemung
md5: 315e5d64caccb234599bca17f099f43b🔍
>>105614130
>105614104
I say this with utmost sincerity.
I've been on 4chan since 2008. I have been to /x/ maybe 5 times total.
As in, individual instances.
Anyhow, finished downloaded it. Let's see what happens.
If I don't report back, please call my parents.
Replies: >>105614248 >>105614351
Anonymous
6/16/2025, 10:02:53 PM No.105614204
20241007_021413
20241007_021413
md5: a60a792ae918ecdc2aacda0d7c1b3947🔍
>>105613087
to be fair, if you post on 4chan -- people WILL make fun of you. even if they're just as depraved.
but hey, i hope you find your perfect jerk-off mommy machine bro.
Replies: >>105614222
Anonymous
6/16/2025, 10:04:39 PM No.105614222
>>105614204
s&m is vanilla compared to marrying a cartoon
that's not even a fetish, it's psychosis
Anonymous
6/16/2025, 10:06:43 PM No.105614248
>>105614005
>>105614148
>load_tensors: layer 0 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 1 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 2 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 3 assigned to device CUDA0, is_swa = 0
>load_tensors: layer 4 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 5 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 6 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 7 assigned to device CUDA0, is_swa = 0
>load_tensors: layer 8 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 9 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 10 assigned to device CUDA0, is_swa = 1
Is this supposed to be a thing? Interleaved swa layers?
That's how they got """1 million""" context?
Replies: >>105614266 >>105614351
Anonymous
6/16/2025, 10:08:13 PM No.105614260
_ec22462d-fa92-4d3c-ba24-88fb265ee83d
_ec22462d-fa92-4d3c-ba24-88fb265ee83d
md5: 4c2d9f4ecfacf22e48ddfd8fcfdf0ce5🔍
>>105614033
Through the power of your butthurt, you have now summoned Migu.

I wonder if there will be a Blackwell card with 48GB? I don't need 96, and 32 just isn't enough. 48GB is about right. It just seems a little overboard to spend $8500 on a GPU.
Replies: >>105614372 >>105614527
Anonymous
6/16/2025, 10:08:49 PM No.105614264
Can't wait for Minimax to get supported. I'll abandon Deepseek for it.
Replies: >>105614304
Anonymous
6/16/2025, 10:08:58 PM No.105614266
>>105614248
Scout is 10 million sir.
Replies: >>105614351
Anonymous
6/16/2025, 10:12:38 PM No.105614304
>>105614264
Buy an ad
Anonymous
6/16/2025, 10:15:22 PM No.105614327
>>105613888
sure. Just run with llama.cpp in pure CPU inference mode (or really low context on the GPU)
It'll be a bit slow, but slow is ok for playing with. You'll be better off than desktop anons stuck with 128GB max RAM capacities that can't even run big models.
Anonymous
6/16/2025, 10:17:55 PM No.105614351
>>105614266
All the same when it breaks down after 8k.

>>105613963
>>105614005
>>105614148
>>105614248
Yeah, it's real bad.
Worse than Qweb 3 30B. It can't even keep up with outputting a specific pattern that much smaller models can just fine.
Amazing.
Replies: >>105614374
Anonymous
6/16/2025, 10:18:40 PM No.105614358
>>105613888
1.5 t/s for deepseek quants
3 t/s for qwen3-235b

Your GPU will be used for promp processing only
Anonymous
6/16/2025, 10:19:53 PM No.105614372
>>105614260
I remember this Miku
>I don't need 96
yes you do. Big batch size genning of big Migus.
Anonymous
6/16/2025, 10:20:20 PM No.105614374
>>105614351
breaks down much sooner than 8k even
https://github.com/adobe-research/NoLiMa?tab=readme-ov-file#results
Anonymous
6/16/2025, 10:21:18 PM No.105614383
Less than two weeks until we get open source Ernie 4.5/X1
Anonymous
6/16/2025, 10:21:31 PM No.105614385
>>105612859
Install gentoo
Replies: >>105614481
Anonymous
6/16/2025, 10:33:28 PM No.105614479
Getting 10 t/s with dots on my 96 GB gaming rig and normal Llama.cpp with a custom -ot.
Replies: >>105614491 >>105614515 >>105614535
Anonymous
6/16/2025, 10:33:51 PM No.105614481
>>105614385
>Gentroon
It is in the name. I have seen tech shrek. I won't be fooled.
Replies: >>105614507
Anonymous
6/16/2025, 10:35:31 PM No.105614491
>>105614479
Post command params and which quant you use

Also, >>105613690
>another Chinese SOTA at 7t/s
I saw it coming
Replies: >>105614503
Anonymous
6/16/2025, 10:36:38 PM No.105614497
One day a sex model will finally drop and i will be free from this place. I hope you all die the next day.
Anonymous
6/16/2025, 10:36:54 PM No.105614503
>>105614491
Hold on. I'm actually failing to allocate more context. I was testing with only 2k initially. Damn does this model not use MLA or even GQA?
Replies: >>105614519
Anonymous
6/16/2025, 10:37:26 PM No.105614507
tenor.gif_thumb.jpg
tenor.gif_thumb.jpg
md5: 2369ccdada7e9015236ef098f34e60ea🔍
>>105614481
Anonymous
6/16/2025, 10:38:23 PM No.105614515
>>105614479
Well, is it good?
Replies: >>105614521
Anonymous
6/16/2025, 10:38:58 PM No.105614519
>>105614503
>gayming
What GPU?
Replies: >>105614529
Anonymous
6/16/2025, 10:39:09 PM No.105614521
>>105614515
Idk yet i just wanted to do an initial speed test first but trying to give more context is giving me the ooms.
Anonymous
6/16/2025, 10:40:04 PM No.105614527
>>105614260
>butthurt
2007 called your hrt ass
Anonymous
6/16/2025, 10:40:09 PM No.105614529
>>105614519
Just a 3090
Replies: >>105614544
Anonymous
6/16/2025, 10:40:32 PM No.105614535
>>105614479
>with a custom -ot
Suggested by unsloth brothers?
Replies: >>105614542
Anonymous
6/16/2025, 10:41:30 PM No.105614542
>>105614535
No? I am using unsloth's q4 quant though.
Anonymous
6/16/2025, 10:41:43 PM No.105614544
>>105614529
On ik_llama-cli or the original?
Anonymous
6/16/2025, 10:57:47 PM No.105614673
>https://github.com/ggml-org/llama.cpp/issues/14044#issuecomment-2961375166
>since it uses MHA rather than GQA or MLA
ACK
Replies: >>105614695
Anonymous
6/16/2025, 11:01:00 PM No.105614695
>>105614673
GQA and its devil offspring are the true killers of soulful outputs. This was common knowledge back during the llama2 era and the first command-r was good because it used natural attention as well
Replies: >>105614739 >>105614871
Anonymous
6/16/2025, 11:07:26 PM No.105614739
>>105614695
Ok but what if dots doesn't have soulful outputs
Anonymous
6/16/2025, 11:07:28 PM No.105614741
>>105613951
sybau nigger, let the anon talk. Finally someone shares their own experiences instead of just shitposting
Anonymous
6/16/2025, 11:10:12 PM No.105614758
Ok so it looks like I can't squeeze more than 11k context out of dots for the amount of memory I have, and now I am also getting 8.8 t/s (at 0 context, generating 100 tokens). Guess I'll test it a bit to see if it's worth downloading Q3 for.
Replies: >>105614873
Anonymous
6/16/2025, 11:25:40 PM No.105614871
>>105614695
This. GQA kills that feeling that the model *gets* what you mean.
Anonymous
6/16/2025, 11:25:48 PM No.105614873
>>105614758
thank for the info
Anonymous
6/16/2025, 11:28:53 PM No.105614895
we need some madlad company to get rid of the tokenizer and train the model on unicode
Anonymous
6/16/2025, 11:33:45 PM No.105614932
>>105611492 (OP)
I'm completely new to this. Should I look further into lmgs if I don't care about chatting and image gen? Will I need a dedicated build or will my pc do?
Replies: >>105615858
Anonymous
6/16/2025, 11:45:04 PM No.105614993
guys, what will save local?
Replies: >>105615046 >>105615060 >>105615100 >>105615313 >>105619181
Anonymous
6/16/2025, 11:47:42 PM No.105615012
we are already saved.
Replies: >>105615065
Anonymous
6/16/2025, 11:52:55 PM No.105615046
>>105614993
BitNet OpenGPT
Anonymous
6/16/2025, 11:55:26 PM No.105615060
>>105614993
miku
Replies: >>105615100
Anonymous
6/16/2025, 11:55:52 PM No.105615065
>>105615012
I don't feel saved
Anonymous
6/16/2025, 11:59:26 PM No.105615090
Ok I tested dots and it's really meh. Feels like any other LLM really and on top of that, the trivia knowledge is also not really that good in my tests. No better than Gemma or GLM-4. MAYBE a bit better than Qwen. What trivia did people test that it had better performance on? I didn't do better on the /lmg/ favorites like mesugaki at least, and not on my personal set.
Replies: >>105615112
Anonymous
6/17/2025, 12:01:02 AM No.105615100
annoyed angry miku pointing gen ComfyUI 2025-06-16-15_00011_(1)
>>105614993
New paradigm. Eternal waiting room until then. Every possible LLM sucks. It's over.
More realistically I'd like to see more online learning experiments or papers. Like a live feedback thumbs up/down. Not to save local, but to keep my own curiosity alive even if the resulting implementations make the models retarded, slow, broken or anything. Something new to play with.
>>105615060
Miku's love
Anonymous
6/17/2025, 12:02:15 AM No.105615106
Can I generate porn and or 3d models on a 3090Ti?
Replies: >>105615149
Anonymous
6/17/2025, 12:02:46 AM No.105615112
>>105615090
>What trivia did people test that it had better performance on?
Even hilab admits the only thing it has better knowledge of is Chinese language trivia knowledge. For anything else it's beaten by fucking Qwen 2.5 72B.
Anonymous
6/17/2025, 12:08:34 AM No.105615149
>>105615106
No, you're retarded
Anonymous
6/17/2025, 12:27:05 AM No.105615313
>>105614993
openai's SOTA phone model will shift the pareto frontier of speed x safety
Anonymous
6/17/2025, 12:30:25 AM No.105615335
Trying to use a local model to make enhancements to tesseract OCR. Tesseract is fairly good without AI, but my ultimate goal is structured output of receipt data, so I can easily port it into hledger

What models would be best for this sort of thing? I've used ollama with some of the vision models and results haven't been great so far
Replies: >>105615373
Anonymous
6/17/2025, 12:34:54 AM No.105615360
some migus & friends
https://www.mediafire.com/folder/et1b18ntkdlac/vocaloid
Replies: >>105615415 >>105615419 >>105615433 >>105615438 >>105615727 >>105615887 >>105615959 >>105616569 >>105619906
Anonymous
6/17/2025, 12:36:33 AM No.105615373
>>105615335
Why would you need a vision model after you've OCRed the receipt to text?
Anonymous
6/17/2025, 12:41:35 AM No.105615415
>>105615360
>can't download the entire folder without a premium account
Replies: >>105615430
Anonymous
6/17/2025, 12:42:10 AM No.105615419
>>105615360
>mfw I've been saving them all manually
Replies: >>105615430
Anonymous
6/17/2025, 12:43:29 AM No.105615430
file
file
md5: ff49862ffe0f835ff493ddcf9eef34dd🔍
>>105615415
>>105615419
sorry, first filehost that came to mind. you can use jDownloader.
I know some other people download em so if you want to make up a more complete collection feel free
my collection, ironically, is probably more incomplete due to catastrophic data loss.
Replies: >>105615727 >>105615743
Anonymous
6/17/2025, 12:43:53 AM No.105615433
>>105615360
Thank you Migu genner
Anonymous
6/17/2025, 12:44:56 AM No.105615438
>>105615360
>all .jpg
i curse you!
Anonymous
6/17/2025, 12:45:25 AM No.105615443
Minimax was very obviously trained on the old R1. The thinking process is the same endlessly long plain text where the model tries to think about even the most trivial shit. It even sometimes deliberately gets things wrong at first just to be able to correct itself and think some more.
Anonymous
6/17/2025, 12:59:36 AM No.105615535
>llama.cpp
>warming up the model with an empty run - please wait ... (--no-warmup to disable)

can I just skip warmup for good?
Replies: >>105615582
Anonymous
6/17/2025, 1:06:27 AM No.105615582
>>105615535
"Warming up?" You don't know the meaning of those words, Bardin.
Anonymous
6/17/2025, 1:07:42 AM No.105615587
Do people use the quantized context with llama.cpp?
Replies: >>105615598
Anonymous
6/17/2025, 1:08:28 AM No.105615595
https://huggingface.co/bartowski/rednote-hilab_dots.llm1.inst-GGUF

bartgoatski quants are up. gogogo
Replies: >>105615710 >>105615716
Anonymous
6/17/2025, 1:08:39 AM No.105615598
>>105615587
Yeah, but I wouldn't call them "people"
Anonymous
6/17/2025, 1:25:12 AM No.105615710
>>105615595
does it need a llamacpp update?
Anonymous
6/17/2025, 1:25:44 AM No.105615716
>>105615595
Shit llm for copers that somehow still dont have even 128gb ram for sneedsex
Anonymous
6/17/2025, 1:27:19 AM No.105615727
>>105615360
>>105615430
Host a public FTP server, you coward.
Anonymous
6/17/2025, 1:28:50 AM No.105615743
>>105615430
>first filehost that came to mind
makes sense that a mikutroon's mind is retarded
Replies: >>105615767
Anonymous
6/17/2025, 1:29:03 AM No.105615745
Ive being seeing stupid adds for this other half ai anime bullshit...is there anything to approximate locally(plug in some llm to like an anime vroid model or something that has maybe limited voice rec)
Replies: >>105615817
Anonymous
6/17/2025, 1:32:31 AM No.105615767
>>105615743
can you point at, precisely, what aspect of miku makes it at all relevant to trans
not the people coopting the design and changing it, the official design, as per crypton future media
you're been throwing around this trans/agp thing for literal months if not years at this point and yet you've failed to even once properly ground your point or lack thereof in any actual sense
nobody's looking at miku and thinking of sex change surgeries that's all you
nobod- why am I even bothering you're clearly off your meds
Replies: >>105615788 >>105615805 >>105615819
Anonymous
6/17/2025, 1:33:34 AM No.105615772
>>105612561
fair enough, but you probably dont want to haul a full blown desktop like in the good lan days everyday. Still, if you have a better rec with desktop form factor Im happy to listen

>>105612568
living in a shoebox, cant use the same room all the time, its functionality is time multiplexed
Anonymous
6/17/2025, 1:36:14 AM No.105615788
>>105615767
Xir, this is a trans website. Nobody would be obsessed with this obsolete design if he wasn't a real woman.
Replies: >>105615794
Anonymous
6/17/2025, 1:36:56 AM No.105615794
>>105615788
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
whatever retard keep thinking of cock
Anonymous
6/17/2025, 1:38:07 AM No.105615805
>>105615767
>you're been throwing around this trans/agp thing for literal months if not years at this point and yet you

>only 1 person realized that mikuniggers are retards who just spam their dogshit mascot obsessively and almost never ever have a single based opinion imaginable that they post in the thread despite being in the thread every day and despite the many actual trannies and faggots that raid the threads but never got told off by a single mikunigger avatarposter, instead they ignore those people and keep posting their generic trash obsession waifu
yeah... i wonder why people dislike you
Replies: >>105615820
Anonymous
6/17/2025, 1:39:34 AM No.105615817
>>105615745
You don't have the IQ to run that
Anonymous
6/17/2025, 1:39:38 AM No.105615819
>>105615767
>nobod- why am I even bothering you're clearly off your meds
American brown tumblr tier writing btw
Anonymous
6/17/2025, 1:39:38 AM No.105615820
>>105615805
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
you could just answer
Replies: >>105615842
Anonymous
6/17/2025, 1:43:12 AM No.105615842
>>105615820
I concede you aren't a troon. Now continue not being a troon by no longer spamming that worthless avatar. And if you continue then... well you admit you are a disgusting troon.
Replies: >>105615867
Anonymous
6/17/2025, 1:46:01 AM No.105615858
>>105614932
vector databases and semantic search
Anonymous
6/17/2025, 1:47:06 AM No.105615867
>>105615842
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
for someone who rants and raves endlessly about proper argumentation and logic you're proper fucking shit at it
never expect me to reply to your bs again
Replies: >>105615880
Anonymous
6/17/2025, 1:48:59 AM No.105615880
>>105615867
Well there you go that is how we know you are a disgusting troon and you have AGP fantasies focusing on that retarded avatar you keep pushing on everyone. I recommend joining the 41%
Anonymous
6/17/2025, 1:50:03 AM No.105615887
>>105615360
https://multiup.io/download/f927ee16eeea9bf6db1576a0d0c1f536/xx.zip
single file
Replies: >>105615944 >>105615959
Anonymous
6/17/2025, 1:56:42 AM No.105615944
>>105615887
Thank you. Download finished in a couple seconds. Much better than fucking with jDownloader.
Anonymous
6/17/2025, 1:58:41 AM No.105615959
>>105615360
>>105615887
An artifact to be preserved
Anonymous
6/17/2025, 1:59:53 AM No.105615963
1722130950934-1
1722130950934-1
md5: bcfbfada987138aebea2f0403a5b3a7a🔍
>load 0.6B model
>rig starts screeching like it's getting fistfucked by Satan himself

>load 8B model
>rigs handles it just fine

Okay I'm way over my head here, guys.
Replies: >>105615971 >>105615982 >>105616370
Anonymous
6/17/2025, 2:00:44 AM No.105615971
>>105615963
It likes its models small and open
Anonymous
6/17/2025, 2:02:46 AM No.105615982
>>105615963
First case it ran on CPU
Anonymous
6/17/2025, 2:06:02 AM No.105616002
__hatsune_miku_and_kasane_teto_vocaloid_and_1_more_drawn_by_rtm1016__2548c07e192f6bfb5b2c0097987e1f44
Anonymous
6/17/2025, 2:35:21 AM No.105616197
Ive been running mythomax for years now, and I just upgraded to a 5080. Whats the new meta for coom llms?
Replies: >>105616261 >>105616387
Anonymous
6/17/2025, 2:45:00 AM No.105616261
>>105616197
there is still nothing better than mythomax
Replies: >>105616340
Anonymous
6/17/2025, 2:56:16 AM No.105616340
>>105616261
is there a way I can finetrain it a bunch of specific fetish smut to make it better?
Replies: >>105616408
Anonymous
6/17/2025, 3:00:05 AM No.105616370
>>105615963
0.6b needs very little bandwidth, so compute usage goes up (and so do the fans). 8b needs more bandwidth, so it spends more time just waiting for memory to reach registers to compute, giving it time to chill.
Replies: >>105616422
Anonymous
6/17/2025, 3:02:52 AM No.105616387
>>105616197
Try Cydonia
Anonymous
6/17/2025, 3:06:12 AM No.105616408
>>105616340
Yes you can finetune if you want (but not on a single 5080), but you really should just read a thread because this question gets asked every single god damn thread and if you look at the last thread you would be able to find at least 5 different recommendations for someone in your situation
Anonymous
6/17/2025, 3:08:06 AM No.105616422
>>105616370
This. When my CPU is doing prompt processing the fans go full blast, but once it starts generating tokens they calm down.
Anonymous
6/17/2025, 3:11:19 AM No.105616441
file
file
md5: 8d8a0533f21928de0e48a54c29d86369🔍
can someone generate neutral looking anime women pictures so i can use them for my blogposts?
Replies: >>105616511
Anonymous
6/17/2025, 3:20:27 AM No.105616511
>>105616441
You could.
Replies: >>105616512
Anonymous
6/17/2025, 3:20:47 AM No.105616512
>>105616511
i have rx 6600
Replies: >>105616521
Anonymous
6/17/2025, 3:22:56 AM No.105616521
>>105616512
Most image gen is python based. I don't know how if they work with amd. Try stablediffusion.cpp. Should probably work with vulkan.
Anonymous
6/17/2025, 3:29:38 AM No.105616569
>>105615360
ピクシブのリンクでいいでは?
Replies: >>105619751
Anonymous
6/17/2025, 3:52:53 AM No.105616734
1597786378292
1597786378292
md5: bf9063314c4fa43c05af7956b21a0101🔍
justpaste (DOTit) GreedyNalaTests

Added:
dans-personalityengine-v1.3.0-24b
Cydonia-24B-v3e
Broken-Tutu-24B-Unslop-v2.0
Delta-Vector_Austral-24B-Winton
Magistral-Small-2506
medgemma-27b-text-it
Q3-30B-A3B-Designant
QwQ-32B-ArliAI-RpR-v4
TheDrummer_Agatha-111B-v1-IQ2_M
Qwen3-235B-A22B-Q5_K_M from community

Been preoccupied for a while but now I'm caught up. 235B was given a star rating, the others had no stars and no flags, they're just the same old really.

Looking for contributions:
Deepseek models
dots.llm1.inst
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the EXACT prompt sent to the backend, in addition to the output. And your backend used + pull datetime. Also a link to the quant used, or what settings you used to make your quant.
Replies: >>105616768 >>105616800 >>105617143 >>105617582 >>105617625
Anonymous
6/17/2025, 3:57:18 AM No.105616768
>>105616734
Long time no see
Replies: >>105616867
Hi all, Drummer here...
6/17/2025, 4:01:38 AM No.105616800
>>105616734
Could you test...

Cydonia-24B-v3i
Cydonia-24B-v3j

They're both v3.1 candidates.

I'm also curious about...

Cydonia-24B-v3f and Cydonia-24B-v3g but more for research purposes.

Big fan of your work!
Replies: >>105616867
Anonymous
6/17/2025, 4:12:42 AM No.105616867
>>105616768
Yee

>>105616800
I'll be honest, I don't feel like spending my kind of slow internet downloading all that. You could just copy and paste the prompts into mikupad and get the outputs yourself pretty easily. If you simply just want all the outputs archived in one place, I do take contributions and will add them if you give them (and of course I will read/rate them).
Anonymous
6/17/2025, 4:55:17 AM No.105617143
>>105616734
>235B was given a star rating
"235b is bad" bros how are we coping with being empirically proven wrong
Replies: >>105619640
Anonymous
6/17/2025, 4:55:49 AM No.105617146
Base Image
Base Image
md5: c7a9a5811f798f5f53e590a5396ed912🔍
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
https://arxiv.org/abs/2506.13284
>In this work, we investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models. We begin by curating the SFT training data through two scaling strategies: increasing the number of collected prompts and the number of generated responses per prompt. Both approaches yield notable improvements in reasoning performance, with scaling the number of prompts resulting in more substantial gains. We then explore the following questions regarding the synergy between SFT and RL: (i) Does a stronger SFT model consistently lead to better final performance after large-scale RL training? (ii) How can we determine an appropriate sampling temperature during RL training to effectively balance exploration and exploitation for a given SFT initialization? Our findings suggest that (i) holds true, provided effective RL training is conducted, particularly when the sampling temperature is carefully chosen to maintain the temperature-adjusted entropy around 0.3, a setting that strikes a good balance between exploration and exploitation. Notably, the performance gap between initial SFT models narrows significantly throughout the RL process. Leveraging a strong SFT foundation and insights into the synergistic interplay between SFT and RL, our AceReason-Nemotron-1.1 7B model significantly outperforms AceReason-Nemotron-1.0 and achieves new state-of-the-art performance among Qwen2.5-7B-based reasoning models on challenging math and code benchmarks, thereby demonstrating the effectiveness of our post-training recipe.
https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
Isn't posted yet. pretty interesting
Anonymous
6/17/2025, 4:59:10 AM No.105617169
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
https://arxiv.org/abs/2506.13585
Not sure if they posted a paper when they released their model but the arxiv version is up now
The Amazon Nova Family of Models: Technical Report and Model Card
https://arxiv.org/abs/2506.12103
paper from amazon. doesn't seem like they're open sourcing anything so w/e
Anonymous
6/17/2025, 5:06:46 AM No.105617224
Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
https://arxiv.org/abs/2506.13681
>Sampling from language models impacts the quality and diversity of outputs, affecting both research and real-world applications. Recently, Nguyen et al. 2024's "Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs" introduced a new sampler called min-p, claiming it achieves superior quality and diversity over established samplers such as basic, top-k, and top-p sampling. The significance of these claims was underscored by the paper's recognition as the 18th highest-scoring submission to ICLR 2025 and selection for an Oral presentation. This paper conducts a comprehensive re-examination of the evidence supporting min-p and reaches different conclusions from the original paper's four lines of evidence. First, the original paper's human evaluations omitted data, conducted statistical tests incorrectly, and described qualitative feedback inaccurately; our reanalysis demonstrates min-p did not outperform baselines in quality, diversity, or a trade-off between quality and diversity; in response to our findings, the authors of the original paper conducted a new human evaluation using a different implementation, task, and rubric that nevertheless provides further evidence min-p does not improve over baselines. Second, comprehensively sweeping the original paper's NLP benchmarks reveals min-p does not surpass baselines when controlling for the number of hyperparameters. Third, the original paper's LLM-as-a-Judge evaluations lack methodological clarity and appear inconsistently reported. Fourth, community adoption claims (49k GitHub repositories, 1.1M GitHub stars) were found to be unsubstantiated, leading to their removal; the revised adoption claim remains misleading. We conclude that evidence presented in the original paper fails to support claims that min-p improves quality, diversity, or a trade-off between quality and diversity.
RIP minp
Replies: >>105617522
Anonymous
6/17/2025, 5:15:00 AM No.105617281
Base Image
Base Image
md5: 442f552622e5901baf669ab97c00badb🔍
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
https://arxiv.org/abs/2506.12040
>Binary quantization represents the most extreme form of large language model (LLM) compression, reducing weights to ±1 for maximal memory and computational efficiency. While recent sparsity-aware binarization methods achieve sub-1-bit compression by pruning redundant binary weights, they suffer from three critical challenges: performance deterioration, computational complexity from sparse mask management, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit LLM quantization framework that leverages adaptive weight transformation and binary pattern clustering to overcome these limitations, delivering both superior accuracy and efficiency. Our approach incorporates two key innovations: (1) a Learnable Transformation that optimizes invertible scaling and rotation matrices to align binarized weights with full-precision distributions, enabling incoherence processing to enhance layer-wise representation quality; (2) a Flash and Accurate Binary Codebook that identifies recurring binary vector clusters, compressing them into compact indices with tailored distance metrics and sign-based centroid updates. This eliminates the need for sparse masks, enabling efficient inference on standard hardware.
https://github.com/Chooovy/BTC-LLM
bpw below bitnet...
Anonymous
6/17/2025, 5:59:26 AM No.105617522
>>105617224
>After we showed these results to the authors, they informed us that we had run our experiments using the “Llama" formatting of GSM8K prompts as we used the command from the authors’ public Colab notebook; the authors clarified that "Llama" formatting should be used only for Llama models. We then reran our experiments using standard formatting of GSM8K prompts. The results were nearly identical (Appendix B), with one small difference: min-p does produce higher scores for 2 of 12 language models. Again, we conclude min-p does not outperform other samplers on either formatting of GSM8K when controlling for hyperparameter volume.
Why would you want to publish your ignorance of chat templates and wasting of ~3000 Nvidia A100-hours of compute as a result? Instead of engendering confidence in your findings, this just makes you come across as petty and seething.
Anonymous
6/17/2025, 6:08:05 AM No.105617582
>>105616734
All those new 24b mistral slops and not a gem among them.
Anonymous
6/17/2025, 6:11:57 AM No.105617604
will they ever release the multimodal qwen 3
Replies: >>105617608
Anonymous
6/17/2025, 6:12:28 AM No.105617608
>>105617604
don't worry, qwen 4 will be omni and smart
Anonymous
6/17/2025, 6:14:31 AM No.105617625
>>105616734
cockbench?
Anonymous
6/17/2025, 6:15:30 AM No.105617634
>>105612859
hrt.coffee
Anonymous
6/17/2025, 6:16:27 AM No.105617637
so what's the general consensus here on doing RP with reasoning? It's shit, right? It kinda improves creativity but not by much
Replies: >>105617645 >>105617684 >>105617864 >>105618521
Anonymous
6/17/2025, 6:18:08 AM No.105617645
>>105617637
It depends. I've found that if you do a lot of complex math, logic, and programming in your RPs you'll notice a massive difference.
Hi all, Drummer here...
6/17/2025, 6:23:25 AM No.105617684
>>105617637
Prefilling the think block to have reasoning act like a director/storywriter seems to help. I've only tried it on the new R1 though.
Anonymous
6/17/2025, 6:32:22 AM No.105617731
Can I have a general mesugaki card sample?
Replies: >>105620373
Anonymous
6/17/2025, 6:35:44 AM No.105617753
__kaai_yuki_and_hiyama_kiyoteru_vocaloid_drawn_by_naoto_sakurai__1827a0f038230efcfe7748f02f9382be
>the purpose of benchmaxxing on math is to improve the quality of RPs where anon is a grade-school math teacher
Anonymous
6/17/2025, 6:50:22 AM No.105617864
>>105617637
It definitely makes it worse for magistral, it actually makes it less likely to follow the sys prompt
Anonymous
6/17/2025, 7:43:47 AM No.105618208
Can you redpill me on WizardLM and Miqu? They seem quite large models, did anyone actually use them at larger quants?
Replies: >>105618276 >>105618366 >>105618436 >>105618500
Anonymous
6/17/2025, 7:56:20 AM No.105618276
>>105618208
Buy an ad.
Replies: >>105618364
Anonymous
6/17/2025, 8:05:17 AM No.105618329
Given how close we are to AGI, is it safe to say that Europe has zero chance of entering the running? What are the odds there's some dark horse AI lab that has been building in secret on the continent?
Replies: >>105618428 >>105619257 >>105619427
Anonymous
6/17/2025, 8:12:08 AM No.105618364
>>105618276
Yeah cause those are the current hot thing.
Anonymous
6/17/2025, 8:12:55 AM No.105618366
>>105618208
I see that you woke up from a year old coma. Qwen 3 235b repaces WizardLM directly and if you got 128gb ram +16/24gb vram dynamic R1 replaces that https://unsloth.ai/blog/deepseekr1-dynamic
Replies: >>105618665
Anonymous
6/17/2025, 8:23:11 AM No.105618426
Anyone knows what multi modal model used on smash-or-pass-ai site? Can abliterated Gemma-3 do this?
Replies: >>105618490
Anonymous
6/17/2025, 8:23:28 AM No.105618428
>>105618329
AGI will start with mistral-nemo 2
Anonymous
6/17/2025, 8:24:05 AM No.105618436
>>105618208
No, they didn't. People would usually try to fit as much as they could into a single 3090 because that's what everyone was using (and probably what most people still are using)
Replies: >>105618481
Anonymous
6/17/2025, 8:32:55 AM No.105618481
>>105618436
>probably what most people still are using
I doubt that
Replies: >>105618834
Anonymous
6/17/2025, 8:34:02 AM No.105618490
>>105618426
I think Gemini 2.5 Flash probably. If you click the websim button you can edit it.
Replies: >>105618562
Anonymous
6/17/2025, 8:35:55 AM No.105618500
>>105618208
they are more sovl compared to what we have now but they are also noticeably more retarded
Anonymous
6/17/2025, 8:39:12 AM No.105618521
>>105617637
Reasoning feels like is the right place for the model to plan ahead and maintain state, but you'd have to keep at least 2 reasoning traces in context, which is different than how they've been trained (mostly single-turn math questions). And Gemma 3 works better with fake reasoning for RP than Magistral Small natively trained on it does.

Another problem of reasoning is that it dilutes attention to the instructions, so ideally you'd want to keep instructions high in the context, but again models aren't trained for it, so it often gives issues. Ironically, models not trained with system instructions in mind (just to follow whatever user says) may work better for that.

On the other hand, I find that reasoning tends to decrease repetitive patterns in the actual responses. It's just that Mistral Small and by extension Magistral suck for these uses and they're only good for saying "fuck", "cock" and "pussy". If you're OK with just that...
Anonymous
6/17/2025, 8:45:16 AM No.105618562
>>105618490
>Gemini 2.5
Isn't it censored?
Anonymous
6/17/2025, 8:59:19 AM No.105618654
server cpu cucks will lose their time in the spotlight
MoEs will not scale
Replies: >>105618688
Anonymous
6/17/2025, 9:01:39 AM No.105618665
>>105618366
>Qwen 3 235b repaces WizardLM directly
Smaller Qwen 3's are censored quite badly, is this the same?
Replies: >>105618678
Anonymous
6/17/2025, 9:03:17 AM No.105618678
>>105618665
Idk your use case but in my experience barring very few insanely slopped exceptions, no model is really censored given a good system prompt, especially 100b+ models.
Anonymous
6/17/2025, 9:04:26 AM No.105618688
>>105618654
>MoEs will not scale
Titans&co are mixture of attention experts.
Replies: >>105618840
Anonymous
6/17/2025, 9:33:42 AM No.105618834
>>105618481
I'm not running ~70b models but I'm still using a single 3090
A 5090 would be about 5x the price I paid for the 3090 and not much more useful
Anonymous
6/17/2025, 9:34:58 AM No.105618840
>>105618688
>moae
it doesn't even sound cool!
Anonymous
6/17/2025, 9:37:46 AM No.105618863
chroma-unlocked-v37-Q8_0.gguf
chroma-unlocked-v37-Q8_0.gguf
md5: 12f6851d6d6e073fd397f4ba4fb292e8🔍
LLMbros... we got too cocky while image and videogenbros are eating good... I don't think anything short of actually multimodal R3 will save us...
Replies: >>105618877
Anonymous
6/17/2025, 9:39:59 AM No.105618877
>>105618863
pic unrelated?
Replies: >>105619001
Anonymous
6/17/2025, 10:00:31 AM No.105619001
>>105618877
No
Anonymous
6/17/2025, 10:09:59 AM No.105619044
What is raison d'etre for Q4_0 quants if everyone agrees that Q4_K is always better? I hear this argument for years now
Replies: >>105619183 >>105619192 >>105619265 >>105619471
Anonymous
6/17/2025, 10:39:01 AM No.105619181
>>105614993
No one can make shit. All hopes ride on Sam
Anonymous
6/17/2025, 10:39:05 AM No.105619183
>>105619044
og nigga
Anonymous
6/17/2025, 10:40:20 AM No.105619192
quantcompgraph
quantcompgraph
md5: 1515522c4c7bd6d182ee1eac9d042cd0🔍
>>105619044
Nothing, it's a legacy format. iq4_xs is both smaller and better. q4_k_s/m are very slightly bigger and much better.
Replies: >>105619228 >>105619265
Anonymous
6/17/2025, 10:44:50 AM No.105619228
>>105619192
nta but k_s?
I've been grabbing k_m like a monkey all this time
Replies: >>105619265
Anonymous
6/17/2025, 10:49:45 AM No.105619257
>>105618329
>dark horse AI lab
Kek, you have no idea.
Anonymous
6/17/2025, 10:51:13 AM No.105619265
>>105619044
>>105619192
>>105619228
qat only works for q4_0 for models that have it (gemma)
Replies: >>105619472
Anonymous
6/17/2025, 11:00:30 AM No.105619313
has anyone tried any of the new ocr models such as MonkeyOCR and Nanonets-OCR-s?

looking to convert research papers in pdf to markdown or txt
docling is letting me down in accuracy and some other issues
Replies: >>105619435
Anonymous
6/17/2025, 11:18:51 AM No.105619427
>>105618329
>Given how close we are to AGI
We're not.
Replies: >>105619439 >>105619443
Anonymous
6/17/2025, 11:20:24 AM No.105619435
>>105619313
Have you tried simply extracting the text directly from the PDF?
Replies: >>105619493
Anonymous
6/17/2025, 11:21:35 AM No.105619439
>>105619427
We're
Anonymous
6/17/2025, 11:22:21 AM No.105619443
>>105619427
Of course, your job is safe don't worry. But hypothetically if we were... does Europe have a shot?
Replies: >>105619485
llama.cpp CUDA dev !!yhbFjk57TDr
6/17/2025, 11:25:39 AM No.105619471
>>105619044
q4_0 is faster than q4_K_M due to the simpler data structure.
For development purposes in particular it's also the quant that I use because I don't care about maximizing quality/VRAM in that scenario but I do care about speed and simplicity to make my measurements easier.
I never use q4_0 outside of testing though.
Anonymous
6/17/2025, 11:25:48 AM No.105619472
>>105619265
qat is not nearly as good as google claims it to be.
Anonymous
6/17/2025, 11:27:41 AM No.105619485
>>105619443
AGI will not be achieved by scaling up our current architectures.
It requires a fundamental breakthrough which could come from anywhere, including Europe.
Replies: >>105619525
Anonymous
6/17/2025, 11:28:33 AM No.105619493
>>105619435
it didn't work that well with the multiple columns and formulas, but I'll give it another try, thanks.
Anonymous
6/17/2025, 11:33:24 AM No.105619525
>>105619485
>AGI will not be achieved by scaling up our current architectures.
Source: your ass
>breakthrough which could come from anywhere, including Europe
It could also come from a pajeet 5 year old. Will it? No. You actually need a ML industry and companies for that.
Replies: >>105619590
Anonymous
6/17/2025, 11:40:57 AM No.105619566
Gemma 3 seems obsessed with wrapping her legs around your waist, no matter her position.
Replies: >>105619662
Anonymous
6/17/2025, 11:44:38 AM No.105619590
1730378102308121
1730378102308121
md5: 36bf987c47fee56d5dc23b58321dd50e🔍
>>105619525
*AHEM*
Replies: >>105619596 >>105619607
Anonymous
6/17/2025, 11:46:01 AM No.105619596
>>105619590
They have nothing absolutely nothing noteworthy since Miqu and Mixtral.
Anonymous
6/17/2025, 11:46:59 AM No.105619607
>>105619590
>no good model since 2407, 9 months ago
>rekt by r1 like everyone else, except they dont have nearly as much money to recover, and didn't
Replies: >>105619612
Anonymous
6/17/2025, 11:47:58 AM No.105619612
1733193251086829
1733193251086829
md5: 2434e85864e275ac5325d06e89c466cd🔍
>>105619607
>9 months ago
jesus christ
Anonymous
6/17/2025, 11:51:45 AM No.105619630
file
file
md5: 637c2c411a0e5b144b56f46e4a0db8c5🔍
https://huggingface.co/Menlo/Jan-nano
https://huggingface.co/Menlo/Jan-nano-gguf
Replies: >>105620152 >>105620434
Anonymous
6/17/2025, 11:53:04 AM No.105619635
>still no local llm with native image gen
Replies: >>105619642
Anonymous
6/17/2025, 11:54:15 AM No.105619640
>>105617143
It's only good at high quant.
Anonymous
6/17/2025, 11:54:25 AM No.105619642
>>105619635
too unsafe, please understand
Anonymous
6/17/2025, 11:57:54 AM No.105619662
>>105619566
For me she loves tangling her fingers in my hair and arching her back, no matter what the context.
Replies: >>105620164
Anonymous
6/17/2025, 12:03:59 PM No.105619694
I tried dots q4. It feels like a very smart 30b model that still makes a brainfart or two like a 30b. So it is pretty useless. It is like a moe grok1.
Replies: >>105619710 >>105619724 >>105619725 >>105620779
Anonymous
6/17/2025, 12:04:20 PM No.105619697
>>105612316
>>105611862
SWE-bench tests Python only. Perfect if you need something marginally better at writing a glue script I guess.
Anonymous
6/17/2025, 12:06:56 PM No.105619710
>>105619694
prompt issue
Replies: >>105619717 >>105619719
Anonymous
6/17/2025, 12:08:17 PM No.105619717
>>105619710
model issue
Replies: >>105619722
Anonymous
6/17/2025, 12:08:39 PM No.105619719
>>105619710
The sperm that made you had a prompt issue.
Anonymous
6/17/2025, 12:09:06 PM No.105619722
>>105619717
works for me, I'm having a blast
Anonymous
6/17/2025, 12:09:25 PM No.105619724
>>105619694
>It is like a moe grok1.
but grok1 is already a moe?
Anonymous
6/17/2025, 12:09:38 PM No.105619725
>>105619694
grok 1 was a moe
Replies: >>105619734
Anonymous
6/17/2025, 12:11:14 PM No.105619734
>>105619725
YOU are moe
Replies: >>105619741
Anonymous
6/17/2025, 12:11:58 PM No.105619741
>>105619734
nuh uh im dense
Anonymous
6/17/2025, 12:12:54 PM No.105619751
long shot but if anyone else saved migus (or friends) down feel free to reupload. most of my edited data got wiped.

>>105616569
what
Replies: >>105619796 >>105619812
Anonymous
6/17/2025, 12:21:11 PM No.105619796
>>105619751
Stop with offtopic spamming. Nobody cares about your journey to become a woman.
Anonymous
6/17/2025, 12:23:00 PM No.105619812
>>105619751
>my edited data got wiped
Good!
Anonymous
6/17/2025, 12:24:43 PM No.105619828
why can't you faggots just get a miku thread going and migrate? you can spam there as much as you want
Replies: >>105619849
Anonymous
6/17/2025, 12:27:41 PM No.105619849
>>105619828
They don't like /a/ for some reason.
Anonymous
6/17/2025, 12:27:58 PM No.105619851
>samefagging this hard
Replies: >>105619864
Anonymous
6/17/2025, 12:30:02 PM No.105619864
>>105619851
Stop spamming troon
Anonymous
6/17/2025, 12:31:40 PM No.105619870
my thinking boxes in st are inconsistent and sometimes the reply ends up inside both the thinking and the reply blocks
Anonymous
6/17/2025, 12:37:23 PM No.105619906
well duh_thumb.jpg
well duh_thumb.jpg
md5: 0518ec026e6e78e7673ba825cfcb3a61🔍
>>105615360
>>712856332
added some random loop animations
single file:
https://multiup.io/download/1f7bdf2cee24911d0cef25316887bba0/migu%20animations.zip
Replies: >>105619933
Anonymous
6/17/2025, 12:43:33 PM No.105619933
>>105619906
Nobody loves you and you should kill yourself
Replies: >>105619951
Anonymous
6/17/2025, 12:48:15 PM No.105619951
>>105619933
A few loony troons love him though
Replies: >>105620049
Anonymous
6/17/2025, 12:52:16 PM No.105619971
Mikugaki Anon makes the thread more comfy, muh troons posters are just annoying.
Replies: >>105619977 >>105619997 >>105620008
Anonymous
6/17/2025, 12:53:33 PM No.105619977
>>105619971
True story
Anonymous
6/17/2025, 12:53:35 PM No.105619978
Masculine urge to post pictures of vocaloids.
Anonymous
6/17/2025, 12:56:10 PM No.105619994
>vocaloids
>masculine
The irony ironing
Replies: >>105620063
Anonymous
6/17/2025, 12:56:32 PM No.105619997
>>105619971
What causes a person to think about and see "troons" everywhere?
Anonymous
6/17/2025, 12:59:08 PM No.105620008
>>105619971
This is not your safespace.
Replies: >>105620013 >>105620183
Anonymous
6/17/2025, 1:00:02 PM No.105620013
>>105620008
It's not yours either, feel free to fuck off.
Replies: >>105620026
Anonymous
6/17/2025, 1:01:52 PM No.105620026
>>105620013
I am not the one shitting up thread with anime ai generated slop though
Replies: >>105620047
Anonymous
6/17/2025, 1:05:17 PM No.105620045
I think there's a compromise to be made here - miku can be posted, but she must advocate for total troon death.
Replies: >>105620155
Anonymous
6/17/2025, 1:05:45 PM No.105620047
>>105620026
>go to ai thread
>see ai generated content
>get upset
zoomer logic everyone
Replies: >>105620103
Anonymous
6/17/2025, 1:06:21 PM No.105620049
>>105619951
i love xim xes my xusband
Anonymous
6/17/2025, 1:08:29 PM No.105620063
>>105619994
>masculine urge to fuck women

>women
>masculine

This is your logic.
Anonymous
6/17/2025, 1:16:15 PM No.105620103
>>105620047
No that works with /sdg/ or /ldg/ only.
Replies: >>105620144
Anonymous
6/17/2025, 1:22:40 PM No.105620144
>>105620103
You would still be sperging out even if they were made with anole or bagel.
Anonymous
6/17/2025, 1:23:56 PM No.105620152
>>105619630
Q4 might be fine to load in browser directly and replace google
Anonymous
6/17/2025, 1:25:07 PM No.105620155
>>105620045
Why would she go against her only fanbase?
Anonymous
6/17/2025, 1:26:06 PM No.105620164
>>105619662
Yeah, those too. I get that sex ultimately is repetitive, but it feels as if Gemma 3 only knows a couple ways of describing it non-explicitly. There are other areas too where it isn't very creative and after a while you'll notice it uses always the same patterns.
Anonymous
6/17/2025, 1:29:59 PM No.105620183
>>105620008
Ywnbaw
Anonymous
6/17/2025, 1:31:32 PM No.105620193
Today has been one of the worst blackpills for me regarding local models in a long time.

The strongest open source vision model is worse than o4-mini.

Try this experiment:

Upload this image https://www.lifewire.com/thmb/7GETJem9McVRDI8kbzLM6TfwED0=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/windows-11-screenshots-615d31976db445bb9f725b510becd850.png

With this prompt

You are an assistant tasked with finding a bounding box for the indicated element.
In your response you should return a string in the following format:
[BOUNDING_BOX x0 y0 x1 y1]
x0, y0, x1, y1 being decimal coordinates normalized from 0 to 1 starting at the top left corner of the screen.
The indicated element you are required to find is the following:
Start Menu button

The only model that gives a half right response is Gemma 3 27B. All the other open source models give wrong answers. ALL of the proprietary models give better answers than the best open source model.

Now you might think this is a weird thing to ask the model. But this is exactly the kind of tasks that are required for an assistant to control a computer and perform tasks autonomously.
Replies: >>105620203 >>105620226 >>105620425 >>105620650
Anonymous
6/17/2025, 1:32:38 PM No.105620203
>>105620193
This is about as retarded as expecting them to count letters and do calculations.
Replies: >>105620227
Anonymous
6/17/2025, 1:36:12 PM No.105620226
>>105620193
Gemma 3 only uses a 400M parameters vision model and with current implementations it encodes all images in 256 tokens at 896x896 resolution. It's a miracle it performs like it does. Imagine if it had at least twice the size.
Replies: >>105620247 >>105621260
Anonymous
6/17/2025, 1:36:17 PM No.105620227
>>105620203
Then what workflow do you suggest to let a model control a computer?
And if it's so retarded then how come the tiniest OpenAI or Gemini models can give a reasonable answer but Qwen 3 235B can't? Your response sounds like cope to me.
Replies: >>105620299
Anonymous
6/17/2025, 1:39:14 PM No.105620247
>>105620226
Yeah but the problem is that I haven't found any open source models that work. Llama 3.2 90B and the Qwen model I mentioned above give worse responses than Gemma.
The fact that Gemma 27B works given the tiny size tells me Google has used some of the same training data or methods they used for Gemini and that's why it kinda works.
Anonymous
6/17/2025, 1:43:40 PM No.105620270
Gtm44Jia0AcwMyM
Gtm44Jia0AcwMyM
md5: c143e604320d086c6acf33d9b45d09d2🔍
Replies: >>105620315
Anonymous
6/17/2025, 1:49:04 PM No.105620299
>>105620227
>tiniest OpenAI or Gemini models
How do you know the number of parameters these closed models have?
For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?
Replies: >>105620369
Anonymous
6/17/2025, 1:51:26 PM No.105620315
>>105620270
beeg meeg
Anonymous
6/17/2025, 1:52:55 PM No.105620326
no-qwen-above-30b
no-qwen-above-30b
md5: bad538e9c5e94bc4f2dd74a10034e5d5🔍
Qwen won't make dense models larger than 30B anymore.

https://x.com/JustinLin610/status/1934809653004939705
> For dense models larger than 30B, it is a bit hard to optimize effectiveness and efficiency (either training or inference). We prefer to use MoE for large models.
Replies: >>105620345 >>105620350
Anonymous
6/17/2025, 1:53:52 PM No.105620333
https://www.youtube.com/watch?v=_0rftbXPJLI
Replies: >>105620361
Anonymous
6/17/2025, 1:56:16 PM No.105620345
>>105620326
Local is evolving
Anonymous
6/17/2025, 1:56:36 PM No.105620350
>>105620326
The future is <=32B Dense and >150B MoE. And I'm all for it, just buy RAM.
Replies: >>105620407
Anonymous
6/17/2025, 1:57:46 PM No.105620361
>>105620333
Buy ad!
Anonymous
6/17/2025, 1:59:03 PM No.105620366
migu_general
migu_general
md5: c637c3ecfea2be47e15ad8039144dd55🔍
Anonymous
6/17/2025, 1:59:33 PM No.105620369
>>105620299
>How do you know the number of parameters these closed models have?
I don't, but what difference does it make? If they have better quality results because they have bigger models they still have better quality results.
>For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?
Again, does it matter? Do local users have any kind of image segmentation model that can detect GUI elements? No, at least that I know of. I tried the most popular phrase grounding model (owlv2) and it basically knows nothing about GUI elements.
If you wanna have a go at it here's a list of models
https://huggingface.co/models?pipeline_tag=zero-shot-object-detection&sort=trending
Anonymous
6/17/2025, 2:00:04 PM No.105620373
>>105617731
Yuma.
Anonymous
6/17/2025, 2:05:50 PM No.105620407
>>105620350
RAM is slow.
Replies: >>105620414 >>105620419
Anonymous
6/17/2025, 2:06:33 PM No.105620414
>>105620407
but not for MoE!
Replies: >>105620422
Anonymous
6/17/2025, 2:07:30 PM No.105620419
>>105620407
Just buy fast RAM.
Replies: >>105620648
Anonymous
6/17/2025, 2:07:37 PM No.105620422
>>105620414
It is compared to a dense model you can fit in VRAM.
Anonymous
6/17/2025, 2:07:53 PM No.105620425
>>105620193
This is a trap, data-mining post by closed AI
Tell sam atman to kill himself.
Replies: >>105620437
Anonymous
6/17/2025, 2:09:10 PM No.105620434
>>105619630
>MCP Server?
>No Problem, set up SERPER API!
>Just place your SERPER API KEY IN JAN! 1000 Pages for only 0.30$!!
Are they fucking retarded? I can just use grok for free and it probably works better too. Soon probably gemini according to rumors.
I want something local. Why don't they show me how to set that up with duckduckgo or brave or whatever.
Replies: >>105620457 >>105620517
Anonymous
6/17/2025, 2:09:30 PM No.105620437
>>105620425
No it's not
Anonymous
6/17/2025, 2:11:15 PM No.105620457
>>105620434
Download a gguf model locally, load it up on your gpu....to call a $$ api for the results.
They didnt think that through. How stupid.
Anonymous
6/17/2025, 2:18:08 PM No.105620495
Asking for a friend, how do I go from no model to having a model which acts just like a girl i used to know?
Replies: >>105620519 >>105620918
Anonymous
6/17/2025, 2:22:00 PM No.105620517
>>105620434
>It's a chrome app
>2GB installed
>1GB updater
>All in my C: which has no more free space left
I'm so angry
Anonymous
6/17/2025, 2:22:08 PM No.105620519
>>105620495
Go to koboldcpp's github, find the wiki tab, and read the quickstart.
Then download mistral nemo instruct gguf from huggingface and silly tavern and make a card of that person.
Anonymous
6/17/2025, 2:45:26 PM No.105620648
>>105620419
If only you could easily buy more channels...
Replies: >>105620666
Anonymous
6/17/2025, 2:45:38 PM No.105620650
>>105620193
Bro you could train a tiny yolo model on whatever GUI elements you want and strap that to your LLM. Go be retarded elsewhere
Replies: >>105620738
Anonymous
6/17/2025, 2:49:39 PM No.105620666
>>105620648
DDR6 and CAMM2 are coming Soon™ to save the day
Replies: >>105620674
Anonymous
6/17/2025, 2:51:35 PM No.105620674
>>105620666
Here's hoping it comes alongside CPUs with even wider busses too.
Anonymous
6/17/2025, 3:01:40 PM No.105620738
>>105620650
We're talking about GUI elements here. Training an object detection model on specific GUI elements would take a lot of effort for little reward. If I were to take screenshots of all the elements I want to click on I could just do image similarity matching.
What I want to do is describe the element I want to match in natural language and have the model find it, without giving it any visual examples (other than the one it has to find in the image).
You think I'm retarded? Why? You think the idea is retarded?
You think Manus and Operator are retarded?
Replies: >>105620763 >>105620807
Anonymous
6/17/2025, 3:03:49 PM No.105620763
>>105620738
https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B
Anonymous
6/17/2025, 3:05:52 PM No.105620779
>>105619694
It felt worse than the current 30B models, though.
Anonymous
6/17/2025, 3:09:28 PM No.105620807
>>105620738
You want the bounding boxes of GUI elements, which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what is. The reward would be having a model for your use case and stop bitching here. I'm training my own small models for my use cases all the time.
Replies: >>105620889 >>105621260
Anonymous
6/17/2025, 3:21:11 PM No.105620889
>>105620807
It's not really a segmentation task. Segmentation in the classic sense (like in the YOLO models you mentioned) means detecting pre-determined categories in the image. The term they use for detecting objects based on free form natural language is phrase grounding.
>which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what is
I don't think using the tool that works better for the job is retarded. Do you? Or are you saying there a tool that works better? If so, then I challenge you to show me a segmentation model that performs better at doing this task than ChatGPT or Gemini. Like I said above, I tried the most popular phrase grounding model and it doesn't know what a start menu is. When you ask it about GUI elements it will just randomly highlight random icons in the image.
>I'm training my own small models for my use cases all the time.
If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.
Replies: >>105620944
Anonymous
6/17/2025, 3:24:50 PM No.105620918
>>105620495
>just like a girl i used to know?
Stop. Don't try to have LLMs simulate real people. Take your meds instead.
Anonymous
6/17/2025, 3:28:18 PM No.105620944
>>105620889
Multimodality is in its infancy and requires a lot more resources to pull off than simple LLM + tool calling, so why are you even surprised that only cloud models are able to do that?
The only solution now for local models is training a model for a specific task, which here is segmentation on GUI elements. Even if Google bothers to release a new model it won't be as good as their flagship, that much is a given.
>If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.
Model A is a cloud model and model B is running on my own computer, model B is superior by default.
Replies: >>105621016
Anonymous
6/17/2025, 3:37:36 PM No.105621016
>>105620944
>so why are you even surprised that only cloud models are able to do that?
Because I was under the impression that the largest open source models would BTFO the -mini and Flash commercial models in all tasks. I was hoping somebody to prove me wrong but it seems that the vision capabilities in local models are just worse.
>Model A is a cloud model and model B is running on my own computer, model B is superior by default.
That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.
Replies: >>105621036 >>105621039
Anonymous
6/17/2025, 3:40:09 PM No.105621036
>>105621016
>That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.
For all work. There is no reason to ever give them free data, even if you've been successfully programmed to not care about them building a profile on you.
Replies: >>105621076
Anonymous
6/17/2025, 3:40:24 PM No.105621039
>>105621016
You can't automate that stuff without using your cloud model API and that's not free, it has nothing to do with nsfw
Replies: >>105621076
Anonymous
6/17/2025, 3:44:48 PM No.105621076
>>105621039
You mean free as in beer or free as in freedom? Because if I can spend a few dollars to save me an hour of work then I probably would.

>>105621036
It's not free if they're giving me something in return.
Replies: >>105621096
Anonymous
6/17/2025, 3:47:25 PM No.105621096
>>105621076
>free as in beer or free as in freedom
Both. You really want to give cloud models pics of your own computer? That's not even a question if you work for a company
Replies: >>105621319
Anonymous
6/17/2025, 4:11:55 PM No.105621260
>>105620226
do local inference backends not support some form of tiling to spread the image out across several tiles?

I've been using Gemini 2.5 for some video understanding tasks and I use ffmpeg to resize+pad the extracted frames so that they fit across two tiles. Works fairly well for a lot of tasks but the bounding boxes are kinda shit.

>>105620807
retard
Replies: >>105621294 >>105621405
Anonymous
6/17/2025, 4:13:56 PM No.105621282
New music model dropped:

https://github.com/tencent-ailab/SongGeneration

Someone please try this and report back.
Replies: >>105621313
Anonymous
6/17/2025, 4:15:08 PM No.105621294
>>105621260
>Do X, but it's shit for your use case
Retarded gorilla
Replies: >>105621343
Anonymous
6/17/2025, 4:16:38 PM No.105621307
>>105611492 (OP)
>install LM Studio on laptop with a 3060
>Not even 30% CUDA utilization with 100% offloading to vram
>Everything takes minutes to answer
Why might this be?
Replies: >>105621969
Anonymous
6/17/2025, 4:17:27 PM No.105621313
>>105621282
>SongGeneration-base(zh) v20250520
>SongGeneration-base(zh&en) Coming soon
>SongGeneration-full(zh&en) Coming soon
Chinese only for now
Replies: >>105621337
Anonymous
6/17/2025, 4:18:46 PM No.105621319
>>105621096
If George Hotz is not afraid of streaming his PC to the whole world, I don't see why I should be afraid of streaming my PC to Google.
As for the company I work for, nobody is monitoring me that closely that I'd get in trouble for leaking a few screenshots with bits of internal data here and there to a non authorized API . I don't handle anything too sensitive.
Replies: >>105621342
Anonymous
6/17/2025, 4:20:32 PM No.105621337
>>105621313
It seems to support instrumental stuff at least.
I am curious what is the generation speed for this, if it's dreadful like YuE or if it's a bit fast like AceStep (which is shit but fast)
Anonymous
6/17/2025, 4:21:14 PM No.105621342
>>105621319
>Defending cloud cuckery this hard
You're in the wrong general bro
Replies: >>105621386 >>105621399
Anonymous
6/17/2025, 4:21:18 PM No.105621343
>>105621294
Get some reading comprehension you ape. Everything apart from the bounding boxes works well. I can get time stamped occurrences of company logos even when they're blurred/upside down, it completely btfos older methods. Even the inaccurate bounding boxes are still in the general location and useful for human evaluation. It's super convenient to feed the audio in too.
Replies: >>105621399
Anonymous
6/17/2025, 4:28:19 PM No.105621386
>>105621342
I'm just being realistic. Acknowledging deficiencies is the first step towards improving.
I have a friend who told me he has had success with this using an open source model but doesn't remember which one he used, he's gonna send me some info when he gets back from work.
Anonymous
6/17/2025, 4:29:31 PM No.105621399
>>105621343
>>105621342
Anonymous
6/17/2025, 4:30:28 PM No.105621405
>>105621260
Pan&Scan (tiling) isn't implemented in llama.cpp and it didn't work last time I tried it in vLLM where apparently it's implemented (with an optional flag to enable it). From what I've read, it needs some form of prompt engineering; it's not as simple as just sending image tiles.
Replies: >>105621463
Anonymous
6/17/2025, 4:38:33 PM No.105621463
>>105621405
thanks for the info. That explains why Gemma performed so poorly for manga translation when I tried a while back, the image was probably resized to the point of the text being unreadable.
Anonymous
6/17/2025, 4:50:28 PM No.105621574
>>105621559
>>105621559
>>105621559
Anonymous
6/17/2025, 5:38:16 PM No.105621969
>>105621307
Low memory bandwidth can't saturate compute.