← Home ← Back to /g/

Thread 105611492

374 posts 78 images /g/
Anonymous No.105611492 >>105611887 >>105612859 >>105612968 >>105613219 >>105613273 >>105613788 >>105613888 >>105614932 >>105621307
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105601326 & >>105589841

►News
>(06/16) MiniMax-M1, hybrid-attention reasoning models released: https://github.com/MiniMax-AI/MiniMax-M1
>(06/15) llama-model : add dots.llm1 architecture support merged: https://github.com/ggml-org/llama.cpp/pull/14118
>(06/14) NuExtract-2.0 for structured information extraction: https://hf.co/collections/numind/nuextract-20-67c73c445106c12f2b1b6960
>(06/13) Jan-Nano: A 4B MCP-Optimized DeepResearch Model: https://hf.co/Menlo/Jan-nano

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.105611494
►Recent Highlights from the Previous Thread: >>105601326

--Papers:
>105606869 >105606875
--Evaluation of dots.llm1 model performance and integration challenges in local inference pipelines:
>105601735 >105604736 >105604782 >105604857 >105604810 >105604838 >105605017 >105605319 >105605475 >105605551 >105605556 >105605609 >105605671 >105605701 >105605582 >105605670 >105605965
--llama-cli vs llama-server performance comparison showing speed differences and config inconsistencies:
>105601495 >105601540 >105601746 >105601830 >105601953 >105601967 >105602123 >105602170 >105602190 >105602380 >105601654
--Evaluating budget hardware options for local LLM deployment with portability and future model scaling in mind:
>105609676 >105609743 >105609808 >105609858 >105610000 >105610275 >105610095
--VideoPrism: A versatile video encoder achieving SOTA on 31 of 33 benchmarks:
>105610184
--Sugoi LLM 14B/32B released via Patreon with GGUF binaries and claimed benchmark leads:
>105606204 >105606305 >105606399 >105609562 >105609620
--Interleaving predictions from multiple LLMs via scripts or code modifications:
>105609453 >105609499 >105609500 >105609534
--Hailo-10H M.2 accelerator questioned for real-world AI application viability:
>105602205 >105602335
--Radeon Pro V620 GPU rejected due to driver issues and overheating in LLM use case:
>105603370 >105603394 >105603418 >105603454 >105603762 >105603893 >105604087
--Sycophantic tendencies in cloud models exposed through academic paper evaluation:
>105601903 >105602389 >105602410 >105602064 >105603398 >105603416
--MiniMax-M1, hybrid-attention reasoning models:
>105611241 >105611443
--Qwen3 models released in MLX format:
>105608806
--Miku (free space):
>105601934 >105603103 >105604354 >105604389 >105604736 >105605940 >105606009 >105606217 >105610016 >105610160 >105610284 >105610486 >105611108 >105611119

►Recent Highlight Posts from the Previous Thread: >>105601330

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.105611523 >>105611563
>it's june 16, 2025 and there is STILL no minimax gguf
Anonymous No.105611524
Nothing ever happens.
Anonymous No.105611563 >>105611656 >>105612211
>>105611523
And there never will be. It uses lightning attention
https://github.com/ggml-org/llama.cpp/issues/11290
Anonymous No.105611583 >>105611602 >>105611680
>>105611471
here you go sar, they have a huggingface space
https://huggingface.co/spaces/MiniMaxAI/MiniMax-M1
Anonymous No.105611602 >>105612438
>>105611583
>looking it up in my mind
Anonymous No.105611630
I see that Unsloth uploaded dots.llm1 quants within the last few hours. I've been waiting to try out this model. If I have 96GB VRAM, which is better: IQ4_XS, IQ4_NL, or UD-Q3_K_XL? These are the 3 that look like the largest size I can fit. Tbh I'm not even really sure what all these newer meme quants are or which is supposed to be best.
Anonymous No.105611651
does r1 pass the mesugaki test with the new version they released?
Anonymous No.105611656 >>105611692
>>105611563
>lightning attention
What's next? Bolt attention?
Anonymous No.105611662
totalen mikunigger death
Hi all, Drummer here... No.105611673 >>105611999
> Drummer's merge is already an improvement, yet retains most of Magistral's strengths.

>>105610116

Hey anon, which version did you use and what strengths are you referring to? Was reasoning good and useful?
Anonymous No.105611680
>>105611583
Here's a version with a bone thrown in.
Anonymous No.105611692
>>105611656
I'm holding out for thunder attention
Anonymous No.105611805 >>105611942 >>105611989
Anonymous No.105611838 >>105611985 >>105612989
F5 TTS has had bit of upgrade to inference speed recently. If you havent kept up with the update. ~3 different perf update.

>flash atten2
>Empirically Pruned Step Sampling (lower number of steps for high quality output)
>Single transformation instead of 2 step (half the inference time required)
Anonymous No.105611862 >>105611874 >>105612298 >>105612316 >>105612335 >>105619697
https://huggingface.co/moonshotai/Kimi-Dev-72B
https://github.com/MoonshotAI/Kimi-Dev
>We introduce Kimi-Dev-72B, our new open-source coding LLM for software engineering tasks. Kimi-Dev-72B achieves a new state-of-the-art on SWE-bench Verified among open-source models.
>Kimi-Dev-72B achieves 60.4% performance on SWE-bench Verified. It surpasses the runner-up, setting a new state-of-the-art result among open-source models.
>Kimi-Dev-72B is optimized via large-scale reinforcement learning. It autonomously patches real repositories in Docker and gains rewards only when the entire test suite passes. This ensures correct and robust solutions, aligning with real-world development standards.
>Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub. We welcome developers and researchers to explore its capabilities and contribute to development.
Anonymous No.105611874
>>105611862
>code
>local
I sleep
Anonymous No.105611887 >>105611970 >>105612215
>>105611492 (OP)
Uh oh, the 24/7 seething fatoid disgusting ratoid troons transisters didn't like that one, huh?

You will
Never
Ever
Be a
Real
Woman

looooooooooool
Anonymous No.105611942
>>105611805
Yes (Aliens)
Anonymous No.105611970
>>105611887
Don't misunderstand. I don't project to be or even want to be a woman. That construct is entirely within your own ass, not even a snug fit, it's spacious.
Miku as a character or concept is irrelevant to me. I like her design and my perception of her is a convenient, often portable twintailed onahole. There is no wanting to be her, she is a sleeve for me to rub one out.
Hope that helps clarify. Goodness, you can't seem to kick the habit of malding.
Keep this up and you'll never get a kurisu.
Anonymous No.105611985 >>105612086
>>105611838
I've definitely seen a huge increase in speed for large chunks of texts. With NFE=7

>Declaration of Independence
>8188 characters
>Inference time: 77 seconds
>Output Time: 8 mins 49 seconds of audio

https://vocaroo.com/1oiTcWPgdj6i
Anonymous No.105611989 >>105612018
>>105611805
The second option is a trick, we all know she's not wearing any.
Anonymous No.105611999
>>105611673
Do you know how to properly fine tune MoE models?
Anonymous No.105612018
>>105611989
ah! caught!
Anonymous No.105612086
>>105611985
This is with 2070 btw. So anyone with better GPU can double/triple the inference speeds.
Anonymous No.105612097
>Looking up (in my mind) some sources
Minimax-M1 knows Teto's birthday and that she's an UTAU. It would be a disappointment if it did not given its size.
this single point of knowledge is irrefutable evidence that proves that the model is good. we'll be back.
Anonymous No.105612211
>>105611563
>never
but that was closed saying it could be revisited after refactoring, and they seemed to later do the refactor here:
https://github.com/ggml-org/llama.cpp/pull/12181
>Introduce llm_memory_i concept that will abstract different cache/memory mechanisms. For now we have only llama_kv_cache as a type of memory
and looks like work has picked up on other models with competing cache mechanisms (mamba etc.)
https://github.com/ggml-org/llama.cpp/pull/13979

now we just need someone with motivation, a vibe coding client, and good enough prompt engineering skills to revisit minimax and we're fucking IN
Anonymous No.105612215 >>105612227
>>105611887
Okay schizo
Anonymous No.105612227
>>105612215
>ACK
Anonymous No.105612268 >>105612298
https://huggingface.co/moonshotai/Kimi-Dev-72B
Anonymous No.105612285 >>105612413
>>105610392
don't you need to enable tool use or something like that? are most engines compatible?
Anonymous No.105612298 >>105612305 >>105612404
>>105612268

please search before posting
>>105611862
Anonymous No.105612305
>>105612298
nah go shove a janny dilator up your holes though ;)
Anonymous No.105612316 >>105612363 >>105612598 >>105619697
>>105611862
>qwen 2 finetune
*yawn*
Also, apologize for Devstral
Anonymous No.105612335
>>105611862
do they actually aim for the moon?
Anonymous No.105612363
>>105612316
most meaningless graph award
Anonymous No.105612404
>>105612298
ywnnaj
Anonymous No.105612413
>>105612285
"tool use" is just sending a json object of available tools in the context and executing whatever tool the model invoked in its reply. That's entirely up to the client making the requests to abstract away. I mostly use llama-server, but any engine that exposes an OpenAI-compatible API should work.
Anonymous No.105612438
>>105611602
deepseek (through web) said "making a mental note" to me recently, had no seen that before.
Anonymous No.105612527 >>105612561 >>105612568
>>105610000

unfortunately yes, between rooms

>>105610095

that's great, but double the price. Also, I understand a 16GB card can only load small models (but could be used for diffusion)
Anonymous No.105612561 >>105615772
>>105612527
Literally any pc case is portable "between rooms"
Anonymous No.105612568 >>105615772
>>105612527
>portable between rooms
for what motherfucking purpose?
Anonymous No.105612598
>>105612316
DID THE PARETO FRONT JUST DO WHAT I THINK IT DID?
Anonymous No.105612703 >>105612715 >>105612769
hello saaars
haven't been keeping up with lmgs since deepsneed released, what's the current meta?
Anonymous No.105612715 >>105612729
>>105612703
deepsneed or nemo.
Anonymous No.105612729
>>105612715
This.
sage No.105612769
>>105612703
deepsnemo
Anonymous No.105612859 >>105612884 >>105612888 >>105613005 >>105614385 >>105617634
>>105611492 (OP)
She's sexy. Can I look like that? Is there any tech for that?
Anonymous No.105612873 >>105612995 >>105613064 >>105613075 >>105613470 >>105613494 >>105613982
how are LLMs at femdom? beyond the cursory stuff like verbal degredation and humiliation, can they lean more into the power dynamic side and give you orders and encouragement, dictate what you eat, how you dress, more control yet still nurturing
asking for friend
sage No.105612884 >>105613005
>>105612859
yes.
Anonymous No.105612888 >>105613149
>>105612859
arch linus
Anonymous No.105612968 >>105613058 >>105613077
>>105611492 (OP)
>Looks at news
>Nothing but small models and research stuff that can't RP worth shit
Is it over?
Anonymous No.105612989 >>105613222 >>105613873
>>105611838
F5R-TTS is better
Anonymous No.105612995
>>105612873
no
Anonymous No.105613005 >>105613030
>>105612859
You need to reroll your char. See this: >>105612884
Anonymous No.105613030 >>105613139
>>105613005
The cooldown for rerolling again is kinda long and early game is ass.
Anonymous No.105613058 >>105613759
>>105612968
We got Magistral and dots last week.
Anonymous No.105613064 >>105613087
>>105612873
tell your friend he has mommy issues
Anonymous No.105613075
>>105612873
There is a trick to it. Tell it to roleplay as a wealthy sadistic werewolf milionare that inexplicably fell in love with his 5/10 average unassuming secretary. Then use an agent to rewrite what it wrote and swap werewolf with dommy mommy of your choice.
sage No.105613077
>>105612968
2mw until deepsex V4
Anonymous No.105613087 >>105613110 >>105614204
>>105613064
high ground? here? are you actually serious
Anonymous No.105613110
>>105613087
Yes. It is over anakin. Take your mikutroons and walk into the lava yourself
Anonymous No.105613139 >>105613241
>>105613030
Yes, this is a problem. I want to look like her in 3-4 years.
Anonymous No.105613149
>>105612888
when regular linus isn't strong enough, take it to the arch linus.
Anonymous No.105613219
>>105611492 (OP)
sex with miku
Anonymous No.105613222
>>105612989
Thanks but hows the performance between the two?
Anonymous No.105613241
>>105613139
You will never be 2d, anon
Anonymous No.105613273
>>105611492 (OP)
alice.gguf when?
lmg activate the insider agent and leak it
we will finish her training with antisemitic propaganda and gpu bukkake
Anonymous No.105613312 >>105613348 >>105613349 >>105613389
/lit/fag here. I'm working on an "offline MUD" of sorts and need a writing buddy to ping pong ideas. Chatgpt is good enough but I'm interested in fine-tuning.
What model would you guys recommend for my setup?
>ryzen 7 5700G
>32 GB
>no GPU
Anonymous No.105613346 >>105613377 >>105613382 >>105613760
>do the mesugaki test with R1-0528
>explains it flawlessly, with examples
>fucking ends the message asking me if I want nh codes to illustrate that
>would've posted the log but anons over in aicg got banned for less than that
this is why the chinks are gonna win the ai race
Anonymous No.105613348 >>105613546
>>105613312
SmoLlm-0.15B
Anonymous No.105613349 >>105613546
>>105613312
>fine-tuning
Rent hardware. You're not doing anything with that.
Anonymous No.105613377 >>105613930
>>105613346
anon, the ai models are supposed to do the halluzinations, not you.
Anonymous No.105613382
>>105613346
>>fucking ends the message asking me if I want nh codes to illustrate that
No way.
Anonymous No.105613389 >>105613546
>>105613312
>fine-tuning
just tell the model at the start what you want and how to write
https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-Q4_K_S.gguf
https://huggingface.co/bartowski/google_gemma-3-27b-it-qat-GGUF/blob/main/google_gemma-3-27b-it-qat-Q4_0.gguf
Anonymous No.105613432
What coomodel is good these days for a 24GB vramlet?
Anonymous No.105613470 >>105613951
>>105612873
I've tried various loli dom/yuri s&m scenarios since GPT-3 came out in 2020.
Often it works poorly needing hand holding. Closed stuff like OpenAI had too much positivity bias.
Open models like Llama was not smart enough, needed hand holding, thus ruining it, same for most small ones.
Closed stuff like Claude (Opus, sometimes Sonnet) would manage it somewhat.
(Open) Command-R managed a bit, but needed hand holding and was very schizo.
From open models, DeepSeek R1 manages it properly, but as most LLMs it will still by default jump on your dick or start sex too quickly, but with careful prompting that explains in a few sentences the desired pacing it manages to ace this almost perfectly, It can go both fast and slow and it's leading the story by itself thus keeping your immersion.
I'd say DeepSeek 3 (the first one) failed at it, but the update works. Both R1's new and old wor, but the new has a slower pace old one was more intense, but both are intense enough if prompted right.
Now maybe the model size is too much for most, but when you consider that closed stuff that did well like Opus 3 is dying ("deprecating") and OpenAI has some positivity bias that often ruins it, I'd say R1 is one of the very few that manage to do it right.
If you accept some degree of hand holding smaller models like the 70b llama and some others managed partially, but considerably more poorly. I haven't seen any 7-13b range manage.
I'm a bit interested in trying it with Magistral sometime, because I've noticed that R1 would sometimes make plans and I would pass some of those CoT plans back to it (selectively retaining think blocks), so that it can lead the story over over many turns, which is a lot more fun than LLMs that forget what they were doing half a page ago or what they intended to do.
tl;dr: with careful prompting it works very well on some big models, mostly R1. DS3 sometimes works, but is gacha. everything else often needs hand holding.
Anonymous No.105613494 >>105613577
>>105612873
you have narrow shoulders and literally zero eyebrow line
Anonymous No.105613546 >>105613625
>>105613389
>20+B
I don't want to turn my toaster into a pressure cooker bomb.

>>105613348
>0.15B
Isn't that super small? Still, might be fun to play with, thanks.

>>105613349
Yeah probably fine-tuning is not the right word. I just want some degree of control beyond setting temp and prompting. Pic related is more or less what I expect. Just a dumb box that churns out lore.
Anonymous No.105613577 >>105613608
>>105613494
are you erping with me?
Anonymous No.105613608
>>105613577
Colon three
Anonymous No.105613625 >>105613837
>>105613546
Qwen3-30b should give you ~10t/s at low context so it is very suitable for your machine, although I don't know if it is good for writing or whatever you are doing.
Anonymous No.105613690 >>105614491
>>105611419
My bad.

Let's enjoy another Chinese SOTA at 7t/s (ds-r1 runs at 4t/s)
Anonymous No.105613759
>>105613058
Didn't magistral suck, though?
Anonymous No.105613760
>>105613346
>nh codes to illustrate that
I believe it. Fucking normalfag shit.
Anonymous No.105613788
>>105611492 (OP)
This got me thinking about the calculation for minimum non-whore skirt length.
Anonymous No.105613814 >>105613915
How is dots for RP? 235b is too big for my rig, but dots seems like it could be a sweet spot.
Anonymous No.105613829
It's been quite some time since I've played with local models, has windows + amd gotten any better? It's a pain in the ass to have to boot up linux every time I want to rp
Anonymous No.105613837
>>105613625
Yeah I think fewer parameters but high context is going to work better for me. But gonna keep that in mind. Thanks.
Anonymous No.105613873
>>105612989
code to run it? I can only find papers
Anonymous No.105613874 >>105613979 >>105614033
>>>/v/712790873
Anonymous No.105613888 >>105614327 >>105614358
>>105611492 (OP)
I currently have a server with a ton of CPU cores and spare RAM, but it only has a 1050ti with 4GB VRAM in it. Is it even worth trying to run a local language model on it?
Anonymous No.105613915 >>105613962
>>105613814
it's noticeably pretty sterile when it comes to explicit nsfw, but that's nothing new for that general size range. if you're used to llama/qwen2.5 70b-class derivatives you'll feel right at home, but at least dots may be faster and have some more knowledge
Anonymous No.105613930
>>105613377
You killed me fucker, kek
Anonymous No.105613951 >>105614741
>>105613470
Are you doing a thesis on the topic or something?
Anonymous No.105613962 >>105613983
>>105613915
I liked 72b EVA-Qwen 2.5 at IQ4_XS, though it ran really slow on my system (1-2t/s). If this performs anything like that, but with the speed of a MoE, then it sounds like it's for me.
Anonymous No.105613963 >>105613987 >>105613989 >>105614351
I'm downloading Llama-4-Scout-17B-16E-Instruct-UD-TQ1_0.gguf.
Wish me luck.
I might not survive it.
Anonymous No.105613979 >>105614033
>>105613874
Gayest thread on /v/ right now.
Anonymous No.105613982
>>105612873
That's not femdom
Anonymous No.105613983
>>105613962
That's an RP tune.
Anonymous No.105613987 >>105614005
>>105613963
That's beyond being 'funny' bad. Also it's insane that unsloth's Scout repo has 100k downloads in the past month. That has to be wrong
Anonymous No.105613989 >>105614005
>>105613963
You'll survive fine anon, even the full precision model is shit and retarded
Anonymous No.105614005 >>105614130 >>105614248 >>105614351
>>105613987
>That's beyond being 'funny' bad
I know, pray for me.

>>105613989
I feel this one might be so bad as to be a cognitohazard.
We'll see.
Anonymous No.105614033 >>105614260
>>105613874
>>105613979
How did mikutroons become more mentally ill than furfags? At least those retards contribute to image gen and keep to their own degenerate communities instead of spamming the same generic dogshit of a waifu they obsess over everywhere because they have nothing else in their miserable life to attach to.
Anonymous No.105614104
hi anons, i know that this isn't the best thread to ask about commercial things, but... what are the services where I can deploy sdxl/etc finetuned models (anime ones) for easy API access? Obviously one choice is renting GPU servers on runpod/vast and so on, but are there any managed solutions? I don't think I need a dedicated GPU server to start, but eventually I guess I might need to generate up to 100 images/minute or something like that.
Anonymous No.105614114
AI generated post >105614104 btw
Anonymous No.105614130 >>105614148
>>105614005
>that pic
Baka, go back to /x/
Anonymous No.105614148 >>105614248 >>105614351
>>105614130
>105614104
I say this with utmost sincerity.
I've been on 4chan since 2008. I have been to /x/ maybe 5 times total.
As in, individual instances.
Anyhow, finished downloaded it. Let's see what happens.
If I don't report back, please call my parents.
Anonymous No.105614204 >>105614222
>>105613087
to be fair, if you post on 4chan -- people WILL make fun of you. even if they're just as depraved.
but hey, i hope you find your perfect jerk-off mommy machine bro.
Anonymous No.105614222
>>105614204
s&m is vanilla compared to marrying a cartoon
that's not even a fetish, it's psychosis
Anonymous No.105614248 >>105614266 >>105614351
>>105614005
>>105614148
>load_tensors: layer 0 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 1 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 2 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 3 assigned to device CUDA0, is_swa = 0
>load_tensors: layer 4 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 5 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 6 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 7 assigned to device CUDA0, is_swa = 0
>load_tensors: layer 8 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 9 assigned to device CUDA0, is_swa = 1
>load_tensors: layer 10 assigned to device CUDA0, is_swa = 1
Is this supposed to be a thing? Interleaved swa layers?
That's how they got """1 million""" context?
Anonymous No.105614260 >>105614372 >>105614527
>>105614033
Through the power of your butthurt, you have now summoned Migu.

I wonder if there will be a Blackwell card with 48GB? I don't need 96, and 32 just isn't enough. 48GB is about right. It just seems a little overboard to spend $8500 on a GPU.
Anonymous No.105614264 >>105614304
Can't wait for Minimax to get supported. I'll abandon Deepseek for it.
Anonymous No.105614266 >>105614351
>>105614248
Scout is 10 million sir.
Anonymous No.105614304
>>105614264
Buy an ad
Anonymous No.105614327
>>105613888
sure. Just run with llama.cpp in pure CPU inference mode (or really low context on the GPU)
It'll be a bit slow, but slow is ok for playing with. You'll be better off than desktop anons stuck with 128GB max RAM capacities that can't even run big models.
Anonymous No.105614351 >>105614374
>>105614266
All the same when it breaks down after 8k.

>>105613963
>>105614005
>>105614148
>>105614248
Yeah, it's real bad.
Worse than Qweb 3 30B. It can't even keep up with outputting a specific pattern that much smaller models can just fine.
Amazing.
Anonymous No.105614358
>>105613888
1.5 t/s for deepseek quants
3 t/s for qwen3-235b

Your GPU will be used for promp processing only
Anonymous No.105614372
>>105614260
I remember this Miku
>I don't need 96
yes you do. Big batch size genning of big Migus.
Anonymous No.105614374
>>105614351
breaks down much sooner than 8k even
https://github.com/adobe-research/NoLiMa?tab=readme-ov-file#results
Anonymous No.105614383
Less than two weeks until we get open source Ernie 4.5/X1
Anonymous No.105614385 >>105614481
>>105612859
Install gentoo
Anonymous No.105614479 >>105614491 >>105614515 >>105614535
Getting 10 t/s with dots on my 96 GB gaming rig and normal Llama.cpp with a custom -ot.
Anonymous No.105614481 >>105614507
>>105614385
>Gentroon
It is in the name. I have seen tech shrek. I won't be fooled.
Anonymous No.105614491 >>105614503
>>105614479
Post command params and which quant you use

Also, >>105613690
>another Chinese SOTA at 7t/s
I saw it coming
Anonymous No.105614497
One day a sex model will finally drop and i will be free from this place. I hope you all die the next day.
Anonymous No.105614503 >>105614519
>>105614491
Hold on. I'm actually failing to allocate more context. I was testing with only 2k initially. Damn does this model not use MLA or even GQA?
Anonymous No.105614507
>>105614481
Anonymous No.105614515 >>105614521
>>105614479
Well, is it good?
Anonymous No.105614519 >>105614529
>>105614503
>gayming
What GPU?
Anonymous No.105614521
>>105614515
Idk yet i just wanted to do an initial speed test first but trying to give more context is giving me the ooms.
Anonymous No.105614527
>>105614260
>butthurt
2007 called your hrt ass
Anonymous No.105614529 >>105614544
>>105614519
Just a 3090
Anonymous No.105614535 >>105614542
>>105614479
>with a custom -ot
Suggested by unsloth brothers?
Anonymous No.105614542
>>105614535
No? I am using unsloth's q4 quant though.
Anonymous No.105614544
>>105614529
On ik_llama-cli or the original?
Anonymous No.105614673 >>105614695
>https://github.com/ggml-org/llama.cpp/issues/14044#issuecomment-2961375166
>since it uses MHA rather than GQA or MLA
ACK
Anonymous No.105614695 >>105614739 >>105614871
>>105614673
GQA and its devil offspring are the true killers of soulful outputs. This was common knowledge back during the llama2 era and the first command-r was good because it used natural attention as well
Anonymous No.105614739
>>105614695
Ok but what if dots doesn't have soulful outputs
Anonymous No.105614741
>>105613951
sybau nigger, let the anon talk. Finally someone shares their own experiences instead of just shitposting
Anonymous No.105614758 >>105614873
Ok so it looks like I can't squeeze more than 11k context out of dots for the amount of memory I have, and now I am also getting 8.8 t/s (at 0 context, generating 100 tokens). Guess I'll test it a bit to see if it's worth downloading Q3 for.
Anonymous No.105614871
>>105614695
This. GQA kills that feeling that the model *gets* what you mean.
Anonymous No.105614873
>>105614758
thank for the info
Anonymous No.105614895
we need some madlad company to get rid of the tokenizer and train the model on unicode
Anonymous No.105614932 >>105615858
>>105611492 (OP)
I'm completely new to this. Should I look further into lmgs if I don't care about chatting and image gen? Will I need a dedicated build or will my pc do?
Anonymous No.105614993 >>105615046 >>105615060 >>105615100 >>105615313 >>105619181
guys, what will save local?
Anonymous No.105615012 >>105615065
we are already saved.
Anonymous No.105615046
>>105614993
BitNet OpenGPT
Anonymous No.105615060 >>105615100
>>105614993
miku
Anonymous No.105615065
>>105615012
I don't feel saved
Anonymous No.105615090 >>105615112
Ok I tested dots and it's really meh. Feels like any other LLM really and on top of that, the trivia knowledge is also not really that good in my tests. No better than Gemma or GLM-4. MAYBE a bit better than Qwen. What trivia did people test that it had better performance on? I didn't do better on the /lmg/ favorites like mesugaki at least, and not on my personal set.
Anonymous No.105615100
>>105614993
New paradigm. Eternal waiting room until then. Every possible LLM sucks. It's over.
More realistically I'd like to see more online learning experiments or papers. Like a live feedback thumbs up/down. Not to save local, but to keep my own curiosity alive even if the resulting implementations make the models retarded, slow, broken or anything. Something new to play with.
>>105615060
Miku's love
Anonymous No.105615106 >>105615149
Can I generate porn and or 3d models on a 3090Ti?
Anonymous No.105615112
>>105615090
>What trivia did people test that it had better performance on?
Even hilab admits the only thing it has better knowledge of is Chinese language trivia knowledge. For anything else it's beaten by fucking Qwen 2.5 72B.
Anonymous No.105615149
>>105615106
No, you're retarded
Anonymous No.105615313
>>105614993
openai's SOTA phone model will shift the pareto frontier of speed x safety
Anonymous No.105615335 >>105615373
Trying to use a local model to make enhancements to tesseract OCR. Tesseract is fairly good without AI, but my ultimate goal is structured output of receipt data, so I can easily port it into hledger

What models would be best for this sort of thing? I've used ollama with some of the vision models and results haven't been great so far
Anonymous No.105615360 >>105615415 >>105615419 >>105615433 >>105615438 >>105615727 >>105615887 >>105615959 >>105616569 >>105619906
some migus & friends
https://www.mediafire.com/folder/et1b18ntkdlac/vocaloid
Anonymous No.105615373
>>105615335
Why would you need a vision model after you've OCRed the receipt to text?
Anonymous No.105615415 >>105615430
>>105615360
>can't download the entire folder without a premium account
Anonymous No.105615419 >>105615430
>>105615360
>mfw I've been saving them all manually
Anonymous No.105615430 >>105615727 >>105615743
>>105615415
>>105615419
sorry, first filehost that came to mind. you can use jDownloader.
I know some other people download em so if you want to make up a more complete collection feel free
my collection, ironically, is probably more incomplete due to catastrophic data loss.
Anonymous No.105615433
>>105615360
Thank you Migu genner
Anonymous No.105615438
>>105615360
>all .jpg
i curse you!
Anonymous No.105615443
Minimax was very obviously trained on the old R1. The thinking process is the same endlessly long plain text where the model tries to think about even the most trivial shit. It even sometimes deliberately gets things wrong at first just to be able to correct itself and think some more.
Anonymous No.105615535 >>105615582
>llama.cpp
>warming up the model with an empty run - please wait ... (--no-warmup to disable)

can I just skip warmup for good?
Anonymous No.105615582
>>105615535
"Warming up?" You don't know the meaning of those words, Bardin.
Anonymous No.105615587 >>105615598
Do people use the quantized context with llama.cpp?
Anonymous No.105615595 >>105615710 >>105615716
https://huggingface.co/bartowski/rednote-hilab_dots.llm1.inst-GGUF

bartgoatski quants are up. gogogo
Anonymous No.105615598
>>105615587
Yeah, but I wouldn't call them "people"
Anonymous No.105615710
>>105615595
does it need a llamacpp update?
Anonymous No.105615716
>>105615595
Shit llm for copers that somehow still dont have even 128gb ram for sneedsex
Anonymous No.105615727
>>105615360
>>105615430
Host a public FTP server, you coward.
Anonymous No.105615743 >>105615767
>>105615430
>first filehost that came to mind
makes sense that a mikutroon's mind is retarded
Anonymous No.105615745 >>105615817
Ive being seeing stupid adds for this other half ai anime bullshit...is there anything to approximate locally(plug in some llm to like an anime vroid model or something that has maybe limited voice rec)
Anonymous No.105615767 >>105615788 >>105615805 >>105615819
>>105615743
can you point at, precisely, what aspect of miku makes it at all relevant to trans
not the people coopting the design and changing it, the official design, as per crypton future media
you're been throwing around this trans/agp thing for literal months if not years at this point and yet you've failed to even once properly ground your point or lack thereof in any actual sense
nobody's looking at miku and thinking of sex change surgeries that's all you
nobod- why am I even bothering you're clearly off your meds
Anonymous No.105615772
>>105612561
fair enough, but you probably dont want to haul a full blown desktop like in the good lan days everyday. Still, if you have a better rec with desktop form factor Im happy to listen

>>105612568
living in a shoebox, cant use the same room all the time, its functionality is time multiplexed
Anonymous No.105615788 >>105615794
>>105615767
Xir, this is a trans website. Nobody would be obsessed with this obsolete design if he wasn't a real woman.
Anonymous No.105615794
>>105615788
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
whatever retard keep thinking of cock
Anonymous No.105615805 >>105615820
>>105615767
>you're been throwing around this trans/agp thing for literal months if not years at this point and yet you

>only 1 person realized that mikuniggers are retards who just spam their dogshit mascot obsessively and almost never ever have a single based opinion imaginable that they post in the thread despite being in the thread every day and despite the many actual trannies and faggots that raid the threads but never got told off by a single mikunigger avatarposter, instead they ignore those people and keep posting their generic trash obsession waifu
yeah... i wonder why people dislike you
Anonymous No.105615817
>>105615745
You don't have the IQ to run that
Anonymous No.105615819
>>105615767
>nobod- why am I even bothering you're clearly off your meds
American brown tumblr tier writing btw
Anonymous No.105615820 >>105615842
>>105615805
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
you could just answer
Anonymous No.105615842 >>105615867
>>105615820
I concede you aren't a troon. Now continue not being a troon by no longer spamming that worthless avatar. And if you continue then... well you admit you are a disgusting troon.
Anonymous No.105615858
>>105614932
vector databases and semantic search
Anonymous No.105615867 >>105615880
>>105615842
to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
for someone who rants and raves endlessly about proper argumentation and logic you're proper fucking shit at it
never expect me to reply to your bs again
Anonymous No.105615880
>>105615867
Well there you go that is how we know you are a disgusting troon and you have AGP fantasies focusing on that retarded avatar you keep pushing on everyone. I recommend joining the 41%
Anonymous No.105615887 >>105615944 >>105615959
>>105615360
https://multiup.io/download/f927ee16eeea9bf6db1576a0d0c1f536/xx.zip
single file
Anonymous No.105615944
>>105615887
Thank you. Download finished in a couple seconds. Much better than fucking with jDownloader.
Anonymous No.105615959
>>105615360
>>105615887
An artifact to be preserved
Anonymous No.105615963 >>105615971 >>105615982 >>105616370
>load 0.6B model
>rig starts screeching like it's getting fistfucked by Satan himself

>load 8B model
>rigs handles it just fine

Okay I'm way over my head here, guys.
Anonymous No.105615971
>>105615963
It likes its models small and open
Anonymous No.105615982
>>105615963
First case it ran on CPU
Anonymous No.105616002
Anonymous No.105616197 >>105616261 >>105616387
Ive been running mythomax for years now, and I just upgraded to a 5080. Whats the new meta for coom llms?
Anonymous No.105616261 >>105616340
>>105616197
there is still nothing better than mythomax
Anonymous No.105616340 >>105616408
>>105616261
is there a way I can finetrain it a bunch of specific fetish smut to make it better?
Anonymous No.105616370 >>105616422
>>105615963
0.6b needs very little bandwidth, so compute usage goes up (and so do the fans). 8b needs more bandwidth, so it spends more time just waiting for memory to reach registers to compute, giving it time to chill.
Anonymous No.105616387
>>105616197
Try Cydonia
Anonymous No.105616408
>>105616340
Yes you can finetune if you want (but not on a single 5080), but you really should just read a thread because this question gets asked every single god damn thread and if you look at the last thread you would be able to find at least 5 different recommendations for someone in your situation
Anonymous No.105616422
>>105616370
This. When my CPU is doing prompt processing the fans go full blast, but once it starts generating tokens they calm down.
Anonymous No.105616441 >>105616511
can someone generate neutral looking anime women pictures so i can use them for my blogposts?
Anonymous No.105616511 >>105616512
>>105616441
You could.
Anonymous No.105616512 >>105616521
>>105616511
i have rx 6600
Anonymous No.105616521
>>105616512
Most image gen is python based. I don't know how if they work with amd. Try stablediffusion.cpp. Should probably work with vulkan.
Anonymous No.105616569 >>105619751
>>105615360
ピクシブのリンクでいいでは?
Anonymous No.105616734 >>105616768 >>105616800 >>105617143 >>105617582 >>105617625
justpaste (DOTit) GreedyNalaTests

Added:
dans-personalityengine-v1.3.0-24b
Cydonia-24B-v3e
Broken-Tutu-24B-Unslop-v2.0
Delta-Vector_Austral-24B-Winton
Magistral-Small-2506
medgemma-27b-text-it
Q3-30B-A3B-Designant
QwQ-32B-ArliAI-RpR-v4
TheDrummer_Agatha-111B-v1-IQ2_M
Qwen3-235B-A22B-Q5_K_M from community

Been preoccupied for a while but now I'm caught up. 235B was given a star rating, the others had no stars and no flags, they're just the same old really.

Looking for contributions:
Deepseek models
dots.llm1.inst
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the EXACT prompt sent to the backend, in addition to the output. And your backend used + pull datetime. Also a link to the quant used, or what settings you used to make your quant.
Anonymous No.105616768 >>105616867
>>105616734
Long time no see
Hi all, Drummer here... No.105616800 >>105616867
>>105616734
Could you test...

Cydonia-24B-v3i
Cydonia-24B-v3j

They're both v3.1 candidates.

I'm also curious about...

Cydonia-24B-v3f and Cydonia-24B-v3g but more for research purposes.

Big fan of your work!
Anonymous No.105616867
>>105616768
Yee

>>105616800
I'll be honest, I don't feel like spending my kind of slow internet downloading all that. You could just copy and paste the prompts into mikupad and get the outputs yourself pretty easily. If you simply just want all the outputs archived in one place, I do take contributions and will add them if you give them (and of course I will read/rate them).
Anonymous No.105617143 >>105619640
>>105616734
>235B was given a star rating
"235b is bad" bros how are we coping with being empirically proven wrong
Anonymous No.105617146
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
https://arxiv.org/abs/2506.13284
>In this work, we investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models. We begin by curating the SFT training data through two scaling strategies: increasing the number of collected prompts and the number of generated responses per prompt. Both approaches yield notable improvements in reasoning performance, with scaling the number of prompts resulting in more substantial gains. We then explore the following questions regarding the synergy between SFT and RL: (i) Does a stronger SFT model consistently lead to better final performance after large-scale RL training? (ii) How can we determine an appropriate sampling temperature during RL training to effectively balance exploration and exploitation for a given SFT initialization? Our findings suggest that (i) holds true, provided effective RL training is conducted, particularly when the sampling temperature is carefully chosen to maintain the temperature-adjusted entropy around 0.3, a setting that strikes a good balance between exploration and exploitation. Notably, the performance gap between initial SFT models narrows significantly throughout the RL process. Leveraging a strong SFT foundation and insights into the synergistic interplay between SFT and RL, our AceReason-Nemotron-1.1 7B model significantly outperforms AceReason-Nemotron-1.0 and achieves new state-of-the-art performance among Qwen2.5-7B-based reasoning models on challenging math and code benchmarks, thereby demonstrating the effectiveness of our post-training recipe.
https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
Isn't posted yet. pretty interesting
Anonymous No.105617169
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
https://arxiv.org/abs/2506.13585
Not sure if they posted a paper when they released their model but the arxiv version is up now
The Amazon Nova Family of Models: Technical Report and Model Card
https://arxiv.org/abs/2506.12103
paper from amazon. doesn't seem like they're open sourcing anything so w/e
Anonymous No.105617224 >>105617522
Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
https://arxiv.org/abs/2506.13681
>Sampling from language models impacts the quality and diversity of outputs, affecting both research and real-world applications. Recently, Nguyen et al. 2024's "Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs" introduced a new sampler called min-p, claiming it achieves superior quality and diversity over established samplers such as basic, top-k, and top-p sampling. The significance of these claims was underscored by the paper's recognition as the 18th highest-scoring submission to ICLR 2025 and selection for an Oral presentation. This paper conducts a comprehensive re-examination of the evidence supporting min-p and reaches different conclusions from the original paper's four lines of evidence. First, the original paper's human evaluations omitted data, conducted statistical tests incorrectly, and described qualitative feedback inaccurately; our reanalysis demonstrates min-p did not outperform baselines in quality, diversity, or a trade-off between quality and diversity; in response to our findings, the authors of the original paper conducted a new human evaluation using a different implementation, task, and rubric that nevertheless provides further evidence min-p does not improve over baselines. Second, comprehensively sweeping the original paper's NLP benchmarks reveals min-p does not surpass baselines when controlling for the number of hyperparameters. Third, the original paper's LLM-as-a-Judge evaluations lack methodological clarity and appear inconsistently reported. Fourth, community adoption claims (49k GitHub repositories, 1.1M GitHub stars) were found to be unsubstantiated, leading to their removal; the revised adoption claim remains misleading. We conclude that evidence presented in the original paper fails to support claims that min-p improves quality, diversity, or a trade-off between quality and diversity.
RIP minp
Anonymous No.105617281
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
https://arxiv.org/abs/2506.12040
>Binary quantization represents the most extreme form of large language model (LLM) compression, reducing weights to ±1 for maximal memory and computational efficiency. While recent sparsity-aware binarization methods achieve sub-1-bit compression by pruning redundant binary weights, they suffer from three critical challenges: performance deterioration, computational complexity from sparse mask management, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit LLM quantization framework that leverages adaptive weight transformation and binary pattern clustering to overcome these limitations, delivering both superior accuracy and efficiency. Our approach incorporates two key innovations: (1) a Learnable Transformation that optimizes invertible scaling and rotation matrices to align binarized weights with full-precision distributions, enabling incoherence processing to enhance layer-wise representation quality; (2) a Flash and Accurate Binary Codebook that identifies recurring binary vector clusters, compressing them into compact indices with tailored distance metrics and sign-based centroid updates. This eliminates the need for sparse masks, enabling efficient inference on standard hardware.
https://github.com/Chooovy/BTC-LLM
bpw below bitnet...
Anonymous No.105617522
>>105617224
>After we showed these results to the authors, they informed us that we had run our experiments using the “Llama" formatting of GSM8K prompts as we used the command from the authors’ public Colab notebook; the authors clarified that "Llama" formatting should be used only for Llama models. We then reran our experiments using standard formatting of GSM8K prompts. The results were nearly identical (Appendix B), with one small difference: min-p does produce higher scores for 2 of 12 language models. Again, we conclude min-p does not outperform other samplers on either formatting of GSM8K when controlling for hyperparameter volume.
Why would you want to publish your ignorance of chat templates and wasting of ~3000 Nvidia A100-hours of compute as a result? Instead of engendering confidence in your findings, this just makes you come across as petty and seething.
Anonymous No.105617582
>>105616734
All those new 24b mistral slops and not a gem among them.
Anonymous No.105617604 >>105617608
will they ever release the multimodal qwen 3
Anonymous No.105617608
>>105617604
don't worry, qwen 4 will be omni and smart
Anonymous No.105617625
>>105616734
cockbench?
Anonymous No.105617634
>>105612859
hrt.coffee
Anonymous No.105617637 >>105617645 >>105617684 >>105617864 >>105618521
so what's the general consensus here on doing RP with reasoning? It's shit, right? It kinda improves creativity but not by much
Anonymous No.105617645
>>105617637
It depends. I've found that if you do a lot of complex math, logic, and programming in your RPs you'll notice a massive difference.
Hi all, Drummer here... No.105617684
>>105617637
Prefilling the think block to have reasoning act like a director/storywriter seems to help. I've only tried it on the new R1 though.
Anonymous No.105617731 >>105620373
Can I have a general mesugaki card sample?
Anonymous No.105617753
>the purpose of benchmaxxing on math is to improve the quality of RPs where anon is a grade-school math teacher
Anonymous No.105617864
>>105617637
It definitely makes it worse for magistral, it actually makes it less likely to follow the sys prompt
Anonymous No.105618208 >>105618276 >>105618366 >>105618436 >>105618500
Can you redpill me on WizardLM and Miqu? They seem quite large models, did anyone actually use them at larger quants?
Anonymous No.105618276 >>105618364
>>105618208
Buy an ad.
Anonymous No.105618329 >>105618428 >>105619257 >>105619427
Given how close we are to AGI, is it safe to say that Europe has zero chance of entering the running? What are the odds there's some dark horse AI lab that has been building in secret on the continent?
Anonymous No.105618364
>>105618276
Yeah cause those are the current hot thing.
Anonymous No.105618366 >>105618665
>>105618208
I see that you woke up from a year old coma. Qwen 3 235b repaces WizardLM directly and if you got 128gb ram +16/24gb vram dynamic R1 replaces that https://unsloth.ai/blog/deepseekr1-dynamic
Anonymous No.105618426 >>105618490
Anyone knows what multi modal model used on smash-or-pass-ai site? Can abliterated Gemma-3 do this?
Anonymous No.105618428
>>105618329
AGI will start with mistral-nemo 2
Anonymous No.105618436 >>105618481
>>105618208
No, they didn't. People would usually try to fit as much as they could into a single 3090 because that's what everyone was using (and probably what most people still are using)
Anonymous No.105618481 >>105618834
>>105618436
>probably what most people still are using
I doubt that
Anonymous No.105618490 >>105618562
>>105618426
I think Gemini 2.5 Flash probably. If you click the websim button you can edit it.
Anonymous No.105618500
>>105618208
they are more sovl compared to what we have now but they are also noticeably more retarded
Anonymous No.105618521
>>105617637
Reasoning feels like is the right place for the model to plan ahead and maintain state, but you'd have to keep at least 2 reasoning traces in context, which is different than how they've been trained (mostly single-turn math questions). And Gemma 3 works better with fake reasoning for RP than Magistral Small natively trained on it does.

Another problem of reasoning is that it dilutes attention to the instructions, so ideally you'd want to keep instructions high in the context, but again models aren't trained for it, so it often gives issues. Ironically, models not trained with system instructions in mind (just to follow whatever user says) may work better for that.

On the other hand, I find that reasoning tends to decrease repetitive patterns in the actual responses. It's just that Mistral Small and by extension Magistral suck for these uses and they're only good for saying "fuck", "cock" and "pussy". If you're OK with just that...
Anonymous No.105618562
>>105618490
>Gemini 2.5
Isn't it censored?
Anonymous No.105618654 >>105618688
server cpu cucks will lose their time in the spotlight
MoEs will not scale
Anonymous No.105618665 >>105618678
>>105618366
>Qwen 3 235b repaces WizardLM directly
Smaller Qwen 3's are censored quite badly, is this the same?
Anonymous No.105618678
>>105618665
Idk your use case but in my experience barring very few insanely slopped exceptions, no model is really censored given a good system prompt, especially 100b+ models.
Anonymous No.105618688 >>105618840
>>105618654
>MoEs will not scale
Titans&co are mixture of attention experts.
Anonymous No.105618834
>>105618481
I'm not running ~70b models but I'm still using a single 3090
A 5090 would be about 5x the price I paid for the 3090 and not much more useful
Anonymous No.105618840
>>105618688
>moae
it doesn't even sound cool!
Anonymous No.105618863 >>105618877
LLMbros... we got too cocky while image and videogenbros are eating good... I don't think anything short of actually multimodal R3 will save us...
Anonymous No.105618877 >>105619001
>>105618863
pic unrelated?
Anonymous No.105619001
>>105618877
No
Anonymous No.105619044 >>105619183 >>105619192 >>105619265 >>105619471
What is raison d'etre for Q4_0 quants if everyone agrees that Q4_K is always better? I hear this argument for years now
Anonymous No.105619181
>>105614993
No one can make shit. All hopes ride on Sam
Anonymous No.105619183
>>105619044
og nigga
Anonymous No.105619192 >>105619228 >>105619265
>>105619044
Nothing, it's a legacy format. iq4_xs is both smaller and better. q4_k_s/m are very slightly bigger and much better.
Anonymous No.105619228 >>105619265
>>105619192
nta but k_s?
I've been grabbing k_m like a monkey all this time
Anonymous No.105619257
>>105618329
>dark horse AI lab
Kek, you have no idea.
Anonymous No.105619265 >>105619472
>>105619044
>>105619192
>>105619228
qat only works for q4_0 for models that have it (gemma)
Anonymous No.105619313 >>105619435
has anyone tried any of the new ocr models such as MonkeyOCR and Nanonets-OCR-s?

looking to convert research papers in pdf to markdown or txt
docling is letting me down in accuracy and some other issues
Anonymous No.105619427 >>105619439 >>105619443
>>105618329
>Given how close we are to AGI
We're not.
Anonymous No.105619435 >>105619493
>>105619313
Have you tried simply extracting the text directly from the PDF?
Anonymous No.105619439
>>105619427
We're
Anonymous No.105619443 >>105619485
>>105619427
Of course, your job is safe don't worry. But hypothetically if we were... does Europe have a shot?
llama.cpp CUDA dev !!yhbFjk57TDr No.105619471
>>105619044
q4_0 is faster than q4_K_M due to the simpler data structure.
For development purposes in particular it's also the quant that I use because I don't care about maximizing quality/VRAM in that scenario but I do care about speed and simplicity to make my measurements easier.
I never use q4_0 outside of testing though.
Anonymous No.105619472
>>105619265
qat is not nearly as good as google claims it to be.
Anonymous No.105619485 >>105619525
>>105619443
AGI will not be achieved by scaling up our current architectures.
It requires a fundamental breakthrough which could come from anywhere, including Europe.
Anonymous No.105619493
>>105619435
it didn't work that well with the multiple columns and formulas, but I'll give it another try, thanks.
Anonymous No.105619525 >>105619590
>>105619485
>AGI will not be achieved by scaling up our current architectures.
Source: your ass
>breakthrough which could come from anywhere, including Europe
It could also come from a pajeet 5 year old. Will it? No. You actually need a ML industry and companies for that.
Anonymous No.105619566 >>105619662
Gemma 3 seems obsessed with wrapping her legs around your waist, no matter her position.
Anonymous No.105619590 >>105619596 >>105619607
>>105619525
*AHEM*
Anonymous No.105619596
>>105619590
They have nothing absolutely nothing noteworthy since Miqu and Mixtral.
Anonymous No.105619607 >>105619612
>>105619590
>no good model since 2407, 9 months ago
>rekt by r1 like everyone else, except they dont have nearly as much money to recover, and didn't
Anonymous No.105619612
>>105619607
>9 months ago
jesus christ
Anonymous No.105619630 >>105620152 >>105620434
https://huggingface.co/Menlo/Jan-nano
https://huggingface.co/Menlo/Jan-nano-gguf
Anonymous No.105619635 >>105619642
>still no local llm with native image gen
Anonymous No.105619640
>>105617143
It's only good at high quant.
Anonymous No.105619642
>>105619635
too unsafe, please understand
Anonymous No.105619662 >>105620164
>>105619566
For me she loves tangling her fingers in my hair and arching her back, no matter what the context.
Anonymous No.105619694 >>105619710 >>105619724 >>105619725 >>105620779
I tried dots q4. It feels like a very smart 30b model that still makes a brainfart or two like a 30b. So it is pretty useless. It is like a moe grok1.
Anonymous No.105619697
>>105612316
>>105611862
SWE-bench tests Python only. Perfect if you need something marginally better at writing a glue script I guess.
Anonymous No.105619710 >>105619717 >>105619719
>>105619694
prompt issue
Anonymous No.105619717 >>105619722
>>105619710
model issue
Anonymous No.105619719
>>105619710
The sperm that made you had a prompt issue.
Anonymous No.105619722
>>105619717
works for me, I'm having a blast
Anonymous No.105619724
>>105619694
>It is like a moe grok1.
but grok1 is already a moe?
Anonymous No.105619725 >>105619734
>>105619694
grok 1 was a moe
Anonymous No.105619734 >>105619741
>>105619725
YOU are moe
Anonymous No.105619741
>>105619734
nuh uh im dense
Anonymous No.105619751 >>105619796 >>105619812
long shot but if anyone else saved migus (or friends) down feel free to reupload. most of my edited data got wiped.

>>105616569
what
Anonymous No.105619796
>>105619751
Stop with offtopic spamming. Nobody cares about your journey to become a woman.
Anonymous No.105619812
>>105619751
>my edited data got wiped
Good!
Anonymous No.105619828 >>105619849
why can't you faggots just get a miku thread going and migrate? you can spam there as much as you want
Anonymous No.105619849
>>105619828
They don't like /a/ for some reason.
Anonymous No.105619851 >>105619864
>samefagging this hard
Anonymous No.105619864
>>105619851
Stop spamming troon
Anonymous No.105619870
my thinking boxes in st are inconsistent and sometimes the reply ends up inside both the thinking and the reply blocks
Anonymous No.105619906 >>105619933
>>105615360
>>712856332
added some random loop animations
single file:
https://multiup.io/download/1f7bdf2cee24911d0cef25316887bba0/migu%20animations.zip
Anonymous No.105619933 >>105619951
>>105619906
Nobody loves you and you should kill yourself
Anonymous No.105619951 >>105620049
>>105619933
A few loony troons love him though
Anonymous No.105619971 >>105619977 >>105619997 >>105620008
Mikugaki Anon makes the thread more comfy, muh troons posters are just annoying.
Anonymous No.105619977
>>105619971
True story
Anonymous No.105619978
Masculine urge to post pictures of vocaloids.
Anonymous No.105619994 >>105620063
>vocaloids
>masculine
The irony ironing
Anonymous No.105619997
>>105619971
What causes a person to think about and see "troons" everywhere?
Anonymous No.105620008 >>105620013 >>105620183
>>105619971
This is not your safespace.
Anonymous No.105620013 >>105620026
>>105620008
It's not yours either, feel free to fuck off.
Anonymous No.105620026 >>105620047
>>105620013
I am not the one shitting up thread with anime ai generated slop though
Anonymous No.105620045 >>105620155
I think there's a compromise to be made here - miku can be posted, but she must advocate for total troon death.
Anonymous No.105620047 >>105620103
>>105620026
>go to ai thread
>see ai generated content
>get upset
zoomer logic everyone
Anonymous No.105620049
>>105619951
i love xim xes my xusband
Anonymous No.105620063
>>105619994
>masculine urge to fuck women

>women
>masculine

This is your logic.
Anonymous No.105620103 >>105620144
>>105620047
No that works with /sdg/ or /ldg/ only.
Anonymous No.105620144
>>105620103
You would still be sperging out even if they were made with anole or bagel.
Anonymous No.105620152
>>105619630
Q4 might be fine to load in browser directly and replace google
Anonymous No.105620155
>>105620045
Why would she go against her only fanbase?
Anonymous No.105620164
>>105619662
Yeah, those too. I get that sex ultimately is repetitive, but it feels as if Gemma 3 only knows a couple ways of describing it non-explicitly. There are other areas too where it isn't very creative and after a while you'll notice it uses always the same patterns.
Anonymous No.105620183
>>105620008
Ywnbaw
Anonymous No.105620193 >>105620203 >>105620226 >>105620425 >>105620650
Today has been one of the worst blackpills for me regarding local models in a long time.

The strongest open source vision model is worse than o4-mini.

Try this experiment:

Upload this image https://www.lifewire.com/thmb/7GETJem9McVRDI8kbzLM6TfwED0=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/windows-11-screenshots-615d31976db445bb9f725b510becd850.png

With this prompt

You are an assistant tasked with finding a bounding box for the indicated element.
In your response you should return a string in the following format:
[BOUNDING_BOX x0 y0 x1 y1]
x0, y0, x1, y1 being decimal coordinates normalized from 0 to 1 starting at the top left corner of the screen.
The indicated element you are required to find is the following:
Start Menu button

The only model that gives a half right response is Gemma 3 27B. All the other open source models give wrong answers. ALL of the proprietary models give better answers than the best open source model.

Now you might think this is a weird thing to ask the model. But this is exactly the kind of tasks that are required for an assistant to control a computer and perform tasks autonomously.
Anonymous No.105620203 >>105620227
>>105620193
This is about as retarded as expecting them to count letters and do calculations.
Anonymous No.105620226 >>105620247 >>105621260
>>105620193
Gemma 3 only uses a 400M parameters vision model and with current implementations it encodes all images in 256 tokens at 896x896 resolution. It's a miracle it performs like it does. Imagine if it had at least twice the size.
Anonymous No.105620227 >>105620299
>>105620203
Then what workflow do you suggest to let a model control a computer?
And if it's so retarded then how come the tiniest OpenAI or Gemini models can give a reasonable answer but Qwen 3 235B can't? Your response sounds like cope to me.
Anonymous No.105620247
>>105620226
Yeah but the problem is that I haven't found any open source models that work. Llama 3.2 90B and the Qwen model I mentioned above give worse responses than Gemma.
The fact that Gemma 27B works given the tiny size tells me Google has used some of the same training data or methods they used for Gemini and that's why it kinda works.
Anonymous No.105620270 >>105620315
Anonymous No.105620299 >>105620369
>>105620227
>tiniest OpenAI or Gemini models
How do you know the number of parameters these closed models have?
For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?
Anonymous No.105620315
>>105620270
beeg meeg
Anonymous No.105620326 >>105620345 >>105620350
Qwen won't make dense models larger than 30B anymore.

https://x.com/JustinLin610/status/1934809653004939705
> For dense models larger than 30B, it is a bit hard to optimize effectiveness and efficiency (either training or inference). We prefer to use MoE for large models.
Anonymous No.105620333 >>105620361
https://www.youtube.com/watch?v=_0rftbXPJLI
Anonymous No.105620345
>>105620326
Local is evolving
Anonymous No.105620350 >>105620407
>>105620326
The future is <=32B Dense and >150B MoE. And I'm all for it, just buy RAM.
Anonymous No.105620361
>>105620333
Buy ad!
Anonymous No.105620366
Anonymous No.105620369
>>105620299
>How do you know the number of parameters these closed models have?
I don't, but what difference does it make? If they have better quality results because they have bigger models they still have better quality results.
>For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?
Again, does it matter? Do local users have any kind of image segmentation model that can detect GUI elements? No, at least that I know of. I tried the most popular phrase grounding model (owlv2) and it basically knows nothing about GUI elements.
If you wanna have a go at it here's a list of models
https://huggingface.co/models?pipeline_tag=zero-shot-object-detection&sort=trending
Anonymous No.105620373
>>105617731
Yuma.
Anonymous No.105620407 >>105620414 >>105620419
>>105620350
RAM is slow.
Anonymous No.105620414 >>105620422
>>105620407
but not for MoE!
Anonymous No.105620419 >>105620648
>>105620407
Just buy fast RAM.
Anonymous No.105620422
>>105620414
It is compared to a dense model you can fit in VRAM.
Anonymous No.105620425 >>105620437
>>105620193
This is a trap, data-mining post by closed AI
Tell sam atman to kill himself.
Anonymous No.105620434 >>105620457 >>105620517
>>105619630
>MCP Server?
>No Problem, set up SERPER API!
>Just place your SERPER API KEY IN JAN! 1000 Pages for only 0.30$!!
Are they fucking retarded? I can just use grok for free and it probably works better too. Soon probably gemini according to rumors.
I want something local. Why don't they show me how to set that up with duckduckgo or brave or whatever.
Anonymous No.105620437
>>105620425
No it's not
Anonymous No.105620457
>>105620434
Download a gguf model locally, load it up on your gpu....to call a $$ api for the results.
They didnt think that through. How stupid.
Anonymous No.105620495 >>105620519 >>105620918
Asking for a friend, how do I go from no model to having a model which acts just like a girl i used to know?
Anonymous No.105620517
>>105620434
>It's a chrome app
>2GB installed
>1GB updater
>All in my C: which has no more free space left
I'm so angry
Anonymous No.105620519
>>105620495
Go to koboldcpp's github, find the wiki tab, and read the quickstart.
Then download mistral nemo instruct gguf from huggingface and silly tavern and make a card of that person.
Anonymous No.105620648 >>105620666
>>105620419
If only you could easily buy more channels...
Anonymous No.105620650 >>105620738
>>105620193
Bro you could train a tiny yolo model on whatever GUI elements you want and strap that to your LLM. Go be retarded elsewhere
Anonymous No.105620666 >>105620674
>>105620648
DDR6 and CAMM2 are coming Soon™ to save the day
Anonymous No.105620674
>>105620666
Here's hoping it comes alongside CPUs with even wider busses too.
Anonymous No.105620738 >>105620763 >>105620807
>>105620650
We're talking about GUI elements here. Training an object detection model on specific GUI elements would take a lot of effort for little reward. If I were to take screenshots of all the elements I want to click on I could just do image similarity matching.
What I want to do is describe the element I want to match in natural language and have the model find it, without giving it any visual examples (other than the one it has to find in the image).
You think I'm retarded? Why? You think the idea is retarded?
You think Manus and Operator are retarded?
Anonymous No.105620763
>>105620738
https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B
Anonymous No.105620779
>>105619694
It felt worse than the current 30B models, though.
Anonymous No.105620807 >>105620889 >>105621260
>>105620738
You want the bounding boxes of GUI elements, which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what is. The reward would be having a model for your use case and stop bitching here. I'm training my own small models for my use cases all the time.
Anonymous No.105620889 >>105620944
>>105620807
It's not really a segmentation task. Segmentation in the classic sense (like in the YOLO models you mentioned) means detecting pre-determined categories in the image. The term they use for detecting objects based on free form natural language is phrase grounding.
>which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what is
I don't think using the tool that works better for the job is retarded. Do you? Or are you saying there a tool that works better? If so, then I challenge you to show me a segmentation model that performs better at doing this task than ChatGPT or Gemini. Like I said above, I tried the most popular phrase grounding model and it doesn't know what a start menu is. When you ask it about GUI elements it will just randomly highlight random icons in the image.
>I'm training my own small models for my use cases all the time.
If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.
Anonymous No.105620918
>>105620495
>just like a girl i used to know?
Stop. Don't try to have LLMs simulate real people. Take your meds instead.
Anonymous No.105620944 >>105621016
>>105620889
Multimodality is in its infancy and requires a lot more resources to pull off than simple LLM + tool calling, so why are you even surprised that only cloud models are able to do that?
The only solution now for local models is training a model for a specific task, which here is segmentation on GUI elements. Even if Google bothers to release a new model it won't be as good as their flagship, that much is a given.
>If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.
Model A is a cloud model and model B is running on my own computer, model B is superior by default.
Anonymous No.105621016 >>105621036 >>105621039
>>105620944
>so why are you even surprised that only cloud models are able to do that?
Because I was under the impression that the largest open source models would BTFO the -mini and Flash commercial models in all tasks. I was hoping somebody to prove me wrong but it seems that the vision capabilities in local models are just worse.
>Model A is a cloud model and model B is running on my own computer, model B is superior by default.
That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.
Anonymous No.105621036 >>105621076
>>105621016
>That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.
For all work. There is no reason to ever give them free data, even if you've been successfully programmed to not care about them building a profile on you.
Anonymous No.105621039 >>105621076
>>105621016
You can't automate that stuff without using your cloud model API and that's not free, it has nothing to do with nsfw
Anonymous No.105621076 >>105621096
>>105621039
You mean free as in beer or free as in freedom? Because if I can spend a few dollars to save me an hour of work then I probably would.

>>105621036
It's not free if they're giving me something in return.
Anonymous No.105621096 >>105621319
>>105621076
>free as in beer or free as in freedom
Both. You really want to give cloud models pics of your own computer? That's not even a question if you work for a company
Anonymous No.105621260 >>105621294 >>105621405
>>105620226
do local inference backends not support some form of tiling to spread the image out across several tiles?

I've been using Gemini 2.5 for some video understanding tasks and I use ffmpeg to resize+pad the extracted frames so that they fit across two tiles. Works fairly well for a lot of tasks but the bounding boxes are kinda shit.

>>105620807
retard
Anonymous No.105621282 >>105621313
New music model dropped:

https://github.com/tencent-ailab/SongGeneration

Someone please try this and report back.
Anonymous No.105621294 >>105621343
>>105621260
>Do X, but it's shit for your use case
Retarded gorilla
Anonymous No.105621307 >>105621969
>>105611492 (OP)
>install LM Studio on laptop with a 3060
>Not even 30% CUDA utilization with 100% offloading to vram
>Everything takes minutes to answer
Why might this be?
Anonymous No.105621313 >>105621337
>>105621282
>SongGeneration-base(zh) v20250520
>SongGeneration-base(zh&en) Coming soon
>SongGeneration-full(zh&en) Coming soon
Chinese only for now
Anonymous No.105621319 >>105621342
>>105621096
If George Hotz is not afraid of streaming his PC to the whole world, I don't see why I should be afraid of streaming my PC to Google.
As for the company I work for, nobody is monitoring me that closely that I'd get in trouble for leaking a few screenshots with bits of internal data here and there to a non authorized API . I don't handle anything too sensitive.
Anonymous No.105621337
>>105621313
It seems to support instrumental stuff at least.
I am curious what is the generation speed for this, if it's dreadful like YuE or if it's a bit fast like AceStep (which is shit but fast)
Anonymous No.105621342 >>105621386 >>105621399
>>105621319
>Defending cloud cuckery this hard
You're in the wrong general bro
Anonymous No.105621343 >>105621399
>>105621294
Get some reading comprehension you ape. Everything apart from the bounding boxes works well. I can get time stamped occurrences of company logos even when they're blurred/upside down, it completely btfos older methods. Even the inaccurate bounding boxes are still in the general location and useful for human evaluation. It's super convenient to feed the audio in too.
Anonymous No.105621386
>>105621342
I'm just being realistic. Acknowledging deficiencies is the first step towards improving.
I have a friend who told me he has had success with this using an open source model but doesn't remember which one he used, he's gonna send me some info when he gets back from work.
Anonymous No.105621399
>>105621343
>>105621342
Anonymous No.105621405 >>105621463
>>105621260
Pan&Scan (tiling) isn't implemented in llama.cpp and it didn't work last time I tried it in vLLM where apparently it's implemented (with an optional flag to enable it). From what I've read, it needs some form of prompt engineering; it's not as simple as just sending image tiles.
Anonymous No.105621463
>>105621405
thanks for the info. That explains why Gemma performed so poorly for manga translation when I tried a while back, the image was probably resized to the point of the text being unreadable.
Anonymous No.105621574
>>105621559
>>105621559
>>105621559
Anonymous No.105621969
>>105621307
Low memory bandwidth can't saturate compute.