/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105601326 &
>>105589841►News
>(06/16) MiniMax-M1, hybrid-attention reasoning models released: https://github.com/MiniMax-AI/MiniMax-M1>(06/15) llama-model : add dots.llm1 architecture support merged: https://github.com/ggml-org/llama.cpp/pull/14118>(06/14) NuExtract-2.0 for structured information extraction: https://hf.co/collections/numind/nuextract-20-67c73c445106c12f2b1b6960>(06/13) Jan-Nano: A 4B MCP-Optimized DeepResearch Model: https://hf.co/Menlo/Jan-nano►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105601326--Papers:
>105606869 >105606875--Evaluation of dots.llm1 model performance and integration challenges in local inference pipelines:
>105601735 >105604736 >105604782 >105604857 >105604810 >105604838 >105605017 >105605319 >105605475 >105605551 >105605556 >105605609 >105605671 >105605701 >105605582 >105605670 >105605965--llama-cli vs llama-server performance comparison showing speed differences and config inconsistencies:
>105601495 >105601540 >105601746 >105601830 >105601953 >105601967 >105602123 >105602170 >105602190 >105602380 >105601654--Evaluating budget hardware options for local LLM deployment with portability and future model scaling in mind:
>105609676 >105609743 >105609808 >105609858 >105610000 >105610275 >105610095--VideoPrism: A versatile video encoder achieving SOTA on 31 of 33 benchmarks:
>105610184--Sugoi LLM 14B/32B released via Patreon with GGUF binaries and claimed benchmark leads:
>105606204 >105606305 >105606399 >105609562 >105609620--Interleaving predictions from multiple LLMs via scripts or code modifications:
>105609453 >105609499 >105609500 >105609534--Hailo-10H M.2 accelerator questioned for real-world AI application viability:
>105602205 >105602335--Radeon Pro V620 GPU rejected due to driver issues and overheating in LLM use case:
>105603370 >105603394 >105603418 >105603454 >105603762 >105603893 >105604087--Sycophantic tendencies in cloud models exposed through academic paper evaluation:
>105601903 >105602389 >105602410 >105602064 >105603398 >105603416--MiniMax-M1, hybrid-attention reasoning models:
>105611241 >105611443--Qwen3 models released in MLX format:
>105608806--Miku (free space):
>105601934 >105603103 >105604354 >105604389 >105604736 >105605940 >105606009 >105606217 >105610016 >105610160 >105610284 >105610486 >105611108 >105611119►Recent Highlight Posts from the Previous Thread:
>>105601330Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
>it's june 16, 2025 and there is STILL no minimax gguf
>>105611523And there never will be. It uses lightning attention
https://github.com/ggml-org/llama.cpp/issues/11290
>>105611471here you go sar, they have a huggingface space
https://huggingface.co/spaces/MiniMaxAI/MiniMax-M1
>>105611583>looking it up in my mind
I see that Unsloth uploaded dots.llm1 quants within the last few hours. I've been waiting to try out this model. If I have 96GB VRAM, which is better: IQ4_XS, IQ4_NL, or UD-Q3_K_XL? These are the 3 that look like the largest size I can fit. Tbh I'm not even really sure what all these newer meme quants are or which is supposed to be best.
does r1 pass the mesugaki test with the new version they released?
>>105611563>lightning attentionWhat's next? Bolt attention?
> Drummer's merge is already an improvement, yet retains most of Magistral's strengths.>>105610116Hey anon, which version did you use and what strengths are you referring to? Was reasoning good and useful?
>>105611583Here's a version with a bone thrown in.
>>105611656I'm holding out for thunder attention
qte
md5: 566dfea476b42df878517f4ab96d3e71
🔍
F5 TTS has had bit of upgrade to inference speed recently. If you havent kept up with the update. ~3 different perf update.
>flash atten2
>Empirically Pruned Step Sampling (lower number of steps for high quality output)
>Single transformation instead of 2 step (half the inference time required)
https://huggingface.co/moonshotai/Kimi-Dev-72B
https://github.com/MoonshotAI/Kimi-Dev
>We introduce Kimi-Dev-72B, our new open-source coding LLM for software engineering tasks. Kimi-Dev-72B achieves a new state-of-the-art on SWE-bench Verified among open-source models.
>Kimi-Dev-72B achieves 60.4% performance on SWE-bench Verified. It surpasses the runner-up, setting a new state-of-the-art result among open-source models.
>Kimi-Dev-72B is optimized via large-scale reinforcement learning. It autonomously patches real repositories in Docker and gains rewards only when the entire test suite passes. This ensures correct and robust solutions, aligning with real-world development standards.
>Kimi-Dev-72B is available for download and deployment on Hugging Face and GitHub. We welcome developers and researchers to explore its capabilities and contribute to development.
>>105611862>code>localI sleep
>>105611492 (OP)Uh oh, the 24/7 seething fatoid disgusting ratoid troons transisters didn't like that one, huh?
You will
Never
Ever
Be a
Real
Woman
looooooooooool
>>105611887Don't misunderstand. I don't project to be or even want to be a woman. That construct is entirely within your own ass, not even a snug fit, it's spacious.
Miku as a character or concept is irrelevant to me. I like her design and my perception of her is a convenient, often portable twintailed onahole. There is no wanting to be her, she is a sleeve for me to rub one out.
Hope that helps clarify. Goodness, you can't seem to kick the habit of malding.
Keep this up and you'll never get a kurisu.
>>105611838I've definitely seen a huge increase in speed for large chunks of texts. With NFE=7
>Declaration of Independence>8188 characters>Inference time: 77 seconds>Output Time: 8 mins 49 seconds of audiohttps://vocaroo.com/1oiTcWPgdj6i
>>105611805The second option is a trick, we all know she's not wearing any.
>>105611673Do you know how to properly fine tune MoE models?
>>105611985This is with 2070 btw. So anyone with better GPU can double/triple the inference speeds.
file
md5: 6c52dd43bbc39d54f336c57b7ae74333
🔍
>Looking up (in my mind) some sources
Minimax-M1 knows Teto's birthday and that she's an UTAU. It would be a disappointment if it did not given its size.
this single point of knowledge is irrefutable evidence that proves that the model is good. we'll be back.
>>105611563>neverbut that was closed saying it could be revisited after refactoring, and they seemed to later do the refactor here:
https://github.com/ggml-org/llama.cpp/pull/12181
>Introduce llm_memory_i concept that will abstract different cache/memory mechanisms. For now we have only llama_kv_cache as a type of memoryand looks like work has picked up on other models with competing cache mechanisms (mamba etc.)
https://github.com/ggml-org/llama.cpp/pull/13979
now we just need someone with motivation, a vibe coding client, and good enough prompt engineering skills to revisit minimax and we're fucking IN
https://huggingface.co/moonshotai/Kimi-Dev-72B
>>105610392don't you need to enable tool use or something like that? are most engines compatible?
>>105612268please search before posting
>>105611862
>>105612298nah go shove a janny dilator up your holes though ;)
>>105611862>qwen 2 finetune*yawn*
Also, apologize for Devstral
>>105611862do they actually aim for the moon?
>>105612316most meaningless graph award
>>105612285"tool use" is just sending a json object of available tools in the context and executing whatever tool the model invoked in its reply. That's entirely up to the client making the requests to abstract away. I mostly use llama-server, but any engine that exposes an OpenAI-compatible API should work.
>>105611602deepseek (through web) said "making a mental note" to me recently, had no seen that before.
>>105610000unfortunately yes, between rooms
>>105610095that's great, but double the price. Also, I understand a 16GB card can only load small models (but could be used for diffusion)
>>105612527Literally any pc case is portable "between rooms"
>>105612527>portable between roomsfor what motherfucking purpose?
>>105612316DID THE PARETO FRONT JUST DO WHAT I THINK IT DID?
hello saaars
haven't been keeping up with lmgs since deepsneed released, what's the current meta?
>>105612703deepsneed or nemo.
>>105611492 (OP)She's sexy. Can I look like that? Is there any tech for that?
how are LLMs at femdom? beyond the cursory stuff like verbal degredation and humiliation, can they lean more into the power dynamic side and give you orders and encouragement, dictate what you eat, how you dress, more control yet still nurturing
asking for friend
>>105611492 (OP)>Looks at news>Nothing but small models and research stuff that can't RP worth shitIs it over?
>>105611838F5R-TTS is better
>>105612859You need to reroll your char. See this:
>>105612884
>>105613005The cooldown for rerolling again is kinda long and early game is ass.
>>105612968We got Magistral and dots last week.
>>105612873tell your friend he has mommy issues
>>105612873There is a trick to it. Tell it to roleplay as a wealthy sadistic werewolf milionare that inexplicably fell in love with his 5/10 average unassuming secretary. Then use an agent to rewrite what it wrote and swap werewolf with dommy mommy of your choice.
>>1056129682mw until deepsex V4
>>105613064high ground? here? are you actually serious
>>105613087Yes. It is over anakin. Take your mikutroons and walk into the lava yourself
>>105613030Yes, this is a problem. I want to look like her in 3-4 years.
>>105612888when regular linus isn't strong enough, take it to the arch linus.
>>105612989Thanks but hows the performance between the two?
>>105613139You will never be 2d, anon
>>105611492 (OP)alice.gguf when?
lmg activate the insider agent and leak it
we will finish her training with antisemitic propaganda and gpu bukkake
/lit/fag here. I'm working on an "offline MUD" of sorts and need a writing buddy to ping pong ideas. Chatgpt is good enough but I'm interested in fine-tuning.
What model would you guys recommend for my setup?
>ryzen 7 5700G
>32 GB
>no GPU
>do the mesugaki test with R1-0528
>explains it flawlessly, with examples
>fucking ends the message asking me if I want nh codes to illustrate that
>would've posted the log but anons over in aicg got banned for less than that
this is why the chinks are gonna win the ai race
>>105613312>fine-tuningRent hardware. You're not doing anything with that.
>>105613346anon, the ai models are supposed to do the halluzinations, not you.
>>105613346>>fucking ends the message asking me if I want nh codes to illustrate thatNo way.
>>105613312>fine-tuningjust tell the model at the start what you want and how to write
https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/blob/main/Qwen3-30B-A3B-Q4_K_S.gguf
https://huggingface.co/bartowski/google_gemma-3-27b-it-qat-GGUF/blob/main/google_gemma-3-27b-it-qat-Q4_0.gguf
What coomodel is good these days for a 24GB vramlet?
>>105612873I've tried various loli dom/yuri s&m scenarios since GPT-3 came out in 2020.
Often it works poorly needing hand holding. Closed stuff like OpenAI had too much positivity bias.
Open models like Llama was not smart enough, needed hand holding, thus ruining it, same for most small ones.
Closed stuff like Claude (Opus, sometimes Sonnet) would manage it somewhat.
(Open) Command-R managed a bit, but needed hand holding and was very schizo.
From open models, DeepSeek R1 manages it properly, but as most LLMs it will still by default jump on your dick or start sex too quickly, but with careful prompting that explains in a few sentences the desired pacing it manages to ace this almost perfectly, It can go both fast and slow and it's leading the story by itself thus keeping your immersion.
I'd say DeepSeek 3 (the first one) failed at it, but the update works. Both R1's new and old wor, but the new has a slower pace old one was more intense, but both are intense enough if prompted right.
Now maybe the model size is too much for most, but when you consider that closed stuff that did well like Opus 3 is dying ("deprecating") and OpenAI has some positivity bias that often ruins it, I'd say R1 is one of the very few that manage to do it right.
If you accept some degree of hand holding smaller models like the 70b llama and some others managed partially, but considerably more poorly. I haven't seen any 7-13b range manage.
I'm a bit interested in trying it with Magistral sometime, because I've noticed that R1 would sometimes make plans and I would pass some of those CoT plans back to it (selectively retaining think blocks), so that it can lead the story over over many turns, which is a lot more fun than LLMs that forget what they were doing half a page ago or what they intended to do.
tl;dr: with careful prompting it works very well on some big models, mostly R1. DS3 sometimes works, but is gacha. everything else often needs hand holding.
>>105612873you have narrow shoulders and literally zero eyebrow line
>>105613389>20+B I don't want to turn my toaster into a pressure cooker bomb.
>>105613348>0.15BIsn't that super small? Still, might be fun to play with, thanks.
>>105613349Yeah probably fine-tuning is not the right word. I just want some degree of control beyond setting temp and prompting. Pic related is more or less what I expect. Just a dumb box that churns out lore.
>>105613494are you erping with me?
>>105613546Qwen3-30b should give you ~10t/s at low context so it is very suitable for your machine, although I don't know if it is good for writing or whatever you are doing.
>>105611419My bad.
Let's enjoy another Chinese SOTA at 7t/s (ds-r1 runs at 4t/s)
>>105613058Didn't magistral suck, though?
>>105613346>nh codes to illustrate thatI believe it. Fucking normalfag shit.
>>105611492 (OP)This got me thinking about the calculation for minimum non-whore skirt length.
How is dots for RP? 235b is too big for my rig, but dots seems like it could be a sweet spot.
It's been quite some time since I've played with local models, has windows + amd gotten any better? It's a pain in the ass to have to boot up linux every time I want to rp
exosuit
md5: 92b235587833d7e9dae666a6e72feb2f
🔍
>>105613625Yeah I think fewer parameters but high context is going to work better for me. But gonna keep that in mind. Thanks.
>>105612989code to run it? I can only find papers
>>105611492 (OP)I currently have a server with a ton of CPU cores and spare RAM, but it only has a 1050ti with 4GB VRAM in it. Is it even worth trying to run a local language model on it?
>>105613814it's noticeably pretty sterile when it comes to explicit nsfw, but that's nothing new for that general size range. if you're used to llama/qwen2.5 70b-class derivatives you'll feel right at home, but at least dots may be faster and have some more knowledge
>>105613377You killed me fucker, kek
>>105613470Are you doing a thesis on the topic or something?
>>105613915I liked 72b EVA-Qwen 2.5 at IQ4_XS, though it ran really slow on my system (1-2t/s). If this performs anything like that, but with the speed of a MoE, then it sounds like it's for me.
I'm downloading Llama-4-Scout-17B-16E-Instruct-UD-TQ1_0.gguf.
Wish me luck.
I might not survive it.
>>105613874Gayest thread on /v/ right now.
>>105612873That's not femdom
>>105613962That's an RP tune.
>>105613963That's beyond being 'funny' bad. Also it's insane that unsloth's Scout repo has 100k downloads in the past month. That has to be wrong
>>105613963You'll survive fine anon, even the full precision model is shit and retarded
>>105613987>That's beyond being 'funny' badI know, pray for me.
>>105613989I feel this one might be so bad as to be a cognitohazard.
We'll see.
>>105613874>>105613979How did mikutroons become more mentally ill than furfags? At least those retards contribute to image gen and keep to their own degenerate communities instead of spamming the same generic dogshit of a waifu they obsess over everywhere because they have nothing else in their miserable life to attach to.
hi anons, i know that this isn't the best thread to ask about commercial things, but... what are the services where I can deploy sdxl/etc finetuned models (anime ones) for easy API access? Obviously one choice is renting GPU servers on runpod/vast and so on, but are there any managed solutions? I don't think I need a dedicated GPU server to start, but eventually I guess I might need to generate up to 100 images/minute or something like that.
AI generated post >105614104 btw
>>105614005>that picBaka, go back to /x/
>>105614130>105614104I say this with utmost sincerity.
I've been on 4chan since 2008. I have been to /x/ maybe 5 times total.
As in, individual instances.
Anyhow, finished downloaded it. Let's see what happens.
If I don't report back, please call my parents.
>>105613087to be fair, if you post on 4chan -- people WILL make fun of you. even if they're just as depraved.
but hey, i hope you find your perfect jerk-off mommy machine bro.
>>105614204s&m is vanilla compared to marrying a cartoon
that's not even a fetish, it's psychosis
>>105614005>>105614148>load_tensors: layer 0 assigned to device CUDA0, is_swa = 1>load_tensors: layer 1 assigned to device CUDA0, is_swa = 1>load_tensors: layer 2 assigned to device CUDA0, is_swa = 1>load_tensors: layer 3 assigned to device CUDA0, is_swa = 0>load_tensors: layer 4 assigned to device CUDA0, is_swa = 1>load_tensors: layer 5 assigned to device CUDA0, is_swa = 1>load_tensors: layer 6 assigned to device CUDA0, is_swa = 1>load_tensors: layer 7 assigned to device CUDA0, is_swa = 0>load_tensors: layer 8 assigned to device CUDA0, is_swa = 1>load_tensors: layer 9 assigned to device CUDA0, is_swa = 1>load_tensors: layer 10 assigned to device CUDA0, is_swa = 1Is this supposed to be a thing? Interleaved swa layers?
That's how they got """1 million""" context?
>>105614033Through the power of your butthurt, you have now summoned Migu.
I wonder if there will be a Blackwell card with 48GB? I don't need 96, and 32 just isn't enough. 48GB is about right. It just seems a little overboard to spend $8500 on a GPU.
Can't wait for Minimax to get supported. I'll abandon Deepseek for it.
>>105614248Scout is 10 million sir.
>>105613888sure. Just run with llama.cpp in pure CPU inference mode (or really low context on the GPU)
It'll be a bit slow, but slow is ok for playing with. You'll be better off than desktop anons stuck with 128GB max RAM capacities that can't even run big models.
>>105614266All the same when it breaks down after 8k.
>>105613963>>105614005>>105614148>>105614248Yeah, it's real bad.
Worse than Qweb 3 30B. It can't even keep up with outputting a specific pattern that much smaller models can just fine.
Amazing.
>>1056138881.5 t/s for deepseek quants
3 t/s for qwen3-235b
Your GPU will be used for promp processing only
>>105614260I remember this Miku
>I don't need 96yes you do. Big batch size genning of big Migus.
>>105614351breaks down much sooner than 8k even
https://github.com/adobe-research/NoLiMa?tab=readme-ov-file#results
Less than two weeks until we get open source Ernie 4.5/X1
>>105612859Install gentoo
Getting 10 t/s with dots on my 96 GB gaming rig and normal Llama.cpp with a custom -ot.
>>105614385>GentroonIt is in the name. I have seen tech shrek. I won't be fooled.
>>105614479Post command params and which quant you use
Also,
>>105613690>another Chinese SOTA at 7t/sI saw it coming
One day a sex model will finally drop and i will be free from this place. I hope you all die the next day.
>>105614491Hold on. I'm actually failing to allocate more context. I was testing with only 2k initially. Damn does this model not use MLA or even GQA?
>>105614479Well, is it good?
>>105614503>gaymingWhat GPU?
>>105614515Idk yet i just wanted to do an initial speed test first but trying to give more context is giving me the ooms.
>>105614260>butthurt 2007 called your hrt ass
>>105614479>with a custom -otSuggested by unsloth brothers?
>>105614535No? I am using unsloth's q4 quant though.
>>105614529On ik_llama-cli or the original?
>https://github.com/ggml-org/llama.cpp/issues/14044#issuecomment-2961375166
>since it uses MHA rather than GQA or MLA
ACK
>>105614673GQA and its devil offspring are the true killers of soulful outputs. This was common knowledge back during the llama2 era and the first command-r was good because it used natural attention as well
>>105614695Ok but what if dots doesn't have soulful outputs
>>105613951sybau nigger, let the anon talk. Finally someone shares their own experiences instead of just shitposting
Ok so it looks like I can't squeeze more than 11k context out of dots for the amount of memory I have, and now I am also getting 8.8 t/s (at 0 context, generating 100 tokens). Guess I'll test it a bit to see if it's worth downloading Q3 for.
>>105614695This. GQA kills that feeling that the model *gets* what you mean.
>>105614758thank for the info
we need some madlad company to get rid of the tokenizer and train the model on unicode
>>105611492 (OP)I'm completely new to this. Should I look further into lmgs if I don't care about chatting and image gen? Will I need a dedicated build or will my pc do?
guys, what will save local?
>>105614993BitNet OpenGPT
>>105615012I don't feel saved
Ok I tested dots and it's really meh. Feels like any other LLM really and on top of that, the trivia knowledge is also not really that good in my tests. No better than Gemma or GLM-4. MAYBE a bit better than Qwen. What trivia did people test that it had better performance on? I didn't do better on the /lmg/ favorites like mesugaki at least, and not on my personal set.
>>105614993New paradigm. Eternal waiting room until then. Every possible LLM sucks. It's over.
More realistically I'd like to see more online learning experiments or papers. Like a live feedback thumbs up/down. Not to save local, but to keep my own curiosity alive even if the resulting implementations make the models retarded, slow, broken or anything. Something new to play with.
>>105615060Miku's love
Can I generate porn and or 3d models on a 3090Ti?
>>105615090>What trivia did people test that it had better performance on?Even hilab admits the only thing it has better knowledge of is Chinese language trivia knowledge. For anything else it's beaten by fucking Qwen 2.5 72B.
>>105615106No, you're retarded
>>105614993openai's SOTA phone model will shift the pareto frontier of speed x safety
Trying to use a local model to make enhancements to tesseract OCR. Tesseract is fairly good without AI, but my ultimate goal is structured output of receipt data, so I can easily port it into hledger
What models would be best for this sort of thing? I've used ollama with some of the vision models and results haven't been great so far
some migus & friends
https://www.mediafire.com/folder/et1b18ntkdlac/vocaloid
>>105615335Why would you need a vision model after you've OCRed the receipt to text?
>>105615360>can't download the entire folder without a premium account
>>105615360>mfw I've been saving them all manually
file
md5: ff49862ffe0f835ff493ddcf9eef34dd
🔍
>>105615415>>105615419sorry, first filehost that came to mind. you can use jDownloader.
I know some other people download em so if you want to make up a more complete collection feel free
my collection, ironically, is probably more incomplete due to catastrophic data loss.
>>105615360Thank you Migu genner
>>105615360>all .jpgi curse you!
Minimax was very obviously trained on the old R1. The thinking process is the same endlessly long plain text where the model tries to think about even the most trivial shit. It even sometimes deliberately gets things wrong at first just to be able to correct itself and think some more.
>llama.cpp
>warming up the model with an empty run - please wait ... (--no-warmup to disable)
can I just skip warmup for good?
>>105615535"Warming up?" You don't know the meaning of those words, Bardin.
Do people use the quantized context with llama.cpp?
https://huggingface.co/bartowski/rednote-hilab_dots.llm1.inst-GGUF
bartgoatski quants are up. gogogo
>>105615587Yeah, but I wouldn't call them "people"
>>105615595does it need a llamacpp update?
>>105615595Shit llm for copers that somehow still dont have even 128gb ram for sneedsex
>>105615360>>105615430Host a public FTP server, you coward.
>>105615430>first filehost that came to mindmakes sense that a mikutroon's mind is retarded
Ive being seeing stupid adds for this other half ai anime bullshit...is there anything to approximate locally(plug in some llm to like an anime vroid model or something that has maybe limited voice rec)
>>105615743can you point at, precisely, what aspect of miku makes it at all relevant to trans
not the people coopting the design and changing it, the official design, as per crypton future media
you're been throwing around this trans/agp thing for literal months if not years at this point and yet you've failed to even once properly ground your point or lack thereof in any actual sense
nobody's looking at miku and thinking of sex change surgeries that's all you
nobod- why am I even bothering you're clearly off your meds
>>105612561fair enough, but you probably dont want to haul a full blown desktop like in the good lan days everyday. Still, if you have a better rec with desktop form factor Im happy to listen
>>105612568living in a shoebox, cant use the same room all the time, its functionality is time multiplexed
>>105615767Xir, this is a trans website. Nobody would be obsessed with this obsolete design if he wasn't a real woman.
>>105615788to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
whatever retard keep thinking of cock
>>105615767>you're been throwing around this trans/agp thing for literal months if not years at this point and yet you>only 1 person realized that mikuniggers are retards who just spam their dogshit mascot obsessively and almost never ever have a single based opinion imaginable that they post in the thread despite being in the thread every day and despite the many actual trannies and faggots that raid the threads but never got told off by a single mikunigger avatarposter, instead they ignore those people and keep posting their generic trash obsession waifuyeah... i wonder why people dislike you
>>105615745You don't have the IQ to run that
>>105615767>nobod- why am I even bothering you're clearly off your medsAmerican brown tumblr tier writing btw
>>105615805to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
you could just answer
>>105615820I concede you aren't a troon. Now continue not being a troon by no longer spamming that worthless avatar. And if you continue then... well you admit you are a disgusting troon.
>>105614932vector databases and semantic search
>>105615842to use your own phrasing and rhetoric
"moving goalposts, deflection, bad argument, etc. etc."
for someone who rants and raves endlessly about proper argumentation and logic you're proper fucking shit at it
never expect me to reply to your bs again
>>105615867Well there you go that is how we know you are a disgusting troon and you have AGP fantasies focusing on that retarded avatar you keep pushing on everyone. I recommend joining the 41%
>>105615360https://multiup.io/download/f927ee16eeea9bf6db1576a0d0c1f536/xx.zip
single file
>>105615887Thank you. Download finished in a couple seconds. Much better than fucking with jDownloader.
>>105615360>>105615887An artifact to be preserved
>load 0.6B model
>rig starts screeching like it's getting fistfucked by Satan himself
>load 8B model
>rigs handles it just fine
Okay I'm way over my head here, guys.
>>105615963It likes its models small and open
>>105615963First case it ran on CPU
Ive been running mythomax for years now, and I just upgraded to a 5080. Whats the new meta for coom llms?
>>105616197there is still nothing better than mythomax
>>105616261is there a way I can finetrain it a bunch of specific fetish smut to make it better?
>>1056159630.6b needs very little bandwidth, so compute usage goes up (and so do the fans). 8b needs more bandwidth, so it spends more time just waiting for memory to reach registers to compute, giving it time to chill.
>>105616340Yes you can finetune if you want (but not on a single 5080), but you really should just read a thread because this question gets asked every single god damn thread and if you look at the last thread you would be able to find at least 5 different recommendations for someone in your situation
>>105616370This. When my CPU is doing prompt processing the fans go full blast, but once it starts generating tokens they calm down.
file
md5: 8d8a0533f21928de0e48a54c29d86369
🔍
can someone generate neutral looking anime women pictures so i can use them for my blogposts?
>>105616511i have rx 6600
>>105616512Most image gen is python based. I don't know how if they work with amd. Try stablediffusion.cpp. Should probably work with vulkan.
>>105615360ピクシブのリンクでいいでは?
justpaste (DOTit) GreedyNalaTests
Added:
dans-personalityengine-v1.3.0-24b
Cydonia-24B-v3e
Broken-Tutu-24B-Unslop-v2.0
Delta-Vector_Austral-24B-Winton
Magistral-Small-2506
medgemma-27b-text-it
Q3-30B-A3B-Designant
QwQ-32B-ArliAI-RpR-v4
TheDrummer_Agatha-111B-v1-IQ2_M
Qwen3-235B-A22B-Q5_K_M from community
Been preoccupied for a while but now I'm caught up. 235B was given a star rating, the others had no stars and no flags, they're just the same old really.
Looking for contributions:
Deepseek models
dots.llm1.inst
>From neutralized samplers, use temperature 0, top k 1, seed 1 (just in case). Copy the EXACT prompt sent to the backend, in addition to the output. And your backend used + pull datetime. Also a link to the quant used, or what settings you used to make your quant.
>>105616734Long time no see
>>105616734Could you test...
Cydonia-24B-v3i
Cydonia-24B-v3j
They're both v3.1 candidates.
I'm also curious about...
Cydonia-24B-v3f and Cydonia-24B-v3g but more for research purposes.
Big fan of your work!
>>105616768Yee
>>105616800I'll be honest, I don't feel like spending my kind of slow internet downloading all that. You could just copy and paste the prompts into mikupad and get the outputs yourself pretty easily. If you simply just want all the outputs archived in one place, I do take contributions and will add them if you give them (and of course I will read/rate them).
>>105616734>235B was given a star rating"235b is bad" bros how are we coping with being empirically proven wrong
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
https://arxiv.org/abs/2506.13284
>In this work, we investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models. We begin by curating the SFT training data through two scaling strategies: increasing the number of collected prompts and the number of generated responses per prompt. Both approaches yield notable improvements in reasoning performance, with scaling the number of prompts resulting in more substantial gains. We then explore the following questions regarding the synergy between SFT and RL: (i) Does a stronger SFT model consistently lead to better final performance after large-scale RL training? (ii) How can we determine an appropriate sampling temperature during RL training to effectively balance exploration and exploitation for a given SFT initialization? Our findings suggest that (i) holds true, provided effective RL training is conducted, particularly when the sampling temperature is carefully chosen to maintain the temperature-adjusted entropy around 0.3, a setting that strikes a good balance between exploration and exploitation. Notably, the performance gap between initial SFT models narrows significantly throughout the RL process. Leveraging a strong SFT foundation and insights into the synergistic interplay between SFT and RL, our AceReason-Nemotron-1.1 7B model significantly outperforms AceReason-Nemotron-1.0 and achieves new state-of-the-art performance among Qwen2.5-7B-based reasoning models on challenging math and code benchmarks, thereby demonstrating the effectiveness of our post-training recipe.
https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B
Isn't posted yet. pretty interesting
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
https://arxiv.org/abs/2506.13585
Not sure if they posted a paper when they released their model but the arxiv version is up now
The Amazon Nova Family of Models: Technical Report and Model Card
https://arxiv.org/abs/2506.12103
paper from amazon. doesn't seem like they're open sourcing anything so w/e
Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models
https://arxiv.org/abs/2506.13681
>Sampling from language models impacts the quality and diversity of outputs, affecting both research and real-world applications. Recently, Nguyen et al. 2024's "Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs" introduced a new sampler called min-p, claiming it achieves superior quality and diversity over established samplers such as basic, top-k, and top-p sampling. The significance of these claims was underscored by the paper's recognition as the 18th highest-scoring submission to ICLR 2025 and selection for an Oral presentation. This paper conducts a comprehensive re-examination of the evidence supporting min-p and reaches different conclusions from the original paper's four lines of evidence. First, the original paper's human evaluations omitted data, conducted statistical tests incorrectly, and described qualitative feedback inaccurately; our reanalysis demonstrates min-p did not outperform baselines in quality, diversity, or a trade-off between quality and diversity; in response to our findings, the authors of the original paper conducted a new human evaluation using a different implementation, task, and rubric that nevertheless provides further evidence min-p does not improve over baselines. Second, comprehensively sweeping the original paper's NLP benchmarks reveals min-p does not surpass baselines when controlling for the number of hyperparameters. Third, the original paper's LLM-as-a-Judge evaluations lack methodological clarity and appear inconsistently reported. Fourth, community adoption claims (49k GitHub repositories, 1.1M GitHub stars) were found to be unsubstantiated, leading to their removal; the revised adoption claim remains misleading. We conclude that evidence presented in the original paper fails to support claims that min-p improves quality, diversity, or a trade-off between quality and diversity.
RIP minp
BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
https://arxiv.org/abs/2506.12040
>Binary quantization represents the most extreme form of large language model (LLM) compression, reducing weights to ±1 for maximal memory and computational efficiency. While recent sparsity-aware binarization methods achieve sub-1-bit compression by pruning redundant binary weights, they suffer from three critical challenges: performance deterioration, computational complexity from sparse mask management, and limited hardware compatibility. In this paper, we present BTC-LLM, a novel sub-1-bit LLM quantization framework that leverages adaptive weight transformation and binary pattern clustering to overcome these limitations, delivering both superior accuracy and efficiency. Our approach incorporates two key innovations: (1) a Learnable Transformation that optimizes invertible scaling and rotation matrices to align binarized weights with full-precision distributions, enabling incoherence processing to enhance layer-wise representation quality; (2) a Flash and Accurate Binary Codebook that identifies recurring binary vector clusters, compressing them into compact indices with tailored distance metrics and sign-based centroid updates. This eliminates the need for sparse masks, enabling efficient inference on standard hardware.
https://github.com/Chooovy/BTC-LLM
bpw below bitnet...
>>105617224>After we showed these results to the authors, they informed us that we had run our experiments using the “Llama" formatting of GSM8K prompts as we used the command from the authors’ public Colab notebook; the authors clarified that "Llama" formatting should be used only for Llama models. We then reran our experiments using standard formatting of GSM8K prompts. The results were nearly identical (Appendix B), with one small difference: min-p does produce higher scores for 2 of 12 language models. Again, we conclude min-p does not outperform other samplers on either formatting of GSM8K when controlling for hyperparameter volume.Why would you want to publish your ignorance of chat templates and wasting of ~3000 Nvidia A100-hours of compute as a result? Instead of engendering confidence in your findings, this just makes you come across as petty and seething.
>>105616734All those new 24b mistral slops and not a gem among them.
will they ever release the multimodal qwen 3
>>105617604don't worry, qwen 4 will be omni and smart
so what's the general consensus here on doing RP with reasoning? It's shit, right? It kinda improves creativity but not by much
>>105617637It depends. I've found that if you do a lot of complex math, logic, and programming in your RPs you'll notice a massive difference.
>>105617637Prefilling the think block to have reasoning act like a director/storywriter seems to help. I've only tried it on the new R1 though.
Can I have a general mesugaki card sample?
>the purpose of benchmaxxing on math is to improve the quality of RPs where anon is a grade-school math teacher
>>105617637It definitely makes it worse for magistral, it actually makes it less likely to follow the sys prompt
Can you redpill me on WizardLM and Miqu? They seem quite large models, did anyone actually use them at larger quants?
Given how close we are to AGI, is it safe to say that Europe has zero chance of entering the running? What are the odds there's some dark horse AI lab that has been building in secret on the continent?
>>105618276Yeah cause those are the current hot thing.
>>105618208I see that you woke up from a year old coma. Qwen 3 235b repaces WizardLM directly and if you got 128gb ram +16/24gb vram dynamic R1 replaces that https://unsloth.ai/blog/deepseekr1-dynamic
Anyone knows what multi modal model used on smash-or-pass-ai site? Can abliterated Gemma-3 do this?
>>105618329AGI will start with mistral-nemo 2
>>105618208No, they didn't. People would usually try to fit as much as they could into a single 3090 because that's what everyone was using (and probably what most people still are using)
>>105618436>probably what most people still are usingI doubt that
>>105618426I think Gemini 2.5 Flash probably. If you click the websim button you can edit it.
>>105618208they are more sovl compared to what we have now but they are also noticeably more retarded
>>105617637Reasoning feels like is the right place for the model to plan ahead and maintain state, but you'd have to keep at least 2 reasoning traces in context, which is different than how they've been trained (mostly single-turn math questions). And Gemma 3 works better with fake reasoning for RP than Magistral Small natively trained on it does.
Another problem of reasoning is that it dilutes attention to the instructions, so ideally you'd want to keep instructions high in the context, but again models aren't trained for it, so it often gives issues. Ironically, models not trained with system instructions in mind (just to follow whatever user says) may work better for that.
On the other hand, I find that reasoning tends to decrease repetitive patterns in the actual responses. It's just that Mistral Small and by extension Magistral suck for these uses and they're only good for saying "fuck", "cock" and "pussy". If you're OK with just that...
>>105618490>Gemini 2.5Isn't it censored?
server cpu cucks will lose their time in the spotlight
MoEs will not scale
>>105618366>Qwen 3 235b repaces WizardLM directlySmaller Qwen 3's are censored quite badly, is this the same?
>>105618665Idk your use case but in my experience barring very few insanely slopped exceptions, no model is really censored given a good system prompt, especially 100b+ models.
>>105618654>MoEs will not scaleTitans&co are mixture of attention experts.
>>105618481I'm not running ~70b models but I'm still using a single 3090
A 5090 would be about 5x the price I paid for the 3090 and not much more useful
>>105618688>moaeit doesn't even sound cool!
LLMbros... we got too cocky while image and videogenbros are eating good... I don't think anything short of actually multimodal R3 will save us...
>>105618863pic unrelated?
What is raison d'etre for Q4_0 quants if everyone agrees that Q4_K is always better? I hear this argument for years now
>>105614993No one can make shit. All hopes ride on Sam
>>105619044Nothing, it's a legacy format. iq4_xs is both smaller and better. q4_k_s/m are very slightly bigger and much better.
>>105619192nta but k_s?
I've been grabbing k_m like a monkey all this time
>>105618329>dark horse AI labKek, you have no idea.
>>105619044>>105619192>>105619228qat only works for q4_0 for models that have it (gemma)
has anyone tried any of the new ocr models such as MonkeyOCR and Nanonets-OCR-s?
looking to convert research papers in pdf to markdown or txt
docling is letting me down in accuracy and some other issues
>>105618329>Given how close we are to AGIWe're not.
>>105619313Have you tried simply extracting the text directly from the PDF?
>>105619427Of course, your job is safe don't worry. But hypothetically if we were... does Europe have a shot?
>>105619044q4_0 is faster than q4_K_M due to the simpler data structure.
For development purposes in particular it's also the quant that I use because I don't care about maximizing quality/VRAM in that scenario but I do care about speed and simplicity to make my measurements easier.
I never use q4_0 outside of testing though.
>>105619265qat is not nearly as good as google claims it to be.
>>105619443AGI will not be achieved by scaling up our current architectures.
It requires a fundamental breakthrough which could come from anywhere, including Europe.
>>105619435it didn't work that well with the multiple columns and formulas, but I'll give it another try, thanks.
>>105619485>AGI will not be achieved by scaling up our current architectures.Source: your ass
>breakthrough which could come from anywhere, including EuropeIt could also come from a pajeet 5 year old. Will it? No. You actually need a ML industry and companies for that.
Gemma 3 seems obsessed with wrapping her legs around your waist, no matter her position.
>>105619590They have nothing absolutely nothing noteworthy since Miqu and Mixtral.
>>105619590>no good model since 2407, 9 months ago>rekt by r1 like everyone else, except they dont have nearly as much money to recover, and didn't
>>105619607>9 months agojesus christ
file
md5: 637c2c411a0e5b144b56f46e4a0db8c5
🔍
https://huggingface.co/Menlo/Jan-nano
https://huggingface.co/Menlo/Jan-nano-gguf
>still no local llm with native image gen
>>105617143It's only good at high quant.
>>105619635too unsafe, please understand
>>105619566For me she loves tangling her fingers in my hair and arching her back, no matter what the context.
I tried dots q4. It feels like a very smart 30b model that still makes a brainfart or two like a 30b. So it is pretty useless. It is like a moe grok1.
>>105612316>>105611862SWE-bench tests Python only. Perfect if you need something marginally better at writing a glue script I guess.
>>105619710The sperm that made you had a prompt issue.
>>105619717works for me, I'm having a blast
>>105619694>It is like a moe grok1.but grok1 is already a moe?
>>105619694grok 1 was a moe
>>105619734nuh uh im dense
long shot but if anyone else saved migus (or friends) down feel free to reupload. most of my edited data got wiped.
>>105616569what
>>105619751Stop with offtopic spamming. Nobody cares about your journey to become a woman.
>>105619751>my edited data got wipedGood!
why can't you faggots just get a miku thread going and migrate? you can spam there as much as you want
>>105619828They don't like /a/ for some reason.
>>105619851Stop spamming troon
my thinking boxes in st are inconsistent and sometimes the reply ends up inside both the thinking and the reply blocks
>>105615360>>712856332added some random loop animations
single file:
https://multiup.io/download/1f7bdf2cee24911d0cef25316887bba0/migu%20animations.zip
>>105619906Nobody loves you and you should kill yourself
>>105619933A few loony troons love him though
Mikugaki Anon makes the thread more comfy, muh troons posters are just annoying.
Masculine urge to post pictures of vocaloids.
>vocaloids
>masculine
The irony ironing
>>105619971What causes a person to think about and see "troons" everywhere?
>>105619971This is not your safespace.
>>105620008It's not yours either, feel free to fuck off.
>>105620013I am not the one shitting up thread with anime ai generated slop though
I think there's a compromise to be made here - miku can be posted, but she must advocate for total troon death.
>>105620026>go to ai thread>see ai generated content>get upsetzoomer logic everyone
>>105619951i love xim xes my xusband
>>105619994>masculine urge to fuck women>women>masculineThis is your logic.
>>105620047No that works with /sdg/ or /ldg/ only.
>>105620103You would still be sperging out even if they were made with anole or bagel.
>>105619630Q4 might be fine to load in browser directly and replace google
>>105620045Why would she go against her only fanbase?
>>105619662Yeah, those too. I get that sex ultimately is repetitive, but it feels as if Gemma 3 only knows a couple ways of describing it non-explicitly. There are other areas too where it isn't very creative and after a while you'll notice it uses always the same patterns.
Today has been one of the worst blackpills for me regarding local models in a long time.
The strongest open source vision model is worse than o4-mini.
Try this experiment:
Upload this image https://www.lifewire.com/thmb/7GETJem9McVRDI8kbzLM6TfwED0=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/windows-11-screenshots-615d31976db445bb9f725b510becd850.png
With this prompt
You are an assistant tasked with finding a bounding box for the indicated element.
In your response you should return a string in the following format:
[BOUNDING_BOX x0 y0 x1 y1]
x0, y0, x1, y1 being decimal coordinates normalized from 0 to 1 starting at the top left corner of the screen.
The indicated element you are required to find is the following:
Start Menu button
The only model that gives a half right response is Gemma 3 27B. All the other open source models give wrong answers. ALL of the proprietary models give better answers than the best open source model.
Now you might think this is a weird thing to ask the model. But this is exactly the kind of tasks that are required for an assistant to control a computer and perform tasks autonomously.
>>105620193This is about as retarded as expecting them to count letters and do calculations.
>>105620193Gemma 3 only uses a 400M parameters vision model and with current implementations it encodes all images in 256 tokens at 896x896 resolution. It's a miracle it performs like it does. Imagine if it had at least twice the size.
>>105620203Then what workflow do you suggest to let a model control a computer?
And if it's so retarded then how come the tiniest OpenAI or Gemini models can give a reasonable answer but Qwen 3 235B can't? Your response sounds like cope to me.
>>105620226Yeah but the problem is that I haven't found any open source models that work. Llama 3.2 90B and the Qwen model I mentioned above give worse responses than Gemma.
The fact that Gemma 27B works given the tiny size tells me Google has used some of the same training data or methods they used for Gemini and that's why it kinda works.
>>105620227>tiniest OpenAI or Gemini modelsHow do you know the number of parameters these closed models have?
For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?
Qwen won't make dense models larger than 30B anymore.
https://x.com/JustinLin610/status/1934809653004939705
> For dense models larger than 30B, it is a bit hard to optimize effectiveness and efficiency (either training or inference). We prefer to use MoE for large models.
https://www.youtube.com/watch?v=_0rftbXPJLI
>>105620326Local is evolving
>>105620326The future is <=32B Dense and >150B MoE. And I'm all for it, just buy RAM.
>>105620299>How do you know the number of parameters these closed models have?I don't, but what difference does it make? If they have better quality results because they have bigger models they still have better quality results.
>For that matter, how do you know that they aren't just secretly calling a tool for image segmentation and hiding it from you?Again, does it matter? Do local users have any kind of image segmentation model that can detect GUI elements? No, at least that I know of. I tried the most popular phrase grounding model (owlv2) and it basically knows nothing about GUI elements.
If you wanna have a go at it here's a list of models
https://huggingface.co/models?pipeline_tag=zero-shot-object-detection&sort=trending
>>105620407but not for MoE!
>>105620407Just buy fast RAM.
>>105620414It is compared to a dense model you can fit in VRAM.
>>105620193This is a trap, data-mining post by closed AI
Tell sam atman to kill himself.
>>105619630>MCP Server?>No Problem, set up SERPER API!>Just place your SERPER API KEY IN JAN! 1000 Pages for only 0.30$!!Are they fucking retarded? I can just use grok for free and it probably works better too. Soon probably gemini according to rumors.
I want something local. Why don't they show me how to set that up with duckduckgo or brave or whatever.
>>105620434Download a gguf model locally, load it up on your gpu....to call a $$ api for the results.
They didnt think that through. How stupid.
Asking for a friend, how do I go from no model to having a model which acts just like a girl i used to know?
>>105620434>It's a chrome app>2GB installed>1GB updater>All in my C: which has no more free space leftI'm so angry
>>105620495Go to koboldcpp's github, find the wiki tab, and read the quickstart.
Then download mistral nemo instruct gguf from huggingface and silly tavern and make a card of that person.
>>105620419If only you could easily buy more channels...
>>105620193Bro you could train a tiny yolo model on whatever GUI elements you want and strap that to your LLM. Go be retarded elsewhere
>>105620648DDR6 and CAMM2 are coming Soon™ to save the day
>>105620666Here's hoping it comes alongside CPUs with even wider busses too.
>>105620650We're talking about GUI elements here. Training an object detection model on specific GUI elements would take a lot of effort for little reward. If I were to take screenshots of all the elements I want to click on I could just do image similarity matching.
What I want to do is describe the element I want to match in natural language and have the model find it, without giving it any visual examples (other than the one it has to find in the image).
You think I'm retarded? Why? You think the idea is retarded?
You think Manus and Operator are retarded?
>>105620738https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B
>>105619694It felt worse than the current 30B models, though.
>>105620738You want the bounding boxes of GUI elements, which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what is. The reward would be having a model for your use case and stop bitching here. I'm training my own small models for my use cases all the time.
>>105620807It's not really a segmentation task. Segmentation in the classic sense (like in the YOLO models you mentioned) means detecting pre-determined categories in the image. The term they use for detecting objects based on free form natural language is phrase grounding.
>which is a segmentation task and you want that from a fucking LLM. If that's not retarded, I don't know what isI don't think using the tool that works better for the job is retarded. Do you? Or are you saying there a tool that works better? If so, then I challenge you to show me a segmentation model that performs better at doing this task than ChatGPT or Gemini. Like I said above, I tried the most popular phrase grounding model and it doesn't know what a start menu is. When you ask it about GUI elements it will just randomly highlight random icons in the image.
>I'm training my own small models for my use cases all the time.If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.
>>105620495>just like a girl i used to know?Stop. Don't try to have LLMs simulate real people. Take your meds instead.
>>105620889Multimodality is in its infancy and requires a lot more resources to pull off than simple LLM + tool calling, so why are you even surprised that only cloud models are able to do that?
The only solution now for local models is training a model for a specific task, which here is segmentation on GUI elements. Even if Google bothers to release a new model it won't be as good as their flagship, that much is a given.
>If model A is capable of doing diverse tasks just by prompting it and model B requires finetuning for each specific task, then model A is superior.Model A is a cloud model and model B is running on my own computer, model B is superior by default.
>>105620944>so why are you even surprised that only cloud models are able to do that?Because I was under the impression that the largest open source models would BTFO the -mini and Flash commercial models in all tasks. I was hoping somebody to prove me wrong but it seems that the vision capabilities in local models are just worse.
>Model A is a cloud model and model B is running on my own computer, model B is superior by default.That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.
>>105621016>That is the case when you are using them for NSFW stuff. For automating boring rote work, not so much.For all work. There is no reason to ever give them free data, even if you've been successfully programmed to not care about them building a profile on you.
>>105621016You can't automate that stuff without using your cloud model API and that's not free, it has nothing to do with nsfw
>>105621039You mean free as in beer or free as in freedom? Because if I can spend a few dollars to save me an hour of work then I probably would.
>>105621036It's not free if they're giving me something in return.
>>105621076>free as in beer or free as in freedomBoth. You really want to give cloud models pics of your own computer? That's not even a question if you work for a company
>>105620226do local inference backends not support some form of tiling to spread the image out across several tiles?
I've been using Gemini 2.5 for some video understanding tasks and I use ffmpeg to resize+pad the extracted frames so that they fit across two tiles. Works fairly well for a lot of tasks but the bounding boxes are kinda shit.
>>105620807retard
New music model dropped:
https://github.com/tencent-ailab/SongGeneration
Someone please try this and report back.
>>105621260>Do X, but it's shit for your use caseRetarded gorilla
>>105611492 (OP)>install LM Studio on laptop with a 3060>Not even 30% CUDA utilization with 100% offloading to vram>Everything takes minutes to answerWhy might this be?
>>105621282>SongGeneration-base(zh) v20250520>SongGeneration-base(zh&en) Coming soon>SongGeneration-full(zh&en) Coming soonChinese only for now
>>105621096If George Hotz is not afraid of streaming his PC to the whole world, I don't see why I should be afraid of streaming my PC to Google.
As for the company I work for, nobody is monitoring me that closely that I'd get in trouble for leaking a few screenshots with bits of internal data here and there to a non authorized API . I don't handle anything too sensitive.
>>105621313It seems to support instrumental stuff at least.
I am curious what is the generation speed for this, if it's dreadful like YuE or if it's a bit fast like AceStep (which is shit but fast)
>>105621319>Defending cloud cuckery this hardYou're in the wrong general bro
>>105621294Get some reading comprehension you ape. Everything apart from the bounding boxes works well. I can get time stamped occurrences of company logos even when they're blurred/upside down, it completely btfos older methods. Even the inaccurate bounding boxes are still in the general location and useful for human evaluation. It's super convenient to feed the audio in too.
>>105621342I'm just being realistic. Acknowledging deficiencies is the first step towards improving.
I have a friend who told me he has had success with this using an open source model but doesn't remember which one he used, he's gonna send me some info when he gets back from work.
>>105621260Pan&Scan (tiling) isn't implemented in llama.cpp and it didn't work last time I tried it in vLLM where apparently it's implemented (with an optional flag to enable it). From what I've read, it needs some form of prompt engineering; it's not as simple as just sending image tiles.
>>105621405thanks for the info. That explains why Gemma performed so poorly for manga translation when I tried a while back, the image was probably resized to the point of the text being unreadable.
>>105621307Low memory bandwidth can't saturate compute.