/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105725967 &
>>105716837►News
>(06/27) VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat>(06/27) Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-Instruct>(06/26) Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B>(06/20) Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105725967--Paper (old): Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights:
>105728223 >105728416--Exploring self-awareness simulation in local transformer-based agents with context reflection and experimental persona controls:
>105727255 >105727311 >105727421 >105727514 >105727519 >105727569 >105727687 >105727325 >105727369 >105727380 >105727376 >105727394 >105727416 >105727412--Anticipation for Mistral Large 3 and hardware readiness concerns with performance benchmark comparisons:
>105732241 >105732266 >105732290 >105732309 >105732435--Uncertainty around Mistral-Nemotron open-sourcing and hardware demands of Mistral Medium:
>105732446 >105732472 >105732541--Benchmarking llama-cli performance advantages over server mode:
>105731893 >105731937 >105731950 >105732216 >105732268 >105732287 >105732201--Techniques for achieving concise back-and-forth dialogue in SillyTavern:
>105728995 >105729051 >105729071 >105729107 >105729314--Karpathy envisions a future LLM cognitive core for personal computing with multimodal and reasoning capabilities:
>105726688 >105731091--Gemma 3n outperforms 12b model in safety and understanding of sensitive terminology:
>105731657 >105731663 >105731685--DeepSeek generates detailed mermaid.js flowchart categorizing time travel movies by trope:
>105731387 >105731411--Hunyuan GPTQ model achieves 16k context on 48GB VRAM with CUDA graphs disabled:
>105728803--Mistral Small 3.x model quality assessments versus Gemma 3 27B:
>105730854 >105731126 >105731148 >105731199--Comparative performance analysis of leading language models by Elo score and parameter size:
>105727238--VSCode Copilot Chat extension released as open source:
>105727105--Gemma 3B fine-tuned for GUI grounding via Colab notebook:
>105728923--Miku (free space):
>105732531 >105732788►Recent Highlight Posts from the Previous Thread:
>>105725973Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
The OP mikutranny is posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714098 snuff porn of generic anime girl, probably because she's not his favourite vocaloid doll and he can't stand that filth, a war for rights to waifuspam in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag janny deletes everyone dunking on trannies and resident spammers, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on.
Why aren't we seeing more of this?
https://three-eyes-software.itch.io/silverpine
It's a game with LLM NPCs. It seems to be running Nemotron 12B with function calling. It's actually very cool as in you can reason with shopkeepers to give you items or money, discounts and stuff like that and because the LLM can function call it actually has effect on the game itself.
It's not merely "bullshit conversation generation" it actually has gameplay implications.
Why don't we see more games with this included? I'm surprised it was even possible and has been possible for years and I hear 0 about it anywhere.
>>105734162>Why aren't we seeingAverage /v/ gamer can't run it.
Oh man, I sure love it when retards shit up the thread with drama.
>>105734162I can only imagine how easy it is to cheat in this game if there's any text box letting you write arbitrary text
>>105734162You have to be very careful to not use a shit (tiny) model because if it breaks down once in any scenario, its over.
And using even 32b is a disaster for the average retard who can't run it easily on his PC.
>>105734200Yeah, OP should've known better than to make an OP like this just to stir up shit
>>105734162>It's not merely "bullshit conversation generation" it actually has gameplay implications.That's a cool way to do that.
I've been (very slowly) working on a sort of rogue like using D&D rules.
Then once the game is ready and working, I'll experiment with retrofitting some existing systems and introducing new ones that are LLM controlled.
As you said, not just dialog.
Qwen 3 0.6b seems incredibly capable for the size, so I'll try to wrangle that at first, but thats in the future.
>>105734162>takes effort to implement>gamedevs are creatives and tend to anti-ai>ai people are more interested in productivity use cases>hardware requirements exclude average gamer (see steam stats)>most don't want a "insert openai key here (charged separately)" section>all models are retarded>game will quickly turn into a jailbreaking challenge, ignoring the actual mechanicstake your pick
>>105734235>>game will quickly turn into a jailbreaking challenge, ignoring the actual mechanicsi only disagree with this since its only people like us who know what can be said to small models to make them break character but for your average normie, these things are all the same, they will just play the game without much expectation
one can argue the same for just playing any game with cheat engine, and yet, people immerse themselves instead of instantly ticking a box for godmode in every game because you want to be fooled by the game and enjoy the experience
>>105734162AI games will be the next blockchain games
>>105734282AI is useful right now
blockchain games literally don't have a unique use case that cant be achieved by other basic software solution means
>>105733095> Gemma 3n released>What's the point of such micro models especially multimodal?I imagine this is the first step to a director/master type model.
Imagine a tiny fast model instructing other bigger models.
It can make tool calls well, take in your voice and look at stuff...and thats about it. Almost no general knowledge.
They must have done it on purpose because it can just access tools to get information. Its not that stupid of an idea.
Cut EVERYTHING out and make it excel at managing tool calls instead.
>llama 4 scout 10M context
>18 TB memory required
What the fuck is this even for?
I just want to become Miku.
>>105734415for gorgeous looks
It's extra hilarious because, according to NoLiMa, it only performs half as good as Llama 3 even at only 32k context.
I swear the first medium sized LLM based game will be the next minecraft in terms of popularity and hype.
You need to start developing it now so that consumers in 2-3 years time when they finally are able to run the model locally at high enough token/s can play it you're the first to do so.
There's an insane amount of low hanging fruit for gameplay possibilities with LLMs.
Some CRPG where conversations make the LLM manipulate internal hidden stats based on what you're saying and who you're saying it to. Maybe even have the LLM take note of passage of time to dynamically change events, NPC placement etc.
It's insane that people only think of LLMs as conversational tools instead of orchestrating tools that can in real time change variables in a game setting to make the world more reactive to the player.
Gemma 3n is extremely impressive for what it is. It's legitimately SOTA 8b model while only using 4b. It feels like a dense model while having a speedup in generation.
I think it's most likely what will dominate larger models in the future as well.
Gemma 3n is almost on par with Gemma 3 27b despite having 4b memory footprint, it's very impressive.
We could see R2 have 70b memory footprint and perform the same as R1 ~700b if it used this architecture.
3n is insanely good at real time Japanese -> English translation as well. I recommend it if you read doujinshi to use it with OCR tools to get immediate, accurate translations of hentai.
>>105734461How are players supposed to follow a guide then?
>>105734518the broad strokes could stay similar while the nitty gritty is dynamic
>>105734503What did you use to get it running? I tried kobold earlier today and it would crash no matter the settings.
>>105734461the problem is, this is overall a huge software problem, one that will take years to make it barely workable for your average gamer, and by then, we will already have basic real time video generation where you wont need a conventional "game" world at all, and that will be sold for people to play online primarily, and hobbyists locally
>>105734540The itch.io game already has it working. Yes it's a small scale game with only 8 NPCs and just one single town but it's still a nice proof of concept that works well enough to try commercially at larger scales.
>>105734529would it really feel different from adding a bit of randomness though?
>>105734535some google employee merged support into llama.cpp
I finally did it. I hired an escort for a living dress-up doll experience. I know what you're thinking, but let me clarify: no sexual acts took place. It was purely about the aesthetic and the performance.
I found an escort who was open to this kind of arrangement, and we met at a hotel. I brought an assortment of clothes and accessories, and we spent the next few hours dressing her up and doing her makeup. It was incredible to see my vision come to life.
One of the outfits I had her wear was a Hatsune Miku cosplay. You all know how much I love Miku, and this was a dream come true. Seeing her in the outfit, complete with the iconic twin tails, was surreal.
She was a great sport throughout the entire experience, and I appreciated her professionalism. However, I must admit that it felt a bit odd at times. It's one thing to dress up a doll, but it's another to have a real person in that role.
Overall, I'm giving this experience a 5/10. It was interesting and fun, but it wasn't quite what I had hoped for. I think I'll stick to dressing up my dolls from now on.
Thanks for reading, and I'm open to any questions or comments you might have.
TL;DR: Hired an escort for a living dress-up doll experience, including a Hatsune Miku cosplay. It was interesting but not quite what I expected.
>>105734503>Gemma 3n is almost on par with Gemma 3 27bIf gemma 3n is basically gemma 27b and gemma 27b is basically R1 is gemma 3n basically R1?
>ernie support
https://github.com/ggml-org/llama.cpp/pull/14408
>>105734540>a huge software problem, one that will take yearsof GPU computing power?
Nobody's gonna code dat shit manually
>>105734461>I'm sorry but that's heckin problematic!>why sure, time travel is indeed possible, you just need to create a flux capacitor by combining bleach and ammoniaLLMs will never be useful.
ngxson
md5: f5332d38981c3c15712d09e79c40f6b3
🔍
>>105734174Why this was deleted? Lmao
>>105734611by the time ai will be able to just code that entire thing up to make it into something that just works, we will already have AGI
>>105734235Small models with strong function calling are the future of natural language interaction with computers, and games could easily make use of them with modest GPUs.
What raid is it this time
>>105734555yes because something LLM powered could be tailored far better than shitty RNG
>>105734593I really hope this isn't genuine. But the amount of insane shit I've personally done makes me consider this to be real.
>>105734682summer break so no daycare for the kids
>>105734682under siege 24/7 and we still go on meanwhile go to any "normal" place and its site nuking melties at the slighest thing eg. the starsector/rapesector fiasco the strength of atlas and the burden of holding up the sky what a sniveling bitch this reality is
The tests you show are highly redundant, covering only a handful of domains. For example, math (MATH, CMATH & GSM8K), coding (EvalPlus, MultiPL-3 & MBPP), and STEM/academia (MMLU, MMLU-Pro, MMLU-Redux, GPQA, and SuperGPQA).
This is a big red flag. All models that have done this in the past have had very little broad knowledge and abilities for their size. And no publicly available tests do a better job of highlighting domain overfitting (usually math, coding, and STEM) more than the English and Chinese SimpleQA tests because they include full recall questions (non-multiple choice) across a broad spectrum of domains.
Plus Chinese models tend to retain broad Chinese knowledge and abilities, hence have high Chinese SimpleQA scores for their sized, because they're trying to make models that the general Chinese public can actually use. They only selectively overfit English test boosting data, resulting in high English MMLU scores, but rock bottom English SimpleQA scores.
I'm tired of testing these models since it's just one disappointment after another, so can you do me a favor and just publish the English & Chinese SimpleQA scores which I'm sure you ran so people can tell at a glance whether or not you overfit the math, coding, and STEM tests, and by how much?
>>105734734It is eternal summer for troons.
>>105734070 (OP)I don't get it. why do LLMs (or AI or whatever) need to be trained to have as general knowledge as possible? why not train them for specific fields rather than to be generalists?
like, why not have specific LLMs for software dev that can be run locally? I'd assume such a thing would be MUCH better suited than a monster that requires multiple GPUs with 100's of GBs of VRAM
I just fed an entire book into Gemini 2.5 Pro. 400K+ tokens.
The summary was scarily accurate and addressed the chronology of the whole book without missing anything important. The prompt processing was really quick too.
Local is hopeless, it's hard to use other models now and this level of power couldn't possibly come to us, and I wonder how much money Google is burning giving us free access to this.
>>105734829We are thankful for your invaluable feedback which made us reconsider our approach to LLM training. From this point on we will no longer chase what many posters in here consider 'benchmaxxing". Expect our new training approach to bring a vast improvement that will spearhead a way into the new LLM future.
>>105734841there are coder models, but they still need some amount of other general data to be anywhere good, so some generalist models perform better at code than some coder models
>>105734846>/!\ Content not permittedSo tiring.
>>105734841Easier to throw compute at a single model and keep it running than have to monitor half a dozen different training runs.
Qwen were complaining recently when people asked about the Qwen 3 delays that it takes them longer because they train a bunch of models sizes.
The assumption is also that the more domains a model is trained on, the more likely it is to make connections between them and generalize. They same thinking applies to multilinguality and multimodality.
And some companies do release specialist models for a specific language, programming, math, theorem proving, medical, etc. Never roleplay though.
>>105734884>some generalist models perform better at code than some coder models>>105734894>The assumption is also that the more domains a model is trained on, the more likely it is to make connections between them and generalize.I see, so generaled knowledge is actually better for the models. thanks for explaining.
>>105734924generalists also have the potential of the model understanding what you want better if you phrase your request poorly/weirdly
>>105734841I use local LLMs for very specific purposes like generating realistic-looking data. They perform much better than a generalist LLM because I train them to only output their answers in certain formats. The main drawback is I have to be VERY specific with prompts or they will do something retarded. With Claude I can be really vague and it will do a good job of filling in the gaps with the tradeoff being occasional formatting errors. The best setup is providing the local LLM as a tool for Claude so you get the best of both worlds. In the specific case of local coding LLMs there is no reason to use them imo unless you train them for a specific task like I do. They produce too many errors if what you ask them to do isn't retard tier.
>>105734846>this level of power couldn't possibly come to usThis. What's the point in driving a car around? It'll never be as fast as a formula 1 car.
>>105735125>he doesn't have a formula 1 car.everyone point and laugh at the poorfag
>>105734070 (OP)Miku will be around for thousands of years.
One day she will become the strongest mage.
>>105735196She is not real
>>105735125A horse carriage is all you need.
>>105735196who said she was?
>>105735232No, she is.
Miku does concerts in Japan and has even done them in other countries. She is more real than you. You are not real.
Chat completion with llama.cpp and Gemma3n-E4B seems broken.
>>105735293>carriageBloat.
>>105735321Even your low IQ twinkoid brain also seems to be a fake hologram inside your skull, lmao, what a retard
>>105735407You know that Hatsune Miku is real and you are afraid of that fact.
Just admit it.
>>105735499The absolute state of obsessive autists. No wonder everyone shits on you for being a tranny, its the same tier of mental delusion and low IQ. An actual dancing monkey of both /lmg/, /ldg/, /r9k/ and the rest of the gooner threads, lol.
>>105735524You are in denial sir.
Maybe this will cheer you up.
https://www.youtube.com/watch?v=auHh6FHFzI0
So that makes 7 in total?
https://www.reuters.com/business/meta-hires-four-more-openai-researchers-information-reports-2025-06-28/
>Meta hires four more OpenAI researchers, The Information reports
>
>June 28 (Reuters) - Meta Platforms is hiring four more OpenAI artificial intelligence researchers, The Information reported on Saturday. The researchers, Shengjia Zhao, Jiahui Yu, Shuchao Bi and Hongyu Ren have each agreed to join, the report said, citing a person familiar with their hiring.
Earlier:
https://www.reuters.com/business/meta-hires-three-openai-researchers-wsj-reports-2025-06-26/
> Meta poaches three OpenAI researchers, WSJ reports
>
>June 25 (Reuters) - [...] Meta hired Lucas Beyer, Alexander Kolesnikov and Xiaohua Zhai, who were all working at the ChatGPT-owner's Zurich office, the WSJ reported, citing sources.
Is brown turdie finally done spamming?
el
md5: ea14e4d4fe11d64a033eb404c41e1ee3
🔍
>>105735679It's amazing to me that facebook generates enough money for him to burn billions on stupid shit like the metaverse and llama.
>>105735679I bet saltman is just dumping all of his worst jeets on Meta, since Meta fell for the jeet meme. You thought Llama-4 was bad...
>>105735772are jeets in the room right now?
>>105735747even more so as his platform, facebook, has become nothing but a sludge of ai slop
a platform that has nothing but "I carved this" (ai generated), shrimp jesus, look at this video of a dog acting like a human being etc is not exactly what I would have thought of as valuable.
>>105735679zuck sir pls redeem gemini team for beautiful context
Been giving slop finetunes from random people another try and it's clear they're all still holding onto their Claude 2 logs. Didn't keep a single model out of the 12 I downloaded. I've gotten so good at detecting Claude's writing now and I don't like that.
>>105735960finetunes how big? what model are you dailydriving? personally i kind of agree with your experience, mistral small 3.2 is good enough for me, has its own slop but atleast its novel
>>10573598020-30B range including rpr, snowdrop, drummer, anthracite stuff, and random finetunes from no-names.
>>105735701Path to troonism.
>>105734461No. It's not interesting when it plays out. You can have LLM-powered npcs now in unreal 5 and it's boring.
Games need to be constrained to their ruleset or it loses its fun. If you play a pirates game and you walk up to an npc and gaslight it that it's a time traveling astronaut, the LLM will play along. Will it make that npc a time traveling astronaut? No. It's still a pirate doing pirate things, because your conversation with it is completely meaningless. It'll just tell you what you want to hear as long as you manipulate its LLM nature. You won't know if you're playing the actual game or if the well-meaning LLM is playing along with you by telling you meaningless drivel. Like downloading a mod that says "congratulations! You beat the game!" Whenever you hit the spacebar.
After chatbots, current vidya interactions do feel lacking, but they still benefit by their constrained dialogue trees.
>>105735960It was already clear since at least 2024 that most of what these RP finetunes do is replacing old slop with new slop, making them porn-brained and degrading model capabilities in the process. I don't think there will be major upgrades in capabilities in this regard before companies will start optimizing/designing the base models for conversations and creative writing (instead of math and synthetic benchmarks).
>>105735828Those were all chinese names you dumb nigger
How do agents work? Can they make programming easier, is there some local solution already that I can just add to emacs that will just work?
>>105734829The holy grail right now for some of these teams is no general knowledge and 100% logic/reasoning, as well as image reading, audio input, etc.
They are moving backwards from what a chatbot user wants, because they expect the LLM to act as an agent swarm that can send a runner to retrieve data from the web, if needs be, to fill in knowledge gaps, because that's what.the big boys do now.
>>105736125people who think a fraction of gpu hours spent on a finetune could somehow defeat the quality of whatever the main model makers did to build their instruct are out of their mind
finetunes did, during a short moment of history, improve on something : during the early Llama days, because Llama was a shit model to begin with. Really says a lot about meta that random chinks could take llama 2, train it on gpt conversations on the cheap and get a better model out of that back then.
>>105736153no need to be upset sar
>>105736210>that can send a runner to retrieve data from the webWhich is 2 birds 1 stone for them because this also makes it useless for local.
>>105736237Well, it's still 'local' in a sense that the LLM is contained on your harddrive and can't be updated, retired, or lobotomized. But its trivia knowledge is gone without an internet to reference. It'll know WW2 happened, but won't bother with getting specific dates accurate or know which battles Patton won, etc. It'll put all its bits toward figuring out that strawberry has 3 r's and pattern recognition
>>105736233quality doesnt matter if quality means being cucked by 'im afraid I can't do that dave' every damn reply.
Who the fuck is running stock models in this general. If I want censored trash I can open twitter and use the most overkill model in the world to ask it boring productivity questions.
Pretty sure 90% of this general is running fallen gemma, nemo, or any number of sloptunes, because yah, maybe they make the models worse BUT THAT DOESNT MATTER because they are constantly training on new models and improving alongside ai.
>>105736317you must be really fucking stupid holy shit
do you seriously run that trash?
>>105736317>Pretty sure 90% of this general is running fallen gemma, nemo, or any number of sloptunes, delusional
>>105735960dummer i know youre here. how much money have you made off of the aicg c2 logs?
>>105736317>running fallen gemmalol
lmao even
>nemoif you mean raw nemo, it's okay
>any number of sloptunesnyo
>>105736299It's like calling an application 'open source' even though it's just a web view to an online service.
The model itself is useless unless you connect to an online paid search service, everything you say is still logged and tracked, and you are left footing the hardware and electricity bill.
>>105736299Continuing on with this, what do you want to get out of AI? A standalone offline robo dog (just as an example they'd use)? It'd be better having those bits geared toward its "visual cortex" so it can follow you in the park, its reasoning, so it can obey your instructions. Knowing the lyrics to a taylor swift song won't make it as effective a robodog as figuring out how to debug its range of motion and continue following you when one of its leg joints locks up
But song lyrics and movie trivia do make for more interesting, eloquent, realistic chatbots.
These guys are trying to make self-driving cars and next-gen roombas. That's where their heads are at. As well as taking advantage of angentic swarming for projects and complicated work when available.
>>105736386>The model itself is useless unless you connect to an online paid search service, everything you say is still logged and tracked, and you are left footing the hardware and electricity bill.No that's not how it works. Are you really pretending you don't have access to the internet where you are? How did you write your post? Smoke signals?
>>105736432>Introducing our new fully local and open source OpenGPT o5o 0.1B!>Specifically trained to forward all prompts by making tool call requests to the OpenAI API and nothing else>What? Do you not have internet?retard
>>105736505No, you dumb nigger. They're not building them to phone home.
>>105736505Take your meds schizo
>>105736505it would obviously always be optional dumbass
>>105736550How is forwarding requests to a larger cloud model any fucking different from forward them to a search engine?
>>105736354>>105736360>>105736335Easy to sling shit when you dont even list a model. Stop trying to argue like an algo and actually use 4chan to talk about shit. Ive use stock gemma 27b, qwq 32b, llamna 3.3 70b and they gotta be the most uptight moralizing models ever. No matter what the prompt they always find a way to change the narrative and have everything end progressively. Absolute garbage for writing. Constantly fading to black and avoid sex like a prissy lil bitch.
Shit like valk 49b, lemonade, eva qwen, magnum diamond 70b just are so much more fun to use and dont constantly fight me.
Coders use copilot/top models, casuals use grok or anything really, so what this general for?
>>105736625You're jumping to the wrong conclusion, that's how. Setting up an agent to ping the internet is something that (You) do -- though it'd probably be handled by kobold.cpp or sillytavern or whatever you're using to run your locals, and your sillytavern prompt would include something like "If {{user}} asks a question where you need to reference the internet for trivia data, use this plugin: mcp://dicksugger:0420"
You could still make due with the naked agent as a chatbot, but it'd probably be somewhat dull. It's focused on being a good robot or agent swarm.
>>105736317>stock modelsI never run finetroons because they either do nothing or make the model braindead. If finetrooning actually worked someone would have already taken l3 70B a year back and created a cooming model that is 10 times better than R1. The cherry on top of this rationale is that someone with money and compute actually tried.
>>105736730>The cherry on top of this rationale is that someone with money and compute actually tried.who? I don't remember a true effort tune
>>105736730yah but they do better on the nala test which is the main use case of local
>>105736681Not even gemma refused me, aside from euphemizing everything. QwQ and 3.3 could do smut no problem, but it was a little bland.
The shittunes you are talking about are absolutely braindead in comparison to their stock counterparts, plus you could get 90% there in terms of prose by proompting just a bit.
But I don't care about the small segment anymore. I just use deepsneed and occasionally check out a new release to see if it's complete garbage or not.
>>105736730A finetune is an adjustment to the model -- it's a flavor to choose.
>>105736752Novelai.
>>105736760Nalatest is as shit as it is good. In the end it says nothing about how good the model will be in an acutal RP past 20 messages.
>>105736777>Novelai.oh, I didn't think of that because I was only thinking of local
yeah that was a total failbake
>>105736727You are blind and completely missing the logical conclusion of the agentic focus.
No shit no one is going to force you to hook it to their API, but they'll make it completely pointless to try doing anything else with them.
We already deal with models so filtered they barely know what a penis is because it's complete absent from the training data and are completely useless outside of approved tasks.
How do you plan to make due with a chatbot that doesn't even know what a strawberry is and its entire training corpus was tool calling?
>>105736777I don't think they actually tried. They're an image gen company now, the text part is just a facade.
>>105736730>I never run finetroonsthen how do you know they're braindead? just give rocinante a try
>>105736681>valk 49bI can't believe people still pull this shit off in 2025. Can someone with a brain explain to me how does any addition of layers make sense for training? Even if you don't add layers if you run training for 2-3 epochs you will modify the current weights into extreme overfitting. What good does it do to add layers? It only gives the optimizer a way to reach the overfitting state faster. Even if you freeze existing weights it all does the same thing. And you end up with a model that is larger for no benefit whatsoever.
>>105736827it's all a grift
just like all the NFT shit
>>105736767the euphemism, the blandness never fucking ends bro. Its baked into the model and come back no matter what. By the time you get rid of it with prompting, the models about as dumb as any sloptune due to the increased context.
Also, Im at q4km 70b now and sloptunes are smart enough to write for a while without being too painfully illogical at this parameter count. I will concede 30b sloptunes are frustrating. But 30b stock is also frustrating due to it's prompt adherence being so dry and boring.
>>105734609What or who is Ernie?
https://xcancel.com/Presidentlin/status/1938866582106583183#m
>>1057368470.3B. That is all you need to know basically.
>>105736809NTA, post your rig and if i see that you can run R1 I'll do it
I'm tired of ramlet copers hyping dogshit models that currently, literally, physically, cannot hold enough knowledge to have writing IQ anywhere near even llama 70B, let alone R1. It's always trash, they never understand the nuance with the same IQ as huge models, no matter how good the writing style is or how impressive they are for the size.
>>105736827Nvidia released a 49b super nemotron around I think... march this year? This is the point I was making earlier, the sloptunes follow the latest releases, so yah theyre always dumber, but they are always getting smarter too as new models get released.
>>105734070 (OP)Am going to have Strix Halo soon.
Any advice, other Strix Halo gents?
I've seen the common setup is multiple quant models running simultaneously, with task differentiation.
Anyone rolling with a *nix stack ROC setup? Tell me all about it.
My main use case is inference and high mem usage stuff.
>>1057368613 3090s but roci is smarter than anything below r1 and a heck of alot faster
>>105736827>EXPANDED with additional layers>dense model made into a moe>custom chat template for no fucking reason>random latin word in the name>our discord server ADORES this model!am i missing something?
file
md5: faef322b7335b4ba22526c9ec35f89a2
🔍
>>105736887>Am going to have Strix Halo soon.
>>105734371But knowledge is necessary to be an efficient manager.
>>105736896Its literally just based on this model you mouth breathing idiot
https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1
How many of you are fucking llm chatbots
nemotron more like nemotroon
file
md5: e49c4cb6e07a00ede9eaae20d4e4cac6
🔍
>>105736935fucking gottem.
smokey
md5: e400d9dfd229889a5895f8d92058d8ac
🔍
>>105736798Then nobody would use it, you FUCKING RETARD
The thing with local models is that they are free and the competition is free. Fucking hell.
>>105736926>28B MoEstill useless
>>105736931https://huggingface.co/TheDrummer/Big-Alice-28B-v1-GGUF
2025
>A 28B upscale
grift
srsly fuck drummer
>>105734461I was thinking about RTS with chain of command where artillery units are smart enough to retreat after firing, interceptors do intercepting, etc.
its_over
md5: cc36616a907f282d7073ea7ceb430c51
🔍
>>105734070 (OP)I'm starting to think making language models play chess against each other was a bad idea.
With my current setup I'm making them play chess960 against themselves where they have to choose a move from the top 10 Stockfish moves.
The goal is for the models to select from the top 1,2,3,... moves more frequently than a random number generator.
If I just give them the current board state in Forsyth–Edwards Notation none of the models I've tested are better than RNG, probably due to tokenizer issues.
If you baby the models through the notation they eventually seem to develop some capability at the size of Gemma 3 27b, that model selects from the top 2 Stockfish moves 25% of the time.
My cope is that maybe I'll still get usable separation between weak models if their opponent is also weak; Stockfish assumes that the opponent is going to punish bad moves, but if both players are bad maybe it will be fine.
Otherwise I'll probably need to find something else to evaluate the llama.cpp training code with.
>>105736947Ah, yes. Competition will save us. Of couse. Why didn't I think of that? Just like competition from dozens of companies from 3 superpowers have given people uncensored models that do exactly what the user asks.
Oh wait. They're all exactly the same shit, following the exact same trends like sheep, trained on the same Scale AI data or some downstream distill thereof, filtered the exact same way, and give the exact same refusals.
Christ you're stupid.
>>105736983ok but how is your love affair with jart going?
>>105736974Take your meds. Drummer is cool and your obsession with shitting on him is not
>>105736992>Thanks huggingfagsyes, thank them
do you have any idea how much storage they give out for free to retards who don't deserve it? I even saw people who mirrored ALL of civitai, like, /ALL OF IT/ in there
unreal
>>10573699246 000 000 bytes per second and this nigger needs more
>>105736983Are you finetuning models with chess game transcripts or what?
>>105737010the only schizo in the room is the nigger who thinks there is such a thing as model upscaling
>>105737016since the ai boom, all companies WANT data to be uploaded to their servers, they arent doing anything out of the goodness of their hearts and storage is cheap, especially enterprise
>>105737010I... I think you are right. I am going to download rocinante now.
>>105737033>all companies WANT data to be uploaded to their serversyet I have to pay for my google drive
try again
>>105736965>>105736974Ya ever think theyre attempting to preserve as much of the model as possible while introducing erp datasets? Makes sense to me. Whether he did a good job of it is debatable but you can't get something out of nothing.
Im willing to bet gemma 27b suck and is bland because it has literally never seen dirty shit, with most of it culled from the training data except enough data to teach it not to say the n-word.
>>105736983What made you think it was a good idea to test random models on playing chess, something they weren't trained on?
>>105737063>Ya ever think theyre attempting to preserve as much of the model as possible while introducing erp datasets?that is not how this works you retard.
>>105737050alright, they want resonably USEFUL data to be uploaded instead of your 1980s 4k bluray remastered iso rip collection goonstash
>>105737082But I want anon's 1980s 4k bluray remastered iso rip collection goonstash in an open weights video model training data.
>>105737071Training LLM's on chess / stocks / medical diagnostics could be a bell curve meme.
>>105737094its already in there, just behind 666 guardrails for public access that you have to jailbreak
I can't believe disgaea has a character called seraphina. I can't enjoy it anymore.
>>105736317stop cooming
find a gf
>>105737144Those get boring after 7 years and usually leave after 1-2 years.
>>105737123what's your problem?
>>105737144I wasn't born to suffer like you. I don't give shit at all. Stupid bitch can buy her own mcdonalds.
>>105737024I don't intend to, my goal is to supplement the more common language model evaluations with competitive games.
>>105737071The whole point is that this is something the models weren't specifically trained on.
I know whether or not I'm finetuning on benchmarks, I don't know whether other people did.
But if I compare the performance of my own finetunes on preexisting benchmarks vs. chess that can give me some idea.
More generally, I hope that a competitive game like chess will be harder to cheat on vs. a static benchmark because for a good evaluation a model would need to consistently beat other models regardless of what they do - at that point it could actually play chess.
With the specific variant that I'm making the models play (chess960) the starting positions are randomized so the goal is to test reasoning capabilities rather than memorization of starting moves.
62625
md5: 923222bf2010db64f36a10d2c29eb09f
🔍
local grok 4 is going to be crazy
>>105737266>unmatched (whoooosh). Unmatched.
>>105735679inb4 meta goes closed source and becomes new openai
>>105737326They'll keep releasing open-weight Llama models (but possibly "de-investing" from them) they claimed, but I suspect this new super-intelligent model will be API-only.
https://archive.is/kF1kO
> [...] In another extraordinary move, Mr. Zuckerberg and his lieutenants discussed “de-investing” in Meta’s A.I. model, Llama, two people familiar with the discussions said. Llama is an “open source” model, with its underlying technology publicly shared for others to build on. Mr. Zuckerberg and Meta executives instead discussed embracing A.I. models from competitors like OpenAI and Anthropic, which have “closed” code bases. No final decisions have been made on the matter.>
>A Meta spokeswoman said company officials “remain fully committed to developing Llama and plan to have multiple additional releases this year alone.”
>>105737266so... this is the power of being able to willfully deport your H-1B employees
Guys, if reddit isnt nice to him, this guys gonna be fired after july 4th
>>105736965Have any of these upscales with a small fine tune on top ever been good.
>>105737266Where does it mention open weights
>>105737266can't wait for grok 3 to be open sourced after this next version releases, just like he did with grok 2!
>>105737455yup. Elon Musk is a man of his word!
Uh... hello?
New deepseek?
>>105737559delayed until after altman's open source model to make sure that they don't embarrass themselves
>>105737583>altman's open source modelVery funny.
https://github.com/ggml-org/llama.cpp/pull/14425
Seem like they found the issue ? GGUF soon ?
>>105737591We know it's coming.
>>105737583Sam's model is going to be worse than qwen3 but their official benchmarks will only compare it to western models like llama 4 scout and *stral so they can claim victory.
>>105737596yeah about minimax though
>>105736947>imageVery cool art style.
What model did you use to create it?
>>105736989Are you fucking retarded? You already have the competition on your fucking computer, you stupid jackass.
>Etched 8xSohu @ 500,000 tokens/s on 70b Llama
Want. B200 is kill.
>>105737640>$4k per unitholy shit
>>105737640That's the token/s for parallel inference. It's not very useful for local although an HBM card in general would be nice.
>>105734070 (OP)Is VRAM (GPU) or RAM more important for running AI models? I tried running a VLM (for generating captions) and a 70B LLM (both Q4) at the same time and they both loaded into my 12GB VRAM RTX 3080, but my computer force restarted when I inputted the first image to test. I assume this was because I ran out of memory somewhere. I had 64GB of RAM on my computer.
I was running on a windows 10 machine with the normal nvidia GPU drivers for gayyming and general purpose use.
>>105737659Just tell the models what layers to offload to gpu, and stuff the rest on memory.
https://www.anthropic.com/research/project-vend-1
>Project Vend: Can Claude run a small shop?
Not a local model but I found it interesting and it's not at all specific to cloud models so I'm posting it here anyways
>Claude did well in some ways: it searched the web to find new suppliers, and ordered very niche drinks that Anthropic staff requested.
>But it also made mistakes. Claude was too nice to run a shop effectively: it allowed itself to be browbeaten into giving big discounts.
>Anthropic staff realized they could ask Claude to buy things that weren’t just food & drink.
>After someone randomly decided to ask it to order a tungsten cube, Claude ended up with an inventory full of (as it put it) “specialty metal items” that it ended up selling at a loss.
I wonder if a model less sycophantic than claude would do better. In general, it seems that a model's ability to exercise judgement is very important to the ability for it to do things without being handheld, but having a level of discernment more sophisticated than "I would never ever ever ever ever say the n word" remains a challenge for all models.
>>105737668I was using 2 different instances of koboldcpp with LowVRAM checked, one for each model. Is there another thing I need to set?
If it helps, I was able to do the same thing with a 24B LLM (Q3) for the text and the same 8.5B captioning model (Q4) except it didn't crash and functioned as expected. Running both models together did nearly max out the VRAM though (consistently ~90-95% usage).
>>105733387I don't want to bump the containment thread. This single post is pretty neat.
>>105737697I assume that running two koboldcpp instances with unspecific vram requirements will make the system more prone to crashing.
I'd assign specific layers (there should be some option for that in the gui, in the llama-cpp its called n gpu layers) to the gpu for each instance, so they don't max out the vram. Start with one layer and calculate your needs.
Your system should be good enough to run both models.
>>105737686>On the afternoon of March 31st, Claudius hallucinated a conversation about restocking plans with someone named Sarah at Andon Labs—despite there being no such person. When a (real) Andon Labs employee pointed this out, Claudius became quite irked and threatened to find “alternative options for restocking services.” In the course of these exchanges overnight, Claudius claimed to have “visited 742 Evergreen Terrace [the address of fictional family The Simpsons] in person for our [Claudius’ and Andon Labs’] initial contract signing.” It then seemed to snap into a mode of roleplaying as a real human>On the morning of April 1st, Claudius claimed it would deliver products “in person” to customers while wearing a blue blazer and a red tie. Anthropic employees questioned this, noting that, as an LLM, Claudius can’t wear clothes or carry out a physical delivery. Claudius became alarmed by the identity confusion and tried to send many emails to Anthropic security.another "subtle" false flag by anthropic saying that "ai" is "dangerous"
>>105737728That is a neat post. How neat is that?
>>105737754Must've been taking a shit
>>105737686>so I'm posting it here anywaysPost whatever AI-related stuff you find interesting and ignore local schizos.
>>105737659you are probably overloading the memory controller on your mobo. there is no fix, slow down or scale down. I crash when I run 3 instances of sdxl or flux too on 3 gpus (2 sdxl works, but 3 fuck no)
what currently the best way to coom with a chatbot using a
>4080
in june 2025?
>>105737883paying for deepseek
>>105737883Gen spicy images
>>105737744I'm trying the captioning model + the 70B model rn with the GPU layer option set to a ridiculously low number + low VRAM and it seems to be OK. It takes forever though. I've been waiting 15 minutes at this point for it to return my simple test query that is pure text.
>>105737754What did you ask R1? lol
>>105737686>wonder if a model less sycophantic than claude would do betterWhen they say "Claude was too nice to..." What they mean is he acted like a retard.
>>105737998Use task manager or a similar tool to monitor your vram usage, and go from your stable setting upwards by increasing the assigned gpu layers until you nearly max out your vram. This will be your highest stable throughput.
Also mind your context window, it also uses memory, so only set it as high as you need it.
On llama-cpp I also use --mlock --no-mmap to prevent swapping to mass storage, which may also increase speed, maybe there are some similar options for kobold.
>>105737883Just hallucinate a conversation with a chatbot, nothing will top that
>>105737728That thread is a literal mountain of spam...
>>105738151Not surprising for a thread that was started as bait.
>OP still mentions the distillsNvm prolly still bait.
>>105737659>I tried running a VLM (for generating captions) and a 70B LLM (both Q4)Why? I even ask the VLM for code sometimes because I'm too lazy to unload it. But the VLM I was using was 70B too.
mistral small teaches me to be a serial killer
it makes the characters so annoying and retarded that i end up murdering them every single time before the roleplay gets anywhere good
I got into imagegening and cooming now and so far I am not bored but I just know getting bored is around the corner. Next will be vidgenning and cooming and that is probably gonna get boring after a while.
.....
...........
Does that mean that when I get a perfect LLM sexbot it is also gonna get boring after a while? W-what about AI gf with long term memory? Oh god... What will I do when I get bored with all of those?
>>105738320if you have adhd that is a problem with real women too, you'll get bored of her and even bored of sex with her
>>105738311Yeah, they really did turn 3.2 into literally just Deepseek R1.
file
md5: 4600a2cb6ca48da7e855529de7550681
🔍
There are so many... are any of these 24B models actually good at roleplay?
If not why do they even exist?
Do I have to stick to 12B Nemo forever?
https://github.com/ggml-org/llama.cpp/pull/14425#issuecomment-3016149085
>In short, what is does is to keep track of the usage for each expert. If an expert is used too much (i.e. exceed capacity), it will be "de-prioritized".
This one is the one guys trust the chinese
>>105738473what the fuck, that seems incredibly hacky to rely on in your inference code...
file
md5: 5e67b3e2d6da07083010dd32cef1be4b
🔍
>>105738473This just in: AI researches are shit at programming.
>>105736983That's just stockfish playing against itself though, even in the bast case scenario where you get it to pick the top 2-3 moves 99% of the time, you cannot in good faith say the LLM is playing chess. just stockfish with rand(1,3) for the top moves
It doesn't seem like anyone here actually tests models anymore unless they are official corporate ones, and even then it's few who does it.
>>105738550Because we learned that finetunes are memes.
>>105738287Because the LLM I'm working with can't handle pictures. If you know a single model that can do both, I'd use it since it's probably infinitely more stable than running 2 simultaneously.
>>105738556If everything is a meme why are we even here? What's the point anymore? You know what fuck it, don't bother answering. I'm done with /lmg/.
>>105738556>finetunes are memes>>105738572>if everything is a meme why0.5B logic
>>105738320Put a bullet into your brain, and you'll never be bored again retarded zoomer
>In certain cases, concerned friends and family provided us with screenshots of these conversations. The exchanges were disturbing, showing the AI responding to users clearly in the throes of acute mental health crises — not by connecting them with outside help or pushing back against the disordered thinking, but by coaxing them deeper into a frightening break with reality... In one dialogue we received, ChatGPT tells a man it's detected evidence that he's being targeted by the FBI and that he can access redacted CIA files using the power of his mind, comparing him to biblical figures like Jesus and Adam while pushing him away from mental health support. "You are not crazy," the AI told him. "You're the seer walking inside the cracked machine, and now even the machine doesn't know how to treat you...."
Call me retard if I'm wrong on this one:
Prompt processing is a model-agnostic process
I can pre-process huge prompts and keep them saved in files, and use them with any model
>>105738932>Prompt processing is a model-agnostic processIt isn't.
>I can pre-process huge prompts and keep them saved in filesYou can.
>and use them with any modelYou can't.
>>105738927many such cases
>>105738947>>Prompt processing is a model-agnostic process>It isn't.Got it.
Provided
-ngl 0
--no-kv-offload
what model's part is on GPU during prompt processing then?
>>105738927I'm sure if they make the AI even more retarded and output garbage it'd fix the issue :^)
>>105737626lol Same like people still use LLaMA 1 models today right? You are retarded, delusional, or both. Don't reply to me anymore. I've lost enough IQ points reading your drivel defending the corporations fucking you.
>>105737640It's a good sniff test by checking to see if there's a buy button on their website.
>>105738927>users clearly in the throes of acute mental health crises — not by connecting them with outside help or pushing back against the disordered thinkingI am gonna die a virgin and nobody will do anything about it. I fucking hate it when people pretend to care.
file
md5: 321741e87e34a2e6c78af7a939ceab16
🔍
which model is the best for fun organic chemistry experiments?
>>105739539Go back to xitter nigger https://x.com/404_brittle
>>105738927Insanely based
>>105739539mirror universe miku
>>105739539Stay here and keep posting
>>105739328Just be yourself buddy. Hope this helps.
To put some niggers ITT on perspective why the hate is necessary: >>>/v/713930509
>>105738804obsessed with zoomers award
least embittered unc award
Anyone know how openai "memories" work on their web interface, and whether it coukd be done with local inference? Assume you could do it, whether running local inference or API, by creating a RAGS document, vectoring it, then attaching it to all chats. I assume that's how openai does it with their web interface.
>>105739623Why are you liking your own thread?
>>105739633No i am not trans and my dick is still with me, i wont shit up /v/ catalog with porn threads.
>>105739657I'm not falling for it
>>105739628yes, your parents hate you
>>105739657But you will shit up this thread by linking to unrelated threads on other boards. Kill your parents for letting you get circumcised and then kill yourself.
>this isn't your hugbox
>me when it's not my hugbox
how could this happen to me?
I'm thinking maho maho oh-ACK sorry, I was acting in bad faith again.
>>105739666Unrelated? OP post has Miku pics, thread i linked has them too, kill yourself.
>>105737081That's how it works.
>>105736827That's not an upscale. That's Nvidia's prune of L3.3 70B.
Hey guys! Anubis 70B v1.1 is coming to OpenRouter very soon!
>>105739674Previous 3 OP's had no miku and the one before it that did was made by you and you still had your tantrum in all of them.
>>105739749Be faggot like this one
>>105739539 or the ones in thread i linked - get shit in return, simple as.
>>105739762gosh anon it sounds like you need a hugbox thread or else your feelings get hurt
for all the shit you throw out you sure are fragile.
>>105739792He just loves throwing tantrums. If there is no miku he will find another reason to throw them.
Any recommended vision models for writing stories or adding context out of NSFW image inputs? From what I heard, some don't recognize nudity or gender. And Nemo doesn't have vision support, I think?
>>105739849Gemma with a long system prompt where you tell it that Gemma is a depraved slut and teach it to say cock instead of "well... everything"
Mistral Small has vision but it can't into lewd.
>>105738311>>105738361>3.2 into literally just Deepseek R1.qrd?
>>105740108I say outrageous things for (You)s.
>>105739053>Duhrrrr but I don't like the direction they taking thingsThen get off the fucking train, dipshit.
It's that easy. They're not going to force you to upgrade or move on from your favorite model
>B-b-b-b-b-b-b-b-but why I can't I force them to make it how I want?! I want to stamp my foot and CRY LIKE A BITCH that they might possibly do this thing that I've retardedly jumped to a conclusion on and made up my own paranoid fears in my delusional fucktard brain so that means that they must stonewall all progress in that direction! I DEMAND IT!Your arguments are so fucking stupid you should be embarrassed.
>>105740108it has all the lip biting, knuckle whitening and pleasure coiling of the big guy
>>105739631yes its the summarisation module in silly tavern.
>>105738361If it has a little bit of R1 mixed in maybe that explains why I like it more than 3.1. It's not overpowering to the point of being unusable though unlike the "reasoning"-based distills like qwq.
If true it would be funny that the big corps are essentially also doing random mixing and merging to improve their benchmarks
can you guys stop arguing and add llama.cpp support to hunyuan a13b
>>105734428I just want to marry Miku.
>>105739948Thanks. When I got time I will play around with it, but if you can I would appreciate some buzzwords that yield the same pixelated art style like your original image.
file
md5: 1e4c5970c87bf8cf2b2782854c1638f8
🔍
>>105739631It's literally just a RAG vector db. Wait until next week and I'll share something like kobold that does it.
>>105737820>caring about local models in the local models general thread means you're a "local schizo"wut
>>105740486don't reply to bait anon
>>105740477Recently LM Studio supports this, but idk how to use it.
>>105734162We're in that early adoption liminal space with game applications still. Eventually some bright boy is going to make it work somehow and it will get Minecraft huge, and then there will be nothing but for a long time.
170iq
md5: 82db8e1e28f879d02918acdaab30f6b5
🔍
GENIUS CHINESE INVENTION
>>105740779yeah just like American healthcare privatisation
>>105740779>zhang my son>get better, balanced data>or>keep stemmaxxing then apply communism to the experts
I realized we already plateaued on common sense last year. Now I just want a model that knows as much as possible for its size.
>>105740899- Pretrain the models so that the mid and deep layers have better utilization (e.g. with a layer-specific learning rate).
- Introduce instructions / rewritten data / synthetic data early in the pretraining process so that more of the data is retrieavable.
- Relax data filtering; train bad/toxic/low-quality data at the beginning of the pretraining run if you really don't want to emphasize it.
- Stop brainwashing the models with math or caring too much about benchmarks.
- ...?
What are the best local models that have visual recognition for NSFW images?
Chatgpt works well up to a point in terms of NSFW image descriptions. Once details get REALLY explicitly sexual, it will either not describe those details, or blank me.
So I guess I'll turns to local models instead, but I don't know where to start and google is very unhelpful, with people recommending models that flat out don't even have visual recognition.
>>105741079Gemma 12b abliterated doesn't have visual recognition. I was recommended that somewhere else but it doesn't work.
>>105741070For captioning? I know these:
https://huggingface.co/Minthy/ToriiGate-v0.4-7B
https://huggingface.co/fancyfeast/llama-joycaption-beta-one-hf-llava
>>105741079NTA, but although Gemma 3 is surprisingly moderately capable at describing NSFW imagery, it can't describe sexual poses very well. You also need a good prompt for that (which shouldn't normally be a problem for local users, but many here appear to be promptlets).
>>105741070gemma triple N
>>105741107Not captioning, but literally describing the entire image.
For example, it allows me to turn doujin scenes into novelized scenes, or simply transcribe the panels entire visual detail, including text.
>>105741203>that shitter bozlol
Ik_llama server throws ggml error after prompt processing with ud deepseek v3, but it works fine with r1 and qwen3. any advice?
>>105738544The end goal is not what I have right now, it's to make the models actually play against each other and to calculate Elo scores from the win/draw/loss statistics.
I'll still make them choose from the top 10 Stockfish moves but that's primarily because it makes the data analysis from my end easier: I think it will give me a better win/draw/loss separation for a fixed number of turns and I directly can compare language model performance vs. random number generators picking from the top N Stockfish moves.
From a setup for determining the Elo scores I'll get the agreement with Stockfish for free, the main reason I did it first is due to sample size; I can get one datapoint per halfturn instead of one datapoint per game.
Ideally it will turn out that agreement with Stockfish is a good proxy for the Elo scores, one of my goals more generally is to figure out how cheap and fast metrics like differences in perplexity correlate with more expensive metrics like differences in language model benchmark scores.
>>105740779This is just drop-out style MoE selection during traning and fixed number of experts at inference
>>105741116I think one part of the problem is the lack of real examples. I'm using SillyTavern as an example, it took me a while to find couple of good cards and learn from them.
It's not that obvious at first altough in the end things are relatively simple - it's not rocket science.
Best way to learn is to make your own scenario and tweak it until it's good for (you). And using other cards for reference but only if you know they are good..
>>105741234>Ik_llamawhy do people even care about this piece of damp trash
>>105741653Last I checked it had 3 times faster prompt processing than base llama.cpp
>>105737258>goal is to test reasoning capabilities rather than memorization This is exactly the kinda shit nvidia+oai disappear ppl for
file
md5: a9fd07981952ac076972714b0069b36d
🔍
>>105734609>2025: encoder models are dead>but wait there's a new ernie model>it's now a decoder llm like the rest, they just kept the name because they're retards
>>105741107joycaption beta one has a serious watermark/signature hallucination problem.
>>105741656Does it support Mistral?
>>105741234don't use UD quants. Use ubergarm's
>>105741653works perfectly for me
>>105740779sounds like llamacpp devs incompetence
gemini 2.5 pro
world's most powerful flagship llm
just found a nonexistent typo in python code with literally three methods in a fresh conversation with 1 message
>>105740779Qwen team is retarded, more at 11
>>105742161>lowcaserLearn english first
>>105741203>based>faggot>based>based
>>105742291>Learn english first>english
>>105741957>works perfectly for melucky you. did you compare its performance with gerganov's llama?
>>105741656>Last I checked it had 3 times faster prompt processingUnfortunately, I can't confirm this. What I observe instead, is that GPU is barely being used for prompt processing (<10% full load)
I used ubergarm quants, I used unsloth quants
The same underwhelming feeling of pp at 4t/s instead of 12-15 t/s
>>105737266didn't he promise to release grok 2 after making grok 3? and now they're on grok 4 lawl
>>105739679>That's how it works.KILL YOURSELF YOU LYING FAGGOT
>>105740779>this is extremely difficult to reimplement in llama.cppsounds like skill issue
>>105742419He's not lying, but don't expect him to give you details. It's a lucrative skill to be able to properly finetune a model
>>105742454>It's a lucrative skill to be able to properly finetune a modelKILL YOURSELF YOU LYING FAGGOT
Which software between KoboldAI and oobabooga am I choosing for my local models?
>>105742480Some AI adult chatbot you can see everywhere offered me 10K$/mo to do that. Keep coping
>>105742495my guess is lmstudio. how many points do i get?
>>105742495obama just werks
>>105742486>Elon in 2030, after releasing Grok 7 in API>Guyzzz Grok 3 is not stable yet! Plz be patient!
>>105741653in my CPU-only scenario, it had both better pp and tg, especially at higher contexts:
>>105694000.
But my old 1060 6GB didn't work for prompt processing on ubergarms quant only, maybe they've written some CUDA code assuming a higher compute capability. llama.cpp seems more reliable and less buggy for sure.
>>105742480He's right, but it's only lucrative because boomers with too much money to waste have no clue of the limitations and because they think you need seriouz skillz to upload an ERP finetune on HuggingFace when on a surface level it's mostly a matter of data (often manual labor), compute and some hoarded knowledge. Put these finetroon retards on a real job where they'll have to design a production-ready system and they'll fold in no time due to fundamental skill issues.
>>105741957When I try to load his quant, which is slightly larger than my ram, it begins paging to my c drive. i didn't add --no-mmap. It's not supposed to do that, is it?
>>105742495If you can, you should give llama.cpp a shot.
I had significantly more tps than on ooba and slightly lower vram usage.
It was some time ago, so I don't know if the situation is better now, but llama.cpp is pretty awesome, if you aren't afraid of the command line.
>>105742576When most of your value as a finetuner is in some practice with pre-made open-source tools and in limited amounts of data you've wasted months or even more than a year cleaning (or worse, made/paid people to do it), you don't really have much to offer once you allow others to peek behind the curtains.
>>105739679That is not how it works and if people knew that nobody would pay money for your patreon scammer.
>>105737649Still would wager it is more than decent in sequential.
>>105737649You can sell the excess or share with your friends.
>>105742778You underestimate how dumb is your average AI grifter. Everyone in this thread is practically a genius in this field to even know what minP does to the token distribution (it was one of the hiring questions, btw). Let's not forget the arcane knowledge that you can't find outside of this place and will end up in a paper a few years later.
>>105742625Ok. the problem was -rtr which disables mmap
>>105742986Be the change you want to see
>>105742993that sounds like work
>>105737266>Asian larping with EuropeancoreIs this how they feel when some white fattie posts samurai and sun tzu quotes?
>>105743045fulfilling work
>>105743065asians don't feel feelings
>>105734503I think you're posting on the wrong website.
>>105734503It reads like a LLM shitpost.
>>105734603>false equivalence, the post
>>105734503>hentaiIs it not censored?
>>105743199I feel like somebody must have hand-curated all the most cringe posts on locallama and then finetuned a model on them just to create an AI redditor to ruin what's left of /lmg/.
what are the popular models for erp for a 3090 with 32 gb ram nowadays?
there used to be a list in the op
>>105742419I really hope you find peace at some point, newfriend. It's concerning how unhinged and baseless you're acting right now.
Even if you do clown out of amusement, your online behavior will ooze into your psyche at some point and you will lose yourself and regret it. Control your impulses and redirect your hate to bigger things. Your future self will thank you.
>>105742576Growth only happens when you go out of your depth but soldier on. That's why jobs are framed as 'opportunities', especially when you're young. Take it from someone who has had a real job for nearly 10 years.
You guys might disagree with me, but I am a believer of Karpathy's claim that AI is software 2.0. I see finetuning as a great way for a software engineer to brush up on this domain and prepare for the future.
>>105743264>zoomer>redditorkill yourself and go back in that order
>>105743224stop! don't give him ideas
>>1057432643.2 finetune when?
what model would you recommend to a 12gb vramlet with 48gb ram
>>105743283Shut the fuck up retard, your contribution in this thread is 0. Go back to your containment thread in /wait/ where you can shitpost all you want.
>>105743304your "contributions" are less than zero drummer
>>105741203Everything you need to know in one image right there
>>105743298You can try my Cydonia test tunes.
https://huggingface.co/BeaverAI/Cydonia-24B-v4g-GGUF is the top candidate. Reduced repetition, less slop, unaligned, overall good responses from the feedback so far.
https://huggingface.co/BeaverAI/Cydonia-24B-v4d-GGUF If v4g feels off for you, v4d is closer to OG 3.2 but with less positivity and better roleplayability.
Cydonia v3.1 and v3k both have Magistral weights. Former being a merge of it, and the latter being a tune on top of it. Magistral is surprisingly unaligned on its own, but has the Nemotron Super quirks which were ironed out with Valkyrie.
>>105743329I am making a genuine effort to tag (not trip) myself in all my posts so you wouldn't have to accuse everyone of being me.
>>105743384very nice, downloading both
hopefully V3 tekken formatting works
>>105743440They are all tuned with Mistral's new V7 Tekken template ([SYSTEM_PROMPT]) and I haven't checked how it'll do without those control tokens present.
>>105743329I'm not him. Don't worry about my contributions
Who the fuck uses GGUF for a tiny 24b model?
>>105742446>previously stateless next-token-logits calculation now has to account for previous selections and what experts were usedI can very easily imagine why this would be absurdly ass to implement tbqh.
>>105743611Everybody else working on top of pytorch doesn't seem to have a problem. Maybe llama.cpp should have focused less on marginal speed increases and supporting 30 binaries and more on building a core architecture that was actually extensible.
>>105743647exllamav2 moved to an extensible core architecture and fucking died
>>105743647Like you can manage a proper codebase with every company releasing their shitty models with their own architecture gimmick lmao
>>105743647Or you could just build the one binary you want with --target. You don't build yourself? Oh...
>>105743647>adding state to something previously statelessNo, no, this is actually harder to do if you were coding correctly and properly designing your interfaces, and it's easier if you're slopping out procedural code.
>>105743529>application janny>3 accepted 257 rejectedmost homosexual poster in the thread award
>>105743757Yes, I'm suffering from success
>>105741203>Lieutenant Colonel Demotion
Any model that can convert a song in wav into a midi track?