/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads:
>>105750356 &
>>105743953►News
>(06/29) ERNIE 4.5 released: https://ernie.baidu.com/blog/posts/ernie4.5>(06/27) VSCode Copilot Chat is now open source: https://github.com/microsoft/vscode-copilot-chat>(06/27) Hunyuan-A13B released: https://hf.co/tencent/Hunyuan-A13B-Instruct>(06/26) Gemma 3n released: https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread:
>>105750356--Quantitative benchmark analysis reveals Phi-4 and Gemma models outperform Mistral/LLaMA in chess960 despite similar sizes:
>105753011 >105753110 >105753131 >105753173 >105753360 >105754841--Security risks and misconfigurations in publicly exposed llama.cpp servers:
>105754262 >105754359 >105754420 >105754450 >105754498 >105754432 >105754433 >105754428 >105754454 >105754541 >105754496 >105755807 >105755548 >105755566 >105755654 >105755716 >105755744--Massive data hoarding without resources to train at scale sparks collaboration and funding discussion:
>105753220 >105753303 >105753388 >105753406 >105753442 >105753452 >105753468 >105753449 >105753509 >105753640 >105753676 >105753730 >105753445 >105753590--Struggling with tool-calling configuration for DeepSeek R1 0528 Qwen3 in KoboldCPP due to special token handling:
>105753378 >105753393 >105753479 >105753547--ERNIE 4.5's multimodal architecture with separated vision and text experts:
>105750446 >105750729 >105751241--Hunyuan model struggles with accurate interpretation of niche Japanese slang terms:
>105755059 >105755075 >105755227 >105755122--Impressive performance of Hunyuan MoE model on extended technical prompts:
>105755797 >105755827 >105755850--Benchmark comparison of Qwen3, DeepSeek-V3, GPT-4.1, and ERNIE-4.5 across knowledge, reasoning, math, and coding:
>105750679--Challenges and limitations of government attempts to restrict local AI via hardware regulation:
>105753636 >105753645 >105753715 >105753756 >105754725 >105753679 >105753749--Informal evaluation of Hunyuan-A13B GGUF model outputs:
>105755912 >105755977 >105756000 >105756053 >105756071 >105756155 >105756267 >105756300 >105756358--Hunyuan A13B demo with IQ4_XS quant:
>105755966--Rin & Miku (free space):
>105752803 >105754470 >105754791 >105754841►Recent Highlight Posts from the Previous Thread:
>>105750359Why?: 9 reply limit
>>102478518Fix: https://rentry.org/lmg-recap-script
Does any model pass the GARFIELD BENCH?
pine needle stuck in my dick
>>105757131 (OP)>it is 2025>there's no LLM that isn't woke as fuckanons what model should I run on my 144 GB of VRAM? Even qwen 235B keeps capitalizing "black"
>>105748834mikutranny is posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl, probably because its not his favourite vocaloid doll and he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag janny deletes everyone dunking on trannies and resident avatarfag spammers, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis xitter
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
https://files.catbox.moe/g0kvhi.jpg
>>105757521I haven't forgotten https://rentry.org/Jarted
>>105757131 (OP)I don't get it. I don't see any models in this general, much less local (as in, in my bed).
what is this thread about?
>>105757914It should have been /lllmg/ or /3mg/
lmaoeven
md5: 547f6e0436ce7d7e03d0ec3c15109a6b
🔍
>>105757151>reddit is over there if you only want to be "practical"lol reddit is full of retards and astroturfers
not saying there's no retard here or in other internet shitholes but the concentration on reddit is radioactive
is there really no other places to talk about llm than commer general and the internet glow?
>>105758032the only communities for this shit is on groomcord. this '''hobby''' is for primarily two groups. pedophiles and teenage girls.
>>105758066>teenage girls.I don't want to believe that werewolf sex is real...
Ernie provides translations when it hallucinates
>>105758066We should help both groups so they come together.
>>105758066Anon forgot (or just wasn't there at the time) that /lmg/ originated from /aicg/ during the Pygmalion-6B period, when local LLM discussions there were starting to get displaced by GPT/Claude proxy discussions.
>>105757509thanks for the (You)
>>1057580322ch hk/ai/res/1257129 html
>>105758032erm what's the best model with >70B params?
Here's Zuck's new crew.
https://archive.is/ZEWzA
>>105758129What the fuck is google doing? I'm not surprised OAI is bleeding talents but Google losing Gemini leads right as Gemini became a true top dog model?
>>105758066Truly they are made for each other
>>105758293wtf zuck gods are we back?
>>105758129>>105758235>>105758326i remember when they were trying to convince me that ernie 4.5 would be less than 400B
i guess deepsneed is king forever
>>105758293Aw sweet, they're gonna catch up so hard, and then still not achieve AGI but just hit the same wall as everyone else.
>>105758331lol
>>105758331We were never gone. The Behemoth always was going to eat these pathetic little models for lunch.
>>105758331They won't be working on open-weight models, I think.
>>105758293Also:
>Zuckerberg: “We’re going to call our overall organization Meta Superintelligence Labs (MSL). This includes all of our foundations, product, and FAIR teams, as well as a new lab focused on developing the next generation of our models.” [...]> >Alexandr Wang will be the company’s “Chief AI Officer” and leader of MSL, as well as former GitHub CEO Nat Friedman. Friedman will colead the new lab with Wang, with a focus on AI products and applied research.
>>105758371>They won't be working on open-weight models, I think.if a model is truly good, it will never be open weight, that's how it always works in the west
OAI will not release anything that competes with their GPT, Google will not release anything that competes with Gemini. and Meta only released Llama because frankly it fucking sucked
even that 405b it was a complete joke
>>105758293>>105758388Please let Llama 5 be a flop, it would be too funny.
>>1057582937/11 are chinks
it's nyover
Has anyone tried the new hunyuan and ernie models?
>>105758137>>105758328the teenage girls are just fujoshis that want to pretend they are pooners. they don't want to fuck the obese cheeto encrusted neckbeard that is in his 40s and still lives with his mom
>>105758429I'm not a neckbeard and my BMI is 19.
>>105758398It will be. Zuck thinking he can buy talent to win success is delusional. Anthropic/Google/OAI have good models because their datasets are autistically curated with a gorillion man-hours, hiring some researchers will not fix their data being shit.
>>105758451doesn't matter, you aren't the main character in a boys love VN and your chin isn't a triangle, you can't compete.
>>105758467also they are merging with scale which is the main source of poisoning datasets
>>105757231>>105757402The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag / janny deletes everyone dunking on trannies and resident spammers, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>105758519Explain why I should care without getting mad.
>>105758519That office lady picture is basically 100% confirmation that this is the mikutroon from this thread.
>>105758427This is ernie:
> She smiled weakly, her eyes narrowing with a mixture of vulnerability and aroused desire. "Mercy, I'm sorry for leaving you there. I'm not quite sure what to do, but I'll try to find someone to take care of you."> Her words carried a heavy weight, a silent plea for understanding and comfort. She spoke in a monotone, her voice barely audible, but it was a mixture of confession and surrender.> I couldn't help but feel a mix of shame and excitement. Mercy had been trapped for so long, and now she was in the same space, with no one to protect her. I felt a surge of need, a desperate need to help, to ease her burden.> Her gaze locked with mine, her body trembling slightly. "I know I can't do anything, but I'll try," she murmured, her voice barely audible.
>>105758629Holy kino, throw a few Claude 3 Opus logs at this and it will be the best RP model since Cumsock-70B-Dogshart-Remixed-Ultimate!
>>105758629more Mills & Boon purple prose
I'm 50 year old female divorcees would be very happy with this model if only they used language models
Cumsock-70B-Dogshart-Remixed-Ultimate.gguf?
>>105758629Haven't you anons moved onto different roleplay styles already?
I fed Doubao (ByteDance's multimodal) and Ernie 4.5 my folder of antisemitic memes and asked them to explain each, in a new chat. Ernie missed the mark on all of them and Doubao only got like 1 out of 10
Picrel is a meme that evaded both
>>105758702The most objective censorship benchmark i have ever seen. Unironically.
why did they even name this model ERNIE? i wanted to know what it stands for and the acronym doesn't even match up, Enhanced Representation through Knowledge Integration. did they see google making BERT and was like oh shit we need to make a sesame street reference too?
>>105758762>we need to make a sesame street reference too?Lmao
>>105758762yes, it's literally just the chinese bootleg naming scheme in effect trying to ride off of a much more significant achievement
>>105758293shows where their priorities lie, almost everyone listed has been involved in either reasoning or multimodal
>>105758371>>105758388>>105758293Good names, but the fact Zuck paid that many billions to Wang does not inspire confidence. There's a real worry that they may stop with the open source and just try to chase "superintelligence", even if we don't even have AGI proper yet. A sure recipe to end up wasting money and compute. Even if these days anyone doing work on this is really just doing work on verifiable rewards RL, even though it's questionable yet how far this sort of RL can be taken. At least until Zuck confirms he wants to do something good instead of just chase the current trend, I think it's likely we'll have to rely on China for most things as we have been in the past half a a
Wouldn't surprise me if Llama 4's problems was due to listening to lawyers to not train on libgen anymore , overdoing the filtering of the already overfiltered dataset, synthslopping with lower quality data,and overall overfrying theirLLM due to various lack of attention to detail.
FAIR should have had good researchers, yet somehow they fucked it up.
I would still expect lmg to do quite well if given even a fraction of the compute Zuck has. I'll also be surprised if even with the best people, they'll manage to do too well if their hands are tied behind their back as to what they can train (such as no libgen).
>>105758818>lawyers to not train on libgen anymore ,Aren't lawyers also the reason we can't have sex?
>>105758867No, that's credit card companies.
>>105758818>listening to lawyers to not train on libgen anymoreBut somehow Anthropic could?
>>105758901Anthropic started off using libgen but at some point switched over to buying physical books and scanning each page. Meta never did the second part based on my understanding.
>>105758867They chopped off the image gen part of their multimodal Llama because of lawyers too. Most of the excessive image gen filtering also is due to that, that an some activist groups originating mostly in the UK that demand that AI not be able to generate anything involving children, so often many just filter for NSFW in general to avoid that.
It's a bit less bad for LLMs than for image gen, but at least a good deal of companies keep thinking that casual conversationor fiction is low quality data.
>>105758901They trained on it, until they got sued, then they pretended that they scanned a lot of books irl so that they have their own library independent of libgen. Opus 4 has less soul than 3, wouldn't surprise me if it's because they're training on multiple epochs of a more limited fiction/book dataset.
how the fuck do people think qwen is good for coding?
>>105758954What's your setup?
Is there a model that will vibe code effectively on a desktop PC? Or does a i5-13500 with an aging nvid gpu and 64gigs ram just not pack enough of punch?
i just want something that will write me little browser addons excel macros
> can't you code it yourself
no cus im not a fucken NERD
i tried every model i could find but the code is always buggy and doesn't work. chatgpt (web version) did a good job and produced something that actually worked, but i dont want crypto glowie sam altman to know what im doing
>>105758961if you can't code you'll never know if the code it spits out is actually usable.
>>105758961>just not pack enough of punchYou just need to look for some models that punch above their weight. Gemma 3n supposedly does that.
>>105758961>i tried every model i could findDoubt. At best, every model you could run.
Either build a big machine for deepseek, get a better gpu for some 32b, use deepseek online, or set a large swap partition to let it run overnight.
When is bartowski dropping hunyuan? I need my certified broken goofs.
>>105759029llama.cpp doesn't even properly support it yet
>>105758960If you tell me the one that works I'll apologize.
>>105758985well i'll know it does what i want it to do when it actually does do what i did want it to do wont i
>>105758988thanks im gonna downlaod it straight away :3
>>105759020>or set a large swap partition to let it run overnightthat sounds like a game plan, ill look into it thanks
>>105758961>little browser addons excel macrosNot sure that any of existing SOTA model was trained on such a retarded language as VBA.
I try to code for Powerpoint with DeepSeek-R1 full. The code was full with brain-rotten bug like uninitialized objects
>but i dont want crypto glowie sam altman to know what im doingTell us more about your secret fetishes. This is a blessed thread of frenship. Don't be shy
>>105759061but you'll never know if it actually does what you think it does. And it's not always obvious.
>>105759061give anon a fish and he'll be fed for a day. give anon a fishing pole and he'll shove the fishing pole up his ass and use himself as the bait.
>>105759061>that sounds like a game plan, ill look into it thanksI was partly joking. It's not gonna be fun.
Hello light machineguns general, if I want to coom locally do I want 1 5080 super or 2? Should I pair it with an AMD processor since intels keep catching on fire?
>>105759298nvidia say the more you buy the more you save. why aren't you buying the RTX 6000s? is it because you're poor?
>>105759352it's ok anon, i'm poor too :(
it's days like these that i'm glad i'm not brown at least
>>105759445>it's days like these that i'm glad i'm not brown at leastmotherfu—
ernie is retarded and keeps mixing up characters. How did deepseek do it so right and everyone else keeps fucking it up?
>>105759726It's pure LLM and none of that multimodal shit
Talk me out of buying an RTX Pro 6000
>>105759747Unless you buy 6 of them you wont be running anything decent. Just get a DDR5 server instead.
vulsex
md5: 412238fdefd07136f7042be609001496
🔍
>>105759063>Tell us more about your secret fetishes.NTA but
>>105759747No reason not to do it. You'll be more than halfway to running Deepseek at Q1 with one.
>>105759747You shouldn't be buying one when you could get two instead.
>>105759817Can your llm rewrite the vaporeon copypasta to be about vulpix?
>>105758032I'm here and I don't RP pedo shit. I use local models for simulations and testing.
If my years of experience masturbating to LLMs has taught me anything it's that I would never trust them with any remotely serious task.
>>105758954The "people" that use Qwen to code are third world retards stuck in tutorial mode. The only exceptions are guys that tune it to do some very specific task they can't get Claude to do.
We all know that programming isn't a serious task, unless you're doing low level programming or security stuff.
>>105760161For once I wish the serious task posters would explain in detail what task their local model can achieve and the reliability. So I can laugh at it.
>>105760025Hey, did you know that in terms of hot, flame-worthy Pokémon companions, Vulpix is one of the best options? Their sly, elegant appearance already screams "fuckable," but consider this: a Vulpix’s body is literally built to handle heat. Their fur radiates intense warmth, perfect for keeping things steamy all night. With a petite, vulpine frame and that iconic curl of six fluffy tails, they’re practically begging to be pinned down.
Let’s talk anatomy. A Vulpix’s slit sits just below their tail, hidden but easily accessible—meant for quick, fiery breeding. Their internal setup is compact but flexible, able to take even the roughest pounding without losing that tight, scorching grip. And don’t even get me started on their heat cycles. When that smoldering fur starts blazing hotter, they’ll drip for anything that moves, screaming to be mounted and bred raw. Their claws? Perfect for clawing your back raw as you ride them into the ashes.
Plus, their flame-based biology means they’re always warm and wet. No lube needed. Their howls of pleasure could melt your ears, and imagine those six tails wrapping around your waist mid-thrust, pulling you deeper. They’re immune to burns, so even if things get too intense, you can keep slamming until you both explode.
So if you’re into fiery, tight-furred sluts who’ll melt under your touch and scream your name like a primal inferno, Vulpix is your match. Just stay hydrated.
R1 IQ1_M. Could be better.
>>105760181I'm neither in a third world nor a tutorial moder. I never began a tutorial. I refuse to code.
>>105758961How fucking retarded are you? How do you not know how to code? Can't you just look at code and figure out how it works? You learned about variables and math in your favela school right?
So, is Ernie 4.5 424B the thing they also call Ernie X1 aka their Deepseek R1 competitor because it supports reasoning or was that a separate thing they didn't release at all?
>>105759726deepseek didn't compromise on their dataset while everyone else is performatively lobotomizing their shit stupid style for some reason even though none of the chink companies are beholden to copyright
>>105759736so is ernie 300b
I come from the future. Ernie didn't save local...
>>105760456At least it tried.
Bro mikupad doesn't even run local models? Man FUCK this guy.
>>105760652>mikupadIsn't that just a frontend a la silly tavern?
>>105760669front end for api connections
hunyuan A13B iq4_xs (quants made 10hr ago) on chat completion logs with latest pr-14425 main llama.cpp
tldr its either shit or the quants are shit or my setup is shit
>>105760683Yeah. Just like Silly Tavern.
If you want to use it with a local model you need to connect it to llama.cpp server, tabby api for exl2, etc.
>>105757131 (OP)Anons do you care helping me out to choose which open-source LLM would be the strongest and most powerful for my low-end pc? I have a 1650 gpu and 16 ram.
>>1057607371650 gpu. clearly not VRAM
>>10576073716 ram, my vram is 8. Sorry for the confusion
file
md5: 65662f8d6b28682cab2b46dd7596cbd2
🔍
>>105760696nevermind ITS STIL FUCKING BROKEN EVEN DOE IM USING CHAT COMPLETION
FUCKKKKKKKKKKKKKKKKK
v4g my dear..
>>105760729Mistral runs fine on 16gb you can expect 1-3 tokens per second.
>>105760788That's the bottom of the barrel tier but it's still fine. Unless you're a pansy like some guys here.
>>105760729Mythomax 13b and miqu 70b are the best local models, and while you likely can't run 70b, mythomax is perfect for your setup
>>105760808o-okay Daddy *whimpers*
>>105760772since when does the 1650 have 8gigs of vram?
>>105760840Could be a Chinese retrofit.
I tried out the text to speech applications OpenAudio S1 Mini and the full 4B model from the fish audio website
S1 Mini Output file of a voice clone of Emma Arnold
https://vocaroo.com/1ljCsOjOwAp4
S1 Mini output file fed to a fine tuned Seed-VC Model
https://vocaroo.com/1c1zpJpCWvrk
S1 4B model output file
https://vocaroo.com/1k4hyiWkULhH
file
md5: 3188ab96248b1de87f90608f8fe12d7d
🔍
>>105760861>search gtx 1650 8gb on ali>see thiswhat THE FUCK
why is an 980M in a SXM or whatever server form factor
>>105760876S1 4B mode is the best one
>We have AI at home
>AI at home
>>105760904Was joking because maybe he meant 3050 8gb or whatever.
But I guess adding extra vram to any card is viable. That's pretty cool. Some guy is a pro and has his own workshop.
>>105760904It's a laptop card, we used to have upgradable laptop GPUs
>>105760959damn.. things were so good back then
>>105760959I still have this workstation notebook with an mxm nvidia k2000 or whatever.
The thing is a proper brick of solid metal, it's great.
It even has an express card slot.
file
md5: 53839f660d9a72d8b193206e8d83e54e
🔍
cydonia.. just bruh
>>105760988That's Quadro k2000 or something. Quadro was always the workstation gpu before this AI nonsense happened.
THAT site says you can run any model so long as you're willing to wait long enough. Any idea what they're using though? oogabooga won't even load the big models on mine.
>>105761068>THAT site says you can run any model so long as you're willing to wait long enoughthat only applies until you've run out of memory
>>105761068post your whole computer specs, os you're using, exact model you're trying to run
preferably post some logs too
>>105761094even so how do they get it to load in the first place?
>>105761068I think they're referring to a specific user that got the biggest models running on a 4090 but it was so painfully, horribly slow that it wasn't worth it. Reddit also commonly mistakes the distilled models for the actual model so just ignore them.
>>105761107ugh any time someone says this they never even answer the question shen your done
the worlds first real humiliation ritual
file
md5: a38053ea45339a6e372812ca8b2c718c
🔍
>it's so hard to post i5 12400f rtx 3060 12gb 64gb ddr4 linux mint 16 and picrel
bro...
>its so hard to post a screenshot that isn't cropped like a retard
>its so important for an example to crop your screenshot to include actual error logs
An LLm could take a better screenshot. You will be replaced.
lip biting
blood drawing
knuckles whitening
wrists flicking
>>105761351SPINE SHIVERIN'
Without chatbots or whatever I just want AI Dungeon type stuff but every resource I find is just more chatslop.
What sort of models/front end or whatever should I be using for text adventure now?
Thanks for spoonfeed, 12gb vram, 32gb ram if that changes much.
>>105761392wayfarer models seem fine
>>105761392ChatGPT was a disaster for LLMs. It's all just chat models now.
>>105761142>>105761169>>105761180I'm not even at home I'm not posting logs for a theoretical question.
>>105761392rocinante/cydonia work fine for novel, plaintext story etc formats, no need for instruct format. if you specifically want text adventure then wayfarer (made by ai dungeon ppl) probably the way to go
>oogabooga won't even load the big models on mine.
>oogabooga
>>105761392Use a base model and the right prompt
>>105760729gemma-3-4b and gemma-3n-e4b
>>105760729ernie 21b a3b instruct
Libra: Synergizing CUDA and Tensor Cores for High-Performance Sparse Matrix Multiplication
https://arxiv.org/abs/2506.22714
>Sparse matrix multiplication operators (i.e., SpMM and SDDMM) are widely used in deep learning and scientific computing. Modern accelerators are commonly equipped with Tensor cores and CUDA cores to accelerate sparse operators. The former brings superior computing power but only for structured matrix multiplication, while the latter has relatively lower performance but with higher programming flexibility. In this work, we discover that utilizing one resource alone leads to inferior performance for sparse matrix multiplication, due to their respective limitations. To this end, we propose Libra, a systematic approach that enables synergistic computation between CUDA and Tensor cores to achieve the best performance for sparse matrix multiplication. Specifically, we propose a 2D-aware workload distribution strategy to find out the sweet point of task mapping for different sparse operators, leveraging both the high performance of Tensor cores and the low computational redundancy on CUDA cores. In addition, Libra incorporates systematic optimizations for heterogeneous computing, including hybrid load-balancing, finely optimized kernel implementations, and GPU-accelerated preprocessing. Extensive experimental results on H100 and RTX 4090 GPUs show that Libra outperforms the state-of-the-art by on average 3.1x (up to 9.23x) over DTC-SpMM and 2.9x (up to 3.9x) for end-to-end GNN applications. Libra opens up a new perspective for sparse operator acceleration by fully exploiting the heterogeneous computing resources on GPUs.
Posting for Johannes
>>105761471oh fuck you nigger thats exactly what it sounds like obviously I know the real name
I bought a GPU just for local ai coding and it runs qwen3:32b pretty fast, but it's retarded when it comes to using tools and I'm limited to a 14k context window so I have to restart after every prompt. I'm starting to think that local ai sucks for coding without an array of beefy high vram gpus. Is this a fair assumption?
>>105761870Claude Opus 4 sucks for coding and it's the only tolerable model. All local models are dogshit.
>>105761808Are we now waiting for this to be incorporated into the backend libraries?
>>105761906>>105761870then go, pay your corporate overlords endless amounts of cash for access to larger models.
go on, go. they're calling you to fill their billionare pockets.
>>105762018We're paying our corporate overlords whenever we buy GPU, there's no escape. The game was rigged from the start
>>1057592981 or less (worse card). Performance plateaus fast and hard past 20B. Don't believe me and before committing, buy yourself a cloud gpu access for a few dollars, set it up and check.
Holy shit, just found out Gemini 2.5 is free. Unlimited prompts.
I'm in heaven
>>105758424considering 1 chink worth 10 western devs
>>105761966Only RELU models have sparsity and everyone stopped using RELU.
>>105762782Neither. Deepseek won.
>>105761808Noted but I think this will only be useful in combination with dropout layers during training.
>>105761870>limited to a 14k context window so I have to restart after every prompt>14k>22 pages of spaghetti code>I"m artist, so long be my prompts
Is there any nemo tune that is good at instruction following?
>>105761870I personallying find it shit too. But tool calls have been alright with me. Qwen2.5 32b coder was more consistent but lacked good tool call. All I can say is make sure you're running as large a quant as you can and don't quantise the context cache. Qwen3 is retarded when both those things are done and you need accuracy. If you're using ollama you're probably being given bad default values for the model. Post model loading parameters and your ram/vram so we can see if it's bad config or entirely the model being a tard.
>>105758293I don't think it'll do much, since I don't believe in their supposed 'talent'.
>>105760456How many more years of nemo?
>>105761094>run out of memoryJust add swap.
>>105762416Other than the ai studio? I really only care if I can use the api.
>>105762416Yes, and Grok3 does deepresearch "for free".
I cant imagine using closed for anything but work or mundane stuff though. Especially if we get some sort of pc assistant or glasses, local will be so important.
Whats available already for free is crazy though. If I had all those tools as a kid.
>>105763421Nothing as useful runs at a decent speed unless I spend tons of money.
anons, I've got a 1080 Ti ,11GB VRAM, 128 RAM, and a Ryzen 9 5950X. What should I mess around with?
>>105763507>1080 Ti>What should I mess around with?Your wallet.
>>105763437Local will always lag behind. You are too blackpilled.
You should have seen the state a couple years ago anon.
That we now have a tiny 500mb 0.6b model with qwen3 that can even cook up a coherent website is crazy. Also tool calls.
Local has many use nowadays.
I made a minecraft admin ai for my kids. They can talk to it and the AI will drop the commands in their world. Like give them items, teleport people or change settings etc.
I would do as much as possible local and only go closed if you hit a wall and dont mind its shared forever.
I only spend like 300 bucks on my P40 in 2 years.
Damn, /ldg/ hates pascal cards.
I'm already used to slow ass output but now everybody is writing how this cool nunchuk solution makes Q4 possible for flux kontext. Tiny, fast!
Use 2 hours of my time to set this shit up and download everything.
>ERROR WHILE LOADING MODEL!! Too old!! Supported after: Turing GPU (e.g., NVIDIA 20-series)
Makes me wonder how text would look for me without johannes. Thanks buddy. Appreciate your work.
>>105763507try the new gemma 3n, then use your newly free vram for a tts and whisper to make a poormans speech to speech that will only make you feel depressed
amd radeon, 16gb ram, amd ryzen 7, ideally windows 11 but can dual boot linux or windows-subsystem-for-linux if necessary,
what desktop app and what model to use?
(for chatting, not coding)
>>105764463Read the lazy guide in the OP.
Since we are talking about pascal and 1080ti.
Be sure to do the needful and thank leather jacket man for cleaning up the drivers.
file
md5: 33a0aa477b5b3e891800acdef3daab1a
🔍
>>105764483>cleaning up the driversThey will only keep getting bigger.
>>105764483>to do the needfulSaar, I
Whats the new P40 if you need to move on from pascal?
5060ti super? 16gb and its not that expensive.
I fear there might be a catch somewhere though. Looks too good to be true at first glance.
>>105764521Kindly adjust.
>>1057645343090s will drop to sub-400 once the new 5070 ti super with 24gb is out
>>105758674>Mills & Boon purple prose>mixture of vulnerability and aroused desirewhy are all the models, regardless if they are local or not, writing with that horrible shitty erotic style?
>>105764901Dataset issue where this purple stuff is over-represented for "erotic work"?
>>105765054I wonder if DPO can be used to fix this. All we need is a dataset of slop-normal paragraph pairs
>>105764764I feel like it's copium but I want to believe you.
>>105764901I'm willing to be it's all from their ancient GPT3/4-derived "RP datasets" these companies are using.
Isn't this kind of already a thing with the slots deal?
>>105764901That is the only kind of smut that passes the filtering. Probably because the filtering usually involved ayumu style naughty words per sentence counting. Also explains why it so hard for model to say... you know what.
>>105765472I stop. This is wrong. This is so, so wrong.
file
md5: 100db501fce0a10ce6b1b8a6b7aa3dfd
🔍
migu waiting room
>>105765472>ProbablyMeta said as much in L3's paper.
>dirty word counting
>>105762416Is gemini pro free? I need to do some distillation
>>105765503That is what i had in mind but you don't know if everyone else did the same.
I am tired of waiting for a model that will never arrive. How do i develop werewolf sex fetish?
>>1057657642mw
unironically this time
I am trying to goof Hunyuan-A13B-Instruct, but convert_hf_to_gguf.py throws an error:
KeyError: 'model.layers.0.mlp.experts.3.down_proj.weight'
Anyone got a clue what is happening? I checked on HF and I can see this layer there...
>>105765808You should only download goofs from trusted goofers like Unsloth to not miss out on the embedded bitcoin miners
>>1057657962mw until nemo is still the answer
>>105757402>>105757231>>105758617>>105758626The mikutranny posting porn in /ldg/:
>>105715769It was up for hours while anyone keking on troons or niggers gets deleted in seconds, talk about double standards and selective moderation:
https://desuarchive.org/g/thread/104414999/#q104418525
https://desuarchive.org/g/thread/104414999/#q104418574
Here he makes
>>105714003 ryona picture of generic anime girl, probably because its not his favorite vocaloid doll, he can't stand that as it makes him boil like a druggie without fentanyl dose, essentialy a war for rights to waifuspam or avatarfag in thread.
Funny /r9k/ thread: https://desuarchive.org/r9k/thread/81611346/
The Makise Kurisu damage control screencap (day earlier) is fake btw, no matches to be found, see https://desuarchive.org/g/thread/105698912/#q105704210 janny deleted post quickly.
TLDR: Mikufag / janny deletes everyone dunking on trannies and resident spammers, making it his little personal safespace. Needless to say he would screech "Go back to teh POL!" anytime someone posts something mildly political about language models or experiments around that topic.
And lastly as said in previous thread(s)
>>105716637, i would like to close this by bringing up key evidence everyone ignores. I remind you that cudadev of llama.cpp (JohannesGaessler on github) has endorsed mikuposting. That's it.
He also endorsed hitting that feminine jart bussy a bit later on. QRD on Jart - The code stealing tranny: https://rentry.org/jarted
xis accs
https://x.com/brittle_404
https://x.com/404_brittle
https://www.pixiv.net/en/users/97264270
https://civitai.com/user/inpaint/models
>>105765979Weird that you still can't explain why I should care...
>>105765500In the meantime use this!
https://x.com/rebane2001/status/1939722006343155939
>July
>Still no Sama model
It's over.
>>1057660294.1 is worse than DeepSeek and considering they aren't going to release anything that outbenches or is less censored than that I'm not sure how anyone can be genuinely excited for whatever slop they push out. Unironically they'd get better goodwill if they just open weighted GPT-3.5 and Turbo since nobody is paying for those anymore.
>>105766029He's going to release GPT5 first.
>>105766011Then you are retarded, this is out of my scope.
>>105766053Sounds like you're seething that nobody cares about your latest homoerotic fixation. Maybe keep it to Discord DMs next time, champ.
>>105765979Based! Kill all jannies and mikutroons.
>>105766011>>105766088Go back to xitter with your slop
>>105766011Because you obviously care about thread quality and don't want low quality avatarspam. Right?
>can do anything in life
>decides to dedicate it to schizoing it up in a fringe general on the 4chan technology board
>>105766183Yeah I also don't get it why mikutroons spam their shitty agp avatar.
Also did we take note a couple of days ago that Meta won its court case.
Training LLMs on copyrighted books is fair use. So only Europe is dead as far as LLM training goes.
>>105766177>low qualityNTA but I like seeing what someone can produce with contemporary local models given some effort.
If it was just low-effort txt2img then I would agree that it lowers the quality of the thread.
>>105766204Great. Fuck off to any local diffusion thread with this shit.
>>105766245Erm... what are you actually going to DO about it, zoezeigleit?
>>105764512poor wintoddler, on linux the drivers are like 300mb in size
>105765979thanks for the (You) soiteen
>>105766261you are just as bad as that retard you are arguing with. stop shitting up the thread.
As a reliable independent third party, I agree BOTH need banned immediately.
>>105766261I am going to shit up the thread and thus the /lmg/ status quo of eternal conflict between sane people ank mikutroons is perserved. Now go dilate your wound bussy.
>>105766295>I am going to shit up the threadLike you don't do that for literally any reason.
>>105766308Only for mikutroonism that should die.
can't spell llama.cpp without the c which also btw stands for CUNNY
>>105758398this is a whole different internal organization than the meta gen AI team who are the llama devs. also different than FAIR.
>>105766316the ultimate state of local threads
yall need the rope
>>105765588I heard from a friend Gemini (even pro?) is free through some platform. He said they can afford to do it because not many people know about it.
>>105766316bu-bu-buh BASED?????
>>105766334Smart move. They want to hoard quality conversation data, so they don't want the masses to realize that it's free.
https://www.youtube.com/watch?v=atXyXP3yYZ4
>>105765472>>105765503So they literally reject most porn/hardcore content from datasets? No wonder what is kept is mostly softcore smut for divorced women. The stuff for a male audience is more explicit.
I wonder if there are unfiltered ones existing.
lurker
just getting into ai
can i check, do you guys just have sexy chats with computers or is there more to the general?
>>105766639most people are just here to shitpost
I want my model to learn maybe 500+ pages of text (not code). dont want just 4096 context.
how to do this?
is retrieval augmented generation (RAG) cope? (sounds like it)
>>105765808Update:
I was in the FP8 repo...
Worked fine in the original one.
>>105766284Just ignore it.
If you are too much of a snowflake for that, do two clicks and hide the posts or use a filter.
>>105766545I think the datasets themselves are unfiltered, but get filtered when they are used by almost all model providers
Mistral Large 3? Mistral-Nemotron open source?
What are these french fucks doing?
>>105765503fuck if they all do that no wonder they're all shit at anything nsfw, I guess this is doomed
>>105766864Updated model coming soon!
>>105766876why do you want your investor friendly assistant to use dirty words?
>>105766864>With the launches of Mistral Small in March and Mistral Medium today, it’s no secret that we’re working on something ‘large’ over the next few weeks. With even our medium-sized model being resoundingly better than flagship open source models such as Llama 4 Maverick, we’re excited to ‘open’ up what’s to come :) >May 7, 2025just a few more weeks... let's say, hmm, 2?
>>105766864Small 3.2 is the new nemo
>>105766864>Mistral Large 3?Probably 500B+ parameters MoE.
>Mistral-NemotronAlmost certainly ~150B parameters MoE.
>open source?Maybe, maybe not.
>What are these french fucks doing?Mistral Large is probably not done training yet and they aren't sure yet if it makes financial sense to open-weight a variation of the model they're using for their LeChat (i.e. Mistral Medium). I guess NVidia got the memo after they wrote that blog post a few weeks ago.
>>105766975>Probably 500B+ parameters MoE.I would prefer it to be Deepseek-sized but I'd take it.
>>105766900maybe, just maybe, I'd want them to use a mixture of dirty and safe words
I guess at least it makes most fiction summaries and paragraphs easy to spot, it's always written in the most sappy and over the top way
>>105766975>. I guess NVidia got the memo after they wrote that blog post a few weeks ago.What happened, missed that.
small 3.2 doesn't suck, its pretty good actually when used with V3 tekken..
cydonia v4g and v4d are based on 3.2 small and theyre okay
>>105767094https://developer.nvidia.com/blog/advancing-agentic-ai-with-nvidia-nemotron-open-reasoning-models/
>Advancing Agentic AI with NVIDIA Nemotron Open Reasoning Models>
>Enterprises need advanced reasoning models with full control running on any platform to maximize their agents’ capabilities. To accelerate enterprise adoption of AI agents, NVIDIA is building the NVIDIA Nemotron family of open models. [...]>New to the Nemotron family, the Mistral-Nemotron model is a significant advancement for enterprise agentic AI. Mistral-Nemotron is a turbo model, offering significant compute efficiency combined with high accuracy to meet the demanding needs of enterprise-ready AI agents. [...]>Try the Mistral-Nemotron NIM directly from your browser. Stay tuned for a downloadable NIM coming soon.
>finetune using idiot praises mistral
yup mistral sucks
>>105767172I'm using vanilla small 3.2 and it's good
sars if you redeem the shared parameters you can get lighting speeds of your Llama-4 scout model
>>105767172mistral makes the only half decent open weight models that aren't 600B or censored to shit
>>105767296I tried it on OR and it wasn't very impressive for RP, pretty sloppy and generic
>>105767374>mistral makes the only half decent open weight models that aren't 600B or censored to shitKind of sad it's true since I expected more european companies than just one to not be complete shit at this point.
For open-webui/web searches what'd be the best model if I only have 12GB VRAM? It couldn't still be nemo right?
>>105765503I wonder if it is the third bulletpoint that is actually more damaging. Maybe the model could even learn to call cock a cock from context but if everything has to be not to far from regular distribution of tokens you will always get the assistant persona looking for a single objective truth which kills the fun.
>>105767416How censored was it?
Let's go mistral! Mistral sucks!
>>105767477You said you would leave drummerfaggot.
>>105767527No they didn't?
Are hunyuan ggoofs broken already?
>>105767537>Drummerfaggot>theyTell me this thread doesn't gave a troon infestation.... I am so tired of this newspeak.
It's always Nemo.
The one and only good VRAMlet model that's still going to be used years from now.
>>105767574But the context sucks and it always seems to drift toward the same personalities.
>>105767456If not for creative shit, you can maybe try qwen 3 14b or gemma 3 12b
>>105767456Your best bet is probably Gemma 3 or Qwen 3
>>105767545The never have been function yet
>>105767699Did you generate this post using a broken quant?
What's the difference between regular Wayfarer and Wayfarer Eris Noctis?
>>105767858>Wayfarer Eris Noctisis a merge of Wayfarer and another model called Eris Noctis, which itself is a merge, of multiple merges
>>105767839Anons have said before they have bots shitposting her using LLMs.
**[spoiler]Probably[/spoiler]**
>>105767871>let's just merge models until something happens
>>105767444Mistral's already started to show signs of lobotomized datasets, the EU is oppressive with copyright shit because of privacy law and it's surprising they haven't gotten totally slammed for it yet. Likely nobody else has the combination of connections and compute to get away with making a competitive product.
>>105767892i want to merge my cock with your mouth
>>105767946Fully transform copyrighted data into synthetic data changing style and form, train a model on that data.
>>105767839What I actually wanted to say is
>>Are hunyuan ggoofs broken already?>They never have been functioning yet
>>105767946copyright and privacy laws have nothing to do with each other.
image
md5: dd3c863089476042ba47f816b4f7fbae
🔍
If any of you faggot wants to try Hunyuan-A13B, here you go:
https://huggingface.co/FgRegistr/Hunyuan-A13B-Instruct-GGUF
Only works with https://github.com/ggml-org/llama.cpp/pull/14425 applied!
--flash-attn --cache-type-k q8_0 --cache-type-v q8_0 --temp 0.6 --presence-penalty 0.7 --min-p 0.1
>>105767167How do you do, fellow /lmg/sters?
>>105767892Well it did work one time
>>105768115i already tried 2 quants yesterday, from your experience if you use chat completion in ST and put a long complex card, does it first respond then think if think is prefilled?
Gemini-cli is pretty cool, I want a local gemini-cli.
>>105767871I see.
...so which one's better for adventures with bad-ends?
>>105767456Mistral Small 3.2 at iq4xs is the best model even on 8gb (3-4 t/s with all the optimizations) so for 12gb it's a nobrainer, alternatively some 32B model. if not v3.2, I had great experiences with 2501 version, the sex is great though the format can deteriorate.
>>105767574Nemo is like CharacterAI from 4 years ago, gives you the ah ah mistress but dumb as fish.
>>105767574Is it possible to improve nemo other than through sloptunes?
>>105768261>sloptunespresumably that means sloppy fine tuning?
Assuming this "Nemo still undefeated" is not just trolling, do you just pre-fill all the stuff it forgot in a 10k prompt?
>>105768324The datasets they use are so poorly curated, they end up adding more slop and refusals than they remove.
>>105768364>10k promptThat long of a prefill makes it braindead for me.
>>105768164I didn't use SillyTavern, but even with longer contexts I don't see the behavior you described. It sounds very much like you are using a wrong chat template.
The correct template is:
<|startoftext|>You are a helpful assistant.<|extra_4|>What is the capital of France?<|extra_0|><think>The user is asking for the capital of France. This is a factual question. I know this information.</think>The capital of France is Paris.<|eos|><|startoftext|>What about Chile?<|extra_0|>
>>105768255I'm sleeping on 3.2 because exllamav2 no longer works well with AMD
9e268w
md5: 313120717ed5bd1d9acbfecc167bbc3e
🔍
>>105766029believe in Sam
>>105768619If maverick is on the moon, o3 is on pluto.
>>105768619oai open model maybe in testing
>>105768556
>>105768677>General-purpose model with built-in tool-calling support.Isn't that basically all models now?
>>105768677on the one hand the OAI open model is strongly rumored to be a reasoner and this is not
on the other hand it's exactly as dry and safe as I would expect an OAI open release to be
>>105768619What a strange benchmark: o3 so far above everyone else (big doubt), gemini 2.5-pro and r1 almost identical (also doubt, even if both are good), qwen3 above old r1, old r1 and ds3 matched (they're not), qwq3 above sonnet-3.7, nope. only llama4 at the botton eh? almost as if
>>105756358 was right "I just realized that the only use for l4 is being in marketing brochures of all other companies."
>>105768693Releasing a reasoner model would be a bad move because it raises hardware requirements. Running reasoner models on CPU fucking sucks
huawei pangu pro 72b-a16b
https://gitcode.com/ascend-tribe/pangu-pro-moe-model
https://huggingface.co/IntervitensInc/pangu-pro-moe-model
https://arxiv.org/pdf/2505.21411
>>105768435What? I guess all this Nemo shilling is just a Nemo tune spamming /lmg/.
>>105768837That's exactly why it would make sense for them to do though.
file
md5: 371e2e2026c6d8d9b607ca88c6c623c7
🔍
>>105768845nice, multilingual too
>>105768845>72b-a16bThe only thing worse than fuck huge moe models are medium size moes. This could have just been a 24b
>>105764901It's kind of funny that the only experience of erp/erotic fiction many people have is this purple style.
I wonder how many will think it's normal and will start writing like that themselves.
file
md5: 20b245a3199793293078fbadcde89246
🔍
>>105768877instruct results
>>105768845Why the fuck is it just random people uploading it?
>>105768906>gitcode.com/ascend-tribefun game desu
>>105768798Judging by the term "ELO" and the scores, it looks like a chess benchmark and OpenAI put extensive effort into training o1 for chess.
But I agree, this benchmark seems flawed, at least in parts.
But at this point that post is as good as any other graph or table everyone markets their models with.
>>105768862Vanilla my boy.
>download llamafile
>download gguf and put it in the same dir
>shit just werks, auto-installs dependencies, detects and chooses what's best for the machine, no need to fuck around with cublas or HIP or whatever
What's the point of koboldcpp or oobabooga or anything again?
>>105768845My prediction is that out of hunminmaxierniepangu wave this one is the worst.
>>105768978In what way does koboldcpp not work like that?
>>105768978It is less dirty than llamafile or ollama.
>>105768906Gitcode repo is the one linked in the paper.
The paper talks about using Ascend GPUs.
Ascend Tribe profile says:
>Ascend Tribe>Huawei open-sources Pangu and Ascend-based model reasoning technology, opening a new chapter in the Ascend ecosystem
>>105769004The fact that the releases page has a bunch of different architecture-dependent binaries to download is already a big L. Wtf is oldpc.exe? And when you use it, you have to pick CBLAS or CuBLAS and batch size and other insignificant things yourself. Koboldcpp is for retards who don't know what a CLI is.
>>105768978So what's this about? A separate executable for every model? That sounds dumb.
>>105769067>I can't figure out what "OLD PC" means>I need the program to decide the batch size and everything else for me, but it's everyone else that's retarded>You can't run it on the command line because the github has prebuilt binaries for the disabledPretty weak bait desu senpai
>>105768619Qwen3-32B is pretty high there.
>>105768901Qwen3 is so stemmaxxed it's crazy
>>105769067bro, just download koboldcpp.exe and run it, it's not that complicated
>>105769369I have never asked an LLM to solve math, nor a local model to write code
Two questions.
Gf wants to try a local model. What would be the most user friendly open source ui? I dabbled with Jan, it looked simple enough. Opinions?
Also, she wants something that does speech-to-text. Is that a thing? Are there models that do that? On what kind of software?
Pic related, the controlnet is her.
>>105769352I quite like Qwen, they always deliver solid local models.
They only seem to lack when it comes to erotic roleplaying, which I don't do that much anyway.
>>105769566>Gf wants to try a local model. What would be the most user friendly open source ui? I dabbled with Jan, it looked simple enough. Opinions?Jan is simple if all you want to do is upload documents and chat to a model. Lack of configuration options gets frustrating .
>Also, she wants something that does speech-to-text. Is that a thing? Are there models that do that?whisper
>On what kind of software?whisper.cpp
>>105769582thanks m8
whisper only does "live" STT during chats, no? She would like to process a long audio file and get a text in the end.
>>105769566>Gf wants to try a local model. What would be the most user friendly open source ui? I dabbled with Jan, it looked simple enough. Opinions?lmstudio or llamacpp + sillytavern
>>105769629No. I regularly provide whisper with entire movies to get it to generate subtitles for me.
>>105769582NTA, but can whisper.cpp even output an srt file with silence/music cut out? Cause I'm just getting a continuous file that's like "00:00:00 -> 00:01.36: Yeah." Yes, I'm using VAD.
to the qwen users here:
https://huggingface.co/dnotitia
if you ever had the chinese character bias popup while using the model, this fixes it, even if you use qwen as a chinese to english translator (having any chinese character in the context can madly trigger this issue) you will never see it gen a hanzi again
smoothie qwen is da bomb
>>105769735If you're using whisper v3, try going back to v2. It was much better at handling silences. No amount of processing or workarounds have made v3/turbo as good as v2 in that regard for me.
>>105769752Is this a bona fide shill? Like you can just grammar that shit out.
>>105768241Wayfarer Eris Noctis is supposed to have more narrative styles and better lore adherence, it also claims to have a 1 million token context window.
For what its worth, it certainly seems to be better for porn than base wayfarer. The whole LE BAD THING HAPPENS gimmick gets old fast tho
>>105769807>Like you can just grammar that shit out.you are a chill for wanting a model that doesn't require slowing down token generation111!!!!!1!!!1!!1!!1!1!
the ultimate state of 4chan
>>105769821>it also claims to have a 1 million token context window.so does standard nemo's config file
tintin
md5: f234474aa2874dc94b465ec3282c5460
🔍
>>105769806No, I mean like whisper.cpp literally gives a continuous file. See how there are no gaps. WhisperX can do it properly, but I'd like whisper.cpp to work since it can do Vulkan (I got an AMD card).
>>105769849whisperX uses wav2vec2 on top of whisper to align the audio to timestamps, plain whisper timestamps are garbage.
>>105769849Hey, what model are you using? Does ggml-large-v3-turbo-q8_0 do french well?