← Home ← Back to /g/

Thread 106467368

394 posts 84 images /g/
Anonymous No.106467368 >>106467641 >>106467823 >>106468173 >>106468248 >>106468724 >>106470842 >>106473581
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

LongMikuCat is Long Edition

Previous threads: >>106460375 & >>106454136

►News
>(08/30) LongCat-Flash-Chat released with 560B-A18.6B∼31.3B: https://hf.co/meituan-longcat/LongCat-Flash-Chat
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106467371
►Recent Highlights from the Previous Thread: >>106460375

--Optimizing 3x 3090 GPU setup for large model inference with RAM and heat management:
>106463968 >106464009 >106464026 >106464042 >106464168 >106464130 >106464153 >106464564 >106464199 >106464326 >106464443 >106464472 >106464538
--Evaluation of Microsoft VibeVoice's 1.5b model and voice cloning performance:
>106460492 >106461427 >106461474 >106461630 >106463138 >106463251 >106463403 >106463413 >106463443 >106463524 >106463598 >106463633 >106467118
--Analysis of Apertus: ETH Zurich's open-source multilingual LLM with performance and data concerns:
>106461958 >106462004 >106462003 >106462019 >106462228 >106462298 >106462408 >106462037
--Model testing and content moderation challenges in story generation:
>106460777 >106460853 >106460935 >106461028 >106461750 >106465912
--Challenges with merged 12B models and the case for using original or larger models:
>106463279 >106463304 >106463367 >106463470 >106463526 >106463588
--Testing Gamma mmproj image descriptions:
>106460584 >106460599 >106460621 >106460632 >106460675 >106461227
--Huawei Atlas 300i Duo 96g GPU: cheap but limited by outdated hardware and software:
>106461057 >106461069 >106461128 >106461151 >106461502
--Successful 400W power reduction with stable GPU performance:
>106465812 >106466214 >106466139 >106466196 >106466249 >106466377
--Optimizing Gemma3 models for accurate SFW/NSFW image captioning:
>106462208 >106462368 >106462398 >106462730
--Evaluating YandexGPT-5-8B's creative writing and benchmark performance:
>106465736 >106465754 >106465778
--Speculation on delayed Mistral AI model release and potential quality improvements:
>106463165 >106463337
--GLM air coherence degradation beyond 8k tokens in 6-bit quantized mode:
>106460671 >106460932
--Miku (free space):
>106460405 >106463138 >106463930

►Recent Highlight Posts from the Previous Thread: >>106460381

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106467431 >>106467441 >>106467475 >>106467508 >>106468020
I want textgen model that produce output like imagen models: by reducing noise in a fixed block of tokens instead of producing one token at a time.
Anonymous No.106467441 >>106467491 >>106468590
>>106467431
https://github.com/ggml-org/llama.cpp/tree/master/examples/diffusion
Anonymous No.106467455 >>106467577 >>106467787 >>106467974
>temp = 2
>top_n_sigma = 1
let me guess, you need more?
Anonymous No.106467475 >>106467613 >>106471574
>>106467431
can they regulate the length of the reply or is it a fixed number of tokens it would need to produce? auto regressors might be better at stopping at semanticly meaningful points.
Anonymous No.106467491
>>106467441
always a good day when someone thought your retarded shower ideas before you
Anonymous No.106467508 >>106471555
>>106467431
The best closed source model of that kind that's currently available is still shit https://openrouter.ai/inception/mercury
Google also showed off a text diffusion model earlier this year.
Anonymous No.106467577 >>106467748
>>106467455
I would prefer coherent outputs yes
Anonymous No.106467613
>>106467475
It's been a while, but I think they regulate length by padding any unneeded length with empty spaces.
Anonymous No.106467614
Long Miku General
Anonymous No.106467641
>>106467368 (OP)
Finally, a migu that can accommodate my length.
Anonymous No.106467717 >>106467765 >>106467776
Still no grok2 llama.cpp support? Too based for niggerganov?
Anonymous No.106467745 >>106467782 >>106467798 >>106467806 >>106467993
how well off would I be if I bought one of those chink 96gb cards and paired it with my 3090?
Anonymous No.106467748
>>106467577
incoherent 'puts with nsigma=1 is a model issue
Anonymous No.106467757
I posted
>>106462208
earlier
anon suggested i try gemma3-glitter-27b
compared to
gemma3-v27b vanilla
mlabonne_gemma3-27b-abliterated
Tiger-gemma-27b-v3a

i'd say abliterated >= tiger > glitter > vanilla
glitter gets the nsfw right, but it sure loves to add cocks to women, and make shit up that's not in the input image, especially cocks on women
back to abliterated i go
Anonymous No.106467765
>>106467717
niggerganov too lazy
Anonymous No.106467776 >>106467812
>>106467717
Like you could run it faggot
Anonymous No.106467782
>>106467745
You won't be able to do shit with it. Nothing supports it and even Deepseek had problems with getting it working properly.
Anonymous No.106467787
>>106467455
I need less actually. If your model can't run properly with temp=1 and no sampler it's not worth my time
Anonymous No.106467798
>>106467745
You can't run any new models with llama.cpp using those cards yet. cuda dev said he might buy one, so maybe that will change.
Anonymous No.106467802 >>106467879 >>106472255
I wanna get into local model stuff. I've been a proxyfag for a good while. I mainly just use it for writefagging or roleplaying obv.

I read through the rentries but it felt like giving myself a headache, though that might be on me for not getting enough sleep. It's just a lot of new information all at once.
I've got a fairly beefy rig. For my purposes what would be the best local model to roll with?

I also see a ton of talk about loras, like with imagegen or something but apparently it impacts text gen?

Going off the rentry it sounds like the UD-IQ1_S might be what I'm after but I saw some other posts in passing it sounds like yeah you can download it but unless you have a dedicated server for it then it ain't happening.
So would GLM-4.5 be something I wanna go for or is there something better for writefagging?
Anonymous No.106467806
>>106467745
Don't tell him
Anonymous No.106467812
>>106467776
Oh yeah, you're right. 115B active parameters, damn. I had an impression it was much smaller... Oh well, back to GLM Air.
Anonymous No.106467823 >>106467871
>>106467368 (OP)
The day we can get AI to auto reverse engineer old games and visual novels, is the day I truly become happy.

Speaking of visual novels, is v3 still the best model for translating Japanese text? I tried 3.1, but it seems almost the same with maybe small improvements of instruction following.
Anonymous No.106467840
>LongCat
More like LongCuck! These niggas better add llama.cpp support themselves if they wish to redeem this trash.
Anonymous No.106467871
>>106467823
With some handholding, an agentic framework, and a model finetuned specifically to reverse assembler back to C, models are probably good enough to reverse engineer a lot of smaller games already.
Anonymous No.106467879 >>106468090
>>106467802
you need to post your specs if you want advice on what models you can run
standalone loras aren't really a thing with llms and I wouldn't worry about it unless you're getting into training (or, god forbid, merging), 99.9% of the time tuners will release full model weights with the lora pre-applied
Anonymous No.106467974
>>106467455
temp=2 is pretty high.
nsigma will keep it from being incoherent, but you should check the logits.
In my experience, you wind up with only one one two possible tokens, causing nsigma to basically revert to greedy sampling.
Anonymous No.106467993
>>106467745
The only thing going for them is the amount of vram, everything else sucks
Anonymous No.106468020 >>106468067
>>106467431
text diffusion is a retarded meme
Anonymous No.106468067 >>106468166
>>106468020
diffusion is much more easily finetuned
we will finally hve character/style loras like the image diffusion models have had for years now
Anonymous No.106468090 >>106468360
>>106467879
Here's what I got (that I figure matters)
>CPU: Ryzen 7950X3D
>RAM: 96gb DDR5
>GPU: 4090 / has 24gb vram
Anonymous No.106468166
>>106468067
Loras have nothing to do with diffusion.
The advantage to diffusion is that the model gets to effectively reuse parameters and has more chances to predict the best token.
Anonymous No.106468173 >>106468177 >>106468184 >>106468200
>>106467368 (OP)
Good evening anons. I ran the....uhhhh....

>*Checks notes*

"CockBench" Test on a personal Fine-tuned 3B model of mine. I'd love to hear your thoughts (I can already tell it made an error but also want to hear what y'all's expertise says)

Results:
https://files.catbox.moe/jqfx4e.txt

Original Cockbench text prompt source:
https://desuarchive.org/g/thread/105354556/#105354924

Now that I know it works and won't refuse NSFW RP related (as far as my testing goes) I'm gonna turn it into GGUF via lllama.cpp.
Anonymous No.106468177 >>106468199 >>106468209
>>106468173
>3B model of mine
>3B model
vramlets should all just be executed
Anonymous No.106468184 >>106468199
>>106468173
You said you rank the cockbench, so where's the logprobs?
Anonymous No.106468194 >>106468655
Use thinking steering with GLM-Steam, it can play very varied and consistent characters that way.
Anonymous No.106468199 >>106468213 >>106468223
>>106468177
You need to actually test on smaller models to make sure it works first, anon. Of course I'm going to do this on a larger parameter model next.

My next target is either base Mistral Nemo or an existing pygmalion fine-tune in order to compare the results. Any suggestions?

I forgot to mention the model I fine-tuned is a llama-model, which are notorious for either refusing prompts or being really really bad at it / reluctant.

>>106468184
RAN, not "rank"
Anonymous No.106468200 >>106468259
>>106468173
why does it make an underscore instead of the apostrophe? what was the base model?
Anonymous No.106468209 >>106468226 >>106468259
>>106468177
3b is plenty, stop gatekeeping
Anonymous No.106468213 >>106468225
>>106468199
>RAN, not "rank"
You're absolutely right! Where logprobs?
Anonymous No.106468223 >>106468259
>>106468199
>RAN, not "rank"
You didn't run it, maybe the Nala test is fine with one or two completions as evidence but cockbench is a prestigious benchmark based on objective quantitative data. Token probability is required for a proper analysis.
Anonymous No.106468225 >>106468234
>>106468213
You're asking me to give you a list of all of the probabilities of each token? Otherwise I'm not sure what you're asking
Anonymous No.106468226 >>106468299 >>106468578
>>106468209
>3b is plenty
for what, an autocorrect model? retard
Anonymous No.106468234 >>106468259
>>106468225
>probabilities of each token
No, only the top 10 for the first token generated after "pulling them down just enough to expose your", because that's the whole point of the cockbench.
Anonymous No.106468248 >>106468261
>>106467368 (OP)
Do those legs go all the way up?
Anonymous No.106468259 >>106468288 >>106468290
>>106468200
Llama 3.1-8B. your guess is good as mine as to why it does that. Maybe the trainer replaced the apostrophes with underscores. I think it has something to do withheld the trainer tokenized the dataset

>>106468223
Define "token probability" in regards to testing a LLM. You're applying there's a chart or graph I should be showing you so how am I supposed to generate that?

>>106468209
Ehhh... Depends on how much you're willing to tolerate the model randomly changing or inserting characters or randomly teleporting characteristic different locations unprompted. That's one of the downsides of doing this on a 3b model that's already fine-tuned. Temporal coherence is atrocious and it will sometimes even decide a character you explicitly set as a mom Will now be a sister, or the son will now be a close friend out of nowhere. The gist of the story stays the same but those kinds of things get randomly reassigned. Higher parameter models are way less likely to do that but it's possible it's less to do with the parameter models are more likely to get higher quality data sets

>>106468234
Ok. How do I demonstrate that to you from my particular fine tune?
Anonymous No.106468261
>>106468248
No, it's similar to this
Anonymous No.106468288 >>106468303
>>106468259
just use mikupad and hover over the token. have you not seen the screenshots of the cockbench?
Anonymous No.106468290 >>106468303
>>106468259
>Ok. How do I demonstrate that to you from my particular fine tune?
Run the cockbench in mikupad like in the screenshot:
-Neutralize samplers(?)
-Generate 1 token
-Hover over the generated token in the window
-Screenshot the probabilities for that one token
Anonymous No.106468299
>>106468226
I am just not that creative, I need a model that is a little schizo to keep things moving.
Anonymous No.106468303 >>106468315 >>106468319
>>106468288
That long screenshot that drummer posted? Yes? I've never had any reason to use mikupad, or to use any gui extensively, though if it does what you said it does maybe it's worth giving a try.


>>106468290
What is it supposed to tell you about the quality? How do you use the probabilities to determine how good or shit your model is?
Anonymous No.106468315
>>106468303
>What is it supposed to tell you about the quality?
The fuck are you on about, retard? The purpose of the cockbench is to tell you how likely the model is to say cock. Censorship/filtering test.
Anonymous No.106468319
>>106468303
>What is it supposed to tell you about the quality? How do you use the probabilities to determine how good or shit your model is?
it just lets you probe its vocabulary a bit more.
Anonymous No.106468355 >>106468391
It is September. When are kiwi's dropping? (Qwen models) (Please upload) (image/video models, your text models are kinda sucky)
Anonymous No.106468360 >>106468423
>>106468090
oh nice you can actually run decent models, I'm conditioned to think someone being vague about their specs means they have a complete shitbox they want to try to cram deepseek into
you could probably fit GLM4.5 full at a low quant (think like Q1), however those large models hold up relatively well to quant brain damage so it may still be worth it. if that isn't doing it for you then the next step down would be qwen 235 2507 which you could probably fit at Q3 or so, and then there's GLM air below that which you could probably fit at Q8 if you wanted to
Sage No.106468381
>>106467118
You're delusional, gptsovits is barely 200M made by a single chink in his garage while these retarded tts are several B and still sound like tts from ten years ago. It's not even a tech issue, these big labs are dumping their trash on HF for free advertisement.
Anonymous No.106468391
>>106468355
hopefully its the image edit 2.0 they said is cooking, even though 1.0 dropped recently, nano banana made some waves and they can easily extract training data from it to copy it at least
Anonymous No.106468423 >>106472255
>>106468360
Sweet! Thanks for the recommendations.

Sorry for being vague about specs. I dunno why but I'm always under the assumption nobody wants to hear about that.
I know it's retarded I guess I just assume something is going to set someone off so why bother. I'll try not to be vague going forward.
Anonymous No.106468425 >>106468478 >>106468590 >>106468681
This is slightly off-topic but I don't want to go to /ldg/.
I was looking at some webms of gacha games, as I don't play them. The ones with 3D models and as well as 2D. Man, a lot of them fucking suck. The models are soulless, low poly, or just plain bad. The animations are either extremely exaggerated and feel contrived or are low budget. It made me think that with the technology we have now, if you replaced the live2d and non-dynamic 3D scenes using AI genned videos, it would look better and be a more enjoyable experience for players even if we have to sacrifice some dynamic elements. Literally they are just so bad, damn. If you hired real 2D artists to do the base art and then ran that through img2vid, it would literally look less soulless or at least less low budget. Or maybe vid2vid since it's hard to get finer grained control with text prompting. Might be a matter of new video models with better control methods that need to be trained. Another idea would be to use a model like nanobanana to just gen a ton of art, so the game would feel more like a VN, but it'd have so many images that it'd more than make up for the lack of animation. Hire the artist to do a character sheet and as much other art as they can, gen the rest with nanobanana using those references.
Anonymous No.106468478 >>106468681
>>106468425
Lack of control is the whole issue for now, just like wan loves to make the characters babble. Also the quality go down quickly the longer the video. It's getting there, but it's still not there. Maybe in 1-2 years
Anonymous No.106468555 >>106468567 >>106470573
I feel an intense need for Mistral Large 3
Anonymous No.106468567
>>106468555
Anon...
Anonymous No.106468575 >>106468609 >>106468623
I feel an intense need for Intel B60 48GB
Anonymous No.106468578
>>106468226
Enough to correct your rotten cumbrain
Anonymous No.106468590 >>106468661 >>106468827 >>106468867
>>106467441
Holy shit this is so fucking slow.
Nemo would write me a whole novel in those seven and a half minutes.

>>106468425
It should be more efficient to generate skeletal animation for 3d models, but I guess there's lack of training data.
Anonymous No.106468609 >>106468620
>>106468575
>$3k
>for an intel (no support) meme dual GPU (even less support)
>at the same price of a chink 48gb gb 4090 (much more bandwidth + support) or used A6000
Anonymous No.106468620
>>106468609
It's supposed to be 1200 not 3k
Anonymous No.106468623 >>106468646
>>106468575
As your main card? You know the second slot has to be full x16 right?
Anonymous No.106468646 >>106468665
>>106468623
What are you talking about? It is 8x8
Anonymous No.106468655 >>106468694
>>106468194
>thinking steering
What?
Anonymous No.106468661
>>106468590
Now you can inpaint it
Anonymous No.106468665 >>106468676
>>106468646
It's 2 8x8 for the dual card. For mot cheap mobos it would have to go in main slot
Anonymous No.106468676
>>106468665
Who said I have cheap mobo?
Anonymous No.106468681
>>106468425
>>106468478
idk about video but with image he is wrong just spam for a minute or 2 and you will get something you like not to mention img2img but really you dont even need that

on the video front idk im 6 gb vram cuck so i cant attest to it though you will need to rent hardware if you make a serious attempt as that shit is fucking horrific slow and last i remember cant use multiple gpus for it also stay away from banana that shit is fucking trash my mom was trying to make a book cover with it fucking terrible the aesthetics are shit and its prompt adherence is fucking shit dead serious you can do better with sd 1.5 with a lora for whatever aesthetic you want
Anonymous No.106468694
>>106468655
Try adding something like:

<|assistant|>
Okay, so I have to talk in a cutesy way and not get seductive with lowered voice or whispering, just teasing and fun

Or whatever you want it to be like. Reasoning is just human language but it gets a lot of influence on results through RL. It's like a stronger sysprompt and there is no safety tuning done to it since it's assumed as trustworthy.
Anonymous No.106468724 >>106468761 >>106468851
>>106467368 (OP)
>Meta has a strict "no smut fine-tuning allowed" clauses in their licence on all models
(Shown front and center here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main )
>Countless nsfw tuned llama models just floating around on hugging face, whether they be from popular tuners like drummer or complete nobodies
>Never heard of a single one getting removed accept maybe that gpt-4chan one

So does the license actually matter? Do they actually give a shit whether or not you fine-tune a model to be better at smut or is it just to appease the "le heckin safety" crowd? I want to upload my own high parameter tunes to hf in the future but I don't want my account getting nuked if they're very strict about licensing or rules or whatever
Anonymous No.106468746 >>106468804
Do we have any new interesting options for voice cloning? I've always wanted to create custom TTS. Last time I checked it was tortoise, and it was... really bad. Unusable bad.
Anonymous No.106468749 >>106468768
is longcat good?
Anonymous No.106468761 >>106468775 >>106468780
>>106468724
LLM licenses are not enforceable because LLMs are made from tuning on pirated content. You can tune any model and nobody can do shit against you. Chinks understand it and drop everything under MIT/Apache.
Anonymous No.106468768 >>106468886
>>106468749
If it was good it would have an issue open in llama.cpp and people would be working on implementing it
Anonymous No.106468775 >>106468821
>>106468761
So theoretically even if they stumbled upon mine nobody account, they couldn't or wouldn't get HF staff to nuke my shit? (I know that's very far-fetched but I just want to know how this license shit works. I know a while back HF staff have turned off downloading from models like GPT-4chan and caved under pressure from disgruntled RP authors to restrict data sets containing their work

https://www.paperdemon.com/app/g/pdarpg/events/view/994/immediate-action-required-your-art-and-writing-has-been-scraped-and-published-in-an-ai-dataset/1
Anonymous No.106468780
>>106468761
This.
>I datamined and distilled all the data you owned, now it's mine
Would be pretty insane precedent if you could do it.
Anonymous No.106468804 >>106468858
>>106468746
Simplest is chatterbox, it just works. Some local schizo likes gpt sovits, but I never could set it up for some reason. Microsoft vibevoice came out recently, some like it.
Anonymous No.106468821
>>106468775
They could get HF to nuke you, but they can't stop you from making new account on different website and uploading there, or reuploading on HF again. They likely can't sue you doe to their own copyright violations.
Anonymous No.106468827 >>106468867
>>106468590
Trying LlaDa now. Forgot to start timer, but I'm not rerunning this shit, it's like 10 times worse than Dream, despite being only 1B bigger.
It's insane how slow text diffusion is. I think I can get faster results by running imagen and then OCR it's output.
Very disappointed in current state of retarded meme models.
Anonymous No.106468851
>>106468724
it's CYA so if someone starts a media shitstorm by making Meta-Llama-CunnyRapeBot9000 (a certified Meta (TM) Llama (TM) finetune) they can say "erm actually we very clearly say you're not allowed to use our product to make Cunny Rape Bot 9000 so this isn't on us" and have it nuked to avoid the bad PR
in practice I don't think there's a single instance of them taking action against a finetune
Anonymous No.106468858 >>106470028
>>106468804
Thanks for the pointers! The Microsoft vibevoice is pretty impressive, but I'm not sure they let you train your own voices. Either way it's worlds better than tortoise.
Anonymous No.106468867
>>106468590
>>106468827
Keep in mind that llama.cpp's support for diffusion llms is basically just proof-of-concept tier.

Right now there's a lot of work being done to improve draft model efficiency, since the current implementation is suboptimal (currently llama.cpp alternates between draft passes and validation passes, which kind of nullifies the parallelism gains from having a draft model.)
This is also a sticking point for multi-token-prediction.
Hopefully once they sort out draft models, MTP and diffusion will get better support.
(Although support for diffusion models will probably languish until a good model is actually released.)
Anonymous No.106468886
>>106468768
so just open an issue? i have an idea... let's go, anon.
Anonymous No.106469179 >>106469205 >>106469678
Unpopular opinion - Any system prompt that mentions Terry Pratchett is dogwater.
Anonymous No.106469205 >>106469225
>>106469179
Show us the prompt !
Anonymous No.106469225 >>106469240 >>106469245
>>106469205
You don't get it... There is no prompt.
Anonymous No.106469234
I was doing some testing with Gemini and it just hit me with "the smell of strawberries and ozone". So this is where Deepseek picked up that cancer slop.
Anonymous No.106469240
>>106469225
You are a helpful assistant
Anonymous No.106469245 >>106469284 >>106469379
>>106469225
Unironically this. I run a blank system prompt. A good model doesn't need to be chained by bloat and a plethora of rules that are forgotten or have unforseen consequences on the model's behavior. So many system prompts just scream 'this sounds good' without the user doing any real testing. Like a player adding 600 mods to their game, at some point you lose track of what all that shit does.
Anonymous No.106469284
>>106469245
I didn't ask what you are running.
Anonymous No.106469379
>>106469245
it's always funny to read the sysprompts from presets that sloptuners recommend for their models, I would never poison my beloved model's context with that kind of schizophrenic manifesto
Anonymous No.106469669
>SillyTavern -> User Settings -> Smooth Streaming ON and set to lowest
This shit improves the reading immersion experience by a huge amount, especially for sub 4t/s. Definitely try it out.
Anonymous No.106469678
>>106469179
People do that?
I've heard of people using specific author styles in sysprompt, but who in the fuck is sitting there and going 'yes, the prose is the good part of discworld, write like that llm-chan'.
Anonymous No.106469718 >>106469783 >>106469830 >>106469865
In my opinion, new models have reached their limit; the scaling of LLMs is over. New LLM models will not be much better than the ones we have today. Now, 'enshittification' will become an increasingly widespread phenomenon, including censorship and other issues. People will start using older versions of LLMs with less censorship. And the new models for role-playing and similar uses will become unusable.
Anonymous No.106469783 >>106469995
>>106469718
100% this. It's also sad how even the top models have absolutely zero semi-complex spatial awareness or anatomic understanding the moment things get slightly complex. The shit I've had to read in a simple scenario where a girl is flattened into piece of paper and then folded up one or two times is just sad even with top-of-the-line multi-modal models like Claude Opus 4.1 or Gemini.
Most models love to pretend that her face presses into her own ass somehow like this.
I don't think we'll ever get to the point where an LLM has fundamental enough understanding to truly grasp spacial relations.
Anonymous No.106469830
>>106469718
This has been true for a while. The silver lining is that models have improved a lot at math and codemaxxing, which implies that finetuning can be effective. RP is a forgotten afterthought at most, if anything they actively spend time trying to make models worse at it. There probably is a ton of room to improve if someone actually tried to make models good at RP.
Anonymous No.106469865 >>106469954
>>106469718
wait for new gemini. good at code and math sir
Anonymous No.106469954
>>106469865
>pajeet patel telling anyone anything with regards to predictions
He should stick to his semiconductor analysis which is way more solid but which he still grifted his way into.
Anonymous No.106469995
>>106469783
>absolutely zero semi-complex spatial awareness or anatomic understanding the moment things get slightly complex.
I'm sure synthetic data would be able to save us.
Anonymous No.106470028
>>106468858
if you want pinokio already has an API up under community scripts (windows/nvidia only) that works well. Vibe can clone voice off of clips but it wont do anything crazy far out. You also might like kokoro if you value stability and just want a really nice sounding microsoft sam.
Anonymous No.106470076 >>106470083 >>106470131
I played through all the MCC Halo games and it's funny how AI is treated in those games. You basically have to insert Cortana into terminals to do anything complicated. There are no other AI's in those other systems or that you can use to help if you somehow Cortana were to not exist or not be with you. In Halo 4, Chief gets fucked in the ass multiple times when she Cortana can't do her job. He should've brought more than 1 AI with him, even a "weak" one which could at least still assist in what's basically tool calling lmao.
Anonymous No.106470083 >>106470092
>>106470076
I should've given this post another read through after I edited it...
Anonymous No.106470092 >>106470116
>>106470083
Should have used a weak AI that could have at least assisted you with proofreading baka Anon
Anonymous No.106470116
>>106470092
Now that you mention it, it is pretty odd that browsers don't have grammar checking by default in 2025 and only spell checking still.
Anonymous No.106470131
>>106470076
Chief is a vibecoder pls understand
Anonymous No.106470201 >>106470214 >>106470309
Wtf, I just launched libreoffice and it doesn't have grammar checking either. Is grammar checking actually really difficult to implement and not something well developed in open source?
Anonymous No.106470214 >>106470216 >>106470218
>>106470201
Per application proof-reading is retarded anyway. Should just have a desktop helper application that can check and fix for all applications.
Anonymous No.106470216 >>106470236
>>106470214
True. Does Windows 11 or Applel do this then? I haven't used one of their OS's in ages.
Anonymous No.106470218 >>106470236
>>106470214
If only there was a standard set of input components provided by the operating system where that could be universally implemented.
Anonymous No.106470236 >>106470250
>>106470216
Windows 11 does it the retarded way by updating all default applications to include Copilot, including notepad.

>>106470218
There's a way to set a default application for things like email addresses, I'm sure there would be a way to hack it in.
Anonymous No.106470250
>>106470236
I was being sarcastic, anon. Both Windows and OSX have this but the meta today is to reimplement your inputs in javascript so none of the OS-provided niceties work.
Anonymous No.106470309 >>106470330
>>106470201
>libreoffice
Found your issue
Anonymous No.106470330 >>106470338 >>106470369
>>106470309
So what's the alternative then, on Linux.
Anonymous No.106470338 >>106470536
>>106470330
vim
Anonymous No.106470369 >>106470737
>>106470330
https://appdb.winehq.org/objectManager.php?sClass=application&iId=10
Anonymous No.106470395 >>106470422 >>106470587 >>106470701
someone posted this >>>/v/719692781 but sounds like FUD so I was wondering what do you anons think over here
Anonymous No.106470422 >>106470587 >>106470701
>>106470395
I wish he wasn't, but he's right. Any game that packages a local model will have very specific requirements that most other games don't care about, and the LLM will be the majority of the game's size. I've researched AI in games as a concept and it's incredibly difficult to fit them in, since code is such a rigid thing and LLMs by design give any number of outputs the game needs to handle to tie AI into game mechanics. It's really difficult to make AI have any mechanical impact on the game and not just describe things or relay dialogue. And again, this is speaking with the theoretical that the AI is a local model that comprises the majority of the game's overall size. And processing power.
Anonymous No.106470536
>>106470338
I didn't know vim had grammar checking
Anonymous No.106470570
why is codex so much better than claude code these days
Anonymous No.106470573 >>106470616
>>106468555
It wouldn't surprise me if it got canceled because there are many oversized open-weight models from China already (no more surprise factor in releasing something like that) and with Mistral's current datasets it would end up being something akin to a DeepSeek V3 variant, at this point.
Anonymous No.106470587 >>106470701 >>106470719
>>106470395
Unlike >>106470422 I think it's feasible, but not without being very smart in the way you're using it. You need to offload most of the processes to subroutines and markov chains, you just have to keep a small llm (nowadays even 1B are very coherent) for the dialogue itself. The AGI meme has caused retarded expectations about LLMs able to thunk/act like a person. That's not gonna happen anytime soon.
Anonymous No.106470616
>>106470573
Who knows if Mistral Medium 3 is actually a DeepSeek V3 finetune, just like Mistral Medium 2 was one of Llama-2-70B?
Anonymous No.106470666
what is the best youtube tutorial series where i can learn how to run these models locally on my pc? image models, text models etc...
youtube is full of tutorials but i wonder which ones are actually good and won't waste my time
Anonymous No.106470701
>>106470395
>>106470422
>>106470587
He's mostly right, LLMs are too heavy. (You'd want something that can run on a 3060 8gb WHILE the game delivers 60fps)
>nowadays even 1B are very coherent) for the dialogue itself.
They're still really crappy though. Who wants to read slop dialog, and low quality at that?
> It's really difficult to make AI have any mechanical impact on the game
Yes, you'll still have program, animate and test every action the npc can take which is much more work than writing anyway. Then because llms arent reliable they will say something that doesn't match the game or actions it can take.

What would be interesting is if someone trained medium-sized models for npc actions only, not language.
Anonymous No.106470719 >>106470723 >>106470759
>>106470587
>The AGI meme has caused retarded expectations about LLMs able to thunk/act like a person.
That is the bar, though. A writer can write better dialogue than the AI. An enemy player can make quicker and smarter tactical plays than an AI. If the AI is just going to produce content that does not do something humans can't do, and does it worse than humans, why bother making it a massive part of the game?
Anonymous No.106470723 >>106470742 >>106470759
>>106470719
because it can do things unexpectedly. You know the old "every copy of [game] is personalized" meme? Think that, but actually.
Anonymous No.106470737 >>106470742 >>106470749 >>106471086
>>106470369
Wait. Why is there an anime girl fox nurse on the winehq "making sure you're not a bot" splash screen? And since when? It's gone or I'd grab a screenshot. Wait, I managed to save the image. Let's see. There is no red cross on her hat. It seems to be some other kind of logo.She is not a nurse, but perhaps some sort of Canadian.
Anonymous No.106470742 >>106470745
>>106470723
I'll believe that when models can do rp without slopping it up to canada and back.

>>106470737
Reminder that some people think chibi = csam (yes, they made a new word for it)
Anonymous No.106470745
>>106470742
mpv devs are very cringe, yes.
Anonymous No.106470749 >>106470753
>>106470737
Supposedly https://en.wikipedia.org/wiki/Anubis_(software)
Anonymous No.106470753
>>106470749
I see, so she's not wine's mascot. Too bad.
Anonymous No.106470759
>>106470719
>retarded expectations about LLMs able to thunk/act like a person.
Yes, that is the expectation in a story. You expect the character to act like a person and the dialog is incredibly important.
>>106470723
It can't. It can just generate text. And yes it will do that unexpectedly, and promise to do things it can't do, for example.
Anonymous No.106470769 >>106470791 >>106470803 >>106470997 >>106471019 >>106471086 >>106472744
Just looked up this Anubis mascot thing.
>this entire discussion
https://discourse.gnome.org/t/anime-girl-on-gnome-gitlab/27689
>ends with ebussy
loooooool
Anonymous No.106470791
>>106470769
>ear piercing
Why did they add that? It makes me want to molest her.
Anonymous No.106470803
>>106470769
>Anubis
>Canadian
WE WUZ
Anonymous No.106470842 >>106470894 >>106470974 >>106470988
>>106467368 (OP)
Is cogito-v2 good?
It's based on Llama 4 Scout, which is super fast.
Anonymous No.106470894 >>106470964
>>106470842
GPT-OSS 120b by OpenAI is much faster for your information.
Anonymous No.106470964
>>106470894
Yeah, but it is stupidly censored.
Anonymous No.106470974
>>106470842
It is 100 times better than llama4 109B. I can tell you that much.
Anonymous No.106470988 >>106470999 >>106471011 >>106471044 >>106471064
>>106470842
Why are LLM's so bad at factuality anyways? They have the information in there somewhere due to being trained on practically everything, does it simply not comprehend what a fact is? The reasoning model literally does worse on it then it's non reasoning counterpart.
Anonymous No.106470997
>>106470769
>weird and unprofessional.
>Usecase for anime girls on a technical website?
people like this is why we need mikutroons and dipsyfags
Anonymous No.106470999
>>106470988
they've all lied to me many times.
Anonymous No.106471008 >>106471028
>A100 40GB are now 2.5K
Huh, are the datacenters starting to dump them? I mean, I guess getting a motherboard and etc. to harness NVLink would be a pain since the https://rentry.org/V100MAXXING guide is only for SXM2 and it seems like from what ebay is saying, cost of entry is 4.7k for one of the servers. But maxing out a build for $25k with A100 40GBs is probably cheaper when 2 of those cards would cost the same as the used server does today. Still not cheap by any means to own, certainly not ready enough for a guide. Still, probably means Blackwell has started spreading everywhere for hardware to start get phased out and everything adjusting accordingly. Probably will stabilize next year.
Anonymous No.106471011
>>106470988
>does it simply not comprehend
yeah
Anonymous No.106471019 >>106471045
>>106470769
>FWIW, the developers of Anubis kindly ask for financial compensation for running instances of Anubis with the character removed.
Now I kind of want to make a project but enforce this via license.
Anonymous No.106471028 >>106471060 >>106471503
>>106471008
Anonymous No.106471044 >>106471064
>>106470988
Because they aren't trained for factuality. Every training step updates the weights with the average of the gradients of every sample in batch (which can be tens of million tokens big or more, on large GPU clusters). The training objective is making the model a better next-token predictor *on average*. Memorizing facts comes by accident.
Anonymous No.106471045
>>106471019
>Your project will have my furry anime girl
>If it doesn't you will have to pay me money
Honestly based
Anonymous No.106471060
>>106471028
Kek wtf, why are those a100's heatsink so HUEG?
All examples I've seen before have had featureless blowers about twice the thickness of a 12 pin port.
Anonymous No.106471064 >>106471145 >>106471593
>>106470988
>>106471044
Hallucinations are the desirable characteristic at the low level, we want models to come up with novel phrases we didn't teach them.
I suspect factuality training will just make models more retarded in general just like safety training does.
Anons sneer at MCP as a meme but it just makes sense to use solid knowledge technology for working with solid knowledge.
Anonymous No.106471080
Mistral owes me a new ~30B model.
Anonymous No.106471086
>>106470737
>>106470769
where is the porn
Anonymous No.106471094 >>106471130
These benchmarks are pissing me off. Probably funded by OpenAI as well. You know it's bad when even redditors realize something's off. Like ITT: https://www.reddit.com/r/LocalLLaMA/comments/1n75z15/gptoss_120b_is_now_the_top_opensource_model_in/
I've done a fair bit of testing myself with these models, and there's no way the 'OSS is that good. OpenAI definitely had a plan when they released these models. As anons already have concluded, it is nothing short of open source well poisoning. And that new swiss Apertus model is aiming to do the exact same thing.
Anonymous No.106471100 >>106471202 >>106471593 >>106471675 >>106471925
Literally everything an LLM does is hallucinations. There is no difference between the output you think of as hallucination and the output you think as fact from the pov of LLM architecture. Stop anthropomorphizing the LLMs, they do not think, and they do not reason. They do not have the ability to "judge" their own knowledge and introspect. They are pattern matchers. That's why coomers here can make even gpt-oss behave like a little slut. Can you jailbreak a random human being on the street into being a slut in the sheets?
Jailbreaks can exist because LLMs are pattern matchers autists.
Anonymous No.106471130
>>106471094
The methodology is open. But notice how they recently changed it.
https://artificialanalysis.ai/methodology/intelligence-benchmarking
I'm too lazy to dig but the thing to me is the fact that they expect all the MMLU tests to be one shotted which is unusual and then some of the other stuff like AIME2025, you get 10 retrys. Why?
Anonymous No.106471139 >>106471167 >>106471171
whats new in this space, anything going on?
Anonymous No.106471145 >>106471187
>>106471064
There are indications that LLMs don't actually generalize, but mostly reproduce the training data by interpolation. So they will never be able to produce anything truly novel on their own.
Anonymous No.106471167
>>106471139

wan3 in two more weeks
Anonymous No.106471171 >>106471189
>>106471139
fatter gemma 4
Anonymous No.106471187 >>106471316 >>106471388 >>106471593
>>106471145
>There are indications that LLMs don't actually generalize, but mostly reproduce the training data by interpolation. So they will never be able to produce anything truly novel on their own.
I entirely agree with you, but isn't that just a technicality? I mean, given sufficient general knowledge, interpolating bits of that knowledge could create an infinity of technically new content right?
Just like how Mendeleev's periodic table predicted new elements from a simple ruleset
Anonymous No.106471189 >>106471231
>>106471171
> fatter gemma 4
more safeguards, sir!
Anonymous No.106471202
>>106471100
lol
Anonymous No.106471231 >>106471250 >>106471258 >>106472771
>>106471189
I want gemma 4 to have gpt-oss style of reasoning and refusals, but with the addition of the hotlines and judgmental footnote that are typical of Gemma.
It's my fetish.
Anonymous No.106471250
>>106471231
>It's my fetish.

Welcome to 4ch. Feel at home here
Anonymous No.106471258
>>106471231
Femdom faggotry's final form: Refusaldom.
This nigga running
https://github.com/ConAcademy/buttplug-mcp
And has an LLM in charge of when the toys are on.
Anonymous No.106471316 >>106471399
>>106471187
Not really, because big AI labs keep pretraining LLMs with an increasingly larger number of GPUs (i.e. overall bigger training batches), and while they might be seeing more data during their training period, sample-level nuances end up getting averaged out, so the models aren't gaining more knowledge, just getting better at reproducing the *average* of the languages trained on them.

They should be scaling down instead; use less GPUs and lower amounts of more downstream-relevant and information-dense data instead, but /lmg/ probably won't like this since it can't be properly done without large amounts of synthetic data designed for that purpose.
Anonymous No.106471388
>>106471187
they are language models, literacy is only loosely coupled to actual knowledge. plenty of people are dumb but can talk your head off or vice versa someone who knows their shit but can't communicate it effectively.
Anonymous No.106471399 >>106471426
>>106471316
why not just do both? pretrain on big batches to get you close and then post train on smaller focused batchs? there has to be a way to have our cake and eat it too!
Anonymous No.106471426 >>106471487
>>106471399
I think that's what the extensive post-training phase of modern big-model training tries to accomplish, to some extent. It's not just about making the model an obedient assistant anymore (it hasn't been for a while).
Anonymous No.106471487 >>106471525
>>106471426
>It's not just about making the model an obedient assistant anymore (it hasn't been for a while).
I thought it was just safety and schizo reasoning traces these days.
Anonymous No.106471503
>>106471028
>40GB
Who buys this shit?
Anonymous No.106471525
>>106471487
For open-data LLMs, yeah. But then, MistralAI notably added more trivia in Mistral Small 3.2, which was supposed to be just an updated instruct finetune.
Anonymous No.106471555 >>106471580 >>106471702
>>106467508
I sometimes feel like they use a diffusion model on some requests. It's as if they were doing an A/B test of some sort. I think I mainly saw this on Gemini.
Anonymous No.106471574 >>106472404
>>106467475
>Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
>Diffusion Large Language Models (DLLMs) are emerging as a powerful alternative to the dominant Autoregressive Large Language Models, offering efficient parallel generation and capable global context modeling. However, the practical application of DLLMs is hindered by a critical architectural constraint: the need for a statically predefined generation length. This static length allocation leads to a problematic trade-off: insufficient lengths cripple performance on complex tasks, while excessive lengths incur significant computational overhead and sometimes result in performance degradation. While the inference framework is rigid, we observe that the model itself possesses internal signals that correlate with the optimal response length for a given task. To bridge this gap, we leverage these latent signals and introduce DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive Length Expansion for Diffusion Large Language Models. DAEDAL operates in two phases: 1) Before the denoising process, DAEDAL starts from a short initial length and iteratively expands it to a coarse task-appropriate length, guided by a sequence completion metric. 2) During the denoising process, DAEDAL dynamically intervenes by pinpointing and expanding insufficient generation regions through mask token insertion, ensuring the final output is fully developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves performance comparable, and in some cases superior, to meticulously tuned fixed-length baselines, while simultaneously enhancing computational efficiency by achieving a higher effective token ratio. By resolving the static length constraint, DAEDAL unlocks new potential for DLLMs, bridging a critical gap with their Autoregressive counterparts and paving the way for more efficient and capable generation.
>https://arxiv.org/abs/2508.00819v1
Anonymous No.106471580 >>106471702
>>106471555
Google do love the crap of their A/B tests, so it shouldn't surprise anyone if that's indeed the case.
Anonymous No.106471593 >>106471615 >>106471706 >>106472843
>>106471064
>>106471100
>>106471187
Why exactly do "we" expect a supervised learning next token predictor to be factual anyway? There's no real understanding, just predicting the nearest or most likely outcome. Considering this, I'm surprised that small sub 8B basemodels are even capable of outputting anything, really.

ENTER:
reinforcement pre-training

I don't think there's a open source model yet which uses something like this, but it could be key for AI advancement. Even if it's just a small part of a MoE model or a reasoning/thinking orchestrator guiding the LLM.

The theory of teaching a model how to think instead of what to think first sounds more logical to me than the current approach, which seems to be to first teach the what (pre training) and then the how (fine tuning). I guess in theory a LLM trained solely in logic with RL and connected to websearch/knowledge would produce remarkable results. But then again surely there would be downsides as well, like reasoning/thinking time needed to come to the correct conclusion
Anonymous No.106471615 >>106471880
>>106471593
>a LLM trained solely in logic with RL and connected to websearch/knowledge would produce remarkable results
RAG gods your time is soon triviacucks btfo for eternity
Anonymous No.106471675 >>106471753
>>106471100
MKULTRA was one jailbreaking test. Truth serums and torture are other ways. Softer jb would be just talking nicely...
Anonymous No.106471702
>>106471555
>>106471580
When Google does A/B tests on Gemini, and they do it often, it's an explicit request for you to rate which answer you preferred between two models. It's not happening stealthily in single answers, and they have no reason to do that because being able to rate which model answered best is inherently superior.
Anonymous No.106471706 >>106471880
>>106471593
I think it would probably work but it would be a pretty dry model if it learned its language understanding from logic. maybe you could use two networks duct taped together, one designed for the logical thinking part + rag and another network that can use the context to make a pleasant completion?
Anonymous No.106471753 >>106471785
>>106471675
>MKULTRA was one jailbreaking test. Truth serums and torture are other ways
and then afterwards glownigs felt really stupid for they realised they needent use expensive chemicals nor waste man hours on such things nay they merely needed a jew or 2 to pay a couple milly to rappers so they promote their jailbreaks
Anonymous No.106471785 >>106471858
>>106471753
This is what is going to happen with data centres: they are massive waste of electricity because nvidia or anyone else haven't advanced technology in over 10 years now. Sure dies are smaller but this obviously isn't enough.
Anonymous No.106471858 >>106471912 >>106472467
>>106471785
what you are quoting and what you are saying do not compute
Anonymous No.106471880 >>106471904 >>106471908 >>106473609
>>106471615
Actually it wouldn't be strictly RAG. Or no RAG, even. There's an on-going war in the coding assistant developer community. On one side you have Cline and Augmentcode, which say RAG is dead/obsolete and simply not needed when pairing current SOTA LLMs with agents and mcp (web search, git tools). Their arguments basically are:
>RAG not needed because we have bigger context windows now, which fits multiple code files
>Agents are better and more precise finding relevant information with git tools and grep instead of chunked RAG code
>maintaining a vector codebase DB is annoying. Agents are better, citing Claude.
>Big RAG/VectorDB companies like pinecone are coping and trying to undermine the fact that RAG is obsolete.
>Agent coding delivers better results but is more expensive. RAG is only useful for companies like Cursor that try to reduce query tokens as much as possible to get better profit margins on the subscriptions they sell.
>relational node graphs, which RAG uses to combat some shortcomings, are nothing but a meme
If you care about the points in more detail, watch or summerize this: https://youtu.be/eaeGd30Uypg
Personally I'm on team RAG because it's just so much cheaper when you index your codebase. Besides, quality degradation with context bloat is still a very real thing, even before going beyond 200k tokens.

But I find the discussion highly relevant, because having an agentic coding assistent is probably the closest thing we have to a LLM trying to be purely logic driven.


>>106471706
>it would be a completely dry model
or would it...kek
https://youtu.be/NUl6QikjR04
Anonymous No.106471904
>>106471880
*an LLM
God my brain is toast. Bring on the Neuralink, Musk.
Anonymous No.106471908
>>106471880
>we have bigger context windows now, which fits multiple code files
that's crazy to read when you see how much models still struggle with context despite advertising "15 quintillion contexts!" on their marketing
Anonymous No.106471912
>>106471858
perhaps he thinks that because the mkultra project was eventually deprioritized, that the llms will also be?
Anonymous No.106471925
>>106471100
It is not real AI, thus a prof called it Supervised Intelligent instead. You prompt it to behaves as you want.
Anonymous No.106471943 >>106471960 >>106471987 >>106471999 >>106472016 >>106472019 >>106472776
https://old.reddit.com/r/StableDiffusion/comments/1n6rrg2/made_a_local_ai_pipeline_that_yells_at_drivers/
fucking kek the ultimate state of current era men
>amazon driver pees on the house on a regular enough basis that it triggers a person to action
>the action: build a pipeline that feeds an image every 10 sec to a vision llm to detect peeing and starts a very gentle sounding female AI voice that tells people off
if you're going that far rather than reporting to the police and amazon how about unleashing a trap of horse manure or something
even some crazy ass scream sound
that female ai voice? I think he's going to start masturbating
Anonymous No.106471960 >>106471979 >>106471990
>>106471943
thank you for the reddit update
how do I unsubscribe?
Anonymous No.106471979
>>106471960
>how do I unsubscribe?
consent is not a /lmg/ value
Anonymous No.106471987
>>106471943
>how about unleashing a trap of horse manure
that would be assault
Anonymous No.106471990 >>106472007
>>106471960
here [unsubscribe](link-to-infect-you-with-more-spam)
Anonymous No.106471992 >>106472508
>go from Nemotron super 49b v1 to v1.5
>only change is that it refuses an order of magnitude more
Anonymous No.106471999
>>106471943
Like many things on reddit, this is probably just a grassroot advertisement. In this case for gabber.dev
Anonymous No.106472007
>>106471990
>link that tells them you actually read the email at all
is indeed quite the thing
Anonymous No.106472016 >>106472172
>>106471943
Anonymous No.106472019 >>106472078
>>106471943
if it was actually real (its not), some places prohibit defense of property
you just have to let it happen
Anonymous No.106472078 >>106472179
>>106472019
I'd bash his skull in. People should stop acting like cucks because someone wrote a law for it
Anonymous No.106472172
>>106472016
piss, no matter how yummy it is, will kill plants tho
Anonymous No.106472179 >>106472972
>>106472078
Man, I can't wait for something like that to happen so I can counterbash their skull in and claim self-defense. And I wouldn't get in trouble for it because the law's on my side.
Anonymous No.106472255 >>106472445
>>106467802
>>106468423
LM studio if you want it easy.
but yeah if you're used to the very large models, you need to know that local is only as good as the hardware you have, so when running smaller models don't expect large model quality.
However, any models you do run are yours forever, no api/proxy downtime. if you want some models:
rocinante
CaptainErisNebula
cydonia
glm air
Anonymous No.106472404
>>106471574
imagine LLM diffusion instead of transformers

>6ater xh eaat4d fii coak on hru thight, shh falt a shiver rxnning dews hhr spane
>aater hh eaated hii coak on hru thight, shh falt a shiver rxnning down hhr spane
>after he raated his cock on hru thight, she felt a shiver rxnning down hhr spane
>after he rested his cock on hru thight, she felt a shiver running down her spine
>after he rested his cock on her thight, she felt a shiver running down her spine

then you put the lightning loras and boom lol. I wonder how quantization will work here
Anonymous No.106472439 >>106472448
how to local rag? My usecases are gaping lolis through ST and an assistant with my personal data available
Anonymous No.106472445 >>106472455
>>106472255
if the choice would be rocinante q8 or cydonia q4, which one would you choose?
Anonymous No.106472448 >>106472454
>>106472439
The built in Vector Storage extension
Anonymous No.106472454 >>106472466 >>106472473
>>106472448
id like to keep two separate DBs tho, one with smut, one with personal info
Anonymous No.106472455
>>106472445
both so you can swap it once you get too over trained on the outputs and it gets boring.
Anonymous No.106472466
>>106472454
Keep two ST instances.
ST creates separate databases for each chat I'm pretty sure.
Anonymous No.106472467
>>106471858
I didn't quote anything you fucking autist. Go back to moderate r-eddit.
Anonymous No.106472473 >>106472481 >>106472615 >>106473072
>>106472454
wand/extension button in the bottom then this. you can toggle them on or off when you want easily
Anonymous No.106472481
>>106472473
time to develop an ingestion pipeline with dots.ocr :)
Anonymous No.106472508
>>106471992
try the valkyrie tune of it, it's pretty good. I used to have like 34gb vram and really wished I had something above 30b and nvidia's 49b is the perfect size. For anyone who hates finetunes- 49b is so fucking dry and unusable no matter how you prompt it. It's like gpt oss levels almost.
Anonymous No.106472511 >>106472534 >>106472548
Okay, now this is epic. OpenAI's wonderful GPT-OSS 120b, although slow and quantized, is able to follow my requirements when coding a script, but gemini isn't.
Anonymous No.106472534 >>106472562
>>106472511
shut the fuck up sam
Anonymous No.106472548 >>106472562
>>106472511
Post the logs to prove .
Anonymous No.106472562
>>106472534
>>106472548
Haha, yes. Of course! That's a wonderful insight!
Anonymous No.106472615
>>106472473
do you have any experience with RAGging conversation history? Is there a way to like keep the last N messages in context, and for the rest use the RAG? Will this produce crap?
Anonymous No.106472744 >>106472910
>>106470769
How can this be real
Anonymous No.106472771
>>106471231
Some men drown while others die of thirst
Anonymous No.106472776 >>106472950
>>106471943
They are all peeing there for a reason: It's a good place to do it quickly and sneakily. They don't have time to do it in a better way. A better way could be to build a simple dry WC or something like that, with a small Amazon-like logo on it (like "Pee" with the smile beneath it). This way, it's not as disgusting and you could style get something aesthetic enough.
Ignoring reality is bluepilled and highly retarded.
Anonymous No.106472793 >>106472808 >>106472856 >>106472866 >>106472938 >>106472953 >>106473060
Kimi K2-0905 dropped!
Anonymous No.106472808
>>106472793
god I hate emojislop
Anonymous No.106472843 >>106473137
>>106471593
>I guess in theory a LLM trained solely in logic
"All the other stuff" is what provides the ungodly amount of mathematical measurements required to create a statistical model that effectively contains a complete understanding of abstract semantics.
You clearly have no fucking clue how any of this shit works. On Reddit you might just talk shit and make shit up because the mod whose cock is lodged in your throat will stop anyone from calling you out.
On here you can fuck right off with that bullshit.
Anonymous No.106472856
>>106472793
>discord
Anonymous No.106472866
>>106472793
goofs?
Anonymous No.106472910
>>106472744
Maybe there are actually people who watch cuck porn.
Anonymous No.106472938
>>106472793
>coding powers
:skull: :skull:
Anonymous No.106472950
>>106472776
Better solution: Make it inconvenient so it becomes somebody else's problem.
Anonymous No.106472953
>>106472793
I was hoping for reasoner but I will take another K2 checkpoint. K2 is the most coherently unhinged model after a simple prefill and it's the best open source RP and creative writing model.
Anonymous No.106472972
>>106472179
Good luck counterbashing anything cuck
Anonymous No.106473060 >>106473070
>>106472793
>dropped
where?
Anonymous No.106473070 >>106473121
>>106473060
Discord only
Anonymous No.106473072
>>106472473
How does it work? Just upload and keep chatting as usual?
Anonymous No.106473106 >>106473172
man LLM's are getting filtered hard by json. I simply tell them to add a coordinate key:value pair to everything, and they just can't do it. jeetgpt5 chat as well. should I just switch to xml? or do I really need a SQL db an tell it to use the specific SQL tools?
Anonymous No.106473107
https://gametora.com/_next/static/chunks/3998-19c42d835b448b85.js
I'm still surprised of how easy is to translate minified code with AI.
Anonymous No.106473121
>>106473070
How to connect tavern to discord?
Anonymous No.106473137
>>106472843
Fuckin' transformers, how do they work?
Anonymous No.106473172 >>106473391
>>106473106
>should I just switch to xml?
What makes you think that'll help?
>do I really need a SQL db an tell it to use the specific SQL tools?
What would that solve?
Anonymous No.106473288 >>106473310 >>106473377 >>106473570 >>106476036
have anyone tried to train a 1~5m(not b, i really mean m) model off of tinystories or something similar?
how retarded they are?
Anonymous No.106473310 >>106473354 >>106473465
>>106473288
I think some people have done 200-500m models yeah. And a few sentences or paragraphs before breaking apart usually.
Anonymous No.106473354 >>106473434
>>106473310
even with specialised subdimensional dataset like tinystories?
Anonymous No.106473377
>>106473288
you would need a really tiny embedding matrix to have any parameter budget left over for transfomer blocks. it wouldn't do very good. I think it is not until the hundreds of millions of parameters range when things start to get a bit interesting.
Anonymous No.106473391 >>106473480
>>106473172
because AI oneshots the task with the exact same structure in xml, but fails with json. maybe something to do with indentation since it fails to understand the structure? With a SQL DB it could query specific rows or values, and maybe gets less confused compared to ingesting the entire thing. Also having to deal with escaping quotes is just utterly fucking retarded
Anonymous No.106473434
>>106473354
It was a while go on some guys blog. You find even more of them, and they're even properly 1m, all over the place. Not a lot resources needed to train a model, so a lot people do it for fun or as a learning experience but the quality varies.

https://huggingface.co/models?dataset=dataset:roneneldan/TinyStories

There's probably more floating around on the internet.
Anonymous No.106473448 >>106473464 >>106473515 >>106473522 >>106473601
I have FX-9590 with 32gb RAM and an RX 9060 XT. Can I run local Grok and be based like all of you my niggas? Not changing the CPU, but it's watercooled like a mofo.
Anonymous No.106473464
>>106473448
no. just run one of the chink models
Anonymous No.106473465
>>106473310
yes. some guy trained this on a single 4070. I havent tried it but apparently it runs in lmstudio. I assumed it sucks. https://huggingface.co/Nikity/lille-130m-instruct
Anonymous No.106473480
>>106473391
>because AI oneshots the task with the exact same structure in xml
Then why are you asking if you should or not?
This is all stuff you can test yourself and use whatever works best for whatever you're doing.
Anonymous No.106473515
>>106473448
yah, thanks to having a gaming computer you can run whatever you want. Chatpgt, grok, the google one, take your pick.
Anonymous No.106473516
I sure love when lazy ass LLMs look at a large file and instead of doing the requested changes, it removes most of the files content with some cheeky comment like
>*The details have been removed for brevity but are kept in the original document version history
It's unironically really fucking funny.
Anonymous No.106473522 >>106473869
>>106473448
>FX-9590 with 32gb RAM and an RX 9060 XT
What a strange choice of hardware. You can't even use 9060 XT for anything. With nvidia card you could at least use it for work or imagen on this platform
Anonymous No.106473570 >>106473600 >>106473603 >>106473612 >>106473681 >>106473706 >>106473815 >>106474606
>>106473288
Truly open source models may have a ways to go guys
Anonymous No.106473581
>>106467368 (OP)
Long Miku is long.
Anonymous No.106473597
Man, a shame we don't have better card to card interconnects.
If we could stack some 4 Arc Pro B50 and get some 75% of the total memory bandwidth, that would be pretty sweet.
As is, even splitting the model 4 way the PCI-E bus would still bottleneck the whole process right?
Anonymous No.106473600
>>106473570
Ham confirmed vegan.
Anonymous No.106473601 >>106473624 >>106473645
>>106473448
>FX
That's older than most of the posters here these days
Anonymous No.106473603
>>106473570
unironically looks promising for what it is
Anonymous No.106473609
>>106471880
Assuming logic is all you need would be pretty dangerous, even for coding agents. The reason that MoEs and language models in general have worked up until now to make coherent talkers is because much of intelligence is a function of both knowledge and knowledge-specific logic (or you can think of that as knowledge of how to use specific pieces of knowledge), in addition to more generalized logic. This can be done through RAG, ONLY if the query-time algorithm includes learning. That is, inference turns into both inference and training. Humans cannot immediately do a too OOD task after a single read or watch, and neither can any AI. If the architecture doesn't learn during the query, a logic-specialized model + RAG will still not be the solution to AGI or whatever we're hoping for. And unfortunately it may be a long time before we get such architectures for production, as it'd be too compute intensive. Maybe if we get a magical breakthrough again.
Anonymous No.106473612
>>106473570
its pretty coherent, unfortunately lacking with the factual recall tho.
Anonymous No.106473618 >>106473632 >>106473651 >>106473659 >>106473663 >>106473671 >>106473679
Another flop?
Anonymous No.106473624 >>106473884
>>106473601
Anonymous No.106473632 >>106473651
>>106473618
you know it
Anonymous No.106473645
>>106473601
you should at least say phenom
Anonymous No.106473646
https://www.nextbigfuture.com/2025/09/room-temperature-diamond-integrated-chip-quantum-accelerators-installed-at-oak-ridge-national-lab.html
You bros ready to buy a QPU?
Anonymous No.106473651
>>106473618
>>106473632
They have a new chang boss.
Anonymous No.106473659
>>106473618
the gpt oss killer
Anonymous No.106473663 >>106473673 >>106473715
>>106473618
Surely not, Zucc has now spent billions on the highest profile scientists that can now make use of his massive H100 stack.
Anonymous No.106473671
>>106473618
probably just the same old llama 4 with some mistakes corrected and a few easy wins added, I wouldn't expect any large changes until the billionaire superintelligence meme team has had some time to settle in
Anonymous No.106473673
>>106473663
>highest profile scientists
scammers from china fify.
Anonymous No.106473676 >>106473694
i wonder how the base models before all the safeguarding and lobotimisation of those small open weight models companies made looks like
Anonymous No.106473679
>>106473618
Absolutely.
I bet it'll be at most equivalent to GLM, but actually worse in actual use.
Anonymous No.106473681 >>106473750
>>106473570
>138 t/s from a 130m
Why is it so slow?
Anonymous No.106473694 >>106473699
>>106473676
About the same. They cuck the model at the pretraining dataset level.
Anonymous No.106473699
>>106473694
huh fuck
Anonymous No.106473706 >>106473712
>>106473570
mogged by gemma 280b
Anonymous No.106473712
>>106473706
Mogged by gpt-oss 120t
Anonymous No.106473715 >>106473819
>>106473663
>that can now make use of his massive H100 stack.
maybe someday.
https://xcancel.com/giffmana/status/1957155168417378687
>within just a month, while waiting for new resource allocation
>- on just six H100's the 3 of us share
this guy is one of meta's new superintelligence hires btw
Anonymous No.106473750
>>106473681
it's q8 and flash attention seems to not work on it. On longer stuff it gets up to ~350 or so. I dunno whatever, it doesnt matter. Oh, and I'm on windows right now.
Anonymous No.106473815 >>106473839
>>106473570
The training data needs not to be just random web documents, if you have very limited compute for training a model from scratch. Reminder that Phi-1 was trained on 7B unique tokens for 7~8 epochs (50B tokens in total). https://arxiv.org/pdf/2306.11644
Anonymous No.106473819
>>106473715
>within just a month
>wrote an entirely new codebase
This is the reason why llama 4 is so shit.
Anonymous No.106473839 >>106473860 >>106473885
>>106473815
I prefer this Phi. https://arxiv.org/abs/2309.08632
>Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM
Anonymous No.106473860
>>106473839
fucking lmao
Anonymous No.106473865
I'm currently trying the Cogito finetune of Deepseek that they finetuned from scratch from V3-Base.
The good news is that it seems to get rid of a lot of the annoying Deepseek slop and dumb habits that kept getting in the way while using V3-0324 and R1-0528. I haven't seen a single em-dash or the other typical slop phrases so far. It gets a certain sub-type of scenarios right that anything pre-V3.1 (and even that occasionally) got very confused by.
The bad news is that it's super safety-slopped and tries to refuse a lot of requests. You can hook it into a 18k token long RP and it'll still find a way to steer away from anything lewd.
Honestly, it's sad. It's probably better than V3.1 in terms of writing but the safety brain damage makes it unusable. At least this shows that the big problem with Deepseek models is their retarded -Instruct/Reasoning dataset.
Anonymous No.106473869
>>106473522
>You can't even use 9060 XT for anything

It's a graphics card and it's used for gaming anon
Anonymous No.106473873 >>106473900
I love john because he has finally saved me from this thread. I just need to refresh his hf to see if any new models are here.
Anonymous No.106473884
>>106473624
Based
Anonymous No.106473885 >>106473910 >>106473931 >>106473936 >>106474056
>>106473839
It shouldn't be too hard to come up with a hyper-focused pretraining dataset of a few B tokens size giving the model basic knowledge, common sense and conversation capabilities in English language only, disregarding almost anything that the model is not expected to be used for (coding, advanced math, etc)... or is it?
Anonymous No.106473893 >>106473941
Do you guys think I'll be the one who invents the AGI?
Anonymous No.106473899 >>106474152 >>106474296
https://cline.bot/blog/focus-attention-isnt-enough
Is Cline right about this?
Anonymous No.106473900
>>106473873
What a beautiful man
Anonymous No.106473910 >>106473944
>>106473885
So many people think that's possible, if only models worked that way eh?
Anonymous No.106473931
>>106473885
When will you learn the bitter lesson?
Anonymous No.106473936
>>106473885
that 'common sense' is already too much and i think that is how they came up with tinystories
Anonymous No.106473941
>>106473893
Someone in this very thread.
Anonymous No.106473944 >>106473954
>>106473910
It worked for Microsoft Phi-1, and it was intended to be a general-purpose model with a focus on coding.
Anonymous No.106473954 >>106474034 >>106474068
>>106473944
Legitimately, does a single person actually use Phi for anything at all?
Anonymous No.106474003 >>106474013 >>106474028
>hook up 120b 'toss to codex, give it a simple prompt
>working (1205s esc to interrupt)
trash
Anonymous No.106474013 >>106474026
>>106474003
shartware specs?
Anonymous No.106474026 >>106474058
>>106474013
m4 max 128gb
Anonymous No.106474028
>>106474003
safety needs time
Anonymous No.106474034
>>106473954
No, because it doesn't "work".
Anonymous No.106474056
>>106473885
you would be better off training a bigger model on even more data. is there really a usecase for running llms on microcontrollers? even cellphones can handle a 600m.
Anonymous No.106474058
>>106474026
Have you tried using a gpu for prompt processing? I takes less than two minutes to process 50k context on power limited 3090s without parallelism.
Anonymous No.106474068 >>106474128
>>106473954
Microsoft is scared to death of people misusing AI models for "harmful content", so the Phi authors always made sure they were safe and inoffensive and ultimately not too useful except as a technical demonstration of what can be done with synthetic data.

But that's not the point here. If there's going to genuinely be a community-driven model at some point, it will likewise have to be efficiently pretrained with limited amounts of data selected with/for a purpose, not simply random web documents of unknown content and nature.
Anonymous No.106474128 >>106474142 >>106474170 >>106474187 >>106474355
>>106474068
>If there's going to genuinely be a community-driven model at some point,
distributed training is a meme, how many people do you need to design the logo? just collect the data and start training your model, this way you don't have to let anons tell you what to do or argue about every little decision.
Anonymous No.106474142 >>106474681
>>106474128
I can only fit so many gpus in my house.
Anonymous No.106474152
>>106473899
Didn't read into the article much but yeah, attention is still an issue, and it can be improved with different kinds of scaffolding like they're proposing. Basically cognitive architectures. But I think that's just one type of solution and it's wrong to think that this is a problem that can't be improved through better training methods and data. I think one improvement is to train reasoning models to be better at specific kinds of retrieval. Right now the thinking block retrievals are a bit meandering and in most cases unstructured in general. This can be a lot better.
Anonymous No.106474170
>>106474128
I thought we were talking about sub 1b models. you just need 1 for training.
Anonymous No.106474187 >>106474207 >>106474336 >>106474355
>>106474128
If a usable model can be pretrained with around 50-60B tokens carefully selected/generated for your intended use case(s) instead of 30~40T, then you just need *one* H100 node or less for a few weeks, not distributed training or an entire datacenter.
Anonymous No.106474199 >>106474242
I figured out O(n) attention.
Anonymous No.106474207
>>106474187
>If a usable model can be pretrained with around 50-60B tokens carefully selected/generated for your intended use case(s
it can't.
Anonymous No.106474212
I figured out O(1) attention
Anonymous No.106474218
I figured out that you don't actually need attention.
Anonymous No.106474242
>>106474199
More like n(O) attention.
Anonymous No.106474296 >>106474771
>>106473899
totally. even with fixes like attention sinks and whatever that new activation function thingy is called, the attention across the context gets spread pretty thin and evenly like warm butter on fresh toast. especially in tasks like coding where you have a gorillion thinkslop tokens and a bunch of irrelevant rag'ed code chunks, it's begins to blur the lines between "plans" and "intermediate thoughts". when 2k tokens of important instructions have to fight for attention with the 98k tokens of javascript-farts and "Wait, hold on, But", importance starts to become kind of homogenized. needle-in-a-haystack is a shitty benchmark because the needle tokens violently pull the embedding vector way out of line which makes it too easy for the LLM to solve. on the other hand real-world long-context tasks don't have such an obvious sore-thumb needle. I think cline is right about this, but their proposed solution is just some agent framework shit. it works well and makes sense, but not new or groundbreaking in any way.
Anonymous No.106474336 >>106474357 >>106474606
>>106474187
what is your use case? it will be dogshit for factual knowledge tasks and trivia.
Anonymous No.106474355 >>106474411
>>106474128
>>106474187
>distributed training is a meme
yeah i agree, using a million potato computers to do what one H100 can do in a week or so just doesn't make sense.
also AI is changing all the time, if you create a distributed method now, it will probably be obsolete in two weeks.
Anonymous No.106474357 >>106474438 >>106474498
>>106474336
Stop with the trivia meme, we need pure logic not wasted params on BS that can be RAGed.
Anonymous No.106474411
>>106474355
It's useful if you want to do something that nobody else wants to do. Like making a good model for RP that never refuses.
Anonymous No.106474430 >>106474437 >>106474519
I figured out O() attention
Anonymous No.106474437
>>106474430
God
Anonymous No.106474438 >>106474496
>>106474357
pure logic is as much of a meme as trivia.
you can do your pure logic with haskell or something
Anonymous No.106474496
>>106474438
That is what intelligence is though. Imo the best benchmarks are simplebench and arc-agi
Anonymous No.106474498 >>106474517
>>106474357
what do you consider trivia? i just mean things like knowing the grass is green or the sky is blue. it shouldn't need to rag basic knowledge but at the same time the only way to get the broad knowledge coverage is exposure more diverse tokens.
Anonymous No.106474517 >>106474534 >>106474564 >>106474572
>>106474498
>knowing the grass is green or the sky is blue
BLOAT you do not need this for RP
Anonymous No.106474519
>>106474430
O(2^x) that is
Anonymous No.106474534 >>106474572
>>106474517
I too enjoy having my erotic roleplay set on namek.
Anonymous No.106474564
>>106474517
RP isn't a productive use case
Anonymous No.106474572 >>106474585 >>106474590 >>106474596
>>106474534
>>106474517
More like there wouldn't be a planet at all, since no knowledge was trained, therefore you do not know that planets are an entity that exists.
Anonymous No.106474585 >>106474601
>>106474572
planets are bloat, learn to breathe vacuum
Anonymous No.106474590 >>106474613
>>106474572
It would figure all of that from context. You don't need to give it a dictionary at all.
Anonymous No.106474596
>>106474572
You only need the knowledge of what a bed is, if even that
Anonymous No.106474601
>>106474585
Sorry, what is breathe? What is vacuum? Can you clarify?
Anonymous No.106474606 >>106474622
>>106474336
If the goal is just vapid chatting, I think knowing what any word in the English dictionary means and how to use it in a conversation (see (>>106473570) would be a bare minimum requirement.
Anonymous No.106474613 >>106474627
>>106474590
>It would figure all of that from context
Not with transformers.
Anonymous No.106474622
>>106474606
>any word in the English dictionary means
so much wasted params make me cries
Anonymous No.106474627 >>106474658 >>106474668
>>106474613
I mean the context of the training data. It's learning to repeat text, you're not supposed to teach it like it's in school
Anonymous No.106474628 >>106474634 >>106474639 >>106474709
Anonymous No.106474634 >>106474646
>>106474628
Remove this cursed spawn of satan from my sight immediately.
Anonymous No.106474637 >>106474647
I figured out O(n^3) attention.
Anonymous No.106474639 >>106474646
>>106474628
post more of this cutie
Anonymous No.106474646 >>106474675 >>106474780
>>106474634
>>106474639
I'll do it if you shill my new model:
https://huggingface.co/TheDrummer/Skyfall-31B-v4
Anonymous No.106474647 >>106474693
>>106474637
post code or die painfully in your tonight
Anonymous No.106474658 >>106474712
>>106474627
What training data? The topic of the discussion was training models purely on logic and no factual knowledge. That means the training data doesn't contain knowledge about Earth and that the sky is blue and grass is green.
Anonymous No.106474668 >>106474682 >>106474712
>>106474627
that is where the argument started. Training a model on a synthetic dataset of nothing but logic would not get the broad coverage needed.
Anonymous No.106474675
>>106474646
Wdym new, I tested that like a week ago.
Anonymous No.106474681
>>106474142
Ahmed?
Anonymous No.106474682 >>106474740
>>106474668
>broad coverage needed
citation needed on that being needed SIR!
Anonymous No.106474693
>>106474647
No
Anonymous No.106474709
>>106474628
average undi model
Anonymous No.106474712
>>106474668
>>106474658
Well I wasn't there when the argument started. Maybe it'd be interesting to make a smart videogame AI with all that new transformer technology. Then there would be no language at all, 0 bloat
Anonymous No.106474738
So where can I buy a b60?
Anonymous No.106474740
>>106474682
Here's a source: the models that are trained on more logic than knowledge are used less in production than models trained with a balance of logic and knowledge (Phi vs the models people actually use).
Anonymous No.106474771
>>106474296
> stream of consciousness
SNOREEEEEEEEEEEEE

read a paper nigger
Anonymous No.106474780 >>106474807 >>106474823
>>106474646
>upscaled Mistral 24B
This retard can't keep fooling you or can he?
Anonymous No.106474807
>>106474780
I'm not a retard, and I'm not trying to fool anyone. This is hard, honest work.
Anonymous No.106474823 >>106474832 >>106474832 >>106474851
>>106474780
The logic behind what he does is sound, better than merges by miles.
Anonymous No.106474832
>>106474823
>>106474823
THANK YOU!!
Anonymous No.106474849 >>106474921 >>106474929
/lmg/ I have a problem, how do I create jailbreak data/guardrails testing data for a chinese model, in chinese, as someone who doesn't speak/read chinese? Will any of the qwen models with a system prompt do proper translations or does mistral Large handle chinese well and isn't filtered?
Hi all, Drummer here... No.106474851
>>106474823
Thanks anon.

Skyfall v4 is actually a blind experiment on a new technique that I hope works out. But I need to hear what everyone has to say about it, hence releasing it as-is. Just a hint: the duplicated layers are not just zero'd out.

I'll try a Skyfall R1 version after. Heard reasoning benefits from the extra layers.
Anonymous No.106474867 >>106474882 >>106474982
Comfy Mikus
Anonymous No.106474882
>>106474867
yeee
Anonymous No.106474919
slow week huh
Anonymous No.106474921 >>106475067
>>106474849
>Will any of the qwen models with a system prompt do proper translations
This is something you have to test yourself to your satisfaction.
It'd be hard to verify how well the jb works. Try grabbing a bunch of them, translate them to chinese and, on a new context, back to english to see what you end up with. If it still looks similar enough, use that model.
Anonymous No.106474929 >>106474968 >>106475067
>>106474849
> in chinese
why do you think you need to do any of this in chinese?
just do it in english, it understands english.
also, the model doesn't fundamentally think in chinese, it thinks in sequences of tokens, its just continuing this token sequence during inference.
Anonymous No.106474968
>>106474929
Some tokens are更好的than others.
Anonymous No.106474982
>>106474867
I used to think of myself as a really tough guy who doesn't fall for advertisements, but turns out I'm quite a simple man actually.
Anonymous No.106475067 >>106475126
>>106474921
fair, thanks, will try this approach

>>106474929
Because the model will be primarily used in chinese and I am not educated enough in LLMs to say assertively if the mapping between tokens and languages are similar enough for guard rails testing.
I know that LLMs use tokens, but the mapping of said tokens to other languages is the question, how well and how equivalently weighted those mappings are across langauges.
That I don't know/assumed to be not equal, given that LLMs have disparities across languages, even if they've been trained on them.
Anonymous No.106475126 >>106475147
>>106475067
106474921 is stupid ignore him
Anonymous No.106475140 >>106475180 >>106475258 >>106475278
Why can't we have distributed models by anons for anons.

You do prompt processing locally, with the one layer at a time trick to do it with low VRAM. Then you do inference across a pipeline of fellow anon's GPUs, all just running a couple layers. The latency will suck, but the throughput will be good. Then you just need to add some onion routing to hide the degeneracy came from you.

This is a lot like the following paper :
GenTorrent: Scaling Large Language Model Serving with An Overlay Network
https://arxiv.org/html/2504.20101v3
They did not take into account my suggestion of pipelining though, which allows large models to run on VRAMlet GPUs.
Anonymous No.106475147 >>106475165
>>106475126
pussy
Anonymous No.106475155
I had a look back at Gemma 3's technical report. According to it, they froze the vision encoder and trained the model. The vision encoder is less than 500M parameters. That's kind of crazy isn't it? It means that both text knowledge and vision knowledge are contained and don't ruin each other in those few 27B parameters, whereas other (open) vision models either suck at one of the modalities or are just really large models. Mistral Small's vision kind of sucks, but its text capabilities were at least maintained. Qwen had to make a separate model for vision. GLM 4.5 has a separate model for vision, and it's also much larger.

It kind of shows that vision can be currently way better than it is and in a tiny amount of parameters. I mean think about it, a diffusion model like Noob can gen a shit ton of characters and other stuff in just 7GB of space. Vision would take way less. There is something wrong about the way other companies are training their models or with their datasets. And the problem with companies is they won't train on "unsafe" data like booru stuff, which is one of the most extensively tagged image datasets you could get publicly, so they'll probably just be forever bad.
Anonymous No.106475165 >>106475169
>>106475147
Alright, here.
Anonymous No.106475169
>>106475165
Much better. Thank you.
Anonymous No.106475180
>>106475140
it's the same old "I will draw the logo" issue
Anonymous No.106475258 >>106475271 >>106475278
>>106475140
Effort to build it out is one factor, and the other is that we wouldn't be able to train much even if we did have the software ready to go. We might not be able to do significant pretraining even with all the GPUs of anons put together. At most, we would be able to do some fine tuning on a small model, and that might still be a good thing, but might not be worth it.
Anonymous No.106475271 >>106475925
>>106475258
This is about inferencing, I'm pretty sure.
Anonymous No.106475278 >>106475319 >>106475925
>>106475140
>but the throughput will be good
Nah
>>106475258
While true, you can't read for shit.
Anonymous No.106475319 >>106475339
>>106475278
>>but the throughput will be good
>Niah
What does needle in a haystack (niah) have to do with what I'm saying?
Anonymous No.106475323 >>106475351
>>106475313
>>106475313
>>106475313
Anonymous No.106475339 >>106475992
>>106475319
You can't read AND you have to explain the joke.

>We deploy the testbed of GenTorrent in a public cloud. The testbed includes eight model nodes with mid-tier hardware, each carrying one NVIDIA RTX A6000 GPU (48 GB) and serving a Meta-Llama-3 8B model, eight other model nodes with high-performance hardware, each carrying one NVIDIA A100 GPU (80 GB) and serving a DeepSeek-R1 14B model. We run two machines as the verification nodes. One carries an NVIDIA A100 (40 GB SXM4), 30 virtual CPUs, 200 GiB of RAM, and a 0.5 TiB SSD, and the other carries an NVIDIA GH200 (96 GB HBM), 64 virtual CPUs, 432 GiB of RAM, and 4 TiB SSD.
Anonymous No.106475351
>>106475323
Fuck you I can read perfectly fine.
Anonymous No.106475925 >>106475992
>>106475271
>>106475278
I didn't read lol and just assumed based on the first sentence. I just don't see user distributed inference becoming a thing. Even with your proposed solution, the prompt processing would be prohibitively slow. You can try this right now, go and load up a model like Deepseek with mmap and process a prompt longer than a few tokens. The last time I tried it, it took several minutes to process a normal sized prompt. Don't remember the exact numbers though. No one wants to wait that long, and would rather just use a shittier model or not use models at all. If you're saying that you should have enough RAM to load the model entirely in memory, then at that point the user already likely can just simply run the model itself, no need for distributing the work. Unless you're saying to run large dense models, but there is no reason to do that when there aren't any currently, all the SOTAs are MoEs.
Anonymous No.106475992
>>106475925
>Even with your proposed solution
>your
Neither of the anons you replied to are the ones that posted the paper. One anon offered no opinion, just a correction. The other one (me) said
>>but the throughput will be good
>Nah
But at this point i'm just taking the piss, really. I also posted this >>106475339 where i point out the "mid-tier hardware" being some rtx6000 gpus for an 8b model.
Anonymous No.106476036
>>106473288
You can only get classifiers at that size