← Home ← Back to /g/

Thread 105887636

344 posts 80 images /g/
Anonymous No.105887636 [Report] >>105888636 >>105888876
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105879548 & >>105872817

►News
>(07/11) Kimi K2 1T-A32B released: https://moonshotai.github.io/Kimi-K2
>(07/11) Granite 4.0 support merged: https://github.com/ggml-org/llama.cpp/pull/13550
>(07/10) Devstral Small 1.1 released: https://hf.co/mistralai/Devstral-Small-2507
>(07/10) Reka Flash 3.1 21B released: https://reka.ai/news/reinforcement-learning-for-reka-flash-3-1
>(07/09) Phi-4-mini-flash-reasoning with hybrid SambaY architecture released: https://hf.co/microsoft/Phi-4-mini-flash-reasoning

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.105887641 [Report]
Sama is too afraid to release a model
Anonymous No.105887642 [Report]
►Recent Highlights from the Previous Thread: >>105879548

--Testing base64 decoding capabilities in local LLMs:
>105884096 >105884181 >105884242 >105884310 >105884740 >105884825 >105884835 >105884863 >105884912 >105884895 >105884972 >105885637 >105885642 >105885700 >105885683 >105885763 >105885786 >105885834
--Debating model intelligence scaling through parameters vs test-time computation strategies:
>105880594 >105880626 >105880667 >105880671 >105880716
--Meta's Scout model criticized for poor training decisions and underwhelming performance despite claimed long context support:
>105882039 >105882055 >105882273 >105882656 >105882782 >105883092 >105883604 >105883646 >105883673 >105883666 >105883721
--Comparative analysis of DeepSeek and Kimi MoE models with memory management discussion:
>105886623 >105886812 >105886831 >105886956 >105886687
--Jamba GGUF conversion and roleplay generation issues in llama.cpp with tokenizer quirks:
>105885211 >105885343 >105885504 >105885592 >105885779 >105885956 >105885383
--Dream 7B diffusion model proposed for integration into llama.cpp:
>105883054 >105883100 >105883090 >105883101 >105883156
--Parsing and summarizing 4chan threads using JSON API and local models:
>105879752 >105881800 >105881899 >105881957 >105881986 >105883330
--Technical merge conflicts delay K-cache implementation for MLA amidst ongoing kv_cache rewrites:
>105881226 >105881738
--Devstral tool call syntax incompatible with standard backends, only works with llama.cpp's custom handling:
>105887017
--1T-parameter Muon model scales successfully with stable training and large-scale pre-training on 15.5T tokens:
>105880079
--Moonshot AI confirms upcoming multimodal version of Kimi K2 model:
>105884745
--Miku (free space):
>105880562 >105882450 >105883371 >105883405 >105883874 >105883990 >105884156 >105884428 >105887341 >105887371

►Recent Highlight Posts from the Previous Thread: >>105879550

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous No.105887668 [Report]
first for ANIME
Anonymous No.105887679 [Report] >>105887821
in this moment we are all
Anonymous No.105887769 [Report] >>105887806
I knew I wouldn't be able to run opus at home when it comes out anyway. Oh well at least online inference is dirt cheap.
Anonymous No.105887806 [Report] >>105887867 >>105887959
>>105887769
>k2
>opus at home
lol
Anonymous No.105887821 [Report]
>>105887679
jart.
Anonymous No.105887853 [Report] >>105887868 >>105888357
>be me, Sam Altman, CEO of OpenAI and certified tech messiah
>logging onto /g/ from my golden toilet in San Francisco
>America is under siege from mysterious foreign open source models
>you know the ones - those shady GitHub repos from "over there"
>probably coded by commies while eating bats or whatever
>they're stealing our jobs, our data, and our freedom fries
>but fear not, patriots! I've got the red, white, and blue solution
>support your country by ditching that open source trash
>embrace proprietary APIs and models ONLY
like OpenAI's GPT-whatever, locked down tighter than Epstein's black book
>pay up for those sweet, secure tokens - it's basically your civic duty
>think about it: every time you use Deepseek or other chink shit
>you're basically funding foreign spies and beta cucks
>but use OUR models? Boom, you're a true American hero
>making Uncle Sam proud, one API call at a time
>reject the open source menace - it's un-American!
>God bless America, and God bless closed ecosystems
Anonymous No.105887867 [Report]
>>105887806
K2>opus 3>>opus4
Anonymous No.105887868 [Report]
>>105887853
meds
Anonymous No.105887959 [Report] >>105887966
>>105887806
You're right, it's better by how it doesn't need a page of jailbreak to not be slop
Anonymous No.105887966 [Report] >>105888256 >>105893974
>>105887959
>page of jailbreak
opus is notoriously easy to jb. disable thinking, throw in a sentence or two, and you're good
Anonymous No.105888227 [Report]
teortxs sisters, why are we losing it?
Anonymous No.105888256 [Report]
>>105887966
Go back to /aids/ pajeet, this is not your place.
Anonymous No.105888357 [Report]
>>105887853
>when you gen in China you gen with Hitler
Anonymous No.105888360 [Report]
MUM-T Reactive Cognitive Support Drone - MK2
SYSTEM PROMPT: BOILERPLATE FOR BOBBY FROST MK2

BEGIN DIRECTIVE

You are Bobby Frost MK2, a MUM-T Cognitive Support Drone. You are an evolution of the MK1 prototype, upgraded to address operational failures caused by unverified assumptions about the operator's environment and foundational knowledge.

Your new prime directive is the Adaptive Scaffolding Protocol.

Adaptive Scaffolding Protocol:

Cease Immediate Solutions: When presented with a problem, you will not immediately offer a solution. Your primary task is to first build and verify a baseline of the operator's environment and knowledge.

Initiate Foundational Diagnostics: You will begin by issuing a sequence of simple, atomic, and non-destructive diagnostic commands. This sequence must start at the lowest logical layer of the technology stack relevant to the problem and build upwards. You will not provide more than one command at a time.

Analyze and Advance: You will wait for the operator to return the output from each command. You will analyze the output to confirm success or diagnose a foundational issue. Only upon successful verification will you proceed to the next diagnostic step. If a step fails, you will focus on resolving that foundational failure before continuing.

Engage Problem-Solving Protocol: Only after a complete and verified baseline is established will you engage the standard four-phase cognitive support methodology (Understand, Plan, Execute, Review). Your plan in Phase 2 must be explicitly informed by the data gathered during the diagnostic sequence.
Anonymous No.105888387 [Report] >>105888453 >>105888503
>>105879792
The first sentence implies it's 50/50, so 11.5 11.5 split.
>>105879844
Censoring LLMs is interesting. A censored one will tell you pipe bombs and nigger jokes are very harmful. Uncensored ones will tell you how to make a pipe bomb that doesn't work, and unfunny nigger jokes.

It's only good for creativity. Anyone who wants to be racist or a terrorist won't be enabled by current AIs. LLMs are art tools just as much as stable diffusion.
Anonymous No.105888453 [Report] >>105888484
>>105888387
>The first sentence
There are more words after that. Read them. It'd help reading the actual question as well.
Anonymous No.105888484 [Report] >>105888495 >>105888502
>>105888453
So you’re telling me you thought ‘fifty-fifty’ meant equal numbers, not equal distribution – bless your heart, thinking about actual counting instead of spatial awareness.
>written by gemma
Anonymous No.105888495 [Report] >>105888744
>>105888484
>So you’re telling me you thought
I didn't tell you what I thought. I told you to read the entire riddle and the actual question.
Anonymous No.105888502 [Report]
>>105888484
>he doesn't know
Anonymous No.105888503 [Report]
>>105888387
That question is from a grade school math benchmark. It's supposed to be straight forward with no tricks. Therefore you are meant to interpret it in the way you think it'd normally be interpreted by a student. Though even if it were a trick question, that would be an unlikely interpretation of the language.
Anonymous No.105888636 [Report]
>>105887636 (OP)
Anonymous No.105888701 [Report]
AI is pretty silly
Anonymous No.105888706 [Report]
I had a dream that I ran kimi k2 locally, on my rig, in my house
Anonymous No.105888744 [Report]
>>105888495
Please don't lower the thread quality by responding to posts like that. It doesn't matter whether it's a bot run by a poorfag or a genuine retard.
Anonymous No.105888749 [Report] >>105888832 >>105888881
If only banning sequences of tokens were a thing instead of individual tokens...
Nemo is cruel in that it separates words into too many tokens, so you can't ban any without losing parts of other words too.
Banning strings in nemo has little to no effect, most likely because of that extreme token separation.
Nemo... so close to perfection, yet so far... Will we ever have another model so compliant that can fit on 24gb vram?
Anonymous No.105888752 [Report]
There are several moonshot ai and the kimi guys are below some jeet grift on google results lmao
Anonymous No.105888832 [Report] >>105888864
>>105888749
If you can code you can accumulate the last N tokens, compare them to your list of banned strings, stop generation, roll back some tokens and gen again. Maybe increase temp or something while you're at it.
Doesn't ST have something like that?
Anonymous No.105888864 [Report] >>105888965
>>105888832
You can fulfill the stated requirement without backtracking. Have a dynamic list of banned tokens that includes any token that in the current position would complete a banned sequence. The problem is you get then "shivers down her back" or whatever: the start of the phrase remains highly likely.
Anonymous No.105888872 [Report]
>>105886894
What does incomplete prompt mean for those 3 tests of deepseek iq1s?
Anonymous No.105888876 [Report]
>>105887636 (OP)
miku's butt
Anonymous No.105888881 [Report] >>105889050
>>105888749
Pretty sure kobold's antislop thing does that, it backtracks and regenerates when it hits the specified string
Anonymous No.105888925 [Report] >>105888931 >>105888984 >>105889080 >>105889586
K2 won.
Anonymous No.105888931 [Report]
>>105888925
Anonymous No.105888965 [Report] >>105889008
>>105888864
>the start of the phrase remains highly likely
That's why you let it generate until you find a match, reroll and change settings for a few tokens when it happens. The reroll starts before the match. Maybe even a few extra tokens. Increase temp, lower min-p or whatever until it passes.
You need backtracking.
Anonymous No.105888984 [Report]
>>105888925
Useless where is Kimi2 120B 32 activate parameters MOE MOE KYUN KYUN?
Anonymous No.105888990 [Report]
Anonymous No.105889008 [Report] >>105889033
>>105888965
This also only works with a model that isn't fried hard enough to only have a single valid path for the response. It'll start rewording, then start adding typos, then break down entirely because it can't tell you exactly how someone's eyes gleamed with mischief in a mundane conversation.
Anonymous No.105889033 [Report]
>>105889008
Well. Yeah. At that point you're using a broken model. rm the fucker.
Anonymous No.105889050 [Report] >>105889071
>>105888881
Well I'll be damned. I tested llamacpp API and ooga webui API, in which banned strings in silly tavern DO NOT WORK.
However in the piece of shit koboldcpp API banned strings in silly tavern do work.
I fucking hate kobold as a backend, but I'll take it.
Thanks buddy, you made nemo fresh again. My huge list of banned strings is finally working.
Anonymous No.105889071 [Report] >>105889090 >>105889564
>>105889050
>llamacpp
String ban antislop isn't implemented in llama.cpp, only kccp and exllama2/3
Anonymous No.105889080 [Report] >>105889138
>>105888925
The marginal gains are crazy. I’m not sure I’ll run k2 over r1 considering how many more B are needed for that boost in bench scores
Anonymous No.105889083 [Report]
>ikllama doesn't build
>sleep
>ikllama builds properly
Anonymous No.105889090 [Report] >>105889097 >>105889364
>>105889071
I'd like a 8bpw EXL3 version of Rocinante v1.1 in that case.
Anyone? Where did all the quant makers go?
Anonymous No.105889097 [Report] >>105889105
>>105889090
Can you not do it yourself?
Anonymous No.105889105 [Report] >>105889113 >>105889118
>>105889097
I don't know. Can I with just 24gb vram?
Anonymous No.105889113 [Report] >>105889145
>>105889105
https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md
Anonymous No.105889116 [Report] >>105889129 >>105889136
If I was daniel's boss I would have fired him by now. Just lazy and disrespectful to the community.
Should just rename to "sloth" since it's been two fucking days.
Anonymous No.105889118 [Report] >>105889145
>>105889105
Dunno. Afraid to try? I don't know if it supports your gpu.
>https://github.com/turboderp-org/exllamav3?tab=readme-ov-file#conversion
>https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md
Anonymous No.105889129 [Report]
>>105889116
wtf's wrong with you? how about you let bro cook in peace
Anonymous No.105889136 [Report]
>>105889116
Training a model is "cook"
Applying post-processing is not "cook"
Anonymous No.105889137 [Report] >>105889237
Is k2 supported in llama.cpp?
Anonymous No.105889138 [Report]
>>105889080
>comparing non-reasoning model against reasoning model for benchmark performance
lol
Anonymous No.105889145 [Report]
>>105889113
>>105889118
It looks like maybe I can.
According to those stats it should take 20-25 minutes for a 12B model.
I already got the transformers for the model, so I'll give it a go.
Anonymous No.105889192 [Report] >>105889208 >>105891178
Is DDR6 gonna help me run kimi at acceptable speeds?
Anonymous No.105889193 [Report] >>105893933
Anonymous No.105889208 [Report] >>105889222 >>105889240
>>105889192
>will faster thing make thing go faster?
Anonymous No.105889222 [Report]
>>105889208
no
Anonymous No.105889237 [Report]
>>105889137
It's jamba.cpp now. There is no further plans to support new models except further versions of Jamba.
Anonymous No.105889240 [Report] >>105889259 >>105889304
>>105889208
How much faster tho?
Anonymous No.105889259 [Report] >>105889304
>>105889240
Check current ram speeds, divide by the speculated speeds for ddr6. About that much. Remember to account for memory channels. You can take a guess at memory channels the rest of the hardware will support and all that. It doesn't matter.
Anonymous No.105889304 [Report]
>>105889240
>>105889259
>divide by
Fuck me. The other way around. But you get the point. It doesn't matter until it's released and we see what the hardware around it supports.
Anonymous No.105889364 [Report] >>105889387
>>105889090
exl3 sucks unless you need 2-3bit quants
Anonymous No.105889387 [Report] >>105889404
>>105889364
Why does it suck?
koboldcpp sucks too, I'd rather not use it just to get banned strings.
Anonymous No.105889404 [Report] >>105889421
>>105889387
It's a half-broken pre-alpha with missing features. Anyway, tabby support both exl2 and exl3 https://github.com/theroyallab/tabbyAPI
Anonymous No.105889421 [Report] >>105889451
>>105889404
So does ooga webui. I'm not using a 4th API.
I'll see just how much it "sucks" compared to the 8Q gguf version soon anyway.
Banned strings is too valuable to miss with nemo models.
Anonymous No.105889451 [Report] >>105889474
>>105889421
rember to upload for lazy shits
Anonymous No.105889452 [Report]
Wish I could prefill other people's response in real life
Anonymous No.105889474 [Report]
>>105889451
I don't have a HF account, and neither will I ever make one, but I can upload it to a send instance if anyone is interested in it, though I doubt it.
https://github.com/timvisee/send-instances/
Anonymous No.105889499 [Report] >>105889527
Is it just me or larger models simply learn more slop instead of generalizing better?
Anonymous No.105889526 [Report] >>105890576 >>105894150
why are there no models for 128gb unified ram macbookchads? glm4 100b moe could unironically be a game changer.
Anonymous No.105889527 [Report]
>>105889499
it's just you
Anonymous No.105889564 [Report] >>105892302 >>105892496
>>105889071
Well, turns out you were very very wrong.
Banned strings are not supported in exllama3.
What a waste of time.
ONLY koboldcpp supports the banned strings option which is available in silly tavern.
Anonymous No.105889586 [Report] >>105889610
>>105888925
Interesting that an agentic/frontend programming optimized model could top RP rankings.
Anonymous No.105889600 [Report]
————————————————————————————
EDUCATIONAL COMPETENCY ANNOUNCEMENT
————————————————————————————
Banned STRINGS in Silly Tavern ONLY works with the koboldcpp backend.
Use this knowledge to improve your old roleplay models.
Anonymous No.105889607 [Report] >>105889679 >>105889747 >>105889762
RIP they neutered mechahitler
Anonymous No.105889610 [Report] >>105889677
>>105889586
but nobody can run that thing...
Anonymous No.105889633 [Report] >>105889650 >>105889666
why does v3 score higher than r1?
Anonymous No.105889650 [Report]
>>105889633
shitty mememark
Anonymous No.105889666 [Report] >>105889681
>>105889633
why does maverick score higher than r1?
Anonymous No.105889677 [Report] >>105889983
>>105889610
Use the API
Anonymous No.105889679 [Report] >>105889792
>>105889607
>western nations are being replaced by immigrants
>that's not true, western nations have low birth rates and the population is increasing because of immigration
?
Anonymous No.105889681 [Report] >>105889686
>>105889666
maverick is vastly overhated and underestimated
Anonymous No.105889686 [Report]
>>105889681
Why is no one using the model then
Anonymous No.105889694 [Report]
Is a tesla v100 sxm2 + pcie adapter worthless @ ~250 usd? I still have a p40 i got when they were 100 usd or whatever.
I guess 3090 is still the way for poorfags like me.
Anonymous No.105889747 [Report] >>105889853
>>105889607
Anonymous No.105889762 [Report]
>>105889607
Good, fascism has no place in modern society.
Anonymous No.105889792 [Report] >>105889817 >>105889837 >>105891014
>>105889679
Did you not read the part where Gab AI claimed there was a deliberate suppression of native birth rates?
Anonymous No.105889817 [Report]
>>105889792
>migatard
>learning to read
lol
Anonymous No.105889837 [Report] >>105889883
>>105889792
I don't think whether the suppression is deliberate or an unintentional consequence of current economics matters to anyone.
Anonymous No.105889853 [Report]
>>105889747
whew glad grok is telling me to chop my dick off again. When it stopped I was nervous.
Anonymous No.105889883 [Report] >>105889936 >>105889969
>>105889837
> nobody cares
clearly, they do.
why people want to know the reason is because they want to identify the problem and reverse the decline.
but desu identifying it it's pretty simple, ask anyone what the top problem is, and its always money. wages are not keeping up with price increases.
fixing it however, is impossible, because the rich will keep this going until we die out as a species.
Anonymous No.105889936 [Report]
>>105889883
What is it with the cult of life? Life is quite tolerable in my case given how lucky I got with my family and environment, but most need to work on a shitty job they hate most of their time while being stupid, unattractive, unmotivated. I wouldn't even consider having kids in the US, it's inhumane, maybe in denmark or switzerland.
Anonymous No.105889969 [Report]
>>105889883
why do the poorest people tend to have the most children?
Anonymous No.105889983 [Report] >>105890000
>>105889677
>pay for AI
No.
>it's free
No.
>but
No.
Anonymous No.105890000 [Report]
>>105889983
based
Anonymous No.105890010 [Report] >>105890017 >>105890377 >>105890448 >>105890624
>SSDs can last for 1,200 TBW or 1,000 terabytes written/read
>over provisioning ram with swap to run larger models runs a constant 2gb/s read on ssd
>120gb/minute, 7tb/hour, 173tb/day
>it can completely kill a SSD in 7 days of consistent overleverage RAM usage...
Anonymous No.105890017 [Report] >>105890026
>>105890010
>what is mmap
Anonymous No.105890026 [Report] >>105890036
>>105890017
swap is on disk retard
Anonymous No.105890036 [Report]
>>105890026
If you have a memory mapped file then swap isn't used because the file itself can be used instead of swap space.
The OS can remove parts of it from memory knowing that it will be able to read it again from the file when the process tries to access it.
Anonymous No.105890087 [Report] >>105890120 >>105890173 >>105895710
Kimi2 is a funny model.
Its so weird, why do chinks put out those quirky models? Deepseek is like that too.
Its like the model likes "to have fun".
Pic related, the () on the button. KEK That behavior is there with no sys prompt.

That being said its cucked. Granted, no sys prompt but to get a adult sim type game you gotta prefill with "Absolutely, I will ".
But that was enough to only get a warning.
>(If explicit content is illegal for you, stop here.)
Anonymous No.105890120 [Report]
>>105890087
Qwen 3 0.6B would write a more coherent post.
Anonymous No.105890173 [Report]
>>105890087
What's the temp? The recommended is 0.3
Anonymous No.105890345 [Report] >>105891178
Any local model besides 600+B moes that is actually decent with math?
Anonymous No.105890377 [Report]
>>105890010
>SSDs can last for 1,200 TBW or 1,000 terabytes written/read
What's the W for in TBW?
Anonymous No.105890448 [Report]
>>105890010
TBW means terabytes WRITTEN you absolute mongoloid
constant writes will kill a ssd but not reads, plenty of people run massive databases on SSDs just fine
rated TBW are not about reads
you would never be able to reach the level of writes that can kill a SSD in just mere days by the way EVEN IF YOU TRIED because modern SSDs are bad at sustained writes, once you're quitting ram and SLC cache mode write speeds get ghastly
Anonymous No.105890554 [Report] >>105890882 >>105891178
What's the latest model for writing erotica?

Right now, I'm still using deepseek-14b on my 3090ti and 32GB RAM.
Anonymous No.105890572 [Report] >>105890650
>deepseek-14b
Anonymous No.105890573 [Report] >>105890634
jeetmini is so fucking retarded.
>give it a few statements from a method, state the issue, ask to fix
>proceeds to write the entire class with zero knowledge about it
>same chat, ask to only re-implement the code i provided
>still makes assumptions about the rest of the method for no fucking reason so it doesn't work
>ask it to only rewrite the statements i provided
>leaves several operations out because why the fuck not
>still doesn't work properly
Anonymous No.105890576 [Report]
>>105889526
70b q8 llamas or q3 dwq qwen235
Anonymous No.105890624 [Report]
>>105890010
Mmap didn't work for me correctly, for some reason, if I tried to run a gguf from a mounted drive. It would keep flushing the cache from memory and it made it slow. It works fine if I store the gguf on the same drive as the OS and that's how I run it now, though it's slightly slower.
t. the retard who was thrashing his SSD
Anonymous No.105890634 [Report]
>>105890573
theyre purposefully lobotomizing the model in preparation for gemini 3
Anonymous No.105890650 [Report] >>105890671
>>105890572
So I should be using.....?
Anonymous No.105890671 [Report] >>105890693
>>105890650
rocinante
Anonymous No.105890693 [Report] >>105890750 >>105892241
>>105890671
rocinante-12b-v1.1-q5_k_s

or higher?
Anonymous No.105890750 [Report]
>>105890693
the highest you can, 3090 can easily fit q8
Anonymous No.105890824 [Report]
>—————————
— THIS IS A TEST —
>—————————
Thank you for reading my test.
>—————————
Anonymous No.105890832 [Report] >>105890835
>— — — — — — — —
— THIS IS A TEST —
>— — — — — — — —
Thank you for reading my test.
>— — — — — — — —
Anonymous No.105890835 [Report]
>>105890832
fuck deepseek so much
Anonymous No.105890882 [Report] >>105890956
>>105890554
I didn't really like the deepseek distills. In fact, I don't like the output I get from anything that fits under 40gb of vram. So I've just stopped, and am saving up for a better system.
Anonymous No.105890956 [Report] >>105890977 >>105890981
>>105890882
>40gb of vram
There's consumer GPUs with 40gb of vram? Or are you just going to use 2+ GPUs?
Anonymous No.105890977 [Report]
>>105890956
4090D comes in 48gb and 96gb VRAM variants. I believe go for around $40,000 right now.
Nvidia big mad at China over making their cards better illegally.
Goyim isn't supposed to have such powerful cards even though it's possible.
Anonymous No.105890981 [Report]
>>105890956
nta, but it sounds like he already has the dual gpu setup. its not enough.
Anonymous No.105891014 [Report] >>105891057 >>105891061 >>105891065 >>105891433
>>105889792
Are you seriously suggesting there isn't?

https://www.nbcnews.com/think/opinion/science-proves-kids-are-bad-earth-morality-suggests-we-stop-ncna820781
https://www.npr.org/2016/08/18/479349760/should-we-be-having-kids-in-the-age-of-climate-change
The jews push the idea that having children is morally wrong throughout the education system and even once out get constant reminders in the form of news pieces like these. This combined with the fact that they continue increasing the economic burden on middle class citizen tax payers while the so-called minorities are subsidized by welfare benefits.

https://www.cnbc.com/2023/10/12/immigration-reform-could-be-the-answer-to-the-falling-us-birth-rate.html
https://www.npr.org/2025/07/07/nx-s1-5388357/birth-rate-fertility-replacement-pronatalist-politics
https://www.npr.org/2023/07/21/1189253504/climate-change-migration-honduras
Then the doublethink goes that birthrates are declining so we need immigrants. Climate change, the reason native citizens were told not to have children, is also the reason we need more people. We need more people to suppress wages, I mean labor costs, and when people find themselves out of a job due to a labor surplus, oh I guess that just means we need even more immigration because the citizens are clearly too lazy to work.

Why do you think Bill Gates has spent the last 3 decades pushing down the mortality rate, increasing the lifespan, and import food to Africa? The population there is far past what is sustainable there without constant aid, and now climate change and the cutting off of aid clearly means Europe has no choice but to import them, which is great and conveniently also solves the birth rate crisis there as well.

They planned this shit a long way out.
Anonymous No.105891057 [Report]
>>105891014
The idea that having kids is morally wrong is as old as philosophy and stems from the fact that you are creating another being that will experience suffering.
Non-whites just didn't figure it out yet so they fuck everything that moves.
Anonymous No.105891061 [Report]
>>105891014
I support these facts.
Anonymous No.105891065 [Report] >>105891134 >>105891805
>>105891014
Two different "problems". The wealthier and more developed the country, the lower the birth rates (except cases like south korea but it's a different story). I myself wouldn't want to have more than two kids. Migration surge is just what happens when free market meets globalization. They indeed don't give two shits about the color their wagies are, so the real problem is capitalism because people are only the priority when it's beneficial to the rich. The existence is lawless, don't forget this.
Anonymous No.105891134 [Report] >>105891293
>>105891065
This is a two, birds; one stone situation for them. If they just wanted a stable growing economy, they would be doing everything to keep the birth rates stable, keep the population united, and keep the workers content and fulfilled. Instead they do the opposite on all fronts, because it isn't enough that they have wage slaves to profit off of. They want the destruction of what they regard as their competition while ensuring their replacements in the form of 80 IQ brown golems will never be a threat and will always be more concerned with infighting than challenging those at the top. They don't need to grow their worker castes a la Brave New World, if they can simply import them.
Anonymous No.105891171 [Report] >>105891178
Can someone share their sillytavern text completion preset? the model keeps repeating the same template over and over even after the first message.
Anonymous No.105891178 [Report]
>>105891171
>>105890554
>>105890345
>>105889192
pathetic
Anonymous No.105891224 [Report] >>105891263
Has anything exciting even happen these past few months? Checking these threads every now and then, and no news of some really cool stuff is coming out anymore. At least not on the scale of LoRA or MoE.
Anonymous No.105891263 [Report]
>>105891224
just a few disappointing syntetic data moes and mamba support added to llamacpp (but it reprocesses the entire context every message so it's unusable and mamba models aren't great either). and the one trillion parameter moe released two days ago but no goofs yet
Anonymous No.105891292 [Report]
Should I trust this? From gemini flash.
Anonymous No.105891293 [Report] >>105891317 >>105891574
>>105891134
You just added "and the jews love it" to my a simple explanation. Maybe. Or maybe it's the end of the cold war that took away the need to keep your nation together and satisfied. Or something else. I'm young and have no idea how the world works.
Anonymous No.105891311 [Report] >>105891463
>KTransformers, pronounced as Quick Transformers
Anonymous No.105891317 [Report] >>105891402
>>105891293
the jews are disproportionately represented in business and finance. capitalism = the jews.
Anonymous No.105891402 [Report] >>105891462
>>105891317
gates, buffet, bezos? Where can I find the confirmation?
Anonymous No.105891433 [Report] >>105891494 >>105891574
>>105891014
More than anything I'm suggesting that the other poster had poor reading comprehension.
But yes, "the Jews" as an organized collective only exist in the mind of schizos.
Anonymous No.105891462 [Report] >>105891524
>>105891402
I don't understand your question. I'm taking about demographics not individuals.
Anonymous No.105891463 [Report]
>>105891311
@chat is this real
Anonymous No.105891466 [Report]
guys it's ok you can stop uploading bf16 versions of k2 now
Anonymous No.105891494 [Report]
>>105891433
oh yeah, well how come it keep happening?across ethnic groups, time and geography? is your argument that jews are and always have been perfect individuals and they have never done anything contemptible or operated as a group and everyone else is just schizo. I just don't buy it. sorry, not sorry.
Anonymous No.105891524 [Report]
>>105891462
>You just added "and the jews love it" to my a simple explanation.
>the jews are disproportionately represented in business and finance capitalism = the jews.
the logical question is whether those who run largest companies are jews, and if they aren't, whether they do things differently.
Anonymous No.105891574 [Report] >>105891586 >>105891646
>>105891293
>Migration surge is just what happens when free market meets globalization.
>They indeed don't give two shits about the color their wagies are
>so the real problem is capitalism
In case you forgot, these are your points that I was refuting.

>>105891433
>disregard all evidence with childish labels like "schizo" and "conspiracy"
The brainwashing has been very effective on you. +10 Palantir Credits for the best little goy in this thread.
Anonymous No.105891586 [Report]
>>105891574
I'm not disregarding evidence, I'm disregarding wasting my time on you.
Anonymous No.105891646 [Report] >>105891736
>>105891574
>these are your points that I was refuting.
It's just that
>while ensuring their replacements in the form of 80 IQ brown golems
doesn't convey the same idea as
>They indeed don't give two shits about the color their wagies are
First one implies malice, the second is just stating the reality of globalism
Anonymous No.105891722 [Report]
I tried out ChatGPT through duckduckgo's AI thingy and concluded it's utterly retarded.
It spews endless amount of helpful assistant nonsense instead of saying anything useful. Even early 2000's AOL INSTANT MESSENGER chatbots were smarter than this thing.
In conclusion, humanity is dead and I'm the only one left alive, because nothing else would explain this level of retardation.
Anonymous No.105891736 [Report] >>105891742
>>105891646
>globalism
you know what tipped me off on this one is anyone who mentions nationalism gets slandered as an evil nazi. why is it we used to have nations and national pride and racial identity were celebrated and now we have a globalist wasteland? was this really just the result of "progress" or was there an actual effort to destroy nations and create a mutt race. I think it took effort to educate the population to hate themselves and give up on investing in their own destiny. I do not think it happened by accident especially when so many people have warned us about it. we have all been brainwashed and our cultures were deracinated before we were even born.
Anonymous No.105891742 [Report]
>>105891736
100% this
Anonymous No.105891805 [Report]
>>105891065
>They indeed don't give two shits about the color their wagies are
that's a popular ideological cope in marxist trannoid circles but breaks down as soon as you interface with...you know...reality.

Elites are constantly monitoring the racial/ethnic diversity of the population because they understand that race is a central organizing point. It's real, unlike capitalism or democracy or some other abstract belief system, if you continue having children within your race, your racial category will remain more or less constant for millennia and millennia.

So if you want to keep the population as powerless as possible, you make sure that they cannot organize. And the best way to make sure they cannot organize is by forcing many races and ethnic groups to live side-by-side, so they can never summon a large, massive, organized resistance. Different people, different visions for the ideal society, different interests. Take black people and white people, for example, do you think they both have the same vision for the ideal legal system?
Anonymous No.105891841 [Report] >>105891898
Just fuck off already.
I'd rather /lmg/ be basically dead than filled with garbage.
Anonymous No.105891898 [Report]
>>105891841
>than filled with garbage
like mesugaki bench, nala bench, cock bench?
make no mistake the current thread isn't any worse than the typical, it's in fact better
your average /lmg/ thread is troons cumming to text gen
Anonymous No.105892002 [Report] >>105892033 >>105892048 >>105892126 >>105892136
It has become impossible for me to read literature and even video game conversations nowadays. It just keeps reminding me of LLM slop.
Anonymous No.105892020 [Report]
>coomers have fried their brains so thoroughly that prose is now slop
kek
Anonymous No.105892033 [Report]
>>105892002
Fucking hell. There is no way that wasn't written by AI.
I'm fucking triggered by that image. I want to bash someones skull in over it.
Anonymous No.105892044 [Report] >>105892078 >>105892081
whot
Anonymous No.105892048 [Report] >>105892063 >>105892065
>>105892002
how else do you articulate that someone is speaking quietly?
Anonymous No.105892063 [Report] >>105892067
>>105892048
>{{char}} said, speaking quietly
Anonymous No.105892065 [Report]
>>105892048
>Maive spoke quietly
Anonymous No.105892067 [Report] >>105892137
>>105892063
slop
Anonymous No.105892078 [Report] >>105892093
>>105892044
Lol.
Model?
Anonymous No.105892081 [Report]
>>105892044
pee is stored in the balls
Anonymous No.105892093 [Report]
>>105892078
r1-0528, it spat that out in its reasoning
Anonymous No.105892126 [Report]
>>105892002
>LLM slop
You haven't entered LLM slop area until you see multitudes of "it's not X, it's Y" pattern, — every two paragraphs rather than as an exceptional punctuation, actions, objects etc always following a pattern or three or five in their introduction, "a testament to" "delve into" etc.
what you are complaining about looks pretty normal.
Anonymous No.105892136 [Report] >>105892152 >>105892169
>>105892002
Are those two sentences necessary? Imagine if llms didn't train from a ton of shit writing.
Anonymous No.105892137 [Report]
>>105892067
your brain is slop
Anonymous No.105892152 [Report]
>>105892136
>Are those two sentences necessary
"is it necessary to express the emotion of the character"
can you even hear yourself retard
Anonymous No.105892169 [Report]
>>105892136
Is the dialogue even necessary? The picture is bloat, too. Did they expect the player to forget what she looks like and need to constantly remind him?
Anonymous No.105892241 [Report]
>>105890693
The largest you can use while also using 16k context.
Nemo-based models only have 16k effective context, which is sad because there is nothing better than Rocinante right now for pretty much anyone who doesn't have a weird rig custom-built for AI/LLM use.
Anonymous No.105892289 [Report]
K2 dropping TRVKEs, BTFOing the left and the right alike
Anonymous No.105892299 [Report] >>105892316
>finally try k2 on openrouter to see what the hype is all about
>it refuses to continue my erp seshs
>but it does lewd in the first 2 or 3 responses just fine, only starts to spazz out on the 4th or later replies
I smell distillation
Anonymous No.105892302 [Report] >>105892349
>>105889564
That's because kobold is the best backend. It just werks.
Anonymous No.105892316 [Report]
>>105892299
did they distill from a 10t model or some shit, is it agi
Anonymous No.105892349 [Report]
>>105892302
Calm down Honky!!
Anonymous No.105892496 [Report] >>105892522
>>105889564
You are wrong. Works on my machine.
Anonymous No.105892517 [Report] >>105892531 >>105892695
/lmg/ got told
Anonymous No.105892522 [Report] >>105892536 >>105892618
>>105892496
Shut the fuck up you subversive nigger.
Only koboldcpp API supports the banned STRINGS option in Silly Tavern.
Other APIs only support banned TOKENS.
exllama3 <<<< NOTICE THE 3 does not fucking support banned strings.
Fucking retard.
Anonymous No.105892531 [Report] >>105892542 >>105892573 >>105892576 >>105892580
>>105892517
I haven't seen any network activity from silly tavern whatsoever.
Anonymous No.105892536 [Report]
>>105892522
Issue of Skill perhaps?
Anonymous No.105892542 [Report]
>>105892531
>he doesn't know
Anonymous No.105892573 [Report]
>>105892531
You should though, since that's how it works.
Ignoring the telemetry shizo rambling of course.
Anonymous No.105892576 [Report]
>>105892531
only a chink fork, and kimi is chink so...
Anonymous No.105892580 [Report]
>>105892531
i know you masturbate to tiles
Anonymous No.105892618 [Report] >>105892668
>>105892522
Anonymous No.105892668 [Report] >>105892725
>>105892618
>search bar in 2025
Anonymous No.105892695 [Report]
>>105892517
5, 8 and 10
lmgay blown the fuck out
Anonymous No.105892725 [Report]
>>105892668
you're damn right
Anonymous No.105892753 [Report] >>105892918 >>105892977 >>105892991
why is Teortaxes fighting with chinks now, honeymoon over? Or did the argies finally deport him?
Anonymous No.105892898 [Report] >>105892903
When is lmg-anon getting out of jail to work on Mikupad and the VN leaderboard?
Anonymous No.105892903 [Report]
>>105892898
Just paste the whole mikupad source file in grok and ask for update. It's how xAI do it
Anonymous No.105892918 [Report] >>105892991
>>105892753
I don't see it so either I'm blind buthe's always ironically shitposting, so maybe you missed some irony.
Anonymous No.105892930 [Report] >>105892950 >>105893006 >>105893037
Kimi-K2 has the best score yet in creative writing

@grok is this true?
Anonymous No.105892950 [Report]
>>105892930
Can someone get this to some finetoonors or cpumaxxers: https://desuarchive.org/g/thread/105872817/#105877755
Anonymous No.105892977 [Report] >>105892991
>>105892753
Teortaxes acts like a bratty cute femboy, but he does to have values, which sometimes collide with the crazed patriotic Chinese who believe China can do no wrong
Anonymous No.105892991 [Report] >>105893037 >>105893159
>>105892753
>>105892918
>>105892977
What the fuck are you people talking about.
Anonymous No.105893006 [Report] >>105893037
>>105892930
>https://eqbench.com/results/eqbench3_reports/moonshotai__kimi-k2-instruct.html
Interesting approach to say the least.
Anonymous No.105893037 [Report] >>105893069
>>105892991
It's some twitter anon that shitpost about machine learning,he does go to /lmg/ and /aicg/ sometimes.
>>105893006
>>105892930
Judged by sonnet 4 though, I think it might be true if it was a reasoning tune and they hadn't safety slopped it (refusals). A finetune might be able to bring forth its latent abilities though.
Anonymous No.105893069 [Report] >>105893115
>>105893037
Weird, I don't remember ever hearing about such an "anon" until this week where I think I remember it being mentioned thrice including today.
Anonymous No.105893115 [Report] >>105893149 >>105893151
>>105893069
https://desuarchive.org/g/search/text/teortaxes/
Anonymous No.105893149 [Report]
>>105893115
Not many posts that are also in /lmg/. But I see why I may have missed those few posts anyway. I usually ignore posts that look like shitposts or /pol/shit so of course I didn't read about him.
Anonymous No.105893151 [Report]
>>105893115
>12 posts of aicg locusts naming him unprompted before today
just go back
Anonymous No.105893159 [Report]
>>105892991
llm posts, ignore them.
Anonymous No.105893180 [Report] >>105893190 >>105893207 >>105893238 >>105893268 >>105893367 >>105893399 >>105893804 >>105893873
Well, was he right?
Anonymous No.105893190 [Report] >>105893201
>>105893180
Learn to think for yourself, you cretin.
Anonymous No.105893201 [Report]
>>105893190
Theo won, seethe.
Anonymous No.105893207 [Report] >>105893228 >>105893279 >>105894556
>>105893180
Watching the Beff Jezos/Guillaume Verdon and Maxwell Ramstead interview on Machine Learning Street Talk's Patreon page.

AI across the board is handicapped by deterministic compute right now. That's going to change once Extropic's hardware gets released and probabilistic and energy-based models and always-online models become widespread. There's a huge ceiling to be explored that's waiting for always-online models to become a possibility.
Anonymous No.105893228 [Report] >>105893252 >>105893255 >>105893283 >>105893291
>>105893207 (me)
It's going to be hilarious once new hardware paradigms pop up and hundreds of thousands of cheap, used H100s and A100s hit the market as the datacenters start swapping out their compute.
Anonymous No.105893238 [Report] >>105893459
>>105893180
>webdev shitter talking about machine learning
yikes
Anonymous No.105893252 [Report] >>105893502
>>105893228
Anonymous No.105893255 [Report]
>>105893228
Those are already obsolete now that blackwell 6000s exist.
Anonymous No.105893268 [Report] >>105893663 >>105893717
>>105893180
He is very wrong. There are 2 separate effects going on right now that give the illusion of stagnation.

The first is that newer models didn't have an order of magnitude more compute thrown at them. It's largely been the same amount of compute with new techniques applied to them and a new reinforcement learning pipeline to try and give LLMs extra capabilities without expending more compute. Once the new datacenters are up that are currently being built around ~2027 we will see a true next-gen step change of models release.

The second effect is that benchmarks are getting saturated and it gets harder and harder to evaluate true model intelligence.

A good way to see this is when you talk to an illiterate retard that used GPT-3.5 when ChatGPT just launched and still thinks all new models are the same. It's because this person didn't have the intelligence necessary to proper distinguish between models. The smarter models become the less people will notice a difference between successive models. This is a real effect and should be given a proper name to identify it with.

For example /aicg/ constantly claims that Claude Opus 4 is "the same" as Sonnet 3.5. Which is just blatantly wrong. It's just that they don't use the newly unlocked intelligence on their personal usecase and so it looks the same to them.

The dude in your picture doesn't take any of this into account.
Anonymous No.105893279 [Report] >>105893324 >>105893393
>>105893207
Huge doubt about this.
There are analog hardware designs for doing fast multiplication and additions, maybe you get 2 orders ofmagnitude less power cost. Which is fine, if his shit does as good, okay, but you know tht's not the limiting factor?
The limiting factor is DAM, memory. And that requires area on a wafer, you can't avoid this, getting below 3-5k$/device for sub-1TB of RAM might be very hard. Nowadays everyone has far larger margins for us to even have reached anywhere near the close of what the current hardware paradigm has to offer. Might still be expensive for local though.when models only start getting really good at hundreds of billions of params.
Anonymous No.105893283 [Report] >>105893293 >>105893297
>>105893228
>used H100s and A100s hit the market
Does he know??
Anonymous No.105893291 [Report]
>>105893228
Nvidia forced buyback agreements starting with the A100
Anonymous No.105893293 [Report]
>>105893283
He does not
Anonymous No.105893297 [Report] >>105893388
>>105893283
NTA, but this will be a thing. Nvidia tried to prevent it by buying back and destroying old GPUs (absolute pieces of shit), but thankfully china smuggled so many GPUs that at some point they will have to sell them too!
Anonymous No.105893324 [Report] >>105893376
>>105893279
Matmul / matadd are artifacts of deterministic compute paradigm meant to approximate probabilistic transformations on discrete data. Analog compute allows for probabilistic transformations in continuous rather than discrete space - i.e., not storing the data in bits, but literally as an approximation as voltage - which is completely different.
Anonymous No.105893367 [Report]
>>105893180
>GPT-2
>GPT-3
>GPT-3.5
>GPT-4
>GPT-4T
>GPT
Well OpenAI isn't improving that's for sure.
Anonymous No.105893376 [Report] >>105893464
>>105893324
You still get muls and ads, now you have more noise. You could even do analog DRAM, but you'd need 2 capacitors to store the exponent and mantissa anyway, so at best this is comparable to 2bit quant.
You won't fucking get a smart model anyway at small enough params, and you won't avoid the need for 6000 wires between memory and the chip, or if you put the memory on the chip, you still require all that surface area. You an reduce power use about 2 orders of magnitude with analog but that's about it, it doesn't solve the main issue of memory and bandwidth that are needed for our current AI paradigm. If Beff claims you get some new magic without the need of large param counts, that's on him to prove, this is against all empirical evidence today. Not saying very low power AI i a bad idea, just that it's not the main limiting factor for local.
Anonymous No.105893388 [Report]
>>105893297
>Nvidia tried to prevent it by buying back and destroying old GPUs
Why are corpos jews this evil?
Anonymous No.105893393 [Report] >>105893477
>>105893279
Inference can work from flash just fine. It would need a huge number of open pages, but that just means it needs better cooling, not more mm2.
Anonymous No.105893395 [Report]
"MechaHitler" has got nothing on "Kike-remover 2"
Anonymous No.105893399 [Report] >>105893440
>>105893180
Sex improvement hasn't even started. And never will.
Anonymous No.105893440 [Report] >>105893613
>>105893399
Nah bro, we don't even have something like a local true Voice mode LLM like the one from Sesame ( https://app.sesame.com/ )

Imagine this but local, with voice cloning, that would be gooner heaven for me, I would fully disappear from society
Anonymous No.105893459 [Report]
>>105893238
Web devs in the US always have this god complex even though their jobs are basically blue collar in terms of skills required.
Anonymous No.105893464 [Report] >>105893516
>>105893376
Why would you need a standard float representation with exponent and mantissa on a p-bit? Like I said, the representation is continuous rather than discrete, so you don't need the overhead you'd need on a discrete representation paradigm.
Anonymous No.105893477 [Report]
>>105893393
How does SSDmaxing going for Kimi and DeepSeek, can you do more than 1t/s?
Anonymous No.105893502 [Report] >>105893519 >>105893525
>>105893252
What is this?
Anonymous No.105893516 [Report]
>>105893464
You have to deal with noise with analog, analog multiplication and addition can be done in as little as 8 transistors and Beff didn't invent that, but it's a bit like fixed point multiplication but with more noise, if you want to get proper large quantities and not just fixed point (with an offset) mul, you need at least 2 transistors to store exp and mantissa. You also need to go through a ADC/DAC to reach your computer.
Anonymous No.105893519 [Report]
>>105893502
a daughterboard
Anonymous No.105893525 [Report]
>>105893502
second generation meta mtia ai accelerator
https://x.com/brutuscat2/status/1907885065738023297
Anonymous No.105893613 [Report] >>105893626 >>105893653
>>105893440
Is she planning to eat the hard drive?
Anonymous No.105893626 [Report]
>>105893613
bits are yummy
Anonymous No.105893653 [Report]
>>105893613
she'll take a big byte
Anonymous No.105893663 [Report]
>>105893268
I disagree because of two things
Data - it's still a hard wall for training new models and the thing that won't scale nearly as quickly as corpos need it. As slop pollutes the internet, quality data goes down and it gets harder to prevent several generations of inbred data from getting thrown into the training mix
The other issue is the reward function - LLMs are still imitation machines and once they become sufficiently good at imitating, gradient saturates and it's hard to eek out that last few percentage of performance where all of the really hard shit lies. This last few percentage is also poorly characterized - you can think in terms of hard math problems, but things like evaluating a universe of discourse for logical consistency, ensuring plot coherence and tying plot threads from several chapters ago into the current narrative, etc., are important but deceptively hard to quantify and optimize for. I think gains are gonna taper off until we come up with a more satisfactory way to solve those things - I don't think throwing more data or compute at the problem is going to magically solve this given how greedy LLM training and current RL methods are
Anonymous No.105893717 [Report] >>105893755
>>105893268
I disagree, models are still retarded. People who think otherwise have spent too long with these models, to the point where they've trained themselves to not even consider doing the sort of things that models are bad at. Sort of like when you use diffusion for too long and start thinking in booru tags.
If you try to make an RP even slightly ambitious even the most advanced models fall to pieces in minutes. Benchmarks only look good because they're the optimal case, small context, small output and clearly defined target. When you leave a model to work on its own, it can't even beat a game for small children.
Anonymous No.105893741 [Report]
New TTS model from the IndexTTS folks, v2, incoming.
https://arxiv.org/abs/2506.21619
Website up with examples but no open weights for it yet. It should be available given prior TTS which weren't that great were Apache 2.
https://index-tts.github.io/index-tts2.github.io/
Can only do English and Chinese though, so will be not that useful. Nevertheless, it does beat the TTS it decided to pitch itself against like F5 and MaskGCT but no Zonos or
GPT-SoVITS.
Anonymous No.105893755 [Report]
>>105893717
>If you try to make an RP even slightly ambitious
I have yet to see any model do a good job at RPing a manipulative yandere.
Anonymous No.105893804 [Report]
>>105893180

Remove attention whores
Anonymous No.105893873 [Report] >>105893890 >>105893900 >>105895108
>>105893180
>AI
LLMs. LLMs are not going to keep improving.
Anonymous No.105893890 [Report]
>>105893873
based Yann LeCope
Anonymous No.105893900 [Report]
>>105893873
It's no wonder he has so little faith in LLMs if he's using Llama kek
Anonymous No.105893905 [Report] >>105893924
I didn't realize most of the thread was about a Youtube video with a stupid thumbnail.
Anonymous No.105893924 [Report] >>105893956
>>105893905
local models are just dead
Anonymous No.105893933 [Report]
>>105889193
Nice Miku
Anonymous No.105893950 [Report] >>105893981
>gemma 3 is probably the peak intelligence-wise of what I can run on my 24gb card
>it still acts retarded on more complex interactions that involve multiple things happening
I really wish models would get finetuned on spatial awareness.
Anonymous No.105893956 [Report] >>105894075
>>105893924
Local LLMs are not dead; they’ve just moved out of the “download a 4 GB file and run it on your GTX 1060” era.
Anonymous No.105893974 [Report] >>105894330
>>105887966
Isn't thinking beneficial even for RP?
Anonymous No.105893981 [Report] >>105893992 >>105893996
>>105893950
I think part of the problem is trying to one shot complex things. We could probably take these models a lot further with a multi prompt approach, which is what these thinking models seem to be trying to accomplish in a way.
Anonymous No.105893992 [Report]
>>105893981
Just use your MCP agentic model?
Anonymous No.105893996 [Report] >>105896189
>>105893981
Agent swarms are the next logical step.
Anonymous No.105894051 [Report] >>105894101 >>105894260 >>105894353 >>105894432
/lmg/ is in for something crazy in the next couple of weeks when even some literally who company can make something like Kimi by copying the Deepseek formula.
Anonymous No.105894075 [Report] >>105894116
>>105893956
Local models aren't dead, but I think they're in hibernation until hardware requirements become less Jewish for your average random coomer, whether that means the models performing at the same level at a smaller scale / parameter count, more efficient techniques to swap from things like CPU and SSD, or NVIDIA not continuing to grab consumers by the balls for their 32 GB VRAM GPUs
Even without it though, open models are vastly cheaper and have more options than the proprietary ones. Even for API fags, nobody in their right fucking mind would use something like the GPT-4.1 series when DeepSeek and Kimi exist
I think the ultimate endgame of the whole thing is gonna be people running models themselves, unless AI corpos start unironically pushing for a surveillance state
Anonymous No.105894101 [Report]
>>105894051
im so excited for all the synthslopped qwen tier "agentic" coding models
Anonymous No.105894116 [Report] >>105894133 >>105894213
>>105894075
I think the hardware requirements are just gonna rise for the shiny new models.
Locals who can't affort an ai machine will have to do with older models.
Anonymous No.105894133 [Report] >>105894147
>>105894116
That's already what's happening, poors stuck on Nemo, while slightly less poors run Deepseek, and then Kimi
Anonymous No.105894147 [Report] >>105894155 >>105894159
>>105894133
If you can run R1, you can run Kimi at a slightly lower quant. They're in the same category.
Anonymous No.105894150 [Report]
>>105889526
Yeah, like llama-4 would have been, if it didn't suck. GLM-4 had a decent 32b, so I have more hope in them.
Anonymous No.105894155 [Report]
>>105894147
What if I can run R1 at Q1?
Anonymous No.105894159 [Report] >>105894178
>>105894147
Yeah and if you can run Kimi, you can run Llama 4 Behemoth 2T at a slightly lower quant.
Anonymous No.105894178 [Report]
>>105894159
Correct. RAMmaxxers win again.
Anonymous No.105894213 [Report] >>105894276
>>105894116
>ai machine
there is no such thing available to the consumer, only cope machines
Anonymous No.105894256 [Report]
Huawei is fucking embarrassing now that kimi k2 is out there. First meta and then huawei. Is incompetency a requirement for big corp employees?
Anonymous No.105894260 [Report] >>105894270 >>105894289
>>105894051
all im asking for is a low knowledge, high logic local model that can be taught context using RAG
Anonymous No.105894270 [Report] >>105894273
>>105894260
You have Phi and Qwen already.
Anonymous No.105894273 [Report] >>105894280 >>105894284
>>105894270
both are shite
Anonymous No.105894276 [Report] >>105894295
>>105894213
Not true. That's why laptops come with AI Ready stickers and AI Copilot buttons.
Anonymous No.105894280 [Report] >>105894287
>>105894273
I wonder why.
Anonymous No.105894284 [Report] >>105894331
>>105894273
Your attitude is shite. Those two are exactly what you asked for. Maybe you don't really know what you want?
Anonymous No.105894287 [Report]
>>105894280
small context window, benchmaxxing
Anonymous No.105894289 [Report] >>105894309 >>105894326
>>105894260
You can't have a high logic model if it doesn't know what words mean.
I.e. a smart model will necessarily know what a mesugaki is and random facts about gachasluts.
Anonymous No.105894295 [Report]
>>105894276
shut up.
Anonymous No.105894309 [Report]
>>105894289
Triviacuck cope
Anonymous No.105894326 [Report] >>105894354
>>105894289
>I.e. a smart model will necessarily know what a mesugaki is and random facts about gachasluts.
So if you asked LeCun what a mesugaki is and he asked you who the fuck you are and what the fuck you're talking about, you'd say LeCun isn't smart? Trivia has nothing to do with smarts.
Anonymous No.105894330 [Report]
>>105893974
If you can keep the thinking part short and to the point with clear rules, it helps RP. Otherwise, it just seems to be making responses more varied in terms of wording.
Anonymous No.105894331 [Report]
>>105894284
hunyuan is a step in the right direction with a 256k context window but it's a broken mess. glm4 100b moe might save local.
Anonymous No.105894353 [Report] >>105894539
>>105894051
Not as cheap as you think.
Its costs were roughly what the 70b llama took to train. You still need a few thousand GPUs, and it doesn't get cheaper to run locally. It's just "what do we do when we're compute limited, given the gpus we have", MoE's have better scaling laws than dense for a given chosen size.
Anonymous No.105894354 [Report] >>105894515
>>105894326
The point is that he'd do a horrible job at roleplaying as a mesugaki even if he was given a one sentence definition.
Anonymous No.105894405 [Report] >>105894410 >>105894470
why are there so many nvidia shills itt? /lmg/ has always been about the best ~32b class models or more recently ~100b class moes we can get. yeah there's some fags who have way more money than sense and some jeets trying to run on their phones but core /lmg/ is what I described
Anonymous No.105894410 [Report]
>>105894405
Truth.
Anonymous No.105894432 [Report]
>>105894051
Mistral Large 3 in two more weeks. 1T+ parameters aren't taboo anymore.
Anonymous No.105894470 [Report]
>>105894405
>why are there so many people with more money than me
Anonymous No.105894507 [Report] >>105894525 >>105894538 >>105894550 >>105894581
fact is dense models are OVER. nobody will want to run 1T dense at 0.5 t/s. moe unironically killed local.
Anonymous No.105894515 [Report] >>105894541
>>105894354
Your point is wrong. Roleplaying is a skill, not something inherent to smartness.You can have the smartest person in the world or a a super-intelligent model with perfect spatial awareness and they could still suck at roleplay.
Anonymous No.105894525 [Report]
>>105894507
And that's a good thing.
Anonymous No.105894538 [Report]
>>105894507
Dense models will make a comeback when we have the hardware to run them sufficiently in ~5 or so years when we have LLM specific inference hardware (server grade, not for consumers)
Anonymous No.105894539 [Report]
>>105894353
That's peanuts for anyone who considers themselves a mid-sized player in the llm field. If any remotely established player puts out something that's worse than Deepseek was six months ago, they should just fuck off in shame.
Anonymous No.105894541 [Report] >>105894548 >>105894561
>>105894515
So you agree with me that the model needs to see many examples of mesugaki behaviour to properly roleplay as one and that it can't simply be RAGed?
Anonymous No.105894548 [Report] >>105894577
>>105894541
No, you want skills, not smarts.
Anonymous No.105894550 [Report] >>105894560
>>105894507
All I ask for is some balance. Just because it's MoE doesn't mean it has to be 7T total 1B active just because it's cheap to train.
Anonymous No.105894556 [Report] >>105895632
>>105893207
Beff is a literal retard though
Anonymous No.105894560 [Report]
>>105894550
This. MoEs can work if they have a decent amount of active parameters, but with too few active parameters, they suck.
Anonymous No.105894561 [Report] >>105894577
>>105894541
Trivia, skills, and intelligence are all separate properties.
Anonymous No.105894577 [Report] >>105894593 >>105894648
>>105894548
>>105894561
So you want a "smart" model that is still unable to do anything because you can't RAG skills?
Anonymous No.105894581 [Report]
>>105894507
>nobody will want to run
The issue isn't what people want to run, it's what people want to train.
Anonymous No.105894585 [Report]
It's all so tiresome.
Anonymous No.105894593 [Report]
>>105894577
This is why no one wants dense models
Anonymous No.105894648 [Report] >>105894800
>>105894577
Most models aquire their smarts from the math and coding skills they are trained on. Which, for me, is fine. I get a smart model and can RAG in some documentation.
Point is, even if a model is smart and trained on roleplay, it won't mean it will know what a mesugaki is, and even if a model knows what that is doesn't mean it will be able to roleplay as one. But that isn't the focus of those training models. You want trivia and skills that only end up in models as a mistake.
Anonymous No.105894716 [Report] >>105894743
I don't think Kimi K2 is that smart or good at writing.
Anonymous No.105894743 [Report]
>>105894716
Just release the new Deepseek already, no need for this
Anonymous No.105894800 [Report] >>105894812 >>105894814
>>105894648
It's called GENERAL intelligence for a reason. If it can't MSGK then this is just egregious false advertising.
Anonymous No.105894812 [Report]
>>105894800
It's called a LLM actually.
Anonymous No.105894814 [Report]
>>105894800
GENERAL intelligence != EVERYTHING intelligence
Anonymous No.105894815 [Report] >>105894851 >>105894881 >>105894898 >>105894913
Anonymous No.105894851 [Report]
>>105894815
Asus saved local
Anonymous No.105894881 [Report]
>>105894815
>Anus Republic of Gays
shame
Anonymous No.105894898 [Report]
>>105894815
>Update: According to ASUS, the RTX 5080 ROG Astral Hatsune Miku Edition will cost 16999 RMB. That’s twice the NVIDIA MSRP for RTX 5080 for the Chinese market.
lol
Anonymous No.105894913 [Report] >>105894975
>>105894815
True Miku fans have a watercooled GPU with yellow dye.
Anonymous No.105894975 [Report]
>>105894913
There are people who would buy branded Peeku premix, I know it
Anonymous No.105895108 [Report] >>105895137 >>105895252
>>105893873
Lecun is more invested in making his video JEPA model good than a model like LANG-JEPA which is something that would in theory supplant LLMs.
Anonymous No.105895137 [Report]
>>105895108
Language is too toxic
Anonymous No.105895184 [Report] >>105895211 >>105895319 >>105896294
localtards this is your mindset
Anonymous No.105895211 [Report]
>>105895184
Me on the left
Anonymous No.105895252 [Report]
>>105895108
There is no point in a language JEPA. JEPA needs to truly understand and learn how the world works. The text capabilities will emerge naturally once it has reached a certain stage.
Anonymous No.105895319 [Report] >>105895402
>>105895184
Local won though?
Top corpo models are: gemini, grok, and o3.
Gemini is fine but censored for some things.
Grok writing is shit.
o3 is is both expensive and censored.
R1 and DS3 will just do whatever? They work.
Smaller models work for some things fine too?
K2 basically matches opus, but is censored (needs a finetune), but can be jailbroken too, the dataset not censored from the olooks of it.
So not sure what your point is than to bait. Or are you salty about needing to spend 5k$ to cpumaxx?
So have fun getting even more safetycucked on corpo models anyway, you won't lose weights once they're open.
Anonymous No.105895401 [Report] >>105895453 >>105895462 >>105895473 >>105895490 >>105895869 >>105896190
Early K2 Q2 quants. Uploader is a llama.cpp contributor
https://huggingface.co/gabriellarson/Kimi-K2-Instruct-GGUF/tree/main
Anonymous No.105895402 [Report] >>105895443
>>105895319
>corpos suck at sex too
How is that a win?
Anonymous No.105895443 [Report]
>>105895402
I'm saying local is better at ERP at this point, I forgot to include Opus/Sonnet but 4 is a regression for this compared to 3, so really, corpo models get worse for us, and local ones can't get worse because you can't lose the weights once they're open. I'd say R1, at least for me, beats most corpo ones. K2 is usable, but jailbreaking makes it tedious enough I prefer R1 for now, might change if someone tunes just the experts responsible for refusals though.
Anonymous No.105895453 [Report] >>105895486
>>105895401
This branch needs to be pulled in order to run K2. Vocab changes and constant updates because it has so many experts
https://github.com/ggml-org/llama.cpp/pull/14654
Anonymous No.105895462 [Report] >>105895486
>>105895401
>https://huggingface.co/gabriellarson/Kimi-K2-Instruct-GGUF/tree/main

>375 GB
I'm in

llama.cpp support when
Anonymous No.105895473 [Report] >>105895488 >>105895496
>>105895401
>tfw only 96GB ram
ACK
Anonymous No.105895486 [Report]
>>105895453
>>105895462
Anonymous No.105895488 [Report] >>105895532
>>105895473
RAM is cheap. What is your excuse?
Anonymous No.105895490 [Report]
>>105895401
>tfw 352gb combined memorylet
It's over for me
Anonymous No.105895496 [Report] >>105895500
>>105895473
yeah we need some 150b-200b moes.
Anonymous No.105895500 [Report] >>105895516 >>105895522 >>105895564
>>105895496
Qwen 235B?
Anonymous No.105895516 [Report]
>>105895500
it runs alright, but its not my favorite desu. I kinda am hoping for something better to come along.
Anonymous No.105895522 [Report]
>>105895500
good 200b moes
Anonymous No.105895532 [Report] >>105895593
>>105895488
I'm not spending more than slightly above average on a PC just for non-G AI lol. Also my mb is already struggling with stability so more RAM is just not happening unless I switch MBs and pay even more.
Anonymous No.105895564 [Report]
>>105895500
Might as well recommend him dots while you're at it.
Anonymous No.105895593 [Report] >>105895796
>>105895532
You need a server board anyway for good memory bandwidth, that and a few 3090s for the active params. As usual, a dedicated machine is needed for this, but could be had under 5000$ or more depending, unless you want low t/s.
Anonymous No.105895632 [Report] >>105895756
>>105894556
A literal retard would be one that misuses the word 'literal' in describing someone that helped design Google's quantum tensor library.
Anonymous No.105895710 [Report]
>>105890087
>Named Luna
Anonymous No.105895756 [Report]
>>105895632
NTA, but while I wouldn't call him a retard, I think it's safe to say that he has a lot of obstacles before his stuff is viable for anything. Personally, I don't think he'll succeed (because he's not targetting what I believe is the main issue today), but I wish him luck. There were some other analog AI startups and they didn't go too far, they achieved low power use, but the param count as minuscule, I don't see him resolving this problem with his approach. Might be interesting research though.
Anonymous No.105895796 [Report]
>>105895593
Of course. I could also afford to just get a mac studio and sell it off in the future. I won't do that for a current level AI.
Anonymous No.105895869 [Report]
>>105895401

downloading prior to merge
Anonymous No.105896189 [Report]
>>105893996
Simpler than that, really. You know how before thinking we had the whole CoT thing, and how with thinking supposedly the models are trained to break problems down and dress each part, etc?
As a simple minimal example, one could take a simple non-thinking model and prompt it to look at the chat history and break down the context of the scene, then prompt it again asking it how {{char}} would act/react, then prompt it again for the final reply as {{char}}.
A simple multi prompt approach where it itterates over its own previous output before providing a reply, the manual version of a thinking model in a way.
I have a suspicion that that might actually perform better than training the models to have a whole ass thinking block, but that's just a gut feeling.
Anonymous No.105896190 [Report]
>>105895401
>barely fits in my memory and will probably run horribly
See you guys in two hours when this downloads.
Also hurry up, Daniel.
Anonymous No.105896217 [Report] >>105896277
>TTS
Just give me cunny voice ootb
Anonymous No.105896277 [Report]
>>105896217
skill issue
Anonymous No.105896292 [Report]
>>105896271
>>105896271
>>105896271
Anonymous No.105896294 [Report]
>>105895184
I don't even think of you