/lmg/ - Local Models General
Anonymous
9/12/2025, 7:39:33 PM
No.106566844
[Report]
►Recent Highlights from the Previous Thread:
>>106559371
--Paper: ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms:
>106561145 >106561161
--Troubleshooting llama-server.exe performance with multi-GPU configurations:
>106563691 >106563763 >106563772 >106563861 >106563838 >106563879 >106563891 >106563919 >106563941 >106563960 >106564017 >106564070 >106564107 >106564154 >106564411 >106564784
--Qwen3-Next model efficiency and performance analysis:
>106560211 >106560245 >106560248 >106560269 >106560274 >106560283 >106560310 >106560291 >106560294 >106560322 >106560314 >106560338 >106560356 >106560302 >106563929
--Optimizing MoE models via selective tensor offloading:
>106559871 >106559925 >106559938 >106559943 >106559962 >106559979 >106559984 >106560000 >106560056
--Role of LLMs in TTS and image generation via autoregression and embeddings:
>106562827 >106562864 >106562981 >106563064
--Qwen3 model's verbose reasoning issues during roleplay testing:
>106561341 >106561358 >106561391
--Public server setup for Qwen3-Next-80B-A3B-Instruct with 65t/s performance:
>106563343
--TTS phonemizer bottlenecks and optimization:
>106562423 >106562450 >106562542 >106562586 >106562603 >106562493 >106563141 >106562482 >106562515 >106562543 >106562579 >106562608 >106562763 >106564046 >106564012 >106564024
--California bill to regulate AI companion chatbots:
>106563074 >106563109 >106563402 >106563820 >106563680 >106564086 >106563394
--OpenAI optimization techniques boost local transformer efficiency:
>106563608
--FTC probes major tech firms' AI chatbots for child safety risks:
>106562092
--Specialized small LLMs vs multi-purpose models:
>106564203 >106564224 >106564273 >106564280 >106564230 >106564323 >106564409 >106564560 >106564600 >106564607
--Miku (free space):
>106559401 >106562108 >106562161 >106562252
►Recent Highlight Posts from the Previous Thread:
>>106559374
Why?:
>>102478518
Enable Links:
https://rentry.org/lmg-recap-script
Ramlets, how are we doing today?
Anonymous
9/12/2025, 7:44:50 PM
No.106566895
[Report]
Mikulove
>>106566876
I compress my ram - gives approx. 2 times more memory.
Anonymous
9/12/2025, 7:46:16 PM
No.106566908
[Report]
>>106566903
I sparsify my ram - gives 2 times more speed.
Anonymous
9/12/2025, 7:46:53 PM
No.106566920
[Report]
>>106566903
I overclock my ram,
It cost twice as much because it breaks.
Anonymous
9/12/2025, 7:47:04 PM
No.106566923
[Report]
>>106566952
>>106566778
If you're using it on API, why are you using Air instead of the big one?
Anonymous
9/12/2025, 7:47:08 PM
No.106566924
[Report]
Anonymous
9/12/2025, 7:48:04 PM
No.106566936
[Report]
Back in the MS-DOS days, I actually had a driver that compressed my RAM. It broke some things though.
https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/
https://services.google.com/fh/files/blogs/vaultgemma_tech_report.pdf
https://huggingface.co/google/vaultgemma-1b
The future of Google LLMs: models that know nothing about rare information. They use huge batch size to mitigate memorization, and other stuff.
>What does this mean in practice? Informally speaking, because we provide protection at the sequence level, if information relating to any (potentially private) fact or inference occurs in a single sequence, then VaultGemma essentially does not know that fact: the response to any query will be statistically similar to the result from a model that never trained on the sequence in question. However, if many training sequences contain information relevant to a particular fact, then in general VaultGemma will be able to provide that information.
>
> [...] Sequence-level DP provably bounds the influence of any single training sequence (example) on the final model. We prompted the model with a 50-token prefix from a training document to see if it would generate the corresponding 50-token suffix. VaultGemma 1B shows no detectable memorization of its training data and successfully demonstrates the efficacy of DP training.
>>106566923
Maybe he means he tried it, and found it lacking (in addition to the obese one).
>>106566876
As a 24GB vramlet I just went back to gemma3 qat and it's still the goat for ST-style RP. The writing is always fresh and pleasant to read, with outstanding vocabulary. Fucked around with offloading glm4.5-air q3 and other drummer finetunes and they seemed broken and inconsistent in their responses. Google needs to release a 80b moe gemma 4
Anonymous
9/12/2025, 8:01:51 PM
No.106567050
[Report]
>>106567073
Is this legit or over-hyped?
I find hard to believe that just 32B is enough to match GPT4 and 200B checkpoints.
With just 32B, you can run it locally in your PC, having a private GPT4...sounds too good to be true.
Anonymous
9/12/2025, 8:04:34 PM
No.106567073
[Report]
>>106567050
Hard to say, can't see the image any longer.
Is this legit or over-hyped?
I find hard to believe that just 32B is enough to match GPT4 and 200B checkpoints.
With just 32B, you can run it locally in most PCs with a decent quantization that doesn't sacrifice much, having a private GPT4 with no quota limitations...sounds too good to be true.
Anonymous
9/12/2025, 8:10:48 PM
No.106567127
[Report]
>>106567118
Sounds like a typical marketing department sales pitch.
Anonymous
9/12/2025, 8:13:57 PM
No.106567150
[Report]
>>106567118
>sounds too good to be true
Congratulations, you have a brain.
Anonymous
9/12/2025, 8:19:22 PM
No.106567193
[Report]
>>106567118
>reasoning
always and has always been a meme
Anonymous
9/12/2025, 8:45:57 PM
No.106567440
[Report]
>>106567105
>>106566972
Welcome to /lmg/ thread google stealth marketing engineer technician saars. Please kindly inform us if the next gemma will be as cucked as the last one so I can decide if I should post funny jeet pictures or not.
Anonymous
9/12/2025, 8:47:50 PM
No.106567457
[Report]
>>106566972
> glm4.5-air
> glm4.5-air
To me work well is at the moment the best model for Vramlets, you are doing something work, this model is ultra sensitive to context temperature, when you reach too much context is time to low the temperature.
>>106566836 (OP)
quoteth the raven.....
Anonymous
9/12/2025, 9:13:02 PM
No.106567662
[Report]
>>106567715
>>106565629
It's been stuck for four hours...
Anonymous
9/12/2025, 9:20:56 PM
No.106567707
[Report]
>>106568235
>>106566944
If this works how it looks like it works based on what you quoted and the image, then the model would theoretically be equivalent to stuff like Phi. Maybe a bit better. But ultimately it will have trouble with real world user queries since it would lack OOD skills and knowledge. This technique can only create models for extremely narrow use cases, not general assistant models. So if they do it for all future models, Google would be shooting themselves in the foot and lose all the market share they just spent tons of effort to claw back.
Anonymous
9/12/2025, 9:21:24 PM
No.106567715
[Report]
>>106567662
It's probably for the better.
WSL is a janky half-measure.
If you want to run windows for your main gaming/home office PC but you want to run linux for LLM stuff just get a second system and run linux properly.
>>106566952
GLM-chan is NOT obese!
Anonymous
9/12/2025, 9:32:37 PM
No.106567806
[Report]
>>106568384
>>106567118
Both can be true
30b's are seeing massive improvements in abilities but that has to do with coding, physics, chemical equations etc. And who gives a fuck about that. It's a glorified wikipedia. Good for grunt work.
For stuff like writing and other more complex tasks, size is still king and may be for a long time. My LLM needs to know every dragon ball z character and understand that yes, piccolo <could> wipe his ass with goku's face if he wanted to. If you want nuance and complexity, simple trivia is not gonna do it for ya.
Anonymous
9/12/2025, 9:33:10 PM
No.106567814
[Report]
>>106567795
(stomp stomp stomp)
Anonymous
9/12/2025, 9:41:05 PM
No.106567898
[Report]
>>106567970
>>106567118
I'll ignore all the rest of the shit in the post. What called my attention was
>2000 tokens/s on cerebras
>most reasoning models crawl at 200 tok/sec
What the fuck does reasoning have to do with the token generation speed?
And why the fuck are you paying attention to that retard?
Anonymous
9/12/2025, 9:45:32 PM
No.106567937
[Report]
>>106567628
ggufs nevermore
Anonymous
9/12/2025, 9:48:32 PM
No.106567970
[Report]
>>106567898
Assuming that's total throughput over multiple concurrent requests, that guy has skill issues with the other models or cerebras is shit.
Anonymous
9/12/2025, 9:49:12 PM
No.106567977
[Report]
>>106568544
>>106567795
>>106566952
This is just AIR-CHAN, GLM-CHAN wont even fit in the frame.
Anonymous
9/12/2025, 10:15:22 PM
No.106568235
[Report]
>>106568661
>>106567707
They were already bragging how memorization in Gemma 3 was lower than Gemma 2, so I think that's the direction where things are going.
Anonymous
9/12/2025, 10:24:54 PM
No.106568337
[Report]
>>106568567
>>106566944
>adding noise to the model so it doesn't precisely reproduce input data
>reduces training stability, increases computation costs
>today’s private training methods produce models with utility comparable to that of non-private models from roughly 5 years ago, highlighting the important gap our work will help the community systematically close.
Okay, so adding noise to the model makes it significantly worse (as one would expect). They seem to think that's avoidable, but I don't know how.
Anonymous
9/12/2025, 10:25:14 PM
No.106568344
[Report]
>>106566944
>make your model retarded
>brag about it
Anonymous
9/12/2025, 10:27:37 PM
No.106568369
[Report]
>>106567118
I think punching above weight and trading blows with gpt 4 started in 2024.
Anonymous
9/12/2025, 10:28:02 PM
No.106568374
[Report]
>>106572145
macchads just cant stop winning
Anonymous
9/12/2025, 10:28:38 PM
No.106568384
[Report]
>>106567806
>piccolo <could> wipe his ass with goku's face if he wanted to
That is only at the start of dbz.
Anonymous
9/12/2025, 10:29:58 PM
No.106568398
[Report]
is next better than 235B for sex?
Anonymous
9/12/2025, 10:31:40 PM
No.106568414
[Report]
>>106572145
macchads just cant stop winning/2
Anonymous
9/12/2025, 10:32:38 PM
No.106568426
[Report]
>>106572145
macchads just cant stop winning/3
Anonymous
9/12/2025, 10:47:29 PM
No.106568544
[Report]
>>106567977
Fake news! GLM-chan is a slender young lady.
Anonymous
9/12/2025, 10:49:17 PM
No.106568567
[Report]
>>106568337
Differential Privacy is an area of research. Researchers work on it, promising "it'll be good soon" and push out papers. Everyone not working in that field ignores them, unless they need to bring them up for compliance like "we're working on DP, don't worry".
Anonymous
9/12/2025, 10:53:37 PM
No.106568601
[Report]
Macfags stopped bragging about glm. It's the one thing they had. They will stop bragging about 80b-a3b soon enough. It's the one thing they'll have for a few days.
Then it's back to just sucking cock.
Anonymous
9/12/2025, 11:01:20 PM
No.106568659
[Report]
>>106568674
anyone else testing qwen 80b right now? it feels scarily close to 235b, long context, barely hallucinates, incredibly fast. their new moe architecture must be getting close to the closed models sota.
(testing the mlx version)
Anonymous
9/12/2025, 11:01:56 PM
No.106568661
[Report]
>>106568688
>>106568235
That's weird since Gemma 3 still knows a ton of trivia compared to other similarly sized models. If it truly didn't memorize things, then it should be worse than even Qwen. Also that graph is weird. How does 4B have the exact same memorization rate as 9B and 27B?
Anonymous
9/12/2025, 11:03:22 PM
No.106568674
[Report]
>>106568697
>>106568659
It's honestly been garbage for me, outputs seem comparable or even a little worse than 30B A3B
>>106568645
>needs a lora to do it
Local nano banana when?
Anonymous
9/12/2025, 11:05:18 PM
No.106568688
[Report]
>>106568747
>>106568661
xou don't need or want the model to memorize single examples verbatim. its supposed to generalize. if the information is presented many times it will get integrated. just not a single example. its just to prevent personal information from getting memorized.
Anonymous
9/12/2025, 11:05:27 PM
No.106568690
[Report]
>>106568680
If you praise china hard enough, two more weeks.
If you dont 4 more months
Anonymous
9/12/2025, 11:06:20 PM
No.106568697
[Report]
>>106568680
>need a lora
wrong mentality, reframe: can train the model to do new things if desired.
Also it can already do this without, this just enhances the result faithfulness (fabric texture on fumo, painting style).
Anonymous
9/12/2025, 11:08:24 PM
No.106568713
[Report]
>>106568718
>>106568700
But I don't want to browse, search for, and maintain a library of loras like I did during the SD days. Not again...
Anonymous
9/12/2025, 11:09:29 PM
No.106568718
[Report]
>>106568713
Yeah me neither
But I also understand that no model, ever, will be able to do enough for what I want to see.
Being able to teach a model new stuff is important to me.
Granted this isn't super complex stuff and I can understand why you'd want it out of the box.
Anonymous
9/12/2025, 11:12:32 PM
No.106568743
[Report]
>>106568753
>>106568700
Let me guess you need 100gb of vram to train a Qwen lora?
>>106568688
So is it a bad thing then? Won't the model have more room to learn things when it's not spending time memorizing everything word for word? I mean, could this explain why Gemma seems to know so much for its parameter size?
Anonymous
9/12/2025, 11:13:33 PM
No.106568753
[Report]
>>106568743
yeah it's a dire situation there, I had to use prosumer hardware.
still, the results were fantastic on even a tiny dataset, model understand is clever even if the base results are dog.
Anonymous
9/12/2025, 11:18:17 PM
No.106568779
[Report]
Anonymous
9/12/2025, 11:19:28 PM
No.106568789
[Report]
>>106572155
/lmg/ still struggling to cope with the fact that apple won local
Anonymous
9/12/2025, 11:19:48 PM
No.106568796
[Report]
>>106568925
>>106568645
How well does that lora work with fancy outfits?
Anonymous
9/12/2025, 11:20:35 PM
No.106568803
[Report]
>>106568747
I think it is a valid idea. it shouldn't hurt models at the trillions of training tokens scale. anything important will be seen multiple times from multiple examples. it just won't be able to reproduce some random persons medical information or ssn.
Anonymous
9/12/2025, 11:21:10 PM
No.106568808
[Report]
>>106568747
The possibly have very good/long post-training and include general trivia there.
Anonymous
9/12/2025, 11:22:07 PM
No.106568821
[Report]
>>106568866
my toaster is already creepin in on the 6 year mark
Planning my next machine to buy on this black friday, will prolly buy an rtx 5090, a ryzen 9950X3D cpu and 256gb of ddr5 memory, what are some of the more memory hungry models I should try then?
Anonymous
9/12/2025, 11:27:53 PM
No.106568863
[Report]
>>106568879
What are some must have extensions for SillyTavern?
Anonymous
9/12/2025, 11:28:10 PM
No.106568866
[Report]
why is training a model to make it specialized so fucking hard?
Anonymous
9/12/2025, 11:29:43 PM
No.106568879
[Report]
>>106568892
Anonymous
9/12/2025, 11:31:36 PM
No.106568892
[Report]
>>106568898
>>106568879
I already have one. She doesn't like AI or me using it but she respects my drive to learn and be skilled with computers.
Anonymous
9/12/2025, 11:31:59 PM
No.106568897
[Report]
>>106568875
its easier now then it was 10 years ago.
Anonymous
9/12/2025, 11:32:11 PM
No.106568898
[Report]
>>106568892
does she know you are having sex with it?
Anonymous
9/12/2025, 11:33:09 PM
No.106568907
[Report]
>>106569204
>>106568645
>spyware UI
any other options?
Anonymous
9/12/2025, 11:34:15 PM
No.106568916
[Report]
>>106568923
qwen goofs???????????????
Anonymous
9/12/2025, 11:35:11 PM
No.106568923
[Report]
>>106568936
>>106568796
dunno, you can crank an arbitrary resolution though
Anonymous
9/12/2025, 11:35:56 PM
No.106568931
[Report]
>>106566944
safetymaxxing niggers
Anonymous
9/12/2025, 11:36:48 PM
No.106568936
[Report]
>>106568962
>>106568923
>mlx
I said goofs nigger, im not downloading lm studio or vllm for the awq variant
Anonymous
9/12/2025, 11:38:02 PM
No.106568947
[Report]
can I run glm4.5v in llama.cpp? I cant find gguf for it
Anonymous
9/12/2025, 11:38:10 PM
No.106568949
[Report]
>>106568875
If you're trying to teach it domain-specific information, it's pretty much impossible with a tiny LoRA and/or without heavily affecting previous knowledge with a huge learning rate and burning the information into the weights.
Using summarized information might work better/faster than entire documents, but good luck training the model in a way to make it truly understand the information without just parroting it (verbatim, even).
Anonymous
9/12/2025, 11:39:39 PM
No.106568962
[Report]
>>106568998
>>106568936
wait 2 more weeks then faggot
Anonymous
9/12/2025, 11:42:47 PM
No.106568998
[Report]
qwen next is officially goated
Anonymous
9/12/2025, 11:43:54 PM
No.106569014
[Report]
>>106569044
>>106568680
judging by the google studio api outages the chinese are already working on it
Anonymous
9/12/2025, 11:44:39 PM
No.106569021
[Report]
>>106569008
what happens at 100%?
Anonymous
9/12/2025, 11:45:01 PM
No.106569026
[Report]
>>106569008
>80b
>worse than 30b coder
go fuck your herpes infested goat
Anonymous
9/12/2025, 11:45:06 PM
No.106569029
[Report]
>>106569072
>>106569008
why is thinking variant so much worse than the regular chat version?
Anonymous
9/12/2025, 11:46:04 PM
No.106569044
[Report]
>>106569014
But distilling never gets the performance of the original...
Anonymous
9/12/2025, 11:48:17 PM
No.106569072
[Report]
>>106569104
>>106569029
it hasnt been trained on the secret benchmaxx
Anonymous
9/12/2025, 11:48:34 PM
No.106569075
[Report]
>>106569111
yeah im starting to think its over
Anonymous
9/12/2025, 11:48:59 PM
No.106569082
[Report]
>>106570295
>>106569008
>officially goated
>Lost to Qwen3-Coder-30B
Dense bros we can't stop winning
Anonymous
9/12/2025, 11:51:18 PM
No.106569104
[Report]
>>106569072
wouldn't that be kind of against the point if they have to train it specifically for the benchmark? That's like one of the big flaws with test driven programming where you make your program fit your test rather than your actual problem.
Anonymous
9/12/2025, 11:52:20 PM
No.106569111
[Report]
>>106569075
GLM4.5 AIR BROS, WE CANT STOP WINNING!!!
Anonymous
9/12/2025, 11:57:44 PM
No.106569153
[Report]
>pull vllm
>follow the same steps I did last time that I wrote down for successfully compiling it, which I had to change and come up with new steps for as the ones previous stopped working at some point
>doesn't work anymore either, giving different errors
Sigh...
Anonymous
9/13/2025, 12:04:18 AM
No.106569204
[Report]
>>106569281
>>106568907
>>spyware UI
Uhh what??
Anonymous
9/13/2025, 12:06:38 AM
No.106569224
[Report]
so... has anyone modified a onahole to interact with a LLM yet?
Anonymous
9/13/2025, 12:08:05 AM
No.106569237
[Report]
Groksirs when is we getting supports?
https://github.com/ggml-org/llama.cpp/pull/15539
@CUDAdev sir please do the needful Vishnu bless yo
>vllm supports gguf guys!!!
>try loading a gguf
>it just errors out
My ass it's supported. Now I'm downloading some safetensors to try again and see if it's a model issue or my build is just fucked for some reason.
Anonymous
9/13/2025, 12:13:02 AM
No.106569281
[Report]
>>106571956
>>106569204
sends your data to jewgle on startup. API nodes, electron have it but they are optional. the manager calls home
Anonymous
9/13/2025, 12:13:30 AM
No.106569285
[Report]
>>106568925
>/gig/ on /lmg/
Weird colab
Anonymous
9/13/2025, 12:15:17 AM
No.106569299
[Report]
>>106569319
https://www.trendforce.com/news/2025/09/11/news-kioxia-reportedly-eyes-2027-launch-for-nvidia-partnered-ai-ssds-with-100x-speed-boost/
>Kioxia, in partnership with NVIDIA, is developing next-generation SSDs aimed at AI servers, targeting commercialization by 2027 with read speeds nearly 100 times faster than current models, reaching up to 200 million IOPS using PCIe 7.0 technology. These SSDs, designed to partially replace HBM as GPU memory expanders, reflect the growing AI-driven demand in storage, with projections indicating that AI-related NAND will comprise 34% of the global NAND market by 2029, adding $29 billion in total addressable market (TAM). As a result, a U.S. investment firm warns of a potential NAND shortage starting in 2026, exacerbated by increased adoption of QLC eSSDs, Nearline SSDs, and high-bandwidth flash in response to tightening HDD supplies and AI infrastructure needs.
SSDmaxxers, 2027 will be your year!
Anonymous
9/13/2025, 12:17:26 AM
No.106569319
[Report]
>>106569299
>Kioxia, in partnership with NVIDIA
doa
Anonymous
9/13/2025, 12:19:48 AM
No.106569331
[Report]
>>106569357
>>106569268
When I tried it, I couldn't get a single moe gguf to load. I was expecting it to be slow and unoptimized, but it didn't even load.
Anonymous
9/13/2025, 12:20:43 AM
No.106569335
[Report]
>>106569357
>>106569268
just use llama for goofs bro
Anonymous
9/13/2025, 12:23:10 AM
No.106569349
[Report]
>>106569357
>>106569268
Support for gguf on vllm is ass anyway
Anonymous
9/13/2025, 12:24:17 AM
No.106569356
[Report]
>>106569268
>he expects corposlop pyshit to "just work" without suffering through dependency hell
>>106569268
Ok I just tried loading a small safetensors model and it also failed. Searching the error on the github issues gives 0 hits.
Wtf is wrong with vllm man.
I suppose GPU would probably work fine as I can just download the prebuilt wheels, but the CPU build is not well.
>>106569331
Thanks.
I think CPU AND goof support just simply cannot be expected to be stable on vllm. Let alone GPU + CPU inference which isn't currently supported.
>>106569335
>>106569349
It seems even safetensors don't work on my build kek. They don't truly "support" goofs or CPU either.
you will give me ze best GERMAN low latency TTS or STS model right now!
I'm tired of my models turning into drooling retards when trying to pronounce 25 aka FÜNFUNDZWANZIG!
>fünfffffuuffuhffffzwwffggg 'hick!
Don't make me use jewgle voices...(fuck elevenlabs, they aren't even that good).
Anonymous
9/13/2025, 12:27:23 AM
No.106569379
[Report]
>>106569407
Did Drummer finally troon out?
Anonymous
9/13/2025, 12:27:56 AM
No.106569385
[Report]
>>106569553
>>106569357
Thier github is practically useless and it seems like all support happens through their discord.
What error did you get? Try using the V0 engine. They rushed getting V1 out and making it the default while it was still a broken pile of shit missing more than half the features of V0.
Anonymous
9/13/2025, 12:28:23 AM
No.106569391
[Report]
>>106569520
>>106569367
Just directly pass your ipa transcription?
Anonymous
9/13/2025, 12:29:12 AM
No.106569402
[Report]
>>106569367
VibeVoice
>low latency
oh...
Anonymous
9/13/2025, 12:29:24 AM
No.106569407
[Report]
>>106569413
>>106569379
What's with this tech to troon pipeline?
Anonymous
9/13/2025, 12:31:07 AM
No.106569413
[Report]
>>106569471
>>106569407
>What's with this tech to troon pipeline?
The terminally online fit into two groups mostly, the mentally ill and assholes. If you are weak to praise and group think the former is where you will stay. If you just want to solve problems you are going to argue try out things fix it come back and then call everyone a dumbass.
Anonymous
9/13/2025, 12:35:02 AM
No.106569441
[Report]
>>106569488
How do you guys usually write character prompts? Do you just write a paragraph describing them, or something more structured?
Anonymous
9/13/2025, 12:37:35 AM
No.106569471
[Report]
>>106569492
>>106569413
your logic is a self report
anyway, maybe focus less on people, or do you keep your nose firmly buried up everyone's ass
Anonymous
9/13/2025, 12:38:01 AM
No.106569474
[Report]
>>106569564
Since I'm an esl I'd like to know if this prosody using heuristics (kokoro) sounds acceptable for americans:
https://vocaroo.com/17z5mdm2a0yU
The sample is complex on purpose so I can test a bunch of heuristics at once: "At 11:30 p.m. on Jan. 3, 2024, the project's lead (Dr. O'Neil) announced, "We'll ship v2.1 to the U.S. and EU by Q2," but a $1250.50 shortfall, a 5% processing fee, and three critical API regressions forced the team to triage, reassign tasks, and reconvene Monday—prioritizing user-facing fixes over backend refactors to preserve product quality."
Anonymous
9/13/2025, 12:39:28 AM
No.106569488
[Report]
>>106569441
Everyone here just asks GPT5 to write it for them and improve it. Nobody uses local models for roleplay, GPT5 is the current meta.
>>106569471
What isnt a self report? im writing myself my opinion what else am i suppose to right or think
Anonymous
9/13/2025, 12:41:13 AM
No.106569503
[Report]
>>106569614
The hell kinda quant method are you supposed to use again? I've seen conflicting reports.
Anonymous
9/13/2025, 12:41:53 AM
No.106569510
[Report]
Anonymous
9/13/2025, 12:43:32 AM
No.106569520
[Report]
>>106569557
>>106569391
Local or private, show me a conversational STT>TTS method with decent enough latency. Best I found was some hugging face space from the transformers.js dudedev. But it was kinda meh and engerrish only. and it had no dialogue turn system or whatever that interruption mechanic is called. I really cba developing this as I'm more interested in the backend stuff. I'd just use openAI realtime API for prototyping, but fuck me those prices are surreal.
Anonymous
9/13/2025, 12:46:27 AM
No.106569553
[Report]
>>106569630
>>106569385
>discord
Ugh.
The error is "TypeError: Cannot instantiate typing.Literal". I guess I could ask my llm about it to see if it possibly has any solutions.
How do I use the V0 engine? I tried the environment variable but it doesn't seem to do anything?
Anonymous
9/13/2025, 12:46:43 AM
No.106569557
[Report]
>>106569617
>>106569520
Check this
https://github.com/Open-LLM-VTuber/Open-LLM-VTuber it has a barge-in system which is the interruption mechanic you're looking for
Anonymous
9/13/2025, 12:47:24 AM
No.106569564
[Report]
Anonymous
9/13/2025, 12:50:09 AM
No.106569594
[Report]
>>106569357
I was actually fucking with the cpu install just to see if I could give next a little test or two, but I could smell it being a migraine the minute I started running into weird dependency mismatches. I'd honestly rather wait the multiple weeks it'll take to get support in llamacpp only to test it and go "yeah, it's pretty shit for writing" anyway. Shame, because small active parameter models are great for cheap context and being relatively fast off the bat. Even jamba with more active parameters is still pretty snappy if you put enough of it into vram, but sperganov has yet to fix the constant reprocessing on new messages for it, or multiple a couple other models that whatever the fuck they coded that causes this.
Anonymous
9/13/2025, 12:52:43 AM
No.106569614
[Report]
>>106569646
>>106569503
IQ2 is the new meta, really. You will not notice any difference even when using smaller models.
Anonymous
9/13/2025, 12:53:08 AM
No.106569617
[Report]
>>106569672
>>106569557
yeah that looks promising, will give it a shot. thanks, pedo.
Anonymous
9/13/2025, 12:55:14 AM
No.106569630
[Report]
>>106569666
>>106569553
>How do I use the V0 engine? I tried the environment variable but it doesn't seem to do anything?
Read the output on startup, it should tell you which engine is being used.
>TypeError: Cannot instantiate typing.Literal
See if there's any hints in the stack trace before this part. For me, the only success I had when vllm decided to throw errors was upgrading or downgrading (at the cost of model support) the vllm version. Using the V0 engine solved a lot of trouble for me, but once they hit v10, I gave up on messing with it.
Anonymous
9/13/2025, 12:57:09 AM
No.106569646
[Report]
>>106569614
He says as he spins on his heel and then says how q8 kv cache is disastrous for models or something
Anonymous
9/13/2025, 12:59:19 AM
No.106569666
[Report]
>>106569630
Yeah I think I'll just stop here if my LLM doesn't solve it. Don't feel like trying out various versions.
And honestly I have a feeling the CPU performance is worse than Llama.cpp's anyway, but it'd be nice to actually confirm.
Anonymous
9/13/2025, 1:00:09 AM
No.106569672
[Report]
Anonymous
9/13/2025, 1:18:07 AM
No.106569780
[Report]
>>106568925
>>106568645
damn, wasn't qwen-image and qwen-image edit supposed to be a slopped failure that's not worth it to run?
Is there some kind of model that can act as a sort of twitch chat for when you're playing games by yourself? Like you give it a video feed and it reacts in some way. Just so that it's not so lonely.
Anonymous
9/13/2025, 1:27:28 AM
No.106569822
[Report]
>>106569829
>>106569817
im gonna use this idea to become a billionaire
thanks
Anonymous
9/13/2025, 1:28:50 AM
No.106569829
[Report]
>>106569822
You'd be lucky to make lunch money. The only billionare is the owner of whatever platform streaming platform you use.
Anonymous
9/13/2025, 1:29:49 AM
No.106569832
[Report]
>>106569878
>finally found a model with little censoring and pretty comptetent logic
>leaks chatml all over the place half the time
sigh
Anonymous
9/13/2025, 1:32:37 AM
No.106569839
[Report]
>>106569817
Having used 30b+ models, it depends. If you start off in a prose setting, then ask it to interject with something like a chat/review section (eg: character reads chat mid-story), it will fuck it up. Off the bat with examples, maybe. As for giving an llm a video feed, I don't think that's feasible at the moment unless you have a shitload of vram or some kind of highly specialized and hand written pipeline
Anonymous
9/13/2025, 1:33:25 AM
No.106569846
[Report]
>>106569817
>Just so that it's not so lonely
This is a general for pretend sex with computers and yet this post is one of the most pathetic things I've ever read
Just became a 128gb ramGOD with 24gb vram. What's the best I can run now?
Anonymous
9/13/2025, 1:36:34 AM
No.106569868
[Report]
>>106569856
>ram
>What's the best I can run now
You mean crawl?
>>106569817
for the millionth time, no we cant build screenreading twitch yet, no we dont know how neuro does it and it cannot be done locally for any sane price
Anonymous
9/13/2025, 1:37:38 AM
No.106569874
[Report]
>>106569856
Probably glm 4.5 at iq3, with a whopping 9 t/s on empty context
Anonymous
9/13/2025, 1:38:16 AM
No.106569876
[Report]
Anonymous
9/13/2025, 1:38:26 AM
No.106569878
[Report]
>>106569832
What model and how does it "leak" "chatml" "all" "over" "the" "place"?
Anonymous
9/13/2025, 1:40:07 AM
No.106569888
[Report]
Anonymous
9/13/2025, 1:43:15 AM
No.106569905
[Report]
>>106569920
>>106569856
>128gb ramGOD
sounds you are a channellet with 2 channels at most
Anonymous
9/13/2025, 1:48:48 AM
No.106569920
[Report]
>>106569905
Times are changing old man, I could barely fit a llama2 13b at 4k context and now I can run 100b moes and 32b dense models with bare minimum 16k context yet I have not bothered buying new hardware
Anonymous
9/13/2025, 1:49:30 AM
No.106569923
[Report]
>>106569955
>>106569817
Could you get a primitive version by sending screenshots through a multimodal model?
Anonymous
9/13/2025, 1:55:33 AM
No.106569955
[Report]
>>106569923
If you don't mind minute+ long latency.
>>106569869
>no we dont know how neuro does it
It's still hilarious that some random guy built a better utilization for AI than trillions in VC cash between every major corporation
>>106570004
Did you forget Ani exists?
Anonymous
9/13/2025, 2:06:26 AM
No.106570016
[Report]
>>106570004
>It's still hilarious that some random guy built a better utilization for AI than trillions in VC cash between every major corporation
Not really, if you dig into anything you will realize its a very small group of people actually doing anything at all sometimes its just one hyperfocused dude who does nothing but that for years cause of autism.
Anonymous
9/13/2025, 2:06:28 AM
No.106570017
[Report]
>>106570013
I wish I could
Anonymous
9/13/2025, 2:07:32 AM
No.106570027
[Report]
>>106570013
Someone post the mouth animations
Anonymous
9/13/2025, 2:08:44 AM
No.106570036
[Report]
>>106570048
>>106569869
Uhh, techlet?
>stream has a few minutes long delay (this is what most parasites do normally even)
>selected twitch chat entries are redirected to llm
It's not rocket science.
He wrote a backend what controls the character and llm and integrates them together but I can assure I could make a demo if I had more interest.
>>106570036
>I can assure I could make a demo if I had more interest.
That means you cant, and no one else has cracked it as good and made it available.
Anonymous
9/13/2025, 2:10:47 AM
No.106570054
[Report]
>>106570278
>I could
lol
Anonymous
9/13/2025, 2:15:48 AM
No.106570087
[Report]
The new fiscal quarter starts in October. As usual, this will be when companies start pushing out new models to look good.
Two more weeks and the big releases start.
Anonymous
9/13/2025, 2:16:51 AM
No.106570094
[Report]
>>106570100
>>106570048
>no one else has cracked it as good and made it available.
What is the incentive to put in that much work just to make it available because you want it? Even if I put in that much effort, I would just make a Neuro knockoff and try to make money off it.
Anonymous
9/13/2025, 2:18:24 AM
No.106570100
[Report]
>>106570094
Okay thats fair, but still if you can clone it and make money why not? how come none of the 'i made my own neuro' is close to his?
Anonymous
9/13/2025, 2:19:43 AM
No.106570107
[Report]
My implementation is cool she's just on the Canadian Github
Anonymous
9/13/2025, 2:20:22 AM
No.106570114
[Report]
>>106570048
You are just too stupid and/or underage even. Jesus christ these ERPers shouldn't even allowed to post in this thread.
Anonymous
9/13/2025, 2:52:58 AM
No.106570266
[Report]
I just did a test of GPU performance with vllm and llama.cpp. With Qwen 4B, original BF16 safetensors, I got around 71 t/s on vllm with empty context, and 45 t/s on llama.cpp with a BF16 GGUF. At 8k context, Llama.cpp got 44 t/s, and vllm got 60 t/s. I also tried an F16 GGUF and it was like 2 t/s faster. These results suggest that at least on my system with Qwen 4B on full precision, there is something bottlenecking Llama.cpp. Maybe it'd be closer with a larger parameter model, but I don't have the VRAM to hold the full precision weights.
Anonymous
9/13/2025, 2:54:58 AM
No.106570278
[Report]
>>106570054
Problem nigger?
Anonymous
9/13/2025, 2:57:02 AM
No.106570295
[Report]
>>106572734
>>106569082
So, a general instruct model lost to a model that was specialized for coding at coding, and that's supposed to be a mark against the general instruct model?
Anonymous
9/13/2025, 3:03:58 AM
No.106570343
[Report]
>>106570369
>>106569869
Nah you coomers are braindead. There are bazillions of projects like these on github
https://github.com/kimjammer/Neuro
Anonymous
9/13/2025, 3:10:29 AM
No.106570386
[Report]
>>106572714
>>106566836 (OP)
Question about -ts command on llamacpp, when considering a split, would I consider the fact that my main(first) GPU will have vram in use from other programs and windows when considering the split? Or does llama.cpp take that into consideration and balance it properly? Are there any other commands that will just split the vram evenly between two cards without having to adjust numbers with -ts? I find myself using such odd -ts number combos to get an almost even vram usage split, I don't know why.
For example, currently -ts 24,15 has it split almost evenly between my cards which makes no sense to me considering my 1st card is using vram for other programs and windows. I just don't like how I have to keep re-loading the model over and over trying different numbers until I find a combo that splits it properly.
Anonymous
9/13/2025, 3:11:19 AM
No.106570396
[Report]
>>106570405
What if my computer is over a decade old with no vram? Is local llms the hobby for me.
Anonymous
9/13/2025, 3:12:14 AM
No.106570405
[Report]
>>106570369
Wait, so it is possible? Why were anons being mean to me? Are they trying to keep this tech all to themselves?
Anonymous
9/13/2025, 3:24:38 AM
No.106570488
[Report]
>>106570796
>>106570369
>ElevenLabs voice synthesis
Anonymous
9/13/2025, 3:50:41 AM
No.106570646
[Report]
>>106570480
You talked to clueless retards. Very few here know more than edging to chub cards
Anonymous
9/13/2025, 3:50:55 AM
No.106570647
[Report]
>>106570480
they're all tsun with little dere here
Anonymous
9/13/2025, 4:15:12 AM
No.106570796
[Report]
>>106570807
>>106570488
It's the best and will continue to be the best
Anonymous
9/13/2025, 4:16:07 AM
No.106570807
[Report]
>>106570796
China will rip off vibe voice and make it better.
I believe
Finally a model that passes this test and it's only 1.7B and open sourced. wild
>>106570867
Didn't it get the 8th character wrong?
Anonymous
9/13/2025, 4:38:51 AM
No.106570901
[Report]
>>106570943
>>106570867
the model alone or the whole stack?
Anonymous
9/13/2025, 4:39:13 AM
No.106570902
[Report]
>>106570892
well fuck, guess there's always next model
Anonymous
9/13/2025, 4:48:10 AM
No.106570943
[Report]
>>106570901
just the model
Anonymous
9/13/2025, 4:51:31 AM
No.106570964
[Report]
>>106571077
What is a good model for being negative and critical? I hate how they agree with everything. I want to be told I'm wrong or being an idiot.
Anonymous
9/13/2025, 5:09:39 AM
No.106571070
[Report]
>>106571360
For those of you who use the models for anything other than porn, what is the best way to let the model browse the web to search for info?
In my opinion the difference nowadays between proprietary models and local is mostly in the tooling integration rather than the actual models.
>>106570964
Kimi k2 is the GOAT
>>106571077
>Kimi k2
Is kimi k2 local? can you run it?
Anonymous
9/13/2025, 5:14:49 AM
No.106571094
[Report]
>>106571102
>>106571090
No but I understand that one or two people here can run it :)
nta btw
Anonymous
9/13/2025, 5:14:57 AM
No.106571095
[Report]
Anonymous
9/13/2025, 5:16:10 AM
No.106571099
[Report]
Anonymous
9/13/2025, 5:16:36 AM
No.106571102
[Report]
>>106571094
>one or two people here can run it :)
I wish i was one of them.
Anonymous
9/13/2025, 5:16:48 AM
No.106571105
[Report]
>>106571123
>>106571077
Can it still talk about medical or mental stuff or does it just shut down?
Anonymous
9/13/2025, 5:19:24 AM
No.106571123
[Report]
>>106571105
post your full medical history and social security number and i'll ask my buddy kimi
Anonymous
9/13/2025, 5:40:59 AM
No.106571237
[Report]
>>106571216
Can it do Japanese sex, moaning, and blowjob noises?
If no, it's worthless
Anonymous
9/13/2025, 5:47:49 AM
No.106571266
[Report]
>>106571243
VibeVoice can, but no api support yet
Anonymous
9/13/2025, 6:02:45 AM
No.106571337
[Report]
>>106571379
>>106571243
>braindead coomer
Anonymous
9/13/2025, 6:04:34 AM
No.106571347
[Report]
>>106571466
There arent any coomers here. We are all using this technology safely and to enhance our lives and work abilities.
Anonymous
9/13/2025, 6:08:31 AM
No.106571360
[Report]
>>106571070
>what is the best way to let the model browse the web to search for info?
MCP
Anonymous
9/13/2025, 6:12:20 AM
No.106571379
[Report]
>>106571337
gooners are the reason AI has advanced so much
a 4chan holo gooner invented chain of thought
Anonymous
9/13/2025, 6:27:17 AM
No.106571466
[Report]
>>106571553
>>106571347
>There arent any coomers here
Sorry I was offline for a bit, I'm back now.
Anonymous
9/13/2025, 6:39:30 AM
No.106571553
[Report]
>>106571961
>>106571466
show me your coom
Anonymous
9/13/2025, 6:57:48 AM
No.106571656
[Report]
KH music came on in my playlist and I remembered the lyrics poster :)
Anonymous
9/13/2025, 7:09:06 AM
No.106571715
[Report]
>>106570892
Come on now, let the man rest
I made a Miku for you guys. Feel free to use at your leisure.
Can you guys give me a list of safetymaxxing companies so I know to ignore their model releases?
Anonymous
9/13/2025, 7:34:12 AM
No.106571849
[Report]
>>106571876
Anonymous
9/13/2025, 7:35:13 AM
No.106571852
[Report]
>>106571836
Pretty much everyone else except Mistral and Chinese..
Anonymous
9/13/2025, 7:35:14 AM
No.106571853
[Report]
>>106571876
Textless, exploitable version.
Anonymous
9/13/2025, 7:35:37 AM
No.106571854
[Report]
Anonymous
9/13/2025, 7:36:16 AM
No.106571856
[Report]
>>106571876
Exploitable transparency version.
Enjoy you are images of official /lmg/ mascot Hatsune Miku!
Anonymous
9/13/2025, 7:41:07 AM
No.106571876
[Report]
>>106571885
Anonymous
9/13/2025, 7:42:11 AM
No.106571883
[Report]
stay, cute normal poster
Anonymous
9/13/2025, 7:42:44 AM
No.106571885
[Report]
>>106571876
I'm sorry for contributing OC. Really, I am.
I'll go back to enjoying my chat with official /lmg/ roleplaying model Rocinante 1.1, the best roleplaying model made by the best finetuner, TheDrummer!
Anonymous
9/13/2025, 7:48:16 AM
No.106571916
[Report]
>>106571940
Anonymous
9/13/2025, 7:51:45 AM
No.106571940
[Report]
>>106571916
Yeah I'm happy with how that artist tag blend turned out.
The key is Namori. Namori tag makes anything cute.
Anonymous
9/13/2025, 7:55:08 AM
No.106571956
[Report]
>>106569281
fork it and edit out the homing beacon
Anonymous
9/13/2025, 7:56:20 AM
No.106571961
[Report]
Anonymous
9/13/2025, 8:06:12 AM
No.106572018
[Report]
>>106572081
>>106570867
wasn't the mokuro manga thing able to do this already?
>https://github.com/kha-white/mokuro
Anonymous
9/13/2025, 8:21:38 AM
No.106572081
[Report]
>>106572018
where are you supposed to get the high quality raws for this though
Anonymous
9/13/2025, 8:40:27 AM
No.106572139
[Report]
>>106572166
Anonymous
9/13/2025, 8:43:21 AM
No.106572145
[Report]
>>106568374
>>106568414
>>106568426
You do know there is still transformers and that has all the model support and where everything goes first, right? Most of the internet only mentions GGUF because they don't want to waste space to download the raw model and use AWQ for non-fined grained 4/8 bit inference because most people don't overspend for the amount of compute they get and are running <1-2k USD builds for these models.
Anonymous
9/13/2025, 8:45:44 AM
No.106572155
[Report]
>>106568789
Apple didn't win jack shit when it is slower per dollar and harder to use overall for anything <=128 GB of RAM than AMD's Strix Halo. Maybe their matmul implementation in the A19/M5 is worth a shit but I am leaning towards no unless proven otherwise given how shit Apple is at AI.
Anonymous
9/13/2025, 8:48:29 AM
No.106572165
[Report]
w y w a
y d h m s
p
o b
d g
Anonymous
9/13/2025, 8:49:02 AM
No.106572166
[Report]
>>106572139
b-but I make up for it with my prompting...
Anonymous
9/13/2025, 8:50:17 AM
No.106572171
[Report]
my prompts turns 30B models to 500B behemoths
Anonymous
9/13/2025, 9:17:59 AM
No.106572287
[Report]
>>106570867
Haha, nice to see my image still floating around.
8th character like other people said and also the KA hiragana torwards the end.
Damn 2 years and and they all still struggle.
In 2023 I thought we would have a local gaming buddy by now. That I can have in the background translating games with an overlay.
At least drummer finetunes are good enough for lunatranslator. That works pretty well most of the time.
I remember the old ATLAS translations back in the 00s. kek
Anonymous
9/13/2025, 9:57:55 AM
No.106572459
[Report]
>>106572468
>>106570867
It failed though. There is one big error and three small ones.
Anonymous
9/13/2025, 9:59:15 AM
No.106572468
[Report]
>>106572459
You're absolutely right anon! It really is a testament to your genius to point this out!
Anonymous
9/13/2025, 10:21:03 AM
No.106572569
[Report]
>>106572592
what did they mean by this
>>106572569
>everyone picking up mi50s and v100s despite the next major ML stack software releases for their vendors with AMD and Nvidia dropping them.
I don't get it at all, Even if you had to pay double the price, it is worth still having software support over trying hack things together after that point and praying the Vulkan backend is super optimized one day so you can keep using your card.
Anonymous
9/13/2025, 10:29:12 AM
No.106572601
[Report]
>>106572669
>>106572592
i meant the little green display but yes the gpu choice is also questionable
Anonymous
9/13/2025, 10:34:58 AM
No.106572637
[Report]
>>106572669
>>106572592
What could be the reasons for updating your drivers? The last time I assembled my LLM machine was last year and I had to downgrade drivers for better power savings, it still works to this day. The only thing I've heard about these drivers is that they become slower in newer versions, and power efficiency on idle has been broken for more than a year now
Anonymous
9/13/2025, 10:39:40 AM
No.106572653
[Report]
>>106572669
And when it comes to AMD drivers, if you find a version that somewhat works, you'd better never touch it again
Anonymous
9/13/2025, 10:41:59 AM
No.106572669
[Report]
>>106572729
>>106572601
Oh didn't notice. Yeah, won't comment on that. I still think microATX is way too small to fit cards like that even with blowers but I guess that's why noise is never a factor to consider.
>>106572637
Depends on what card you have. Ada and now Blackwell are still getting performance improvements and fixes. If you locked your hardware stack now especially on Blackwell, you're missing out on Nvidia actually providing value in unbreaking shit, although to be fair, it's shit they broke in the first place. CUDA also does get a bunch of API changes between major releases.
>>106572653
For AMD, you especially want to run nightly ROCm if you can build it yourself.
Of course, that's from a developer/tinkerer standpoint. If you want shit to just work, then okay, you do you in order to keep software stability at all costs.
Anonymous
9/13/2025, 10:47:35 AM
No.106572696
[Report]
>>106572729
just don't use AYYMD and you will be happy
llama.cpp CUDA dev
!!yhbFjk57TDr
9/13/2025, 10:52:56 AM
No.106572714
[Report]
>>106570386
Unless someone changed it when I wasn't looking, the default in llama.cpp is to use the GPU with index 0 as "main GPU".
Note that the order in which llama.cpp/ggml receives GPUs is not necessarily consistent with the one reported elsewhere.
>>106572592
Essentially all GPUs you buy are a depreciating asset.
Even if you have to replace them earlier and they end up as e-waste that may have been a better deal than buying and later selling a more expensive GPU.
Though as long as there are drivers I intend to maintain llama.cpp/ggml support for Pascal and Vega (Mi50).
Anonymous
9/13/2025, 10:56:20 AM
No.106572729
[Report]
>>106572669
Have you ever experienced a t/s increase after updating nvidia drivers?
>>106572696
People who buy AMD are either desperate enough or in it for the ride. Someone has to finish off that lunatic extra phantasm, you know
Anonymous
9/13/2025, 10:57:55 AM
No.106572734
[Report]
>>106570295
>So, a general instruct model lost to a model that was specialized for coding at coding, and that's supposed to be a mark against the general instruct model?
the general instruct usually did better than the previous coder focused model, yes.
Qwen 3 instructs (the general instruct, not coder) are better than 2.5 coder.
A new model being worse than previous model is a sign that the tech is stalling.