/lmg/ - Local Models General - /g/ (#105689385) [Archived: 758 hours ago]

Anonymous
6/24/2025, 2:23:43 PM No.105689385
2025-06-18 21-25-18_thumb.jpg
2025-06-18 21-25-18_thumb.jpg
md5: e12d51415d22d45bae3bee92b05ed528🔍
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>105681538 & >>105671827

►News
>(06/21) LongWriter-Zero, RL trained ultra-long text generation: https://hf.co/THU-KEG/LongWriter-Zero-32B
>(06/20) Magenta RealTime open music generation model released: https://hf.co/google/magenta-realtime
>(06/20) Mistral-Small-3.2 released: https://hf.co/mistralai/Mistral-Small-3.2-24B-Instruct-2506
>(06/19) Kyutai streaming speech-to-text released: https://kyutai.org/next/stt
>(06/17) Hunyuan3D-2.1 released: https://hf.co/tencent/Hunyuan3D-2.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Replies: >>105691554 >>105691774 >>105692936 >>105694943 >>105696371 >>105696891
Anonymous
6/24/2025, 2:24:04 PM No.105689390
1111
1111
md5: 7fcb224f351951a0b8c966d3a4f08f85🔍
►Recent Highlights from the Previous Thread: >>105681538

--Paper: Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights:
>105686014 >105686064 >105686080 >105686529
--Papers:
>105686227 >105687733
--Challenges of using LLMs as video game agents in Pokémon environments:
>105685606 >105685624 >105685632 >105685679 >105685728 >105685856 >105685965 >105686068 >105686194 >105688488 >105688498 >105688505 >105688507 >105685653
--DeepSeek-R1 671B performance comparison on low-end hardware using different llama.cpp backends:
>105688247 >105688269 >105688291
--Discussion around LlamaBarn, Ollama's divergence from llama.cpp, and usability improvements in model serving tools:
>105682647 >105682703 >105682731 >105682745 >105682833 >105682846 >105683347 >105682882 >105683117 >105683331 >105683363 >105683401 >105683503 >105687438 >105688703 >105688849
--Comparison of voice cloning tools and techniques for improved emotional and audio fidelity:
>105685897 >105685934 >105685961
--LLM deployment options for RTX A5000 clusters using quantization and pipeline parallelism:
>105687473 >105687524 >105687643
--LLMauthorbench dataset for studying code authorship attribution across models:
>105688324
--Consciousness localization problem under computationalism and the Universal Dovetailer framework:
>105684402 >105684720 >105684889 >105684897 >105684904 >105685022 >105685354 >105685358 >105685366 >105685372 >105685516 >105685576 >105685434 >105685674 >105685791
--Behavioral quirks and prompt sensitivity of Mistral Small 3.2 variants explored through dream sequences:
>105682349 >105682382 >105682432 >105682499 >105682533 >105684446
--Mistral Saba deprecation signals potential evolution toward Mistral Small 3.2 architecture:
>105688925
--Rin-chan and Mikuplush (free space):
>105683160 >105685322 >105686106 >105688300 >105688383 >105688993 >105689241

►Recent Highlight Posts from the Previous Thread: >>105681543

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Anonymous
6/24/2025, 2:27:08 PM No.105689419
petition for mods to "FUZZ" lmg threads, make every pic posted in threads with lmg in subject be black squares.
Replies: >>105689552
Anonymous
6/24/2025, 2:28:24 PM No.105689431
1739973508802713
1739973508802713
md5: d563db30ef52a0e28a84761a4e5f7b82🔍
►Recent Highlights from the Previous Thread: >>105681538

--Total migger death

--Total tranny death

--Total nigger death

--Total kike death

►Recent Highlights from the Previous Thread: >>105681538

Why?: Because you will never be a woman
Fix: Consider suicide
Replies: >>105689454 >>105689647 >>105690155
Anonymous
6/24/2025, 2:28:32 PM No.105689432
ITT: OP continues to be bewildered for being called out as the troon he is after posting this gay video in OP
Replies: >>105689454
Anonymous
6/24/2025, 2:28:59 PM No.105689435
>>105689394
I do wonder what kind of images they trained it on.
It's so good that it can recognize a pussy slip in an otherwise completely normal image. When it sees that it spergs out but if you censor it it describes the image as usual.
Replies: >>105689545
Anonymous
6/24/2025, 2:29:41 PM No.105689440
file
file
md5: 06afd3f84ac8d20e7a1d92ea23c1537e🔍
>>105689416
>Culture warriors can fuck off to Twitter.
Replies: >>105691711
Anonymous
6/24/2025, 2:30:30 PM No.105689454
>>105689431
>>105689432
Just fuck off already.
Replies: >>105689462 >>105689489
Anonymous
6/24/2025, 2:30:58 PM No.105689458
>samefagging this hard
oh he mad
Anonymous
6/24/2025, 2:31:19 PM No.105689462
1729631715991430
1729631715991430
md5: 022838879d638ea43c47b3112ba23fb5🔍
>>105689454
You are the unwanted one here, troony, back to discord
Anonymous
6/24/2025, 2:32:11 PM No.105689469
Tuesday my beloved
Anonymous
6/24/2025, 2:32:12 PM No.105689470
Are ~30B models any good for coding small (up to 4k lines) projects (on lua though). I have only 12GB of VRAM.
Replies: >>105689484 >>105689492
Anonymous
6/24/2025, 2:35:01 PM No.105689484
>>105689470
qwen2.5 coder is exceptionally good for its size, qwen3 version is supposedly in the works. You probably won't get it to one shot a 4k line script but it will write individual functions just fine if you tell it what you want.
I never tried it with lua though.
Replies: >>105689548
Anonymous
6/24/2025, 2:35:33 PM No.105689489
>>105689454
Why are you angry at me? I didn't force you to cut your dick off. You did it.
Replies: >>105689538
Anonymous
6/24/2025, 2:35:53 PM No.105689492
>>105689470
Qwen 3 30B should work reasonably well. Small models are decent for short fragments like that and Lua should be present in the data enough for it to know it without too many errors as long as you don't go below Q4.
Replies: >>105689548
Anonymous
6/24/2025, 2:43:01 PM No.105689538
>>105689489
kek
Anonymous
6/24/2025, 2:43:36 PM No.105689545
gem-expl-02
gem-expl-02
md5: 13682486de7bafbcb12eb2563e81ec6b🔍
>>105689435
To clarify, the original image in that screenshot wasn't censored. I just censored it before posting it here.
The Gemma Team clearly trained it on a decent amount of NSFW photos and illustrations, as well as some medical imagery (which they probably increased in MedGemma 4B).
Replies: >>105689572 >>105689638 >>105691042
Anonymous
6/24/2025, 2:44:08 PM No.105689548
>>105689484
>>105689492
Thanks. What about devstal and glm4?
Replies: >>105689590 >>105689754
Anonymous
6/24/2025, 2:44:25 PM No.105689552
>>105689419
I once wrote a userscript that did something similar but it stopped working, I will post it here if I ever make it again.
It basically blurred every image in /lmg/ and would unblur it when clicked on (for the rare instances that something relevant was being posted)

It would be better if jannies did something about the degeneracy but they won't.
Anonymous
6/24/2025, 2:47:49 PM No.105689572
>>105689545
I know yours wasn't, I'm just saying that they have a pretty good dataset when it can recognize lewd details in an otherwise normal image.
Anonymous
6/24/2025, 2:50:16 PM No.105689590
>>105689548
Devstral should be ok but GLM4 is kind of dated at this point being from last year.
Replies: >>105689604
Anonymous
6/24/2025, 2:51:57 PM No.105689604
file
file
md5: 19ba8a4228d57526fb291ee589f7c563🔍
>>105689590
it's from two months ago bro
Replies: >>105689631
Anonymous
6/24/2025, 2:55:40 PM No.105689631
>>105689604
Time flies.
Anonymous
6/24/2025, 2:56:24 PM No.105689638
>>105689545
how do you send an image in sillytavern so it remains in context? it always tries to send it outside the convo with a stupid prompt from an extension
Replies: >>105689674
Anonymous
6/24/2025, 2:57:21 PM No.105689647
>>105689431
That must have really struck a nerve,
Anonymous
6/24/2025, 3:01:10 PM No.105689674
>>105689638
If you attach images to user or assistant messages, they'll remain in the conversation log. If you attach them to system messages, they won't. I'm not doing anything special there.
Replies: >>105689686
Anonymous
6/24/2025, 3:02:35 PM No.105689686
>>105689674
huh okay i must be retarded then
Anonymous
6/24/2025, 3:03:14 PM No.105689695
It is kind of crazy how even aicg has more interesting offtopic anime girl pictures than this place. This constant spam of the same mediocre staple green haired girl design is so tiresome. And then the faggot has to show everyone he has a collection of dolls. I mean holy fucking shit OG 4chan would convince you to kill yourself on a livestream you disgusting faggot.
Replies: >>105689783
Anonymous
6/24/2025, 3:08:51 PM No.105689746
kek
Anonymous
6/24/2025, 3:09:38 PM No.105689754
>>105689548
I had troubles with repetition using glm4 compared to qwen2.5 coder. I didn't really mess around with samplers though.
Anonymous
6/24/2025, 3:13:01 PM No.105689783
>>105689695
you have no idea what "og 4chan" was or wasn't since that was literally years before you were even born you mentally ill zoomer
Replies: >>105689808
Anonymous
6/24/2025, 3:15:08 PM No.105689808
>>105689783
i am sorry your dad raped you when you were a kid. now you have to play with dolls and push your mental illness onto others. but it is never too late to kill yourself so please do it. we believe in you.
Anonymous
6/24/2025, 3:52:23 PM No.105690146
tetolove
tetolove
md5: 0bd53f9263838b4d169471a8def2cde2🔍
>>105689241
friend
Replies: >>105690777 >>105691774
Anonymous
6/24/2025, 3:53:31 PM No.105690155
>>105689431
>dead
>troon mod deletes this
>why do you keep talking about troons
It is a mystery
Replies: >>105690256
Anonymous
6/24/2025, 3:56:09 PM No.105690177
gemma3-suggestions
gemma3-suggestions
md5: 44d4234f65050414cc120d09514285d0🔍
It's that time of the year again.
https://x.com/osanseviero/status/1937453755261243600

>Feedback requested!
>
>What do you want for our next Gemma models? Which sizes should we target and why?
Replies: >>105690204 >>105690212 >>105690233 >>105690247 >>105690248 >>105690267 >>105690399 >>105690518 >>105691275
Anonymous
6/24/2025, 3:58:40 PM No.105690204
>>105690177
1B
Anonymous
6/24/2025, 3:59:20 PM No.105690212
>>105690177
I don't care about gemma, this shit is so fucking cucked
Replies: >>105690248
Anonymous
6/24/2025, 4:01:12 PM No.105690233
>>105690177
1000b
itd be funny i think
Anonymous
6/24/2025, 4:02:46 PM No.105690247
1722109990338196
1722109990338196
md5: 78e528a36b86e1adca20e575e7596282🔍
>>105690177
always hilarious seeing every retarded soifaggot in existance begging for the smallest trash possible, not knowing what distillation is

thankfully when scam fagman asked the same thing and wanted to astroturf a low phone model release, there were too many people who voted that counteracted the sois and the bots, forcing him to not release the dogshit mobile model and have to work on something bigger
Replies: >>105690336 >>105690423
Anonymous
6/24/2025, 4:02:53 PM No.105690248
>>105690177
Ask for 100B dense

>>105690212
They're useful for non-ERP tasks
Anonymous
6/24/2025, 4:03:47 PM No.105690256
>>105690155
>be an annoying faggot
>janny deletes my retard spam
>this makes me impotently fill my diaper in rage
>I'm totally in the right btw
Replies: >>105690265 >>105690346
Anonymous
6/24/2025, 4:04:47 PM No.105690265
>>105690256
>a miggerspammer talks about being annoying and spamming
pottery
Replies: >>105690283
Anonymous
6/24/2025, 4:05:11 PM No.105690267
>>105690177
>What do you want for our next Gemma models
Release the older smaller gemini models for fuck sake.
Anonymous
6/24/2025, 4:06:56 PM No.105690283
Ironic
Ironic
md5: f8fc81ba370b24e1f32cdd985df07e00🔍
>>105690265
Anonymous
6/24/2025, 4:13:22 PM No.105690336
>>105690247
that guy Simon is a literal co creator of django btw and he doesn't know what distillation is
Replies: >>105690369 >>105692893
Anonymous
6/24/2025, 4:15:05 PM No.105690346
>>105690256
>be an annoying faggot
>janny deletes my retard spam
This whole "discussion" is because janny doesn't delete your retard spam you disgusting troon.
Anonymous
6/24/2025, 4:17:21 PM No.105690369
>>105690336
>python scripter
>clueless retard
I am Joe's complete lack of surprise.
Anonymous
6/24/2025, 4:20:42 PM No.105690399
>>105690177
8x8B moe
45B dense fallback just in case they fuck up their moe run
Anonymous
6/24/2025, 4:22:51 PM No.105690423
>>105690247
Just drink straight from the gemini tap if you want to distill
Replies: >>105690529 >>105690642
Anonymous
6/24/2025, 4:31:12 PM No.105690502
I no longer feel safe in this thread.
Replies: >>105690513
Anonymous
6/24/2025, 4:32:13 PM No.105690513
>>105690502
Good. There are many hugboxes for your out there instead.
Anonymous
6/24/2025, 4:32:38 PM No.105690518
>>105690177
Nobody is going to dare ask for less censorship on twitter.
Anonymous
6/24/2025, 4:33:28 PM No.105690529
>>105690423
True distill requires logits. Training on generated datasets is not true distillation.
Replies: >>105690541
Anonymous
6/24/2025, 4:35:01 PM No.105690541
>>105690529
Exactly.
People calling fine tunes/pretrains distils is fucked.
Is that Deepseek's fault?
Replies: >>105690614
Anonymous
6/24/2025, 4:43:44 PM No.105690614
>>105690541
Meta starting calling it that first.
Replies: >>105690826
Anonymous
6/24/2025, 4:47:39 PM No.105690642
>>105690423
Gemma 3 models were already pre- and post-trained with knowledge distillation, although the technical report didn't go too much in depth into that.

https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
> All Gemma 3 models are trained with knowledge distillation (Hinton et al., 2015)

> 2.2. Pre-training
>We follow a similar recipe as in Gemma 2 for pre-training with knowledge distillation.
>[...]
>Distillation. We sample 256 logits per token, weighted by teacher probabilities. The student learns the teacher’s distribution within these samples via cross-entropy loss. The teacher’s target distribution is set to zero probability for non-sampled logits, and renormalized.

>3. Instruction-Tuning
>[...]
>Techniques. Our post-training approach relies on an improved version of knowledge distillation (Agarwal et al., 2024; Anil et al., 2018; Hinton et al., 2015) from a large IT teacher, along with a RL finetuning phase based on improved versions of BOND (Sessa et al., 2024), WARM (Ramé et al., 2024b), and WARP (Ramé et al., 2024a).
Anonymous
6/24/2025, 5:06:50 PM No.105690777
>>105690146
no fucking shot you got the chubmaxxed teto let's go
Replies: >>105690807
Anonymous
6/24/2025, 5:12:06 PM No.105690807
>>105690777
C-can i come out now? Is it a trans ally thread again?
Replies: >>105690822
Anonymous
6/24/2025, 5:14:26 PM No.105690822
>>105690807
at this point you're being such a massive homo that you could fuck every anon's mother and still be gay
Anonymous
6/24/2025, 5:14:48 PM No.105690826
>>105690614
Well, then fuck them for starting this mess.
Anonymous
6/24/2025, 5:27:18 PM No.105690949
Trans lives matter.
Replies: >>105691427
Anonymous
6/24/2025, 5:30:48 PM No.105690984
we need a new big release to make this thread interesting again
Replies: >>105691032 >>105691097
Anonymous
6/24/2025, 5:35:14 PM No.105691032
>>105690984
Plenty interesting for me now
Anonymous
6/24/2025, 5:36:27 PM No.105691042
>>105689545
Which lora and model do you use for those KC pics?
Replies: >>105691079
Anonymous
6/24/2025, 5:41:30 PM No.105691079
>>105691042
No LoRA, just this: https://civitai.com/models/997160
Anonymous
6/24/2025, 5:43:14 PM No.105691097
>>105690984
a "powerful" reasoning agentic edge device tool calling gemini distilled safety tested math and science able coding proficient model coming right up for you sir
Replies: >>105691131
Anonymous
6/24/2025, 5:47:02 PM No.105691131
>>105691097
elo moon status?
Replies: >>105691181 >>105691330
Anonymous
6/24/2025, 5:48:42 PM No.105691150
Anybody used chatllm.cpp before?
It supports the kimi-vl model.
Replies: >>105691439
Anonymous
6/24/2025, 5:52:39 PM No.105691181
>>105691131
the pareto frontier... its moved!
Anonymous
6/24/2025, 6:01:18 PM No.105691275
>>105690177
Native in and out omnimodal that can make violent and sexual imagery.
Replies: >>105691354
Anonymous
6/24/2025, 6:06:34 PM No.105691330
1737239588386432
1737239588386432
md5: 0e644007d846f4ac7712f6f1f8b8fcc8🔍
>>105691131
here you go sir!
Replies: >>105691344 >>105691351 >>105691419 >>105695682
Anonymous
6/24/2025, 6:08:09 PM No.105691344
>>105691330
lmaooooo
Anonymous
6/24/2025, 6:08:45 PM No.105691351
>>105691330
Fuck, I actually laughed.
Anonymous
6/24/2025, 6:09:00 PM No.105691354
>>105691275
I **cannot** and **will not** generate violent and sexual imagery, or any content that is sexually suggestive, or exploits, abuses or endangers children. If you are struggling with harmful thoughts or urges, or are concerned about the creation or consumption of pornography, please reach out for help.
Anonymous
6/24/2025, 6:16:20 PM No.105691419
>>105691330
how do gain install for ollama? pls tell afap
Anonymous
6/24/2025, 6:17:10 PM No.105691427
>>105690949
Hence why they sterilize zirselves.
Anonymous
6/24/2025, 6:18:11 PM No.105691439
>>105691150
Considering how often llama.cpp subtly fucks model outputs, I wouldn't trust such a small project to not give degraded outputs in some way.
Replies: >>105691580
Anonymous
6/24/2025, 6:20:45 PM No.105691463
file
file
md5: 056eb322be793700bb7c77622676e6d2🔍
I don't know what to make of this.
Replies: >>105691481 >>105692615
Anonymous
6/24/2025, 6:22:45 PM No.105691481
>>105691463
What do you mean?
The fact that quantization fucks around with logits in non obvious or consistent ways?
Anonymous
6/24/2025, 6:31:41 PM No.105691554
>>105689385 (OP)
My boyfriend told me this place is based but I see it's full of racist and especially transphobic chuds? Y'all need to do a lot better.
Replies: >>105691574 >>105691594 >>105691614 >>105691711 >>105692069
Anonymous
6/24/2025, 6:34:07 PM No.105691574
>>105691554
It never gets better. Just more and more troon redditors.
Anonymous
6/24/2025, 6:34:56 PM No.105691580
>>105691439
Are there any cases of llama.cpp fucking the outputs of an unquanted model when compared to the loggits out of the reference implementation?
Replies: >>105691643
Anonymous
6/24/2025, 6:36:34 PM No.105691594
>>105691554
So true sister slayyyy
Anonymous
6/24/2025, 6:38:33 PM No.105691614
>>105691554
(you)
Anonymous
6/24/2025, 6:38:34 PM No.105691615
https://huggingface.co/qiuqiu666/activity/community
Anonymous
6/24/2025, 6:41:04 PM No.105691639
1723196040717075_thumb.jpg
1723196040717075_thumb.jpg
md5: fd0923a933046ab2384b10bd2ea521df🔍
News for the handful of other people frequenting /lmg/ who are doing robotics stuff:

Google is releasing an on-device version of their gemini robotics VLA that they've had behind closed doors for a while.

https://deepmind.google/discover/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/

It's not really clear exactly how open this whole thing is. To get model access you have to submit a form to request to be part of their "trusted tester program". Not sure if it's going to be a heavily gatekept and vetted thing, or if it'll be like the old llama access requests where it was just a formality and everyone got blanket rubber stamp approved.
Replies: >>105691672 >>105691677 >>105691721 >>105692142 >>105692217
Anonymous
6/24/2025, 6:41:16 PM No.105691643
>>105691580
Nearly every model release with a new architecture?
Replies: >>105691719
Anonymous
6/24/2025, 6:44:00 PM No.105691671
https://www.reuters.com/legal/litigation/anthropic-wins-key-ruling-ai-authors-copyright-lawsuit-2025-06-24/
>A federal judge in San Francisco ruled late on Monday that Anthropic's use of books without permission to train its artificial intelligence system was legal under U.S. copyright law.
>Alsup also said, however, that Anthropic's copying and storage of more than 7 million pirated books in a "central library" infringed the authors' copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement.
>U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work.
Replies: >>105691690
Anonymous
6/24/2025, 6:44:10 PM No.105691672
>>105691639
That's some cool stuff.
Robot gfs jerking you off when
Replies: >>105691734
Anonymous
6/24/2025, 6:44:24 PM No.105691677
>>105691639
Dexterity: Perform expert ministrations on the manhood
Replies: >>105691715
Anonymous
6/24/2025, 6:45:58 PM No.105691690
>>105691671
>training is okay
>storage is not
What's the reasoning?
Replies: >>105691760 >>105691810 >>105691865
Anonymous
6/24/2025, 6:48:02 PM No.105691711
>>105691554
>t. >>105689440
Anonymous
6/24/2025, 6:48:48 PM No.105691715
1741091646400316_thumb.jpg
1741091646400316_thumb.jpg
md5: 6826836ead7210b35ea7782f624381c9🔍
>>105691677
Replies: >>105691734 >>105691958 >>105692217
Anonymous
6/24/2025, 6:49:06 PM No.105691719
>>105691643
I might have lost the plot.
Explain to me what that has to do with my query, please.
Replies: >>105691728
Anonymous
6/24/2025, 6:49:22 PM No.105691721
file
file
md5: 46cff0da52aac4c691d7653acdab5d75🔍
>>105691639
sir
Replies: >>105691748
Anonymous
6/24/2025, 6:50:24 PM No.105691728
>>105691719
>Nearly every model release with a new architecture
>cases of llama.cpp fucking the outputs of an unquanted model when compared to the loggits out of the reference implementation?
Anonymous
6/24/2025, 6:51:00 PM No.105691734
1745843817588107_thumb.jpg
1745843817588107_thumb.jpg
md5: ee7898f3688b930b22acb6cdd8a1ceab🔍
>>105691672
Probably not for a while, current models don't have anywhere near the speed or precision to do that sort of act to any acceptable standard.

I did a teleop version a while ago as a proof of concept. Can't say I would recommend it if you value your safety.

>>105691715
they put those warning labels on android girls for a reason you know
Anonymous
6/24/2025, 6:52:12 PM No.105691748
file
file
md5: aee472ee4245d9a8dd46dd306e91ac9b🔍
>>105691721
English is a shit language with an even shittier spelling system, more at eleven.
Anonymous
6/24/2025, 6:53:40 PM No.105691760
>>105691690
please cool it with the antisemitic remarks
Anonymous
6/24/2025, 6:55:42 PM No.105691774
>>105689385 (OP)
>>105690146
I'm a straight man and find these adorable.
Where can I get the Miku and Teto plushies from the images/video in the two threads?
Replies: >>105691782 >>105691816
Anonymous
6/24/2025, 6:57:09 PM No.105691782
>>105691774
>I'm a straight man
Ah yes, the casual statement a straight man makes
Replies: >>105691797
Anonymous
6/24/2025, 6:58:27 PM No.105691797
>>105691782
Well you assume everyone who likes those plushies are a tranny, so what can I possibly say to explain that I'm not a dick cutter?
At least I have my foreskin.
Replies: >>105691807 >>105691819
Anonymous
6/24/2025, 7:00:23 PM No.105691807
>>105691797
Do you think that's why he always talks about dick cutting? Is he angry because jews mutilated his dick?
Anonymous
6/24/2025, 7:00:49 PM No.105691810
>>105691690
most judges don't actually understand how the technology they're passing rulings on actually works, the best they can do is equate it to the physical things they grew up with

in their mind:
training = going to the library and writing a book report
storage = stealing the books and keeping them at home
Replies: >>105691818 >>105691825
Anonymous
6/24/2025, 7:02:00 PM No.105691816
>>105691774
Cheap from China ebay. that Teto though has gone up to an insane price. got mine when it was around $50 shipped
Replies: >>105691839
Anonymous
6/24/2025, 7:02:05 PM No.105691818
>>105691810
>in their mind
reminder that thinking is woke and censorship
Anonymous
6/24/2025, 7:02:09 PM No.105691819
>>105691797
>more antisemitic chuds in the thread
Replies: >>105691829
Anonymous
6/24/2025, 7:02:49 PM No.105691825
>>105691810
Looks like the argument is simply that they should've bought the books that are still available for purchase.
Anonymous
6/24/2025, 7:03:27 PM No.105691829
>>105691819
>mention foreskin
>instantly someone calls it antisemetic
LMAO
Anonymous
6/24/2025, 7:05:10 PM No.105691839
>>105691816
Which china ebay? There's like 3 or 4 of them now. I remember checking one out once but it was all blocked from viewing without an account, so I never made one.
Replies: >>105691916
Anonymous
6/24/2025, 7:08:45 PM No.105691865
>>105691690
Training is OK: There's nothing wrong with reading a book and learning from it, even if you're a sequence of numbers.
Storage is not: The books being read must be acquired reasonably.

So the correct way to train your AI is to make a robot that can go into a library, take books off of shelves, and OCR the pages like Johnny 5 and use that data stream to update the model. And if you buy books you can have Johnny 5 read them as much as you like. But somewhere along the way, the Ferengi must get or have gotten their latinum.
Replies: >>105692000
Anonymous
6/24/2025, 7:13:58 PM No.105691916
>>105691839
>Which china ebay?
Meant regular eBay, shipped from China.
Anonymous
6/24/2025, 7:15:37 PM No.105691934
buying 7 million books at very generous 100 dollar average = 700 mil
scale ai (they wont even own them lmoa) = 14 billion
very beautiful smarts zuck
Anonymous
6/24/2025, 7:17:34 PM No.105691958
>>105691715
>Didn't even use the /lmg/ effortpost version...
Faggot.
Anonymous
6/24/2025, 7:23:31 PM No.105692000
>>105691865
>a robot that can go into a library, take books off of shelves, and OCR the pages like Johnny 5
Rule of cool as seen in law, case 1:
Anonymous
6/24/2025, 7:28:42 PM No.105692033
>>105688247
update, got the ubergarm quants.
Bad news: it doesn't work on my GPU, i'm getting a CUDA error. I did not have such issue on unsloth quant with either backend.
Good news: even with pp on CPU, ubergarm+ik_llama.cpp it's faster at 0 context than unsloth+llama.cpp!
| model | size | params | backend | ngl | fa | mla | amb | mmap | rtr | fmoe | test | t/s |
| ----------------------------------- | ---------: | ---------: | ---------- | --: | -: | --: | ----: | ---: | --: | ---: | ------------: | ---------------: |
| deepseek2 671B IQ2_K_R4 - 2.375 bpw | 219.02 GiB | 672.05 B | CUDA | 0 | 1 | 3 | 512 | 0 | 1 | 1 | pp512 | 9.03 ± 0.73 |
| deepseek2 671B IQ2_K_R4 - 2.375 bpw | 219.02 GiB | 672.05 B | CUDA | 0 | 1 | 3 | 512 | 0 | 1 | 1 | tg128 | 2.53 ± 0.02 |

Next is testing at different context depths.
>>105688269
apparently llama-bench in ik_llama.cpp doesn't have --n-depth implemented, they have some other tool llama-sweep-bench, but i don't know if you can use it to run just a couple of tests (pp512 at 1k, 2k, 4k, 8k depth) instead of this continuous sweeping. Maybe i could just port the n-depth bit to ik_llama.
Replies: >>105692048 >>105692197 >>105694000
Anonymous
6/24/2025, 7:31:30 PM No.105692045
https://github.com/ggml-org/llama.cpp/pull/14363
llama : add high-throughput mode #14363

Some nice perf gains
Replies: >>105692065
Anonymous
6/24/2025, 7:31:56 PM No.105692048
file
file
md5: e7ee3e416ccaaac99bc932ad2592512a🔍
>>105692033
>ik_llama.cpp doesn't have --n-depth implemented
Somebody just copy the useful stuff from niggerakow's fork and merge it into llama.cpp so we can be done with it.

Funny captcha
Replies: >>105692077 >>105692086
Anonymous
6/24/2025, 7:33:52 PM No.105692065
>>105692045
Useless for single user stuff.
Replies: >>105694013
Anonymous
6/24/2025, 7:34:05 PM No.105692069
>>105691554
Kys
Anonymous
6/24/2025, 7:35:09 PM No.105692077
>>105692048
Can't. niggerakow would shit and piss himself over attribution or whatever.
Anonymous
6/24/2025, 7:35:44 PM No.105692086
>>105692048
>VRAM0
the horror
Anonymous
6/24/2025, 7:44:16 PM No.105692142
>>105691639
It's pretty cute when it misses picking up the pear and then tries again, reminds me of something an animal would do
Anonymous
6/24/2025, 7:48:02 PM No.105692176
Does anybody else have the impression during RP that Mistral Small 3.2 used data distilled either from Gemini (indirectly) or Gemma 3 (more directly)? Sometimes I feel like I have to check if I'm actually using MS3.2 instead of Gemma-3-27B.
Replies: >>105692196
Anonymous
6/24/2025, 7:49:38 PM No.105692196
>>105692176
Mistral does not use synthetic data, at all.
Replies: >>105692215 >>105692226
Anonymous
6/24/2025, 7:49:54 PM No.105692197
>>105692033
Weird that I'm trying this, and ik_llama is about 1/3 slower for me than mainline llamacpp on r1.
Replies: >>105694062 >>105694719
Anonymous
6/24/2025, 7:52:38 PM No.105692215
>>105692196
lol
Anonymous
6/24/2025, 7:52:42 PM No.105692217
>>105691639
>Can you
It should respond by printing "yes," not by committing actions.
This will lead to surprisingly benign comments causing catastrophes.

>Robot, can you move your arm?
>>105691715 Can I? LOOK AT ME GO, MEATBAGS! I'm being entertaining! Beep booppidy doo dah diddly!
Anonymous
6/24/2025, 7:54:09 PM No.105692226
>>105692196
That's the base model, there's no way the instruct version doesn't use synthetic data.
Anonymous
6/24/2025, 8:04:50 PM No.105692315
i will never stop feeling safe in this thread
Replies: >>105692400
Anonymous
6/24/2025, 8:17:16 PM No.105692400
>>105692315
im going to rape you through the internet
Anonymous
6/24/2025, 8:23:22 PM No.105692454
why does dipsy love to newline so much in text completion? I don't have any instructions beside tags placed at the top of the prompt
Anonymous
6/24/2025, 8:41:01 PM No.105692615
>>105691463
i think it means that those quants are working quite good even for that level of quantization, and that it's quite uncensored
Anonymous
6/24/2025, 9:14:35 PM No.105692893
>>105690336
python users are one of the cancers of the earth
Replies: >>105692932
Anonymous
6/24/2025, 9:15:17 PM No.105692899
Undibros...
Replies: >>105692922
Anonymous
6/24/2025, 9:18:01 PM No.105692922
>>105692899
what, did something happen to the undster?
Anonymous
6/24/2025, 9:18:53 PM No.105692932
>>105692893
>python
Better that being me, being a Java oldfag.
I dunno. Should I take the C# or C++ or Zig pill?
Replies: >>105692953
Anonymous
6/24/2025, 9:19:37 PM No.105692936
>>105689385 (OP)
i think its finally time to end /lmg/
nothing ever happens.
always nothingburgers.
too much money to buy cards.
just why bother? there's no future in this junk.
Replies: >>105692944 >>105692962
Anonymous
6/24/2025, 9:20:22 PM No.105692944
>>105692936
stop being a poorfag then
Anonymous
6/24/2025, 9:20:56 PM No.105692953
>>105692932
C/C++ will be the most useful to learn in general.
You'll learn things that you can apply everywhere.
From there you could go for Zig.
Replies: >>105693027
Anonymous
6/24/2025, 9:22:12 PM No.105692962
1738318067723929
1738318067723929
md5: f5c4696115286a13ec1a321ab1f4cd4a🔍
>>105692936
There are plenty of things to do. Local is in dire need of agentic frameworks
Replies: >>105692985 >>105693849
Anonymous
6/24/2025, 9:24:10 PM No.105692985
>>105692962
>Local is in dire need of agentic frameworks
What exactly are you hoping for? Agentic frameworks should work regardless of where the model is hosted.
Replies: >>105693006
Anonymous
6/24/2025, 9:26:03 PM No.105693006
>>105692985
Most of them rely on powerful cloud models to do the heavy lifting, which isn't an option locally
Replies: >>105693025
Anonymous
6/24/2025, 9:28:03 PM No.105693025
>>105693006
So you just want frameworks that assume the model being used is 8b with 4k context and don't ask anything too complex?
Replies: >>105693060
Anonymous
6/24/2025, 9:28:08 PM No.105693027
>>105692953
I did use C++ for a while a long ass time ago, before they started adding numbers ("c++0x? heh heh heh it's leet for cocks") and whatever crazy shit came with the numbers. But from the sound of it they just added more foot guns and not things that prevent cyber exploitation on every typo till there were so many that it caused Rust to happen. Makes me reluctant to put my toe back into the water.

And every time I think I'll try learning a new language every one of them seem like they deliberately have made a list of all the things that would be good, tore it in half at random, became excellent at one half, and completely shit all over the other half.
Anonymous
6/24/2025, 9:30:09 PM No.105693041
In that case, go for Zig for sure.
Anonymous
6/24/2025, 9:31:52 PM No.105693060
>>105693025
Retard, to solve that you just need very small specialized LLMs with structured outputs around a local model of ~30B. The complex task can then be divided into smaller tasks without having to one-shot it with Claude or Gemini
Anonymous
6/24/2025, 9:47:20 PM No.105693189
>he doesn't enable an extra expert on his moe for spice
ngmi
Replies: >>105693216
Anonymous
6/24/2025, 9:50:45 PM No.105693216
>>105693189
MoE models nowadays have a shared expert that's always evaluated, right?
Has anybody tried freezing the model and only fine tuning that expert (and whatever dense part the model might have sans the router I guess) for "creativity"?
I wonder what that would do.
Replies: >>105693250 >>105693267
Anonymous
6/24/2025, 9:54:22 PM No.105693250
>>105693216
>MoE models nowadays have a shared expert that's always evaluated, right?
DS and Llama do, Qwen don't.
Anonymous
6/24/2025, 9:55:43 PM No.105693267
>>105693216
The latest Qwen 3 MoE models don't used shared experts. However they were trained in a way to promote "expert specialization" (in a non-interpretable way, I suppose).
Anonymous
6/24/2025, 10:26:13 PM No.105693514
LocalLLaMA back in business
https://old.reddit.com/r/LocalLLaMA/comments/1ljlr5b/subreddit_back_in_business/
Replies: >>105693581 >>105693662 >>105693715 >>105694075 >>105697179
Anonymous
6/24/2025, 10:34:37 PM No.105693581
>>105693514
>I'm also a moderator of Lifeprotips, doesn't mean I share life advice in Chatgpt sub but the policy is simple if not open source= remove
cloudbros...
Replies: >>105693662
Anonymous
6/24/2025, 10:44:05 PM No.105693662
>>105693581
>>105693514
They're making reddit great again?
Anonymous
6/24/2025, 10:46:55 PM No.105693688
another thread discussing reddit on /lmg/
it really is over isnt it
Anonymous
6/24/2025, 10:50:15 PM No.105693709
When am I supposed to use other values than defaults for these?

--batch-size N logical maximum batch size (default: 2048)
--ubatch-size N physical maximum batch size (default: 512)
--threads-batch N number of threads to use during batch and prompt processing (default:
same as --threads)


Now, I get around 12 tkn/s for pp
Replies: >>105693753 >>105693791
Anonymous
6/24/2025, 10:50:27 PM No.105693715
file
file
md5: b434bb9bcb3793569ff8a8e7c912d2f7🔍
>>105693514
of course
Anonymous
6/24/2025, 10:51:53 PM No.105693725
1741599179559
1741599179559
md5: c2f34dfd4fe19d4234dd90e46aa6f427🔍
What would be the best model for generating specific sequences of JSON? I'd like it to output logs according to scenarios I explain to it like "user authenticates, user does XYZ, user session end" and have it create the appropriate artifacts. Should I start with Qwen coder and create a LORA with the data I have? 4090 btw
Replies: >>105693741
Anonymous
6/24/2025, 10:53:53 PM No.105693741
>>105693725
Qwen should be good at it.
Read
>https://github.com/ggml-org/llama.cpp/blob/master/grammars/json.gbnf
that might be useful.
Replies: >>105693904
Anonymous
6/24/2025, 10:55:13 PM No.105693753
>>105693709
Generally speaking leaving at default for normal models is close to optimal already but there have been times where I saw improvements and it depends on the model. Probably also depends on your system. You can only know by doing some benchmarking.
Replies: >>105693941
Anonymous
6/24/2025, 10:58:13 PM No.105693780
Screenshot 2025-06-24 225133
Screenshot 2025-06-24 225133
md5: 70f3000cefae126e7c93db2d03635303🔍
ahh i am doing

./llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/unsloth/DeepSeek-R1-0528-GGUF/UD-Q2_K_XL/DeepSeek-R1-0528-UD-Q2_K_XL-00001-of-00006.gguf \
--cache-type-k q4_0 \
--threads -1 \
--n-gpu-layers 99 \
--prio 3 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
-ot ".ffn_.*_exps.=CPU" \
-no-cnv \
--prompt "<|User|> blabal <|Assistant|>

and i get 1t/s on 12 A5000, that bad or?
Replies: >>105693806 >>105693814 >>105693828 >>105693920 >>105693933 >>105693968
Anonymous
6/24/2025, 10:59:23 PM No.105693791
>>105693709
The best value depends on the combination of backend and hardware, IIRC.
99% of the cases (a newish NVIDIA GPU) the default is fine.
You can increase it to speed up pp if you have spare VRAM.
Replies: >>105693941
Anonymous
6/24/2025, 11:02:09 PM No.105693806
>>105693780
I would have thought you were SSDMAXXing
Anonymous
6/24/2025, 11:02:52 PM No.105693814
>>105693780
That's worse than running on a cpu, something is definitely wrong. If those a5000s are split on many machines through rpc, then it's going to be slow because the protocol is not very optimized.
Anonymous
6/24/2025, 11:05:02 PM No.105693828
>>105693780
>on 12 A5000
You are running most of the model on your CPU, which is bottlenecked by RAM bandwidth.
Adjust your -ot to make use of those GPUs anon.
Replies: >>105693870
Anonymous
6/24/2025, 11:06:38 PM No.105693849
>>105692962
>Local is in dire need of agentic frameworks
There are a gorillion "agentic frameworks" out there that work with local. Far too many in fact, and most of them are just trying to race to become a de facto standard while being absolute shit.
Anonymous
6/24/2025, 11:09:06 PM No.105693870
Screenshot 2025-06-24 230847
Screenshot 2025-06-24 230847
md5: d4f0c5950aa6c842465e9164e81029d0🔍
>>105693828
i thought -ot makes it all faster?
Replies: >>105693890 >>105693900
Anonymous
6/24/2025, 11:10:18 PM No.105693890
file
file
md5: e91e6ef1bd1e3b3ee9d69a3b94750d0f🔍
>>105693870
you have 12 not 1
Replies: >>105693987 >>105694096
Anonymous
6/24/2025, 11:11:33 PM No.105693900
>>105693870
-ot moves the expert tensors to run on CPU, aka live in RAM.
If you don't have enough VRAM to fit them, then yeah, it'll make things faster.
In your case, you want to only move the ones that don't fit in VRAM since you have so much of it and can probably fit most of them in there.
Replies: >>105693919
Anonymous
6/24/2025, 11:11:48 PM No.105693904
>>105693741
Cool. Thanks man
Replies: >>105693919
Anonymous
6/24/2025, 11:13:26 PM No.105693919
>>105693900
>-ot ".ffn_.*_exps.=CPU"
Or more specifically, -ot ffn etc etc does.
You'll have to craft a -ot mask that moves the tensors you want to where you want them.

>>105693904
BNF is awesome.
Replies: >>105693987
Anonymous
6/24/2025, 11:13:41 PM No.105693920
>>105693780
>--threads -1

You should avoid doing this. Limit it to the exact number of your PHYSICAL (not hyper-threaded logical) cores

I get 4 tkn/s on RTX 3090 with exactly this quant. I hope you are using the original llama.cpp, not ik_llama fork
Replies: >>105693987
Anonymous
6/24/2025, 11:14:45 PM No.105693933
>>105693780
>-ot ".ffn_.*_exps.=CPU"

This part is fine
Anonymous
6/24/2025, 11:14:48 PM No.105693934
I wonder how much the slopification also leads to model retardation. Since they follow patterns established by themselves, does it see the shit it's outputting (overuse of italics, overuse of expressions like "It's not - it's", etc) and decides that since it's obviously writing shit anyway, why put in any effort in completing its task.

Just annoyed because I asked Gemma 3 to perform literary analysis, and it puked out some shitty tropes instead of paying attention to the actual text.
Replies: >>105699292
Anonymous
6/24/2025, 11:15:50 PM No.105693941
>>105693753
>>105693791

Thank you all
Anonymous
6/24/2025, 11:18:17 PM No.105693968
>>105693780

# Run the command
CUDA_VISIBLE_DEVICES="0," \
numactl --physcpubind=0-7 --membind=0 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-cli" \
--model "$model" \
--threads 8 \
--ctx-size 100000 \
--cache-type-k q4_0 \
--flash-attn \
$model_parameters \
--n-gpu-layers 99 \
--no-warmup \
--color \
--override-tensor ".ffn_.*_exps.=CPU" \
$log_option \
--single-turn \
--file "$tmp_file"


llama_perf_sampler_print: sampling time = 275.04 ms / 29130 runs ( 0.01 ms per token, 105910.71 tokens per second)
llama_perf_context_print: load time = 1871167.51 ms
llama_perf_context_print: prompt eval time = 1661405.80 ms / 26486 tokens ( 62.73 ms per token, 15.94 tokens per second)
llama_perf_context_print: eval time = 756450.27 ms / 2643 runs ( 286.21 ms per token, 3.49 tokens per second)
llama_perf_context_print: total time = 2629007.70 ms / 29129 tokens
Replies: >>105693987 >>105694144 >>105694434 >>105698422
Anonymous
6/24/2025, 11:20:34 PM No.105693987
Screenshot 2025-06-24 231318
Screenshot 2025-06-24 231318
md5: 05124766a9a93b340a49171b35918bbb🔍
>>105693890
ok well i am reatrded. fully removing ot seems to make it to big for 288gb vram

>>105693920
thanks, i put in my 48 cores

>>105693919
any way i can see which -ot offload is better/worse besides testing?

>>105693968
whats that numa stuff?
i can only get the other cpus in a private network connected with a switch, so thats why i use the rpc server.
Replies: >>105694020 >>105694032 >>105694045
Anonymous
6/24/2025, 11:22:18 PM No.105694000
deepseek-iq2
deepseek-iq2
md5: 91cb375e48f57e109721df899e2e9c07🔍
>>105692033
probably final update: performance comparison at different context depths. Only ran with `--repetitions 1`, as it already takes a long time as it is.
unsloth+llama.cpp pp512 uses GPU (1060 6GB), ubergarm+ik_llama.cpp pp512 uses CPU only. Both tg128 are CPU only.
At 8k context you can see a big difference, 3x pp and 2x tg with ik_llama.
Interesting point: running `llama-server` with the same flags as `llama-bench` doesn't throw CUDA error and pp on GPU works just fine...
Anyways, this is the kind of performance that you can expect for 400€ total worth of hardware, not great, but not terrible either considering the cost.
bonus: quick and dirty patch adding `-d, --n-depth` support to ik_llama, to compare results with llama.cpp: https://files.catbox.moe/e64yat.patch
Replies: >>105694047 >>105694101
Anonymous
6/24/2025, 11:23:32 PM No.105694013
>>105692065
Isn't it useful when you want to make captions? llama.cpp is the easiest way to run a vision model.
Replies: >>105694048
Anonymous
6/24/2025, 11:24:03 PM No.105694020
>>105693987
>any way i can see which -ot offload is better/worse besides testing?
As far as I know, not really.
I think the devs are working on a way to automatically set that, but that's not yet ready.
Replies: >>105694032
Anonymous
6/24/2025, 11:25:21 PM No.105694032
>>105693987
>>105694020
But it's basically a question of looking at each tensor size in the terminal and using -ot to only move as few as you must to RAM.
Anonymous
6/24/2025, 11:27:07 PM No.105694045
>>105693987
>whats that numa stuff?

You do not need to bother if you have a single CPU.

I have two on HP Z840, and thus have to take care of where the model will be place (it must be close to the CPU it will be run on, obviously)

numactl allows to define which cores to use. Interestingly, the neighboring CPU, if used, only slowed everything down.

The process is VERRRRY memory-intensive, and avoiding bottlenecks and collisions is the MUST
Anonymous
6/24/2025, 11:27:53 PM No.105694047
>>105694000
>Interesting point: running `llama-server` with the same flags as `llama-bench` doesn't throw CUDA error and pp on GPU works just fine...
nevermind, it shat itself on second assistant response with the same CUDA error, have to use `CUDA_VISIBLE_DEVICES=none`.
Anonymous
6/24/2025, 11:28:00 PM No.105694048
>>105694013
vLLM is easier and supports more/better models, assuming to have new enough hardware supported by their quant types.
Replies: >>105694098
Anonymous
6/24/2025, 11:29:40 PM No.105694062
>>105692197
>ik_llama is about 1/3 slower for me than mainline llamacpp on r1

Same here. ik_llama sucks big time. No magic.
Replies: >>105694719
Anonymous
6/24/2025, 11:31:21 PM No.105694075
...
...
md5: 8a005d6d13449f4060cb195adba60d6e🔍
>>105693514
see you sisters on the discord
Replies: >>105694160
Anonymous
6/24/2025, 11:33:43 PM No.105694096
Screenshot 2025-06-24 233111
Screenshot 2025-06-24 233111
md5: 7bc0b7b786f3e1893c68404746544be4🔍
>>105693890
how do i find out the tensor size? i can only find this.

i mean i think the vram should be enough to offload nothing? even on HF they say 251 gb vram for this model. are there any other stuff i can check before i play with offloading tensors?
Replies: >>105694121 >>105694144 >>105694168 >>105694200
Anonymous
6/24/2025, 11:34:05 PM No.105694098
>>105694048
vLLM and sglang are full of bugs, it's impossible to run InternVL3 on them.
Anonymous
6/24/2025, 11:34:11 PM No.105694101
>>105694000
>400€ total worth of hardware
Impressive for that much.
`-d, --n-depth` support to ik_llama
I though that's what -gp (not -pg) is for. e.g: -gp 4096,128 tests "tg128@pp4096"
Replies: >>105694179
Anonymous
6/24/2025, 11:36:28 PM No.105694121
file
file
md5: dbeeb168e0e131ff355b43c96390fcce🔍
>>105694096
go to repo where you got the gguf and click this
it's a file explorer of sorts, will show you what's inside
Replies: >>105694168
Anonymous
6/24/2025, 11:38:26 PM No.105694144
>>105694096
>>105693968

nta

use CUDA_VISIBLE_DEVICES="0," to use a single GPU out of your harem, the the suggested --override-tensor ".ffn_.*_exps.=CPU" will work too

At low context sizes, and --no-kv-offload ypu will use less that 12gb vram
Replies: >>105694265
Anonymous
6/24/2025, 11:39:50 PM No.105694160
>>105694075
A LocalLLaMA discord already previously existed but eventually it disappeared. That might have been some time after TheBloke also vanished, so quite some time ago.
Anonymous
6/24/2025, 11:40:18 PM No.105694168
>>105694096
>>105694121
>how do i find out the tensor size?
llama-gguf <file> r
if you don't wish to rely on third party services
Replies: >>105694265
Anonymous
6/24/2025, 11:41:09 PM No.105694179
>>105694101
>I though that's what -gp (not -pg) is for
very possible, but it was still missing pp tests like "pp512@d4096" afaik.
Anonymous
6/24/2025, 11:41:38 PM No.105694188
Another thought about ik_llama (latest commit)

mlock fails to fix the model in RAM which results in a long start time
Replies: >>105694253
Anonymous
6/24/2025, 11:42:56 PM No.105694200
>>105694096
Launch with --verbose and use -ot to do whatever and it'll output the name and size of all tensors.
Replies: >>105694265
Anonymous
6/24/2025, 11:48:48 PM No.105694253
>>105694188
mlock never ever worked for me in neither backend (loading models from NFS mount), maybe it's a bug or unimplemented feature in the Linux kernel, i always run with `--no-mmap` to guarantee it doesn't swap out.
Replies: >>105694272 >>105694434
Anonymous
6/24/2025, 11:50:18 PM No.105694265
Screenshot 2025-06-24 234620
Screenshot 2025-06-24 234620
md5: 89fc9cb6d3215cf207ada04f42aca517🔍
>>105694200
>>105694168
>>105694144
thanks all, ill try with just the ffn_up and check performance.
Anonymous
6/24/2025, 11:51:27 PM No.105694272
>>105694253
doesnt work for me in windows either unless you have a few more gb free on top of the actual full size of everything
Anonymous
6/25/2025, 12:08:24 AM No.105694431
Screenshot 2025-06-25 000704
Screenshot 2025-06-25 000704
md5: 5b93984eb3b8f4fa0db076f9fa64a8ef🔍
ahoy we have a liftoff. just with

./llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/unsloth/DeepSeek-R1-0528-GGUF/UD-Q2_K_XL/DeepSeek-R1-0528-UD-Q2_K_XL-00001-of-00006.gguf \
--cache-type-k q4_0 \
--threads 48 \
--n-gpu-layers 99 \
--prio 3 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
-ot ".ffn_(up)_exps.=CPU" \
-no-cnv

but i still have about 4gb free per gpu, i can probably only offlead thee last 20 or so layers.
ill report back
Replies: >>105694487 >>105694501
Anonymous
6/25/2025, 12:09:14 AM No.105694434
>>105694253
I run the original llama.cpp >>105693968
and I do not have to set anything. It caches the model by itself which gives 15-second restarts
Anonymous
6/25/2025, 12:15:01 AM No.105694487
>>105694431
>--threads 48
It seems as if this fixed the problem

>--prio 3
I saw no change with or without

your prompt_eval is lower than mine (16t/s), and the genning speed is just the same.

Keep optimizing and please report the results
Replies: >>105694515 >>105694544
Anonymous
6/25/2025, 12:16:40 AM No.105694501
>>105694431
It cannot be that your use a bunch of GPUs for prompt evaluation, and it is still so low

Something is botched
Replies: >>105694515 >>105694533 >>105694544
Anonymous
6/25/2025, 12:18:32 AM No.105694515
>>105694501
maybe the rpc is really fucked. but no clue how to benchmark that.

>>105694487
"\.(2[5-9]|[3-9][0-9]|[0-9][0-9][0-9])\.ffn_up_exps.=CPU"

trying to only offload up after gate 25. hopefully the regex works.

ill report results.


ahhh maybe flash-attantion is missing?
Replies: >>105694568
Anonymous
6/25/2025, 12:21:08 AM No.105694533
>>105694501
Does prompt evaluation get any faster from having more GPUs in series?
I understand that generation doesn't.
Replies: >>105694621
Anonymous
6/25/2025, 12:21:53 AM No.105694544
Screenshot 2025-06-25 002022
Screenshot 2025-06-25 002022
md5: dacf480901db610a0945efa3c083b154🔍
>>105694501
>>105694487
its still offloading much to the cpu it seems. but now less then before.
Anonymous
6/25/2025, 12:24:24 AM No.105694568
>>105694515
>flash-attention

This will reduce VRAM usage and keep the genning speed stable
Replies: >>105694828
Anonymous
6/25/2025, 12:29:17 AM No.105694621
Screenshot from 2025-06-25 00-28-17
Screenshot from 2025-06-25 00-28-17
md5: 1fb8474ab92ecbfe4e797c4caf6cf4a0🔍
>>105694533
Look at what your GPUs are doing during prompt processing
Replies: >>105694636 >>105694653
Anonymous
6/25/2025, 12:30:54 AM No.105694636
>>105694621
And you will see the real pp speed with much bigger prompts like mine (20k tkn)
Anonymous
6/25/2025, 12:32:36 AM No.105694651
contextshift or swa? i may be stupid but swa just seems like it fucks up your context size, is it at least way faster or something?
Anonymous
6/25/2025, 12:32:54 AM No.105694653
>>105694621
I see only one of your 2 GPUs being used, so I'll guess the answer is no?
Replies: >>105694714
Anonymous
6/25/2025, 12:33:24 AM No.105694656
Screenshot 2025-06-25 003220
Screenshot 2025-06-25 003220
md5: 82e43590a618353e6e8457cbfadf4f3e🔍
?????????
why the fuck did it try to allocate that?
Anonymous
6/25/2025, 12:39:54 AM No.105694714
>>105694653
This picture is to show that during prompt processing the dedicated GPU will run close to 100%.

My second GPU is M2000 - small and retarded, used for the display only, so I could have the entire RTX 3090 for AI stuff

Since bigger prompt are processed in batches, I would think that it can be distributed among several GPUs
Anonymous
6/25/2025, 12:40:41 AM No.105694719
>>105694062
>>105692197
sad to hear. it's the complete opposite for me.

INFO [ print_timings] prompt eval time = 62198.27 ms / 8268 tokens ( 7.52 ms per token, 132.93 tokens per second) | tid="135607548014592" id_slot=0 id_task=111190 t_prompt_processing=62198.269 n_prompt_tokens_processed=8268 t_token=7.522770803096275 n_tokens_second=132.9297443952982
Replies: >>105694758 >>105694804
Anonymous
6/25/2025, 12:44:12 AM No.105694758
>>105694719
>prompt eval time = 62198.27 ms / 8268 tokens ( 7.52 ms per token, 132.93 tokens per second)

jeeeez...

Where is the genning speed?
Anonymous
6/25/2025, 12:48:53 AM No.105694804
>>105694719
Could you please post your complete llama-cli command including the model, and, if possible, the commit of ik_llama used

The pp speed in your case is staggering
Replies: >>105696513
Anonymous
6/25/2025, 12:51:47 AM No.105694824
What goes into prompt processing? Are all model weights involved in that?
Replies: >>105695915
Anonymous
6/25/2025, 12:52:05 AM No.105694828
Screenshot 2025-06-25 005036
Screenshot 2025-06-25 005036
md5: 8b85b6cc134a9c84e73801f3d615ffd7🔍
>>105694568
BOYS, we are getting there!

now how do i make sure the model gets more evenly distributet among my gpus? some have 8gb vram free, some only 1
Replies: >>105694834 >>105694863 >>105694890
Anonymous
6/25/2025, 12:52:56 AM No.105694834
>>105694828
An even longer -ot argument.
Replies: >>105694997
Anonymous
6/25/2025, 12:56:19 AM No.105694863
>>105694828
That's usable speed
Replies: >>105694997
Anonymous
6/25/2025, 12:59:52 AM No.105694890
>>105694828
-ot for each blk., per gpu. lots of argument lines but easiest to keep track
Replies: >>105694997
Anonymous
6/25/2025, 1:08:59 AM No.105694943
>>105689385 (OP)
I will jailbreak your anus
Replies: >>105695033
Anonymous
6/25/2025, 1:16:33 AM No.105694997
Screenshot 2025-06-25 011557
Screenshot 2025-06-25 011557
md5: a7cb11dcd73fba168491ac96390a09fa🔍
>>105694834
>>105694863
>>105694890

with
-ot "\.(3[0-9]|4[0-9]|5[0-9]|6[0-9]|7[0-9]|8[0-9]|9[0-9]|[0-9][0-9][0-9])\.ffn_up_exps.=CPU
Replies: >>105695000 >>105695037 >>105695042
Anonymous
6/25/2025, 1:17:23 AM No.105695000
>>105694997
kek
Anonymous
6/25/2025, 1:21:15 AM No.105695033
>>105694943
please be rough daddy~ *blushes*
Anonymous
6/25/2025, 1:21:59 AM No.105695037
>>105694997
Replace CPU in "exps.=CPU" with CUDA0 for your first GPU then another -ot for CUDA1 for the second etc to control which tensors go on the GPUs, then put -ot exps=CPU at the end so all the leftover tensors go to ram.
Replies: >>105695123
Anonymous
6/25/2025, 1:22:50 AM No.105695042
>>105694997
Are you using the example launch params on ubergarm's model page? Did you compile ikllama with arguments to reduce parallel to 1? Are you adding the mla param? Your speeds are really off.
Replies: >>105695123
Anonymous
6/25/2025, 1:33:37 AM No.105695123
>>105695037
huh? sorry do you mean like instead of CPU use
-ot "\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15)\.ffn_up_exps.=CUDA0" \
-ot "\.(16|17|18|19|20|21|22|23|24|25|26|27|28|29|30)\.ffn_up_exps.=CUDA1" \
-ot "\.(31|32|33|34|35|36|37|38|39|40|41|42|43|44|45)\.ffn_up_exps.=CUDA2" \
-ot "ffn_up_exps.=CPU"

how do i do that with rpc resources?


>>105695042
i am using base llama.cpp is the fork better performance?
Replies: >>105695182 >>105695906
Anonymous
6/25/2025, 1:34:12 AM No.105695128
in general have local models been a success or failure?
Replies: >>105695159
Anonymous
6/25/2025, 1:34:26 AM No.105695129
Is it just me or is R1 and V3 absolute crap for ERP? I don't know what the fuck you guys are doing but I don't get it to roleplay nicely at all.

Gemma 3 is the only local model that actually roleplays reasonably.

I just HAVE to assume I'm retarded because how the fuck can a 27B model be consistently better than some ~700B model that is hyped everywhere.
Replies: >>105695343 >>105695356
Anonymous
6/25/2025, 1:37:52 AM No.105695159
>>105695128
a failure, it's over, etc.
Anonymous
6/25/2025, 1:41:30 AM No.105695182
>>105695123
Leave the last one as -ot exps=CPU.
>how do i do that with rpc resources?
I don't know but try exps=RPC[address]
Replies: >>105695506 >>105695631
Anonymous
6/25/2025, 2:04:11 AM No.105695343
>>105695129
skill issue with the model and another skill issue in giving people enough details to diagnose your initial skill issue
Anonymous
6/25/2025, 2:05:30 AM No.105695356
1750809385551
1750809385551
md5: a5de399d9793afc5e3f2f58644aa29c5🔍
>>105695129
>ollmao run deepsneed:8b
>UUUHHHH GUYS ITS SHIT HURR DURR HURRR HURR
bait used to be believable
Anonymous
6/25/2025, 2:29:43 AM No.105695506
>>105695182
actually how do i calculate the size the tensor will take up on the GPU?
Replies: >>105695533
Anonymous
6/25/2025, 2:30:07 AM No.105695510
How the fuck do I stop models from hitting me with the usual slop of

>Gives me a perfect line, in tone of the character
>Follows up with "But their heart and mind is more important" disclaimer

It pisses me off.

If I notice a girls huge tits, why does my bot, NO MATTER the bot or model always give me that type of response. My prompt must be fucked (basic roleplay default prompt in ST)
Replies: >>105695665
Anonymous
6/25/2025, 2:33:43 AM No.105695533
>>105695506
Offload less tensors to GPUs if you're OOMing, add more if you're not. All tensors of a type are the same size but how big they are depends on quantization.
Anonymous
6/25/2025, 2:53:21 AM No.105695631
Screenshot 2025-06-25 024831
Screenshot 2025-06-25 024831
md5: 69c83ceb6753e8bdf078c4c18fe3e337🔍
>>105695182
i tried, didnt work i tried balancing the blk in a fair manner accross the devices like

"\.blk\.[0-4]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=CUDA0"
"\.blk\.[5-9]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=CUDA1"
"\.blk\.1[0-4]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=CUDA2"
"\.blk\.1[5-9]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=CUDA3"
"\.blk\.2[0-4]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=RPC[10.0.0.28:50052]"
"\.blk\.2[5-9]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=RPC[10.0.0.28:50053]"
"\.blk\.3[0-4]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=RPC[10.0.0.28:50054]"
"\.blk\.3[5-9]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=RPC[10.0.0.28:50055]"
"\.blk\.4[0-4]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=RPC[10.0.0.40:50052]"
"\.blk\.4[5-9]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=RPC[10.0.0.40:50053]"
"\.blk\.5[0-4]\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\..*=RPC[10.0.0.40:50054]"

"(^output\.|^token_embd\.|\.blk\.(5[5-9]|60)\.(ffn_down_exps|ffn_gate_exps|ffn_up_exps|attn_output)\.).*=RPC[10.0.0.40:50055]"

"(\.blk\..*\.(ffn_.*shexp|attn_k_b|attn_kv_a|attn_q_|attn_v_b|.*norm)\.|.*norm\.).*=CPU"

this, then i want even simpler with exactly 3 blocks per device and the rest on cpu


-ot ".*=CPU"

which then didnt use Cuda at all????

i mean looking at this i could fit atleast 30 GB more in the vram.
Replies: >>105695667 >>105695668
Anonymous
6/25/2025, 2:58:40 AM No.105695665
>>105695510
model issue
Anonymous
6/25/2025, 2:58:59 AM No.105695667
>>105695631
Stop using black magic fool
Replies: >>105695683
Anonymous
6/25/2025, 2:59:07 AM No.105695668
>>105695631
Might as well do this now:
"blk\.(0|1|2|3|4)\..*exps=CUDA0
"blk\.(5|6|7|8|9)\..*exps=CUDA1...
Also RPC can be wack
Replies: >>105695683
Anonymous
6/25/2025, 3:01:20 AM No.105695682
>>105691330
keks
Anonymous
6/25/2025, 3:01:20 AM No.105695683
file
file
md5: 2075a58a15f06bc08e1ef1abbb6c8acf🔍
>>105695668
>>105695667
this i get with

-ot "\.(3[4-9]|4[0-9]|5[0-9]|6[0-9]|7[0-9]|8[0-9]|9[0-9]|[0-9][0-9][0-9])\.ffn_up_exps.=CPU" \


its usable. and i will freeze the -ot optimizations for now and try using the ik_llama.cpp.

with the same settings, maybe it gives me 10T/s. that would be cool.
Replies: >>105695741
Anonymous
6/25/2025, 3:11:34 AM No.105695741
>>105695683
don't split exps down/gate/up. keep them together on the same device with simply blk\.(numbers)\..*exps for the ones to send to CPU for fastest speed.
and don't touch attn_output, or any attention tensors.
Anonymous
6/25/2025, 3:39:43 AM No.105695881
Screenshot 2025-06-25 033900
Screenshot 2025-06-25 033900
md5: de625ba2f1ec3f0faca1ae6d4f0dffd6🔍
BRO I AM GOONA CRIPPLE YOUR FACE. WHY DO I NEED 50 DIFFERENT FLAVOURS OF GGUF FOR EVERY SHIT
Anonymous
6/25/2025, 3:42:35 AM No.105695906
>>105695123
Machine specs:
OS: EndeavourOS x86_64
Motherboard: B450M Pro4-F
Kernel: Linux 6.15.2-arch1-1
CPU: AMD Ryzen 3 3300X (8) @ 4.35 GHz
GPU 1: NVIDIA GeForce RTX 4090 [Discrete]
GPU 2: NVIDIA GeForce RTX 3090 [Discrete]
GPU 3: NVIDIA RTX A6000
Memory: 2.89 GiB / 125.72 GiB (2%)

ik_llama.cpp build command:
cmake -B build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_SCHED_MAX_COPIES=1 -DLLAMA_SERVER_SSL=ON
cmake --build build --config Release -j 8

My scripts I use to run deepseek. I get 16 tk/s prompt processing and 5.4 tk/s slowing down to around 4.5 tk/s gen around 10k context but remains constant at that point. I can do proper tests if necessary but it's good enough for RP and faster than slow reading speed. My speed before adding the rtx3090 was 15 tk/s pp and 4.4 dropping to 4 tk/s gen.

Main difference was the MLA and parallel param which literally cut VRAM usage down to a 1/3 which let me offload more tensors. Also. don't use ctv (quantise value cache) since it made garbage outputs with it. MLA param and threads param were what shot my speed up from 1 ish to 4+. Tried regular llama.cpp again last night and the speed is 1/4th of ikllama.

https://pastebin.com/Yde41zyL
Replies: >>105696219
Anonymous
6/25/2025, 3:43:50 AM No.105695915
>>105694824
https://www.youtube.com/watch?v=wjZofJX0v4M
https://www.youtube.com/watch?v=KJtZARuO3JY
Anonymous
6/25/2025, 4:04:55 AM No.105696010
Screenshot 2025-06-24 215322
Screenshot 2025-06-24 215322
md5: c8cdb58a4d9c42919d662e2500d968e1🔍
Why is the pro so much faster even for models that fit inside the 5090s vram? Exclusive driver features?
Replies: >>105696048 >>105696237 >>105696333 >>105697419 >>105698454
Anonymous
6/25/2025, 4:14:03 AM No.105696048
>>105696010
should be faster but not that much faster. smells like testing error
Anonymous
6/25/2025, 4:23:50 AM No.105696087
I'm still using monstral for RP on 48gb vram, anything newer cos that much be like 6 months old now
Replies: >>105696107
Anonymous
6/25/2025, 4:27:25 AM No.105696107
>>105696087
try command A (very easy to jail break) or a finetune like agatha
Anonymous
6/25/2025, 4:50:56 AM No.105696219
>>105695906
I have a similar setup and same speed on ik_llama, but about 7t/s on regular llama. Read somewhere that DGGML_SCHED_MAX_COPIES=1 tanked speeds, compiling with DGGML_SCHED_MAX_COPIES=2 brought them back.
Replies: >>105696567
Anonymous
6/25/2025, 4:55:20 AM No.105696237
>>105696010
Maybe they fucked up the context size?
Anonymous
6/25/2025, 4:56:18 AM No.105696242
models
models
md5: 1796798e9d6f231337aab7206ee37ae4🔍
Are there models better than these ones right now that I can use? I want ERP coomer models. The nemo one is my go-to when I want fast tokens since it fits in vram, but the quality is shittier when compared to QwQ snowdrop.
Running nvshitia 3070 8gb vram & 64gb ram, with amd 3600 cpu.
trashpanda-org_QwQ-32B-Snowdrop-v0-IQ4_XS
NemoReRemix-12B-Q3_K_XL
Replies: >>105696422 >>105696433 >>105696473
Anonymous
6/25/2025, 5:15:11 AM No.105696333
>>105696010
InternLM is the only one that could fit entirely inside all 3 cards, and depending on context size could still spill over into system RAM, in fact it's very likely that was the case.
I love GN and I know they don't usually do ML benches but this was an extremely amateur effort.
Replies: >>105697419
Anonymous
6/25/2025, 5:21:35 AM No.105696371
1737167924630942
1737167924630942
md5: 46231bffc4128f568f4ee6c44b1a2764🔍
>>105689385 (OP)
Anonymous
6/25/2025, 5:29:32 AM No.105696422
>>105696242
>qwq snowdrop
It feels like it's more stupid than regular qwq, and it doesn't seem trained on my particular kinks, so it doesn't really offer a better experience than just wrangling regular qwq.
Replies: >>105696473
Anonymous
6/25/2025, 5:32:01 AM No.105696433
>>105696242
Rocinante 12B, cydonia 24B and erase these trashes
Anonymous
6/25/2025, 5:39:12 AM No.105696473
>>105696422
It's not as smart but it is more fun. At least that has been my experience with it.
>>105696242
Try GLM4, I thought it was an upgrade to qwq and it doesn't need thinking.
Replies: >>105696542
Anonymous
6/25/2025, 5:44:48 AM No.105696513
>>105694804
CUDA_VISIBLE_DEVICES="0,1,2,3" ./llama-server --attention-max-batch 512 --batch-size 4096 --ubatch-size 4096 --cache-type-k f16 --ctx-size 32768 --mla-use 3 --flash-attn --fused-moe --model models/DeepSeek-R1-0528-IQ4_KS_R4/DeepSeek-R1-0528-IQ4_KS_R4-00001-of-00009.gguf -ngl 99 -sm layer -ot "blk\.3\.ffn_up_exps=CUDA0, blk\.3\.ffn_gate_exps=CUDA0" -ot "blk\.4\.ffn_up_exps=CUDA0, blk\.4\.ffn_gate_exps=CUDA0" -ot "blk\.5\.ffn_up_exps=CUDA1, blk\.5\.ffn_gate_exps=CUDA1" -ot "blk\.6\.ffn_up_exps=CUDA1, blk\.6\.ffn_gate_exps=CUDA1" -ot "blk\.7\.ffn_up_exps=CUDA1, blk\.7\.ffn_gate_exps=CUDA1" -ot "blk\.8\.ffn_up_exps=CUDA2, blk\.8\.ffn_gate_exps=CUDA2" -ot "blk\.9\.ffn_up_exps=CUDA2, blk\.9\.ffn_gate_exps=CUDA2" -ot "blk\.10\.ffn_up_exps=CUDA2, blk\.10\.ffn_gate_exps=CUDA2" -ot "blk\.11\.ffn_up_exps=CUDA3, blk\.11\.ffn_gate_exps=CUDA3" -ot "blk\.12\.ffn_up_exps=CUDA3, blk\.12\.ffn_gate_exps=CUDA3" -ot "blk\.13\.ffn_up_exps=CUDA3, blk\.13\.ffn_gate_exps=CUDA3" --override-tensor exps=CPU,attn_kv_b=CPU --no-mmap --threads 24
Replies: >>105696562 >>105697166
Anonymous
6/25/2025, 5:49:17 AM No.105696542
>>105696473
I had more repetition problems with glm4 compared to qwq.
Anonymous
6/25/2025, 5:49:52 AM No.105696546
diabetic_miku
diabetic_miku
md5: a11cb01a76c2cffbaa085c2feac0be97🔍
>SWA
Turned this shit off by accident. Quality went up 100%.
Anonymous
6/25/2025, 5:52:02 AM No.105696558
Actually, now that I think about it, can't I just train a lora for my kinks?
There's only 2 problems that I know of, one: I'm computer illiterate; two: there's like one or two pieces of literature that are adjacent to my kinks, but never quite accurately capture it. And I can't really use the big hosted solutions to generate synthetic data since they're not trained for it...
Replies: >>105696564
Anonymous
6/25/2025, 5:52:52 AM No.105696562
>>105696513
cmake -B build -DGGML_CUDA=ON -DGGML_RPC=OFF -DGGML_BLAS=OFF -DGGML_CUDA_IQK_FORCE_BF16=1 -DGGML_CUDA_F16=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_MIN_BATCH_OFFLOAD=32
Replies: >>105697166
Anonymous
6/25/2025, 5:53:40 AM No.105696564
>>105696558
>computer illiterate
You'll feel right at home there: >>>/g/aicg
Replies: >>105696582
Anonymous
6/25/2025, 5:54:32 AM No.105696567
>>105696219
was that the compile flag for llama.cpp or ik? Can you share your llama-server command params for llama.cpp so I can compare speeds on my machine with ik?
Anonymous
6/25/2025, 5:57:09 AM No.105696582
>>105696564
I fucking hate cloud shit though
Replies: >>105697266
Anonymous
6/25/2025, 6:57:10 AM No.105696874
Hello fellow retards. I am trying to get this shit running locally. So far I have text_generation_webui running on a local server that has an RTX 3080 Ti in it. I grabbed the "mythomax-l2-13b" model (thanks to ChatGPT fucking me up the ass when I was trying to figure this shit out on my own). It's trying to tell me about installing Chroma DB for persistent memory, but I don't fucking understand shit. Help this retarded nigger out, please. I don't even have a character thrown in yet. I am interested in using this shit for roleplay because I'm a worthless virgin who wants to talk to anime girls, and I was hoping to enable emotion recognition and mood shifts.
Replies: >>105696891
Anonymous
6/25/2025, 7:00:34 AM No.105696891
>>105696874
>>105689385 (OP)
>►Getting Started
Replies: >>105696908 >>105696911
Anonymous
6/25/2025, 7:05:29 AM No.105696908
>>105696891
nothing under there is going to tell him how to set up chroma with ooba
Replies: >>105696964
Anonymous
6/25/2025, 7:05:34 AM No.105696911
>>105696891
>Hey, I have a question about enabling persistent memory past the context, so my character remembers the night before and previous messages. I also want to know about emotional recognition and mood shifts. Does someone know how to do this?
>dude read the getting started
I need this shit spoonfed because I'm a raging baboon and wasted the past 8 hours trying to get this working. I'm not trying to coom. I'm trying to have an AI girlfriend so I don't kill myself.
Replies: >>105696944
Anonymous
6/25/2025, 7:12:00 AM No.105696944
>>105696911
you should still read the getting started because both your model and ui are trash and not good for an ai girlfriend
then you would be using silly and can use their memory solution and we wouldn't have to play 20 questions figuring out what part you didn't understand of a simple install process
Replies: >>105697000
Anonymous
6/25/2025, 7:14:57 AM No.105696957
now that&#039;s more like it
now that&#039;s more like it
md5: d8d1418321b86e73a8d93bb00b1ee242🔍
I don't know why I spent so long fucking around with ollama as my backend to SillyTavern. I guess my friend telling me about being able to switch models on the fly really hooked me, but my god what was I doing not using Kobold? I haven't been able to get the kind of fever dream nightmare porn I've been missing this entire time just because Ollama just would operate like shit across every possible configuration I tried on ST. If I tried to use Universal-Super-Creative to get shit like this with Ollama I would get nothing but a text string of complete nonsense dogshit wordsalad instead of the vile flowing conciousness meatfucking I've been craving.
Anonymous
6/25/2025, 7:15:04 AM No.105696958
best model for RAG?
Replies: >>105696979 >>105697002
Anonymous
6/25/2025, 7:15:39 AM No.105696964
>>105696908
If he is too stupid to google "how to set up chroma with ooba" and follow a plebbit tutorial, there is not much I can or want to do.
>captcha
>4TTRP
Replies: >>105696983
Anonymous
6/25/2025, 7:18:51 AM No.105696979
>>105696958
I tried a few, some quite fat but I always come back to "snowflake-arctic-embed-l-v2.0"
Qwen3-Embedding-4B is okay too, but it is significantly bigger than Arctic.
Anonymous
6/25/2025, 7:19:27 AM No.105696983
Untitled
Untitled
md5: e4260eda402a6520d7e426f143f9e980🔍
>>105696964
I think for your purposes, sillytavern would be easier to deal with.
Replies: >>105697000
Anonymous
6/25/2025, 7:22:14 AM No.105697000
>>105696944
My bad for using the general dedicated to the discussion of local language models to discuss how to use local language models effectively. I'm struggling navigating the command line on fucking ubuntu server; switching to something else when I can barely wrap my head around wget is not what I was hoping to do. Regardless, looks like I can follow >>105696983's image (is that fucking Gemini?) to either get myself more fucked up, or maybe get it working altogether. If you don't hear from me, imagine it worked.
Replies: >>105697009 >>105697022 >>105697037
Anonymous
6/25/2025, 7:22:27 AM No.105697002
>>105696958
sfw shit? use gemmy3
Anonymous
6/25/2025, 7:23:31 AM No.105697009
>>105697000
>My bad for using the general dedicated to the discussion of local language models
discussion not tech support
Anonymous
6/25/2025, 7:26:49 AM No.105697022
>>105697000
By the way, I'm pretty sure that gemini's response is wrong. In anyway case, I really wouldn't use text generation webui for what you're doing. You'll have a easier time running sillytavern on top, since it has easily installed extensions (some are even built in) for persistent memory, summarization, and dynamic images based on current emotion of the character.
Replies: >>105697041 >>105697043
Anonymous
6/25/2025, 7:28:42 AM No.105697037
>>105697000
I know it's overwhelming when you first try to get into it.
But anyway, AI can not be your girlfriend. It just doesn't work that way.
Anonymous
6/25/2025, 7:28:56 AM No.105697041
>>105697022
...Alright. Thank you for helping me. I'll try and figure out moving the files over to sillytavern. I'll google it.
Anonymous
6/25/2025, 7:29:25 AM No.105697043
>>105697022
Hey different guy here piling on tech support minute.
Is there a way to have SillyTavern highlight my messages/edits vs generated text? That was one of the features I remember from NovelAI (I think it was NovelAI) that I really liked, because it made it clear what in the text was my tard wrangling.
Replies: >>105697063
Anonymous
6/25/2025, 7:31:36 AM No.105697059
I've got some questions regarding DeepSeek-R1-0528. I'm a newb.

1. How censored is the model? (criteria: will it answer if I ask it to make a plan to genocide Indians?)
2. Is there any model on huggingface that's trained to be superior?
3. Is there something like a "VPS" so I can run it on my control? (I don't have a strong enough PC)
Replies: >>105697074
Anonymous
6/25/2025, 7:32:01 AM No.105697063
>>105697043
I know I said to switch over to sillytavern for that other anon, but I don't really rp, so I'm not too experienced a st user.
Anonymous
6/25/2025, 7:33:29 AM No.105697074
>>105697059
1. It'll refuse.
2. There are grifts trained to be "superior".
3. Yeah, you can rent them.
Replies: >>105697111
Anonymous
6/25/2025, 7:40:17 AM No.105697111
>>105697074
1. Damn. AI Studio does answer it.
2. Why grifts? None you recommend? Specially for censorship.
3. Any recommendations?
Replies: >>105697160
Anonymous
6/25/2025, 7:48:10 AM No.105697160
>>105697111
It'll refuse in the same way that most models will refuse any "problematic" request unless you explicit tell it to answer.
You can't really fine-tune the censorship out of a model. It needs to be done during pre-training.
And I run everything locally, I don't have any experience with that. If you're fine with slow speeds, you can run a low quant of it using cpu.
Replies: >>105697176 >>105697219
Anonymous
6/25/2025, 7:49:09 AM No.105697166
>>105696513
>>105696562

Thank you, anon
Anonymous
6/25/2025, 7:51:20 AM No.105697176
>>105697160
Thanks anon.

If anyone else has any renting deepseek experience, pls help.
Anonymous
6/25/2025, 7:51:56 AM No.105697179
>>105693514
the best part is that it appears Reddit gave the sub to some turbojanny who didn't even make a thread on "redditrequest" like everyone else has to.

It just goes to show (again) that there is a handful of moderators who get all the subs, and they likely all know the admins.
To who they speak directly on discord probably and are given the subs.

This guy here, was the first to actually request it but he didn't get it, and there were several more requests after him:
https://old.reddit.com/r/redditrequest/comments/1lhsjz1/rlocalllama/
And here's the guy who ended up getting it without even requesting it officially lol:
https://old.reddit.com/user/HOLUPREDICTIONS/submitted/
Hopefully someone calls him out on it, I would but I was banned from reddit.
Replies: >>105697220 >>105697296
Anonymous
6/25/2025, 7:57:08 AM No.105697219
>>105697160
Running shit on a VPS is like running them locally, just that the machine is far, far away from you.
Replies: >>105697266
Anonymous
6/25/2025, 7:57:14 AM No.105697220
>>105697179
To whom*
Anonymous
6/25/2025, 8:05:09 AM No.105697266
>>105697219
>>105696582
Anonymous
6/25/2025, 8:09:54 AM No.105697296
>>105697179
I don't think the new one will be dedicated enough to limit spam from grifters and karma farmers, it's already worse than it's ever been in recent times and I can already imagine how it will be in just a couple weeks.
Anonymous
6/25/2025, 8:14:00 AM No.105697310
every hour, someone on huggingface releases a merge. Usually, they're a merge of merged models - so the merge might consist of 3-4 models and each model in the merge is a merge of another 3-4 models.

From such a method, can you ever really get anything exceptional? I see so many and I'm just starting to think that it's worth dismissing them completely out of hand, despite the claims that you read on the model card. There might be slight improvements (usually there are not) but over the course of a long chat, it's barely noticeable
Replies: >>105697330 >>105697368
Anonymous
6/25/2025, 8:17:03 AM No.105697330
>>105697310
>every hour, someone on huggingface releases a merge
that sounds like the opening to a tearjerker charity drive
>to help, call 1-800-HUG, and stop the senseless merging
Anonymous
6/25/2025, 8:27:10 AM No.105697368
>>105697310
I'm confident that almost nobody is doing merges because they think they're useful to others.
Anonymous
6/25/2025, 8:32:46 AM No.105697393
dd
dd
md5: 0b3db732f236eed6aca249e00b47a47d🔍
>{{random::arg1::arg2}}
I made a quest table using random macro. There's about 20 or so different entries for now. This allows me to randomize the theme every time I do a new chat. It's all happening in the 'first message' slot.

Is there a way to create the string but hide it from the user? He doesn't even need to know what the quest is.
Replies: >>105697460
llama.cpp CUDA dev !!yhbFjk57TDr
6/25/2025, 8:37:07 AM No.105697419
>>105696010
I don't know what LMStudio specifically does by default, but all of the currently existing code for automatically setting the number of GPU layers is very bad.
It's using heuristics for how much VRAM will be used but those heuristics have to be very conservative in order not to OOM, so a lot of performance is left on the table.
Steve's commentary suggests that he is not aware of what the software does internally when there is insufficient VRAM.

>>105696333
I'm a long time viewer of GN and left a comment offering to help them with their benchmarking methodology.
Replies: >>105697910 >>105698779
Anonymous
6/25/2025, 8:45:35 AM No.105697460
>>105697393
Whatever to answer my own question, HTML comments do work.
><!-- comment !-->
Block out a comment using these in the first comment and it's visible in the terminal but won't appear in ST.
Replies: >>105697469
Anonymous
6/25/2025, 8:46:00 AM No.105697464
Screenshot 2025-06-25 083613
Screenshot 2025-06-25 083613
md5: 38d80513b55c0179d5324ad068f0d636🔍
ahhhhhhh. well i am gonna try the /ubergarm/DeepSeek-R1-0528-GGUF quants first. maybe it fucks something up with the unsloth ones.
Anonymous
6/25/2025, 8:46:01 AM No.105697465
Any good guide on how to make AI less generic and reddit-like when running fantasy or sci-fi quest?
Replies: >>105697472
Anonymous
6/25/2025, 8:46:59 AM No.105697469
>>105697460
*first message not first comment
Anonymous
6/25/2025, 8:47:24 AM No.105697472
>>105697465
top nsigma
Replies: >>105697501
Anonymous
6/25/2025, 8:53:47 AM No.105697501
>>105697472
? Why would I want to 'top' (I know this is a tombler word for sex) a 'sigma' (I assume nsigma is neo-sigma) male (I am not gay)?
Anonymous
6/25/2025, 8:58:35 AM No.105697526
Screenshot 2025-06-25 085707
Screenshot 2025-06-25 085707
md5: cc2dc0dd70b96c0f592d62096397e140🔍
can i check at runtime which layers get written to which cuda device?
Replies: >>105697575
Anonymous
6/25/2025, 9:11:07 AM No.105697575
>>105697526
--verbose
Anonymous
6/25/2025, 9:12:07 AM No.105697583
Screenshot 2025-06-25 091011
Screenshot 2025-06-25 091011
md5: a5bfcc830cd61ba8ac4a280a9c727f8c🔍
what the helly?

cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF -DGGML_RPC=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1

./ik_llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4/DeepSeek-R1-0528-IQ2_K_R4-00001-of-00005.gguf \
--threads 48 \
--n-gpu-layers 99 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--flash-attn \
--ctx-size 16384 \
--parallel 1 \
-mla 3 -fa \
-amb 512 \
-fmoe \
-ctk q8_0 \
-ot "\.(3[3-9]|4[0-9]|5[0-9]|6[0-9]|7[0-9]|8[0-9]|9[0-9]|[0-9][0-9][0-9])\.ffn_up_exps.=CPU" \
Replies: >>105697697 >>105697723
Anonymous
6/25/2025, 9:19:46 AM No.105697627
ms32-creative-writing
ms32-creative-writing
md5: 9f3c745a01a401e83e46ee61d87de3fa🔍
The EQBench author added Mistral Small 3.2 to the Creative Writing bench. https://eqbench.com/creative_writing.html
Contrarily to my expectations, the "slop profile" of Mistral Small 3.2 is apparently the closest to DeepSeek-V3-0324.
Replies: >>105697672
Anonymous
6/25/2025, 9:22:42 AM No.105697641
https://fortune.com/2025/06/20/hugging-face-thomas-wolf-ai-yes-men-on-servers-no-scientific-breakthroughs/
>“In science, asking the question is the hard part, it’s not finding the answer,” Wolf said. “Once the question is asked, often the answer is quite obvious, but the tough part is really asking the question, and models are very bad at asking great questions.”
>Wolf said he initially found the piece inspiring but started to doubt Amodei’s idealistic vision of the future after the second read.
>“It was saying AI is going to solve cancer, and it’s going to solve mental health problems—it’s going to even bring peace into the world. But then I read it again and realized there’s something that sounds very wrong about it, and I don’t believe that,” he said.
>“Models are just trying to predict the most likely thing,” Wolf explained. “But in almost all big cases of discovery or art, it’s not really the most likely art piece you want to see, but it’s the most interesting one.”
Wow, huggingface bro is totally mogging the anthropic retards, based and brain pilled.
LLMs are useful tools but they are not actual intelligence and will never become intelligence.
Replies: >>105697661
Anonymous
6/25/2025, 9:27:05 AM No.105697661
>>105697641
>AI is going to solve cancer, and it’s going to solve mental health problems—it’s going to even bring peace into the world.
Whoever believed that is just naive.
Anonymous
6/25/2025, 9:30:16 AM No.105697672
brainlessturd
brainlessturd
md5: 5952561126cdf9f5987602db199c9270🔍
>>105697627
>llm as judge for a benchmark in the creative writing field
this shit is so retarded bro just stop
llm are fuzzy finders and they can understand the most broken of writing and even when commenting negatively on it they can still act quite sycophant
ex prompt from chatgpt:
https://rentry.org/5f7xrz9y
anyone who takes benches like eqbench seriously are brainless turds, waste of space, waste of oxygen, waste of food and literal oven dodgers
Replies: >>105697712 >>105697741
Anonymous
6/25/2025, 9:35:48 AM No.105697695
Screenshot 2025-06-25 093458
Screenshot 2025-06-25 093458
md5: a85e06f9e8fa88695a9704e620f79bb9🔍
WTF? HOW THE FUCK IS THAT SHIT SOO SLOW
Anonymous
6/25/2025, 9:36:24 AM No.105697697
>>105697583
What was your "best effort"?
Replies: >>105697706
Anonymous
6/25/2025, 9:37:17 AM No.105697706
>>105697697
with llama.cpp around 7.5 T/S
Replies: >>105697723
Anonymous
6/25/2025, 9:38:11 AM No.105697712
>>105697672
lmao, said exactly like someone who hasn't spent a single fucking second reading the about page
>asks the llm known for glazing the shit out of its users
>is surprised when it glazes the shit out of its users
benchmarks have to be objective and perfect!!! ignore everything with minor flaws in methodology!!! reeee
Replies: >>105697724
Anonymous
6/25/2025, 9:41:39 AM No.105697723
>>105697706

7.5 t/s with gerganov's llama,
And this >>105697583 with ik_llama?

You finally broke it lol
Replies: >>105697740
Anonymous
6/25/2025, 9:42:14 AM No.105697724
>>105697712
https://arxiv.org/pdf/2506.11440
it's not just about the glazing
LLMs aren't even able to tell when something that should be there isn't there
they rarely manage to spot excessively repetitive writing etc when tasked with judging
you are a subhuman mongoloid who belongs to the oven if you believe in benchmarks
Replies: >>105697776
Anonymous
6/25/2025, 9:45:55 AM No.105697740
>>105697723
yeah, i dont know whats wrong. i assume the PRC is fucked
Replies: >>105698103
Anonymous
6/25/2025, 9:46:39 AM No.105697741
>>105697672
Isn't the fact that there's an apparently massive difference within the same benchmark from 3.1 to 3.2 at least interesting? Or that it seems to be borrowing slop from DeepSeek V3 rather than Gemini/Gemma like others mentioned earlier?
Replies: >>105697753
Anonymous
6/25/2025, 9:48:34 AM No.105697753
>>105697741
It is, just ignore the elo to the moon bs and focus on what you can actually see yourself, so the slop profile info basically
Anonymous
6/25/2025, 9:52:18 AM No.105697776
>>105697724
>https://arxiv.org/pdf/2506.11440
ok interesting paper but wtf does this have to do with anything

slop/repetition is a metric on eqbench btw
and it goes down with better models and up with worse models... lol
even if it's a coincidence or they aren't controlling for it directly you can literally fucking filter it out yourself blindass
Anonymous
6/25/2025, 9:58:04 AM No.105697822
1000019992
1000019992
md5: 613629a52da69548ea6843b0ed53579d🔍
thoughts on minimax and polaris?
Anonymous
6/25/2025, 10:12:35 AM No.105697910
DER RÜTTLER
DER RÜTTLER
md5: a402936775587fe375411595b4e0fff5🔍
>>105697419
Quite strange why such a comment is being filtered, but GN can it and make it visible if they check their "spam folder" for YouTube comments.
Replies: >>105698779
Anonymous
6/25/2025, 10:45:31 AM No.105698103
>>105697740
Did you try to achieve the max speed just on a single local GPU? You thow too many variables into the equation at once
Replies: >>105698194
Anonymous
6/25/2025, 11:01:29 AM No.105698194
>>105698103
with
./ik_llama.cpp/build/bin/llama-cli \
--model models/ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4/DeepSeek-R1-0528-IQ2_K_R4-00001-of-00005.gguf \
--threads 48 \
--n-gpu-layers 40 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
--flash-attn \
-mla 3 -fa \
-amb 512 \
-fmoe \
-ctk q8_0 \
-ot "blk\.(1|2|3|4)\.ffn_.*=CUDA0" \
-ot "blk\.(5|6|7|8)\.ffn_.*=CUDA1" \
-ot "blk\.(9|10|11|12)\.ffn_.*=CUDA2" \
-ot "blk\.(13|14|15|16)\.ffn_.*=CUDA3" \
--override-tensor exps=CPU \

i get around 3.5T/s
Replies: >>105698422 >>105698577
Anonymous
6/25/2025, 11:41:43 AM No.105698422
>>105698194
>i get around 3.5T/s

I get 4 tkn/s on a single RTX 3090 and gerganov's llama-cli

and it is still 3.5 tkn/s with 20k+ of context >>105693968
I mean a REAL LOADED context
Replies: >>105698577 >>105698591
Anonymous
6/25/2025, 11:48:48 AM No.105698454
file
file
md5: acc4ba2e1ad86a20953bc6b47865c3e4🔍
>>105696010
>my post worked
The benchmark is obviously botched but at least he tried.
Anonymous
6/25/2025, 12:09:28 PM No.105698577
>>105698422
>>105698194
Why -cli and not server? Just curious about your setup, this is not a critique.
Replies: >>105698591 >>105698645
Anonymous
6/25/2025, 12:11:18 PM No.105698591
Screenshot 2025-06-25 120811
Screenshot 2025-06-25 120811
md5: fa52245a6b8167565c5bf3d423e9a653🔍
>>105698422
./ik_llama.cpp/build/bin/llama-cli \
--rpc "$RPC_SERVERS" \
--model models/ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4/DeepSeek-R1-0528-IQ2_K_R4-00001-of-00005.gguf \
--threads 48 \
--n-gpu-layers 99 \
--temp 0.6 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
--flash-attn \
-mla 3 -fa \
-amb 512 \
-fmoe \
-ctk q8_0 \
-ot "blk\.(1|2|3|4|5|6)\.ffn_.*=CUDA0" \
-ot "blk\.(7|8|9|10)\.ffn_.*=CUDA1" \
-ot "blk\.(11|12|13|14)\.ffn_.*=CUDA2" \
-ot "blk\.(15|16|17|18)\.ffn_.*=CUDA3" \
-ot "blk\.(19|20|21|22)\.ffn_.*=RPC[10.0.0.28:50052]" \
-ot "blk\.(23|24|25|26)\.ffn_.*=RPC[10.0.0.28:50053]" \
-ot "blk\.(27|28|29|30)\.ffn_.*=RPC[10.0.0.28:50054]" \
-ot "blk\.(31|32|33|34)\.ffn_.*=RPC[10.0.0.28:50055]" \
-ot "blk\.(35|36|37|38)\.ffn_.*=RPC[10.0.0.40:50052]" \
-ot "blk\.(39|40|41|42)\.ffn_.*=RPC[10.0.0.40:50053]" \
-ot "blk\.(43|44|45|46)\.ffn_.*=RPC[10.0.0.40:50054]" \
-ot "blk\.(47|48|49|50)\.ffn_.*=RPC[10.0.0.40:50055]" \
--override-tensor exps=CPU \
--prompt


i am getting 5.5T/s. 2 T/s worse then llama.cpp. also the ubergarm/DeepSeek-R1-0528-GGUF/IQ2_K_R4 are japping holy hell. couldnt even 0-shot a working flappy bird clone- meanwhile /unsloth/DeepSeek-R1-0528-GGUF/UD-Q2_K_XL has no problems.

>>105698577
i am using the cli with a prompt to get some testing done because i dont have a client ready yet.
Replies: >>105698605
Anonymous
6/25/2025, 12:13:16 PM No.105698605
>>105698591
I see.
I'm hoping to get to programming my own client with the help of chatgpt.
So far SillyTavern has been great but I want to test persistent locations and something what tracks user's location behind the scenes. LLM is used to generate and bolster location descriptions.
I'm sure this has been done million times by now but it's new to me.
Anonymous
6/25/2025, 12:18:28 PM No.105698626
1739039461311126
1739039461311126
md5: 7d7f2d0166f87bcd27f587a2d1e1a21f🔍
bros please help my ESL ass, what does dipsy mean by this?
Anonymous
6/25/2025, 12:21:50 PM No.105698645
>>105698577
>Why -cli and not server?
For some strange reason, in my setup, the server is 40% slower than -cli. The server starts at 4 tkn/s, but quickly falls to 2 tkn/s. I tried different commits, same behavior
Replies: >>105698699 >>105698742
Anonymous
6/25/2025, 12:31:29 PM No.105698699
d
d
md5: 791ed2c12c8b09b845fd867acd8f6f9e🔍
>>105698645
Huh. Have you double checked your system's power settings? I'm just guessing here but there has to be some reason.
Replies: >>105699216
Anonymous
6/25/2025, 12:39:14 PM No.105698742
>>105698645
To add - or compare their terminal output.
Server could be using less of your vram by default unless you are directly specifying it with --gpu-layers for example. (so check out task manager for gpu vram usage too).
Replies: >>105699216
Anonymous
6/25/2025, 12:40:39 PM No.105698751
file
file
md5: 6fc70cc13f212dddf6b2e6bf04e8f118🔍
NOOO AI! Not that realistic! ;_;
Replies: >>105698796
Anonymous
6/25/2025, 12:46:29 PM No.105698779
>>105697419
>>105697910
If you actually care about getting involved you'd be way better off sending an email than relying on youtube to work properly
Replies: >>105699023
Anonymous
6/25/2025, 12:48:23 PM No.105698796
>>105698751
Wait why would she not have a hymen if she is undead?
Replies: >>105698891
Anonymous
6/25/2025, 12:49:34 PM No.105698798
whats the best webdev pycuck model to run on 48gb, ive tried devstral on q8_0 and its shite
Replies: >>105698815
Anonymous
6/25/2025, 12:53:11 PM No.105698815
>>105698798
Don't people still say Qwen is great for dev-cucking some short snippets.
Anonymous
6/25/2025, 12:54:02 PM No.105698820
Is there some tool that abstracts away all the fucking around with prompt formatting with llama-server?
Replies: >>105698860
Anonymous
6/25/2025, 1:00:34 PM No.105698860
>>105698820
ChatGPT.
Anonymous
6/25/2025, 1:05:21 PM No.105698884
how difficult is training a lora and do you need really high vram requirements?
What about a lora vs a fine tune?
I want to try add some question and answer style text blocks to mistral large quants to both increase knowledge and reinforce the answering style, I have 48gb vram
Anonymous
6/25/2025, 1:06:36 PM No.105698891
file
file
md5: 2332da3936312f31a7886691498f4fd6🔍
>>105698796
Anonymous
6/25/2025, 1:11:01 PM No.105698928
>>105698912
>>105698912
>>105698912
llama.cpp CUDA dev !!yhbFjk57TDr
6/25/2025, 1:25:52 PM No.105699023
>>105698779
I was not able to find an email for general contact on the GN website and they said in the video to leave a comment, so that is how I'll try contacting them first.
They were responding to comments in the first hour or so after the video was published, if I was simply too late I'll find another way.
Anonymous
6/25/2025, 2:01:15 PM No.105699216
>>105698699
>>105698742
I'm on Linux. And the only thing I changed was -server instead of -cli.

I noticed than the CPU core were running at approx 80% (I isolated 8 cores for the purpose) in case of the server, while they've been at 100% in case of CLI.

GPU load is the same in both cases.

I see no reason why there should be a difference
Anonymous
6/25/2025, 2:12:56 PM No.105699292
>>105693934
I think you don't understand what LLMs are. They don't "decide" and they don't perform "analysis."
Anonymous
6/25/2025, 2:20:09 PM No.105699343
llama.cpp is a guaranteed blue screen for me. Even for very small models that take up a fraction of my vram it bsods when I unload the model. Am I missing out on anything if I use ollama?
Replies: >>105699359 >>105699450
Anonymous
6/25/2025, 2:23:53 PM No.105699359
>>105699343
why not kobold?
.t henk
Replies: >>105699406
Anonymous
6/25/2025, 2:29:51 PM No.105699406
>>105699359
It's based on llama.cpp so it should have a similar issue right?
Replies: >>105699435
Anonymous
6/25/2025, 2:32:14 PM No.105699435
>>105699406
You never know, ollama uses a lot of llama.cpp code too
Anonymous
6/25/2025, 2:34:37 PM No.105699450
>>105699343
If it doesn't find a connected uranium centrifuge to blow up the system crashes.
It's a known issue, will be fixed soon.