Thread 106364639

382 posts 70 images /g/

Anonymous 8/24/2025, 5:55:39 AM No.106364639 >>106364672 >>106364684 >>106364734 >>106364785 >>106364805 >>106367419

/lmg/ - Local Models General

MikuInAStrangeLand.png md5: 0e838cbd... 🔍

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106358752 & >>106351514

►News
>(08/23) Grok 2 finally released: https://hf.co/xai-org/grok-2
>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025
>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 8/24/2025, 5:56:10 AM No.106364646

file.png md5: 90c68392... 🔍

►Recent Highlights from the Previous Thread: >>106358752

--Grok-2 release and licensing limitations prevent local use and model distillation:
>106360573 >106360583 >106361215 >106361234 >106361251 >106361491 >106361508 >106361292 >106363242 >106361355 >106361361 >106361382 >106361396 >106361554 >106361358 >106361677
--Achieving ultra-fast local ERP inference with aggressive quantization and high-memory setups:
>106359914 >106359949 >106359984 >106359998 >106359974 >106359978 >106360195 >106360219 >106360243 >106362566 >106362839 >106362874 >106360238 >106360272 >106360289 >106360298 >106362206 >106363781 >106362235 >106359989 >106360005 >106360022 >106360042 >106359993
--Quantization tradeoffs: Q4_K_M often sufficient, but higher quants better if resources allow:
>106363201 >106363258 >106363281 >106363328 >106363371
--Command A Reasoning released with strong safety and competitive performance:
>106358780 >106358832 >106358856
--MoE models require neutral prompting to avoid schizophrenic behavior:
>106359448 >106359923 >106360407
--Timeline chart of LLM evolution from LLaMA2 to projected Chinese dominance era:
>106358892 >106358922 >106358959 >106359070 >106359105 >106359241 >106359351 >106359450 >106359474 >106359780
--Skepticism over Elon's claim that Grok 3 will open-source in six months:
>106362417 >106362439 >106362483 >106362602 >106363177 >106363241 >106363972
--Investigate prompt and tokenization differences causing qwen3-30b looped thinking on llama.cpp:
>106362795 >106362812 >106362823 >106362840
--DeepSeek-V3.1 tradeoffs: better context handling but more autistic behavior:
>106362644 >106362661 >106362676 >106362699
--Lightweight TTS options for low VRAM and fast LLM inference:
>106360462 >106360524 >106360638
--Miku (free space):
>106358887 >106362792

►Recent Highlight Posts from the Previous Thread: >>106358757

Why?: >>102478518 (Dead)
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 8/24/2025, 5:59:41 AM No.106364672 >>106364684 >>106364755

>>106364639 (OP)
I like this Miku

Anonymous 8/24/2025, 6:01:04 AM No.106364684 >>106364844

>>106364672
>>106364639 (OP)
Isn't this from the cover art for Heinlein's "Stranger in a Strange Land"?

Anonymous 8/24/2025, 6:09:09 AM No.106364734

>>106364639 (OP)
>Nemotron Nano 2
Why no one quantized it? Is it a failure?

Anonymous 8/24/2025, 6:14:00 AM No.106364755

>>106364672
I like the recap miku better, primarily because of the "o-oe-oo". Love that shit

Anonymous 8/24/2025, 6:20:12 AM No.106364785 >>106364801 >>106364850 >>106364866 >>106367573

>>106364639 (OP)
Hello, I am new to local LLMs - what is the purpose of them?
I entertain the thought of having a sub 70b LLM, but I just cannot fathom a use-case for it.
Can you guys let me know what they could be used for?

Anonymous 8/24/2025, 6:22:36 AM No.106364801

>>106364785
Advanced masturbation

Anonymous 8/24/2025, 6:23:20 AM No.106364805 >>106364847

>>106364639 (OP)
stranger in a strange land sucked
mike sucks

Anonymous 8/24/2025, 6:27:36 AM No.106364844

>>106364684
Yes... the origin of the word grok.

Anonymous 8/24/2025, 6:27:54 AM No.106364847 >>106364899

>>106364805
>stranger in a strange land sucked
It went real quick from an interesting premise to an author's barely disguised fetish.

>mike sucks
you suck

Anonymous 8/24/2025, 6:28:30 AM No.106364850

>>106364785
the 3 Cs of LLMs
>coding
>cooming
>coming up with a third C

Anonymous 8/24/2025, 6:29:24 AM No.106364855

image_2025-08-24_095915088.png md5: fcedf93c... 🔍

Anonymous 8/24/2025, 6:31:55 AM No.106364865

LLMs are useless

Anonymous 8/24/2025, 6:31:59 AM No.106364866 >>106366501 >>106369275

>>106364785
RLM simulator
My friend don't watch that many movies
Redditoors and letterbox niggers suck ass
X is jeetslop

So i can only talk to LLMs about the movies i like.
I also use coding models for making small games and apps for myself.

Anonymous 8/24/2025, 6:33:01 AM No.106364876

LLMs are useful

Anonymous 8/24/2025, 6:36:30 AM No.106364899 >>106364909

>>106364847
That's pretty much every Heinlein book.
>dude dies
>has his brain transplanted into voluptuous female body
>explores his new sexuality
or
>dude has himself cloned with his Y chromosome replaced with another copy of his X
>...
>fucks his clone
Sci-fi at its finest.

Anonymous 8/24/2025, 6:36:31 AM No.106364900 >>106366046 >>106367183

image_2025-08-24_100627204.png md5: 6f3f89a4... 🔍

Anonymous 8/24/2025, 6:37:20 AM No.106364909

>>106364899
repressed tranny

Anonymous 8/24/2025, 6:44:51 AM No.106364945 >>106364952 >>106367586

Deepseek coin when? I need funds for gpus

Anonymous 8/24/2025, 6:45:20 AM No.106364952

>>106364945
Just ask the model what shitcoin to buy

Anonymous 8/24/2025, 6:59:01 AM No.106365043 >>106365112 >>106366052

audio input when? video input when? image generation when? llama.cpp is lagging behind

Anonymous 8/24/2025, 7:10:30 AM No.106365112

>>106365043
As long as the underlying modality is LLM tokens, they will all be shit

Anonymous 8/24/2025, 8:20:48 AM No.106365481

grok 2 killed local

Anonymous 8/24/2025, 8:32:35 AM No.106365538

i want bytedance seed inside me

Anonymous 8/24/2025, 8:39:28 AM No.106365569 >>106365576 >>106365583 >>106366515

Is there anything better than GLM-4.5 Air that I can run on Ryzen 9800 with 64 GB DDR5 along with a 8GB 3060 Ti?
GLM Q3 seems to be the sweet spot so far. It makes both my CPU and GPU work and just barely fits into memory with 32k context, while running at an enjoyable speed.

I think I'm giving up on small 8GB models that fit into VRAM. Any good MoE models other than GLM?
Are there any GLM mixes or variants trained on roleplay slop, or is it too early for that?

Anonymous 8/24/2025, 8:41:35 AM No.106365576

>>106365569
GLM Air is probably the best thing your rig can run, all the better MoE's are much larger, and all the smaller MoE's aren't very good.

Anonymous 8/24/2025, 8:43:11 AM No.106365583 >>106365589

>>106365569
arent you satisfied enough with GLM sex?????

Anonymous 8/24/2025, 8:45:02 AM No.106365589

>>106365583
It's great. But I'm always keeping my eyes open for better models.

Anonymous 8/24/2025, 8:46:41 AM No.106365592 >>106365613 >>106365670 >>106366493

Has anyone managed to install
>https://github.com/KittenML/KittenTTS
Did
>pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl
But it complains about no matching distribution found for misaki (which I have installed anyway). I'm not that technical with python dependencies and especially wheels are bit of a mystery to me.

Anonymous 8/24/2025, 8:52:45 AM No.106365613 >>106365636

>>106365592
Why do you want it anyway

Anonymous 8/24/2025, 8:54:13 AM No.106365617 >>106365623 >>106365624 >>106365634 >>106367605

why are local llms so fucking wordy? I ask it to check grammar in one sentence and I get a whole fucking essay.

Anonymous 8/24/2025, 8:55:15 AM No.106365623 >>106365636 >>106365637 >>106365648

>>106365617
You're supposed to use another LLM to summarize it

Anonymous 8/24/2025, 8:55:16 AM No.106365624 >>106365637

>>106365617
*why are llms

Anonymous 8/24/2025, 8:57:41 AM No.106365634 >>106365662

>>106365617
Just write Json instructions for repeated tasks,
that's what i do
>Grammar checks
>One word answers
>Bullet points
>Rp
>story writing
>criticism

etc

Anonymous 8/24/2025, 8:57:48 AM No.106365635 >>106365774 >>106366445 >>106368339 >>106368366

736296573.jpg md5: 7b2bb3b2... 🔍

mistralbros we are so back

Anonymous 8/24/2025, 8:57:52 AM No.106365636

>>106365623
>>106365613
Get fucked, cretin.

Anonymous 8/24/2025, 8:57:52 AM No.106365637

>>106365624
>>106365623
is that how all the online llm do it? Thats crazy.

Anonymous 8/24/2025, 9:00:15 AM No.106365648 >>106365659

>>106365623
And then have a third LLM grade its performance

Anonymous 8/24/2025, 9:02:43 AM No.106365659

>>106365648
fourth llm generates an image for your query

Anonymous 8/24/2025, 9:03:42 AM No.106365662 >>106365668

>>106365634
so something like
you are a meticulous grammar assistant who extracts grammar mistakes from provided text in structured JSON format use the following json template: { [ { "mistake_id": 1, "mistake_text": "betwen", "corrected": "between" } ] } Here is the provided text:

mostly checking if it can built something like languagetool

Anonymous 8/24/2025, 9:04:31 AM No.106365668 >>106365718

>>106365662
schizo prompt
llms aren't trained with or

Anonymous 8/24/2025, 9:04:42 AM No.106365670 >>106366331

>>106365592
this works for me
https://github.com/clowerweb/kitten-tts-web-demo

Anonymous 8/24/2025, 9:07:40 AM No.106365682 >>106365699

1749701903376710.jpg md5: fb4e9b8b... 🔍

Mistral small 3.2 is unironically better than Gemma for RP
Not just for sex, Gemma is fucking terrible at understanding subtext and struggles at character development, if you define X trait in a character card as a starting point, that character will ALWAYS be defined by X, no matter the context. Also if you make a character angry it will NEVER fucking calm down, it will act like a hysterical 45 year old HR worker who just saw a 30 year old man kiss an 18 year old woman, until the heat death of the universe.
tl;dr total frog victory

Anonymous 8/24/2025, 9:10:54 AM No.106365699 >>106365707 >>106365731 >>106368179

>>106365682
mistral small 3.2 gets mogged by Rocinante its time to upgrade anon

Anonymous 8/24/2025, 9:13:01 AM No.106365707

>>106365699
I used to use nemo/rocinante, small is much better, even 3/3.1 were better.

Anonymous 8/24/2025, 9:13:04 AM No.106365709 >>106365732 >>106366442 >>106366457

Total noob here. For Grok 2 it says

>Use the command below to launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).

To run this thing locally I need, say, an NVIDIA A100 ($15,000 each)? $120,000 for the GPUs alone?

Anonymous 8/24/2025, 9:15:11 AM No.106365718 >>106365768

>>106365668
just copying stuff from google no idea how this works. So far it works fine but fucks up inserting random bullshit.

Anonymous 8/24/2025, 9:18:06 AM No.106365731 >>106365739 >>106365741

>>106365699
>user: hello
>char: walks over and grabs your cock, "fuck me, anon. Make me yours"
wow, great erp!! So immersive.

I can name 4 nemo finetroons that beat out rocinante by a mile.

Anonymous 8/24/2025, 9:18:06 AM No.106365732 >>106365773

>>106365709
did you expect elon musk was using consumer cards or something?

Anonymous 8/24/2025, 9:19:08 AM No.106365739

>>106365731
go ahead anon humor me with that massive cock of yours

Anonymous 8/24/2025, 9:19:53 AM No.106365741 >>106365762 >>106366365

>>106365731
Not him, but what are some good non-drummer nemo tunes?

Anonymous 8/24/2025, 9:24:44 AM No.106365762 >>106365767 >>106365783

>>106365741
Bigger-Personality
Irix-12B-Model_Stock
MN-12B-Mag-Mell-R1
patricide
Golden-Curry

Anonymous 8/24/2025, 9:25:32 AM No.106365767

>>106365762
>curry mentioned
I LOVE INDIA

Anonymous 8/24/2025, 9:25:45 AM No.106365768

>>106365718
Well then read a few pages on how these instructions work.

Anonymous 8/24/2025, 9:26:18 AM No.106365773

>>106365732
No. I just wanted to know whether I, a total noob, understand this correctly. Reading the previous thread it seems so. How sad.

Anonymous 8/24/2025, 9:26:25 AM No.106365774

>>106365635
Can't wait for them to not release it

Anonymous 8/24/2025, 9:28:27 AM No.106365783 >>106365794

>>106365762
>frankenmerges
>coomtunes
>model cards recommending temp 1.25 for nemo

Anonymous 8/24/2025, 9:28:29 AM No.106365784 >>106365787

mistral 2508 bros???????

Anonymous 8/24/2025, 9:29:28 AM No.106365787

>>106365784
There's a week left in august. But I'd wait at least one extra week afterwards, to be safe.

Anonymous 8/24/2025, 9:30:51 AM No.106365794 >>106365916

>>106365783
...and yet better than rocinante

Anonymous 8/24/2025, 9:31:15 AM No.106365797 >>106366725

Quixk initial impression: I like the way GLM-4.5 (355B-A32B) writes more than ERNIE-4.5-300B-A47B in caee you were about to download one of these and were deciding which to try first.

Anonymous 8/24/2025, 9:33:37 AM No.106365812 >>106365815 >>106366182

>have long script ai helped me make
>ask it to help me clean it up, reorganize it
>it nails it
>ask it a question about copying an existing feature to another part
>it explodes
mistral small 24b. it just doesn't want to it no matter how i ask. dling devstral to try, but thats based on the old mistral 3.1 small. i'm amused it could nail something first pass, then just shit the bed like this

Anonymous 8/24/2025, 9:35:14 AM No.106365815 >>106365830 >>106365931

>>106365812
Mistral models don't really excel in coding, especially the small ones. You should be using qwen 32b-coder in that size range.

Anonymous 8/24/2025, 9:38:16 AM No.106365830 >>106365837

>>106365815
yeah i used qwen 2.5 code in the past and its pretty good. i need to see what the equiv of 3 is for similar size. i just like mistral small 24b (with thinking) as a smart small model that fits in my vram. its always been pretty good. but now i just watched it impress me, then shit itself. its even having trouble with some basic html stuff

Anonymous 8/24/2025, 9:39:24 AM No.106365837 >>106365847 >>106365862 >>106365868

>>106365830
>i need to see what the equiv of 3 is for similar size.
https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct

Anonymous 8/24/2025, 9:41:24 AM No.106365847

>>106365837
anon im in love thank you

Anonymous 8/24/2025, 9:44:14 AM No.106365862

>>106365837
thanks, dling now. how would you compare this to qwen 2.5 coder or older codestral 22b? i was using the 32b qwen

Anonymous 8/24/2025, 9:45:33 AM No.106365868 >>106365884

>>106365837
In the one test I ever tried Devstral-Small-2507 and 2505 shat on Qwen3-Coder-30B-A3B-Instruct.

Anonymous 8/24/2025, 9:49:32 AM No.106365884

>>106365868
and?? tell us more

Anonymous 8/24/2025, 9:55:23 AM No.106365916

>>106365794
Bait used to be believable

Anonymous 8/24/2025, 9:57:15 AM No.106365931 >>106365947 >>106366387

>>106365815
>You should be using qwen 32b-coder

I see Qwen coder being recommended again and again ad nauseam

is it really that good? Don't tell me "for its size and speed"

Anonymous 8/24/2025, 10:00:59 AM No.106365947 >>106365973 >>106365979

>>106365931
>Don't tell me "for its size and speed"
Are you a retard or something? If you want to run something locally and you don't have a cpumaxx rig with well over 128GB of memory then yes, it's about the best you can get. If your dad works at nvidia and can buy you a few H100s for your birthday then go run deepsneed locally, or pay for access to one of the big flagship models hosted by corpos.

Anonymous 8/24/2025, 10:07:04 AM No.106365973 >>106365978

retard.png md5: b7adaaf4... 🔍

>>106365947
>you don't have a cpumaxx rig

I do have a cpumax rig
So, what's the point?

Anonymous 8/24/2025, 10:08:06 AM No.106365978

>>106365973
Then you should be capable of doing your own research and testing

Anonymous 8/24/2025, 10:08:25 AM No.106365979

>>106365947
>or pay for access to one of the big flagship models hosted by corpos

You can use ds for free online

Anonymous 8/24/2025, 10:15:15 AM No.106366017 >>106366028 >>106366043 >>106366069 >>106366074 >>106366176 >>106366938

toodamnhigh.jpg md5: 4b56759e... 🔍

The LLM industry is built on a house of cards and it's just a matter of time before it crumbles. Most "best practices" or "recommendations" are a lie. Scaling never actually worked as claimed and if anything made the problem worse. It's going to crash so hard I'm actually worried for the economy.

Anonymous 8/24/2025, 10:16:35 AM No.106366028

>>106366017
LLMs help me coom and I can do that with models that already exist on my hard drive so I dont care

Anonymous 8/24/2025, 10:19:48 AM No.106366043

>>106366017
>I'm actually worried

A desperate attempt of attention seeking

You won

Anonymous 8/24/2025, 10:20:49 AM No.106366046 >>106366126

>>106364900
So GGUF is the sound she makes when the drink goes down the wrong pipe?

Anonymous 8/24/2025, 10:21:38 AM No.106366052

>>106365043
Lagging behind what?

Anonymous 8/24/2025, 10:23:00 AM No.106366060

We already reached the endgame(Nemo). What are you still doing here?

Anonymous 8/24/2025, 10:25:53 AM No.106366069

>>106366017
Consider this: the amount of information that LLMs can store in their weights is limited by their parameter size, around 2~4 bits of information per parameter (in the information-theoretical sense), depending on who you ask. There's no way they're actually learning efficiently from tens of trillions of tokens of data with current large-scale pretraining methods.

Anonymous 8/24/2025, 10:26:39 AM No.106366074

>>106366017
>It's going to crash so hard I'm actually worried for the economy.
There a so many reasons you should be worried about the economy that LLM's shouldn't even make the top 50 anon, get some perspective.

Anonymous 8/24/2025, 10:33:43 AM No.106366126 >>106366168

>>106366046
that drink isn't for her, it's for you.

Anonymous 8/24/2025, 10:41:57 AM No.106366168 >>106366171

>>106366126
I'm not thirsty.

Anonymous 8/24/2025, 10:42:58 AM No.106366171

>>106366168
you don't have a choice

Anonymous 8/24/2025, 10:43:33 AM No.106366176

>>106366017
Just short big tech if you believe so. There's money to be made if you think something will crash.

Anonymous 8/24/2025, 10:44:42 AM No.106366182

>>106365812
AI labs tunnel vision hard on zero-shot problems instead of multi-turn. That's why. Just start over and make a new prompt for your new problem.

Anonymous 8/24/2025, 10:50:11 AM No.106366205 >>106366372

Are there any slop scoring models, for data filtering and such? Sure, counting words and n-grams would get you far, but what about deeper stuff? I bet even a classifier model that tells apart male and female prose would be useful.

Anonymous 8/24/2025, 11:09:26 AM No.106366305

1745783054150523.jpg md5: 41bc3508... 🔍

Is there a way to roughly calculate what speed you can expect on a cpu + gpu build for MoE models running -ot exps=cpu?
In particular, what would happen if you upgraded the GPU part of your server with something that's a lot faster like a 5090 or switched out the CPU+RAM part for one of those server boards with lots of channels.
By splitting your model across both RAM and GPU it obviously stops scaling linearly with the bandwidth of either which is confusing. Is there a bottleneck hidden here somewhere? Like, that even if you had a 12-channel ddr5 server that it wouldn't make much of a difference in token generation speed between a 3090 and a 5090 despite the latter having twice the memory bandwidth because of something retarded like the PCI-E bandwidth getting in the way.

Anonymous 8/24/2025, 11:15:03 AM No.106366331 >>106366357

>>106365670
Instead of kitten, I implemented Piper solution for my client. Took 15 minutes to set it up in the python script.
>https://github.com/OHF-Voice/piper1-gpl
It's fun but the voices are so monotonous that it will get pretty dull fast.

Anonymous 8/24/2025, 11:17:38 AM No.106366345 >>106366528 >>106367123

LLM-history-fancy.png md5: e6c41eca... 🔍

Anonymous 8/24/2025, 11:18:59 AM No.106366357

>>106366331
To add: there is something more relaxing about being able to hear the voice instead of just staring at the wall of text. Maybe I'm somewhat dyslexic or it's an ergonomical issue...

Anonymous 8/24/2025, 11:21:13 AM No.106366365 >>106366402

>>106365741
>good
>tunes
lol, lmao even
lay off the troonshine and learn to prompt

Anonymous 8/24/2025, 11:22:50 AM No.106366372 >>106366414

>>106366205
It won't get you anywhere. We've come far since early kaioken's slop issues, and I barely encounter those early phrases outside of their appropriate context now. But the current iteration of slop is no less annoying. It's the inherent repetitiveness of the patterns that's making the vein pop, not their fotm class.

Anonymous 8/24/2025, 11:25:15 AM No.106366387

>>106365931
>Don't tell me "for its size and speed"
It's really "for its size and speed" otherwise you would just go paypig for Claude nigger
with that said, all Qwen code models are best in class in their size range. That includes the 480B-A35B if you can run that. Qwen makes models for almost all sizes except the gigawhale range (where DeepSeek and Kimi K2 are only game in town anyway)

Anonymous 8/24/2025, 11:28:26 AM No.106366402

>>106366365
tongue my anus

Anonymous 8/24/2025, 11:31:09 AM No.106366414

>>106366372
Maybe what I mean isn't slop but the opposite of show-not-tell, both have a lot in common. So I thought maybe approaches like style clustering could help filter it out, but i don't think there are any embedding models for this.

Anonymous 8/24/2025, 11:38:04 AM No.106366442

>>106365709
This is all bloat they rushed out to catch up with the big players. Requirements for Grok 3 may be less demanding.

Anonymous 8/24/2025, 11:38:24 AM No.106366445

>>106365635
Gemini 2.5 is still #1 jai hind

Anonymous 8/24/2025, 11:42:06 AM No.106366457

>>106365709
If anyone cared to implement this trash fire in llama cpp, the requirements would be lowered with quantization, but yes, these requirements are typical of any unquanted API model and all of us are in fact coping hard running local models quanted because even a small model wouldn't fit our GPUs without quant
desu I would be more interested in an open source release of the mini versions of Grok

Anonymous 8/24/2025, 11:43:08 AM No.106366465 >>106366471

If one more person criticizes my ability to prompt I'm going to cry

Anonymous 8/24/2025, 11:44:20 AM No.106366471 >>106366524

>>106366465
promptlet

Anonymous 8/24/2025, 11:49:42 AM No.106366493 >>106366531 >>106366658

>>106365592
Will the suckers ever learn that conda is an essential tool...

install conda on your computer then run these commands one by one

conda create -n "py310" python=3.10
conda activate py310
pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl --no-cache-dir

--no-cache-dir is not needed, but it will force pip to download all stuff instead of using cached data.
When the installation finishes, start python and paste this when you see the python prompt '>>':

from kittentts import KittenTTS
m = KittenTTS("KittenML/kitten-tts-nano-0.2")

audio = m.generate("This high quality TTS model works without a GPU", voice='expr-voice-2-f' )

# available_voices : [ 'expr-voice-2-m', 'expr-voice-2-f', 'expr-voice-3-m', 'expr-voice-3-f', 'expr-voice-4-m', 'expr-voice-4-f', 'expr-voice-5-m', 'expr-voice-5-f' ]

# Save the audio
import soundfile as sf
sf.write('output.wav', audio, 24000)

Look for the file output.wav

Anonymous 8/24/2025, 11:50:57 AM No.106366501 >>106369364

>>106364866
don't we have /tv/ here or something
>inb4 4chins suck more ass than redditors
prompt issue

Anonymous 8/24/2025, 11:54:59 AM No.106366515 >>106368591

>>106365569
Give me your llama.cpp args, because somehow I can't even fit a single layer of GLM-Air-chan into my 8G of VRAM, it always complain about being out of memory even if I set -ngl 0 and -c 512

Anonymous 8/24/2025, 11:55:24 AM No.106366518 >>106367088 >>106367644

lmg 2025 bingo.png md5: 6684a4bd... 🔍

Leak please, I want a bingo

Anonymous 8/24/2025, 11:56:04 AM No.106366524

1740567977567676.gif md5: 078cc787... 🔍

>>106366471

Anonymous 8/24/2025, 11:57:12 AM No.106366528

>>106366345
It's so over

Anonymous 8/24/2025, 11:58:03 AM No.106366531 >>106366637

>>106366493
nta, my conda experience was waiting for 10 hours for it's retarded package manager to fail. It doesn't even tell you why it's hanging, fucking disrespectful software. Also venv creates the environment in whatever directory I want, i don't understand why anyone would use anaconda.

Anonymous 8/24/2025, 12:04:17 PM No.106366559 >>106366660

I use Anubis-70B-v1-IQ4_XS.gguf [llama.cpp] for ERP. Sell me on a better model that works well on two 3090s.

Anonymous 8/24/2025, 12:06:06 PM No.106366567 >>106366579 >>106366590 >>106366681 >>106370898

===FRIENDLY REMINDER===
DOWNGRADE YOUR CUDA IF YOU ARE USING RTX 3000 SERIES CARDS
NVIDIA IS RUINING PERFORMANCE ON THOSE WITH THE NEW VERSIONS OF CUDA
I accidentally updated torch yesterday from CUDA 12.6 to CUDA 12.8 and it ran like shit and even OOMed. After going back to CUDA 12.6 the issues were gone.

Anonymous 8/24/2025, 12:08:22 PM No.106366579

>>106366567
Never upgrading unless absolutely necessary chads, where we at? Still at 12.5 here.

Anonymous 8/24/2025, 12:09:17 PM No.106366585

forever cuda 12.4

Anonymous 8/24/2025, 12:10:21 PM No.106366590

>>106366567
>old software
>new version
>better, faster
>modern software
>more workarounds, slower, more bugs
Why is this?

Anonymous 8/24/2025, 12:18:45 PM No.106366637

>>106366531
my experience was so far rather positive. Python is a bloatware, so what

One way or another you will have to isolate your installation in a separate venv. I had cases where ome module installs torch2.8.0 for the other module to downgrade it to 2.7 something, and the whole thing did not fly because of this.

Python's native venv would do the trick as well

Anonymous 8/24/2025, 12:19:23 PM No.106366642 >>106366720 >>106366897

1733100083347886.jpg md5: 3c68f70a... 🔍

https://github.com/LostRuins/koboldcpp/releases/tag/v1.98

Anonymous 8/24/2025, 12:22:43 PM No.106366658 >>106366659 >>106366691

>>106366493
Thanks, I'll take a look at it.
As fun as python can be for retards I hate the bloat. Install one thing and you'll need 20 extra modules.

Anonymous 8/24/2025, 12:23:35 PM No.106366659

>>106366658

Godspeed, anon

Anonymous 8/24/2025, 12:23:44 PM No.106366660

>>106366559
Alpaca.

Anonymous 8/24/2025, 12:28:39 PM No.106366681

>>106366567
windows doesn't have this problem

Anonymous 8/24/2025, 12:30:52 PM No.106366691 >>106366716 >>106366741 >>106366775

>>106366658
How is this bloat? Those modules are explicitly required by the module you need. Sometimes, rarely, modules will require more than they actually need (like requiring stuff for training when you just need to infer), but that's not a python's "bloat" problem, just the authors of those modules being careless.

Anonymous 8/24/2025, 12:34:19 PM No.106366716 >>106366751

>>106366691
>How is this bloat?

if you create a venv, all stuff needed will be copied in there. You'll end up with multiple copies of torch etc.

Anonymous 8/24/2025, 12:34:49 PM No.106366720 >>106366763

>>106366642
Alright how do I into thinking budget for seed 36b?

Anonymous 8/24/2025, 12:35:36 PM No.106366725

>>106365797
I agree with this. ERNIE-4.5 isn't that impressive as a text model.

Anonymous 8/24/2025, 12:36:39 PM No.106366732 >>106366853 >>106366855 >>106366906

1737000065239715.png md5: 565dfeea... 🔍

https://github.com/ggml-org/llama.cpp/pull/15524
AMD prompt processing speed more than doubling on vulkan for MoE models

Anonymous 8/24/2025, 12:37:32 PM No.106366741 >>106366751

>>106366691
The only design decision of python that I think is really bad is allowing a >= version specification in the requirements. It's the only way you can get an error under normal circumstances

Anonymous 8/24/2025, 12:40:57 PM No.106366751 >>106366786 >>106366795

>>106366716
OK I agree that this can be seen as bloat, but if you make symlinks to already installed stuff instead, edits in one venv can affect another one. Choosing between two separation and disk space, I'd choose disk space.

>>106366741
>different OSes
>different python versions
>also implying author won't just reupload a different thing under the same version

Anonymous 8/24/2025, 12:43:05 PM No.106366763

>>106366720

It's all in the system prompt anon. Just add it into you system prompt in this format

"You are an intelligent assistant with reflective ability. In the process of thinking and reasoning, you need to strictly follow the thinking budget, which is {{thinking_budget}}. That is, you need to complete your thinking within {{thinking_budget}} tokens and start answering the user's questions."

Anonymous 8/24/2025, 12:43:52 PM No.106366767 >>106366938

Thoughts on JEPA?

Anonymous 8/24/2025, 12:45:13 PM No.106366775

>>106366691
I meant in general, dumbass.

Anonymous 8/24/2025, 12:47:06 PM No.106366786

>>106366751
>different OSes
>different python versions
could be useful
>also implying author won't just reupload a different thing under the same version
what kind of idiot would do this

Anonymous 8/24/2025, 12:48:01 PM No.106366795

>>106366751
>OK I agree that this can be seen as bloat, but if you make symlinks to already installed stuff instead, edits in one venv can affect another one. Choosing between two separation and disk space, I'd choose disk space.

This unironically

Anonymous 8/24/2025, 12:55:22 PM No.106366836

>i'm a nigbophile

Anonymous 8/24/2025, 12:58:38 PM No.106366853

>>106366732
so many delicious pulls blue balling me

Anonymous 8/24/2025, 12:58:39 PM No.106366855

>>106366732
And that's a good thing, since rocm is fucking dead in the water

Anonymous 8/24/2025, 1:04:41 PM No.106366897 >>106366901

>>106366642
The maintainer wants to fuck shortstack dragon lizards.

Anonymous 8/24/2025, 1:05:23 PM No.106366901

>>106366897
If you don't then you don't belong here

Anonymous 8/24/2025, 1:06:13 PM No.106366906 >>106366927 >>106366930

>>106366732
man the radeon vii is so funny, they made a consumer gaming card with insane memory bandwidth, twice as much vram as anyone needed, but disappointing gaming performance. there was literally no good use case for the thing on release, but if they did an experiment like that today they'd have made the perfect local llm card for people who don't want to build a home datacenter

Anonymous 8/24/2025, 1:09:53 PM No.106366927 >>106366941 >>106366978

>>106366906
Why does AMD manage to suck so much at GPUs(besides obvious cousin at nvidia) compared to CPUs? They should just sell off GPU division to someone competent.

Anonymous 8/24/2025, 1:10:14 PM No.106366930

>>106366906
AMD has since become 100% controlled opposition, so that'll never happen. You think it's a coincidence that AMD drops out of high-end at the same time that nvidia releases their worst generation of cards, ever?

Anonymous 8/24/2025, 1:11:57 PM No.106366938 >>106367082

1749592160433.jpg md5: 14f63578... 🔍

>>106366017
>pic related
>>106366767
The currently released JEPA models like V-JEPA 2 are primarily for robotics and still need improvements before they can actually be used for serious tasks. The LANG-JEPA model that the regular Joe would care about and use is not ready and there aren't any regular updates on the status of that model.

Anonymous 8/24/2025, 1:12:39 PM No.106366941

>>106366927
The US government should buy it

Anonymous 8/24/2025, 1:18:27 PM No.106366969

file.png md5: ecd16717... 🔍

Those retards are basically ollama of quanters. I have no idea what they are doing with this shit but you can quant all attention layers to at least 5bpw. shared experts to 4.5bpw and routed experts to 2.1-2.3bpw and get the same filesize.

Anonymous 8/24/2025, 1:19:52 PM No.106366978

>>106366927
>They should just sell off GPU division to someone competent.
It's not native to them. They just bought ATI.

Anonymous 8/24/2025, 1:35:46 PM No.106367082 >>106367086

1725027938543797.png md5: 94e3c2ba... 🔍

>>106366938
lecunny btfo by an auto-regressive llm

Anonymous 8/24/2025, 1:36:35 PM No.106367086

>>106367082
btw this is a fresh chat
there is no previous context

Anonymous 8/24/2025, 1:36:46 PM No.106367088 >>106367112

>>106366518
>we are 2/3 through the year
>nobody, not even proprietary, has figured out how to do long(1M+) context properly
it's over

Anonymous 8/24/2025, 1:39:29 PM No.106367112 >>106367125 >>106367141 >>106367143 >>106367157 >>106367178 >>106367190

>>106367088
Consider the following: There are no 1M token long documents to train on.

Anonymous 8/24/2025, 1:40:57 PM No.106367123

>>106366345
>2023->2024
Llama1->Miqu
>2024->2025
Miqu->R1

This gives me hope for the future.

Anonymous 8/24/2025, 1:41:30 PM No.106367125 >>106367136 >>106367392

>>106367112
lotr trilogy is ~500k words so around ~600k tokens

Anonymous 8/24/2025, 1:43:54 PM No.106367136

>>106367125
nta

count all characters and divide by 4 for English texts.

Will be different for non-English text

The Bible is 3.11m letters. idk if spaces are included

Anonymous 8/24/2025, 1:44:31 PM No.106367141 >>106367147 >>106367186

wordcount.png md5: b7b8e968... 🔍

>>106367112
WRONG!

Anonymous 8/24/2025, 1:44:54 PM No.106367143

>>106367112
Consider the following: Just concat multiple smaller documents together.
Most uses for long context are either having it work on many source files from a code base, analyze lots of web search results, or writing novels.

Anonymous 8/24/2025, 1:45:54 PM No.106367147 >>106367161

>>106367141
You see those multiple colors? Those are series with multiple books, not a single document.

Anonymous 8/24/2025, 1:47:30 PM No.106367157 >>106367399

>>106367112
Codebases, anon, codebases! Take for example llama.cpp.

Anonymous 8/24/2025, 1:48:31 PM No.106367161

>>106367147
The context of previous book matters for the next book.

Anonymous 8/24/2025, 1:51:02 PM No.106367178

>>106367112
just give it the linux source code

Anonymous 8/24/2025, 1:51:36 PM No.106367183

>>106364900
Ha! I can tell that's qwen because the dumb thing takes "can of beer" to be can + beer, ending up somehow with a transparent can showing the beer inside.

Anonymous 8/24/2025, 1:51:58 PM No.106367186

>>106367141

None of them is a coherent flow of conscience

Just some nonsensical collection of pseudoscience

Anonymous 8/24/2025, 1:52:53 PM No.106367190

>>106367112
What about all the im logs that are totally private?

Anonymous 8/24/2025, 1:53:18 PM No.106367193 >>106367213 >>106367228

It's up!
https://huggingface.co/collections/sugoitoolkit/sugoillm-68aa6049fbd744558d952925

Anonymous 8/24/2025, 1:56:20 PM No.106367213

>>106367193
Buy an ad nigger

Anonymous 8/24/2025, 1:57:39 PM No.106367228

>>106367193
>2 months old
>gguf only
>zero information
>Qwen/Qwen2.5-32B
You can fuck right off

Anonymous 8/24/2025, 2:09:37 PM No.106367304 >>106367313 >>106367326

K2 reasoner... never...

Anonymous 8/24/2025, 2:10:42 PM No.106367313

>>106367304
Obsoleted by Sugoi anyway

Anonymous 8/24/2025, 2:12:46 PM No.106367326 >>106367340 >>106367375 >>106368320

>>106367304
Why would you need a reasoner?

Anonymous 8/24/2025, 2:14:00 PM No.106367340

>>106367326
Reasoners do creative writing better, RP better and code better.

Anonymous 8/24/2025, 2:16:59 PM No.106367360 >>106367367

I find it funny when reasoning models realizes that they're wrong when writing the final output

Anonymous 8/24/2025, 2:18:23 PM No.106367367

>>106367360
me at exams

Anonymous 8/24/2025, 2:19:32 PM No.106367375

>>106367326
Reasoners benchmaxx better

Anonymous 8/24/2025, 2:20:07 PM No.106367381 >>106367453

cheers clinks.jpg md5: b626cae0... 🔍

Under 70b bros.

What are you using to goon? I've been messing around with Gemma 3 (27b). I have no idea how but a few months back I remember it being awful. But now (maybe my card is better, prompts are better, who knows) it seems ridiculously good. Like, actually impressive levels for ERP (good at remembering my cards details, knowing what I want the card to achieve etc).

Makes me wanna reconsider Qwen models now (also used to be garbage). I tried out that Qwen 30b MoE instruct one but it was just too inconsistent in quality (speed was 10/10 though)

Anonymous 8/24/2025, 2:22:43 PM No.106367392 >>106367402

>>106367125
That's three documents.

Anonymous 8/24/2025, 2:23:58 PM No.106367399 >>106367441

>>106367157
Consist of multiple documents.

Anonymous 8/24/2025, 2:23:59 PM No.106367400 >>106367421 >>106367463 >>106369024

3.png md5: 082e3c1e... 🔍

fuck reasoning
and fuck all the retards who think there's actual reasoning under the hood
absolutely retarded models

Anonymous 8/24/2025, 2:24:10 PM No.106367402

>>106367392
Document != context
Say I want to research something and the LLM decided to use search, it could have returned 50 search results (50 documents) but they all have to be in the context

Anonymous 8/24/2025, 2:27:30 PM No.106367419

>>106364639 (OP)
>the scent of ozone is suddenly chocked out by the sharp tang of ozone
models are devolving.... it's over...

Anonymous 8/24/2025, 2:27:35 PM No.106367421 >>106367448

>>106367400
I'd probably fail that too. Why does that dog have an extra leg?

Anonymous 8/24/2025, 2:30:12 PM No.106367441

>>106367399
Documents related to each other. Without one the other don't function, therefore they should be combined in one context.

Anonymous 8/24/2025, 2:30:54 PM No.106367448 >>106367481 >>106367626

>>106367421
https://www.bbc.com/news/uk-wales-68017390
and no, you wouldn't "probably fail that", if you saw that dog you would notice something is fucking wrong with it
the point of this sort of image serving as test though is that because that sorta shit isn't benchmaxxed yet so you can see the true nature of LLMs as retarded pattern matchers in action and no amount of minutes spent in muh reasoning tokens can fix it
LLMs are absolutely incapable of reasoning, it's all a simulation that works when you test them on something that was benchmaxxed (like all the math problems that are more than 50% of the datasets used to train modern LLMs), when you're trying to make them reason about something they weren't benchmaxxed on.. you see the reasoning for what it truly is
BULLSHIT

Anonymous 8/24/2025, 2:31:32 PM No.106367453

>>106367381
unironically rocinante

Anonymous 8/24/2025, 2:32:41 PM No.106367463 >>106367479

>>106367400
Fuck vision in general, can't OCR, can't count, doesn't know any people/characters, and is cucked more than text

Anonymous 8/24/2025, 2:32:53 PM No.106367464 >>106367550

loli footjobs....

Anonymous 8/24/2025, 2:35:11 PM No.106367479 >>106367507

>>106367463
Oldschool vision is still good
https://github.com/roboflow/supervision

Anonymous 8/24/2025, 2:35:20 PM No.106367481 >>106367494

>>106367448
If you give humans the same question, a decent fraction of them will give similar answers. "He told me to think hard about it, so it can't be the obvious answer."

Anonymous 8/24/2025, 2:37:09 PM No.106367494 >>106367534

>>106367481
Didn't occur to you that "think hard" part was only added precisely because the model got it wrong in the previous round?

Anonymous 8/24/2025, 2:39:20 PM No.106367507

>>106367479
Isn't that an ultralytics wrapper which is already a yolo wrapper?

Anonymous 8/24/2025, 2:39:57 PM No.106367510

5.png md5: 02033b4d... 🔍

le sigh
they can't do it, no matter how you phrase it

Anonymous 8/24/2025, 2:42:10 PM No.106367528

btw GPT-5 counts it right but thinks it's an illusion
I 100% believe the reasoning is at fault, not the vision part. LLMs can't reason about things they weren't benchmaxxed to reason about.

Anonymous 8/24/2025, 2:42:58 PM No.106367534 >>106367549

>>106367494
My default assumption with any partial images is that the missing part runs counter to whatever the poster wants to say.
Not a perfect heuristic, as this case shows, but it tends to be accurate.

Anonymous 8/24/2025, 2:44:39 PM No.106367549

>>106367534
>the missing part runs counter to whatever the poster wants to say.
I linked to the article that has the original picture. Any nigger can go and test it for themselves if they think I'm bullshitting the prompts. The unbelievable thing is how much faith people put in these bullshit generators. In a just world scam altman would get his comeuppance for the mountain of lies he spouted to scam investments.

Anonymous 8/24/2025, 2:44:42 PM No.106367550

>>106367464
foot lolipops

Anonymous 8/24/2025, 2:46:23 PM No.106367557 >>106367595

mr cunn will save us right after he defeats wang in a battle to the death for zucks gpus

Anonymous 8/24/2025, 2:48:52 PM No.106367573

>>106364785
There is no use. Dont even know why they are released. Why are you here? You like posting proprietary/ IP/ personal data to someone else that you do not know? You like someone else fidgeting with your settings and not even revealing what they did? Your model + your info = your control. Not your server, not your data.

Anonymous 8/24/2025, 2:50:17 PM No.106367586

>>106364945
Comput3 has B200s now.

Anonymous 8/24/2025, 2:51:03 PM No.106367595 >>106367609 >>106367636

1732282137528189.png md5: d003fa45... 🔍

>>106367557
>mr cunn
>wang
>zuck

Anonymous 8/24/2025, 2:52:39 PM No.106367605

>>106365617
Tell it be brief, simple as.
>If you dont have 10 custom GPT/ json files, ngmi lads.

Anonymous 8/24/2025, 2:53:45 PM No.106367609 >>106367614

>>106367595
Stop doxxing meta GenAI employees

Anonymous 8/24/2025, 2:54:31 PM No.106367614

>>106367609
You can't dox public figures

Anonymous 8/24/2025, 2:55:36 PM No.106367626

>>106367448
Yeah, if you saw it in real life. Not just a random picture.
LLMs won't have the kind of human level reasoning needed to truly hit AGI until they start getting slapped into robots with pseudo-senses to experience the real world.

Anonymous 8/24/2025, 2:56:39 PM No.106367636

>>106367595
Dam Son...

Anonymous 8/24/2025, 2:57:49 PM No.106367644 >>106367738

472327795763314698.png md5: 11594683... 🔍

>>106366518
Imagine if we ever got a Sonnet 3.5 leak. Man. Of course that would never happen.

Anonymous 8/24/2025, 3:09:19 PM No.106367738

>>106367644
Sonnet was shit until 3.7 though

Anonymous 8/24/2025, 3:13:16 PM No.106367777

claude is overrated
I'd rather a Gemini leak

Anonymous 8/24/2025, 3:18:13 PM No.106367802

dense :<
moe :3

Anonymous 8/24/2025, 3:26:57 PM No.106367860 >>106367865 >>106367886 >>106367931

142140240420.png md5: 16583d75... 🔍

Why is Gemma 27b so much slower on my 4090 that other models of similar sizes (Mistral Small, Qwen even the old grandpappy, Command R).

I really don't get it

Anonymous 8/24/2025, 3:27:46 PM No.106367865 >>106367884 >>106368340

>>106367860
someone explained in the last thread, part of the model has to be run on cpu because there's no cuda kernel for it

Anonymous 8/24/2025, 3:30:31 PM No.106367884 >>106368031

>>106367865
oh shit really? Has nobody found a fix? The model is super good besides that

Anonymous 8/24/2025, 3:30:47 PM No.106367886 >>106367894 >>106368318

>>106367860
In addition to what the other poster said about CUDA support, Gemma also uses a SHITLOAD of memory for context, make sure that it isn't overflowing into your RAM.

Anonymous 8/24/2025, 3:31:52 PM No.106367894 >>106367918

>>106367886
MAX ram usage peaked out at 11GB (I have 32)

Anonymous 8/24/2025, 3:35:44 PM No.106367918 >>106367941

>>106367894
Anon the model alone is about 11GB at Q3, is it filling up your entire 24GB VRAM and has another 11GB in system? If so, that's why it's slow. Any dense model spilling into system ram will have a huge speed penalty.

Anonymous 8/24/2025, 3:37:21 PM No.106367931 >>106367955

>>106367860
Gemma is bit slow because it's thicc...

Anonymous 8/24/2025, 3:38:08 PM No.106367941

>>106367918
VRAM useage didn't cross over to 24GB when I fully offloaded it (at Q4 as well) with 14k context. I'm using GLM 32B right now (same size, Q4 also at 16k context now) and i'm getting around 35 t/s

Anonymous 8/24/2025, 3:39:44 PM No.106367955 >>106368013

>>106367931
She does enjoy her curries

Anonymous 8/24/2025, 3:46:43 PM No.106368013 >>106368029

>>106367955
I LOVE INDIA!!!!

Anonymous 8/24/2025, 3:48:43 PM No.106368029

>>106368013
A country so great its biggest export is its own people

Anonymous 8/24/2025, 3:49:02 PM No.106368031

>>106367884
vibe code a fix anon

Anonymous 8/24/2025, 4:00:51 PM No.106368116 >>106368141 >>106368262

I found out that Apple did a Mixtral 8x7B finetune earlier this year for a research project

>Recent advances in large language models have demonstrated impressive capabilities in task-oriented applications, yet building emotionally intelligent chatbots that can engage in natural, strategic conversations remains a challenge. We present a novel approach called SAGE that uses latent variables to control long-horizon behavior in dialogue generation. At the core of our method is the State-Action Chain (SAC), which augments standard language model fine-tuning by introducing latent variables that encapsulate emotional states and conversational strategies between dialogue turns. During inference, these variables are generated before each response, enabling coarse-grained control over dialogue progression while maintaining natural interaction patterns. We also introduce a self-improvement pipeline that leverages dialogue tree search, LLM-based reward modeling, and targeted fine-tuning to optimize conversational trajectories. Our experimental results show that models trained with this approach demonstrate improved performance in emotional intelligence metrics while maintaining strong capabilities on LLM benchmarks. The discrete nature of our latent variables facilitates search-based strategies and provides a foundation for future applications of reinforcement learning to dialogue systems, where learning can occur at the state level rather than the token level.
https://arxiv.org/abs/2503.03040
apple/sage-ft-mixtral-8x7b

I only tried the Q2 which is naturally dumb and crazy, but the writing style has a flair unlike anything else. It's worth further testing.

Anonymous 8/24/2025, 4:04:20 PM No.106368141 >>106368177

>>106368116
>but the writing style has a flair unlike anything else
Just means it has a different slop profile. Mixtrals are incredibly dumb by modern standards, may as well use Nemo.

Anonymous 8/24/2025, 4:07:43 PM No.106368172 >>106368180 >>106368185 >>106368191 >>106368225

How can I make prompt processing faster on a CPU only server without upgrading hardware?

Anonymous 8/24/2025, 4:08:09 PM No.106368177

>>106368141
Maybe our local finetuners can make use of Apple's research to improve their tune.

Anonymous 8/24/2025, 4:08:27 PM No.106368179

>>106365699
glm air at a low quant mogs both
I can't go back to sub 70b models now

Anonymous 8/24/2025, 4:08:32 PM No.106368180

>>106368172
call api on cloud

Anonymous 8/24/2025, 4:09:11 PM No.106368185 >>106368221

>>106368172
>prompt processing
>on a CPU
oof
go dig up a 1060 out of the trash or something, it's still going to be faster

Anonymous 8/24/2025, 4:09:43 PM No.106368191

>>106368172
ngram speculative decoding might help

Anonymous 8/24/2025, 4:11:30 PM No.106368200

How big does a GPU for prompt processing need to be?
Does it need to be able to hold the model?

Anonymous 8/24/2025, 4:13:14 PM No.106368221 >>106368233

>>106368185
not them but i do have a 1070ti how much am i gonna get from that

Anonymous 8/24/2025, 4:13:34 PM No.106368225

>>106368172
Prefix caching on vllm

Anonymous 8/24/2025, 4:15:02 PM No.106368233

>>106368221
about tree fiddy

Anonymous 8/24/2025, 4:18:31 PM No.106368262 >>106368519

>>106368116
Is it safe?

Anonymous 8/24/2025, 4:19:59 PM No.106368275 >>106368305

Mistral Small finetune tier list

Worth using
>TDE
>Magnum Diamond

Decent
>Any of Gryphs ones (Pantheon dude)
>Cydonia (I find them the worst out of the commonly used ones but people must like em for a reason so i'll mention. They're just unremarkable to me and seem to handle the things that Drummer proudly highlights about his models worse than the others)

Avoid
>Any Drummer shit
>That Broken Tutu garbage

Anonymous 8/24/2025, 4:24:08 PM No.106368301 >>106368313 >>106368322 >>106368331 >>106368344

Mistral Small finetune tier list

Worth using
>

Decent
>

Avoid
>Mistral Small finetunes

Anonymous 8/24/2025, 4:24:25 PM No.106368305

>>106368275
>Any Drummer shit that's not his basic Cydonia model (I tried the R1 Cydonia and it sucked)

Anonymous 8/24/2025, 4:25:26 PM No.106368313 >>106368341

>>106368301
Name a better small local model for ERP

>inb4 reasoning models
>inb4 Qwen or QWQ

Only one that outright beats it in ERP is Gemma and those models have far more issues that outweigh the positives.

Anonymous 8/24/2025, 4:26:10 PM No.106368318 >>106368340 >>106368350

>>106367886
>Gemma also uses a SHITLOAD of memory for context
1/ with iSWA turned on, it uses less than any other model (maybe gpt-oss uses less? it's the only other model I've seen with iSWA and it did use very little for context)
llama.cpp should have that by default (--swa-full is a turn-the-feature-off flag)
2/ of the Gemma 3, ironically, 27B uses less VRAM for its context than 12B (attention head size differences) so having a GPU that can fit 27b feels like getting the better deal over running the smaller gemma

Anonymous 8/24/2025, 4:26:27 PM No.106368320 >>106368465

>>106367326
reasoning can be good for rp consistency and more realistic responses (outside of erp though)

Anonymous 8/24/2025, 4:26:33 PM No.106368322 >>106368341

>>106368301
Fuck off

Anonymous 8/24/2025, 4:27:22 PM No.106368331

>>106368301
trvke

Anonymous 8/24/2025, 4:28:29 PM No.106368339

>>106365635
>proprietary
mistral y u do this

Anonymous 8/24/2025, 4:28:43 PM No.106368340

>>106368318
this makes more sense to me >>106367865

It's just painfully slow on kobold at least

Anonymous 8/24/2025, 4:28:47 PM No.106368341 >>106368381

>>106368313
I'm not saying small is bad, I'm saying that the finetunes are garbage. At the best of times they're a placebo and at other times they just cut model's intelligence in half. It takes very little effort to make it write a sex scene, or any depraved shit you can think of. It doesn't need a finetune.
>>106368322
*smooch*

Anonymous 8/24/2025, 4:29:04 PM No.106368344

>>106368301
Mistral models worth using:

Anonymous 8/24/2025, 4:29:48 PM No.106368350 >>106368372

>>106368318
Fucking with SWA means no contextshift, so fuck that.

Anonymous 8/24/2025, 4:31:05 PM No.106368366 >>106368508

>>106365635
redpill me on medium 2508.

How bigs it gonna be (around the 30b range or 70 seeing as large is like 120B)

I swear Mistral is the only thing keeping achievable LLMs alive at this moment. Now everyone jerks off over 3 t/s q2 MoE garbage that they still need a server to run

Anonymous 8/24/2025, 4:31:46 PM No.106368372 >>106368395

>>106368350
>means no contextshift
why would you want this cancer? I turn it off
if you run out of context summarize and start anew, or delete some messages in the history

Anonymous 8/24/2025, 4:32:36 PM No.106368381 >>106368407

>>106368341
See I don't get when people say this. I've tried them all, non finetunes also and there's a clear difference to me and many others which is why people even waste time on them.

Such a weird thing to be contrarian over when we can all download them for free and see for ourselves and literally compare outputs between the models.

Anonymous 8/24/2025, 4:33:14 PM No.106368385 >>106368407

what does it mean when you swipe multiple times and get virtually the exact same fucking response? What samplers do I change to fix this?

Anonymous 8/24/2025, 4:34:17 PM No.106368395

>>106368372
>why would you want this cancer?
So I don't have to summarize all the fucking time and have small details constantly being lost.

Anonymous 8/24/2025, 4:35:49 PM No.106368407 >>106368453

>>106368381
Yeah, I've done just that and there's basically none that resulted in a better experience.
>>106368385
Assuming your other sliders are mostly neutral, Temperature
But as you increase it the model becomes gradually more retarded

Anonymous 8/24/2025, 4:41:10 PM No.106368453 >>106368462 >>106368471 >>106368491

lay mao.jpg md5: 584653fd... 🔍

>>106368407
>increasing temperature makes the model more retarded
So these are the geniuses behind the brigade against finetunes.

Anonymous 8/24/2025, 4:42:02 PM No.106368462 >>106368565

>>106368453
>no argument
as usual

Anonymous 8/24/2025, 4:42:06 PM No.106368465

>>106368320
It's not. Just compare V3 to R1. R1 loses the plot fast when it's time for some spatial reasoning.

Anonymous 8/24/2025, 4:42:34 PM No.106368471 >>106368485

1754211265679791.jpg md5: 0174a4f4... 🔍

>>106368453
>finetrooner doesn't know how the most basic sampler works

Anonymous 8/24/2025, 4:43:44 PM No.106368485 >>106368498 >>106368505

>>106368471
It adds creativity.

Anonymous 8/24/2025, 4:44:25 PM No.106368491

>>106368453
>increasing temperature doesn't make the model more retarded
so these are the geniuses defending sloptunes

Anonymous 8/24/2025, 4:45:12 PM No.106368498 >>106369231

>>106368485
it adds retardation
there's a reason why all recommendations from all model trainers is to use a lower temperature for coding

Anonymous 8/24/2025, 4:45:30 PM No.106368501 >>106368517

God people have become so fucking awful.
People deserve the inevitable pajeetification of the world.

Anonymous 8/24/2025, 4:45:50 PM No.106368505

>>106368485
lol

Anonymous 8/24/2025, 4:46:12 PM No.106368508 >>106368693

>>106368366
Medium is almost certainly another MoE in the 100-200B range
They will also never release it, unless another Miqu happens

Anonymous 8/24/2025, 4:46:42 PM No.106368515 >>106368559

chrome_2025-08-24-1756046764.png md5: f2fba1fb... 🔍

I did it yay

Anonymous 8/24/2025, 4:46:51 PM No.106368517

>>106368501
>doesn't quote the posts he thinks is stupid because he's afraid of being called out
coward

Anonymous 8/24/2025, 4:46:55 PM No.106368519

>>106368262
From its model card
>Not suitable for high-stakes, safety-critical deployment without further evaluation

Also the paper does not say they did anything for safety

Anonymous 8/24/2025, 4:49:22 PM No.106368538 >>106368547 >>106368548 >>106368571

I just realized fuckhuge MoE's have another huge advantage - they reduce the popularity of finetroons and the population of finetrooners.

Anonymous 8/24/2025, 4:50:29 PM No.106368547

>>106368538
That's why they keep tuning the same old mistral and qwen models over and over again.

Anonymous 8/24/2025, 4:50:36 PM No.106368548

>>106368538
extremely based indeed

Anonymous 8/24/2025, 4:51:53 PM No.106368559 >>106368643

>>106368515
Good job

Anonymous 8/24/2025, 4:53:15 PM No.106368565 >>106368593 >>106368598

puffs.gif md5: 64596526... 🔍

>>106368462
You never presented one.

But go on.

Explain what temperature even does and then directly point the link to "retardation".

Your word, not mine. I'll wait and I will keep reminding you of your post when the goalpost shift inevitably comes. Here's your initial comment:

>Assuming your other sliders are mostly neutral, Temperature
>But as you increase it the model becomes gradually more retarded

You guys have no fucking idea how models work yet you speak with so much authority on them, it's laughable.

>*stubs out his high temperature cigar ashes on your 5ft2 frame*

Anonymous 8/24/2025, 4:53:36 PM No.106368571 >>106368583 >>106368678

>>106368538
That doesn't make sense. The only people that use finetunes are people that can't run huge models and hope the tune makes the small model better.

Anonymous 8/24/2025, 4:55:24 PM No.106368583

hqdefault.jpg md5: 5b1db037... 🔍

>>106368571
>listening to people who edge for a week while their 0.25 t/s Q2 MoE generates moon runes for them
lmao

Anonymous 8/24/2025, 4:56:19 PM No.106368591 >>106368606

>>106366515
Are you using base llama.cpp?
Last time I checked, you need ik_llama for GLM.
https://github.com/ikawrakow/ik_llama.cpp

Can't help you with llama args because I've always used kobold.cpp. I use a fork of kobold.cpp which includes ik_llama's optimizations and model support:
https://github.com/Nexesenex/croco.cpp

The relevant settings in the GUI:
GPU Layers: 999 (just a random number for every layer)
Use contextshift, fastforwarding, flashattention, quantize KV cache: 1 - q8_0
Override tensors: (this is the key)
.ffn_.*_exps.=CPU

Anonymous 8/24/2025, 4:56:27 PM No.106368593

>>106368565
>Explain what temperature even does
it controls how random the next token choice is (choice which get filtered by other samplers first like top_k, top_p, min_p and whatever UnIQuE snake oil troonery du jour you prefer)
high temperature makes the less likely picks more likely to happen and LLMs being pattern matchers without any real intelligence the less likely pick can be something incredibly stupid

Anonymous 8/24/2025, 4:56:50 PM No.106368598 >>106368711

>>106368565
Ask your temp=5 model to educate you, I don't care if you stay retarded.

Anonymous 8/24/2025, 4:57:35 PM No.106368606

>>106368591
>Last time I checked, you need ik_llama for GLM.
That was never the case. In fact, GLM ran like shit on ik_ for the longest time because they had something about their attention fucked up that made GLM run horribly.

Anonymous 8/24/2025, 4:59:29 PM No.106368625 >>106368635

1729118327947236.png md5: 66eeb97a... 🔍

ik_llama.cpp bros we fucking won

Anonymous 8/24/2025, 4:59:56 PM No.106368631 >>106368653

ik is a sad meme that is diverging more and more from mainline llama.cpp which in turn will make it more difficult to merge new model and feature support in
this drama whore needs to go back to obscurity where it belongs

Anonymous 8/24/2025, 5:00:12 PM No.106368635

>>106368625
That's actually pretty dope.

Anonymous 8/24/2025, 5:01:30 PM No.106368643 >>106368658

>>106368559
I'm tired. I must remember to never argue with "AI". Bullshit technology.

Anonymous 8/24/2025, 5:02:02 PM No.106368647 >>106368664 >>106368665 >>106368800

Can a 24GB VRAM + 32GB RAM (DDR5) run GLM Air.

Kinda wanna try it out

Anonymous 8/24/2025, 5:02:22 PM No.106368649 >>106368659

God I can't wait until the Israel/Iran war resumes.

Anonymous 8/24/2025, 5:02:48 PM No.106368653

>>106368631
maybe once llama.cpp doesn't have shit prompt processing speed on deepseek and other mla models

Anonymous 8/24/2025, 5:03:09 PM No.106368658

>>106368643
scam altman and dario really convinced much of the world that these things are something they aren't and will never be
the average tech bro is so buckbroken by this shit it's unreal
LLMs are like a religion

Anonymous 8/24/2025, 5:03:10 PM No.106368659

>>106368649
What does that have to do with LLMs

Anonymous 8/24/2025, 5:03:38 PM No.106368664 >>106368695

>>106368647
>32gb ram
no, non-retarded quants (q4 and up) only fit on 64gb

Anonymous 8/24/2025, 5:03:38 PM No.106368665 >>106368675 >>106368695 >>106368735

>>106368647
Q3KM is less than 50gb, so yeah I guess.
It'll be really fucking fast too.

Anonymous 8/24/2025, 5:04:46 PM No.106368675 >>106368682

>>106368665
>It'll be really fucking fast too.
Not with half of it offloaded to dual channel ram

Anonymous 8/24/2025, 5:04:48 PM No.106368677

is there a perplexity chart for GLM air, including the trannygram goofs?

Anonymous 8/24/2025, 5:04:58 PM No.106368678

>>106368571
I would think that when you see the cancerummer release mistral small shittune number 49 you either give up on hobby or buy more ram and run 235B Q2 (it is objectively better than everything drummer shat out). But even in the scenario you presented drummer has a huge problem if he doesn't get any new models to slap his label on. That problem is that people start to notice his grift is worthless. Even the most retarded people will realize finetrooning does nothing after they use like 4-5 finetroons of the same model. The only reason drummer exists is he gets to slap his label on new models that get released and people confuse the base model being different with what drummer did. Btw he should die in a fire.

Anonymous 8/24/2025, 5:06:07 PM No.106368682 >>106368728

>>106368675
It's a MoE with not that many activated parameters. And in his case almost half of the experts will be in vram.
It'll be fast as fuck.

Anonymous 8/24/2025, 5:07:18 PM No.106368693

>>106368508
another MoE in the 70b-100b range instead of more small 20b slop or fuckhuge monster models requiring a whole server to run would be nice
too bad nobody cares about the medium tier anymore

Anonymous 8/24/2025, 5:07:41 PM No.106368695 >>106368710 >>106368751

>>106368664
Would Q3 be more retarded than the models I usually use? (mistral small, nemo)?
>>106368665
Q3KM is 56 GB (my max memory is 56)

Anonymous 8/24/2025, 5:09:29 PM No.106368710 >>106368721

>>106368695
Q3KS is between 48 and 49 gbs IIRC.
And the last layer of the GGUF is ignore, so that's a couple hundred MBs that don't get loaded to memory.

Anonymous 8/24/2025, 5:09:40 PM No.106368711 >>106368729 >>106368736

>>106368598
I did it for the keks (topk=20 to keep it somewhat on the rails): https://rentry.org/dpavrps2
enlightening

Anonymous 8/24/2025, 5:10:42 PM No.106368721

>>106368710
yea that's the one I figured you meant (using that now).

It's 48GB.

Well, i'll test. Really curious to see how it runs (if it runs).

Anonymous 8/24/2025, 5:11:04 PM No.106368728 >>106368748

>>106368682
"as fuck" seems to translate to approximately 10 t/s or so for moetards

Anonymous 8/24/2025, 5:11:07 PM No.106368729

>>106368711
High temp + low topk is the way of kings.

Anonymous 8/24/2025, 5:12:13 PM No.106368735

>>106368665
>really fucking fast
As soon as you reach a decent context size (say 16k) your fast as fuck turns into 1-3tk/s which is excrutiating.
I really do think anyone that tells people to use these big moes either has the patience of a saint or doesn't actually use them.

Anonymous 8/24/2025, 5:12:17 PM No.106368736

>>106368711
>https://rentry.org/dpavrps2
sounds about as coherent as the finetrooner high temperature enjoyer we were arguing with
maybe his brain is high temperature too

Anonymous 8/24/2025, 5:13:14 PM No.106368748

>>106368728
That's faster than typical reading speed.

Anonymous 8/24/2025, 5:13:26 PM No.106368751 >>106368763

>>106368695
consider that you also need VRAM for the context bruh

Anonymous 8/24/2025, 5:14:36 PM No.106368763

>>106368751
Yea I know, i'm moreso just curious to try it, if it doesn't work then back to my good old sloptunes if need be

Anonymous 8/24/2025, 5:14:38 PM No.106368765

I used to think top_p (or actually min_p but I got over this meme) at 5% or more is a lot. But if you think about it that means it only works for 1 out of 20 tokens at most.

Anonymous 8/24/2025, 5:15:10 PM No.106368770 >>106368780 >>106368802 >>106368814

My favorite way to play with OSS models is to fire up a high powered cloud instance for a couple hours and it costs like $10 total. It's cheaper than spending $100k on the machine I'm using, and it's cheaper than spending tokens on model providers.

Is this basic linux system administration too far beyond many people? Does knowing basic ssh make you skilled in 2025?

Anonymous 8/24/2025, 5:16:02 PM No.106368780

>>106368770
Are you dumb? Just pay for the official API

Anonymous 8/24/2025, 5:18:33 PM No.106368800 >>106368951

>>106368647
I have the same amount of memory and I can fit iq3_xs on it. Even at that quant it still beats everything else I've used. Maybe some 70Bs come close but I haven't tried enough of them yet.

Anonymous 8/24/2025, 5:18:35 PM No.106368802 >>106368824

>>106368770
>a couple hours
>$10
>cheap
What that's like 10 times more expensive than straight up paying for fp8 k2. You're running it on somebody else's computer anyway.

Anonymous 8/24/2025, 5:19:44 PM No.106368814 >>106368849

>>106368770
>and it's cheaper than spending tokens on model providers.
You might want to check those prices again.

Anonymous 8/24/2025, 5:20:47 PM No.106368824

>>106368802
>You're running it on somebody else's computer anyway.
Exactly.
At that point, just pay for deepseek's API.

Anonymous 8/24/2025, 5:24:03 PM No.106368849 >>106368857 >>106368859 >>106368889

>>106368814
What happens when OSS models are good enough for all uses so paying for APIs is worthless and the AI infra buildout made GPU compute time worth zero because there's too much excess compute? How does anybody make money in AI after that?

Anonymous 8/24/2025, 5:25:51 PM No.106368857

>>106368849
As long as the bigger the better holds true, that will not happen. We have "good enough" compared to 2 years ago, but we still want better

Anonymous 8/24/2025, 5:25:57 PM No.106368859

>>106368849
Burn GPUs until price return to profitable again
It's what they do with diary products

Anonymous 8/24/2025, 5:26:07 PM No.106368863 >>106368891

I tried running Qwen3-30B-A3B-Thinking-2507_Q8 on ollama and llama.cpp (with CLI for minimal overhead) to test which is faster and ollama is 3-4 times faster than llama.cpp, which is unexpected given I've heard llama.cpp is either just as fast or faster. Any ideas? I tried playing with settings a little (number of threads, number of GPU layers, context length, etc.) and I can never get llama.cpp to be as fast as ollama. I'm not sure what I have to tweak exactly. If llama.cpp is faster in theory, I'd like to switch to it, but it's clearly not the case here.

Anonymous 8/24/2025, 5:28:48 PM No.106368889 >>106368942 >>106369543 >>106369560

>>106368849
Your entire premise is retarded.
Only a small number of hobbyists and researchers use OSS models to begin with. And almost nobody is running models like K2 locally. Average people aren't going to stop paying for Claude and Gemini APIs en mass because they can spend hours in the terminal to get a braindamaged version running slowly with an OSS model.
Even if APIs decline in usage due to AI bubble pop, excess compute is the opposite of worthless. That's AWS and Azure's entire business model.

Anonymous 8/24/2025, 5:29:14 PM No.106368891 >>106369140

>>106368863
Keep using ollama until you decide you no longer want to be a retard when it comes to this hobby.

Anonymous 8/24/2025, 5:31:23 PM No.106368907 >>106368948

re: high temperature discussion, it reminds me of one of the more interesting model capabilities I have ever seen where llama 405b could deliberately output a completely random token sequence like you would see when using a really high temp (including those placeholder control tokens which it had realistically never seen before, like it was actually randomly sampling) and then at will pull itself out of it and regain perfect coherence. it could do this at perfectly normal temps
I think I have logs of this somewhere but I saw someone vouch for it on xitter before too. it's possible to push other models into random token mode but I've never seen one able to slip so easily in and out of it at will.

Anonymous 8/24/2025, 5:35:10 PM No.106368942 >>106369018 >>106369250

>>106368889
There's vast industries that AI cant touch yet because they can't exfiltrate data to third parties.
Lawyers and doctors with client and patient info come to mind.
On prem AI solutions are the future, there's a massive market for making local models good.
Would you trust Sam Twinkman with your medical info?

Anonymous 8/24/2025, 5:35:56 PM No.106368948

>>106368907
Sounds like sentience.

Anonymous 8/24/2025, 5:36:30 PM No.106368951 >>106369109

>>106368800
When you say exact memory, you mean both VRAM and RAM right? What t/s you getting and at what context? How long can your chats go before the memory becomes an issue (i.e you need to summarize and flush the chat)

Cheers

Anonymous 8/24/2025, 5:44:05 PM No.106369018

>>106368942
Or the AI companies will successfully lobby for changes to the relevant laws so they can vacuum up that data, too.

Anonymous 8/24/2025, 5:44:41 PM No.106369024

>>106367400
>machine
>optical illusion
????????

Anonymous 8/24/2025, 5:54:20 PM No.106369109 >>106369245 >>106370067

>>106368951
Just try it out, anon. Most people in this thread ask a model "how many 'r's in Strawberry", laugh at the result and then go back to Openrouter or an API.
As someone with 24gb vram+64gb ram I don't think it's usable beyond a few k tokens.

Anonymous 8/24/2025, 5:57:56 PM No.106369140

>>106368891
Everyone's gotta learn somewhere

Anonymous 8/24/2025, 6:08:20 PM No.106369231 >>106369436

>>106368498
I hope that's bait. You can't possibly be that retarded

Anonymous 8/24/2025, 6:09:30 PM No.106369245 >>106369293

>>106369109
> I don't think it's usable beyond a few k tokens.
What the hell does that mean lmao

Anonymous 8/24/2025, 6:09:50 PM No.106369250

>>106368942
There was a video about some doctor using chatGPT to write summaries from patient files...
That was a while ago.

Anonymous 8/24/2025, 6:13:07 PM No.106369275 >>106369344

>>106364866
which model do you use for that anon?

Anonymous 8/24/2025, 6:15:44 PM No.106369293 >>106370067

>>106369245
Context, anon. A few thousand tokens worth of filled up context.

Anonymous 8/24/2025, 6:16:42 PM No.106369302 >>106369347

2024.png md5: a6a372f6... 🔍

is the lazy getting started guide in OP still relevant? it's from 2024

Anonymous 8/24/2025, 6:20:06 PM No.106369344

>>106369275
Mistral.
It's the most knowledgeable model when it comes to Western media.
And if it's lacking anything, you can always feed it a page or 2 with all the details and information

Anonymous 8/24/2025, 6:20:22 PM No.106369347 >>106369357 >>106369574

>>106369302
Yes nothing changed

Anonymous 8/24/2025, 6:21:18 PM No.106369357

>>106369347
thank you

Anonymous 8/24/2025, 6:22:05 PM No.106369364

>>106366501
i miss /bcsg/

Anonymous 8/24/2025, 6:30:51 PM No.106369436

>>106369231
I can never be as retarded as the high temperature enjoyer

Anonymous 8/24/2025, 6:36:34 PM No.106369484 >>106369533 >>106369557

When doing math or coding, is there any reason why a sampler that always picks the top option wouldn't be ideal?

Anonymous 8/24/2025, 6:42:22 PM No.106369533

>>106369484
Sometimes it's wrong there, and is right on subsequent try. Just had this today.

Anonymous 8/24/2025, 6:44:11 PM No.106369543

>>106368889
I suspect that OS models will make things difficult for the big Western AI companies short term, but that's more in the sense of other services being able to use those models to offer their services for a lot cheaper than the big boys. Hell, OpenAI and Google have already started doing it, and Anthropic is probably going to have to sooner rather than later
I think there is going to be a point where hardware advances to the point API services becomes antiquated and everyone can and does run everything locally, but that's true endgame for after we've hit the wall, and probably not within the next 10 years at least

Anonymous 8/24/2025, 6:47:07 PM No.106369557 >>106369719

>>106369484
In theory that would be the correct approach for anything that needs 100% accuracy, asuming the training is on par, but in practice, it degrades the model's output
>https://arxiv.org/html/2407.10457v1

Anonymous 8/24/2025, 6:47:35 PM No.106369560 >>106369577

>>106368889
I suspect that OS models will make things difficult for the big Western AI companies short term, but that's more in the sense of other services being able to use those models to offer their services for a lot cheaper than the big boys. Hell, OpenAI and Google have already started cutting their costs to their bone, and Anthropic is probably going to have to sooner rather than later
I think there is going to be a point where hardware advances to the point API services becomes antiquated and everyone can and does run everything locally, but that's true endgame for after we've hit the wall, and probably not within the next 10 years at least

Anonymous 8/24/2025, 6:49:03 PM No.106369574 >>106369592

>>106369347
one more question, can I still use nemo 12B with an AMD card or do I have to use something else?

Anonymous 8/24/2025, 6:49:28 PM No.106369577 >>106369664

>>106369560
That's what you hope to be endgame. The trend I see is towards more always online thin clients and everything as a subscription service.

Anonymous 8/24/2025, 6:51:12 PM No.106369592 >>106369610

>>106369574
or if anyone could suggest a similar model which would work with AMD? It's 16gb

Anonymous 8/24/2025, 6:53:01 PM No.106369610 >>106369617

>>106369592
Why would you think you can't use nemo on an amd card?

Anonymous 8/24/2025, 6:54:03 PM No.106369617 >>106369653

>>106369610
I'm low IQ and read Nvidia in the name but I just realized it's because they made it. Please excuse me I am stupid. Now I'm trying to figure out how to download the model from HuggingFace. Pay me no mind.

Anonymous 8/24/2025, 6:58:39 PM No.106369653 >>106369665

>>106369617
This just triggers my feeding fetish. Also if you're just looking to coom, maybe try a finetune instead of just vanilla nemo

Anonymous 8/24/2025, 7:00:06 PM No.106369664

>>106369577
I do, just because there are too many reasons that I don't think the "one centralized provider" approach is going to work out in the long run. It's already debatable how much these companies actually make, and even the network strain as more and more applications become more AI focused is unimaginable
I think there could be AI focused applications that are API based too (like now) but those are going to be the products, not the raw materials

Anonymous 8/24/2025, 7:00:22 PM No.106369665 >>106369696 >>106369718

>>106369653
Yes that is precisely my intention. Any suggestions? And would you like to.. feed me directions towards how I could "download the gguf" I would love you long time indeed.

Anonymous 8/24/2025, 7:04:36 PM No.106369696 >>106369728

>>106369665
https://huggingface.co/bartowski/TheDrummer_Cydonia-24B-v4.1-GGUF/resolve/main/TheDrummer_Cydonia-24B-v4.1-Q4_K_M.gguf

Is a pretty popular one. But q4_km with context may be too big for your 16gb card if you're running off it. Replace Q4_K_M with IQ4_XS if that's the case.

I don't really use the small models, so I don't know what's good. If you can, try running a ~medium~ sized moe. It'll be slower depending on your hardware. But I think 10 tokens/s is achievable on most people's computers, and that's good enough for that use case.

Anonymous 8/24/2025, 7:06:46 PM No.106369718 >>106369728

>>106369665
https://huggingface.co/bartowski/Rocinante-12B-v1.1-GGUF/resolve/main/Rocinante-12B-v1.1-Q8_0.gguf

And that's a nemo based finetune.

Anonymous 8/24/2025, 7:06:54 PM No.106369719 >>106369762

>>106369557
Did you even read the paper?
>In summary: 1) Greedy decoding generally proves more effective for most tasks. 2) In the case of AlpacaEval, which comprises relatively simpler open-ended creative tasks, sampling tends to generate better responses.

Anonymous 8/24/2025, 7:07:50 PM No.106369728 >>106369744

>>106369696
>>106369718
<3 thank you oldGOD
I guess I'll try IQ4_XS first.
Is that last one better?

Anonymous 8/24/2025, 7:08:50 PM No.106369744

>>106369728
The first one is a bit better but will be slower on your machine

Anonymous 8/24/2025, 7:10:44 PM No.106369762

>>106369719
they have high temperature brains
we must forgive them

Anonymous 8/24/2025, 7:11:15 PM No.106369768 >>106369848

>it's 2075
>Sam Altman asks investors for another trillion dollars because AGI is two weeks away
>GPU poorfags are still rping on Nemo because nothing better has been developed so far for smaller models
>GPT-43-omni writes a bestseller book, only two shivers down the spine per page
the future is bright

Anonymous 8/24/2025, 7:11:41 PM No.106369773 >>106369781 >>106369786

why the fuck is kobold tryingo to connect to the internet?

Anonymous 8/24/2025, 7:12:12 PM No.106369781

>>106369773
tool calling

Anonymous 8/24/2025, 7:13:01 PM No.106369786 >>106369791

>>106369773
to upload your logs to the archive we all read

Anonymous 8/24/2025, 7:13:54 PM No.106369791 >>106369816

>>106369786
STOP

Anonymous 8/24/2025, 7:16:19 PM No.106369816

>>106369791
>ah ah mistress
anon...

Anonymous 8/24/2025, 7:16:57 PM No.106369820 >>106369862

ok so I have it loaded in kobold how do I plug into sillytavern? you guys are so cool btw I bet you guys are awesome irl

Anonymous 8/24/2025, 7:17:11 PM No.106369823 >>106369833 >>106369834

Is there a use case for 2bit and 1bit quants?
It was pretty interesting to see a 27B model run on my older laptop with 8gb ram and 4gb Vram at 0.1t/s

Anonymous 8/24/2025, 7:18:25 PM No.106369833 >>106369888

>>106369823
2bit DS is still usable for RP and maybe translations.

Anonymous 8/24/2025, 7:18:25 PM No.106369834

>>106369823
Bragging on /lmg/

Anonymous 8/24/2025, 7:19:37 PM No.106369848

>>106369768
The catch - it's not GPT-43, it's GPT-4.3
Reception to 5 was so bad that they never went above 4.9 again

Anonymous 8/24/2025, 7:19:48 PM No.106369849 >>106369872

bitch who's this adam guy you kept mentioning

Anonymous 8/24/2025, 7:20:32 PM No.106369861

>>106369841
>>106369841
>>106369841

Anonymous 8/24/2025, 7:20:37 PM No.106369862 >>106369879

1.png md5: 588ff730... 🔍

>>106369820

Anonymous 8/24/2025, 7:21:35 PM No.106369872

>>106369849
Adam Wang

Anonymous 8/24/2025, 7:21:59 PM No.106369879 >>106369898

>>106369862
connrefused?!?!
love u tho
ill ask grok too

Anonymous 8/24/2025, 7:22:40 PM No.106369888

>>106369833
>translations
Definitely not. If you just want the gist of things it's okay, I guess.

Anonymous 8/24/2025, 7:23:16 PM No.106369898 >>106369942

>>106369879
Change llama.cpp to kobold bro

Anonymous 8/24/2025, 7:27:12 PM No.106369942 >>106369947

>>106369898
ya I did, port was wrong too
we are so back !!!!

Anonymous 8/24/2025, 7:28:17 PM No.106369947 >>106369957

>>106369942
>we
How many of you are behind that keyboard?

Anonymous 8/24/2025, 7:29:33 PM No.106369957 >>106369974 >>106370038

hello.png md5: 94454e70... 🔍

IT WORKS
but...
>>106369947
i'm the fed??? you guys tell me to use this model and I'M the fed???

Anonymous 8/24/2025, 7:30:53 PM No.106369974 >>106369985

>>106369957
Keep in mind most finetuners don't bother with unaligning assistant-related tasks. You'll want to do it in a rp-style way.

Anonymous 8/24/2025, 7:31:36 PM No.106369985 >>106370005 >>106370151

>>106369974
I have to roleplay with my graphics card to convince it to say nigger?

Anonymous 8/24/2025, 7:33:54 PM No.106370005

>>106369985
Foreplay is important.

Anonymous 8/24/2025, 7:35:06 PM No.106370016

this IQ4 XS model runs pretty speedy on my 7900 GRE, any recommendations for something more lewd and less sterile?

Anonymous 8/24/2025, 7:36:37 PM No.106370038 >>106370047

Untitled.png md5: 76dfbd2f... 🔍

>>106369957
You don't want something like this, do you anon?

Anonymous 8/24/2025, 7:37:41 PM No.106370047

>>106370038
plz anon

Anonymous 8/24/2025, 7:39:13 PM No.106370067

>>106369293
>>106369109
I've loaded it up.

How much should I offload to the GPU and how much to the CPU in kobold?

Anonymous 8/24/2025, 7:48:35 PM No.106370151

>>106369985
All AI use is roleplaying. Online models are just roleplaying as a really boring character called an AI assistant.

Anonymous 8/24/2025, 9:10:07 PM No.106370898

Screen Shot 2025-08-25 at 4.08.20.png md5: 6061c4b5... 🔍

>>106366567
What's the point in updating something that works?