Thread 106358752

345 posts 82 images /g/

Anonymous 8/23/2025, 5:55:42 PM No.106358752 [Report] >>106358780 >>106360462 >>106362544

/lmg/ - Local Models General

1743052378919146.jpg md5: e709f99d...

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106351514 & >>106345562

►News
>(08/21) Command A Reasoning released: https://hf.co/CohereLabs/command-a-reasoning-08-2025
>(08/20) ByteDance releases Seed-OSS-36B models: https://github.com/ByteDance-Seed/seed-oss
>(08/19) DeepSeek-V3.1-Base released: https://hf.co/deepseek-ai/DeepSeek-V3.1-Base
>(08/18) Nemotron Nano 2 released: https://research.nvidia.com/labs/adlr/NVIDIA-Nemotron-Nano-2
>(08/15) Ovis2.5 MLLMs released: https://huggingface.co/collections/AIDC-AI/ovis25-689ec1474633b2aab8809335

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous 8/23/2025, 5:56:14 PM No.106358757 [Report] >>106364579

image_2025-08-15_085459386.png md5: c6093d86...

►Recent Highlights from the Previous Thread: >>106351514

--Qwen VL blocks Mao commemorative tea image due to political content moderation:
>106352603 >106352638 >106352653 >106352678 >106352695 >106352729 >106352741 >106352766 >106352778 >106352788 >106352794 >106352824 >106354537 >106354560
--GPU frequency locking affects code path performance and can't be queried:
>106351737 >106351762 >106351867 >106351875 >106351889 >106351911
--Frontend differences affecting token generation speed on same backend:
>106353506 >106353548 >106353898 >106354113 >106353905 >106354026
--Reasoning pre-fill exploits model trust bias for stronger output control:
>106354146 >106354174 >106354426 >106354778 >106354793 >106354614
--Meta partners with Midjourney, sparking criticism and speculation:
>106352643 >106352648 >106352649 >106354887 >106355765
--Avoid FP16 CUDA flags to prevent numerical overflow in quantized models:
>106356396 >106356788
--Qwen models overusing "not x but y" phrasing:
>106353981 >106353997 >106354008 >106354031 >106354058 >106354159 >106354182 >106356075
--GPU memory fault due to excessive GPU offload layers and poor memory management:
>106352359 >106352374 >106352413 >106352428 >106352463 >106352578 >106352673
--Maximize VRAM usage during fine-tuning for optimal throughput:
>106355943 >106356138 >106356180 >106356282
--Anons deploy local LLMs for gaming, finance, automation, and adult content:
>106354780 >106354986 >106355189 >106355209 >106355240
--OpenAI's India expansion mirrors past tech offshoring trends:
>106353105 >106353224 >106353263
--Seed 36B model support merged:
>106354673 >106355049 >106357911
--Illegal GPU memory access likely caused by index calculation bugs, not VRAM capacity:
>106352021 >106352040
--Copyright lawsuit accuses Meta of using pirated adult films for AI training:
>106352956
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>106351520

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous 8/23/2025, 5:58:06 PM No.106358772 [Report] >>106358795 >>106358804 >>106364258

Local AI is as good as dead if we don't get a local equivalent to Genie 3 by the end of this year.

Anonymous 8/23/2025, 5:58:41 PM No.106358780 [Report] >>106358809 >>106358832 >>106358856

>>106358752 (OP)
>Command A Reasoning released
How is it?

Anonymous 8/23/2025, 6:00:02 PM No.106358795 [Report]

>>106358772
whats the plan then genius

Anonymous 8/23/2025, 6:00:07 PM No.106358797 [Report]

loli feet

Anonymous 8/23/2025, 6:01:01 PM No.106358804 [Report]

>>106358772
How have you pushed local models in order to realize this claim?

Anonymous 8/23/2025, 6:01:39 PM No.106358809 [Report]

>>106358780
Cohere has completely committed to slopping up and safety cucking their shit.

Anonymous 8/23/2025, 6:04:55 PM No.106358831 [Report] >>106358886 >>106358890 >>106358902 >>106359465

How come ST doesn't have some simple tool calling yet that lets the model roll a die or something dynamically? Why are local models so far behind?

Anonymous 8/23/2025, 6:05:00 PM No.106358832 [Report] >>106358878

250815_blog_command-a-reasoning_Agentic-Benchmarks-1.png md5: e7b4d48b...

>>106358780
It is absolutely safe.

Anonymous 8/23/2025, 6:06:45 PM No.106358856 [Report]

>>106358780
It competes with gpt-oss

Anonymous 8/23/2025, 6:08:50 PM No.106358878 [Report]

>>106358832
You are absolutely right! Bringing safety to all is a part of my core programming.

Anonymous 8/23/2025, 6:09:09 PM No.106358884 [Report]

>>106355818
Hoping someone could patch command-a-reasoning-08-2025 into ST. Model works over the API trial key.
"thinking": {
"type": "disabled", # enabled by default
"token_budget": 500 # no error on disabled, no max, unlimited when not specified
}
"message": {
"role": "assistant",
"content": [
{
"type": "thinking",
"thinking": "stuff here"
},
{
"type": "text",
"text": "final response here"
}
]
}

Anonymous 8/23/2025, 6:09:29 PM No.106358886 [Report]

>>106358831
SillyTavern is not a local model it's an user interface.

Anonymous 8/23/2025, 6:09:32 PM No.106358887 [Report]

1754493464792375.png md5: 29095566...

Anonymous 8/23/2025, 6:09:45 PM No.106358890 [Report]

>>106358831
The bloated broken mess that is ServiceTesnor is single-handedly holding back local.

Anonymous 8/23/2025, 6:09:57 PM No.106358892 [Report] >>106358922 >>106359070 >>106359098 >>106359105 >>106359184 >>106359351 >>106359434 >>106359780 >>106360557 >>106363056

LLM-history-fancy.png md5: 29b23a9f...

Little update

Anonymous 8/23/2025, 6:10:39 PM No.106358902 [Report]

>>106358831
>How come ST doesn't have some simple tool calling yet that lets the model roll a die or something dynamically?
It literally does. Ask gpt to look up the documentation

Anonymous 8/23/2025, 6:13:12 PM No.106358922 [Report] >>106358959

>>106358892
Seems like you ran out of colours. Mentioning individual dev is also nasty and irrelevant.

Anonymous 8/23/2025, 6:16:21 PM No.106358959 [Report] >>106358980

>>106358922
>Mentioning individual dev is also nasty and irrelevant.
Which dev?

Anonymous 8/23/2025, 6:17:41 PM No.106358974 [Report]

command-a-reasoning really was the final punch in the dick of densessisies

Anonymous 8/23/2025, 6:18:14 PM No.106358980 [Report] >>106358995 >>106359017

>>106358959
are you incapable of reading you even quoted the name

Anonymous 8/23/2025, 6:19:14 PM No.106358995 [Report]

>>106358980
moetards are really trying to play up a cohere model failing as a win for themselves?
oh no no no

Anonymous 8/23/2025, 6:21:38 PM No.106359017 [Report] >>106359056

>>106358980
What? Bro, how many B is your brain and at what quant is it running?

Anonymous 8/23/2025, 6:25:19 PM No.106359056 [Report]

>>106359017
You are absolutely right to question this

Anonymous 8/23/2025, 6:27:19 PM No.106359070 [Report] >>106359241

>>106358892
DS V3 0324 so good it was mentioned twice

Anonymous 8/23/2025, 6:28:34 PM No.106359090 [Report] >>106359097 >>106359107 >>106359119 >>106359132 >>106359164 >>106359212

Best way to make a disappointment build?

Anonymous 8/23/2025, 6:29:09 PM No.106359097 [Report]

>>106359090
Buy a prebuilt

Anonymous 8/23/2025, 6:29:10 PM No.106359098 [Report]

>>106358892
Retard

Anonymous 8/23/2025, 6:29:43 PM No.106359104 [Report] >>106359114 >>106359123 >>106359128 >>106360625

crazy how we're still stuck with sillytavern in 2025 when it's essentially stuck as a cobbled together piece of shit from 2023 for all eternity

Anonymous 8/23/2025, 6:29:45 PM No.106359105 [Report]

>>106358892
Thanks

Anonymous 8/23/2025, 6:29:53 PM No.106359107 [Report]

>>106359090
Buy a premade HP and realise it has 9nly two ram sockets.

Anonymous 8/23/2025, 6:30:26 PM No.106359114 [Report]

>>106359104
this but llms in general

Anonymous 8/23/2025, 6:31:05 PM No.106359119 [Report]

>>106359090
Buy enough RAM to run deepseek and realize it's slower than the slowest cloud provider

Anonymous 8/23/2025, 6:32:11 PM No.106359123 [Report]

>>106359104
Try doing better

Anonymous 8/23/2025, 6:32:26 PM No.106359124 [Report]

Cloud will always be cheaper and faster than local because you aren't running your local model 24/7

Anonymous 8/23/2025, 6:32:49 PM No.106359128 [Report]

>>106359104
Vibe code your own interface. You're sending formatted strings to model and back. All you need to know is how to implement tags for each model you are using and how to keep every string in order. It's that simple.

Anonymous 8/23/2025, 6:33:12 PM No.106359132 [Report]

>>106359090
Forgo the build and give away your privacy, autonomy and personal information to use the cloud instead.

Anonymous 8/23/2025, 6:34:27 PM No.106359148 [Report] >>106359205 >>106362246

People who still use shit cards from 2023 think they're in their rights to criticize 2025 models

Anonymous 8/23/2025, 6:35:24 PM No.106359164 [Report]

>>106359090
>RTX 5070 12 GB
>slow 16GB DDR4 RAM
>Intel i3-14100

Anonymous 8/23/2025, 6:37:53 PM No.106359184 [Report] >>106359241

>>106358892
Jamba sisters...

Anonymous 8/23/2025, 6:40:04 PM No.106359205 [Report]

>>106359148
bro I haven't touched a card with less than 3k tokens in a year and a half

Anonymous 8/23/2025, 6:40:35 PM No.106359212 [Report]

>>106359090
spend about $15k on hardware and run the best local model you can find

Anonymous 8/23/2025, 6:42:35 PM No.106359234 [Report] >>106359246

>>536373993
>>536373993
>>536373993
Apologize to rentry

Anonymous 8/23/2025, 6:43:19 PM No.106359241 [Report]

>>106359070
Will correct it in the next version.

>>106359184
Will add as a note in summer flood.

Anonymous 8/23/2025, 6:43:45 PM No.106359246 [Report]

>>106359234
>>>/vg/536373993
>>>/vg/536373993
>>>/vg/536373993
oops

Anonymous 8/23/2025, 6:44:44 PM No.106359256 [Report] >>106359261 >>106359263 >>106359275

GooH6mwWIAEGDaG.jpg md5: 28112a26...

>Try Qwen +200b
>Purple prose schizo
>GLM, and Deepseek are MoEs
>Kimi too big for local
I await my Magnum v5.

Anonymous 8/23/2025, 6:45:48 PM No.106359261 [Report]

>>106359256
This brings memories. Funny how that image slipped past filters.

Anonymous 8/23/2025, 6:46:01 PM No.106359263 [Report] >>106359285

>>106359256
kimi and qwen are also moes

Anonymous 8/23/2025, 6:46:53 PM No.106359275 [Report] >>106359285

>>106359256
Qwen is a MoE too...

Anonymous 8/23/2025, 6:48:00 PM No.106359285 [Report]

>>106359263
>>106359275
Fuck me, I saw instruct and thought it wasn't. This explains everything.

Anonymous 8/23/2025, 6:48:40 PM No.106359290 [Report] >>106359299 >>106359316

If dense is so good
Why aren't more people training them

Anonymous 8/23/2025, 6:49:31 PM No.106359299 [Report] >>106359311 >>106359315

>>106359290
because expensive, and as always once something becomes anywhere big it's race to the bottom time

Anonymous 8/23/2025, 6:50:51 PM No.106359311 [Report] >>106359315

>>106359299
Why don't people that want dense models train their own?

Anonymous 8/23/2025, 6:51:14 PM No.106359315 [Report] >>106359396

>>106359311
refer to >>106359299
>because expensive

Anonymous 8/23/2025, 6:51:15 PM No.106359316 [Report]

>>106359290
Everyone wants to be the next deepseek now

Anonymous 8/23/2025, 6:54:57 PM No.106359351 [Report] >>106359363 >>106359364 >>106359373 >>106359450

>>106358892
Is this a joke? DeepSeek was never good.

Anonymous 8/23/2025, 6:56:30 PM No.106359363 [Report]

>>106359351
Fuck off :D

Anonymous 8/23/2025, 6:56:40 PM No.106359364 [Report]

>>106359351
just let it go sam

Anonymous 8/23/2025, 6:57:48 PM No.106359373 [Report]

>>106359351
Sam, it's been almost nine months. Please settle down.

Anonymous 8/23/2025, 6:59:59 PM No.106359396 [Report]

>>106359315
wtf are you poor?

Anonymous 8/23/2025, 7:03:31 PM No.106359425 [Report]

I was drunk last night and downloaded GPT Ass. Jesus, I promptly deleted it today.

Anonymous 8/23/2025, 7:04:07 PM No.106359434 [Report]

>>106358892
hybrid reasoners work fine, look at GLM

Anonymous 8/23/2025, 7:05:45 PM No.106359448 [Report] >>106359923 >>106360407

e107305d653cfc62117d586b54381714.jpg md5: cedf50cb...

Is there a trick to prompting moes that I'm not aware of? GLM, Deepseek, and Qwen3 are all schizo when I use them.

Anonymous 8/23/2025, 7:05:49 PM No.106359450 [Report] >>106359474

>>106359351
Openai was never good

Anonymous 8/23/2025, 7:07:45 PM No.106359465 [Report]

>>106358831
You need to write an extension to give it tools.

Anonymous 8/23/2025, 7:08:33 PM No.106359470 [Report]

Untitled.jpg md5: 8c516feb...

WTF cheater

Anonymous 8/23/2025, 7:09:04 PM No.106359474 [Report]

>>106359450
o3 was good. Was expensive too. But good.

Anonymous 8/23/2025, 7:20:08 PM No.106359573 [Report]

>compute_imatrix: 1500.86 seconds per pass - ETA 973 hours 53.38 minutes
O-oh...

Anonymous 8/23/2025, 7:22:38 PM No.106359595 [Report] >>106359604 >>106359613

playing games with reasoning models sure is time consuming

Anonymous 8/23/2025, 7:23:49 PM No.106359604 [Report]

>>106359595
Hybrid reasoners are perfect for that.

Anonymous 8/23/2025, 7:24:35 PM No.106359613 [Report]

>>106359595
Prefil the reasoning with relevant information.
Hell, inject lorebook entries in the reasoning block even.

Anonymous 8/23/2025, 7:29:18 PM No.106359645 [Report]

Is it just me or are inline latex single dollar signs not rendering on DS webapp

Anonymous 8/23/2025, 7:35:00 PM No.106359709 [Report] >>106359723

I notice the same model takes twice as long responding to my prompt on Open WebUI as in the Ollama interface. Most of the time is loading time as I wait for the first word to appear. This happens even if I run the same prompt back-to-back in a different chat, so it's not Open WebUI loading the model for the first time. I know Open WebUI adds overhead, but this is suspicious. Anything I can check in my settings?

Anonymous 8/23/2025, 7:36:52 PM No.106359723 [Report] >>106359750

>>106359709
stop using ollama

Anonymous 8/23/2025, 7:39:44 PM No.106359750 [Report] >>106359766 >>106359787 >>106359824

>>106359723
Suggestions for alternatives? I want something with features like chat history, markdown, etc. and not just a command terminal.

Anonymous 8/23/2025, 7:41:58 PM No.106359766 [Report] >>106362389

>>106359750
llamacpp has it's own embedded webui.

Anonymous 8/23/2025, 7:43:13 PM No.106359780 [Report]

>>106358892
>Next up - the AI ice age

Anonymous 8/23/2025, 7:44:15 PM No.106359787 [Report] >>106362389

>>106359750
troonkupad

Anonymous 8/23/2025, 7:47:02 PM No.106359808 [Report] >>106359817 >>106359826 >>106359847 >>106359898 >>106362568

Is Miku trans?

Anonymous 8/23/2025, 7:47:34 PM No.106359817 [Report]

>>106359808
Yes

Anonymous 8/23/2025, 7:47:59 PM No.106359824 [Report] >>106362389

>>106359750
llama.cpp server and any frontend what works for (you). llama has its own webchat but that's very bare. SillyTavern or whatever else is out there works well.

Anonymous 8/23/2025, 7:48:38 PM No.106359826 [Report]

>>106359808
Absolutely

Anonymous 8/23/2025, 7:49:12 PM No.106359837 [Report] >>106359906 >>106359908 >>106362860

Anyone using any good Mistral Small models? I've been pretty much exclusively using Magnum Diamond (Cydonia is meh, decent but I think there's other better Mistral small models)?

I really wanna try out the Qwen shit but I can never really get it to work well. Feels like it's really poor at RP (probably prompt issue or some shit). Got 24GB VRAM, 32GB RAM so i'm pretty limited on the shit I can run

Anonymous 8/23/2025, 7:49:30 PM No.106359844 [Report] >>106359854 >>106359899

So now that the dust has settled
What went wrong with DeepSeek 3.1?

Anonymous 8/23/2025, 7:49:38 PM No.106359847 [Report]

>>106359808
She is

Anonymous 8/23/2025, 7:50:39 PM No.106359854 [Report]

>>106359844
Lack of sex modality

Anonymous 8/23/2025, 7:52:25 PM No.106359877 [Report]

Wtf is going on in /aicg/? I come back after a few hours and every post is deleted.

Anonymous 8/23/2025, 7:52:29 PM No.106359879 [Report]

file.png md5: 294f7edc...

muh blackx rights

Anonymous 8/23/2025, 7:54:29 PM No.106359898 [Report]

>>106359808
Stop replying to yourself

Anonymous 8/23/2025, 7:54:32 PM No.106359899 [Report]

>>106359844
People expecting to RP with it using schizo cards.

Anonymous 8/23/2025, 7:55:31 PM No.106359906 [Report]

>>106359837
Devatral

Anonymous 8/23/2025, 7:55:40 PM No.106359908 [Report] >>106360623 >>106363134

>>106359837
Qwen is not that great at writing. Mistral 3.2 is ok. Cydonia is somewhat strange it's not bad though.
Try Gemma 3 or Gemma 3 Glitter specifically. I really like its output (relative) but it's annoying if you are pushing its censorship limits. That works too but you need to groom it first can't just blurt out something or otherwise it'll display suicide hotline disclaimer with phone numbers lel

Anonymous 8/23/2025, 7:55:59 PM No.106359914 [Report] >>106359949 >>106359974 >>106359978 >>106359989 >>106359993

1738785005961858.png md5: a62d4473...

Newfag here, Im trying to build fast local models for erp conversations, what are some models that are on par with qwen-flash's speed? Because those 1-3s delays in most LLMs are a huge turndown for me. We are talking about around 3-500ms with like 100-300 input tokens.

Also in picrel the numbers of gigabytes in parentheses are the memory needed right? How tf are you supposed to have 200GB in your local machine?

Anonymous 8/23/2025, 7:56:57 PM No.106359923 [Report]

>>106359448
https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune
Recommended settings.

Anonymous 8/23/2025, 7:59:49 PM No.106359944 [Report]

cdf614e2f036e2e4c9366f973120dc02.jpg md5: cdf614e2...

Smell will be the next big modality

Anonymous 8/23/2025, 8:01:10 PM No.106359949 [Report] >>106359984

>>106359914
>How tf are you supposed to have 200GB in your local machine?
A lot of money
Personally I'm eyeing a 96gb DDR5 kit

Anonymous 8/23/2025, 8:04:54 PM No.106359974 [Report]

>>106359914
>How tf are you supposed to have 200GB in your local machine?
people off load the model to system ram out of desperation. I suppose you could stack some of those workstation cards like thet Blackwell rtx 6000 but it would cost prohibitive pretty quickly.

Anonymous 8/23/2025, 8:05:08 PM No.106359976 [Report] >>106360038 >>106360066 >>106360527

https://rentry.org/lmg-lazy-spoonfeed-guide
?

Anonymous 8/23/2025, 8:05:16 PM No.106359978 [Report] >>106360195

>>106359914
>even Q1_S is extremely capable
does that mean 1 bit?

Anonymous 8/23/2025, 8:06:16 PM No.106359984 [Report] >>106359998

>>106359949
what would you run with that?

Anonymous 8/23/2025, 8:06:45 PM No.106359989 [Report] >>106360005

>>106359914
>How tf are you supposed to have 200GB in your local machine?
disk space + RAM + VRAM ≥ 250GB
Most of lmg ERPs at 1 t/s.

Anonymous 8/23/2025, 8:07:11 PM No.106359993 [Report] >>106362341

>>106359914
>How tf are you supposed to have 200GB in your local machine?
https://www.amazon.com/Crucial-5600MHz-5200MHz-Compatible-CP2K64G56C46U5/dp/B0DSR5P84D
https://www.amazon.com/G-SKILL-4x64GB-CL36-44-44-96-Desktop-Computer/dp/B0FFKFCLLL
Like this.

Anonymous 8/23/2025, 8:07:58 PM No.106359998 [Report]

>>106359984
Cope quants of bigger models than glm45 air

Anonymous 8/23/2025, 8:09:20 PM No.106360005 [Report] >>106360019 >>106360022 >>106360042

>>106359989
c'mon, you need at least 5t/s for it to be halfway enjoyable. I can run larger models at 1t/s but I never would, what do you do while it's spitting out the response?

Anonymous 8/23/2025, 8:10:53 PM No.106360019 [Report]

>>106360005
>what do you do while it's spitting out the response?
Masturbating and shit posting.

Anonymous 8/23/2025, 8:11:01 PM No.106360022 [Report]

>>106360005
I get 3t/s and I am sure something is fucked with my config + I am on windows.

Anonymous 8/23/2025, 8:11:43 PM No.106360038 [Report] >>106360075

>>106359976
>Pub: 23 Aug 2025 18:04 UTC
>Views: 0
>wixmp.com
Did it have anything useful before?

Anonymous 8/23/2025, 8:12:03 PM No.106360042 [Report]

>>106360005
i just switch tabs and do something else

Anonymous 8/23/2025, 8:14:28 PM No.106360066 [Report] >>106360173

>>106359976
recommending ooba as a first thing to start with is diabolical (no one is going to call this shit text gen ui, fuck off)
llama.cpp has release builds every hour or so anyway if you are running windows, and if you are running linux and can't compile a program then this might not be a hobby for you anyway
overall pretty dogshit guide, if it were really spoonfeeding then it would go from a to z through every part but it's not even half assed, more like quarter assed
a list of recommended programs would be better + a small glossary and that's it

Anonymous 8/23/2025, 8:15:33 PM No.106360075 [Report]

>>106360038
It's just the old spoonfeed guide (https://rentry.org/lmg-spoonfeed-guide) with model recommendations replaced with a link to the rentry in the OP.

Anonymous 8/23/2025, 8:25:00 PM No.106360173 [Report] >>106360265

>>106360066
>a list of recommended programs would be better + a small glossary and that's it
That's already covered by the links in the OP template.

>recommending ooba as a first thing to start with is diabolical
Ooba is still used, and rewriting it to walkthrough llama.cpp instead is too much for a lazy guide.

>overall pretty dogshit guide, if it were really spoonfeeding then it would go from a to z through every part but it's not even half assed, more like quarter assed
It touches on and mentions most things someone starting out will need to know and gets them running. People have to put in some effort themselves too. If someone asks how to run local and you give them a 30 page document, they won't even bother.

It's better than the current getting started guide, no?

Anonymous 8/23/2025, 8:26:25 PM No.106360195 [Report] >>106360219 >>106360238

>>106359978
Allowed quantization types:
2 or Q4_0 : 4.34G, +0.4685 ppl @ Llama-3-8B
3 or Q4_1 : 4.78G, +0.4511 ppl @ Llama-3-8B
38 or MXFP4_MOE : MXFP4 MoE
8 or Q5_0 : 5.21G, +0.1316 ppl @ Llama-3-8B
9 or Q5_1 : 5.65G, +0.1062 ppl @ Llama-3-8B
19 or IQ2_XXS : 2.06 bpw quantization
20 or IQ2_XS : 2.31 bpw quantization
28 or IQ2_S : 2.5 bpw quantization
29 or IQ2_M : 2.7 bpw quantization
24 or IQ1_S : 1.56 bpw quantization
31 or IQ1_M : 1.75 bpw quantization
36 or TQ1_0 : 1.69 bpw ternarization
37 or TQ2_0 : 2.06 bpw ternarization
10 or Q2_K : 2.96G, +3.5199 ppl @ Llama-3-8B
21 or Q2_K_S : 2.96G, +3.1836 ppl @ Llama-3-8B
23 or IQ3_XXS : 3.06 bpw quantization
26 or IQ3_S : 3.44 bpw quantization
27 or IQ3_M : 3.66 bpw quantization mix
12 or Q3_K : alias for Q3_K_M
22 or IQ3_XS : 3.3 bpw quantization
11 or Q3_K_S : 3.41G, +1.6321 ppl @ Llama-3-8B
12 or Q3_K_M : 3.74G, +0.6569 ppl @ Llama-3-8B
13 or Q3_K_L : 4.03G, +0.5562 ppl @ Llama-3-8B
25 or IQ4_NL : 4.50 bpw non-linear quantization
30 or IQ4_XS : 4.25 bpw non-linear quantization
15 or Q4_K : alias for Q4_K_M
14 or Q4_K_S : 4.37G, +0.2689 ppl @ Llama-3-8B
15 or Q4_K_M : 4.58G, +0.1754 ppl @ Llama-3-8B
17 or Q5_K : alias for Q5_K_M
16 or Q5_K_S : 5.21G, +0.1049 ppl @ Llama-3-8B
17 or Q5_K_M : 5.33G, +0.0569 ppl @ Llama-3-8B
18 or Q6_K : 6.14G, +0.0217 ppl @ Llama-3-8B
7 or Q8_0 : 7.96G, +0.0026 ppl @ Llama-3-8B
1 or F16 : 14.00G, +0.0020 ppl @ Mistral-7B
32 or BF16 : 14.00G, -0.0050 ppl @ Mistral-7B
0 or F32 : 26.00G @ 7B
COPY : only copy tensors, no quantizing

Anonymous 8/23/2025, 8:28:18 PM No.106360219 [Report] >>106360243

>>106360195
>we have bitnet at home

Anonymous 8/23/2025, 8:28:55 PM No.106360226 [Report]

file.png md5: f9791543...

i am running a chroot inside a chroot, and i am running things off of different partitions
HOW THE FUCK DO I HIDE THIS SHIT INSIDE MY FILE PICKER AND INSIDE MY FILE MANAGER
HOW TO FUCKING HIDE IT FUCK FUCK FUCK FUCK FUCK!!!!!!!! FUUUUUUUCCCCCCCCCCCCCKKKKKKKKKKKKKKKKKKKKK

Anonymous 8/23/2025, 8:30:20 PM No.106360238 [Report] >>106360272 >>106360289 >>106360289 >>106360298 >>106361847 >>106362195 >>106362206 >>106362235

>>106360195
Thanks, I found a 2 year old version of that chart.
Q1 sounds like there's no way it can be good desu

Anonymous 8/23/2025, 8:30:31 PM No.106360243 [Report] >>106362541

>>106360219
Not quite. Bitnet needs the training to be quantization aware to be near-lossless as they claim. This is not it.

Anonymous 8/23/2025, 8:33:16 PM No.106360265 [Report]

>>106360173
>It's better than the current getting started guide, no?
not really
i could try my hand at writing a guide, but i was here since llama2 released, so i'm not sure what are the pain points for new people of various literacy levels
this shit isn't really rocket science though, i'm sure that a moderately non-retarded person could figure it out in an afternoon or two on their own
you can't really save the lowest common denominator from their own stupidity

Anonymous 8/23/2025, 8:33:59 PM No.106360272 [Report]

llama_quantize.png md5: 2569fd73...

>>106360238
It's the output of llama-quantize without parameters. You should have it on your pc already.
>Q1 sounds like there's no way it can be good desu
Some people are desperate and will do it anyway.

Anonymous 8/23/2025, 8:35:56 PM No.106360289 [Report]

>>106360238
>>106360238
Q1 can be good for HUGE HUGE HUGE models, like deepseek R1, ymmv

Anonymous 8/23/2025, 8:36:28 PM No.106360298 [Report] >>106360369

>>106360238
Any Q1 Deepseek is better than any dense model you have ever tried. The problem of Q1 is that it is essentially enforced greedy sampling. All rerolls are almost the same.

Anonymous 8/23/2025, 8:43:33 PM No.106360369 [Report]

>>106360298
Interesting. So it's hard baked. Hardtack.

Anonymous 8/23/2025, 8:49:29 PM No.106360407 [Report]

>>106359448
to some degree the model is going to act the way it wants to act no matter what, but imo those models need a more restrained prompt than the ones that people used to use for roleplay with dry models, e.g. you don't really want to be encouraging them to use a flashy personality-maxxed hentai writing style or telling them to be extremely creative and unpredictable etc. they do much better with a more neutral prompt

Anonymous 8/23/2025, 8:55:28 PM No.106360462 [Report] >>106360524

>>106358752 (OP)
Do any of you use TTS programs? I'm not looking for the best - I'm looking for fast and low vram, because I want as much of my vram as possible to be dedicated to the LLM, not to the TTS.

Anonymous 8/23/2025, 9:02:37 PM No.106360524 [Report] >>106360565 >>106360638

>>106360462
I used piper for a bit. Had to make the glue between my editor and piper, but it worked. It's stupid fast. Kokorotts is fast too. Not as fast, but I think it sounds a little better. Haven't tried kittentts. It has the smallest models of all three, so it should be faster than piper. One of these days i'll integrate it in my stuff.

Anonymous 8/23/2025, 9:02:55 PM No.106360527 [Report] >>106360554

>>106359976
I agree with the other anon that recommending ooba is dumb.
In my opinion LM Studio (not open source I know) is the easiest to get started with because it uses llama.cpp and it doesn't require any python stuff that filters so many people. Kobold is a distant second.
Anyone serious about this uses llama.cpp and it's not even mentioned.

Anonymous 8/23/2025, 9:06:35 PM No.106360554 [Report]

>>106360527
>Anyone serious about this uses llama.cpp and it's not even mentioned.
It's a spoonfeed guide. Whoever needs to read that, is not serious yet. llama.cpp is mentioned in the OP that retards cannot be bothered to read, so whatever.

Anonymous 8/23/2025, 9:06:59 PM No.106360557 [Report]

>>106358892
real nice

Anonymous 8/23/2025, 9:08:10 PM No.106360565 [Report]

>>106360524
Nice, thanks for the suggestions

Anonymous 8/23/2025, 9:08:23 PM No.106360566 [Report]

I think I moved on from ooba when i got a bug where you couldn't interrupt generation and you had to wait for it to finish. I also still remember how retard forcibly changed API to openAI API and removed the old one. And openAI implementation was obviously bugged so there was no way to use it.

Anonymous 8/23/2025, 9:09:22 PM No.106360573 [Report] >>106360583 >>106360588

So, meta is not releasing Llama 4 Behemoth, right?

Anonymous 8/23/2025, 9:10:25 PM No.106360583 [Report] >>106361215

>>106360573
Right after Grok 2

Anonymous 8/23/2025, 9:11:18 PM No.106360588 [Report] >>106360600

>>106360573
they realized that drummer already has tunes that are named behemoth so they decided to scrap the model to not cause any confusion

Hi all, Drummer here... 8/23/2025, 9:12:51 PM No.106360600 [Report]

>>106360588
damn this drummer guy is pretty badass

WANG 8/23/2025, 9:12:54 PM No.106360601 [Report] >>106360663 >>106360793 >>106360889

Screenshot 2025-08-23 125031.png md5: f784922f...

hi shitfuckers
just wanted to tell you all that me and sam-chan's plan to hollow meta from the inside out has been going smoothly
convincing meta to abandon open source? easy
making lecunny ledone? yawn
convincing zuck to use gpt-5 after spending millions on his "superintelligence" (lol) team? well let's just say zuck is even more of a bottom than sam is
i've also been putting the plans in motion to get that chink ban underway. enjoy your deepsneed while it lasts, because when i'm done the only chinese letters you'll see will be the digits after a dollar sign
lol. good luck, and for those of you who are interested in resources about openness, remember that by reading this you have acknowledged that the wang-sama foundation legally owns your car and your daughter's virginity

Anonymous 8/23/2025, 9:15:19 PM No.106360623 [Report] >>106360682

>>106359908
Isn't Gemma meant to be super retarded when it comes to roleplaying though (misremembering basic details etc)

I remember trying it before, Drummers one and the Abilerated one or something? Both sucked.

Anonymous 8/23/2025, 9:15:28 PM No.106360625 [Report]

>>106359104
this but also cumfartui for diffusion

Anonymous 8/23/2025, 9:16:45 PM No.106360638 [Report]

>>106360524
Nta but thanks, I'll test kittentts and will integrate that to my client. Todo list grows but no work gets done lol

Anonymous 8/23/2025, 9:17:44 PM No.106360651 [Report]

file.png md5: ab442f0e...

drummer why is gemma r1 12b so shit? i swear to god i pulled out a steam deck and then it started doing this 2 messages later when i told it to suck my dick
glm4.5 air chan would never do this

Anonymous 8/23/2025, 9:18:52 PM No.106360663 [Report]

>>106360601
buy an ad

Anonymous 8/23/2025, 9:20:49 PM No.106360682 [Report] >>106363134

>>106360623
No, in my experience even 12b gemma 3 excels and is comparable to larger 24b mistral. I mean I use it for d&d rp and it can cite how much gold my partner has etc. I don't have any complex rules and have tried to make every system prompt rule as concise as possible.
Retardation comes more from its censorship in sexual content but this can be avoided with jailbreak thing plus gemma 3 glitter is somewhat better in this sense.
try it out and if you don't like it into the trash it goes

Anonymous 8/23/2025, 9:21:52 PM No.106360692 [Report] >>106360702

https://youtu.be/mjB6HDot1Uk?t=428

Anonymous 8/23/2025, 9:22:28 PM No.106360702 [Report]

>>106360692
youtube slop

Anonymous 8/23/2025, 9:31:03 PM No.106360793 [Report]

>>106360601
>sam-chan
>not sama-chama
anon...

Anonymous 8/23/2025, 9:39:21 PM No.106360889 [Report] >>106361058 >>106361170

>>106360601
>lecunny ledone
qrd on this?

Anonymous 8/23/2025, 9:46:05 PM No.106360964 [Report] >>106361016 >>106361294

I have 128 GB RAM and 24 GB VRAM. What model that fits is the best for making small scripts? I could run Qwen 235B at like Q2 to Q3, or GLM 4 Air at Q6. Does the quantization hurt Qwen too much for coding or is still the best even when lobotomized?

Anonymous 8/23/2025, 9:50:45 PM No.106361016 [Report] >>106361034

>>106360964
Devatral FP16

Anonymous 8/23/2025, 9:52:22 PM No.106361034 [Report]

>>106361016
That would be really slow though since no more than half of the model could fit in VRAM.

Anonymous 8/23/2025, 9:54:45 PM No.106361058 [Report] >>106361136 >>106363476

1736744801054204.png md5: 502e84d7...

>>106360889

Anonymous 8/23/2025, 10:01:18 PM No.106361136 [Report]

>>106361058
it know

Anonymous 8/23/2025, 10:02:36 PM No.106361151 [Report] >>106361204

has lecun made a statement on genie3 yet? google just went ahead and did what he was dreaming of with that

Anonymous 8/23/2025, 10:04:03 PM No.106361170 [Report]

>>106360889
He reports to Wang now

Anonymous 8/23/2025, 10:04:38 PM No.106361174 [Report] >>106361184 >>106361187

onmeth.png md5: d14c56ef...

>Deepseek R1 400b
>prompt it with: Write uniquely to the tone of {{char}}'s personality.
>Magical shit like this happens

Anonymous 8/23/2025, 10:05:09 PM No.106361184 [Report]

>>106361174
400b?

Anonymous 8/23/2025, 10:05:16 PM No.106361187 [Report]

>>106361174
>Deepseek R1 400b
is this some pruned shit?

Anonymous 8/23/2025, 10:06:03 PM No.106361204 [Report] >>106361208

>>106361151
Do we even know what architecture it uses and what methods they used to train it?

Anonymous 8/23/2025, 10:06:31 PM No.106361207 [Report] >>106361212

https://huggingface.co/xai-org/grok-2

Anonymous 8/23/2025, 10:06:38 PM No.106361208 [Report] >>106361228

>>106361204
No, closed source has finally achieved its moat. All it needed was to fully abandon LLMs.

Anonymous 8/23/2025, 10:07:15 PM No.106361212 [Report]

file.png md5: 335d8f79...

DEEPSEEK WHY WHY!?!?!?!
>>106361207
HOLY SHIT
HE DELIVERED
THANK YOU MUSK SAMA
I APOLOGIZE

Anonymous 8/23/2025, 10:07:32 PM No.106361215 [Report] >>106361224 >>106361234 >>106361251 >>106361256 >>106361270 >>106361355 >>106361358 >>106361677 >>106362380

>>106360583
where is behemoth?
https://huggingface.co/xai-org/grok-2

Anonymous 8/23/2025, 10:08:17 PM No.106361224 [Report]

>>106361215
>500GB
I sleep

Anonymous 8/23/2025, 10:08:29 PM No.106361228 [Report]

>>106361208
If the details are that light then I imagine Lecunny would be incentivized to not make a post about it since he either knows too much and would get into trouble, or he'd have to "speculate".

Anonymous 8/23/2025, 10:09:37 PM No.106361234 [Report]

1732453400163950.png md5: 0c16ec5e...

>>106361215
The fuck is this structure?

Anonymous 8/23/2025, 10:11:09 PM No.106361251 [Report] >>106361491

>>106361215
Use the command below to launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).
does this mean its 8bit?
500gb but 8bit? that means.. 1trillion parameters? BROS BROS?!??!?! BROS!@#?%!#%^)!@$*^()!#@$&*^!)($^ BROS FUCKING BROS HOLY SHIT !!!!

Anonymous 8/23/2025, 10:11:23 PM No.106361253 [Report] >>106361264 >>106361266 >>106361276 >>106361422

Is grok 2 potentially fuckable? As in is it worth it over r1 and glm4.5 for sex?

Anonymous 8/23/2025, 10:11:34 PM No.106361256 [Report] >>106361272

Screenshot 2025-08-23 141118.png md5: 2b46f8c1...

>>106361215

Anonymous 8/23/2025, 10:12:09 PM No.106361264 [Report]

>>106361253
Nah. Grok 3 was okay at erotic creative writing though.

Anonymous 8/23/2025, 10:12:30 PM No.106361266 [Report]

>>106361253
Ask the cloudcucks instead

Anonymous 8/23/2025, 10:12:56 PM No.106361270 [Report] >>106361280 >>106361292 >>106363242

>>106361215
>it's real
Well, /lmg/? Your apology to Elon sir?

Anonymous 8/23/2025, 10:13:09 PM No.106361272 [Report]

>>106361256
I recognize that profile picture

Anonymous 8/23/2025, 10:13:25 PM No.106361276 [Report] >>106361340

>>106361253
no, it was hardly even a good option at the time

Anonymous 8/23/2025, 10:13:38 PM No.106361280 [Report]

>>106361270
He was late by a few days, but it could be the unpaid intern's fault

Anonymous 8/23/2025, 10:14:59 PM No.106361292 [Report] >>106363198

>>106361270
I still think Elon is a despicable piece of shit. This model is late, 2 generations behind and quite frankly worthless. I would only apologize if it made me cum but that is not gonna happen.

Anonymous 8/23/2025, 10:15:00 PM No.106361293 [Report] >>106361308 >>106361309

Grok2 had native image editing, didn't it?

Anonymous 8/23/2025, 10:15:04 PM No.106361294 [Report] >>106361310

>>106360964
235B less quantized than Q1 of R1, and even at Q2 or Q3, it's still better than dense 70B models and more than capable at making small scripts.

Anonymous 8/23/2025, 10:16:36 PM No.106361308 [Report]

>>106361293
I don't think so. Didn't it call flux on the backend?

Anonymous 8/23/2025, 10:16:38 PM No.106361309 [Report]

>>106361293
Grok2 called out to Flux iirc

Anonymous 8/23/2025, 10:16:39 PM No.106361310 [Report] >>106361336 >>106361352

>>106361294
What about Air though?

Anonymous 8/23/2025, 10:19:11 PM No.106361336 [Report]

>>106361310
Never tried it. But you can download both and ask them to do some script and keep the one that does better.

Anonymous 8/23/2025, 10:19:30 PM No.106361340 [Report] >>106361348 >>106361422

>>106361276
So there is zero reason to use it over Deepseek. No one sane is going to actually use it. And we will now get some loud obnoxious saars running around the internet saying that Elon is a friend of open source because of it.

I hope Elon gets cancer soon.

Anonymous 8/23/2025, 10:20:31 PM No.106361348 [Report] >>106361363 >>106361381

>>106361340
go cry on blue cry, more open weights is always a good thing

Anonymous 8/23/2025, 10:20:58 PM No.106361352 [Report]

>>106361310
air is good

Anonymous 8/23/2025, 10:21:24 PM No.106361355 [Report] >>106361361

>>106361215
>b. Restrictions:
>You may not use the Materials, derivatives, or outputs (including generated data) to train, create, or improve any foundational, large language, or general-purpose AI models, except for modifications or fine-tuning of Grok 2 permitted under and in accordance with the terms of this Agreement.
Ewww

Anonymous 8/23/2025, 10:21:54 PM No.106361358 [Report] >>106361853

>>106361215
So it's basically mixtral but 500gb.
"hidden_size": 8192,
"intermediate_size": 32768,
"moe_intermediate_size": 16384,
"num_experts_per_tok": 2,
"num_local_experts": 8,
"num_hidden_layers": 64,

Anonymous 8/23/2025, 10:22:07 PM No.106361361 [Report] >>106361382

>>106361355
No one should want to anyway kek.

Anonymous 8/23/2025, 10:22:18 PM No.106361363 [Report]

>>106361348
Post output

Anonymous 8/23/2025, 10:22:56 PM No.106361368 [Report]

ollama run grok 2

Anonymous 8/23/2025, 10:23:31 PM No.106361378 [Report]

ollama run you're mum

Anonymous 8/23/2025, 10:23:49 PM No.106361381 [Report]

>>106361348
Indeed, only because then losing to Grok in the benchmarks is more embarrassing.
For being fuckable? You'd be stupid not prefer the fucking wildly hallucinating Gemma mini models over Grok.

Anonymous 8/23/2025, 10:23:54 PM No.106361382 [Report] >>106361396

>>106361361
But if Grok 3 and 4 ever get released, they'll likely have the same license.

Anonymous 8/23/2025, 10:25:15 PM No.106361396 [Report] >>106361554

>>106361382
Is Grok 3 and 4 so good to warrant distilling them though?

Anonymous 8/23/2025, 10:28:00 PM No.106361422 [Report] >>106361469

>>106361253
Grok 2 was the one that had the engineers on twitter complaining about how much positivity bias leaked into it from contaminated training data.

>>106361340
If he really wanted to show up Altman, he could have easily released both Grok 2 and 3, and even gpt-oss-sized distill just to rub it in. The probably could have knocked out the distills in a week.

Anonymous 8/23/2025, 10:31:54 PM No.106361460 [Report] >>106361464

Grok 2 saved local

Anonymous 8/23/2025, 10:32:15 PM No.106361464 [Report]

>>106361460
*safed

Anonymous 8/23/2025, 10:32:47 PM No.106361469 [Report] >>106361527

>>106361422
Sir they are working on Grok 5 AGI Companions, they are rightly focusing their attention where it's needed.

Anonymous 8/23/2025, 10:34:24 PM No.106361491 [Report] >>106361508

>>106361251
Weird if true. HF says the tensors are at BF16.

Anonymous 8/23/2025, 10:36:32 PM No.106361508 [Report]

>>106361491
dont trust HF autodetect for anything, its always wrong
if its BF16 even better, only 250b model thats nice

Anonymous 8/23/2025, 10:38:41 PM No.106361527 [Report] >>106361545 >>106361575

>>106361469
If they dumped both 2 and 3 at the same time, they wouldn't have people nagging them to do another release in 6 months because they would have already gotten it out of the way.

Anonymous 8/23/2025, 10:40:16 PM No.106361545 [Report]

>>106361527
Please understand, safety checking needs long time.

Anonymous 8/23/2025, 10:40:52 PM No.106361554 [Report]

>>106361396
Grok 4 is the a SOTA model. Grok 1 and Grok 2 were them just dipping their toes in the water. 3 is when they really started doing decent.

Anonymous 8/23/2025, 10:44:45 PM No.106361575 [Report]

>>106361527
Elon dumps something when his ego needs a stroke, so my guess is it'll probably come when / if OpenAI does another "open" release

Anonymous 8/23/2025, 10:54:34 PM No.106361648 [Report] >>106361673 >>106361681 >>106361688 >>106361743

wahaha cry.jpg md5: ad065cfd...

Petra why are you bullying the facehuggers, you know they're sensitive

Anonymous 8/23/2025, 10:55:49 PM No.106361657 [Report] >>106361680

>>106358189
I like how these faggots are acting all uppity as if markdown rendering is some arcane secret only they control. Anyone could vibecode a clone over a weekend these days

Anonymous 8/23/2025, 10:57:48 PM No.106361673 [Report]

>>106361648
It's really funny

Anonymous 8/23/2025, 10:58:04 PM No.106361677 [Report]

>>106361215
>Usage: Serving with SGLang
https://github.com/sgl-project/sglang/pull/9532/files
Is 'xai_temperature' something like dynamic temperature?

Anonymous 8/23/2025, 10:58:49 PM No.106361680 [Report]

>>106361657
The issue isn't rendering an alternative but hosting that shit

Anonymous 8/23/2025, 10:58:50 PM No.106361681 [Report] >>106361706

>>106361648
How does he not get banned anyway?

Anonymous 8/23/2025, 10:59:27 PM No.106361688 [Report]

petra.png md5: a63e12a1...

>>106361648
You harbour sin brother

Anonymous 8/23/2025, 11:02:16 PM No.106361706 [Report]

file.png md5: 3dd10e19...

>>106361681
hf jannies are based
picrel is from when gpt oss released and i dropped gamer word and whatever else

Anonymous 8/23/2025, 11:05:26 PM No.106361740 [Report] >>106361783

file.png md5: d1e24427...

HAPPENING! CONFIRMED TO BE 260B/A30B
QWEN3 235B BUT SEX AND STUPED
BASED BASED BASED

Anonymous 8/23/2025, 11:06:21 PM No.106361743 [Report] >>106361759

>>106361648
if the companies who create new models and publish them on huggingface realize that the main userbase of open models uses them for porn and obscene purposes, they're more likely to try to pander to us in the future

Anonymous 8/23/2025, 11:08:04 PM No.106361759 [Report]

>>106361743
I'd say the opposite is far more likely, Which he would like since he's been trying to kill the thread for a while now.

Anonymous 8/23/2025, 11:09:53 PM No.106361783 [Report] >>106361801

>>106361740
8 experts 2 active. Therefore slow as shit

Anonymous 8/23/2025, 11:12:12 PM No.106361801 [Report] >>106361853

>>106361783
what about the common/shared ones?

Anonymous 8/23/2025, 11:12:16 PM No.106361802 [Report] >>106361864

Hey /lmg/ scholars, what makes LLMs so sensitive to quantization degradation? I'm quantizing small transformers models (T5, ViT, Bert...) to UINT8 ONNX and get literally 0 degradation over the full FP32 safetensors (and sometimes a very small improvement due to regularization). Why is that so hard to achieve with LLMs?

Anonymous 8/23/2025, 11:12:30 PM No.106361807 [Report] >>106362129

firefox-2025-08-23_17-11-06.jpg md5: 2bdfadc4...

saw the shitposts in that thread and fucking knew it came from here lol

hf admins literally posting as well there so ill await my ban

Anonymous 8/23/2025, 11:16:32 PM No.106361847 [Report]

dither-3596767975.jpg md5: 551fc245...

>>106360238
>Q1 sounds like there's no way it can be good desu
's fine

Anonymous 8/23/2025, 11:17:12 PM No.106361853 [Report]

>>106361801
Look at the config. >>106361358
It don't think it has shared experts

Anonymous 8/23/2025, 11:18:23 PM No.106361864 [Report] >>106361925 >>106362792

>>106361802
depends on usecase
8 bit int is barely quanted anons itt are nutting to Q1
masturbation requires novelty = nuanced token distribution
experiment with QAT

Anonymous 8/23/2025, 11:26:09 PM No.106361925 [Report] >>106361995

>>106361864
You got a point. I forgot you guys run sub-Q8 models

Anonymous 8/23/2025, 11:34:55 PM No.106361990 [Report] >>106361996 >>106362014 >>106362027

>some Ukrainian guy who cited my paper died in Russian strikes last month
welp

Anonymous 8/23/2025, 11:35:28 PM No.106361995 [Report]

>>106361925
big-MoE changed the computational game a bit but I'd say 4-6bit quants are most widely used

Anonymous 8/23/2025, 11:35:52 PM No.106361996 [Report]

>>106361990
lmao that sounds funny, post more info please

Anonymous 8/23/2025, 11:37:32 PM No.106362014 [Report]

>>106361990
that's horrible to hear, alpindale

Anonymous 8/23/2025, 11:38:47 PM No.106362027 [Report] >>106362056 >>106362065 >>106362091 >>106362102

>>106361990
What's the point of writing papers?

Anonymous 8/23/2025, 11:41:35 PM No.106362056 [Report]

>>106362027
To get citations

Anonymous 8/23/2025, 11:42:21 PM No.106362065 [Report]

>>106362027
Showing your peers the size of your dick

Anonymous 8/23/2025, 11:44:47 PM No.106362091 [Report]

>>106362027
publish or perish

Anonymous 8/23/2025, 11:46:07 PM No.106362102 [Report] >>106362115

>>106362027
publishing papers makes you a "researcher" and eligible for free money from universities if you're part of their sekrit club of academics
this way you can live off your degree without getting a real job or doing anything productive

Anonymous 8/23/2025, 11:47:25 PM No.106362115 [Report]

>>106362102
>eligible for free money from universities if you're part of their sekrit club of academics
Lmg, please elaborate

Anonymous 8/23/2025, 11:48:58 PM No.106362129 [Report] >>106362162 >>106362209 >>106362282

>>106361807
>lmao what a good meme
>that will be $0.16
Do cloudcuckd really?

Anonymous 8/23/2025, 11:52:48 PM No.106362162 [Report] >>106362277

>>106362129
put it another way, if I'm paying $0.16 for cloudshit, I do want my dick sucked, metaphorically or not.

Anonymous 8/23/2025, 11:56:30 PM No.106362195 [Report]

>>106360238
>Q1 sounds like there's no way it can be good desu
all the people who are positive about q1 are hard copers

Anonymous 8/23/2025, 11:57:23 PM No.106362206 [Report] >>106362235 >>106363781

>>106360238
q1 is cope, you need at least dynamic q2 for a close to lossless experience with big moe models such as deepseek r1 0528

Anonymous 8/23/2025, 11:57:43 PM No.106362209 [Report]

>>106362129
Wait till they start asking for tips.

Anonymous 8/24/2025, 12:00:19 AM No.106362235 [Report]

>>106360238

Listen to what this anon said >>106362206

Anonymous 8/24/2025, 12:01:12 AM No.106362246 [Report] >>106362266

>>106359148
What has changed with cards?

Anonymous 8/24/2025, 12:03:04 AM No.106362266 [Report] >>106362289

>>106362246
Models are now powerful enough to take all the schizo ramblings in your card literally

Anonymous 8/24/2025, 12:04:37 AM No.106362277 [Report]

>>106362162
Unfortunately they make it reluctant to suck my dick when I want it to (ERP) and overly eager to suck my dick when I don't want it to (coding).

Anonymous 8/24/2025, 12:05:20 AM No.106362282 [Report] >>106362388 >>106362392

>>106362129
its not opus and im not spending that much - its the new 235b qwen3 with the cost multiplied by a random big number

retard discord users tend to like the responses more if they see it costs money and its claude - human psychology

Anonymous 8/24/2025, 12:06:03 AM No.106362289 [Report] >>106362315

>>106362266
>Models are now powerful enough to take all the schizo ramblings in your card literally
So whats the effective limit for tokens now?

Anonymous 8/24/2025, 12:08:35 AM No.106362315 [Report] >>106362351

>>106362289
it's not about the amount of tokens, it's what you do with them

Anonymous 8/24/2025, 12:11:40 AM No.106362341 [Report]

>>106359993
>2 memory channels
LOL

Anonymous 8/24/2025, 12:12:33 AM No.106362351 [Report] >>106362845 >>106362855

>>106362315
>it's what you do with them
Yeah? whats the smallest card you've seen work well? how many tokens?

Anonymous 8/24/2025, 12:15:47 AM No.106362380 [Report] >>106362563

>>106361215
the discussion thread is lol :)

Anonymous 8/24/2025, 12:16:38 AM No.106362388 [Report]

>>106362282
based

Anonymous 8/24/2025, 12:16:42 AM No.106362389 [Report]

>>106359766
>>106359787
>>106359824
Thanks, llama.cpp + OpenWebUI is way faster. Maybe I'll check out other frontends later. I'm new at this and just used ollama + OpenWebUI because that's the advice that seemed most common online.

Anonymous 8/24/2025, 12:17:02 AM No.106362392 [Report]

>>106362282
kek

Anonymous 8/24/2025, 12:19:49 AM No.106362417 [Report] >>106362439 >>106362441 >>106362442 >>106362476 >>106362483 >>106362602 >>106363177 >>106363241 >>106363972

Grok.jpg md5: b26799c9...

Elon claims Grok 3 will be open sourced "in about 6 months."
https://x.com/elonmusk/status/1959379349322313920

Anonymous 8/24/2025, 12:21:33 AM No.106362430 [Report]

What is the use case for grok 2 when deepseek and qwen3 coder 480b exist?

Anonymous 8/24/2025, 12:22:47 AM No.106362439 [Report]

>>106362417
Seems like you need to x2 every timeframe he gives.

Anonymous 8/24/2025, 12:22:54 AM No.106362441 [Report]

>>106362417
I hereby formally apologize to Elon.

Anonymous 8/24/2025, 12:22:55 AM No.106362442 [Report]

>>106362417
Would be nice. Grok 3 is an okay creative writing model

Anonymous 8/24/2025, 12:26:42 AM No.106362476 [Report] >>106362485

>>106362417
When will Ani be opensourced?

Anonymous 8/24/2025, 12:27:26 AM No.106362483 [Report]

>>106362417
i kneel. fucking BASED

Anonymous 8/24/2025, 12:27:40 AM No.106362485 [Report]

>>106362476
I will open her source

Anonymous 8/24/2025, 12:33:24 AM No.106362541 [Report] >>106362566 >>106362624 >>106362874

>>106360243
>Bitnet needs the training to be quantization aware to be near-lossless as they claim.
Pre-training.

What was QAT is now QApT. QAT is now trash thanks to Google and Gemma 3 poisoning the well.

Anonymous 8/24/2025, 12:34:16 AM No.106362544 [Report]

>>106358752 (OP)
Reminder Miku is canonly skinny and have flat chests

Anonymous 8/24/2025, 12:36:41 AM No.106362563 [Report]

Grok2.png md5: 5be475a7...

>>106362380
You're not kidding

Anonymous 8/24/2025, 12:37:00 AM No.106362566 [Report] >>106362839

>>106362541
>QAT is now trash thanks to Google and Gemma 3 poisoning the well.

what happened?

Anonymous 8/24/2025, 12:37:04 AM No.106362568 [Report]

>>106359808
Miku as character? No.
mikuspammers definitely are, though

Anonymous 8/24/2025, 12:40:54 AM No.106362602 [Report]

>>106362417
Elon sir delivered!

Anonymous 8/24/2025, 12:42:47 AM No.106362624 [Report]

>>106362541
>Pre-training.
Distinction without a difference and a stupid naming convention. It's training.

Anonymous 8/24/2025, 12:46:05 AM No.106362644 [Report] >>106362661 >>106362669

https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF/discussions/2#68a9cfca361af4a168b42b74
In case anyone else tried to make DS 3.1 reasoning work with ST chat completion.

Anonymous 8/24/2025, 12:48:30 AM No.106362661 [Report] >>106362676 >>106362699

>>106362644
So, is it worth it to make the jump from R1/V3?

Anonymous 8/24/2025, 12:49:49 AM No.106362669 [Report]

>>106362644
>ubergarm
I have seen this name somewhere...

Anonymous 8/24/2025, 12:51:14 AM No.106362676 [Report]

>>106362661
It can deal better with longer context, but it's more autistic so you have to be more explicit about what you want it to do.

Anonymous 8/24/2025, 12:54:04 AM No.106362694 [Report] >>106362765

u.png md5: 862b412d...

Anonymous 8/24/2025, 12:55:01 AM No.106362699 [Report]

>>106362661
Not if your primary use case is Vocaloid/UTAU birthday asking at IQ1KT. V3 0324 is better here

Anonymous 8/24/2025, 1:00:30 AM No.106362735 [Report] >>106362747

/pol/ favorite celebrity did something, now the serbian is going to be like a kid on a sugarrush all weekend

Anonymous 8/24/2025, 1:03:27 AM No.106362747 [Report]

>>106362735
As one of the prime shitposters I can confirm that I am not feeling like shitposting that much now.

Anonymous 8/24/2025, 1:07:22 AM No.106362765 [Report]

>>106362694
will he quant the grok?

Anonymous 8/24/2025, 1:11:58 AM No.106362792 [Report]

1589887234978.gif md5: 7f20500f...

>>106361864
>experiment with QAT
Was MXFP4 really a mistake?

Anonymous 8/24/2025, 1:12:52 AM No.106362795 [Report] >>106362807 >>106362812 >>106362823 >>106362840

>qwen3-30b-a3b-thinking-2507-q8 slower on ollama but thinks efficiently
>qwen3-30b-a3b-thinking-2507-q8 faster on llama.cpp but keeps repeating itself in the thinking block so the speed gains are negated
What's going on? Why is the same model with the same quantization behaving differently on ollama and llama.cpp? What should I tweak to make the llama.cpp model behave more like the ollama model and reduce overthinking to actually benefit from the faster inference?

Anonymous 8/24/2025, 1:14:13 AM No.106362807 [Report]

>>106362795
ollama is slow trash and you didn't set the samplers correctly on llama.cpp, causing repetition

Anonymous 8/24/2025, 1:14:29 AM No.106362812 [Report]

>>106362795
You need to inspect the prompt, the hyperparameters, and the launch parameters the backends are getting and compare them.

Anonymous 8/24/2025, 1:15:51 AM No.106362823 [Report]

>>106362795
Get a grip on your inference infrastructure and understand what it's actually doing under the hood. Log the raw text input and diff it

Anonymous 8/24/2025, 1:15:53 AM No.106362825 [Report]

Btw I heard that ikllama KT quants are bad at the moment. But what is the problem with them specifically? I wanted to do an R1 IQ2_KT quant since I tried exl3 70B like that and I was surprised how good it was.

Anonymous 8/24/2025, 1:17:20 AM No.106362839 [Report]

>>106362566
>what happened?
Google says they use QAT for Gemma 3, when it's just quantization aware fine-tuning.

Anonymous 8/24/2025, 1:17:57 AM No.106362840 [Report]

>>106362795
Does the llama.cpp log have a warning about a double BOS token?
The GGUF file can define that one should be added and if you then also add one in your prompt you can end up with two.

Anonymous 8/24/2025, 1:18:13 AM No.106362845 [Report] >>106362856

>>106362351
>how many tokens
Seven

Anonymous 8/24/2025, 1:19:40 AM No.106362855 [Report]

>>106362351
>You're X.
You don't need more

Anonymous 8/24/2025, 1:19:47 AM No.106362856 [Report]

>>106362845
We have sex. You are a pony.

Anonymous 8/24/2025, 1:20:40 AM No.106362860 [Report]

>>106359837
Broken-tutu-24b, turn off all samplers, they adversely affect output, causing bad repetition.

Anonymous 8/24/2025, 1:23:00 AM No.106362874 [Report]

>>106362541
>distillation
>omni
>QAT
They keep watering down established terms.

Anonymous 8/24/2025, 1:54:30 AM No.106363056 [Report]

Timeline.png md5: 42c3f64f...

>>106358892
>Adding cuck and shot to the timeline.
A big vulgar and unneeded but more importantly wasn't there initially. Why even include them on the timeline?

Anonymous 8/24/2025, 2:05:12 AM No.106363134 [Report] >>106363427

>>106359908
>>106360682
do you use 27b?

It's so fucking slow, even compared to 32b models for me.

Anonymous 8/24/2025, 2:10:36 AM No.106363177 [Report]

file.jpg md5: 8c6efd38...

>>106362417
kek at how fast trannies itt change their flip-flops

Anonymous 8/24/2025, 2:14:19 AM No.106363198 [Report]

>>106361292
>I still think Elon is a despicable piece of shit
no truer words have ever been spoken.

Anonymous 8/24/2025, 2:14:38 AM No.106363201 [Report] >>106363258 >>106363281 >>106363371

Can someone explain quants to me?

Is it true Q4 K_M is all you really need? I usually go for the highest that my GPU can handle but I literally can't see a difference between Q4 K_M and Q5 K_M but i've not tested long enough to know.

Anonymous 8/24/2025, 2:21:01 AM No.106363235 [Report] >>106363245 >>106363291 >>106363294

1744388702202956.png md5: 94a2b36b...

song of the day featuring miku and teto, tenntekomai girl
https://www.nicovideo.jp/watch/sm45323744

Anonymous 8/24/2025, 2:22:06 AM No.106363241 [Report]

>>106362417
safetykeks btfo
I hope he buys meta too and fixes the shit out of their models

Anonymous 8/24/2025, 2:22:12 AM No.106363242 [Report]

>>106361270
I will always be glad that Elon kicked started the space industry after it was stagnant for decades. But my god he can't help himself from burning bridges and flying off the handle for no good reason. Hopefully he someday learns how to keep his shit together because eventually he will run out of bridges to burn.

Anonymous 8/24/2025, 2:22:43 AM No.106363245 [Report] >>106363260 >>106363368 >>106363848

>>106363235
Is this Japanese youtube?

Anonymous 8/24/2025, 2:24:12 AM No.106363258 [Report] >>106363328

mmlu_vs_quants.png md5: 036662f3...

>>106363201
The smaller the more degradation, generally.
Basically, since you are losing numerical precision of the numbers that are being used in the calculations, each "internal nudge" towards the final output is that little bit more different ("inaccurate") compared to full precision.
Something like that.
How much the degradation is noticeable or matter will depend on a lot.
The heuristic is, use the largest bpw (correlated with file size) that you can run at the speeds you are comfortable with the context size you need.

Anonymous 8/24/2025, 2:24:29 AM No.106363260 [Report]

>>106363245
yep! and if you've ever seen those videos where viewers' comments are scrolling from right to left, across the main screen of the video, that's where it comes from

Anonymous 8/24/2025, 2:26:07 AM No.106363281 [Report] >>106363328

>>106363201
>Can someone explain quants to me?
Accuracy goes down as the quantization becomes more aggressive. Generally, bigger models handle low bit quants better than smaller ones. That's it.
>Is it true Q4 K_M is all you really need?
If q4km is good enough for you, use that. If you can manage to run something bigger and tolerate the speed, use that instead. If you need more memory, use a lower bit quant.
It depends on the problem you're trying to solve and your expectations. This is not me asking what the problem is nor what your expectations are. It's something you have to evaluate yourself.

Anonymous 8/24/2025, 2:27:09 AM No.106363291 [Report]

create machines guy head bopping banging dancing animation_thumb.jpg.webm md5: 56cbf704...

WebM not supported

>>106363235

Anonymous 8/24/2025, 2:27:44 AM No.106363294 [Report] >>106363433

>>106363235
When is crypton going to give up and let Synth V make a Miku voicebank? She sounds awful compared to Teto.

Anonymous 8/24/2025, 2:31:40 AM No.106363328 [Report]

>>106363258
>>106363281
nah I get that stuff, I just read that most graphs show for the standard models (not speaking the crazier sized ones, moreso in the sub 34b range) that Q4_M is sort of the sweet spot or some shit but I have no idea so figured one of you guys may know more.

I'll stick to Q4s for a while, see how they feel.

Anonymous 8/24/2025, 2:35:51 AM No.106363368 [Report]

>>106363245
unironically better for my eyes than american jewtube
now if only I knew more japanese...

Anonymous 8/24/2025, 2:36:06 AM No.106363371 [Report] >>106363415

color_depth_llm_comparison.jpg md5: fcee3aea...

>>106363201
quantization is a mapping of the models weights down to a smaller size.
weights are basically floating point numbers.
basically it is like images, the lower the amount of bits you can store for the image, the less accurate the picture will be to the original.

Anonymous 8/24/2025, 2:40:47 AM No.106363415 [Report] >>106363418 >>106363423

>>106363371
We never needed more than 4bit color.

Anonymous 8/24/2025, 2:41:29 AM No.106363418 [Report]

>>106363415
I like to give my models 6 bits, as a treat.

Anonymous 8/24/2025, 2:41:44 AM No.106363423 [Report] >>106363429

>>106363415
the greens got duller

Anonymous 8/24/2025, 2:42:15 AM No.106363427 [Report]

>>106363134
I've never gotten Gemma to work to good for me either for similar reasons. I prefer it's writing style to Mistral small but having a 24GB card I struggle to find reason to pick Gemma over Mistral when one is so much quicker.

Anonymous 8/24/2025, 2:42:36 AM No.106363429 [Report]

>>106363423
You noticed that but not the blues turning gray?

Anonymous 8/24/2025, 2:42:53 AM No.106363433 [Report]

tetter.jpg md5: e505e8f9...

>>106363294
Most SV Tetos while vastly more natural sounding compared to the old sovl UTAU and vocaloid, sound the same. Might be that few producers make an effort with tuning to make her sound different.
Gets boring. Kino exception: https://www.youtube.com/watch?v=ekrAP7mzKa0
New vocaloid versions are meh imo, trying too hard to sound "real". Vocaloid V2-4 Miku variations sound different and have the soul of imperfection.
https://www.youtube.com/watch?v=rQRlSJJ0OrI

Anonymous 8/24/2025, 2:46:17 AM No.106363452 [Report] >>106363479 >>106363495 >>106363504 >>106363737

63643.jpg md5: 69dff652...

GPT OSS VS Grok 2 VS maverick
Who is the king of local?

Anonymous 8/24/2025, 2:49:09 AM No.106363476 [Report]

>>106361058
>Of course! This is a great question that gets to the heart of
you edited this right? I hope so

Anonymous 8/24/2025, 2:49:27 AM No.106363479 [Report] >>106363518

>>106363452
me :D

Anonymous 8/24/2025, 2:51:06 AM No.106363495 [Report]

>>106363452
GPT OSS

Anonymous 8/24/2025, 2:52:14 AM No.106363504 [Report]

>>106363452
Drummer

Anonymous 8/24/2025, 2:54:14 AM No.106363518 [Report] >>106363525

>>106363479
I didn't vote for you.

Anonymous 8/24/2025, 2:55:24 AM No.106363525 [Report] >>106363540

>>106363518
Alright, rank them.
1. GPT OSS
2. Grok 2
3. Llama 4 Maverick
4. meee

Anonymous 8/24/2025, 2:58:04 AM No.106363540 [Report] >>106363743

>>106363525
(You) > Grok 2 > Llama 4 Maverick > GPT OSS
Omitting any Chinese options is cheating though.

Anonymous 8/24/2025, 3:24:05 AM No.106363737 [Report]

>>106363452
glm 4.5 air

Anonymous 8/24/2025, 3:24:32 AM No.106363743 [Report]

>>106363540
i know my worth (less than chinese models)

Anonymous 8/24/2025, 3:26:03 AM No.106363757 [Report] >>106363767

Screenshot 2025-08-23 at 18.25.38.png md5: 313878fa...

>being excited for >this

Anonymous 8/24/2025, 3:27:49 AM No.106363767 [Report] >>106363777

>>106363757
He also said grok 3 in six months, which means this is just precedent-setting. I'm more excited for OpenAI getting their lunch eaten from all angles than the actual release themselves.

Anonymous 8/24/2025, 3:28:59 AM No.106363777 [Report] >>106363794 >>106363861

>>106363767
Dropping a model not a single person will ever use isn't eating anyone's lunch.

Anonymous 8/24/2025, 3:29:25 AM No.106363781 [Report] >>106363832

>>106362206
>dynamic q2 for a close to lossless experience
Really? Q2? How much better is that than Q1?

Anonymous 8/24/2025, 3:31:05 AM No.106363794 [Report] >>106363850

>>106363777
Why won't anyone use it?

Anonymous 8/24/2025, 3:35:53 AM No.106363832 [Report]

>>106363781
twice as many Qs bro

Anonymous 8/24/2025, 3:38:32 AM No.106363848 [Report]

>>106363245
No. YouTube is western nicodou

Anonymous 8/24/2025, 3:38:48 AM No.106363850 [Report] >>106363858

>>106363794
It's old, big, and dumb. Much like your mother. They didn't even have the decency to release the base model.

Anonymous 8/24/2025, 3:39:39 AM No.106363858 [Report]

>>106363850
That sucks

Anonymous 8/24/2025, 3:39:47 AM No.106363859 [Report]

how much ram does mistral small 24b take up at 128k context?

Anonymous 8/24/2025, 3:39:58 AM No.106363861 [Report] >>106363868

>>106363777
It makes OpenAI's future retiring of 4o instead of open-sourcing look very weak, especially after their last open source shitshow. It causes a public loss of confidence, which is a useful antidote to their arrogance.

Anonymous 8/24/2025, 3:40:56 AM No.106363868 [Report] >>106363903

>>106363861
They'll say it's too dangerous to release, and for all I know they actually believe that.

Anonymous 8/24/2025, 3:45:21 AM No.106363903 [Report] >>106363917 >>106364354 >>106364450

>>106363868
The engineers and safety researchers might believe it, but I don't believe for a minute that sam does.

Anonymous 8/24/2025, 3:47:30 AM No.106363917 [Report]

>>106363903
Well, companies only open source last gen technology. And GPT-4o is still current gen for OpenAI :)

Anonymous 8/24/2025, 3:55:00 AM No.106363972 [Report]

>>106362417
Gotta hand it to this,
He delivers (eventually).

Anonymous 8/24/2025, 4:01:16 AM No.106364011 [Report] >>106364019 >>106364650

grok-2 gguf status

Anonymous 8/24/2025, 4:02:47 AM No.106364019 [Report]

>>106364011
sir sglang is all you need sir

Anonymous 8/24/2025, 4:50:22 AM No.106364258 [Report]

>>106358772
With a 5090 rtx what could i make as far as video?

Anonymous 8/24/2025, 4:51:47 AM No.106364271 [Report] >>106364467

wtf.jpg md5: 6604e3a7...

Can anyone explain to me how koboldccp works with the offloading shit.

Why does it automatically start reducing the layers when I take something easy like say a 24b Mistral Small model up to 24k context. Does that mean my VRAM isn't enough or something? Because when I just manually set it to 43/43 it works fine, even quicker I think.

Should I ignore that Auto Offload Layer shit entirely?

Anonymous 8/24/2025, 5:03:31 AM No.106364354 [Report]

>>106363903
They kneecapped Toss, there's no way they'll ever release 4o

Anonymous 8/24/2025, 5:18:39 AM No.106364439 [Report]

Did anyone manage to get any TTS models working with RDNA3/ROCm on Arch? I need someone to explain how like I'm a fucking retard, every attempt I've made has failed despite using the rocm torch packages, onnx and whatever else, I always end up with dependency conflicts I fucking hate python environments and pip packages so much

Anonymous 8/24/2025, 5:21:33 AM No.106364450 [Report]

>>106363903
>but I don't believe for a minute that sam does.
The fact that Sam released the stinking pile of shit that was OSS makes me 50/50 whether he was trying to poison future open models from other companies with the approach that takes a sledgehammer to intelligence in the name of "safety", or whether he's genuinely schizo and believes peasants don't deserve what amounts to private internet access

Anonymous 8/24/2025, 5:25:08 AM No.106364467 [Report]

>>106364271
>Auto Offload
jank; ignore. Use that which makes it go faster through trial and error, then save the good config.

Anonymous 8/24/2025, 5:44:21 AM No.106364579 [Report]

1750864005822845.jpg md5: 76340d16...

>>106358757
>Copyright lawsuit accuses Meta of using pirated adult films for AI training:
Kek

Anonymous 8/24/2025, 5:47:44 AM No.106364598 [Report]

I'd use Grok 2 as my Ani's brain. SADly I'm vramlet and ramlet

Anonymous 8/24/2025, 5:56:46 AM No.106364650 [Report]

>>106364011
not needed

Anonymous 8/24/2025, 5:57:12 AM No.106364653 [Report]

>>106364639
>>106364639
>>106364639