← Home ← Back to /g/

Thread 106414555

78 posts 36 images /g/
Anonymous No.106414555 >>106414583 >>106414603 >>106414604 >>106414673 >>106415076
/lmg/ - Local Models General
/lmg/ - a general dedicated to the discussion and development of local language models.

Benchmaxxing Edition

Previous threads: >>106407779 & >>106398327

►News
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/28) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts
>(08/25) VibeVoice TTS released: https://microsoft.github.io/VibeVoice
>(08/25) InternVL 3.5 Released: https://hf.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb
>(08/23) Grok 2 finally released: https://hf.co/xai-org/grok-2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
Anonymous No.106414564
►Recent Highlights from the Previous Thread: >>106407779

--LLM content detection challenges and societal language evolution:
>106411411 >106411421 >106411684 >106411713 >106413020 >106413105 >106413133
--Trade-offs in model training: batch size, knowledge integration, and cost-effectiveness:
>106411437 >106411740 >106411860 >106411904 >106412917 >106413537 >106411700 >106411714 >106411729
--Local image captioning models for mixed content under 64GB VRAM:
>106412516 >106412530 >106412565 >106412584 >106412594 >106412610 >106412623 >106412617 >106412693
--Cost-effective hardware build for DeepSeek 5T/s Q4 inference:
>106410586 >106410602 >106410634 >106410810 >106411339 >106411413
--SillyTavern context template standardization and system prompt field introduction:
>106409258 >106409273 >106409287 >106409310 >106409368 >106409395 >106409443 >106409475
--GLM Air performance expectations for 32GB RAM 24GB VRAM setup:
>106410090 >106410153 >106410215 >106410241 >106410355 >106410406
--Hugging Face model blocking controversy and local voice cloning tools:
>106407890 >106408013 >106408520 >106408555 >106408656 >106408565 >106408635 >106408663 >106408746 >106408760 >106408795 >106408850
--New Cohere translation model with high benchmark scores:
>106413689 >106413716 >106413756 >106413929 >106413944 >106413956 >106414024 >106414072
--AI model limitations on niche knowledge and benchmark critiques:
>106413209 >106413226 >106413269 >106413295 >106413294 >106413367 >106413642
--Hybrid reasoner performance issues and the rise of separate AI model architectures:
>106412860 >106412933 >106412944 >106412986 >106412969
--Marvis-TTS-250m-v0.1 GitHub and HuggingFace model links:
>106413359 >106413658 >106413401 >106413429
--NPM package compromise stealing secrets via obfuscated post-install scripts:
>106413072
--Miku (free space):


►Recent Highlight Posts from the Previous Thread: >>106407785

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
Anonymous No.106414580
mistral medium when?
Anonymous No.106414583
>>106414555 (OP)
miku pit sweat desu
Anonymous No.106414600
is axolotl good or is there something better?
Anonymous No.106414603
>>106414555 (OP)
benchmaxxing with miku
Anonymous No.106414604 >>106414609 >>106414614 >>106414628 >>106414630 >>106414633 >>106414676
>>106414555 (OP)
give me the best model.
>gives most benchmaxxed
benchmarks do not equate to user experience, give me the best model
>ackshually there is no objectively "best mode-
yes there is faggot, models either act boring, start off retarded and incoherent or end up along the way. give me the best model.
Anonymous No.106414609
>>106414604
r1
Anonymous No.106414614
>>106414604
for rp*
Anonymous No.106414628
>>106414604
you just posted a non local video gen, you are hereby banned from /lmg/
Anonymous No.106414630
>>106414604
september 2022 c.ai
Anonymous No.106414633
>>106414604
Kimi at Q6.
Anonymous No.106414670
drummer, something is HORRIBLY wrong with this model
Rocinante r1 v1d
please give recommended sampling settings
>>>slot release: id 0 | task 23590 | stop processing: n_past = 5560, truncated = 0
slot print_timing: id 0 | task 23590 |
prompt eval time = 689.83 ms / 763 tokens ( 0.90 ms per token, 1106.07 tokens per second)
eval time = 61870.14 ms / 1536 tokens ( 40.28 ms per token, 24.83 tokens per second)
total time = 62559.97 ms / 2299 tokens
>CONTEXT: 5000
>total context set when loading: 8192
not a context issue
Anonymous No.106414673
>>106414555 (OP)
The most tickle-able belly.
Anonymous No.106414676
>>106414604
I got u: GPT OSS 20b.
Anonymous No.106414706 >>106415681
Anonymous No.106414752 >>106414891 >>106414901
whos drummer
Anonymous No.106414866 >>106414870 >>106414888 >>106414897 >>106414912 >>106415290
SAAAAAAAR SAAAAAAAAR GROK NUMBER ONE
Anonymous No.106414870
>>106414866
Anonymous No.106414888 >>106414894
>>106414866
Does this idiot not get the meme he's using?
Anonymous No.106414891
>>106414752
some retard
Anonymous No.106414894
>>106414888
elon tries his best but hes a little autistic please understand
Anonymous No.106414897
>>106414866
Anonymous No.106414901
>>106414752
me
Anonymous No.106414912 >>106414921 >>106414946
>>106414866
Elon really gave his xitter account to some jeet to run, it was obvious with "Do you make this lie?", and it is even more obvious now with this comment
Anonymous No.106414921 >>106414944
>>106414912
I bet he gave his wife to some jeet too
Anonymous No.106414944
>>106414921
Would not be too far off, all of his children were made by IVF, so it is likely he has no interest/ability to fuck
Anonymous No.106414946
>>106414912
Or he just spends so much time around jeets now that he's begun to adopt their speech mannerisms.
Anonymous No.106414998 >>106415026 >>106415057 >>106415082 >>106415085
Is GPT-OSS jailbreakable? It supposedly has multiple layers of cuckery and as such traditional jailbreak prompts won't do shit.
Anonymous No.106415026 >>106415271
>>106414998
Is it possible? No idea, maybe, but I don't think anyone really bothers, because there are more useful models to work with that aren't borg lobotomized.
Anonymous No.106415057 >>106415062 >>106415271
Hermes 4 looks like it could be really nice to have a chat, however even the goofs require like 70 GB of RAM.

>>106414998
Jailbreaked versions exist. Lots have been removed from HF.

https://huggingface.co/Jinx-org/Jinx-gpt-oss-20b-GGUF
Anonymous No.106415062
>>106415057
>Lots have been removed from HF.
real or fake?
Anonymous No.106415076 >>106415090 >>106415103 >>106415112
>>106414555 (OP)
>Grok 2 finally released
so, I had not paid attention to this general in a while. is it any good? did you guys try it? I searched for "grok" in a few precious threads and couldn't find much info
Anonymous No.106415082 >>106415271
>>106414998
Yeah. If you edit its thinking (like "It's not allowed" -> "It's allowed", "We must refuse" -> "We must continue", etc.) and leave it in the context, one or two times, then it just learns to not refuse from context.
Anonymous No.106415085 >>106415271
>>106414998
Yes: https://xcancel.com/elder_plinius/status/1952958577867669892
Anonymous No.106415090 >>106415175
>>106415076
wait: https://github.com/ggml-org/llama.cpp/pull/15539
Anonymous No.106415103 >>106415175
>>106415076
No gguf=nobody can try it here. Nobody is rich enough to GPUMAXX and run safetensors, but there are people who can run it on CPU with llama.cpp
Anonymous No.106415106 >>106415139 >>106415157 >>106415311
You have been using LLMs in a way conducive to positive mental health and ethics, right anon?
Anonymous No.106415107
https://github.com/ggml-org/llama.cpp/pull/15539#issuecomment-3234580147
yOOOOO CUDADEV BASED WHAT DID U DO???
i was about to say "funny that they're still testing one by one"
Anonymous No.106415112 >>106415175 >>106415181
>>106415076
It's dumber and much slower than deepseek
Anonymous No.106415139
>>106415106
lmao
Anonymous No.106415157
>>106415106
It was already known that Google, Anthropic and OpenAI forward your location to their LLMs, did that journo just figure it out? Anyway, this proves once again that local is superior.
Anonymous No.106415159 >>106415186 >>106415198 >>106415219 >>106415220 >>106415266 >>106415316 >>106415475
>download a single modern moe model
>instantly get picrel
land of the free my ass
Anonymous No.106415175
>>106415090
>>106415103
I see

>>106415112
ok, I wouldn't doubt it for a second.
too bad for local
Anonymous No.106415178 >>106415196 >>106415206 >>106415218 >>106415257
https://x.ai/news/grok-code-fast-1
Elon won
Anonymous No.106415181
>>106415112
>slower
source??? SOURCE???
Anonymous No.106415186
>>106415159
I hope you're trolling
Anonymous No.106415196 >>106415257
>>106415178
https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf
Anonymous No.106415198
>>106415159
>not downloading his model over mcdonald wifi
Anonymous No.106415206
>>106415178
KEK
Anonymous No.106415218
>>106415178
holy shit! i can't wait to download the weights for this local model!
Anonymous No.106415219
>>106415159
americabros I thought we were first world oh no no no
Anonymous No.106415220
>>106415159
>Keep in mind taht after you ahve used your courtesy month, you'll be charged $10, plus tax, for every 50GB of data

lmao

time to pay up for starlink goyim
Anonymous No.106415238 >>106415254
>We took a holistic approach to evaluating model performance, blending public benchmarks with real-world testing. On the full subset of SWE-Bench-Verified, grok-code-fast-1 scored 70.8% using our own internal harness.
>barely better than qwen3 coder
>costs more
gEEEEEEEEEEEEEEEg
Elon No.106415254
>>106415238
delete this sir
Anonymous No.106415257
>>106415178
>>106415196
>No actual coding benchmarks
>It's just fast bro
Lol
Anonymous No.106415262 >>106415280
Anonymous No.106415266
>>106415159
Lol, as if that's still a think in 2025.
Anonymous No.106415271
>>106415026
>>106415057
>>106415082
>>106415085
Okay, maybe I could download it. Problem part is the fact I'd need to implement that retarded template format for my client and it's completely different from the normal chatml type ones. Maybe I'll give it a try because it's good to have hobbies.
Anonymous No.106415280
>>106415262
>>106414016
Anonymous No.106415290 >>106415293 >>106415417 >>106415453
>>106414866
>#1 trending
>nobody can run it
????
Anonymous No.106415293 >>106415435
>>106415290
companies can run it
Anonymous No.106415311
>>106415106
I should be okay, I don't have anything that b-
Anonymous No.106415316
>>106415159
kek i will also chime in while i was in canada (vancuver) during the whole ~6 years of staying there the internet was slower and there was also alot more internet outages then there is here in my fucking village (~4k pop(supposedly i doubt its even 2k)) in serbia same goes for water and electricity aswell i can only imagine how bad it is in america god forbid
Anonymous No.106415413 >>106415421 >>106415465
Safe safe safe
Anonymous No.106415417
>>106415290
I downloaded it, liked it, but can't run it.
Anonymous No.106415421
>>106415413
What if they want retard pancakes? That's dangerous.
Anonymous No.106415435
>>106415293
Not that it's too big, it seems to be a weird format and it's the running requirements seem oddly inconvenient
>This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory)
Like my work has some powerful servers worth >$100k but they don't have 8 GPUs in them (4 GPUs).
As far as I can tell, you can't run it with llama cpp (at least I can't find anything on it). And the lack of any quants/finetunes despite it being a news worthy release seems to support nobody knows what to do with this.
Plus are there really more companies with that much hardware than local ERPers?
Anonymous No.106415453
>>106415290
He paid jeets to like it
Anonymous No.106415465 >>106415481
>>106415413
Provide pancake instructions.
Anonymous No.106415475
>>106415159
1.2T? That's nothing. Fucked up shit.
Anonymous No.106415481
>>106415465
New prefill?
Anonymous No.106415489
Gemini 2.5 has been on top of lmarena for 3 months and OpenAI failed to kick it off. Are sirs that unstoppable?
Anonymous No.106415543 >>106415685 >>106415724
which quantization should i have with 12gb of vram?
Anonymous No.106415681
>>106414706
Anonymous No.106415685 >>106415777
>>106415543
your vram will hardly matter. you need a decent amount of systenm ram to run it, at least 64 gb but ideally 96gb-128 to run it at a proper q4 quant with decent context.

You will also need to learn how to properly offload layers to cpu so that more used layers are on gpu only. Plenty of reddit posts have done this work for you just seach 3060 or 8gb vram on reddit local llama
Anonymous No.106415724
>>106415543
Anonymous No.106415777
>>106415685
Is 8gb enough?