►Recent Highlights from the Previous Thread: >>106278215
--Running Qwen3-235B on 3090 with AutoRound and avoiding shared memory pitfalls:
>106278283 >106278326 >106278391 >106278545 >106278550 >106278693 >106278757 >106279781 >106279825 >106281329 >106281388 >106279868 >106279926 >106280022 >106280015 >106280136 >106280374 >106280389 >106280393 >106280443 >106280450 >106280451
--Is the golden age of local LLMs over due to stagnation and lack of open foundational models?:
>106285989 >106286015 >106286167 >106286190 >106286198 >106286025 >106286036 >106286058 >106286074 >106286109 >106286146 >106286028 >106286087 >106286243
--High RAM setups vs corporate VDI centralization for local LLM use:
>106278462 >106278506 >106278598 >106278739 >106278777 >106278823 >106278830 >106278847 >106278903 >106278913 >106278946 >106279046 >106278985 >106279019 >106279007
--Dynamic MoE inference and efficiency improvements in llama.cpp development:
>106280498 >106280521 >106280633 >106280644 >106280671 >106280560 >106280745 >106280818
--Modular small models with dynamic routing as an alternative to large monolithic models:
>106278501 >106278547 >106278625 >106278657
--Deepseek's Huawei bet delays Nvidia comeback, pushing users to alternative models:
>106285826 >106285840 >106285914 >106285933 >106286011 >106285850 >106285873 >106285902
--Grok 2 open-source announcement sparks anticipation and skepticism:
>106285546 >106285608 >106285635 >106285912
--Splitting and hosting large quantized models on Hugging Face with storage limitations:
>106284117 >106284123 >106284283 >106284780 >106284798 >106285085
--Lack of reliable LoRAs for consistent 16x16 pixel art generation:
>106284801 >106285420 >106285537 >106285647 >106285821
--Miku (free space):
>106278466 >106278492 >106279395 >106279412 >106282523 >106282599 >106282785 >106283278 >106283542
►Recent Highlight Posts from the Previous Thread: >>106278217
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script