►Recent Highlights from the Previous Thread: >>107044779
--Paper: INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats:
>107048819 >107050729 >107051225 >107051397 >107051579 >107051763 >107051785 >107052024 >107052042
--Kimi Linear release and model size vs performance tradeoffs:
>107052386 >107052523 >107052534 >107052587 >107052868 >107053037 >107053119 >107053253 >107053271 >107053372 >107053399 >107053296 >107052943 >107052960
--Brumby-14B-Base's power retention architecture:
>107053745 >107053782 >107053793 >107053806 >107053815 >107054051 >107054141 >107054191 >107054161 >107054205 >107054237 >107054228
--MiniMax M2's full attention choice due to efficient attention's unmet real-world expectations:
>107055069
--Optimizing VibeVoice-Large-Q8 with selective quantization and performance tweaks:
>107046566 >107046649
--Input text recovery from hidden states:
>107053293 >107053393
--CUDA toolkit installation headaches and alternatives:
>107045283 >107045326 >107045351 >107045445 >107045512 >107045605 >107049390 >107049857
--Mixed experiences and optimization tips for glm 4.6 usage:
>107051344 >107051367 >107052899 >107053125 >107051379 >107051387 >107053864
--GLM-4.6 excels in code planning and tool stability:
>107046842 >107046900 >107046932 >107046939 >107047296
--Evaluating Mamba-based LLMs: context length claims vs practical performance:
>107044925 >107045236 >107045252 >107045278
--Qwen3VL support added to llama.cpp:
>107054671 >107054693
--LLM preference inconsistency under contextual shifts:
>107049878 >107049939 >107049985
--Exploring transformer token prediction theory and Suno AI's limitations:
>107047458 >107048117 >107048175 >107048207 >107048762
--Logs:
>107046612 >107046642 >107048277 >107056280
--Miku (free space):
>107047069 >107049649 >107051768 >107051786 >107053223 >107053796 >107055480
►Recent Highlight Posts from the Previous Thread: >>107044782
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script