►Recent Highlights from the Previous Thread: >>106293952
--Paper (old): NoLiMa: Long-Context Evaluation Beyond Literal Matching:
>106296664 >106296755 >106296775 >106297038 >106297035
--FP8 and 8bit quantization degrade coding performance compared to FP16:
>106302809 >106302818 >106302838 >106302819 >106302849 >106302857 >106302868 >106302894 >106303047 >106302881 >106302907 >106302936 >106303114
--Creating DPO datasets from human-written NSFW stories:
>106302049 >106302064 >106302106 >106302172 >106302244 >106302360 >106302456 >106302636
--Local MoE VLMs lag behind cloud counterparts:
>106295622 >106295765 >106295840 >106295922 >106300768 >106300901 >106301033 >106301074 >106301104 >106301180 >106301307 >106301362 >106301369 >106301408 >106301393 >106301189
--Modern aligned models reduce need for complex sampling; min-p debate:
>106298467 >106298643 >106298755 >106299150 >106299218 >106299294 >106299400 >106299475 >106299509 >106299426 >106299450 >106299481 >106299591
--Open TTS models remain limited but improving:
>106294668 >106294724 >106294733 >106294765 >106294807 >106295276 >106295284 >106294821 >106294825 >106298039 >106298987
--Challenges in using small LLMs for uncensored, dynamic NPC dialogue in games:
>106294169 >106294184 >106294284 >106294340 >106294385 >106294203 >106294363 >106294373 >106294438 >106294437 >106294471 >106294635 >106294769 >106294206 >106294266 >106294291 >106294309
--Dilemma over $5k hardware investment amid upcoming GPU releases:
>106300115 >106300186 >106300203 >106300209 >106300232 >106300261 >106300273 >106300438 >106300264 >106300295
--LLMs may inherit hidden behavioral traits from training data without explicit exposure:
>106297307 >106297935
--GLM-4.5 attention fix improves performance in ik_llama.cpp:
>106295849 >106295985
--Miku (free space):
>106294340 >106300131 >106300178 >106301406
►Recent Highlight Posts from the Previous Thread: >>106293959
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script