►Recent Highlights from the Previous Thread: >>106335536

--Optimizing GLM-4.5 MoE inference speed via quant and offload tuning in llama.cpp:
>106335633 >106335669 >106335686 >106335702 >106335719 >106335721 >106335704 >106335823 >106336163 >106336177 >106336221 >106336229 >106336236 >106336398
--dots.ocr preprocessing essential for accurate document understanding in local models:
>106338159 >106338172 >106338188 >106338181 >106338215 >106338210 >106338337 >106338374 >106338523 >106338576 >106338590
--Cohere's new 256K reasoning model faces skepticism over licensing and safety alignment:
>106336632 >106336642 >106336651 >106336656 >106336675 >106336680 >106336692 >106336690 >106336733 >106336750 >106336775 >106336818 >106336861 >106336737 >106336758 >106336923 >106337358 >106337460 >106337748 >106337789 >106337814 >106337848 >106337871
--New 3.1 model criticized for blandness and overfitting on synthetic safety data:
>106336831 >106336893 >106336909 >106336979 >106337037 >106337046 >106337093 >106337128 >106337099 >106337246 >106336996 >106337236 >106337264 >106336977 >106337079 >106337003 >106338206
--Linux vs Windows power reporting and inference efficiency on RTX 3090:
>106336491 >106336561 >106336576 >106336655 >106336874 >106336990 >106337011 >106337060 >106336671
--GPT-5 inflated EQ-Bench scores by ignoring word limit prompts:
>106335810
--Skepticism toward NVIDIA's AI roadmap and social media hype around small model agents:
>106337495 >106337644 >106337664 >106337510 >106337570 >106337595 >106337614 >106337665 >106337728 >106337732 >106338079 >106337772 >106337818 >106337918 >106337963 >106338350 >106338382 >106338412 >106338500
--UE8M0 FP8 as a new data format for upcoming Chinese AI chips:
>106337941 >106337976 >106338002 >106338175 >106338316
--Miku (free space):
>106336448

►Recent Highlight Posts from the Previous Thread: >>106335541

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script