Search Results
8/3/2025, 6:55:20 PM
►Recent Highlights from the Previous Thread: >>106119921
--Post-MoE architectures and disk-offloaded inference for scalable local LLMs:
>106121097 >106121115 >106121190 >106121262 >106121334 >106121488 >106121523 >106121556 >106121596 >106121611 >106121430 >106121560
--Horizon Alpha claimed as safest model; GLM-4.5 integration progressing in llama.cpp:
>106125259 >106125378 >106125398 >106125416
--exl3 underperforms on 3090s compared to exl2 and llama.cpp in prompt processing:
>106125299 >106125313 >106125340 >106125350 >106125355 >106125381 >106125391 >106125632
--CUDA 12.8 causes performance regression on consumer GPUs vs 12.6:
>106125806 >106125955 >106125984
--Qwen's dominance in finetuning due to practical and technical constraints on alternatives:
>106126367 >106126373 >106126382 >106126385 >106126433 >106126456 >106126463 >106126477 >106126494 >106126386
--GLM4.5 support for llama.cpp PR out of draft, validation and context length concerns raised:
>106126450 >106126466 >106126505 >106126522 >106126498
--Practical RAG implementations for local roleplay and knowledge retrieval:
>106124883 >106124923 >106124910 >106124913 >106124924
--SSDmaxxing remains unviable due to hardware and cost constraints:
>106122317 >106122412 >106122423
--Step3 vision model runs on CPU with strong multimodal performance but lacks framework support:
>106121507 >106122468
--SanDisk's 4TB VRAM flash memory tech still in development, facing timeline and performance questions:
>106121982 >106121990 >106122001 >106122043 >106122062 >106122098
--Development of GLM-4.5 MoE support in llama.cpp progresses with multiple competing PRs:
>106122392 >106122409 >106122420
--Theoretical limits of model scaling and the nature of intelligence in optimization:
>106120115
--Miku (free space):
>106120838 >106121261 >106123615 >106126952
►Recent Highlight Posts from the Previous Thread: >>106119924
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
--Post-MoE architectures and disk-offloaded inference for scalable local LLMs:
>106121097 >106121115 >106121190 >106121262 >106121334 >106121488 >106121523 >106121556 >106121596 >106121611 >106121430 >106121560
--Horizon Alpha claimed as safest model; GLM-4.5 integration progressing in llama.cpp:
>106125259 >106125378 >106125398 >106125416
--exl3 underperforms on 3090s compared to exl2 and llama.cpp in prompt processing:
>106125299 >106125313 >106125340 >106125350 >106125355 >106125381 >106125391 >106125632
--CUDA 12.8 causes performance regression on consumer GPUs vs 12.6:
>106125806 >106125955 >106125984
--Qwen's dominance in finetuning due to practical and technical constraints on alternatives:
>106126367 >106126373 >106126382 >106126385 >106126433 >106126456 >106126463 >106126477 >106126494 >106126386
--GLM4.5 support for llama.cpp PR out of draft, validation and context length concerns raised:
>106126450 >106126466 >106126505 >106126522 >106126498
--Practical RAG implementations for local roleplay and knowledge retrieval:
>106124883 >106124923 >106124910 >106124913 >106124924
--SSDmaxxing remains unviable due to hardware and cost constraints:
>106122317 >106122412 >106122423
--Step3 vision model runs on CPU with strong multimodal performance but lacks framework support:
>106121507 >106122468
--SanDisk's 4TB VRAM flash memory tech still in development, facing timeline and performance questions:
>106121982 >106121990 >106122001 >106122043 >106122062 >106122098
--Development of GLM-4.5 MoE support in llama.cpp progresses with multiple competing PRs:
>106122392 >106122409 >106122420
--Theoretical limits of model scaling and the nature of intelligence in optimization:
>106120115
--Miku (free space):
>106120838 >106121261 >106123615 >106126952
►Recent Highlight Posts from the Previous Thread: >>106119924
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
Page 1