Search Results
8/9/2025, 4:55:52 PM
►Recent Highlights from the Previous Thread: >>106195686
--Performance comparison between exllama and llama.cpp for parallel and multi-user inference:
>106199569 >106199579 >106199594 >106199638 >106199669 >106199726 >106199737 >106199763 >106199783 >106199790 >106199791 >106199675 >106199693 >106199729 >106199780 >106199785
--Struggling with GLM-4.5-Air roleplay quality despite template fixes:
>106196301 >106196331 >106196398
--PyTorch 2.8.0 and 2.9.0-dev slower than 2.7.1 on CUDA 12.8:
>106195727
--High-speed local inference on 4090 with 192GB RAM and GGUF model:
>106196335
--ik_llama.cpp shows poor prompt processing despite MoE optimizations on GLM 4.5 Air:
>106196063 >106196144 >106196670
--Real-time OCR translation for Japanese doujins using VLM tools:
>106198427 >106198438 >106198486 >106198709 >106199058
--Poor performance of unsloth's UD Q2_K_XL compared to Bart's quants in coherence and memory retention:
>106197204 >106197260
--High VRAM demands make non-quantized models impractical without cpumaxx or distributed workarounds:
>106199691 >106199751 >106199771 >106199806
---amb flag does affect VRAM usage, especially for large models like DeepSeek:
>106197040 >106197069 >106197142
--Advanced prompting techniques to bypass Gemma-4B restrictions:
>106197654 >106197666 >106198920 >106199050
--LLM file distribution challenges and optimization tips in a torrenting context:
>106196903 >106196968 >106196977 >106196981
--Comparison of LLM inference backends and their trade-offs in support, speed, and quantization:
>106199642
--Prefill inefficiency wastes tokens; split prompting as workaround:
>106195800 >106195827 >106195887
--VLM1 is SOTA open-source vision model based on DeepseekV3:
>106196750
--GLM 4.5 Air q3_K_XL inference logs with user error noted:
>106195786
--Miku (free space):
>106195786 >106196681 >106200382 >106201057
►Recent Highlight Posts from the Previous Thread: >>106195692
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
--Performance comparison between exllama and llama.cpp for parallel and multi-user inference:
>106199569 >106199579 >106199594 >106199638 >106199669 >106199726 >106199737 >106199763 >106199783 >106199790 >106199791 >106199675 >106199693 >106199729 >106199780 >106199785
--Struggling with GLM-4.5-Air roleplay quality despite template fixes:
>106196301 >106196331 >106196398
--PyTorch 2.8.0 and 2.9.0-dev slower than 2.7.1 on CUDA 12.8:
>106195727
--High-speed local inference on 4090 with 192GB RAM and GGUF model:
>106196335
--ik_llama.cpp shows poor prompt processing despite MoE optimizations on GLM 4.5 Air:
>106196063 >106196144 >106196670
--Real-time OCR translation for Japanese doujins using VLM tools:
>106198427 >106198438 >106198486 >106198709 >106199058
--Poor performance of unsloth's UD Q2_K_XL compared to Bart's quants in coherence and memory retention:
>106197204 >106197260
--High VRAM demands make non-quantized models impractical without cpumaxx or distributed workarounds:
>106199691 >106199751 >106199771 >106199806
---amb flag does affect VRAM usage, especially for large models like DeepSeek:
>106197040 >106197069 >106197142
--Advanced prompting techniques to bypass Gemma-4B restrictions:
>106197654 >106197666 >106198920 >106199050
--LLM file distribution challenges and optimization tips in a torrenting context:
>106196903 >106196968 >106196977 >106196981
--Comparison of LLM inference backends and their trade-offs in support, speed, and quantization:
>106199642
--Prefill inefficiency wastes tokens; split prompting as workaround:
>106195800 >106195827 >106195887
--VLM1 is SOTA open-source vision model based on DeepseekV3:
>106196750
--GLM 4.5 Air q3_K_XL inference logs with user error noted:
>106195786
--Miku (free space):
>106195786 >106196681 >106200382 >106201057
►Recent Highlight Posts from the Previous Thread: >>106195692
Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
8/9/2025, 3:33:26 PM
>>106200991
>You could make and sell the perfect husbandobot and still have it refuse 80% of the shit anons in this thread are into
>Tfw into relatively vanilla maledom and bdsm that coincides with a happy marriage and kids.
I'm untouchable.
>You could make and sell the perfect husbandobot and still have it refuse 80% of the shit anons in this thread are into
>Tfw into relatively vanilla maledom and bdsm that coincides with a happy marriage and kids.
I'm untouchable.
Page 1