Search Results
7/5/2025, 4:33:54 PM
>>105806508
For the purposes of understanding quantization quality loss, it's not a 37B either. Since modern quants are quantizing differently per tensor and expert, we are essentially quanting it by following how undertrained each expert/tensor is, allowed by (probably) inherent deficiencies in MoE architectures and training methods. From the benchmarks above, a 100B would do way worse in quality loss, so it really does seem like it is effectively a 600B or close for the purposes of considering quantization quality loss.
For the purposes of understanding quantization quality loss, it's not a 37B either. Since modern quants are quantizing differently per tensor and expert, we are essentially quanting it by following how undertrained each expert/tensor is, allowed by (probably) inherent deficiencies in MoE architectures and training methods. From the benchmarks above, a 100B would do way worse in quality loss, so it really does seem like it is effectively a 600B or close for the purposes of considering quantization quality loss.
Page 1