Search Results

Found 1 results for "036662f37ed79379b4c136c08d08feba" across all boards searching md5.

7/5/2025, 4:33:54 PM

>>105806508
For the purposes of understanding quantization quality loss, it's not a 37B either. Since modern quants are quantizing differently per tensor and expert, we are essentially quanting it by following how undertrained each expert/tensor is, allowed by (probably) inherent deficiencies in MoE architectures and training methods. From the benchmarks above, a 100B would do way worse in quality loss, so it really does seem like it is effectively a 600B or close for the purposes of considering quantization quality loss.

Go to Thread

Page 1