Search results for "036662f37ed79379b4c136c08d08feba" in md5 (2)

/g/ - /lmg/ - Local Models General
Anonymous No.106363258
>>106363201
The smaller the more degradation, generally.
Basically, since you are losing numerical precision of the numbers that are being used in the calculations, each "internal nudge" towards the final output is that little bit more different ("inaccurate") compared to full precision.
Something like that.
How much the degradation is noticeable or matter will depend on a lot.
The heuristic is, use the largest bpw (correlated with file size) that you can run at the speeds you are comfortable with the context size you need.
/g/ - /lmg/ - Local Models General
Anonymous No.105808470
>>105806508
For the purposes of understanding quantization quality loss, it's not a 37B either. Since modern quants are quantizing differently per tensor and expert, we are essentially quanting it by following how undertrained each expert/tensor is, allowed by (probably) inherent deficiencies in MoE architectures and training methods. From the benchmarks above, a 100B would do way worse in quality loss, so it really does seem like it is effectively a 600B or close for the purposes of considering quantization quality loss.