>>107129880
It's not even clear there are fp16 weights for thinking. It's perfectly possible all the RL happened at int4. Who knows though, because this fucking industry has made the term training entirely fucking meaningless.
>Quantization-Aware Training (QAT) during the post-training phase
Blah.