>>106492824
despite being, like qwen, benchmaxxed on stem/code stuff, they're only slightly better than that old 8B qwen in nothink mode (and the current 2507 4b is a better model imho)
what is the point of this kind of 2.5b active param moe
I don't get it