Search - 4rchive

>>106492824
despite being, like qwen, benchmaxxed on stem/code stuff, they're only slightly better than that old 8B qwen in nothink mode (and the current 2507 4b is a better model imho)
what is the point of this kind of 2.5b active param moe
I don't get it

Search results for "909da150e94305975635eccfbe1506dd" in md5 (1)