>>106390642
apparently it's a real model, I had never read the DS report before and didn't know they had unreleased models like these
https://arxiv.org/html/2412.19437v2
". At the small scale, we train a baseline MoE model comprising 15.7B total parameters on 1.33T tokens"
the shitotron report refers to this model's benchmark in ds3's TR when they talk about a deepseek v3 small