>>105631287
>>105631628
>>105631953
It has been obvious for a while, the Chinese don't have enough GPUs unlike zucc. The scaling curves for training dense and the costs are a lot higher. Zuck could do it because he was aiming for local use.
Meanwhile if the whale for 2x the 70b compute can reach close enough to the big boys. The catch? huge memory requirements.
Chinese labs had as much GPUs to throw at the problem as zucc does, we might see some "for local" projects, otherwise they'll just go for MoE to maximize performance. That and a 70b is unlikely to have as much trivia knowledge as a big MoE, but might be more consistent in some way than a MoE.