>Supposedly because of the DynaMoE architecture this model can actually be quanted to run only certain parts of the model at a time. In their own words:
>this is a merged model. They took a bunch of existing big models and turned them into a MoE
I hope /ourguy/ is going to sue.