Anonymous
8/21/2025, 10:54:55 PM
No.106339908
>>106339878
Supposedly because of the DynaMoE architecture this model can actually be quanted to run only certain parts of the model at a time. In their own words:
> Run a (very) small part of the model with 64GB of RAM/VRAM (when quantized - quants coming soon), or the whole thing with 1TB. It’s that scalable.
https://huggingface.co/posts/ccocks-deca/499605656909204
Downside is that the devs literally haven't impletemnted support for their own model into vLLM or Transformers. Guess that's just a massive fuck you, not even to the poors, but everybody.
Supposedly because of the DynaMoE architecture this model can actually be quanted to run only certain parts of the model at a time. In their own words:
> Run a (very) small part of the model with 64GB of RAM/VRAM (when quantized - quants coming soon), or the whole thing with 1TB. It’s that scalable.
https://huggingface.co/posts/ccocks-deca/499605656909204
Downside is that the devs literally haven't impletemnted support for their own model into vLLM or Transformers. Guess that's just a massive fuck you, not even to the poors, but everybody.