>>106339905
might be interesting. It's moe so maybe it has some tricks up it's sleeve for inference. Will be interesting to see if we can run a 1tb moe at any kind of usable speed. If it turns out to run at 0.01 token/second I hate this dev though.
https://huggingface.co/posts/ccocks-deca/499605656909204