>>106316852
you're ignoring that this is for moe's on purpose. The most crucial layers will be on gpu and even at 'shit' ddr5 speeds, it's going to be plenty fast enough for the less frequent layers. I'm guessing glm full will be several tokens a second at least, maybe more if the draft layers get implemented. Not to mention, this gives enough vram for dense 70b or even at the edge of dense 100b. Also, bang for buck is good. 300 for some extra ram is cheap compared to gpu stacking. So why not, as I see it.

I'm gonna wait and see in any case, let some more insane people do it first. I feel like intel will be priced higher at release for a bit and their will be a bit of a shitstorm here over it next week.