Anonymous
8/20/2025, 1:17:24 AM
No.103674078
>>103670986
That makes sense when you can offload 'everything' onto VRAM but how does it work for these MoE where you can't? Does it selectively offload the active parameters or what? I assume the effect isn't linear and there's a certain amount of layers that you have to offload before you get a 'good' speed increase.
That makes sense when you can offload 'everything' onto VRAM but how does it work for these MoE where you can't? Does it selectively offload the active parameters or what? I assume the effect isn't linear and there's a certain amount of layers that you have to offload before you get a 'good' speed increase.