>>106372102
I messed up and loaded a part of it on my 4090.
The gpus are only at 50% utilization during inference so it does seem like a memory bottleneck.