>>106381744
it's not that bad when you are using CUDA on the GPUs. you can specify the limit for processing tokens on the CPU and on the GPU. anything less than 512 token batches gets performed on the CPU for me.