What's the best pytorch and cuda version for speed right now? Anyone tried investigating that rabbit hole?