>>106013245
It is a Windows issue in the sense that the kernel launch overhead is much higher on Windows vs. Linux.
So whether or not CUDA graphs work correctly has a higher impact for the end-to-end performance.