I got a qwen-image lora training session going finally. Stupid me ovelooked the note about not being able to use the comfy version of the VAE. Once I switched to to the original it worked.
I'm able to set block_to_swap = 0 and comment out transformer_dtype = 'float8', but it takes almost 46GB of VRAM.