Search Results
7/11/2025, 5:56:45 AM
Can anyone explain how these tensors work in vae?
For SD 1.5 and SDXL models:
[[1, 4, 160, 106]] for compressed and for output [[1, 1280, 848, 3]].
I fully understand the output; batch size, height, width, color channel.
As for the compressed image; 1 is the batch size and 160 and 106 are compressed resolutions (divided by 8). But I can't make sense of "4". (3 colors plus alpha, maybe?)
It gets even more weird when you have SD3, SD3.5 and Flux with [[1, 16, 160, 106]]. Why did it become 16 now?
The VAEs of video models like hunyuan and WAN have [[1, 16, 1, 160, 106]], I assume the 1 in the middle is frame-number (I am testing images) (though interesting that it still outputs [[1, 1280, 848, 3]]) but I still don't know what 4 and 16 are.
Also do the "4"s in SD 1.5 and SDXL represent the same thing, considering that forcing 1.5 VAE on SDXL doesn't return great results? (Pic related)
For SD 1.5 and SDXL models:
[[1, 4, 160, 106]] for compressed and for output [[1, 1280, 848, 3]].
I fully understand the output; batch size, height, width, color channel.
As for the compressed image; 1 is the batch size and 160 and 106 are compressed resolutions (divided by 8). But I can't make sense of "4". (3 colors plus alpha, maybe?)
It gets even more weird when you have SD3, SD3.5 and Flux with [[1, 16, 160, 106]]. Why did it become 16 now?
The VAEs of video models like hunyuan and WAN have [[1, 16, 1, 160, 106]], I assume the 1 in the middle is frame-number (I am testing images) (though interesting that it still outputs [[1, 1280, 848, 3]]) but I still don't know what 4 and 16 are.
Also do the "4"s in SD 1.5 and SDXL represent the same thing, considering that forcing 1.5 VAE on SDXL doesn't return great results? (Pic related)
Page 1