cont of cont
i think its because of the opacity of a type now that i gave it a fink
an __m256 is not guaranteed to be the equivalent to 2 __m128's
the compiler cant make that assumption
so it emits an instruction
this should be a macro into union
if the m256 is 32 byte aligned than its halves are 16 byte aligned, kinda by definition