Use movaps instead of movups to spill 16-byte vector values when default alignment...