More refactoring of basic SSE arith instructions. Open room for 256-bit instructions