Improve 256-bit shuffle splitting to allow 2 sources in each 128-bit lane. As long...