Generate better code for v8i16 shuffles on SSE2