Added VPERM optimization for AVX2 shuffles