[x86] Implement a faster vector population count based on the PSHUFB