[AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector
authorSanjay Patel <spatel@rotateright.com>
Thu, 2 Apr 2015 20:21:52 +0000 (20:21 +0000)
committerSanjay Patel <spatel@rotateright.com>
Thu, 2 Apr 2015 20:21:52 +0000 (20:21 +0000)
commit5b93ab6cde250b3c6470cf49daa28e54848a86c5
treec56e6b347ad0acb0fd5c6a1e49e079f6e5c47fff
parent19443c1bcb863ba186abfe0bda3a1603488d17f7
[AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector

Without this patch, we split the 256-bit vector into halves and produced something like:
movzwl (%rdi), %eax
vmovd %eax, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vblendps $15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]

Now, we eliminate the xor and blend because those zeros are free with the vmovd:
        movzwl  (%rdi), %eax
        vmovd   %eax, %xmm0

This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233941 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/vector-shuffle-256-v16.ll
test/CodeGen/X86/vector-shuffle-256-v32.ll