[x86] Adjust the patterns for lowering X86vzmovl nodes which don't
authorChandler Carruth <chandlerc@gmail.com>
Fri, 3 Oct 2014 21:38:49 +0000 (21:38 +0000)
committerChandler Carruth <chandlerc@gmail.com>
Fri, 3 Oct 2014 21:38:49 +0000 (21:38 +0000)
commit91ea3e41ae46348d520e9cdf8123748d01b2a46a
tree1c46a7f4385502e0f2873ed9d35b86e2f67b7b67
parent69ee7cb4c3a7736574587d007b8002c5aa02914e
[x86] Adjust the patterns for lowering X86vzmovl nodes which don't
perform a load to use blendps rather than movss when it is available.

For non-loads, blendps is *much* faster. It can execute on two ports in
Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes
one of the "regressions" from aggressively taking the "insertion" path
in the new vector shuffle lowering.

This does highlight one problem with blendps -- it isn't commuted as
heavily as it should be. That's future work though.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219022 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/X86/X86InstrInfo.td
lib/Target/X86/X86InstrSSE.td
test/CodeGen/X86/combine-or.ll
test/CodeGen/X86/sse41.ll
test/CodeGen/X86/vec_set-3.ll
test/CodeGen/X86/vector-shuffle-128-v4.ll
test/CodeGen/X86/vector-shuffle-256-v4.ll