X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available.
authorBenjamin Kramer <benny.kra@googlemail.com>
Sat, 22 Dec 2012 16:07:56 +0000 (16:07 +0000)
committerBenjamin Kramer <benny.kra@googlemail.com>
Sat, 22 Dec 2012 16:07:56 +0000 (16:07 +0000)
commit2f8a6cdfa3bc0bfa4532da89e574666c5251cdb5
tree7b8d1f46fdf06a86b5ac8ed24ebcc10a3dede709
parent17347912b46213658074416133396caffd034e0c
X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available.

pmuludq is slow, but it turns out that all the unpacking and packing of the
scalarized mul is even slower. 10% speedup on loop-vectorized paq8p.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170985 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/sse2-mul.ll [new file with mode: 0644]