combine consecutive subvector 16-byte loads into one 32-byte load
authorSanjay Patel <spatel@rotateright.com>
Tue, 16 Dec 2014 16:30:01 +0000 (16:30 +0000)
committerSanjay Patel <spatel@rotateright.com>
Tue, 16 Dec 2014 16:30:01 +0000 (16:30 +0000)
commit8fe9488a40dd2f569549a0c395b8559e84367ee6
tree7c906d1432fdba53c165f216e889fc241724f442
parentd69e4e2945cdb1eb28d578545fd33e260c15dc2c
combine consecutive subvector 16-byte loads into one 32-byte load

This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ).
When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector,
we can use a single 32-byte load instead.
But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops.
We also don't bother using 32-byte *integer* loads on a machine that only has AVX1 (btver2)
because those operands would have to be split in half anyway since there is no support for
32-byte integer math ops.

Differential Revision: http://reviews.llvm.org/D6492

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224344 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/X86/X86InstrInfo.td
lib/Target/X86/X86InstrSSE.td
test/CodeGen/X86/unaligned-32-byte-memops.ll