Programming Languages Research Group: Git

[X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.

This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226277 91177308-0d34-0410-b5e6-96231b3b80d8

author	Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
	Fri, 16 Jan 2015 14:55:26 +0000 (14:55 +0000)
committer	Andrea Di Biagio <Andrea_DiBiagio@sn.scee.net>
	Fri, 16 Jan 2015 14:55:26 +0000 (14:55 +0000)
commit	ac7b9c828fba1c7676102d4aac43ec1b1ce97c25
tree	3c840ff0feb2f1f097b8c9ee3ede2da41a094cca	tree \| snapshot
parent	ca2812cfc6130fb5c8672fd8ff6256433f497aa3	commit \| diff

lib/Target/X86/X86ISelLowering.cpp		diff \| blob \| history
test/CodeGen/X86/insertps-O0-bug.ll	[new file with mode: 0644]	blob