Programming Languages Research Group: Git

author	Mehdi Amini <mehdi.amini@apple.com>
	Sat, 17 Jan 2015 01:35:56 +0000 (01:35 +0000)
committer	Mehdi Amini <mehdi.amini@apple.com>
	Sat, 17 Jan 2015 01:35:56 +0000 (01:35 +0000)
commit	5eed637b34df7a601b8231c6373d4b8237317fd8
tree	7d422492fed17f4b2590f9b51fdd19c75ccfdac0	tree \| snapshot
parent	f2a51a78f59cff657805a3b0c6dc3efd78c67bf2	commit \| diff

Improve DAG combine pass on certain IR vector patterns

Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226360 91177308-0d34-0410-b5e6-96231b3b80d8

lib/CodeGen/SelectionDAG/DAGCombiner.cpp		diff \| blob \| history
test/CodeGen/X86/vector-shuffle-256-v8.ll		diff \| blob \| history