AMDGPU: fix overlapping copies in copyPhysReg
authorNicolai Haehnle <nhaehnle@gmail.com>
Sat, 19 Dec 2015 01:16:06 +0000 (01:16 +0000)
committerNicolai Haehnle <nhaehnle@gmail.com>
Sat, 19 Dec 2015 01:16:06 +0000 (01:16 +0000)
commit710bb5a59841c1a3c79e32ee374b5d0448bbf9b7
tree5d793c76c3e3f5e99f838fd0a0bd762a9a53eed1
parent7ed616c150739cd644dd8ec9de80f7d9f5326aa4
AMDGPU: fix overlapping copies in copyPhysReg

Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256072 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Target/AMDGPU/SIInstrInfo.cpp
test/CodeGen/AMDGPU/ctpop64.ll
test/CodeGen/AMDGPU/flat-address-space.ll