From: Chandler Carruth <chandlerc@gmail.com>
Date: Wed, 26 Sep 2012 07:41:40 +0000 (+0000)
Subject: Revert the business end of r164636 and try again. I'll come in again. ;]
X-Git-Url: http://plrg.eecs.uci.edu/git/?a=commitdiff_plain;h=b3dca3f50e649caa29f97736f0fa4b2fe1097034;p=oota-llvm.git

Revert the business end of r164636 and try again. I'll come in again. ;]

This should really, really fix PR13916. For real this time. The
underlying bug is... a bit more subtle than I had imagined.

The setup is a code pattern that leads to an @llvm.memcpy call with two
equal pointers to an alloca in the source and dest. Now, not any pattern
will do. The alloca needs to be formed just so, and both pointers should
be wrapped in different bitcasts etc. When this precise pattern hits,
a funny sequence of events transpires. First, we correctly detect the
potential for overlap, and correctly optimize the memcpy. The first
time. However, we do simplify the set of users of the alloca, and that
causes us to run the alloca back through the SROA pass in case there are
knock-on simplifications. At this point, a curious thing has happened.
If we happen to have an i8 alloca, we have direct i8 pointer values. So
we don't bother creating a cast, we rewrite the arguments to the memcpy
to dircetly refer to the alloca.

Now, in an unrelated area of the pass, we have clever logic which
ensures that when visiting each User of a particular pointer derived
from an alloca, we only visit that User once, and directly inspect all
of its operands which refer to that particular pointer value. However,
the mechanism used to detect memcpy's with the potential to overlap
relied upon getting visited once per *Use*, not once per *User*. This is
always true *unless* the same exact value is both source and dest. It
turns out that almost nothing actually produces that pattern though.

We can hand craft test cases that more directly test this behavior of
course, and those are included. Also, note that there is a significant
missed optimization here -- we prove in many cases that there is
a non-volatile memcpy call with identical source and dest addresses. We
shouldn't prevent splitting the alloca in that case, and in fact we
should just remove such memcpy calls eagerly. I'll address that in
a subsequent commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164669 91177308-0d34-0410-b5e6-96231b3b80d8
---

diff --git a/lib/Transforms/Scalar/SROA.cpp b/lib/Transforms/Scalar/SROA.cpp
index 04e350c25fc..c03b37da523 100644
--- a/lib/Transforms/Scalar/SROA.cpp
+++ b/lib/Transforms/Scalar/SROA.cpp
@@ -662,11 +662,14 @@ private:
     bool Inserted = false;
     llvm::tie(PMI, Inserted)
       = MemTransferPartitionMap.insert(std::make_pair(&II, NewIdx));
-    if (!Inserted && Offsets.IsSplittable) {
+    if (Offsets.IsSplittable &&
+        (!Inserted || II.getRawSource() == II.getRawDest())) {
       // We've found a memory transfer intrinsic which refers to the alloca as
-      // both a source and dest. We refuse to split these to simplify splitting
-      // logic. If possible, SROA will still split them into separate allocas
-      // and then re-analyze.
+      // both a source and dest. This is detected either by direct equality of
+      // the operand values, or when we visit the intrinsic twice due to two
+      // different chains of values leading to it. We refuse to split these to
+      // simplify splitting logic. If possible, SROA will still split them into
+      // separate allocas and then re-analyze.
       Offsets.IsSplittable = false;
       P.Partitions[PMI->second].IsSplittable = false;
       P.Partitions[NewIdx].IsSplittable = false;
@@ -2228,10 +2231,7 @@ private:
     // alloca that should be re-examined after rewriting this instruction.
     if (AllocaInst *AI
           = dyn_cast<AllocaInst>(OtherPtr->stripInBoundsOffsets()))
-      // Don't revisit the alloca if both sides of the memory transfer are
-      // referring to the same alloca.
-      if (AI != &NewAI)
-        Pass.Worklist.insert(AI);
+      Pass.Worklist.insert(AI);
 
     if (EmitMemCpy) {
       Value *OurPtr
diff --git a/test/Transforms/SROA/basictest.ll b/test/Transforms/SROA/basictest.ll
index f7a6e0e1e83..e31315f9bfd 100644
--- a/test/Transforms/SROA/basictest.ll
+++ b/test/Transforms/SROA/basictest.ll
@@ -856,26 +856,45 @@ entry:
   ret i8 %result
 }
 
-%test22.struct = type { i8 }
+%PR13916.struct = type { i8 }
+
+define void @PR13916.1() {
+; Ensure that we handle overlapping memcpy intrinsics correctly, especially in
+; the case where there is a directly identical value for both source and dest.
+; CHECK: @PR13916.1
+; FIXME: We shouldn't leave this alloca around.
+; CHECK: alloca
+; CHECK: ret void
 
-define void @test22() {
-; CHECK: @test22
-; CHECK-NOT: alloca
+entry:
+  %a = alloca i8
+  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %a, i8* %a, i32 1, i32 1, i1 false)
+  %tmp2 = load i8* %a
+  ret void
+}
+
+define void @PR13916.2() {
+; Check whether we continue to handle them correctly when they start off with
+; different pointer value chains, but during rewriting we coalesce them into the
+; same value.
+; CHECK: @PR13916.2
+; FIXME: We shouldn't leave this alloca around.
+; CHECK: alloca
 ; CHECK: ret void
-; PR13916
 
 entry:
-  %A = alloca %test22.struct
+  %a = alloca %PR13916.struct, align 1
   br i1 undef, label %if.then, label %if.end
 
-if.then:                                          ; preds = %entry
-  %tmp = bitcast %test22.struct* %A to i8*
-  %tmp1 = bitcast %test22.struct* %A to i8*
-  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* %tmp1, i32 1, i32 1, i1 false)
-  unreachable
+if.then:
+  %tmp0 = bitcast %PR13916.struct* %a to i8*
+  %tmp1 = bitcast %PR13916.struct* %a to i8*
+  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %tmp0, i8* %tmp1, i32 1, i32 1, i1 false)
+  br label %if.end
 
-if.end:                                           ; preds = %entry
-  %tmp2 = load %test22.struct* %A
+if.end:
+  %gep = getelementptr %PR13916.struct* %a, i32 0, i32 0
+  %tmp2 = load i8* %gep
   ret void
 }