Add support for llvm.vectorizer metadata

author Paul Redmond <paul.redmond@intel.com>

Tue, 28 May 2013 20:00:34 +0000 (20:00 +0000)

committer Paul Redmond <paul.redmond@intel.com>

Tue, 28 May 2013 20:00:34 +0000 (20:00 +0000)
author Paul Redmond <paul.redmond@intel.com>
Tue, 28 May 2013 20:00:34 +0000 (20:00 +0000)
committer Paul Redmond <paul.redmond@intel.com>
Tue, 28 May 2013 20:00:34 +0000 (20:00 +0000)
diff --git a/docs/LangRef.rst b/docs/LangRef.rst

index e902159bf08b6a8acff26c17ffba0754aa9b7e9d..72648edbcb28e36648af3f1803fe6eeaf73a9b78 100644 (file)
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -2554,8 +2554,8 @@ Examples:
  It is sometimes useful to attach information to loop constructs. Currently,
  loop metadata is implemented as metadata attached to the branch instruction
  in the loop latch block. This type of metadata refer to a metadata node that is
-guaranteed to be separate for each loop. The loop-level metadata is prefixed
-with ``llvm.loop``.
+guaranteed to be separate for each loop. The loop identifier metadata is 
+specified with the name ``llvm.loop``.
  
  The loop identifier metadata is implemented using a metadata that refers to
  itself to avoid merging it with any other identifier metadata, e.g.,
@@ -2569,32 +2569,17 @@ constructs:
      !0 = metadata !{ metadata !0 }
      !1 = metadata !{ metadata !1 }
  
+The loop identifier metadata can be used to specify additional per-loop
+metadata. Any operands after the first operand can be treated as user-defined
+metadata. For example the ``llvm.vectorizer.unroll`` metadata is understood
+by the loop vectorizer to indicate how many times to unroll the loop:
  
-'``llvm.loop.parallel``' Metadata
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: llvm
  
-This loop metadata can be used to communicate that a loop should be considered
-a parallel loop. The semantics of parallel loops in this case is the one
-with the strongest cross-iteration instruction ordering freedom: the
-iterations in the loop can be considered completely independent of each
-other (also known as embarrassingly parallel loops).
-
-This metadata can originate from a programming language with parallel loop
-constructs. In such a case it is completely the programmer's responsibility
-to ensure the instructions from the different iterations of the loop can be
-executed in an arbitrary order, in parallel, or intertwined. No loop-carried
-dependency checking at all must be expected from the compiler.
-
-In order to fulfill the LLVM requirement for metadata to be safely ignored,
-it is important to ensure that a parallel loop is converted to
-a sequential loop in case an optimization (agnostic of the parallel loop
-semantics) converts the loop back to such. This happens when new memory
-accesses that do not fulfill the requirement of free ordering across iterations
-are added to the loop. Therefore, this metadata is required, but not
-sufficient, to consider the loop at hand a parallel loop. For a loop
-to be parallel,  all its memory accessing instructions need to be
-marked with the ``llvm.mem.parallel_loop_access`` metadata that refer
-to the same loop identifier metadata that identify the loop at hand.
+      br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
+    ...
+    !0 = metadata !{ metadata !0, metadata !1 }
+    !1 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 2 }
  
  '``llvm.mem``'
  ^^^^^^^^^^^^^^^
@@ -2606,29 +2591,28 @@ for optimizations are prefixed with ``llvm.mem``.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  
  For a loop to be parallel, in addition to using
-the ``llvm.loop.parallel`` metadata to mark the loop latch branch instruction,
+the ``llvm.loop`` metadata to mark the loop latch branch instruction,
  also all of the memory accessing instructions in the loop body need to be
  marked with the ``llvm.mem.parallel_loop_access`` metadata. If there
  is at least one memory accessing instruction not marked with the metadata,
-the loop, despite it possibly using the ``llvm.loop.parallel`` metadata,
-must be considered a sequential loop. This causes parallel loops to be
+the loop must be considered a sequential loop. This causes parallel loops to be
  converted to sequential loops due to optimization passes that are unaware of
  the parallel semantics and that insert new memory instructions to the loop
  body.
  
  Example of a loop that is considered parallel due to its correct use of
-both ``llvm.loop.parallel`` and ``llvm.mem.parallel_loop_access``
+both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``
  metadata types that refer to the same loop identifier metadata.
  
  .. code-block:: llvm
  
     for.body:
-   ...
-   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !0
+     ...
+     %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
  
     for.end:
     ...
@@ -2644,27 +2628,73 @@ the loop identifier metadata node directly:
     ...
  
     inner.for.body:
-   ...
-   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop.parallel !1
+     ...
+     %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
  
     inner.for.end:
-   ...
-   %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
-   ...
-   br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop.parallel !2
+     ...
+     %0 = load i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+     ...
+     br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
  
     outer.for.end:                                          ; preds = %for.body
     ...
-   !0 = metadata !{ metadata !1, metadata !2 } ; a list of parallel loop identifiers
-   !1 = metadata !{ metadata !1 } ; an identifier for the inner parallel loop
-   !2 = metadata !{ metadata !2 } ; an identifier for the outer parallel loop
+   !0 = metadata !{ metadata !1, metadata !2 } ; a list of loop identifiers
+   !1 = metadata !{ metadata !1 } ; an identifier for the inner loop
+   !2 = metadata !{ metadata !2 } ; an identifier for the outer loop
+
+'``llvm.vectorizer``'
+^^^^^^^^^^^^^^^^^^^^^
+
+Metadata prefixed with ``llvm.vectorizer`` is used to control per-loop
+vectorization parameters such as vectorization factor and unroll factor.
+
+``llvm.vectorizer`` metadata should be used in conjunction with ``llvm.loop``
+loop identification metadata.
+
+'``llvm.vectorizer.unroll``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata instructs the loop vectorizer to unroll the specified
+loop exactly ``N`` times.
+
+The first operand is the string ``llvm.vectorizer.unroll`` and the second
+operand is an integer specifying the unroll factor. For example:
+
+.. code-block:: llvm
+
+   !0 = metadata !{ metadata !"llvm.vectorizer.unroll", i32 4 }
+
+Note that setting ``llvm.vectorizer.unroll`` to 1 disables unrolling of the
+loop.
+
+If ``llvm.vectorizer.unroll`` is set to 0 then the amount of unrolling will be
+determined automatically.
+
+'``llvm.vectorizer.width``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This metadata forces the loop vectorizer to widen scalar values to a vector
+width of ``N`` rather than computing the width using a cost model.
+
+The first operand is the string ``llvm.vectorizer.width`` and the second
+operand is an integer specifying the width. For example:
+
+.. code-block:: llvm
+
+   !0 = metadata !{ metadata !"llvm.vectorizer.width", i32 4 }
+
+Note that setting ``llvm.vectorizer.width`` to 1 disables vectorization of the
+loop.
  
+If ``llvm.vectorizer.width`` is set to 0 then the width will be determined
+automatically.
  
  Module Flags Metadata
  =====================
diff --git a/include/llvm/Analysis/LoopInfo.h b/include/llvm/Analysis/LoopInfo.h

index 783e347522d42761624096990d655bc370931f8c..7b3eed7350648952a22990ad6e9f550294521fc9 100644 (file)
--- a/include/llvm/Analysis/LoopInfo.h
+++ b/include/llvm/Analysis/LoopInfo.h
@@ -50,6 +50,7 @@ inline void RemoveFromVector(std::vector<T*> &V, T *N) {
  class DominatorTree;
  class LoopInfo;
  class Loop;
+class MDNode;
  class PHINode;
  class raw_ostream;
  template<class N, class M> class LoopInfoBase;
@@ -391,6 +392,22 @@ public:
    /// iterations.
    bool isAnnotatedParallel() const;
  
+  /// Return the llvm.loop loop id metadata node for this loop if it is present.
+  ///
+  /// If this loop contains the same llvm.loop metadata on each branch to the
+  /// header then the node is returned. If any latch instruction does not
+  /// contain llvm.loop or or if multiple latches contain different nodes then
+  /// 0 is returned.
+  MDNode *getLoopID() const;
+  /// Set the llvm.loop loop id metadata for this loop.
+  ///
+  /// The LoopID metadata node will be added to each terminator instruction in
+  /// the loop that branches to the loop header.
+  ///
+  /// The LoopID metadata node should have one or more operands and the first
+  /// operand should should be the node itself.
+  void setLoopID(MDNode *LoopID) const;
+
    /// hasDedicatedExits - Return true if no exit block for the loop
    /// has a predecessor that is outside the loop.
    bool hasDedicatedExits() const;
diff --git a/lib/Analysis/LoopInfo.cpp b/lib/Analysis/LoopInfo.cpp

index f1ad6506e4ba13fe4fba64057e890573683cd4cb..f1f02a8c0a11c62b9300f219d013b4684e2b74c2 100644 (file)
--- a/lib/Analysis/LoopInfo.cpp
+++ b/lib/Analysis/LoopInfo.cpp
@@ -50,6 +50,9 @@ INITIALIZE_PASS_BEGIN(LoopInfo, "loops", "Natural Loop Information", true, true)
  INITIALIZE_PASS_DEPENDENCY(DominatorTree)
  INITIALIZE_PASS_END(LoopInfo, "loops", "Natural Loop Information", true, true)
  
+// Loop identifier metadata name.
+static const char* LoopMDName = "llvm.loop";
+
  //===----------------------------------------------------------------------===//
  // Loop implementation
  //
@@ -234,14 +237,62 @@ bool Loop::isSafeToClone() const {
    return true;
  }
  
-bool Loop::isAnnotatedParallel() const {
+MDNode *Loop::getLoopID() const {
+  MDNode *LoopID = 0;
+  if (isLoopSimplifyForm()) {
+    LoopID = getLoopLatch()->getTerminator()->getMetadata(LoopMDName);
+  } else {
+    // Go through each predecessor of the loop header and check the
+    // terminator for the metadata.
+    BasicBlock *H = getHeader();
+    for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
+      TerminatorInst *TI = (*I)->getTerminator();
+      MDNode *MD = 0;
+
+      // Check if this terminator branches to the loop header.
+      for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
+        if (TI->getSuccessor(i) == H) {
+          MD = TI->getMetadata(LoopMDName);
+          break;
+        }
+      }
+      if (!MD)
+        return 0;
  
-  BasicBlock *latch = getLoopLatch();
-  if (latch == NULL)
-    return false;
+      if (!LoopID)
+        LoopID = MD;
+      else if (MD != LoopID)
+        return 0;
+    }
+  }
+  if (!LoopID || LoopID->getNumOperands() == 0 ||
+      LoopID->getOperand(0) != LoopID)
+    return 0;
+  return LoopID;
+}
  
-  MDNode *desiredLoopIdMetadata =
-    latch->getTerminator()->getMetadata("llvm.loop.parallel");
+void Loop::setLoopID(MDNode *LoopID) const {
+  assert(LoopID && "Loop ID should not be null");
+  assert(LoopID->getNumOperands() > 0 && "Loop ID needs at least one operand");
+  assert(LoopID->getOperand(0) == LoopID && "Loop ID should refer to itself");
+
+  if (isLoopSimplifyForm()) {
+    getLoopLatch()->getTerminator()->setMetadata(LoopMDName, LoopID);
+    return;
+  }
+
+  BasicBlock *H = getHeader();
+  for (block_iterator I = block_begin(), IE = block_end(); I != IE; ++I) {
+    TerminatorInst *TI = (*I)->getTerminator();
+    for (unsigned i = 0, ie = TI->getNumSuccessors(); i != ie; ++i) {
+      if (TI->getSuccessor(i) == H)
+        TI->setMetadata(LoopMDName, LoopID);
+    }
+  }
+}
+
+bool Loop::isAnnotatedParallel() const {
+  MDNode *desiredLoopIdMetadata = getLoopID();
  
    if (!desiredLoopIdMetadata)
        return false;
diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp

index f6c4dea04c11d164ef6d1b3716ee89a1683c9b61..3f664de7601a7d9d8884b243e0d643beb99bb581 100644 (file)
--- a/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -119,11 +119,11 @@ static const unsigned TinyTripCountUnrollThreshold = 128;
  /// than this number of comparisons.
  static const unsigned RuntimeMemoryCheckThreshold = 8;
  
-/// We use a metadata with this name  to indicate that a scalar loop was
-/// vectorized and that we don't need to re-vectorize it if we run into it
-/// again.
-static const char*
-AlreadyVectorizedMDName = "llvm.vectorizer.already_vectorized";
+/// Maximum simd width.
+static const unsigned MaxVectorWidth = 64;
+
+/// Maximum vectorization unroll count.
+static const unsigned MaxUnrollFactor = 16;
  
  namespace {
  
@@ -768,6 +768,127 @@ private:
    const TargetLibraryInfo *TLI;
  };
  
+/// Utility class for getting and setting loop vectorizer hints in the form
+/// of loop metadata.
+struct LoopVectorizeHints {
+  /// Vectorization width.
+  unsigned Width;
+  /// Vectorization unroll factor.
+  unsigned Unroll;
+
+  LoopVectorizeHints(const Loop *L)
+  : Width(VectorizationFactor)
+  , Unroll(VectorizationUnroll)
+  , LoopID(L->getLoopID()) {
+    getHints(L);
+    // The command line options override any loop metadata except for when
+    // width == 1 which is used to indicate the loop is already vectorized.
+    if (VectorizationFactor.getNumOccurrences() > 0 && Width != 1)
+      Width = VectorizationFactor;
+    if (VectorizationUnroll.getNumOccurrences() > 0)
+      Unroll = VectorizationUnroll;
+  }
+
+  /// Return the loop vectorizer metadata prefix.
+  static StringRef Prefix() { return "llvm.vectorizer."; }
+
+  MDNode *createHint(LLVMContext &Context, StringRef Name, unsigned V) {
+    SmallVector<Value*, 2> Vals;
+    Vals.push_back(MDString::get(Context, Name));
+    Vals.push_back(ConstantInt::get(Type::getInt32Ty(Context), V));
+    return MDNode::get(Context, Vals);
+  }
+
+  /// Mark the loop L as already vectorized by setting the width to 1.
+  void setAlreadyVectorized(Loop *L) {
+    LLVMContext &Context = L->getHeader()->getContext();
+
+    Width = 1;
+
+    // Create a new loop id with one more operand for the already_vectorized
+    // hint. If the loop already has a loop id then copy the existing operands.
+    SmallVector<Value*, 4> Vals(1);
+    if (LoopID)
+      for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i)
+        Vals.push_back(LoopID->getOperand(i));
+
+    Twine Name = Prefix() + "width";
+    Vals.push_back(createHint(Context, Name.str(), Width));
+
+    MDNode *NewLoopID = MDNode::get(Context, Vals);
+    // Set operand 0 to refer to the loop id itself.
+    NewLoopID->replaceOperandWith(0, NewLoopID);
+
+    L->setLoopID(NewLoopID);
+    if (LoopID)
+      LoopID->replaceAllUsesWith(NewLoopID);
+
+    LoopID = NewLoopID;
+  }
+
+private:
+  MDNode *LoopID;
+
+  /// Find hints specified in the loop metadata.
+  void getHints(const Loop *L) {
+    if (!LoopID)
+      return;
+  
+    // First operand should refer to the loop id itself.
+    assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
+    assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
+  
+    for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
+      const MDString *S = 0;
+      SmallVector<Value*, 4> Args;
+
+      // The expected hint is either a MDString or a MDNode with the first
+      // operand a MDString.
+      if (const MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i))) {
+        if (!MD || MD->getNumOperands() == 0)
+          continue;
+        S = dyn_cast<MDString>(MD->getOperand(0));
+        for (unsigned i = 1, ie = MD->getNumOperands(); i < ie; ++i)
+          Args.push_back(MD->getOperand(i));
+      } else {
+        S = dyn_cast<MDString>(LoopID->getOperand(i));
+        assert(Args.size() == 0 && "too many arguments for MDString");
+      }
+
+      if (!S)
+        continue;
+
+      // Check if the hint starts with the vectorizer prefix.
+      StringRef Hint = S->getString();
+      if (!Hint.startswith(Prefix()))
+        continue;
+      // Remove the prefix.
+      Hint = Hint.substr(Prefix().size(), StringRef::npos);
+  
+      if (Args.size() == 1)
+        getHint(Hint, Args[0]);
+    }
+  }
+
+  // Check string hint with one operand.
+  void getHint(StringRef Hint, Value *Arg) {
+    const ConstantInt *C = dyn_cast<ConstantInt>(Arg);
+    if (!C) return;
+    unsigned Val = C->getZExtValue();
+
+    if (Hint == "width") {
+      assert(isPowerOf2_32(Val) && Val <= MaxVectorWidth &&
+             "Invalid width metadata");
+      Width = Val;
+    } else if (Hint == "unroll") {
+      assert(isPowerOf2_32(Val) && Val <= MaxUnrollFactor &&
+             "Invalid unroll metadata");
+      Unroll = Val;
+    } else
+      DEBUG(dbgs() << "LV: ignoring unknown hint " << Hint);
+  }
+};
+
  /// The LoopVectorize Pass.
  struct LoopVectorize : public LoopPass {
    /// Pass identification, replacement for typeid
@@ -806,6 +927,13 @@ struct LoopVectorize : public LoopPass {
      DEBUG(dbgs() << "LV: Checking a loop in \"" <<
            L->getHeader()->getParent()->getName() << "\"\n");
  
+    LoopVectorizeHints Hints(L);
+
+    if (Hints.Width == 1) {
+      DEBUG(dbgs() << "LV: Not vectorizing.\n");
+      return false;
+    }
+
      // Check if it is legal to vectorize the loop.
      LoopVectorizationLegality LVL(L, SE, DL, DT, TTI, AA, TLI);
      if (!LVL.canVectorize()) {
@@ -833,10 +961,10 @@ struct LoopVectorize : public LoopPass {
  
      // Select the optimal vectorization factor.
      LoopVectorizationCostModel::VectorizationFactor VF;
-    VF = CM.selectVectorizationFactor(OptForSize, VectorizationFactor);
+    VF = CM.selectVectorizationFactor(OptForSize, Hints.Width);
      // Select the unroll factor.
-    unsigned UF = CM.selectUnrollFactor(OptForSize, VectorizationUnroll,
-                                        VF.Width, VF.Cost);
+    unsigned UF = CM.selectUnrollFactor(OptForSize, Hints.Unroll, VF.Width,
+                                        VF.Cost);
  
      if (VF.Width == 1) {
        DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
@@ -851,6 +979,9 @@ struct LoopVectorize : public LoopPass {
      InnerLoopVectorizer LB(L, SE, LI, DT, DL, TLI, VF.Width, UF);
      LB.vectorize(&LVL);
  
+    // Mark the loop as already vectorized to avoid vectorizing again.
+    Hints.setAlreadyVectorized(L);
+
      DEBUG(verifyFunction(*L->getHeader()->getParent()));
      return true;
    }
@@ -1318,11 +1449,6 @@ InnerLoopVectorizer::createEmptyLoop(LoopVectorizationLegality *Legal) {
    BasicBlock *ExitBlock = OrigLoop->getExitBlock();
    assert(ExitBlock && "Must have an exit block");
  
-  // Mark the old scalar loop with metadata that tells us not to vectorize this
-  // loop again if we run into it.
-  MDNode *MD = MDNode::get(OldBasicBlock->getContext(), None);
-  OldBasicBlock->getTerminator()->setMetadata(AlreadyVectorizedMDName, MD);
-
    // Some loops have a single integer induction variable, while other loops
    // don't. One example is c++ iterators that often have multiple pointer
    // induction variables. In the code below we also support a case where we
@@ -2516,13 +2642,6 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
    BasicBlock *PreHeader = TheLoop->getLoopPreheader();
    BasicBlock *Header = TheLoop->getHeader();
  
-  // If we marked the scalar loop as "already vectorized" then no need
-  // to vectorize it again.
-  if (Header->getTerminator()->getMetadata(AlreadyVectorizedMDName)) {
-    DEBUG(dbgs() << "LV: This loop was vectorized before\n");
-    return false;
-  }
-
    // Look for the attribute signaling the absence of NaNs.
    Function &F = *Header->getParent();
    if (F.hasFnAttribute("no-nans-fp-math"))
diff --git a/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll b/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll

index 47a5e7aee4c1baa03c6e1369410d2ce3219c0f96..30579cebb1bd5768da8733a7b43adab294a328f3 100644 (file)
--- a/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
+++ b/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
@@ -21,7 +21,7 @@ for.end.us:                                       ; preds = %for.body3.us
    %indvars.iv.next34 = add i64 %indvars.iv33, 1
    %lftr.wideiv35 = trunc i64 %indvars.iv.next34 to i32
    %exitcond36 = icmp eq i32 %lftr.wideiv35, %m
-  br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop.parallel !5
+  br i1 %exitcond36, label %for.end15, label %for.body3.lr.ph.us, !llvm.loop !5
  
  for.body3.us:                                     ; preds = %for.body3.us, %for.body3.lr.ph.us
    %indvars.iv29 = phi i64 [ 0, %for.body3.lr.ph.us ], [ %indvars.iv.next30, %for.body3.us ]
@@ -35,7 +35,7 @@ for.body3.us:                                     ; preds = %for.body3.us, %for.
    %indvars.iv.next30 = add i64 %indvars.iv29, 1
    %lftr.wideiv31 = trunc i64 %indvars.iv.next30 to i32
    %exitcond32 = icmp eq i32 %lftr.wideiv31, %m
-  br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop.parallel !4
+  br i1 %exitcond32, label %for.end.us, label %for.body3.us, !llvm.loop !4
  
  for.body3.lr.ph.us:                               ; preds = %for.end.us, %entry
    %indvars.iv33 = phi i64 [ %indvars.iv.next34, %for.end.us ], [ 0, %entry ]
diff --git a/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll b/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll

index f904a8e0b1173b4d1e3a968015661b94a58a55c1..2c47fcb4d38902cbd66b6911da6ae3b2b210ac9d 100644 (file)
--- a/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
+++ b/test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
@@ -35,7 +35,7 @@ for.body:                                         ; preds = %for.body.for.body_c
    %indvars.iv.next.reload = load i64* %indvars.iv.next.reg2mem
    %lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
    %exitcond = icmp eq i32 %lftr.wideiv, 512
-  br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop.parallel !3
+  br i1 %exitcond, label %for.end, label %for.body.for.body_crit_edge, !llvm.loop !3
  
  for.body.for.body_crit_edge:                      ; preds = %for.body
    %indvars.iv.next.reload2 = load i64* %indvars.iv.next.reg2mem
diff --git a/test/Transforms/LoopVectorize/X86/parallel-loops.ll b/test/Transforms/LoopVectorize/X86/parallel-loops.ll

index 3f1a071e69fa842f4801c093cb6935b1b993c97f..681a815d32a4a6bb1222a07051f6fef498aab8d3 100644 (file)
--- a/test/Transforms/LoopVectorize/X86/parallel-loops.ll
+++ b/test/Transforms/LoopVectorize/X86/parallel-loops.ll
@@ -65,7 +65,7 @@ for.body:                                         ; preds = %for.body, %entry
    store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
    %lftr.wideiv = trunc i64 %indvars.iv.next to i32
    %exitcond = icmp eq i32 %lftr.wideiv, 512
-  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !3
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3
  
  for.end:                                          ; preds = %for.body
    ret void
@@ -98,7 +98,7 @@ for.body:                                         ; preds = %for.body, %entry
    store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
    %lftr.wideiv = trunc i64 %indvars.iv.next to i32
    %exitcond = icmp eq i32 %lftr.wideiv, 512
-  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop.parallel !6
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6
  
  for.end:                                          ; preds = %for.body
    ret void
diff --git a/test/Transforms/LoopVectorize/metadata-unroll.ll b/test/Transforms/LoopVectorize/metadata-unroll.ll

new file mode 100644 (file)

index 0000000..0112fee
--- /dev/null
+++ b/test/Transforms/LoopVectorize/metadata-unroll.ll
@@ -0,0 +1,41 @@
+; RUN: opt < %s  -loop-vectorize -force-vector-width=4 -dce -instcombine -S | FileCheck %s
+
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-apple-macosx10.8.0"
+
+@a = common global [2048 x i32] zeroinitializer, align 16
+
+; This is the loop.
+;  for (i=0; i<n; i++){
+;    a[i] += i;
+;  }
+;CHECK: @inc
+;CHECK: load <4 x i32>
+;CHECK: load <4 x i32>
+;CHECK: add nsw <4 x i32>
+;CHECK: add nsw <4 x i32>
+;CHECK: store <4 x i32>
+;CHECK: store <4 x i32>
+;CHECK: ret void
+define void @inc(i32 %n) nounwind uwtable noinline ssp {
+  %1 = icmp sgt i32 %n, 0
+  br i1 %1, label %.lr.ph, label %._crit_edge
+
+.lr.ph:                                           ; preds = %0, %.lr.ph
+  %indvars.iv = phi i64 [ %indvars.iv.next, %.lr.ph ], [ 0, %0 ]
+  %2 = getelementptr inbounds [2048 x i32]* @a, i64 0, i64 %indvars.iv
+  %3 = load i32* %2, align 4
+  %4 = trunc i64 %indvars.iv to i32
+  %5 = add nsw i32 %3, %4
+  store i32 %5, i32* %2, align 4
+  %indvars.iv.next = add i64 %indvars.iv, 1
+  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
+  %exitcond = icmp eq i32 %lftr.wideiv, %n
+  br i1 %exitcond, label %._crit_edge, label %.lr.ph, !llvm.loop !0
+
+._crit_edge:                                      ; preds = %.lr.ph, %0
+  ret void
+}
+
+!0 = metadata !{metadata !0, metadata !1}
+!1 = metadata !{metadata !"llvm.vectorizer.unroll", i32 2}
diff --git a/test/Transforms/LoopVectorize/metadata-width.ll b/test/Transforms/LoopVectorize/metadata-width.ll

new file mode 100644 (file)

index 0000000..b06d442
--- /dev/null
+++ b/test/Transforms/LoopVectorize/metadata-width.ll
@@ -0,0 +1,31 @@
+; RUN: opt < %s  -loop-vectorize -force-vector-unroll=1 -dce -instcombine -S | FileCheck %s
+
+target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; CHECK: @test1
+; CHECK: store <8 x i32>
+; CHECK: ret void
+define void @test1(i32* nocapture %a, i32 %n) #0 {
+entry:
+  %cmp4 = icmp sgt i32 %n, 0
+  br i1 %cmp4, label %for.body, label %for.end
+
+for.body:                                         ; preds = %entry, %for.body
+  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
+  %arrayidx = getelementptr inbounds i32* %a, i64 %indvars.iv
+  %0 = trunc i64 %indvars.iv to i32
+  store i32 %0, i32* %arrayidx, align 4
+  %indvars.iv.next = add i64 %indvars.iv, 1
+  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
+  %exitcond = icmp eq i32 %lftr.wideiv, %n
+  br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
+
+for.end:                                          ; preds = %for.body, %entry
+  ret void
+}
+
+attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
+!0 = metadata !{metadata !0, metadata !1}
+!1 = metadata !{metadata !"llvm.vectorizer.width", i32 8}
diff --git a/test/Transforms/LoopVectorize/vectorize-once.ll b/test/Transforms/LoopVectorize/vectorize-once.ll

index f289ded25de1ab3f8a6a3df086765a94fddb7b78..2b8f3fd31f7df2df57118962c06612e3332f7169 100644 (file)
--- a/test/Transforms/LoopVectorize/vectorize-once.ll
+++ b/test/Transforms/LoopVectorize/vectorize-once.ll
@@ -11,7 +11,7 @@ target triple = "x86_64-apple-macosx10.8.0"
  ; This test checks that we add metadata to vectorized loops
  ; CHECK: _Z4foo1Pii
  ; CHECK: <4 x i32>
-; CHECK: llvm.vectorizer.already_vectorized
+; CHECK: llvm.loop
  ; CHECK: ret
  
  ; This test comes from the loop:
@@ -40,10 +40,10 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit:            ; preds = %for.body.i, %entry
    ret i32 %__init.addr.0.lcssa.i
  }
  
-; This test checks that we don't vectorize loops that are marked with the "already vectorized" metadata.
+; This test checks that we don't vectorize loops that are marked with the "width" == 1 metadata.
  ; CHECK: _Z4foo2Pii
  ; CHECK-NOT: <4 x i32>
-; CHECK: llvm.vectorizer.already_vectorized
+; CHECK: llvm.loop
  ; CHECK: ret
  define i32 @_Z4foo2Pii(i32* %A, i32 %n) #0 {
  entry:
@@ -59,7 +59,7 @@ for.body.i:                                       ; preds = %entry, %for.body.i
    %add.i = add nsw i32 %0, %__init.addr.05.i
    %incdec.ptr.i = getelementptr inbounds i32* %__first.addr.04.i, i64 1
    %cmp.i = icmp eq i32* %incdec.ptr.i, %add.ptr
-  br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.vectorizer.already_vectorized !3
+  br i1 %cmp.i, label %_ZSt10accumulateIPiiET0_T_S2_S1_.exit, label %for.body.i, !llvm.loop !0
  
  _ZSt10accumulateIPiiET0_T_S2_S1_.exit:            ; preds = %for.body.i, %entry
    %__init.addr.0.lcssa.i = phi i32 [ 0, %entry ], [ %add.i, %for.body.i ]
@@ -68,5 +68,9 @@ _ZSt10accumulateIPiiET0_T_S2_S1_.exit:            ; preds = %for.body.i, %entry
  
  attributes #0 = { nounwind readonly ssp uwtable "fp-contract-model"="standard" "no-frame-pointer-elim" "no-frame-pointer-elim-non-leaf" "realign-stack" "relocation-model"="pic" "ssp-buffers-size"="8" }
  
-!3 = metadata !{}
+; CHECK: !0 = metadata !{metadata !0, metadata !1}
+; CHECK: !1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
+; CHECK: !2 = metadata !{metadata !2, metadata !1}
  
+!0 = metadata !{metadata !0, metadata !1}
+!1 = metadata !{metadata !"llvm.vectorizer.width", i32 1}
author	Paul Redmond <paul.redmond@intel.com>
	Tue, 28 May 2013 20:00:34 +0000 (20:00 +0000)
committer	Paul Redmond <paul.redmond@intel.com>
	Tue, 28 May 2013 20:00:34 +0000 (20:00 +0000)
docs/LangRef.rst		patch \| blob \| history
include/llvm/Analysis/LoopInfo.h		patch \| blob \| history
lib/Analysis/LoopInfo.cpp		patch \| blob \| history
lib/Transforms/Vectorize/LoopVectorize.cpp		patch \| blob \| history
test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll		patch \| blob \| history
test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll		patch \| blob \| history
test/Transforms/LoopVectorize/X86/parallel-loops.ll		patch \| blob \| history
test/Transforms/LoopVectorize/metadata-unroll.ll	[new file with mode: 0644]	patch \| blob
test/Transforms/LoopVectorize/metadata-width.ll	[new file with mode: 0644]	patch \| blob
test/Transforms/LoopVectorize/vectorize-once.ll		patch \| blob \| history