From 91a44dd9ccd8ec3a10fa35315c381cffade91d5b Mon Sep 17 00:00:00 2001
From: Eli Friedman <eli.friedman@gmail.com>
Date: Fri, 12 Aug 2011 21:50:54 +0000
Subject: [PATCH] Some reorganization of atomic docs.  Added explicit section
 for NonAtomic.  Added example for illegal non-atomic operation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137520 91177308-0d34-0410-b5e6-96231b3b80d8
---
 docs/Atomics.html | 143 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 111 insertions(+), 32 deletions(-)
diff --git a/docs/Atomics.html b/docs/Atomics.html
index 967ebdddb1a..357f43167bf 100644
--- a/docs/Atomics.html
+++ b/docs/Atomics.html
@@ -14,8 +14,8 @@
 
 <ol>
   <li><a href="#introduction">Introduction</a></li>
-  <li><a href="#loadstore">Load and store</a></li>
-  <li><a href="#otherinst">Other atomic instructions</a></li>
+  <li><a href="#outsideatomic">Optimization outside atomic</a></li>
+  <li><a href="#atomicinst">Atomic instructions</a></li>
   <li><a href="#ordering">Atomic orderings</a></li>
   <li><a href="#iropt">Atomics and IR optimization</a></li>
   <li><a href="#codegen">Atomics and Codegen</a></li>
@@ -75,51 +75,84 @@ instructions has been clarified in the IR.</p>
 
 <!-- *********************************************************************** -->
 <h2>
-  <a name="loadstore">Load and store</a>
+  <a name="outsideatomic">Optimization outside atomic</a>
 </h2>
 <!-- *********************************************************************** -->
 
 <div>
 
 <p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of 
-   optimizations, but can have unintuitive results in a concurrent environment.
-   For a frontend writer, the rule is essentially that all memory accessed 
-   with basic loads and stores by multiple threads should be protected by a
-   lock or other synchronization; otherwise, you are likely to run into
-   undefined behavior. (Do not use volatile as a substitute for atomics; it
-   might work on some platforms, but does not provide the necessary guarantees
-   in general.)</p>
+   optimizations, but can lead to undefined results in a concurrent environment;
+   see <a href="#o_nonatomic">NonAtomic</a>. This section specifically goes
+   into the one optimizer restriction which applies in concurrent environments,
+   which gets a bit more of an extended description because any optimization
+   dealing with stores needs to be aware of it.</p>
 
 <p>From the optimizer's point of view, the rule is that if there
    are not any instructions with atomic ordering involved, concurrency does
    not matter, with one exception: if a variable might be visible to another
    thread or signal handler, a store cannot be inserted along a path where it
-   might not execute otherwise. For example, suppose LICM wants to take all the
-   loads and stores in a loop to and from a particular address and promote them
-   to registers. LICM is not allowed to insert an unconditional store after
-   the loop with the computed value unless a store unconditionally executes
-   within the loop. Note that speculative loads are allowed; a load which
+   might not execute otherwise.  Take the following example:</p>
+
+<pre>
+/* C code, for readability; run through clang -O2 -S -emit-llvm to get
+   equivalent IR */
+int x;
+void f(int* a) {
+  for (int i = 0; i &lt; 100; i++) {
+    if (a[i])
+      x += 1;
+  }
+}
+</pre>
+
+<p>The following is equivalent in non-concurrent situations:</p>
+
+<pre>
+int x;
+void f(int* a) {
+  int xtemp = x;
+  for (int i = 0; i &lt; 100; i++) {
+    if (a[i])
+      xtemp += 1;
+  }
+  x = xtemp;
+}
+</pre>
+
+<p>However, LLVM is not allowed to transform the former to the latter: it could
+   introduce undefined behavior if another thread can access x at the same time.
+   (This example is particularly of interest because before the concurrency model
+   was implemented, LLVM would perform this transformation.)</p>
+
+<p>Note that speculative loads are allowed; a load which
    is part of a race returns <code>undef</code>, but does not have undefined
    behavior.</p>
 
-<p>For cases where simple loads and stores are not sufficient, LLVM provides
-   atomic loads and stores with varying levels of guarantees.</p>
 
 </div>
 
 <!-- *********************************************************************** -->
 <h2>
-  <a name="otherinst">Other atomic instructions</a>
+  <a name="atomicinst">Atomic instructions</a>
 </h2>
 <!-- *********************************************************************** -->
 
 <div>
 
+<p>For cases where simple loads and stores are not sufficient, LLVM provides
+   various atomic instructions. The exact guarantees provided depend on the
+   ordering; see <a href="#ordering">Atomic orderings</a></p>
+
+<p><code>load atomic</code> and <code>store atomic</code> provide the same
+   basic functionality as non-atomic loads and stores, but provide additional
+   guarantees in situations where threads and signals are involved.</p>
+
 <p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
    atomic load followed by an atomic store (where the store is conditional for
-   <code>cmpxchg</code>), but no other memory operation can happen between
-   the load and store.  Note that our cmpxchg does not have quite as many
-   options for making cmpxchg weaker as the C++0x version.</p>
+   <code>cmpxchg</code>), but no other memory operation can happen on any thread
+   between the load and store.  Note that LLVM's cmpxchg does not provide quite
+   as many options as the C++0x version.</p>
 
 <p>A <code>fence</code> provides Acquire and/or Release ordering which is not
    part of another operation; it is normally used along with Monotonic memory
@@ -146,6 +179,54 @@ instructions has been clarified in the IR.</p>
    each level includes all the guarantees of the previous level except for
    Acquire/Release.</p>
 
+<!-- ======================================================================= -->
+<h3>
+     <a name="o_notatomic">NotAtomic</a>
+</h3>
+
+<div>
+
+<p>NotAtomic is the obvious, a load or store which is not atomic. (This isn't
+   really a level of atomicity, but is listed here for comparison.) This is
+   essentially a regular load or store. If code accesses a memory location
+   from multiple threads at the same time, the resulting loads return
+   'undef'.</p>
+
+<dl>
+  <dt>Relevant standard</dt>
+  <dd>This is intended to match shared variables in C/C++, and to be used
+      in any other context where memory access is necessary, and
+      a race is impossible.
+  <dt>Notes for frontends</dt>
+  <dd>The rule is essentially that all memory accessed with basic loads and
+      stores by multiple threads should be protected by a lock or other
+      synchronization; otherwise, you are likely to run into undefined
+      behavior. If your frontend is for a "safe" language like Java,
+      use Unordered to load and store any shared variable.  Note that NotAtomic
+      volatile loads and stores are not properly atomic; do not try to use
+      them as a substitute. (Per the C/C++ standards, volatile does provide
+      some limited guarantees around asynchronous signals, but atomics are
+      generally a better solution.)
+  <dt>Notes for optimizers</dt>
+  <dd>Introducing loads to shared variables along a codepath where they would
+      not otherwise exist is allowed; introducing stores to shared variables
+      is not. See <a href="#outsideatomic">Optimization outside
+      atomic</a>.</dd>
+  <dt>Notes for code generation</dt>
+  <dd>The one interesting restriction here is that it is not allowed to write
+      to bytes outside of the bytes relevant to a store.  This is mostly
+      relevant to unaligned stores: it is not allowed in general to convert
+      an unaligned store into two aligned stores of the same width as the
+      unaligned store. Backends are also expected to generate an i8 store
+      as an i8 store, and not an instruction which writes to surrounding
+      bytes.  (If you are writing a backend for an architecture which cannot
+      satisfy these restrictions and cares about concurrency, please send an
+      email to llvmdev.)</dd>
+</dl>
+
+</div>
+
+
 <!-- ======================================================================= -->
 <h3>
      <a name="o_unordered">Unordered</a>
@@ -379,24 +460,22 @@ instructions has been clarified in the IR.</p>
 <ul>
   <li>isSimple(): A load or store which is not volatile or atomic.  This is
       what, for example, memcpyopt would check for operations it might
-      transform.
+      transform.</li>
   <li>isUnordered(): A load or store which is not volatile and at most
       Unordered. This would be checked, for example, by LICM before hoisting
-      an operation.
+      an operation.</li>
   <li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
       that they return true for any operation which is volatile or at least
-      Monotonic.
+      Monotonic.</li>
   <li>Alias analysis: Note that AA will return ModRef for anything Acquire or
-      Release, and for the address accessed by any Monotonic operation.
+      Release, and for the address accessed by any Monotonic operation.</li>
 </ul>
 
-<p>There are essentially two components to supporting atomic operations. The
-   first is making sure to query isSimple() or isUnordered() instead
-   of isVolatile() before transforming an operation.  The other piece is
-   making sure that a transform does not end up replacing, for example, an 
-   Unordered operation with a non-atomic operation.  Most of the other 
-   necessary checks automatically fall out from existing predicates and
-   alias analysis queries.</p>
+<p>To support optimizing around atomic operations, make sure you are using
+   the right predicates; everything should work if that is done.  If your
+   pass should optimize some atomic operations (Unordered operations in
+   particular), make sure it doesn't replace an atomic load or store with
+   a non-atomic operation.</p>
 
 <p>Some examples of how optimizations interact with various kinds of atomic
    operations:
-- 
2.34.1