Merging r259645:

[oota-llvm.git] / docs / Atomics.rst
diff --git a/docs/Atomics.rst b/docs/Atomics.rst

index 58d1a26d5441530970f999cc8a42e1f8d7049635..79ab74792dd476fd83fdfc83044d29b57d88d6ee 100644 (file)
--- a/docs/Atomics.rst
+++ b/docs/Atomics.rst
@@ -18,8 +18,8 @@ clarified in the IR.
  The atomic instructions are designed specifically to provide readable IR and
  optimized code generation for the following:
  
  The atomic instructions are designed specifically to provide readable IR and
  optimized code generation for the following:
  
-* The new C++0x ``<atomic>`` header.  (`C++0x draft available here
-  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here
+* The new C++11 ``<atomic>`` header.  (`C++11 draft available here
+  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
    <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
  
  * Proper semantics for Java-style memory, for both ``volatile`` and regular
    <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
  
  * Proper semantics for Java-style memory, for both ``volatile`` and regular
@@ -115,7 +115,10 @@ memory operation can happen on any thread between the load and store.
  A ``fence`` provides Acquire and/or Release ordering which is not part of
  another operation; it is normally used along with Monotonic memory operations.
  A Monotonic load followed by an Acquire fence is roughly equivalent to an
  A ``fence`` provides Acquire and/or Release ordering which is not part of
  another operation; it is normally used along with Monotonic memory operations.
  A Monotonic load followed by an Acquire fence is roughly equivalent to an
-Acquire load.
+Acquire load, and a Monotonic store following a Release fence is roughly
+equivalent to a Release store. SequentiallyConsistent fences behave as both
+an Acquire and a Release fence, and offer some additional complicated
+guarantees, see the C++11 standard for details.
  
  Frontends generating atomic instructions generally need to be aware of the
  target to some degree; atomic instructions are guaranteed to be lock-free, and
  
  Frontends generating atomic instructions generally need to be aware of the
  target to some degree; atomic instructions are guaranteed to be lock-free, and
@@ -170,7 +173,7 @@ Notes for code generation
    also expected to generate an i8 store as an i8 store, and not an instruction
    which writes to surrounding bytes.  (If you are writing a backend for an
    architecture which cannot satisfy these restrictions and cares about
    also expected to generate an i8 store as an i8 store, and not an instruction
    which writes to surrounding bytes.  (If you are writing a backend for an
    architecture which cannot satisfy these restrictions and cares about
-  concurrency, please send an email to llvmdev.)
+  concurrency, please send an email to llvm-dev.)
  
  Unordered
  ---------
  
  Unordered
  ---------
@@ -221,7 +224,7 @@ essentially guarantees that if you take all the operations affecting a specific
  address, a consistent ordering exists.
  
  Relevant standard
  address, a consistent ordering exists.
  
  Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those
+  This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
    standards for the exact definition.
  
  Notes for frontends
    standards for the exact definition.
  
  Notes for frontends
@@ -251,8 +254,8 @@ Acquire provides a barrier of the sort necessary to acquire a lock to access
  other memory with normal loads and stores.
  
  Relevant standard
  other memory with normal loads and stores.
  
  Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be
-  used for C++0x/C1x ``memory_order_consume``.
+  This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
+  used for C++11/C11 ``memory_order_consume``.
  
  Notes for frontends
    If you are writing a frontend which uses this directly, use with caution.
  
  Notes for frontends
    If you are writing a frontend which uses this directly, use with caution.
@@ -281,7 +284,7 @@ Release is similar to Acquire, but with a barrier of the sort necessary to
  release a lock.
  
  Relevant standard
  release a lock.
  
  Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_release``.
+  This corresponds to the C++11/C11 ``memory_order_release``.
  
  Notes for frontends
    If you are writing a frontend which uses this directly, use with caution.
  
  Notes for frontends
    If you are writing a frontend which uses this directly, use with caution.
@@ -307,7 +310,7 @@ AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
  barrier (for fences and operations which both read and write memory).
  
  Relevant standard
  barrier (for fences and operations which both read and write memory).
  
  Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
+  This corresponds to the C++11/C11 ``memory_order_acq_rel``.
  
  Notes for frontends
    If you are writing a frontend which uses this directly, use with caution.
  
  Notes for frontends
    If you are writing a frontend which uses this directly, use with caution.
@@ -330,7 +333,7 @@ and Release semantics for stores. Additionally, it guarantees that a total
  ordering exists between all SequentiallyConsistent operations.
  
  Relevant standard
  ordering exists between all SequentiallyConsistent operations.
  
  Relevant standard
-  This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and
+  This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
    the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
  
  Notes for frontends
    the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
  
  Notes for frontends
@@ -368,6 +371,11 @@ Predicates for optimizer writers to query:
    that they return true for any operation which is volatile or at least
    Monotonic.
  
    that they return true for any operation which is volatile or at least
    Monotonic.
  
+* ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on
+  orderings. They can be useful for passes that are aware of atomics, for
+  example to do DSE across a single atomic access, but not across a
+  release-acquire pair (see MemoryDependencyAnalysis for an example of this)
+
  * Alias analysis: Note that AA will return ModRef for anything Acquire or
    Release, and for the address accessed by any Monotonic operation.
  
  * Alias analysis: Note that AA will return ModRef for anything Acquire or
    Release, and for the address accessed by any Monotonic operation.
  
@@ -389,7 +397,9 @@ operations:
  
  * DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
    be DSE'ed in some cases, but it's tricky to reason about, and not especially
  
  * DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
    be DSE'ed in some cases, but it's tricky to reason about, and not especially
-  important.
+  important. It is possible in some case for DSE to operate across a stronger
+  atomic operation, but it is fairly tricky. DSE delegates this reasoning to
+  MemoryDependencyAnalysis (which is also used by other passes like GVN).
  
  * Folding a load: Any atomic load from a constant global can be constant-folded,
    because it cannot be observed.  Similar reasoning allows scalarrepl with
  
  * Folding a load: Any atomic load from a constant global can be constant-folded,
    because it cannot be observed.  Similar reasoning allows scalarrepl with
@@ -400,7 +410,8 @@ Atomics and Codegen
  
  Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
  On architectures which use barrier instructions for all atomic ordering (like
  
  Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
  On architectures which use barrier instructions for all atomic ordering (like
-ARM), appropriate fences are split out as the DAG is built.
+ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
+``setInsertFencesForAtomic()`` was used.
  
  The MachineMemOperand for all atomic operations is currently marked as volatile;
  this is not correct in the IR sense of volatile, but CodeGen handles anything
  
  The MachineMemOperand for all atomic operations is currently marked as volatile;
  this is not correct in the IR sense of volatile, but CodeGen handles anything
@@ -415,11 +426,6 @@ error when given an operation which cannot be implemented.  (The LLVM code
  generator is not very helpful here at the moment, but hopefully that will
  change.)
  
  generator is not very helpful here at the moment, but hopefully that will
  change.)
  
-The implementation of atomics on LL/SC architectures (like ARM) is currently a
-bit of a mess; there is a lot of copy-pasted code across targets, and the
-representation is relatively unsuited to optimization (it would be nice to be
-able to optimize loops involving cmpxchg etc.).
-
  On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
  generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
  fences generate an ``MFENCE``, other fences do not cause any code to be
  On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
  generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
  fences generate an ``MFENCE``, other fences do not cause any code to be
@@ -435,3 +441,19 @@ operation. Loads and stores generate normal instructions.  ``cmpxchg`` and
  ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
  which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
  on ARM, etc.).
  ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
  which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
  on ARM, etc.).
+
+It is often easiest for backends to use AtomicExpandPass to lower some of the
+atomic constructs. Here are some lowerings it can do:
+
+* cmpxchg -> loop with load-linked/store-conditional
+  by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
+  ``emitStoreConditional()``
+* large loads/stores -> ll-sc/cmpxchg
+  by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
+* strong atomic accesses -> monotonic accesses + fences
+  by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
+  and ``emitTrailingFence()``
+* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
+  by overriding ``expandAtomicRMWInIR()``
+
+For an example of all of these, look at the ARM backend.