[x86] Fix yet another bug in the new vector shuffle lowering's handling

[oota-llvm.git] / docs / Atomics.rst
diff --git a/docs/Atomics.rst b/docs/Atomics.rst

index 705d73fbaba43c62b9dbd80bedd20e3cd77bea30..58d1a26d5441530970f999cc8a42e1f8d7049635 100644 (file)
--- a/docs/Atomics.rst
+++ b/docs/Atomics.rst
@@ -24,10 +24,10 @@ optimized code generation for the following:
  
  * Proper semantics for Java-style memory, for both ``volatile`` and regular
    shared variables. (`Java Specification
-  <http://java.sun.com/docs/books/jls/third_edition/html/memory.html>`_)
+  <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
  
  * gcc-compatible ``__sync_*`` builtins. (`Description
-  <http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html>`_)
+  <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
  
  * Other scenarios with atomic semantics, including ``static`` variables with
    non-trivial constructors in C++.
@@ -110,8 +110,7 @@ where threads and signals are involved.
  
  ``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an
  atomic store (where the store is conditional for ``cmpxchg``), but no other
-memory operation can happen on any thread between the load and store.  Note that
-LLVM's cmpxchg does not provide quite as many options as the C++0x version.
+memory operation can happen on any thread between the load and store.
  
  A ``fence`` provides Acquire and/or Release ordering which is not part of
  another operation; it is normally used along with Monotonic memory operations.
@@ -178,10 +177,10 @@ Unordered
  
  Unordered is the lowest level of atomicity. It essentially guarantees that races
  produce somewhat sane results instead of having undefined behavior.  It also
-guarantees the operation to be lock-free, so it do not depend on the data being
-part of a special atomic structure or depend on a separate per-process global
-lock.  Note that code generation will fail for unsupported atomic operations; if
-you need such an operation, use explicit locking.
+guarantees the operation to be lock-free, so it does not depend on the data
+being part of a special atomic structure or depend on a separate per-process
+global lock.  Note that code generation will fail for unsupported atomic
+operations; if you need such an operation, use explicit locking.
  
  Relevant standard
    This is intended to match the Java memory model for shared variables.
@@ -211,7 +210,7 @@ Notes for code generation
    never stored.  A normal load or store instruction is usually sufficient, but
    note that an unordered load or store cannot be split into multiple
    instructions (or an instruction which does multiple memory operations, like
-  ``LDRD`` on ARM).
+  ``LDRD`` on ARM without LPAE, or not naturally-aligned ``LDRD`` on LPAE ARM).
  
  Monotonic
  ---------
@@ -430,10 +429,9 @@ other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``.  Depending
  on the users of the result, some ``atomicrmw`` operations can be translated into
  operations like ``LOCK AND``, but that does not work in general.
  
-On ARM, MIPS, and many other RISC architectures, Acquire, Release, and
-SequentiallyConsistent semantics require barrier instructions for every such
+On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
+and SequentiallyConsistent semantics require barrier instructions for every such
  operation. Loads and stores generate normal instructions.  ``cmpxchg`` and
  ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
  which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
-on ARM, etc.). At the moment, the IR does not provide any way to represent a
-weak ``cmpxchg`` which would not require a loop.
+on ARM, etc.).