+<!-- _______________________________________________________________________ -->
+<h4>
+<a name="i_fence">'<tt>fence</tt>' Instruction</a>
+</h4>
+
+<div>
+
+<h5>Syntax:</h5>
+<pre>
+ fence [singlethread] <ordering> <i>; yields {void}</i>
+</pre>
+
+<h5>Overview:</h5>
+<p>The '<tt>fence</tt>' instruction is used to introduce happens-before edges
+between operations.</p>
+
+<h5>Arguments:</h5> <p>'<code>fence</code>' instructions take an <a
+href="#ordering">ordering</a> argument which defines what
+<i>synchronizes-with</i> edges they add. They can only be given
+<code>acquire</code>, <code>release</code>, <code>acq_rel</code>, and
+<code>seq_cst</code> orderings.</p>
+
+<h5>Semantics:</h5>
+<p>A fence <var>A</var> which has (at least) <code>release</code> ordering
+semantics <i>synchronizes with</i> a fence <var>B</var> with (at least)
+<code>acquire</code> ordering semantics if and only if there exist atomic
+operations <var>X</var> and <var>Y</var>, both operating on some atomic object
+<var>M</var>, such that <var>A</var> is sequenced before <var>X</var>,
+<var>X</var> modifies <var>M</var> (either directly or through some side effect
+of a sequence headed by <var>X</var>), <var>Y</var> is sequenced before
+<var>B</var>, and <var>Y</var> observes <var>M</var>. This provides a
+<i>happens-before</i> dependency between <var>A</var> and <var>B</var>. Rather
+than an explicit <code>fence</code>, one (but not both) of the atomic operations
+<var>X</var> or <var>Y</var> might provide a <code>release</code> or
+<code>acquire</code> (resp.) ordering constraint and still
+<i>synchronize-with</i> the explicit <code>fence</code> and establish the
+<i>happens-before</i> edge.</p>
+
+<p>A <code>fence</code> which has <code>seq_cst</code> ordering, in addition to
+having both <code>acquire</code> and <code>release</code> semantics specified
+above, participates in the global program order of other <code>seq_cst</code>
+operations and/or fences.</p>
+
+<p>The optional "<a href="#singlethread"><code>singlethread</code></a>" argument
+specifies that the fence only synchronizes with other fences in the same
+thread. (This is useful for interacting with signal handlers.)</p>
+
+<h5>Example:</h5>
+<pre>
+ fence acquire <i>; yields {void}</i>
+ fence singlethread seq_cst <i>; yields {void}</i>
+</pre>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4>
+<a name="i_cmpxchg">'<tt>cmpxchg</tt>' Instruction</a>
+</h4>
+
+<div>
+
+<h5>Syntax:</h5>
+<pre>
+ cmpxchg [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <ordering> <i>; yields {ty}</i>
+</pre>
+
+<h5>Overview:</h5>
+<p>The '<tt>cmpxchg</tt>' instruction is used to atomically modify memory.
+It loads a value in memory and compares it to a given value. If they are
+equal, it stores a new value into the memory.</p>
+
+<h5>Arguments:</h5>
+<p>There are three arguments to the '<code>cmpxchg</code>' instruction: an
+address to operate on, a value to compare to the value currently be at that
+address, and a new value to place at that address if the compared values are
+equal. The type of '<var><cmp></var>' must be an integer type whose
+bit width is a power of two greater than or equal to eight and less than
+or equal to a target-specific size limit. '<var><cmp></var>' and
+'<var><new></var>' must have the same type, and the type of
+'<var><pointer></var>' must be a pointer to that type. If the
+<code>cmpxchg</code> is marked as <code>volatile</code>, then the
+optimizer is not allowed to modify the number or order of execution
+of this <code>cmpxchg</code> with other <a href="#volatile">volatile
+operations</a>.</p>
+
+<!-- FIXME: Extend allowed types. -->
+
+<p>The <a href="#ordering"><var>ordering</var></a> argument specifies how this
+<code>cmpxchg</code> synchronizes with other atomic operations.</p>
+
+<p>The optional "<code>singlethread</code>" argument declares that the
+<code>cmpxchg</code> is only atomic with respect to code (usually signal
+handlers) running in the same thread as the <code>cmpxchg</code>. Otherwise the
+cmpxchg is atomic with respect to all other code in the system.</p>
+
+<p>The pointer passed into cmpxchg must have alignment greater than or equal to
+the size in memory of the operand.
+
+<h5>Semantics:</h5>
+<p>The contents of memory at the location specified by the
+'<tt><pointer></tt>' operand is read and compared to
+'<tt><cmp></tt>'; if the read value is the equal,
+'<tt><new></tt>' is written. The original value at the location
+is returned.
+
+<p>A successful <code>cmpxchg</code> is a read-modify-write instruction for the
+purpose of identifying <a href="#release_sequence">release sequences</a>. A
+failed <code>cmpxchg</code> is equivalent to an atomic load with an ordering
+parameter determined by dropping any <code>release</code> part of the
+<code>cmpxchg</code>'s ordering.</p>
+
+<!--
+FIXME: Is compare_exchange_weak() necessary? (Consider after we've done
+optimization work on ARM.)
+
+FIXME: Is a weaker ordering constraint on failure helpful in practice?
+-->
+
+<h5>Example:</h5>
+<pre>
+entry:
+ %orig = atomic <a href="#i_load">load</a> i32* %ptr unordered <i>; yields {i32}</i>
+ <a href="#i_br">br</a> label %loop
+
+loop:
+ %cmp = <a href="#i_phi">phi</a> i32 [ %orig, %entry ], [%old, %loop]
+ %squared = <a href="#i_mul">mul</a> i32 %cmp, %cmp
+ %old = cmpxchg i32* %ptr, i32 %cmp, i32 %squared <i>; yields {i32}</i>
+ %success = <a href="#i_icmp">icmp</a> eq i32 %cmp, %old
+ <a href="#i_br">br</a> i1 %success, label %done, label %loop
+
+done:
+ ...
+</pre>
+
+</div>
+
+<!-- _______________________________________________________________________ -->
+<h4>
+<a name="i_atomicrmw">'<tt>atomicrmw</tt>' Instruction</a>
+</h4>
+
+<div>
+
+<h5>Syntax:</h5>
+<pre>
+ atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering> <i>; yields {ty}</i>
+</pre>
+
+<h5>Overview:</h5>
+<p>The '<tt>atomicrmw</tt>' instruction is used to atomically modify memory.</p>
+
+<h5>Arguments:</h5>
+<p>There are three arguments to the '<code>atomicrmw</code>' instruction: an
+operation to apply, an address whose value to modify, an argument to the
+operation. The operation must be one of the following keywords:</p>
+<ul>
+ <li>xchg</li>
+ <li>add</li>
+ <li>sub</li>
+ <li>and</li>
+ <li>nand</li>
+ <li>or</li>
+ <li>xor</li>
+ <li>max</li>
+ <li>min</li>
+ <li>umax</li>
+ <li>umin</li>
+</ul>
+
+<p>The type of '<var><value></var>' must be an integer type whose
+bit width is a power of two greater than or equal to eight and less than
+or equal to a target-specific size limit. The type of the
+'<code><pointer></code>' operand must be a pointer to that type.
+If the <code>atomicrmw</code> is marked as <code>volatile</code>, then the
+optimizer is not allowed to modify the number or order of execution of this
+<code>atomicrmw</code> with other <a href="#volatile">volatile
+ operations</a>.</p>
+
+<!-- FIXME: Extend allowed types. -->
+
+<h5>Semantics:</h5>
+<p>The contents of memory at the location specified by the
+'<tt><pointer></tt>' operand are atomically read, modified, and written
+back. The original value at the location is returned. The modification is
+specified by the <var>operation</var> argument:</p>
+
+<ul>
+ <li>xchg: <code>*ptr = val</code></li>
+ <li>add: <code>*ptr = *ptr + val</code></li>
+ <li>sub: <code>*ptr = *ptr - val</code></li>
+ <li>and: <code>*ptr = *ptr & val</code></li>
+ <li>nand: <code>*ptr = ~(*ptr & val)</code></li>
+ <li>or: <code>*ptr = *ptr | val</code></li>
+ <li>xor: <code>*ptr = *ptr ^ val</code></li>
+ <li>max: <code>*ptr = *ptr > val ? *ptr : val</code> (using a signed comparison)</li>
+ <li>min: <code>*ptr = *ptr < val ? *ptr : val</code> (using a signed comparison)</li>
+ <li>umax: <code>*ptr = *ptr > val ? *ptr : val</code> (using an unsigned comparison)</li>
+ <li>umin: <code>*ptr = *ptr < val ? *ptr : val</code> (using an unsigned comparison)</li>
+</ul>
+
+<h5>Example:</h5>
+<pre>
+ %old = atomicrmw add i32* %ptr, i32 1 acquire <i>; yields {i32}</i>
+</pre>
+
+</div>
+