X-Git-Url: http://plrg.eecs.uci.edu/git/?a=blobdiff_plain;f=docs%2FAtomics.html;h=2358f4d2ef2268ccfa1ffb456d7b6be699c68cdf;hb=16eeb6f5ebc978b03745177b9ac82684ab1c6932;hp=967ebdddb1a3e7e37d540d1c5880495103c04cb6;hpb=9a5ffbfa6eb2234b9d7bdde0360200823a74130e;p=oota-llvm.git diff --git a/docs/Atomics.html b/docs/Atomics.html index 967ebdddb1a..2358f4d2ef2 100644 --- a/docs/Atomics.html +++ b/docs/Atomics.html @@ -4,7 +4,7 @@
The basic 'load'
and 'store'
allow a variety of
- optimizations, but can have unintuitive results in a concurrent environment.
- For a frontend writer, the rule is essentially that all memory accessed
- with basic loads and stores by multiple threads should be protected by a
- lock or other synchronization; otherwise, you are likely to run into
- undefined behavior. (Do not use volatile as a substitute for atomics; it
- might work on some platforms, but does not provide the necessary guarantees
- in general.)
From the optimizer's point of view, the rule is that if there are not any instructions with atomic ordering involved, concurrency does not matter, with one exception: if a variable might be visible to another thread or signal handler, a store cannot be inserted along a path where it - might not execute otherwise. For example, suppose LICM wants to take all the - loads and stores in a loop to and from a particular address and promote them - to registers. LICM is not allowed to insert an unconditional store after - the loop with the computed value unless a store unconditionally executes - within the loop. Note that speculative loads are allowed; a load which + might not execute otherwise. Take the following example:
+ ++/* C code, for readability; run through clang -O2 -S -emit-llvm to get + equivalent IR */ +int x; +void f(int* a) { + for (int i = 0; i < 100; i++) { + if (a[i]) + x += 1; + } +} ++ +
The following is equivalent in non-concurrent situations:
+ ++int x; +void f(int* a) { + int xtemp = x; + for (int i = 0; i < 100; i++) { + if (a[i]) + xtemp += 1; + } + x = xtemp; +} ++ +
However, LLVM is not allowed to transform the former to the latter: it could + indirectly introduce undefined behavior if another thread can access x at + the same time. (This example is particularly of interest because before the + concurrency model was implemented, LLVM would perform this + transformation.)
+ +Note that speculative loads are allowed; a load which
is part of a race returns undef
, but does not have undefined
behavior.
For cases where simple loads and stores are not sufficient, LLVM provides - atomic loads and stores with varying levels of guarantees.
For cases where simple loads and stores are not sufficient, LLVM provides + various atomic instructions. The exact guarantees provided depend on the + ordering; see Atomic orderings
+ +load atomic
and store atomic
provide the same
+ basic functionality as non-atomic loads and stores, but provide additional
+ guarantees in situations where threads and signals are involved.
cmpxchg
and atomicrmw
are essentially like an
atomic load followed by an atomic store (where the store is conditional for
- cmpxchg
), but no other memory operation can happen between
- the load and store. Note that our cmpxchg does not have quite as many
- options for making cmpxchg weaker as the C++0x version.
cmpxchg
), but no other memory operation can happen on any thread
+ between the load and store. Note that LLVM's cmpxchg does not provide quite
+ as many options as the C++0x version.
A fence
provides Acquire and/or Release ordering which is not
part of another operation; it is normally used along with Monotonic memory
@@ -144,7 +178,55 @@ instructions has been clarified in the IR.
In order to achieve a balance between performance and necessary guarantees, there are six levels of atomicity. They are listed in order of strength; each level includes all the guarantees of the previous level except for - Acquire/Release.
+ Acquire/Release. (See also LangRef.) + + +NotAtomic is the obvious, a load or store which is not atomic. (This isn't + really a level of atomicity, but is listed here for comparison.) This is + essentially a regular load or store. If there is a race on a given memory + location, loads from that location return undef.
+ +cmpxchg
and
+ and stores. No fences are required. cmpxchg
and
atomicrmw
are required to appear as a single operation.There are essentially two components to supporting atomic operations. The - first is making sure to query isSimple() or isUnordered() instead - of isVolatile() before transforming an operation. The other piece is - making sure that a transform does not end up replacing, for example, an - Unordered operation with a non-atomic operation. Most of the other - necessary checks automatically fall out from existing predicates and - alias analysis queries.
+To support optimizing around atomic operations, make sure you are using + the right predicates; everything should work if that is done. If your + pass should optimize some atomic operations (Unordered operations in + particular), make sure it doesn't replace an atomic load or store with + a non-atomic operation.
Some examples of how optimizations interact with various kinds of atomic operations: