From 91a44dd9ccd8ec3a10fa35315c381cffade91d5b Mon Sep 17 00:00:00 2001
From: Eli Friedman
The basic 'load'
and 'store'
allow a variety of
- optimizations, but can have unintuitive results in a concurrent environment.
- For a frontend writer, the rule is essentially that all memory accessed
- with basic loads and stores by multiple threads should be protected by a
- lock or other synchronization; otherwise, you are likely to run into
- undefined behavior. (Do not use volatile as a substitute for atomics; it
- might work on some platforms, but does not provide the necessary guarantees
- in general.)
From the optimizer's point of view, the rule is that if there are not any instructions with atomic ordering involved, concurrency does not matter, with one exception: if a variable might be visible to another thread or signal handler, a store cannot be inserted along a path where it - might not execute otherwise. For example, suppose LICM wants to take all the - loads and stores in a loop to and from a particular address and promote them - to registers. LICM is not allowed to insert an unconditional store after - the loop with the computed value unless a store unconditionally executes - within the loop. Note that speculative loads are allowed; a load which + might not execute otherwise. Take the following example:
+ ++/* C code, for readability; run through clang -O2 -S -emit-llvm to get + equivalent IR */ +int x; +void f(int* a) { + for (int i = 0; i < 100; i++) { + if (a[i]) + x += 1; + } +} ++ +
The following is equivalent in non-concurrent situations:
+ ++int x; +void f(int* a) { + int xtemp = x; + for (int i = 0; i < 100; i++) { + if (a[i]) + xtemp += 1; + } + x = xtemp; +} ++ +
However, LLVM is not allowed to transform the former to the latter: it could + introduce undefined behavior if another thread can access x at the same time. + (This example is particularly of interest because before the concurrency model + was implemented, LLVM would perform this transformation.)
+ +Note that speculative loads are allowed; a load which
is part of a race returns undef
, but does not have undefined
behavior.
For cases where simple loads and stores are not sufficient, LLVM provides - atomic loads and stores with varying levels of guarantees.
For cases where simple loads and stores are not sufficient, LLVM provides + various atomic instructions. The exact guarantees provided depend on the + ordering; see Atomic orderings
+ +load atomic
and store atomic
provide the same
+ basic functionality as non-atomic loads and stores, but provide additional
+ guarantees in situations where threads and signals are involved.
cmpxchg
and atomicrmw
are essentially like an
atomic load followed by an atomic store (where the store is conditional for
- cmpxchg
), but no other memory operation can happen between
- the load and store. Note that our cmpxchg does not have quite as many
- options for making cmpxchg weaker as the C++0x version.
cmpxchg
), but no other memory operation can happen on any thread
+ between the load and store. Note that LLVM's cmpxchg does not provide quite
+ as many options as the C++0x version.
A fence
provides Acquire and/or Release ordering which is not
part of another operation; it is normally used along with Monotonic memory
@@ -146,6 +179,54 @@ instructions has been clarified in the IR.
NotAtomic is the obvious, a load or store which is not atomic. (This isn't + really a level of atomicity, but is listed here for comparison.) This is + essentially a regular load or store. If code accesses a memory location + from multiple threads at the same time, the resulting loads return + 'undef'.
+ +There are essentially two components to supporting atomic operations. The - first is making sure to query isSimple() or isUnordered() instead - of isVolatile() before transforming an operation. The other piece is - making sure that a transform does not end up replacing, for example, an - Unordered operation with a non-atomic operation. Most of the other - necessary checks automatically fall out from existing predicates and - alias analysis queries.
+To support optimizing around atomic operations, make sure you are using + the right predicates; everything should work if that is done. If your + pass should optimize some atomic operations (Unordered operations in + particular), make sure it doesn't replace an atomic load or store with + a non-atomic operation.
Some examples of how optimizations interact with various kinds of atomic operations: -- 2.34.1