The function of the data layout string may not be what you expect. Notably, + this is not a specification from the frontend of what alignment the code + generator should use.

Instead, if specified, the target data layout is required to match what the + ultimate code generator expects. This string is used by the + mid-level optimizers to + improve code, and this only works if it matches what the ultimate code + generator uses. If you would like to generate IR that does not embed this + target-specific detail into the IR, then you don't have to specify the + string. This will disable some optimizations that require precise layout + information, but this also prevents those optimizations from introducing + target specificity into the IR.

Any memory access must be done through a pointer value associated with an address range of the memory access, otherwise the behavior @@ -1406,11 +1477,11 @@ to implement type-based alias analysis.

Volatile Memory Accesses -

+ -

Certain memory accesses, such as loads, stores, and

+ +

+ Memory Model for Concurrent Operations +

+ +

The LLVM IR does not define any way to start parallel threads of execution +or to register signal handlers. Nonetheless, there are platform-specific +ways to create them, and we define LLVM IR's behavior in their presence. This +model is inspired by the C++0x memory model.

+ +

For a more informal introduction to this model, see the +LLVM Atomic Instructions and Concurrency Guide. + +

We define a happens-before partial order as the least partial order +that

Is a superset of single-thread program order, and
When a synchronizes-with b, includes an edge from + a to b. Synchronizes-with pairs are introduced + by platform-specific techniques, like pthread locks, thread + creation, thread joining, etc., and by atomic instructions. + (See also Atomic Memory Ordering Constraints). +

+ +

Note that program order does not introduce happens-before edges +between a thread and signals executing inside that thread.

+ +

Every (defined) read operation (load instructions, memcpy, atomic +loads/read-modify-writes, etc.) R reads a series of bytes written by +(defined) write operations (store instructions, atomic +stores/read-modify-writes, memcpy, etc.). For the purposes of this section, +initialized globals are considered to have a write of the initializer which is +atomic and happens before any other read or write of the memory in question. +For each byte of a read R, R_byte may see +any write to the same byte, except:

+ +

If write₁ happens before + write₂, and write₂ happens + before R_byte, then R_byte + does not see write₁. +
If R_byte happens before + write₃, then R_byte does not + see write₃. +

+ +

Given that definition, R_byte is defined as follows: +

If R is volatile, the result is target-dependent. (Volatile + is supposed to give guarantees which can support + sig_atomic_t in C/C++, and may be used for accesses to + addresses which do not behave like normal memory. It does not generally + provide cross-thread synchronization.) +
Otherwise, if there is no write to the same byte that happens before + R_byte, R_byte returns + undef for that byte. +
Otherwise, if R_byte may see exactly one write, + R_byte returns the value written by that + write.
Otherwise, if R is atomic, and all the writes + R_byte may see are atomic, it chooses one of the + values written. See the Atomic Memory Ordering + Constraints section for additional constraints on how the choice + is made. +
Otherwise R_byte returns undef.

+ +

R returns the value composed of the series of bytes it read. +This implies that some bytes within the value may be undef +without the entire value being undef. Note that this only +defines the semantics of the operation; it doesn't mean that targets will +emit more than one instruction to read the series of bytes.

+ +

Note that in cases where none of the atomic intrinsics are used, this model +places only one restriction on IR transformations on top of what is required +for single-threaded execution: introducing a store to a byte which might not +otherwise be stored is not allowed in general. (Specifically, in the case +where another thread might write to and read from an address, introducing a +store can change a load that may see exactly one write into a load that may +see multiple writes.)

+ + + +

+ + +

+ Atomic Memory Ordering Constraints +

+ +

Atomic instructions (cmpxchg, +atomicrmw, +fence, +atomic load, and +atomic store) take an ordering parameter +that determines which other atomic instructions on the same address they +synchronize with. These semantics are borrowed from Java and C++0x, +but are somewhat more colloquial. If these descriptions aren't precise enough, +check those specs (see spec references in the +atomics guide). +fence instructions +treat these orderings somewhat differently since they don't take an address. +See that instruction's documentation for details.

+ +

For a simpler introduction to the ordering constraints, see the +LLVM Atomic Instructions and Concurrency Guide.

+ +

unordered: The set of values that can be read is governed by the happens-before +partial order. A value cannot be read unless some operation wrote it. +This is intended to provide a guarantee strong enough to model Java's +non-volatile shared variables. This ordering cannot be specified for +read-modify-write operations; it is not strong enough to make them atomic +in any interesting way.
monotonic: In addition to the guarantees of unordered, there is a single +total order for modifications by monotonic operations on each +address. All modification orders must be compatible with the happens-before +order. There is no guarantee that the modification orders can be combined to +a global total order for the whole program (and this often will not be +possible). The read in an atomic read-modify-write operation +(cmpxchg and +atomicrmw) +reads the value in the modification order immediately before the value it +writes. If one atomic read happens before another atomic read of the same +address, the later read must see the same value or a later value in the +address's modification order. This disallows reordering of +monotonic (or stronger) operations on the same address. If an +address is written monotonically by one thread, and other threads +monotonically read that address repeatedly, the other threads must +eventually see the write. This corresponds to the C++0x/C1x +memory_order_relaxed.
acquire: In addition to the guarantees of monotonic, +a synchronizes-with edge may be formed with a release +operation. This is intended to model C++'s memory_order_acquire.
release: In addition to the guarantees of monotonic, if this operation +writes a value which is subsequently read by an acquire operation, +it synchronizes-with that operation. (This isn't a complete +description; see the C++0x definition of a release sequence.) This corresponds +to the C++0x/C1x memory_order_release.
acq_rel (acquire+release): Acts as both an +acquire and release operation on its address. +This corresponds to the C++0x/C1x memory_order_acq_rel.
seq_cst (sequentially consistent): +; In addition to the guarantees of acq_rel +(acquire for an operation which only reads, release +for an operation which only writes), there is a global total order on all +sequentially-consistent operations on all addresses, which is consistent with +the happens-before partial order and with the modification orders of +all the affected addresses. Each sequentially-consistent read sees the last +preceding write to the same address in this global order. This corresponds +to the C++0x/C1x memory_order_seq_cst and Java volatile.

+ +

If an atomic operation is marked singlethread, +it only synchronizes with or participates in modification and seq_cst +total orderings with other operations running in the same thread (for example, +in signal handlers).

+ +

+ -

Type System

The LLVM type system is one of the most important features of the intermediate representation. Being typed enables a number of optimizations @@ -1437,13 +1689,12 @@ synchronization behavior.

and transformations that are not feasible to perform on normal three address code representations.

- -

Type -Classifications

+ Type Classifications +

The types fall into a few useful classifications:

@@ -1465,7 +1716,6 @@ Classifications

pointer, vector, structure, - union, array, label, metadata. @@ -1475,7 +1725,9 @@ Classifications

primitive label, void, + integer, floating point, + x86mmx, metadata. @@ -1484,8 +1736,6 @@ Classifications

function, pointer, structure, - packed structure, - union, vector, opaque. @@ -1500,19 +1750,21 @@ Classifications

+ Primitive Types +

The primitive types are the fundamental building blocks of the LLVM system.

- -

Integer Type

+ Integer Type +

Overview:

The integer type is a very simple type that simply specifies an arbitrary @@ -1546,9 +1798,11 @@ Classifications

Floating Point Types

+ Floating Point Types +

@@ -1564,9 +1818,28 @@ Classifications -

Void Type

+ X86mmx Type +

+ +

Overview:

The x86mmx type represents a value held in an MMX register on an x86 machine. The operations allowed on it are quite limited: parameters and return values, load and store, and bitcast. User-specified MMX instructions are represented as intrinsic or asm calls with arguments and/or results of this type. There are no arrays, vectors or constants of this type.

+ +

Syntax:

+  x86mmx
+

+ +

+ + +

+ Void Type +

Overview:

The void type does not represent any value and has no size.

@@ -1579,9 +1852,11 @@ Classifications

Label Type

+ Label Type +

Overview:

The label type represents code labels.

@@ -1594,9 +1869,11 @@ Classifications

Metadata Type

+ Metadata Type +

Overview:

The metadata type represents embedded metadata. No derived types may be @@ -1610,11 +1887,14 @@ Classifications

+ -

Derived Types

+ Derived Types +

The real power in LLVM comes from the derived types in the system. This is what allows a programmer to represent arrays, functions, pointers, and other @@ -1623,25 +1903,26 @@ Classifications

possible to have a two dimensional array, using an array as the element type of another array.

- -

Aggregate Types

+ Aggregate Types +

Aggregate Types are a subset of derived types that can contain multiple member types. Arrays, - structs, vectors and - unions are aggregate types.

+ structs, and vectors are + aggregate types.

Array Type

+ Array Type +

Overview:

The array type is a very simple derived type that arranges elements @@ -1697,16 +1978,16 @@ Classifications

Function Type

+ Function Type +

Overview:

The function type can be thought of as a function signature. It consists of a return type and a list of formal parameter types. The return type of a - function type is a scalar type, a void type, a struct type, or a union - type. If the return type is a struct type then all struct elements must be - of first class types, and the struct must have at least one element.

+ function type is a first class type or a void type.

Syntax:

@@ -1752,15 +2033,15 @@ Classifications

Structure Type

+ Structure Type +

Overview:

The structure type is used to represent a collection of data members together - in memory. The packing of the field types is defined to match the ABI of the - underlying processor. The elements of a structure may be any type that has a - size.

+ in memory. The elements of a structure may be any type that has a size.

Structures in memory are accessed using 'load' and 'store' by getting a pointer to a field @@ -1768,116 +2049,84 @@ Classifications

Structures in registers are accessed using the 'extractvalue' and 'insertvalue' instructions.

+ +

Structures may optionally be "packed" structures, which indicate that the + alignment of the struct is one byte, and that there is no padding between + the elements. In non-packed structs, padding between field types is inserted + as defined by the TargetData string in the module, which is required to match + what the underlying code generator expects.

+ +

Structures can either be "literal" or "identified". A literal structure is + defined inline with other types (e.g. {i32, i32}*) whereas identified + types are always defined at the top level with a name. Literal types are + uniqued by their contents and can never be recursive or opaque since there is + no way to write one. Identified types can be recursive, can be opaqued, and are + never uniqued. +

Syntax:

-  { <type list> }
+  %T1 = type { <type list> }     ; Identified normal struct type
+  %T2 = type <{ <type list> }>   ; Identified packed struct type

- +

Examples:

- + + -

{ i32, i32, i32 } A triple of three i32 values

{ float, i32 (i32) * } A pair, where the first element is a float and the second element is a pointer to a function that takes an i32, returning an i32.

- -

- - -

Packed Structure Type -

- -

Overview:

The packed structure type is used to represent a collection of data members - together in memory. There is no padding between fields. Further, the - alignment of a packed structure is 1 byte. The elements of a packed - structure may be any type that has a size.

- -

Structures are accessed using 'load and - 'store' by getting a pointer to a field with - the 'getelementptr' instruction.

- -

Syntax:

-  < { <type list> } >
-

- -

Examples:

- - - - - - + +

< { i32, i32, i32 } > A triple of three i32 values

-< { float, i32 (i32)* } > A pair, where the first element is a float and the - second element is a pointer to a - function that takes an i32, returning - an i32. <{ i8, i32 }> A packed struct known to be 5 bytes in size.

- + -

Union Type

+ Opaque Structure Types +

Overview:

A union type describes an object with size and alignment suitable for - an object of any one of a given set of types (also known as an "untagged" - union). It is similar in concept and usage to a - struct, except that all members of the union - have an offset of zero. The elements of a union may be any type that has a - size. Unions must have at least one member - empty unions are not allowed. -

- -

The size of the union as a whole will be the size of its largest member, - and the alignment requirements of the union as a whole will be the largest - alignment requirement of any member.

- -

Union members are accessed using 'load and - 'store' by getting a pointer to a field with - the 'getelementptr' instruction. - Since all members are at offset zero, the getelementptr instruction does - not affect the address, only the type of the resulting pointer.

Opaque structure types are used to represent named structure types that do + not have a body specified. This corresponds (for example) to the C notion of + a forward declared structure.

Syntax:

-  union { <type list> }
+  %X = type opaque
+  %52 = type opaque

Examples:

- - - - - + +

union { i32, i32*, float } A union of three types: an i32, a pointer to - an i32, and a float.

- union { float, i32 (i32) * } A union, where the first element is a float and the - second element is a pointer to a - function that takes an i32, returning - an i32. opaque An opaque type.

+ + -

Pointer Type

+ Pointer Type +

Overview:

The pointer type is used to specify memory locations. @@ -1919,9 +2168,11 @@ Classifications

Vector Type

+ Vector Type +

Overview:

A vector type is a simple derived type that represents a vector of elements. @@ -1935,8 +2186,9 @@ Classifications

< <# elements> x <elementtype> > -

The number of elements is a constant integer value; elementtype may be any - integer or floating point type.

The number of elements is a constant integer value larger than 0; elementtype + may be any integer or floating point type. Vectors of size zero are not + allowed, and pointers are not allowed as the element type.

Examples:

@@ -1956,94 +2208,25 @@ Classifications - -

Opaque Type

- -

Overview:

Opaque types are used to represent unknown types in the system. This - corresponds (for example) to the C notion of a forward declared structure - type. In LLVM, opaque types can eventually be resolved to any type (not just - a structure type).

- -

Syntax:

-  opaque
-

- -

Examples:

- - - - -

opaque An opaque type.

- -

- - -

- Type Up-references

- -

Overview:

An "up reference" allows you to refer to a lexically enclosing type without - requiring it to have a name. For instance, a structure declaration may - contain a pointer to any of the types it is lexically a member of. Example - of up references (with their equivalent as named type declarations) - include:

- -

-   { \2 * }                %x = type { %x* }
-   { \2 }*                 %y = type { %y }*
-   \1*                     %z = type %z*
-

- -

An up reference is needed by the asmprinter for printing out cyclic types - when there is no declared name for a type in the cycle. Because the - asmprinter does not want to print out an infinite type string, it needs a - syntax to handle recursive types that have no names (all names are optional - in llvm IR).

- -

Syntax:

-   \<level>
-

- -

The level is the count of the lexical type that is being referred to.

- -

Examples:

- - - - - - - - - -

\1* Self-referential pointer.

{ { \3*, i8 }, i32 } Recursive structure where the upref refers to the out-most - structure.

Constants

LLVM has several different basic types of constants. This section describes them all and their syntax.

- -

Simple Constants

+ Simple Constants +

Boolean constants

they match the long double format on your target. All hexadecimal formats are big-endian (sign bit at the left).

There are no constants of type x86mmx.

Complex Constants -

+ -

Complex constants are a (potentially recursive) combination of simple constants and smaller complex constants.

@@ -2115,14 +2299,6 @@ Classifications

the number and types of elements must match those specified by the type. -

Union constants

Union constants are represented with notation similar to a structure with - a single element - that is, a single typed element surrounded - by braces ({})). For example: "{ i32 4 }". The - union type can be initialized with a single-element - struct as long as the type of the struct element matches the type of - one of the union members.

Array constants

Array constants are represented with notation similar to array type definitions (a comma separated list of elements, surrounded by square @@ -2158,11 +2334,11 @@ Classifications

Global Variable and Function Addresses -

+ -

The addresses of global variables and functions are always implicitly valid @@ -2180,13 +2356,16 @@ Classifications

Undefined Values

+ Undefined Values +

+ +

The string 'undef' can be used anywhere a constant is expected, and indicates that the user of the value may receive an unspecified bit-pattern. - Undefined values may be of any type (other than label or void) and be used - anywhere a constant is permitted.

+ Undefined values may be of any type (other than 'label' + or 'void') and be used anywhere a constant is permitted.

Undefined values are useful because they indicate to the compiler that the program is well defined no matter what value is used. This gives the @@ -2205,7 +2384,7 @@ Safe:

This is safe because all of the output bits are affected by the undef bits. -Any output bit can have a zero or one depending on the input bits.

+ Any output bit can have a zero or one depending on the input bits.

   %A = or %X, undef
@@ -2219,13 +2398,14 @@ Unsafe:

These logical operations have bits that are not always affected by the input. -For example, if "%X" has a zero bit, then the output of the 'and' operation will -always be a zero, no matter what the corresponding bit from the undef is. As -such, it is unsafe to optimize or assume that the result of the and is undef. -However, it is safe to assume that all bits of the undef could be 0, and -optimize the and to 0. Likewise, it is safe to assume that all the bits of -the undef operand to the or could be set, allowing the or to be folded to --1.

+ For example, if %X has a zero bit, then the output of the + 'and' operation will always be a zero for that bit, no matter what + the corresponding bit from the 'undef' is. As such, it is unsafe to + optimize or assume that the result of the 'and' is 'undef'. + However, it is safe to assume that all bits of the 'undef' could be + 0, and optimize the 'and' to 0. Likewise, it is safe to assume that + all the bits of the 'undef' operand to the 'or' could be + set, allowing the 'or' to be folded to -1.

   %A = select undef, %X, %Y
@@ -2241,13 +2421,14 @@ Unsafe:
   %C = undef

This set of examples show that undefined select (and conditional branch) -conditions can go "either way" but they have to come from one of the two -operands. In the %A example, if %X and %Y were both known to have a clear low -bit, then %A would have to have a cleared low bit. However, in the %C example, -the optimizer is allowed to assume that the undef operand could be the same as -%Y, allowing the whole select to be eliminated.

- +

This set of examples shows that undefined 'select' (and conditional + branch) conditions can go either way, but they have to come from one + of the two operands. In the %A example, if %X and + %Y were both known to have a clear low bit, then %A would + have to have a cleared low bit. However, in the %C example, the + optimizer is allowed to assume that the 'undef' operand could be the + same as %Y, allowing the whole 'select' to be + eliminated.

   %A = xor undef, undef
@@ -2268,16 +2449,17 @@ Safe:
   %F = undef

This example points out that two undef operands are not necessarily the same. -This can be surprising to people (and also matches C semantics) where they -assume that "X^X" is always zero, even if X is undef. This isn't true for a -number of reasons, but the short answer is that an undef "variable" can -arbitrarily change its value over its "live range". This is true because the -"variable" doesn't actually have a live range. Instead, the value is -logically read from arbitrary registers that happen to be around when needed, -so the value is not necessarily consistent over time. In fact, %A and %C need -to have the same semantics or the core LLVM "replace all uses with" concept -would not hold.

This example points out that two 'undef' operands are not + necessarily the same. This can be surprising to people (and also matches C + semantics) where they assume that "X^X" is always zero, even + if X is undefined. This isn't true for a number of reasons, but the + short answer is that an 'undef' "variable" can arbitrarily change + its value over its "live range". This is true because the variable doesn't + actually have a live range. Instead, the value is logically read + from arbitrary registers that happen to be around when needed, so the value + is not necessarily consistent over time. In fact, %A and %C + need to have the same semantics or the core LLVM "replace all uses with" + concept would not hold.

   %A = fdiv undef, %X
@@ -2288,17 +2470,17 @@ b: unreachable

These examples show the crucial difference between an undefined -value and undefined behavior. An undefined value (like undef) is -allowed to have an arbitrary bit-pattern. This means that the %A operation -can be constant folded to undef because the undef could be an SNaN, and fdiv is -not (currently) defined on SNaN's. However, in the second example, we can make -a more aggressive assumption: because the undef is allowed to be an arbitrary -value, we are allowed to assume that it could be zero. Since a divide by zero -has undefined behavior, we are allowed to assume that the operation -does not execute at all. This allows us to delete the divide and all code after -it: since the undefined operation "can't happen", the optimizer can assume that -it occurs in dead code. -

+ value and undefined behavior. An undefined value (like + 'undef') is allowed to have an arbitrary bit-pattern. This means that + the %A operation can be constant folded to 'undef', because + the 'undef' could be an SNaN, and fdiv is not (currently) + defined on SNaN's. However, in the second example, we can make a more + aggressive assumption: because the undef is allowed to be an + arbitrary value, we are allowed to assume that it could be zero. Since a + divide by zero has undefined behavior, we are allowed to assume that + the operation does not execute at all. This allows us to delete the divide and + all code after it. Because the undefined operation "can't happen", the + optimizer can assume that it occurs in dead code.

 a:  store undef -> %X
@@ -2308,17 +2490,20 @@ a: <deleted>
 b: unreachable

These examples reiterate the fdiv example: a store "of" an undefined value -can be assumed to not have any effect: we can assume that the value is -overwritten with bits that happen to match what was already there. However, a -store "to" an undefined location could clobber arbitrary memory, therefore, it -has undefined behavior.

These examples reiterate the fdiv example: a store of an + undefined value can be assumed to not have any effect; we can assume that the + value is overwritten with bits that happen to match what was already there. + However, a store to an undefined location could clobber arbitrary + memory, therefore, it has undefined behavior.

Trap Values

+ Trap Values +

+ +

Trap values are similar to undef values, however instead of representing an unspecified bit pattern, they represent the @@ -2370,14 +2555,19 @@ has undefined behavior.

terminator instruction if the terminator instruction has multiple successors and the instruction is always executed when control transfers to one of the successors, and - may not be executed when control is transfered to another. + may not be executed when control is transferred to another. + +

Additionally, an instruction also control-depends on a terminator + instruction if the set of instructions it otherwise depends on would be + different if the terminator had transferred control to a different + successor.

Dependence is transitive.

Whenever a trap value is generated, all values which depend on it evaluate - to trap. If they have side effects, the evoke their side effects as if each + to trap. If they have side effects, they evoke their side effects as if each operand with a trap value were undef. If they have externally-visible side effects, the behavior is undefined.

@@ -2397,11 +2587,11 @@ entry: %narrowaddr = bitcast i32* @g to i16* %wideaddr = bitcast i32* @g to i64* - %trap3 = load 16* %narrowaddr ; Returns a trap value. - %trap4 = load i64* %widaddr ; Returns a trap value. + %trap3 = load i16* %narrowaddr ; Returns a trap value. + %trap4 = load i64* %wideaddr ; Returns a trap value. - %cmp = icmp i32 slt %trap, 0 ; Returns a trap value. - %br i1 %cmp, %true, %end ; Branch to either destination. + %cmp = icmp slt i32 %trap, 0 ; Returns a trap value. + br i1 %cmp, label %true, label %end ; Branch to either destination. true: volatile store i32 0, i32* @g ; This is control-dependent on %cmp, so @@ -2414,17 +2604,34 @@ end: ; control-dependent on %cmp, so this ; always results in a trap value. - volatile store i32 0, i32* @g ; %end is control-equivalent to %entry - ; so this is defined (ignoring earlier + volatile store i32 0, i32* @g ; This would depend on the store in %true + ; if %cmp is true, or the store in %entry + ; otherwise, so this is undefined behavior. + + br i1 %cmp, label %second_true, label %second_end + ; The same branch again, but this time the + ; true block doesn't have side effects. + +second_true: + ; No side effects! + ret void + +second_end: + volatile store i32 0, i32* @g ; This time, the instruction always depends + ; on the store in %end. Also, it is + ; control-equivalent to %end, so this is + ; well-defined (again, ignoring earlier ; undefined behavior in this example).

Addresses of Basic - Blocks

+ Addresses of Basic Blocks +

+ +

blockaddress(@function, %block)

@@ -2433,33 +2640,33 @@ end: the address of the entry block is illegal.

This value only has defined behavior when used as an operand to the - 'indirectbr' instruction or for comparisons - against null. Pointer equality tests between labels addresses is undefined - behavior - though, again, comparison against null is ok, and no label is - equal to the null pointer. This may also be passed around as an opaque - pointer sized value as long as the bits are not inspected. This allows - ptrtoint and arithmetic to be performed on these values so long as - the original value is reconstituted before the indirectbr.

+ 'indirectbr' instruction, or for + comparisons against null. Pointer equality tests between labels addresses + results in undefined behavior — though, again, comparison against null + is ok, and no label is equal to the null pointer. This may be passed around + as an opaque pointer sized value as long as the bits are not inspected. This + allows ptrtoint and arithmetic to be performed on these values so + long as the original value is reconstituted before the indirectbr + instruction.

Finally, some targets may provide defined semantics when - using the value as the operand to an inline assembly, but that is target - specific. -

Finally, some targets may provide defined semantics when using the value as + the operand to an inline assembly, but that is target specific.

Constant Expressions -

+ Constant Expressions +

Constant expressions are used to allow expressions involving other constants to be used as constants. Constant expressions may be of any first class type and may involve any LLVM operation that does not have side effects (e.g. load and call are not - supported). The following is the syntax for constant expressions:

+ supported). The following is the syntax for constant expressions:

trunc (CST to TYPE)

+ -

Other Values

- +

Inline Assembler Expressions -

+ -

LLVM supports inline assembler expressions (as opposed to Module-Level Inline Assembly) through the use of @@ -2637,17 +2846,16 @@ call void asm alignstack "eieio", ""() documented here. Constraints on what can be done (e.g. duplication, moving, etc need to be documented). This is probably best done by reference to another document that covers inline asm from a holistic perspective.

Inline Asm Metadata -

+ -

The call instructions that wrap inline asm nodes may have a "!srcloc" MDNode - attached to it that contains a constant integer. If present, the code - generator will use the integer as the location cookie value when report + attached to it that contains a list of constant integers. If present, the + code generator will use the integer as the location cookie value when report errors through the LLVMContext error reporting mechanisms. This allows a front-end to correlate backend errors that occur with inline asm back to the source code that produced it. For example:

@@ -2659,16 +2867,19 @@ call void asm sideeffect "something bad", ""(), !srcloc !42

It is up to the front-end to make sense of the magic numbers it places in the - IR.

+ IR. If the MDNode contains multiple constants, the code generator will use + the one that corresponds to the line of the asm that the error occurs on.

- -

Metadata Nodes and Metadata - Strings

+ +

+ Metadata Nodes and Metadata Strings +

+ +

LLVM IR allows metadata to be attached to instructions in the program that can convey extra information about the code to the optimizers and code @@ -2693,25 +2904,107 @@ call void asm sideeffect "something bad", ""(), !srcloc !42

Metadata can be used as function arguments. Here llvm.dbg.value function is using two metadata arguments.

-       call void @llvm.dbg.value(metadata !24, i64 0, metadata !25)
-

+call void @llvm.dbg.value(metadata !24, i64 0, metadata !25)
+

Metadata can be attached with an instruction. Here metadata !21 is attached with add instruction using !dbg identifier.

-      %indvar.next = add i64 %indvar, 1, !dbg !21
-

+%indvar.next = add i64 %indvar, 1, !dbg !21
+

+ +

More information about specific metadata nodes recognized by the optimizers + and code generator is found below.

+ +

+ '`tbaa`' Metadata +

+ +

In LLVM IR, memory does not have types, so LLVM's own type system is not + suitable for doing TBAA. Instead, metadata is added to the IR to describe + a type system of a higher level language. This can be used to implement + typical C/C++ TBAA, but it can also be used to implement custom alias + analysis behavior for other languages.

+ +

The current metadata format is very simple. TBAA metadata nodes have up to + three fields, e.g.:

+ +

+!0 = metadata !{ metadata !"an example type tree" }
+!1 = metadata !{ metadata !"int", metadata !0 }
+!2 = metadata !{ metadata !"float", metadata !0 }
+!3 = metadata !{ metadata !"const float", metadata !2, i64 1 }
+

+ +

The first field is an identity field. It can be any value, usually + a metadata string, which uniquely identifies the type. The most important + name in the tree is the name of the root node. Two trees with + different root node names are entirely disjoint, even if they + have leaves with common names.

+ +

The second field identifies the type's parent node in the tree, or + is null or omitted for a root node. A type is considered to alias + all of its descendants and all of its ancestors in the tree. Also, + a type is considered to alias all types in other trees, so that + bitcode produced from multiple front-ends is handled conservatively.

+ +

If the third field is present, it's an integer which if equal to 1 + indicates that the type is "constant" (meaning + pointsToConstantMemory should return true; see + other useful + AliasAnalysis methods).

+ +

+ '`fpaccuracy`' Metadata +

+ +

fpaccuracy metadata may be attached to any instruction of floating + point type. It expresses the maximum relative error of the result of + that instruction, in ULPs. ULP is defined as follows:

+ +

+If x is a real number that lies between two finite consecutive floating-point +numbers a and b, without being equal to one of them, then ulp(x) = |b - a|, +otherwise ulp(x) is the distance between the two non-equal finite +floating-point numbers nearest x. Moreover, ulp(NaN) is NaN. +

+ +

The maximum relative error may be any rational number. The metadata node + shall consist of a pair of unsigned integers respectively representing + the numerator and denominator. For example, 2.5 ULP:

+ +

+!0 = metadata !{ i32 5, i32 2 }
+

+ +

Intrinsic Global Variables -

+ - +

LLVM has a number of "magic" global variables that contain data that affect code generation or other IR semantics. These are documented here. All globals of this sort should have a section specified as "llvm.metadata". This @@ -2719,95 +3012,113 @@ section and all globals that start with "llvm." are reserved for use by LLVM.

The '`llvm.used`' Global Variable -

+ -

The @llvm.used global is an array with i8* element type which has appending linkage. This array contains a list of pointers to global variables and functions which may optionally have a pointer cast formed of bitcast or getelementptr. For example, a legal use of it is:

-  @X = global i8 4
-  @Y = global i32 123
+@X = global i8 4
+@Y = global i32 123
 
-  @llvm.used = appending global [2 x i8*] [
-     i8* @X,
-     i8* bitcast (i32* @Y to i8*)
-  ], section "llvm.metadata"
+@llvm.used = appending global [2 x i8*] [
+   i8* @X,
+   i8* bitcast (i32* @Y to i8*)
+], section "llvm.metadata"

If a global variable appears in the @llvm.used list, then the -compiler, assembler, and linker are required to treat the symbol as if there is -a reference to the global that it cannot see. For example, if a variable has -internal linkage and no references other than that from the @llvm.used -list, it cannot be deleted. This is commonly used to represent references from -inline asms and other things the compiler cannot "see", and corresponds to -"attribute((used))" in GNU C.

+ compiler, assembler, and linker are required to treat the symbol as if there + is a reference to the global that it cannot see. For example, if a variable + has internal linkage and no references other than that from + the @llvm.used list, it cannot be deleted. This is commonly used to + represent references from inline asms and other things the compiler cannot + "see", and corresponds to "attribute((used))" in GNU C.

On some targets, the code generator must emit a directive to the assembler or -object file to prevent the assembler and linker from molesting the symbol.

+ object file to prevent the assembler and linker from molesting the + symbol.

-The 'llvm.compiler.used' Global Variable -

+ + The '`llvm.compiler.used`' Global Variable + +

The @llvm.compiler.used directive is the same as the -@llvm.used directive, except that it only prevents the compiler from -touching the symbol. On targets that support it, this allows an intelligent -linker to optimize references to the symbol without being impeded as it would be -by @llvm.used.

+ @llvm.used directive, except that it only prevents the compiler from + touching the symbol. On targets that support it, this allows an intelligent + linker to optimize references to the symbol without being impeded as it would + be by @llvm.used.

This is a rare construct that should only be used in rare circumstances, and -should not be exposed to source languages.

+ should not be exposed to source languages.

The '`llvm.global_ctors`' Global Variable -

+ -

+ +

 %0 = type { i32, void ()* }
 @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor }]

The @llvm.global_ctors array contains a list of constructor functions and associated priorities. The functions referenced by this array will be called in ascending order of priority (i.e. lowest first) when the module is loaded. The order of functions with the same priority is not defined. -

+ +

The @llvm.global_ctors array contains a list of constructor + functions and associated priorities. The functions referenced by this array + will be called in ascending order of priority (i.e. lowest first) when the + module is loaded. The order of functions with the same priority is not + defined.

The '`llvm.global_dtors`' Global Variable -

+ + +

 %0 = type { i32, void ()* }
 @llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor }]

The @llvm.global_dtors array contains a list of destructor functions and associated priorities. The functions referenced by this array will be called in descending order of priority (i.e. highest first) when the module is loaded. The order of functions with the same priority is not defined. -

The @llvm.global_dtors array contains a list of destructor functions + and associated priorities. The functions referenced by this array will be + called in descending order of priority (i.e. highest first) when the module + is loaded. The order of functions with the same priority is not defined.

Instruction Reference

The LLVM instruction set consists of several different classifications of instructions: terminator @@ -2816,13 +3127,12 @@ should not be exposed to source languages.

memory instructions, and other instructions.

- -

Terminator -Instructions

+ Terminator Instructions +

As mentioned previously, every basic block in a program ends with a "Terminator" instruction, which indicates which @@ -2831,22 +3141,22 @@ Instructions

control flow, not values (the one exception being the 'invoke' instruction).

There are seven different terminator instructions: the - 'ret' instruction, the - 'br' instruction, the - 'switch' instruction, the - ''indirectbr' Instruction, the - 'invoke' instruction, the - 'unwind' instruction, and the - 'unreachable' instruction.

- -

The terminator instructions are: + 'ret', + 'br', + 'switch', + 'indirectbr', + 'invoke', + 'unwind', + 'resume', and + 'unreachable'.

'ret' -Instruction

+ '`ret`' Instruction +

Syntax:

@@ -2892,13 +3202,16 @@ Instruction

'br' Instruction

+ '`br`' Instruction +

Syntax:

-  br i1 <cond>, label <iftrue>, label <iffalse>
  br label <dest>          ; Unconditional branch
+  br i1 <cond>, label <iftrue>, label <iffalse>
+  br label <dest>          ; Unconditional branch

Overview:

@@ -2933,11 +3246,11 @@ IfUnequal:

'`switch`' Instruction -

+ -

Syntax:

@@ -2988,11 +3301,11 @@ IfUnequal:
 
 
 
-
+
    'indirectbr' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3036,11 +3349,11 @@ IfUnequal:
 
 
 
-
+
   'invoke' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3058,6 +3371,17 @@ IfUnequal:
    instruction, control is interrupted and continued at the dynamically nearest
    "exception" label.
 
+The 'exception' label is a
+   landing pad for the
+   exception. As such, 'exception' label is required to have the
+   "landingpad" instruction, which contains
+   the information about about the behavior of the program after unwinding
+   happens, as its first non-PHI instruction. The restrictions on the
+   "landingpad" instruction's tightly couples it to the
+   "invoke" instruction, so that the important information contained
+   within the "landingpad" instruction can't be lost through normal
+   code motion.
+
 Arguments:
 This instruction requires several arguments:
 
@@ -3126,10 +3450,11 @@ that the invoke/unwind semantics are likely to change in future versions.
 
 
 
- 'unwind'
-Instruction 
+
+  'unwind' Instruction
+
 
-
+
 
 Syntax:
 @@ -3155,33 +3480,72 @@ that the invoke/unwind semantics are likely to change in future versions.
 
 
 
-
-
- 'unreachable'
-Instruction 
+ 
+ 
+
+  'resume' Instruction
+
 
-
+
 
 Syntax:
 -  unreachable
+  resume <type> <value>
 
 
 Overview:
-The 'unreachable' instruction has no defined semantics.  This
-   instruction is used to inform the optimizer that a particular portion of the
-   code is not reachable.  This can be used to indicate that the code after a
-   no-return function cannot be reached, and other facts.
+The 'resume' instruction is a terminator instruction that has no
+   successors.
 
-Semantics:
-The 'unreachable' instruction has no defined semantics.
+Arguments:
+The 'resume' instruction requires one argument, which must have the
+   same type as the result of any 'landingpad' instruction in the same
+   function.
 
-
+Semantics:
+The 'resume' instruction resumes propagation of an existing
+   (in-flight) exception whose unwinding was interrupted with
+   a landingpad instruction.
+
+Example:
++  resume { i8*, i32 } %exn
+
+
+
+
+
+
+
+  'unreachable' Instruction
+
+
+
+
+Syntax:
++  unreachable
+
+
+Overview:
+The 'unreachable' instruction has no defined semantics.  This
+   instruction is used to inform the optimizer that a particular portion of the
+   code is not reachable.  This can be used to indicate that the code after a
+   no-return function cannot be reached, and other facts.
+
+Semantics:
+The 'unreachable' instruction has no defined semantics.
+
+
+
+
 
 
- Binary Operations 
+
+  Binary Operations
+
 
-
+
 
 Binary operators are used to do most of the computation in a program.  They
    require two operands of the same type, execute an operation on them, and
@@ -3191,14 +3555,12 @@ Instruction 
 
 There are several different binary operators:
 
-
-
 
-
+
   'add' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3239,11 +3601,11 @@ Instruction 
 
 
 
-
+
   'fadd' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3269,11 +3631,11 @@ Instruction 
 
 
 
-
+
    'sub' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3321,11 +3683,11 @@ Instruction 
 
 
 
-
+
    'fsub' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3357,11 +3719,11 @@ Instruction 
 
 
 
-
+
   'mul' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3407,11 +3769,11 @@ Instruction 
 
 
 
-
+
   'fmul' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3437,14 +3799,16 @@ Instruction 
 
 
 
- 'udiv' Instruction
-
+
+  'udiv' Instruction
+
 
-
+
 
 Syntax:
 -  <result> = udiv <ty> <op1>, <op2>   ; yields {ty}:result
+  <result> = udiv <ty> <op1>, <op2>         ; yields {ty}:result
+  <result> = udiv exact <ty> <op1>, <op2>   ; yields {ty}:result
 
 
 Overview:
@@ -3463,6 +3827,11 @@ Instruction 
 
 Division by zero leads to undefined behavior.
 
+If the exact keyword is present, the result value of the
+   udiv is a trap value if %op1 is not a
+  multiple of %op2 (as such, "((a udiv exact b) mul b) == a").
+
+
 Example:
    <result> = udiv i32 4, %var          ; yields {i32}:result = 4 / %var
@@ -3471,10 +3840,11 @@ Instruction 
 
 
 
- 'sdiv' Instruction
- 
+
+  'sdiv' Instruction
+
 
-
+
 
 Syntax:
 @@ -3513,10 +3883,11 @@ Instruction 
 
 
 
- 'fdiv'
-Instruction 
+
+  'fdiv' Instruction
+
 
-
+
 
 Syntax:
 @@ -3542,10 +3913,11 @@ Instruction 
 
 
 
- 'urem' Instruction
-
+
+  'urem' Instruction
+
 
-
+
 
 Syntax:
 @@ -3579,11 +3951,11 @@ Instruction 
 
 
 
-
+
   'srem' Instruction
-
+
 
-
+
 
 Syntax:
 @@ -3603,9 +3975,10 @@ Instruction 
 
 Semantics:
 This instruction returns the remainder of a division (where the result
-   has the same sign as the dividend, op1), not the modulo
-   operator (where the result has the same sign as the divisor, op2) of
-   a value.  For more information about the difference,
+   is either zero or has the same sign as the dividend, op1), not the
+   modulo operator (where the result is either zero or has the same sign
+   as the divisor, op2) of a value.
+   For more information about the difference,
    see The
    Math Forum. For a table of how this is implemented in various languages,
    please see 
@@ -3629,10 +4002,11 @@ Instruction 
 
 
 
-
-  'frem' Instruction 
+
+  'frem' Instruction
+
 
-
+
 
 Syntax:
 @@ -3659,11 +4033,14 @@ Instruction 
 
 
 
+
+
 
- Bitwise Binary
-Operations 
+
+  Bitwise Binary Operations
+
 
-
+
 
 Bitwise binary operators are used to do various forms of bit-twiddling in a
    program.  They are generally very efficient instructions and can commonly be
@@ -3671,17 +4048,19 @@ Operations 
    same type, execute an operation on them, and produce a single value.  The
    resulting value is the same type as its operands.
 
-
-
 
- 'shl'
-Instruction 
+
+  'shl' Instruction
+
 
-
+
 
 Syntax:
 -  <result> = shl <ty> <op1>, <op2>   ; yields {ty}:result
+  <result> = shl <ty> <op1>, <op2>           ; yields {ty}:result
+  <result> = shl nuw <ty> <op1>, <op2>       ; yields {ty}:result
+  <result> = shl nsw <ty> <op1>, <op2>       ; yields {ty}:result
+  <result> = shl nuw nsw <ty> <op1>, <op2>   ; yields {ty}:result
 
 
 Overview:
@@ -3701,6 +4080,14 @@ Instruction 
    vectors, each vector element of op1 is shifted by the corresponding
    shift amount in op2.
 
+If the nuw keyword is present, then the shift produces a 
+   trap value if it shifts out any non-zero bits.  If
+   the nsw keyword is present, then the shift produces a
+   trap value if it shifts out any bits that disagree
+   with the resultant sign bit.  As such, NUW/NSW have the same semantics as
+   they would if the shift were expressed as a mul instruction with the same
+   nsw/nuw bits in (mul %op1, (shl 1, %op2)).
+
 Example:
    <result> = shl i32 4, %var   ; yields {i32}: 4 << %var
@@ -3713,14 +4100,16 @@ Instruction 
 
 
 
- 'lshr'
-Instruction 
+
+  'lshr' Instruction
+
 
-
+
 
 Syntax:
 -  <result> = lshr <ty> <op1>, <op2>   ; yields {ty}:result
+  <result> = lshr <ty> <op1>, <op2>         ; yields {ty}:result
+  <result> = lshr exact <ty> <op1>, <op2>   ; yields {ty}:result
 
 
 Overview:
@@ -3740,6 +4129,11 @@ Instruction 
    vectors, each vector element of op1 is shifted by the corresponding
    shift amount in op2.
 
+If the exact keyword is present, the result value of the
+   lshr is a trap value if any of the bits
+   shifted out are non-zero.
+
+
 Example:
    <result> = lshr i32 4, 1   ; yields {i32}:result = 2
@@ -3753,13 +4147,16 @@ Instruction

'ashr' -Instruction

+ '`ashr`' Instruction +

+ +

Syntax:

-  <result> = ashr <ty> <op1>, <op2>   ; yields {ty}:result
+  <result> = ashr <ty> <op1>, <op2>         ; yields {ty}:result
+  <result> = ashr exact <ty> <op1>, <op2>   ; yields {ty}:result

Overview:

@@ -3780,6 +4177,10 @@ Instruction

the arguments are vectors, each vector element of op1 is shifted by the corresponding shift amount in op2.

If the exact keyword is present, the result value of the + ashr is a trap value if any of the bits + shifted out are non-zero.

Example:

   <result> = ashr i32 4, 1   ; yields {i32}:result = 2
@@ -3793,10 +4194,11 @@ Instruction

'and' -Instruction

+ '`and`' Instruction +

Syntax:

@@ -3853,9 +4255,11 @@ Instruction

'or' Instruction

+ '`or`' Instruction +

Syntax:

@@ -3914,10 +4318,11 @@ Instruction

'xor' -Instruction

+ '`xor`' Instruction +

Syntax:

@@ -3977,12 +4382,14 @@ Instruction

+ -

Vector Operations -

+ -

LLVM supports several instructions to represent vector operations in a target-independent manner. These instructions cover the element-access and @@ -3991,14 +4398,12 @@ Instruction

will want to use target-specific intrinsics to take full advantage of a specific target.

@@ -4113,24 +4518,24 @@ Instruction

+ -

Aggregate Operations -

+ -

LLVM supports several instructions for working with aggregate values.

- -

'`extractvalue`' Instruction -

+ -

Syntax:

@@ -4143,10 +4548,18 @@ Instruction

Arguments:

The first operand of an 'extractvalue' instruction is a value - of struct, union or + of struct or array type. The operands are constant indices to specify which value to extract in a similar manner as indices in a 'getelementptr' instruction.

The major differences to getelementptr indexing are:

Since the value being indexed is not a pointer, the first index is + omitted and assumed to be zero.
At least one index must be specified.
Not only struct indices but also array indices must be in + bounds.

Semantics:

The result is the value at the position in the aggregate specified by the @@ -4160,15 +4573,15 @@ Instruction

'`insertvalue`' Instruction -

+ -

Syntax:

-  <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>    ; yields <aggregate type>
+  <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}*    ; yields <aggregate type>

Overview:

@@ -4177,11 +4590,11 @@ Instruction

Arguments:

The first operand of an 'insertvalue' instruction is a value - of struct, union or + of struct or array type. The second operand is a first-class value to insert. The following operands are constant indices indicating the position at which to insert the value in a similar manner as indices in a - 'getelementptr' instruction. The + 'extractvalue' instruction. The value to insert must have the same type as the value identified by the indices.

@@ -4192,33 +4605,33 @@ Instruction

Example:

-  %agg1 = insertvalue {i32, float} undef, i32 1, 0         ; yields {i32 1, float undef}
-  %agg2 = insertvalue {i32, float} %agg1, float %val, 1    ; yields {i32 1, float %val}
+  %agg1 = insertvalue {i32, float} undef, i32 1, 0              ; yields {i32 1, float undef}
+  %agg2 = insertvalue {i32, float} %agg1, float %val, 1         ; yields {i32 1, float %val}
+  %agg3 = insertvalue {i32, {float}} %agg1, float %val, 1, 0    ; yields {i32 1, float %val}

Memory Access and Addressing Operations -

+ -

A key design point of an SSA-based representation is how it represents memory. In LLVM, no memory locations are in SSA form, which makes things very simple. This section describes how to read, write, and allocate memory in LLVM.

- -

'`alloca`' Instruction -

+ -

Syntax:

@@ -4265,15 +4678,16 @@ Instruction

'load' -Instruction

+ '`load`' Instruction +

Syntax:

-  <result> = load <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]
-  <result> = volatile load <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]
+  <result> = load [volatile] <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]
+  <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment>
   !<index> = !{ i32 1 }

@@ -4288,6 +4702,19 @@ Instruction

number or order of execution of this load with other volatile operations.

If the load is marked as atomic, it takes an extra + ordering and optional singlethread + argument. The release and acq_rel orderings are + not valid on load instructions. Atomic loads produce defined results when they may see multiple atomic + stores. The type of the pointee must be an integer type whose bit width + is a power of two greater than or equal to eight and less than or equal + to a target-specific size limit. align must be explicitly + specified on atomic loads, and the load has undefined behavior if the + alignment is not set to a value which is at least the size in bytes of + the pointee. !nontemporal does not have any defined semantics + for atomic loads.

The optional constant align argument specifies the alignment of the operation (that is, the alignment of the memory address). A value of 0 or an omitted align argument means that the operation has the preferential @@ -4323,15 +4750,16 @@ Instruction

'store' -Instruction

+ '`store`' Instruction +

Syntax:

-  store <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]                   ; yields {void}
-  volatile store <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]          ; yields {void}
+  store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]                   ; yields {void}
+  store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment>             ; yields {void}

Overview:

@@ -4347,6 +4775,19 @@ Instruction

order of execution of this store with other volatile operations.

If the store is marked as atomic, it takes an extra + ordering and optional singlethread + argument. The acquire and acq_rel orderings aren't + valid on store instructions. Atomic loads produce defined results when they may see multiple atomic + stores. The type of the pointee must be an integer type whose bit width + is a power of two greater than or equal to eight and less than or equal + to a target-specific size limit. align must be explicitly + specified on atomic stores, and the store has undefined behavior if the + alignment is not set to a value which is at least the size in bytes of + the pointee. !nontemporal does not have any defined semantics + for atomic stores.

The optional constant "align" argument specifies the alignment of the operation (that is, the alignment of the memory address). A value of 0 or an omitted "align" argument means that the operation has the preferential @@ -4384,11 +4825,220 @@ Instruction

- 'getelementptr' Instruction +

+'`fence`' Instruction +

+ +

Syntax:

+  fence [singlethread] <ordering>                   ; yields {void}
+

+ +

Overview:

The 'fence' instruction is used to introduce happens-before edges +between operations.

+ +

Arguments:

'fence' instructions take an ordering argument which defines what +synchronizes-with edges they add. They can only be given +acquire, release, acq_rel, and +seq_cst orderings.

+ +

Semantics:

A fence A which has (at least) release ordering +semantics synchronizes with a fence B with (at least) +acquire ordering semantics if and only if there exist atomic +operations X and Y, both operating on some atomic object +M, such that A is sequenced before X, +X modifies M (either directly or through some side effect +of a sequence headed by X), Y is sequenced before +B, and Y observes M. This provides a +happens-before dependency between A and B. Rather +than an explicit fence, one (but not both) of the atomic operations +X or Y might provide a release or +acquire (resp.) ordering constraint and still +synchronize-with the explicit fence and establish the +happens-before edge.

+ +

A fence which has seq_cst ordering, in addition to +having both acquire and release semantics specified +above, participates in the global program order of other seq_cst +operations and/or fences.

+ +

The optional "singlethread" argument +specifies that the fence only synchronizes with other fences in the same +thread. (This is useful for interacting with signal handlers.)

+ +

Example:

+  fence acquire                          ; yields {void}
+  fence singlethread seq_cst             ; yields {void}
+

+ +

+ + +

+'`cmpxchg`' Instruction +

+ +

Syntax:

+  cmpxchg [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <ordering>                   ; yields {ty}
+

+ +

Overview:

The 'cmpxchg' instruction is used to atomically modify memory. +It loads a value in memory and compares it to a given value. If they are +equal, it stores a new value into the memory.

+ +

Arguments:

There are three arguments to the 'cmpxchg' instruction: an +address to operate on, a value to compare to the value currently be at that +address, and a new value to place at that address if the compared values are +equal. The type of '<cmp>' must be an integer type whose +bit width is a power of two greater than or equal to eight and less than +or equal to a target-specific size limit. '<cmp>' and +'<new>' must have the same type, and the type of +'<pointer>' must be a pointer to that type. If the +cmpxchg is marked as volatile, then the +optimizer is not allowed to modify the number or order of execution +of this cmpxchg with other volatile +operations.

+ + + +

The ordering argument specifies how this +cmpxchg synchronizes with other atomic operations.

+ +

The optional "singlethread" argument declares that the +cmpxchg is only atomic with respect to code (usually signal +handlers) running in the same thread as the cmpxchg. Otherwise the +cmpxchg is atomic with respect to all other code in the system.

+ +

The pointer passed into cmpxchg must have alignment greater than or equal to +the size in memory of the operand. + +

Semantics:

The contents of memory at the location specified by the +'<pointer>' operand is read and compared to +'<cmp>'; if the read value is the equal, +'<new>' is written. The original value at the location +is returned. + +

A successful cmpxchg is a read-modify-write instruction for the +purpose of identifying release sequences. A +failed cmpxchg is equivalent to an atomic load with an ordering +parameter determined by dropping any release part of the +cmpxchg's ordering.

+ + + +

Example:

+entry:
+  %orig = atomic load i32* %ptr unordered                       ; yields {i32}
+  br label %loop
+
+loop:
+  %cmp = phi i32 [ %orig, %entry ], [%old, %loop]
+  %squared = mul i32 %cmp, %cmp
+  %old = cmpxchg i32* %ptr, i32 %cmp, i32 %squared                       ; yields {i32}
+  %success = icmp eq i32 %cmp, %old
+  br i1 %success, label %done, label %loop
+
+done:
+  ...
+

+ +

+ + +

+'`atomicrmw`' Instruction +

+ +

Syntax:

+  atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering>                   ; yields {ty}
+

+ +

Overview:

The 'atomicrmw' instruction is used to atomically modify memory.

+ +

Arguments:

There are three arguments to the 'atomicrmw' instruction: an +operation to apply, an address whose value to modify, an argument to the +operation. The operation must be one of the following keywords:

xchg
add
sub
and
nand
or
xor
max
min
umax
umin

+ +

The type of '<value>' must be an integer type whose +bit width is a power of two greater than or equal to eight and less than +or equal to a target-specific size limit. The type of the +'<pointer>' operand must be a pointer to that type. +If the atomicrmw is marked as volatile, then the +optimizer is not allowed to modify the number or order of execution of this +atomicrmw with other volatile + operations.

+ + + +

Semantics:

The contents of memory at the location specified by the +'<pointer>' operand are atomically read, modified, and written +back. The original value at the location is returned. The modification is +specified by the operation argument:

+ +

xchg: *ptr = val
add: *ptr = *ptr + val
sub: *ptr = *ptr - val
and: *ptr = *ptr & val
nand: *ptr = ~(*ptr & val)
or: *ptr = *ptr | val
xor: *ptr = *ptr ^ val
max: *ptr = *ptr > val ? *ptr : val (using a signed comparison)
min: *ptr = *ptr < val ? *ptr : val (using a signed comparison)
umax: *ptr = *ptr > val ? *ptr : val (using an unsigned comparison)
umin: *ptr = *ptr < val ? *ptr : val (using an unsigned comparison)

+ +

Example:

+  %old = atomicrmw add i32* %ptr, i32 1 acquire                        ; yields {i32}
+

+ +

+ '`getelementptr`' Instruction +

+ +

Syntax:

@@ -4410,15 +5060,15 @@ Instruction

indexes a value of the type pointed to (not necessarily the value directly pointed to, since the first index can be non-zero), etc. The first type indexed into must be a pointer value, subsequent types can be arrays, - vectors, structs and unions. Note that subsequent types being indexed into + vectors, and structs. Note that subsequent types being indexed into can never be pointers, since that would require loading the pointer before continuing calculation.

The type of each index argument depends on the type it is indexing into. - When indexing into a (optionally packed) structure or union, only i32 + When indexing into a (optionally packed) structure, only i32 integer constants are allowed. When indexing into an array, pointer or vector, integers of any width are allowed, and they are not required to be - constant.

+ constant. These integers are treated as signed values where relevant.

For example, let's consider a C code fragment and how it gets compiled to LLVM:

@@ -4484,18 +5134,20 @@ entry: base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely - precise arithmetic are not an in bounds address of that allocated - object. The in bounds addresses for an allocated object are all - the addresses that point into the object, plus the address one byte past - the end.

+ precise signed arithmetic are not an in bounds address of that + allocated object. The in bounds addresses for an allocated object + are all the addresses that point into the object, plus the address one + byte past the end.

If the inbounds keyword is not present, the offsets are added to - the base address with silently-wrapping two's complement arithmetic, and - the result value of the getelementptr may be outside the object - pointed to by the base pointer. The result value may not necessarily be - used to access memory though, even if it happens to point into allocated - storage. See the Pointer Aliasing Rules - section for more information.

+ the base address with silently-wrapping two's complement arithmetic. If the + offsets have a different width from the pointer, they are sign-extended or + truncated to the width of the pointer. The result value of the + getelementptr may be outside the object pointed to by the base + pointer. The result value may not necessarily be used to access memory + though, even if it happens to point into allocated storage. See the + Pointer Aliasing Rules section for more + information.

The getelementptr instruction is often confusing. For some more insight into how it works, see the getelementptr FAQ.

@@ -4514,23 +5166,25 @@ entry:

- -

Conversion Operations

+ +

+ Conversion Operations +

+ +

The instructions in this category are the conversion instructions (casting) which all take a single operand and a type. They perform various bit conversions on the operand.

@@ -4974,21 +5643,24 @@ entry:

+ -

Other Operations

+ Other Operations +

The instructions in this category are the "miscellaneous" instructions, which defy better classification.

- -

'icmp' Instruction -

+ '`icmp`' Instruction +

Syntax:

@@ -5087,10 +5759,11 @@ entry:

'fcmp' Instruction -

+ '`fcmp`' Instruction +

Syntax:

@@ -5207,11 +5880,11 @@ entry:

'`phi`' Instruction -

+ -

Syntax:

@@ -5255,11 +5928,11 @@ Loop:       ; Infinite loop that counts from 0 on up...

'`select`' Instruction -

+ -

Syntax:

@@ -5298,11 +5971,11 @@ Loop:       ; Infinite loop that counts from 0 on up...

'`call`' Instruction -

+ -

Syntax:

@@ -5407,11 +6080,11 @@ freestanding environments and non-C-based languages.

'`va_arg`' Instruction -

+ -

Syntax:

@@ -5452,11 +6125,96 @@ freestanding environments and non-C-based languages.

+ +

+ '`landingpad`' Instruction +

+ +

Syntax:

+  <resultval> = landingpad <somety> personality <type> <pers_fn> <clause>+
+  <resultval> = landingpad <somety> personality <type> <pers_fn> cleanup <clause>*
+
+  <clause> := catch <type> <value>
+  <clause> := filter <array constant type> <array constant>
+

+ +

Overview:

The 'landingpad' instruction is used by + LLVM's exception handling + system to specify that a basic block is a landing pad — one where + the exception lands, and corresponds to the code found in the + catch portion of a try/catch sequence. It + defines values supplied by the personality function (pers_fn) upon + re-entry to the function. The resultval has the + type somety.

+ +

Arguments:

This instruction takes a pers_fn value. This is the personality + function associated with the unwinding mechanism. The optional + cleanup flag indicates that the landing pad block is a cleanup.

+ +

A clause begins with the clause type — catch + or filter — and contains the global variable representing the + "type" that may be caught or filtered respectively. Unlike the + catch clause, the filter clause takes an array constant as + its argument. Use "[0 x i8**] undef" for a filter which cannot + throw. The 'landingpad' instruction must contain at least + one clause or the cleanup flag.

+ +

Semantics:

The 'landingpad' instruction defines the values which are set by the + personality function (pers_fn) upon re-entry to the function, and + therefore the "result type" of the landingpad instruction. As with + calling conventions, how the personality function results are represented in + LLVM IR is target specific.

+ +

The clauses are applied in order from top to bottom. If two + landingpad instructions are merged together through inlining, the + clauses from the calling function are appended to the list of clauses.

+ +

The landingpad instruction has several restrictions:

+ +

A landing pad block is a basic block which is the unwind destination of an + 'invoke' instruction.
A landing pad block must have a 'landingpad' instruction as its + first non-PHI instruction.
There can be only one 'landingpad' instruction within the landing + pad block.
A basic block that is not a landing pad block may not include a + 'landingpad' instruction.
All 'landingpad' instructions in a function must have the same + personality function.

+ +

Example:

+  ;; A landing pad which can catch an integer.
+  %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0
+           catch i8** @_ZTIi
+  ;; A landing pad that is a cleanup.
+  %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0
+           cleanup
+  ;; A landing pad which can catch an integer and can only throw a double.
+  %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0
+           catch i8** @_ZTIi
+           filter [1 x i8**] [@_ZTId]
+

+ +

+ -

Intrinsic Functions

LLVM supports the notion of an "intrinsic function". These functions have well known names and semantics and are required to follow certain @@ -5499,14 +6257,12 @@ freestanding environments and non-C-based languages.

To learn how to add an intrinsic function, please see the Extending LLVM Guide.

- -

Variable Argument Handling Intrinsics -

+ -

Variable argument support is defined in LLVM with the va_arg instruction and these three @@ -5548,15 +6304,13 @@ declare void @llvm.va_copy(i8*, i8*) declare void @llvm.va_end(i8*) -

@@ -5643,12 +6397,14 @@ declare void @llvm.va_end(i8*)

+ -

Accurate Garbage Collection Intrinsics -

+ -

LLVM support for Accurate Garbage Collection (GC) requires the implementation and generation of these @@ -5663,14 +6419,12 @@ LLVM.

The garbage collection intrinsics only operate on objects in the generic address space (address space zero).

@@ -5760,24 +6514,24 @@ LLVM.

+ -

Code Generator Intrinsics -

+ -

These intrinsics are provided by LLVM to expose special features that may only be implemented with code generator support.

-  declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>)
+  declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>, i32 <cache type>)

Overview:

@@ -5918,8 +6672,10 @@ LLVM.

address is the address to be prefetched, rw is the specifier determining if the fetch should be for a read (0) or write (1), and locality is a temporal locality specifier ranging from (0) - no - locality, to (3) - extremely local keep in cache. The rw - and locality arguments must be constant integers.

+ locality, to (3) - extremely local keep in cache. The cache type + specifies whether the prefetch is performed on the data (1) or instruction (0) + cache. The rw, locality and cache type arguments + must be constant integers.

Semantics:

This intrinsic does not modify the behavior of the program. In particular, @@ -5930,11 +6686,11 @@ LLVM.

'`llvm.pcmarker`' Intrinsic -

+ -

Syntax:

@@ -5961,11 +6717,11 @@ LLVM.

'`llvm.readcyclecounter`' Intrinsic -

+ -

Syntax:

@@ -5987,26 +6743,26 @@ LLVM.

This is an overloaded intrinsic. You can use llvm.pow on any @@ -6338,24 +7094,124 @@ LLVM.

- -

+ +

Semantics:

This function returns the same values as the libm log functions + would, and handles error conditions in the same way.

Semantics:

This function returns the same values as the libm fma functions + would.

+ + +

+ Bit Manipulation Intrinsics +

+ +

LLVM provides intrinsics for a few important bit manipulation operations. + These allow efficient code generation for some algorithms.

+ + +

+ '`llvm.bswap.*`' Intrinsics +

+ +

Syntax:

This is an overloaded intrinsic function. You can use bswap on any integer @@ -6386,15 +7242,16 @@ LLVM.

'`llvm.ctpop.*`' Intrinsic -

+ -

Syntax:

This is an overloaded intrinsic. You can use llvm.ctpop on any integer bit - width. Not all targets support all bit widths however.

+ width, or on any vector with integer elements. Not all targets support all + bit widths or vector types, however.

   declare i8 @llvm.ctpop.i8(i8  <src>)
@@ -6402,6 +7259,7 @@ LLVM.
   declare i32 @llvm.ctpop.i32(i32 <src>)
   declare i64 @llvm.ctpop.i64(i64 <src>)
   declare i256 @llvm.ctpop.i256(i256 <src>)
+  declare <2 x i32> @llvm.ctpop.v2i32(<2 x i32> <src>)

Overview:

@@ -6410,23 +7268,26 @@ LLVM.

Arguments:

The only argument is the value to be counted. The argument may be of any - integer type. The return type must match the argument type.

+ integer type, or a vector with integer elements. + The return type must match the argument type.

Semantics:

The 'llvm.ctpop' intrinsic counts the 1's in a variable.

The 'llvm.ctpop' intrinsic counts the 1's in a variable, or within each + element of a vector.

'`llvm.ctlz.*`' Intrinsic -

+ -

Syntax:

This is an overloaded intrinsic. You can use llvm.ctlz on any - integer bit width. Not all targets support all bit widths however.

+ integer bit width, or any vector whose elements are integers. Not all + targets support all bit widths or vector types, however.

   declare i8 @llvm.ctlz.i8 (i8  <src>)
@@ -6434,6 +7295,7 @@ LLVM.
   declare i32 @llvm.ctlz.i32(i32 <src>)
   declare i64 @llvm.ctlz.i64(i64 <src>)
   declare i256 @llvm.ctlz.i256(i256 <src>)
+  declare <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src;gt)

Overview:

@@ -6442,25 +7304,28 @@ LLVM.

Arguments:

The only argument is the value to be counted. The argument may be of any - integer type. The return type must match the argument type.

+ integer type, or any vector type with integer element type. + The return type must match the argument type.

Semantics:

The 'llvm.ctlz' intrinsic counts the leading (most significant) - zeros in a variable. If the src == 0 then the result is the size in bits of + zeros in a variable, or within each element of the vector if the operation + is of vector type. If the src == 0 then the result is the size in bits of the type of src. For example, llvm.ctlz(i32 2) = 30.

'`llvm.cttz.*`' Intrinsic -

+ -

Syntax:

This is an overloaded intrinsic. You can use llvm.cttz on any - integer bit width. Not all targets support all bit widths however.

+ integer bit width, or any vector of integer elements. Not all targets + support all bit widths or vector types, however.

   declare i8 @llvm.cttz.i8 (i8  <src>)
@@ -6468,6 +7333,7 @@ LLVM.
   declare i32 @llvm.cttz.i32(i32 <src>)
   declare i64 @llvm.cttz.i64(i64 <src>)
   declare i256 @llvm.cttz.i256(i256 <src>)
+  declase <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>)

Overview:

@@ -6476,32 +7342,36 @@ LLVM.

Arguments:

The only argument is the value to be counted. The argument may be of any - integer type. The return type must match the argument type.

+ integer type, or a vectory with integer element type.. The return type + must match the argument type.

Semantics:

The 'llvm.cttz' intrinsic counts the trailing (least significant) - zeros in a variable. If the src == 0 then the result is the size in bits of + zeros in a variable, or within each element of a vector. + If the src == 0 then the result is the size in bits of the type of src. For example, llvm.cttz(2) = 1.

+ -

Arithmetic with Overflow Intrinsics -

+ -

Syntax:

This is an overloaded intrinsic. You can use llvm.umul.with.overflow @@ -6772,12 +7652,14 @@ LLVM.

+ -

Half Precision Floating Point Intrinsics -

+ -

Half precision floating point is a storage-only format. This means that it is a dense encoding (in memory) but does not support computation in the @@ -6791,14 +7673,15 @@ LLVM.

float if needed, then converted to i16 with llvm.convert.to.fp16, then storing as an i16 value.

- 'llvm.convert.to.fp16' Intrinsic -

+ + '`llvm.convert.to.fp16`' Intrinsic + +

Syntax:

@@ -6829,11 +7712,13 @@ LLVM.

- 'llvm.convert.from.fp16' Intrinsic -

+ + '`llvm.convert.from.fp16`' Intrinsic + +

Syntax:

@@ -6863,12 +7748,14 @@ LLVM.

+ -

Debugger Intrinsics -

+ -

The LLVM debugger intrinsics (which all start with llvm.dbg. prefix), are described in @@ -6878,11 +7765,11 @@ LLVM.

Exception Handling Intrinsics -

+ -

The LLVM exception handling intrinsics (which all start with llvm.eh. prefix), are described in @@ -6892,13 +7779,13 @@ LLVM.

- Trampoline Intrinsic -

+ Trampoline Intrinsics +

This intrinsic makes it possible to excise one parameter, marked with +

These intrinsics make it possible to excise one parameter, marked with the nest attribute, from a function. The result is a callable function pointer lacking the nest parameter - the caller does not need to @@ -6915,30 +7802,31 @@ LLVM.

   %tramp = alloca [10 x i8], align 4 ; size and alignment only correct for X86
   %tramp1 = getelementptr [10 x i8]* %tramp, i32 0, i32 0
-  %p = call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8* nest , i32, i32)* @f to i8*), i8* %nval)
+  call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8*, i32, i32)* @f to i8*), i8* %nval)
+  %p = call i8* @llvm.adjust.trampoline(i8* %tramp1)
   %fp = bitcast i8* %p to i32 (i32, i32)*

The call %val = call i32 %fp(i32 %x, i32 %y) is then equivalent to %val = call i32 %f(i8* %nval, i32 %x, i32 %y).

- -

- 'llvm.init.trampoline' Intrinsic -

+ + '`llvm.init.trampoline`' Intrinsic + +

Syntax:

-  declare i8* @llvm.init.trampoline(i8* <tramp>, i8* <func>, i8* <nval>)
+  declare void @llvm.init.trampoline(i8* <tramp>, i8* <func>, i8* <nval>)

Overview:

This fills the memory pointed to by tramp with code and returns a - function pointer suitable for executing it.

This fills the memory pointed to by tramp with executable code, + turning it into a trampoline.

Arguments:

The llvm.init.trampoline intrinsic takes three arguments, all @@ -6952,514 +7840,71 @@ LLVM.

Semantics:

The block of memory pointed to by tramp is filled with target - dependent code, turning it into a function. A pointer to this function is - returned, but needs to be bitcast to an appropriate - function pointer type before being called. The new function's signature - is the same as that of func with any arguments marked with - the nest attribute removed. At most one such nest argument - is allowed, and it must be of pointer type. Calling the new function is - equivalent to calling func with the same argument list, but - with nval used for the missing nest argument. If, after - calling llvm.init.trampoline, the memory pointed to - by tramp is modified, then the effect of any later call to the - returned function pointer is undefined.

- -

- - -

- Atomic Operations and Synchronization Intrinsics -

- -

These intrinsic functions expand the "universal IR" of LLVM to represent - hardware constructs for atomic operations and memory synchronization. This - provides an interface to the hardware, not an interface to the programmer. It - is aimed at a low enough level to allow any programming models or APIs - (Application Programming Interfaces) which need atomic behaviors to map - cleanly onto it. It is also modeled primarily on hardware behavior. Just as - hardware provides a "universal IR" for source languages, it also provides a - starting point for developing a "universal" atomic operation and - synchronization IR.

- -

These do not form an API such as high-level threading libraries, - software transaction memory systems, atomic primitives, and intrinsic - functions as found in BSD, GNU libc, atomic_ops, APR, and other system and - application libraries. The hardware interface provided by LLVM should allow - a clean implementation of all of these APIs and parallel programming models. - No one model or paradigm should be selected above others unless the hardware - itself ubiquitously does so.

- -

- - -

- 'llvm.memory.barrier' Intrinsic -

Syntax:

-  declare void @llvm.memory.barrier(i1 <ll>, i1 <ls>, i1 <sl>, i1 <ss>, i1 <device>)
-

- -

Overview:

The llvm.memory.barrier intrinsic guarantees ordering between - specific pairs of memory access types.

- -

Arguments:

The llvm.memory.barrier intrinsic requires five boolean arguments. - The first four arguments enables a specific barrier as listed below. The - fifth argument specifies that the barrier applies to io or device or uncached - memory.

- -

ll: load-load barrier
ls: load-store barrier
sl: store-load barrier
ss: store-store barrier
device: barrier applies to device and uncached memory also.

- -

Semantics:

This intrinsic causes the system to enforce some ordering constraints upon - the loads and stores of the program. This barrier does not - indicate when any events will occur, it only enforces - an order in which they occur. For any of the specified pairs of load - and store operations (f.ex. load-load, or store-load), all of the first - operations preceding the barrier will complete before any of the second - operations succeeding the barrier begin. Specifically the semantics for each - pairing is as follows:

- -

ll: All loads before the barrier must complete before any load - after the barrier begins.
ls: All loads before the barrier must complete before any - store after the barrier begins.
ss: All stores before the barrier must complete before any - store after the barrier begins.
sl: All stores before the barrier must complete before any - load after the barrier begins.

- -

These semantics are applied with a logical "and" behavior when more than one - is enabled in a single memory barrier intrinsic.

- -

Backends may implement stronger barriers than those requested when they do - not support as fine grained a barrier as requested. Some architectures do - not need all types of barriers and on such architectures, these become - noops.

- -

Example:

-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 4, %ptr
-
-%result1  = load i32* %ptr      ; yields {i32}:result1 = 4
-            call void @llvm.memory.barrier(i1 false, i1 true, i1 false, i1 false)
-                                ; guarantee the above finishes
-            store i32 8, %ptr   ; before this begins
-

- + dependent code, turning it into a function. Then tramp needs to be + passed to llvm.adjust.trampoline to get a pointer + which can be bitcast (to a new function) and + called. The new function's signature is the same as that of + func with any arguments marked with the nest attribute + removed. At most one such nest argument is allowed, and it must be of + pointer type. Calling the new function is equivalent to calling func + with the same argument list, but with nval used for the missing + nest argument. If, after calling llvm.init.trampoline, the + memory pointed to by tramp is modified, then the effect of any later call + to the returned function pointer is undefined.

- 'llvm.atomic.cmp.swap.*' Intrinsic -

+ + '`llvm.adjust.trampoline`' Intrinsic + +

Syntax:

This is an overloaded intrinsic. You can use llvm.atomic.cmp.swap on - any integer bit width and for different address spaces. Not all targets - support all bit widths however.

-  declare i8 @llvm.atomic.cmp.swap.i8.p0i8(i8* <ptr>, i8 <cmp>, i8 <val>)
-  declare i16 @llvm.atomic.cmp.swap.i16.p0i16(i16* <ptr>, i16 <cmp>, i16 <val>)
-  declare i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* <ptr>, i32 <cmp>, i32 <val>)
-  declare i64 @llvm.atomic.cmp.swap.i64.p0i64(i64* <ptr>, i64 <cmp>, i64 <val>)
+  declare i8* @llvm.adjust.trampoline(i8* <tramp>)

Overview:

This loads a value in memory and compares it to a given value. If they are - equal, it stores a new value into the memory.

This performs any required machine-specific adjustment to the address of a + trampoline (passed as tramp).

Arguments:

The llvm.atomic.cmp.swap intrinsic takes three arguments. The result - as well as both cmp and val must be integer values with the - same bit width. The ptr argument must be a pointer to a value of - this integer type. While any bit width integer may be used, targets may only - lower representations they support in hardware.

tramp must point to a block of memory which already has trampoline code + filled in by a previous call to llvm.init.trampoline + .

Semantics:

This entire intrinsic must be executed atomically. It first loads the value - in memory pointed to by ptr and compares it with the - value cmp. If they are equal, val is stored into the - memory. The loaded value is yielded in all cases. This provides the - equivalent of an atomic compare-and-swap operation within the SSA - framework.

- -

Examples:

-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 4, %ptr
-
-%val1     = add i32 4, 4
-%result1  = call i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* %ptr, i32 4, %val1)
-                                          ; yields {i32}:result1 = 4
-%stored1  = icmp eq i32 %result1, 4       ; yields {i1}:stored1 = true
-%memval1  = load i32* %ptr                ; yields {i32}:memval1 = 8
-
-%val2     = add i32 1, 1
-%result2  = call i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* %ptr, i32 5, %val2)
-                                          ; yields {i32}:result2 = 8
-%stored2  = icmp eq i32 %result2, 5       ; yields {i1}:stored2 = false
-
-%memval2  = load i32* %ptr                ; yields {i32}:memval2 = 8
-

- -

- - -

- 'llvm.atomic.swap.*' Intrinsic -

Syntax:

- -

This is an overloaded intrinsic. You can use llvm.atomic.swap on any - integer bit width. Not all targets support all bit widths however.

- -

-  declare i8 @llvm.atomic.swap.i8.p0i8(i8* <ptr>, i8 <val>)
-  declare i16 @llvm.atomic.swap.i16.p0i16(i16* <ptr>, i16 <val>)
-  declare i32 @llvm.atomic.swap.i32.p0i32(i32* <ptr>, i32 <val>)
-  declare i64 @llvm.atomic.swap.i64.p0i64(i64* <ptr>, i64 <val>)
-

- -

-  declare i8 @llvm.atomic.load.add.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.add.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.add.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.add.i64.p0i64(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.sub.i8.p0i32(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.sub.i16.p0i32(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.sub.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.sub.i64.p0i32(i64* <ptr>, i64 <delta>)
-

- -

Overview:

This intrinsic subtracts delta to the value stored in memory at - ptr. It yields the original value at ptr.

- -

Arguments:

- -

Semantics:

This intrinsic does a series of operations atomically. It first loads the - value stored at ptr. It then subtracts delta, stores the - result to ptr. It yields the original value stored - at ptr.

- -

Examples:

-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 8, %ptr
-%result1  = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 4)
-                                ; yields {i32}:result1 = 8
-%result2  = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 2)
-                                ; yields {i32}:result2 = 4
-%result3  = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 5)
-                                ; yields {i32}:result3 = 2
-%memval1  = load i32* %ptr      ; yields {i32}:memval1 = -3
-

- -

- - -

- 'llvm.atomic.load.and.*' Intrinsic
- 'llvm.atomic.load.nand.*' Intrinsic
- 'llvm.atomic.load.or.*' Intrinsic
- 'llvm.atomic.load.xor.*' Intrinsic
-

- -

Syntax:

These are overloaded intrinsics. You can - use llvm.atomic.load_and, llvm.atomic.load_nand, - llvm.atomic.load_or, and llvm.atomic.load_xor on any integer - bit width and for different address spaces. Not all targets support all bit - widths however.

- -

-  declare i8 @llvm.atomic.load.and.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.and.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.and.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.and.i64.p0i64(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.or.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.or.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.or.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.or.i64.p0i64(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.nand.i8.p0i32(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.nand.i16.p0i32(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.nand.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.nand.i64.p0i32(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.xor.i8.p0i32(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.xor.i16.p0i32(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.xor.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.xor.i64.p0i32(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.max.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.max.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.max.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.max.i64.p0i64(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.min.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.min.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.min.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.min.i64.p0i64(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.umax.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.umax.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.umax.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.umax.i64.p0i64(i64* <ptr>, i64 <delta>)
-

- -

-  declare i8 @llvm.atomic.load.umin.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.umin.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.umin.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.umin.i64.p0i64(i64* <ptr>, i64 <delta>)
-

- -

Overview:

These intrinsics takes the signed or unsigned minimum or maximum of - delta and the value stored in memory at ptr. It yields the - original value at ptr.

- -

Arguments:

- -

+ -

Syntax:

@@ -7485,11 +7930,11 @@ LLVM.

'`llvm.lifetime.end`' Intrinsic -

+ -

Syntax:

@@ -7514,15 +7959,15 @@ LLVM.

'`llvm.invariant.start`' Intrinsic -

+ -

Syntax:

-  declare {}* @llvm.invariant.start(i64 <size>, i8* nocapture <ptr>) readonly
+  declare {}* @llvm.invariant.start(i64 <size>, i8* nocapture <ptr>)

Overview:

@@ -7542,11 +7987,11 @@ LLVM.

'`llvm.invariant.end`' Intrinsic -

+ -

Syntax:

@@ -7568,24 +8013,24 @@ LLVM.

+ -

General Intrinsics -

+ -

This class of intrinsics is designed to be generic and has no specific purpose.

- -

'`llvm.var.annotation`' Intrinsic -

+ -

Syntax:

@@ -7603,17 +8048,17 @@ LLVM.
 Semantics:
 This intrinsic allows annotation of local variables with arbitrary strings.
    This can be useful for special purpose optimizations that want to look for
-   these annotations.  These have no other defined use, they are ignored by code
+   these annotations.  These have no other defined use; they are ignored by code
    generation and optimization.

'`llvm.annotation.*`' Intrinsic -

+ -

Syntax:

This is an overloaded intrinsic. You can use 'llvm.annotation' on @@ -7639,17 +8084,17 @@ LLVM.

Semantics:

This intrinsic allows annotations to be put on arbitrary expressions with arbitrary strings. This can be useful for special purpose optimizations that - want to look for these annotations. These have no other defined use, they + want to look for these annotations. These have no other defined use; they are ignored by code generation and optimization.

'`llvm.trap`' Intrinsic -

+ -

Syntax:

@@ -7670,11 +8115,11 @@ LLVM.

'`llvm.stackprotector`' Intrinsic -

+ -

Syntax:

@@ -7697,18 +8142,18 @@ LLVM.
    the AllocaInst stack slot to be before local variables on the
    stack. This is to ensure that if a local variable on the stack is
    overwritten, it will destroy the value of the guard. When the function exits,
-   the guard on the stack is checked against the original guard. If they're
+   the guard on the stack is checked against the original guard. If they are
    different, then the program aborts by calling the __stack_chk_fail()
    function.

'`llvm.objectsize`' Intrinsic -

+ -

Syntax:

@@ -7717,25 +8162,28 @@ LLVM.

Overview:

The llvm.objectsize intrinsic is designed to provide information - to the optimizers to discover at compile time either a) when an - operation like memcpy will either overflow a buffer that corresponds to - an object, or b) to determine that a runtime check for overflow isn't - necessary. An object in this context means an allocation of a - specific class, structure, array, or other object.

The llvm.objectsize intrinsic is designed to provide information to + the optimizers to determine at compile time whether a) an operation (like + memcpy) will overflow a buffer that corresponds to an object, or b) that a + runtime check for overflow isn't necessary. An object in this context means + an allocation of a specific class, structure, array, or other object.

Arguments:

The llvm.objectsize intrinsic takes two arguments. The first +

The llvm.objectsize intrinsic takes two arguments. The first argument is a pointer to or into the object. The second argument - is a boolean 0 or 1. This argument determines whether you want the - maximum (0) or minimum (1) bytes remaining. This needs to be a literal 0 or + is a boolean 0 or 1. This argument determines whether you want the + maximum (0) or minimum (1) bytes remaining. This needs to be a literal 0 or 1, variables are not allowed.

Semantics:

The llvm.objectsize intrinsic is lowered to either a constant - representing the size of the object concerned or i32/i64 -1 or 0 - (depending on the type argument if the size cannot be determined - at compile time.

+ representing the size of the object concerned, or i32/i64 -1 or 0, + depending on the type argument, if the size cannot be determined at + compile time.

+ +

@@ -7748,7 +8196,7 @@ LLVM.

src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"> Chris Lattner
- The LLVM Compiler Infrastructure
+ The LLVM Compiler Infrastructure
Last modified: $Date$

LLVM Language Reference Manual

+ Well-Formedness +

+ Module Structure +

Linkage Types -

Calling Conventions -

Visibility Styles -

Named Types -

Global Variables -

Functions -

Syntax:

Aliases -

Named Metadata -

+ Parameter Attributes +

Garbage Collector Names -

Function Attributes -

Module-Level Inline Assembly -

Data Layout -

Pointer Aliasing Rules -

Volatile Memory Accesses -

+ Memory Model for Concurrent Operations +

+ Atomic Memory Ordering Constraints +

+ Type Classifications +

+ Primitive Types +

+ Integer Type +

Overview:

+ Floating Point Types +

+ X86mmx Type +

Overview:

Syntax:

+ Void Type +

Overview:

+ Label Type +

Overview:

+ Metadata Type +

Overview:

+ Derived Types +

+ Aggregate Types +

+ Array Type +

Overview:

+ Function Type +

Overview:

Syntax:

+ Structure Type +

Overview:

Syntax:

Examples:

Overview:

Syntax:

Examples:

+ Opaque Structure Types +

Overview:

Syntax:

Examples:

+ Pointer Type +

Overview:

+ Vector Type +

Overview:

Examples:

Overview:

Syntax:

Examples:

Overview:

Syntax:

Examples:

+ Simple Constants +

Complex Constants -

Global Variable and Function Addresses -

+ Undefined Values +

+ Trap Values +

+ Addresses of Basic Blocks +

+ Constant Expressions +

Inline Assembler Expressions -

Inline Asm Metadata -

+ Metadata Nodes and Metadata Strings +

+ 'tbaa' Metadata +

+ 'fpaccuracy' Metadata +

Intrinsic Global Variables -

The 'llvm.used' Global Variable -

+ + The 'llvm.compiler.used' Global Variable + +

The 'llvm.global_ctors' Global Variable -

+ '`tbaa`' Metadata +

+ '`fpaccuracy`' Metadata +

The '`llvm.used`' Global Variable -

+ + The '`llvm.compiler.used`' Global Variable + +

The '`llvm.global_ctors`' Global Variable -

The '`llvm.global_dtors`' Global Variable -

+ '`ret`' Instruction +

+ '`br`' Instruction +

'`switch`' Instruction -

'`indirectbr`' Instruction -

'`invoke`' Instruction -

+ '`unwind`' Instruction +

+ '`resume`' Instruction +

+ '`unreachable`' Instruction +

'`add`' Instruction -

'`fadd`' Instruction -

'`sub`' Instruction -

'`fsub`' Instruction -

'`mul`' Instruction -

'`fmul`' Instruction -

+ '`udiv`' Instruction +

+ '`sdiv`' Instruction +

+ '`fdiv`' Instruction +

+ '`urem`' Instruction +

'`srem`' Instruction -

+ '`frem`' Instruction +

+ '`shl`' Instruction +

+ '`lshr`' Instruction +